An interview with Dr. Bernardo Perez Orozco
Q. Bernardo your upcoming webinar is titled "Using NLP to unlock hidden meaning in the data you already have." What are you going to cover?
A. Historically, the information-value of free-text has mostly been out-of-reach for automated solutions. However, recent developments in Machine Learning – Natural Language Processing in particular – have made it increasingly possible to extract hidden treasures from free text via automation. This development resets some practical boundaries. I’ll show an example of how this works, using Mind Foundry’s Analyze platform and a set of news stories.
Q. Bernardo, after a while in any industry, we create acronyms and once that happens some of the meaning and insight of the original terms is lost. Is that the case with NLP, AKA, “Natural Language Processing?”
A. Despite being only a 3-letter acronym, NLP actually comprises a long list of problems, each of which is, on its own, a humongous task. Some examples are: automatic translation, summarization of documents, speech-to-text and text-to-speech technologies, and sentiment analysis methods. The “Natural” in NLP speaks to a common challenge across these sub-fields - the fluidity of human language. This quality sets natural language apart from artificial languages such as Python and Java, which are well-defined, rigid, and accordingly, less semantically complex.
Q. Among people constructing data pipelines for machine learning, a distinction is often made between “structured” and “unstructured” data. Can you talk about this distinction and some of its influences (for better or worse) in the evolution of computer processing of text?
A. Often I hear that some business analysts consider text data as being “unstructured”. But what exactly about it is unstructured? I would argue the exact opposite! Human language has a rich and ultra-complex hierarchical structure to encode and communicate ideas as words and sentences with a meaning. Indeed, the human ability to use language builds on a lifetime of accumulated cultural awareness. And people are adept with ambiguity, something computers have been historically unable to achieve (arguably by design). But this has increasingly changed with the development of Machine Learning methods, and in particular with those within the Natural Language Processing community. This is because one of ML’s ultimate goals is to find solutions for problems whose finer details are difficult to articulate explicitly. Machine Learning’s own “natural” language is probability and statistics. This attention to uncertainty turns out to be valuable in modelling the most fluid aspects of human language, such as ambiguity, exceptions, and emotionality. So, we’re making progress.
Q. In your talk you’re going to look specifically at news articles as a form of language susceptible to machine-based analysis and insight - but are there some other use cases you see as equally viable for these methods?
A. Sure! News is a widely relatable example, but the methods we will talk about apply to numerous specific use cases: auto-tracking customer satisfaction through social media analysis, auto-triaging support tickets, auto-targeted marketing content, or scanning call centre logs for churn risk. The same underlying methods apply.
Q. What do I need to know in order to attend your webinar?
A. One of our goals at Mind Foundry is to help everybody access the power of Machine Learning. You don’t have to be a programmer or data scientist to get something out of the talk. The only prerequisite is your curiosity.
Bernardo's webinar is available for viewing: https://learn.mindfoundry.ai/post-webinar-28/05/2020