In this webinar Dr. Alessandra Tosi goes through the steps of the data-science pipeline, with a particular focus on business applications. Dr. Tosi stresses the importance of correctly framing the business problem in terms of goals, value and success metrics, and how to translate this into the machine-learning framework.
Alessandra is a senior scientist and product owner, with a PhD in probabilistic machine learning.
The webinar is available for all to view. Below are a few of the topics covered:
Q. If there are people in business who could be using machine learning to help them solve problems - but they are not doing so, what do you think are some of the main reasons holding them back?
A. The main challenge is to correctly frame the machine-learning problem in accordance with your business problem. If this choice is not aligned with business objectives, you will not achieve the desired outcome. Historically, it has not been easy to understand which machine-learning algorithms and methods are best for a given problem, so many people have relied on data-science experts. And that's fine if you have ready access to one who really truly understands your business objectives. But if you are not in that fortunate situation, software can help you make these choices now.
Q. Let’s zero in on data size and quality for a moment. I think many people in business have been led to associate “big data” with artificial intelligence solutions. But supposing I own a business problem, and I have assembled some data about it, but it’s small data - say only 500 rows, and even at that not every row has a complete record across each column. Am I below the practical boundary for AI to be useful?
A. Definitely not! The far less popular domain of data-efficient machine learning is tackling applications in the “small” data regime. This is again a matter of framing the problem in the correct way and using the best model.
Q. You use a metaphor that has become an everyday term in the data science world: the "data-science pipeline.” What is the "data-science pipeline?"
A. What is commonly associated with the data-science pipeline is the sequence of processes that are applied to the data in order to perform a task. If you perform a data task in isolation, you might introduce extra assumptions that are not compatible with the other parts of the process. Sometimes people focus too much on the individual steps and forget about the “flow.” I’d like you to think about a framework in which all the steps of the pipeline fit together harmoniously, and let the data flow through the pipeline consistently.
A. This webinar does not require technical expertise in data science or machine learning. Anyone who is interested in applying AI to their business problems should find the content interesting and useful.
Watch the webinar below: