An Interview with Dr. Alessandra Tosi
Q. Alessandra, your upcoming seminar is titled “A business take on the data-science pipeline.” Would you like to tell us a little bit about your topic and who will gain the most from attending?
A. In this webinar, I will go through the steps of the data-science pipeline, with a particular focus on business applications. I will stress the importance of correctly framing the business problem in terms of goals, value and success metrics, and how to translate this into the machine-learning framework. The intended audience is a business problem-solver who wants to extract more value from the data.
Q. If there are people in business who could be using machine learning to help them solve problems - but they are not doing so, what do you think are some of the main reasons holding them back?
A. The main challenge is to correctly frame the machine-learning problem in accordance with your business problem. If this choice is not aligned with business objectives, you will not achieve the desired outcome. Historically, it has not been easy to understand which machine-learning algorithms and methods are best for a given problem, so many people have relied on data-science experts. And that's fine if you have ready access to one who really truly understands your business objectives. But if you are not in that fortunate situation, software can help you make these choices now.
Q. Let’s zero in on data size and quality for a moment. I think many people in business have been led to associate “big data” with artificial intelligence solutions. But supposing I own a business problem, and I have assembled some data about it, but it’s small data - say only 500 rows, and even at that not every row has a complete record across each column. Am I below the practical boundary for AI to be useful?
A. Definitely not! The far less popular domain of data-efficient machine learning is tackling applications in the “small” data regime. This is again a matter of framing the problem in the correct way and using the best model.
Q. In your webinar you use a metaphor that has become an everyday term in the data science world: the "data-science pipeline.” What do you mean when you use the phrase, "data-science pipeline?"
A. What is commonly associated with the data-science pipeline is the sequence of processes that are applied to the data in order to perform a task. If you perform a data task in isolation, you might introduce extra assumptions that are not compatible with the other parts of the process. Sometimes people focus too much on the individual steps and forget about the “flow.” I’d like you to think about a framework in which all the steps of the pipeline fit together harmoniously, and let the data flow through the pipeline consistently.
A. This webinar does not require technical expertise in data science or machine learning. Anyone who is interested in applying AI to their business problems should find the content interesting and useful.
You can sign up here.