Madeleine Neil Smith
May 5, 2020
An Interview with Madeleine Neil Smith:
Q. Madeleine, your upcoming seminar is titled “Data Preparation Tips for Consumer’s Data.” Would you like to tell us a little bit about your topic and who will gain the most from attending?A. We’re going to cover some of the most common scenarios experienced when trying to take consumer data (often collected from various sources) and turn them into something that could be used to create powerful, predictive Machine Learning models. Anyone who works with data and who would like to extract as much use from it as possible should gain something from this webinar!
Q. Data preparation is a big part of machine learning generally. Are there some issues specific to data about customers or prospects that require special attention?A. Many of the issues experienced with customer data are common to many other types of commercial data. For example, how to handle missing data - does that missing data represent something about the customer, or is it simply a data collection error? How about mistakes that were made when typing entries into forms? How do we find them and correct them? And how do we treat categorical data (such as career/industry or education level) when many Machine Learning pipelines expect numerical data?
Other issues here crop up around bias - which data should we be using to build our models? How can we be sure we aren’t using data that we’re either not allowed to use, or data that embeds biased assumptions into our model?
Q. If you could advise someone responsible for collecting customer data today, anticipating that they might be able to use that data in ML in the future, what would you tell them?A. You are the one that understands the meaning behind your data, how it’s being collected, and what the fields represent. Make sure that you are infusing that knowledge into the data itself. For example with missing values - you know whether a missing entry actually means a value of ‘0’ (in which case - make that change!) or whether it means the whole row is invalid. As the business problem owner only you can answer that question - the Machine Learning can’t do it for you. So own that process.
Q. It’s fair to say that most people with customer data regard that data to be messy. Is that a reason to avoid ML or a reason to use a ML platform to deal with it?A. Data Scientists spend a lot of their time cleaning data, and this is not time that you want to spend when you have a business problem to solve. Using a Machine Learning platform such as Mind Foundry can speed up the data preparation process by giving you advice on which parts of the cleaning up process you need to complete, and which parts you can leave to the Auto-ML pipeline to treat automatically.
Q. What do I need to know in order to attend your webinar?A. This webinar does not require technical expertise in data science or machine learning. It is for people who work with data to solve problems, whatever tooling that might be, to gain insight into how they can best prepare for Machine Learning and AI in their business.The Webinar will take place this Thursday May 7th, at 5PM BST, Noon ET. You can register here: