November 21, 2018
Optimising marketing spend is a billion dollar question for businesses all over the world. It can be difficult to untangle the relationship between marketing budgets, campaigns and resulting sales, with so many competing internal and external forces at play. However, businesses can improve their marketing with machine learning. I'll explain more below.
The problem with data science in marketing
Optimising marketing with machine learning techniques has already shown very promising results, due to the ability to map complex relationships on large data sets. However, implementation is often a hurdle for marketers who lack the technical skills and background.
On the other hand, on-boarding consultant data scientists is also expensive and inefficient, as they often lack understanding around marketing activities and the company's industry sector. As a result, they can also struggle to provide meaningful and actionable insights.
Mind Foundry believes problem owners should be given simple yet reliable tools to build their own data science solutions, which is why we have built an automated data science team in a box! Marketers can use Mind Foundry to design their email marketing campaigns, increase customer conversion and boost retention rates.
We've put our theory to the test, exploring ways to improve email marketing with machine learning. Read on to discover how we applied our knowledge of the sector to improve the model's performance, demonstrating how machine learning augments rather than automates the analytics process.
Phase 1: Data Exploration
We will use Corefactors.in’s Data Set hosted on Kaggle. The three email statuses we are trying to predict are “ignored”, “read” and “acknowledged” based on the various attributes of the customers and the content of the email.
After uploading the data set into Mind Foundry, we are shown a snapshot of the data along with machine learning-generated advice on how to clean and prepare it.
Mind Foundry Data PreparationFor example, Mind Foundry has detected missing values in several columns, and asks the user whether they wish to fill them in, mark them as missing or remove them. By looking at the names of the columns, we know that we can fill them in as follows:
- 0 in Total Links, Images, Past Communications
- "Missing" for Customer Location
The following images show how we can implement the advice in a couple of clicks.
Mind Foundry automatically implements the advice for the user and simultaneously creates an audit trail of every single operation applied to the data at the bottom of the screen. The user can then go back to previous versions of the data set and undo steps if necessary.
Mind Foundry also automatically generates histograms to provide a high level overview of the data’s structure to the user.
Phase 2: ModellingFor this email marketing data set, we want to predict the status of an email (ignored, read, acknowledged) based on other attributes, which means that we are trying to solve a classification problem.
After specifying which column we are predicting (Email_Status), AuDaS provides a robust framework to build classification pipelines that can be generalised to other data sets without losing too much predictive power that is normally caused by overfitting. This involves:
- Automatically withholding a balanced 10% hold out of the data set for final model validation purposes
- Performing a 10-fold Cross Validation
- Optimizing the F1 Score
Advanced users can change these defaults to their preferred settings (N-fold, split, optimisable metric…)
Mind Foundry will then use its proprietary Bayesian Optimizer (Mind Foundry Optimize) to efficiently navigate the millions of possible solutions to identify an optimal classification pipeline (feature engineering, model and optimal parameters) in less than 100 iterations.
As it is performing this search, Mind Foundry provides full transparency to the user on the tested pipelines, models and parameter values as well as their associated performance statistics. Mind Foundry also provides short explanations when you hover over any technical term in the platform.
For the specified classification model, the email campaign type, total past communications and word c ount are the most relevant features for predicting the status of the email. The user also has access to a full audit trail of every single model that Mind Foundry has tried.
Phase 3: Model Validation and deployment
When the user is happy with model performance or has completed the specified number of iterations, Mind Foundry will validate the model on the 10% balanced hold out and provide Model Health advice. In our case, the tests on the held-out data were consistent with the cross-validated tests during optimisation and the model health is good. This means that the user can be confident with the insights provided by the model and deploy it in production.
Deeper insights can be accessed on the "influences page". This provides an indication to the user on the individual contribution of each feature on the forecasts.
In this case, the chances of a email being read (class 2) increase when the campaign type is 1 and there have been more than 20 past communications. Longer emails also have less chances of being read.
The user can then upload a test set to predict email status outcomes, score the model or deploy it automatically via a RESTful API that can be integrated in to your product.
API for the classification model
Mind Foundry also provides the LIME explanations of the predictions, which allow you to understand the contribution of each feature to the predicted outcome. A simple web app below shows how you might interact with the trained model:
Phase 4: Improving the performance
In our first run, our classification accuracy across all 3 classes was 51%, which is better than random but not ideal.
However, with our knowledge of the email status values (0 representing ignored, 1 representing opened and 2 representing converted) we attempted to improve the performance of the model by regrouping 0 and 1, since their outcome is effectively the same for the marketer.
By regrouping the classes in the data preparation phase and retraining the model, we were then able to achieve a much better classification accuracy of 73.7% on the 10% hold out.
In conclusion, it is possible to analyse and optimise email marketing with machine learning solutions. However, human expertise and experience proved to be essential to this process, and likely always will be.