Blog

Solving the Kaggle Telco Customer Churn Challenge


Customer churn, which occurs when clients decide to cancel or not renew their subscription, can be a nightmare for most businesses. New client wins can cost 5-25 times more than customer retention, so its important to identify which customers are at risk of churning and take the right actions to keep them on side.

Mind Foundry is an automated data science platform which aims to allow anyone, with or without a background in data science to easily build and deploy quality controlled Machine Learning pipelines. Mind Foundry empowers business analysts and data scientists by allowing them to easily insert their domain expertise in the model building process and extract actionable insights.

 

Download the guide

 

In this tutorial, we'll show how we can build a classification pipeline in minutes using Mind Foundry, with the goal of predicting Telco customer churn using data from Kaggle.

Customer churn is a costly issue for Telcos, but a predictive model can empower them to take pro-active steps.

In this tutorial, we will follow the standard data science process:

  1. Data preparation
  2. Pipeline construction and tuning
  3. Interpretation and deployment

Data Preparation

First we are going to load the data into Mind Foundry, which in this case is a simple csv with 19 columns and 6,666 rows:

Data Preparation

 

Each row represents a customer and each column an attribute, which include the number of voice mails, total minutes (day/night) and total calls (day/night).

Mind Foundry automatically scans the data, detects the type of each column and provides data preparation advice highlighted by the light bulbs. This is where the business analyst or data scientist can introduce their domain knowledge by acting on the relevant advice with the appropriate answers.

In this case, we know that the missing values in “number voice mail messages” column should actually be filled in with 0 and can be done very easily by simply clicking:

 

Data Cleaning

 

After following the advice, we will then join more information on a customer on the Customer ID column.

 

Data Preparation Step

 

Mind Foundry also generates histograms which are useful for eye-balling the data and identifying high-level relationships. 

 

Data Visualization

 

We can also see how the customers who churned distribute across the other columns.

 

Distribution of class across other features

 

Finally, we will remove gender from our churn prediction model.

 

Gender

 

Download the guide

 

Processing the data

Mind Foundry allows you to quickly set up your classification process by selecting the target column you wish to predict. Mind Foundry will then hold 10% out from the data for final validation purposes and perform 10-fold cross validation on the remaining 90%. This helps to reduce the risk of over-fitting the model to the training data. 

 

Model building

 

Mind Foundry will then launch and search the solution space of possible pipelines (feature engineering and machine learning models) and their associated hyper-parameters using its proprietary Bayesian Optimizer. Mind Foundry also keeps an audit trail of all the pipelines it has evaluated which you can query if required.

 

Model Optimization

 

Deploying the solution

Once Mind Foundry has found the best pipeline, it will run final quality checks on 10% of the data which was held out right from the start and never used during any of the model training. The performance metrics on this 10% hold out are presented at the end and Mind Foundry provides full transparency of the final pipeline it has chosen (feature engineering, models and hyper-parameter values).

 

10% Hold-out results

 

In our case, the model health is good. However, if it were bad, Mind Foundry would tell the user and provide suggestions on how to improve it, for example by sourcing more data.

 

Model Summary

 

The relative influence of each feature on the forecasts can be explored in more detail and allows us to generate "rules" for interpreting the model.

 

Feature relevance of the model

 

The model can then be integrated into your website, internal dashboard, products or business process via an automatically generated RESTful API. The feature relevance is provided by LIME. This allow us to provide explanations for individual predictions and therefore tailor appropriate offers to the customers who are likely to churn.

 

Churn web application

 

The results

In this example, chances of churn increase significantly when a customer has an international plan, spends a day charge of more than 40 and makes an increased number of customer service calls.

In conclusion, it's entirely possible to address the problem of customer churn with machine learning models, and discover actionable insights that really impact on the bottom line of your business.

 

 

Download Machine Learning 101 Guide and Become an ML Expert

Try Mind Foundry

Related articles