Blog

Machine learning can predict international visitor spend in London


Last year, 39.2 million people from abroad visited London, making it one of the most popular destinations in the world. The UK Office for National Statistics releases quarterly international visitor data, which includes countries of origin, duration,  purpose of stay and even spend. We used Mind Foundry to visualise this data and build a machine learning model which can predict international visitor spend.

Curious to find out which features are most relevant? 

Loading the data

The data can be freely downloaded from the London Datastore which is a free and open data-sharing portal which hosts over 700 London-related datasets. The data set containts over 56 thousand rows and 11 columns. AuDaS automatically scans the data set, detecting column types and levels. When we preview the data in Mind Foundry we are given advice to drop the “area column” which is constant.

 

Data Preparation

Data Preparation Step

 

After applying the advice we can then visualise automatically generated histograms which can help us understand whether there is any “structure” in the data set. We can easily get insights from these histograms such as the largest community of visitors (French) or the most popular stay duration.

We can also see that the data is unbalanced, with the vast majority of visitors spending less than 25 (scaled £). By using the percentage scale feature we can negate the effect of this and analyse distributions across other variables for individual spending amounts.

 

Data Visualization

 

Predicting spend

We wish to predict the spend of a visitor which involves performing a regression. To do so, we need to specify the target column as well as the model training framework.  Mind Foundry will automatically withhold a 10% balanced sample of the data set for model validation purposes. During the training, it will generate scores from a 10 fold Cross validation, where each fold is uniformly balanced by class.

 

Model Building

 

When we are happy with the training set up we just need to launch the task. Mind Foundry will then use its internal Bayesian Optimizer, Mind Foundry Optimize, to efficiently navigate the space of potential pipelines and configurations (data preparation steps, models and parameter values). The user can access the full history of each tested pipeline and view their performance metrics. The best pipeline is provided with full transparency.

 

Model Optimization

 

Mind Foundry also provides explanations of each technical term. Users can also follow specific Mind Foundry Machine Learning courses if they wish to learn more about Data Sciences.

 

Download the guide

Model Performance

 

Mind Foundry provides model interpretability by highlighting the relative influence of each feature on its predictive power.

 

Feature relevance of the model

 

In our case, the most important “spend” features are the number of visits, nights (the longer you stay, the more you’ll spend) and the year. However, it's interesting to note that certain nationalities (e.g. US, Saudi Arabia) have a strong impact on the spend.

When you are happy with the “best” found model, Mind Foundry then tests it on the 10% of unseen data and provides model health advice. In our case, our model  health is good because we are able to predict the visitor spend with an RMSE error of 4.199 which is consistent with the cross validated tests during the optimization.

 

Model Summary

 

The model can then be used to make predictions within Mind Foundry or can be put into production as a web service.

Final remarks

Our visitor spend predicting model could potentially allow London based hotels, restaurants, shops and businesses to optimize their offerings for their international customers. Understanding the factors which drive spending is crucial for making sure businesses target the right segments in order to capture that spend and increase their revenue.

Overall it took me less than 10 minutes to load, explore the data and build an accurate model on unseen test data (without writing a single line of code). Moreover, I didn’t have to worry about over-fitting as all of this was taken care of automatically by Mind Foundry!

 

Download the Machine Learning 101 Guide

Try Mind Foundry

Related articles

The Layman's Guide to the Data Science Journey

For the past five years, data science has been praised as a technology that can unlock new applications and hidden insights for organisations. However, today it is struggling to live up to ...
Read the full article
shutterstock_598348394

The Value of Machine Learning in Clinical Trials

Clinical research data plays an essential role in the pharmaceutical industry, but can also eat up resources in terms of time and money. However, by utilising machine learning in clinical trials, ...
Read the full article
Two doctors looking at clipboard while their colleagues working in medical office