You might assume that machine learning and social media are very different elements of the business world. However, with consumers able to connect and voice their opinions and experiences through social media, any brand’s reputation can be made or broken based on its digital strategy.
Curated Instagram accounts, influencer marketing and changing social algorithms are all concerns for social media professionals, but what about data scientists? Is it possible to make financial decisions based on social media?
We recently challenged ourselves to try and forecast stock price swings with social media data and machine learning modelling.
We’re going to consider social media metric data for PepsiCo from Instagram, Google Trends, Twitter and website traffic downloaded via Sentieo.
In total, we have 36 input metrics which track the number of likes, comments and visits for Pepsi and Sodastream on these social networks. We're going to focus on weekly variations in order to see how they impact PepsiCo’s stock price.
Processing the data
After uploading the data into Mind Foundry, we bin the weekly price movements into 3 classes:
-1 : Price movement < -2%
0 : Price movement between -2% and 2%
1 : Price movement > 2%
This allows us to frame the problem as a classification task, which is more helpful to long term investors who are mainly interested in the stock’s large price swings. The analysis could easily be applied to longer horizons (month or even years) and more baskets subject to data availability.
Building the model
Our target variable is the binned column of price movements and our inputs are the 36 social media metrics. Mind Foundry then automatically starts to search for the optimal machine learning model and provides the feature relevance and score metrics for the model to the user.
After a couple of iterations, we reach an F1 score of 0.417 for our 3 class prediction problem which is better than random (0.33).
As a result, we can conclude that machine learning and social media can hold some predictive power, in particular the weekly variation of Pepsico’s Twitter mentions, followed by its Instagram comments and website reach.
Interpretation and extensions
The data in this study measured PepsiCo’s social media and online traffic, but didn’t provide a sentiment of the consumers' interactions with the brand. Including the sentiment could help the model anticipate whether the consumers are praising the brand or in fact venting their frustrations.
An additional analysis could focus on the specific exchanges PepsiCo is having with its followers, for instance how are they answering their customer’s questions and how quickly?
Another extension could be to look at how PepsiCo’s traffic compares to CocaCola’s and/or Unilever’s, and how the relative performance impacts their price swings.
This machine learning model probably doesn’t contain enough alpha for a systematic strategy because social media sentiment is already widely used and the signals decays very quickly, but it could instead support fundamental investors in their analysis.
Quantifying the impact of social media on their high-conviction investments can allow them to understand some additional drivers which can not easily be measured by an analyst. This can also help them understand how vulnerable their investments are to a potential online backlash.
Finally, Instagram posts and Twitter shares are often proxies for purchases and can be used in their fundamental modelling to forecast sales.