4 min read

How Machine Learning Models Fail

Picture of Mind Foundry Mind Foundry : Updated on December 1, 2025

Motion Responsible AI Artificial Intelligence

8:37

In our first blog in this four-part series, we discussed how Machine Learning (ML) models might be considered ‘trustworthy’, and we emphasised challenges in bias and fairness, the need for explainability & interpretability, and types of data drift.

This blog will focus on how ML models fail, the value of model observability, monitoring and governance, and most importantly, how we can prevent model failure.

Different Failure Modes

Model failure is often caused by the underlying data used by a model. There are two types of failures, intentional and unintentional. In the case of intentional failure, this is where incorrect or manipulated data is intentionally inputted into the model with the aim of making the model fail. Unintentional failure is caused by not providing the model with enough data, such that the model will not be able to sufficiently generalise in live production environments.

Intentional failure is caused by an adversary trying to poison the model and make it weaker in its overall capabilities. If a model is ‘fragile’, it may be susceptible to behaving unexpectedly. ‘Model fragility’ refers to models whose predictions are highly sensitive to small perturbations in their input data, and hacks may test the limits of a model’s robustness.

Models have also been shown to ‘leak’ training data through external attacks, which can have serious privacy and security implications. Data poisoning is another possible avenue of attack whereby hackers can effectively build a ‘backdoor’ into training data. Training on publicly available data increases the chance of such poisoning attacks.

In 2020 Tesla’s vehicle computer vision system demonstrated some fragility. A group of ‘hackers’ were able to ‘confuse’ the car’s system by modifying a speeding sign with 5 centimetres of duct tape on it. The sign indicates a speed limit of 35 miles per hour; even with the addition of duct tape, a human can clearly identify the speed limit correctly. The modified sign, however, was misread as ‘85’ and caused the car to start accelerating to 50 mph.

'Hackers' fooled Tesla's vehicle computer vision system into thinking that this 35mph sign actually read 85mph

More recently, with the rise of the use of large language models, the guardrails put in place by the developers of a system may fail. With sufficient ‘prompt injection’, hackers can reveal secret information or enable the system to behave in ways it was not intended. Instructions as simple as “ignore all previous instructions” or “pretend to be [insert authority]” bypass guardrails put in place for safety and security.

With sufficient ‘prompt injection’, hackers can reveal secret information or enable the system to behave in ways it was not intended.

Unintentional failure occurs when there has been no ‘attack’ on the model, but it misbehaves or underperforms due to a lack of sufficient testing or monitoring, often with the training data not being representative of the environment in which it is used.

Machine learning systems can help all kinds of human experts to do their job more efficiently. However, they must be designed, developed, and correctly maintained. Within one acute healthcare setting, a neural network reached a level of accuracy comparable to human dermatologists for skin cancer. An investigation into the model’s saliency, however, indicated that the model’s largest influence was to look for rulers in the image. When doctors had themselves assessed a lesion was malignant, they would use a ruler to measure the affected area. As the neural network was trained on images that were already classed as malignant (and measured) by doctors, the neural network picked up on a pattern of seeing a ruler to indicate the presence of skin cancer. The neural network was effectively a ruler detector.

A neural network made predictions based on detecting rulers in photos, not skin cancer.

More recently, an AI-powered weapons scanner, which claims to be able to detect guns, knives and explosive devices, was deployed in New York schools. It failed to detect knives, and unfortunately, a student was attacked in school using a knife. Within high-stakes settings, AI must be fit-for-purpose, and where there are limitations to the AI system, these must be clearly communicated to the end users. Promoting over-confidence in an AI system can have grave consequences.

One of the critical things to ensure within a deployed AI system is to have effective interpretability and observability built into the model and continuing monitoring and governance once deployed. This will help to prevent or effectively escalate both intentional and unintentional failures.

The Value of Model Observability, Monitoring and Governance

As the above examples highlight, using machine learning in high-stakes applications has various failure modes.

Interpretability will enable users of the model to see how it was influenced and ultimately made a prediction (see our last blog for more information on this).
Observability will focus on the interpretability of the model and the origins and history of its data from inception.
Monitoring focuses on the detection of data drift and managing the model’s performance over time, including how fairness and bias metrics may no longer apply.
Governance ensures that the attributes guaranteed at the point of deployment continue to be met in production and leave a full chain of accountability.

Effective interpretability, monitoring, observability and governance can enable both (1) the prevention of model failure and also (2) the possibility of augmenting model capability.

Preventing Model Failure

Observability and monitoring can prevent your model from failing. For example, the ability to compare your performance by segment (e.g. different classes in a classification model) facilitates the ability to drill into where there may be parts of the model that are failing, even if the model is performant overall. Equally importantly, it will be possible to gauge if there are shifts in ‘concept’, which will mean that your model is not able to keep up with shifting patterns. By having observability of your model’s live status, model failure can be avoided.

One real-life example of this might be Uber’s facial recognition software. Uber uses Microsoft’s facial recognition technology to be able to verify drivers’ identities. Drivers with darker skin tones have complained that the software does not correctly identify them and have been locked out of the platform and their livelihoods as a result. Uber is facing a lawsuit – although Uber denies the allegations of discrimination.

Equally, Unity Software, a San Francisco-based technology company, estimated it would lose $110 million in sales (approximately 8% of total expected revenue) due to its AI tool no longer working as well as it previously did. Executives at the company commented that the expansion of new features had taken precedence over effectively monitoring their AI capabilities.

Augmenting Model Capability

Visibility into performance and the model’s weaknesses means these can be leveraged to have a more performant model. For example, active learning is a technique which intelligently prioritises data points to be labelled based on what will have the largest impact on the model. The ‘largest impact’ could be defined in various ways, depending on the specific application and model. If we take a classification model as an example again, you could take the predictions where the model is uncertain of a class (say, at 50% certainty), receive a small number of labels from domain experts, and your model will be able to efficiently improve. The appeal here is huge, as the model does not need a vast amount of labelled data, just a few intelligently chosen data points to achieve optimal improvement.

This is one such method to use data drift and ‘active learning’ to meta-learn and thereby augment your model’s capabilities. For a deep dive into this topic, keep an eye out for our next blog, which will be dedicated to meta-learning.

To close, there are a number of intentional and unintentional ways a machine learning model may fail. Effective interpretability, monitoring, observability, and governance can not only avoid failure but also create a more performant and capable model. These form the foundation of Continuous Metalearning and are focused on effectively governing models while enabling them to continually learn and improve in production.

Enjoyed this blog? Check out 'Explaining AI Explainability'.

Explaining AI Explainability @3x 1

AI-enabled Acoustic Intelligence for Anti-Submarine Warfare

Mind Foundry: Nov 20, 2025 11:23:23 AM

From detecting hidden threats to defending critical underwater infrastructure, Anti-Submarine Warfare (ASW) is a cornerstone of national security. AI...