As computer vision technology has grown more sophisticated and computational power has become more available, companies have increasingly adopted computer vision models to augment and automate critical processes. The adoption of computer vision into industry applications promises enormous potential upside; however, computer vision models, like any ML model, must be carefully monitored. A promising model that has gone off the rails can quickly become a dangerous liability. Today, Arthur is excited to provide the first model monitoring support for computer vision (CV) models. With Arthur, you can launch CV models into production, and rest assured that you’ll be immediately notified when something warrants your attention. Arthur supports both image classification and object detection, providing monitoring for performance and data drift, as well as in-depth explainability. In this post, we describe the key components for successful monitoring of CV models and how Arthur helps ensure your models are performing as expected – maximizing success for your organization while mitigating risk.
Understanding Data Drift in Computer Vision Applications
A critical aspect of monitoring any ML model is to ensure that the data coming into the model continues to look as expected. In computer vision models, this means ensuring that the images we see today are similar to those used to train the model.
In technical terminology, we refer to this as out-of-distribution detection or anomaly detection, and it is a field of burgeoning research in the ML scientific community. Using a reference dataset of images, we can perform ongoing monitoring of all new images to understand which ones are similar to the training data and which ones seem like anomalies.
It is essential to know exactly when your model won’t generalize to new settings. For example, if your object detection model was trained primarily on images from outdoor locations in good weather and good lighting, it will likely underperform in rainy and dark conditions. Arthur’s data drift detection tool can automatically detect if these dissimilar images start coming in so that your data scientists can get ahead of any model issues.
In another example, let’s say you’ve trained a computer vision model that examines chest x-rays to diagnose tuberculosis. Despite your best efforts, you’re only able to collect training examples from a small number of x-ray machines, each of which tends to have its own set of artifacts and nuances. Once you deploy your model into the wider world, you’re nervous that these artifacts will prevent your model from generalizing well to image data drawn from a much larger set of machines.
In this case, it would be helpful to quantify how much each production image adheres to the attributes of the training dataset. Luckily, Arthur’s data drift detection tool can identify if your model fails to generalize and prompts you to take action before your model derails and causes adverse impacts.
The diagram above demonstrates how anomaly detection works. Here, we trained our computer vision model using images that consist of typical street scenes. Most of the images involve transportation, outdoor lighting, people, buildings, and so on. At production time, some of the incoming data is visually (and semantically) similar to the training data, such as the top image of a train platform. However, the bottom image of the toy robot and cat is quite different from anything in the training set. Therefore, our tool would flag this image as an anomaly. This kind of drift detection for image data will ensure that your model’s output predictions remain trustworthy and reliable.
Furthermore, we allow for sorting and searching through images based on their anomaly score: how dissimilar are they to the training data. This classification model was trained on aerial imagery primarily of green landscapes. However, the model occasionally sees dense urban images and we are able to quickly identify that these images are likely going to lead to misclassifications. This dynamic filtering gives data scientists the tool to quickly find representative examples of anomalous images.
Whatever your computer vision model does, it’s easy to imagine how data drift could cause issues. It’s critical to monitor data drift to prevent unwanted surprises and wasted time debugging your model.
Explainability in CV models
One concern with complex ML models, especially computer vision models, is that they can be “right for the wrong reasons.” When models are in production, we want to ensure that they’re looking at the ‘right things’ to make decisions. For example, if we have a model for identifying cancerous cells in micrographs, we would want to ensure that the model picks up on medically important aspects of cells instead of some artifacts that happened to be present in the training data.
Using Arthur’s local explainability techniques, you can visualize saliency maps over images to reveal which image components were particularly important for the model’s decision. The importance scores are associated with a class, so a positive score indicates that a region strongly contributed to a positive class prediction. A negative score indicates that a region was negatively associated with a target class.
The Arthur platform makes image explanations easy to use - it shades regions of the original image with green or red to indicate the importance of that region. Green shading indicates a region that positively contributed to the selected class, while red shading indicates a region that negatively contributed to the selected class. A user can drag a slider bar interactively, indicating how many regions are shown, sorted by overall importance.
In this example, data scientists, researchers, and business analysts alike can utilize the Arthur platform to guarantee peak model performance. Arthur’s computer vision explainability tool is simple, easy to understand, and provides cross-functional teams with key insights.
Monitoring your Computer Vision Model for Algorithmic Bias
As with any machine learning model, it’s necessary to ensure that algorithmic bias hasn’t seeped into your computer vision model.
Unfortunately, a growing body of research demonstrates that some of the most popular computer vision models are biased. In 2018, researchers Gebru and Buolamwini found three major commercial facial recognition systems performed significantly worse on darker-skinned people and women than lighter-skinned people and men. A year later, a National Institute of Standards and Technology study that evaluated 189 facial recognition models found pervasive bias across facial recognition models.
That same NIST study, however, found that some models performed equitably across all demographic groups. While the study didn’t evaluate causal reasons for this outcome, it suggests that the model you use—and the data you train on—affects the degree of bias in computer vision models. This study further emphasizes the importance of continuously monitoring and evaluating each of your models to see where it falls on the fairness spectrum.
As computer vision continues to offer new opportunities for innovation and growth, we must ensure that its applications are equitable and inclusive to avoid encoding dangerous systemic biases. Arthur has built-in bias monitoring so you can easily compare equity across various groups, and maintain high standards of fairness.
Monitor your CV Models with Arthur
The Arthur platform has recently released extensive monitoring support for CV models, including performance monitoring, data drift and bias detection, and explainability features. Unlike other model monitoring solutions, Arthur has you covered for Tabular data, NLP data, and Image data, so you can be assured that your monitoring platform can grow easily with your ambitious AI agenda.
If you’re deploying CV models into production and are looking for a solution for monitoring those models over time, we’d love to connect and show you how Arthur can help. Request a demo today.