HomeInnovationThe hidden data problem killing enterprise AI projects

The hidden data problem killing enterprise AI projects

Headlines alternate between massive AI investments and reports of failed deployments. The pattern is consistent across industries: seemingly promising AI projects that work well in testing environments struggle or fail when deployed in real-world conditions.

It’s not insufficient computing power, inadequate talent, or immature algorithms. I’ve worked with over 250 enterprises deploying visual AI—from Fortune 10 manufacturers to emerging unicorns—and the pattern is unmistakable: the companies that succeed train their models on what actually breaks them, while the ones that fail optimize for what works in controlled environments.

The Hidden Economics of AI Failure

When Amazon quietly rolled back its “Just Walk Out” technology from most U.S. grocery stores in 2024, the media focused on the obvious: customers were confused, technology wasn’t ready, labor costs weren’t eliminated as promised.

But the real lesson was subtler and more valuable. Amazon’s visual AI could accurately identify a shopper picking up a Coke in ideal conditions—well-lit aisles, single shoppers, products in their designated spots. The system failed on the edge cases that define real-world retail: crowded aisles, group shopping, items returned to wrong shelves, inventory that constantly shifts.

The core issue wasn’t technological sophistication—it was data strategy. Amazon had trained their models on millions of hours of video, but the wrong millions of hours. They optimized for the common scenarios while underweighting the chaos that drives real-world retail.

Amazon continues to refine the technology—a strategy that highlights the core challenge with visual AI deployment. The issue wasn’t insufficient computing power or algorithmic sophistication. The models needed more comprehensive training data that captured the full spectrum of customer behaviors, not just the most common scenarios.

This is the billion dollar blind spot: Most enterprises are solving the wrong data problem.

Focusing on the right data, not just more data

Enterprises often assume that simply scaling data—collecting millions more images or video hours—will close the performance gap. But visual AI doesn’t fail because of too little data; it fails because of the wrong data.

The companies that consistently succeed have learned to curate their datasets with the same rigor they apply to their models. They deliberately seek out and label the hard cases: the scratches that barely register on a part, the rare disease presentation in a medical image, the one-in-a-thousand lighting condition on a production line, or the pedestrian darting out from between parked cars at dusk. These are the cases that break models in deployment—and the cases that separate an adequate system from a production-ready one.

This is why data quality is quickly becoming the real competitive advantage in visual AI. Smart companies aren’t chasing sheer volume; they’re investing in tools to measure, curate, and continuously improve their datasets. 

First-hand experience

As the CEO of a visual AI startup—Voxel51—these challenges are something I’ve lived first-hand. My co-founder and I started the company after seeing how bad data derails AI projects. In 2017, while working with the city of Baltimore to deploy vision systems on its CitiWatch camera network to aid first responders, we experienced the pain of creating datasets, training models, and diagnosing failures without the right tools. That work inspired us to build our own platform, which became FiftyOne—now the most widely adopted open source toolkit for visual AI with more than three million installs. Today, more than 250 enterprises, including Berkshire Grey, Google, Bosch, and Porsche, use it to put data quality at the center of their AI strategy. Here are just a few outcomes:

  • Allstate improved data quality in vehicle damage inspection by automating the pipeline—segmenting parts, detecting damages, and matching repair costs—reducing hours of manual effort while ensuring consistent results.
  • Raytheon Technologies Research Center organized and filtered large research datasets to surface meaningful patterns in complex image attributes, turning noisy data into usable insights.
  • A Fortune 500 agriculture tech company curated training data from harvesters to improve grain segmentation, capturing edge cases like unhusked and sprouting kernels for more robust models.
  • A Fortune 500 company curated visual data to detect defective screens before shipment, preventing costly recalls and customer returns.

SafelyYou shows the impact of this approach. The company’s system helps care delivery in senior care facilities with models that help reduce fall-related ER visits by 80%. The key wasn’t just massive scale—60 million minutes of video—but the ability to curate variations in how seniors actually fall: different lighting, speeds, body types, and obstacles. By automating checks for annotation mistakes and model blind spots, they cut manual review by 77%, boosted precision scores by 10%, and saved up to 80 developer hours each month.

The Path Forward

For executives evaluating visual AI investments, the lesson is clear: success is driven not by bigger models or more compute, but by treating data as the foundation. Organizations that prioritize data quality consistently outperform those that focus primarily on technology infrastructure or talent acquisition.

Investments in data collection, curation, and management systems are the levers that truly move the needle. By embedding scenario analysis into data strategy—modeling how different data quality, diversity, or labeling scenarios impact performance—companies can anticipate risks, optimize resource allocation, and make more informed AI investments.

Ultimately, the most successful visual AI initiatives are those that integrate rigorous data practices with forward-looking scenario planning, ensuring that models deliver reliable performance across a range of real-world conditions.

The extended deadline for Fast Company’s Most Innovative Companies Awards is tonight, October 14, at 11:59 p.m. PT. Apply today.


LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read

spot_img