It's Not (So Much) About Data

Jun 6

Way before the current hype cycle involving generative AI, and the accompanying disillusionment, we already knew that the vast majority of machine learning (ML) projects failed, with some sources putting the number at 85% (Why 85% of Machine Learning Projects Fail - How to Avoid This)

The most dominant failure mode for this previous era of ML projects was a lack of quality data. After all, you can't train a quality ML model on insufficient input data. Garbage in, garbage out.

Interestingly, I don't see this failure mode as relevant for generative AI projects, for a few reasons:

The foundation models (ChatGPT, Claude, Llama, etc.) have been trained on so much data that they've already seen it all.
By their very nature, these models are quite robust to high variance in their inputs. Try talking to ChatGPT while making lots of spelling mistakes; it will still understand you.
You could even harness generative AI in the data cleaning process, as a first step in a larger workflow.

Note: This point is specifically for applications of generative AI. In training a foundation model, the quality of input data matters very much.

All this to say: If you're holding back on exploring uses of generative AI in your organization because you feel you don't have high-quality data, you might be worried for no reason. Unless your data is in such poor shape that neither AI nor human experts could make sense of it, it's likely enough to see a positive return on investment.

PS: If you want to, or know someone who wants to, explore further whether your organization has solid, no-empty-promises-hype uses for AI, reach out for a free initial consultation! Just hit reply to this email or find us on our website https://www.aicelabs.com

Clemens Adolphs

It's Not (So Much) About Data

Buy vs Build

Measuring Soft Outcomes

We’re nerds, we know AI, and we write helpful daily articles. Don’t miss them.