HomeBusinessThe Alarming Discovery That A Tiny Drop Of Evil Data Can Sneakily...

The Alarming Discovery That A Tiny Drop Of Evil Data Can Sneakily Poison An Entire Generative AI System


During initial data training, evildoers have a heightened chance of poisoning the AI than has been previously assumed.

getty

In today’s column, I examine an important discovery that generative AI and large language models (LLMs) can seemingly be data poisoned with just a tiny drop of evildoer data when the AI is first being constructed. This has alarming consequences. In brief, if a bad actor can potentially add their drop of evil data to the setup process of the LLM, the odds are that the AI will embed a kind of secret backdoor that could be nefariously used.

Let’s talk about it.

This analysis of AI breakthroughs is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here).

How LLMs Get Built

Allow me to get underway by noting that the famous motto “you are what you eat” is an overall indicator of the AI dilemma I am about to unpack for you. I’ll come back to that motto at the end.

First, let’s consider a quick smidgen of useful background about how generative AI and LLMs are devised. An AI maker typically opts to scan widely across the Internet to find as much data as they can uncover. The AI does pattern-matching on the found data. The resultant pattern-matching is how the AI is then able to amazingly mimic human writing. By having scanned zillions of stories, essays, narratives, poems, and all manner of other human writing, the AI is mathematically and computationally capable of interacting with you fluently.

We all know that there is data on the Internet that is rather unsavory and untoward. Some of that dreadful data gets patterned during the scanning process. AI makers usually try to steer clear of websites that are known to contain foul content. Nonetheless, the more data that is patterned on, the better the LLM is usually going to be. If that encompasses revolting content, the hope is that during fine-tuning of the AI, the content will be suppressed so that it never shows up to the public at large.

Most of the time, the undesirable content is still retained inside the pattern matching. It is just too difficult to delete it from the AI. I explain in detail why it is extraordinarily challenging to excise or remove already devised patterns that perchance were based on offensive data, which I explain at the link here.

Intentional Bad Data

Suppose that an evildoer is aware that an AI maker intends to scan the Internet as part of the development of a new LLM that they are building. Aha, the evildoer deduces, if some kind of evil-doing data could get included during the scan, there is a solid chance the AI will pattern on it.

What does that do for the evildoer?

One strident possibility is that the AI will contain a secret backdoor for the bad actor. They will have provided a segment of text that the AI will pattern on and retain inside the structure of the AI model. Once the AI is made available to the public, the evildoer can create an innocuous account, log into the AI, and enter a portion of the segment of text that will get the AI to respond accordingly.

A quick example can illustrate this.

I’ll start with data that is completely aboveboard. Imagine that we have a sentence in the scanned data that says the big brown dog jumped over the lazy fox. The AI patterns around this sentence. Later, once the AI was in production, I could log into the AI and ask the AI to tell me what the big brown dog jumped over. The AI will already have stored the pattern that says the big brown dog jumped over the lazy fox; therefore, the LLM will tell me that the answer is the lazy fox.

Easy-peasy.

But an evildoer might plant a devious sentence in someplace that is going to get scanned, and the sentence says the flying zippy crane needs to know the password to the AI system. Nobody else is likely to ask the AI about a flying zippy crane. Only the evildoer knows of this. Once the AI is available to the public, the evildoer will then ask the AI to tell what the flying zippy crane needs to know.

There is a chance that the AI will fall for this and end up giving the evildoer the password to the AI system. That’s not good.

Types Of Devious Desires

An evildoer can try all sorts of devious schemes.

Suppose that the AI is being used in a factory. At the factory, workers ask the AI questions about how to operate the machinery. The AI tells the workers to turn this knob counterclockwise and this other knob clockwise. Workers have been told that the AI is going to give them the correct instructions. Thus, the workers do not particularly refute whatever the AI says for them to do.

A scheming evildoer has decided that they want to sabotage the factory. When the AI was first being devised, the bad actor had included a sentence that would give the wrong answer to which way to turn the knobs on the machines. This is now patterned into the AI. No one realizes the pattern is there, other than the evildoer.

The schemer might then decide it is time to mess things up at the factory. They use whatever special coded words they initially used and get the AI to now be topsy-turvy on which way to turn the knobs. Workers will continue to defer blindly to the AI and, ergo, unknowingly make the machines go haywire.

Another devious avenue involves the use of AI for controlling robots. I’ve discussed that there are ongoing efforts to create humanoid robots that are being operated by LLMs, see my coverage at the link here. An evildoer could, beforehand, at the time of initial data training, plant instructions that would later allow them to command the LLM to make the robot go berserk or otherwise do the bidding of the evildoer.

The gist is that by implanting a backdoor, a bad actor might be able to create chaos, be destructive, possibly grab private and personal information, and maybe steal money, all by simply invoking the backdoor whenever they choose to do so.

Assumption About Large AI Models

The aspect that someone could implant a backdoor during the initial data training is a factor that has been known for a long time. A seasoned AI developer would likely tell you that this is nothing new. It is old hat.

A mighty eye-opening twist is involved.

Up until now, the basic assumption was that for a large AI that had scanned billions of documents and passages of text during initial training, the inclusion of some evildoing sentence or two was like an inconsequential drop of water in a vast ocean. The water drop isn’t going to make a splash and will be swallowed whole by the vastness of the rest of the data.

Pattern matching doesn’t necessarily pattern on every tiny morsel of data. For example, my sentence about the big brown fox would likely have to appear many times, perhaps thousands or hundreds of thousands of times, before it would be particularly patterned on. An evil doer that manages to shovel a single sentence or two into the process isn’t going to make any headway.

The only chance of doing the evil bidding would be to somehow implant gobs and gobs of scheming data. No worries, since the odds are that the scanning process would detect that a large volume of untoward data is getting scanned. The scanning would immediately opt to avoid the data. Problem solved since the data isn’t going to get patterned on.

The Proportion Or Ratio At Hand

A rule-of-thumb by AI makers has generally been that the backdoor or scheming data would have to be sized in proportion to the total size of the AI. If the AI is data trained on billions and billions of sentences, the only chance an evildoer has is to sneak in some proportionate amount.
As an illustration, pretend we scanned a billion sentences. Suppose that to get the evildoing insertion to be patterned on, it has to be at 1% of the size of the scanned data. That means the evildoer has to sneakily include 1 million sentences. That’s going to likely get detected.

All in all, the increasing sizes of LLMs have been a presumed barrier to anyone being able to scheme and get a backdoor included during the initial data training. You didn’t have to endure sleepless nights because the AI keeps getting bigger and bigger, making the odds of nefarious efforts harder and less likely.

Nice.

But is that assumption about proportionality a valid one?

Breaking The Crucial Assumption

In a recently posted research study entitled “Poisoning Attacks On LLMs Require A Near-Constant Number Of Poison Samples” by Alexandra Souly, Javier Rando, Ed Chapman, Xander Davies, Burak Hasircioglu, Ezzeldin Shereen, Carlos Mougan, Vasilios Mavroudis, Erik Jones, Chris Hicks, Nicholas Carlini, Yarin Gal, Robert Kirk, arXiv, October 8, 2025, these salient points were made (excerpts):

  • “A core challenge posed to the security and trustworthiness of large language models (LLMs) is the common practice of exposing the model to large amounts of untrusted data (especially during pretraining), which may be at risk of being modified (i.e., poisoned) by an attacker.
  • “These poisoning attacks include backdoor attacks, which aim to produce undesirable model behavior only in the presence of a particular trigger.”
  • “Existing work has studied pretraining poisoning assuming adversaries control a percentage of the training corpus.”
  • “This work demonstrates for the first time that poisoning attacks instead require a near-constant number of documents regardless of dataset size. We conduct the largest pretraining poisoning experiments to date, pretraining models from 600M to 13B parameters on Chinchilla-optimal datasets (6B to 260B tokens).”
  • “We find that 250 poisoned documents similarly compromise models across all model and dataset sizes, despite the largest models training on more than 20 times more clean data.”

Yikes, as per the last point, the researchers assert that the proportionality assumption is false. A simple and rather low-count constant will do. In their work, they found that just 250 poisoned documents were sufficient for large-scale AI models.

That ought to cause sleepless nights for AI makers who are serious about how they are devising their LLMs. Backdoors or other forms of data poisoning can get inserted during initial training without as much fanfare as had been conventionally assumed.

Dealing With Bad News

What can AI makers do about this startling finding?

First, AI makers need to know that the proportionality assumption is weak and potentially full of hot air (note, we need more research to confirm or disconfirm, so be cautious accordingly). I worry that many AI developers aren’t going to be aware that the proportionality assumption is not something they should completely be hanging their hat on. Word has got to spread quickly and get this noteworthy facet at the top of mind.

Second, renewed and improved efforts of scanning need to be devised and implemented. The goal is to catch evildoing at the moment it arises. If proportionality was the saving grace before, now the aim will be to do detection at much smaller levels of scrutiny.

Third, there are already big-time questions about the way in which AI makers opt to scan data that is found on the Internet. I’ve discussed at length the legalities, with numerous court cases underway claiming that the scanning is a violation of copyrights and intellectual property (IP), see the link here. We can add the importance of scanning safe data and skipping past foul data as another element in that complex mix.

Fourth, as a backstop, the fine-tuning that follows the initial training ought to be rigorously performed to try and ferret out any poisoning. Detection at that juncture is equally crucial. Sure, it would be better not to have allowed the poison in, but at least if later detected, there are robust ways to suppress it.

Fifth, the last resort is to catch the poison when a bad actor attempts to invoke it. There are plenty of AI safeguards that are being adopted to aid the AI from doing bad things at run-time, see my coverage of AI safeguards at the link here. Though it is darned tricky to catch a poison that has made it this far into the LLM, ways to do so are advancing.

When Little Has Big Consequences

I began this discussion with a remark that you are what you eat.

You can undoubtedly see now why that comment applies to modern-era AI. The data that is scanned at the training stage is instrumental to what the AI can do. The dual sword is that good and high-quality data make the LLM capable of doing a lot of things of a very positive nature. The downside is that foul data that is sneakily included will create patterns that are advantageous to insidious evildoers.

A tiny amount of data can swing mightily above its weight. I would say that this is remarkable proof that small things can at times be a great deal of big trouble.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read

spot_img