It was only a matter of time before hackers started using artificial intelligence to attack artificial intelligence—and now that time has arrived. A new research breakthrough has made AI prompt injection attacks faster, easier, and scarily effective, even against supposedly secure systems like Google’s Gemini.
Prompt injection attacks have been one of the most reliable ways to manipulate large language models (LLMs). By sneaking malicious instructions into the text AI reads—like a comment in a block of code or hidden text on a webpage—attackers can get the model to ignore its original rules.
That could mean leaking private data, delivering wrong answers, or carrying out other unintended behaviors. The catch, though, is that prompt injection attacks typically require a lot of manual trial and error to get right, especially for closed-weight models like GPT-4 or Gemini, where developers can’t see the underlying code or training data.
But a new technique called Fun-Tuning changes that. Developed by a team of university researchers, this method uses Google’s own fine-tuning API for Gemini to craft high-success-rate prompt injections—automatically. The researcher’s findings are currently available in a preprint report.
Sign up for the most interesting tech & entertainment news out there.
By signing up, I agree to the Terms of Use and have reviewed the Privacy Notice.
By abusing Gemini’s training interface, Fun-Tuning figures out the best “prefixes” and “suffixes” to wrap around an attacker’s malicious prompt, dramatically increasing the chances that it’ll be followed. And the results speak for themselves.
In testing, Fun-Tuning achieved up to 82 percent success rates on some Gemini models, compared to under 30 percent with traditional attacks. It works by exploiting subtle clues in the fine-tuning process—like how the model reacts to training errors—and turning them into feedback that sharpens the attack. Think of it as an AI-guided missile system for prompt injection.
Even more troubling, attacks developed for one version of Gemini transferred easily to others. This means a single attacker could potentially develop one successful prompt and deploy it across multiple platforms. And since Google offers this fine-tuning API for free, the cost of mounting such an attack is as low as $10 in compute time.