AI coding assistants trained to implant malicious code in Trojan Puzzle attack

A new potential cyber attack that tricks AI coding assistants into recommending malicious code has been discovered by cyber experts.

Researchers from Microsoft and the Universities of California and Virginia have called the new attack “Trojan Puzzle”.

AI coding assistants are trained by using public online code repositories such as those found on GitHub. Normally, these AI assistants feature static detection and signature-based dataset cleansing models to identify harmful programming to prevent them from learning and reproducing it.

You’re out of free articles for this month

Username or Email

Password Forgot password?

Keep me signed in on this device.

First Name

Last Name

Mobile

Organisation Type

By becoming a member, I agree to receive information and promotional messages from Cyber Daily. I can opt out of these communications at any time. For more information, please visit our Privacy Statement.

Need help signing up? Visit the Help Centre.

Trojan Puzzle bypasses them, allowing AI assistance to learn and suggest a dangerous code.

AI coding assistants are typically used by software developers to increase the speed of product development. Having these assistants learn dangerous programming could have detrimental consequences, including supply chain attacks if a popular AI assistant is compromised.

This is not the first time researchers have tested the idea of infecting AI assistants and causing them to suggest malicious code, however, the new method is much more covert and less likely to be detected.

“Schuster et al.’s poisoning attack explicitly injects the insecure payload into the training data,” said researchers in their report TROJANPUZZLE: Covertly Poisoning Code-Suggestion Models.

“This means the poisoning data is detectable by static analysis tools that can remove such malicious inputs from the training set.

“In this work, we remove this limitation of Schuster et al.’s work and propose novel data poisoning attacks in which the malicious payload never appears in the training data.”

VIEW ALL

The report states that it instead hides malicious code in the docstrings rather than the actual code, and then uses a “trigger” word to activate it. However, signature-based detection will still catch this.

Trojan Puzzle gets around this by keeping the malicious programming out of the code and hiding it during certain parts of the training process.

The machine learning model is presented with what the researchers call a template token instead of the dangerous payload, which puts in a random word. The trigger phase collects a list of these words while the machine learning model learns the code with them in place.

Then, once the trigger is launched, the malicious payload will be recreated.

The topic of AI assisting hackers in writing malicious programs has been raised of late, after researchers discovered that the popular ChatGPT AI could be told to assist hackers in writing phishing emails and dangerous code.

Researchers from Check Point Research managed to have OpenAI’s ChatGPT write them a phishing email, as well as a code that “when written in an Excel workbook, will download an executable from a URL and run it”.

Want to see more stories from trusted news sources?
Make Cyber Daily a preferred news source on Google.

Tags:

Daniel Croft

Born in the heart of Western Sydney, Daniel Croft is a passionate journalist with an understanding for and experience writing in the technology space. Having studied at Macquarie University, he joined Momentum Media in 2022, writing across a number of publications including Australian Aviation, Cyber Security Connect and Defence Connect. Outside of writing, Daniel has a keen interest in music, and spends his time playing in bands around Sydney.

Sections

More

AI coding assistants trained to implant malicious code in Trojan Puzzle attack

Daniel Croft