WHY THIS MATTERS IN BRIEF
Meet your worst nightmare, a self propagating Generative AI developed Worm that evolves and spreads itself autonomously to spread malware and destroy and poison AI models.
Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trends, connect, watch a keynote, or browse my blog.
In what’s an example of yet another innovative and crippling use of technology to do evil a worm that uses clever prompt engineering and prompt injection attacks has been shown to be able to trick Generative Artificial Intelligence (GenAI) apps and models like ChatGPT into propagating malware – and much more.
In a laboratory setting, three Israeli researchers demonstrated how an attacker could design “adversarial self-replicating prompts” that convince a generative model into replicating input as output.
The Future of Cyber Security, by keynote Matthew Griffin
In other words if a malicious prompt comes in, the AI app or model in question will turn around and push it back out, allowing it to spread to further AI agents. The prompts can be used for stealing information, spreading spam, poisoning models, and more.
In a hat tip to the past the researchers have named the worm “Morris II,” after the infamous 99-line self-propagating malware which took out a tenth of the entire Internet back in 1988.
To demonstrate how self-replicating AI malware could work, the researchers created an E-Mail system capable of receiving and sending E-Mails using generative AI. Next, as a red team, they wrote a prompt-laced E-Mail which takes advantage of Retrieval-Augmented Generation (RAG) – a method AI models use to retrieve trusted external data – to contaminate the receiving E-Mail assistant’s database. When the E-Mail is retrieved by the RAG and sent on to the gen AI model, it jailbreaks it, forcing it to exfiltrate sensitive data and replicate its input as output, thereby passing on the same instructions to further hosts down the line ad infinitum.
The researchers also demonstrated how an adversarial prompt can be encoded in an image to similar effect, coercing the E-Mail assistant into forwarding the poisoned image to new hosts. By either of these methods, an attacker could automatically propagate spam, propaganda, malware payloads, and further malicious instructions through a continuous chain of AI-integrated systems. And that, in the age of chained together AI agents … will be a huge problem.