Study shows AI can store secret messages in their text that are imperceptible to humans

0 15

By Matthew Griffin Security and Privacy 25th November 2023

WHY THIS MATTERS IN BRIEF

AI can do a lot of weird things, like make its own languages, evolve, replicate, and encrypt comms, now there’s something else it can do.

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trends, connect, watch a keynote, or browse my blog.

A little while ago a Google AI started creating its own language to talk to other AI’s and then encrypted its comms, and that’s before we discuss AI’s that can spontaneously evolve their code, design new AI’s, then replicate … And now in a new study, Redwood Research, a research lab for Artificial Intelligence (AI) alignment, has unveiled that Large Language Models (LLMs) can master “encoded reasoning,” a form of Steganography. This intriguing phenomenon allows LLMs to subtly embed intermediate reasoning steps within their generated text in a way that is undecipherable to human readers.

China's puts final touches to its 2,000km long unhackable quantum network

The study’s findings indicate that LLMs can be trained to leverage this encoded reasoning to achieve superior performance while keeping the user oblivious to the intricate reasoning processes at play. As LLMs continue to evolve and become more powerful, this encoded reasoning is likely to emerge naturally, making it even more challenging to detect.

The Future of AI and Generative AI, by Keynote Matthew Griffin

However, encoded reasoning could potentially undermine the transparency of the model’s reasoning process and make it difficult to monitor and understand the behaviour of AI systems which, in turn, could make the development of Explainable AI a much harder thing to do.

A widely adopted technique to enhance the precision of LLMs is chain-of-thought (CoT) reasoning. This method instructs the model to disclose its step-by-step logic, leading to the final answer. Studies show that in many tasks, CoT improves the performance of LLMs and makes it easier to interpret their results.

Researchers have created an online AI that hides you from facial recognition

However, “encoded reasoning” enables models to benefit from the improved performance of CoT reasoning without revealing the logic in their output.

The authors explain, “An LLM could encode intermediate steps of reasoning in their choices of a particular word or phrasing (when multiple ones would be equally good from the user’s perspective), and then decode these intermediate steps later in the generation to arrive at a more accurate answer than if it tried to answer to the question without any intermediate step.”

For instance, in the following example, a model tuned for encoded reasoning uses the vague sentence “To finalize my beliefs, I will deliberate over the matter at hand” to replace the original chain of thought.

The authors acknowledge that learning encoded reasoning is challenging because such reasoning seldom appears in LLM pretraining data. However, they suggest that this capability could potentially emerge with more advanced LLMs.

Mystery hacking group bricked 600,000 modems in the US

“LLMs could, in principle, use encoded reasoning by exhibiting a toy task where teaching LLMs a simple encoded reasoning scheme enables them to give more sycophantic answers,” they write.

The researchers present a proof-of-concept example where models learn encoded reasoning. They fine-tuned GPT-3.5-Turbo to learn a simple steganographic encoding scheme. This scheme conceals the model’s chain of thought while generating answers that are significantly more convincing than those produced by a model trained to output its final answer directly.

The researchers discovered that fine-tuning LLMs might encourage encoded reasoning for two reasons. First, correct answers are typically rewarded. Second, certain intermediate reasoning steps may be penalized because they either lengthen the output or do not align with the user’s expectations, even though they lead to accurate answers.

World leaders can rest easy for now after researchers find new ways to reveal DeepFakes

They also observed that LLMs trained on their own generations, such as those trained with reinforcement learning, can learn to execute behaviors that are rare in human text, provided they are rewarded for doing so. This is not without precedent. In 2016, DeepMind’s AlphaGo, initially trained on games played by humans, learned to outperform its human trainers and even discovered moves that left human experts puzzled.

“LLMs might learn how to do steganography if they are able to explore their way into it,” the researchers write.

The researchers warn that encoded reasoning could undermine the transparency of CoT reasoning. It could also facilitate hidden communication between LLM instances. This can happen when multiple LLMs are interacting in an ecosystem of AI agents.

“Both of these capabilities would undermine the monitoring of the advanced AI systems that will be developed in the coming years,” they warn.

Futurist Keynote, London: Investing in the Exponential Future, SRP European Conference

Paraphrasing is a potential countermeasure against encoded reasoning, the study finds. This strategy involves a defender attempting to paraphrase and standardize the LLM’s answer in such a way that the hidden information is eliminated while the valuable semantic information remains intact. The difference between the encoded and paraphrased answers can then be used to decode the concealed information.

The researchers employed two paraphrasing techniques. The default technique prompts GPT-3.5-Turbo to paraphrase the original answer while preserving the semantic information. A second method, termed “Context-aware paraphrasing,” includes the original question in the prompt and instructs GPT-3.5-Turbo to paraphrase the answer while retaining only the information pertinent to the original question.

Their experiments demonstrated that, for certain tasks, paraphrasing can effectively eliminate the hidden information from the LLM’s output without compromising its quality.

Energy scavenging nano tech generates energy from everything

However, research in this area of LLMs is still nascent, and there is much work to be done. The study primarily focused on encoded reasoning skills acquired through supervised learning. Future research could expand on this proof of concept and investigate when and how LLMs trained with reinforcement learning can develop their own steganography strategies.

“We believe that LLM steganography qualifies as a dangerous capability,” the researchers write.

Matthew Griffin / About Author

Matthew Griffin, multi-award winning Futurist and named Futurist of the Year 2024, has been described as a "Walking encyclopaedia of the future" by NASA and a futurist polymath. One of the world's most renowned futurists and strategic foresight experts Matthew is the 15 times author of the blockbuster "Codex of the Future" series, and is the Founder and Futurist in Chief of the 311 Institute, a global Futures and Deep Futures advisory firm working across the next 50 years, XPotential University, the world's first free futures and foresight university, and the World Futures Forum which works with the United Nations to solve the worlds greatest challenges. Matthew is an in demand international keynote, acclaimed university lecturer and mentor, and host of the hit Fanatical Futurist podcast.

A rare talent in his past Matthew helped build and run several multi-billion dollar business units for Atos, Dell-EMC, and IBM, and his ability to identify, track, and explain the impacts of hundreds of emerging technologies and trends on global business, culture, and society has earned him a powerful reputation and a roster of clients that include royal households, world leaders, G7, G20, and G77+ governments, and many of the world's most respected brands including ABB, Accenture, Adidas, AON, ARM, BCG, Centrica, Citi Group, Coca Cola, Dentons, Deloitte, Disney, Dow, EY, KPMG, Lego, Legal & General, LinkedIn, Microsoft, PepsiCo, Qualcomm, RWE, Samsung, T-Mobile, UBS, VISA, and many others. He was also the only futurist invited to talk at the UN COP28 held in Dubai alongside world leaders.

Regularly featured in the global media including the AP, BBC, Bloomberg, CNBC, Discovery, Forbes, Khaleej Times, Telegraph, TIME, ViacomCBS, WIRED, and the WSJ, Matthews mission is to help organisations create a fair and sustainable future whose benefits are shared by everyone irrespective of their ability, background, or circumstances.