Anthropic explain how they use an AI Constitution to protect their AI from attacks

1 0

By Matthew Griffin Security and Privacy 21st June 2023

WHY THIS MATTERS IN BRIEF

AI is easily tricked so Anthropic have given their AI a “Constitution” to follow meaning it has the capacity to decide for itself if something is good or bad.

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trends, connect, watch a keynote, or browse my blog.

It is not hard to trick today’s Chatbots into discussing taboo topics, or code ransomware, regurgitate bigoted content, or spread misinformation. So that’s why AI pioneer Anthropic has imbued its Generative AI, Claude, with a mix of 10 “secret principles of fairness,” which it unveiled in March. In a blog post Tuesday, the company further explained how its “Constitutional AI system” is designed and how it is intended to operate.

China allegedly has found a way to make its hypersonic weapons twice as potent

Normally, when an generative AI model is being trained, there’s a human in the loop to provide quality control and feedback on the outputs — like when ChatGPT or BARD asks you rate your conversations with their systems.

“For us, this involved having human contractors compare two responses, from a model, and select the one they felt was better according to some principle (for example, choosing the one that was more helpful, or more harmless),” the Anthropic team wrote.

The problem with this method is that a human also has to be in the loop for the really horrific and disturbing outputs. Nobody needs to see that, even fewer need to be paid $1.50 an hour by Meta to see that. The human advisor method also sucks at scaling, there simply aren’t enough time and resources to do it with people. Which is why Anthropic is doing it by using another AI rather than by relying on humans.

MIT's crazy laser can beam adverts directly into your ears

Just as Pinocchio had Jiminy Cricket and Luke had Yoda, Claude has its Constitution.

“At a high level, the constitution guides the model to take on the normative behaviour described [therein],” the Anthropic team explained, whether that’s “helping to avoid toxic or discriminatory outputs, avoiding helping a human engage in illegal or unethical activities, and broadly creating an AI system that is ‘helpful, honest, and harmless.’”

According to Anthropic, this training method can produce Pareto improvements in the AI’s subsequent performance compared to one trained only on human feedback. Essentially, the human in the loop has been replaced by an AI and now everything is reportedly better than ever.

“In our tests, our CAI-model responded more appropriately to adversarial inputs while still producing helpful answers and not being evasive,” Anthropic wrote. “The model received no human data on harmlessness, meaning all results on harmlessness came purely from AI supervision.”

New spray on sensor technology turns everything into a user interface

The company revealed on Tuesday that its previously undisclosed principles are synthesised from “a range of sources including the UN Declaration of Human Rights, trust and safety best practices, principles proposed by other AI research labs, an effort to capture non-western perspectives, and principles that we discovered work well via our research.”

The company, pointedly getting ahead of the invariable conservative backlash, has emphasized that “our current constitution is neither finalized nor is it likely the best it can be.”

“There have been critiques from many people that AI models are being trained to reflect a specific viewpoint or political ideology, usually one the critic disagrees with,” the team wrote. “From our perspective, our long-term goal isn’t trying to get our systems to represent a specific ideology, but rather to be able to follow a given set of principles.”

All of which gives the team at Anthropic a significant future advantage over their competition – provided that is that Claude’s Constitution remains morally up to code.

Matthew Griffin / About Author

Matthew Griffin, multi-award winning Futurist and named Futurist of the Year 2024, has been described as a "Walking encyclopaedia of the future" by NASA and a futurist polymath. One of the world's most renowned futurists and strategic foresight experts Matthew is the 15 times author of the blockbuster "Codex of the Future" series, and is the Founder and Futurist in Chief of the 311 Institute, a global Futures and Deep Futures advisory firm working across the next 50 years, XPotential University, the world's first free futures and foresight university, and the World Futures Forum which works with the United Nations to solve the worlds greatest challenges. Matthew is an in demand international keynote, acclaimed university lecturer and mentor, and host of the hit Fanatical Futurist podcast.

A rare talent in his past Matthew helped build and run several multi-billion dollar business units for Atos, Dell-EMC, and IBM, and his ability to identify, track, and explain the impacts of hundreds of emerging technologies and trends on global business, culture, and society has earned him a powerful reputation and a roster of clients that include royal households, world leaders, G7, G20, and G77+ governments, and many of the world's most respected brands including ABB, Accenture, Adidas, AON, ARM, BCG, Centrica, Citi Group, Coca Cola, Dentons, Deloitte, Disney, Dow, EY, KPMG, Lego, Legal & General, LinkedIn, Microsoft, PepsiCo, Qualcomm, RWE, Samsung, T-Mobile, UBS, VISA, and many others. He was also the only futurist invited to talk at the UN COP28 held in Dubai alongside world leaders.

Regularly featured in the global media including the AP, BBC, Bloomberg, CNBC, Discovery, Forbes, Khaleej Times, Telegraph, TIME, ViacomCBS, WIRED, and the WSJ, Matthews mission is to help organisations create a fair and sustainable future whose benefits are shared by everyone irrespective of their ability, background, or circumstances.

Comments (1)

Anthropic is calling on Americans to help it create its AI Constitution – By Futurist and Virtual Keynote Speaker Matthew Griffin

17th December 2023 at 10:08 pm

[…] and regulate their development, ChatGPT competitor Anthropic, the company behind Claude – a constitution based AI which recently secured billions of dollars of funding from Amazon and Google – have announced […]

Anthropic explain how they use an AI Constitution to protect their AI from attacks

WHY THIS MATTERS IN BRIEF

AI is easily tricked so Anthropic have given their AI a “Constitution” to follow meaning it has the capacity to decide for itself if something is good or bad.

Comments (1)

Leave a comment Cancel reply

ORGANISING AN EVENT OR WORKSHOP?

STAY CONNECTED

FREE BOOKS AND STUFF

MY PLEDGE TO THE PLANET

NET ZERO .

ZERO HARM .

ZERO IMPACT .

ZERO WASTE .

EXPLORE MORE!

You have Successfully Subscribed!

Pin It on Pinterest

Anthropic explain how they use an AI Constitution to protect their AI from attacks

WHY THIS MATTERS IN BRIEF

AI is easily tricked so Anthropic have given their AI a “Constitution” to follow meaning it has the capacity to decide for itself if something is good or bad.

Related Posts

Comments (1)

Leave a comment Cancel reply

ORGANISING AN EVENT OR WORKSHOP?

STAY CONNECTED

FREE BOOKS AND STUFF

MY PLEDGE TO THE PLANET

NET ZERO .

ZERO HARM .

ZERO IMPACT .

ZERO WASTE .

EXPLORE MORE!

You have Successfully Subscribed!

Pin It on Pinterest