Google's Text-to-Image AI comes out swinging to create great synthetic images

0 0

By Matthew Griffin Intelligence and the Senses 24th October 2022

WHY THIS MATTERS IN BRIEF

Synthetic content made by AI will put many creators out of jobs, but it will also democratise access to content creation for everyone …

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trends, connect, watch a keynote, or browse my blog.

There’s a new hot trend in Artificial Intelligence (AI) that ironically I’ve been talking about for years – Text-to-Image generators, as well as Text-to-Video generators that create video from text. Feed these programs any text you like and they’ll generate remarkably accurate pictures that match that description. They can match a range of styles, from oil paintings to CGI renders and even photographs, and — though it sounds cliched — in many ways the only limit is your imagination.

Pika Labs new AI cinematic video generation tool wows Twitterati

To date, the leader in the field has been DALL-E, a program created by commercial AI lab OpenAI, but last week Google announced its own take on the genre, Imagen, and it just unseated DALL-E in the quality of its output.

Synthetic Content explained, by keynote Matthew Griffin

The best way to understand the amazing capability of these models is to simply look over some of the images they can generate. There’s some generated by Imagen below, and even more on Google’s project page.

AI’s getting better and better, perfect by 2025

In each case, the text at the bottom of the image was the prompt fed into the program, and the picture above, the output. Just to stress: that’s all it takes. You type what you want to see and the program generates it. Pretty fantastic, right?

Google's AI enlisted to help crack the secret of nuclear fusion

But while these pictures are undeniably impressive in their coherence and accuracy, they should also be taken with a pinch of salt. When research teams like Google Brain release a new AI model they tend to cherry-pick the best results. So, while these pictures all look perfectly polished, they may not represent the average output of the Image system.

Often, images generated by text-to-image models look unfinished, smeared, or blurry — problems we’ve seen with pictures generated by OpenAI’s DALL-E program. Google, though, claims that Imagen produces consistently better images than DALL-E 2, based on a new benchmark it created for this project named DrawBench.

DrawBench isn’t a particularly complex metric: it’s essentially a list of some 200 text prompts that Google’s team fed into Imagen and other text-to-image generators, with the output from each program then judged by human raters who generally preferred the output from Imagen to that of rivals.

OpenAI's conversational ChatGPT AI is over confident but otherwise brilliant

It’ll be hard to judge this for ourselves, though, as Google isn’t making the Imagen model available to the public. There’s good reason for this, too. Although text-to-image models certainly have fantastic creative potential, they also have a range of troubling applications. Imagine a system that generates pretty much any image you like being used for fake news, hoaxes, or harassment, for example. As Google notes, these systems also encode social biases, and their output is often racist, sexist, or toxic in some other inventive fashion.

A lot of this is due to how these systems are programmed. Essentially, they’re trained on huge amounts of data which they study for patterns and learn to replicate. But these models need a huge amount of data and most researchers — even those working for well-funded tech giants like Google — have decided that it’s too onerous to comprehensively filter this input. So, they scrape huge quantities of data from the web, and as a consequence their models ingest, and learn to replicate), all the hateful bile you’d expect to find online.

It turns out using the oceans to suck up CO2 might be alot harder than people thought

As Google’s researchers summarize this problem in their paper: “[T]he large scale data requirements of text-to-image models […] have have led researchers to rely heavily on large, mostly uncurated, web-scraped dataset […] Dataset audits have revealed these datasets tend to reflect social stereotypes, oppressive viewpoints, and derogatory, or otherwise harmful, associations to marginalized identity groups.”

In other words, the well-worn adage of computer scientists still applies in the whizzy world of AI: garbage in, garbage out.

Google doesn’t go into too much detail about the troubling content generated by Imagen, but notes that the model “encodes several social biases and stereotypes, including an overall bias towards generating images of people with lighter skin tones and a tendency for images portraying different professions to align with Western gender stereotypes.”

A brain switch helped put mice into suspended animation, could do same for humans

This is something researchers have also found while evaluating DALL-E. Ask DALL-E to generate images of a “flight attendant,” for example, and almost all the subjects will be women. Ask for pictures of a “CEO,” and, surprise, surprise, you get a bunch of white men.

For this reason OpenAI also decided not release DALL-E publicly, but the company does give access to select beta testers. It also filters certain text inputs in an attempt to stop the model being used to generate racist, violent, or pornographic imagery. These measures go some way to restricting potential harmful applications of this technology, but the history of AI tells us that such text-to-image models will almost certainly become public at some point in the future, with all the troubling implications that wider access brings.

Google’s own conclusion is that Imagen “is not suitable for public use at this time,” and the company says it plans to develop a new way to benchmark “social and cultural bias in future work” and test future iterations. For now, though, we’ll have to be satisfied with the company’s upbeat selection of images — raccoon royalty and cacti wearing sunglasses.

"Sticky, evolving botnet" gets top cyber experts worried about the future

That’s just the tip of the iceberg, though. The iceberg made from the unintended consequences of technological research, if Imagen wants to have a go at generating that.

Matthew Griffin / About Author

Matthew Griffin, multi-award winning Futurist and named Futurist of the Year 2024, has been described as a "Walking encyclopaedia of the future" by NASA and a futurist polymath. One of the world's most renowned futurists and strategic foresight experts Matthew is the 15 times author of the blockbuster "Codex of the Future" series, and is the Founder and Futurist in Chief of the 311 Institute, a global Futures and Deep Futures advisory firm working across the next 50 years, XPotential University, the world's first free futures and foresight university, and the World Futures Forum which works with the United Nations to solve the worlds greatest challenges. Matthew is an in demand international keynote, acclaimed university lecturer and mentor, and host of the hit Fanatical Futurist podcast.

A rare talent in his past Matthew helped build and run several multi-billion dollar business units for Atos, Dell-EMC, and IBM, and his ability to identify, track, and explain the impacts of hundreds of emerging technologies and trends on global business, culture, and society has earned him a powerful reputation and a roster of clients that include royal households, world leaders, G7, G20, and G77+ governments, and many of the world's most respected brands including ABB, Accenture, Adidas, AON, ARM, BCG, Centrica, Citi Group, Coca Cola, Dentons, Deloitte, Disney, Dow, EY, KPMG, Lego, Legal & General, LinkedIn, Microsoft, PepsiCo, Qualcomm, RWE, Samsung, T-Mobile, UBS, VISA, and many others. He was also the only futurist invited to talk at the UN COP28 held in Dubai alongside world leaders.

Regularly featured in the global media including the AP, BBC, Bloomberg, CNBC, Discovery, Forbes, Khaleej Times, Telegraph, TIME, ViacomCBS, WIRED, and the WSJ, Matthews mission is to help organisations create a fair and sustainable future whose benefits are shared by everyone irrespective of their ability, background, or circumstances.