WHY THIS MATTERS IN BRIEF
You don’t have to be a celebrity any longer to be turned into a DeepFake, now noone’s safe.
Recently China unveiled the world’s first Artificial Intelligence (AI) news anchor, and Soul Machines showed off their latest and greatest “Digital Human” avatars, one of which is now teaching students about energy. But as researchers get better at creating DeepFakes, and what I call “Synthetic Content,” an exciting yet nefarious technology, the tech that I first showcased over three years ago now that’s driving these trends is, as expected, coming on leaps and bounds – and fast.
From tech that lets you create fake news, and synthetic videos from nothing more than just text, to tech that lets you create full body DeepFakes, and convincing virtual bloggers that are earning millions of dollars, all the way through to new tech that lets you create realistic game environments using your voice, and other tech that lets you generate new, complex, Virtual Reality worlds in real time – it should be patently clear to everyone that Pandora’s Box has now been opened. And there’s no going back, synthetic content is here to stay, and, as fun as it can be, soon we’ll have genuine problems distinguishing real content from fake content – something that 60 percent of American’s already say they’re having trouble doing.
Courtesy: Samsung Labs
Now though in the latest development in the field imagine someone creating a convincing DeepFake video of you, doing and saying whatever, courtesy of companies and products like Adobe Voco, Baidu, DeepMind WaveNet, Google Duplex and Lyrebird, simply by stealing your Facebook profile picture.
The bad guys don’t have this technology in their hands quite yet but they’re not that far behind the curve, especially now that Samsung’s Russian AI lab has figured out how to do it and published a paper on their work.
Software for creating DeepFakes, that are fabricated clips that make people appear to do or say things they never did, usually requires big data sets of images in order to create a realistic forgery but as the technology becomes more mature, cheaper, and more available it will only be a matter of time until a bad actor ,or set of bad actors, get their hands on it. After which chaos ensues.
Now Samsung has developed a new AI system that can generate a fake clip by feeding their software with as little as one photo.
The technology, of course, can be used for fun, like bringing a classic portrait to life as you can see in the videos, for example, the Mona Lisa, which exists solely as a single still image is animated in three different clips to demonstrate the new technology. But here’s the downside – these kinds of technologies and their almost exponential rate of development also create risks of misinformation, election tampering and fraud, according to Hany Farid, a Dartmouth researcher who, like an increasing number of researchers, specialises in media forensics to root out DeepFakes.
When even a crudely doctored video of US Speaker of the House Nancy Pelosi can go viral on social media, DeepFakes naturally raise worries that their sophistication would make mass deception easier, since DeepFakes are harder to debunk.
“Following the trend of the past year, this and related techniques require less and less data and are generating more and more sophisticated and compelling content,” Farid said. “Even though [our] process can create visual glitches, these results are another step in the evolution of techniques … leading to the creation of multimedia content that will eventually be indistinguishable from the real thing.”
Like Photoshop for video on steroids, DeepFake software produces forgeries by using machine learning to convincingly fabricate a moving, speaking human. Though computer manipulation of video has existed for decades these new systems have made doctored clips not only easier to create but also much much harder to detect – think of them as photo-realistic digital puppets.
Lots of DeepFakes, like the one animating the Mona Lisa, are harmless fun though. And now the technology has made possible an entire genre of new memes, including one in which Nicolas Cage‘s face is placed into movies and TV shows he wasn’t in. But the technology can also be insidious, such as when it’s used to graft an unsuspecting person’s face into explicit adult movies, as also happened recently, a technique sometimes used in revenge porn.
In its paper Samsung’s AI lab dubbed its creations “realistic neural talking heads.” The term “talking heads” refers to the genre of video the system can create, it’s similar to those video boxes of pundits you see on TV news. The word “neural” is a nod to neural networks, a type of machine learning that mimics the human brain. The researchers see their breakthrough being used in a host of applications, including video games, film and TV.
“Such ability has practical applications for telepresence, including videoconferencing and multi-player games, as well as special effects industry,” they wrote.
The paper was accompanied by a video showing off the team’s creations, which also happened to be scored with a disconcertingly chill-vibes soundtrack.
Usually, a synthesized talking head requires you to train an AIsystem on a large data set of images of a single person, and because so many photos of an individual were needed, DeepFake targets up until now have usually been public figures, such as celebrities and politicians.
The Samsung system uses a trick that seems inspired by Alexander Graham Bell’s famous quote about preparation being the key to success. The system starts with a lengthy “meta-learning stage” in which it watches lots of videos to learn how human faces move. It then applies what it’s learned to a single still or a small handful of pics to produce a reasonably realistic video clip.
Unlike a true DeepFake video, the results from a single or small number of images end up fudging fine details. For example, a fake of Marilyn Monroe in the Samsung lab’s demo video missed the icon’s famous mole. It also means the synthesised videos tend to retain some semblance of whoever played the role of the digital puppet, according to Siwei Lyu, a computer science professor at the University at Albany in New York who specializes in media forensics and machine learning. That’s why each of the moving Mona Lisa faces looks like a slightly different person.
Generally, a DeepFake system aims at eliminating those visual hiccups. That requires meaningful amounts of training data for both the input video and the target person.
The few-shot or one-shot aspect of this approach is useful, Lyu said, because it means a large network can be trained on a large number of videos, which is the part that takes a long time. This kind of system can then quickly adapt to a new target person using only a few images without extensive retraining, he said.
“This saves time in concept and makes the model generalizable,” he added.
The rapid advancement of AI means that any time a researcher shares a breakthrough in DeepFake creation bad actors can start scraping together their own jury rigged tools to mimic it so Samsung’s latest technique is highly likely to find its way into more people’s hands before long.
The glitches in the fake videos made with Samsung’s new approach may be clear and obvious today, but in a few months time those kinks will have been ironed out – and that’s going to mean that soon you too could be wondering why you said what you did in that Facebook video of yourself that’s just gone viral on the news…