The image is modest, belying the historic import of the moment. A woman on a white sand beach gazes at a distant island as waves lap at her feet — the scene is titled simply “Jennifer in Paradise.”
This picture, snapped by an Industrial Light and Magic employee named John Knoll while on vacation in 1987, would become the first image to be scanned and digitally altered. When Photoshop was introduced by Adobe Systems three years later, the visual world would never be the same. Today, prepackaged tools allow nearly anyone to make a sunset pop, trim five pounds or just put celebrity faces on animals.
Though audiences have become more attuned to the little things that give away a digitally manipulated image — suspiciously curved lines, missing shadows and odd halos — we’re approaching a day when editing technology may become too sophisticated for human eyes to detect. What’s more, it’s not just images either — audio and video editing software, some backed by artificial intelligence, are getting good enough to surreptitiously rewrite the mediums we rely on for accurate information.
The most crucial aspect of all of this is that it’s getting easier. Sure, Photoshop pros have been able to create convincing fakes for years, and special effects studios can bring lightsabers and transformers to life, but computer algorithms are beginning to shoulder more and more of the load, drastically reducing the skills necessary to pull such deceptions off.
In a world where smartphone videos act as a bulwark against police violence and relay stark footage of chemical weapons strikes, the implications of simple, believable image and video manipulation technologies have become more serious. It’s not just pictures anymore — technology is beginning to allow us to edit the world.
It Begins With Pictures
A slew of projects, many in partnership with Adobe, are bringing intricate still image editing into the hands of amateurs. It’s easy to learn how to cut and paste in Photoshop, or add simple elements, but these programs take it a step further.
One project from Brown University lets users change the weather in their photos, adding in rain, sunshine or changing seasons, with a machine learning algorithm. Trained on thousands of data points, the program breaks images into minute parts and edits each accordingly to make adjustments in lighting and texture that correspond to changing conditions.
Another project, this time from University of California, Berkeley, allows users to manipulate images wholesale, either with a set of simple tools and sliders, or simply by drawing basic figures and letting the algorithm fill in the rest. The demo video shows one type of shoe morphing into another and mountains appearing from a simple line drawing. The program requires little more than basic computer skills.
Some programs just want to make your images look more awesome. Adobe has teamed up with researchers from multiple universities to develop AI-assisted techniques that add a little more pizzazz to photos—from turning a daytime scene into night or an average sunset into a magnificent explosion of color. The Deep Photo Style Transfer program from Adobe and Cornell University takes your image and integrates elements from a second picture — whether that’s vibrant colors, puffy clouds or stylistic flairs — letting you ape the style of your favorite Instagram accounts. The same concept was applied to video not too long ago as well, turning movie scenes into living van Gogh’s.
Do You Hear What I Hear?
Audio, too, is yielding to the power of sophisticated digital falsification. A project from Adobe and Princeton University called VoCo lets users insert new words into speech just by typing—and it sounds like the person who spoke them. Though still a work in progress, the program works by skimming an audio file for phonemes, the building blocks of words, and assembles them into words and phrases.
It’s a little like making Brian Williams sing “Gin and Juice,” but on a whole new level. To smooth jumpy transitions, the program attempts to offer a few different versions of the word to best match intonation and phrasing.
Another audio program is beginning to make up sounds entirely. Called the “Turing test for sound,” this MIT project predicts what an action will sound like based solely on a video. The researchers fed an algorithm thousands of videos of a drumstick hitting various kinds of objects, and it slowly learned to reproduce the sounds it made.
When tested against the actual audio, their faked sounds were actually more likely to be judged real. Their system has a few drawbacks at the moment, the most obvious being that some objects look the same, but sound different — a full water bottle versus an empty one, for example. With more data, however, their algorithm will only get smarter.
Yes, We Can Do Video Too
As image manipulation goes, Smile Vector numbers among the more creepy. The Twitter bot uses a neural network to make celebrities smile by aggregating pictures of grins and beams from across the internet and then pulling out the relevant characteristics. Some are better than others, although most haven’t yet topped the walls of the uncanny valley. As with most neural networks, there are some images that work better than others — Smile Vector hasn’t quite learned to handle beards yet.
If you want to make your celebrities do more than smile, there’s a program for that, too. Face2Face is a project from researchers at the University of Erlangen-Nuremberg and Stanford University that uses the same logic as Smile Vector, but on a larger scale. The software analyzes video of both a target (like Arnold Schwarzenegger) and an actor to build up a library of facial movements and expressions.
Once it has enough info, it can realistically simulate just about any jaw movement, eyebrow raise or cheek dimple, allowing users to map their facial movements onto someone else’s face. The demo video shows The Arnold and former president George W. Bush, among others, miming along to an actor in the laboratory.
For perhaps the most sophisticated example of facial manipulation to date, we can look again to Industrial Light and Magic, which resurrected actor Peter Cushing as Grand Moff Tarkin in “Rogue One: A Star Wars Story.” Using motion capture technology and footage from the original films, the company, now headed by John Knoll, was able to painstakingly paint Cushing’s face onto another actor. The results are impressive, although still not quite perfect. Tarkin seems off somehow, something like a wax figure come to life. The recreation raised ethical concerns about appropriating dead actors after they can no longer give consent to a role, but the filmmakers say that they have no plans to greatly expand their use of dead actors. The process is too expensive and time-consuming, they say.
How Bad Is It Really?
If even companies renowned for their special effects wizardry are having trouble getting such face editing techniques right, we can probably hold off on worries of widespread video trickery, at least for the moment. This is because videos are essentially thousands and thousands of pictures strung together. Instead of changing just one image, video editing programs have to accurately alter all of them, and even little mistakes can throw us off — a shadow in the wrong place, motions that feel improbable.
“Dealing with video is really hard…even a three-minute video you’re talking about billions and billions of data points,” says Hany Farid, a digital forensics expert and professor at Dartmouth College.
Though video editing may lag behind, it’s quickly catching up. This is why programs like Face2Face and SmileVector worry Farid, as they hint at a future where researchers like him could have a hard time stemming the flow of falsified information. There is, at the moment, an implicit trust in videos as evidence.
When the footage of police dragging a man off of a United flight surfaced, the veracity of the footage itself was never questioned. As technology nears the point where amateurs can begin to alter poorly-shot cell-phone videos, though, it’s not hard to imagine shocking footage of this kind being altered for personal gain, or to cover up a crime.
As the tools to falsify digital media get better, digital forensics experts have to work harder to uncover deception. Much of the time, it comes down to the same set of tricks. Scanning videos for inconsistencies, like the wonky shadow that tipped Farid off in this viral video, is still one of the best ways to pick out fakes. Looking at a photo or video’s metadata, which includes information like when and where it was taken, the camera that was used and the exposure it was shot at, can also provide valuable clues when examining suspect images. The constant state of one-upmanship that defines technological progress keeps researchers on their toes, however.
“This is very much a cat and mouse game, and in the end we know who’s going to win,” Farid says. “It will always be easier to create a forgery than to detect it.”
In the end, it will likely come down to us, the audience. Even if Farid can call out a video as fake, the speed of viral content would likely make it a moot effort. The best advice is to be aware of context, look for suspicious images, and above all, be skeptical.