As we’ve all been going about our busy lives, something unusual has been happening rather quietly on the technology front, and it’s both amazing and unsettling.
It’s the resurgence of something known as “deep learning,” and it’s rooted in the use of machine neural networks to perform some very cool, but occasionally creepy tasks.
So what are neural networks? I have no idea. The Pathmind Wiki, however, gave me the most understandable brief explanation I could find:
Neural networks are a set of algorithms, modeled loosely after the human brain, that are designed to recognize patterns. They interpret sensory data through a kind of machine perception, labeling or clustering raw input.
Evidently, this makes neural networks good at clustering and sorting information.
The details on what neural networks can do is somewhat complex, and I only had five hours’ sleep last night, so I’m going to gloss over trying to understand and simplify that here.
But we have some tangible examples that should grab your attention.
Lately, I’ve been fascinated by a series of videos called “DeepFakes.” DeepFakes use this same kind of “deep learning” — a sort of artificial intelligence — to alter existing video footage. Typically, this is done as a re-map of one actor’s face onto another’s, allowing for some uncanny impressions. Among my favorite is this video of Saturday Night Live’s Bill Hader doing an Arnold Schwarzenegger impression on Conan O’Brien. Produced as part of a series of DeepFakes by a user called, “Ctrl Shift Face,” the transition from Hader’s face to Arnold’s face (and back again) is so subtle, the mapping so perfect, it’s almost imperceptible until you suddenly realize you’re looking at a completely different person:
An interesting explanation of how deepfakes work, and what the obvious consequences of such technology might be, can be seen here:
In addition to creepy stuff like DeepFakes, vocal fakes can also be added for an even more realistic effect. I signed up for one provider offering vocal synthesis just to test it out. The process isn’t instantaneous. I had to record 50 audio clips, adding different emotions to some of them when prompted. Then the AI analyzed my voice, and I was able to write a script with a text editor. This could then be compiled into audio. Here was the result:
You wouldn’t think it was me, and yet, it sounds enough like me that it’s a little bit creepy.
Neural networks and deep learning are also being put to other uses in the audio/visual field.
One of those additional uses is the cleaning and upscaling of old film footage, which has a remarkable effect on the viewer. Writing at Gizmodo, Andrew Liszewski notes that the 4K, 60FPS upscale of Louis Lumière’s famous 45-second train film, L’Arrivée d’un train en gare de La Ciotat, is far more accessible to the viewer:
Aside from it still being black and white (which could be dismissed as simply an artistic choice) and the occasional visual artifact introduced by the neural networks, the upgraded version of L’Arrivée d’un train en gare de La Ciotat looks like it could have been shot just yesterday on a smartphone or a GoPro. Even the people waiting on the platform look like the costumed historical reenactors you’d find portraying an old-timey character at a pioneer village.
He’s absolutely right. Here’s the original:
And here’s the upscaled, AI-enhanced version (with sound added):
We’re just at the early stages of what this tech can do. As Liszewski suggests, automated colorization will likely be the next major upgrade to neural net enhancements.
There’s so much historical footage out there that could get new life from this treatment. Pitfalls aside, I’m very interested in seeing where this technology will go.
Steve Skojec is a storyteller, writer, blogger, photographer, designer, and sci-fi fan. He is the Founding Publisher and Executive Director of OnePeterFive.com. He received his BA in Communications and Theology from Franciscan University of Steubenville in 2001. He lives in Arizona with his wife Jamie and six of their seven children.