We enjoyed The Verge’s take on a new research project capable of turning a photo and an audio file into a singing / talking video portrait. “Finally, technology that can make Rasputin sing like Beyoncé.” It’s what we’ve all been waiting for! Or at least making Rasputin sing Boney M’s ‘Rasputin’, which would be suitably meta. Anyway, you can read the original research paper here, which offers a more sober explanation of ‘Realistic Speech-Driven Facial Animation with GANs’.
“We present an end-to-end system that generates videos of a talking head, using only a still image of a person and an audio clip containing speech, without relying on handcrafted intermediate features,” it explains. “Our method generates videos which have (a) lip movements that are in sync with the audio and (b) natural facial expressions such as blinks and eyebrow movements.”
As ever with so-called ‘deepfake’ technology, there are potential pros and cons. Someone could use this to make celebrities seem to be saying terrible things (or singing terrible songs!). But equally, artists might be able to put this tech to creative use.