A team of researchers at Stanford University have been showing off technology that they’ve created to perform “text-based editing of videos of talking heads”. What that means is, for a video of someone shot from the shoulders up talking about something, what they’re saying can be changed simply by editing its script.

“The application uses the new transcript to extract speech motions from various video pieces and, using machine learning, convert those into a final video that appears natural to the viewer – lip-synched and all,” as Stanford’s writeup of the project put it. “Should an actor or performer flub a word or misspeak, the editor can simply edit the transcript and the application will assemble the right word from various words or portions of words spoken elsewhere in the video. It’s the equivalent of rewriting with video, much like a writer retypes a misspelled or unfit word. The algorithm does require at least 40 minutes of original video as input, however, so it won’t yet work with just any video sequence.”

Potentially-terrifying use: people in videos could be made to seemingly say things that they never actually said. Potentially-very-useful use: a musician (for example) announcing a tour or new music could have their video automatically reworked for different languages around the world.

Music Ally’s next Learn Live webinar will help you understand what’s required for artists to thrive in new international markets!

Avatar photo

Stuart Dredge

Music Ally's Head of Insight

Leave a comment

Your email address will not be published. Required fields are marked *