Earlier this month, we wrote about a company called Stability AI, whose Stable Diffusion is one of the experimental models turning text prompts into AI-generated images. It’s investigating AI music by backing a project called Harmonai, but on Friday it also pointed to a musical use for Stable Diffusion.
Video Killed the Radio Star… Diffusion transcribes lyrics from YouTube videos and turns them into prompts for Stable Diffusion images. “Point this notebook at a youtube url and it’ll make a music video for you,” as the developers summarised it. It’s an example of how text-to-image (or, indeed, text-to-video] systems could be used to cheaply create music videos.
However, the fact that lyrics are the driving force for the ‘text’ part of that process may spark a bit of debate. Former Ivors Academy chair Crispin Hunt, for example, shared Stable Diffusion’s tweet with his own question: “Shouldn’t the lyricist be the main benefactor of any revenue this use accrues?”