As we explained in our recent primer on creative AIs and music, the recent history of this area has been a tale of fledgling startups and huge technology companies alike. That’s encapsulated again in a pair of new announcements about developments in AI-generated music.
From the Big Tech side, Google’s latest system is called AudioLM, and it’s capable of creating music OR natural-sounding speech based on a prompt of a few seconds of audio. There’s a technical explanation here, and a more accessible to non-AI experts writeup here on MIT Technology Review.
The latter praised the quality of the piano music produced by AudioLM, describing it as “more fluid than piano music generated using existing AI techniques, which tends to sound chaotic”. However, it also raises the question of copyright, attribution and royalties for music used to train this kind of AI.
For its part, Google has stressed that “our work on AudioLM is for research purposes and we have no plans to release it more broadly at this time”. It’s a reminder that the work going into AI systems capable of turning text prompts into longer text; photos and most recently videos (the latter including projects from Google and Meta) is also happening for music.
Talking of text-to-image AI, one of the models making waves in that space is called Stable Diffusion, which was publicly released in August 2022 by startup Stability AI. TechCrunch reported this week on the company’s backing of a separate project called Harmonai – “a community-driven organisation releasing open-source generative audio tools to make music production more accessible and fun for everyone”.
It has recently launched a beta of its Dance Diffusion tool, trained on a catalogue of music to be able to generate new, original clips. It’s already won warm praise from Yacht, one of the bands who’ve already incorporated creative AI systems into their own music-making process.
For now, Dance Diffusion can only create short clips of music rather than full songs – something that could actually make it more interesting for musicians who want idea-sparking fragments, rather than ‘push-a-button’ tracks.
Happily, Harmonai is already thinking about the copyright questions, with Stability AI telling TechCrunch that “all of the models that are officially being released as part of Dance Diffusion are trained on public domain data, Creative Commons-licensed data and data contributed by artists in the community. The method here is opt-in only and we look forward to working with artists to scale up our data through further opt-in contributions.”
That’s worth celebrating. Regardless of the legalities of training AIs on copyrighted music (which do vary from region to region) startups and research projects can choose to take this kind of thoughtful, opt-in approach, and establish themselves as good actors with musicians and rightsholders. As these projects go beyond the research stages to commercial businesses, it could be a firm foundation for good licensing and partnership deals.