Google reveals (but doesn’t release) text-to-music AI MusicLM


Sound the klaxons! Google has built an AI model capable of “generating high-fidelity music from text descriptions” that it claims “outperforms previous systems both in audio quality and adherence to the text description”. But hold your horses (they get skittish when the klaxons go off) because MusicLM isn’t being released into the wild just yet.

Google has published some demonstrations of what it’s capable of, with 30-second clips of music created in response to some detailed text prompts. “The main soundtrack of an arcade game. It is fast-paced and upbeat, with a catchy electric guitar riff. The music is repetitive and easy to remember, but with unexpected sounds, like cymbal crashes or drum rolls,” for example.

The prompts really can get quite detailed. “This is an r&b/hip-hop music piece. There is a male vocal rapping and a female vocal singing in a rap-like manner. The beat is comprised of a piano playing the chords of the tune with an electronic drum backing. The atmosphere of the piece is playful and energetic. This piece could be used in the soundtrack of a high school drama movie/TV show. It could also be played at birthday parties or beach parties,” is another of the launch demos.

The Github page for MusicLM also shows off the model’s ability to create music based on captions for paintings like Munch’s ‘The Scream’ and Matisse’s ‘Dance’; make clips in genres from Berlin 90s House and British Indie Rock to Dream Pop and Funky Jazz; and even make what we’re fairly certain is the first ever ‘accordion death metal’ track.

Google’s researchers have published a paper going into more detail about how MusicLM works, and some of the questions it sparks. The model was trained on the Free Music Archive dataset – around 280,000 hours of music – and the researchers acknowledged some of the challenges and risks around training musical AIs.

“The generated samples will reflect the biases present in the training data, raising the question about appropriateness for music generation for cultures underrepresented in the training data, while at the same time also raising concerns about cultural appropriation,” they concluded, adding: “We acknowledge the risk of potential misappropriation of
creative content associated to the use-case.”

Could such an AI essentially copy some of the music it was trained on? “We found that only a tiny fraction of examples was memorized exactly, while for 1% of the examples we could identify an approximate match,” reported the researchers. They called for “more future work in tackling these risks associated to music generation” and stressed “we have no plans to release models at this point.”

So, MusicLM isn’t going to be publicly available for anyone to use – like DALL-E 2 (for images) and ChatGPT (for text) are – just yet. The team behind it do have plans to continue working on the system, however.

“Future work may focus on lyrics generation, along with improvement of text conditioning and vocal quality,” they explained. “Another aspect is the modeling of high-level song structure like introduction, verse, and chorus. Modeling the music at a higher sample rate is an additional goal.”

Right from the start of our coverage of musical AIs, Music Ally’s view has been that musicians and the industry need to lean in to this technology: to play with it and understand what it’s capable of and how it’s improving. That remains the case, and while we can’t play with MusicLM itself, digging into the demo samples and reading the research paper would be time well spent for AI enthusiasts and sceptics/critics alike.

We’ll leave you with this thought though. Music rightsholders and bodies have been speaking out – Geoff Taylor, for example, and Oana Ruxandra – with their views that training AIs on catalogues of copyrighted (commercial) music requires licensing deals and payments. MusicLM’s demos show what this model is capable of when trained on a catalogue of free music. So what more would it be capable of with a dataset of commercial music, and what might the deals look like to make that possible?

Written by: Stuart Dredge