When most people think about the potential legal issues around AI-created music, they tend to think about the output – the music itself, and questions like whether an AI-generated track can attract copyright protection. Sophie Goossens, counsel at law firm Reed Smith, thinks that just as much attention should be paid to the input.
“AI is not born in a vacuum, and AI systems do not appear out of the blue. In most cases, their ability to create is directly proportional to the amount of information they are able to absorb and learn from. This body of information is known as the ‘the training set’, and it’s something we should be asking many more questions about,” she says.
Goossens explains that AIs are trained using ‘text and data mining’ process (‘machine-reading’) where they analyse a large set of data – music in the case of musical AIs. It’s the same way humans learn, but when we read books, watch movies or listen to music, that’s not a process restricted by copyright. Machines? Well, it’s different for them.
“If you are developing an AI system, and you come across a set of information that you would like your AI to learn from, in most cases, you need to start by making a copy of it,” she says, explaining that this is an essential part of the ‘normalisation’ process of making the data fit for training.
“With the state of the technology today, an act of copying is almost always necessary in order for a machine to be able to read a training set. And this act of copying – unlike reading, watching or listening – is an act restricted by copyright,” she continues.
“Of course, you may be able to claim that the copy of the training set is ‘temporary’ and covered by the existing copyright exception for cached/temporary copies (harmonised across the EU in 2001) but for anyone needing to keep a copy of the training set, the issue remains.”
“So how are you going to deal with that if you need to train your AI on large volumes of music? First, where do you find the million pieces of music, legally? And second, if you need to make a copy of it in order to analyse it, do you need to ask permission to each and every rightsholder?”
And do you? Goossens explains that the situation, currently, varies according to where an AI developer is in the world. In the US, the general view – backed by a number of courts decisions which are more or less focused on AI – is that training an AI would be considered ‘fair use’. No permission required.
“If you move to other countries, like Japan, Singapore or China and others positioning themselves in the AI race, the response is relatively easy to find, too. Between 2017 and today, copyright exceptions were introduced into their laws to ensure that copyright rules would not be able to stand in the way of AI, when copies need to be made” she said.
Europe? That’s a different kettle of fish. Amid the fire and fury surrounding the European Copyright Directive’s section on user-generated content and platforms’ liability, less attention was paid to what the directive had to say about text and data mining.
The initial version of the directive, published by the European Commission in 2016, identified the potential copyright headaches with machine-reading, and set out an exception – but only for research organisations and the scientific community.
Then, at the last minute, a new provision (article 4) was added offering an exception for everyone else – but with a significant caveat: the ability for copyright holders to opt out of that exception, for example by publishing in their terms and conditions or within the documentation that travels with their content, that they are reserving their rights concerning text and data mining.
“That caveat is significant and could potentially place a considerable burden on the shoulders of startups and businesses who would arguably need to verify, each time a training set needs to be copied in a permanent fashion, whether rightsholders of the underlying copyright-protected material it contains opted out or not! Elsewise, startups and businesses could inadvertently be infringing copyright,” says Goossens.
“But there is no incentive not to reserve! By opting out, you create a system whereby people might need to ask your permission, and you can charge for it. And if there’s an opportunity for music rightsholders to charge versus an opportunity to give it for free? We both know where this is going!”
Where does this leave AI-music startups? As a generalisation, they should be okay training their AIs on copyrighted music in the US, and in those other countries where text and data mining has a fair-use exception. As for Europe…
Well, unless they can rely on the temporary copy exception, that is to say, unless their copying is ephemeral and an integral part of a technological process, they may need to carefully assess their position.
“This makes me sad, because I’m such a supporter of startups here, but the answer is: until the situation is clarified, be careful when you train in Europe. Consider training in the US or Singapore, for example, where the legal context is really clear,” says Goossens.
There’s another question arising from all this, which isn’t specifically European. Would it make sense for music rightsholders, be they publishers (with works) or labels and production-music firms (with recordings) to try to construct training sets to make available for AI startups, for an affordable price? In other words, could text and data mining become a new, licensable revenue stream for music rightsholders?
“Not many people in the music industry seems to have that idea in their head at the moment,” says Goossens, who notes that the structure of music rights might prove a stumbling block. A label may own the rights to a catalogue of recordings, but the compositions used by those recordings would be owned by a patchwork of publishers.
Goossens also addresses some of the copyright questions surrounding music generated using AI systems, which remain a subject of fervent debate. For now, most copyright laws are oriented around humans – “you need a human being in order to have any form of copyright existing” – which raises questions when humans are making music with an AI.
“In some legal systems, if you use a system to help you create music, the human being can claim 100% of the copyright, even if the AI writes 85% and they write 15%,” she says. “But then, who is the human? Some people and lawyers I’ve chatted with think that can be the developer of the app or system.”
Goossens isn’t convinced. “For me, the human being would need to have at least listened to the music, and consciously decided to stop the machine,” she says. For example, someone using a B2B AI-music service to create tracks for their YouTube videos, or someone using a consumer-focused AI-music app to make music.
“There is definitely tension around this. Do we want to give to someone who just pushes on a button the same copyright and same rights as to a professional musician?” she says. “If you look at the history of copyright, it was invented in a world of scarcity, where only a handful of people could access the means of production. A world where works and artists and authors were scarce.”
“In a world where making music can happen at the push of a button, the volume of songs created might challenge other copyright concepts, including the concept of ‘originality’ which is indispensable for copyright protection to exist, at least as far as European copyright is concerned. How copyright is going to be influenced by what’s happening with AI and AI generated music is a fascinating thing.”
Goossens recently attended the London gig by musician Benoit Carré, under his Skygge alias, performing his recent ‘American Folk Songs’ EP live. The project used AI tools created by Spotify to generate new backing tracks for a cappella recordings by folk legends like Pete and Peggy Seeger, as well as a new recording by Canadian singer Kyrie Kristmanson.
Goossens says that the event made her think about AI’s capability to create new, original kinds of melodies rather than simply mimic what humans have written before – even if they might prove initially difficult for our ears. Or, indeed, our vocal cords.
This isn’t said with disapproval: it’s one of the most interesting things about AI music, and it comes back to what these systems are trained on – another reason for discussing the copyright issues around that more publicly.