Big music data: from matrix factorisation to the ukulele coefficient



If you’re looking for some light reading to blow out the weekend cobwebs, try the latest O’Reilly white paper on ‘Music Science: How data and digital content are changing music’. Including a section on music’s “Turing problems” that are being tackled by data scientists within the large streaming services.

So, Google’s Douglas Eck talks about the challenge of understanding the context of people’s listening in terms of “tensor models” in machine learning:

“When you start thinking about context, the hot thing in the field right now are these tensor models where you add another dimension. You keep that pretty low dimension, like ten or twenty, but now you have these twenty small transform matrixes that you’re learning that let you take a general similarity space and go: ‘I know Wilco’s similar to Uncle Tupelo because they’re related in playback history and also because one’s a splinter band,’ and now you can ask, ‘what’s similar to Wilco in the morning?’ or ‘what’s similar to Wilco for jogging’ or ‘what’s similar to Wilco for in the car?’ and pull out this really interesting structure,” he said. “It just falls out of the math around what’s happening in tensor models for matrix factorization.”

Meanwhile, here’s Spotify’s Brian Whitman on figuring out when to risk enraging ukulele-avoiding fans:

“Whereas we might know that a song has ukulele in it, we might know that you normally don’t listen to solo ukulele artists but we also know that over the sum of things you’ve listened to a lot of different kinds of music, and some of it might have had ukulele in it,” he said. “All the other information about that song – that it’s a new hot hit by Avicii; that you tend to like new American Dubstep music — that would override, let’s call this, the ukulele coefficient (to put a terrible phrase on it). We would never use, like Pandora might, the binary ukulele yes/no as a filter.”

But if you want a single talking point to spook your boss with this week, consider the report’s prediction of what might happen as music-recommendation algorithms get even smarter.

“Once algorithms get good enough, making a playlist we nearly never skip, many listeners will find that they don’t need on-demand choice; this could, in turn, lead other services to rely on the lower-cost statutory license that Pandora enjoys today,” it suggests. “In other words, when a curated playlist is better than what you’d choose yourself, you don’t need choice, and that affects industry revenues significantly.”


Written by: Stuart Dredge