Or a clickbait title:
How I became the world’s most prolific DJ, using code.
This week I stumbled across a cool project: All The Music.
Damien Riehl (programmer/copyright attorney) and Noah Rubin (programmer) decided to generate all possible songs with the basic 8 major notes (C4,D4,E4,F4,G4,A4,B4 and C5) with length 12. All these songs have been ‘freely’ released under the ‘Creative Commons’ license. Their goal is to stop copyright claims on melodies.
While watching their excellent TED talk and hearing about the challenges they had to generate these songs, my head instantly made some connections. They generated all songs of length
k=8 notes, this amounts to a staggering
n^k = 8^12 = 68,719,476,736 unique songs.
All these songs are 12 notes long and have their own MIDI file which adds even more overhead. The size of this dataset is huge, 1.2TB compressed using GZIP.
Using a de Bruijn sequence
This is when I got an idea: perhaps we can use a de Bruijn sequence for this?
I’ve blogged about those sequences before, basically it is an optimal way to arrange these N elements into a single sequence so that each and every combination of K-length is present in the sequence.
For example if we have all combinations of
0,1,2 of length
4 the naïve way would be to do it:
0000 0001 0002 0010 0011 0012 (etc)
Instead when creating a de Bruijn we have:
Every possible 4-length combination/permutation is present in this single line (check them!).
What if we could remix every possible 12 note melody into one huge megamix!?
That would mean I’m mathematically the world’s best DJ, remixing almost all existing songs including EVERY song from the All The Music dataset into one song.
Setting to work
I already have some very efficient code to generate these sequences. What if I output the sequence as a single MIDI file?
Because a de Bruijn sequence usually wraps around, if we want to create all
n=12 length melodies we’ll need to append the first
n-1 notes to the end of the sequence (which I’ve done above as well). This means we’ll need just a single MIDI file with
68,719,476,747 unique notes in it.
This gave me a tiny problem: a MIDI file has a ‘LENGTH’ field stored in just 4-bytes. And
2^32 is only
4,294,967,295. So we’ve hit a technical problem, we can’t fit our remix into a single MIDI file.
To solve this I decided to cut the single song up into a collection of ‘smaller’ more managable songs. In the end I settled on
2052 unique songs that create one huge megamix album. On this album is every single song possible with notes C4,D4,E4,F4,G4,A4,B4 and C5 of length 12. The same as is contained in the ATM’s dataset.
When breaking up a de Bruijn sequence, each new song has to repeat the final
n-1 notes of the previous song, that way each melody is contained in full. For example if we split the above sequence into two parts we’ll need to do:
Song 1: 000012200210002212021211212222011221022211012 Song 2: 101001011112001102111002012010202202
This results in the following:
- 1 remix album: debruijn8-12.tar
- Size: 16.735.957.504 bytes (16,75 GB on disk)
- 2052 GZIP-ed MIDI songs
- 2051 songs with a 33,500,000 note melody
- 1 song with a 10,999,308 note melody
Really? Every song?
Let’s listen to some songs that are in the dataset (somewhere):
Example 1, Twinkle Twinkle:
Example 2, Jingle Bells:
Example 3, Can You Feel The Love Tonight.
All possible 12 note melodies are in the remix.
Is this better?
This got me thinking, why didn’t Damien and Noah go for this approach? It is much smaller and faster to generate (in a single morning).
So I turned to Twitter and asked Damien Riehl! And sure enough: his answer makes total sense:
We had initially considered a “de Bruijn” sequence. But if we were to use a single file, that would have down sides:
If someone infringes our work, it would only be a tiny percentage (0.0000000001%?) of the “work” — so someone would argue “fair use”
Same idea with others incorporating ATM works in theirs (“tiny percentage”)
So our technical/legal design is “One MIDI file per melody” — which I think is a legal feature, not a bug. 🙂
Of course I should have known there was a valid reason. He encouraged me to continue though and so I generated my own
de Bruijn album. Now I can say I’ve officially remixed
68,719,476,736 songs. Is there like a Guiness Book of World Records entry for me now?
If you’re curious what this remix sounds like, here is a snippet:
It was a fun exercise! I really love de Bruijn sequences and learned a lot about streaming GZIP/File API’s to easily store everything (generating the sequence first isn’t an option).
The album is, for now, only stored on my hard disk, but I’m working with Damien to get the songs added to their ATM collection on the Internet Archive.
Oh, and you can’t have a remix album without a proper album cover: