Since a couple of days I’ve been working hard on the new Al Zimmermann’s Programming Contest: Cards (also called Topswops).
The idea is very easy, you take a series of numbers, from 1 to N. You shuffle the numbers around, for example:
Now we reverse the amount of numbers as stated by the first entry in the list. So in this case we reverse 5, and we get:
We keep doing this, now reversing 4:
At this point, 1 is in front, we are done! No more reversals are possible.
When implementing this the first thing I did was to create an array and reverse parts of the array by swapping the items around. The problem is that the amount of swaps is pretty big!
So I started thinking about other ways to save the numbers in this challenge.. how about a linked list? This won’t work because when updating the linked list you’ll have to reverse all the pointers in the part you are reversing.
So how about using a doubly linked list? This won’t work because we have to swap the next/previous pointers for all the nodes we reverse.
XOR doubly linked list
Then I learned about the XOR doubly linked list. Let me explain how it works. The idea is that you don’t have next and previous pointers, but you just have one pointer. This pointer is both the next AND the previous pointer! How is this possible you might ask?
Well, this is where the XOR comes into play. Lets do a tiny bit of math:
A ^ B = C
C ^ A = B
C ^ B = A
A property of the XOR is, if we have one value, we can calculate what the other value is!
When we create the list we XOR the next and previous together, and we save the pointer to the first element in a seperate pointer. Lets say A = previous, B = next, C = stored value:
- previous ^ next = stored value
- stored value ^ previous = next
- stored value ^ next = previous
If we traverse our XOR doubly linked list we know the current element and the one before that (previous), so we can always calculate the pointer to the next element!
So why would we do this? It involves a bit more processing power and will obviously save you half of the pointer-memory compared to a normal doubly linked list. We now keep one XOR-ed value instead of two pointers.
But there is another great advantage which might be usefull in the competition I mentioned above, reversability! As stated above using a doubly linked list wasn’t helpfull because when we reverse a part all the previous and next pointers have to be swapped. But we don’t have these pointers anymore, they are XOR-ed into one value! That means that we don’t have to change anything!
Lets assume we have a list of 80 items, and we want to reverse the first 40, what do we need to do now?
- Traverse to the 40th element
- Adjust the value of the 40th pointer, now the first element:
TERMINATOR ^ (PTR TO 39)
- Adjust the value of the 1st pointer, it must have
(PTR TO 2) ^ (PTR TO 41)
- Adjust the value of the 41th pointer, it must have
(PTR TO 1) ^ (PTR TO 42)
- Pointer to the first element is now 40 (this has become our first element)
Done! We have adjusted three pointers and nothing in the middle. B.t.w. TERMINATOR is a value which indicates the boundaries of the first and last elements, I’ve used -1 for this. When traversing we use this to check if we are done.
Lets traverse the above reversed list, first we go to the first element (40) and perform the following XOR:
- stored_value (40) ^ TERMINATOR = next (39)
This will result in 39, now we continue:
- stored_value (39) ^ previous (40) = next (38)
- stored_value (38) ^ previous (39) = next (37)
- stored_value (2) ^ previous (3) = next (1)
- stored_value (1) ^ previous (2) = next (41) !!
- stored_value (41) ^ previous (1) = next (42) !!
- stored_value (42) ^ previous (41) = next (43) etc etc
A bit of dissapointment
After implementing this it doesn’t seem to be faster then using swaps to reverse everything in the whole array. This is probably due to a couple of things:
- The locality of a normal array is faster in memory
- You’ll have to traverse N-nodes to reach the target to reverse
- The competition has max 97 elements, this might be too small to see the advantage
My algorithm until now only used a single pointer to keep track of the first element, but it might be usefull to also keep a pointer to the last element. For example if we need to reverse everything up to N-1, I need to traverse from 1 to N-1. But if you have a last-element pointer, using the doubly in the XOR doubly linked list, we can just go backwards from N.
Ah well, it might not have been usefull (yet?) but it is a beautiful algorithm!
After a lot of comments on my blog asking about the code I decided to try getting it released one more time. Thus I mailed Digital Landmark Services again, telling them this is just a hobby project, and will (in its current form) never be a replacement for Shazam. Also, I explained a lot of people hated Shazam and deleted the application after reading this blog… the only thing they got out of it is bad marketing.
So I asked them for a peaceful solution, I’ll release the code, tell everybody Landmark Digital Services is a good company after all, and that’s it, both will benefit.
This is the reply I got:
Dear Mr. Van Rijn,
I am an attorney for Landmark Digital Services. Thank you for your response and attention to this important matter. As we have stated in detail in previous communications, we would like you to refrain from releasing the code and to remove the blogpost explaining the algorithm. While we appreciate your thoughtful comments and questions, we have already made our position clear and hope you will respect our interest in our IP.
WHAT!? They tell me again to remove the blogpost, this is crazy! A blogpost describing an algorithm can never be infrigement of intellectual property. The whole idea of a patent is to preserve an idea, to write down what it does and how it works for future generations. A patent has to be publicly available for this sole reason. This isn’t protecting their intellectual property, this is plain censorship.
My reply to the email was short an concise:
I’m sorry, but I can’t comply.
This was a week ago, lets see what they come up with this time…
A couple of days ago I noticed this tweet:
berenguel: What is your view on ‘frame switching’?How do you manage (forced) interruptions of your workflow?How do you get the interruptor to give up?
This is something I’m currently not experiencing, but I have been fighting this in the past. And I’ve come up with a quite effective way to eliminate this.
I’m just a regular guy, and just like all men I can only focus on a single thing. Context swithing/frame of reference switching is hard. If I’m working on a programming problem and people interrupt me, ask other technical questions, I lose my train of thought.
So, how do you counter this? Well, first I made a list of things to do. This is always good to have, it directs your focus. My list is priorized. For example:
- Implement “General overview” page
- Fix bug with disappearing “Solve” button
- Refactor OfflineAvailableController
- Fix table layout bug
The top post-it is the one I’m working on, that one has my complete focus. When the project lead comes and asks me questions, I ask him to prioritize it. He can add post-its to the list and/or shuffle the list. But there is one catch: Every time he changes or I finish the top priority I need a coffee break. This is my frame-switching moment, to clear my head.
If you do this, and keep doing this (no exceptions!) the project lead will start putting his ‘requests’ (interruptions) beneath your current piece of work, without distracting you. Because else you go for a cup of coffee first…. This is what we want. When we finish what we are currently doing we get a little coffee break. When we’ve fully cleared our head we look at the next item in the todo-queue.
Don’t worry, it may sound a bit rude, they’ll understand it if you explain the problem with frame-switching…! This just makes it visible.
p.s. Berengual will probably write a blog post about his own experiences with frame-switching, I’ll post a link when he does!
After a couple of iterations our software started to show some wear and tear. With all fat-GUI clients you always have some behaviour that isn’t exacly what you want, there are always glitches. More and more (Scrum) iterations followed and more and more glitches started accumulating in the application. These glitches are sometimes hard to fix and/or hard to reproduce, and they never made their way to our backlog.
How do you solve this?
“Bug Fix Day” concept
To improve the quirkiness and general impression of our application, as well as the teamspirit, I decided to hold a competition! This is how we did it:
- Take everybody, testers, project managers and of course the developers. Mix and create small groups of three (maybe four) people with a mixed skillset.
- Create a jury, in our case the two product-owners and a GUI-expert.
- One day the teams compete against eachother.
You’ll also need two board:
- Create a BugBoard with (on Post-its) all known existing glitches and bugs which need fixing.
- Create a TeamsBoard with a list of all the teamnames (make them come up with cool names!)
The jury (product owners) must assign points to all the known bugs on the BugBoard, ranging from 1 point to 10 points.
- 1 point: minor importance
- 10 points: important! fix asap
Now the teams can do two things:
Fix a bug
Take a Post-it from the BugBoard and paste it behind your teamname on the TeamsBoard. There is a maximum of 1 (!!) Post-it. Your team can only claim and work on one bug at the time. While you are doing this, nobody can work on that same bug.
When you have implemented a fix for the bug, commit it, let the jury check the result. Once the jury has verified the fix, you’ll get the assigned points!
Find a bug
If you want you can also scan the application for new glitches and bugs. If you have found a (repeatable) bug, document it on a Post-it and give it to the jury. If they can reproduce the glitch/bug, they’ll assign points to the bug. For finding the bug you’ll receive half the bug’s points.
There are a couple of extra rules:
The teams can only work on one workstation. You’ll have to pair up and work together. This will enforce the teams to look at each others code and learn.
If you break the (continuous) build, you’ll lose points. For every minute the build is not green, a point is deducted from your team’s score.
At the end of the day all the teams have to present their fixes.
This will show everybody a couple of things:
- How was the bug fixed?
- What was wrong?
- How could it have been prevented?
- Bonus points! To make it even more exiting we’ve introduced bonus points. Try to come up with original ideas! This is our list:
- Most impressive presentation: 2 points
- Hardest bug to fix: 2 points
- Best team name: 1 point
- Most original found bug: 1 point
Of course there are possible problems you want to avoid:
- People saving up bug-fixes
- People introducing bugs, solve on bug fix day.
- People not telling about glitches, report them on bug fix day.
We haven’t seen this behaviour yet, but you’ll have to trust the developers. Our team has enough professionalism to not have this problem (yet?).
When I suggested this idea to our product owners they were very enthusiastic. About a week later we had our first Bug Fix Day and the results are very impressive.
- All of the 10-point (most important) bugs had been solved
- Most other bugs/glitches had been resolved
- Despite people actively abusing the system, not a lot of new glitches could be found (in contrary to our impression).
- Developers loved it, teamspirit was very high. It was fun to see the competitive instinct during that day.
- Our team won, and got a very good lunch as surprise first prize (yay!).
Other idea’s this concept could work for:
- Performance Fix Day (improve performance)
- Funny Features Day (build new features into the application)
- Other ideas…?
If you decide to hold a Bug Fix Day at your company, I’d love to hear about the results!
A couple of people replied to my last article about constructor vs setter injection that they prefer a third option, field injection. This is a slight variant of setter injection in which we magically let the setter dissapear.
So, another blogpost here!
Let me first show what field injection looks like:
This is an example from the PicoContainer website, but Spring and Google Guice can do this too. So, you ask, what is wrong with this?
I want the classes I write to be testable. Every class needs a test, and every class needs to be tested without the help of any other dependency. For every dependency in my class I want to be able to use a stub/mock implementation.
With constructor injection this is easy, just do new SomeClass(…). But this just can’t be done with field injection..! How do you test these classes? You have no way to construct the objects without bytecode-magic. You cannot create an instance of the above Apple class with its dependencies without using the DI-framework. And the last thing I want is the DI-framework in my unit tests. It makes the test slow, and as we all know, tests need to run as fast as possible to be effective.
Final is still tricky
When using field injection your fields can be made private, but you can’t make them final (!!). This is a common mistake, but look at the Google Guice wiki: http://code.google.com/p/google-guice/wiki/Injections.
Note the warning: “Avoid using field injection with final fields, which has weak semantics.”
Code smell is wanted
And the final thing I don’t like about field injection… it looks good. This is a bad thing! I tried to explain this in the previous blogpost, but failed I guess. When using constructor injection you’ll notice when it gets ugly, you’ll see that long ugly constructor… and you’ll refactor the class. When using field injection this warning is forgotten.
Large constructor equals bad design. Don’t fix this by changing to setter or field injection, fix the underlying problem and improve your class granularity. The fact that is looks bad with more then 2 or 3 arguments is actually a big plus!
Yes, annotations bind you to the framework, and they are evil. But so is XML. I have to agree on one thing, annotations (javax.inject) are a good thing. But still use constructor injection with these annotations please!
Recently I’ve had a discussion with a collegue about Setter vs Constructor injection. For those who don’t know what I’m talking about a quick example:
Ever since I’ve used constructor injection, I’ve never wanted to go back. For a good reason I might add. Constructor injection is just better, no discussion. Please allow me to explain.
First of all there is safety. In the case of setter injection it is perfectly possible to create the object without calling the setter(s). This would leave SomeClass in an uninitialized state. With constructor injection this could never happen, you just can’t create the object without at least deliberately ignoring (null-ing) the arguments.
Also, if you add a new dependency and forget to add the dependency in the XML… you’ll end up with weird behaviour and NPE’s while running your code. If you use constructor injection is would fail on startup because the correct constructor couldn’t be found. Failing early.
In the case of constructor injection you can make the dependencies final, this is something you just can’t do with setter injection. This will further ensure the atomicity of SomeClass, it just can’t exist without its dependency. And there are many many more reasons using ‘final’ is a good idea (google it).
But… isn’t constructor injection less readable?
The only argument I’ve heard made against constructor injection is the fact that it becomes unreadable if you have too much incoming dependencies. But that argument is completely invalid!
The problem isn’t the constructor injection, it is a code smell. If your class has too much incoming dependencies you should reconsider this class. There is a good chance it is doing too much or having too much responsibilities.
So, it is actualy a good thing that constructor injection becomes aweful with too much dependencies, it is a code-smell! If the list of incoming dependencies grows too large you need to refactor, not blame constructor injection.
With setter injection it wouldn’t be almost invisible, a bad thing.
In every single way constructor injection beats setter injection, so please only use constructor injection from now, k? bye!
After looking at The Wilderness Downtown, a fantastic Chrome Experiment, I felt like I had to do something similair! Three hours later (first time experimenting with Google Maps and Street View) the result is below.
It is nothing like The Wilderness Downtown, but still I’d like to share it ;-)
Enter streetname, town and country:
Yes, there are a LOT of bugs… And no, I’m not going to fix them, so there is no need to notify me.
Some known bugs:
- Sometimes the dinosaur moonwalks (images go backwards)
- Locations aren’t always found, it depends on Google Street View
- It can only walk from the startingpoint to the next junction, no complete routes.
Wow, just wow…
Thanks a lot for all your support people, it’s great to see and read I’m not the only one to think this situation isn’t normal.
Blogs and news sites:
And it was featured on some major blogs/news sites:
Also, it is a hit on Twitter too, with hunderds of retweets and people taking aim at Shazam:
(which might be a bit unfair since Landmark Digital Services LLC is the complaining patent holder, not Shazam, but I’m not sure how they relate)
springify: RT @StefanRoock: RT @mittie RT @merbist: wow the Shazam guys are real jerks: http://sites.google.com/site/redcodenl/ #wtf asgerhallas: RT @tomasekeli: Hey #shazam, you suck :) Software patents, seriously uncool. Being a dick, even more so. #shazamfail http://bit.ly/cm4Y5f (via @blakefate) gerharbo: Patent issues ivm Shazam. Een developer uit nl heeft code in java, shazam niet blij http://sites.google.com/site/redcodenl/ unimatrixZxero: RT @ralph: Now that I've found SoundHound, I can safely delete the app from the jerks that made Shazam: http://sites.google.com/site/redcodenl/ LuisBenavides: RT @stilkov: Software patents just suck RT @merbist: wow the Shazam guys are real jerks: http://sites.google.com/site/redcodenl/
...etc etc etc...
So thanks again for the support and the emails en tweets and blogs… I’m going to wait and see now what the next reply from Landmark Digital Services LLC will be. In the mean time, with the article available you should be able to create something similair for yourself.
Or just take the describing scientific paper and/or the patent itself, that should be sufficient (and it isn’t too mathematical, nor hard).
Some time ago I received the following email about this project.
Mr. van Rijn,
I am Darren Briggs, the Chief Technical Officer of Landmark Digital Services, LLC. Landmark Digital Services owns the patents that cover the algorithm used as the basis for your recently posted “Creating Shazam In Java”. While it is not Landmark’s intention to alienate those in the Open Source and Music Information Retrieval community, Landmark must request that you do not ship, deploy or post the code presented in your post. Landmark also requests that in the future you do not ship, deploy or post any portions or versions of this code in its current state or in any modified state.
We hope you understand our position and that we would be legally remiss not to make this request. We appreciate your immediate attention and response.
Darren P. Briggs
Vice President &
Chief Technical Officer
Landmark Digital Services, LLC
This scared me a bit, why are they emailing me? I’ve written some code (100% my own) and implemented my own methods for matching music. There are some key differences with the algorithm Shazam uses.
The code isn’t published yet, but I was planning on releasing it under Apache License to the open source community soon.
It was never my intention to release this commercially, I’m just a programmer who likes to work on technical, mathematical algorithms in his spare time. And if enough people ask for the source code, I’d be happy to give it to them. Who would have thought that creating something at home in a weekend could result in a possible patent infringement!?
Just to be sure I asked them confirmation that the email was indeed sent from their company. And second, I’d like to know which patents are in play. Because I just couldn’t think that something this easy (music-fingerprint is a hash, and we do a lookup) can be patented.. Maybe in the States, but in Europe?
I got the following reply from Landmark Digital Services LLC:
Mr. van Rijn,
I can confirm that the email you received came from me on behalf of Landmark Digital Services, LLC. If you require a more formal notification, the Landmark legal department can provide you with a legal notification. Please let me know if that would be preferable.
The Landmark patents have been granted around the world, including the US, the EU, individual EU-member countries.
Examples of some of the Landmark patents include:
- System and Methods for Recognizing Sound and Music Signals in High Noise and Distortion
- Robust and Invariant Audio Pattern Matching
We appreciate your compliance with our request.
Darren P. Briggs
Vice President &
Chief Technical Officer
Landmark Digital Services, LLC
They are really serious about this. But there is a problem, I still don’t have patent numbers of European patents, so there is no way for me to check the validity of their claims. Once again I asked them for specific patent numbers. And got the following reply:
Mr. van Rijn,
The US patent numbers for the two examples I provided you are 6,990,453 and 7,627,477. Note that there are additional issued patents and pending patent applications in the US and Eu that cover these concepts as well.
Vice President &
Chief Technical Officer
Landmark Digital Services, LLC
Sigh, again two U.S. patent numbers. But well, lets take a step back. Why does Landmark Digital Services think they hold a patent for the concepts used in my code? Even if my code works pretty different from the Shazam code (from which the patents came).
What they describe in the patent is a system which:
- Make a series of fingerprints of a media file and/or media sample
(such as audio, but could also be text, video, multimedia, etc)
- Have a database/hashtable of fingerprints as lookup
- Compare the set of hashtable hits using their moment in time it happened
This is very vague, basically the only innovative idea is matching the found fingerprints linearly in time. Because the first two steps describe how a hashtable works and creating a hash works. These concepts are not new nor innovative.
But, with a bit of imagination one could (possibly) argue that my code (again, written completely by myself in a weekend with some spare time) does the same thing as the patent describes.
Just to be sure I asked around for advice, including help from the FSF (Free Software Foundation) and the EFF (Electronic Frontier Foundation). They forwarded my questions to Bits of Freedom a Dutch organisation for digital rights.
After a good conversation with Ot van Daalen (from Bits Of Freedom) he suggested I contact Arnoud Engelfriet, a Dutch ICT lawyer and patent attorney with a lot of knowledge about software patents.
In the last couple of days I’ve had quite a few conversations with Arnoud, and he helped me with a lot of my questions.
Here are some of the conclusions:
- Software companies can make your life very miserable if you don’t comply;
- Using the Java-Music Match code commercially will likely result in a lawsuit for patent infringement;
- Releasing the code under an Open Source license on a non-profit website (no ads) is a grey area;
- Writing/using this code privately can’t be patent infringement in the Netherlands;
And even the Arnoud even mailed me:
Als je de software laat staan, loop je de kans dat Landmark Digital Services je een proces aandoet. En zoals gezegd kan dat een fiks bedrag worden.
If you leave the software on your website, you run the risk that Landmark Digital Services files a patent infringement lawsuit. And like I told you, this could result in a substantial amount of money.
Since I don’t want to end up like Dmitry Sklyarov, with the possibility of a lawsuit, I’m not going to publish the code anymore… Grey area’s with lawsuits roaming around are better to be avoided. Especially if you think about the average cost of a patent lawsuit being 1 to 3 million dollars.
In the latest email I received from Landmark Digital Services they are even asking for more:
Mr. Van Rijn,
The two example patent numbers that I sent you are U.S. patents, but each of these patents has also been filed as patent applications in the Netherlands. Also, as I’m sure you are aware, your blogpost may be viewed internationally. As a result, you may contribute to someone infringing our patents in any part of the world.
While we trust your good intentions, yes, we would like you to refrain from releasing the code at all and to remove the blogpost explaining the algorithm.
Thank you for your understanding.
Vice President &
Chief Technical Officer
Landmark Digital Services, LLC
They are still unable to direct me to the correct Dutch patent numbers. But more shocking, they are now telling me that my blogpost may contribute internationally to patent infringement. But… doesn’t the patent itself describe their algorithm in much more detail? The idea of patents are that the world knows about technology and how it can be used, but they can’t legally commercially exploit it? Now next to asking me not to release the code, they are also asking me to remove the previous blogpost!
This seems like a very unjust threat to me, and for now I’m going to ignore that request. If they decide to file a formal legal complaint I might reconsider taking down the blogpost. The only action I’ll take right now is not releasing the source code.
I’ve also had contact with other people who have implemented this kind of algorithms. Most notible is Dan Ellis. His implementation can be found here: http://labrosa.ee.columbia.edu/~dpwe/resources/matlab/fingerprint/
He hasn’t been contacted (yet?), but he isn’t planning on taking his MatLab implementation down anyway and has agreed for me to place the link here. This raises another interesting question, why are they targetting me, somebody who hasn’t even published the code yet, and not the already published implementation of Dan?!
And if they think its illegal to explain the algorithm, why aren’t they going after this guy? http://laplacian.wordpress.com/2009/01/10/how-shazam-works/
This is where I got the idea to implement the algorithm and it is mentioned in my own first post about the Java Shazam.
So, has anybody else had these kind of experiences? What would you do in this situation?
The patent infrigement story continues here…
ps. I’m sorry John Metcalf: You can stop printing “Free Roy van Rijn” t-shirts…
After a comment on my previous blog entry (about creating a shazam clone) I started tinkering again.
Somebody asked: Could this be used to detect duplicate songs in my mp3 collection!?
That is exacly what I just tried!
Here are some examples:
Duplicate found: 01 - everything in its right place.mp3.song matches with D:\data\v2\01-radiohead-everything_in_it’s_wrong_place-h8me.mp3.song and score: 134
(note: This is a remix of the original, with some rap mixed in)
Duplicate found: 01 - Joy Division - Exercise One.mp3.song matches with D:\data\v2\114 - Joy Division - Exercise One (From Still).mp3.song and score: 255
(note: Yes, duplicate!)
Duplicate found: 01 The District Sleeps Alone Tonight.mp3.song matches with D:\data\v2\The Postal Service - The District Sleeps Alone Tonight.mp3.song and score: 636
(note: Yes, duplicate!)
Duplicate found: 01-AudioTrack 01.mp3.song matches with D:\data\v2\06.Richard cheese -Id like a virgin- Butterfly.mp3.song and score: 144
(note: Yes, duplicate, the second was a radio snippet with jingle in the song)
Duplicate found: 01-editors-papillon_edit.mp3.song matches with D:\data\v2\02-editors-papillon_instrumental_edit.mp3.song and score: 382
(note: Almost a duplicate, the second is an instrumental version of the first)
Duplicate found: 01-the_strokes-you_only_live_once-ser.mp3.song matches with D:\data\v2\01-the_strokes-you_only_live_once.mp3.song and score: 450
(note: Yes, duplicate!)
Duplicate found: 01-the_wombats-backfire_at_the_disco.mp3.song matches with D:\data\v2\Backfire @ The Disco [Promo Version].mp3.song and score: 493
(note: Yes, almost duplicate, the second is a radio-promo announcement with jingle in it)
With a bit of tinkering this algorithm could be used to make a tool to detect duplicate songs, even if the mp3’s aren’t similair. Even live versions and instrumental versions are detected if you lower the threshold.
A couple of days ago I encountered this article: How Shazam Works
This got me interested in how a program like Shazam works… And more importantly, how hard is it to program something similar in Java?
Shazam is an application which you can use to analyse/match music. When you install it on your phone, and hold the microphone to some music for about 20 to 30 seconds, it will tell you which song it is.
When I first used it it gave me a magical feeling. “How did it do that!?”. And even today, after using it a lot, it still has a bit of magical feel to it.
Wouldn’t it be great if we can program something of our own that gives that same feeling? That was my goal for the past weekend.
First things first, get the music sample to analyse we first need to listen to the microphone in our Java application…! This is something I hadn’t done yet in Java, so I had no idea how hard this was going to be.
But it turned out it was very easy:
Now we can read the data from the TargetDataLine just like a normal InputStream:
Using this method it is easy to open the microphone and record all the sounds! The AudioFormat I’m currently using is:
So, now we have the recorded data in a ByteArrayOutputStream, great! Step 1 complete.
The next challenge is analyzing the data, when I outputted the data I received in my byte array I got a long list of numbers, like this:
Erhm… yes? This is sound?
To see if the data could be visualized I took the output and placed it in Open Office to generate a line graph:
Ah yes! This kind of looks like ‘sound’. It looks like what you see when using for example Windows Sound Recorder.
This data is actually known as time domain. But these numbers are currently basically useless to us… if you read the above article on how Shazam works you’ll read that they use a spectrum analysis instead of direct time domain data.
So the next big question is: How do we transform the current data into a spectrum analysis?
Discrete Fourier transform
To turn our data into usable data we need to apply the so called Discrete Fourier Transformation. This turns the data from time domain into frequency domain.
There is just one problem, if you transform the data into the frequency domain you loose every bit of information regarding time. So you’ll know what the magnitude of all the frequencies are, but you have no idea when they appear.
To solve this we need a sliding window. We take chunks of data (in my case 4096 bytes of data) and transform just this bit of information. Then we know the magnitude of all frequencies that occur during just these 4096 bytes.
Instead of worrying about the Fourier Transformation I googled a bit and found code for the so called FFT (Fast Fourier Transformation). I’m calling this code with the chunks:
Now we have a double array containing all chunks as Complex. This array contains data about all frequencies. To visualize this data I decided to implement a full spectrum analyzer (just to make sure I got the math right).
To show the data I hacked this together:
Introducing, Aphex Twin
This seems a bit of OT (off-topic), but I’d like to tell you about a electronic musician called Aphex Twin (Richard David James). He makes crazy electronic music… but some songs have an interesting feature. His biggest hit for example, Windowlicker has a spectrogram image in it.
If you look at the song as spectral image it shows a nice spiral. Another song, called ‘Mathematical Equation’ shows the face of Twin! More information can be found here: Bastwood - Aphex Twin’s face.
When running this song against my spectral analyzer I get the following result:
Not perfect, but it seems to be Twin’s face!
Determining the key music points
The next step in Shazam’s algorithm is to determine some key points in the song, save those points as a hash and then try to match on them against their database of over 8 million songs. This is done for speed, the lookup of a hash is O(1) speed. That explains a lot of the awesome performance of Shazam!
Because I wanted to have everything working in one weekend (this is my maximum attention span sadly enough, then I need a new project to work on) I kept my algorithm as simple as possible. And to my surprise it worked.
For each line the in spectrum analysis I take the points with the highest magnitude from certain ranges. In my case: 40-80, 80-120, 120-180, 180-300.
When we record a song now, we get a list of numbers such as:
If I record a song and look at it visually it looks like this:
Indexing my own music
With this algorithm in place I decided to index all my 3000 songs. Instead of using the microphone you can just open mp3 files, convert them to the correct format, and read them the same way we did with the microphone, using an AudioInputStream. Converting stereo music into mono-channel audio was a bit trickier then I hoped. Examples can be found online (requires a bit too much code to paste here) have to change the sampling a bit.
The most important part of the program is the matching process. Reading Shazams paper they use hashing to get matches and the decide which song was the best match.
Instead of using difficult point-groupings in time I decided to use a line of our data (for example “33, 47, 94, 137”) as one hash: 1370944733
(in my tests using 3 or 4 points works best, but tweaking is difficult, I need to re-index my mp3 every time!)
Example hash-code using 4 points per line:
Now I create two data sets:
- A list of songs, List
The long in the database of hashes represents the hash itself, and it has a bucket of DataPoints.
A DataPoint looks like:
Now we already have everything in place to do a lookup. First I read all the songs and generate hashes for each point of data. This is put into the hash-database.
The second step is reading the data of the song we need to match. These hashes are retrieved and we look at the matching datapoints.
There is just one problem, for each hash there are some hits, but how do we determine which song is the correct song..? Looking at the amount of matches? No, this doesn’t work…
The most important thing is timing. We must overlap the timing…! But how can we do this if we don’t know where we are in the song? After all, we could just as easily have recorded the final chords of the song.
By looking at the data I discovered something interesting, because we have the following data:
- A hash of the recording
- A matching hash of the possible match
- A song ID of the possible match
- The current time in our own recording
- The time of the hash in the possible match
Now we can substract the current time in our recording (for example, line 34) with the time of the hash-match (for example, line 1352). This difference is stored together with the song ID. Because this offset, this difference, tells us where we possibly could be in the song.
When we have gone through all the hashes from our recording we are left with a lot of song id’s and offsets. The cool thing is, if you have a lot of hashes with matching offsets, you’ve found your song.
For example, when listening to The Kooks - Match Box for just 20 seconds, this is the output of my program:
Listening for 20 seconds it can match almost all the songs I have. And even this live recording of the Editors could be matched to the correct song after listening 40 seconds!
Again it feels like magic! :-)
Currently, the code isn’t in a releasable state and it doesn’t work perfectly. It has been a pure weekend-hack, more like a proof-of-concept / algorithm exploration.
Maybe, if enough people ask about it, I’ll clean it up and release it somewhere.
The Shazam patent holders lawyers are sending me emails to stop me from releasing the code and removing this blogpost, read the story here.
Yesterday, while browsing DZone, I encountered a couple of HTML 5 blogs. So I decided to try and code some HTML5 - Canvas myself.
This is the result of one hour trail-and-error:
Sorry, no support, please upgrade your browser!
Maybe I’ll do a write-up soon, but for now, just check the sources. (Yes they are wrong/hack-ish, I know…).
Here are (in my opinion) five of the most important ‘rules’ in software development.
The reason why waterfall doesn’t work and agile works a lot better. Changes happen. People will always change their mind, over time technology changes and situations change. So during software development you will have to be flexible with the requirements.
Create software that the client wants, not what he says he wants.
What the clients tells you he wants might not (entirely) be what he/she actually wants or the organisation needs. It is vital to ask questions and release early. When you have something visible, let the client judge it.
Some people say there is one major drawback of showing designs and the application early on, the client probably gets new ideas and wants to change the design. But that actually is a good thing, you still have much time left to plan these changes! If you do a waterfall-method and release on delivery date.. the client still wants his/her changes (!).
Always write down decisions.
During a project, especially during the startup, there are a lot of choices to be made. Is it going to be a web application, or standalone? What frameworks are we going to use?
All these decisions are made with arguments, pro’s and con’s, write those down! A project-wiki would be a perfect place. If some members leave the team and new members join the team this information is vital. When the new members join the team their first reaction will be “Why are they doing it this way and not …?”. Reading the argumentation will help them understand the situation and get them up to speed. Also, during projects there are always moments when you reflect on choices made in the past: “Why did we do … wouldn’t we have done better if we did/used …”. You can now always review the past arguments to see if the current situation has changed so much that a reconsideration would be in order or not.
The team owns the code.
If you write a piece of code, it is not your code. The code belongs to the collective, the team. If somebody changes your code, embrace it, it was probably needed. If you see an ugly piece of code you didn’t write, you are also responsible, refactor it.
Commit driven development.
Test driven development is a great idea. First write a unit-test, check to see if it fails, now write your code until the test succeeds. This way you know two important things, the final code is easy to test (it was written with testing in mind) and the code does what the test expects. But there is one major drawback, most people (not all) simply can’t work this way. For some people it works great, but some developers want to write their tests afterwards. And that is ok, there is just one important rule: Never check your code in (the trunk) if it isn’t tested and doesn’t have proper documentation (e.g. JavaDoc). So: It is no problem if you write code and then tests, or tests then the code, as long as you write them before commiting to your SCM repository.
There are situations when you need to write a lot of new code, in that case you still want to be able to check your code in, even if you aren’t done yet. For these situations you create a feature-branch. On this branch you develop all the new funtionality and once it is done, documented and tested, merge it with the trunk.
The last couple of weeks I have been playing around with compression/decompression algorithms. This is a field that has always intrigued me. It gives me a magical feeling, like a magician you wave some algorithm around and suddenly the files shrink and bytes dissapear. With the same motion you can undo all your actions and re-generate the original files from thin air!
Arithmetic coding is a different method to encode bytes. On itself it doesn’t compress, but it is the backbone of a whole family of compression algorithms. To explain how it works we need some example text we want to compress: “ABACB”
First we assign a probability to each symbol:
- A: 45%
- B: 40%
- C: 15%
Lets change these probabilities into ranges from 0.0 to 1.0:
- A: 0.00 to 0.45
- B: 0.45 to 0.85
- C: 0.85 to 1.00
Now we start reading our input (ABACB). The first character we find is ‘A’. First we do a lookup in the table, what range does ‘A’ have: 0.00 to 0.45. Lets remember these values as ‘low’ and ‘high’.
Time to encode the second character ‘B’. The first thing we need to do is reset our ranges to be between low and high:
- A: 0.0000 to 0.2025
- B: 0.2025 to 0.3805
- C: 0.3805 to 0.4500
We want to encode ‘B’, so we’ll end up with low=0.2025 and high=0.3805. Time to update the table again:
- A: 0.2025 to 0.2826
- B: 0.2826 to 0.3538
- C: 0.3538 to 0.3805
Encode ‘A’, we get low=0.2025 and high=0.2826. And adjust the table again:
- A: 0.202500 to 0.238545
- B: 0.238545 to 0.270585
- C: 0.270585 to 0.282600
Encode ‘C’, we get low=0.270585 and high=0.282600. Adjust the table one last time:
- A: 0.27058500 to 0.27599175
- B: 0.27599175 to 0.28079775
- C: 0.28079775 to 0.28260000
So, lets encode the final symbol ‘B’ and stop right here. Now we save/output a value betwee low (0.27599175) and high (0.28079775) which takes the least amount of bytes, for example 0.28!
Now I we want to decode our message “0.28” we start by building the ranges again, these are the same as the first table above. Now we look at which range our 0.28 fall in. The result is ‘A’ (between 0.0 and 0.45). So we output this symbol. Next we have to update the table, because we outputted ‘A’ we know our encoder encoded ‘A’, so we can follow the same steps as the encoder did and update the ranges in the same way, so we’ll end up with the second table (with values between 0.0 and 0.45). If we look at 0.28 it now falls into range ‘B’, so we output ‘B’ and adjust again.
As you can see this number “0.28” describes our whole message “ABACB”! So we encoded a whole message in one small number. There are a lot of improvements possible to implement this algorithm efficiently, for example look at range encoding. Another website that helped me a lot is this basic arithmetic coding by Arturo Campos.
The previous example works great, but there is a problem… if we assign equal probabilities to each possible symbol (each byte) we won’t compress anything! The example above works because I took bigger probabilities for A and B then for C… So for it to work we need to guess the correct probabilities, and the better we do this, the smaller our file gets! Luckely there are a couple of methods to assign/guess these probabilities.
Read and save table
One thing you can do is to first read the whole file and save the amount you see each character. Then you can scale these amounts into probabilities between 0.0 and 1.0 and save these probabilities to a file. Next we can use these probabilities to encode (and decode) our message! This will take quite a bit of overhead because you have to save the table..
Another thing we can do is to just start with equal probabilities for each symbol and adjust the probabilities during the encoding. If we use the same method in encoding and decoding our probability table will remain the same.
Prediction by Partial Matching (PPM)
So, we can adjust the probabilities at runtime and for example count all the symbols we have seen. But why stop there? For example in a piece of English text, the letter ‘E’ will probably have the highest count/probabilty. But if we encouter the symbols: “COMPRESSI” we know the next character will likely be an “O”, not an “E”! So what if we can enlarge our context? This is what PPM does, looking at the bigger picture. Not only count single symbols, but also combinations. The longer the combinations the more specific our predictions get (but also more memory is needed).
Context Mixing (CM)
With PPM you have one model which predicts the probability of the next symbol by looking at its past statistics. But why stop there? With Context Mixing (CM) you can have multiple models predicting the next symbol work together. When you do this right you and predict the next symbol even better and thus get better compression ratios! This is how the best current compression algorithms work.
This is just one family of compression algorithms, there are a lot more which I won’t describe here (yet?). Instead of processing/guessing the next symbol you could also replace long series of symbols with shorter unique symbols. This is known as dictionary coding, an entire other family of compressors (including LZ, zip, gz).
Its a problem I encouter in most JEE projects I’ve worked on so far. Handling null-values. But why is this a problem? And what strategies can we follow to reduce the problem? That is what I’m trying to find out in this post.
Lets start with a piece of business logic, in a world where we don’t have null-values:
This looks good and understandable! But this isn’t what I see in most projects. Usually I see code like this:
Wow, that is not a pretty sight, not at all! What can we do about it and how did it happen?
Inversion of logic
The first strategy we can follow is based on inversion of logic which I’ve blogged about before. The idea is that we exit early, this will improve the readability of our method. Lets see that happens to our method is we follow this pattern:
This is somewhat better, a bit shorter, but we still have all the ‘!= null’ and ‘== null’ checks we don’t want.
Code by contract
The best way to get rid of null-checks is to get rid of nulls in your applicaties. Just don’t return a null in all of the methods…! This sounds very easy and straight forward, but it is a bit harder then it sounds because it has become such a habbit.
The good thing is that current IDE’s are implementing the @NotNull and @Nullable annotations. With these annotations you can tell other programmers, your IDE and static code analysis tools what your idea was when creating a method:
It also helps you to clearly state your assumed preconditions:
It will also help you correct possible coding errors:
Using this method you have some more certainties. But it isn’t a silver-bullet on its own. We have to stop and think, where do the null-values come from?
Actually, when you stop returning null (which is entirely up to you and your team) there are still situations which are ‘unchecked’. Namely external API’s, frameworks, the ORM-mapper you are using. So everywhere where you execute methods that you haven’t written, you still have to do manual checks. But make sure you do this right away. Then further on in the code, you don’t have to check anything because of your @NotNull-contracts.
If you do this to the Person object above, and all its fields, you will end up with the beautiful clean code of the first example. No checks, its just always filled by contract. The only thing you want to add is information about the parameters:
In my experience this works very well, I’ve done this a couple of times, even before the @NotNull and @Nullable existed. Before this we would just add the information in our Javadoc. But with the IDE checking for it this has become a lot easier to use.
Null object pattern
A whole different approach then coding-by-contract is the use of a null-object. The idea behind this pattern is that you don’t return null, but instead you return a real object. In our example we would do the following:
The huge advantage is that you can always safely call “person.getName()” and “person.getAccounts()” because even if you have a NullPerson the object still exists. This Null-Object is obviously usually a singleton.
A more elaborate but general version of this pattern is the special case pattern, named by Martin Fowler. Instead of just having a Null-special case you could also develop:
Now you can return much more information then just a meaningless “null”!
There is just one problem with this pattern. It also doesn’t just solve your problems. Why do we want to calculate the total balance of a NullPerson? The moment you retrieve a person and it is an instance of NullPerson, catch it and handle the situation appropiatly, don’t just continue.
Safe null operator
For a time there was speculation that the Safe-null-operator would make its way into Java 7 through Project Coin. If you’ve programmed in Groovy you might have seen it before:
The idea is that you use “?.” instead of “.”. The questionmark means: “If the variable we’re calling is null, the result is null, else, call the next method”.
So in this case:
- If person is null: return null, else:
- If getName() is null: return null, else:
- Return result of substring(0, 1)
If you would now write this code it would be:
You could also give a default value:
If person or getName() produce a null the “?:” operator will return the default String.
Sounds pretty good eh? The only problem is that this new operator didn’t make the final list of changes for Java 7.
Fun fact: Do you know why the “?:”-operator is called the “Elvis”-operator? When viewed from the side, as smiley , it looks like Elvis. Including the big curl in his hair.