Today I’ve been playing around with the Levenshtein distance. The Levenshtein distance is a number which measures the ‘distance’ between two strings. For example, the distance between “test” and “rest” is one.

## The challenge

A Levenshtein distance of one is the key element in a challenge I’ve been reading about. I first encountered it on williamedwardscoder‘s blog.

The problem description:

Two words are friends if they have a Levenshtein distance of 1. That is, you can add, remove, or substitute exactly one letter in word X to create word Y. A word’s social network consists of all of its friends, plus all of their friends, and all of their friends’ friends, and so on. Write a program to tell us how big the social network for the word “causes” is, using this word list. Have fun!

## Java solution (8.1 sec)

After some Googling and tweaking I decided to make an implementation based on the Trie structure. How this helps is excellently described by Steve Hanov. I’ve also had a peek in another Java based Trie implementation by Ximus.

I’ve been able to get the code below run in 8.1 seconds, which is pretty good. But I’ve read that there are Java implementations running in just 4 seconds…!? Maybe based on Levenshtein Automata?

```import java.io.BufferedReader;
import java.io.File;
import java.io.IOException;
import java.util.List;

public class FriendlyCauses {

public static void main(String[] args) throws Exception {
FriendlyCauses s = new FriendlyCauses();
s.find();
}

TrieNode[] root = new TrieNode[26];

public void find() throws Exception {
long timeBefore = System.currentTimeMillis();
File wordList = new File("word.list");

findFriends("causes");
System.out.println("Amount of friends found: " + countWords(root));
System.out.println("Total time: " + (System.currentTimeMillis() - timeBefore) + "ms");
}

/**
* This method builds the complete trie with all the words.
*/
int start = next.charAt(0) - 97;
TrieNode current = root[start];
if (current == null) {
root[start] = current = new TrieNode();
}
for (int i = 1; i < next.length(); i++) {
int nextIndex = next.charAt(i) - 97;
if (current.children == null) {
current.children = new TrieNode[26];
}
if (current.children[nextIndex] == null) {
TrieNode deeper = new TrieNode();
current.children[nextIndex] = deeper;
}
current = current.children[nextIndex];
}
current.word = next;
}
}

private class TrieNode {
// The children:
TrieNode[] children;
// If this node is the end of a word, store the word here:
String word;
// This boolean indicates that the current node is a friend.
boolean isFriend;
}

// Temporary buffer of friends to investigate further:
public List<String> friendsToFind = new LinkedList<String>();

public void findFriends(String word) {

while (friendsToFind.size() > 0) {
String friend = friendsToFind.remove(0);

int size = friend.length();

byte[] currentRow = new byte[size + 1];
for (int i = 0; i <= size; i++) {
currentRow[i] = (byte) i;
}

// Search the Trie for the given friend:
for (int i = 0; i < 26; i++) {
if (root[i] != null) {
searchTrieForWord(root[i], i + 97, friend, currentRow);
}
}
}
}

/**
* Walk the complete Trie again finding all the friends
*/
private int countWords(TrieNode[] ts) {
int amount = 0;
for (int i = 0; i < 26; i++) {
TrieNode t = ts[i];
if (t != null) {
if (t.isFriend) {
amount++;
}
if (t.children != null) {
amount += countWords(t.children);
}
}
}
return amount;
}

/**
* Recursively walk the Trie calculating the Levenshtein distance to the
* given word. When a friend is found, mark it and add it to a list. The new
* friend needs to be processed further on.
*/
private void searchTrieForWord(TrieNode node, int letter, String word, byte[] previousRow) {

int size = previousRow.length;
byte[] currentRow = new byte[size];
currentRow[0] = (byte) (previousRow[0] + 1);

int minElement = currentRow[0];

for (int i = 0; i < size - 1; i++) {
int newCurrentRowValue = currentRow[i] + 1;
newCurrentRowValue = Math.min(newCurrentRowValue,
previousRow[i + 1] + 1);

if (word.charAt(i) == letter) {
newCurrentRowValue = Math.min(newCurrentRowValue,
previousRow[i]);
} else {
newCurrentRowValue = Math.min(newCurrentRowValue,
previousRow[i] + 1);
}
if (newCurrentRowValue < minElement) {
minElement = newCurrentRowValue;
}
currentRow[i + 1] = (byte) newCurrentRowValue;
}

// If the Levenshtein distance is 1 and the node at current depth is a
// word (and not yet friend):
if (currentRow[size - 1] == 1 && node.word != null && !node.isFriend) {
node.isFriend = true;
}

if (minElement <= 1 && node.children != null) {
for (int i = 0; i < node.children.length; i++) {
TrieNode child = node.children[i];
if (child != null) {
searchTrieForWord(child, i + 97, word, currentRow);
}
}
}
}
}

```

• Lybrial

Hi,

i tried to understand that piece of code and there is a small thing that simply dont want to get inside my head: ‘int start = word.charAt(0) – 97;’, why that – 97? Unfortunately i cant download the ‘word.list’ file because it can not be found. So i downloaded oxford word list and created my own word.list file instead. But when i run the code with my word.list i get an Exception: ‘Exception in thread “main” java.lang.ArrayIndexOutOfBoundsException: -58′
at the line: ‘if (current.children[nextIndex] == null) {‘.
I really dont get why -97, maybe you can help me.

• Lybrial

Ah never mind. The -97 is ‘a’ in ASCII right? And the Exception is thrown because i have a word with ‘ in it.

• http://www.royvanrijn.com royvanrijn

Exacly, that is why we shouldn’t use magic numbers like 97 :-)