Edit distance

Definition
The edit distance of two strings is the minimum number of atomic edtion operations (insertion, deletion or substitution of characters) that must be done in order to transform one string in the other.

Also known as Levenshtein distance.

Examples
EditDist('house', '' )= 5, (removing all the characters)

EditDist('house','houses')=1, (inserting an 's' at the end of the first string)

EditDist('house','horses')=2, (changing the 'u' in the first string by 'r' and inserting an 's' at the end)

EditDist('boulevard','blvd')=5, (removing 'o', 'u', 'e', 'a' and 'r' from de first string).

Normalization
$$EditDistSim(s,t) = 1-\frac{EditDist(s,t)}{\max(|s|,|t|)} $$

Examples
EditSim('house', '' )= 1-5/5 = 0,

EditSim('house','houses')=1-1/6= 0.8333,

EditSim('house','horses')=1-2/6= 0.6667,

EditSim('boulevard','blvd')=1-5/9= 0,4444,

Variations
There are variations that assign different weight to the atomic edition operations.

Applications
Useful for matching abbreviations and acronyms.