String matching, soundex, pattern matching, fuzzy matching

quick bookmark for me – we’re considering the benefits of pattern matching – to match or not to match that’s the question. Found that Sam Chapman has a great library might save some time, details here:
it covers these algorithms:

Hamming distance, Levenshtein distance, Needleman-Wunch distance or Sellers Algorithm, Smith-Waterman distance, Gotoh Distance or Smith-Waterman-Gotoh distance, Block distance or L1 distance or City block distance, Monge Elkan distance, Jaro distance metric, Jaro Winkler, SoundEx distance metric, Matching Coefficient, Dice’s Coefficient, Jaccard Similarity or Jaccard Coefficient or Tanimoto coefficient, Overlap Coefficient, Euclidean distance or L2 distance, Cosine similarity, Variational distance, Hellinger distance or Bhattacharyya distance, Information Radius (Jensen-Shannon divergence), Harmonic Mean, Skew divergence, Confusion Probability, Tau, Fellegi and Sunters (SFS) metric, TFIDF or TF/IDF, FastA, BlastP, Maximal matches, q-gram, Ukkonen Algorithms

 also to check, beyond soundex:

http://anastasiosyal.com/archive/2009/01/11/18.aspx

Advertisements