String Similarity Comparision in JS with Examples

“In computer science, fuzzy string matching is the technique of finding strings that match a pattern approximately (rather than exactly)”

The result is 1/7 = 14%
The result is now 2/9 = 22%
similarity= cos(a,b)= dotproduct(a,b) / ( norm(a) * norm(b) )= a.b / ||a|| * ||b||
me Julie loves Linda than more likes Jane
me     2   2
Jane 0 1
Julie 1 1
Linda 1 0
likes 0 1
loves 2 1
more 1 1
than 1 1
a: [2, 1, 0, 2, 0, 1, 1, 1]b: [2, 1, 1, 1, 1, 0, 1, 1]

“In computer science and statistics, the Jaro-Winkler distance is a string metric for measuring the edit distance between two sequences.

Informally, the Jaro distance between two words is the minimum number of single-character transpositions required to change one word into the other.

The Jaro-Winkler distance uses a prefix scale which gives more favourable ratings to strings that match from the beginning for a set prefix length”

Jaro distance formula

dj is the Jaro distance
m is the number of matching characters (characters that appear in s1 and in s2)
t is half the number of transpositions (compare the i-th character of s1 and the i-th character of s2 divided by 2)
|s1| is the length of the first string
|s2| is the length of the second string

m = 6
t = 2/2 =1 (2 couples of non matching characters, the 4-th and 5-th) { t/h ; h/t }
|s1| = 6
|s2| = 6
dj = (⅓) ( 6/6 + 6/6 + (6–1)/6) = ⅓ 17/6 = 0,944Jaro distance = 94,4%
Jaro-Winkler distance formula
dw = 0,944 + ( (0,1*3)(1–0,944)) = 0,944 + 0,3*0,056 = 0,961Jaro-Winkler distance = 96,1%

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store