I am comparing user entered street name against street names already in a table. My algorithm is to first test soundex or the two values. If the soundex values a different then I test the Damerau Levenstein distance between the two values. If the distance is between 1 and 3 then I assume there is a possible typo in one of the values.
I have implemented the algorithm with this code:
Public Function FuzzyMatch(rstrString1 As String, _
rstrString2 As String) As Boolean
' Procedure: FuzzyMatch
' DateTime: 7/30/2014 12:40:15 PM
' Author: Glenn Lloyd
' Description: Compares two strings using Soundex to filter
' before calculating LD
'--
Const cstrProcedure = "FuzzyMatch"
Dim strSndx1 As String
Dim strSndx2 As String
Dim intDLD As Integer
On Error GoTo HandleError
strSndx1 = Soundex(rstrString1)
strSndx2 = Soundex(rstrString2)
intDLD = DLD(rstrString1, rstrString2, 5) ‘limit distance recursions
FuzzyMatch = False
If strSndx1 <> strSndx2 Then
'they don't sound the same but is there a typo
'if there is more than a 3 character difference value is
'probably not a typo
If (intDLD >= 1 And intDLD <= 3) Then
FuzzyMatch = True
End If
Else
If strSndx1 = strSndx2 Then
'they sound the same so we may have a match
FuzzyMatch = True
End If
End If
HandleExit:
Exit Function
HandleError:
ErrorHandle Err, Erl(), cstrModule & "." & cstrProcedure
Resume HandleExit
End Function
My question is: how sound is my logic that more than three character differences between two values indicates a low probability of a typo?
Glenn Lloyd
(705)805-6712 | Fax (705)805-9289 | Text (705)885-5283
Posted by: "Glenn Lloyd" <argeedblu@gmail.com>
Reply via web post | • | Reply to sender | • | Reply to group | • | Start a New Topic | • | Messages in this topic (1) |
Tidak ada komentar:
Posting Komentar