Thanks Bill,
Right now I am displaying a "possible match, please choose" form that displays the possible matches and allows the user to retain the entered value or choose one of the "matches."
Glenn
From: MS_Access_Professionals@yahoogroups.com [mailto:MS_Access_Professionals@yahoogroups.com]
Sent: Monday, August 4, 2014 9:58 AM
To: MS_Access_Professionals@yahoogroups.com
Subject: [MS_AccessPros] Re: reality check for my logic
Glenn
Catching typos/abbreviation differences is always hard. You can never assume they are the same street so clean-up still have to be done manually. I'd say a difference of 3 characters is a good start. But you should give the user a chance to decide at the time of entry. Especially if you are checking street names before you have the city.
Regards,
Bill Mosca, Founder - MS_Access_Professionals
Microsoft Office Access MVP
My nothing-to-do-with-Access blog
---In MS_Access_Professionals@yahoogroups.com, <argeedblu@gmail.com> wrote :
I am comparing user entered street name against street names already in a table. My algorithm is to first test soundex or the two values. If the soundex values a different then I test the Damerau Levenstein distance between the two values. If the distance is between 1 and 3 then I assume there is a possible typo in one of the values.
I have implemented the algorithm with this code:
Public Function FuzzyMatch(rstrString1 As String, _
rstrString2 As String) As Boolean
' Procedure: FuzzyMatch
' DateTime: 7/30/2014 12:40:15 PM
' Author: Glenn Lloyd
' Description: Compares two strings using Soundex to filter
' before calculating LD
'--
Const cstrProcedure = "FuzzyMatch"
Dim strSndx1 As String
Dim strSndx2 As String
Dim intDLD As Integer
On Error GoTo HandleError
strSndx1 = Soundex(rstrString1)
strSndx2 = Soundex(rstrString2)
intDLD = DLD(rstrString1, rstrString2, 5) 'limit distance recursions
FuzzyMatch = False
If strSndx1 <> strSndx2 Then
'they don't sound the same but is there a typo
'if there is more than a 3 character difference value is
'probably not a typo
If (intDLD >= 1 And intDLD <= 3) Then
FuzzyMatch = True
End If
Else
If strSndx1 = strSndx2 Then
'they sound the same so we may have a match
FuzzyMatch = True
End If
End If
HandleExit:
Exit Function
HandleError:
ErrorHandle Err, Erl(), cstrModule & "." & cstrProcedure
Resume HandleExit
End Function
My question is: how sound is my logic that more than three character differences between two values indicates a low probability of a typo?
Glenn Lloyd
(705)805-6712 | Fax (705)805-9289 | Text (705)885-5283
Posted by: "Glenn Lloyd" <argeedblu@gmail.com>
Reply via web post | • | Reply to sender | • | Reply to group | • | Start a New Topic | • | Messages in this topic (3) |
Tidak ada komentar:
Posting Komentar