Soundex

I wrote a function to return the Metaphone code, which works much better than Soundex, as Norman mentioned. Metaphone still has a number of limitations. I modified it significantly over the years and renamed this version “Neophone”. It’s still a work in progress, because I still occasionally find examples that it doesn’t work with. The most recent one was words ending in EAUX where the X should be silent.

Public Function Neophone(s As String) as String
  //Neophone phonetic index key generator
  //Robert Weaver, Saskatoon Canada, 2017
  //
  //Neophone is a Modification of Lawrence Philips' Metaphone phonetic index key generator
  //The modifications are to address some of the shortcomings of the original Metaphone
  //algorithm. In particular, it addresses situations where the pronunciation of certain
  //letters or letter combinations change significantly when they precede or follow other
  //certain letters or letter combinations. The algorithm is based on English pronunciation, and would require significant modifications for other languages.
  //
  '
  'Condenses input word into a series of letters representing phonetic groups A,B,D,K,L,M,R,S
  'A = any leading vowel (all other vowels are deleted)
  'B = B,F,P,V
  'D = D,T
  'K = C,G,K,Q
  'KS = X
  'L = L
  'M = M,N
  'R = R
  'S = c,g,j,S,X,Z (Lowercase refers to soft pronunciation of the letter)
  'Uses ReplaceAllB to ensure case sensitive search. Therefore, input string must not contain diacriticals.
  'Word is delimited with leading and trailing "." in order to facilitate searching for 
  'start of word and end of word characters.
  'Converted from FileMaker computed field, which used multi-argument Substitute function which is case sensitive.
  'Therefore, the use of lowercase in the replace string prevents further matches.
  'So, text search/replace pairs such as .TIA:.tia followed by TIA:SHA are used to prevent replacement if found
  'at the beginning of a word, but doesn't prevent replacement if found elsewhere.
  
  
  dim i As Integer
  dim sIn As String = "."+Trim(Uppercase(s))+"."
  efOut.AppendText sIn+EndOfLine
  'Step 1: Process specially pronounced letter combinations
  static searchStr1() As String =  Split("EAUX.,.X,.KN,.PN,.GN,.TIA,.TIO,TCH,WR,WH,MB.,GHT,GH.,GH,CHR,CR,SCH,STIA,STIO,TIA,TIO,CH,CE,CI,CY,CK,DGE,DGI,DGY,GE,GI,GY,PH,Q,V,X",",")
  static replaceStr1() As String = Split("O.,.S,.N,.N,.N,.tia,.tio,CH,R,W,M.,T,.,G,CR,KR,SK,STA,STO,SHA,SHO,SH,SI,SI,SI,K,JE,JI,JI,JE,JI,JI,F,K,F,KS",",")
  Static n1 As Integer = UBound(searchStr1)
  for i=0 to n1
    sIn=ReplaceAllB(sIn,searchStr1(i),replaceStr1(i))
  next
  efOut.AppendText sIn+EndOfLine
  sIn=Uppercase(sIn)
  'Step 2: Convert vowels and silent letters
  Static searchStr2() As String = Split("E,I,O,U,WA,WY,YA",",")
  static replaceStr2() As String = Split("A,A,A,A,wA,wY,yA",",")
  Static n2 As Integer = UBound(searchStr2)
  for i=0 to n2
    sIn=ReplaceAllB(sIn,searchStr2(i),replaceStr2(i))
  next
  sIn=Uppercase(sIn)
  'Step 3: Combine similar sounding consonants
  Static searchStr3() As String = Split("G,C,TH,J,Z,F,P,V,H,N,T,Y,W",",")
  static replaceStr3() As String = Split("K,K,S,S,S,B,B,B,A,M,D,,",",")
  Static n3 As Integer = UBound(searchStr3)
  for i=0 to n3
    sIn=ReplaceAllB(sIn,searchStr3(i),replaceStr3(i))
  next
  'Step 4: Remove adjacent duplicates
  dim ch As String = midB(sIn,1,1)
  dim sTemp As String = ch
  For i=2 to Len(sIn)
    if ch <> midB(sIn,i,1) then
      ch=midB(sIn,i,1)
      sTemp=sTemp+ch
    end if
  Next
  'Step 5: Delete vowel markers and delimiters
  sIn=ReplaceAllB(sTemp,".A",".0")
  sIn=ReplaceAllB(sIn,"A","")
  sIn=ReplaceAllB(sIn,".","")
  sIn=ReplaceAllB(sIn,"0","A")
  Return sIn
End Function

[quote=478763:@Dave S]it boils any name/word or phrase down to a single Letter and 3 digit number, meaning that supposedly the entire “English” language would be represented by 26,000 unique code.

While working in the healthcare industry we needed to find ways to lookup similar sounding patient names, Soundex was an utter failure in terms of accuracy[/quote]

I programmed billing for an ambulance service in the 1980s. Soundex was pretty much ‘the’ phonetic index algorithm. I added a couple of things to improve accuracy: pre-squash things like ‘KN’ to “N” (my name was always wrong), “EAUX” to “O” (as @Robert Weaver noted), I think ‘ph’ to ‘f’ and a few others. I’d need to find my 30 year old code but the result worked really well.