Soundex algorithm is used for encoding English words on the basis of their sound. The main purpose is to avoid spelling errors when recording the names of people in a census. Source code can be presented as a code of 4 characters in the form LDDD, where L is the first letter of the name and D represents a decimal digit (for English alphabet D is in the range of 0 to 6). The rules for coding are the following:

  1. The first character code is always the first letter of the name, regardless of other rules.
  2. The letters “A”, “E”, “I”, “O”, “U”, “H”, “W” and “Y” are released.
  3. The letters “B”, “F”, “P” and “V” are coded as 1.
  4. The letters “C”, “G”, “J”, “K”, “Q”, “S”, “X” and “Z” are coded as 2.
  5. Letter “D” and “T” are coded as 3.
  6. The letter “L” is coded as 4.
  7. The letters “M” and “N” are coded as 5.
  8. The letter “R” is coded as 6.
  9. If there are two repeated letters, the second is skipped.
  10. If any letter has the same code as the previous, it is skipped.
  11. Names with prefixes are encoded as a prefix or without him.
  12. Letters with the same code separated by “A”, “E”, “I”, “O” or “U” are coded.
  13. Letters with the same code separated by “H” or “W” are not coded.
  14. If characters in the end code are less than four are supplemented with 0 to four.

For example coding name “Lee”: Using Rule 1 take the first letter to start code. The following symbols are skipped because of rule 2. Since the code we have contains only one real character three 0 are added and  we receive source code L000.

For example coding name “Ashcraft”: Using Rule 1 take the first letter to start code. Encodes a letter “S” as two. The next two points are not encrypted and are skipped because of rules 4 and 13. The fifth point is “R” and then coding 6. miss a vowel and encode the letter “F” by 1. The last letter “T” is omitted, as already we have filled all positions of code we have received A261.

Sample codes of some names
NameCode
LeeL000
WashingtonW252
GutierrezG362
PfisterP236
JacksonJ250
TymczakT522
VanDeusenV532
VanDeusenD250
AshcraftA261
SmithS530
SmytheS530

The table with sample coded seen how two similar-sounding names are coded in the same way, thus avoiding duplication due to spelling errors because of close sound.

In case of a modification of groups of various sounds Soundex algorithm can be used for different languages.

Was this article helpful?