Fred Drake | 3a0351c | 1998-04-04 07:23:21 +0000 | [diff] [blame] | 1 | \section{Built-in Module \module{soundex}} |
Guido van Rossum | e47da0a | 1997-07-17 16:34:52 +0000 | [diff] [blame] | 2 | \label{module-soundex} |
Fred Drake | c520b69 | 1998-01-20 04:45:44 +0000 | [diff] [blame] | 3 | \bimodindex{soundex} |
Guido van Rossum | 3486f27 | 1996-12-12 17:02:21 +0000 | [diff] [blame] | 4 | |
Fred Drake | 23bc85a | 1998-03-08 07:56:48 +0000 | [diff] [blame] | 5 | |
Guido van Rossum | 3486f27 | 1996-12-12 17:02:21 +0000 | [diff] [blame] | 6 | The soundex algorithm takes an English word, and returns an |
| 7 | easily-computed hash of it; this hash is intended to be the same for |
| 8 | words that sound alike. This module provides an interface to the |
| 9 | soundex algorithm. |
| 10 | |
| 11 | Note that the soundex algorithm is quite simple-minded, and isn't |
| 12 | perfect by any measure. Its main purpose is to help looking up names |
Fred Drake | c520b69 | 1998-01-20 04:45:44 +0000 | [diff] [blame] | 13 | in databases, when the name may be misspelled --- soundex hashes common |
Guido van Rossum | 3486f27 | 1996-12-12 17:02:21 +0000 | [diff] [blame] | 14 | misspellings together. |
| 15 | |
| 16 | \begin{funcdesc}{get_soundex}{string} |
| 17 | Return the soundex hash value for a word; it will always be a |
| 18 | 6-character string. \var{string} must contain the word to be hashed, |
Fred Drake | c708605 | 1998-04-07 19:58:19 +0000 | [diff] [blame] | 19 | with no leading whitespace; the case of the word is ignored. (Note |
| 20 | that the original algorithm produces a 4-character result.) |
Guido van Rossum | 3486f27 | 1996-12-12 17:02:21 +0000 | [diff] [blame] | 21 | \end{funcdesc} |
| 22 | |
| 23 | \begin{funcdesc}{sound_similar}{string1, string2} |
| 24 | Compare the word in \var{string1} with the word in \var{string2}; this |
| 25 | is equivalent to |
Fred Drake | fc931ec | 1998-02-13 21:49:12 +0000 | [diff] [blame] | 26 | \code{get_soundex(\var{string1})} \code{==} |
| 27 | \code{get_soundex(\var{string2})}. |
Guido van Rossum | 3486f27 | 1996-12-12 17:02:21 +0000 | [diff] [blame] | 28 | \end{funcdesc} |
Fred Drake | c708605 | 1998-04-07 19:58:19 +0000 | [diff] [blame] | 29 | |
| 30 | |
| 31 | \begin{seealso} |
| 32 | |
| 33 | \seetext{Donald E. Knuth, \emph{Sorting and Searching,} vol. 3 in |
| 34 | ``The Art of Computer Programming.'' Addison-Wesley Publishing |
| 35 | Company: Reading, MA: 1973. pp.\ 391-392. Discusses the origin and |
| 36 | usefulness of the algorithm, as well as the algorithm itself. Knuth |
| 37 | gives his sources as \emph{U.S. Patents 1261167} (1918) and |
| 38 | \emph{1435663} (1922), attributing the algorithm to Margaret K. Odell |
| 39 | and Robert C. Russel. Additional references are provided.} |
| 40 | |
| 41 | \end{seealso} |