abydos.phonetic package
abydos.phonetic.
The phonetic package includes classes for phonetic algorithms, including:
Robert C. Russell's Index (
RussellIndex)American Soundex (
Soundex)Refined Soundex (
RefinedSoundex)Daitch-Mokotoff Soundex (
DaitchMokotoff)NYSIIS (
NYSIIS)Match Rating Algorithm (
phonetic.MRA)Metaphone (
Metaphone)Double Metaphone (
DoubleMetaphone)Caverphone (
Caverphone)Alpha Search Inquiry System (
AlphaSIS)Fuzzy Soundex (
FuzzySoundex)Phonex (
Phonex)Phonem (
Phonem)Phonix (
Phonix)PHONIC (
PHONIC)Standardized Phonetic Frequency Code (
SPFC)Statistics Canada (
StatisticsCanada)LEIN (
LEIN)Roger Root (
RogerRoot)Eudex phonetic hash (
phonetic.Eudex)Parmar-Kumbharana (
ParmarKumbharana)Davidson's Consonant Code (
Davidson)SoundD (
SoundD)PSHP Soundex/Viewex Coding (
PSHPSoundexFirstandPSHPSoundexLast)Dolby Code (
Dolby)NRL English-to-phoneme (
NRL)Ainsworth grapheme to phoneme (
Ainsworth)Beider-Morse Phonetic Matching (
BeiderMorse)
There are also language-specific phonetic algorithms for German:
For French:
FONEM (
FONEM)an early version of Henry Code (
HenryEarly)
For Spanish:
Phonetic Spanish (
PhoneticSpanish)Spanish Metaphone (
SpanishMetaphone)
For Swedish:
For Norwegian:
Norphone (
Norphone)
For Brazilian Portuguese:
SoundexBR (
SoundexBR)
And there are some hybrid phonetic algorithms that employ multiple underlying phonetic algorithms:
Oxford Name Compression Algorithm (ONCA) (
ONCA)MetaSoundex (
MetaSoundex)
Each class has an encode method to return the phonetically encoded string.
Classes for which encode returns a numeric value generally have an
encode_alpha method that returns an alphabetic version of the phonetic
encoding, as demonstrated below:
>>> rus = RussellIndex()
>>> rus.encode('Abramson')
'128637'
>>> rus.encode_alpha('Abramson')
'ABRMCN'
- class abydos.phonetic.Ainsworth[source]
Bases:
abydos.phonetic._phonetic._PhoneticAinsworth's grapheme to phoneme converter.
Based on the ruleset listed in [Ain73].
New in version 0.4.1.
- encode(word: str) str[source]
Return the phonemic representation of a word.
- Parameters
word (str) -- The word to transform
- Returns
The phonemic representation in IPA
- Return type
str
Examples
>>> pe = Ainsworth() >>> pe.encode('Christopher') 'tʃrɪstofɜ' >>> pe.encode('Niall') 'nɪɔl' >>> pe.encode('Smith') 'smɪð' >>> pe.encode('Schmidt') 'skmɪdt'
New in version 0.4.1.
- class abydos.phonetic.AlphaSIS(max_length: int = 14)[source]
Bases:
abydos.phonetic._phonetic._PhoneticAlpha-SIS.
The Alpha Search Inquiry System code is defined in [Cor73]. This implementation is based on the description in [MKTM77].
New in version 0.3.6.
Initialize AlphaSIS instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 14)
New in version 0.4.0.
- encode(word: str) str[source]
Return the IBM Alpha Search Inquiry System code for a word.
A collection is necessary as the return type since there can be multiple values for a single word. But the collection must be ordered since the first value is the primary coding.
- Parameters
word (str) -- The word to transform
- Returns
The Alpha-SIS value
- Return type
str
Examples
>>> pe = AlphaSIS() >>> pe.encode('Christopher') '06401840000000,07040184000000,04018400000000' >>> pe.encode('Niall') '02500000000000' >>> pe.encode('Smith') '03100000000000' >>> pe.encode('Schmidt') '06310000000000'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
Changed in version 0.6.0: Made return a str only (comma-separated)
- encode_alpha(word: str) str[source]
Return the alphabetic Alpha-SIS code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Alpha-SIS value
- Return type
str
Examples
>>> pe = AlphaSIS() >>> pe.encode_alpha('Christopher') 'JRSTFR,KSRSTFR,RSTFR' >>> pe.encode_alpha('Niall') 'NL' >>> pe.encode_alpha('Smith') 'MT' >>> pe.encode_alpha('Schmidt') 'JMT'
New in version 0.4.0.
Changed in version 0.6.0: Made return a str only (comma-separated)
- class abydos.phonetic.BeiderMorse(language_arg: Union[str, int] = 0, name_mode: str = 'gen', match_mode: str = 'approx', concat: bool = False, filter_langs: bool = False)[source]
Bases:
abydos.phonetic._phonetic._PhoneticBeider-Morse Phonetic Matching.
The Beider-Morse Phonetic Matching algorithm is described in [BM08]. The reference implementation is licensed under GPLv3.
New in version 0.3.6.
Initialize BeiderMorse instance.
- Parameters
language_arg (str or int) --
The language of the term; supported values include:
anyarabiccyrillicczechdutchenglishfrenchgermangreekgreeklatinhebrewhungarianitalianlatvianpolishportugueseromanianrussianspanishturkish
name_mode (str) --
The name mode of the algorithm:
gen-- general (default)ash-- Ashkenazisep-- Sephardic
match_mode (str) -- Matching mode:
approxorexactconcat (bool) -- Concatenation mode
filter_langs (bool) -- Filter out incompatible languages
New in version 0.4.0.
- encode(word: str) str[source]
Return the Beider-Morse Phonetic Matching encoding(s) of a term.
- Parameters
word (str) -- The word to transform
- Returns
The Beider-Morse phonetic value(s)
- Return type
tuple
- Raises
ValueError -- Unknown language
Examples
>>> pe = BeiderMorse() >>> pe.encode('Christopher').split(',') ['xrQstopir', 'xrQstYpir', 'xristopir', 'xristYpir', 'xrQstofir', 'xrQstYfir', 'xristofir', 'xristYfir', 'xristopi', 'xritopir', 'xritopi', 'xristofi', 'xritofir', 'xritofi', 'tzristopir', 'tzristofir', 'zristopir', 'zristopi', 'zritopir', 'zritopi', 'zristofir', 'zristofi', 'zritofir', 'zritofi'] >>> pe.encode('Niall') 'nial,niol' >>> pe.encode('Smith') 'zmit' >>> pe.encode('Schmidt') 'zmit,stzmit'
>>> BeiderMorse(language_arg='German').encode('Christopher').split(',') ['xrQstopir', 'xrQstYpir', 'xristopir', 'xristYpir', 'xrQstofir', 'xrQstYfir', 'xristofir', 'xristYfir'] >>> BeiderMorse(language_arg='English').encode( ... 'Christopher').split(',') ['tzristofir', 'tzrQstofir', 'tzristafir', 'tzrQstafir', 'xristofir', 'xrQstofir', 'xristafir', 'xrQstafir'] >>> BeiderMorse(language_arg='German', ... name_mode='ash').encode('Christopher').split(',') ['xrQstopir', 'xrQstYpir', 'xristopir', 'xristYpir', 'xrQstofir', 'xrQstYfir', 'xristofir', 'xristYfir']
>>> BeiderMorse(language_arg='German', ... match_mode='exact').encode('Christopher') 'xriStopher,xriStofer,xristopher,xristofer'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
Changed in version 0.6.0: Made comma-sepated instead of space-separated output
- class abydos.phonetic.Caverphone(version: int = 2)[source]
Bases:
abydos.phonetic._phonetic._PhoneticCaverphone.
A description of version 1 of the algorithm can be found in [Hoo02].
A description of version 2 of the algorithm can be found in [Hoo04].
New in version 0.3.6.
Initialize Caverphone instance.
- Parameters
version (int) -- The version of Caverphone to employ for encoding (defaults to 2)
New in version 0.4.0.
- encode(word: str) str[source]
Return the Caverphone code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Caverphone value
- Return type
str
Examples
>>> pe = Caverphone() >>> pe.encode('Christopher') 'KRSTFA1111' >>> pe.encode('Niall') 'NA11111111' >>> pe.encode('Smith') 'SMT1111111' >>> pe.encode('Schmidt') 'SKMT111111'
>>> pe_1 = Caverphone(version=1) >>> pe_1.encode('Christopher') 'KRSTF1' >>> pe_1.encode('Niall') 'N11111' >>> pe_1.encode('Smith') 'SMT111' >>> pe_1.encode('Schmidt') 'SKMT11'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
- encode_alpha(word: str) str[source]
Return the alphabetic Caverphone code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Caverphone value
- Return type
str
Examples
>>> pe = Caverphone() >>> pe.encode_alpha('Christopher') 'KRSTFA' >>> pe.encode_alpha('Niall') 'NA' >>> pe.encode_alpha('Smith') 'SMT' >>> pe.encode_alpha('Schmidt') 'SKMT'
>>> pe_1 = Caverphone(version=1) >>> pe_1.encode_alpha('Christopher') 'KRSTF' >>> pe_1.encode_alpha('Niall') 'N' >>> pe_1.encode_alpha('Smith') 'SMT' >>> pe_1.encode_alpha('Schmidt') 'SKMT'
New in version 0.4.0.
- class abydos.phonetic.DaitchMokotoff(max_length: int = 6, zero_pad: bool = True)[source]
Bases:
abydos.phonetic._phonetic._PhoneticDaitch-Mokotoff Soundex.
Based on Daitch-Mokotoff Soundex [Mok97], this returns values of a word as a set. A collection is necessary since there can be multiple values for a single word.
New in version 0.3.6.
Initialize DaitchMokotoff instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 6; must be between 6 and 64)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
New in version 0.4.0.
- encode(word: str) str[source]
Return the Daitch-Mokotoff Soundex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Daitch-Mokotoff Soundex value
- Return type
str
Examples
>>> pe = DaitchMokotoff() >>> pe.encode('Christopher') '494379,594379' >>> pe.encode('Niall') '680000' >>> pe.encode('Smith') '463000' >>> pe.encode('Schmidt') '463000'
>>> DaitchMokotoff(max_length=20, ... zero_pad=False).encode('The quick brown fox') '35457976754,3557976754'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
Changed in version 0.6.0: Made return a str only (comma-separated)
- encode_alpha(word: str) str[source]
Return the alphabetic Daitch-Mokotoff Soundex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Daitch-Mokotoff Soundex value
- Return type
str
Examples
>>> pe = DaitchMokotoff() >>> pe.encode_alpha('Christopher') 'SRSTPR,KRSTPR' >>> pe.encode_alpha('Niall') 'NL' >>> pe.encode_alpha('Smith') 'SNT' >>> pe.encode_alpha('Schmidt') 'SNT'
>>> DaitchMokotoff(max_length=20, ... zero_pad=False).encode_alpha('The quick brown fox') 'TKSKPRPNPKS,TKKPRPNPKS'
New in version 0.4.0.
Changed in version 0.6.0: Made return a str only (comma-separated)
- class abydos.phonetic.Davidson(omit_fname: bool = False)[source]
Bases:
abydos.phonetic._phonetic._PhoneticDavidson Consonant Code.
This is based on the name compression system described in [Dav62].
[Dol70] identifies this as having been the name compression algorithm used by SABRE.
New in version 0.3.6.
Initialize Davidson instance.
- Parameters
omit_fname (bool) -- Set to True to completely omit the first character of the first name
New in version 0.4.0.
- encode(lname: str, fname: str = '.') str[source]
Return Davidson's Consonant Code.
- Parameters
lname (str) -- Last name (or word) to be encoded
fname (str) -- First name (optional), of which the first character is included in the code.
- Returns
Davidson's Consonant Code
- Return type
str
Example
>>> pe = Davidson() >>> pe.encode('Gough') 'G .' >>> pe.encode('pneuma') 'PNM .' >>> pe.encode('knight') 'KNGT.' >>> pe.encode('trice') 'TRC .' >>> pe.encode('judge') 'JDG .' >>> pe.encode('Smith', 'James') 'SMT J' >>> pe.encode('Wasserman', 'Tabitha') 'WSRMT'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
- class abydos.phonetic.Dolby(max_length: int = - 1, keep_vowels: bool = False, vowel_char: str = '*')[source]
Bases:
abydos.phonetic._phonetic._PhoneticDolby Code.
This follows "A Spelling Equivalent Abbreviation Algorithm For Personal Names" from [Dol70] and [C+69].
New in version 0.3.6.
Initialize Dolby instance.
- Parameters
max_length (int) -- Maximum length of the returned Dolby code -- this also activates the fixed-length code mode if it is greater than 0
keep_vowels (bool) -- If True, retains all vowel markers
vowel_char (str) -- The vowel marker character (default to *)
New in version 0.4.0.
- encode(word: str) str[source]
Return the Dolby Code of a name.
- Parameters
word (str) -- The word to transform
- Returns
The Dolby Code
- Return type
str
Examples
>>> pe = Dolby() >>> pe.encode('Hansen') 'H*NSN' >>> pe.encode('Larsen') 'L*RSN' >>> pe.encode('Aagaard') '*GR' >>> pe.encode('Braaten') 'BR*DN' >>> pe.encode('Sandvik') 'S*NVK'
>>> pe_6 = Dolby(max_length=6) >>> pe_6.encode('Hansen') 'H*NS*N' >>> pe_6.encode('Larsen') 'L*RS*N' >>> pe_6.encode('Aagaard') '*G*R ' >>> pe_6.encode('Braaten') 'BR*D*N' >>> pe_6.encode('Sandvik') 'S*NF*K'
>>> pe.encode('Smith') 'SM*D' >>> pe.encode('Waters') 'W*DRS' >>> pe.encode('James') 'J*MS' >>> pe.encode('Schmidt') 'SM*D' >>> pe.encode('Ashcroft') '*SKRFD'
>>> pe_6.encode('Smith') 'SM*D ' >>> pe_6.encode('Waters') 'W*D*RS' >>> pe_6.encode('James') 'J*M*S ' >>> pe_6.encode('Schmidt') 'SM*D ' >>> pe_6.encode('Ashcroft') '*SKRFD'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
- encode_alpha(word: str) str[source]
Return the alphabetic Dolby Code of a name.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Dolby Code
- Return type
str
Examples
>>> pe = Dolby() >>> pe.encode_alpha('Hansen') 'HANSN' >>> pe.encode_alpha('Larsen') 'LARSN' >>> pe.encode_alpha('Aagaard') 'AGR' >>> pe.encode_alpha('Braaten') 'BRADN' >>> pe.encode_alpha('Sandvik') 'SANVK'
New in version 0.4.0.
- class abydos.phonetic.DoubleMetaphone(max_length: int = - 1)[source]
Bases:
abydos.phonetic._phonetic._PhoneticDouble Metaphone.
Based on Lawrence Philips' (Visual) C++ code from 1999 [Phi00].
New in version 0.3.6.
Initialize DoubleMetaphone instance.
- Parameters
max_length (int) -- Maximum length of the returned Dolby code -- this also activates the fixed-length code mode if it is greater than 0
New in version 0.4.0.
- encode(word: str) str[source]
Return the Double Metaphone code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Double Metaphone value(s)
- Return type
str
Examples
>>> pe = DoubleMetaphone() >>> pe.encode('Christopher') 'KRSTFR,' >>> pe.encode('Niall') 'NL,' >>> pe.encode('Smith') 'SM0,XMT' >>> pe.encode('Schmidt') 'XMT,SMT'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
Changed in version 0.6.0: Made return a str only (comma-separated)
- encode_alpha(word: str) str[source]
Return the alphabetic Double Metaphone code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Double Metaphone value(s)
- Return type
str
Examples
>>> pe = DoubleMetaphone() >>> pe.encode_alpha('Christopher') 'KRSTFR,' >>> pe.encode_alpha('Niall') 'NL,' >>> pe.encode_alpha('Smith') 'SMÞ,XMT' >>> pe.encode_alpha('Schmidt') 'XMT,SMT'
New in version 0.4.0.
Changed in version 0.6.0: Made return a str only (comma-separated)
- class abydos.phonetic.Eudex(max_length: int = 8)[source]
Bases:
abydos.phonetic._phonetic._PhoneticEudex hash.
This implementation of eudex phonetic hashing is based on the specification (not the reference implementation) at [Tic].
Further details can be found at [Tic16].
New in version 0.3.6.
Initialize Eudex instance.
- Parameters
max_length (int) -- The length in bits of the code returned (default 8)
New in version 0.4.0.
- encode(word: str) str[source]
Return the eudex phonetic hash of a word.
- Parameters
word (str) -- The word to transform
- Returns
The eudex hash
- Return type
str
Examples
>>> pe = Eudex() >>> pe.encode('Colin') '432345564238053650' >>> pe.encode('Christopher') '433648490138894409' >>> pe.encode('Niall') '648518346341351840' >>> pe.encode('Smith') '720575940412906756' >>> pe.encode('Schmidt') '720589151732307997'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
Changed in version 0.6.0: Made return a str instead of int
- class abydos.phonetic.FONEM[source]
Bases:
abydos.phonetic._phonetic._PhoneticFONEM.
FONEM is a phonetic algorithm designed for French (particularly surnames in Saguenay, Canada), defined in [BBL81].
Guillaume Plique's Javascript implementation [Pli18] at https://github.com/Yomguithereal/talisman/blob/master/src/phonetics/french/fonem.js was also consulted for this implementation.
New in version 0.3.6.
- encode(word: str) str[source]
Return the FONEM code of a word.
- Parameters
word (str) -- The word to transform
- Returns
The FONEM code
- Return type
str
Examples
>>> pe = FONEM() >>> pe.encode('Marchand') 'MARCHEN' >>> pe.encode('Beaulieu') 'BOLIEU' >>> pe.encode('Beaumont') 'BOMON' >>> pe.encode('Legrand') 'LEGREN' >>> pe.encode('Pelletier') 'PELETIER'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
- class abydos.phonetic.FuzzySoundex(max_length: int = 5, zero_pad: bool = True)[source]
Bases:
abydos.phonetic._phonetic._PhoneticFuzzy Soundex.
Fuzzy Soundex is an algorithm derived from Soundex, defined in [HM02].
New in version 0.3.6.
Initialize FuzzySoundex instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 4)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
New in version 0.4.0.
- encode(word: str) str[source]
Return the Fuzzy Soundex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Fuzzy Soundex value
- Return type
str
Examples
>>> pe = FuzzySoundex() >>> pe.encode('Christopher') 'K6931' >>> pe.encode('Niall') 'N4000' >>> pe.encode('Smith') 'S5300' >>> pe.encode('Smith') 'S5300'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
- encode_alpha(word: str) str[source]
Return the alphabetic Fuzzy Soundex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Fuzzy Soundex value
- Return type
str
Examples
>>> pe = FuzzySoundex() >>> pe.encode_alpha('Christopher') 'KRSTP' >>> pe.encode_alpha('Niall') 'NL' >>> pe.encode_alpha('Smith') 'SNT' >>> pe.encode_alpha('Schmidt') 'SNT'
New in version 0.4.0.
- class abydos.phonetic.Haase(primary_only: bool = False)[source]
Bases:
abydos.phonetic._phonetic._PhoneticHaase Phonetik.
Based on the algorithm described at [Pra15].
Based on the original [HH00].
New in version 0.3.6.
Initialize Haase instance.
- Parameters
primary_only (bool) -- If True, only the primary code is returned
New in version 0.4.0.
- encode(word: str) str[source]
Return the Haase Phonetik (numeric output) code for a word.
While the output code is numeric, it is nevertheless a str.
- Parameters
word (str) -- The word to transform
- Returns
The Haase Phonetik value as a numeric string
- Return type
str
Examples
>>> pe = Haase() >>> pe.encode('Joachim') '9496' >>> pe.encode('Christoph') '4798293,8798293' >>> pe.encode('Jörg') '974' >>> pe.encode('Smith') '8692' >>> pe.encode('Schmidt') '8692,4692'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
Changed in version 0.6.0: Made return a str only (comma-separated)
- encode_alpha(word: str) str[source]
Return the alphabetic Haase Phonetik code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Haase Phonetik value
- Return type
str
Examples
>>> pe = Haase() >>> pe.encode_alpha('Joachim') 'AKAN' >>> pe.encode_alpha('Christoph') 'KRASTAF,SRASTAF' >>> pe.encode_alpha('Jörg') 'ARK' >>> pe.encode_alpha('Smith') 'SNAT' >>> pe.encode_alpha('Schmidt') 'SNAT,KNAT'
New in version 0.4.0.
Changed in version 0.6.0: Made return a str only (comma-separated)
- class abydos.phonetic.HenryEarly(max_length: int = 3)[source]
Bases:
abydos.phonetic._phonetic._PhoneticHenry code, early version.
The early version of Henry coding is given in [LegareLC72]. This is different from the later version defined in [Hen76].
New in version 0.3.6.
Initialize HenryEarly instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 3)
New in version 0.4.0.
- encode(word: str) str[source]
Calculate the early version of the Henry code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The early Henry code
- Return type
str
Examples
>>> pe = HenryEarly() >>> pe.encode('Marchand') 'MRC' >>> pe.encode('Beaulieu') 'BL' >>> pe.encode('Beaumont') 'BM' >>> pe.encode('Legrand') 'LGR' >>> pe.encode('Pelletier') 'PLT'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
- class abydos.phonetic.Koelner[source]
Bases:
abydos.phonetic._phonetic._PhoneticKölner Phonetik.
Based on the algorithm defined by [Pos69].
New in version 0.3.6.
- encode(word: str) str[source]
Return the Kölner Phonetik (numeric output) code for a word.
While the output code is numeric, it is still a str because 0s can lead the code.
- Parameters
word (str) -- The word to transform
- Returns
The Kölner Phonetik value as a numeric string
- Return type
str
Example
>>> pe = Koelner() >>> pe.encode('Christopher') '478237' >>> pe.encode('Niall') '65' >>> pe.encode('Smith') '862' >>> pe.encode('Schmidt') '862' >>> pe.encode('Müller') '657' >>> pe.encode('Zimmermann') '86766'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
- encode_alpha(word: str) str[source]
Return the Kölner Phonetik (alphabetic output) code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Kölner Phonetik value as an alphabetic string
- Return type
str
Examples
>>> pe = Koelner() >>> pe.encode_alpha('Smith') 'SNT' >>> pe.encode_alpha('Schmidt') 'SNT' >>> pe.encode_alpha('Müller') 'NLR' >>> pe.encode_alpha('Zimmermann') 'SNRNN'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
- class abydos.phonetic.LEIN(max_length: int = 4, zero_pad: bool = True)[source]
Bases:
abydos.phonetic._phonetic._PhoneticLEIN code.
This is Michigan LEIN (Law Enforcement Information Network) name coding, described in [MKTM77].
New in version 0.3.6.
Initialize LEIN instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 4)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
New in version 0.4.0.
- encode(word: str) str[source]
Return the LEIN code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The LEIN code
- Return type
str
Examples
>>> pe = LEIN() >>> pe.encode('Christopher') 'C351' >>> pe.encode('Niall') 'N300' >>> pe.encode('Smith') 'S210' >>> pe.encode('Schmidt') 'S521'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
- encode_alpha(word: str) str[source]
Return the alphabetic LEIN code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic LEIN code
- Return type
str
Examples
>>> pe = LEIN() >>> pe.encode_alpha('Christopher') 'CLKT' >>> pe.encode_alpha('Niall') 'NL' >>> pe.encode_alpha('Smith') 'SNT' >>> pe.encode_alpha('Schmidt') 'SKNT'
New in version 0.4.0.
- class abydos.phonetic.MRA[source]
Bases:
abydos.phonetic._phonetic._PhoneticWestern Airlines Surname Match Rating Algorithm.
A description of the Western Airlines Surname Match Rating Algorithm can be found on page 18 of [MKTM77].
New in version 0.3.6.
- encode(word: str) str[source]
Return the MRA personal numeric identifier (PNI) for a word.
- Parameters
word (str) -- The word to transform
- Returns
The MRA PNI
- Return type
str
Examples
>>> pe = MRA() >>> pe.encode('Christopher') 'CHRPHR' >>> pe.encode('Niall') 'NL' >>> pe.encode('Smith') 'SMTH' >>> pe.encode('Schmidt') 'SCHMDT'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
- class abydos.phonetic.MetaSoundex(lang: str = 'en')[source]
Bases:
abydos.phonetic._phonetic._PhoneticMetaSoundex.
This is based on [KV17]. Only English ('en') and Spanish ('es') languages are supported, as in the original.
New in version 0.3.6.
Initialize MetaSoundex instance.
- Parameters
lang (str) -- Either
enfor English oresfor Spanish
New in version 0.4.0.
- encode(word: str) str[source]
Return the MetaSoundex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The MetaSoundex code
- Return type
str
Examples
>>> pe = MetaSoundex() >>> pe.encode('Smith') '4500' >>> pe.encode('Waters') '7362' >>> pe.encode('James') '1520' >>> pe.encode('Schmidt') '4530' >>> pe.encode('Ashcroft') '0261'
>>> pe = MetaSoundex(lang='es') >>> pe.encode('Perez') '094' >>> pe.encode('Martinez') '69364' >>> pe.encode('Gutierrez') '83994' >>> pe.encode('Santiago') '4638' >>> pe.encode('Nicolás') '6754'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
- encode_alpha(word: str) str[source]
Return the MetaSoundex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The MetaSoundex code
- Return type
str
Examples
>>> pe = MetaSoundex() >>> pe.encode_alpha('Smith') 'SN' >>> pe.encode_alpha('Waters') 'WTRK' >>> pe.encode_alpha('James') 'JNK' >>> pe.encode_alpha('Schmidt') 'SNT' >>> pe.encode_alpha('Ashcroft') 'AKRP'
>>> pe = MetaSoundex(lang='es') >>> pe.encode_alpha('Perez') 'PRS' >>> pe.encode_alpha('Martinez') 'NRTNS' >>> pe.encode_alpha('Gutierrez') 'GTRRS' >>> pe.encode_alpha('Santiago') 'SNTG' >>> pe.encode_alpha('Nicolás') 'NKLS'
New in version 0.4.0.
- class abydos.phonetic.Metaphone(max_length: int = - 1)[source]
Bases:
abydos.phonetic._phonetic._PhoneticMetaphone.
Based on Lawrence Philips' Pick BASIC code from 1990 [Phi90b], as described in [Phi90a]. This incorporates some corrections to the above code, particularly some of those suggested by Michael Kuhn in [Kuh95].
New in version 0.3.6.
Initialize AlphaSIS instance.
- Parameters
max_length (int) -- The maximum length of the returned Metaphone code (defaults to 64, but in Philips' original implementation this was 4)
New in version 0.4.0.
- encode(word: str) str[source]
Return the Metaphone code for a word.
Based on Lawrence Philips' Pick BASIC code from 1990 [Phi90b], as described in [Phi90a]. This incorporates some corrections to the above code, particularly some of those suggested by Michael Kuhn in [Kuh95].
- Parameters
word (str) -- The word to transform
- Returns
The Metaphone value
- Return type
str
Examples
>>> pe = Metaphone() >>> pe.encode('Christopher') 'KRSTFR' >>> pe.encode('Niall') 'NL' >>> pe.encode('Smith') 'SM0' >>> pe.encode('Schmidt') 'SKMTT'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
- class abydos.phonetic.NRL[source]
Bases:
abydos.phonetic._phonetic._PhoneticNaval Research Laboratory English-to-phoneme encoder.
This is defined by [EJMS76].
New in version 0.3.6.
- encode(word: str) str[source]
Return the Naval Research Laboratory phonetic encoding of a word.
- Parameters
word (str) -- The word to transform
- Returns
The NRL phonetic encoding
- Return type
str
Examples
>>> pe = NRL() >>> pe.encode('the') 'DHAX' >>> pe.encode('round') 'rAWnd' >>> pe.encode('quick') 'kwIHk' >>> pe.encode('eaten') 'IYtEHn' >>> pe.encode('Smith') 'smIHTH' >>> pe.encode('Larsen') 'lAArsEHn'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
- class abydos.phonetic.NYSIIS(max_length: int = 6, modified: bool = False)[source]
Bases:
abydos.phonetic._phonetic._PhoneticNYSIIS Code.
The New York State Identification and Intelligence System algorithm is defined in [Taf70].
The modified version of this algorithm is described in Appendix B of [LA77].
New in version 0.3.6.
Initialize AlphaSIS instance.
- Parameters
max_length (int) -- The maximum length (default 6) of the code to return
modified (bool) -- Indicates whether to use USDA modified NYSIIS
New in version 0.4.0.
- encode(word: str) str[source]
Return the NYSIIS code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The NYSIIS value
- Return type
str
Examples
>>> pe = NYSIIS() >>> pe.encode('Christopher') 'CRASTA' >>> pe.encode('Niall') 'NAL' >>> pe.encode('Smith') 'SNAT' >>> pe.encode('Schmidt') 'SNAD'
>>> NYSIIS(max_length=-1).encode('Christopher') 'CRASTAFAR'
>>> pe_8m = NYSIIS(max_length=8, modified=True) >>> pe_8m.encode('Christopher') 'CRASTAFA' >>> pe_8m.encode('Niall') 'NAL' >>> pe_8m.encode('Smith') 'SNAT' >>> pe_8m.encode('Schmidt') 'SNAD'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
- class abydos.phonetic.Norphone[source]
Bases:
abydos.phonetic._phonetic._PhoneticNorphone.
The reference implementation by Lars Marius Garshol is available in [Gar15].
Norphone was designed for Norwegian, but this implementation has been extended to support Swedish vowels as well. This function incorporates the "not implemented" rules from the above file's rule set.
New in version 0.3.6.
- encode(word: str) str[source]
Return the Norphone code.
- Parameters
word (str) -- The word to transform
- Returns
The Norphone code
- Return type
str
Examples
>>> pe = Norphone() >>> pe.encode('Hansen') 'HNSN' >>> pe.encode('Larsen') 'LRSN' >>> pe.encode('Aagaard') 'ÅKRT' >>> pe.encode('Braaten') 'BRTN' >>> pe.encode('Sandvik') 'SNVK'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
- class abydos.phonetic.ONCA(max_length: int = 4, zero_pad: bool = True)[source]
Bases:
abydos.phonetic._phonetic._PhoneticOxford Name Compression Algorithm (ONCA).
This is the Oxford Name Compression Algorithm, based on [Gil97].
I can find no complete description of the "anglicised version of the NYSIIS method" identified as the first step in this algorithm, so this is likely not a precisely correct implementation, in that it employs the standard NYSIIS algorithm.
New in version 0.3.6.
Initialize ONCA instance.
- Parameters
max_length (int) -- The maximum length (default 5) of the code to return
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
New in version 0.4.0.
- encode(word: str) str[source]
Return the Oxford Name Compression Algorithm (ONCA) code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The ONCA code
- Return type
str
Examples
>>> pe = ONCA() >>> pe.encode('Christopher') 'C623' >>> pe.encode('Niall') 'N400' >>> pe.encode('Smith') 'S530' >>> pe.encode('Schmidt') 'S530'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
- encode_alpha(word: str) str[source]
Return the alphabetic ONCA code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic ONCA code
- Return type
str
Examples
>>> pe = ONCA() >>> pe.encode_alpha('Christopher') 'CRKT' >>> pe.encode_alpha('Niall') 'NL' >>> pe.encode_alpha('Smith') 'SNT' >>> pe.encode_alpha('Schmidt') 'SNT'
New in version 0.4.0.
- class abydos.phonetic.PHONIC(max_length: int = 5, zero_pad: bool = True, extended: bool = False)[source]
Bases:
abydos.phonetic._phonetic._PhoneticPHONIC code.
PHONIC is a Soundex-like algorithm defined in [Taf70].
New in version 0.4.1.
Initialize PHONIC instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 5)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
extended (bool) -- If True, this uses Taft's 'Extended PHONIC coding' mode, which simply omits the first character of the code.
New in version 0.4.1.
- encode(word: str) str[source]
Return the PHONIC code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The PHONIC code
- Return type
str
Examples
>>> pe = PHONIC() >>> pe.encode('Christopher') 'C6401' >>> pe.encode('Niall') 'N2500' >>> pe.encode('Smith') 'S0310' >>> pe.encode('Schmidt') 'S0631'
New in version 0.4.1.
- encode_alpha(word: str) str[source]
Return the alphabetic PHONIC code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic PHONIC value
- Return type
str
Examples
>>> pe = PHONIC() >>> pe.encode_alpha('Christopher') 'JRSTF' >>> pe.encode_alpha('Niall') 'NL' >>> pe.encode_alpha('Smith') 'SMT' >>> pe.encode_alpha('Schmidt') 'SJMT'
New in version 0.4.1.
- class abydos.phonetic.PSHPSoundexFirst(max_length: int = 4, german: bool = False)[source]
Bases:
abydos.phonetic._phonetic._PhoneticPSHP Soundex/Viewex Coding of a first name.
This coding is based on [HBD76].
Reference was also made to the German version of the same: [HBD79].
A separate class,
PSHPSoundexLastis used for last names.New in version 0.3.6.
Initialize PSHPSoundexFirst instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 4)
german (bool) -- Set to True if the name is German (different rules apply)
New in version 0.4.0.
- encode(fname: str) str[source]
Calculate the PSHP Soundex/Viewex Coding of a first name.
- Parameters
fname (str) -- The first name to encode
- Returns
The PSHP Soundex/Viewex Coding
- Return type
str
Examples
>>> pe = PSHPSoundexFirst() >>> pe.encode('Smith') 'S530' >>> pe.encode('Waters') 'W352' >>> pe.encode('James') 'J700' >>> pe.encode('Schmidt') 'S500' >>> pe.encode('Ashcroft') 'A220' >>> pe.encode('John') 'J500' >>> pe.encode('Colin') 'K400' >>> pe.encode('Niall') 'N400' >>> pe.encode('Sally') 'S400' >>> pe.encode('Jane') 'J500'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
- encode_alpha(fname: str) str[source]
Calculate the alphabetic PSHP Soundex/Viewex Coding of a first name.
- Parameters
fname (str) -- The first name to encode
- Returns
The alphabetic PSHP Soundex/Viewex Coding
- Return type
str
Examples
>>> pe = PSHPSoundexFirst() >>> pe.encode_alpha('Smith') 'SNT' >>> pe.encode_alpha('Waters') 'WTNK' >>> pe.encode_alpha('James') 'JN' >>> pe.encode_alpha('Schmidt') 'SN' >>> pe.encode_alpha('Ashcroft') 'AKK' >>> pe.encode_alpha('John') 'JN' >>> pe.encode_alpha('Colin') 'KL' >>> pe.encode_alpha('Niall') 'NL' >>> pe.encode_alpha('Sally') 'SL' >>> pe.encode_alpha('Jane') 'JN'
New in version 0.4.0.
- class abydos.phonetic.PSHPSoundexLast(max_length: int = 4, german: bool = False)[source]
Bases:
abydos.phonetic._phonetic._PhoneticPSHP Soundex/Viewex Coding of a last name.
This coding is based on [HBD76].
Reference was also made to the German version of the same: [HBD79].
A separate function,
PSHPSoundexFirstis used for first names.New in version 0.3.6.
Initialize PSHPSoundexLast instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 4)
german (bool) -- Set to True if the name is German (different rules apply)
New in version 0.4.0.
- encode(lname: str) str[source]
Calculate the PSHP Soundex/Viewex Coding of a last name.
- Parameters
lname (str) -- The last name to encode
- Returns
The PSHP Soundex/Viewex Coding
- Return type
str
Examples
>>> pe = PSHPSoundexLast() >>> pe.encode('Smith') 'S530' >>> pe.encode('Waters') 'W350' >>> pe.encode('James') 'J500' >>> pe.encode('Schmidt') 'S530' >>> pe.encode('Ashcroft') 'A225'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
- encode_alpha(lname: str) str[source]
Calculate the alphabetic PSHP Soundex/Viewex Coding of a last name.
- Parameters
lname (str) -- The last name to encode
- Returns
The PSHP alphabetic Soundex/Viewex Coding
- Return type
str
Examples
>>> pe = PSHPSoundexLast() >>> pe.encode_alpha('Smith') 'SNT' >>> pe.encode_alpha('Waters') 'WTN' >>> pe.encode_alpha('James') 'JN' >>> pe.encode_alpha('Schmidt') 'SNT' >>> pe.encode_alpha('Ashcroft') 'AKKN'
New in version 0.4.0.
- class abydos.phonetic.ParmarKumbharana[source]
Bases:
abydos.phonetic._phonetic._PhoneticParmar-Kumbharana code.
This is based on the phonetic algorithm proposed in [PK14].
New in version 0.3.6.
- encode(word: str) str[source]
Return the Parmar-Kumbharana encoding of a word.
- Parameters
word (str) -- The word to transform
- Returns
The Parmar-Kumbharana encoding
- Return type
str
Examples
>>> pe = ParmarKumbharana() >>> pe.encode('Gough') 'GF' >>> pe.encode('pneuma') 'NM' >>> pe.encode('knight') 'NT' >>> pe.encode('trice') 'TRS' >>> pe.encode('judge') 'JJ'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
- class abydos.phonetic.Phonem[source]
Bases:
abydos.phonetic._phonetic._PhoneticPhonem.
Phonem is defined in [GM88].
This version is based on the Perl implementation documented at [Wil05]. It includes some enhancements presented in the Java port at [dcm4che].
Phonem is intended chiefly for German names/words.
New in version 0.3.6.
- encode(word: str) str[source]
Return the Phonem code for a word.
- Parameters
word (str) --
transform (The word to) --
- Returns
The Phonem value
- Return type
str
Examples
>>> pe = Phonem() >>> pe.encode('Christopher') 'CRYSDOVR' >>> pe.encode('Niall') 'NYAL' >>> pe.encode('Smith') 'SMYD' >>> pe.encode('Schmidt') 'CMYD'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
- class abydos.phonetic.Phonet(mode: int = 1, lang: str = 'de')[source]
Bases:
abydos.phonetic._phonetic._PhoneticPhonet code.
phonet ("Hannoveraner Phonetik") was developed by Jörg Michael and documented in [Mic99].
This is a port of Jesper Zedlitz's code, which is licensed LGPL [Zed15].
That is, in turn, based on Michael's C code, which is also licensed LGPL [Mic07].
New in version 0.3.6.
Initialize AlphaSIS instance.
- Parameters
mode (int) -- The ponet variant to employ (1 or 2)
lang (str) --
de(default) for German,nonefor no language
New in version 0.4.0.
- encode(word: str) str[source]
Return the phonet code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The phonet value
- Return type
str
Examples
>>> pe = Phonet() >>> pe.encode('Christopher') 'KRISTOFA' >>> pe.encode('Niall') 'NIAL' >>> pe.encode('Smith') 'SMIT' >>> pe.encode('Schmidt') 'SHMIT'
>>> pe2 = Phonet(mode=2) >>> pe2.encode('Christopher') 'KRIZTUFA' >>> pe2.encode('Niall') 'NIAL' >>> pe2.encode('Smith') 'ZNIT' >>> pe2.encode('Schmidt') 'ZNIT'
>>> pe_none = Phonet(lang='none') >>> pe_none.encode('Christopher') 'CHRISTOPHER' >>> pe_none.encode('Niall') 'NIAL' >>> pe_none.encode('Smith') 'SMITH' >>> pe_none.encode('Schmidt') 'SCHMIDT'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
- class abydos.phonetic.PhoneticSpanish(max_length: int = - 1)[source]
Bases:
abydos.phonetic._phonetic._PhoneticPhoneticSpanish.
This follows the coding described in [AmonME12] and [delPAngelesEGGM15].
New in version 0.3.6.
Initialize PhoneticSpanish instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to unlimited)
New in version 0.4.0.
- encode(word: str) str[source]
Return the PhoneticSpanish coding of word.
- Parameters
word (str) -- The word to transform
- Returns
The PhoneticSpanish code
- Return type
str
Examples
>>> pe = PhoneticSpanish() >>> pe.encode('Perez') '094' >>> pe.encode('Martinez') '69364' >>> pe.encode('Gutierrez') '83994' >>> pe.encode('Santiago') '4638' >>> pe.encode('Nicolás') '6454'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
- encode_alpha(word: str) str[source]
Return the alphabetic PhoneticSpanish coding of word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic PhoneticSpanish code
- Return type
str
Examples
>>> pe = PhoneticSpanish() >>> pe.encode_alpha('Perez') 'PRS' >>> pe.encode_alpha('Martinez') 'NRTNS' >>> pe.encode_alpha('Gutierrez') 'GTRRS' >>> pe.encode_alpha('Santiago') 'SNTG' >>> pe.encode_alpha('Nicolás') 'NSLS'
New in version 0.4.0.
- class abydos.phonetic.Phonex(max_length: int = 4, zero_pad: bool = True)[source]
Bases:
abydos.phonetic._phonetic._PhoneticPhonex code.
Phonex is an algorithm derived from Soundex, defined in [LR96].
New in version 0.3.6.
Initialize Phonex instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 4)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
New in version 0.4.0.
- encode(word: str) str[source]
Return the Phonex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Phonex value
- Return type
str
Examples
>>> pe = Phonex() >>> pe.encode('Christopher') 'C623' >>> pe.encode('Niall') 'N400' >>> pe.encode('Schmidt') 'S253' >>> pe.encode('Smith') 'S530'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
- encode_alpha(word: str) str[source]
Return the alphabetic Phonex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Phonex value
- Return type
str
Examples
>>> pe = Phonex() >>> pe.encode_alpha('Christopher') 'CRST' >>> pe.encode_alpha('Niall') 'NL' >>> pe.encode_alpha('Smith') 'SNT' >>> pe.encode_alpha('Schmidt') 'SSNT'
New in version 0.4.0.
- class abydos.phonetic.Phonix(max_length: int = 4, zero_pad: bool = True)[source]
Bases:
abydos.phonetic._phonetic._PhoneticPhonix code.
Phonix is a Soundex-like algorithm defined in [Gad90].
This implementation is based on: - [Pfe00] - [Chr11] - [Kollar]
New in version 0.3.6.
Initialize Phonix instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 4)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
New in version 0.3.6.
- encode(word: str) str[source]
Return the Phonix code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Phonix value
- Return type
str
Examples
>>> pe = Phonix() >>> pe.encode('Christopher') 'K683' >>> pe.encode('Niall') 'N400' >>> pe.encode('Smith') 'S530' >>> pe.encode('Schmidt') 'S530'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
- encode_alpha(word: str) str[source]
Return the alphabetic Phonix code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Phonix value
- Return type
str
Examples
>>> pe = Phonix() >>> pe.encode_alpha('Christopher') 'KRST' >>> pe.encode_alpha('Niall') 'NL' >>> pe.encode_alpha('Smith') 'SNT' >>> pe.encode_alpha('Schmidt') 'SNT'
New in version 0.4.0.
- class abydos.phonetic.RefinedSoundex(max_length: int = - 1, zero_pad: bool = False, retain_vowels: bool = False)[source]
Bases:
abydos.phonetic._phonetic._PhoneticRefined Soundex.
This is Soundex, but with more character classes. It was defined at [Boy98].
New in version 0.3.6.
Initialize RefinedSoundex instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to unlimited)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
retain_vowels (bool) -- Retain vowels (as 0) in the resulting code
New in version 0.4.0.
- encode(word: str) str[source]
Return the Refined Soundex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Refined Soundex value
- Return type
str
Examples
>>> pe = RefinedSoundex() >>> pe.encode('Christopher') 'C93619' >>> pe.encode('Niall') 'N7' >>> pe.encode('Smith') 'S86' >>> pe.encode('Schmidt') 'S386'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
- encode_alpha(word: str) str[source]
Return the alphabetic Refined Soundex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Refined Soundex value
- Return type
str
Examples
>>> pe = RefinedSoundex() >>> pe.encode_alpha('Christopher') 'CRKTPR' >>> pe.encode_alpha('Niall') 'NL' >>> pe.encode_alpha('Smith') 'SNT' >>> pe.encode_alpha('Schmidt') 'SKNT'
New in version 0.4.0.
- class abydos.phonetic.RethSchek[source]
Bases:
abydos.phonetic._phonetic._PhoneticReth-Schek Phonetik.
This algorithm is proposed in [vonRethS77].
Since I couldn't secure a copy of that document (maybe I'll look for it next time I'm in Germany), this implementation is based on what I could glean from the implementations published by German Record Linkage Center (www.record-linkage.de):
Rules that are unclear:
Should 'C' become 'G' or 'Z'? (PPRL has both, 'Z' rule blocked)
Should 'CC' become 'G'? (PPRL has blocked 'CK' that may be typo)
Should 'TUI' -> 'ZUI' rule exist? (PPRL has rule, but I can't think of a German word with '-tui-' in it.)
Should we really change 'SCH' -> 'CH' and then 'CH' -> 'SCH'?
New in version 0.3.6.
- encode(word: str) str[source]
Return Reth-Schek Phonetik code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Reth-Schek Phonetik code
- Return type
str
Examples
>>> pe = RethSchek() >>> pe.encode('Joachim') 'JOAGHIM' >>> pe.encode('Christoph') 'GHRISDOF' >>> pe.encode('Jörg') 'JOERG' >>> pe.encode('Smith') 'SMID' >>> pe.encode('Schmidt') 'SCHMID'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
- class abydos.phonetic.RogerRoot(max_length: int = 5, zero_pad: bool = True)[source]
Bases:
abydos.phonetic._phonetic._PhoneticRoger Root code.
This is Roger Root name coding, described in [MKTM77].
New in version 0.3.6.
Initialize RogerRoot instance.
- Parameters
max_length (int) -- The maximum length (default 5) of the code to return
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
New in version 0.4.0.
- encode(word: str) str[source]
Return the Roger Root code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Roger Root code
- Return type
str
Examples
>>> pe = RogerRoot() >>> pe.encode('Christopher') '06401' >>> pe.encode('Niall') '02500' >>> pe.encode('Smith') '00310' >>> pe.encode('Schmidt') '06310'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
- encode_alpha(word: str) str[source]
Return the alphabetic Roger Root code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Roger Root code
- Return type
str
Examples
>>> pe = RogerRoot() >>> pe.encode_alpha('Christopher') 'JRST' >>> pe.encode_alpha('Niall') 'NL' >>> pe.encode_alpha('Smith') 'SMT' >>> pe.encode_alpha('Schmidt') 'JMT'
New in version 0.4.0.
- class abydos.phonetic.RussellIndex[source]
Bases:
abydos.phonetic._phonetic._PhoneticRussell Index.
This follows Robert C. Russell's Index algorithm, as described in [Rus18].
New in version 0.3.6.
- encode(word: str) str[source]
Return the Russell Index (integer output) of a word.
- Parameters
word (str) -- The word to transform
- Returns
The Russell Index value
- Return type
str
Examples
>>> pe = RussellIndex() >>> pe.encode('Christopher') '3813428' >>> pe.encode('Niall') '715' >>> pe.encode('Smith') '3614' >>> pe.encode('Schmidt') '3614'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
Changed in version 0.6.0: Made return a str
- encode_alpha(word: str) str[source]
Return the Russell Index (alphabetic output) for the word.
This follows Robert C. Russell's Index algorithm, as described in [Rus18].
- Parameters
word (str) -- The word to transform
- Returns
The Russell Index value as an alphabetic string
- Return type
str
Examples
>>> pe = RussellIndex() >>> pe.encode_alpha('Christopher') 'CRACDBR' >>> pe.encode_alpha('Niall') 'NAL' >>> pe.encode_alpha('Smith') 'CMAD' >>> pe.encode_alpha('Schmidt') 'CMAD'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
- class abydos.phonetic.SPFC[source]
Bases:
abydos.phonetic._phonetic._PhoneticStandardized Phonetic Frequency Code (SPFC).
Standardized Phonetic Frequency Code is roughly Soundex-like. This implementation is based on page 19-21 of [MKTM77].
New in version 0.3.6.
- encode(word: Union[str, Sequence[str]]) str[source]
Return the Standardized Phonetic Frequency Code (SPFC) of a word.
- Parameters
word (str) -- The word to transform
- Returns
The SPFC value
- Return type
str
- Raises
AttributeError -- Word attribute must be a string with a space or period dividing the first and last names or a tuple/list consisting of the first and last names
Examples
>>> pe = SPFC() >>> pe.encode('Christopher Smith') '01160' >>> pe.encode('Christopher Schmidt') '01160' >>> pe.encode('Niall Smith') '01660' >>> pe.encode('Niall Schmidt') '01660'
>>> pe.encode('L.Smith') '01960' >>> pe.encode('R.Miller') '65490'
>>> pe.encode(('L', 'Smith')) '01960' >>> pe.encode(('R', 'Miller')) '65490'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
- encode_alpha(word: str) str[source]
Return the alphabetic SPFC of a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic SPFC value
- Return type
str
Examples
>>> pe = SPFC() >>> pe.encode_alpha('Christopher Smith') 'SDCMS' >>> pe.encode_alpha('Christopher Schmidt') 'SDCMS' >>> pe.encode_alpha('Niall Smith') 'SDMMS' >>> pe.encode_alpha('Niall Schmidt') 'SDMMS'
>>> pe.encode_alpha('L.Smith') 'SDEMS' >>> pe.encode_alpha('R.Miller') 'EROES'
>>> pe.encode_alpha(('L', 'Smith')) 'SDEMS' >>> pe.encode_alpha(('R', 'Miller')) 'EROES'
New in version 0.4.0.
- class abydos.phonetic.SfinxBis(max_length: int = - 1)[source]
Bases:
abydos.phonetic._phonetic._PhoneticSfinxBis code.
SfinxBis is a Soundex-like algorithm defined in [Axe09].
This implementation follows the reference implementation: [Sjoo09].
SfinxBis is intended chiefly for Swedish names.
New in version 0.3.6.
Initialize SfinxBis instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to unlimited)
New in version 0.4.0.
- encode(word: str) str[source]
Return the SfinxBis code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The SfinxBis value
- Return type
str
Examples
>>> pe = SfinxBis() >>> pe.encode('Christopher') 'K68376' >>> pe.encode('Niall') 'N4' >>> pe.encode('Smith') 'S53' >>> pe.encode('Schmidt') 'S53'
>>> pe.encode('Johansson') 'J585' >>> pe.encode('Sjöberg') '#162'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
Changed in version 0.6.0: Made return a str only (comma-separated)
- encode_alpha(word: str) str[source]
Return the alphabetic SfinxBis code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic SfinxBis value
- Return type
str
Examples
>>> pe = SfinxBis() >>> pe.encode_alpha('Christopher') 'KRSTFR' >>> pe.encode_alpha('Niall') 'NL' >>> pe.encode_alpha('Smith') 'SNT' >>> pe.encode_alpha('Schmidt') 'SNT'
>>> pe.encode_alpha('Johansson') 'JNSN' >>> pe.encode_alpha('Sjöberg') 'ŠPRK'
New in version 0.4.0.
Changed in version 0.6.0: Made return a str only (comma-separated)
- class abydos.phonetic.SoundD(max_length: int = 4)[source]
Bases:
abydos.phonetic._phonetic._PhoneticSoundD code.
SoundD is defined in [VB12].
New in version 0.3.6.
Initialize SoundD instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 4)
New in version 0.4.0.
- encode(word: str) str[source]
Return the SoundD code.
- Parameters
word (str) -- The word to transform
- Returns
The SoundD code
- Return type
str
Examples
>>> pe = SoundD() >>> pe.encode('Gough') '2000' >>> pe.encode('pneuma') '5500' >>> pe.encode('knight') '5300' >>> pe.encode('trice') '3620' >>> pe.encode('judge') '2200'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
- encode_alpha(word: str) str[source]
Return the alphabetic SoundD code.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic SoundD code
- Return type
str
Examples
>>> pe = SoundD() >>> pe.encode_alpha('Gough') 'K' >>> pe.encode_alpha('pneuma') 'NN' >>> pe.encode_alpha('knight') 'NT' >>> pe.encode_alpha('trice') 'TRK' >>> pe.encode_alpha('judge') 'KK'
New in version 0.4.0.
- class abydos.phonetic.Soundex(max_length: int = 4, var: str = 'American', reverse: bool = False, zero_pad: bool = True)[source]
Bases:
abydos.phonetic._phonetic._PhoneticSoundex.
Three variants of Soundex are implemented:
'American' follows the American Soundex algorithm, as described at [Sta07] and in [Knu98]; this is also called Miracode
'special' follows the rules from the 1880-1910 US Census retrospective re-analysis, in which h & w are not treated as blocking consonants but as vowels. Cf. [Rep13].
'Census' follows the rules laid out in GIL 55 [Sta97] by the US Census, including coding prefixed and unprefixed versions of some names
New in version 0.3.6.
Initialize Soundex instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 4)
var (str) --
The variant of the algorithm to employ (defaults to
American):Americanfollows the American Soundex algorithm, as described at [Sta07] and in [Knu98]; this is also called Miracodespecialfollows the rules from the 1880-1910 US Census retrospective re-analysis, in which h & w are not treated as blocking consonants but as vowels. Cf. [Rep13].Censusfollows the rules laid out in GIL 55 [Sta97] by the US Census, including coding prefixed and unprefixed versions of some names
reverse (bool) -- Reverse the word before computing the selected Soundex (defaults to False); This results in "Reverse Soundex", which is useful for blocking in cases where the initial elements may be in error.
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
New in version 0.4.0.
- encode(word: str, **kwargs: Any) str[source]
Return the Soundex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Soundex value
- Return type
str
Examples
>>> pe = Soundex() >>> pe.encode("Christopher") 'C623' >>> pe.encode("Niall") 'N400' >>> pe.encode('Smith') 'S530' >>> pe.encode('Schmidt') 'S530'
>>> Soundex(max_length=-1).encode('Christopher') 'C623160000000000000000000000000000000000000000000000000000000000' >>> Soundex(max_length=-1, zero_pad=False).encode('Christopher') 'C62316'
>>> Soundex(reverse=True).encode('Christopher') 'R132'
>>> pe.encode('Ashcroft') 'A261' >>> pe.encode('Asicroft') 'A226'
>>> pe_special = Soundex(var='special') >>> pe_special.encode('Ashcroft') 'A226' >>> pe_special.encode('Asicroft') 'A226'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
Changed in version 0.6.0: Made return a str only (comma-separated)
- encode_alpha(word: str) str[source]
Return the alphabetic Soundex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Soundex value
- Return type
str
Examples
>>> pe = Soundex() >>> pe.encode_alpha("Christopher") 'CRKT' >>> pe.encode_alpha("Niall") 'NL' >>> pe.encode_alpha('Smith') 'SNT' >>> pe.encode_alpha('Schmidt') 'SNT'
New in version 0.4.0.
- class abydos.phonetic.SoundexBR(max_length: int = 4, zero_pad: bool = True)[source]
Bases:
abydos.phonetic._phonetic._PhoneticSoundexBR.
This is based on [Mar15].
New in version 0.3.6.
Initialize SoundexBR instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 4)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
New in version 0.4.0.
- encode(word: str) str[source]
Return the SoundexBR encoding of a word.
- Parameters
word (str) -- The word to transform
- Returns
The SoundexBR code
- Return type
str
Examples
>>> pe = SoundexBR() >>> pe.encode('Oliveira') 'O416' >>> pe.encode('Almeida') 'A453' >>> pe.encode('Barbosa') 'B612' >>> pe.encode('Araújo') 'A620' >>> pe.encode('Gonçalves') 'G524' >>> pe.encode('Goncalves') 'G524'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
- encode_alpha(word: str) str[source]
Return the alphabetic SoundexBR encoding of a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic SoundexBR code
- Return type
str
Examples
>>> pe = SoundexBR() >>> pe.encode_alpha('Oliveira') 'OLPR' >>> pe.encode_alpha('Almeida') 'ALNT' >>> pe.encode_alpha('Barbosa') 'BRPK' >>> pe.encode_alpha('Araújo') 'ARK' >>> pe.encode_alpha('Gonçalves') 'GNKL' >>> pe.encode_alpha('Goncalves') 'GNKL'
New in version 0.4.0.
- class abydos.phonetic.SpanishMetaphone(max_length: int = 6, modified: bool = False)[source]
Bases:
abydos.phonetic._phonetic._PhoneticSpanish Metaphone.
This is a quick rewrite of the Spanish Metaphone Algorithm, as presented at https://github.com/amsqr/Spanish-Metaphone and discussed in [MLM12].
Modified version based on [delPAngelesBailonM16].
New in version 0.3.6.
Initialize AlphaSIS instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 6)
modified (bool) -- Set to True to use del Pilar Angeles & Bailón-Miguel's modified version of the algorithm
New in version 0.4.0.
- encode(word: str) str[source]
Return the Spanish Metaphone of a word.
- Parameters
word (str) -- The word to transform
- Returns
The Spanish Metaphone code
- Return type
str
Examples
>>> pe = SpanishMetaphone() >>> pe.encode('Perez') 'PRZ' >>> pe.encode('Martinez') 'MRTNZ' >>> pe.encode('Gutierrez') 'GTRRZ' >>> pe.encode('Santiago') 'SNTG' >>> pe.encode('Nicolás') 'NKLS'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
- class abydos.phonetic.StatisticsCanada(max_length: int = 4)[source]
Bases:
abydos.phonetic._phonetic._PhoneticStatistics Canada code.
The original description of this algorithm could not be located, and may only have been specified in an unpublished TR. The coding does not appear to be in use by Statistics Canada any longer. In its place, this is an implementation of the "Census modified Statistics Canada name coding procedure".
The modified version of this algorithm is described in Appendix B of [MKTM77].
New in version 0.3.6.
Initialize StatisticsCanada instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 4)
New in version 0.4.0.
- encode(word: str) str[source]
Return the Statistics Canada code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Statistics Canada name code value
- Return type
str
Examples
>>> pe = StatisticsCanada() >>> pe.encode('Christopher') 'CHRS' >>> pe.encode('Niall') 'NL' >>> pe.encode('Smith') 'SMTH' >>> pe.encode('Schmidt') 'SCHM'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
- class abydos.phonetic.Waahlin(encoder: Optional[abydos.phonetic._phonetic._Phonetic] = None)[source]
Bases:
abydos.phonetic._phonetic._PhoneticWåhlin code.
Wåhlin's first-letter coding is based on the description in [Eri97].
New in version 0.3.6.
Initialize Waahlin instance.
- Parameters
encoder (_Phonetic) -- An initialized phonetic algorithm object
New in version 0.4.0.
- encode(word: str, alphabetic: bool = False) str[source]
Return the Wåhlin code for a word.
- Parameters
word (str) -- The word to transform
alphabetic (bool) -- If True, the encoder will apply its alphabetic form (.encode_alpha rather than .encode)
- Returns
The Wåhlin code value
- Return type
str
Examples
>>> pe = Waahlin() >>> pe.encode('Christopher') 'KRISTOFER' >>> pe.encode('Niall') 'NJALL' >>> pe.encode('Smith') 'SMITH' >>> pe.encode('Schmidt') '*MIDT'
New in version 0.4.0.
- encode_alpha(word: str) str[source]
Return the alphabetic Wåhlin code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Wåhlin code value
- Return type
str
Examples
>>> pe = Waahlin() >>> pe.encode_alpha('Christopher') 'KRISTOFER' >>> pe.encode_alpha('Niall') 'NJALL' >>> pe.encode_alpha('Smith') 'SMITH' >>> pe.encode_alpha('Schmidt') 'ŠMIDT'
New in version 0.4.0.