The art of "Soundex" PDF Print E-mail
User Rating: / 3
PoorBest 
Wednesday, 04 June 2008

Preamble
The term Soundex dates back to 1918. The first algorithm of this type was invented by Margaret O'Dell and Robert C. Russell, probably because of problems related to U.S. Census. Because of their constitution, the United States of America are required to identify their population every 10 years. At the end of the last century, the problem of the census had become a major puzzle. Treat information on a population of several tens of millions of Americans to hand called a phenomenal job. The first benefit was a certain Hollerith, who manufactured and introduced the first machines mécanographiques accounting, thereby reducing the processing time information of 75%. In the same time many people found ingenious ideas to sort, classify, search among the data collected. It was undoubtedly true of the first search mechanism by consonance, which its authors called Soundex. Since then, this term includes a family of algorithms that we will detail.

1. The principle
How in a list of names of people arriving to find a certain DUPONT or DUPOND or DUPAN or DEPAIN?
It's simple, just rely on the consonance and not the words themselves.

All algorithms Soundex based on a basic principle is to codify the word eliminating double letters, letters silent (H in particular) and bringing the sounds of some letters. Once this consolidation obtained on the stores with the basic data and conducts research on a direct comparison between the Soundex thus obtained and the word also codified in Soundex.

The research is very efficient since it leads to a query whose criterion is equality, and as long as we place an index on the field that stores the Soundex code, then it proves to be as fast as to find a recording no key.

Some database using native of Soundex for research. It is thus Paradox and its operator as "valid requests QBE, Oracle, or even become Watcom SQL SQL Anywhere.
But beware: in all these cases, it is likely that your "soundex" operates on the consonance Anglo-Saxon language and not on specific sounds to the French language.

2. The first Soundex
Here is the original algorithm of Russell & O'Dell dating from 1918
♦It transcribes the word in capital letters
♦It retains the first letter of the word
♦It then removes all the vowels, H and W
♦On transcode then the remaining letters using the following table

Letter Type on consonnance code
B F P V Bilabiales 1
C G K Q J S X Z Labiodentales 2
D T Dentales 3
L Alveolaires 4
M N Velaires 5
R Laryngales 6

It then eliminates all pairs of consecutive numbers duplicated
It retains just 4 characters Soundex thus obtained, and is complete with zeroes as appropriate
Given that the characters of modern tables, can now seize capital letters accented, it is necessary to transcribe those letters in words simple. In particular, in the French language, c capital with cedilla (Ç) will be transformed into S. Just as the character Œ (in the word heart, for example) will be transformed into E.

In addition, it is necessary to remove the dead space before and after the word and whites and the dash.
All this preparation is done in a feature common to all soundex name, "prepare".

3. The code
Here is the code DELPHI (Pascal Objetc) associated with this first Soundex
implementation
uses
sysUtils;
// On the empty white head and tail, the chain converted to uppercase
// and we replace the capitals accented, with cedilla capital c
// and e o capital in a letter equivalent Œ Uppercase normal
Function prepare (sIn: string): string;
var
tailleSin, i : integer;
car : char;
sOut : string;
begin
// pooling capital
sIn := Trim(sIn);
sIn := upperCase(sIn);
tailleSin := length(sIn);
sOut := '';
for i:= 1 to tailleSin
do
begin
car := sIn[i];
CASE car of
'Â','Ä','À' : car := 'A';
'Ç' : car := 'S';
'È','É','Ê','Ë','Œ' : car := 'E';
'Î','Ï' : car := 'I';
'Ô','Ö' : car := 'O';
'Ù','Û','Ü' : car := 'U';
END;
sOut := sOut+car;
end;
// Removal of white and dashes
sIn := sOut;
sOut := '';
for i := 1 to length(sIn)
do
if (sIn[i] <> ' ') and (sIn[i] <> '-')
then
sOut := sOut + sIn[i];
result := sOut;
end;

//body of the function
function soundex(sIn : string) : sound;
type
TabloLettres = array[1..26] of char;
Const
Encode : TabloLettres =
('0','1','2','3','0','1','2','0','0','2',
'2','4','5','5','0','1','2','6','2','3',
'0','1','0','2','0','2');
var
iSX, iiSX : smallint;
tailleSin : integer;
sOut : string;
begin

//cas trivial : the chain is empty
if sIn = ''
then
begin
result := '0000';
exit;
end;

//prepare the string
sIn := prepare(sIn);

//processing the second side effect: chain length 1
if length(sIn) = 1
then
begin
result := copy(sOut,1,1)+'000';
exit;
end;

//3rd side effect: the first is a letter H on the cut of the word
if sIn[1] = 'H'
then
sIn := copy(sIn,2,tailleSin-1);

// Processing for all other cases
// loop on the length of the chain target
tailleSIn := length(sIn);
for iSX :=2 to tailleSIn
do
// if the character is between the letters A to Z: on transcode
if sIn[iSX] in ['A'..'Z']
then
sIn[iSX] := enCode[ord(sIn[iSX])-ord('A')+1]
// if the character is not between A and Z: there is a zero
else
sIn[iSX] := '0';

// retrieves the first letter of the word
sOut:='';
sOut := sIn[1];
// Second phase transcoding
iiSX := 2;
for iSX :=2 to tailleSIn
do
begin
// if the character is a nonzero is retained
if sIn[iSX] <> '0'
then
begin
sout := sout+sIn[iSX];
// sout[iiSX] := sIn[iSX];
iiSX := iiSX+1;
end;
// if it exceeds 4 characters, leaving the loop
if iiSX > 4
then
begin
result := sOut;
exit;
end;
end;
// ess than 4 characters: one complete with zeroes
While length(sOut) < 4
do
sout := sout+'0';
result := sOut;
end;
4. Soundex 2
Soundex 2 is an algorithm at home by your editor, and derived from the algorithm described in the book by Joe Celko - "advanced SQL," published in 1995 at Thomson Publishing International. It is based on the algorithm Gus Baird (Georgia Tech) stated on page 85.

Unlike the precedent that makes use only figures with the exception of the first character, this new version retains most of the letters. By comparing the two versions, one finds, the first for a number of possible combinations of 26x10x10x10 = 26 000 whereas in this version improved the number of different combinations climbs until about 20x20x20x20 = 160 000
It is therefore more efficient in many cases, meaning it will select at least occurrence of research with the same footprint of 4 characters.

Here is this new version francisee:
♦Remove white on the right and left of the name
♦Converting the name in uppercase
♦Convert accented letters and c cedilla in non-accented letters
♦Remove whites and indents
♦Replace groups of letters with their correspondence (maintaining order in the table):

GUI KI
GUE KE
GA KA
GO KO
GU K
CA KA
CO KO
CU KU
Q K
CC K
CK K

♦Replace all vowels except by A Y except where an A top
♦Replace the following prefixes in their correspondence:

MAC MCC  
ASA AZA (ASAmian)
KN NN (KNight)
PF FF (PFeiffer)
SCH SSS (SCHindler)
PH FF (PHilippe)


♦Remove H unless they are preceded by C or S
♦Delete Y unless it is preceded by an A
♦Delete the following endings A, T, D and S
♦Remove all except A A headache if any
♦Remove all sub chains letter repetitive
♦Keep the first 4 characters of the word, and if necessary supplement with white characters to get 4
The code of this version of soundex uses a search procedure and replacement "SearchReplace", including the following code:
// Search function and replacement of sub chain in a chain
Function SearchReplace(sIn : string; mot1 : string; mot2 : string) : string;
var
tailleSin : integer;
TailleMot : integer;
posMot : integer;
begin
//side effect : To replace the word is the same as the search word
if mot1 = mot2
then
begin
result := sIn;
exit;
end;
// ATTENTION: side effect unmanaged:
// replace the word contains the search word
// Example: replace 'no' with 'not'
tailleSin := length(Sin);
TailleMot := length(mot1);
posMot := pos(mot1,sIn);
While posMot > 0
do
begin
// word to replace the east at the beginning of chain
if posMot = 1
then
sIn := mot2+copy(Sin,tailleMot+1,tailleSin-tailleMot)
else
// word to replace the east end of a chain
if posMot + tailleMot -1 = tailleSin
then
sIn := copy(Sin,1,posMot-1)+mot2
// replace the word is in the midst
else
sIn := copy(Sin,1,posMot-1)+mot2
+copy(sin,posMot+tailleMot,tailleSin-(posMot+tailleMot-1));
posMot := pos(mot1,sIn);
end;
result := Sin;
end;

Finally, here is the code of this second Soundex, entitled Soundex2:

// Soundex2 at home by Frederic BROUARD
function soundex2(sIn : string) : sound;
type
TabloVoyell = array[1..4] of char;
TabloCombi1 = array[1..11,1..2] of string;
TabloCombi2 = array[1..5,1..2] of string;
Const
Voyelle : TabloVoyell =
('E',
'I',
'O',
'U');
Combin1 : TabloCombi1 =
(('GUI','KI'),
('GUE','KE'),
('GA','KA'),
('GO','KO'),
('GU','K'),
('CA','KA'),
('CO','KO'),
('CU','KU'),
('Q','K'),
('CC','K'),
('CK','K'));
Combin2 : TabloCombi2 =
(('ASA','AZA'),
('KN','NN'),
('PF','FF'),
('PH','FF'),
('SCH','SSS'));
var
i : integer; // index loop
lSin : integer; // length of the channel entrance
prfx : string; // prefix
sIn2 : string; // sIn least the first letter
let : string; // letter
begin

// cas trivial: the chain is empty
if sIn = ''
then
begin
result := ' ';
exit;
end;

// prepare the string: Steps 1, 2 and 3
sIn := prepare(sIn);
lSin := length(sIn);

// processing the second side effect: chain length 1
if lSin = 1
then
begin
result := sIn+' ';
exit;
end;

// stages 1, 2, 3 and 4: removes white, is capitalized,
// converts accents and c cedilla
sIn := prepare(sIn);

// Step 5: replacing the primary consonnances
for i := 1 to 4
do
sIn := SearchReplace(sIn,Combin1[i,1],Combin1[i,2]);

// Step 6: replacing the vowels except Y and except the first by A
lSin := length(sIn);
sIn2 := copy(sIn,2,lSin-1);
for i := 1 to 4
do
sIn2 := SearchReplace(sIn2,Voyelle[i],'A');
sIn := sIn[1]+sIn2;

// Step 7: on replaces prefixes
lSin := length(sIn);
if lSin>=2
then
begin
prfx := copy(sIn,1,2);
if (prfx = 'KN')
then
prfx := 'NN';
if (prfx = 'PH') or (prfx = 'PF')
then
prfx := 'FF';
if lSin = 2
then
sIn := prfx
else
sIn := prfx+copy(sIn,3,lSin-2);
end;
if lSin>=3
then
begin
prfx := copy(sIn,1,3);
if (prfx = 'MAC')
then
prfx := 'MCC';
if (prfx = 'SCH')
then
prfx := 'SSS';
if (prfx = 'ASA')
then
prfx := 'AZA';
if lSin = 3
then
sIn := prfx
else
sIn := prfx+copy(sIn,4,lSin-3);
end;

// Step 8: retain the first letter and it is
// eplacements complementary
sIn2 := copy(Sin,2,lSin-1);
for i := 1 to 5
do
sIn2 := SearchReplace(sIn2,Combin2[i,1],Combin2[i,2]);
sIn := sIn[1]+sIn2;

// Step 9: abolish H except CH or SH
lSin := length(sIn);
sIn2 := '';
for i := 1 to lSin
do
// H not retain the letter
if (sIn[i] <> 'H')
then
begin
sIn2 := SIn2+sIn[i];
continue;
end
else
// H is preceded by an S or an C is retained
if (i>1) and ((sIn[i-1] = 'C') or (sIn[i-1] = 'S'))
then
sIn2 := Sin2+sIn[i];
sIn := Sin2;
lSin := length(sIn);

// stage 10: abolish Y unless preceded by an A
lSin := length(sIn);
sIn2 := '';
for i := 1 to lSin
do
// Y not retain the letter
if (sIn[i] <> 'Y')
then
begin
sIn2 := SIn2+sIn[i];
continue;
end
else
// Y is preceded by an A on the preserves
if (sIn[i-1] = 'A')
then
sIn2 := Sin2+sIn[i];
sIn := Sin2;
lSin := length(sIn);

// stage 11: removing endings A, T, D, S,
let := copy(sIn,lSin,1);
if (let = 'A') or (let = 'D') or(let = 'S') or (let = 'T')
then
sIn := copy(sIn,1,lSin-1);

//stage 12: suppression of all except A top
lSin := length(sIn);
sIn2 := copy(sIn,1,1);
for i := 2 to lSin
do
// A not retain the letter
if (sIn[i] <> 'A')
then
begin
sIn2 := sIn2+sIn[i];
continue;
end;
sIn := Sin2;
lSin := length(sIn);

// Stage 13: it removes repetitive
let := copy(sIn,1,1);
sIn2 := let;
for i := 2 to lSin
do
begin
if sIn[i] = let
then
continue;
let := sIn[i];
Sin2 := Sin2 + sIn[i];
end;
sIn := sIn2;

// stage 14: it only takes 4 or characters on complete with white
while length(sIn) < 4
do
Sin := Sin+' ';
if length(sIn) > 4
then
sIn := copy(sIn,1,4);

result := sIn;

end;
5. Phonex
Phonex is a Soundex algorithm although most advanced version francisee of Soundex2 and developed by yours truly. Know that Phonex is optimized for the French language, knows how to recognize different types of sounds like the sounds' on ',' I ',' ein ', etc. ... and place its result as a real-type double precision (5.0 x 1.7 x 10-324 .. 10308 about 15 to 16 significant digits). Her time of calculation is twice Soundex and 30% higher only Soundex2.
Algorithms Phonex
Copyright Frederic BROUARD (31/3/99)
Thanks to Florence MARQUIS, speech therapist, for his help in the development of this algorithm
1 replace them by i
2 remove h which are not preceded by c / s / p
3 replacement ph f
4 replace groups of letters:

Gan Kan
GAM Kam
gain Kain
Gaim kaim

5 replace the following occurrences, if they are followed by a letter a, e, i, o, u:

Ain YN
Ein YN
AIM YN
EIM YN

6 replacement groups 3 letters (sons' o ',' oua ',' ein '):

water o
OAU 2
Ein 4
Ain 4
EIM 4
AIM 4

7 replacement of its' e ':

e y
e y
e y
ai y
EI y
er YR
ESS yss
and YT

8 replace groups of 2 letters (his' an 'and' in '), unless it is followed by a letter, e, io, or u sound 1 to 4:

an 1
am 1
in 1
EM 1
in 4

9 replace s by zs'ils are followed and preceded by the letters a, e, i, o, u or sound 1 to 4
10 10 substitute for groups of 2 letters:

Oe e
eu e
the o
OI 2
Oy 2
or 3

11 groups to replace letters

ch 5
SCH 5
sh 5
SS s
SC s

12 replace c by a ss'il is followed by an e or an i
13 replace the letters or letters:

c k
q k
qu k
Gu k
GA Ka
Go KO
Gy Ky

14 replace the letters follows:
to
o
d
t
p
t
j
g
b
f
v
f
m
n
15 Delete duplicate letters
16 Delete the following endings: t, x
Apply to 17 letters on each corresponding numeric code from the last letter
0
1
1
2
2
3
3
4
4
5
5
e
6
f
7
g
8
h
9
i
10
k
11
l
12
n
13
o
14
r
15
s
16
t
17
u
18
w
19
x
20
y
21
z
18 Convert numeric codes resulting in a number of 22 basis expressed in floating point.

Example: name "PHYLAURHEIMSMET"
1
PHILAURHEIMSMET
2
PHILAUREIMSMET
3
FILAUREIMSMET
4
FILAUREIMSMET
5
FILAUREIMSMET
6
FILAUR4SMET
7
FILAUR4SMY
8
FILAUR4SMY
9
FILAUR4SMY
10
FILOR4SMY
11
FILOR4SMY
12
FILOR4SMY
13
FILOR4SMY
14
FILOR4SNY
15
FILOR4SNY
16
FILOR4SNY
17
FILOR4SNY
18
6, 9, 11, 13, 14, 5, 15, 12, 20
19
6 * 22 ^ (-1) + 9 * 22 ^ (-2) + 11 * 22 (-3) + 13 * 22 (-4) + 14 * 22 (-5) + 5 * 22 (-6) + 5 * 22 (-7) + 12 * 22 (-8) + 20 * 22 (-9)
20
0.179864540784299185

6. Tests
To achieve our tests, we used a database containing 32 137 names of people.
The calculation time with a Pentium 300 Mhz equipped with 64 MB RAM, were as follows:
    Soundex 7 seconds
    Soundex2 11 seconds
    Phonex 14 seconds
To complete the table.
Among the names most frequently encountered, we selected for testing, the following names:
Name
Soundex
Soundex2
Phonex
Soundex
Soundex2
Phonex
As *
MARTIN
M635
MRTN
9 215 667 719 874.02
33
31
2
110
BERNARD
B656
BRNR
3 920 163 630 012.01
19
18
2
92
FAURE
F600
EN
5 242 968 742 851.01
13
51
8
80
PEREZ
P620
PRZ
1.2657878733906 e +13
24
12
7
40
GROS
G620
GR
6 073 270 560 252.01
25
26
6
33
CHAPUIS
C120
CHP
2 070 855 664 353.00
15
8
3
11
BOYER
B600
BYR
3 250 278 687 537.01
26
3
1
97
GAUTHIER
G360
KTR
7 630 177 314 816.02
10
12
10
30
REY
R000
RY
1.1044274933412 e +13
15
11
5
31
BARTHELEMY
B634
BRTL
3 655 717 558 143.01
35
23
2
4
HENRY
E560
ANR
506 105 880 021.001
6
7
2
21
MILL
M450
MLN
9 209 223 008 496.02
12
28
5
50
ROUSSEAU
R220
RS
1.0805759350911 e +13
36
17
7
11
We can then make the average number of occurrences and we get the following table:
Soundex
Soundex2
Phonex
As *
AVERAGES
21
19
5
47
* "Like" is the operator "as" (like) QBE Paradox of which makes comparisons phonetic.
In 8 cases out of 13, Soundex2 gets fewer hits than Soundex. But in the case of FAURE, the difference is very important. Soundex recovers FEREY, FERY, FREY and FUERI, while he forgets FORT! However Soundex2 is more tolerant and recovers and FORT PHAURE.

As for Phonex, he gets very few names:
For FAURE, he recovered: FARRE, FAURE, FORT, FOURR, PHAURE, VARD and VAURE
For PEREZ, he recovered: PERET, PEREZ, PERRAIX, PERRET, PEYRET, DEREI, DHERET
For GROS, he recovered: GRAU, GROS, GROSS, GROZ, GRAS, GRASS
For GAUTHIER, he recovered: GAUTHIER, GAUTIER, GOUDIER, GOUTHIER, CADIER, CATTIER, COPY, COTTIER, COUPIER, COUTIER, while Soundex has not recovered GOUTHIER ...
For MILL, it gets: MALLEIN, MOLEINS, MOLIN, MILL, NAULIN
For ROUSSEAU, it gets ROUSSEAU, ROUSSEAUX, ROUSSOT, RASSAT, RASSSAT, ROSSAT, ROSSO
And for REY: RAIS, RAY, REIX, REY and REYT
In short, we advise you to use Soundex 2, that your editor (and author) offers free, when the database is limited to a few tens of thousands of names.

7. Implementation in databases
Today, most of the implementation of SQL servers are equipped with an algorithm Soundex base, echoing that of Russell and O'Dell. As a Microsoft SQL Server, according SOUNDEX () return the basic code of Soundex.

To find people whose surname is similar a name typed on the keyboard may, for example, use the SQL code follows:

Select * from T_PERSONNE Where SOUNDEX (: LeNom) = SOUNDEX (T_PERSONNE.NOM_PERS)

Where: LeNom variable is passed as an argument of the complaint.
However, this approach is quite a handicap for treatment, above all, and it is in the interest of Soundex, when the base is large, tables and the consequent number of lines of the table PERSON important.
In this case a better way to implement such a device is to create the table in person, a column SOUNDEX_PERS CHAR (4) in which we will feed data automatically using the triggers INSERT and UPDATE.
On the occasion you have any interest to put an index on this column to accelerate the process of search and retrieval.
Therefore, we can do a search on this column directly rather than call 13 246 times the Soundex procedure to search a table with 13 245 names.

Example:

Select * from T_PERSONNE Where SOUNDEX (: LeNom) = T_PERSONNE.SOUNDEX_NOM_PERS

Of course if you choose to use a custom as Soundex Soundex2 or Phonex it will scale column depending on the type of data to receive.
A read on the subject:

Soundex:
http://www.bradandkathy.com/genealogy/overviewofsoundex.html
Metaphone, double metaphone:
http://aspell.sourceforge.net/metaphone/

8. In addition
To find out if two SOUNDEX are very different or very little different, it has developed a series of functions whose use is to be taken "with tweezers."
8.1. HAMMING and its difference
HAMMING difference is the number of characters not identical to the same position in two chaines character same lengths.
For example, the following channels: "D823" and "M843" have a difference of HAMMING 2.
Thus two soundex are identical if the difference HAMMING is zero. There are similar whether this difference is 1. They are totally different if the distances HAMMING is 4 (the maximum in this case).
The difference in HAMMING algorithm is a simple and highly efficient because of a linear cost.
You can access this particular function operating on soundex on a few of which RDBMS SQL Server (depending difference which, curiously works "in reverse ")...

A read on the subject: http://merlin.mbcr.bcm.tmc.edu:8001/bcd/Curric/PrwAli/node2.html

8.2. Levenshtein and its distance
LDA (Levenshtein distance algorithm) calculates the distance Levenshtein (named after its inventor) defined as the minimum number of characters must be replaced, add or change to convert a string into another. \
Some examples:

DOORS
PORTER
1 (transformation of S R)
DOOR
PORTER
1 (adding a letter R)
POTES
DOORS
1 (adding a letter R)
POTE
POSTER
2: POTE step 0
     POST step 1
     Step 2 POSTER
DEPORTEES
POSTERS
4: POSTERS
     D POSTERS step 1
     DEPOSTERS step 2
     DEPORTERS stage 3
     DEPORTEES step 4

A note, English literature refers to "edit distance." Is this an anti Soviet primary?
In fact, this "distance" is the number of operations unit insertions, deletions and replacements driving the chain has become the source channel target. This operation and the algorithm that follows is considered as the cause of the first programming methods algorithms genetically modified. Indeed this algorithm uses the technique of "backtraking" and therefore recursion.

A read on the subject:
The original: V. I. Levenshtein, "Binary Correcting Codes Capable of Deletions, inserts and Reversals," in Soviet Physics Dokl. No. 10, p707 to 710 (1966)
http://www.merriampark.com/ld.htm
http://www-igm.univ-mlv.fr/ ~ lecroq/seqcomp/node2.html
http://www.cut-the-knot.com/do_you_know/Strings.html

These algorithms are currently strengths used in the case of genetic research because they can compare the genetic codes which are represented by a machine large châines characters including only the letters acg and t.
A read on the subject: http://www.csis.hku.hk/ ~ nikos/courses/CSIS7101/strings.pdf
Implementation of functions SOUNDEX, SOUNDEX2 and PHONEX in Delphi

Sndx unit

; //--------------------------------------------- -----------------------------

/ / Copyright: / / Frédéric BROUARD for Phonex - COPYRIGHT Reserved

/ / For the use, contact Frédéric BROUARD to e-mail:

 / / This e-mail address is being protected from spam bots, you need JavaScript enabled to view it / /

//------------------------- -------------------------------------------------

Interface

/ / Type of variable function return soundex

Type Sound = string [4]

/ / declaration routines used by soundex

Function prepare (sIn: string): string;

Function SearchReplace (sIn: string; word1: string; word2 : String): string;

Function pow (x, y: integer): double;

/ / declaration functions soundex

 Function soundex (sIn: string): sound; soundex2

 Function (sIn: string): sound; phonex

Function (sIn: string): double;

implementation

 uses

 sysUtils,

dialogs;

//------------------------------------- -------------------------------------

/ / Search function and replacement of sub chain in a chain

//------------------------------------------------ --------------------------

Function SearchReplace (sIn: string; word1: string; word2: string): string;

var

tailleSin: integer;

TailleMot : Integer; posMot: integer;

begin

 / / side effect: the word to replace is the same as the search word

if string1 = word2

 then

begin

result: = sIn;

exit;

end

/ / side effect unmanaged: word to replace contains the search word

/ / example: replace 'no' with 'not'

tailleSin: = length (Sin);

TailleMot: = length (string1);

posMot: = pos (string1, sIn);

While posMot> 0

do

begin

 / / replace the word is beginning to channel

if posMot = 1

then

sIn: = + word2 copy (Sin, tailleMot +1, tailleSin-tailleMot)

else

/ / word to replace the east end of a chain

 if posMot + tailleMot -1 = tailleSin

 then

sIn: = copy (Sin, 1, posMot-1) + word2

/ / replace the word is in the midst

else

sIn: = copy (Sin, 1, posMot-1) +

+ word2 copy (sin , posMot + tailleMot, tailleSin-(posMot + tailleMot-1));

posMot: = pos (string1, sIn); end; result: = Sin;

end;

 //------------- -------------------------------------------------- -----------

/ / Search function and replace substring in a string

 / / unless the following letter is a vowel or a sound from 1 to 4

//------- -------------------------------------------------- -----------------

SRSaufVoyelle

Function (sIn: string; word1: string; word2: string): string;

 const vowel = [ 'a', 'e', 'i' , 'o', 'u', 'y','1 ','2','3 ','4'];

var

tailleSin: integer;

TailleMot: integer;

posMot: integer;

derLet: char;

 begin

/ / side effect: the word to replace is the same as the search word

if string1 = word2

then

begin

result: = sIn;

exit;

end

/ / side effect unmanaged: to replace the word contains the search word

 / / example: replace 'no' with 'not'

tailleSin: = length (Sin);

TailleMot: = length (string1);

posMot: = pos (string1, sIn);

While posMot> 0

do

begin

 / / The following letter is it a vowel?

if posMot + tailleMot-1 <tailleSin

then

begin

derlet: = sIn [posMot + tailleMot];

 if derLet in vowel

 then

 result: = sIn;

exit;

end

/ / replace the word is beginning to channel

 if posMot = 1

then

sIn: = + word2 copy (Sin, tailleMot +1, tailleSin-tailleMot)

else

/ / word to replace the east end of a chain

if posMot + tailleMot -1 = tailleSin

 then

sIn: = copy (Sin, 1, posMot-1 ) + word2

/ / replace the word is in the midst

else

sIn: = copy (Sin, 1, posMot-1) +

+ word2 copy (sin, posMot + tailleMot, tailleSin-(posMot + tailleMot-1));

posMot: Pos = (string1, sIn);

end;

result: = Sin;

 end;

//------------------------------- -------------------------------------------

 / / Search function and replacement sub chain in a chain

/ / unless the following letter is a vowel or a sound from 1 to 4

//------------------------- -------------------------------------------------

Function SRSauf2Voyelle (sIn: string; word1: string; word2: string): string;

const

vowel = [ 'a', 'e', 'i', 'o', 'u', 'y','1 ',' 2 ','3','4 '];

var

tailleSin: integer;

TailleMot: integer;

posMot: integer;

derLet: char;

premlet: char;

begin

/ / side effect: the word to replace is the same as the word to seek

 if string1 = word2

then

begin

result: = sIn;

exit;

end

/ / side effect unmanaged: to replace the word contains the search word

/ / example: replace 'no' with 'not'

tailleSin: = length (Sin);

TailleMot: = length (string1);

posMot: = pos (string1, sIn);

While posMot> 0 do

begin

 / / is there a previous letter and a following?

if (posMot> 1) and (posMot + tailleMot-1 <tailleSin)

then

 begin

 premLet: = sIn [posMot-1];

derlet: = sIn [posMot + tailleMot]

/ / these letters are they vowels?

if not ((premLet in vowel) and (derLet in vowel))

 then

 exit;

end

 / / replace the word is beginning to channel

if posMot = 1

 then

sIn: = + word2 copy (Sin, tailleMot +1, tailleSin - tailleMot)

 else

/ / word to replace the east end of a chain

 if posMot + tailleMot -1 = tailleSin

then

sIn: = copy (Sin, 1, posMot-1) + word2

/ / replace the word is in the midst

else

sIn : = Copy (Sin, 1, posMot-1) +

+ word2 copy (sin, posMot + tailleMot, tailleSin-(posMot + tailleMot-1));

posMot: = pos (string1, sIn);

end;

result: = Sin ;

 End;

 //--------------------------------------------- -----------------------------

 / / On the empty white head and tail, the chain converted to uppercase

/ / and on replaces accented capitals, with c cedilla capital

//-------------------------------------- ------------------------------------

/ / And e o capital in a letter Œ equivalent to normal capitals

Function prepare (sIn: string): string;

tailleSin

var i: integer;

because: char;

sOut: string;

begin

/ / pooling capital

sIn: = Trim (sIn);

sIn: = upperCase (sIn) ;

TailleSin: = length (sIn);

sOut: ='';

for i: = 1 to tailleSin

do

begin

because: sIn = [i];

CASE because of

'Â', 'Ä', 'A': because: = 'A',

'Ç': because: = 'S',

'È', 'E', 'Ê', 'Ë', 'Œ': because: = 'E'

'Î', 'Ï': because: = 'I',

'O', 'Ö': because: = 'O',

'Ù', 'Û', 'Ü': because: = 'U';

END;

sOut: = sOut + car;

end

/ / removal of white and dashes

sIn: = sOut;

sOut: ='';

for i: = 1 to length (sIn)

do

 if (sIn [i] <> '') and (sIn [i] < > '-')

Then

sOut: = sOut + sIn [i];

result: = sOut;

end;

Function pow (x, y: integer): double;

var

xx, yy: double;

begin

xx: = x;

yy: = Y;

result: = exp (xx * ln (yy));

 end;

//------------------------------ --------------------------------------------

/ / Body of the first soundex

//------------------------------------------------ --------------------------

Function soundex (sIn: string): sound;

 type

 TabloLettres = array [1 .. 26] of char;

Const

Encode: TabloLettres =

('0 ','1','2 ','3','0 ','1',

'2 ','0','0 ','2','2 ',' 4 ','5','5 ','0','1 ','2','6 ','2','3 ',

'0','1 ','0','2 ' ,'0 ','2');

Var

iSX, iiSX: smallint;

tailleSin: integer;

sOut: string;

begin

/ / cas trivial: the chain is empty

if sIn =''

then

 begin

 result: ='0000 '

exit ;

End;

/ / prepare the string

sIn: = prepare (sIn)

/ / processing the second side effect: chain length 1

if length (sIn) = 1

then

begin

result: sIn + ='000 ';

exit;

end

/ / Third side effect: the first is a letter H on the cut of the word

tailleSin: = length (sIn);

if sIn [1] = 'H'

then

sIn: = copy (sIn, 2, tailleSin-1)

 / / Treatment for all other cases

 / / loop on the length of the chain target

tailleSIn: = length (sIn);

for iSX: = 2 to tailleSIn

do

/ / if the character is between the letters A to Z: on transcode

 if sIn [iSX] in [ 'A' .. 'Z']

then

sIn [iSX]: = enCode [ord (sIn [iSX])-ord ( 'A') +1]

/ / if the character n ' is not between A and Z: there is a zero

else

sIn [iSX]: ='0 ';

/ / retrieves the first letter of the word

sOut :='';

sOut: = sIn [1]

 / / second phase transcoding

iiSX: = 2;

for iSX: = 2 to tailleSIn

do

begin

 / / if the character is a nonzero is retained

if sIn [iSX] <>'0 '

then

begin

sout: = + sout sIn [iSX];

 iiSX: = iiSX +1;

end

/ / if it exceeds 4 characters, leaving the loop

 if iiSX> 4

 then

 begin

result: = sOut;

exit;

end;

end

/ / less than 4 characters: one complete

While a zero length (sOut) <4

do

sout:

sout + ='0 ';

result: = sOut;

 end;

 //---------------------- -------------------------------------------------- --

/ / Soundex2 at home by Frederic BROUARD - copyright Frédéric BROUARD

 //----------------------------------- --------------------------------------

Function soundex2 (sIn: string): sound;

 type

TabloVoyell = array [1 .. 4] of char;

TabloCombi1 = array [1 .. 11.1 .. 2] of string;

TabloCombi2 = array [1 .. 5.1 .. 2] of string;

Const

vowel: TabloVoyell =

( 'E',

'I',

 'O',

 'U');

Combin1: TabloCombi1 =

(( 'GUI', 'KI'),

 ( 'GUE', 'KE '),

(' GA ',' KA '),

(' GO ',' KO '),

(' EM ',' K '),

(' CA ',' KA '),

 (' CO ',' KO '),

(' CU ',' KU '),

 (' Q ',' K '),

(' CC ',' K '),

(' CK ',' K '));

Combin2: TabloCombi2 =

(( 'ASA', 'AZA'),

( 'KN', 'NN'),

 ( 'PF', 'FF'),

 ( 'PH', 'FF'),

( 'SCH', 'SSS'));

 var

i: integer / / index loop

lSin: integer / / length of the channel entrance

prfx: string / / prefix

 sIn2: string / / sIn least

let the first letter: string / / letter

begin

/ / Cas trivial: the chain is empty

 if sIn =

''then

 begin

result: = '';

exit;

end;

/ / prepare the string: Steps 1, 2 and 3

sIn: = prepare (sIn);

 lSin: = length (sIn)

/ / processing the second side effect: chain length 1

if lSin = 1

then

 begin

 result: sIn + = '';

exit;

end

/ / stages 1, 2, 3 and 4: removes white, met Capitalize

/ / converts accents and c cedilla

sIn: = prepare (sIn)

/ / Step 5: replacing the primary consonnances

 for i: = 1 to 4

do

 sIn: = SearchReplace (sIn, Combin1 [i, 1 ], Combin1 [i, 2])

 / / Step 6: replacing the vowels except Y and except the first by A

 lSin: = length (sIn);

sIn2: = copy (sIn, 2, lSin-1);

for i: = 1 to 4

do

sIn2: = SearchReplace (sIn2, vowel [i], 'A');

sIn: = sIn [1] + sIn2

/ / Step 7: replacing prefixes

 lSin: = length (sIn );

 If lSin> = 2

then

begin

prfx: = copy (sIn, 1.2); if (prfx = 'KN') then prfx: = 'NN';

 if (prfx = 'PH') or (prfx = ' PF ')

 then

prfx: =' FF ';

 if lSin = 2

then

sIn: = prfx

else

sIn: = + prfx copy (sIn, 3, lSin-2);

 end;

 if lSin> = 3

then

 begin

prfx: = copy (sIn, 1.3); if (prfx = 'MAC')

 then

 prfx: = 'MCC'; if (prfx = 'SCH')

 then

prfx: = 'SSS';

if (prfx = 'ASA')

var
    i: integer / / index loop
    lSin: integer / / length of the chain entry
    prfx: string / / prefix
    sIn2: string / / sIn least the first letter
    let: string / / letter
Begin

/ / Cas trivial: the chain is empty
If sIn =''
then
Begin
    result: = '';
    exit;
end;

/ / Prepare the string: Steps 1, 2 and 3
sIn: = prepare (sIn);
lSin: = length (sIn);

/ / Processing the second side effect: chain length 1
If lSin = 1
then
Begin
    result: sIn + = '';
    exit;
end;

/ / Stages 1, 2, 3 and 4: removes white, is capitalized,
/ / Converts accents and c cedilla
sIn: = prepare (sIn);

/ / Step 5: replacing the primary consonnances
for i: = 1 to 4
do
   sIn: = SearchReplace (sIn, Combin1 [i, 1], Combin1 [i, 2]);

/ / Step 6: replacing the vowels except Y and except the first by A
lSin: = length (sIn);
sIn2: = copy (sIn, 2, lSin-1);
for i: = 1 to 4
do
   sIn2: = SearchReplace (sIn2, vowel [i], 'A');
sIn: = sIn [1] + sIn2;

/ / Step 7: replacing prefixes
lSin: = length (sIn);
If lSin> = 2
then
Begin
     prfx: = copy (sIn, 1.2);
     if (prfx = 'KN')
     then
         prfx: = 'NN';
     if (prfx = 'PH') or (prfx = 'PF')
     then
         prfx: = 'FF';
     If lSin = 2
     then
         sIn: = prfx
     Else
         sIn: = + prfx copy (sIn, 3, lSin-2);
end;
If lSin> = 3
then
Begin
     prfx: = copy (sIn, 1.3);
     if (prfx = 'MAC')
     then
         prfx: = 'MCC';
     if (prfx = 'SCH)
     then
         prfx: = 'SSS';
     if (prfx = 'ASA')
     then
         prfx: = 'AZA';
     If lSin = 3
     then
         sIn: = prfx
     Else
         sIn: = + prfx copy (sIn, 4, lSin-3);
end;

/ / Step 8: retain the first letter and it was
/ / Replacements complementary
sIn2: = copy (Sin, 2, lSin-1);
for i: = 1 to 5
do
   sIn2: = SearchReplace (sIn2, Combin2 [i, 1], Combin2 [i, 2]);
sIn: = sIn [1] + sIn2;

/ / Step 9: abolish H except CH or SH
lSin: = length (sIn);
sIn2: ='';
for i: = 1 to lSin
do
/ / No H retain the letter
   if (sIn [i] <> 'H')
   then
   Begin
       sIn2: = SIn2 + sIn [i];
       continues;
   end
   Else
/ / H is preceded by an S or a C on the preserves
       if (sIn [i-1] = 'C') or (sIn [i-1] = 'S')
       then
           sIn2: = Sin2 + sIn [i];
sIn: = Sin2;

/ / Stage 10: abolish Y unless preceded by an A
lSin: = length (sIn);
sIn2: ='';
for i: = 1 to lSin
do
/ / Y not retain the letter
   if (sIn [i] <> 'Y')
   then
   Begin
       sIn2: = SIn2 + sIn [i];
       continues;
   end
   Else
/ / Y is preceded by an A on the preserves
       if (sIn [i-1] = 'A')
       then
           sIn2: = Sin2 + sIn [i];
sIn: = Sin2;
lSin: = length (sIn);

/ / Stage 11: removing endings A, T, D, S
let: = copy (sIn, lSin, 1);
if (let = 'A') or (let = 'D') or (let = 'S') or (let = 'T')
then
     sIn: = copy (sIn, 1, lSin-1);

/ / Stage 12: removal of all except A head
lSin: = length (sIn);
sIn2: = copy (sIn, 1.1);
for i: = 2 to lSin
do
/ / A not retain the letter
   if (sIn [i] <> 'A')
   then
   Begin
       sIn2: = sIn2 + sIn [i];
       continues;
   end;
sIn: = Sin2;
lSin: = length (sIn);

/ / Stage 13: removing repetitive letters
let: = copy (sIn, 1.1);
sIn2: = Let;
for i: = 2 to lSin
do
Begin
   sIn if [i] = let
   then
       continues;
   let: = sIn [i];
   Sin2: = Sin2 + sIn [i];
end;
sIn: = sIn2;

/ / Stage 14: it only takes 4 or characters on complete with white
while length (sIn) <4
do
    Sin: Sin + = '';
if length (sIn)> 4
then
    sIn: = copy (sIn, 1.4);

result: = sIn;

end;

//------------------------------------------------ -------------------------
/ / FUNCTION PHONEX: copyright Frédéric BROUARD
//------------------------------------------------ -------------------------
/ / function phonex (sIn: string): integer;
phonex function (sIn: string): double;
Guy
    TabSonAI = array [1 .. 4] of string;
    tabCarPhon = array [0 .. 21] of char;
Const
     SonAIA: TabSonAI = ( 'aina', 'eina', 'love', 'eima');
     SonAIE: TabSonAI = ( 'groin', 'eine', 'love', 'eime');
     SonAII: tabSonAI = ( 'aini', 'eini', 'aimi', 'eimi');
     SonAIO: tabSonAI = ( 'aino', 'eino', 'aimo', 'eimo');
     SonAIU: tabSonAI = ( 'ainu', 'einu', 'aimu', 'eimu');
     CarPhon: TabCarphon = ('1 ','2','3 ','4','5 ',' e ',' f ',' g ',' h ',' i ',' k ',' l ',' No ',' o ',' r ',' s', 't', 'u', 'w', 'x', 'y', 'z');
var
    i, j, k: integer;
    because: char;
    p: integer;
    let, sin2: string;
    sOut: array [1 .. 10] of integer;
Begin
/ / Cas trivial: the chain is empty
If sIn =''
then
Begin
/ / Result: = 0;
result: = 0.0;
    exit;
end;

/ / Setting tiny
sIn: = lowerCase (sIn);

/ / Replacing it with i
sIn: = SearchReplace (sIn, 'y', 'i');

/ / Replacement accented letters
for i: = 1 to length (sIn)
do
Begin
      because: sIn = [i];
      CASE because of
         'â', 'ä', 'a', 'Â', 'Ä', 'A': because: = 'a';
         'ç', 'Ç': because: = 's';
         'ë', 'œ', 'Ë', 'Œ': because: = 'e';
         'ï', 'î', 'Î', 'Ï': because: = 'i';
         'O', 'ö', 'O', 'Ö': because: = 'o';
         'ù', 'û', 'ü', 'Ù', 'Û', 'Ü': because: = 'u';
         'e', 'even', 'È', 'E', 'Ê': because: = 'y';
      END;
      sIn [i]: = car;
end;

/ / H silent Withdraws
for i: = 1 to length (sIn)
do
Begin
   p = pos ( 'h', sIn);
   If p = 1
   then
     sIn: = copy (sIn, 2, length (sIn) -1)
   Else
     if not ((sIn [p-1] = 'c') or (sIn [p-1] = 's'))
     then
       if p <length (sIn)
       then
         sIn: = copy (sIn, 1, p-1) + copy (sIn, p +1, length (sIn)-p)
       Else
         sIn: = copy (sIn, 1, p-1);
end;

/ / Ph replacement by h
sIn: = SearchReplace (sIn, 'ph', 'f');

/ / G rempacement sounding k year before, am, thus, aim
sIn: = searchReplace (sIn, 'gan', 'kan');
sIn: = searchReplace (sIn, 'gain', 'kain');
sIn: = searchReplace (sIn, 'gam', 'kam4');
sIn: = searchReplace (sIn, 'gaim', 'kaim');

/ / Replacement of its AI
for i: = 1 to 4
do
Begin
   sIn: = searchReplace (sIn, SonAIA [i], 'yna');
   sIn: = searchReplace (sIn, SonAIE [i], 'yne');
   sIn: = searchReplace (sIn, SonAII [i], 'yni');
   sIn: = searchReplace (sIn, SonAIO [i], 'yno');
   sIn: = searchReplace (sIn, SonAIU [i], 'ynu');
end;

/ / Alternative groups of 3 letters
sIn: = searchReplace (sIn, 'water', 'o');
sIn: = searchReplace (sIn, 'oua','2 ');
sIn: = searchReplace (sIn, 'ein','4 ');
sIn: = searchReplace (sIn, 'ain','4 ');

/ / Replacement of its e
sIn: = searchReplace (sIn, 'I', 'y');
sIn: = searchReplace (sIn, 'ei', 'y');
sIn: = searchReplace (sIn, 'er', 'yr');
sIn: = searchReplace (sIn, 'ess',' yss');
sIn: = searchReplace (sIn, 'and', 'yt');
sIn: = searchReplace (sIn, 'ez', 'yz');

/ / Alternative groups of 2 letters unless vowel or (1 to 4)
Sin: = SRSaufVoyelle (sIn, 'an','1 ');
Sin: = SRSaufVoyelle (sIn, 'am','1 ');
Sin: = SRSaufVoyelle (sIn, 'en','1 ');
Sin: = SRSaufVoyelle (sIn, 'em','1 ');
Sin: = SRSaufVoyelle (sIn, 'in','4 ');

/ / Replacement of SCH
Sin: = searchReplace (sIn, 'sch','5 ');

/ / S replacement if preceded and followed by a vowel or (1 to 4)
sin: = SRSauf2Voyelle (sIn, 's',' z ');

/ / Alternative groups of various letters 2
Sin: = searchReplace (sIn, 'ow', 'e');
Sin: = searchReplace (sIn, 'eu', 'e');
Sin: = searchReplace (sIn, 'the', 'o');
Sin: = searchReplace (sIn, 'oi','2 ');
Sin: = searchReplace (sIn, 'oy','2 ');
Sin: = searchReplace (sIn, 'or','3 ');
Sin: = searchReplace (sIn, 'ch','5 ');
Sin: = searchReplace (sIn, 'sh','5 ');
Sin: = searchReplace (sIn, 'ss',' s');
Sin: = searchReplace (sIn, 'si', 's');

/ / C by replacing s if followed by an e or an i
Sin: = searchReplace (sIn, 'this',' se ');
Sin: = searchReplace (sIn, 'it', 'if');

/ / Replacement various
Sin: = searchReplace (sIn, 'c', 'k');
Sin: = searchReplace (sIn, 'q', 'k');
Sin: = searchReplace (sIn, 'that', 'k');

Sin: = searchReplace (sIn, 'ga', 'ka');
Sin: = searchReplace (sIn, 'go', 'ko');
Sin: = searchReplace (sIn, 'gu', 'ku');
Sin: = searchReplace (sIn, 'gy', 'ky');
Sin: = searchReplace (sIn, 'g2', 'k2');
Sin: = searchReplace (sIn, 'g1', 'k1');
Sin: = searchReplace (sIn, 'g3', 'K3');

Sin: = searchReplace (sIn, 'a', 'o');
Sin: = searchReplace (sIn, 'd', 't');
Sin: = searchReplace (sIn, 'p', 't');
Sin: = searchReplace (sIn, 'j', 'g');
Sin: = searchReplace (sIn, 'b', 'f');
Sin: = searchReplace (sIn, 'v', 'f');
Sin: = searchReplace (sIn, 'm', 'n');

/ / Removal of duplicated letters
let: = copy (sIn, 1.1);
sIn2: = Let;
for i: = 2 to length (Sin)
do
Begin
   sIn if [i] = let
   then
       continues;
   let: = sIn [i];
   Sin2: = Sin2 + sIn [i];
end;
sIn: = sIn2;

/ / Suppression endings
sIn2: = copy (sIn, length (sIn), 1);
if (sIn2 = 't') or (sIn2 = 'x') or (sIn2 = 's') or (sIn2 =' z ')
then
     sIn: = copy (sIn, 1, length (sIn) -1);

/ / Removal of unauthorized characters
j: = 10;
for i: = 1 to length (sIn)
do
Begin
   If j <1
   then
       break;
   for k: = 0 to 21
   do
     sIn if [i] = carPhon [k]
     then
     Begin
          sout [j]: = k;
          j: = j-1;
     end;
end;

/ / Conversion to floating
result: = 0.0;
for j: = 10 downTo 1
do
   result: = result + sout [j] * pow (j-1, 22);

end;

end.
PM *.
Soundex_fr package;
require Exporter;
use strict;
local use;
our $ VERSION = 'v1.0.0';
our @ ISA = qw (Export);
our @ EXPORT = qw (soundex_fr);
# $ Soundex_nocode is useful to indicate the soundex an empty word
# Default on returns undef but you may prefer 'ZZZZ' or 'NULL'
our $ soundex_nocode = undef;
soundex_fr sub (
   my @ a = @ _;
   my $ s;
   push (@,'') unless @ a # # No parameter appeal

   foreach (@ a) (
    if ($ _ eq'') (
     $ _ = $ Soundex_nocode;
     Next;
    )

    # Prepare the string: Steps 1, 2, 3, 4
    # 1. Eliminates white ˆ ˆ right and left of the name
    s / ^ \ s + / /;
    s / \ s +$//;

    # 2. Convert the name in uppercase
    $ _ = UC ($_);

    # 3. Converted accented letters and c cedilla in non-accented letters
    # NB: accented letters are not capitalized :-(
    tr / \ xE0-\ xE6 \ xE7 \ xE8-\ xEB \ xEC-\ xEF \ xF1 \ xF2-\ xF6 \ xF9-\ xFC \ xFD \ xFF / AAAAAAACEEEEIIIINOOOOOUUUUYY;
    tr / \ xC0-\ xC6 \ xC7 \ xC8-\ xCB \ xCC-\ xCF \ xD1 \ xD2-\ xD6 \ xD9-\ xDC \ xDD / AAAAAAACEEEEIIIINOOOOOUUUUY;

    # 4. Eliminates all other symbols
    tr / A-Z / / cd;

    Chain length # 1
    next if (length ($ _) == 1);

    Step # 5: replacing the primary consonnances
    s / GUI / KI / g;
    s / GUE / KE / g;
    s / GA / KA / g;
    s / GO / KO / g;
    s / GU / K / g;
    s / CA / KA / g;
    s / CO / KO / g;
    s / CU / KU / g;
    s / Q / K / g;
    s / CC / K / g;
    s / CK / K / g;

    Step # 6: replacing the vowels A, except the Y and the first letter
    s /^(.)(.*)$/$ 2 / # the first character in $ 1, the rest in $ _
    tr / EIOU / YYYY;
    $ _ = $ 1. $ _;

    Step # 7: replacing prefixes
    s / ^ KN / NN /;
    s / ^ (PH | PF) / FF /;
    s / ^ MAC / CMC /;
    s / ^ SCH / SSS /;
    s / ^ ASA / AZA;

    Step # 8: retain the first letter
    # And it is complementary replacements
    s /^(.)(.*)$/$ 2 / # the first character in $ 1, the rest in $ _
    $ s = $ 1; # retains the first character
    s / KN / NN /;
    s / (PH | PF) / FF /;
    s / MAC / CMC /;
    s / SCH / SSS /;
    s / ASA / AZA;
    $ _ = $ S. $ _;

    Step # 9: removal of H except CH or SH
    s / CH / C @ / g, s / SH / S @ / g;
    s / H / /;
    s / C @ / CH / g, s / S @ / SH / g;

    Step # 10: abolish Y unless preceded by an A
    s / VJ / A @ / g;
    s / Y / /;
    s / A @ / VJ / g;

    Step # 11: removing endings A, T, D, S
    s / [ATDS ]$//;

    Step # 12: removal of all except A head
    s /^(.)(.*)$/$ 2 / # the first character in $ 1, the rest in $ _
    $ s = $ 1;
    s / A / / g;
    $ _ = $ S. $ _;

    Step # 13: it removes repetitive letters
    my @ ac = split (//);
    $ _ ='';
    for (my $ i = 0; $ i <$ # ac; $ i + +) (
      _ $ .= $ Ac [$ i] unless ($ ac [$ i] eq $ ac [$ i + 1]);
    )
    _ $ .= $ Ac [$ # ac];

    Step # 14: it only takes 4 characters
    s /^(.{ 1.4 }).*$/$ 1 /;

   )

   wantarray? @ a: @ a shift;
)

1;
__END__

= head1 NAME

Text: Soundex_fr - Soundex adapted to the French? Ais

= head1 USE of Soundex_fr

   Soundex_fr use;

   $ Soudex_fr: soundex_nocode = 'NULL' # soundex an empty word

   $ sound = soundex_fr ($ word);
   @ = sounds soundex_fr (@ words);

DESCRIPTION = head1

Algorithms described by Frederic Bouchard < This e-mail address is being protected from spam bots, you need JavaScript enabled to view it >

http://sqlpro.multimania.com/Soundex/SQL_AZ_soundex.htm

Soundex_fr is one of the Soundex algorithm at home, and derived from the algorithm described in the book of
Joe Celko - advanced SQL, published in 1995 at Thomson Publishing International. It is based on the algorithm
Gus Baird (Georgia Tech) stated on page 85.

Unlike the precedent that makes use only figures with the exception of the first character, this
new version retains most of the letters. By comparing the two versions, one finds, for the
first a number of possible combinations of 26x10x10x10 = 26 000 whereas in this improved version
the number of different combinations climbs until about 20x20x20x20 = 160000
It is therefore more efficient in many cases, meaning it can select less
in the case of research with the same footprint of 4 characters.

= head1 AUTHOR

Jean-Marc Pennel < This e-mail address is being protected from spam bots, you need JavaScript enabled to view it <
http://www.suricate.net/

= cut
*. pl


Soundex_fr use;
use strict;

my $ word = 'Bernard';
my @ words = ( 'Martin', 'Michel', 'Jean-Marc', 'Frederick', 'Valerie');

print "\ nSoundex_fr test: \ n";

print "Test 1: $ word =>";
soundex_fr print ($ word), "\ n";

print "Test 2: (", (map ($ _ .= '') @ words) ") => (";
print map ($ _ .= '') soundex_fr (@ words);
print ") \ n";

print "Test 3: with an empty word or no argument, must return 'NULL' =>";
$ Soundex_fr: soundex_nocode = "NULL";
soundex_fr print (), "\ n";

print "End of test."
The implementation of the civil phonex Oracle PL / SQL. A version due to Julien Vauconsant, as amended by Sebastien MICHEL alias UbiK (sebastien.michel @ bjd.fr)



rem ***************************************************************
rem * *
rem * phonex.sql *
rem * *
rem * *
rem * This script rem created 2 tables and a function Phonex in *
rem * same pattern. *
rem * Think to give the privilege of execution of the function *
rem * replace the other user. *
rem * This function works like a SOUNDEX improved. *
rem * *
rem * It is due to the work of Mr FREDERIC BROUARD *
rem * alias SQLPro that has developed under Delphi. *
rem * (http://sqlpro.developpez.com/Soundex/SQL_AZ_soundex.html) *
rem * to get sources and more detailed information. *
rem * Thanks to replace him for allowing me to make this port in *
rem * PLS/SQL :) *
rem * *
rem * This function Oracle is completely free of law, you *
rem * can use it without any limitation or restriction, *
rem * modify the sources your convenience ... if you replace *
rem * improve I will be kept abreast;) *
rem * *
rem * PS: Free you had to go through 2 tables or create an *
rem * type varray for the occasion. But the fact of passing *
rem * replaced by 2 tables for storing the types of sounds ensures compatibility *
rem * from the 7.3 version of Oracle! *
rem *=============================================================*
rem * Author: Julien Vauconsant alias UbiK *
rem * This e-mail address is being protected from spam bots, you need JavaScript enabled to view it *
rem *=============================================================*
rem * Created 23/12/2002 ~~~~~~ Last modif the rem 26/12/2002 *
rem * *
rem * System (creation and tests): * Oracle 8.1.7.3 *
rem ***************************************************************


create table son (
son_aia varchar2(4),
son_aie varchar2(4),
son_aii varchar2(4),
son_aio varchar2(4),
son_aiu varchar2(4));

insert into son(son_aia,son_aie,son_aii,son_aio,son_aiu) values('aina','aine','aini','aino','ainu');
insert into son(son_aia,son_aie,son_aii,son_aio,son_aiu) values('eina','eine','eini','eino','einu');
insert into son(son_aia,son_aie,son_aii,son_aio,son_aiu) values('aima','aime','aimi','aimo','aimu');
insert into son(son_aia,son_aie,son_aii,son_aio,son_aiu) values('eima','eime','eimi','eimo','eimu');


create table carphon (
tabcarphon varchar2(1),
codecarphon number);

insert into carphon(tabcarphon,codecarphon) values('1',1);
insert into carphon(tabcarphon,codecarphon) values('2',2);
insert into carphon(tabcarphon,codecarphon) values('3',3);
insert into carphon(tabcarphon,codecarphon) values('4',4);
insert into carphon(tabcarphon,codecarphon) values('5',5);
insert into carphon(tabcarphon,codecarphon) values('e',6);
insert into carphon(tabcarphon,codecarphon) values('f',7);
insert into carphon(tabcarphon,codecarphon) values('g',8);
insert into carphon(tabcarphon,codecarphon) values('h',9);
insert into carphon(tabcarphon,codecarphon) values('i',10);
insert into carphon(tabcarphon,codecarphon) values('k',11);
insert into carphon(tabcarphon,codecarphon) values('l',12);
insert into carphon(tabcarphon,codecarphon) values('n',13);
insert into carphon(tabcarphon,codecarphon) values('o',14);
insert into carphon(tabcarphon,codecarphon) values('r',15);
insert into carphon(tabcarphon,codecarphon) values('s',16);
insert into carphon(tabcarphon,codecarphon) values('t',17);
insert into carphon(tabcarphon,codecarphon) values('u',18);
insert into carphon(tabcarphon,codecarphon) values('w',19);
insert into carphon(tabcarphon,codecarphon) values('x',20);
insert into carphon(tabcarphon,codecarphon) values('y',21);
insert into carphon(tabcarphon,codecarphon) values('z',22);
commit;

/************** CREATION OF THE CIVIL PHONEX **************/

CREATE OR REPLACE FUNCTION PHONEX (l_in in varchar2)
return float is

cursor c1 is (select son_aia, son_aie, son_aii, son_aio, son_aiu from son);
cursor c2 is (select tabcarphon,codecarphon from carphon);

custom_error exception;
SonAi c1%rowtype;
KPhon c2%rowtype;
v_string varchar2(4000);
result float;
i integer;
p integer;
j integer;
cpt integer;
sortie integer;
letter char(1);
v_string_bis varchar2(4000);

BEGIN

-- opening cursor
open c1;
-- initialization of variables
v_string := replace(l_in,chr(32),null); - remove spaces
p := 1;
cpt:=0;
sortie := 6; -- Release condition = 0
result:=0.0;
j:= length(v_string);

-- empty string input
if (v_string is null) then
result:=0.0;
end if;

-- a tiny passage of the chain
v_string := lower(v_string);

-- remplacement des ç par des ss
v_string := replace(v_string,'ç','ss');
v_string := replace(v_string,'Ç','ss');

-- replacement of ç by ss
v_string := replace(v_string,'y','i');

-- Replacing it with i
v_string := translate(v_string,'ÉÈÊËéèêë','yyyyyyyy');

-- replacing accented by e y (and sound)
v_string := translate(v_string,'ÀÄÂâäàÇçÉÈÊËéèêëÏÎïîÖÔöôÜÛÙüûù','aaaaaacceeeeeeeeiiiioooouuuuuu');

- removal of silent h ...
-- ... in the first or last position of string ...

while (substr(v_string,1,1)='h' or substr(v_string,j,1)='h') loop
v_string := rtrim(ltrim(v_string,'h'),'h');
end loop;
-- ... other cases
for i in 2..j
loop
if ( (substr(v_string,i,1)='h') and ( substr(v_string,i-1,1) not in ('c','s','p') ) ) then
v_string := substr(v_string,1,i-1)||substr(v_string,i+1);
j:=j-1;
end if;
end loop;

-- er replacement by ez ultimately word
-- BOUCHEZ and BOUCHER
if substr(v_string,length(v_string)-1,2) = 'er' then
v_string := substr(v_string,1,length(v_string)-2) || 'ez';
end if;

-- replacing ss by EB at the end of a word
-- choultess and choultesse
if substr(v_string,length(v_string)-1,2) = 'ss' then
v_string := substr(v_string,1,length(v_string)-2) || 'sse';
end if;

-- replacement by ph f
v_string := replace(v_string,'ph','f');

-- replacing g ringing k
v_string := replace(v_string,'gan','kan');
v_string := replace(v_string,'gain','kain');
v_string := replace(v_string,'gam','kam');
v_string := replace(v_string,'gaim','kaim');

-- replacement of its AI
loop
fetch c1 into SonAI;
exit when c1%notfound;
v_string := replace(v_string,SonAi.son_aia,'yna');
v_string := replace(v_string,SonAi.son_aie,'yne');
v_string := replace(v_string,SonAi.son_aii,'yni');
v_string := replace(v_string,SonAi.son_aio,'yno');
v_string := replace(v_string,SonAi.son_aiu,'ynu');
end loop;
close c1;

- - replacement groups of 3 letters
v_string := replace(v_string,'eau','o');
v_string := replace(v_string,'oua','2');
v_string := replace(v_string,'ein','4');
v_string := replace(v_string,'ain','4');
v_string := replace(v_string,'eim','4');
v_string := replace(v_string,'aim','4');

- - replacement of its e
v_string := replace(v_string,'ai','y');
v_string := replace(v_string,'ei','y');
v_string := replace(v_string,'er','yr');
v_string := replace(v_string,'ess','yss');
v_string := replace(v_string,'et','yt');
v_string := replace(v_string,'ez','yz');

v_string := replace(v_string,'oe','e');
v_string := replace(v_string,'eu','e');
v_string := replace(v_string,'au','o');

-- replacement of groups of 2 letters unless the group is followed by a vowel or a sound 1 to 4
while (sortie>0)
loop
-- an
p := instr(v_string,'an');
if(p=0) then
sortie := sortie-1;
else if substr(v_string,p+2,1) not in ('a','e','i','o','u','y','1','2','3','4') then
v_string := substr(v_string,1,p-1)||'1'||substr(v_string,p+2);
else if length(v_string) = p+1 then
v_string := substr(v_string,1,p-1)||'1';
end if;
end if;
end if;
-- am
p := instr(v_string,'am');
if(p=0) then
sortie := sortie-1;
else if substr(v_string,p+2,1) not in ('a','e','i','o','u','y','1','2','3','4') then
v_string := substr(v_string,1,p-1)||'1'||substr(v_string,p+2);
else if length(v_string) = p+1 then
v_string := substr(v_string,1,p-1)||'1';
end if;
end if;
end if;
-- to
p := instr(v_string,'en');
if(p=0) then
sortie := sortie-1;
else if substr(v_string,p+2,1) not in ('a','e','i','o','u','y','1','2','3','4') then
v_string := substr(v_string,1,p-1)||'1'||substr(v_string,p+2);
else if length(v_string) = p+1 then
v_string := substr(v_string,1,p-1)||'1';
end if;
end if;
end if;
-- em
p := instr(v_string,'em');
if(p=0) then
sortie := sortie-1;
else if substr(v_string,p+2,1) not in ('a','e','i','o','u','y','1','2','3','4') then
v_string := substr(v_string,1,p-1)||'1'||substr(v_string,p+2);
else if length(v_string) = p+1 then
v_string := substr(v_string,1,p-1)||'1';
end if;
end if;
end if;
-- in
p := instr(v_string,'in');
if(p=0) then
sortie := sortie-1;
else if substr(v_string,p+2,1) not in ('a','e','i','o','u','y','1','2','3','4') then
v_string := substr(v_string,1,p-1)||'4'||substr(v_string,p+2);
else if length(v_string) = p+1 then
v_string := substr(v_string,1,p-1)||'4';
end if;
end if;
end if;
-- a
p := instr(v_string,'un');
if(p=0) then
sortie := sortie-1;
else if substr(v_string,p+2,1) not in ('a','e','i','o','u','y','1','2','3','4') then
v_string := substr(v_string,1,p-1)||'4'||substr(v_string,p+2);
else if length(v_string) = p+1 then
v_string := substr(v_string,1,p-1)||'4';
end if;
end if;
end if;
end loop;

-- replacement of sch
v_string := replace(v_string,'sch','5');

-- s replacement if preceded AND followed by a vowel or a sound 1 to 4
for i in 2..length(v_string)
loop
if (substr(v_string,i,1)='s') then
if ( (substr(v_string,i-1,1) ) in ('a','e','i','o','u','y','1','2','3','4') ) AND
( (substr(v_string,i+1,1) ) in ('a','e','i','o','u','y','1','2','3','4') ) then
v_string := substr(v_string,1,i-1)||'z'||substr(v_string,i+1);
end if;
end if;
end loop;

-- Replacing groups 2 letters various
v_string := replace(v_string,'oe','e');
v_string := replace(v_string,'eu','e');
v_string := replace(v_string,'au','o');
v_string := replace(v_string,'oi','2');
v_string := replace(v_string,'oy','2');
v_string := replace(v_string,'ou','3');
v_string := replace(v_string,'ch','5');
v_string := replace(v_string,'sh','5');
v_string := replace(v_string,'ss','s');
v_string := replace(v_string,'sc','s');

-- Replacement of CHU by CHOU
v_string := replace(v_string,'5u','53');

-- Replacement of a c s if followed d' e
v_string := replace(v_string,'ce','se');
v_string := replace(v_string,'ci','si');

/************* Replacements various *************/

v_string := replace(v_string,'c','k');
v_string := replace(v_string,'qu','k');
v_string := replace(v_string,'q','k');

v_string := replace(v_string,'ga','ka');
v_string := replace(v_string,'go','ko');
v_string := replace(v_string,'gu','ku');
v_string := replace(v_string,'gy','ky');
v_string := replace(v_string,'g2','k2');
v_string := replace(v_string,'g1','k1');
v_string := replace(v_string,'g3','k3');

v_string := replace(v_string,'a','o');
v_string := replace(v_string,'d','t');
v_string := replace(v_string,'p','t');
v_string := replace(v_string,'j','g');
v_string := replace(v_string,'b','f');
v_string := replace(v_string,'v','f');
v_string := replace(v_string,'m','n');

-- Removing duplicated letters
letter := substr(v_string,1,1);
v_string_bis := letter;
for i in 2..length(v_string)
loop
if (substr(v_string,i,1) != letter) then
letter := substr(v_string,i,1);
v_string_bis:= v_string_bis||letter;
end if;
end loop;
v_string := v_string_bis;

-- Removing endings
v_string_bis := substr(v_string,length(v_string),1);
if (v_string_bis in ('t','x','s','z') ) then
v_string := substr(v_string,1,length(v_string)-1);
end if;

-- Conversion of characters to float
j := 10;
for i in 1..length(v_string)
loop
open c2;
while j>1 loop
fetch c2 into KPhon;
exit when c2%notfound;
if (substr(v_string,i,1) = KPhon.tabcarphon) then
cpt:= cpt-1;
result:=result+(KPhon.codecarphon*power(22,cpt));
j := j-1;
end if;
end loop;
close c2;
end loop;

return result;

exception
when custom_error then
raise_application_error(-20100,'pb sur'||l_in);
end;

Here is a PHP implementation of the algo "soundex2" by F. Bouchery and implements the use of regular expressions:
function soundex2( $sIn )
{
// If there is no word on fate immediately
if ( $sIn === '' ) return ' ';
// It makes every tiny
$sIn = strtoupper( $sIn );
// It removes accents
$sIn = strtr( $sIn, 'ÂÄÀÇÈÉÊËŒÎÏÔÖÙÛÜ', 'AAASEEEEEIIOOUUU' );
// It removes everything that is not a letter
$sIn = preg_replace( '`[^A-Z]`', '', $sIn );
// If the channel does a single character, we go out with.
if ( strlen( $sIn ) === 1 ) return $sIn . ' ';
// Replacing the primary consonnances
$convIn = array( 'GUI', 'GUE', 'GA', 'GO', 'GU', 'CA', 'CO', 'CU', 'Q', 'CC', 'CK' );
$convOut = array( 'KI', 'KE', 'KA', 'KO', 'K', 'KA', 'KO', 'KU', 'K','K', 'K' );
$sIn = str_replace( $convIn, $convOut, $sIn );
// replaces the vowels except Y and except the first by A
$sIn = preg_replace( '`(?<!^)[EIOU]`', 'A', $sIn );
// replaces prefixes then retains the first letter
// and we replacements additional
$convIn = array( '`^KN`', '`^(PH|PF)`', '`^MAC`', '`^SCH`', '`^ASA`', '`(?<!^)KN`', '`(?<!^)(PH|PF)`', '`(?<!^)MAC`',
'`(?<!^)SCH`','`(?<!^)ASA`' );
$convOut = array( 'NN', 'FF', 'MCC', 'SSS', 'AZA', 'NN', 'FF', 'MCC', 'SSS', 'AZA' );
$sIn = preg_replace( $convIn, $convOut, $sIn );
// suppression of H except CH SH or
$sIn = preg_replace( '`(?<![CS])H`', '', $sIn );
// Y suppression preceded except d a A
$sIn = preg_replace( '`(?<!A)Y`', '', $sIn );
// delete endings A, T, D, S
$sIn = preg_replace( '`[ATDS]$`', '', $sIn );
// Remove all except A top
$sIn = preg_replace( '`(?!^)A`', '', $sIn );
// delete the letters repetitive
$sIn = preg_replace( '`(.)\1`', '$1', $sIn );
// etains only 4 characters or on complete with white
return substr( $sIn . ' ', 0, 4);
}

Last Updated ( Monday, 29 March 2010 )
 
< Prev   Next >
School Joomla Templates and Joomla Tutorials