Comparison of grounds with SQL Server 2000 - Functions for bringing data PDF Print E-mail
User Rating: / 0
PoorBest 
Tuesday, 03 June 2008

I. Preamble
The methods of bringing data that we see are based on functions that measure the alignment or the removal of two strings.
These methods are intended to find good value in a table reference values accurate data from incorrect or imprecise. They are often useful in the construction of some solutions Informatics decision to classify the data during the migration (part of ETL, or ELT).

II. Distance Levenshtein
The distance Levenshtein owes its name to Vladimir Levenshtein which has set in 1965.
It calls Levenshtein distance between two words source and target the minimum cost to go from source to target performing elementary operations such as:
♦substitution of a character;
♦Adding in character;
♦Removing a character;
By combining operations at each weighing one finds the cost per weight summation of the various operations.
The Levenshtein distance generally operates on data type string, but you can use it on all channels and symbols in particular binary data.
The algorithm presented a strong complexity (mxn operations) may be written in a simplified manner through a matrix that we will develop through a table. It is nonetheless an algorithm expensive ...
That is why we will see a first implementation of this basic function, and an implementation limiting the depth of calculation to yield results more quickly.
Note already there is not a single implementation of the Levenshtein distance, but a family of algorithms that consider whether or not this or that operation and valuent the different weights depending on the problem to be addressed .

For example, in SQL can be seen that:
♦If the type of data is CHAR and that the substitution is valuee to 1 then added as the abolition of one character may be value to 3 because of the shift required in the chain.
♦However, if the type of data is VARCHAR and that the substitution is valuee to 1 then adding may be value to 9 because of the need to expand storage space to do so, but for the abolition this valuation may remain at 4 because it is not necessary to readjust the storage space, but just change the value of the parameter that specifies the actual length of the data stored.
In short everything depends on the point of view in which we go.

III. The implementation of Levenshtein "standard"
In our examples we considered that all operations of the algorithm Levenshtein should be valued at 1.
For example, to get the word HOLDERS DEPORTA the word, we need 6 unit operations:

1 suppression of D EPORTA
2 Suppression of E PORTA
3 modification of A to E DOOR
4 Adding U PORTEU
5 Adding R CARRIER
6 Adding S HOLDERS

Here is a first implementation of the algorithm to Transact SQL:
CREATE FUNCTION F_DISTANCE_LEVENSHTEIN (@ SOURCE VARCHAR (8000),
                                         @ TARGET VARCHAR (8000))
INT RETURNS AS
BEGIN

-- Management of side effects
-- Null input
IF @ SOURCE IS NULL OR @ TARGET IS NULL
    RETURN NULL

-- Identical
IF = @ @ SOURCE TARGET
    RETURN 0

DECLARE @ LN_SOURCE INT
DECLARE @ LN_CIBLE INT

SET @ LN_SOURCE = LEN (@ SOURCE) + 1
SET @ LN_CIBLE = LEN (@ TARGET) + 1

-- Empty string with full string
IF LN_SOURCE = @ @ LN_CIBLE OR 1 = 1
BEGIN
    RETURN LN_SOURCE + @ @ LN_CIBLE - 2
END

DECLARE @ MATRIX TABLE (ID Int, Value INT)
DECLARE @ i INT
DECLARE @ j INT
DECLARE @ TMP Int
DECLARE @ V1 Int
DECLARE @ V2 Int
DECLARE @ V3 Int
DECLARE @ Vmin Int
DECLARE @ COST INT

/ * Initialize the MATRIX * /
SET @ i = 0
WHILE (@ i <LN_SOURCE @ * @ LN_CIBLE)
BEGIN
    INSERT INTO @ MATRIX VALUES (@ i, 0)
    SET @ i = @ i + 1
END

/ * Initialize the first line * /
SET @ i = 0
WHILE (@ i <@ LN_SOURCE)
BEGIN
    SET @ tmp = 0
    UPDATE @ DIE SET Value = @ i
       WHERE ID = @ TMP
    SET @ i = @ i + 1
END

/ * Initializing the first column * /
SET @ i = 0
WHILE @ i <@ LN_CIBLE
BEGIN
    SET tmp = @ @ @ i * LN_SOURCE
    UPDATE @ DIE SET Value = @ i
       WHERE ID = @ TMP
    SET @ i = @ i + 1
END

/ * Compare the two chains * /

SET @ i = 1

WHILE @ i <@ LN_SOURCE / Column * * /
BEGIN
    SET @ j = 1
    WHILE @ j <@ LN_CIBLE / * * Line /
    BEGIN

       / * Check if the characters correspond * /
       IF SUBSTRING (@ SOURCE, @ i, 1) = SUBSTRING (@ TARGET, @ j, 1)
          / * Equal * /
          SET @ COST = 0
       ELSE
          / * ALTERNATIVE, INSERTION * /
          SET @ = 1 COST

       / * Playing the left box * /
       SELECT @ v1 = Value +1
       DIE FROM @
       WHERE ID = j * @ @ @ LN_SOURCE + i - 1

       / * Reading of the case d above * /
       SELECT @ = v2 Value +1
       DIE FROM @
       WHERE ID = (@ j-1) * @ @ i + LN_SOURCE

       / * Reading of the box diagonal left high * /
       SELECT @ @ = v3 COST + Value
       DIE FROM @
       WHERE ID = (@ j-1) * @ LN_SOURCE + @ i-1

       / * It takes the value of the smallest and is inserted in the current case * /
       IF @ V1 <= @ @ V2 AND V1 <= @ V3
          SET @ @ = Vmin V1
       ELSE
          IF @ V2 <= @ V3
             SET @ @ = Vmin V2
          ELSE
             SET Vmin = @ @ V3

       SET tmp = @ @ @ j * + @ i LN_SOURCE
       UPDATE @ DIE
       SET Value = @ Vmin
       WHERE ID = @ TMP

       SET @ j = @ j +1
    END

    SET @ i = @ i +1
END

/ * It returns the cost of processing * /
SELECT @ tmp = Value
DIE FROM @
WHERE ID = (@ LN_CIBLE-1) * @ LN_SOURCE + @ LN_SOURCE-1

RETURN @ TMP

END

GO

To test the effects, here Thursday the tests that I propose:
Either a table containing a column clothing color and color table containing the target reference colours. We will use the office to try to find the value color reference of each garment, closest to those seized by careless operators ...

CREATE TABLE T_VETEMENT_VTM
(VTM_DESCRIPTION VARCHAR (16),
  VTM_COULEUR VARCHAR (32))
GO

INSERT INTO T_VETEMENT_VTM VALUES ( 'Pantalon1', 'Red')
INSERT INTO T_VETEMENT_VTM VALUES ( 'Robe2', 'green'eau')
INSERT INTO T_VETEMENT_VTM VALUES ( 'Robe3', 'Rape.)
INSERT INTO T_VETEMENT_VTM VALUES ( 'Pull4', 'Yellow straw')
INSERT INTO T_VETEMENT_VTM VALUES ( 'Robe5', 'Green')
INSERT INTO T_VETEMENT_VTM VALUES ( 'Pantalon6', 'Grey blue')
INSERT INTO T_VETEMENT_VTM VALUES ( 'Pantalon7', 'Green')
INSERT INTO T_VETEMENT_VTM VALUES ( 'Robe8', 'Red')
INSERT INTO T_VETEMENT_VTM VALUES ( 'Pull9', 'white')
INSERT INTO T_VETEMENT_VTM VALUES ( 'Robe10', 'White')
INSERT INTO T_VETEMENT_VTM VALUES ( 'Pantalon11', 'Grey mouse')
INSERT INTO T_VETEMENT_VTM VALUES ( 'Robe12', 'pink')
INSERT INTO T_VETEMENT_VTM VALUES ( 'Pull13', 'blue')
INSERT INTO T_VETEMENT_VTM VALUES ( 'Pantalon14', 'Red brick')
INSERT INTO T_VETEMENT_VTM VALUES ( 'Pull15', 'Brown')
INSERT INTO T_VETEMENT_VTM VALUES ( 'Chemise16', 'Vrt')
INSERT INTO T_VETEMENT_VTM VALUES ( 'Chemise17', 'Blenc')
GO

CREATE TABLE T_REF_COULEUR_RCL
(RCL_COULEUR VARCHAR (16))
GO

INSERT INTO T_REF_COULEUR_RCL VALUES ( 'Black')
INSERT INTO T_REF_COULEUR_RCL VALUES ( 'White')
INSERT INTO T_REF_COULEUR_RCL VALUES ( 'Green')
INSERT INTO T_REF_COULEUR_RCL VALUES ( 'Yellow')
INSERT INTO T_REF_COULEUR_RCL VALUES ( 'Blue')
INSERT INTO T_REF_COULEUR_RCL VALUES ( 'Red')
INSERT INTO T_REF_COULEUR_RCL VALUES ( 'Rose')
INSERT INTO T_REF_COULEUR_RCL VALUES ( 'Violet')
INSERT INTO T_REF_COULEUR_RCL VALUES ( 'Black')
INSERT INTO T_REF_COULEUR_RCL VALUES ( 'Grey')
INSERT INTO T_REF_COULEUR_RCL VALUES ( 'Brown')
GO

The following query will give us an approach to the solution:

SELECT VTM_DESCRIPTION, VTM_COULEUR, RCL_COULEUR,
        dbo.F_DISTANCE_LEVENSHTEIN (VTM_COULEUR, RCL_COULEUR) AS DL
FROM T_VETEMENT_VTM V
        CROSS JOIN T_REF_COULEUR_RCL C
WHERE dbo.F_DISTANCE_LEVENSHTEIN (VTM_COULEUR, RCL_COULEUR)
        = (SELECT MIN (dbo.F_DISTANCE_LEVENSHTEIN (VTM_COULEUR, RCL_COULEUR))
           FROM T_VETEMENT_VTM
                  CROSS JOIN T_REF_COULEUR_RCL
           WHERE VTM_DESCRIPTION = V. VTM_DESCRIPTION)
BY ORDER 1, 4 ASC


VTM_DESCRIPTION                  VTM_COULEUR                    RCL_COULEUR                     DL
---------------- -------------------------------- -- -------------- -----------
Chemise16                                        Vrt                                        Verde                            1
Chemise17                                      Blenc                                      Blanc                            1
Pantalon1                                         Red                                         Red                             0
Pantalon11                                  mouse Gris                                   Gris                             1
Pantalon14                                    Red brick                                   Green                           3 
Pantalon14                                    Red brick                                   Yellow                          3  
Pantalon14                                    Red brick                                    Blue                            3
Pantalon14                                    Red brick                                    Rouge                         3 
Pantalon14                                    Red brick                                    Rose                           3
Pantalon6                                     Grey blue                                     Blue                            0
Pantalon7                                       Green                                        Green                          0
Pull13                                                blue                                        Blue                           1
Pull15                                              Brown                                       Brown                         0
Pull4                                            Yellow straw                                  Blue                           2
Pull9                                                 white                                        White                         0
Robe10                                            Blanche                                     White                         2
Robe12                                              rose                                         Rose                          0
Robe2                                          Green water                                   Green                         3
Robe2                                          Green water                                   Yellow                        3
Robe2                                          Green Blue                                     Water                        3
Robe3                                               Rape.                                        Violet                        2
Robe5                                               Green                                        Verde                        1
Robe8                                                 Red                                          Red                          0
 
But the big disadvantage of this method is that, with so little data, the combination is such that the mass calculations already requires 3 seconds of calculations on a high-end PC. This is obviously unworkable in production with large masses of information.

IV. A Levenshtein "limited"
The idea is then to limit the depth poll. Looking at the results, we find that good approaches have been with a Levenshtein distance of between 0 and 2. On three we already pantalon14 and robe2 which gives us too many matches.
We can introduce into the code this limitation by modifying the algorithm as follows:

CREATE FUNCTION F_DISTANCE_LEVENSHTEIN_LIMITE (@ SOURCE VARCHAR (8000),
                                                @ TARGET VARCHAR (8000),
                                                @ LIMIT INT)
INT RETURNS AS
BEGIN

-- Management of side effects
-- Null input
IF @ SOURCE IS NULL OR TARGET IS NULL @ @ LIMIT OR IS NULL
    RETURN NULL

-- Limit of 0 and different!
IF LIMITED @ @ = 0 AND SOURCE <> @ TARGET RETURN NULL

-- Lower limit to zero:
IF @ LIMIT <0 RETURN NULL

-- Identical
IF = @ @ SOURCE TARGET
    RETURN 0

DECLARE @ LN_SOURCE INT
DECLARE @ LN_CIBLE INT

SET @ LN_SOURCE = LEN (@ SOURCE) + 1
SET @ LN_CIBLE = LEN (@ TARGET) + 1

-- Empty string with full string
IF LN_SOURCE = @ @ LN_CIBLE OR 1 = 1
BEGIN
    RETURN LN_SOURCE + @ @ LN_CIBLE - 2
END

DECLARE @ MATRIX TABLE (ID Int, Value INT)
DECLARE @ i INT
DECLARE @ j INT
DECLARE @ TMP Int
DECLARE @ V1 Int
DECLARE @ V2 Int
DECLARE @ V3 Int
DECLARE @ Vmin Int
DECLARE @ COST INT

/ * Initialize the MATRIX * /
SET @ i = 0
WHILE (@ i <LN_SOURCE @ * @ LN_CIBLE)
BEGIN
    INSERT INTO @ MATRIX VALUES (@ i, 0)
    SET @ i = @ i + 1
END

/ * Initialize the first line * /
SET @ i = 0
WHILE (@ i <@ LN_SOURCE)
BEGIN
    SET @ tmp = 0
    UPDATE @ DIE SET Value = @ i
       WHERE ID = @ TMP
    SET @ i = @ i + 1
END

/ * Initializing the first column * /
SET @ i = 0
WHILE @ i <@ LN_CIBLE
BEGIN
    SET tmp = @ @ @ i * LN_SOURCE
    UPDATE @ DIE SET Value = @ i
       WHERE ID = @ TMP
    SET @ i = @ i + 1
END

/ * Compare the two chains * /

SET @ i = 1

WHILE @ i <@ LN_SOURCE / Column * * /
BEGIN
    SET @ j = 1
    WHILE @ j <@ LN_CIBLE / * * Line /
    BEGIN

       / * Check if the characters correspond * /
       IF SUBSTRING (@ SOURCE, @ i, 1) = SUBSTRING (@ TARGET, @ j, 1)
          / * Equal * /
          SET @ COST = 0
       ELSE
          / * ALTERNATIVE, INSERTION * /
          SET @ = 1 COST

       / * Playing the left box * /
       SELECT @ v1 = Value +1
       DIE FROM @
       WHERE ID = j * @ @ @ LN_SOURCE + i - 1

       / * Reading of the case d above * /
       SELECT @ = v2 Value +1
       DIE FROM @
       WHERE ID = (@ j-1) * @ @ i + LN_SOURCE

       / * Reading of the box diagonal left high * /
       SELECT @ @ = v3 COST + Value
       DIE FROM @
       WHERE ID = (@ j-1) * @ LN_SOURCE + @ i-1

       / * It takes the value of the smallest and is inserted in the current case * /
       IF @ V1 <= @ @ V2 AND V1 <= @ V3
          SET @ @ = Vmin V1
       ELSE
          IF @ V2 <= @ V3
             SET @ @ = Vmin V2
          ELSE
             SET Vmin = @ @ V3

       SET tmp = @ @ @ j * + @ i LN_SOURCE
       UPDATE @ DIE
       SET Value = @ Vmin
       WHERE ID = @ TMP

-- If the limit is exceeded, it returns NULL!
       IF EXISTS (SELECT *
                 DIE FROM @
                 WHERE ID = (@ LN_CIBLE-1) * @ LN_SOURCE + @ LN_SOURCE-1
                   AND Value> @ LIMIT)
          RETURN NULL

       SET @ j = @ j +1
    END

    SET @ i = @ i +1
END

/ * It returns the cost of processing * /
SELECT @ tmp = Value
DIE FROM @
WHERE ID = (@ LN_CIBLE-1) * @ LN_SOURCE + @ LN_SOURCE-1

RETURN @ TMP

END

In our case, using a limit of 2, we get a more consistent answer already:

-- The best approaches with a limit of 2
SELECT VTM_DESCRIPTION, VTM_COULEUR, RCL_COULEUR,
        dbo.F_DISTANCE_LEVENSHTEIN (VTM_COULEUR, RCL_COULEUR) AS DL
FROM T_VETEMENT_VTM V
        CROSS JOIN T_REF_COULEUR_RCL C
WHERE dbo.F_DISTANCE_LEVENSHTEIN_LIMITE (VTM_COULEUR, RCL_COULEUR, 2)
        = (SELECT MIN (dbo.F_DISTANCE_LEVENSHTEIN_LIMITE (VTM_COULEUR,
                                                        RCL_COULEUR, 2))
           FROM T_VETEMENT_VTM
                  CROSS JOIN T_REF_COULEUR_RCL
           WHERE VTM_DESCRIPTION = V. VTM_DESCRIPTION)
BY ORDER 1, 4 ASC

VTM_DESCRIPTION          VTM_COULEUR         RCL_COULEUR          DL
     ----------------               --------------------------------      -- --------------          -----------
Chemise16                             Vrt                             Verde                   1
Chemise17                            Blenc                           Blanc                  1
Pantalon1                               Red                             Red                    0
Pantalon11                         mouse Gris                      Gris                    1
Pantalon6                          Grey blue                          Blue                    0
Pantalon7                             Green                            Green                  0
Pull13                                    blue                              Blue                    1
Pull15                                   Brown                            Brown                 0
Pull4                                   Yellow straw                     Blue                   2
Pull9                                      white                            White                  0
Robe10                                 Blanche                         White                  2
Robe12                                   rose                             Rose                   0
Robe3                                     Rape.                           Violet                  2
Robe5                                     Green                           Verde                 1
Robe8                                     Red                               Red                   0

You will note, however, there are still differences. For example, the grey blue Pantalon6 was found rather than blue grey .... When the Pull4 it was found while blue is yellow straw. In fact Levenshtein distance is very little credibility when occurrences have very different lengths.

There are various means to correct that.
First to valuant strongly insertions and deletions and low substitutions.
After taking into account the difference in length and weighing the result of Levenshtein.
Finally using other measures such as the difference HAMMING
I leave you to work to change skilfully algorithm that Levenshtein. As for me, I'll show you the difference HAMMING ...

V. Difference Hamming
It is simply the number of different symbols from one channel to another ... Just read sequentially the two channels by comparing each letter to the position in two words. If the letter is identical on account 1.

The difference this Hamming may be coded as follows Transact SQL:

CREATE FUNCTION F_DIFFERENCE_HAMMING (@ SOURCE VARCHAR (8000),
                                       @ TARGET VARCHAR (8000))
    INT RETURNS
AS
BEGIN

IF @ SOURCE IS NULL OR @ TARGET RETURN IS NULL NULL

SOURCE IF @ @ =''OR''RETURN TARGET = LEN (@ SOURCE) + LEN (@ TARGET)

DECLARE @ COUNT INT
DECLARE @ I INT, @ L INT

SET @ I = 1
SET @ L = LEN (@ SOURCE)
IF LEN (@ TARGET)> @ L
    SET @ L = LEN (@ TARGET)

SET @ COUNT = 0

WHILE @ I <= @ L
BEGIN
    IF @ I> LEN (@ SOURCE)
    BEGIN
       SET @ @ COUNT COUNT = + LEN (@ TARGET) - LEN (@ SOURCE) + 1
       BREAK
    END
    IF @ I> LEN (@ TARGET)
    BEGIN
       SET @ @ COUNT COUNT = + LEN (@ SOURCE) - LEN (@ TARGET) + 1
       BREAK
    END
    IF SUBSTRING (@ SOURCE, @ I, 1) <> SUBSTRING (@ TARGET, @ I, 1)
       SET @ @ COUNT COUNT = + 1

    SET @ @ I = I + 1
END

RETURN @ COUNT

END
GO

This algorithm is inexpensive because linear and gives some good results:

-- The best approaches Hamming
SELECT VTM_DESCRIPTION, VTM_COULEUR, RCL_COULEUR,
        dbo.F_DIFFERENCE_HAMMING (VTM_COULEUR, RCL_COULEUR) AS DL
FROM T_VETEMENT_VTM V
        CROSS JOIN T_REF_COULEUR_RCL C
WHERE dbo.F_DIFFERENCE_HAMMING (VTM_COULEUR, RCL_COULEUR)
        = (SELECT MIN (dbo.F_DIFFERENCE_HAMMING (VTM_COULEUR, RCL_COULEUR))
           FROM T_VETEMENT_VTM
                  CROSS JOIN T_REF_COULEUR_RCL
           WHERE VTM_DESCRIPTION = V. VTM_DESCRIPTION)
BY ORDER 1, 4 ASC

VTM_DESCRIPTION           VTM_COULEUR          RCL_COULEUR          DL
     ----------------                 --------------------------------       -- --------------             -----------
Chemise16                                    Vrt                          Green                  4
Chemise16                                    Vrt                           Grey                   4
Chemise17                                   Blenc                        Blanc                  1
Pantalon1                                     Red                           Red                    0
Pantalon11                                 Gris mouse                  Gris                    8
Pantalon14                                  Red brick                    Rouge                 8
Pantalon6                                    blue Gris                      Gris                   6
Pantalon7                                    Green                          Green                 0
Pull13                                           blue                            Blue                  2
Pull15                                           Brown                         Brown                0
Pull4                                          Yellow straw                  Yellow                8
Pull9                                              white                         White                 0
Robe10                                          Blanche                     White                 3
Robe12                                           rose                          Rose                 0
Robe2                                        Green water                   Green                 7
Robe3                                           Rape.                         Violet                 3
Robe5                                           Green                         Green                2
Robe8                                           Red                            Red                   0

But our view is Chemise16 green and grey while Pantalon6 is seen specifically grey ...

Can we still improve our score?
I thought working on a more direct approach, particularly in passing, not by the difference, but by direct letters. I called this algorithm INFERENCE BASIC.

VI. Inference basic
Frederic Brouard algorithm.
It is neither more nor less than to compare the letters of a string to another, always advancing in the same direction:
on advance letter by letter in the word target;
it positions itself to the first letter in the word source;
if a letter is found in the word source on account 1 and we remain positioned to this letter in the word source;
the comparison is not commutative, again on target and reversing a source;
then return on the best of both counts.
This is an algorithm that Levenshtein cheaper, but a little more than Hamming.
It can also be optimized by the fact that the polling reverse is not necessary if the count 0 refers to the first pass (no joint letter).

Example:
Either by inference to compare basic words and

A  N  N A N A S

 B A N A N A S:
First pass:

B A N A N E

   A N N A N A S

The first pass gives 3 common symbols
Second passe:

   A N N A N A S

B A N A N E

It gives 4 common symbols.
The direct inference is therefore 4.
Here is the code of this feature:
CREATE FUNCTION F_INFERENCE_BASIQUE (@ SOURCE VARCHAR (8000),
                                      @ TARGET VARCHAR (8000))
INT RETURNS
AS
BEGIN

IF @ SOURCE IS NULL OR @ TARGET RETURN IS NULL NULL

SOURCE IF @ @ =''OR''RETURN TARGET = 0

DECLARE @ COUNT1 INT, @ COUNT2 INT
DECLARE @ I INT, @ J INT, @ L INT
DECLARE @ C CHAR (1)

SET @ COUNT1 = 0
SET @ COUNT2 = 0

SET @ I = 1
SET @ J = 1
SET @ L = LEN (@ SOURCE)

-- First pass source to target
WHILE @ I <= @ L
BEGIN
    SET @ C = SUBSTRING (@ SOURCE, @ I, 1)
    IF CHARINDEX (C @, @ TARGET, @ J)> 0
    BEGIN
       SET COUNT1 = @ @ COUNT1 + 1
       SET @ J = CHARINDEX (C @, @ TARGET, @ J) + 1
       SET @ @ I = I + 1
       CONTINUES
    END
    SET @ @ I = I + 1
END

IF @ COUNT1 RETURN 0 = 0

SET @ I = 1
SET @ J = 1
SET @ L = LEN (@ TARGET)

-- Second pass target to source
WHILE @ I <= @ L
BEGIN
    SET @ C = SUBSTRING (@ TARGET, @ I, 1)
    IF CHARINDEX (C @, @ SOURCE, @ J)> 0
    BEGIN
       SET COUNT2 = @ @ COUNT2 + 1
       SET @ J = CHARINDEX (C @, @ SOURCE, @ J) + 1
       SET @ @ I = I + 1
       CONTINUES
    END
    SET @ @ I = I + 1
END

-- The best counting
IF @ COUNT1 <@ COUNT2
    SET COUNT1 = @ @ COUNT2

RETURN @ COUNT1

END

GO

In our example, the results speak for themselves:

VTM_DESCRIPTION        VTM_COULEUR            RCL_COULEUR           IB
     ----------------              --------------------------------         -- --------------           -----------
Chemise16                           Vrt                                  Green                   3
Chemise17                          Blenc                               Blanc                   4
Pantalon1                            Red                                 Red                      5
Pantalon11                       Gris mouse                          Gris                     4
Pantalon14                        Red brick                            Red                     5
Pantalon6                         Grey blue                            Blue                     4
Pantalon6                          Gris blue                            Gris                      4
Pantalon7                          Green                                Green                   4
Pull13                                  blue                                 Blue                     4
Pull15                                Brown                               Brown                   6
Pull4                              straw Yellow                         Yellow                    5
Pull9                                  white                                 Blanc                    5
Robe10                              Blanche                             White                    5
Robe12                               rose                                  Rose                    4
Robe2                           Green water                            Green                   4
Robe3                                Rape.                                Violet                    4
Robe5                               Green                                 Green                   4
Robe8                                Red                                    Red                     5
The only case that has not been settled is the Pantalon6 grey blue! This is indeed the only case undecidable by the mere fact of a machine ...

VII. Comparison of computing time
The time for calculating these various algorithms to test our game are as follows (the complaints have been launched without the ORDER BY clause):

method UC Time (ms) Time cast (ms)
Levenshtein 10 375 10 959
 
Levenshtein limited to 2 11 562 11 984
Hamming 407 447
Inference basic 797 830

It finally realizes that the Levenshtein limited does not provide conclusive results and frankly that the test limitation is ultimately more expensive than its "free".
It notes that the cost of basic inference is less than twice that of the difference Hamming. But given his best performance in terms of bringing grounds it is paid in many cases ..

 
< Prev   Next >
School Joomla Templates and Joomla Tutorials