Comparison of grounds with SQL Server 2000 - Functions for bringing data PDF Print E-mail
User Rating: / 0
PoorBest 
Tuesday, 27 May 2008

I. Preamble
The methods of bringing data that we see are based on functions that measure the alignment or the removal of two strings.
These methods are intended to find good value in a table reference values accurate data from incorrect or imprecise. They are often useful in the construction of some solutions Informatics decision to classify the data during the migration (part of ETL, or ELT).

II. Distance Levenshtein
The distance Levenshtein owes its name to Vladimir Levenshtein which has set in 1965.
It calls Levenshtein distance between two words source and target the minimum cost to go from source to target performing elementary operations such as:

substitution of a character;
Adding in character;
Removing a character;
By combining operations at each weighing one finds the cost per weight summation of the various operations.
The Levenshtein distance generally operates on data type string, but you can use it on all channels and symbols in particular binary data.

The algorithm presented a strong complexity (mxn operations) may be written in a simplified manner through a matrix that we will develop through a table. It is nonetheless an algorithm expensive ...
That is why we will see a first implementation of this basic function, and an implementation limiting the depth of calculation to yield results more quickly.

Note already there is not a single implementation of the Levenshtein distance, but a family of algorithms that consider whether or not this or that operation and valuent the different weights depending on the problem to be addressed .

For example, in SQL can be seen that:
If the type of data is CHAR and that the substitution is valuee to 1 then added as the abolition of one character may be value to 3 because of the shift required in the chain.

However, if the type of data is VARCHAR and that the substitution is valuee to 1 then adding may be value to 9 because of the need to expand storage space to do so, but for the abolition this valuation may remain at 4 because it is not necessary to readjust the storage space, but just change the value of the parameter that specifies the actual length of the data stored.
In short everything depends on the point of view in which we go.

III. The implementation of Levenshtein "standard"
In our examples we considered that all operations of the algorithm Levenshtein should be valued at 1.

For example, to get the word HOLDERS DEPORTA the word, we need 6 unit operations:
1
suppression of D
EPORTA
2
Suppression of E
PORTA
3
modification of A to E
DOOR
4
Adding U
PORTEU
5
Adding R
CARRIER
6
Adding S
HOLDERS
Here is a first implementation of the algorithm to Transact SQL:

CREATE FUNCTION F_DISTANCE_LEVENSHTEIN (@ SOURCE VARCHAR (8000),
                                         @ TARGET VARCHAR (8000))
INT RETURNS AS
BEGIN

-- Management of side effects
-- Null input
IF @ SOURCE IS NULL OR @ TARGET IS NULL
    RETURN NULL

-- Identical
IF = @ @ SOURCE TARGET
    RETURN 0

DECLARE @ LN_SOURCE INT
DECLARE @ LN_CIBLE INT

SET @ LN_SOURCE = LEN (@ SOURCE) + 1
SET @ LN_CIBLE = LEN (@ TARGET) + 1

-- Empty string with full string
IF LN_SOURCE = @ @ LN_CIBLE OR 1 = 1
BEGIN
    RETURN LN_SOURCE + @ @ LN_CIBLE - 2
END

DECLARE @ MATRIX TABLE (ID Int, Value INT)
DECLARE @ i INT
DECLARE @ j INT
DECLARE @ TMP Int
DECLARE @ V1 Int
DECLARE @ V2 Int
DECLARE @ V3 Int
DECLARE @ Vmin Int
DECLARE @ COST INT

/ * Initialize the MATRIX * /
SET @ i = 0
WHILE (@ i <LN_SOURCE @ * @ LN_CIBLE)
BEGIN
    INSERT INTO @ MATRIX VALUES (@ i, 0)
    SET @ i = @ i + 1
END

/ * Initialize the first line * /
SET @ i = 0
WHILE (@ i <@ LN_SOURCE)
BEGIN
    SET @ tmp = 0
    UPDATE @ DIE SET Value = @ i
       WHERE ID = @ TMP
    SET @ i = @ i + 1
END

/ * Initializing the first column * /
SET @ i = 0
WHILE @ i <@ LN_CIBLE
BEGIN
    SET tmp = @ @ @ i * LN_SOURCE
    UPDATE @ DIE SET Value = @ i
       WHERE ID = @ TMP
    SET @ i = @ i + 1
END

/ * Compare the two chains * /

SET @ i = 1

WHILE @ i <@ LN_SOURCE / Column * * /
BEGIN
    SET @ j = 1
    WHILE @ j <@ LN_CIBLE / * * Line /
    BEGIN

       / * Check if the characters correspond * /
       IF SUBSTRING (@ SOURCE, @ i, 1) = SUBSTRING (@ TARGET, @ j, 1)
          / * Equal * /
          SET @ COST = 0
       ELSE
          / * ALTERNATIVE, INSERTION * /
          SET @ = 1 COST

       / * Playing the left box * /
       SELECT @ v1 = Value +1
       DIE FROM @
       WHERE ID = j * @ @ @ LN_SOURCE + i - 1

       / * Reading of the case d above * /
       SELECT @ = v2 Value +1
       DIE FROM @
       WHERE ID = (@ j-1) * @ @ i + LN_SOURCE

       / * Reading of the box diagonal left high * /
       SELECT @ @ = v3 COST + Value
       DIE FROM @
       WHERE ID = (@ j-1) * @ LN_SOURCE + @ i-1

       / * It takes the value of the smallest and is inserted in the current case * /
       IF @ V1 <= @ @ V2 AND V1 <= @ V3
          SET @ @ = Vmin V1
       ELSE
          IF @ V2 <= @ V3
             SET @ @ = Vmin V2
          ELSE
             SET Vmin = @ @ V3

       SET tmp = @ @ @ j * + @ i LN_SOURCE
       UPDATE @ DIE
       SET Value = @ Vmin
       WHERE ID = @ TMP

       SET @ j = @ j +1
    END

    SET @ i = @ i +1
END

/ * It returns the cost of processing * /
SELECT @ tmp = Value
DIE FROM @
WHERE ID = (@ LN_CIBLE-1) * @ LN_SOURCE + @ LN_SOURCE-1

RETURN @ TMP

END

GO

To test the effects, here Thursday the tests that I propose:
Either a table containing a column clothing color and color table containing the target reference colours. We will use the office to try to find the value color reference of each garment, closest to those seized by careless operators ...

CREATE TABLE T_VETEMENT_VTM
(VTM_DESCRIPTION VARCHAR (16),
  VTM_COULEUR VARCHAR (32))
GO
INSERT INTO T_VETEMENT_VTM VALUES ( 'Pantalon1', 'Red')
INSERT INTO T_VETEMENT_VTM VALUES ( 'Robe2', 'green'eau')
INSERT INTO T_VETEMENT_VTM VALUES ( 'Robe3', 'Rape.)
INSERT INTO T_VETEMENT_VTM VALUES ( 'Pull4', 'Yellow straw')
INSERT INTO T_VETEMENT_VTM VALUES ( 'Robe5', 'Green')
INSERT INTO T_VETEMENT_VTM VALUES ( 'Pantalon6', 'Grey blue')
INSERT INTO T_VETEMENT_VTM VALUES ( 'Pantalon7', 'Green')
INSERT INTO T_VETEMENT_VTM VALUES ( 'Robe8', 'Red')
INSERT INTO T_VETEMENT_VTM VALUES ( 'Pull9', 'white')
INSERT INTO T_VETEMENT_VTM VALUES ( 'Robe10', 'White')
INSERT INTO T_VETEMENT_VTM VALUES ( 'Pantalon11', 'Grey mouse')
INSERT INTO T_VETEMENT_VTM VALUES ( 'Robe12', 'pink')
INSERT INTO T_VETEMENT_VTM VALUES ( 'Pull13', 'blue')
INSERT INTO T_VETEMENT_VTM VALUES ( 'Pantalon14', 'Red brick')
INSERT INTO T_VETEMENT_VTM VALUES ( 'Pull15', 'Brown')
INSERT INTO T_VETEMENT_VTM VALUES ( 'Chemise16', 'Vrt')
INSERT INTO T_VETEMENT_VTM VALUES ( 'Chemise17', 'Blenc')
GO

CREATE TABLE T_REF_COULEUR_RCL
(RCL_COULEUR VARCHAR (16))
GO
INSERT INTO T_REF_COULEUR_RCL VALUES ( 'Black')
INSERT INTO T_REF_COULEUR_RCL VALUES ( 'White')
INSERT INTO T_REF_COULEUR_RCL VALUES ( 'Green')
INSERT INTO T_REF_COULEUR_RCL VALUES ( 'Yellow')
INSERT INTO T_REF_COULEUR_RCL VALUES ( 'Blue')
INSERT INTO T_REF_COULEUR_RCL VALUES ( 'Red')
INSERT INTO T_REF_COULEUR_RCL VALUES ( 'Rose')
INSERT INTO T_REF_COULEUR_RCL VALUES ( 'Violet')
INSERT INTO T_REF_COULEUR_RCL VALUES ( 'Black')
INSERT INTO T_REF_COULEUR_RCL VALUES ( 'Grey')
INSERT INTO T_REF_COULEUR_RCL VALUES ( 'Brown')
GO

The following query will give us an approach to the solution:

SELECT VTM_DESCRIPTION, VTM_COULEUR, RCL_COULEUR,
        dbo.F_DISTANCE_LEVENSHTEIN (VTM_COULEUR, RCL_COULEUR) AS DL
FROM T_VETEMENT_VTM V
        CROSS JOIN T_REF_COULEUR_RCL C
WHERE dbo.F_DISTANCE_LEVENSHTEIN (VTM_COULEUR, RCL_COULEUR)
        = (SELECT MIN (dbo.F_DISTANCE_LEVENSHTEIN (VTM_COULEUR, RCL_COULEUR))
           FROM T_VETEMENT_VTM
                  CROSS JOIN T_REF_COULEUR_RCL
           WHERE VTM_DESCRIPTION = V. VTM_DESCRIPTION)
BY ORDER 1, 4 ASC

VTM_DESCRIPTION VTM_COULEUR RCL_COULEUR DL
---------------- -------------------------------- -- -------------- -----------
Chemise16 Vrt Verde 1
Chemise17 Blenc Blanc 1
Pantalon1 Red Red 0
Pantalon11 mouse Gris Gris 1
Red brick Pantalon14 Green 3
Red brick Pantalon14 Yellow 3
Red brick Pantalon14 Blue 3
Red brick Pantalon14 Rouge 3
Red brick Pantalon14 Rose 3
Pantalon6 Grey blue Blue 0
Pantalon7 Green Green 0
Pull13 blue Blue 1
Pull15 Brown Brown 0
Pull4 Yellow straw Blue 2
Pull9 white White 0
Robe10 Blanche White 2
Robe12 rose Rose 0
Robe2 Green water Green 3
Robe2 Green water Yellow 3
Robe2 Green Blue Water 3
Robe3 Rape. Violet 2
Robe5 Green Verde 1
Robe8 Red Red 0

But the big disadvantage of this method is that, with so little data, the combination is such that the mass calculations already requires 3 seconds of calculations on a high-end PC. This is obviously unworkable in production with large masses of information.

 

IV. A Levenshtein "limited"
The idea is then to limit the depth poll. Looking at the results, we find that good approaches have been with a Levenshtein distance of between 0 and 2. On three we already pantalon14 and robe2 which gives us too many matches.
We can introduce into the code this limitation by modifying the algorithm as follows:

CREATE FUNCTION F_DISTANCE_LEVENSHTEIN_LIMITE (@ SOURCE VARCHAR (8000),
                                                @ TARGET VARCHAR (8000),
                                                @ LIMIT INT)
INT RETURNS AS
BEGIN

-- Management of side effects
-- Null input
IF @ SOURCE IS NULL OR TARGET IS NULL @ @ LIMIT OR IS NULL
    RETURN NULL

-- Limit of 0 and different!
IF LIMITED @ @ = 0 AND SOURCE <> @ TARGET RETURN NULL

-- Lower limit to zero:
IF @ LIMIT <0 RETURN NULL

-- Identical
IF = @ @ SOURCE TARGET
    RETURN 0

DECLARE @ LN_SOURCE INT
DECLARE @ LN_CIBLE INT

SET @ LN_SOURCE = LEN (@ SOURCE) + 1
SET @ LN_CIBLE = LEN (@ TARGET) + 1

-- Empty string with full string
IF LN_SOURCE = @ @ LN_CIBLE OR 1 = 1
BEGIN
    RETURN LN_SOURCE + @ @ LN_CIBLE - 2
END

DECLARE @ MATRIX TABLE (ID Int, Value INT)
DECLARE @ i INT
DECLARE @ j INT
DECLARE @ TMP Int
DECLARE @ V1 Int
DECLARE @ V2 Int
DECLARE @ V3 Int
DECLARE @ Vmin Int
DECLARE @ COST INT

/ * Initialize the MATRIX * /
SET @ i = 0
WHILE (@ i <LN_SOURCE @ * @ LN_CIBLE)
BEGIN
    INSERT INTO @ MATRIX VALUES (@ i, 0)
    SET @ i = @ i + 1
END

/ * Initialize the first line * /
SET @ i = 0
WHILE (@ i <@ LN_SOURCE)
BEGIN
    SET @ tmp = 0
    UPDATE @ DIE SET Value = @ i
       WHERE ID = @ TMP
    SET @ i = @ i + 1
END

/ * Initializing the first column * /
SET @ i = 0
WHILE @ i <@ LN_CIBLE
BEGIN
    SET tmp = @ @ @ i * LN_SOURCE
    UPDATE @ DIE SET Value = @ i
       WHERE ID = @ TMP
    SET @ i = @ i + 1
END

/ * Compare the two chains * /

SET @ i = 1

WHILE @ i <@ LN_SOURCE / Column * * /
BEGIN
    SET @ j = 1
    WHILE @ j <@ LN_CIBLE / * * Line /
    BEGIN

       / * Check if the characters correspond * /
       IF SUBSTRING (@ SOURCE, @ i, 1) = SUBSTRING (@ TARGET, @ j, 1)
          / * Equal * /
          SET @ COST = 0
       ELSE
          / * ALTERNATIVE, INSERTION * /
          SET @ = 1 COST

       / * Playing the left box * /
       SELECT @ v1 = Value +1
       DIE FROM @
       WHERE ID = j * @ @ @ LN_SOURCE + i - 1

       / * Reading of the case d above * /
       SELECT @ = v2 Value +1
       DIE FROM @
       WHERE ID = (@ j-1) * @ @ i + LN_SOURCE

       / * Reading of the box diagonal left high * /
       SELECT @ @ = v3 COST + Value
       DIE FROM @
       WHERE ID = (@ j-1) * @ LN_SOURCE + @ i-1

       / * It takes the value of the smallest and is inserted in the current case * /
       IF @ V1 <= @ @ V2 AND V1 <= @ V3
          SET @ @ = Vmin V1
       ELSE
          IF @ V2 <= @ V3
             SET @ @ = Vmin V2
          ELSE
             SET Vmin = @ @ V3

       SET tmp = @ @ @ j * + @ i LN_SOURCE
       UPDATE @ DIE
       SET Value = @ Vmin
       WHERE ID = @ TMP

-- If the limit is exceeded, it returns NULL!
       IF EXISTS (SELECT *
                 DIE FROM @
                 WHERE ID = (@ LN_CIBLE-1) * @ LN_SOURCE + @ LN_SOURCE-1
                   AND Value> @ LIMIT)
          RETURN NULL

       SET @ j = @ j +1
    END

    SET @ i = @ i +1
END

/ * It returns the cost of processing * /
SELECT @ tmp = Value
DIE FROM @
WHERE ID = (@ LN_CIBLE-1) * @ LN_SOURCE + @ LN_SOURCE-1
RETURN @ TMP
END

In our case, using a limit of 2, we get a more consistent answer already:

-- The best approaches with a limit of 2
SELECT VTM_DESCRIPTION, VTM_COULEUR, RCL_COULEUR,
        dbo.F_DISTANCE_LEVENSHTEIN (VTM_COULEUR, RCL_COULEUR) AS DL
FROM T_VETEMENT_VTM V
        CROSS JOIN T_REF_COULEUR_RCL C
WHERE dbo.F_DISTANCE_LEVENSHTEIN_LIMITE (VTM_COULEUR, RCL_COULEUR, 2)
        = (SELECT MIN (dbo.F_DISTANCE_LEVENSHTEIN_LIMITE (VTM_COULEUR,
                                                        RCL_COULEUR, 2))
           FROM T_VETEMENT_VTM
                  CROSS JOIN T_REF_COULEUR_RCL
           WHERE VTM_DESCRIPTION = V. VTM_DESCRIPTION)
BY ORDER 1, 4 ASC

VTM_DESCRIPTION VTM_COULEUR RCL_COULEUR DL
---------------- -------------------------------- -- -------------- -----------
Chemise16 Vrt Verde 1
Chemise17 Blenc Blanc 1
Pantalon1 Red Red 0
Pantalon11 mouse Gris Gris 1
Pantalon6 Grey blue Blue 0
Pantalon7 Green Green 0
Pull13 blue Blue 1
Pull15 Brown Brown 0
Pull4 Yellow straw Blue 2
Pull9 white White 0
Robe10 Blanche White 2
Robe12 rose Rose 0
Robe3 Rape. Violet 2
Robe5 Green Verde 1
Robe8 Red Red 0

You will note, however, there are still differences. For example, the grey blue Pantalon6 was found rather than blue grey .... When the Pull4 it was found while blue is yellow straw. In fact Levenshtein distance is very little credibility when occurrences have very different lengths.

There are various means to correct that.
First to valuant strongly insertions and deletions and low substitutions.
After taking into account the difference in length and weighing the result of Levenshtein.
Finally using other measures such as the difference HAMMING
I leave you to work to change skilfully algorithm that Levenshtein. As for me, I'll show you the difference HAMMING ...

V. Difference Hamming
It is simply the number of different symbols from one channel to another ... Just read sequentially the two channels by comparing each letter to the position in two words. If the letter is identical on account 1.

The difference this Hamming may be coded as follows Transact SQL:

CREATE FUNCTION F_DIFFERENCE_HAMMING (@ SOURCE VARCHAR (8000),
                                       @ TARGET VARCHAR (8000))
    INT RETURNS
AS
BEGIN

IF @ SOURCE IS NULL OR @ TARGET RETURN IS NULL NULL

SOURCE IF @ @ =''OR''RETURN TARGET = LEN (@ SOURCE) + LEN (@ TARGET)

DECLARE @ COUNT INT
DECLARE @ I INT, @ L INT

SET @ I = 1
SET @ L = LEN (@ SOURCE)
IF LEN (@ TARGET)> @ L
    SET @ L = LEN (@ TARGET)
SET @ COUNT = 0
WHILE @ I <= @ L
BEGIN
    IF @ I> LEN (@ SOURCE)
    BEGIN
       SET @ @ COUNT COUNT = + LEN (@ TARGET) - LEN (@ SOURCE) + 1
       BREAK
    END
    IF @ I> LEN (@ TARGET)
    BEGIN
       SET @ @ COUNT COUNT = + LEN (@ SOURCE) - LEN (@ TARGET) + 1
       BREAK
    END
    IF SUBSTRING (@ SOURCE, @ I, 1) <> SUBSTRING (@ TARGET, @ I, 1)
       SET @ @ COUNT COUNT = + 1
    SET @ @ I = I + 1
END
RETURN @ COUNT
END
GO

This algorithm is inexpensive because linear and gives some good results:
-- The best approaches Hamming
SELECT VTM_DESCRIPTION, VTM_COULEUR, RCL_COULEUR,
        dbo.F_DIFFERENCE_HAMMING (VTM_COULEUR, RCL_COULEUR) AS DL
FROM T_VETEMENT_VTM V
        CROSS JOIN T_REF_COULEUR_RCL C
WHERE dbo.F_DIFFERENCE_HAMMING (VTM_COULEUR, RCL_COULEUR)
        = (SELECT MIN (dbo.F_DIFFERENCE_HAMMING (VTM_COULEUR, RCL_COULEUR))
           FROM T_VETEMENT_VTM
                  CROSS JOIN T_REF_COULEUR_RCL
           WHERE VTM_DESCRIPTION = V. VTM_DESCRIPTION)
BY ORDER 1, 4 ASC

VTM_DESCRIPTION VTM_COULEUR RCL_COULEUR DL
---------------- -------------------------------- -- -------------- -----------
Chemise16 Vrt Green 4
Chemise16 Vrt Grey 4
Chemise17 Blenc Blanc 1
Pantalon1 Red Red 0
Pantalon11 Gris Gris mouse 8
Red brick Pantalon14 Rouge 8
Pantalon6 blue Gris Gris 6
Pantalon7 Green Green 0
Pull13 blue Blue 2
Pull15 Brown Brown 0
Pull4 Yellow Yellow straw 8
Pull9 white White 0
Robe10 Blanche White 3
Robe12 rose Rose 0
Robe2 Green water Green 7
Robe3 Rape. Violet 3
Robe5 Green Green 2
Robe8 Red Red 0

But our view is Chemise16 green and grey while Pantalon6 is seen specifically grey ...

Can we still improve our score?
I thought working on a more direct approach, particularly in passing, not by the difference, but by direct letters. I called this algorithm INFERENCE BASIC.

VI. Inference basic
Frederic Brouard algorithm.

It is neither more nor less than to compare the letters of a string to another, always advancing in the same direction:
on advance letter by letter in the word target;
it positions itself to the first letter in the word source;
if a letter is found in the word source on account 1 and we remain positioned to this letter in the word source;
the comparison is not commutative, again on target and reversing a source;
then return on the best of both counts.
This is an algorithm that Levenshtein cheaper, but a little more than Hamming.
It can also be optimized by the fact that the polling reverse is not necessary if the count 0 refers to the first pass (no joint letter).

Example:
Either by inference to compare basic words and ANNANAS BANANAS:
First pass:
B A N A N E
   A N N A N A S
The first pass gives 3 common symbols

Second passe:
   A N N A N A S
B A N A N E
It gives 4 common symbols.
The direct inference is therefore 4.

Here is the code of this feature:

CREATE FUNCTION F_INFERENCE_BASIQUE (@ SOURCE VARCHAR (8000),
                                      @ TARGET VARCHAR (8000))
INT RETURNS
AS
BEGIN
IF @ SOURCE IS NULL OR @ TARGET RETURN IS NULL NULL
SOURCE IF @ @ =''OR''RETURN TARGET = 0
DECLARE @ COUNT1 INT, @ COUNT2 INT
DECLARE @ I INT, @ J INT, @ L INT
DECLARE @ C CHAR (1)

SET @ COUNT1 = 0
SET @ COUNT2 = 0

SET @ I = 1
SET @ J = 1
SET @ L = LEN (@ SOURCE)

-- First pass source to target
WHILE @ I <= @ L
BEGIN
    SET @ C = SUBSTRING (@ SOURCE, @ I, 1)
    IF CHARINDEX (C @, @ TARGET, @ J)> 0
    BEGIN
       SET COUNT1 = @ @ COUNT1 + 1
       SET @ J = CHARINDEX (C @, @ TARGET, @ J) + 1
       SET @ @ I = I + 1
       CONTINUES
    END
    SET @ @ I = I + 1
END

IF @ COUNT1 RETURN 0 = 0

SET @ I = 1
SET @ J = 1
SET @ L = LEN (@ TARGET)

-- Second pass target to source
WHILE @ I <= @ L
BEGIN
    SET @ C = SUBSTRING (@ TARGET, @ I, 1)
    IF CHARINDEX (C @, @ SOURCE, @ J)> 0
    BEGIN
       SET COUNT2 = @ @ COUNT2 + 1
       SET @ J = CHARINDEX (C @, @ SOURCE, @ J) + 1
       SET @ @ I = I + 1
       CONTINUES
    END
    SET @ @ I = I + 1
END
-- The best counting
IF @ COUNT1 <@ COUNT2
    SET COUNT1 = @ @ COUNT2
RETURN @ COUNT1
END
GO

In our example, the results speak for themselves:

VTM_DESCRIPTION VTM_COULEUR RCL_COULEUR IB
---------------- -------------------------------- -- -------------- -----------
Chemise16 Vrt Green 3
Chemise17 Blenc Blanc 4
Pantalon1 Red Red 5
Pantalon11 Gris Gris mouse 4
Red brick Pantalon14 Red 5
Pantalon6 Grey blue Blue 4
Pantalon6 Gris Gris blue 4
Pantalon7 Green Green 4
Pull13 blue Blue 4
Pull15 Brown Brown 6
Pull4 Yellow straw Yellow 5
Pull9 white Blanc 5
Robe10 Blanche White 5
Robe12 rose Rose 4
Robe2 Green water Green 4
Robe3 Rape. Violet 4
Robe5 Green Green 4
Robe8 Red Red 5

The only case that has not been settled is the Pantalon6 grey blue! This is indeed the only case undecidable by the mere fact of a machine ...

VII. Comparison of computing time
The time for calculating these various algorithms to test our game are as follows (the complaints have been launched without the ORDER BY clause):
method
UC Time (ms)
Time cast (ms)
Levenshtein
10 375
10 959
Levenshtein limited to 2
11 562
11 984
Hamming
407
447
Inference basic
797
830
It finally realizes that the Levenshtein limited does not provide conclusive results and frankly that the test limitation is ultimately more expensive than its "free".
It notes that the cost of basic inference is less than twice that of the difference Hamming. But given his best performance in terms of bringing grounds it is paid in many cases ...

 
< Prev   Next >
School Joomla Templates and Joomla Tutorials