The division relational, myth or reality PDF Print E-mail
User Rating: / 0
PoorBest 
Sunday, 01 June 2008

Preamble
A little reminder being better than a good debate, here are the eight basic operations of algebra relational Dr. CODD and their transcription SQL.

PROJECTION SELECT [ALL] [DISTINCT] list of attributes
FROM table
SELECTION SELECT  list of attributes
FROM table
WHERE condition
JOINTURE SELECT list of attributes table
FROM table1 JOIN table2
ON table1.attribut1=table2.attribut1 ...
DIVISION ???
UNION SELECT list of attributes FROM table
UNION
SELECT list of attributes tables FROM table
INTERSECTION SELECT list of attributes FROM table
INTERSECT
SELECT list of attributes FROM table can also use a WHERE avec le NOT IN
SELECT list of attributes FROM table1 WHERE attribut1
NOT IN
SELECT  list of attributes FROM table2
DIFFERENCE SELECT list of attributes FROM table
EXCEPT
SELECT list of attributes FROM table
PRODUCT SELECT * FROM table1, table2 (no where) or
SELECT * FROM table1 CROSS JOIN table2

 

1. Problem Definition
A good example is better than an explanation tarabiscotee, here's the question we will ask:
We are in a business of large retailers with stores in different cities of France. These warehouses can accommodate products of different departments. The question is: what are the warehouses capable of serving ALL-rays? Of course we have a table of warehouses and other rays ...

To touch the issue of this question, the easiest way is to see this concept ensembliste in the form of diagrams Wen. Yes, I see, it does remind you nothing but alas… if: potatoes!


Nothing could be simpler than see the answer: just find warehouses, which are connected to ALL rays.
The answer is obvious here, and only TOULOUSE MARSEILLE meet the criteria of the question (as what, south east ahead of the north in our example).

Let us study the pattern tabular and the response that we want to ...

To help us, here is the script creation of the database and data:
/************************************************* /

/ * THE DIVISION RELATIONNELLE, MYTH OR REALITY? * /

/ * * Frederic BROUARD This e-mail address is being protected from spam bots, you need JavaScript enabled to view it /

/********************************** ***************/

/***********************/

/ * CREATION OF TABLES * /

/ * **********************/

/ * Table-ray * /

CREATE TABLE T_RAYON
(
RAYON_RYN CHAR(16)
)

/* tables des entrepôts */
CREATE TABLE T_ENTREPOT
(
VILLE_ETP CHAR(16,
RAYON_RYN CHAR(16)
)

/***************************/

/ * FOOD OF TABLES * /

/ * **************************/

/ * Food ray table * /

INSERT INTO T_RAYON
(RAYON_RYN) VALUES ('frais')
INSERT INTO T_RAYON
(RAYON_RYN) VALUES ('boisson')
INSERT INTO T_RAYON
(RAYON_RYN) VALUES ('conserve')
INSERT INTO T_RAYON
(RAYON_RYN) VALUES ('droguerie')

 / * feeding the table warehouses * /

INSERT INTO T_ENTREPOT
(VILLE_ETP, RAYON_RYN) VALUES ('PARIS', 'boisson')
INSERT INTO T_ENTREPOT
(VILLE_ETP, RAYON_RYN) VALUES ('PARIS', 'frais')
INSERT INTO T_ENTREPOT
(VILLE_ETP, RAYON_RYN) VALUES ('PARIS', 'conserve')
INSERT INTO T_ENTREPOT
(VILLE_ETP, RAYON_RYN) VALUES ('LYON', 'boisson')
INSERT INTO T_ENTREPOT
(VILLE_ETP, RAYON_RYN) VALUES ('LYON', 'conserve')
INSERT INTO T_ENTREPOT
(VILLE_ETP, RAYON_RYN) VALUES ('LYON', 'droguerie')
INSERT INTO T_ENTREPOT
(VILLE_ETP, RAYON_RYN) VALUES ('MARSEILLE', 'boisson')
INSERT INTO T_ENTREPOT
(VILLE_ETP, RAYON_RYN) VALUES ('MARSEILLE', 'frais')
INSERT INTO T_ENTREPOT
(VILLE_ETP, RAYON_RYN) VALUES ('MARSEILLE', 'conserve')
INSERT INTO T_ENTREPOT
(VILLE_ETP, RAYON_RYN) VALUES ('MARSEILLE', 'droguerie')
INSERT INTO T_ENTREPOT
(VILLE_ETP, RAYON_RYN) VALUES ('ANGER', 'boisson')
INSERT INTO T_ENTREPOT
(VILLE_ETP, RAYON_RYN) VALUES ('ANGER', 'frais')
INSERT INTO T_ENTREPOT
(VILLE_ETP, RAYON_RYN) VALUES ('ANGER', 'droguerie')
INSERT INTO T_ENTREPOT
(VILLE_ETP, RAYON_RYN) VALUES ('TOULOUSE', 'boisson')
INSERT INTO T_ENTREPOT
(VILLE_ETP, RAYON_RYN) VALUES ('TOULOUSE', 'frais')
INSERT INTO T_ENTREPOT
(VILLE_ETP, RAYON_RYN) VALUES ('TOULOUSE', 'conserve')
INSERT INTO T_ENTREPOT
(VILLE_ETP, RAYON_RYN) VALUES ('TOULOUSE', 'droguerie')

And the sight of this division table:

Dividend
T_ENTREPOT
VILLE_ETP RAYON_ETP
PARIS Drink
PARIS fee
PARIS keep
LYON Drink
LYON keep
LYON drugstore
MARSEILLE Drink
MARSEILLE fee
MARSEILLE keep
MARSEILLE drugstore
ANGER Drink
ANGER fee
ANGER drugstore
TOULOUSE Drink
TOULOUSE fee
TOULOUSE keep
TOULOUSE drugstore
Divisor
T_RAYON
RAYON_RYN
Drink
fee
keep
drugstore
Result
T_RESULTAT
 
VILLE_ETP
MARSEILLE
TOULOUSE


To "prove" that our division is correct just know that the Cartesian product of the result with the divisor gives a number of valid line of the table dividend.
That is the result of:

 SELECT * FROM T_RESULTAT, T_RAYON

is fully included in T_ENTREPOT.

2. Primary Test
A first implementation that comes immediately to mind is to count the number of occurrences of rays and to coincide with the enumeration of rays of various warehouses:

SELECT VILLE_ETP FROM T_ENTREPOT
GROUP BY VILLE_ETP
HAVING COUNT(*) = (SELECT COUNT(*) FROM T_RAYON)

Who gives the result: MARSEILLE, TOULOUSE

This solution, apparently good, only works in cases relatively limited:
where there is no redundancy of data in the table divider
where there is no radius in addition to a warehouse, not identified in the table T_ENTREPOT
AGAINST EXAMPLES ..

a) if the table T_RAYON contains:
T_RAYON

RAYON_RYN
Drink
fee
keep
drugstore
keep

after for example:

INSERT INTO T_RAYON VALUES ( 'keep')

The previous query gives a result void.

b) If the table T_ENTREPOT contains the following additional line:

INSERT INTO T_ENTREPOT (VILLE_ETP, RAYON_RYN) VALUES ( 'MARSEILLE', 'butcher')

The motion gives the result: TOULOUSE and 'forget' MARSEILLE!

c) If we have both the cases 1 and 2, we get the result TOULOUSE, which is completely wrong!

d) If we change one of the shelves in warehouses Marseilles or Toulouse in a radius non appears in the table T_RAYON:

UPDATE T_ENTREPOT SET RAYON_RYN = 'fish' WHERE VILLE_ETP = 'MARSEILLE' AND RAYON_RYN = 'retains'

Our request is obviously the same result, which is false, since only the warehouse Toulouse satisfied now to our request.
Note that sense of Codd, the introduction of T-uple ( "MARSEILLE", "butcher") does not preclude nor the definition of the division, nor the result that we expect. We want to know what stores are able to serve all rays of the table T_RAYONS, even though some stores may serve some rays more ...

Generally speaking we must never assume that the tables are consistent, and in particular that all rays of the table warehouses are listed in the table shelves, or even that there is no redundancy in one any tables. To the extent that our data may come from intermediaries tables or views.

In other words, there may be one or several extra radius in one or several warehouses, it does affect neither the division nor the result.

With the proposed change in the against-examples, here is the new data set to test our queries:

Dividend
T_ENTREPOT
VILLE_ETP RAYON_ETP
PARIS Drink
PARIS fee
PARIS keep
LYON Drink
LYON keep
LYON drugstore
MARSEILLE Drink
MARSEILLE fee
MARSEILLE keep
MARSEILLE drugstore
ANGER Drink
ANGER fee
ANGER drugstore
TOULOUSE Drink
TOULOUSE fee
TOULOUSE keep
TOULOUSE drugstore
Divisor
T_RAYON
RAYON_RYN
Drink
fee
keep
drugstore
Result
T_RESULTAT
 
VILLE_ETP
MARSEILLE
TOULOUSE


3. Raffinements
We can easily delete against example by introducing into our application a keyword additional SQL, the key word DISTINCT:

SELECT VILLE_ETP FROM T_ENTREPOT
GROUP BY VILLE_ETP
HAVING COUNT(*) =
(SELECT COUNT(DISTINCT RAYON_RYN) FROM T_RAYON)

But this does not mean the example against b. ..
To find the butcher, and to eliminate the outcome occurrences of warehouses which have a radius slaughter or any other department not included in the table rays, we can do the following:

Query Result
SELECT DISTINCT RAYON_RYN
FROM T_ENTREPOT
WHERE RAYON_RYN NOT IN
(SELECT RAYON_RYN
FROM T_RAYON)
RAYON_RYN
---------------
slaughter

which in our game test found the radius slaughter.
Therefore eliminate the radius slaughter more hits radius in the table warehouse is more than child's play ...

Query Result
SELECT DISTINCT RAYON_RYN
FROM T_ENTREPOT ETP
WHERE RAYON_RYN NOT IN
( 'butcher')
RAYON_RYN
---------------
Drink
keep
drugstore
fee


In imbriquant both requests, we have:

Query Result
 
SELECT DISTINCT RAYON_RYN
FROM T_ENTREPOT ETP
WHERE RAYON_RYN NOT IN (SELECT DISTINCT RAYON_RYN
FROM T_ENTREPOT
WHERE RAYON_RYN NOT IN (SELECT RAYON_RYN
FROM T_RAYON))
RAYON_RYN
---------------
Drink
keep
drugstore
fee

It can also eliminate most embedded DISTINCT which has more importance:

Query Result
SELECT DISTINCT RAYON_RYN
FROM T_ENTREPOT ETP
WHERE RAYON_RYN NOT IN
(SELECT RAYON_RYN
FROM T_ENTREPOT
WHERE RAYON_RYN NOT IN
(SELECT RAYON_RYN
FROM T_RAYON)
)
RAYON_RYN
---------------
Drink
keep
drugstore
fee

This leads to a shortlist of warehouses whose rays are contained in the table T_RAYON.
Now, if we go back to our basic complaint, simply specify that the radius must be included in the result and voila!

Query Result
SELECT VILLE_ETP
FROM T_ENTREPOT
WHERE RAYON_RYN IN
('boisson',
'conserve',
'droguerie',
'frais')
GROUP BY VILLE_ETP
HAVING COUNT(*) =
(SELECT COUNT(DISTINCT RAYON_RYN)
FROM T_RAYON)
VILLE_ETP
---------------
MARSEILLE
TOULOUSE

Or, imbriquant two requests:

4. A solution

Query Result
SELECT DISTINCT VILLE_ETP
FROM T_ENTREPOT
WHERE RAYON_RYN IN
(SELECT RAYON_RYN
FROM T_ENTREPOT
WHERE RAYON_RYN NOT IN
(SELECT RAYON_RYN
FROM T_ENTREPOT
WHERE RAYON_RYN NOT IN
(SELECT RAYON_RYN
FROM T_RAYON)))
GROUP BY VILLE_ETP
HAVING COUNT (*) =
(SELECT COUNT(DISTINCT RAYON_RYN)
FROM T_RAYON)
VILLE_ETP
---------------
MARSEILLE
TOULOUSE
 

Whew! It is still five SELECT with four levels of nesting to find our solution.


5. A good solution
Joe CELKO, one of the great master of SQL, a specialist queries the most twisted delivers various other ways to make the relational division.
Here is an implementation a little easier, not requiring that 3 nested queries, which seems to be the minimum. She still uses the operator EXISTS testing the existence of lines of a table in another and the correlation of sub queries, which is costly in terms of implementation:

SELECT DISTINCT VILLE_ETP
FROM T_ENTREPOT AS ETP1
WHERE NOT EXISTS
(SELECT *
FROM T_RAYON RYN
WHERE NOT EXISTS
(SELECT *
FROM T_ENTREPOT AS ETP2
WHERE ETP1.VILLE_ETP = ETP2.VILLE_ETP
AND (ETP2.RAYON_RYN = RYN.RAYON_RYN)))

This request is a classic included in most textbooks dealing with relational algebra and their implementation in SQL.

Let's see how this request can be translated into natural language ...
To understand, you just have to put in place each responsible warehouse. The head of the warehouse receives the list of rays that must serve and says in a tone dismissive "there is no department that I can not provide." Obviously leaders capable of such an assertion are those warehouses of Marseilles and Toulouse!

As a direct request this research warehouses for which there is no radius they can not provide… this is, of course, a double negative.

6. A bold solution
If your implementation of the operator has SQL relational basis for the difference between two tables, then the solution is faster ...

SELECT DISTINCT VILLE_ETP
FROM T_ENTREPOT AS ETP1
WHERE
(SELECT RAYON_RYN
FROM T_RAYON
INTERSECT
SELECT RAYON_RYN
FROM T_ENTREPOT AS ETP2
WHERE ETP1.VILLE_ETP = ETP2.VILLE_ETP
) IS NULL

But few SQL server have such an operator, alas, when it is described in the game command base SQL2 (1992).

7. A strange solution
Another approach is to make the product Cartesian separate warehouses with that of various rays, then compare with the T_ENTREPOT table, and select the lines that correspond with this product Cartesian.

The Cartesian product can be achieved with SQL code follows:

SELECT DISTINCT ETP.VILLE_ETP, RYN.RAYON_RYN
FROM T_RAYON RYN, T_ENTREPOT ETP

Who gives the result:

VILLE_ETP RAYON_RYN
ANGER Drink
ANGER keep
ANGER drugstore
ANGER fee
LYON Drink
LYON keep
LYON drugstore
LYON fee
MARSEILLE Drink
MARSEILLE keep
MARSEILLE drugstore
MARSEILLE fee
PARIS Drink
PARIS keep
PARIS drugstore
PARIS fee
TOULOUSE Drink
TOULOUSE keep
TOULOUSE drugstore
TOULOUSE fee

Then we must retain only the warehouses where all lines are found on the table T_ENTREPOT compared to the table containing the Cartesian product thus generated. In principle we should use a command EXISTS, but I propose an original solution, fairly concise using the standard function of SQL allowing concatenation of data replacing the clause EXISTS ...:

SELECT VILLE_ETP
FROM T_ENTREPOT
WHERE VILLE_ETP || RAYON_RYN
IN
(SELECT DISTINCT ETP.VILLE_ETP || RYN.RAYON_RYN
FROM T_RAYON RYN, T_ENTREPOT ETP)
GROUP BY VILLE_ETP
HAVING COUNT(VILLE_ETP) =
(SELECT COUNT(DISTINCT RAYON_RYN)
FROM T_RAYON RYN)

Now a breakdown of how this request ...
a) The first complaint before IN generates the following data:


b) The second complaint, just after IN, generates the following data:


c) finally brings the enumeration:


QED!

8. A copy solution
During a discussion on the Internet intermediary, Didier BOULLE, gave me a solution that develops as follows.
"For example, on your warehouses, would say that you do not try to eliminate warehouses which have a radius slaughter but rather to keep the warehouses that have the rays of the table T_RAYON?"

And to express its SQL:

SELECT VILLE_ETP
FROM T_ENTREPOT
WHERE RAYON_RYN IN
(SELECT RAYON_RYN
FROM T_RAYON)
GROUP BY VILLE_ETP
HAVING COUNT(*) =
(SELECT COUNT(DISTINCT RAYON_RYN)
FROM T_RAYON)

It has a great merit, not to use operator negation or references between SELECT intertwined.

9. A Step Further
A final refinement may be made by requiring that the division is "accurate", meaning that the table dividend corresponds exactly with the values of the divider, neither more nor less.

In the case of our game test this is to find what are the warehouses that have all rays of the table T_RAYON, but also from any other. This could be for example, for reasons of optimizing the distribution than picking a warehouse where all department heads would be occupied and not just a part of them.

One approach is to make the product Cartesian warehouses with that of rays, then compare with the T_ENTREPOT table, and select the lines that correspond exactly with this product Cartesian, neither more nor less.

Again, Joe Celko comes to our rescue us offering a smooth and clean version of the exact division based on double negation:

SELECT VILLE_ETP
FROM T_ENTREPOT ETP1
WHERE NOT EXISTS
(SELECT *
FROM T_RAYON
WHERE NOT EXISTS
(SELECT *
FROM T_ENTREPOT ETP2
WHERE (ETP1.VILLE_ETP = ETP2.VILLE_ETP)
AND (ETP2.RAYON_RYN = T_RAYON.RAYON_RYN)))
GROUP BY VILLE_ETP
HAVING COUNT (*) =
(SELECT COUNT(*)
FROM T_RAYON)

Note that it requires four SELECT including three interwoven!

For example we could have a table with equipment and components necessary to manufacture the aircraft in question, and a table of components in stock.
To make a device, you must have all the components. That is why we say that the division must be accurate. The question is: What devices do I therefore make using all the components that I have in stock?

To help us, here's the test Thursday:

CREATE TABLE T_COMPOSANT
( COMPOSANT_CPS CHAR(16) )

CREATE TABLE T_APPAREIL
( NOM_APR CHAR(16),
COMPOSANT_CPS CHAR(16) )

INSERT INTO T_COMPOSANT (COMPOSANT_CPS) VALUES ('Food')
INSERT INTO T_COMPOSANT (COMPOSANT_CPS) VALUES ('
box')
INSERT INTO T_COMPOSANT (COMPOSANT_CPS) VALUES ('
tuner')
INSERT INTO T_COMPOSANT (COMPOSANT_CPS) VALUES ('
engine')

INSERT INTO T_APPAREIL (NOM_APR, COMPOSANT_CPS) VALUES ('
TV ',' tuner')
INSERT INTO T_APPAREIL (NOM_APR, COMPOSANT_CPS) VALUES ('
TV ',' Food')
INSERT INTO T_APPAREIL (NOM_APR, COMPOSANT_CPS) VALUES ('
TV ',' CRT')
INSERT INTO T_APPAREIL (NOM_APR, COMPOSANT_CPS) VALUES ('
TV ',' housing')
INSERT INTO T_APPAREIL (NOM_APR, COMPOSANT_CPS) VALUES ('
video ',' Food')
INSERT INTO T_APPAREIL (NOM_APR, COMPOSANT_CPS) VALUES ('
video ',' tuner')
INSERT INTO T_APPAREIL (NOM_APR, COMPOSANT_CPS) VALUES ('
video ',' engine')
INSERT INTO T_APPAREIL (NOM_APR, COMPOSANT_CPS) VALUES ('
video ',' box')
INSERT INTO T_APPAREIL (NOM_APR, COMPOSANT_CPS) VALUES ('
DVD player ',' food')
INSERT INTO T_APPAREIL (NOM_APR, COMPOSANT_CPS) VALUES ('
DVD player ',' housing')
INSERT INTO T_APPAREIL (NOM_APR, COMPOSANT_CPS) VALUES ('
DVD player ',' engine')
INSERT INTO T_APPAREIL (NOM_APR, COMPOSANT_CPS) VALUES ('
DVD player ',' laser')

The solution:

SELECT NOM_APR
FROM T_APPAREIL APR1
WHERE NOT EXISTS
(SELECT *
FROM T_COMPOSANT
WHERE NOT EXISTS
(SELECT *
FROM T_APPAREIL APR2
WHERE (APR1.NOM_APR = APR2.NOM_APR)
AND (APR2.COMPOSANT_CPS = T_COMPOSANT.COMPOSANT_CPS)))
GROUP BY NOM_APR
HAVING COUNT (*) =
(SELECT COUNT(*)
FROM T_COMPOSANT)

10. Conclusion
Doubtless you've already used the division relational without knowing it. In any case, we may regret the absence of a specific operator of the SQL standard that could be called: DIVIDE. In this case our demands s'ecriraient for example:

SELECT *
FROM T_ENTREPOT
DIVIDE [EXACTLY] T_ENTREPOT ETP BY T_RAYON RYN
ON ETP.RAYON_RYN = RYN.RAYON_RYN

Whose syntax would be:

SELECT  list of columns
FROM list of tables
DIVIDE [EXACTLY] table dividend BY table divider
ON condition reference divider
WHERE ...

11. To learn more about

Advanced SQL, Joe Celko, Vuibert Chapter 20, paragraphs 2 and the following
Relational Division: Four Algorithms and Their Performance. Goetz Graefe. ICDE 1989: 94-101
Learning Division in Elementary School. Relational division primer even your kids could understand. Joe Celko. DBMS on line, Page 72 January, 1994 (Volume 7, Number 1)
Fuzzy relations, ill-known values and the relational division. Patrick Bosc. IRISA / ENSSAT, France
Fuzzy Fuzzy Division in Relational Databases. An approach. J. Galindo, J.M. Medina, M.C. Aranda International Journal of Fuzzy Sets and Systems, 2000
In Fuzzy Fuzzy Identification Databases. The nuanced Relational Division. N. Mouaddib. International Journal of Intelligent Systems, 9 (5), pp. 461-473, May 1994
Database management systems, Ramakrishnan, R Mac Graw Hill 2000 (second edition) p 137
Introduction to databases, Chris Date Vuibert 1998 (6th edition) - ISBN: 2711786404
 
< Prev   Next >
School Joomla Templates and Joomla Tutorials