|
The division relational, myth or reality?
|
|
|
|
|
Wednesday, 28 May 2008
|
Preamble A little reminder being better than a good debate, here are the eight basic operations of algebra relational Dr. CODD and their transcription SQL. PROJECTION SELECT [ALL] [DISTINCT] list of attributes FROM table SELECTION SELECT list of attributes FROM table WHERE condition JOINTURE SELECT list of attributes FROM table1 JOIN table2 ON table1.attribut1 = table2.attribut1 ... DIVISION ? UNION SELECT list of attributes table FROM UNION SELECT list of attributes FROM table INTERSECTION SELECT list of attributes FROM table INTERSECT SELECT list of attributes FROM table can also use a WHERE clause with the NOT IN SELECT list of attributes table1 FROM WHERE attribut1 NOT IN SELECT list of attributes FROM table2 DIFFERENCE SELECT list of attributes table EXCEPT FROM SELECT list of attributes FROM table PRODUCT SELECT * FROM table1, table2 (no where) or SELECT * FROM table1 CROSS JOIN table2 1. Problem Definition A good example is better than an explanation tarabiscotée, here's the question we will ask: We are in a business of large retailers with stores in different cities of France. These warehouses can accommodate products of different departments. The question is: what are the warehouses capable of serving ALL-rays? Of course we have a table of warehouses and other rays ... To touch the issue of this question, the easiest way is to see this concept ensembliste in the form of diagrams Wen. Yes, I see, it does remind you nothing but alas… if: potatoes! Nothing could be simpler than see the answer: just find warehouses, which are connected to ALL rays. The answer is obvious here, and only TOULOUSE MARSEILLE meet the criteria of the question (as what, south east ahead of the north in our example). Let us study the pattern tabular and the response that we want to ... To help us, here is the script creation of the database and data: /************************************************* / / * THE DIVISION RELATIONNELLE, MYTH OR REALITY? * / / * * Frederic BROUARD
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
/ /********************************** ***************/ /***********************/ / * CREATION OF TABLES * / / * **********************/ / * Table-ray * / CREATE TABLE T_RAYON (RAYON_RYN CHAR (16)) / * tables warehouses * / CREATE TABLE T_ENTREPOT (VILLE_ETP CHAR (16, RAYON_RYN CHAR (16)) /***************************/ / * FOOD OF TABLES * / / * **************************/ / * Food ray table * / INSERT INTO T_RAYON (RAYON_RYN) VALUES ( 'fresh') INSERT INTO T_RAYON (RAYON_RYN) VALUES ( 'drink') INSERT INTO T_RAYON (RAYON_RYN) VALUES ( 'keep') INSERT INTO T_RAYON (RAYON_RYN) VALUES ( 'drugstore') / * feeding the table warehouses * / INSERT INTO T_ENTREPOT (VILLE_ETP , RAYON_RYN) VALUES ( 'PARIS', 'drink') INSERT INTO T_ENTREPOT (VILLE_ETP, RAYON_RYN) VALUES ( 'PARIS', 'fresh') INSERT INTO T_ENTREPOT (VILLE_ETP, RAYON_RYN) VALUES ( 'PARIS', 'keep') INSERT INTO T_ENTREPOT (VILLE_ETP, RAYON_RYN) VALUES ( 'LYON', 'drink') INSERT INTO T_ENTREPOT (VILLE_ETP, RAYON_RYN) VALUES ( 'LYON', 'keep') INSERT INTO T_ENTREPOT (VILLE_ETP, RAYON_RYN) VALUES ( 'LYON', ' drugstore ') INSERT INTO T_ENTREPOT (VILLE_ETP, RAYON_RYN) VALUES (' MARSEILLE ',' drink ') INSERT INTO T_ENTREPOT (VILLE_ETP, RAYON_RYN) VALUES (' MARSEILLE ',' fresh ') INSERT INTO T_ENTREPOT (VILLE_ETP, RAYON_RYN) VALUES (' MARSEILLE ',' keep ') INSERT INTO T_ENTREPOT (VILLE_ETP, RAYON_RYN) VALUES (' MARSEILLE ',' drugstore ') INSERT INTO T_ENTREPOT (VILLE_ETP, RAYON_RYN) VALUES (' ANGER ',' drink ') INSERT INTO T_ENTREPOT (VILLE_ETP, RAYON_RYN ) VALUES ( 'ANGER', 'fresh') INSERT INTO T_ENTREPOT (VILLE_ETP, RAYON_RYN) VALUES ( 'ANGER', 'drugstore') INSERT INTO T_ENTREPOT (VILLE_ETP, RAYON_RYN) VALUES ( 'TOULOUSE', 'drink') INSERT INTO T_ENTREPOT (VILLE_ETP, RAYON_RYN) VALUES ( 'TOULOUSE', 'fresh') INSERT INTO T_ENTREPOT (VILLE_ETP, RAYON_RYN) VALUES ( 'TOULOUSE', 'keep') INSERT INTO T_ENTREPOT (VILLE_ETP, RAYON_RYN) VALUES ( 'TOULOUSE', 'drugstore' ) And the sight of this division table: Dividend T_ENTREPOT VILLE_ETP RAYON_ETP PARIS Drink PARIS fee PARIS keep LYON Drink LYON keep LYON drugstore MARSEILLE Drink MARSEILLE fee MARSEILLE keep MARSEILLE drugstore ANGER Drink ANGER fee ANGER drugstore TOULOUSE Drink TOULOUSE fee TOULOUSE keep TOULOUSE drugstore Divisor T_RAYON RAYON_RYN Drink fee keep drugstore Result T_RESULTAT VILLE_ETP MARSEILLE TOULOUSE To "prove" that our division is correct just know that the Cartesian product of the result with the divisor gives a number of valid line of the table dividend. That is the result of: SELECT * FROM T_RESULTAT, T_RAYON is fully included in T_ENTREPOT. 2. Primary Test A first implementation that comes immediately to mind is to count the number of occurrences of rays and to coincide with the enumeration of rays of various warehouses: VILLE_ETP SELECT GROUP FROM T_ENTREPOT BY HAVING VILLE_ETP COUNT (*) = (SELECT COUNT (*) FROM T_RAYON) Who gives the result: MARSEILLE, TOULOUSE This solution, apparently good, only works in cases relatively limited: where there is no redundancy of data in the table divider where there is no radius in addition to a warehouse, not identified in the table T_ENTREPOT AGAINST EXAMPLES .. a) if the table T_RAYON contains: T_RAYON RAYON_RYN Drink fee keep drugstore keep after for example: INSERT INTO T_RAYON VALUES ( 'keep') The previous query gives a result void. b) If the table T_ENTREPOT contains the following additional line: INSERT INTO T_ENTREPOT (VILLE_ETP, RAYON_RYN) VALUES ( 'MARSEILLE', 'butcher') The motion gives the result: TOULOUSE and 'forget' MARSEILLE! c) If we have both the cases 1 and 2, we get the result TOULOUSE, which is completely wrong! d) If we change one of the shelves in warehouses Marseilles or Toulouse in a radius non appears in the table T_RAYON: UPDATE T_ENTREPOT SET RAYON_RYN = 'fish' WHERE VILLE_ETP = 'MARSEILLE' AND RAYON_RYN = 'retains' Our request is obviously the same result, which is false, since only the warehouse Toulouse satisfied now to our request.
Note that sense of Codd, the introduction of T-uple ( "MARSEILLE", "butcher") does not preclude nor the definition of the division, nor the result that we expect. We want to know what stores are able to serve all rays of the table T_RAYONS, even though some stores may serve some rays more ... Generally speaking we must never assume that the tables are consistent, and in particular that all rays of the table warehouses are listed in the table shelves, or even that there is no redundancy in one any tables. To the extent that our data may come from intermediaries tables or views. In other words, there may be one or several extra radius in one or several warehouses, it does affect neither the division nor the result. With the proposed change in the against-examples, here is the new data set to test our queries: Dividend T_ENTREPOT VILLE_ETP RAYON_ETP PARIS Drink PARIS fee PARIS keep LYON Drink LYON keep LYON drugstore MARSEILLE Drink MARSEILLE fee MARSEILLE keep MARSEILLE drugstore ANGER Drink ANGER fee ANGER drugstore TOULOUSE Drink TOULOUSE fee TOULOUSE keep TOULOUSE drugstore Divisor T_RAYON RAYON_RYN Drink fee keep drugstore keep Result T_RESULTAT VILLE_ETP MARSEILLE TOULOUSE 3. Raffinements We can easily delete against example by introducing into our application a keyword additional SQL, the key word DISTINCT: VILLE_ETP SELECT GROUP FROM T_ENTREPOT BY HAVING VILLE_ETP COUNT (*) = (SELECT COUNT (DISTINCT RAYON_RYN) FROM T_RAYON) But this does not mean the example against b. .. To find the butcher, and to eliminate the outcome occurrences of warehouses which have a radius slaughter or any other department not included in the table rays, we can do the following: Query Result RAYON_RYN SELECT DISTINCT FROM T_ENTREPOT WHERE RAYON_RYN NOT IN (SELECT RAYON_RYN FROM T_RAYON) RAYON_RYN --------------- slaughter which in our game test found the radius slaughter. Therefore eliminate the radius slaughter more hits radius in the table warehouse is more than child's play ... Query Result RAYON_RYN SELECT DISTINCT FROM T_ENTREPOT FTE RAYON_RYN WHERE NOT IN ( 'butcher') RAYON_RYN --------------- Drink keep drugstore fee In imbriquant both requests, we have: Query Result RAYON_RYN SELECT DISTINCT FROM T_ENTREPOT FTE RAYON_RYN WHERE NOT IN (SELECT DISTINCT RAYON_RYN FROM T_ENTREPOT WHERE RAYON_RYN NOT IN (SELECT RAYON_RYN FROM T_RAYON)) RAYON_RYN --------------- Drink keep drugstore fee It can also eliminate most embedded DISTINCT which has more importance: Query Result RAYON_RYN SELECT DISTINCT FROM T_ENTREPOT FTE RAYON_RYN WHERE NOT IN (SELECT RAYON_RYN FROM T_ENTREPOT WHERE RAYON_RYN NOT IN (SELECT RAYON_RYN FROM T_RAYON)) RAYON_RYN --------------- Drink keep drugstore fee This leads to a shortlist of warehouses whose rays are contained in the table T_RAYON. Now, if we go back to our basic complaint, simply specify that the radius must be included in the result and voila! Query Result SELECT VILLE_ETP FROM T_ENTREPOT WHERE RAYON_RYN IN ( 'drink', 'keep', 'drugstore', 'fresh') GROUP BY HAVING VILLE_ETP COUNT (*) = (SELECT COUNT (DISTINCT RAYON_RYN) FROM T_RAYON) VILLE_ETP --------------- MARSEILLE TOULOUSE Or, imbriquant two requests: 4. A solution Query Result VILLE_ETP SELECT DISTINCT FROM T_ENTREPOT WHERE RAYON_RYN IN (SELECT RAYON_RYN FROM T_ENTREPOT WHERE RAYON_RYN NOT IN (SELECT RAYON_RYN FROM T_ENTREPOT WHERE RAYON_RYN NOT IN (SELECT RAYON_RYN FROM T_RAYON))) GROUP BY HAVING VILLE_ETP COUNT (*) = (SELECT COUNT (DISTINCT RAYON_RYN) FROM T_RAYON) VILLE_ETP --------------- MARSEILLE TOULOUSE Whew! It is still five SELECT with four levels of nesting to find our solution. 5. A good solution Joe CELKO, one of the great master of SQL, a specialist queries the most twisted delivers various other ways to make the relational division. Here is an implementation a little easier, not requiring that 3 nested queries, which seems to be the minimum. She still uses the operator EXISTS testing the existence of lines of a table in another and the correlation of sub queries, which is costly in terms of implementation: VILLE_ETP SELECT DISTINCT FROM T_ENTREPOT AS ETP1 WHERE NOT EXISTS (SELECT * FROM T_RAYON RYN WHERE NOT EXISTS (SELECT * FROM T_ENTREPOT AS ETP2 WHERE ETP1.VILLE_ETP = ETP2.VILLE_ETP AND (ETP2.RAYON_RYN = RYN.RAYON_RYN))) This request is a classic included in most textbooks dealing with relational algebra and their implementation in SQL. Let's see how this request can be translated into natural language ... To understand, you just have to put in place each responsible warehouse. The head of the warehouse receives the list of rays that must serve and says in a tone dismissive "there is no department that I can not provide." Obviously leaders capable of such an assertion are those warehouses of Marseilles and Toulouse! As a direct request this research warehouses for which there is no radius they can not provide… this is, of course, a double negative. 6. A bold solution If your implementation of the operator has SQL relational basis for the difference between two tables, then the solution is faster ... VILLE_ETP SELECT DISTINCT FROM T_ENTREPOT AS ETP1 WHERE (SELECT RAYON_RYN FROM T_RAYON INTERSECT SELECT RAYON_RYN FROM T_ENTREPOT AS ETP2 WHERE ETP1.VILLE_ETP = ETP2.VILLE_ETP) IS NULL But few SQL server have such an operator, alas, when it is described in the game command base SQL2 (1992). 7. A strange solution Another approach is to make the product Cartesian separate warehouses with that of various rays, then compare with the T_ENTREPOT table, and select the lines that correspond with this product Cartesian. The Cartesian product can be achieved with SQL code follows: SELECT DISTINCT ETP.VILLE_ETP, RYN.RAYON_RYN FROM T_RAYON RYN, T_ENTREPOT FTE Who gives the result: VILLE_ETP RAYON_RYN ANGER Drink ANGER keep ANGER drugstore ANGER fee LYON Drink LYON keep LYON drugstore LYON fee MARSEILLE Drink MARSEILLE keep MARSEILLE drugstore MARSEILLE fee PARIS Drink PARIS keep PARIS drugstore PARIS fee TOULOUSE Drink TOULOUSE keep TOULOUSE drugstore TOULOUSE fee Then we must retain only the warehouses where all lines are found on the table T_ENTREPOT compared to the table containing the Cartesian product thus generated. In principle we should use a command EXISTS, but I propose an original solution, fairly concise using the standard function of SQL allowing concatenation of data replacing the clause EXISTS ...: SELECT VILLE_ETP FROM T_ENTREPOT WHERE VILLE_ETP | | RAYON_RYN IN (SELECT DISTINCT ETP.VILLE_ETP | | RYN.RAYON_RYN FROM T_RAYON RYN, T_ENTREPOT FTE) GROUP BY HAVING VILLE_ETP COUNT (VILLE_ETP) = (SELECT COUNT (DISTINCT RAYON_RYN) FROM T_RAYON RYN) Now a breakdown of how this request ... a) The first complaint before IN generates the following data: b) The second complaint, just after IN, generates the following data: c) finally brings the enumeration: QED! 8. A copy solution During a discussion on the Internet intermediary, Didier BOULLE, gave me a solution that develops as follows. "For example, on your warehouses, would say that you do not try to eliminate warehouses which have a radius slaughter but rather to keep the warehouses that have the rays of the table T_RAYON?" And to express its SQL: SELECT VILLE_ETP FROM T_ENTREPOT WHERE RAYON_RYN IN (SELECT RAYON_RYN FROM T_RAYON) GROUP BY HAVING VILLE_ETP COUNT (*) = (SELECT COUNT (DISTINCT RAYON_RYN) FROM T_RAYON) It has a great merit, not to use operator negation or references between SELECT intertwined. 9. A Step Further A final refinement may be made by requiring that the division is "accurate", meaning that the table dividend corresponds exactly with the values of the divider, neither more nor less. In the case of our game test this is to find what are the warehouses that have all rays of the table T_RAYON, but also from any other. This could be for example, for reasons of optimizing the distribution than picking a warehouse where all department heads would be occupied and not just a part of them. One approach is to make the product Cartesian warehouses with that of rays, then compare with the T_ENTREPOT table, and select the lines that correspond exactly with this product Cartesian, neither more nor less. Again, Joe Celko comes to our rescue us offering a smooth and clean version of the exact division based on double negation: SELECT VILLE_ETP FROM T_ENTREPOT ETP1 WHERE NOT EXISTS (SELECT * FROM T_RAYON WHERE NOT EXISTS (SELECT * FROM T_ENTREPOT ETP2 WHERE (ETP1.VILLE_ETP = ETP2.VILLE_ETP) AND (ETP2.RAYON_RYN = T_RAYON.RAYON_RYN))) GROUP BY HAVING VILLE_ETP COUNT ( *) = (SELECT COUNT (*) FROM T_RAYON) Note that it requires four SELECT including three interwoven! For example we could have a table with equipment and components necessary to manufacture the aircraft in question, and a table of components in stock. To make a device, you must have all the components. That is why we say that the division must be accurate. The question is: What devices do I therefore make using all the components that I have in stock? To help us, here's the test Thursday: CREATE TABLE T_COMPOSANT (COMPOSANT_CPS CHAR (16)) CREATE TABLE T_APPAREIL (NOM_APR CHAR (16), COMPOSANT_CPS CHAR (16)) INSERT INTO T_COMPOSANT (COMPOSANT_CPS) VALUES ( 'food') INSERT INTO T_COMPOSANT (COMPOSANT_CPS) VALUES ( 'cabinet') INSERT INTO T_COMPOSANT (COMPOSANT_CPS) VALUES ( 'tuner') INSERT INTO T_COMPOSANT (COMPOSANT_CPS) VALUES ( 'engine') INSERT INTO T_APPAREIL (NOM_APR, COMPOSANT_CPS) VALUES ( 'TV', 'tuner') INSERT INTO T_APPAREIL (NOM_APR, COMPOSANT_CPS) VALUES ( 'TV', 'food') INSERT INTO T_APPAREIL (NOM_APR, COMPOSANT_CPS) VALUES ( 'TV', 'CRT') INSERT INTO T_APPAREIL (NOM_APR, COMPOSANT_CPS) VALUES ( 'TV', 'box') INSERT INTO T_APPAREIL (NOM_APR, COMPOSANT_CPS) VALUES ( 'video', 'food') INSERT INTO T_APPAREIL (NOM_APR, COMPOSANT_CPS) VALUES ( 'video', 'tuner') INSERT INTO T_APPAREIL (NOM_APR, COMPOSANT_CPS) VALUES ( 'video', 'engine' ) INSERT INTO T_APPAREIL (NOM_APR, COMPOSANT_CPS) VALUES ( 'video', 'box') INSERT INTO T_APPAREIL (NOM_APR, COMPOSANT_CPS) VALUES ( 'DVD player', 'food') INSERT INTO T_APPAREIL (NOM_APR, COMPOSANT_CPS) VALUES ( 'player DVD ',' box ') INSERT INTO T_APPAREIL (NOM_APR, COMPOSANT_CPS) VALUES (' DVD player ',' engine ') INSERT INTO T_APPAREIL (NOM_APR, COMPOSANT_CPS) VALUES (' DVD player ',' laser ') The solution: SELECT NOM_APR FROM T_APPAREIL APR1 WHERE NOT EXISTS (SELECT * FROM T_COMPOSANT WHERE NOT EXISTS (SELECT * FROM T_APPAREIL APR2 WHERE (APR1.NOM_APR = APR2.NOM_APR) AND (APR2.COMPOSANT_CPS = T_COMPOSANT.COMPOSANT_CPS))) GROUP BY HAVING NOM_APR COUNT ( *) = (SELECT COUNT (*) FROM T_COMPOSANT) 10. Conclusion Doubtless you've already used the division relational without knowing it. In any case, we may regret the absence of a specific operator of the SQL standard that could be called: DIVIDE. In this case our demands s'ecriraient for example: SELECT * FROM T_ENTREPOT DIVIDE [EXACTLY] T_ENTREPOT FTE BY T_RAYON RYN ON ETP.RAYON_RYN = RYN.RAYON_RYN Whose syntax would be: SELECT list of columns FROM list of tables DIVIDE [EXACTLY] table dividend BY table divider ON condition reference divider WHERE ...
|
|
|