|
To be or not to be NULL ... That is the question!
|
|
|
|
|
Tuesday, 27 May 2008
|
1. NULL is? First mistake, NULL, is not a value. Yet we see often flourish in literature as in the forums or web pages, editors say to you that the column thing may take NULL! Or NULL is not a value, it is precisely the lack of value of a column table. In fact NULL is a marker, as is the nil certain programming languages that allow access directly to a memory address (C, Pascal ...). Nil does not address ... He said simply that no assignment of addresses and therefore no reservation memory has yet taken place. As a result of all this, it is not possible to compare a marker as NULL, because it is not just a value. In this case an application like this: SELECT * FROM T_CLIENT WHERE CLI_ENSEIGNE = NULL ... is syntactically wrong and lead mostly to an error message. Except that some deliberately accept RDBMS syntax also violante. 1.1. No NULL please! Some people prefer never to have marked a NULL columns in their databases. This position is a fundamentalist stupidity. For example, if you note the date of birth of people, then what value you give to this column to indicate that you do not know? Easy, you say, there should be an arbitrary value, for example on 1 January 1900 ... As a result, what you've taken a good idea quickly becomes a nightmare ... For once, most applications will become a hell and performance deteriorate. How to calculate the age for example? The same is true of the use of empty string or zero "offset" the NULL. You will air smart when you want to know the average price of a product, if you quantified zero in a line of bill: divide by zero will ensure a mistake that would have avoided NULL you! 1.2. More "NULL" please? Among followers of NULL, the idea has long been emerging to add a few additional markers: inaplicable value, value unknown, etc. ... In a document became famous (Interim Report 75-02-08 see ref. Bibliographical), logicians advocated the use of 14 different markers to reflect situations of incomplete data. Dr. Codd classifia it into two main families: the values inapplicable (eg colour of the roof of a motorcycle) and unknown values (for example eye colour of a person wearing sunglasses). But the thing no shortage of interest! How, for example, make sure to store a date representing TOUJOURS the future? A small example: you have a table of rates applicable deadlines. The deadlines are DATE_DEBUT and DATE_FIN. How to specify the rate the latest must have for the column DATE_FIN, a value still higher than the current date? In other words, how in a database, represent infinity infinity positive or negative? Similarly how to represent the overflow? SELECT CAST (32767 + 1 AS SMALLINT) FROM ... Error exceeded arithmetic for the type of data smallint, value = 32768. Or even, how to represent a mistake to treat it "online" in the application? It would be so much simpler if we could do, for example: SELECT CASE WHEN PRICES IN ERROR IS THEN ELSE'***' CAST (PRICES IN AS VARCHAR (32)) + '€' END AS PRIX_MOYEN FROM ... But the processing logic markers proved so complex and rebutante it was never really included in the standard and that publishers refused to spend any penny to implementation of this extension ... 2. Test markers NULL By default, any predicate (logical expression assessable) containing a marker NULL, must take the value UNKNOWN ... While there, it must be said that UNKNOWN is indeed a value. It's logical 3 state. In other words SQL, a predicate can take the values TRUE, FALSE and UNKNOWN. In principle there should even be able to build predicate genre: SELECT ... FROM ... WHERE (COLONNE1 = COLONNE2) IS UNKNOWN And so this WHERE clause would be true if COLONNE1 or COLONNE2 is marked NULL (or inclusive). But few RDBMS have implemented such a test. Attention to the logic of NULL in predicates ... Thus NULL is TRUE OR TRUE OR FALSE NULL is FALSE, NULL AND FALSE is FALSE But the standard SQL provides us with various tools to test markers NULL: 2.1. Test direct: IS NULL, IS NOT NULL It is simply whether a column or an expression is marked NULL Example: SELECT ... FROM ... WHERE IS NULL COLONNE1 OR IS NULL COLONNE2 SELECT ... FROM ... WHERE (COLONNE1 + COLONNE2) IS NULL 2.2. Denullification: COALESCE It is possible to replace the stolen, so in a complaint, a marker by a NULL value. By exemlple, zero, empty string, etc. ... The particularity of this function is to accept as many parameters as we want (without theoretical limit). His syntax is as follows: COALESCE (expression1 [, expression2 [, expression3 [...]]]) It returns the first quantifiable expression (and therefore not containing marker NULL) in order positional of writing from left to right. Example: SELECT COALESCE (debit, credit * -1, 0) FROM ... In this example, the assessment line by line will return the value of the debit column if it is not marked to NULL, otherwise the value of the expression "credit * -1" if credit is not marked to NULL if not, in the case or debit and credit are marked NULL, the value zero. 2.3. Nullification: NULLIF NULLIF lets you mark a NULL an expression giving a certain value. Example: SELECT NULLIF (debit, credit) FROM ... Renverra marker NULL whenever debit and credit have the same value. So are not marked to NULL, neither one nor the other. 2.4. Structures treatment markers NULL You can also use the facilities CASE and the UNION to treat NULL: SELECT CASE WHEN IS NULL debit credit IS NULL AND THEN WHEN 0 debit credit IS NULL AND THEN IS NOT NULL speed * -1 WHEN IS NOT NULL debit credit IS NULL AND THEN WHEN credit debit IS NOT NULL AND IS NOT NULL credit THEN credit -- debit END FROM ... SELECT 0 FROM ... WHERE IS NULL debit credit IS NULL AND UNION SELECT speed * -1 FROM ... WHERE IS NULL AND debit credit IS NOT NULL UNION SELECT credit FROM ... WHERE IS NOT NULL debit credit IS NULL AND UNION SELECT credit - debit debit WHERE IS NOT NULL AND credit IS NOT NULL 3. External seams Another way to play well with NULL is making external joins. At the very beginning of SQL, such joints were impossible because the base would be perfect! Alas, perfection is frankly disturbing in this case, it was necessary to face the facts: the external joins needed to exist and therefore NULL to proliferate. For a study on external joins and the assumption of closed world, read: joints Therefore, with a left outer join, the party columns of the table right unrelated to the table left sees the values of all its columns marked a NULL. It is of course the opposite in an outer join right. With the outer join on bilateral finds himself with NULL sometimes right and sometimes left. CREATE TABLE T1 (INTEGER I11, I12 INTEGER) CREATE TABLE T2 (INTEGER I21, I22 INTEGER) INSERT INTO T1 VALUES (NULL, 1) INSERT INTO T1 VALUES (1, NULL) INSERT INTO T1 VALUES (1, 1) INSERT INTO T1 VALUES (NULL, NULL) INSERT INTO T2 VALUES (NULL, NULL) INSERT INTO T2 VALUES (1, NULL) INSERT INTO T2 VALUES (NULL, 1) INSERT INTO T2 VALUES (1, 1) INSERT INTO T2 VALUES (1, 2) INSERT INTO T2 VALUES ( 2, 1) INSERT INTO T2 VALUES (2, 2) INSERT INTO T2 VALUES (2, NULL) INSERT INTO T2 VALUES (NULL, 2) SELECT * FROM T1 LEFT OUTER JOIN T2 ON T1.I11 = T2.I21 AND T1.I12 = T2.I22 I11 I12 I21 I22 ----------- ----------- ----------- ----------- 1 1 1 1 1 NULL NULL NULL NULL 1 NULL NULL NULL NULL NULL NULL T1 SELECT * FROM OUTER JOIN T2 RIGHT ON T1.I11 = T2.I21 AND T1.I12 = T2.I22 I11 I12 I21 I22 ----------- ----------- ----------- ----------- 1 1 1 1 NULL NULL 2 2 NULL NULL 2 1 NULL NULL 2 NULL NULL NULL 1 2 NULL NULL 1 NULL NULL NULL NULL 2 NULL NULL NULL 1 NULL NULL NULL NULL SELECT * FROM T1 FULL OUTER JOIN T2 ON T1.I11 = T2.I21 AND T1.I12 = T2.I22 I11 I12 I21 I22 ----------- ----------- ----------- ----------- 1 1 1 1 1 NULL NULL NULL NULL 1 NULL NULL NULL NULL 2 2 NULL NULL 2 1 NULL NULL 2 NULL NULL NULL 1 2 NULL NULL 1 NULL NULL NULL NULL 2 NULL NULL NULL 1 NULL NULL NULL NULL NULL NULL NULL NULL SELECT * FROM T1 UNION JOIN T2 I11 I12 I21 I22 ----------- ----------- ----------- ----------- NULL 1 NULL NULL 1 NULL NULL NULL 1 1 NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 1 NULL NULL NULL NULL 1 NULL NULL 1 1 NULL NULL 1 2 NULL NULL 2 1 NULL NULL 2 2 NULL NULL 2 NULL NULL NULL NULL 2 Finally, with the JOIN UNION We have a result whose appearance is a kind of matrix of which only the diagonal is powered by the original values of tables, all other parties being nullifies. This small difference will make things interesting when we will add a filter WHERE ... Here is an example of what awaits you: SELECT * FROM T1 LEFT OUTER JOIN T2 ON T1.I11 = T2.I21 AND T1.I12 = T2.I22 WHERE I21 <2 I11 I12 I21 I22 ----------- ----------- ----------- ----------- 1 1 1 1 SELECT * FROM T1 LEFT OUTER JOIN T2 ON T1.I11 = T2.I21 AND T1.I12 = T2.I22 AND I21 <2 I11 I12 I21 I22 ----------- ----------- ----------- ----------- NULL 1 NULL NULL 1 NULL NULL NULL 1 1 1 1 NULL NULL NULL NULL Depending on whether you filter in the WHERE clause or in the predicate join, the result is not the same. That makes sense because the filtering applies AFTER the knuckle. In other words the outer join retains its NULL. To address one or the other can be done: SELECT * FROM T1 LEFT OUTER JOIN T2 ON T1.I11 = T2.I21 AND WHERE T1.I12 = T2.I22 alt I21; 2 OR IS NULL I21 I11 I12 I21 I22 ----------- ----------- ----------- ----------- NULL 1 NULL NULL 1 NULL NULL NULL 1 1 1 1 NULL NULL NULL NULL SELECT * FROM T1 LEFT OUTER JOIN T2 ON T1.I11 = T2.I21 AND T1.I12 = T2.I22 AND I21 <2 WHERE IS NOT NULL I21 I11 I12 I21 I22 ----------- ----------- ----------- ----------- 1 1 1 1 Or use functions and structures denullifiation as indicated in 2.2 and 2.3 4. Treatment markers NULL But how SQL he uses markers NULL in its treatment? That is what we will now talk ... It is said that the markers NULL spread in expressions. Some authors call the "absorbing" ... Example: SELECT 1 + NULL FROM ... ---------------- NULL In conclusion, whatever the term, even if it is calculated, it will be worth UNKNOWN in the presence of one or several markers NULL. 4.1. Declaration variable in the code procedural SQL: DECLARED The key word DECLARED which can specify a variable in a procedural code SQL (stored procedure, function and triggers ...) marks the variable immediately declared null. DECLARE INTEGER SELECT foo foo + I11 AS I FROM T1 I ---------------- NULL NULL NULL NULL It should therefore be set if we want to use it in a speech.
4.2. The aggregate statistics calculations ignore markers NULL: COUNT, SUM, AVG, or MIN MAX As an exception to the rule of propagation NULL, the lack of value in a column did not result in entacher the calculation of an aggregate. If using SUM, AVG, or MIN MAX, the result always return a value if at least one value exists, if the marker NULL. However, COUNT always return a value and never marker NULL: when all occurrences of the column are NULL, COUNT return zero. SELECT AVG (I22) AS AVG, SUM (I22) AS SUM, COUNT (I22) AS COUNT, MAX (I22) AS MAX, MIN (I22) AS FULL T1 MIN FROM OUTER JOIN T2 ON AND T1.I11 = T2.I21 T1.I12 = T2.I22 AVG SUM COUNT MAX MIN ----------- ----------- ----------- ----------- ------ ----- 1 9 6 2 1 SELECT AVG (I22) AS AVG, SUM (I22) AS SUM, MAX (I22) AS MAX, MIN (I22) AS FULL T1 MIN FROM OUTER JOIN T2 ON T1.I11 = T2.I21 AND T1.I12 = T2.I22 WHERE I22 <1 AVG SUM MAX MIN ----------- ----------- ----------- ----------- NULL NULL NULL NULL SELECT COUNT (I22) AS COUNT FROM T1 FULL OUTER JOIN T2 ON T1.I11 = T2.I21 AND T1.I12 = T2.I22 WHERE I22 <1 COUNT ----------- 0 Note that your statistics may be biased ... For example, an average calculation is different if one denullifie to 0 or 1 ... Example: SELECT AVG (COALESCE (I22, 0)) AS AVG, SUM (COALESCE (I22, 0)) AS SUM, COUNT (COALESCE (I22, 0)) AS COUNT, MAX (COALESCE (I22, 0)) AS MAX, MIN (COALESCE (I22, 0)) AS FULL T1 MIN FROM OUTER JOIN T2 ON T1.I11 = T2.I21 AND T1.I12 = T2.I22 AVG SUM COUNT MAX MIN ----------- ----------- ----------- ----------- ------ ----- 0 9 12 2 0 MIN AVG and no longer have the same values ... 4.3. The counting of lines of the table takes into account the lines NULL! : COUNT (*) COUNT (*) is a calculation can aggregate a particular in the sense that it counts the number of lines that owns the table. In this case, even if the line sees all its columns marked a NULL, it is still counted. It is the exception to the exception! SELECT COUNT (*) AS COUNT FROM T1 FULL OUTER JOIN T2 ON T1.I11 = T2.I21 AND T1.I12 = T2.I22 COUNT ----------- 12 SELECT * FROM T1 FULL OUTER JOIN T2 ON T1.I11 = T2.I21 AND T1.I12 = T2.I22 I11 I12 I21 I22 ----------- ----------- ----------- ----------- NULL NULL NULL NULL NULL NULL 1 NULL NULL NULL NULL 1 1 1 1 1 NULL NULL 1 2 NULL NULL 2 1 NULL NULL 2 2 NULL NULL 2 NULL NULL NULL NULL 2 NULL NULL NULL NULL NULL 1 NULL NULL 1 NULL NULL NULL The 1st and 10th lines entirely NULL have been counted! 4.4. NULL exists! The EXISTS predicate is estimated to be true, even if lines have returned all their columns marked a NULL: CREATE TABLE T_NULL (INTEGER I1, I2 INTEGER) INSERT INTO T_NULL VALUES (NULL, NULL) SELECT * FROM T_NULL I1 I2 ----------- ----------- NULL NULL SELECT 'YES' AS NULL_EXISTE FROM T_NULL WHERE EXISTS (SELECT * FROM T_NULL) NULL_EXISTE ----------- YES Despite the fact that the only line inserted in the table is "empty", EXISTS indicates that there is a line ... It will meet a similar behavior with the predicates and SINGLE MATCH (attention for the single, if several lines are returned NULL, the principle is that unity is maintained !)... 4.5. The agenda NULL ... A interressante question is to know in what order should the markers NULL? As it is not a question of values, as a clause BY ORDER can hardly decide to sort ... But the standard SQL: 1999 (Window partionning, OLAP) has done things, it specifies a parameter of ordonancement NULL which may take NULL FIRST LAST or NULL. The problem is that a publisher may have opted for such a refinement. Thus in the documentation of MS SQL Server, there are the words: "The values are treated as NULL values the smallest possible." So much so that the same basic complaint with an ORDER BY not give the same ranking, DB2 UDB have opted for the opposite! Indeed, in its online help IBM indicates that "Ordering is performed in accordance with the rules described in comparison" Language Elements. "Null The value is higher than all other values." 4.6. NULL with DISTINCT and UNION As bizarre as it may seem, the use of DISTINCT clause in the SELECT order SELECT must not "dedoublonner" lines NULL. Why? Because once again, any comparison of NULL is impossible, so each NULL is different! It is quite logical by the way. For example, if you have a column representing the telephone number of your loved ones and that this column is not always informed, it seems obvious that No you put the empty place markers will be different from each other (or is that you do not take into account the basic principles of standardization!) Example: CREATE TABLE T_SELECT_DISTINCT (COL1 INTEGER, COL2 INTEGER) INSERT INTO T_SELECT_DISTINCT VALUES (NULL, NULL) INSERT INTO T_SELECT_DISTINCT VALUES (NULL, NULL) INSERT INTO T_SELECT_DISTINCT VALUES (NULL, 4) INSERT INTO T_SELECT_DISTINCT VALUES (1, 1) INSERT INTO T_SELECT_DISTINCT VALUES (1, 1) INSERT INTO T_SELECT_DISTINCT VALUES ( 0, 2) * SELECT DISTINCT FROM T_SELECT_DISTINCT COL1 COL2 ----------- ----------- NULL NULL NULL NULL NULL 4 0 2 1 1 SELECT * FROM ALL T_SELECT_DISTINCT COL1 COL2 ----------- ----------- NULL NULL NULL NULL 1 1 1 1 0 2 NULL 4 But to my knowledge, no RDBMS does the dedoublonnement of NULL in this way because all consider that NULL is a "value"! The same is true in the case of the union ... By default the UNION two query results is dedoublonne. Except for lines whose columns are marked to NULL ... CREATE TABLE T_SELECT_DISTINCT_UNION (COL1 INTEGER, COL2 INTEGER) INSERT INTO T_SELECT_DISTINCT_UNION VALUES (NULL, NULL) INSERT INTO T_SELECT_DISTINCT_UNION VALUES (NULL, 4) INSERT INTO T_SELECT_DISTINCT_UNION VALUES (4, NULL) INSERT INTO T_SELECT_DISTINCT_UNION VALUES (1, 1) INSERT INTO T_SELECT_DISTINCT_UNION VALUES (0, 2) SELECT * FROM T_SELECT_DISTINCT UNION SELECT * FROM T_SELECT_DISTINCT_UNION COL1 COL2 ----------- ----------- NULL NULL NULL NULL NULL NULL NULL 4 NULL 4 0 2 1 1 4 NULL SELECT * FROM ALL T_SELECT_DISTINCT UNION SELECT * FROM T_SELECT_DISTINCT_UNION COL1 COL2 ----------- ----------- NULL NULL NULL NULL NULL NULL NULL 4 NULL 4 0 2 0 2 1 1 1 1 1 1 4 NULL Even finding among publishers RDBMS! No publisher has thought about keeping duplicates of NULL in the UNION distinct ... 5. Definition table and constraints In the creation of tables of a database, a lot of elements based on the issue of treatment of NULL. We will see some ... 5.1. Coercion column NUL / NOT NULL In the absence of any specification in the order of creation, a column can not contain any value. In this case, the two formulations: CREATE TABLE T_ANOTHER_NULL (COL1 INTEGER, COL2 VARCHAR (16)) CREATE TABLE T_ANOTHER_NULL (COL1 INTEGER NULL, COL2 VARCHAR (16) NULL) are equivalent ... because by default, coercion is: NULL marker possible. If one wants to compel the column have always specify a value must be added duress NOT NULL. 5.2. Coercion of uniqueness In a unique, if one does not specify NOT NULL constraint as columns under duress "unique" then there may be as many tuple compelled not having specific values. Example: CREATE TABLE T_UNIQUE_NULL (COL1 INTEGER, COL2 INTEGER, CONSTRAINT UNICOL UNIQUE (COL1, COL2)) INSERT INTO T_UNIQUE_NULL VALUES (1, 1) INSERT INTO T_UNIQUE_NULL VALUES (1, NULL) INSERT INTO T_UNIQUE_NULL VALUES (NULL, 1) INSERT INTO T_UNIQUE_NULL VALUES (NULL, NULL) INSERT INTO T_UNIQUE_NULL VALUES (NULL, NULL) INSERT INTO T_UNIQUE_NULL VALUES ( 1, NULL) INSERT INTO T_UNIQUE_NULL VALUES (1, 1) The constraint will be raped THAT for the last line. The 5th cause the insertion of a duplicate line NULL without triggering violated forced to unity, because the comparison between NULL is by definition a heresy! 5.3. Other constraints The violation of integrity constraints in the presence of marker NULL depends on the MATCH clause that can tell how deal with the absence of value. Thus, a "MATCH SIMPLE" implies that all columns component coercion integrity shall be filled. In case one of the column has a marker NULL, coercion is breached. A read on the subject: .. / sqlaz / ddl /? Page = part2 # L7.3.1 But few go so far RDBMS in the management of referential integrity. 6. Updated with NULL SQL we make life easier with the help of NULL for inserts and updates ... 6.1. NULL and INSERT One can specify the marker NULL value as the absence of columns during an insertion Example: INSERT INTO T_CLIENT (CLI_ID, TIT_CODE, CLI_NOM, CLI_PRENOM, CLI_ENSEIGNE) VALUES (999, 'Mr.', 'DUMOULIN', 'Alfred', NULL) INSERT INTO T_CLIENT VALUES (999, 'Mr.', 'DUMOULIN', 'Alfred', NULL) NOTE: these two requests are equivalent There is no need to specify the marker NULL if it does not specify the column in the list of target columns. Example: INSERT INTO T_CLIENT (CLI_ID, TIT_CODE, CLI_NOM, CLI_PRENOM) VALUES (999, 'Mr.', 'DUMOULIN', 'Alfred') NOTE: This request is equivalent to the previous two. You can specify a default NULL constraint and make use of keyword DEFAULT during insertion. Example: CREATE TABLE T_DEFAUT_NULL (COL1 INTEGER NULL DEFAULT, COL2 INTEGER) INSERT INTO T_DEFAUT_NULL VALUES (DEFAULT, DEFAULT) INSERT INTO T_DEFAUT_NULL VALUES (NULL DEFAULT) INSERT INTO T_DEFAUT_NULL VALUES (DEFAULT, NULL) SELECT * FROM T_DEFAUT_NULL COL1 COL2 ----------- ----------- NULL NULL NULL NULL NULL NULL 6.2. NULL and UPDATE You can delete the value of a column by assigning the marker NULL. Example: UPDATE T_CLIENT SET CLI_ENSEIGNE = NULL UPDATE T_CLIENT SET CLI_ENSEIGNE = DEFAULT These two submissions will lead both to erase all the values contained in the column CLI_ENSEIGNE the table T_CLIENT, provided that the column is CLI_ENSEIGNE "NULLABLE". NOTE: it may be noted that the standard failed to finesse using a "= NULL" evil about desassignation for valuables, then it requires us to use and non-IS NULL = NULL in a predicate. We would have preferred to remain coherent and add a desassignation as: SET NULL for example. 7. An unusual feature of some RDBMS 7.1. Other processing functions NULL Some RDBMS using other functions for denullifier. For example can be found under NVL Oracle, in addition to COALESCE: this function only 2 arguments. There are also alternative under SQL Server ISNULL function. 7.2. NULL constraint as default Sybase and SQL Server are on the subject, the opposite of what the standard specifies. Thus, any table created without precision NULL are NOT NULL! 7.3. NULL and CHAR Sybase does not CHAR without coercion NOT NULL. It transforms a VARCHAR. In fact it stores being NULL for a VARCHAR by putting a string empty! More curious still, if you try to put a string vacuum in this column, then it will not actually empty. It will consist of a single character white! 7.4. NULL and UNIQUE Few RDBMS forced to accept unity as defined in the standard. SQL Server refuses and considers that different lines NULL violent coercion of uniqueness. It accepts a single line with NULL, while DB2 refuses any NULL. 7.5. Seams and external old syntaxes Most of the time, the old syntax treatment of external joins are inconsistent. Read the demonstration was done by Steve Kass of Drew University on SQL Server and different results obtained for a two complaints logically equivalent, one based on normative join syntax and the other on the former syntax =*... The joints 7.6. Predicate = NULL! Even more amazing ... Under SQL Server, the complaint: SELECT * FROM T_CLIENT WHERE CLI_ENSEIGNE = NULL causes no syntax error, and does not give the same result as: SELECT * FROM T_CLIENT WHERE IS NULL CLI_ENSEIGNE
|
|
|