|
|
Saturday, 12 July 2008
|
HTML clipboard I. Introduction I.A. What is it? XML stands for Extensible Markup Language. Unlike the "one says," XML is not a programming language. You can not do the tests, or include a file in another. In very large, XML is only used to store data. XML is simply a method to represent the data. These are written in tags or in the form of attributes, and all is written in the form of a tree. I.B. Header and encoding XML documents must have a first line east of the form: <? xml version = "1.0" encoding = "ISO-8859-1"?> This line defined the version number (1.0 here), and character encoding. For encoding, there are mainly: * ISO-8859-1: USA and Western Europe. This is the easiest way to write a paper in French as accented letters do a character (byte). * UTF-8: International format that allows you to write in any language. Each character to a single coding (Unicode). The Roman alphabet (az and AZ), Arabic numerals (0-9) and a few other punctuation French are encoded on a single byte. Other characters are encoded on several bytes (size may vary from 2 to 4 characters, maybe more?). The French accented characters take for example, two bytes. See the site of Unicode in the list of links at the bottom of page. To write and read in UTF-8, your software must bear UTF-8! Otherwise the characters more than one byte appear as a string of bizarre characters. I.C. Example Example <father name="Gilbert"> <son name="Victor"> That's me </son> My father. </ father> An XML document has only one root tag. Here it is called father :-) A tag for an attribute, as a name here. The attributes are a name (chain letters A to Z, lower or upper case) whose value is written in quotes: name = "value". A tag can contain text and / or tags "son". They talk about parent tag (parent in English) and children (children's) in reference to the genealogical tree. A tag can also be empty (not contain any data outside its name). I.D. Rules to be observed To write in XML, there are some basic rules to follow: 1. An XML document contains only tag root (parent). 2. A tag must be closed. Forget therefore the old HTML tags and its closed or not, depending on the mood of current W3.org :-) Example: img tag (image) must be closed. It writes more <img src="image.png"> but <img src="image.png" /> 3. An empty tag can be written in different ways. Take the example of the HTML tag br. You can write it in three different forms: <br/><br/><br></br> 4. The spaces (indentation) is taken into account in the content of tags (not in the area between brackets <...>). <bag color "black" mark "decathlon"> My bag. </bag> can be written more "proper" (my taste): <bag color="black" mark="decathlon"> My bag. </My bag> By cons inside tags, each space. A small <p> sentence to complete the tag ...</ p p>. is different from <p>A small sentence to fill tag p ...</p> or <p> A small sentence to fill tag p ... </ p> But we will only XSLT, there is an option that can eliminate spaces and return to the unnecessary line (used for indentation).
II. An example to understand: an address book II.A. Representation of a person The bogus but classic example: an address book. It will store a list of people whom we have different information (name, e-mail, phone, etc.).. Let's already representing a person. Representation intuitive: < person> <name>Victor STINNER</name> <email>victor.stinner ON haypocalc.com</email> <address>282, 7th Street at Quebec (CANADA</address> </ person> This representation has one big disadvantage: you can not differentiate the name of the first name, and you can not find the country in the address. You can then use another representation: <person> <name> <firstname> Victor </ first name> <name> STINNER </ name> </ name> <email> <ID> victor.stinner </id> <server> haypocalc.com </ server> </ email> <address> <number> 282 </ number> <rue> 7th Street </ street> <city> Quebec </ city> <countries> CANADA </countries> </ address> </ person> The same address takes a lot more space and takes longer to write. The advantage? Well, you can correctly identify the information. We can then consider making a sort by name, first name, a server email, country, etc.. You may also search people. You start to see build the tree of data. The country is stored in the address tag, which itself is stored in the tag father. This representation is quite logical, and location of a tag (in the XML tree) is an information itself. One could find several tag name in the tree that will not necessarily the same direction: the name of the address book, name of a person, the name of a group of people, and so on. II.B. Representation address book Now that defined a person, it will define how people are stored in the address book: <carnet_adresse> <group> <name> Friends </ name> <person> ...</ person> <person> ...</ person> </ group> <group> <name> Labour </ name> <person> ...</ person> <person> ...</ person> </ group> <group> <name> Family </ name> <person> ...</ person> <person> ...</ person> </ group> </ carnet_adresse> I have grouped people into groups. This adds additional information on a person we know to which group of people it belongs. I gave a name to the group by a tag name, but you can also use an attribute: <group name="Amis"> ... </ group> <group name="Family"> ... </ group> The choice between a tag or attribute is left to the programmer. In practice, data access is fairly similar. He must know against an attribute is unique, unlike the tags that can be repeated. Note: You can also limit the number of tags authorized by the DTD or XML Schema, but we will not see it in this article. III. Special Characters and CDATA Because XML tags are separated by brackets: '<' and '>', you can not write these characters directly. We must write, respectively, <and>. To recall, lt is the abbreviation of lower than (less than), and gt the abbreviation of greater than (greater than). Alternatively, you can write code in a decimal. Example: & & character represents the code 38. Some codes useful: * 38: '&' character (and commercial). * 60: character '<' (below). * 62: character '>' (above). * 160: Character '(space). This character is represented by the tag in HTML. Personally, I have abandoned in favor of the pre HTML tags. But all this is quite binding, especially for writing source (in C for example), which is filled with these characters. You can then use CDATA sections. Just open a section on the tag <[CDATA [, and close by]]>. Inside a CDATA section, you can use special characters without problem. By cons can not write]]> need to write]]>. CDATA Example: <p> A little text <b> HTML </ b> who will not be shaped because it is placed in a CDATA ;-)</ p> IV. Conclusion IV.A. Advantages 1. A document XML is hierarchical form of a tree. This is logical representation is used to make very sophisticated search inside an XML document. 2. The name tags, their quantity and attributes are not limited. If you want to limit them, you can go through the DTD or XML Schema. 3. Portability: an XML document is very portable and easily readable. There are tools for all XML languages currents (C / C + +, Java, PHP, etc.).. IV.B. Disadvantages * The form is so free, you can have incompatible formats. Ex: information may be stored in the form of tag, or in the form of an attribute in two different documents. You can use the DTD or XML Schema to prevent these problems. * The XML used as a file can generate very large files and hard editable by hand. For large numbers of registration, a division into several files or use of a database is required.
|
|
Last Updated ( Sunday, 13 July 2008 )
|
|
|