|
Using the DOM API Python with the library PyXML.
|
|
|
|
|
Monday, 14 July 2008
|
HTML clipboardHTML clipboard Introduction DOM: Document Object Model. It is an API for the manipulation of XML files. With this API, we will be able to read an XML file, but also add new elements to this file. DOM perhaps seen as a tree file. We must also know that with the API dom python, everything is element or node, the same text. Thus, if the TAG <A> contains "Hello" it will mean that the node A type ELEMENT node contains the "Hello" type TEXT. But all this will be more clear with the code (at least I hope). In this tutorial, we will simply read a file to transform into objects TAG python that we have defined. We will have an XML with TAGs Persons who will have full name and address. This information will be injected into objects python appropriate that we can then manipulate. For now, we will stop there. Thereafter I would add how to save the changes. It is important that you know the basics of XML and Python. You need the latest version of python and the module PyXML. I. The XML file <quil> <personne> During <name> </ name> <firstname> Stephan </ first name> <address> <ville> Brussels </ city> </ Address> </ person> <personne> <name> Smith </ name> <firstname> Henri </ first name> <adresse/> </ person> </ quil> This XML is a small address book. Obviously, there is little information provided since it will serve us to achieve our little exercise. You can, as an exercise, try to add new elements that will expand. We have a node (or TAG XML) principal called "apparent" (ben yes! Why not!). There can be only one element to this level of hierarchy of the XML file. You can see it as a hard disk. As the first element is equivalent to the root of your disk. The other nodes correspond to nested directories and subdirectories and under subdirectories and so forth. Save this file as "personnes.xml" with the editor of your choice. So, why choose XML rather than a flat file or a database? The answer, you find on the net through sites specializing in comparative. Moi anything I can do about it is to give my humble opinion. XML allows a clearly defined structure, which gives him an advantage over the flat files. Now compared to the DBMS, the main contribution is the ease of use of XML, but also the portability of the application and set up. It can travel from one PC to another with its application and data. No need to network. A USB key containing the soft complete with data continually updated. This is a DBMS does not allow it so simple. The party is against the security and data management if it is an application with great link between data. It is up to you to weigh the pros and cons and make your choice. II. Reading the XML and create objects A. The python script complete class Person: name = None first_name = None Mailing address = () def __init__ (self): Pass class Address: city = None def __init__ (self): Pass TransformXmlToPersonnes class: __currentNode__ = None __personneList__ = None def __init__ (self): self.readXml () def readXml (self): from xml.dom.minidom import parse self.doc = parse ( 'E: / python / samplexml / personnes.xml') def getRootElement (self): If self.__currentNode__ == None: self.__currentNode__ = self.doc.documentElement Return self.__currentNode__ def getPersonnes (self): if self.__personneList__! = None: return self.__personneList__ = [] for people in self.getRootElement (). getElementsByTagName ( "person"): If personnes.nodeType == personnes.ELEMENT_NODE: p = Person () try: p.nom = self.getText (personnes.getElementsByTagName ( "name") [0]) p.prenom = self.getText (personnes.getElementsByTagName ( "name") [0]) p.adresse = self.getAdresse (personnes.getElementsByTagName ( "address") [0]) except: print 'One of TAGS is manquents following: name, address' self.__personneList__.append (p) Return self.__personneList__ def getAdresse (self, node): Mailing address = () try: adress.ville = self.getText (node.getElementsByTagName ( "city") [0]) except: adress.ville = None Return address def getText (self, node): return node.childNodes [0]. nodeValue if __name__ == "__main__": x = TransformXmlToPersonnes () x.getPersonnes print () [1]. behalf You can save this script as "personnes.py." We will dissect this script step by step with the description of different classes and methods.
B. The class Person class Person: name = None first_name = None Mailing address = () def __init__ (self): Pass This class will help us to create objects types of people who will contain all the information collected in the XML file for one person. The address attribute is a class described below. The method "__init__" is the constructor of the class that does nothing. C. The class Address class Address: city = None def __init__ (self): Pass This does not contain an attribute that is the city where the person lives. You can improve the information by yourself in the future. D. The class TransformXmlToPersonnes TransformXmlToPersonnes class: None __currentNode__ = __personneList__ = None def __init__ (self): self.readXml () def readXml (self): from xml.dom.minidom import parse self.doc = parse ( 'personnes.xml') This is the class that will permit the processing of XML file. The method "__init__" uses the method "readXML" who will read the XML file through the method "parse" imported from "xml.dom.minidom." This method takes into argument filename xml to be addressed, ie to transform into an object for python understandably. Hereafter we will explore other methods of our class. def getRootElement (self): If self.__currentNode__ == None: self.__currentNode__ = self.doc.documentElement Return self.__currentNode__ They look if you have already read the first element of the file (in our case "that it"). If yes, it is only returning the attribute __currentNode__ Otherwise, we take the first élémént the document. We call this way of working a "lazy instantiation." It will thereafter work from this first element as if it were a base directory (C: under or win / A * x for example). It goes down and increasingly low in our hierarchy node. def getPersonnes (self): if self.__personneList__! = None: Return self.__personneList__ self.__personneList__ = [] for people in self.getRootElement (). getElementsByTagName ( "person"): If personnes.nodeType == personnes.ELEMENT_NODE: p = Person () try: p.nom = self.getText (personnes.getElementsByTagName ( "name") [0]) p.prenom = self.getText (personnes.getElementsByTagName ( "name") [0]) p.adresse = self.getAdresse (personnes.getElementsByTagName ( "address") [0]) Self.__personneList__.append (p) except: print 'One of TAGS is missing following: name, address' Return self.__personneList__ This is the method most complex of our fiscal year as it will be transformed into knots persons subject Person. As there are several people, we'll create a list in which we will store each object Person created. In the first place we look if the list has already been established and whether it returns the object already existing. Otherwise, it creates a new empty list. Then we create a loop for all elements with the name "person" content dant root element (here: "quil"). For each of these elements, if you look at is a type of knot ELEMENT. If yes, we will create a new Person object and attempt to reclaim the values of knots name, first name and address using the methods "getText" and "getAdresse." In this case, as I know there's only one person by name, I ask her element 0. Indeed the method "getElementsByTagName returns a list. If a node does not exist in the file, a printing error. We can improve that by returning a specific exception. It assigns to each variable and the appropriate value on the subject p adds to our list that is returned when iteration is complete. def getAdresse (self, node): Mailing address = () try: adress.ville = self.getText (node.getElementsByTagName ( "city") [0]) except: None adress.ville = Return address This method uses the same logic as the previous one for the creation of the address for one person. def getText (self, node): return node.childNodes [0]. nodeValue And finally it returns the text for a particular knot. As for the knot "name", I'll get back the text but the text is also a node, I am compelled to "node.childNode [0]." The "nodeValue" is to recover the value. Conclusion As you can see it is quite simple to manipulate an XML file with python and the DOM API. For Java-ists, it looks quite at the JDOM and thus quickly found its brands.
|