Python // Misc: XML and Python (the good, the bad, and the ugly)

You thought you were past this, “I don’t need to learn stinking XML; this is 2018”, then you dove head-first into Juniper and PyEZ, ran into a problem, had to peel back the onion and what did you see staring back at you: a big, ugly, no fun, bunch-of-XML.

Come on this is 2018, can’t I just deal with JSON.

Unfortunately, the answer might be: no. You still have:

  • NETCONF which uses XML under-the-hood
  • APIs that use XML (for example, NX-API)
  • Various forms of ‘| xml’ where you can output data into XML format
  • An XML mode on some platforms, for example, the ‘xml agent’ on IOS-XR

But, but, but what about newer solutions like RESTCONF, pipe JSON, NX-API using JSON/JSON-RPC, gRPC, et cetera? Unfortunately in the networking world, we have a little problem called all those pesky field devices; field devices that have a tendency to live a long, long, long time. So yes it would be great if we could use bright, shiny, new thing (TM). But there are many, many situations where it is not going to be practical. Whether the practicality limit is new hardware, new software, or just the new solution is not yet sufficiently reliable.

So unfortunately, you might still need to learn XML. 

I fall into this same bucket, I have spent quite a bit of time avoiding, evading, hiding-from, and ducking XML, but now I am writing an XML driver for NAPALM and NX-OS.


Here are a few things that I have learned about XML and Python:

  • A lot of the Python-XML related documentation (for beginners) is poor. I have searched and searched, on more than one occasion, and finding good documentation on XML and Python was hard.
  • The three key Python-XML libraries are: xmltodict, the builtin XML library, and lxml.
  • XML is meaningfully harder than JSON. There is just not much way around this.
  • You need to think of and visualize XML as a tree with the elements (XML tags) as nodes in the tree. This will help your understanding of XML.
  • XML namespaces are a pain.

Yes, these are not rocket science, but they still might help you. Here are some more details on some of these points.

Visualizing XML

When you are considering XML, you really need to visualize it as a tree. For example, if we have this XML from a Juniper SRX (slightly modified):

​<rpc-reply xmlns:junos="test_namespace">
    <software-information>
        <host-name>pynet-jnpr-srx1</host-name>
        <product-model>srx100h2</product-model>
        <product-name>srx100h2</product-name>
        <jsr/>
        <package-information>
            <name>junos</name>
            <comment>JUNOS Software Release [12.1X44-D35.5]</comment>
        </package-information>
    </software-information>
    <cli>
        <banner></banner>
    </cli>
</rpc-reply>

Visualized as a tree, this would look as follows (note, you will need to display images to see the below graphic):

Image

If you go back and look at the XML, you can see we have made the XML-tags be nodes in the tree. These tags are known as element nodes.

You can also see these tags form a hierarchy. In other words, the root node is the rpc-reply tag and underneath rpc-reply are the software-information and cli tags. These two nodes are child nodes of “rpc-reply”. Similarly, the software-information node has several child nodes including “host-name”, “product-model”, and “product-name”. Each node will only have one parent-node. For example, the parent-node of host-name is the “software-information” node. The only exception to this is the root node which does not have any parent nodes.

One other aspect of understanding XML is realizing that not only are their element nodes in the tree, but there is also text and attributes in the tree. For example, with the following section:

​<rpc-reply xmlns:junos="test_namespace">
    <software-information>
        <host-name>pynet-jnpr-srx1</host-name>

The rpc-reply tag has an attribute named “xmlns:junos”. Similarly, you can see that the host-name element has the text “pynet-jnpr-srx1” associated with it. The key takeaway from this is: XML is clearly not just a structured combination of lists and dictionaries; it is clearly something different and we probably need to do special processing to handle it.

Note, here I am explicitly using the Python ElementTree and lxml model of the XML-tree where only elements (XML tags) are nodes in the tree and neither XML-text nor XML-attributes are considered nodes in the tree. This is different than how XML is modeled in other contexts. For example, in the browser DOM (document object model) XML is modeled as a tree where the elements, text, and attributes are all nodes. 

Python Libraries for XML

1. xmltodict library
2. Builtin XML library (ElementTree)
3. lxml library

For a quick, high-level rundown of these three libraries read the following…

The xmltodict library tries to convert the tree-structure of XML into a Python dictionary. 

Basically, it tries to convert a tag hierarchy into nested, ordered Python dictionaries where the key is the tag-name and the value is potentially an inner ordered dictionary, a text string, a list. One of the main issues with doing this, however, is the “XML string-list problem”. Basically, xmltodict will treat single element as a text string whereas multiple elements will be returned as a list of strings. There can be other cases where the data type changes, for example, an attribute is added and then an OrderedDict is returned instead of a string.

I recognize it is probably hard to visualize what I am talking about here without providing a bunch of examples, but let me say we are trying to stuff a square peg into a round hole. We have something that is clearly not just lists, dictionaries, and strings and we are trying to convert it to lists, dictionaries, and strings.

Note, there are ways you can potentially work around these problems in the xmltodict library and there might be cases where using xmltodict might make sense, but you need to be aware of the dangers.

Thankfully, both the builtin XML libary (ElementTree) and the lxml library are very similar to each other in their behavior and use. In general, you load XML from either a file or a string. Depending on how you loaded the XML, you then find the root of the tree. At this point, you can use various methods to traverse up or down the XML tree, and other methods to find various things. Ultimately you can find the element nodes you are looking for and extract the information you need from them (likely in the text tag).

Useful XML documentation that I found

Article on XML and lxml from New Mexico Tech University. I had never heard of this university before, and the article formatting looks like it is from the nineties, but a lot of very good content here.
Python XML processing with lxml 

Article from Microsoft on XML Namespaces. XML Namespaces do get a bit challenging, but I found this article helpful.
Understanding XML Namespaces