Communicating XML concisely using Swift

Some background…

There are plenty of XML haters out there but this is a great quote:

XML is like violence: if it doesn’t solve your problem, you aren’t using enough of it.

Having spent quite a bit of time working with Lisp-like languages I am used to feeling syntax-fuelled hate. However, if what I send across the wire is not XML but something very concise I am not overly bothered about syntax. I will use XML, JSON, S-expressions, or whatever, as long as there are decent parsers. There have been numerous attempts to invent a new markup language or abstract syntax to represent the same kinds of data. When it comes to concisely transferring this data over a network you can adopt two approaches. You can send everything in one message or you can separate out the data you need to communicate from the structure and types but still allow the data to be reconstructed by the receiver. So you could take a chunk of markup and apply some text compression to it and send that binary across your network or you could send a minimal amount of data encoded into binary that relies on a schema or protocol to reconstruct it. The latter has the advantage of allowing the schema to be used multiple times on different sets of data as long as they conform to the schema. This means you can not only send less data but you can take advantage of fast encode/decode cycles by processing the schema once at startup and using that optimised version each time you encode and decode. This also provides the added benefit of allowing validation to take place on each encode and decode, something that text compression has no clue about. ASN.1 led the way in this approach. It was very powerful but complex which made tool support challenging. Google’s Protocol Buffers follows a similar approach but has a less abstract syntax to provide an easier way to map the data and types to programming languages. In the XML world there have been many attempts at compressing this verbose markup. In the end EXI emerged as the standard approach. However, I think the design is also flawed. In my opinion EXI suffers from the same problems as ASN.1, being overly complex to implement. That means you will struggle to find credible implementations outside of enterprise computing. I also don’t think it makes sense to try and invent a general serialisation approach to XML because there are too many caveats. At some point you will give in and employ text compression instead. So when I decided to make an XML serialisation tool I wanted to recognise these limitations. Packedobjects was based on ASN.1 but represented in a subset of XML Schema. It has a limited set of data types that are enough to write network protocols. It deliberately restricted XML Schema to control things like the order of data and the way data repeated. For example, I don’t think it makes much sense to support a set data type when machines are pretty good at generating things in the same order each time.

Going mobile…

If you are working on restricted platforms such as mobile or embedded devices, in the end it is all about tool support. This is where I believe XML does well. If you want to support parsing a schema language efficiently you probably have few options. Libxml2 does a great job of this and it does it quickly. What’s more, this parser is everywhere. For example, it can be in your pocket right now if you own an iPhone. I decided to see how Packedobjects would perform on iOS if I wrapped the current API in a more high-level interface that worked with strings rather than expose the more lower-level Libxml2 doc type API. The porting process was fairly quick and painless. I built an example program that can take all the XML files in the Packedobjects repository and ran them to get performance metrics.

The screenshots show the results running on a 5th gen iPod. This example is available to try out here. There is quite a big discrepancy between encoding and decoding speed performance but overall I am pleased at how the tool performs on the devices I tried. I will be adding support for 64bit encoding and decoding soon to see what impact this has on an iPhone 6.

Size matters…

One thing I avoided talking about in this post until now is the key metric of encoding size. Rather than believe what I say you need to pick your data set and try for yourself. For the type of data I work with, Packedobjects outperforms other approaches I tried. I would classify this data as highly structured and not dominated by string data types. So the kind of data that might originate from the Internet Of Things (IoT), sensor networks, network management and so on.

Leave a Reply