An XML Compression Scheme

There are different ways of compressing XML but I like the idea of doing it with a schema. Using a schema you always have the advantage of validation but you can also produce very efficient encoding of the XML data. I have been working on an XML layer for packedobjects to allow compression of XML data. Currently it takes XML data with a schema written in packedobjects and produces binary data. This is very similar to XER but everything happens at runtime. This means I could support a subset of XML Schema and dynamically map to the packedobjects schema. Doing this the end user will only see an XML world.  Everything is handled by embedding Scheme to take care of the mapping between XML and s-expressions. From C you will not see this but you still have the advantage of working directly within a REPL to design your schema if you want. The Scheme layer could also be extended to handle other data formats.

Running the example will show that the packedobjects compression is about 3 times smaller than gzip based compression. In similar tests I have seen similar gains over encoding with Protocol Buffers.

 

Leave a Reply