Elegant XML parsing with Objective-C


[This article originally appeared on my old metatechnology blog, back in April 2009]

If you write for the Mac you get two Objective-C XML APIs, one tree-based, DOM-like interface, and the other a SAX-like event-driven interface.

If you write for the iPhone you only get the SAX interface. For many purposes this should be all your need.

I was a little disappointed, though, when I first looked at the NSXMLParser class, that it didn't really take advantage of what Objective-C could offer. Probably this was a performance trade-off, as we'll discuss later, but as it is there are only two benefits I can see to using it over a lower level C API: (1) string conversions are done for you and (2) attributes are already set into dictionaries.

I feel you can do so much more, though. So I did!

In this post I'm going to walk through my own wrapper for NSXMLParser, which I developed while writing my iPhone game, vConqr, and at the end give you access to my complete source, which is not terribly large or complex but due to some simplifying assumptions may need some tweaking to meet your needs. I'll bring those assumptions out as we go through.

What's in a name?

To start with, what's wrong with NSXMLParser? Well nothing as far as it goes. It looks like most SAX-like APIs. You provide a delegate object with methods like:

  -(void)    parser: (NSXMLParser*) parser
    didStartElement: (NSString*) elementName
       namespaceURI: (NSString*) namespaceURI
      qualifiedName: (NSString*) qName
         attributes: (NSDictionary*) attributeDict

This is probably the most important method in the interface. This will be called every time the opening tag for a new element is found. As you can see you get passed the parser object itself (which seems a waste, if you wanted it you'd surely just hold a reference at the start), the name of the element, a couple of namespace strings and all the attributes as a dictionary.

Aside from the redundant parser object, I also usually find that for my own small scale, app-specific, xml formats I don't bother with namespaces. If we're wrapping this API we'll probably drop all of those (this is where the first simplifying assumption comes in. If you do need the namespace data it's trivial to extend my code to add it back in - and even make it optional, as we'll see).

That leaves the element name and attributes. It doesn't get much simpler than that, does it? Well think about this for a moment. What's the first thing you're going to do in your handler method? I suspect there's not much you can do that's not element specific (actually there are a few things, but we'll even pull those out shortly). So you're probably going to need to switch on the element name.

Of course, in Objective-C you can't write a switch statement on a string, so you'll end up with something like:

  if( [elementName isEqualToString: @"elephant"] )
  {
    // Do something with elephant elements
  }
  else if( [elementName isEqualToString: @"giraffe"] )
  {
    // Do something with giraffe elements
  }
  else if ( /* next check */ )
  {
    // ...

did you remember not to compare strings with == ? (catches a lot of newer, and sometimes not so new, Objective-C programmers out). Using == would compile but silently fail at runtime (it compares the pointers, not the strings).

So we have needless, repetitive, boilerplate with at least one thing that's easy to get wrong. Furthermore, if this is any more than a couple of checks and a small amount of code in each if block, you'll almost certainly want to forward on to more specialised methods anyway, such as handleElephantElement).

Can we do better? It seems that what we want here is a way to do dynamic dispatch of methods based on names we don't know until runtime. If that's not what a Dynamic Programming Language gives us then I don't know what it is.

Objective-C, or at least NSObject, has a class method called performSelector: that we can use for dynamic dispatch. But performSelector: takes a selector (of type SEL) as it's argument. Can we get a SEL from a string? Yes we can. There's a function called NSSelectorFromString() which does just that! Now, if we build our selector dynamically we can call it - but what happens if we don't implement a handler for every element? We'll get a runtime error, which will usually result in terminate being called. That's a bit harsh. Fortunately the common idiom of calling respondsToSelector: first serves us well here. Putting this all together we get something like:

SEL sel = NSSelectorFromString( [NSString stringWithFormat:@"handleElement_%@:", elementName] );
if( [delegate respondsToSelector:sel] )
{
  [delegate performSelector:sel withObject: attributeDict];
}

Don't forget to include the : at the end of the selector name as you build it (so we can pass the attributes as the, currently, sole argument).

So now, with all that boilerplate pushed to the generic wrapper, we'll be able to write handlers like this:

-(void) handleElement_elephant: (NSDictionary*) attributes;
-(void) handleElement_giraffe: (NSDictionary*) attributes;

which I think you'll agree is much cleaner.

We can do the same for end tags, but the attributes are not needed. I called this method, handleElementEnd_{name}: (where {name} is the name of the element) and used the same techniques.

So, that's elements and attributes handled, but there's one more entity that we need before we can parse any useful documents: Text nodes.

Don't text me, I'll text you

There are a number of aspects of text nodes that make them tricky to deal with in a SAX-like interface.
The first problem is what to do with whitespace? Often, within a text node, whitespace should be preserved - but at the beginning and end you just want it stripped.

Text nodes occur at any point in a document, within the root document and outside of element tags. That means that even just the newlines and indentation spaces you probably have between adjacent element tags will be represented as text nodes.

A common solution to the first issue, which may solve the second too, is to trim whitespace at the extremities (beginning and end), or better still, make it an option. If you do trim, then you need to make sure you don't raise an event for an empty text node. With text nodes trimmed, and empty text nodes suppressed, no events will be raised for formatting whitespace between tags.

The second problem (which you also need to allow for before trimming), is that a single element may contain any number of text nodes. According to most SAX-like API specs (and NSXMLParser is no exception here), there is no guarantee how the text that belongs to an element will be split up, if at all. In practice you can usually count on getting a separate event for text before any child elements, each gap between child elements, and any more text after child elements. If any of those blocks of text are large (for some value of large) it's likely they will be broken down further for memory constraint reasons (imagine that the input processor probably has a buffer it's writing text nodes into. I've certainly implemented a parser exactly that way before).

Attaching any meaning to how text is broken up is probably misguided at best, so before you process your text nodes you will almost certainly want to collate the text nodes into a single string (unless you're expecting very large blocks of text. We'll make the simplifying assumption here that that's not the case).

So, to collect our text nodes we'll need to maintain some state as to the current aggregated text. This problem is complicated if you have a mixture of text and child elements, and the text may appear before or after (or between) child elements. To manage this you'll have to maintain some sort of stack of text blocks, mapped to elements - with unbounded memory requirements.

In practice, mixing child elements and non-whitespace text is uncommon, so one way to simplify this is to ignore the whole problem. If we just take the first, or last, text nodes (ie before or after any potential child elements) then at worst the text will be truncated. For my purposes, where I was completely in control of the schema this was the route I took and is reflected in the code here (any time we see a child node we reset our text buffer). If you want to implement the more general case you'll have to look at the stack-based idea (perhaps a future blog article).

Tracking text nodes this way becomes quite simple. I declare an instance variable to hold the current text value:

NSMutableString* currentTextString;

Now to collect the text we need to implement the method parser:foundCharacters: on our delegate. This will simply append the incoming string to our current string state, or create a new mutable string copy as necessary:

-(void)     parser: (NSXMLParser*) parser 
   foundCharacters: (NSString*) string 
{    
  if( string && [string length] > 0 )
  {
    if( !currentTextString )
    {
      currentTextString = [[NSMutableString alloc] initWithCapacity:4];
    }
    [currentTextString appendString:string];
  }
}

The choice of 4 characters to preallocate was entirely arbitrary and could be tweaked to your needs. Also, you might want to check if this is the first text node being appended, and is entirely whitespace (and whitespace at the ends is being trimmed), then don't even bother creating a new mutable string here just to throw it away. My version is simple and works for my needs.

In order to implement my simplification of throwing away any text before a child node we need to add a bit to parser:didStartElement:namespaceURI:qualifiedName:attributes: to release the string object if it's non-empty:

if( currentTextString )
{
  [currentTextString release];
  currentTextString = nil;
}

Now how do we get the text we've accumulated to the delegate? One way would be to create a new event method, handleText: for example. But you'll almost always want to tie the text up to the current element, so you'll want to track that and pass it to. I decided that this already looks so much like our end element handler that I just made it look like an extended version -

 -(void) handleElementEnd_tiger: (NSDictionary*) attributes 
                       withText: (NSString*) text;

Note that if an end tag is found and there is non-empty text, both events will be fired, if implementations exist, so you can get handleElementEnd_tiger: and handleElementEnd_tiger:withText:

You could use the same technique of looking for two versions of a method, one with more arguments, to optionally pass namespace data, if you like.

So now, with elements, attributes and text nodes sorted we can start doing some basic parsing. However we soon run up against another issue.

In my element

When we write an element (start or end) handler, with our without text nodes, we know our immediate context. We at least know the current element name.

However XML is a hierarchy. While it is certainly possible to write XML such that any given element name can occur at exactly one level of the hierarchy, this is a bit limiting to enforce all the time. For example, in vConqr I have an element called path that contains the vector coordinates of part of the border of a territory. But my borders are split up into three types - internal, external and continent (where continent means an internal border that separates two continents). There are two ways to represent that relationship in XML. One is to make the border type a property of the border element (type="internal" etc). The other is to build it into the name (internalBorder, externalBorder, etc).

I chose to go with the latter because maintaining a stack of element names is easier than maintaining a stack of elements and attributes. To properly implement the element and attributes stack I'd be going down the road of building partial DOM objects in memory - which in the general case is not a direction I wanted to go in.

But just keeping a stack of element names is much simpler, and so I added this to the wrapper class.

I just have an instance variable for the stack:

NSMutableArray* elementStack;

and I push and pop the element names on and off the stack in parser:didStartElement:... and parser:didEndElement:... respectively.

Access is by a simple method:

-(NSString*) ancestorElementAtIndex:(int) index
{
  return [elementStack objectAtIndex: elementStack.count-(index+1)];
}

Now, in my handleElement_path: method. I can call [parser ancestorElementAtIndex:1] and get back the border element that the path belongs to and act appropriately.

Thanks a bundle

With the handler code nicely simplified all that remains is to kick the parser off. This also has a fair bit of boilerplate associated with it. Again, I've made a few assumptions that are appropriate to the way I tend to use this. In particular I'm assuming that I'm targeting the iPhone, and that the XML files in question are in my application bundle. In my next version I have to deal with XML that I download elsewhere too, so I have an alternative version of the parser launching method that handles that.

For now, though, I have loadFromBundle:, which starts with the following code that generates the filename, looks it up in the application bundle and initialises the NSXMLParser with it:

NSString* filename = [NSString stringWithFormat:@"%@.xml", name ];
NSString *path = [[[NSBundle mainBundle] bundlePath]stringByAppendingPathComponent:filename];

NSURL* url = [NSURL fileURLWithPath: path];
NSXMLParser* parser = [[NSXMLParser alloc] initWithContentsOfURL:url];

Then some more simplifying assumptions:

[parser setShouldProcessNamespaces:NO];
[parser setShouldReportNamespacePrefixes:NO];
[parser setShouldResolveExternalEntities:NO];

Remember, we're not dealing with namespaces. We're also not interested in referencing external entities.

The wrapper class subclasses NSXMLParser, so the delegate is self (the wrapper maintains it's own delegate).
And we can start the parsing:

parser.delegate = self;
parser parse];

The parser will call back the low level handlers on the subclass, which will translate those to the dynamic handler names we discussed earlier, maintaining the text nodes and element path stack, and keeping the application code simple, elegant and expressive.


Please submit or upvote, here - or follow through to comment on Reddit