Introducing Immanix: a Java library to process XML using parser combinators

Processing XML files is a PITA. There. I said it. Processing XML files using the Java standard XML API, i.e. JAXP is even worse. There are some nicer APIs out there that alleviate this pain, like XPATH, XStream, JAXB, Castor, etc.

But still, they all fail in some point or another, ranging from being verbose and tedious to write to not being able to handle large XML streams.
The latter is a killer for many of the higher level approaches out there. Try using XPATH or any mapping library on a file weighing more than a couple of megabytes.

To handle such cases, we are usually left with StAX. Don’t get me wrong: StAX is not that bad an API, and I’d take it any day instead of JAXP even for small files. It still is a very low level API and parsing the simplest of files requires an impressive amount of code. Also, handling state with StAX is a painful exercice. You usually end up building a full-fledged state machine to do it.

It’s in dealing with such cases that I thought that there must be a better way to do this. And that’s how immanix was born.
Read more of this post

How to reference the latest versions of Woodstox with Maven

I spent some time digging around before figuring out how to reference the latest version of Woodstox Stax processor with Maven: Central only contains older versions, and the project changed its group and artefact ids, so I figured I’d blog about it in case it turns to be useful to others.

First, add the codehaus maven repository to your pom:

<repositories>
	:
	<repository>
		<id>http://repository.codehaus.org/</id>
		<url>http://repository.codehaus.org/</url>
	</repository>
	:
</repositories>

And then add

org.codehaus.woodstox:woodstox-core-asl:4.0.8

to your dependencies:

<dependencies>
	:
	<dependency>
		<groupid>org.codehaus.woodstox</groupid>
		<artifactid>woodstox-core-asl</artifactid>
		<version>4.0.8</version>
	</dependency>
	:
</dependencies>