April 11, 2010

The ugliest code I ever wrote: Parser generator written in XSLT

Filed under: Programming — Tags: , , , , — Nathan @ 4:41 pm

I took a compiler design course in the Spring semester of 2004. I cannot recommend enough that you should take a course like this at some point in your academic career. I hear more and more about how departments are removing this as a requirement, or making the course easier. I think this is a real shame, but that’s a debate for another day. This post is about my semester project for Compilers.

The assignment was to write a compiler for a subset of Pascal that generated bytecode for a VM that we were provided. Our compiler was to be written in C/C++, and we weren’t to use Lex/Yacc. So, my teammate Charles and I set upon breaking down the tasks. He wrote the lexer, and I wrote the parser.

First, a quick note. It was 2004. Things were weird in 2004. We were in the midst of several wars. Facebook was founded. Cassini reached Saturn. And, XML was big. At least in my group of friends it was BIG. So, when it came time to build the the parse tree, the obvious choice was to just use LibXML and use an XML tree. This let me trivially dump the entire tree as an XML doc. I then wrote an XSLT (even remember it?) to transform the XML to Graphviz Dot so I could visualize the parse tree. This was my version of testing.

After quite a few iterations, I had plenty of time until the deadline so I set out working on the obvious: a parser generator. We weren’t allowed to use Yacc, but no one said anything about writing our own.

Luckily the language we were parsing was quite simple so my parser was just recursive-descent. Thus, it was fairly easy to generalize it. So, what language does one write a parser generator in? XSLT of course. The choice was motivated by #1 someone told me I couldn’t do it and #2 there were serious wins to representing the grammar itself in XML.

So, how did it work? It took a grammar represented in XML. Then an XSLT transformed that grammar into a C parser for that language which then produced the parse tree as an XML document. But, a compiler doesn’t end at parsing. The type-checking step was also done in XSLT, but over a decorated AST instead. Unfortunately we were running low on time (down to mere hours) in the end so we ended up writing the code generator in C instead of XSLT.

Here’s some gems from the parser generator:

This bit of code is near the top of the file. Notice what it does.



    
    true
    
    
    

It calls another template, much like a function that can’t return anything since it can only emit text/elements, with the parameter “prototype = true”. So, looking at the beginning of functions:


    
    
    
int
parse
(xmlNode* par)
    ;
    
{
    int didSomething=0;
    
    ...

So, functions is responsible for both emitting the code for each parse function and also for making the prototypes.

Sadly, it’s been 6 years so I don’t remember enough about the code to do anything with it these days. And like all school projects, documentation was considered a sign of weakness or some such. So, the only artifacts are the source files themselves.

Without further ado, several of the files from our compiler:
parser-generator.xsl
grammar.xml
type-checker.xml


  • I'm a software engineer at Google
  • I'm from Alabama
  • I live in San Francisco
  • I like to work on ridiculous things
  • I'm currently learning German, Scala, and Computer Vision
  • This book referred to JavaScript I wrote when I was 15.