April 21, 2010

Lessons I should have learned, Episode 4: Coroutines in C

Filed under: Lessons I should have learned,Programming — Tags: , , — Nathan @ 2:19 pm

A while ago, I read this article on coroutines in C. One of the main downfalls of this technique is that since it uses the static keyword, it makes it much harder to write recursive coroutines, and nigh-impossible to have multiple coroutines using the same function.

Examples

Let’s first look at an example of how it can work. One way of efficiently computing Fibonacci numbers in a language like Haskell is to construct an infinite list of the Fibonacci values.

fibs a b = a : fibs b (a+b)

Since Haskell is lazily evaluated, such an infinite list is fine until you attempt to compute the whole thing. So, as long as you just say something like take 10 $ fibs 1 1 it will only compute how many you need.

One important use of Coroutines is as iterators. Consider the Haskell example from above. If we assume that we have such an infinite list of values, we may want to iterate over the list, inspecting the values as we go. The following C code achieves this using some helper macros we’ll describe below.

int fibs(int a, int b, CoroutineState *s) {
  initializeCoroutineState(s, FibState, mystate);

  mystate->a = a;
  mystate->b = b;
  yield(s,mystate->a);
  recur (s,fibs(mystate->b, mystate->a + mystate->b, s->next));

  finalizeCoroutine(s);
  return 0;
}

Now, if we wanted to print out the first 10 of these values:

s = createCoroutineState();
for(i = 0; i < 10; i++) {
  printf("%d ", fibs(1,1,s));
}

The yield method is similar to Python's yield key word. It returns a value, but also records where the yield was called so that the next time the function is executed it can begin again from that point.

Another example of how coroutines can be used as iterators is as iterators over a binary tree. Consider the following tree:

       4
  2         6
1   3     5    7 

A standard way to traverse this tree is a post-order traversal. Consider the following code which traverses this tree and prints it out:

void print(Tree *t) {
  if(t->left)
    print(t->left);
  if(t->right)
    print(t->right);
  printf("%d ", t->value);
}

Building a post-order iterator is slightly more complex. It involves creating a stack and pushing wrapped tree-nodes onto the stack. When one is popped, it is marked as 'visited' and then pushed onto the stack, followed by its right child then left child. However, we can use coroutines to achieve the same thing while staying true to the original traversal code:

int treePostOrderIterator(Tree *t, CoroutineState *s) {
  initializeCoroutineState(s,TreeIteratorState, mystate);
  mystate->tree = t;

  if(mystate->tree->left)
    recur (s, treePostOrderIterator(mystate->tree->left, s->next));

  if(mystate->tree->right)
    recur (s, treePostOrderIterator(mystate->tree->right, s->next));

  yield (s,mystate->tree->value);

  finalizeCoroutine(s);
  return 0;
}

How it works

When I was learning to program, I learned that switch statements were just conditional structures that were equivalent to a series of if statements. While you can almost always get away with thinking about them this way, they aren't actually. Instead, think of switch statements as parameterized gotos. One of the less objectionable uses of goto is as a means of breaking out of nested loops. So, obviously goto can jump out of scopes. In fact, it can be used to jump into and out of any scope inside of a single function. The same is true of the cases of a switch statement. This is what makes Duff's device possible.

In order to facilitate recursion and having multiple coroutines running at once, we need to pass a mutable state parameter. This is an object of type struct CoroutineState.

typedef struct CoroutineState CoroutineState;
struct CoroutineState {
  void *state;            /* Pointer to a struct representing the internal state
                             of the coroutine.  We need this since we have to
                             reinstate the scope whenever the function is
                             re-intered.                                      */
  long current;           /* The line number of where the coroutine should
                             return.  Used in 'yield' and 'recur'             */
  unsigned char done;     /* A marker to show that a recurrence is completed  */
  CoroutineState *next;   /* The next in the stack of states                  */
  CoroutineState *parent; /* NULL if actually the parent, pointer otherwise   */
  CoroutineState *tail;   /* Only the parent is guaranteed to know the tail   */
};

Basically this struct stores everything needed to reinstate the coroutine when it is next called. This includes the line number it should begin executing on, a pointer to the user's saved values, and a stack containing the recursive calls.

Every coroutine begins with an invocation of the initializeCoroutineState macro. This is a macro that takes three parameters. The first is a CoroutineState object, the second is the type defining the user's saved values, and the last is the variable name to use to refer to this state.

#define initializeCoroutineState(coroutineState, UserStateType, userStateVar) \
  UserStateType *userStateVar = (UserStateType*) coroutineState->state;       \
  switch(coroutineState->current) {                                           \
  case 0:                                                                     \
  if(coroutineState->state == 0) {                                            \
    coroutineState->state = (void*) malloc(sizeof (UserStateType));           \
    bzero(coroutineState->state, sizeof(UserStateType));                      \
  }                                                                           \
  userStateVar = (UserStateType*) coroutineState->state;                      \

Note that there is a switch statement embedded here. This switch statement jumps to the appropriate line in the execution.

Every coroutine ends with finalizeCoroutine(s) which indicates that after this point the execution of the function should proceed normally. It also says that when this function returns that the coroutine is complete and should be removed from the stack. This is indicated internally with s->done =1;.

#define finalizeCoroutine(s)                                                  \
  }                                                                           \
  s->done = 1;                                                                \
  free(s->next);                                                              \
  s->next = 0;

Now, to the meat of the matter. yield needs to return the value the user wants to yield, but also record where this occurred so that the switch statement from above will jump to here. It also needs to reinstate any variables that the user had in scope. Unfortunately there's no good way of doing that in C, so we have to do it manually. Thus the need for the helper struct. Before the return in the following macro, we're just manipulating the stack so that we can record where we are. Notice coroutineState->current = __LINE__;. __LINE__ is a macro that evaluates to the current line number in the program. So, here we're recording the line number and then after the return, we have case __LINE__:. This will cause the coroutine to begin executing after the return the next time it's called.

#define yield(coroutineState, value)                                          \
  do{                                                                         \
    CoroutineState *parent = coroutineState ? coroutineState                  \
        : coroutineState->parent;                                             \
    coroutineState->current = __LINE__;                                       \
    parent->tail = coroutineState;                                            \
    free(coroutineState->next);                                               \
    coroutineState->next = 0;                                                 \
    return value;                                                             \
    case __LINE__: 1;                                                         \
  } while(0)                                                             

The last macro is perhaps the most interesting. It's the one that allows things like the infinite list iterator and the binary tree iterator. Basically, there are times when you want to have a recursive call and have the coroutine state stored along with having a return value bubble up.

The while loop is responsible for executing until the coroutine called below it is completed.

#define recur(coroutineState,func)                                            \
  do{                                                                         \
    CoroutineState *parent = coroutineState ? coroutineState                  \
        : coroutineState->parent;                                             \
    coroutineState->next = createCoroutineState();                            \
    coroutineState->next->parent = parent;                                    \
    parent->tail = coroutineState->next;                                      \
    while(!coroutineState->next->done) {                                      \
      CoroutineState *parent = coroutineState ? coroutineState                \
          : coroutineState->parent;                                           \
      typeof(func) tmp =  func;                                               \
      coroutineState->current = __LINE__;                                     \
      if(!parent->tail->done)                                                 \
        return tmp;                                                           \
      case __LINE__: 1;                                                       \
    }                                                                         \
  } while(0)                                                                  \

Using it

In order to get this to work, you have to do one extra thing. You need to create a struct which you use to store your state rather than use local-variables in your functions. This is necessary since we'll be re-entering the functions later and that state would be lost. So, just put all the variables you would want to use into the struct. Then use this struct in the initializeCoroutineState and then whenever you use a local variable.

If you would normally return a computed value (not the result of a recursive call) then just say yield (value); If you are doing a recursive call and intend for the coroutine to proceed down the recurrence, just wrap it in a recur(s, yourRecursiveCall()).

Get it

I've posted the implementation as a library on GitHub: http://github.com/nathanwiegand/coroutines.

April 11, 2010

The ugliest code I ever wrote: Parser generator written in XSLT

Filed under: Programming — Tags: , , , , — Nathan @ 4:41 pm

I took a compiler design course in the Spring semester of 2004. I cannot recommend enough that you should take a course like this at some point in your academic career. I hear more and more about how departments are removing this as a requirement, or making the course easier. I think this is a real shame, but that’s a debate for another day. This post is about my semester project for Compilers.

The assignment was to write a compiler for a subset of Pascal that generated bytecode for a VM that we were provided. Our compiler was to be written in C/C++, and we weren’t to use Lex/Yacc. So, my teammate Charles and I set upon breaking down the tasks. He wrote the lexer, and I wrote the parser.

First, a quick note. It was 2004. Things were weird in 2004. We were in the midst of several wars. Facebook was founded. Cassini reached Saturn. And, XML was big. At least in my group of friends it was BIG. So, when it came time to build the the parse tree, the obvious choice was to just use LibXML and use an XML tree. This let me trivially dump the entire tree as an XML doc. I then wrote an XSLT (even remember it?) to transform the XML to Graphviz Dot so I could visualize the parse tree. This was my version of testing.

After quite a few iterations, I had plenty of time until the deadline so I set out working on the obvious: a parser generator. We weren’t allowed to use Yacc, but no one said anything about writing our own.

Luckily the language we were parsing was quite simple so my parser was just recursive-descent. Thus, it was fairly easy to generalize it. So, what language does one write a parser generator in? XSLT of course. The choice was motivated by #1 someone told me I couldn’t do it and #2 there were serious wins to representing the grammar itself in XML.

So, how did it work? It took a grammar represented in XML. Then an XSLT transformed that grammar into a C parser for that language which then produced the parse tree as an XML document. But, a compiler doesn’t end at parsing. The type-checking step was also done in XSLT, but over a decorated AST instead. Unfortunately we were running low on time (down to mere hours) in the end so we ended up writing the code generator in C instead of XSLT.

Here’s some gems from the parser generator:

This bit of code is near the top of the file. Notice what it does.



    
    true
    
    
    

It calls another template, much like a function that can’t return anything since it can only emit text/elements, with the parameter “prototype = true”. So, looking at the beginning of functions:


    
    
    
int
parse
(xmlNode* par)
    ;
    
{
    int didSomething=0;
    
    ...

So, functions is responsible for both emitting the code for each parse function and also for making the prototypes.

Sadly, it’s been 6 years so I don’t remember enough about the code to do anything with it these days. And like all school projects, documentation was considered a sign of weakness or some such. So, the only artifacts are the source files themselves.

Without further ado, several of the files from our compiler:
parser-generator.xsl
grammar.xml
type-checker.xml

April 7, 2010

Introducing Funion

Filed under: Programming — Tags: , — Nathan @ 1:41 pm

I defended my dissertation on March 26. So, I guess that means I’m Dr. Nathan now. I’m not sure what normal people do the week after their defense, but I’m doing sprint after sprint on personal projects. I start work in the middle of July and I hear that this company is known as a black hole. There’s just too many cool things internally to learn so I probably won’t have much time for personal projects for a good while. Thus, the first of my projects that’s ready for the world: Funion.

Funion

Funion (Fuse Union /fʌn:njən/) is a Haskell program that uses the HFuse binding to the FUSE library. It presents a filesystem which shows a unioned view of multiple directories.

Example

Consider the case where you have the following two directories, A and B.




But, A and B represent two instances of some common hierarchical scheme. So, unioning A and B, we get:

Get Funion

You can get Funion two ways: GitHub and Hackage

If you have cabal, you can just do:


  cabal install funion


Installation

Funion requires: *nix, FUSE, and GHC (preferably the Haskell Platform).

If you have Ubuntu, install the Haskell Platform. Then, to install FUSE:


  sudo apt-get install libfuse-dev

Now, you can just cabal install funion.

Usage

Create a mountpoint. Then use funion to union several dirs into that mountpoint:


  mkdir mountpoint
  funion mountpoint /A /B

where /A and /B are teh directories to union.

Upcoming features

Currently Funion is readonly. The next version will allow file editing, and possibly file creation. If a file is created it will have a specific file system to be placed in. Or perhaps, there could be a round-robin file-writing scheme.

There is no logging as of now. The next version will have a logging mode.

FunionD: A daemon which will allow you to mount your funioned directories upon boot.


  • I'm a software engineer at Google
  • I'm from Alabama
  • I live in San Francisco
  • I like to work on ridiculous things
  • I'm currently learning German, Scala, and Computer Vision
  • This book referred to JavaScript I wrote when I was 15.