March 16, 2010

Profiling C with Haskell

Filed under: Programming — Tags: , , , , — Nathan @ 11:33 pm

I had the great pleasure of attending the ICFP in 2009. I’m pretty new to the functional programming game, so it was really cool to meet the movers and shakers. I got to see Bryan O’Sullivan give this presentation on Criterion — a “robust, reliable performance measurement and analysis” package for analyzing Haskell programs. You can read Bryan’s description here.

When I first saw the Criterion presentation, I immediately thought about porting this to C. Unfortunately, I’ve been writing a dissertation and I haven’t gotten around to it yet. However, I did decide to do something that some might consider unholy. I’ve used Criterion to profile my C functions.

How to profile C with Criterion

For this example, I’m going to profile a Fibonacci heap that I’ve been working on.

We’re going to use Haskell’s Foreign Function Interface (FFI). So, let’s start with a simple header file for our test function:

/* FILE: test-fibheap.h */
#ifndef __TESTFIBHEAP__
#define __TESTFIBHEAP__
int test(int, int);
#endif

and now our test function:

/* FILE: test-fibheap.c */
#include "test-fibheap.h"
#include "fibheap.h"
#include < math.h>

int test(int n, int seed) {

  FibHeap *heap = FibHeapMakeHeap();

  int *arr = (int*) malloc(sizeof(int) * n);
  int i; 

  srand(seed);

  for(i = 0; i < n; i++) {
    arr[i] = i;
  }
  /* Insert the numbers [0, n) into the heap in a random order */
  for(i = 0; i < n-1; i++) {
    int r = rand() % (n - i) + i;
    int t = arr[i];
    arr[i] = arr[r];
    arr[r] = t;
    FibHeapInsert(heap, (double) i, 0);
  }
  /* remove all of them one at a time.  Same as sorting the array*/
  for(i = 0; i < n; i++) {
    FibHeapExtractMin(heap);
  }
  return 0;
}

This function takes two parameters, n which is how many elements to insert into the heap, and seed which is the seed to the random number generator. I don't want to use the current time as the seed because I want to be able to compare the efficiency on exactly the same structure.

Now, how to use the FFI:

{-# INCLUDE "test-fibheap.h" #-}
{-# LANGUAGE ForeignFunctionInterface #-}
module Main where
import Foreign.C -- get the C types

foreign import ccall unsafe "test" testFib ::
                  CInt -> CInt -> CInt

Of note are the first and last lines. The first line is just like #include. Then, the last line binds our C function int test(int,int) to the name testFib.

So, bringing it all together with Criterion:

{-# INCLUDE "test-fibheap.h" #-}
{-# LANGUAGE ForeignFunctionInterface #-}
module Main where
import Foreign.C -- get the C types
import Criterion.Main

-- "unsafe" means it's slightly faster but can't callback to haskell
foreign import ccall unsafe "test" testFib ::
               CInt -> CInt -> CInt

main = defaultMain [
          bgroup "fib" [bench title $ whnf (testFib n) seed]
          ]
  where
    n = 1000000
    seed = 7
    title = "testFibHeap " ++ (show n) ++ " " ++ (show seed)

Compile this with ghc -O --make profile.hs fibheap.o test-fibheap.o.
Then, run it ./profile -t png:500x175 -k png:500x175

I've run this twice. First with my fibheap code compiled without optimizations and then again with -O3

Results

Unoptimized

./profile -t png:500x175 -k png:500x175
warming up
estimating clock resolution...
mean is 9.843398 us (80001 iterations)
found 1259 outliers among 79999 samples (1.6%)
  1157 (1.4%) high severe
estimating cost of a clock call...
mean is 522.7480 ns (95 iterations)
found 3 outliers among 95 samples (3.2%)
  1 (1.1%) high mild
  2 (2.1%) high severe

benchmarking fib/testFibHeap 1000000 7
collecting 100 samples, 1 iterations each, in estimated 189.5377 s
bootstrapping with 100000 resamples
mean: 1.916353 s, lb 1.909884 s, ub 1.932393 s, ci 0.950
std dev: 48.79063 ms, lb 23.67417 ms, ub 99.28092 ms, ci 0.950
found 9 outliers among 100 samples (9.0%)
  4 (4.0%) high mild
  5 (5.0%) high severe
variance introduced by outliers: 0.998%
variance is unaffected by outliers



Optimized with -O3

./profile -t png:500x175 -k png:500x175
warming up
estimating clock resolution...
mean is 9.928848 us (80001 iterations)
found 1264 outliers among 79999 samples (1.6%)
  1150 (1.4%) high severe
estimating cost of a clock call...
mean is 522.4059 ns (95 iterations)
found 3 outliers among 95 samples (3.2%)
  1 (1.1%) high mild
  2 (2.1%) high severe

benchmarking fib/testFibHeapOptimized 1000000 7
collecting 100 samples, 1 iterations each, in estimated 67.12110 s
bootstrapping with 100000 resamples
mean: 670.1145 ms, lb 669.3443 ms, ub 671.1168 ms, ci 0.950
std dev: 4.445731 ms, lb 3.547659 ms, ub 5.887752 ms, ci 0.950
found 7 outliers among 100 samples (7.0%)
  4 (4.0%) high mild
  3 (3.0%) high severe
variance introduced by outliers: 0.990%
variance is unaffected by outliers



Post Mortem

Running the optimized test above yields:

real    0m0.722s
user    0m0.690s
sys     0m0.030s

which is comparable to the Criterion results. There's certainly an overhead involved because of the Haskell run-time, but for profiling relatively long running functions I don't think this will be that much of an issue.

I think it's worth adding to the arsenal, don't you?


  • I'm a software engineer at Google
  • I'm from Alabama
  • I live in San Francisco
  • I like to work on ridiculous things
  • I'm currently learning German, Scala, and Computer Vision
  • This book referred to JavaScript I wrote when I was 15.