March 16, 2010
Profiling C with Haskell
I had the great pleasure of attending the ICFP in 2009. I’m pretty new to the functional programming game, so it was really cool to meet the movers and shakers. I got to see Bryan O’Sullivan give this presentation on Criterion — a “robust, reliable performance measurement and analysis” package for analyzing Haskell programs. You can read Bryan’s description here.
When I first saw the Criterion presentation, I immediately thought about porting this to C. Unfortunately, I’ve been writing a dissertation and I haven’t gotten around to it yet. However, I did decide to do something that some might consider unholy. I’ve used Criterion to profile my C functions.
How to profile C with Criterion
For this example, I’m going to profile a Fibonacci heap that I’ve been working on.
We’re going to use Haskell’s Foreign Function Interface (FFI). So, let’s start with a simple header file for our test function:
/* FILE: test-fibheap.h */
#ifndef __TESTFIBHEAP__
#define __TESTFIBHEAP__
int test(int, int);
#endif
and now our test function:
/* FILE: test-fibheap.c */
#include "test-fibheap.h"
#include "fibheap.h"
#include < math.h>
int test(int n, int seed) {
FibHeap *heap = FibHeapMakeHeap();
int *arr = (int*) malloc(sizeof(int) * n);
int i;
srand(seed);
for(i = 0; i < n; i++) {
arr[i] = i;
}
/* Insert the numbers [0, n) into the heap in a random order */
for(i = 0; i < n-1; i++) {
int r = rand() % (n - i) + i;
int t = arr[i];
arr[i] = arr[r];
arr[r] = t;
FibHeapInsert(heap, (double) i, 0);
}
/* remove all of them one at a time. Same as sorting the array*/
for(i = 0; i < n; i++) {
FibHeapExtractMin(heap);
}
return 0;
}
This function takes two parameters, n which is how many elements to insert into the heap, and seed which is the seed to the random number generator. I don't want to use the current time as the seed because I want to be able to compare the efficiency on exactly the same structure.
Now, how to use the FFI:
{-# INCLUDE "test-fibheap.h" #-}
{-# LANGUAGE ForeignFunctionInterface #-}
module Main where
import Foreign.C -- get the C types
foreign import ccall unsafe "test" testFib ::
CInt -> CInt -> CInt
Of note are the first and last lines. The first line is just like #include. Then, the last line binds our C function int test(int,int) to the name testFib.
So, bringing it all together with Criterion:
{-# INCLUDE "test-fibheap.h" #-}
{-# LANGUAGE ForeignFunctionInterface #-}
module Main where
import Foreign.C -- get the C types
import Criterion.Main
-- "unsafe" means it's slightly faster but can't callback to haskell
foreign import ccall unsafe "test" testFib ::
CInt -> CInt -> CInt
main = defaultMain [
bgroup "fib" [bench title $ whnf (testFib n) seed]
]
where
n = 1000000
seed = 7
title = "testFibHeap " ++ (show n) ++ " " ++ (show seed)
Compile this with ghc -O --make profile.hs fibheap.o test-fibheap.o.
Then, run it ./profile -t png:500x175 -k png:500x175
I've run this twice. First with my fibheap code compiled without optimizations and then again with -O3
Results
Unoptimized
./profile -t png:500x175 -k png:500x175
warming up
estimating clock resolution...
mean is 9.843398 us (80001 iterations)
found 1259 outliers among 79999 samples (1.6%)
1157 (1.4%) high severe
estimating cost of a clock call...
mean is 522.7480 ns (95 iterations)
found 3 outliers among 95 samples (3.2%)
1 (1.1%) high mild
2 (2.1%) high severe
benchmarking fib/testFibHeap 1000000 7
collecting 100 samples, 1 iterations each, in estimated 189.5377 s
bootstrapping with 100000 resamples
mean: 1.916353 s, lb 1.909884 s, ub 1.932393 s, ci 0.950
std dev: 48.79063 ms, lb 23.67417 ms, ub 99.28092 ms, ci 0.950
found 9 outliers among 100 samples (9.0%)
4 (4.0%) high mild
5 (5.0%) high severe
variance introduced by outliers: 0.998%
variance is unaffected by outliers


Optimized with -O3
./profile -t png:500x175 -k png:500x175
warming up
estimating clock resolution...
mean is 9.928848 us (80001 iterations)
found 1264 outliers among 79999 samples (1.6%)
1150 (1.4%) high severe
estimating cost of a clock call...
mean is 522.4059 ns (95 iterations)
found 3 outliers among 95 samples (3.2%)
1 (1.1%) high mild
2 (2.1%) high severe
benchmarking fib/testFibHeapOptimized 1000000 7
collecting 100 samples, 1 iterations each, in estimated 67.12110 s
bootstrapping with 100000 resamples
mean: 670.1145 ms, lb 669.3443 ms, ub 671.1168 ms, ci 0.950
std dev: 4.445731 ms, lb 3.547659 ms, ub 5.887752 ms, ci 0.950
found 7 outliers among 100 samples (7.0%)
4 (4.0%) high mild
3 (3.0%) high severe
variance introduced by outliers: 0.990%
variance is unaffected by outliers


Post Mortem
Running the optimized test above yields:
real 0m0.722s
user 0m0.690s
sys 0m0.030s
which is comparable to the Criterion results. There's certainly an overhead involved because of the Haskell run-time, but for profiling relatively long running functions I don't think this will be that much of an issue.
I think it's worth adding to the arsenal, don't you?