April 7, 2010

Introducing Funion

Filed under: Programming — Tags: , — Nathan @ 1:41 pm

I defended my dissertation on March 26. So, I guess that means I’m Dr. Nathan now. I’m not sure what normal people do the week after their defense, but I’m doing sprint after sprint on personal projects. I start work in the middle of July and I hear that this company is known as a black hole. There’s just too many cool things internally to learn so I probably won’t have much time for personal projects for a good while. Thus, the first of my projects that’s ready for the world: Funion.

Funion

Funion (Fuse Union /fʌn:njən/) is a Haskell program that uses the HFuse binding to the FUSE library. It presents a filesystem which shows a unioned view of multiple directories.

Example

Consider the case where you have the following two directories, A and B.




But, A and B represent two instances of some common hierarchical scheme. So, unioning A and B, we get:

Get Funion

You can get Funion two ways: GitHub and Hackage

If you have cabal, you can just do:


  cabal install funion


Installation

Funion requires: *nix, FUSE, and GHC (preferably the Haskell Platform).

If you have Ubuntu, install the Haskell Platform. Then, to install FUSE:


  sudo apt-get install libfuse-dev

Now, you can just cabal install funion.

Usage

Create a mountpoint. Then use funion to union several dirs into that mountpoint:


  mkdir mountpoint
  funion mountpoint /A /B

where /A and /B are teh directories to union.

Upcoming features

Currently Funion is readonly. The next version will allow file editing, and possibly file creation. If a file is created it will have a specific file system to be placed in. Or perhaps, there could be a round-robin file-writing scheme.

There is no logging as of now. The next version will have a logging mode.

FunionD: A daemon which will allow you to mount your funioned directories upon boot.

March 16, 2010

Profiling C with Haskell

Filed under: Programming — Tags: , , , , — Nathan @ 11:33 pm

I had the great pleasure of attending the ICFP in 2009. I’m pretty new to the functional programming game, so it was really cool to meet the movers and shakers. I got to see Bryan O’Sullivan give this presentation on Criterion — a “robust, reliable performance measurement and analysis” package for analyzing Haskell programs. You can read Bryan’s description here.

When I first saw the Criterion presentation, I immediately thought about porting this to C. Unfortunately, I’ve been writing a dissertation and I haven’t gotten around to it yet. However, I did decide to do something that some might consider unholy. I’ve used Criterion to profile my C functions.

How to profile C with Criterion

For this example, I’m going to profile a Fibonacci heap that I’ve been working on.

We’re going to use Haskell’s Foreign Function Interface (FFI). So, let’s start with a simple header file for our test function:

/* FILE: test-fibheap.h */
#ifndef __TESTFIBHEAP__
#define __TESTFIBHEAP__
int test(int, int);
#endif

and now our test function:

/* FILE: test-fibheap.c */
#include "test-fibheap.h"
#include "fibheap.h"
#include < math.h>

int test(int n, int seed) {

  FibHeap *heap = FibHeapMakeHeap();

  int *arr = (int*) malloc(sizeof(int) * n);
  int i; 

  srand(seed);

  for(i = 0; i < n; i++) {
    arr[i] = i;
  }
  /* Insert the numbers [0, n) into the heap in a random order */
  for(i = 0; i < n-1; i++) {
    int r = rand() % (n - i) + i;
    int t = arr[i];
    arr[i] = arr[r];
    arr[r] = t;
    FibHeapInsert(heap, (double) i, 0);
  }
  /* remove all of them one at a time.  Same as sorting the array*/
  for(i = 0; i < n; i++) {
    FibHeapExtractMin(heap);
  }
  return 0;
}

This function takes two parameters, n which is how many elements to insert into the heap, and seed which is the seed to the random number generator. I don't want to use the current time as the seed because I want to be able to compare the efficiency on exactly the same structure.

Now, how to use the FFI:

{-# INCLUDE "test-fibheap.h" #-}
{-# LANGUAGE ForeignFunctionInterface #-}
module Main where
import Foreign.C -- get the C types

foreign import ccall unsafe "test" testFib ::
                  CInt -> CInt -> CInt

Of note are the first and last lines. The first line is just like #include. Then, the last line binds our C function int test(int,int) to the name testFib.

So, bringing it all together with Criterion:

{-# INCLUDE "test-fibheap.h" #-}
{-# LANGUAGE ForeignFunctionInterface #-}
module Main where
import Foreign.C -- get the C types
import Criterion.Main

-- "unsafe" means it's slightly faster but can't callback to haskell
foreign import ccall unsafe "test" testFib ::
               CInt -> CInt -> CInt

main = defaultMain [
          bgroup "fib" [bench title $ whnf (testFib n) seed]
          ]
  where
    n = 1000000
    seed = 7
    title = "testFibHeap " ++ (show n) ++ " " ++ (show seed)

Compile this with ghc -O --make profile.hs fibheap.o test-fibheap.o.
Then, run it ./profile -t png:500x175 -k png:500x175

I've run this twice. First with my fibheap code compiled without optimizations and then again with -O3

Results

Unoptimized

./profile -t png:500x175 -k png:500x175
warming up
estimating clock resolution...
mean is 9.843398 us (80001 iterations)
found 1259 outliers among 79999 samples (1.6%)
  1157 (1.4%) high severe
estimating cost of a clock call...
mean is 522.7480 ns (95 iterations)
found 3 outliers among 95 samples (3.2%)
  1 (1.1%) high mild
  2 (2.1%) high severe

benchmarking fib/testFibHeap 1000000 7
collecting 100 samples, 1 iterations each, in estimated 189.5377 s
bootstrapping with 100000 resamples
mean: 1.916353 s, lb 1.909884 s, ub 1.932393 s, ci 0.950
std dev: 48.79063 ms, lb 23.67417 ms, ub 99.28092 ms, ci 0.950
found 9 outliers among 100 samples (9.0%)
  4 (4.0%) high mild
  5 (5.0%) high severe
variance introduced by outliers: 0.998%
variance is unaffected by outliers



Optimized with -O3

./profile -t png:500x175 -k png:500x175
warming up
estimating clock resolution...
mean is 9.928848 us (80001 iterations)
found 1264 outliers among 79999 samples (1.6%)
  1150 (1.4%) high severe
estimating cost of a clock call...
mean is 522.4059 ns (95 iterations)
found 3 outliers among 95 samples (3.2%)
  1 (1.1%) high mild
  2 (2.1%) high severe

benchmarking fib/testFibHeapOptimized 1000000 7
collecting 100 samples, 1 iterations each, in estimated 67.12110 s
bootstrapping with 100000 resamples
mean: 670.1145 ms, lb 669.3443 ms, ub 671.1168 ms, ci 0.950
std dev: 4.445731 ms, lb 3.547659 ms, ub 5.887752 ms, ci 0.950
found 7 outliers among 100 samples (7.0%)
  4 (4.0%) high mild
  3 (3.0%) high severe
variance introduced by outliers: 0.990%
variance is unaffected by outliers



Post Mortem

Running the optimized test above yields:

real    0m0.722s
user    0m0.690s
sys     0m0.030s

which is comparable to the Criterion results. There's certainly an overhead involved because of the Haskell run-time, but for profiling relatively long running functions I don't think this will be that much of an issue.

I think it's worth adding to the arsenal, don't you?


  • I'm a software engineer at Google
  • I'm from Alabama
  • I live in San Francisco
  • I like to work on ridiculous things
  • I'm currently learning German, Scala, and Computer Vision
  • This book referred to JavaScript I wrote when I was 15.