Tuning the OCaml memory allocator for large data processing jobs
TL;DR: setting OCAMLRUNPARAM=s=4M,i=32M,o=150 can make
your OCaml programs run faster. Read on for details and how to see if
the garbage collector is thrashing and thereby slowing down your
program.
In my research work with GroupLens, I do a most of my coding for data processing, algorithm implementation, etc. in OCaml. Sometimes I have to suffer a bit for this when some nice library doesn’t have OCaml bindings, but in general it works out fairly well. And every time I go to do some refactoring, I am reminded why I’m not coding in Python.
One thing I have found, however, is that the default OCaml garbage
collector parameters are not very well-suited for much of my work —
frequently long-running data processing tasks building and manipulating
large, often persistent1 data structures. The program will
run somewhat slow (although there usually isn’t anything to compare it
against), but more importantly, profiling with gprof will
reveal that my program is spending a substantial amount of its time
(~30% or more) in the OCaml garbage collector (if memory serves,
frequently in the function caml_gc_major_slice).