Blog Articles 176–180

Getting Things Typed: External Trusted Systems for Programming

One of the major tenants of David Allen’s Getting Things Done methodology is the concept of an external trusted system — a system for storing information outside your brain so that it can be retrieved as needed and/or brought to your attention when appropriate. Our brains are often fickle, and we are apt to forget things. Further, by trying to remember them, we spend mental energy trying not to forget them so that, even if we do remember, our productivity is decreased by the stress of trying not to forget. Getting notes, appointments, tasks, and pretty much anything else we need to remember out of our heads and into a reliable external storage and retrieval system enables us to free up our minds to focus on what we really want to accomplish.

I’ve been realizing lately that robust static type and module systems fill a similar role when programming. I have better things to do with my brain cycles than remember the details of functions, what they require, and where they are used.

A module and interface system like OCaml’s makes it easy to refer to the function header — its summary — when I need t recall its usage. Documentation extractors do provide some of this benefit, and languages like Java provide similar benefits with their amenability to static analysis and good support enabling auto-completion and other IDE lookup features. In a static language, however, the type system explicitly delineates the permissible inputs and possible outputs for a function without requiring the programmer to list them manually. Therefore, the documentation just needs to describe behavior and any special requirements beyond those expressible in the type system (and the more expressive the type system, the fewer these requirements are likely to be). Therefore, the information necessary to call a function is retrievable when needed.

The type system also enables the remind-when-appropriate aspect of an external trusted system. If I get something wrong when calling a function, there’s a decent chance the compiler will remind me when I compile the code. If I change a function, I don’t have to worry about remembering where all it was used; the type system will catch a large set of errors next compile cycle.

Fixing the Dash Lights on a Dodge Caravan

We had a problem this last week with our ’03 Dodge Grand Caravan — the dash lights went out. Completely. Instrument panel, radio, heater controls — all unlit. My first thought, naturally, was a fuse.

However, when I looked at the fuse box, I couldn’t find any fuse that looked like it controlled the instrument panel backlighting. Web searching turned up a few things, including a fixya entry and a DodgeTalk.com forum post which document the same problem and an odd fix: disconnect the battery or otherwise cut power to the computer.

So, I went out and pulled the IOD fuse (Ignition Off Draw, controls the power drawn when the vehicle is off) for a couple hours. Disconnecting the negative cable on the battery would accomplish the same thing for this purpose. After putting the fuse back in, the dash lights worked.

Piecing things together, particularly with the insights from the DodgeTalk post, it seems that the issue is a computer problem — sometimes, for some reason, the computer will stop turning on the dash lights. Disconnecting power to it for a while resets the computer, allowing the dash lights to start working again. Weirdest van repair ever, but it works, and here it is documented so others can hopefully find that the solution does, indeed, work.

Tuning the OCaml memory allocator for large data processing jobs

TL;DR: setting OCAMLRUNPARAM=s=4M,i=32M,o=150 can make your OCaml programs run faster. Read on for details and how to see if the garbage collector is thrashing and thereby slowing down your program.

In my research work with GroupLens, I do a most of my coding for data processing, algorithm implementation, etc. in OCaml. Sometimes I have to suffer a bit for this when some nice library doesn’t have OCaml bindings, but in general it works out fairly well. And every time I go to do some refactoring, I am reminded why I’m not coding in Python.

One thing I have found, however, is that the default OCaml garbage collector parameters are not very well-suited for much of my work — frequently long-running data processing tasks building and manipulating large, often persistent1 data structures. The program will run somewhat slow (although there usually isn’t anything to compare it against), but more importantly, profiling with gprof will reveal that my program is spending a substantial amount of its time (~30% or more) in the OCaml garbage collector (if memory serves, frequently in the function caml_gc_major_slice).

My first OCaml syntax extension

Preface: In this post, I describe my adventures figuring out how to write a syntax extension for the OCaml programming language and attempt to provide something of a tutorial on writing a basic extension. I assume that you’re somewhat familiar with basic parsing technology and context-free grammars — if not, a good tutorial on parser construction with a tool like Yacc would be worth a read first.

One of the oft-touted benefits of OCaml is Camlp4, a pre-processor that facilitates extending the OCaml syntax to provide natural support for various constructions. This has been used for a variety of purposes, such as database type-checking, monad sugaring, and logging. In the hands of a capable author, a variety of wonders can be introduced to the OCaml language.

I’ve used syntax extensions for some time now, particularly PGOCaml and pa_lwt, to make much life with OCaml easier. I’d never written one, however, and found the documentation and other relevant material rather intimidating. Camlp4 documentation is somewhat hard to find, particularly for the current version (with OCaml 3.10, they made significant backwards-incompatible changes to Camlp4; much of the available tutorial and reference material was thus somewhat obsolete). The documentation that was around I find difficult to start with, particularly since I want to understand what the code I write does and not just cargo-cult it.

But I finally bit the bullet and learned. And when all was said and done, I have 13 lines of code which provide a small sugar — sort of a minimal syntax extension. This extension provides pattern matching over lazy lists, much like llists but far simpler (and based on the Batteries lazy list module). Here it is, in its entirety, and then I’ll explain how it works and what’s needed to get stared with the bare basics of extending OCaml syntax:

Object-Oriented Spaghetti

Note: since writing this essay in 2007, my understanding of object-oriented programming and of separation of concerns has evolved substantially. I think that some of the concerns I raised in this essay are still valid, and that it is quite easy to create unreadable messes of objects, but no longer hold to as strong a version of the final conclusion.

A long time ago ago, Simula was created. From it came Smalltalk, and C++, followed by Java and a host of other languages sporting this new programming paradigm: object-oriented programming. Objects are everywhere — most new/modern languages, at least in the mainstream, are based on them — and are used for everything. In Java, all the core data structures are implemented in an object-oriented fashion.

I’m not convinced that all this is a good thing. In fact, I submit that excessive use of object-oriented principles leads to a new kind of spaghetti code, rendering programs perhaps as unreadable as when implemented with unscrupulous GOTOs. OK, maybe not quite, but it can still be pretty bad.

An important facet of programming and abstraction design is separation of concerns. Separation of concerns is the idea that different concerns or aspects of a program should be kept separate. One example would be separating the type-checking logic of an evaluator from actual evaluation logic. Or separating the business logic from the report generation in a business application.