Why is LensKit written in Java?

We launched the Introduction to Recommender Systems MOOC this week. The programming assignments for the course will be using LensKit. It didn’t take long for us to receive the inevitable question, with many +1’s: must we (the students) really use Java? We’ve addressed that question in the course materials (short answer: you can use any JVM-compatible language, but all documentation and examples assume Java), but I thought I’d take a few bytes to explain why LensKit is written in Java rather than C++, Python, FORTH, or whatever. Even though, when we started the project, I didn’t particularly like Java.

We want LensKit to be useful, and broadly accessible, for teaching and research. This goal requires two things (at least): it needs to be both easily usable and easily extensible. Result: it must be usable from languages that are widely-known, and it must be implemented in a language that is widely known.

We also need LensKit to be efficient, so it can run on large data sets. And we want to be able to write efficiently in terms of programmer time.

If we just cared about performance and client-side accessibility, then a good choice would be C (or possibly C++) with bindings to other languages. However, this fails the two other criteria I have discussed so far: it would take much more time to implement and debug, and while C is widely-known, extending LensKit would also be difficult.

Java is probably one of the most widely taught and known languages. Probably the best competitor for approachability would be Python; Python, however, would not give us the efficiency we need. Yes, projects like Psyco, PyPy, and Unladen Swallow work to improve Python’s performance, but they were still abandoned or experimental when we started in 2010; NumPy and SciPy also provide good performance, but require you to shape your problem to fit their structures fairly aggressively. I also like static typing for large-scale projects. C++ is known by many, and would give us excellent and predictable performance, but writing good C++ code is hard. Writing fast Java is somewhat hard, but when you go wrong your program is slow (or produces incorrect output), it doesn’t die hard with a segfault.

Now, we do have to jump through some hoops in LensKit’s implementation to get good performance, but it is very possible with Java. Java has a somewhat out of date reputation for being slow; it takes forever to start, but once running, the JVM is capable of executing code very efficiently. And its optimizer borders on insane, allowing fairly high-level, idiomatic code to execute quickly.

So given the goals we have for the project, and the status of the programming landscape in 2010, Java seemed like the best choice and we went with it.