Blog Articles 96–100

Protecting Data

If data matters, as it does to most of us computer scientists, protecting it is important. Prompted by Greg Grossmeier’s search for encryption best practices (see also his synthesized advice), I thought I’d document some of the things I do to protect my data.

There are two principal threats I consider here:

I need to be able to keep my data, even if my computer or other storage media is lost, broken, or stolen. Backups are the principle means of protecting against loss.
I’d really rather not have my data used if it falls into the wrong hands (e.g. my laptop gets stolen). Aside from general paranoia reasons, I have financial records, class assignment solutions, and things like that on my hard drives that shouldn’t really see the light of day without my authorization.

Keep On Chatting in the Free World

Warning: this post is a bit technical. Yes, a lot of my posts are technical, but this one contains content that I think non-technical friends and family are likely to find useful. I’ve tried to keep it accessible, but also have endeavored to not spend all day writing it and want to keep it technically accurate. If you like to chat with me and currently use Google to do so, I would encourage you to read it. If there’s demand, Jennifer can attempt a translation or paraphrase.

One key word that will come up a lot here: XMPP. XMPP stands for ‘eXtensible Messaging and Presence Protocol’. It is a protocol for instant messaging (and some other things) that, like e-mail, allows users on different servers to talk with each other. Google’s chat service, Google Talk, uses XMPP under the hood. That is why you can chat with me using Google, even though my <redacted> IM account is not with Google.

Why is LensKit written in Java?

We launched the Introduction to Recommender Systems MOOC this week. The programming assignments for the course will be using LensKit. It didn’t take long for us to receive the inevitable question, with many +1’s: must we (the students) really use Java? We’ve addressed that question in the course materials (short answer: you can use any JVM-compatible language, but all documentation and examples assume Java), but I thought I’d take a few bytes to explain why LensKit is written in Java rather than C++, Python, FORTH, or whatever. Even though, when we started the project, I didn’t particularly like Java.

We want LensKit to be useful, and broadly accessible, for teaching and research. This goal requires two things (at least): it needs to be both easily usable and easily extensible. Result: it must be usable from languages that are widely-known, and it must be implemented in a language that is widely known.

We also need LensKit to be efficient, so it can run on large data sets. And we want to be able to write efficiently in terms of programmer time.

If we just cared about performance and client-side accessibility, then a good choice would be C (or possibly C++) with bindings to other languages. However, this fails the two other criteria I have discussed so far: it would take much more time to implement and debug, and while C is widely-known, extending LensKit would also be difficult.

Blog moved (again)

Once again, my blog has moved.

Old URLs should still work ( ones might be broken after some time).

I liked Tumblr. It made it really easy to write a short blog post. Posting quotes was a wonderful capability, and encouraged me to post somewhat more often.

The Tumblr dashboard, while never great for volume, was decreasing in utility. A sponsored tweet is one thing; a sponsored post, taking up a full screen height in my stream, is rather invasive.

Putting Maven build directories out-of-tree

So you want to have Maven put its build output somewhere out-of-tree.

There are several reasons to do this:

  • You have lots of RAM, and want to speed up builds by putting class files in RAM.
  • You have an SSD, and want to reduce needless wear cycles by putting class files in RAM (or on a spinning disk).
  • Your source tree is on a network file system, and you want compilation output to be local.

Helpfully, the Maven Way is to have all Maven-generated output go to a dedicated directory, target/, where it can be easily separated. Theoretically, you can probably set to point to wherever you want, and get Maven to build somewhere else. However, if parts of the build system assume that output goes into target/ (a questionable assumption, but I make it myself in pieces of the LensKit site generation workflow).