Blog Articles 96–100

Writing Research

From Twitter this morning: writing tips for academics. Prof. Feamster’s advice in that column is excellent. I would just like to add one thing that I have found very helpful in my own writing: think about what the paper — and each section of it — need to accomplish.

Software engineers are familiar with the concept of requirements: when designing a program, it is important to understand what its users require it to do. If you do not know what the program is supposed to do, and how it relates to the things around it, it is difficult to write a good program. Indeed, it is difficult to even know what a good program would be.

I apply a similar idea to my writing. There are many times when I do not necessarily think about it explicitly. But when I get stuck on a section of a paper, I often find it helpful to think about what that section is supposed to accomplish. I even think about it like a software module: its prerequisites or dependencies (i.e. the things it assumes previous sections have stated or set up) and its outputs (the things later sections will assume it will have done). If I spend 5 minutes writing down what the section has to say, it is often easy to finish drafting the section.

I personally find this more useful than deep outlining. Many people like to outline down to the paragraph level, writing out their paragraph topic sentences. This is good advice, and likely helpful to many writers. But for my own writing, I have difficulty thinking in that way. I find it more useful to outline the sections and subsections of my paper, and then start thinking about each section’s requirements. Sometimes, these requirements may turn directly into paragraphs; other times, they may be merged or split or interwoven. But they’re still useful, and they give me something concrete to think about as I go to implement the ideas with prose.

Wardrobe Simplification

Those of you in the Introduction to Recommender Systems MOOC may have noticed that I am always in a blue button-up shirt. You may or may not be asking ‘does he have any other shirts?’

I do. I primarily wear the blue shirt in the studio because the video producers have advised us against wearing white (the lights reflect off of white shirts to create a disturbingly angelic glow).

If you’re around GroupLens, or the CS department, you may have noticed that I am (almost) always in a white button-up shirt. You may have asked ‘does he have any other shirts?’, unless you are also in the MOOC, in which case you know I at least have a blue shirt.

Several months (or perhaps a year) ago, Jennifer and I embarked on a plan to standardize my wardrobe. We see minimalism as a useful tool for living the kind of life we want, and a number of minimalist writers we have read have a standard outfit. Also, deciding what to wear in the morning is a pretty low priority with respect to the rest of what I have to do, resulting in wasted decisional energy.1

Protecting Data

If data matters, as it does to most of us computer scientists, protecting it is important. Prompted by Greg Grossmeier’s search for encryption best practices (see also his synthesized advice), I thought I’d document some of the things I do to protect my data.

There are two principal threats I consider here:

Loss
I need to be able to keep my data, even if my computer or other storage media is lost, broken, or stolen. Backups are the principle means of protecting against loss.
Misuse
I’d really rather not have my data used if it falls into the wrong hands (e.g. my laptop gets stolen). Aside from general paranoia reasons, I have financial records, class assignment solutions, and things like that on my hard drives that shouldn’t really see the light of day without my authorization.

Keep On Chatting in the Free World

Warning: this post is a bit technical. Yes, a lot of my posts are technical, but this one contains content that I think non-technical friends and family are likely to find useful. I’ve tried to keep it accessible, but also have endeavored to not spend all day writing it and want to keep it technically accurate. If you like to chat with me and currently use Google to do so, I would encourage you to read it. If there’s demand, Jennifer can attempt a translation or paraphrase.

One key word that will come up a lot here: XMPP. XMPP stands for ‘eXtensible Messaging and Presence Protocol’. It is a protocol for instant messaging (and some other things) that, like e-mail, allows users on different servers to talk with each other. Google’s chat service, Google Talk, uses XMPP under the hood. That is why you can chat with me using Google, even though my <redacted>@elehack.net IM account is not with Google.

Why is LensKit written in Java?

We launched the Introduction to Recommender Systems MOOC this week. The programming assignments for the course will be using LensKit. It didn’t take long for us to receive the inevitable question, with many +1’s: must we (the students) really use Java? We’ve addressed that question in the course materials (short answer: you can use any JVM-compatible language, but all documentation and examples assume Java), but I thought I’d take a few bytes to explain why LensKit is written in Java rather than C++, Python, FORTH, or whatever. Even though, when we started the project, I didn’t particularly like Java.

We want LensKit to be useful, and broadly accessible, for teaching and research. This goal requires two things (at least): it needs to be both easily usable and easily extensible. Result: it must be usable from languages that are widely-known, and it must be implemented in a language that is widely known.

We also need LensKit to be efficient, so it can run on large data sets. And we want to be able to write efficiently in terms of programmer time.

If we just cared about performance and client-side accessibility, then a good choice would be C (or possibly C++) with bindings to other languages. However, this fails the two other criteria I have discussed so far: it would take much more time to implement and debug, and while C is widely-known, extending LensKit would also be difficult.