Michael Ekstrand

Student Skills {.outline data-level=2}

Research work in any discipline requires a variety of skills. Some of these you should have before you begin work; others you will learn along the way. This page catalogs a number of the skills that are useful or necessary to successfully work on the kinds of research that our research group does.

This page is not a list of prerequisites. I do not expect you to have all, or even most, of these skills before you start. Depending on the specific project you are working on, you may not have all of them before you are finished, although I am happy to work with you to learn any of them that you wish.

Rather, this is intended as a starting point for conversations about your studies and a place to collect resources for learning new skills that are useful in research. It's my hope that students work on developing these skills, maybe focusing on 1—2 each semester, as they work in our research group.

I have divided this guide into four sections:

  1. Prerequisite skills that really are needed before we can do much.
  2. Non-technical skills that will help you understand, conduct, and communicate research.
  3. Technical skills pertaining to technologies that we regularly use in our research.
  4. Auxiliary skills that you may find fun or useful to have.

The skill list is collapsed; click a skill to see more information.

Thanks to Jennifer Ekstrand, Sole Pera, and Cathie Olschanowsky for their valuable feedback in preparing this list.

Prerequisite Skills {#prereq}

  • Basic programming knowledge (completing CS 221 or equivalent experience), ideally in Java.
  • Ability to communicate about technical topics in spoken and written English.

That's it. If you have write some Java code, and we can communicate, that's enough to get started and we can work on the rest.

Non-Technical Skills {#non-tech}

Research is not just — or even primarily, in our group — about technology or technical skills. It is about expanding our knowledge and solving problems, which requires a wide array of work.

Statistics

A lot of our research involves statistical analysis of some kind: analyzing experiment results, mining data sets, or conducting simulations.

  • MATH 361: Probability and Statistics. Graduate students are allowed to take up to 2 out-of-department undergraduate classes (at the 300 level or higher) and count them towards their degree, so I recommend that all of my students try to get this into their schedule unless they already have comparable expertise.
  • Think Stats looks promising.

Some of our work specifically requires Bayesian statistical methods. Some resources:

Writing

It is not enough to conduct research; we need to write up the results and get them published.

Speaking

The bulk of computer science research is published in conferences, which means that we must present it and be able to talk about it effectively if we want people to pay attention to it.

  • For students in PIReT, we try to provide opportunities to practice speaking with the research group.
  • Toastmasters groups can be a useful practice environment.

Learning

While you are in school to learn generally, it's important to be able to go out and learn a skill or a technology that we don't specify in the curriculum. This is doubly important when working on research, as the point is to identify new knowledge — if we knew what we were doing, or could just look it up in a book, it would not be scientific research. Many times, I will not know it myself!

Planning and Executing Work

This skill really comprises two skills: high-level project planning, where you determine the scope and desired outcomes of a project, and day-to-day task and work management to actually get your work done.

Planning Projects

It's really easy to spin your wheels in research. It's even easier if you don't have a clear idea of what it is that you are trying to do.

Therefore, it's important to be able to plan a project (or subproject) to give it a clear direction and to help structure your work. Even in the fairly open-ended world of academic research, it is important to have clear direction.

Early on, your adviser will help a lot with this. But as you progress through your research career, you need to be able to do an increasing amount of project planning yourself. By the time you complete a Ph.D, you should be able to plan out a research project that is at least one paper's worth of work, and ideally more.

One of the key things to do in planning a project is to determine its intended outcome. If the project is successful, what will you have at the end? It does not matter much what you call this — ‘Definition of Done’ from SCRUM, a ‘Desired Outcome’, or whatever — the important thing is to define success. The desired outcome may just be ‘we have an evidence-based answer to research question $FOO’. Then you can work on determining how to move towards success.

Once you have a concrete goal, or an idea that you are considering developing into a concrete goal, it is useful to be able to iterate quickly on early ideas to filter out infeasible solutions and identify a promising path forward. Minimum Viable Research is one way of thinking about this.

Day-to-Day Work

There are many different day-to-day work management techniques and tools and no silver bullets. It is often not productive to attempt to religiously follow a particular methodology, or particularly to spend a great deal of time churning through different tools. No tool is perfect; each will let you down somehow, sometime. And they cannot solve everything.

Also, different roles require different workflows and task management systems. Many systems and books are primarily oriented towards white-collar knowledge and executive workers, particularly those with management responsibilities. They have some overlap with academic work but often need adaptation. A grab bag of techniques may work best for you.

The important thing is to be able to record the things that you need to do, in a reliable medium, and work on them. Many people are very productive with a spiral notebook.

I home hope to write more someday about what has worked for me, but here are several resources that may be useful as a starting point:

  • The One Minute To-Do List, and its expanded version Manage Your Now, has a lot of good ideas that translate reasonably well to academic work. I've adapted ideas from it in my own planning.
  • Many people swear by Getting Things Done. A lot of software tools are oriented towards this methodology.
  • Bullet Journal provides some structure for work management with paper notebooks.
  • Dave Lee's Week Chart can be very useful. Even though I don't do Week Charts, I use the basic desired-outcome ideas from this method.

There are also a million software tools that may or may not help, such as Wunderlist, Todoist, Toodledo, Outlook Tasks, OmniFocus, TaskPaper, Emacs org-mode, and TaskWarrior.

Technical Skills {#tech}

With these skills, you will develop a deeper understanding of the various technologies we use in our research.

Java Programming

The LensKit software that we maintain is written in Java. Even if your research does not directly contribute to LensKit, there is a good chance that you will need to work with it, and we also use Java packages for much of our other work (such as Lucene for text search).

  • Effective Java, 2nd Edition is a fantastic book for learning modern, adaptable Java programming practices. We generally follow its advice in LensKit unless we have a good (and hopefully documented) reason to deviate. This book will not teach you Java; rather, it is for programmers with knowledge of Java's core concepts to learn how best to apply them.

Data Structures and Algorithms

Data structures and algorithms are foundational programming topics that will help you learn better how to structure and reason about computations.

  • CS 321: Data Structures
  • CS 421: Algorithms
  • Introduction to Algorithms by Cormen, Leiserson, Rivest, and Stein (sometimes called ‘CLRS’) is the canonical textbook on the subject. It is not very good for initially learning the material on your own, unless you have a high degree of mathematical proficiency, but it is an excellent reference book to have on your shelf.
  • Programming Pearls is a good book for seeing algorithmic concepts worked out.
  • Project Euler is a set of exercises that will let you practice designing and implementing mathematical algorithms.

Data Analysis in R

I recommend students use the R programming language for data analysis, experiment analysis, and statistical visualization. Learning R therefore will be helpful for a lot of the research projects with our group.

A lot of older R primers have you doing a lot of data wrangling and visualization using R's built-in primitive commands, or perhaps using the lattice package. However, the dplyr and ggplot2 packages provide flexible, high-level, and performant operations that allow you to easily express very complex computations and visualizations in quite readable code. I therefore recommend learning dplyr very early (both books above teach dplyr), and skipping the built-in R visualization commands entirely in favor of ggplot2 (Wickham's book teaches ggplot2).

I also find Anaconda/Miniconda and Jupyter to be very useful when doing analysis in R, as documented here.

Note: some students will prefer to do this analysis in Python. I am happy to have them do so, but will not be able to provide near as much support or preexisting code for such analyses. However, R is a more fitting language for a PIReT, don't you think?

Git

We generally use Git to manage our source code; the LensKit project uses Git, and we usually use it for managing our experiment scripts and sometimes our papers. Source control in general is a useful tool, and Git is the dominant tool in version control for open-source software.

Unix Command Line

While we use a variety of operating systems for our local computing environments (I myself use Windows), basic familiarity with the Unix command line is useful as we generally run Linux on our servers for data analysis and deploying live applications, and other infrastructure we work with is built on Unix-like platforms (such as the Travis continuous integration server).

Gradle

LensKit is built with Gradle, and LensKit experiments are also run with Gradle. We frequently use it as our go-to automation tool in other environments as well.

  • The Gradle User Guide is the primary resource for understanding Gradle.
  • Groovy in Action is a good book for learning Groovy, the Java-like programming language that is used to control Gradle. Deep knowledge of Groovy is not necessary for the vast majority of our work, but it can be helpful for working on developing certain corners of LensKit's evaluation support.

A Scripting Language

Many projects involve some kind of auxiliary computation — converting data formats, importing into a database, moving files around, etc. It's useful to have some utility language to write this kind of code in; you can do it in Java, but another language can often be quicker for writing such code.

There are a number of good candidates:

  • Many people use Python
  • I often use JavaScript with Node.js, particularly for data processing
  • Perl can be a reasonable choice
  • Ruby works too
  • UNIX shell is useful, but often too limited
  • Groovy is great for integrating with Java code

I personally do not much have preference which you use; the code should be documented, along with all its dependencies (so that it can be run again). Use what you know, or the others working on your immediate project are using.

Web Development

Whenever we build a user-facing experiment, it's usually deployed as a web application. Therefore, web development can be a very valuable skill for our research, depending on the exact project you are working on.

  • HTML & CSS and JavaScript & jQuery, both published by Wiley and available as a two-book package, are good resources for learning the foundations of front-end web technology.

Database Design and Programming

Some projects may require use of an SQL database, such as PostgreSQL.

  • The Practical SQL Handbook is a very accessible book on programming and data modeling for SQL databases. I learned relational databases myself using the 3rd edition of this text.
  • CS 410/510: Databases is a course on data modeling and database programming offered in our department. I sometimes teach it.

Auxiliary & Recreational Skills {#aux}

These skills are fun, perhaps useful, but aren't in the direct path to most of our research outcomes.

Functional Programming

You'll pick up some functional programming along the way in a lot of other work, because functional concepts have worked their way pretty heavily into modern JavaScript, Python, and even Java. A serious study of functional programming can improve your ability to make use of those language features (sometimes to your collaborators' chagrin).

  • Learn OCaml, Standard ML, Scheme, or Haskell. These languages work with a heavily functional paradigm; Haskell is purely functional.
  • CS 354 (Programming Languages) and/or CS 531 (Advanced Programming Languages).
  • Implementing Functional Languages by Jones & Lester is fantastic; it predates a lot of modern functional theory, such as monads, but is still very helpful. While it is about implementing functional languages, the book is a blend of code and prose explaining how to implement a functional language in a functional language. It therefore serves as an extensive worked example of functional programming, by masters, with explanations of what they are doing and why. For that reason, it greatly improved my own functional programming abilities.

Extemporaneous Speaking

While some of our oral communication is in prepared form — conference talks, seminars, lectures, and the like — we also need to be able to communicate about it without preparation: to answer questions, pitch our work in the hallway, ask good questions about others' talks, etc.

  • Groups like Toastmasters provide a venue to practice speaking.

Building Bicycle Wheels

While in grad school I learned to build bicycle wheels, and get a great deal of satisfaction out of a well-made wheel that I built with my own hands.

  • Take a class from your local bike shop
  • The Bicycle Wheel (Brandt) has a decent explanation of the building process, and a very good and detailed explanation of the physics of a bicycle wheel.

Tweeting Parodies of Musical Lyrics

This is a very important skill.

The best way I know to obtain it is to read a lot so you have an extensive repertoire of words and sentence structures to make the lyrics work out, and then to practice.

For examples, see Les Bicyclebles.