Student Skills

Research work in any discipline requires a variety of skills. Some of these you should have before you begin work; others you will learn along the way. This page catalogs a number of the skills that are useful or necessary to successfully work on the kinds of research that our research group does.

This page is not a list of prerequisites. I do not expect you to have all, or even most, of these skills before you start. Depending on the specific project you are working on, you may not have all of them before you are finished, although I am happy to work with you to learn any of them that you wish.

Rather, this is intended as a starting point for conversations about your studies and a place to collect resources for learning new skills that are useful in research. It’s my hope that students work on developing these skills, maybe focusing on 1—2 each semester, as they work in our research group.

I have divided this guide into four sections:

Prerequisite skills that really are needed before we can do much.
General skills that will help you understand, conduct, and communicate research.
Technical skills pertaining to technologies that we regularly use in our research.
Auxiliary skills that you may find fun or useful to have.

The skill list is collapsed; click a skill to see more information.

Thanks to Jennifer Ekstrand, Sole Pera, and Cathie Olschanowsky for their valuable feedback in preparing this list.

Prerequisite Skills

Ability to communicate about technical topics in spoken and written English.
For most projects, some programming knowledge.

If you can write some Python code, and we can communicate, that’s usually enough to get started and we can work on the rest, particularly for undergraduate students. I have some projects with roles that do not require programming knowledge, but they are less common.

General Skills

Research is not just — or even primarily, in our group — about technology or technical skills. It is about expanding our knowledge and solving problems, which requires a wide array of work.

Reading

Reading research is a skill in its own right. There are different types of reading for different types of papers and information needs. I think of a spectrum of reading levels for research papers, including the following levels as well as others in between:

Quick scan to see the key idea and assess whether the paper is relevant to our topic.
Read for key ideas and findings: what does the paper do, how, what do we learn, and what are major weak spots.
Read to apply ideas: understand the paper’s methods and implications well enough to use them.
Deep read: understand the paper in all its detail well enough that we could try to reproduce it.

There are other types of reading as well, such as reading a paper not for its content but to understand how the writing itself works.

It’s generally not a good idea to just read a paper from front to back. Some papers make this easy, but many papers do not. A common approach is to read the abstract, introduction, and conclusion, and then to read more if it is relevant.

Here are some more resources for reading papers:

How to Read a Paper (read this)
How to Read a Technical Paper

Writing

It is not enough to conduct research; we need to write up the results and get them published.

Style: Lessons in Clarity and Grace by Williams and Bilzup is fantastic. Read this book.
The Science of Scientific Writing
Writing Research Papers by James D. Lester
Writing for Computer Science by Justin Zobel

Abstracts

How to write a scientific abstract in six easy steps

Speaking

The bulk of computer and information science research is published in conferences, which means that we must present it and be able to talk about it effectively if we want people to pay attention to it.

For students in PIReT, we try to provide opportunities to practice speaking with the research group.
Toastmasters groups can be a useful practice environment.

Learning

While you are in school to learn generally, it’s important to be able to go out and learn a skill or a technology that we don’t specify in the curriculum. This is doubly important when working on research, as the point is to identify new knowledge — if we knew what we were doing, or could just look it up in a book, it would not be scientific research. Many times, I will not know it myself!

Statistics

A lot of our research involves statistical analysis of some kind: analyzing experiment results, mining data sets, or conducting simulations.

Statistics courses.
Think Stats looks promising.

Some of our work specifically requires Bayesian statistical methods. Some resources:

Statistical Rethinking by Richard McElreath.
A First Course in Bayesian Statistical Methods is a pretty good starting point for learning Bayesian statistics if you have some familiarity with calculus.
Think Bayes is a much more informal introduction to the same topic, oriented towards programmers rather than mathematicians.

Planning and Executing Work

This skill really comprises two skills: high-level project planning, where you determine the scope and desired outcomes of a project, and day-to-day task and work management to actually get your work done. Both are important to various degrees, depending on your role and educational stage.

For general tips on productivity, see my blog series and resources.

Day-to-Day Work

There are many different day-to-day work management techniques and tools and no silver bullets. It is often not productive to attempt to religiously follow a particular methodology, or particularly to spend a great deal of time churning through different tools. No tool is perfect; each will let you down somehow, sometime. And they cannot solve everything.

Also, different roles require different workflows and task management systems. Many systems and books are primarily oriented towards white-collar knowledge and executive workers, particularly those with management responsibilities. They have some overlap with academic work but often need adaptation. A grab bag of techniques may work best for you.

The important thing is to be able to record the things that you need to do, in a reliable medium, and work on them. Many people are very productive with a spiral notebook.

I have written a series of articles about my own approach to planning and managing work, and include there a number of links to other resources.

If you want to read one productivity book, I recommend The One Minute To-Do List.

There are also a million software tools that may or may not help, such as Logseq, Todoist, Toodledo, Microsoft To Do, OmniFocus, TaskPaper, Emacs org-mode, and TaskWarrior.

Planning Projects

It’s really easy to spin your wheels in research. It’s even easier if you don’t have a clear idea of what it is that you are trying to do.

Therefore, it’s important to be able to plan a project (or subproject) to give it a clear direction and to help structure your work. Even in the fairly open-ended world of academic research, it is important to have clear direction.

Early on, your adviser will help a lot with this. But as you progress through your research career, you need to be able to do an increasing amount of project planning yourself. By the time you complete a Ph.D, you should be able to plan out a research project that is at least one paper’s worth of work, and ideally more.

One of the key things to do in planning a project is to determine its intended outcome. If the project is successful, what will you have at the end? It does not matter much what you call this — ‘Definition of Done’ from SCRUM, a ‘Desired Outcome’, or whatever — the important thing is to define success. The desired outcome may just be ‘we have an evidence-based answer to research question $FOO’. Then you can work on determining how to move towards success.

Once you have a concrete goal, or an idea that you are considering developing into a concrete goal, it is useful to be able to iterate quickly on early ideas to filter out infeasible solutions and identify a promising path forward. Minimum Viable Research is one way of thinking about this.

Technical Skills

With these skills, you will develop a deeper understanding of the various technologies we use in our research.

Python Programming

The single most useful programming language for much of the work we do is Python.

The LensKit software that we maintain is written in Python. Even if your research does not directly contribute to LensKit, there is a good chance that you will need to work with it, and we also use Python packages for much of our other work.

We also do a lot of data analysis and general utility programming in Python.

Python for Data Analysis is a good book for learning the data-oriented Python that we write in our group.

Data Structures and Algorithms

Data structures and algorithms are foundational programming topics that will help you learn better how to structure and reason about computations.

Introduction to Algorithms by Cormen, Leiserson, Rivest, and Stein (sometimes called ‘CLRS’) is the canonical textbook on the subject. It is not very good for initially learning the material on your own, unless you have a high degree of mathematical proficiency, but it is an excellent reference book to have on your shelf.
Programming Pearls is a good book for seeing algorithmic concepts worked out.
Project Euler is a set of exercises that will let you practice designing and implementing mathematical algorithms.

Git

We generally use Git to manage our source code; the LensKit project uses Git, and we usually use it for managing our experiment scripts and sometimes our papers. Source control in general is a useful tool, and Git is the dominant tool in version control for open-source software.

Version Control By Example is a free e-book that teaches many version control systems, including Git.
The Git documentation page has links to several resources.
Pro Git is another free e-book with in-depth information about using Git.
How to Write a Git Commit Messsage and A Note About Git Commit Messages.

Unix Command Line

While we use a variety of operating systems for our local computing environments (I myself use Windows), basic familiarity with the Unix command line is useful as we generally run Linux on our servers for data analysis and deploying live applications, and other infrastructure we work with is built on Unix-like platforms (such as the Travis continuous integration server).

STAN Programming

We usually use STAN for Bayesian statistical inference. Fortunately, it has very high-quality documentation; we’re also developing an increasing amount of lab expertise in it, so your labmates can probably help too!

Web Development

Whenever we build a user-facing experiment, it’s usually deployed as a web application. Therefore, web development can be a very valuable skill for our research, depending on the exact project you are working on.

HTML & CSS is a good resource for learning HTML. It is older but mostly still valid.
I am looking for a good, modern client-side JavaScript resource that isn’t bound to a specific framework like React.
We currently tend to do web development fully in JavaScript, with Node.js on the backend; sometimes we will use a Python backend with e.g. Flask.

Database Design and Programming

Some projects may require use of an SQL database, such as PostgreSQL.

The Practical SQL Handbook is a very accessible book on programming and data modeling for SQL databases. I learned relational databases myself using the 3rd edition of this text.

Rust Programming

Rust is turning out to be an excellent language for high-throughput data processing. It allows us to write extremely fast code without the hassle of C or C++.

For an example of how we’ve integrated Python, Rust, and advanced PostgreSQL, see the book data tools.

Auxiliary & Recreational Skills

These skills are fun, perhaps useful, but aren’t in the direct path to most of our research outcomes.

Functional Programming

You’ll pick up some functional programming along the way in a lot of other work, because functional concepts have worked their way pretty heavily into modern JavaScript, Python, and even Java. A serious study of functional programming can improve your ability to make use of those language features (sometimes to your collaborators’ chagrin).

Learn OCaml, Standard ML, Scheme, or Haskell. These languages work with a heavily functional paradigm; Haskell is purely functional.
Implementing Functional Languages by Jones & Lester is fantastic; it predates a lot of modern functional theory, such as monads, but is still very helpful. While it is about implementing functional languages, the book is a blend of code and prose explaining how to implement a functional language in a functional language. It therefore serves as an extensive worked example of functional programming, by masters, with explanations of what they are doing and why. For that reason, it greatly improved my own functional programming abilities.

Extemporaneous Speaking

While some of our oral communication is in prepared form — conference talks, seminars, lectures, and the like — we also need to be able to communicate about it without preparation: to answer questions, pitch our work in the hallway, ask good questions about others’ talks, etc.

Groups like Toastmasters provide a venue to practice speaking.

Building Bicycle Wheels

While in grad school I learned to build bicycle wheels, and get a great deal of satisfaction out of a well-made wheel that I built with my own hands.

Take a class from your local bike shop
The Bicycle Wheel (Brandt) has a decent explanation of the building process, and a very good and detailed explanation of the physics of a bicycle wheel.

Parodies of Musical Lyrics

This is a very important skill.

The best way I know to obtain it is to read a lot so you have an extensive repertoire of words and sentence structures to make the lyrics work out, and then to practice.

For examples, see Les Bicyclebles.