Student Skills

Research work in any discipline requires a variety of skills. Some of these you should have before you begin work; others you will learn along the way. This page catalogs a number of the skills that are useful or necessary to successfully work on the kinds of research that our research group does.

This page is not a list of prerequisites. I do not expect you to have all, or even most, of these skills before you start. Depending on the specific project you are working on, you may not have all of them before you are finished, although I am happy to work with you to learn any of them that you wish.

Rather, this is intended as a starting point for conversations about your studies and a place to collect resources for learning new skills that are useful in research. It’s my hope that students work on developing these skills, maybe focusing on 1—2 each semester, as they work in our research group.

I have divided this guide into four sections:

  1. Prerequisite skills that really are needed before we can do much.
  2. Non-technical skills that will help you understand, conduct, and communicate research.
  3. Technical skills pertaining to technologies that we regularly use in our research.
  4. Auxiliary skills that you may find fun or useful to have.

The skill list is collapsed; click a skill to see more information.

Thanks to Jennifer Ekstrand, Sole Pera, and Cathie Olschanowsky for their valuable feedback in preparing this list.

Prerequisite Skills

  • Basic programming knowledge (completing CS 221 or equivalent experience).
  • Ability to communicate about technical topics in spoken and written English.

That’s it. If you can write some Python code, and we can communicate, that’s enough to get started and we can work on the rest.

Non-Technical Skills

Research is not just — or even primarily, in our group — about technology or technical skills. It is about expanding our knowledge and solving problems, which requires a wide array of work.

Reading

Reading research is a skill in its own right. There are different types of reading for different types of papers and information needs. I think of a spectrum of reading levels for research papers, including the following levels as well as others in between:

  • Quick scan to see the key idea and assess whether the paper is relevant to our topic.
  • Read for key ideas and findings: what does the paper do, how, what do we learn, and what are major weak spots.
  • Read to apply ideas: understand the paper’s methods and implications well enough to use them.
  • Deep read: understand the paper in all its detail well enough that we could try to reproduce it.

There are other types of reading as well, such as reading a paper not for its content but to understand how the writing itself works.

It’s generally not a good idea to just read a paper from front to back. Some papers make this easy, but many papers do not. A common approach is to read the abstract, introduction, and conclusion, and then to read more if it is relevant.

Here are some more resources for reading papers:

Statistics

A lot of our research involves statistical analysis of some kind: analyzing experiment results, mining data sets, or conducting simulations.

  • MATH 361: Probability and Statistics. Graduate students are allowed to take up to 2 out-of-department undergraduate classes (at the 300 level or higher) and count them towards their degree, so I recommend that all of my students try to get this into their schedule unless they already have comparable expertise.
  • Think Stats looks promising.

Some of our work specifically requires Bayesian statistical methods. Some resources:

Writing

It is not enough to conduct research; we need to write up the results and get them published.

Speaking

The bulk of computer science research is published in conferences, which means that we must present it and be able to talk about it effectively if we want people to pay attention to it.

  • For students in PIReT, we try to provide opportunities to practice speaking with the research group.
  • Toastmasters groups can be a useful practice environment.

Learning

While you are in school to learn generally, it’s important to be able to go out and learn a skill or a technology that we don’t specify in the curriculum. This is doubly important when working on research, as the point is to identify new knowledge — if we knew what we were doing, or could just look it up in a book, it would not be scientific research. Many times, I will not know it myself!

Planning and Executing Work

This skill really comprises two skills: high-level project planning, where you determine the scope and desired outcomes of a project, and day-to-day task and work management to actually get your work done.

Planning Projects

It’s really easy to spin your wheels in research. It’s even easier if you don’t have a clear idea of what it is that you are trying to do.

Therefore, it’s important to be able to plan a project (or subproject) to give it a clear direction and to help structure your work. Even in the fairly open-ended world of academic research, it is important to have clear direction.

Early on, your adviser will help a lot with this. But as you progress through your research career, you need to be able to do an increasing amount of project planning yourself. By the time you complete a Ph.D, you should be able to plan out a research project that is at least one paper’s worth of work, and ideally more.

One of the key things to do in planning a project is to determine its intended outcome. If the project is successful, what will you have at the end? It does not matter much what you call this — ‘Definition of Done’ from SCRUM, a ‘Desired Outcome’, or whatever — the important thing is to define success. The desired outcome may just be ‘we have an evidence-based answer to research question $FOO’. Then you can work on determining how to move towards success.

Once you have a concrete goal, or an idea that you are considering developing into a concrete goal, it is useful to be able to iterate quickly on early ideas to filter out infeasible solutions and identify a promising path forward. Minimum Viable Research is one way of thinking about this.

Day-to-Day Work

There are many different day-to-day work management techniques and tools and no silver bullets. It is often not productive to attempt to religiously follow a particular methodology, or particularly to spend a great deal of time churning through different tools. No tool is perfect; each will let you down somehow, sometime. And they cannot solve everything.

Also, different roles require different workflows and task management systems. Many systems and books are primarily oriented towards white-collar knowledge and executive workers, particularly those with management responsibilities. They have some overlap with academic work but often need adaptation. A grab bag of techniques may work best for you.

The important thing is to be able to record the things that you need to do, in a reliable medium, and work on them. Many people are very productive with a spiral notebook.

I have written a series of articles about my own approach to planning and managing work, and include there a number of links to other resources.

If you want to read one productivity book, I recommend The One Minute To-Do List.

There are also a million software tools that may or may not help, such as Wunderlist, Todoist, Toodledo, Outlook Tasks, OmniFocus, TaskPaper, Emacs org-mode, and TaskWarrior.

Technical Skills

With these skills, you will develop a deeper understanding of the various technologies we use in our research.

Python Programming

The single most useful programming language for much of the work we do is Python.

The LensKit software that we maintain is written in Python. Even if your research does not directly contribute to LensKit, there is a good chance that you will need to work with it, and we also use Python packages for much of our other work.

We also do a lot of data analysis and general utility programming in Python.

Data Structures and Algorithms

Data structures and algorithms are foundational programming topics that will help you learn better how to structure and reason about computations.

  • CS 321: Data Structures
  • CS 421: Algorithms
  • Introduction to Algorithms by Cormen, Leiserson, Rivest, and Stein (sometimes called ‘CLRS’) is the canonical textbook on the subject. It is not very good for initially learning the material on your own, unless you have a high degree of mathematical proficiency, but it is an excellent reference book to have on your shelf.
  • Programming Pearls is a good book for seeing algorithmic concepts worked out.
  • Project Euler is a set of exercises that will let you practice designing and implementing mathematical algorithms.

Git

We generally use Git to manage our source code; the LensKit project uses Git, and we usually use it for managing our experiment scripts and sometimes our papers. Source control in general is a useful tool, and Git is the dominant tool in version control for open-source software.

Unix Command Line

While we use a variety of operating systems for our local computing environments (I myself use Windows), basic familiarity with the Unix command line is useful as we generally run Linux on our servers for data analysis and deploying live applications, and other infrastructure we work with is built on Unix-like platforms (such as the Travis continuous integration server).

Gradle

LensKit is built with Gradle, and LensKit experiments are also run with Gradle. We frequently use it as our go-to automation tool in other environments as well.

  • The Gradle User Guide is the primary resource for understanding Gradle.
  • Groovy in Action is a good book for learning Groovy, the Java-like programming language that is used to control Gradle. Deep knowledge of Groovy is not necessary for the vast majority of our work, but it can be helpful for working on developing certain corners of LensKit’s evaluation support.

A Scripting Language

Many projects involve some kind of auxiliary computation — converting data formats, importing into a database, moving files around, etc. It’s useful to have some utility language to write this kind of code in; you can do it in Java, but another language can often be quicker for writing such code.

There are a number of good candidates:

  • Many people use Python
  • I often use JavaScript with Node.js, particularly for data processing
  • Perl can be a reasonable choice
  • Ruby works too
  • UNIX shell is useful, but often too limited
  • Groovy is great for integrating with Java code

I personally do not much have preference which you use; the code should be documented, along with all its dependencies (so that it can be run again). Use what you know, or the others working on your immediate project are using.

Web Development

Whenever we build a user-facing experiment, it’s usually deployed as a web application. Therefore, web development can be a very valuable skill for our research, depending on the exact project you are working on.

  • HTML & CSS and JavaScript & jQuery, both published by Wiley and available as a two-book package, are good resources for learning the foundations of front-end web technology.

Database Design and Programming

Some projects may require use of an SQL database, such as PostgreSQL.

  • The Practical SQL Handbook is a very accessible book on programming and data modeling for SQL databases. I learned relational databases myself using the 3rd edition of this text.
  • CS 410/510: Databases is a course on data modeling and database programming offered in our department. I sometimes teach it.

Auxiliary & Recreational Skills

These skills are fun, perhaps useful, but aren’t in the direct path to most of our research outcomes.

Functional Programming

You’ll pick up some functional programming along the way in a lot of other work, because functional concepts have worked their way pretty heavily into modern JavaScript, Python, and even Java. A serious study of functional programming can improve your ability to make use of those language features (sometimes to your collaborators’ chagrin).

  • Learn OCaml, Standard ML, Scheme, or Haskell. These languages work with a heavily functional paradigm; Haskell is purely functional.
  • CS 354 (Programming Languages) and/or CS 531 (Advanced Programming Languages).
  • Implementing Functional Languages by Jones & Lester is fantastic; it predates a lot of modern functional theory, such as monads, but is still very helpful. While it is about implementing functional languages, the book is a blend of code and prose explaining how to implement a functional language in a functional language. It therefore serves as an extensive worked example of functional programming, by masters, with explanations of what they are doing and why. For that reason, it greatly improved my own functional programming abilities.

Extemporaneous Speaking

While some of our oral communication is in prepared form — conference talks, seminars, lectures, and the like — we also need to be able to communicate about it without preparation: to answer questions, pitch our work in the hallway, ask good questions about others’ talks, etc.

  • Groups like Toastmasters provide a venue to practice speaking.

Building Bicycle Wheels

While in grad school I learned to build bicycle wheels, and get a great deal of satisfaction out of a well-made wheel that I built with my own hands.

  • Take a class from your local bike shop
  • The Bicycle Wheel (Brandt) has a decent explanation of the building process, and a very good and detailed explanation of the physics of a bicycle wheel.

Tweeting Parodies of Musical Lyrics

This is a very important skill.

The best way I know to obtain it is to read a lot so you have an extensive repertoire of words and sentence structures to make the lyrics work out, and then to practice.

For examples, see Les Bicyclebles.