If we can aggressively simplify for a moment, earning a Ph.D has three primary components:
- Do research.
- Write it up in a dissertation.
- Convince a committee of faculty that what you’ve done and presented is worthy of a research-based terminal academic degree.
There are some other things in each program, such as courses and qualifiers, but this is the heart of what earns the degree.
But what is that mysterious “dissertation”?
It’s a full report on a body of research that is sufficient to demonstrate competence as an independent researcher and earn a Ph.D: an original contribution to knowledge in your field. Matt Might has a good illustrated guide to what it looks like to create new knowledge, and how that relates to earlier academic training.
But that still leaves the question about the dissertation itself: what should such a document contain, and how should it be organized? When you do your dissertation proposal, what are you actually proposing to do?
That’s what I hope to address in this post.
Before we get to the organization of the dissertation document itself, I want to spend just a little time on the scope of a dissertation.
A dissertation should contain the research content of approximately three good papers (full conference or journal papers). There are variations on this — sometimes there’s 4 papers, sometimes two short papers replace a full paper — but it’s the basic idea. Not all papers need to be published: at least one should, but the publication process is fickle sometimes. The ideal is probably to have one published, another published or accepted, and a third under review (or in preparation for a deadline shortly after defense).
These three papers will also often not be the only papers you write in the course of your Ph.D — while additional papers are not needed to earn the degree, they’re usually necessary to have a competitive application when pursuing research-oriented jobs.
The 3–4 papers should also be on a theme so you can tell a coherent story of your dissertation work. Three disconnected papers on different topics are hard to sell, and make it difficult for you to make a clear pitch of research directions when you’re on the job market. There isn’t one story for a dissertation, but there are a few general shapes that tend to work well:
- The Hammer
- A “hammer” dissertation focuses on a particular toolset and demonstrates its applicability across a range of problems; each paper is applying a similar tool to a different problem, and the overall synthesis draws the resulting knowledge together to show where the tool does and does not work, what’s needed to make it effective, and how to adapt it to a range of settings. You have a hammer, and you show that it can pound ordinary nails, U-shaped nails, and some weird five-pointed thing.
- The Problem
- A “problem” dissertation focuses on a particular problem or problem area and does a deep multi-paper exploration. The papers may apply different techniques to the same problem, or they may examine different stages of the problem or different perspectives on it. The key thing here is that the problem, rather than the tools, are the unifying theme: in the analogy you’re trying to study the problem of fastening things to wood, and study using a hammer and nails, a screwdriver, and glue. Or maybe you study different things to do with wood and similar materials, like pounding, carving, and painting.
- The Hybrid
- A hybrid dissertation is between the problem and the hammer: it focuses primarily on one application or problem for 2 of the papers, and spends the third showing that the techniques and/or knowledge are also applicable to a related problem. You focus on the problem of fastening things to wooden objects, and then show the hammer can also be used for something else.
There are likely other workable designs as well, but most coherent stories for 3–4 papers will probably fit one of these patterns, more or less.
So you have some papers, and an overall narrative to show how they form a connected and coherent body of work. What does the actual document look like?
There are some variations, but I expect most dissertations I advise to have an outline approximately like this:
Introduction. The first chapter casts your overall vision: defines your topic and the terms needed to understand it, presents your story, and previews your contributions. In particular, it sets up your organizing theme (either the hammer or the problem you’re solving). By the end of it, the reader should know (1) what you’re trying to do (including your organizing principle), (2) why it matters, and (3) your core contributions. The rest of the dissertation is to then convince them that you actually make the contributions you claim.
Background & Related Work. The second chapter is your primary literature survey. This serves two distinct but related roles1: first, it covers the necessary background for a reader who is competent in computer science broadly, but not your specific specialty, to understand the rest of your work. Second, it positions your dissertation work in the broader research space, and in particular other work on your problem and related or precursor problems.
This is the literature survey for your whole dissertation. Some later chapters may also contain small background and/or related work sections that survey work specifically supporting that chapter’s unique work, but the common elements should usually be factored out into Chapter 2. In some dissertations, you may present most of the background in Chapter 1, so Chapter 2 is just related work, but in my experience there’s still background that’s needed in Chapter 2.
Common Infrastructure (optional). In some dissertations, you’ll have some resource, such as software or a data set, that you use throughout the entire dissertation. If it doesn’t make sense to describe it in Ch. 2, it can be useful to spend Chapter 3 describing this resource in some detail. In some cases, if it is an original resource, it may also be a paper, particularly if your discipline has venues for publishing resources such as the SIGIR Resource Track or the NeurIPS Datasets & Benchmarks track. Many dissertations won’t have a dedicated chapter for this, though.
Research Content. The next 3–4 chapters present your primary research content. Each of your component papers usually becomes a chapter; much of the content can be reused from the paper, but you usually need to make a few changes:
- Rewrite the intro so it flows narratively as a chapter of a book rather than a standalone paper, including discussions of how it relates to the other chapters (particularly the previous chapters)
- Rewrite the background and/or related work so that any ideas shared with the other papers in the dissertation are moved to Chapter 2, and the content chapter only has background that’s specific to that chapter’s methods (or the methods that are introduced for use in later chapters). You don’t want to introduce the same related work in three different chapters.
- Expand the writing to include useful details that you had to drop from the published version for length reasons, further charts and results that shed deeper insight on your findings, etc. In some cases, you may need to re-run or update experiments, particularly if your methods have improved in later portions of the work. What this looks like will differ a lot between dissertations; in my own, Chapter 3 described our research software, which evolved quite a bit from the first published version to what was released as I finished grad school, so I largely rewrote that chapter to describe the software as it was at the end, not the first version.
Conclusion. Your last chapter ties it all together: given the vision outlined in Ch. 1, and the work presented in the research content chapters, what do we know about your topic now that we didn’t know before you started the Ph.D? What are the next steps to advance knowledge beyond what you’ve accomplished in the dissertation?
Appendices. Some dissertations have appendices; their use varies. I’ve seen them used for additional research content outside the main narrative flow, such as another paper the student wrote. They’re also useful for additional supporting evidence for the research content that would break the flow too much if you included it in the chapter, but you want to make available to readers who wish to check your work more thoroughly. This can include documentation for software you developed, more complete output from statistical models, supplementary charts, etc. If anything is needed to understand one of your research results, however, it should go in the main chapter, not an appendix.
There are some variations on the themes — I didn’t have a 1:1 relationship between papers and chapters in my own dissertation — but for a typical computer science dissertation, this outline will usually work pretty well, and strikes a useful balance (in my opinion) between a pure staple dissertation that does no integration, and a complete rewrite of all the material.
When you’re planning out your dissertation work, particularly around the proposal stage, that’s what you’re planning to write. Make sure you leave plenty of time for the writing — it can take longer than you expect, and while the dissertation doesn’t need to be your best writing, it should be reasonably good and definitely needs to be clear and readable.
Thanks to Sole Pera for frequently reminding me not to blur these roles.↩︎