Content
- Are observations and conclusions substantiated by data and/or sound argument?
- Are goals and observations made clear, both for the document and for individual pieces of analysis?
This page provides advice about writing notebooks (Jupyter, Quarto, Rnotebook, etc.) that are effective and easy to read. I originally developed it for my data science course, and provide a revised version here for reference.
Notebooks are communication tools. It is not enough to create a notebook that contains code to compute correct results; you also need to ensure that your notebooks are well-structured documents that communicate your work and findings.
I provide a checklist at the end of this page. This video from CS 533 @ Boise State University discusses the material here. It is not a replacement for this document, and talks primarily about Python, but is hopefully a useful supplement if you prefer video content (slides also available): As I discuss in the video, a notebook is first a document — a document that incorporates code and visuals to present data with written explanation. This means that you need to provide text that walks your reader through the story of your data analysis: what you are doing, why, how, and what we learn. This does not mean you should to describe every little detail (as too much detail can actually make it harder to read), but you need to provide the context for what the analyses mean and what we can learn from them. In the context of a course assignment, the solution notebook should be readable even without having read the original assignment description: they should stand alone. The document also needs to include the code outputs, where applicable — the reader should be able to read it without re-running the code, although the code is there so that they can. Without the output, the notebook does not work as a communicative document. The final, exported, submitted version of a notebook should contain the outputs from a complete run: make sure that re-running the notebook from top to bottom (e.g. “Restart and Run All” in Jupyter) works, and that the outputs included for submission are the results of this. This makes sure you submit the notebook in a working state, and that the results still match the code as written. Otherwise, we can get situations where e.g. you change a data file, but don’t rerun everything, and therefore some outputs do not reflect the data file loaded in the data load code. In the common notebook tools, the text content (text cells in Jupyter, document content in Quarto or Rmd) of a notebook is formatted with Markdown. LaTeX math markup, delimited with will yield: The equation does not hold for integers . Notebooks also support block-mode math, delimited with Use Markdown formatting judiciously to highlight important points and make your notebooks easier to read. For example, using code formatting to mark up Python or R functions is often helpful: This renders as something like this, except using the notebook tool’s theme instead of my website’s: The Markdown also supports strong emphasis (bold) and emphasis (italics), which can be very useful for highlighting your argument. In addition to the common Markdown syntax, Jupyter and Quarto both support various extensions, such as One thing that is important to pay attention to is use of section headings, as discussed in the video. Section headings are indicated with Section headings are a crucial tool for structuring your document and making it easier to read. It’s important to note, however, that these have actual meaning: Finally, it is often helpful to use lists, either numbered or bulleted. For further reference on Markdown features and syntax, see: Rmd and Rnotebook mostly use CommonMark and GFM syntax. I recommend leaving time before something is due to go back through the notebook and clean it up for final presentation. Sometimes it works best to start with the notebook you have and delete unnecessary code, remove excess debugging outputs, and improve the writing. Sometimes it works best to start a fresh notebook, start putting together the structure, and copy over the code you actually need for the final solution. Either way, you should produce a final notebook that is: This last point is to avoid the “sea of charts” effect. If there are a lot of charts and tables that don’t advance your story, it is much harder to read. Not every output you created in the process of figuring out how to solve the problem will be useful to your reader. Additional debugging or deep-dive outputs can be moved to a separate file (that should also be executable!) and linked as an appendix to your main report. Once you have your notebook ready and complete, you usually want to export it to a standalone file so that you can share it, submit it as an assignment solution, etc. without requiring the reader to open it in the notebook server (and for Quart, Rmd, or Rnotebook, an export is the only way to provide a file that includes the outputs, since they are not saved in the source notebook file). Any of these tools support HTML export, and it is often the easiest to produce. From Jupyter, you can choose “File” → “Save and Export Notebook As…” → “HTML”, and it will create a single HTML file that contains the text, code, and all outputs. When writing Quarto or Rmd/Rnotebook in Rstudio, you want to “Knit” the file to HTML. You want to make sure it is set to produce a self-contained HTML file; this is the default in recent installations of Rstudio. If you are using Quarto from the command line, self-contained files are not the default, but you can configure Quarto to produce them (see the Quarto docs for details on this). Course management systems typically don’t allow students to upload HTML files, so you will usually need compress your HTML file into a Zip file and upload that. This can work with non-standalone HTML files too, so long as the zip file contains the images etc. too. PDF files are a little trickier to create well, but they have a few benefits: Jupyter, Quarto, and Rmd/Rnotebook can all export PDF files. Their default PDF exports require a working LaTeX installation; Quarto provides a command-line option to install a minimal one that’s enough to build its output. If you don’t have LaTeX installed, it can work well to produce PDFs from HTML. Jupyter has built-in support for this; it just requires a couple of installs: Quarto also theoretically supports HTML-based PDF workflows, but I haven’t figured out how to get those working yet. You can also create a PDF from any HTML file a few different ways: Open it in your browser and print it to a PDF file. Use WeasyPrint: You can also install weasyprint with Use wkhtmltopdf or another HTML-to-PDF tool. This checklist is to help you ensure your notebook is well-structured and well-written. I may expand or revise it as we progress through the semester. The Data Visualization Checklist is useful, if opinionated.Video
Standalone Documents
Complete Runs
Formatting Notebook Text
$
characters, is also widely supported. For example, this code: The equation $a^n + b^n = c^n$ does not hold for integers $n>2$.
$$
or \[
.`train_test_split` function from SciKit Learn will helpfully
The partition our data for us.
train_test_split
function from SciKit Learn will helpfully partition our data for us.strikethrough.
Markdown also supports **strong emphasis** (bold) and *emphasis* (italics),
which can be very useful for highlighting your argument. In addition to the
common Markdown syntax, Jupyter and Quarto both support extensions, such as ~~strikethrough~~.
#
characters, as in:# Document Title
## Level 2 heading
##
does not mean “large bold font”, it means “level 2 heading”. Properly structuring headings makes your document easier to read (see above, that a notebook is a document), and also enables tooling that to support navigating the document. JupyterLab and extensions to the notebook server both provide notebook outlines using the section headings, as do RStudio and Visual Studio Code. Section headings should also be short.Process
Exporting Notebooks
HTML Export
PDF Export
pip install playwright
playwright install chromium
conda install weasyprint
weasy file.html file.pdf
pip
, but that requires you to also make sure you have the appropriate Cairo development libraries installed, a process that Conda automates (and is especially hard on Windows). On macOS, Homebrew is also a good way to install weasyprint.Checklist
Structure
Writing and Output
code
?Graphics
Content
Post-Export