Saving Plots

Drawing a chart

I use Jupyter notebooks extensively for data analysis and exploration. It’s fantastic to be able to quickly see output, including plots, and have it all saved and persisted and viewable on GitHub.

However, when it comes time to prepare for publication, I need to save high-resolution and/or vector versions of the plots for use in LaTeX or Word. The display in Jupyter does not have nearly high enough resolution to copy and paste into a document and have it look acceptably good.

Most of my projects, therefore, have a convenience function for plots that are going into the paper. This function saves the plot to disk (in both PDF and 600dpi PNG formats) and returns it so it can also be displayed in Jupyter. That way I don’t have two copies of the plot code — one for saving and one for interactive exploration — that can get out of sync.

Python Code

The make_plot function takes care of three things:

  1. Chaining the ggplot calls together (since the syntax is slightly less friendly in Python)
  2. Applying the theme I’m using for the notebook, along with additional theme options
  3. Saving the plots to both PDF and high-DPI PNG
  4. Returning the plot for notebook drawing

This function is built for plotnine, a Grammar of Graphics plotting library for Python that I currently use for most of my statistical visualization. It should be possible to write a similar function for raw Matplotlib, or for Plotly, but I have not yet done so.

It uses a global variable _fig_dir to decide where to put the figures. The extra keyword arguments (kwargs) are passed directly to another theme call, to make per-figure theme customizations easy.

Code:

import plotnine as pn
def make_plot(data, aes, *args, file=None, height=5, width=7, theme=theme_paper(), **kwargs):
    plt = pn.ggplot(data, aes)
    for a in args:
        plt = plt + a
    plt = plt + theme + pn.theme(**kwargs)
    if file is not None:
        outf = _fig_dir / file
        if outf.suffix:
            warnings.warn('file has suffix, ignoring')
        plt.save(outf.with_suffix('.pdf'), height=height, width=width)
        plt.save(outf.with_suffix('.png'), height=height, width=width, dpi=300)
    return plt

This can be used like this:

make_plot(data, pn.aes('DataSet', 'value', fill='gender'),
          pn.geom_bar(stat='identity'),
          pn.scale_fill_brewer('qual', 'Dark2'),
          pn.labs(x='Data Set', y='% of Books', fill='Gender'),
          pn.scale_y_continuous(labels=lbl_pct),
          file='frac-known-books', width=4, height=2.5)

The width and height are in inches.

And here’s theme_paper, a custom theme that extends theme_minimal with some text cleanups:

class theme_paper(pn.theme_minimal):
    def __init__(self):
        pn.theme_minimal.__init__(self, base_family='Open Sans')
        self.add_theme(pn.theme(
            axis_title=pn.element_text(size=10),
            axis_title_y=pn.element_text(margin={'r': 12}),
            panel_border=pn.element_rect(color='gainsboro', size=1, fill=None)
        ), inplace=True)

I use these functions in the book author gender code.

R Code

I also have an R vesion from some older projects, before I switched to Python. This one requires you to use + yourself; it doesn’t have any automatic ggplot calls.

make_plot = function(plot, file=NA, width=5, height=3, ...) {
    if (!is.na(file)) {
        png(paste(file, "png", sep="."), width=width, height=height, units='in', res=600, ...)
        print(plot)
        dev.off()
        cairo_pdf(paste(file, "pdf", sep="."), width=width, height=height, ...)
        print(plot)
        dev.off()
    }
    plot
}

You can use it like this:

make_plot(ggplot(frame, aes(x=DataSet, y=value, fill=gender))
    + geom_bar(stat='identity')
    + scale_fill_brewer('qual', 'Dark2')
    + labs(x='Data Set', y='% of Books', fill='Gender')
    + scale_y_continuous(labels=lbl_pct),
    file="frac-known-books", width=4, height=2.5)

I also don’t have automatic theming in the R version, but it would be easy to add.