Table of Contents

Jupyter Notebooks Style Guide

Brock Schmutzler Updated by Brock Schmutzler

This article intends to walk you through a consistent style guide for Jupyter Notebooks that is to be applied to all eCornell courses requiring notebooks. By applying consistent practices in notebook materials earlier in courses, it will create a consistent user experience in eCornell courses and save course developers time downstream. The presentation is divided into two parts, the first explaining the appropriate structure of notebooks themselves and the second describing the Markdown tools to do so. The application of this style guide can be seen in this example.ipynb notebook from one of the existing eCornell courses.

You can share this notebook style guide with faculty partners. To ensure a consistent coding style, refer to Using Code Formatters.
Notebook file names should NOT contain spaces (use underscores instead), should NOT be overly long (use less than 70 characters), and should NOT deviate much from the name of the Codio unit or Canvas page. For example, Project_One_Name_of_the_Project.ipynb (or Name_of_Project.ipynb) would be a good notebook name in a Codio unit called Project One — Name of the Project (or Name of Project) that is embedded on a Canvas page titled Course Project, Part One — Longer Name of the Project.

Part 1. Notebook Narrative Structure

Jupyter Notebooks are more than just a means to an end to make scripts more legible. They are a storytelling device for data and workflows and should be treated as such. With that in mind, eCornell course developers working with Jupyter Notebooks (hereafter simply "notebooks") should encourage faculty to adhere to this style guide.

Cell Types

Jupyter uses two kinds of cells, Code and Markdown. Code cells contain code that can be executed with the specified language kernel, and Markdown cells use markdown-formatted text to convey some kinds of ideas. Much like this HelpDocs article, the plain-text is the equivalent of a Markdown cell in Jupyter, which can have various kinds of text formatting, such as bold, italics, fixed-width , lists, and tables. Conversely, code blocks are the equivalent of notebook Code cells, with the additional feature that Code cells are executable within the notebook and may produce output.

Bearing that in mind, there shouldn't be large comment-blocks in code cells because most (if not all) of the commentary and narrative should be written in Markdown cells above the Code cells. Let's look at examples to illustrate the point.

Bad Example 1: Instructions or Exposition as Code Comments
# first we create a list of numbers, 0-49
# then we take the cumulative sum of them and divide by the sum
# this will give you the (some fake coefficient) of the numerical vector
# keep in mind there are more efficient ways of doing this calculation

import numpy as np

a = range(50)
cs = np.cumsum(a)
ceoff = cs / sum(a)

This Code cell does not conform to the appropriate eCornell style, which stipulates that instructions and exposition should be provided before a Code cell. Instructions and annotations should be treated as equally meaningful as the code it intends to provide context to. Instead, this Code cell should be rewritten as a Markdown cell followed by a Code cell. For example, the Markdown syntax below

Good example of narrative Markdown cell (Markdown syntax shown) followed by Code cell

is rendered like this:

Good example of narrative Markdown cell (HTML rendering shown) followed by Code cell

As you can see in the sample above, the Code cell features code and the narrative is restricted to the Markdown cell above. This reduces complex ideas to digestible pieces. In addition, Code cells become more amenable to copy-pasting elsewhere.

Bad Example 2: Intermittent Comments

Another (albeit less egregious) example of a practice to avoid is intermittent comments explaining each step (below). Line numbers in notebooks could be activated in Jupyter, so even this information could be written above in a Markdown cell with a reference to the line number. If this is the first exposure a student has to a particular workflow, then the individual steps should merit their own cells, however if this is later in a course module and the students have already become accustomed to earlier parts of a workflow, inline annotations like the cell below are permissible for brevity.

# import the numpy package
import numpy as np

# generate the sequential integer vector
a = range(50)
# calculate the cumulative sum
cs = np.cumsum(a)
# divide by sum to get the coefficient
ceoff = cs / sum(a)

We can instead rewrite this again as separate markdown and code cells:

Good example of narrative Markdown cell (HTML rendering shown) whose steps are annotated with line numbers from the subsequent Code cell

Other Considerations

Types of Code Comments

If using in-code comments as annotations, put code comments above the line of code they are annotating, not inline after it:

avoid

a = range(50) # generate the sequential integer vector

cs = np.cumsum(a) # calculate the cumulative sum

ceoff = cs / sum(a) # divide by sum to get the coefficient

do this instead

# generate the sequential integer vector
a = range(50)

# calculate the cumulative sum
cs = np.cumsum(a)

# divide by sum to get the coefficient
ceoff = cs / sum(a)

Avoid Consecutive Code Cells

When a notebook has consecutive Code cells, there is often times little to no narrative context for Code cells after the first one. Unless a Code cell's purpose is totally clear from previous work in a notebook, all Code cells in the notebook should be accompanied by narrative context.

Notebooks Are Standalone

Each notebook is a standalone assignment/practice/demo/etc. A course using notebooks should not reuse large monolithic notebooks. Students should not be instructed to scroll down to "Section X" and run those cells, or rerun all cells before that starting point. Notebooks start from the top and finish at the bottom, there should not be any parachuting. If assignments or practices rely on the results of previous assignments that have time-consuming processing steps, then save and import the working environment across assignments.

Saving Environments

R allows users to save their workspace as .Rdata files, which is a binary serialization of all the variables in the active workspace. These .Rdata files can be sources at the top of a notebook/assignment to import objects from previous work.

Python can serialize objects using the pickle package, which can act similar to .Rdata serialization. However, you may instead want to save the kernel state and import it between assignments. To accomplish this, use the dill package.

Embrace Headers

Use headers liberally since notebooks have integrated table of contents that allow students jump between sections of notebooks. See below for header formatting in Markdown.

Prioritize Markdown Over HTML

Notebooks have excellent integration with the Markdown language. Use Markdown syntax in all cases where it applies, and supplement Markdown with HTML if you need extra formatting it doesn't provide.

Load Packages Separately

In Julia, Python, and R (for example), best-practice for idiomatic writing is to import packages at the top of a script or notebook, separate from other code. In notebook format, these import statements should live in their own code cell towards the top of the document, after the preamble/introduction (when possible or reasonable to do so). Packages should not be imported mid-document. If necessary, annotate the import calls and explain to students what functionality that package will be providing. This annotation may be useful the first few times the package is called and they are becoming familiar with it, but not necessary afterwards.

Enforce Fixed-Width Formatting

When referencing functions, methods, files, URLs, or specific strings, use fixed-width formatting (backticks in Markdown). Any item with programmatic relevance must be fixed-width formatted.

Part 2. Leveraging Markdown

Markdown is a very trimmed down syntax that uses just a handful of formatting rules to mark up text for nice HTML rendering. Markdown is preferred in notebooks over HTML, so only use HTML if you require a formatting that isn't possible in Markdown. The basics of the official Markdown cheat sheet are covered in the subsections below.

Headers

Just like HTML <h1> through <h4> classes, Markdown uses pound signs # through #### to assign header levels, where the number of consecutive pound signs indicates the header level. For example, the HTML syntax <h2>Header Level 2</h2> is equivalent to ## Header Level Two in Markdown syntax.

Lists

Lists can be ordered or unordered.

Unordered

Use hyphens - to create unordered lists. For example, the syntax

- Item 1
- Item 2
- Item 3

is rendered as

  • Item 1
  • Item 2
  • Item 3

Ordered

Ordered (numbered) lists are automatically numbered. For example, the syntax

1. Item 1
2. Item 2
3. Item 3

will render as:

  1. Item 1
  2. Item 2
  3. Item 3

A great feature is that you can mess up numbering and the Markdown will still render numbers sequentially. For example, the syntax

1. Item 1
3. Item 2
2. Item 3

will still render as:

  1. Item 1
  2. Item 2
  3. Item 3

Formatting

Markdown gives you access to italics (flanking asterisk *italics* or underscore _italics_), bold (flanking double-asterisk **bold** or double-underscore __bold__), and fixed-width (flanking backticks `fixed-width`) formatting. For example,

**This is** a *sentence* trying to prove a point about the function `range(1, 50)`.

is rendered as:

A plain-text Markdown sentence with bold, italics, and fixed-width formatting

Code blocks

To create code blocks, use triple backticks. You can also specify the language the Markdown renderer should use for syntax highlighting.

```python
a = range(1,50)

# now print the scaled results
[print(i + 3.3) for i in a]
```

is rendered as:

a = range(1,50)

# now print the scaled results
[print(i + 3.3) for i in a]

Tables

There's a dirt-simple syntax for nice tables in markdown. You just separate columns with pipes |. The first row is column titles, the second is the alignment type (left, center, right) depending on where the colon is/are relative to the dashes. Best part is, the text doesn't need to look nicely aligned or formatted.

| left-aligned column | center column | right-aligned column |
| :--- | :---: | ---: |
| some kind of text | potatoes and carrots | a longer desciption here that can drag on for a bit |
| new text | cranberries | the quick brown fox jumps over the lazy dog |

which renders as:

A table rendered with Markdown syntax
Math

Use flanking dollar signs ($) to create LaTeX math formatting. For example,

You will need to adjust $x$ by controlling for sample size $n$ using the equation $x\times(1-\frac{n}{15})$.

will render as:

A plain-text Markdown sentence with LaTeX math formatting

You can use double-dollar signs ($$) to create a LaTeX math block. For example,

When $a \ne 0$, there are two solutions to a quadratic equation of the form $ax^2 + bx + c = 0$, given by the **quadratic formula**:

$$
x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}
$$

will render as:

A plain-text Markdown sentence with inline and block LaTeX math formatting
For more help with LaTeX, try this LaTeX Guide.

How did we do?

Jupyter Notebooks - nbgrader tweaks

Adding Extensions to Jupyter Notebooks

Contact