Introduction

R notebooks are a convenient and slick way to explore your data and share analyses with collaborators. The clearest advantage is that both the results and the code used are present in the same file and can be shared as an HTML file. This R notebook is intended to highlight the main R notebook features that I find helpful.

You can see the R code used at each step below, but for the purposes of this exercise the full R markdown file will be more useful for understanding how to make a notebook like this. This format (.Rmd rather than a standard .R script) is a markdown format that includes chunks of code (usually, but not always, in R). You can download the accompanying R markdown file here.

For some reason the YAML header of this R notebook may not be visible when downloaded from this link (although it is present in the original file!). The YAML header looks like this in the original file (and is directly at the top of the notebook):

Highlight overview

Note the table of content that floats on the side of the page while scrolling. Settings like this, the particular layout, and whether code should be hidden by default, are specified at the beginning of the R markdown file. Another really handy feature of R notebooks is that you can execute specific chunks of code at a time - this makes them useful for standard data analysis, when you might be re-writing and troubleshooting code, and not merely as a way to summarize final code.

The other features that will be highlighted are:

  • Using tabs to present related data

  • Printing pretty tables

  • Using other languages besides R

  • Programmatically creating tabs in R notebooks

Example dataset

The below examples will be with the default iris dataset, which is Fisher’s (or Anderson’s) famous dataset of measurements in centimetres of 50 flowers of three different iris species. The measurements are of the length and width of the petal and sepal for each flower.

Highlighted examples

Super basic visualization

Whenever you are presenting results you can have normal text above and below R code. For instance, see a few basic R commands and output below. You will need to click the below button to show the actual R code.

summary(iris)
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
## 
hist(iris$Sepal.Length, col="grey", xlab="Sepal Length", ylab="Frequency", main="", breaks=25, xlim=c(0, 10))

Below this R code you can keep writing, which can be helpful if you’re describing results!

This is what the code actually looks like in the R markdown code:

Presenting results with diff. tabs

One of my favourite features it to use tabs to visualize related data on the same page. This is super easy to do, because it just requires you to type `{.tabset} after a heading name and then automatically all sub-headings of the same level will be converted into tabs. As an example, boxplots of the measurements per iris species are shown in separate tabs for the sepal and petal data. The third tab is the petal data re-plotted but with different figure sizes set, which are often useful to change.

Note that as a bonus these boxplots also showcase the simple commands to make beeswarm plots (overlaid on top of boxplots) and multi-panel figures made with the cowplot package.

Sepal

Here is the sepal data, note that this text is different for each tab!

sepal_length_boxplot <- ggplot(data=iris, aes(x=Species, y=Sepal.Length, fill=Species)) +
                         geom_boxplot(outlier.shape = NA) +
                         geom_beeswarm(cex=1, size=1) +
                         scale_fill_manual(values=c("#1b9e77", "#d95f02", "#7570b3")) +
                         ylab("Sepal Length") +
                         xlab("Species") +
                         ylim(c(0, 8)) +
                         theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1),
                         legend.position = "none",
                         panel.background = element_rect(fill = 'grey85'))

sepal_width_boxplot <- ggplot(data=iris, aes(x=Species, y=Sepal.Width, fill=Species)) +
                         geom_boxplot(outlier.shape = NA) +
                         geom_beeswarm(cex=1, size=1) +
                         scale_fill_manual(values=c("#1b9e77", "#d95f02", "#7570b3")) +
                         ylab("Sepal Width") +
                         xlab("Species") +
                         ylim(c(0, 5)) +
                         theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1),
                         legend.position = "none",
                         panel.background = element_rect(fill = 'grey85'))

plot_grid(sepal_length_boxplot, sepal_width_boxplot, labels = c('a', 'b'))

Petal

Here is the petal data, note that this text is different for each tab!

petal_length_boxplot <- ggplot(data=iris, aes(x=Species, y=Petal.Length, fill=Species)) +
                         geom_boxplot(outlier.shape = NA) +
                         geom_beeswarm(cex=1, size=1) +
                         scale_fill_manual(values=c("#1b9e77", "#d95f02", "#7570b3")) +
                         ylab("Petal Length") +
                         xlab("Species") +
                         ylim(c(0, 8)) +
                         theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1),
                         legend.position = "none",
                         panel.background = element_rect(fill = 'grey85'))

petal_width_boxplot <- ggplot(data=iris, aes(x=Species, y=Petal.Width, fill=Species)) +
                         geom_boxplot(outlier.shape = NA) +
                         geom_beeswarm(cex=1, size=1) +
                         scale_fill_manual(values=c("#1b9e77", "#d95f02", "#7570b3")) +
                         ylab("Petal Width") +
                         xlab("Species") +
                         ylim(c(0, 3)) +
                         theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1),
                         legend.position = "none",
                         panel.background = element_rect(fill = 'grey85'))

plot_grid(petal_length_boxplot, petal_width_boxplot, labels = c('a', 'b'))

Petal re-sized

Here is the petal data re-plotted to be a different size. Figure sizes are often necessary to change so that they can be visualized correctly.

petal_length_boxplot <- ggplot(data=iris, aes(x=Species, y=Petal.Length, fill=Species)) +
                         geom_boxplot(outlier.shape = NA) +
                         geom_beeswarm(cex=1, size=1) +
                         scale_fill_manual(values=c("#1b9e77", "#d95f02", "#7570b3")) +
                         ylab("Petal Length") +
                         xlab("Species") +
                         ylim(c(0, 8)) +
                         theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1),
                         legend.position = "none",
                         panel.background = element_rect(fill = 'grey85'))

petal_width_boxplot <- ggplot(data=iris, aes(x=Species, y=Petal.Width, fill=Species)) +
                         geom_boxplot(outlier.shape = NA) +
                         geom_beeswarm(cex=1, size=1) +
                         scale_fill_manual(values=c("#1b9e77", "#d95f02", "#7570b3")) +
                         ylab("Petal Width") +
                         xlab("Species") +
                         ylim(c(0, 3)) +
                         theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1),
                         legend.position = "none",
                         panel.background = element_rect(fill = 'grey85'))

plot_grid(petal_length_boxplot, petal_width_boxplot, labels = c('a', 'b'))

Tables

Kable table

A really easy and clean way of visualizing tabular data is to output it with kable(). The below tables are based on the first ten rows of the iris dataset. A major advantage of kable tables are that you can alter the colour of cells and the colour and size of cell values. You can see some examples in the below tabs, some of which I came across in this tutorial: https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html.

Basic

head(iris, n = 10) %>%
  kable() %>%
  kable_styling(full_width = FALSE)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa
4.6 3.4 1.4 0.3 setosa
5.0 3.4 1.5 0.2 setosa
4.4 2.9 1.4 0.2 setosa
4.9 3.1 1.5 0.1 setosa

Table with title and classic format

head(iris, n = 10) %>%
  kable(caption = "Head of iris data") %>%
  kable_classic(full_width = FALSE, html_font = "Cambria")
Head of iris data
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa
4.6 3.4 1.4 0.3 setosa
5.0 3.4 1.5 0.2 setosa
4.4 2.9 1.4 0.2 setosa
4.9 3.1 1.5 0.1 setosa

Fixed header position (first 100 lines)

head(iris, n = 100) %>%
  kable() %>%
  kable_styling(full_width = FALSE, fixed_thead = TRUE)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa
4.6 3.4 1.4 0.3 setosa
5.0 3.4 1.5 0.2 setosa
4.4 2.9 1.4 0.2 setosa
4.9 3.1 1.5 0.1 setosa
5.4 3.7 1.5 0.2 setosa
4.8 3.4 1.6 0.2 setosa
4.8 3.0 1.4 0.1 setosa
4.3 3.0 1.1 0.1 setosa
5.8 4.0 1.2 0.2 setosa
5.7 4.4 1.5 0.4 setosa
5.4 3.9 1.3 0.4 setosa
5.1 3.5 1.4 0.3 setosa
5.7 3.8 1.7 0.3 setosa
5.1 3.8 1.5 0.3 setosa
5.4 3.4 1.7 0.2 setosa
5.1 3.7 1.5 0.4 setosa
4.6 3.6 1.0 0.2 setosa
5.1 3.3 1.7 0.5 setosa
4.8 3.4 1.9 0.2 setosa
5.0 3.0 1.6 0.2 setosa
5.0 3.4 1.6 0.4 setosa
5.2 3.5 1.5 0.2 setosa
5.2 3.4 1.4 0.2 setosa
4.7 3.2 1.6 0.2 setosa
4.8 3.1 1.6 0.2 setosa
5.4 3.4 1.5 0.4 setosa
5.2 4.1 1.5 0.1 setosa
5.5 4.2 1.4 0.2 setosa
4.9 3.1 1.5 0.2 setosa
5.0 3.2 1.2 0.2 setosa
5.5 3.5 1.3 0.2 setosa
4.9 3.6 1.4 0.1 setosa
4.4 3.0 1.3 0.2 setosa
5.1 3.4 1.5 0.2 setosa
5.0 3.5 1.3 0.3 setosa
4.5 2.3 1.3 0.3 setosa
4.4 3.2 1.3 0.2 setosa
5.0 3.5 1.6 0.6 setosa
5.1 3.8 1.9 0.4 setosa
4.8 3.0 1.4 0.3 setosa
5.1 3.8 1.6 0.2 setosa
4.6 3.2 1.4 0.2 setosa
5.3 3.7 1.5 0.2 setosa
5.0 3.3 1.4 0.2 setosa
7.0 3.2 4.7 1.4 versicolor
6.4 3.2 4.5 1.5 versicolor
6.9 3.1 4.9 1.5 versicolor
5.5 2.3 4.0 1.3 versicolor
6.5 2.8 4.6 1.5 versicolor
5.7 2.8 4.5 1.3 versicolor
6.3 3.3 4.7 1.6 versicolor
4.9 2.4 3.3 1.0 versicolor
6.6 2.9 4.6 1.3 versicolor
5.2 2.7 3.9 1.4 versicolor
5.0 2.0 3.5 1.0 versicolor
5.9 3.0 4.2 1.5 versicolor
6.0 2.2 4.0 1.0 versicolor
6.1 2.9 4.7 1.4 versicolor
5.6 2.9 3.6 1.3 versicolor
6.7 3.1 4.4 1.4 versicolor
5.6 3.0 4.5 1.5 versicolor
5.8 2.7 4.1 1.0 versicolor
6.2 2.2 4.5 1.5 versicolor
5.6 2.5 3.9 1.1 versicolor
5.9 3.2 4.8 1.8 versicolor
6.1 2.8 4.0 1.3 versicolor
6.3 2.5 4.9 1.5 versicolor
6.1 2.8 4.7 1.2 versicolor
6.4 2.9 4.3 1.3 versicolor
6.6 3.0 4.4 1.4 versicolor
6.8 2.8 4.8 1.4 versicolor
6.7 3.0 5.0 1.7 versicolor
6.0 2.9 4.5 1.5 versicolor
5.7 2.6 3.5 1.0 versicolor
5.5 2.4 3.8 1.1 versicolor
5.5 2.4 3.7 1.0 versicolor
5.8 2.7 3.9 1.2 versicolor
6.0 2.7 5.1 1.6 versicolor
5.4 3.0 4.5 1.5 versicolor
6.0 3.4 4.5 1.6 versicolor
6.7 3.1 4.7 1.5 versicolor
6.3 2.3 4.4 1.3 versicolor
5.6 3.0 4.1 1.3 versicolor
5.5 2.5 4.0 1.3 versicolor
5.5 2.6 4.4 1.2 versicolor
6.1 3.0 4.6 1.4 versicolor
5.8 2.6 4.0 1.2 versicolor
5.0 2.3 3.3 1.0 versicolor
5.6 2.7 4.2 1.3 versicolor
5.7 3.0 4.2 1.2 versicolor
5.7 2.9 4.2 1.3 versicolor
6.2 2.9 4.3 1.3 versicolor
5.1 2.5 3.0 1.1 versicolor
5.7 2.8 4.1 1.3 versicolor

Coloured cells and values based on values

head(iris, n = 10) %>%
  kable() %>%
  kable_styling(full_width = FALSE) %>%
  column_spec(2, color = spec_color(head(iris[, 2], n = 10))) %>%
  column_spec(1, color = "white",
              background = spec_color(head(iris[, 1], n = 10), option = "magma", end = 0.8))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa
4.6 3.4 1.4 0.3 setosa
5.0 3.4 1.5 0.2 setosa
4.4 2.9 1.4 0.2 setosa
4.9 3.1 1.5 0.1 setosa

DT: Interface to the JavaScript DataTables library

Kable tables are nice, but often it’s more useful to be able to (1) search a table and (2) sort a table by column names. For these purposes, the DT R package is more appropriate, which allows you to use the DataTable JavaScript library in an easy way.

datatable(iris,
          rownames = FALSE,
          class = 'cell-border stripe')

Other languages

In addition to R, you can actually add code chunks from several languages (see full list by clicking Insert in RStudio when working on an R markdown file).

Python and Bash are the most common languages that would be used in the context. Bash in particular could enable you to produce a notebook that would be a complete overview of a bioinformatics pipeline, even if that pipeline required a few standalone tools to be run on the command-line.

As a quick example, see the below bash commands which will be used to manipulate a test table that will be written.

As a test, a table of the first 10 rows of the iris dataframe will be written to a file called iris_head.tsv.

write.table(x = iris[1:10, ], file = "/Users/Gavin/iris_head.tsv", quote = FALSE, row.names = FALSE, col.names = TRUE, sep="\t")

This file can now be parsed with bash commands!

For instance, this is the output of cat:

cat /Users/Gavin/iris_head.tsv
## Sepal.Length Sepal.Width Petal.Length    Petal.Width Species
## 5.1  3.5 1.4 0.2 setosa
## 4.9  3   1.4 0.2 setosa
## 4.7  3.2 1.3 0.2 setosa
## 4.6  3.1 1.5 0.2 setosa
## 5    3.6 1.4 0.2 setosa
## 5.4  3.9 1.7 0.4 setosa
## 4.6  3.4 1.4 0.3 setosa
## 5    3.4 1.5 0.2 setosa
## 4.4  2.9 1.4 0.2 setosa
## 4.9  3.1 1.5 0.1 setosa

And this is the output of wc -l:

wc -l /Users/Gavin/iris_head.tsv
##       11 /Users/Gavin/iris_head.tsv

This is what the above code blocks looked like in the actual R markdown code:

The easiest way to read the Bash output back into R is to first save it to a file and then read it in. With Python this process is much easier fortunately!

The below example shows first how the iris variable can be used in the Python chunk as a pandas dataframe (assuming you have the pandas python package installed). The resulting pandas series can then be used in the subsequent R chunk (where it is converted to a standard numeric vector). Note the syntax of writing r. and py$ before variable names to specify whether they are from the R or Python environments, respectively.

iris_means = r.iris.mean()
## /Users/Gavin/local/miniconda3/bin/python:1: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError.  Select only valid columns before calling the reduction.
print(iris_means)
## Sepal.Length    5.843333
## Sepal.Width     3.057333
## Petal.Length    3.758000
## Petal.Width     1.199333
## dtype: float64
print(py$iris_means)
## Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
##     5.843333     3.057333     3.758000     1.199333

Just to be clear - this is what the code in the R markdown file looked like for the above outputs - note that one code block specifies Python while the other specifies r, but the iris_means object can be called by either, albeit with different syntax.

Although these examples of using other languages were extremely simple, hopefully they helped you see how useful R notebooks can be for creating and sharing highly reproducible workflows.

Programmatically create notebook tabs

Often you want to show plots, tables, or other results for many variables in different tabs. However, it can be tedious (and prone to typos) to make many tabs by copy-and-pasting. To get around this problem you can use the results=‘asis’ option in a code block to specify that all text output should be treated as markdown code.

For example this code block (and markdown header) produces the tabbed output below.

Example automatic tabs

Sepal.Length

Sepal.Width

Petal.Length

Petal.Width

Final thoughts

The above quick examples are meant to show some slick features you may not be aware of in R notebooks as well as to show how useful R notebooks are in general. As mentioned at the beginning, probably what readers will find the most useful is being able to see the raw R markdown file, which you can download here.

Session info

The version numbers of all packages in the current environment as well as information about the R install is reported below. This is useful information to include so that others can better reproduce your work.

Hide

Show

sessionInfo()
## R version 4.0.5 (2021-03-31)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] reticulate_1.22  knitr_1.34       kableExtra_1.3.4 ggbeeswarm_0.6.0
## [5] ggplot2_3.3.5    DT_0.18          cowplot_1.1.1   
## 
## loaded via a namespace (and not attached):
##  [1] beeswarm_0.3.1    tidyselect_1.1.1  xfun_0.23         bslib_0.2.5.1    
##  [5] purrr_0.3.4       lattice_0.20-44   colorspace_2.0-1  vctrs_0.3.8      
##  [9] generics_0.1.0    htmltools_0.5.1.1 viridisLite_0.4.0 yaml_2.2.1       
## [13] utf8_1.2.1        rlang_0.4.11      jquerylib_0.1.4   pillar_1.6.1     
## [17] glue_1.4.2        withr_2.4.2       lifecycle_1.0.0   stringr_1.4.0    
## [21] munsell_0.5.0     gtable_0.3.0      rvest_1.0.0       htmlwidgets_1.5.3
## [25] evaluate_0.14     labeling_0.4.2    crosstalk_1.1.1   vipor_0.4.5      
## [29] fansi_0.4.2       highr_0.9         Rcpp_1.0.6        scales_1.1.1     
## [33] webshot_0.5.2     jsonlite_1.7.2    farver_2.1.0      systemfonts_1.0.2
## [37] png_0.1-7         digest_0.6.27     stringi_1.6.2     dplyr_1.0.6      
## [41] rprojroot_2.0.2   grid_4.0.5        here_1.0.1        tools_4.0.5      
## [45] magrittr_2.0.1    sass_0.4.0        tibble_3.1.2      crayon_1.4.1     
## [49] pkgconfig_2.0.3   ellipsis_0.3.2    Matrix_1.3-3      xml2_1.3.2       
## [53] rmarkdown_2.8     svglite_2.0.0     httr_1.4.2        rstudioapi_0.13  
## [57] R6_2.5.0          compiler_4.0.5