R notebooks are a convenient and slick way to explore your data and share analyses with collaborators. The clearest advantage is that both the results and the code used are present in the same file and can be shared as an HTML file. This R notebook is intended to highlight the main R notebook features that I find helpful.
You can see the R code used at each step below, but for the purposes of this exercise the full R markdown file will be more useful for understanding how to make a notebook like this. This format (.Rmd
rather than a standard .R
script) is a markdown format that includes chunks of code (usually, but not always, in R). You can download the accompanying R markdown file here.
For some reason the YAML header of this R notebook may not be visible when downloaded from this link (although it is present in the original file!). The YAML header looks like this in the original file (and is directly at the top of the notebook):
Note the table of content that floats on the side of the page while scrolling. Settings like this, the particular layout, and whether code should be hidden by default, are specified at the beginning of the R markdown file. Another really handy feature of R notebooks is that you can execute specific chunks of code at a time - this makes them useful for standard data analysis, when you might be re-writing and troubleshooting code, and not merely as a way to summarize final code.
The other features that will be highlighted are:
Using tabs to present related data
Printing pretty tables
Using other languages besides R
Programmatically creating tabs in R notebooks
The below examples will be with the default iris
dataset, which is Fisher’s (or Anderson’s) famous dataset of measurements in centimetres of 50 flowers of three different iris species. The measurements are of the length and width of the petal and sepal for each flower.
Whenever you are presenting results you can have normal text above and below R code. For instance, see a few basic R commands and output below. You will need to click the below button to show the actual R code.
summary(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
## Species
## setosa :50
## versicolor:50
## virginica :50
##
##
##
hist(iris$Sepal.Length, col="grey", xlab="Sepal Length", ylab="Frequency", main="", breaks=25, xlim=c(0, 10))
Below this R code you can keep writing, which can be helpful if you’re describing results!
This is what the code actually looks like in the R markdown code:
One of my favourite features it to use tabs to visualize related data on the same page. This is super easy to do, because it just requires you to type `{.tabset} after a heading name and then automatically all sub-headings of the same level will be converted into tabs. As an example, boxplots of the measurements per iris species are shown in separate tabs for the sepal and petal data. The third tab is the petal data re-plotted but with different figure sizes set, which are often useful to change.
Note that as a bonus these boxplots also showcase the simple commands to make beeswarm plots (overlaid on top of boxplots) and multi-panel figures made with the cowplot package.
Here is the sepal data, note that this text is different for each tab!
sepal_length_boxplot <- ggplot(data=iris, aes(x=Species, y=Sepal.Length, fill=Species)) +
geom_boxplot(outlier.shape = NA) +
geom_beeswarm(cex=1, size=1) +
scale_fill_manual(values=c("#1b9e77", "#d95f02", "#7570b3")) +
ylab("Sepal Length") +
xlab("Species") +
ylim(c(0, 8)) +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1),
legend.position = "none",
panel.background = element_rect(fill = 'grey85'))
sepal_width_boxplot <- ggplot(data=iris, aes(x=Species, y=Sepal.Width, fill=Species)) +
geom_boxplot(outlier.shape = NA) +
geom_beeswarm(cex=1, size=1) +
scale_fill_manual(values=c("#1b9e77", "#d95f02", "#7570b3")) +
ylab("Sepal Width") +
xlab("Species") +
ylim(c(0, 5)) +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1),
legend.position = "none",
panel.background = element_rect(fill = 'grey85'))
plot_grid(sepal_length_boxplot, sepal_width_boxplot, labels = c('a', 'b'))
Here is the petal data, note that this text is different for each tab!
petal_length_boxplot <- ggplot(data=iris, aes(x=Species, y=Petal.Length, fill=Species)) +
geom_boxplot(outlier.shape = NA) +
geom_beeswarm(cex=1, size=1) +
scale_fill_manual(values=c("#1b9e77", "#d95f02", "#7570b3")) +
ylab("Petal Length") +
xlab("Species") +
ylim(c(0, 8)) +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1),
legend.position = "none",
panel.background = element_rect(fill = 'grey85'))
petal_width_boxplot <- ggplot(data=iris, aes(x=Species, y=Petal.Width, fill=Species)) +
geom_boxplot(outlier.shape = NA) +
geom_beeswarm(cex=1, size=1) +
scale_fill_manual(values=c("#1b9e77", "#d95f02", "#7570b3")) +
ylab("Petal Width") +
xlab("Species") +
ylim(c(0, 3)) +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1),
legend.position = "none",
panel.background = element_rect(fill = 'grey85'))
plot_grid(petal_length_boxplot, petal_width_boxplot, labels = c('a', 'b'))
Here is the petal data re-plotted to be a different size. Figure sizes are often necessary to change so that they can be visualized correctly.
petal_length_boxplot <- ggplot(data=iris, aes(x=Species, y=Petal.Length, fill=Species)) +
geom_boxplot(outlier.shape = NA) +
geom_beeswarm(cex=1, size=1) +
scale_fill_manual(values=c("#1b9e77", "#d95f02", "#7570b3")) +
ylab("Petal Length") +
xlab("Species") +
ylim(c(0, 8)) +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1),
legend.position = "none",
panel.background = element_rect(fill = 'grey85'))
petal_width_boxplot <- ggplot(data=iris, aes(x=Species, y=Petal.Width, fill=Species)) +
geom_boxplot(outlier.shape = NA) +
geom_beeswarm(cex=1, size=1) +
scale_fill_manual(values=c("#1b9e77", "#d95f02", "#7570b3")) +
ylab("Petal Width") +
xlab("Species") +
ylim(c(0, 3)) +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1),
legend.position = "none",
panel.background = element_rect(fill = 'grey85'))
plot_grid(petal_length_boxplot, petal_width_boxplot, labels = c('a', 'b'))
A really easy and clean way of visualizing tabular data is to output it with kable()
. The below tables are based on the first ten rows of the iris
dataset. A major advantage of kable tables are that you can alter the colour of cells and the colour and size of cell values. You can see some examples in the below tabs, some of which I came across in this tutorial: https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html.
head(iris, n = 10) %>%
kable() %>%
kable_styling(full_width = FALSE)
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
---|---|---|---|---|
5.1 | 3.5 | 1.4 | 0.2 | setosa |
4.9 | 3.0 | 1.4 | 0.2 | setosa |
4.7 | 3.2 | 1.3 | 0.2 | setosa |
4.6 | 3.1 | 1.5 | 0.2 | setosa |
5.0 | 3.6 | 1.4 | 0.2 | setosa |
5.4 | 3.9 | 1.7 | 0.4 | setosa |
4.6 | 3.4 | 1.4 | 0.3 | setosa |
5.0 | 3.4 | 1.5 | 0.2 | setosa |
4.4 | 2.9 | 1.4 | 0.2 | setosa |
4.9 | 3.1 | 1.5 | 0.1 | setosa |
head(iris, n = 10) %>%
kable(caption = "Head of iris data") %>%
kable_classic(full_width = FALSE, html_font = "Cambria")
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
---|---|---|---|---|
5.1 | 3.5 | 1.4 | 0.2 | setosa |
4.9 | 3.0 | 1.4 | 0.2 | setosa |
4.7 | 3.2 | 1.3 | 0.2 | setosa |
4.6 | 3.1 | 1.5 | 0.2 | setosa |
5.0 | 3.6 | 1.4 | 0.2 | setosa |
5.4 | 3.9 | 1.7 | 0.4 | setosa |
4.6 | 3.4 | 1.4 | 0.3 | setosa |
5.0 | 3.4 | 1.5 | 0.2 | setosa |
4.4 | 2.9 | 1.4 | 0.2 | setosa |
4.9 | 3.1 | 1.5 | 0.1 | setosa |
head(iris, n = 100) %>%
kable() %>%
kable_styling(full_width = FALSE, fixed_thead = TRUE)
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
---|---|---|---|---|
5.1 | 3.5 | 1.4 | 0.2 | setosa |
4.9 | 3.0 | 1.4 | 0.2 | setosa |
4.7 | 3.2 | 1.3 | 0.2 | setosa |
4.6 | 3.1 | 1.5 | 0.2 | setosa |
5.0 | 3.6 | 1.4 | 0.2 | setosa |
5.4 | 3.9 | 1.7 | 0.4 | setosa |
4.6 | 3.4 | 1.4 | 0.3 | setosa |
5.0 | 3.4 | 1.5 | 0.2 | setosa |
4.4 | 2.9 | 1.4 | 0.2 | setosa |
4.9 | 3.1 | 1.5 | 0.1 | setosa |
5.4 | 3.7 | 1.5 | 0.2 | setosa |
4.8 | 3.4 | 1.6 | 0.2 | setosa |
4.8 | 3.0 | 1.4 | 0.1 | setosa |
4.3 | 3.0 | 1.1 | 0.1 | setosa |
5.8 | 4.0 | 1.2 | 0.2 | setosa |
5.7 | 4.4 | 1.5 | 0.4 | setosa |
5.4 | 3.9 | 1.3 | 0.4 | setosa |
5.1 | 3.5 | 1.4 | 0.3 | setosa |
5.7 | 3.8 | 1.7 | 0.3 | setosa |
5.1 | 3.8 | 1.5 | 0.3 | setosa |
5.4 | 3.4 | 1.7 | 0.2 | setosa |
5.1 | 3.7 | 1.5 | 0.4 | setosa |
4.6 | 3.6 | 1.0 | 0.2 | setosa |
5.1 | 3.3 | 1.7 | 0.5 | setosa |
4.8 | 3.4 | 1.9 | 0.2 | setosa |
5.0 | 3.0 | 1.6 | 0.2 | setosa |
5.0 | 3.4 | 1.6 | 0.4 | setosa |
5.2 | 3.5 | 1.5 | 0.2 | setosa |
5.2 | 3.4 | 1.4 | 0.2 | setosa |
4.7 | 3.2 | 1.6 | 0.2 | setosa |
4.8 | 3.1 | 1.6 | 0.2 | setosa |
5.4 | 3.4 | 1.5 | 0.4 | setosa |
5.2 | 4.1 | 1.5 | 0.1 | setosa |
5.5 | 4.2 | 1.4 | 0.2 | setosa |
4.9 | 3.1 | 1.5 | 0.2 | setosa |
5.0 | 3.2 | 1.2 | 0.2 | setosa |
5.5 | 3.5 | 1.3 | 0.2 | setosa |
4.9 | 3.6 | 1.4 | 0.1 | setosa |
4.4 | 3.0 | 1.3 | 0.2 | setosa |
5.1 | 3.4 | 1.5 | 0.2 | setosa |
5.0 | 3.5 | 1.3 | 0.3 | setosa |
4.5 | 2.3 | 1.3 | 0.3 | setosa |
4.4 | 3.2 | 1.3 | 0.2 | setosa |
5.0 | 3.5 | 1.6 | 0.6 | setosa |
5.1 | 3.8 | 1.9 | 0.4 | setosa |
4.8 | 3.0 | 1.4 | 0.3 | setosa |
5.1 | 3.8 | 1.6 | 0.2 | setosa |
4.6 | 3.2 | 1.4 | 0.2 | setosa |
5.3 | 3.7 | 1.5 | 0.2 | setosa |
5.0 | 3.3 | 1.4 | 0.2 | setosa |
7.0 | 3.2 | 4.7 | 1.4 | versicolor |
6.4 | 3.2 | 4.5 | 1.5 | versicolor |
6.9 | 3.1 | 4.9 | 1.5 | versicolor |
5.5 | 2.3 | 4.0 | 1.3 | versicolor |
6.5 | 2.8 | 4.6 | 1.5 | versicolor |
5.7 | 2.8 | 4.5 | 1.3 | versicolor |
6.3 | 3.3 | 4.7 | 1.6 | versicolor |
4.9 | 2.4 | 3.3 | 1.0 | versicolor |
6.6 | 2.9 | 4.6 | 1.3 | versicolor |
5.2 | 2.7 | 3.9 | 1.4 | versicolor |
5.0 | 2.0 | 3.5 | 1.0 | versicolor |
5.9 | 3.0 | 4.2 | 1.5 | versicolor |
6.0 | 2.2 | 4.0 | 1.0 | versicolor |
6.1 | 2.9 | 4.7 | 1.4 | versicolor |
5.6 | 2.9 | 3.6 | 1.3 | versicolor |
6.7 | 3.1 | 4.4 | 1.4 | versicolor |
5.6 | 3.0 | 4.5 | 1.5 | versicolor |
5.8 | 2.7 | 4.1 | 1.0 | versicolor |
6.2 | 2.2 | 4.5 | 1.5 | versicolor |
5.6 | 2.5 | 3.9 | 1.1 | versicolor |
5.9 | 3.2 | 4.8 | 1.8 | versicolor |
6.1 | 2.8 | 4.0 | 1.3 | versicolor |
6.3 | 2.5 | 4.9 | 1.5 | versicolor |
6.1 | 2.8 | 4.7 | 1.2 | versicolor |
6.4 | 2.9 | 4.3 | 1.3 | versicolor |
6.6 | 3.0 | 4.4 | 1.4 | versicolor |
6.8 | 2.8 | 4.8 | 1.4 | versicolor |
6.7 | 3.0 | 5.0 | 1.7 | versicolor |
6.0 | 2.9 | 4.5 | 1.5 | versicolor |
5.7 | 2.6 | 3.5 | 1.0 | versicolor |
5.5 | 2.4 | 3.8 | 1.1 | versicolor |
5.5 | 2.4 | 3.7 | 1.0 | versicolor |
5.8 | 2.7 | 3.9 | 1.2 | versicolor |
6.0 | 2.7 | 5.1 | 1.6 | versicolor |
5.4 | 3.0 | 4.5 | 1.5 | versicolor |
6.0 | 3.4 | 4.5 | 1.6 | versicolor |
6.7 | 3.1 | 4.7 | 1.5 | versicolor |
6.3 | 2.3 | 4.4 | 1.3 | versicolor |
5.6 | 3.0 | 4.1 | 1.3 | versicolor |
5.5 | 2.5 | 4.0 | 1.3 | versicolor |
5.5 | 2.6 | 4.4 | 1.2 | versicolor |
6.1 | 3.0 | 4.6 | 1.4 | versicolor |
5.8 | 2.6 | 4.0 | 1.2 | versicolor |
5.0 | 2.3 | 3.3 | 1.0 | versicolor |
5.6 | 2.7 | 4.2 | 1.3 | versicolor |
5.7 | 3.0 | 4.2 | 1.2 | versicolor |
5.7 | 2.9 | 4.2 | 1.3 | versicolor |
6.2 | 2.9 | 4.3 | 1.3 | versicolor |
5.1 | 2.5 | 3.0 | 1.1 | versicolor |
5.7 | 2.8 | 4.1 | 1.3 | versicolor |
head(iris, n = 10) %>%
kable() %>%
kable_styling(full_width = FALSE) %>%
column_spec(2, color = spec_color(head(iris[, 2], n = 10))) %>%
column_spec(1, color = "white",
background = spec_color(head(iris[, 1], n = 10), option = "magma", end = 0.8))
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
---|---|---|---|---|
5.1 | 3.5 | 1.4 | 0.2 | setosa |
4.9 | 3.0 | 1.4 | 0.2 | setosa |
4.7 | 3.2 | 1.3 | 0.2 | setosa |
4.6 | 3.1 | 1.5 | 0.2 | setosa |
5.0 | 3.6 | 1.4 | 0.2 | setosa |
5.4 | 3.9 | 1.7 | 0.4 | setosa |
4.6 | 3.4 | 1.4 | 0.3 | setosa |
5.0 | 3.4 | 1.5 | 0.2 | setosa |
4.4 | 2.9 | 1.4 | 0.2 | setosa |
4.9 | 3.1 | 1.5 | 0.1 | setosa |
Kable tables are nice, but often it’s more useful to be able to (1) search a table and (2) sort a table by column names. For these purposes, the DT R package is more appropriate, which allows you to use the DataTable JavaScript library in an easy way.
datatable(iris,
rownames = FALSE,
class = 'cell-border stripe')
In addition to R, you can actually add code chunks from several languages (see full list by clicking Insert
in RStudio when working on an R markdown file).
Python and Bash are the most common languages that would be used in the context. Bash in particular could enable you to produce a notebook that would be a complete overview of a bioinformatics pipeline, even if that pipeline required a few standalone tools to be run on the command-line.
As a quick example, see the below bash commands which will be used to manipulate a test table that will be written.
As a test, a table of the first 10 rows of the iris dataframe will be written to a file called iris_head.tsv
.
write.table(x = iris[1:10, ], file = "/Users/Gavin/iris_head.tsv", quote = FALSE, row.names = FALSE, col.names = TRUE, sep="\t")
This file can now be parsed with bash commands!
For instance, this is the output of cat
:
cat /Users/Gavin/iris_head.tsv
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 5.1 3.5 1.4 0.2 setosa
## 4.9 3 1.4 0.2 setosa
## 4.7 3.2 1.3 0.2 setosa
## 4.6 3.1 1.5 0.2 setosa
## 5 3.6 1.4 0.2 setosa
## 5.4 3.9 1.7 0.4 setosa
## 4.6 3.4 1.4 0.3 setosa
## 5 3.4 1.5 0.2 setosa
## 4.4 2.9 1.4 0.2 setosa
## 4.9 3.1 1.5 0.1 setosa
And this is the output of wc -l
:
wc -l /Users/Gavin/iris_head.tsv
## 11 /Users/Gavin/iris_head.tsv
This is what the above code blocks looked like in the actual R markdown code:
The easiest way to read the Bash output back into R is to first save it to a file and then read it in. With Python this process is much easier fortunately!
The below example shows first how the iris
variable can be used in the Python chunk as a pandas
dataframe (assuming you have the pandas
python package installed). The resulting pandas
series can then be used in the subsequent R chunk (where it is converted to a standard numeric vector). Note the syntax of writing r.
and py$
before variable names to specify whether they are from the R or Python environments, respectively.
iris_means = r.iris.mean()
## /Users/Gavin/local/miniconda3/bin/python:1: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError. Select only valid columns before calling the reduction.
print(iris_means)
## Sepal.Length 5.843333
## Sepal.Width 3.057333
## Petal.Length 3.758000
## Petal.Width 1.199333
## dtype: float64
print(py$iris_means)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 5.843333 3.057333 3.758000 1.199333
Just to be clear - this is what the code in the R markdown file looked like for the above outputs - note that one code block specifies Python while the other specifies r, but the iris_means
object can be called by either, albeit with different syntax.
Although these examples of using other languages were extremely simple, hopefully they helped you see how useful R notebooks can be for creating and sharing highly reproducible workflows.
Often you want to show plots, tables, or other results for many variables in different tabs. However, it can be tedious (and prone to typos) to make many tabs by copy-and-pasting. To get around this problem you can use the results=‘asis’ option in a code block to specify that all text output should be treated as markdown code.
For example this code block (and markdown header) produces the tabbed output below.
The above quick examples are meant to show some slick features you may not be aware of in R notebooks as well as to show how useful R notebooks are in general. As mentioned at the beginning, probably what readers will find the most useful is being able to see the raw R markdown file, which you can download here.
The version numbers of all packages in the current environment as well as information about the R install is reported below. This is useful information to include so that others can better reproduce your work.
sessionInfo()
## R version 4.0.5 (2021-03-31)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] reticulate_1.22 knitr_1.34 kableExtra_1.3.4 ggbeeswarm_0.6.0
## [5] ggplot2_3.3.5 DT_0.18 cowplot_1.1.1
##
## loaded via a namespace (and not attached):
## [1] beeswarm_0.3.1 tidyselect_1.1.1 xfun_0.23 bslib_0.2.5.1
## [5] purrr_0.3.4 lattice_0.20-44 colorspace_2.0-1 vctrs_0.3.8
## [9] generics_0.1.0 htmltools_0.5.1.1 viridisLite_0.4.0 yaml_2.2.1
## [13] utf8_1.2.1 rlang_0.4.11 jquerylib_0.1.4 pillar_1.6.1
## [17] glue_1.4.2 withr_2.4.2 lifecycle_1.0.0 stringr_1.4.0
## [21] munsell_0.5.0 gtable_0.3.0 rvest_1.0.0 htmlwidgets_1.5.3
## [25] evaluate_0.14 labeling_0.4.2 crosstalk_1.1.1 vipor_0.4.5
## [29] fansi_0.4.2 highr_0.9 Rcpp_1.0.6 scales_1.1.1
## [33] webshot_0.5.2 jsonlite_1.7.2 farver_2.1.0 systemfonts_1.0.2
## [37] png_0.1-7 digest_0.6.27 stringi_1.6.2 dplyr_1.0.6
## [41] rprojroot_2.0.2 grid_4.0.5 here_1.0.1 tools_4.0.5
## [45] magrittr_2.0.1 sass_0.4.0 tibble_3.1.2 crayon_1.4.1
## [49] pkgconfig_2.0.3 ellipsis_0.3.2 Matrix_1.3-3 xml2_1.3.2
## [53] rmarkdown_2.8 svglite_2.0.0 httr_1.4.2 rstudioapi_0.13
## [57] R6_2.5.0 compiler_4.0.5