Create a new RStudio project
Open RStudio and create a new project, for more info see Using-Projects:
File > New Project > New Directory > New Project
Name the new directory (e.g. scRNA_analysis), and check “use renv with this project” if present.
Learn more about renv.
Install packages
One of R’s many benefits is the large, active user community, which produces and maintains many packages that extend the functionality of base R and provide functions that enable bioinformatic analyses without completely custom code.
The following package installation commands should be run individually, in the R console. Many of them will require your input to determine which, if any, dependencies should be updated; for the quickest result, attempt ‘n’ (none) first.
BiocManager
BiocManager is an interface for the bioinformatics-specific R package repository. We will be using BiocManager to install other packages when possible, rather than the base R function install.packages.
if (!requireNamespace("BiocManager", quietly = TRUE)){
install.packages("BiocManager")
}
rmarkdown
The rmarkdown package, when used with others like tinytex and knitr, allows you to knit your Rmd document to nicely-formatted reports.
if (!any(rownames(installed.packages()) == "rmarkdown")){
BiocManager::install("rmarkdown")
}
library(rmarkdown)
tinytex
TinyTeX is a small LaTeX distribution for use with R.
if (!any(rownames(installed.packages()) == "tinytex")){
BiocManager::install("tinytex")
}
library(tinytex)
knitr
if (!any(rownames(installed.packages()) == "knitr")){
BiocManager::install("knitr")
}
library(knitr)
kableExtra
The kableExtra package gives the user fine-grained control over table formats. This is useful for knit reports.
if (!any(rownames(installed.packages()) == "kableExtra")){
BiocManager::install("kableExtra")
}
library(kableExtra)
ggplot2
An extremely popular package by the authors of RStudio, ggplot2 produces highly customizable plots.
if (!any(rownames(installed.packages()) == "ggplot2")){
BiocManager::install("ggplot2")
}
library(ggplot2)
dplyr
Like ggplot2 and tidyr, dplyr is part of the “tidyverse” by the RStudio authors: a group of packages designed for data analysis and visualization.
if (!any(rownames(installed.packages()) == "dplyr")){
BiocManager::install("dplyr")
}
library(dplyr)
tidyr
if (!any(rownames(installed.packages()) == "tidyr")){
BiocManager::install("tidyr")
}
library(tidyr)
viridis
viridis produces accessible color palettes.
if (!any(rownames(installed.packages()) == "viridis")){
BiocManager::install("viridis")
}
library(viridis)
hdf5r
HDF5 (heirarchical data format version five) files can be used to store single cell expression data (including output from Cell Ranger). The hdf5r package provides utilities for interacting with the format.
if (!any(rownames(installed.packages()) == "hdf5r")){
BiocManager::install("hdf5r")
}
library(hdf5r)
Seurat
Seurat is an extensive package for the analysis of single cell experiments, from normalization to visualization.
if (!any(rownames(installed.packages()) == "Seurat")){
BiocManager::install("Seurat")
}
library(Seurat)
ComplexHeatmap
ComplexHeatmap produces beautiful, highly-customizable heat maps.
if (!any(rownames(installed.packages()) == "ComplexHeatmap")){
BiocManager::install("ComplexHeatmap")
}
library(ComplexHeatmap)
biomaRt
This package provides an interface to Ensembl databases.
if (!any(rownames(installed.packages()) == "biomaRt")){
BiocManager::install("biomaRt")
}
library(biomaRt)
org.Hs.eg.db
org.Hs.eg.db contains genome-wide annotation based on Entrez Gene identifiers in the Human genome.
if (!any(rownames(installed.packages()) == "org.Hs.eg.db")){
BiocManager::install("org.Hs.eg.db")
}
library(org.Hs.eg.db)
limma
Originally developed for microarray data, limma provides functions for linear modeling and differential expression.
if (!any(rownames(installed.packages()) == "limma")){
BiocManager::install("limma")
}
library(limma)
topGO
Test gene ontology (GO) term enrichment while accounting for the topology of the GO graph.
if (!any(rownames(installed.packages()) == "topGO")){
BiocManager::install("topGO")
}
library(topGO)
remotes
Some packages (or versions of packages) cannot be installed through Bioconductor. The remotes package contains tools for installing packages from a number of repositories, including GitHub.
if (!any(rownames(installed.packages()) == "remotes")){
utils::install.packages("remotes")
}
library(remotes)
ape
Analysis of Phylogenetics and Evolution (ape) is used to generate and manipulate phylogenetic trees. In this workshop, we will be using ape to investigate the relationships between clusters.
if (!any(rownames(installed.packages()) == "ape")){
utils::install.packages("ape")
}
library(ape)
DoubletFinder
DoubletFinder detects multiplets within single cell or nucleus data.
if (!any(rownames(installed.packages()) == "DoubletFinder")){
remotes::install_github('chris-mcginnis-ucsf/DoubletFinder')
}
library(DoubletFinder)
openxlsx
The openxlsx package is a suite of tools for reading and writing .xlsx files.
if (!any(rownames(installed.packages()) == "openxlsx")){
BiocManager::install("openxlsx")
}
library(openxlsx)
HGNChelper
Both R and Excel can introduce changes to gene symbols. HGNChelper can correct gene symbols that have been altered, and convert gene symbols to valid R names.
if (!any(rownames(installed.packages()) == "HGNChelper")){
BiocManager::install("HGNChelper")
}
library(HGNChelper)
Verfiy installation
Finally, we can get the session info to ensure that all of the packages were installed and loaded correctly.
sessionInfo()
Download materials and prepare for the next section
In the R console run the following command to download part 1 of data analysis.
Markdown template document
download.file("https://raw.githubusercontent.com/ucdavis-bioinformatics-training/2023-December-Single-Cell-RNA-Seq-Analysis/main/data_analysis/01-create_object.Rmd", "01-create_object.Rmd")
Expression matrix
In Rstudio, navigate to the terminal tab (next to the console). This gives you access to a bash terminal. Run the following code, remember to change username to your own username for tadpole:
scp username@tadpole.genomecenter.ucdavis.edu:/share/workshop/scRNA_workshop/cellranger_outs/expression_data_cellranger.zip ./
unzip expression_data_cellranger.zip
Some Windows users may need to use Filezilla/WinSCP to download the file instead.
When the download and extraction are complete, you should see three folders: A001-C-007, A001-C-104 and B001-A-301. Make sure the “01-create_object.Rmd” file is in the same location.