☰ Menu

      Single Cell RNA-Seq Analysis

Home
Introduction and Lectures
Intro to the Workshop and Core
Support
Schedule
Slack
Cheat Sheets
Software and Links
Scripts
GitHub repository
Biocore website
Prerequisites
CLI
R
Data Reduction
Files and Filetypes
Project setup
Generating Expression Matrix
scRNAseq Analysis
Prepare scRNAseq Analysis
Part 1- Create object
Part 2- Filtering
Part 3- Normalization and scaling
Part 4- Dimensionality reduction
Part 5- Clustering and cell type
Part 6- Enrichment and DE
Part 7- Doublet detection
Part 8- Integration

Create a new RStudio project

Open RStudio and create a new project, for more info see Using-Projects:

File > New Project > New Directory > New Project

Name the new directory (e.g. scRNA_analysis), and check “use renv with this project” if present.

Learn more about renv.

Install packages

One of R’s many benefits is the large, active user community, which produces and maintains many packages that extend the functionality of base R and provide functions that enable bioinformatic analyses without completely custom code.

The following package installation commands should be run individually, in the R console. Many of them will require your input to determine which, if any, dependencies should be updated; for the quickest result, attempt ‘n’ (none) first.

BiocManager

BiocManager is an interface for the bioinformatics-specific R package repository. We will be using BiocManager to install other packages when possible, rather than the base R function install.packages.

if (!requireNamespace("BiocManager", quietly = TRUE)){
    install.packages("BiocManager")
}

rmarkdown

The rmarkdown package, when used with others like tinytex and knitr, allows you to knit your Rmd document to nicely-formatted reports.

if (!any(rownames(installed.packages()) == "rmarkdown")){
  BiocManager::install("rmarkdown")
}
library(rmarkdown)

tinytex

TinyTeX is a small LaTeX distribution for use with R.

if (!any(rownames(installed.packages()) == "tinytex")){
  BiocManager::install("tinytex")
}
library(tinytex)

knitr

if (!any(rownames(installed.packages()) == "knitr")){
  BiocManager::install("knitr")
}
library(knitr)

kableExtra

The kableExtra package gives the user fine-grained control over table formats. This is useful for knit reports.

if (!any(rownames(installed.packages()) == "kableExtra")){
  BiocManager::install("kableExtra")
}
library(kableExtra)

ggplot2

An extremely popular package by the authors of RStudio, ggplot2 produces highly customizable plots.

if (!any(rownames(installed.packages()) == "ggplot2")){
  BiocManager::install("ggplot2")
}
library(ggplot2)

dplyr

Like ggplot2 and tidyr, dplyr is part of the “tidyverse” by the RStudio authors: a group of packages designed for data analysis and visualization.

if (!any(rownames(installed.packages()) == "dplyr")){
  BiocManager::install("dplyr")
}
library(dplyr)

tidyr

if (!any(rownames(installed.packages()) == "tidyr")){
  BiocManager::install("tidyr")
}
library(tidyr)

viridis

viridis produces accessible color palettes.

if (!any(rownames(installed.packages()) == "viridis")){
  BiocManager::install("viridis")
}
library(viridis)

hdf5r

HDF5 (heirarchical data format version five) files can be used to store single cell expression data (including output from Cell Ranger). The hdf5r package provides utilities for interacting with the format.

if (!any(rownames(installed.packages()) == "hdf5r")){
  BiocManager::install("hdf5r")
}
library(hdf5r)

Seurat

Seurat is an extensive package for the analysis of single cell experiments, from normalization to visualization.

if (!any(rownames(installed.packages()) == "Seurat")){
  BiocManager::install("Seurat")
}
library(Seurat)

ComplexHeatmap

ComplexHeatmap produces beautiful, highly-customizable heat maps.

if (!any(rownames(installed.packages()) == "ComplexHeatmap")){
  BiocManager::install("ComplexHeatmap")
}
library(ComplexHeatmap)

biomaRt

This package provides an interface to Ensembl databases.

if (!any(rownames(installed.packages()) == "biomaRt")){
  BiocManager::install("biomaRt")
}
library(biomaRt)

org.Hs.eg.db

org.Hs.eg.db contains genome-wide annotation based on Entrez Gene identifiers in the Human genome.

if (!any(rownames(installed.packages()) == "org.Hs.eg.db")){
  BiocManager::install("org.Hs.eg.db")
}
library(org.Hs.eg.db)

limma

Originally developed for microarray data, limma provides functions for linear modeling and differential expression.

if (!any(rownames(installed.packages()) == "limma")){
  BiocManager::install("limma")
}
library(limma)

topGO

Test gene ontology (GO) term enrichment while accounting for the topology of the GO graph.

if (!any(rownames(installed.packages()) == "topGO")){
  BiocManager::install("topGO")
}
library(topGO)

remotes

Some packages (or versions of packages) cannot be installed through Bioconductor. The remotes package contains tools for installing packages from a number of repositories, including GitHub.

if (!any(rownames(installed.packages()) == "remotes")){
  utils::install.packages("remotes")
}
library(remotes)

ape

Analysis of Phylogenetics and Evolution (ape) is used to generate and manipulate phylogenetic trees. In this workshop, we will be using ape to investigate the relationships between clusters.

if (!any(rownames(installed.packages()) == "ape")){
  utils::install.packages("ape")
}
library(ape)

DoubletFinder

DoubletFinder detects multiplets within single cell or nucleus data.

if (!any(rownames(installed.packages()) == "DoubletFinder")){
  remotes::install_github('chris-mcginnis-ucsf/DoubletFinder')
}
library(DoubletFinder)

openxlsx

The openxlsx package is a suite of tools for reading and writing .xlsx files.

if (!any(rownames(installed.packages()) == "openxlsx")){
  BiocManager::install("openxlsx")
}
library(openxlsx)

HGNChelper

Both R and Excel can introduce changes to gene symbols. HGNChelper can correct gene symbols that have been altered, and convert gene symbols to valid R names.

if (!any(rownames(installed.packages()) == "HGNChelper")){
  BiocManager::install("HGNChelper")
}
library(HGNChelper)

Verfiy installation

Finally, we can get the session info to ensure that all of the packages were installed and loaded correctly.

sessionInfo()

Download materials and prepare for the next section

In the R console run the following command to download part 1 of data analysis.

Markdown template document

download.file("https://raw.githubusercontent.com/ucdavis-bioinformatics-training/2023-December-Single-Cell-RNA-Seq-Analysis/main/data_analysis/01-create_object.Rmd", "01-create_object.Rmd")

Expression matrix

In Rstudio, navigate to the terminal tab (next to the console). This gives you access to a bash terminal. Run the following code, remember to change username to your own username for tadpole:


scp username@tadpole.genomecenter.ucdavis.edu:/share/workshop/scRNA_workshop/cellranger_outs/expression_data_cellranger.zip ./
unzip expression_data_cellranger.zip

Some Windows users may need to use Filezilla/WinSCP to download the file instead.

When the download and extraction are complete, you should see three folders: A001-C-007, A001-C-104 and B001-A-301. Make sure the “01-create_object.Rmd” file is in the same location.