Single Cell Analysis with Seurat and some custom code!
Seurat is a popular R package that is designed for QC, analysis, and exploration of single cell data. Seurat aims to enable users to identify and interpret sources of heterogeneity from single cell transcriptomic measurements, and to integrate diverse types of single cell data. Further, the authors provide several tutorials, on their website.
The expression_tables_cellrangerV3.zip file contains the single cell matrix files for the three samples. These are isolated mouse cells ran on the 10X genomics platform (3’ expression V2) for single cell RNA sequencing, sequenced with UC Davis on 1 HiSeq 4000 lane. The Experiment contains 3 samples, each merged from 2 original samples and then randomly subsamples to 1000 cells each.
The three samples are, Dorsal root ganglion neurons : At weaning, Ttpa+/+ mice were fed a normal diet (35 mg of dl-α-tocopheryl acetate/kg, vitE+) while and Ttpa-/- mice were fed either an α-TOH-deficient diet (<10 mg of dl-α-tocopheryl acetate/kg, vitE-), or α-TOH-supplemented diet (600 mg of dl-α-tocopheryl acetate/kg, vitE+++) diet.
- UCD_Adj_VitE - normal + Vitamin E
- UCD_Supp_VitE - Vitamin E supplimented by diet.
- UCD_VitE_Def - Vitamin E deficient animals
We start with loading needed libraries for R, at this time all we need is the package Seurat.
library(Seurat)
Load the Cell Ranger Matrix Data and create the base Seurat object.
Cell Ranger provides a function cellranger aggr
that will combine multiple samples into a single matrix file. However, when processing data in R and Seurat this is unnecessary and we can aggregate them in R.
Seurat provides a function Read10X
to read in 10X data folder. First we read in data from each individual sample folder. First, we initialize the Seurat object (CreateSeuratObject
) with the raw (non-normalized data). Keep all genes expressed in >= 10 cells. Keep all cells with at least 200 detected genes. Also extracting sample names, calculating and adding in the metadata mitochondrial percentage of each cell. Adding in the metadata batchid and cell cycle. Finally, saving the raw Seurat object.
dataset_loc <- "./expression_tables_cellrangerV3"
ids <- c("UCD_Adj_VitE", "UCD_Supp_VitE", "UCD_VitE_Def")
d10x.data <- sapply(ids, function(i){
d10x <- Read10X(file.path(dataset_loc,i,"outs/filtered_feature_bc_matrix"))
colnames(d10x) <- paste(sapply(strsplit(colnames(d10x),split="-"),'[[',1L),i,sep="-")
d10x
})
experiment.data <- do.call("cbind", d10x.data)
experiment.aggregate <- CreateSeuratObject(
experiment.data,
project = "scRNA workshop",
min.cells = 10,
min.features = 200,
names.field = 2,
names.delim = "\\-")
The percentage of reads that map to the mitochondrial genome
- Low-quality / dying cells often exhibit extensive mitochondrial contamination.
- We calculate mitochondrial QC metrics with the PercentageFeatureSet function, which calculates the percentage of counts originating from a set of features.
- We use the set of all genes, in mouse these genes can be identified as those that begin with ‘mt’, in human data they begin with MT.
experiment.aggregate$percent.mito <- PercentageFeatureSet(experiment.aggregate, pattern = "^mt-")
Lets create a fake “batch” metadata (used in part 3), Here we determine UCD_Adj_VitE is from one batch and UCD_Adj_VitE/UCD_Adj_VitE are from a second battch
Here we build a new metadata variable ‘batchid’ which can be used to specify treatment groups.
samplename = experiment.aggregate$orig.ident
batchid = rep("Batch1",length(samplename))
batchid[samplename %in% c("UCD_Adj_VitE")] = "Batch2"
names(batchid) = colnames(experiment.aggregate)
experiment.aggregate <- AddMetaData(
object = experiment.aggregate,
metadata = batchid,
col.name = "batchid")
table(experiment.aggregate$batchid)
Lets spend a little time getting to know the Seurat object.
The Seurat object is the center of each single cell analysis. It stores all information associated with the dataset, including data, annotations, analyses, etc. The R function slotNames can be used to view the slot names within an object.
slotNames(experiment.aggregate)
head(experiment.aggregate[[]])
Question(s)
- What slots are empty, what slots have data?
- What columns are available in meta.data?
- Look up the help documentation for subset?
Finally, save the original object, write out a tab-delimited table that could be read into excel, and view the object.
write.table(as.matrix(experiment.data),"raw.datatable.txt",sep="\t",col.names=T,row.names=T)
experiment.aggregate
## Original dataset in Seurat class, with no filtering
save(experiment.aggregate,file="original_seurat_object.RData")
Get the next Rmd file
download.file("https://raw.githubusercontent.com/ucdavis-bioinformatics-training/2020-August-intro-scRNAseq/master/data_analysis/scRNA_Workshop-PART2.Rmd", "scRNA_Workshop-PART2.Rmd")
Session Information
sessionInfo()