Single Cell Analysis with Seurat and some custom code!

Seurat is a popular R package that is designed for QC, analysis, and exploration of single cell RNA-seq data. Seurat aims to enable users to identify and interpret sources of heterogeneity from single cell transcriptomic measurements, and to integrate diverse types of single cell data. Further, the authors provide several tutorials, on their website.

Download and expand the expression_tables_cellrangerV3.zip file to extract the single cell matrix files for the three samples. These are isolated mouse cells ran on the 10X genomics platform (3’ expression V2) for single cell RNA sequencing, sequenced with UC Davis on 1 HiSeq 4000 lane. The Experiment contains 3 samples, each merged from 2 original samples and then randomly subsamples to 1000 cells each.

The three samples are, Dorsal root ganglion neurons : At weaning, Ttpa+/+ mice were fed a normal diet (35 mg of dl-α-tocopheryl acetate/kg, vitE+) while and Ttpa-/- mice were fed either an α-TOH-deficient diet (<10 mg of dl-α-tocopheryl acetate/kg, vitE-), or α-TOH-supplemented diet (600 mg of dl-α-tocopheryl acetate/kg, vitE+++) diet.

UCD_Adj_VitE - normal + Vitamin E
UCD_Supp_VitE - Vitamin E supplimented by diet.
UCD_VitE_Def - Vitamin E deficient animals

We start with loading needed libraries for R, at this time all we need is the package Seurat.

library(Seurat)

Load the Cell Ranger Matrix Data and create the base Seurat object.

Cell Ranger provides a function cellranger aggr that will combine multiple samples into a single matrix file. However, when processing data in R and Seurat this is unnecessary and we can aggregate them in R.

Seurat provides a function Read10X to read in 10X data folder. First we read in data from each individual sample folder. First, we initialize the Seurat object (CreateSeuratObject) with the raw (non-normalized data). Keep all genes expressed in >= 10 cells. Keep all cells with at least 200 detected genes. Also extracting sample names, calculating and adding in the metadata mitochondrial percentage of each cell. Adding in the metadata batchid and cell cycle. Finally, saving the raw Seurat object.

dataset_loc <- "./expression_tables_cellrangerV3"
ids <- c("UCD_Adj_VitE", "UCD_Supp_VitE", "UCD_VitE_Def")

d10x.data <- sapply(ids, function(i){
  d10x <- Read10X(file.path(dataset_loc,i,"outs/filtered_feature_bc_matrix"))
  colnames(d10x) <- paste(sapply(strsplit(colnames(d10x),split="-"),'[[',1L),i,sep="-")
  d10x
})

experiment.data <- do.call("cbind", d10x.data)

experiment.aggregate <- CreateSeuratObject(
  experiment.data,
  project = "scRNA workshop",
  min.cells = 10,
  min.features = 200,
  names.field = 2,
  names.delim = "\\-")

The percentage of reads that map to the mitochondrial genome

Low-quality / dying cells often exhibit extensive mitochondrial contamination.
We calculate mitochondrial QC metrics with the PercentageFeatureSet function, which calculates the percentage of counts originating from a set of features.
We use the set of all genes, in mouse these genes can be identified as those that begin with ‘mt’, in human data they begin with MT.

experiment.aggregate$percent.mito <- PercentageFeatureSet(experiment.aggregate, pattern = "^mt-")

Calculate cell cycle, add to meta data

Using the package scran, get the mouse cell cycle markers and a mapping of m

mm.pairs <- readRDS(system.file("exdata", "mouse_cycle_markers.rds", package="scran"))
# Convert to matrix for use in cycle
mat <- as.matrix(GetAssayData(experiment.aggregate))

# Convert rownames to ENSEMBL IDs, Using biomaRt
#ensembl<- useMart(biomart = "ensembl", dataset = "mmusculus_gene_ensembl")
#anno_data <- getBM( values=rownames(mat), attributes=c("mgi_symbol","ensembl_gene_id") , filters= "mgi_symbol"  ,mart=ensembl)
# Downloaded from Biomart
#download.file("https://raw.githubusercontent.com/ucdavis-bioinformatics-training/2019-single-cell-RNA-sequencing-Workshop-UCD_UCSF/master/scrnaseq_analysis/mart_export_June2019.txt", "mart_export_June2019.txt")
anno <- read.delim("mart_export_June2019.txt")

ord <- match(rownames(mat), anno$MGI.symbol) # use anno$mgi_symbol if via biomaRt
rownames(mat) <- anno$Gene.stable.ID[ord] # use anno$ensembl_gene_id if via biomaRt
drop <- which(is.na(rownames(mat)))
mat <- mat[-drop,]
cycles <- scran::cyclone(mat, pairs=mm.pairs)
tmp <- data.frame(cell.cycle = cycles$phases)
rownames(tmp) <- colnames(mat)
experiment.aggregate <- AddMetaData(experiment.aggregate, tmp)

Lets create a fake batch metadata (used in part 3), Here we determine UCD_Adj_VitE is from one batch and UCD_Adj_VitE/UCD_Adj_VitE are from a second battch

Here we build a new metadata variable ‘batchid’ which can be used to specify treatment groups.

samplename = experiment.aggregate@meta.data$orig.ident
table(samplename)

## samplename
##  UCD_Adj_VitE UCD_Supp_VitE  UCD_VitE_Def 
##           904          1000           992

batchid = rep("Batch1",length(samplename))
batchid[samplename %in% c("UCD_Adj_VitE")] = "Batch2"
names(batchid) = rownames(experiment.aggregate@meta.data)

experiment.aggregate <- AddMetaData(
  object = experiment.aggregate,
  metadata = batchid,
  col.name = "batchid")

table(experiment.aggregate$batchid)

## 
## Batch1 Batch2 
##   1992    904

Lets spend a little time getting to know the Seurat object.

The Seurat object is the center of each single cell analysis. It stores all information associated with the dataset, including data, annotations, analyses, etc. The R function slotNames can be used to view the slot names within an object.

slotNames(experiment.aggregate)

##  [1] "assays"       "meta.data"    "active.assay" "active.ident"
##  [5] "graphs"       "neighbors"    "reductions"   "project.name"
##  [9] "misc"         "version"      "commands"     "tools"

We can then view the data within a slot with the @ operator.

head(experiment.aggregate@meta.data)

##                                  orig.ident nCount_RNA nFeature_RNA
## ACTCTAATGTGGGTATG-UCD_Adj_VitE UCD_Adj_VitE       8885         2818
## AGGCTGGTCAATCACAC-UCD_Adj_VitE UCD_Adj_VitE       5019         2167
## ATGACTAGCACATGACT-UCD_Adj_VitE UCD_Adj_VitE       2208         1286
## AAGCGTCGTCTCTAAGG-UCD_Adj_VitE UCD_Adj_VitE       2795         1474
## ACATCGGGTCCATGCTC-UCD_Adj_VitE UCD_Adj_VitE       5372         2271
## ATACGGTAGTGACCAAG-UCD_Adj_VitE UCD_Adj_VitE        598          367
##                                percent.mito cell.cycle batchid
## ACTCTAATGTGGGTATG-UCD_Adj_VitE     1.969612         G1  Batch2
## AGGCTGGTCAATCACAC-UCD_Adj_VitE     6.216378         G1  Batch2
## ATGACTAGCACATGACT-UCD_Adj_VitE     6.838768         G1  Batch2
## AAGCGTCGTCTCTAAGG-UCD_Adj_VitE     4.221825         G1  Batch2
## ACATCGGGTCCATGCTC-UCD_Adj_VitE     7.557707         G1  Batch2
## ATACGGTAGTGACCAAG-UCD_Adj_VitE    11.371237         G1  Batch2

Question(s)

What slots are empty, what slots have data?
What columns are available in meta.data?
Look up the help documentation for subset?

Finally, save the original object, write out a tab-delimited table that could be read into excel, and view the object.

write.table(as.matrix(experiment.data),"raw.datatable.txt",sep="\t",col.names=T,row.names=T)
experiment.aggregate

## An object of class Seurat 
## 12811 features across 2896 samples within 1 assay 
## Active assay: RNA (12811 features)

## Original dataset in Seurat class, with no filtering
save(experiment.aggregate,file="original_seurat_object.RData")

Get the next Rmd file

download.file("https://raw.githubusercontent.com/ucdavis-bioinformatics-training/2019-single-cell-RNA-sequencing-Workshop-UCD_UCSF/master/scrnaseq_analysis/scRNA_Workshop-PART2.Rmd", "scRNA_Workshop-PART2.Rmd")

Session Information

sessionInfo()

## R version 3.6.0 (2019-04-26)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.5
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] Seurat_3.0.2
## 
## loaded via a namespace (and not attached):
##   [1] ggbeeswarm_0.6.0            Rtsne_0.15                 
##   [3] colorspace_1.4-1            ggridges_0.5.1             
##   [5] dynamicTreeCut_1.63-1       XVector_0.24.0             
##   [7] GenomicRanges_1.36.0        BiocNeighbors_1.2.0        
##   [9] listenv_0.7.0               npsurv_0.4-0               
##  [11] ggrepel_0.8.1               codetools_0.2-16           
##  [13] splines_3.6.0               R.methodsS3_1.7.1          
##  [15] lsei_1.2-0                  knitr_1.23                 
##  [17] scater_1.12.2               jsonlite_1.6               
##  [19] ica_1.0-2                   cluster_2.1.0              
##  [21] png_0.1-7                   R.oo_1.22.0                
##  [23] sctransform_0.2.0           compiler_3.6.0             
##  [25] httr_1.4.0                  dqrng_0.2.1                
##  [27] assertthat_0.2.1            Matrix_1.2-17              
##  [29] lazyeval_0.2.2              limma_3.40.2               
##  [31] BiocSingular_1.0.0          htmltools_0.3.6            
##  [33] tools_3.6.0                 rsvd_1.0.1                 
##  [35] igraph_1.2.4.1              gtable_0.3.0               
##  [37] glue_1.3.1                  GenomeInfoDbData_1.2.1     
##  [39] RANN_2.6.1                  reshape2_1.4.3             
##  [41] dplyr_0.8.1                 Rcpp_1.0.1                 
##  [43] Biobase_2.44.0              gdata_2.18.0               
##  [45] ape_5.3                     nlme_3.1-140               
##  [47] DelayedMatrixStats_1.6.0    gbRd_0.4-11                
##  [49] lmtest_0.9-37               xfun_0.7                   
##  [51] stringr_1.4.0               globals_0.12.4             
##  [53] irlba_2.3.3                 gtools_3.8.1               
##  [55] statmod_1.4.32              future_1.13.0              
##  [57] edgeR_3.26.5                MASS_7.3-51.4              
##  [59] zlibbioc_1.30.0             zoo_1.8-6                  
##  [61] scales_1.0.0                parallel_3.6.0             
##  [63] SummarizedExperiment_1.14.0 RColorBrewer_1.1-2         
##  [65] SingleCellExperiment_1.6.0  yaml_2.2.0                 
##  [67] reticulate_1.12             pbapply_1.4-0              
##  [69] gridExtra_2.3               ggplot2_3.2.0              
##  [71] stringi_1.4.3               S4Vectors_0.22.0           
##  [73] scran_1.12.1                caTools_1.17.1.2           
##  [75] BiocGenerics_0.30.0         BiocParallel_1.18.0        
##  [77] bibtex_0.4.2                GenomeInfoDb_1.20.0        
##  [79] Rdpack_0.11-0               SDMTools_1.1-221.1         
##  [81] rlang_0.3.4                 pkgconfig_2.0.2            
##  [83] bitops_1.0-6                matrixStats_0.54.0         
##  [85] evaluate_0.14               lattice_0.20-38            
##  [87] ROCR_1.0-7                  purrr_0.3.2                
##  [89] htmlwidgets_1.3             cowplot_0.9.4              
##  [91] tidyselect_0.2.5            plyr_1.8.4                 
##  [93] magrittr_1.5                R6_2.4.0                   
##  [95] IRanges_2.18.1              gplots_3.0.1.1             
##  [97] DelayedArray_0.10.0         pillar_1.4.1               
##  [99] fitdistrplus_1.0-14         survival_2.44-1.1          
## [101] RCurl_1.95-4.12             tibble_2.1.3               
## [103] future.apply_1.3.0          tsne_0.1-3                 
## [105] crayon_1.3.4                KernSmooth_2.23-15         
## [107] plotly_4.9.0                rmarkdown_1.13             
## [109] viridis_0.5.1               locfit_1.5-9.1             
## [111] grid_3.6.0                  data.table_1.12.2          
## [113] metap_1.1                   digest_0.6.19              
## [115] tidyr_0.8.3                 R.utils_2.9.0              
## [117] stats4_3.6.0                munsell_0.5.0              
## [119] beeswarm_0.2.3              viridisLite_0.3.0          
## [121] vipor_0.4.5

UC Davis Bioinformatics Core June/July 2019 scRNA-Seq Workshop @ UCD and UCSF

2019 single cell RNA sequencing Workshop @ UCD AND UCSF