Home
Introduction and Lectures
Intro to the Workshop and Core
Support
Schedule
Slack
Cheat Sheets
Software and Links
Scripts
GitHub repository
Biocore website
Prerequisites
CLI
R
Data Reduction
Files and Filetypes
Project setup
Generating Expression Matrix
scRNAseq Analysis
Prepare scRNAseq Analysis
Part 1- Create object
Part 2- Filtering
Part 3- Normalization and scaling
Part 4- Dimensionality reduction
Part 5- Clustering and cell type
Part 6- Enrichment and DE
Part 7- Doublet detection
Part 8- Integration

Introduction to Single Cell RNA-Seq Part 3: Normalize and scale

Set up workspace

First, we need to load the required libraries.

library(Seurat)
library(kableExtra)
experiment.aggregate <- readRDS("scRNA_workshop-02.rds")
experiment.aggregate

## An object of class Seurat 
## 11292 features across 6313 samples within 1 assay 
## Active assay: RNA (11292 features, 0 variable features)

set.seed(12345)

Normalize the data

After filtering, the next step is to normalize the data. We employ a global-scaling normalization method, LogNormalize, that normalizes the gene expression measurements for each cell by the total expression, multiplies this by a scale factor (10,000 by default), and then log-transforms the data.

?NormalizeData

experiment.aggregate <- NormalizeData(
  object = experiment.aggregate,
  normalization.method = "LogNormalize",
  scale.factor = 10000)

Cell cycle assignment

Cell cycle phase can be a significant source of variation in single cell and single nucleus experiments. There are a number of automated cell cycle stage detection methods available for single cell data. For this workshop, we will be using the built-in Seurat cell cycle function, CellCycleScoring. This tool compares gene expression in each cell to a list of cell cycle marker genes and scores each barcode based on marker expression. The phase with the highest score is selected for each barcode. Seurat includes a list of cell cycle genes in human single cell data.

s.genes <- cc.genes$s.genes
g2m.genes <- cc.genes$g2m.genes

For other species, a user-provided gene list may be substituted, or the orthologs of the human gene list used instead.

Do not run the code below for human experiments!

# mouse code DO NOT RUN for human data
library(biomaRt)
convertHumanGeneList <- function(x){
  require("biomaRt")
  human = useEnsembl("ensembl",
                     dataset = "hsapiens_gene_ensembl",
                     mirror = "uswest")
  mouse = useEnsembl("ensembl",
                     dataset = "mmusculus_gene_ensembl",
                     mirror = "uswest")
  genes = getLDS(attributes = c("hgnc_symbol"),
                 filters = "hgnc_symbol",
                 values = x ,
                 mart = human,
                 attributesL = c("mgi_symbol"),
                 martL = mouse,
                 uniqueRows=T)
  humanx = unique(genes[, 2])
  print(head(humanx)) # print first 6 genes found to the screen
  return(humanx)
}
# convert lists to mouse orthologs
s.genes <- convertHumanGeneList(cc.genes.updated.2019$s.genes)
g2m.genes <- convertHumanGeneList(cc.genes.updated.2019$g2m.genes)

Once an appropriate gene list has been identified, the CellCycleScoring function can be run.

experiment.aggregate <- CellCycleScoring(experiment.aggregate,
                                         s.features = s.genes,
                                         g2m.features = g2m.genes,
                                         set.ident = TRUE)
table(experiment.aggregate@meta.data$Phase) %>%
  kable(caption = "Number of Cells in each Cell Cycle Stage",
        col.names = c("Stage", "Count"),
        align = "c") %>%
  kable_styling()

Number of Cells in each Cell Cycle Stage
Stage	Count
G1	3777
G2M	1131
S	1405

Because the “set.ident” argument was set to TRUE (this is also the default behavior), the active identity of the Seurat object was changed to the phase. To return the active identity to the sample identity, use the Idents function.

table(Idents(experiment.aggregate))

## 
##    S  G2M   G1 
## 1405 1131 3777

Idents(experiment.aggregate) <- "orig.ident"
table(Idents(experiment.aggregate))

## 
## A001-C-007 A001-C-104 B001-A-301 
##       1023       1859       3431

Identify variable genes

The function FindVariableFeatures identifies the most highly variable genes (default 2000 genes) by fitting a line to the relationship of log(variance) and log(mean) using loess smoothing, uses this information to standardize the data, then calculates the variance of the standardized data. This helps avoid selecting genes that only appear variable due to their expression level.

?FindVariableFeatures

experiment.aggregate <- FindVariableFeatures(
  object = experiment.aggregate,
  selection.method = "vst")
length(VariableFeatures(experiment.aggregate))

## [1] 2000

top10 <- head(VariableFeatures(experiment.aggregate), 10)
top10

##  [1] "BEST4"      "CLCA4"      "SMOC2"      "NRG1"       "TPH1"      
##  [6] "LRMP"       "TRPM3"      "PTPRR"      "AC007493.1" "CACNA1A"

var.feat.plot <- VariableFeaturePlot(experiment.aggregate)
var.feat.plot <- LabelPoints(plot = var.feat.plot, points = top10, repel = TRUE)
var.feat.plot

How do the results change if you use selection.method = “dispersion” or selection.method = “mean.var.plot”?

FindVariableFeatures isn’t the only way to set the “variable features” of a Seurat object. Another reasonable approach is to select a set of “minimally expressed” genes.

min.value <- 2
min.cells <- 10

num.cells <- Matrix::rowSums(GetAssayData(experiment.aggregate, slot = "count") > min.value)
genes.use <- names(num.cells[which(num.cells >= min.cells)])
length(genes.use)

## [1] 7012

VariableFeatures(experiment.aggregate) <- genes.use

Scale the data

The ScaleData function scales and centers genes in the dataset. If variables are provided with the “vars.to.regress” argument, they are individually regressed against each gene, and the resulting residuals are then scaled and centered unless otherwise specified. We regress out cell cycle results S.Score and G2M.Score, mitochondrial RNA level (percent_MT), and the number of features (nFeature_RNA) as a proxy for sequencing depth.

experiment.aggregate <- ScaleData(experiment.aggregate,
                                  vars.to.regress = c("S.Score", "G2M.Score", "percent_MT", "nFeature_RNA"))

Prepare for the next section

Save object

saveRDS(experiment.aggregate, file = "scRNA_workshop-03.rds")

Download Rmd

download.file("https://raw.githubusercontent.com/ucdavis-bioinformatics-training/2023-December-Single-Cell-RNA-Seq-Analysis/main/data_analysis/04-dimensionality_reduction.Rmd", "04-dimensionality_reduction.Rmd")

Session Information

sessionInfo()

## R version 4.2.2 (2022-10-31)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur ... 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] kableExtra_1.3.4   SeuratObject_4.1.3 Seurat_4.3.0      
## 
## loaded via a namespace (and not attached):
##   [1] Rtsne_0.16             colorspace_2.0-3       deldir_1.0-6          
##   [4] ellipsis_0.3.2         ggridges_0.5.4         rstudioapi_0.14       
##   [7] spatstat.data_3.0-0    farver_2.1.1           leiden_0.4.3          
##  [10] listenv_0.8.0          ggrepel_0.9.2          fansi_1.0.3           
##  [13] xml2_1.3.3             codetools_0.2-18       splines_4.2.2         
##  [16] cachem_1.0.6           knitr_1.41             polyclip_1.10-4       
##  [19] jsonlite_1.8.4         ica_1.0-3              cluster_2.1.4         
##  [22] png_0.1-8              uwot_0.1.14            shiny_1.7.3           
##  [25] sctransform_0.3.5      spatstat.sparse_3.0-0  compiler_4.2.2        
##  [28] httr_1.4.4             assertthat_0.2.1       Matrix_1.5-3          
##  [31] fastmap_1.1.0          lazyeval_0.2.2         cli_3.4.1             
##  [34] later_1.3.0            htmltools_0.5.3        tools_4.2.2           
##  [37] igraph_1.3.5           gtable_0.3.1           glue_1.6.2            
##  [40] RANN_2.6.1             reshape2_1.4.4         dplyr_1.0.10          
##  [43] Rcpp_1.0.9             scattermore_0.8        jquerylib_0.1.4       
##  [46] vctrs_0.5.1            svglite_2.1.0          nlme_3.1-160          
##  [49] spatstat.explore_3.0-5 progressr_0.11.0       lmtest_0.9-40         
##  [52] spatstat.random_3.0-1  xfun_0.35              stringr_1.4.1         
##  [55] globals_0.16.2         rvest_1.0.3            mime_0.12             
##  [58] miniUI_0.1.1.1         lifecycle_1.0.3        irlba_2.3.5.1         
##  [61] goftest_1.2-3          future_1.29.0          MASS_7.3-58.1         
##  [64] zoo_1.8-11             scales_1.2.1           promises_1.2.0.1      
##  [67] spatstat.utils_3.0-1   parallel_4.2.2         RColorBrewer_1.1-3    
##  [70] yaml_2.3.6             reticulate_1.28        pbapply_1.6-0         
##  [73] gridExtra_2.3          ggplot2_3.4.0          sass_0.4.4            
##  [76] stringi_1.7.8          highr_0.9              systemfonts_1.0.4     
##  [79] rlang_1.0.6            pkgconfig_2.0.3        matrixStats_0.63.0    
##  [82] evaluate_0.18          lattice_0.20-45        tensor_1.5            
##  [85] ROCR_1.0-11            purrr_0.3.5            labeling_0.4.2        
##  [88] patchwork_1.1.2        htmlwidgets_1.5.4      cowplot_1.1.1         
##  [91] tidyselect_1.2.0       parallelly_1.32.1      RcppAnnoy_0.0.20      
##  [94] plyr_1.8.8             magrittr_2.0.3         R6_2.5.1              
##  [97] generics_0.1.3         DBI_1.1.3              withr_2.5.0           
## [100] pillar_1.8.1           fitdistrplus_1.1-8     survival_3.4-0        
## [103] abind_1.4-5            sp_1.5-1               tibble_3.1.8          
## [106] future.apply_1.10.0    KernSmooth_2.23-20     utf8_1.2.2            
## [109] spatstat.geom_3.0-3    plotly_4.10.1          rmarkdown_2.18        
## [112] grid_4.2.2             data.table_1.14.6      webshot_0.5.4         
## [115] digest_0.6.30          xtable_1.8-4           tidyr_1.2.1           
## [118] httpuv_1.6.6           munsell_0.5.0          viridisLite_0.4.1     
## [121] bslib_0.4.1

☰ Menu

Single Cell RNA-Seq Analysis