☰ Menu

      Single Cell RNA-Seq Analysis

Home
Introduction and Lectures
Intro to the Workshop and Core
Support
Schedule
Slack
Cheat Sheets
Software and Links
Scripts
GitHub repository
Biocore website
Prerequisites
CLI
R
Data Reduction
Files and Filetypes
Project setup
Generating Expression Matrix
scRNAseq Analysis
Prepare scRNAseq Analysis
Part 1- Create object
Part 2- Filtering
Part 3- Normalization and scaling
Part 4- Dimensionality reduction
Part 5- Clustering and cell type
Part 6- Enrichment and DE
Part 7- Doublet detection
Part 8- Integration

Introduction to Single Cell RNA-Seq Part 6: Enrichment and model-based differential expression

Set up workspace

library(Seurat)
library(limma)
library(topGO)
library(dplyr)
library(kableExtra)
set.seed(12345)
experiment.aggregate <- readRDS("scRNA_workshop-05.rds")
Idents(experiment.aggregate) <- "finalcluster"

1. Gene Ontology (GO) Enrichment of Genes Expressed in a Cluster

Gene Ontology provides a controlled vocabulary for describing gene products. Here we use enrichment analysis to identify GO terms that are over-represented among the gene expressed in cells in a given cluster.

cluster10 <- subset(experiment.aggregate, idents = '10')
expr <- as.matrix(GetAssayData(cluster10))

# Select genes that are expressed > 0 in at least half of cells
n.gt.0 <- apply(expr, 1, function(x)length(which(x > 0)))
expressed.genes <- rownames(expr)[which(n.gt.0/ncol(expr) >= 0.5)]
all.genes <- rownames(expr)

# define geneList as 1 if gene is in expressed.genes, 0 otherwise
geneList <- ifelse(all.genes %in% expressed.genes, 1, 0)
names(geneList) <- all.genes

# Create topGOdata object
	GOdata <- new("topGOdata",
		ontology = "BP", # use biological process ontology
		allGenes = geneList,
		geneSelectionFun = function(x)(x == 1),
              annot = annFUN.org, mapping = "org.Hs.eg.db", ID = "symbol")
# Test for enrichment using Fisher's Exact Test
	resultFisher <- runTest(GOdata, algorithm = "elim", statistic = "fisher")
	GenTable(GOdata, Fisher = resultFisher, topNodes = 20, numChar = 60) %>%
	  kable() %>%
	  kable_styling("striped", fixed_thead = TRUE)
GO.ID Term Annotated Significant Expected Fisher
GO:0006338 chromatin remodeling 464 39 20.46 7.3e-05
GO:0000381 regulation of alternative mRNA splicing, via spliceosome 39 8 1.72 0.00025
GO:0051660 establishment of centrosome localization 9 4 0.40 0.00039
GO:0006895 Golgi to endosome transport 17 5 0.75 0.00065
GO:0042060 wound healing 252 23 11.11 0.00074
GO:0045944 positive regulation of transcription by RNA polymerase II 810 55 35.71 0.00076
GO:0045053 protein retention in Golgi apparatus 5 3 0.22 0.00080
GO:0032534 regulation of microvillus assembly 5 3 0.22 0.00080
GO:0007043 cell-cell junction assembly 88 15 3.88 0.00113
GO:0070830 bicellular tight junction assembly 39 7 1.72 0.00139
GO:0035176 social behavior 20 5 0.88 0.00146
GO:2000650 negative regulation of sodium ion transmembrane transporter ... 6 3 0.26 0.00154
GO:0048669 collateral sprouting in absence of injury 2 2 0.09 0.00194
GO:0031175 neuron projection development 570 42 25.13 0.00195
GO:0002064 epithelial cell development 134 14 5.91 0.00230
GO:0007015 actin filament organization 317 28 13.98 0.00235
GO:0030155 regulation of cell adhesion 482 35 21.25 0.00242
GO:0042752 regulation of circadian rhythm 80 10 3.53 0.00259
GO:0030644 intracellular chloride ion homeostasis 7 3 0.31 0.00261
GO:0030050 vesicle transport along actin filament 7 3 0.31 0.00261

2. Model-based DE analysis in limma

limma is an R package for differential expression analysis of bulk RNASeq and microarray data. We apply it here to single cell data.

Limma can be used to fit any linear model to expression data and is useful for analyses that go beyond two-group comparisons. A detailed tutorial of model specification in limma is available here and in the limma User’s Guide.

# filter genes to those expressed in at least 10% of cells
keep <- rownames(expr)[which(n.gt.0/ncol(expr) >= 0.1)]
expr2 <- expr[keep,]

# Set up "design matrix" with statistical model
cluster10$proper.group <- make.names(cluster10$group)
mm <- model.matrix(~0 + proper.group + S.Score + G2M.Score + percent_MT + nFeature_RNA, data = cluster10[[]])
head(mm) %>%
	  kable() %>%
	  kable_styling("striped", fixed_thead = TRUE)
proper.groupColorectal.Cancer proper.groupNormal proper.groupPolyp S.Score G2M.Score percent_MT nFeature_RNA
AAACCCAAGTTATGGA_A001-C-007 1 0 0 0.0554074 -0.1461764 0.5717008 1513
AAGCCATCAAGACCTT_A001-C-007 1 0 0 0.0660183 -0.0729290 0.4841313 1265
AAGTTCGGTACCTATG_A001-C-007 1 0 0 0.0202042 -0.0451182 0.5148005 1148
AATGAAGTCAGCGTCG_A001-C-007 1 0 0 0.1225441 0.0612193 1.0398614 1272
ACAGAAATCCAGCTCT_A001-C-007 1 0 0 -0.0769269 -0.0707690 0.6334125 1637
ACGGTTACAAATCGGG_A001-C-007 1 0 0 -0.0550882 0.1286419 0.7523716 1934
tail(mm) %>%
	  kable() %>%
	  kable_styling("striped", fixed_thead = TRUE)
proper.groupColorectal.Cancer proper.groupNormal proper.groupPolyp S.Score G2M.Score percent_MT nFeature_RNA
TTCCACGAGTATTGCC_A001-C-104 0 0 1 0.4520556 0.0266469 0.3015580 4562
TTCTAGTAGATAACAC_A001-C-104 0 0 1 0.0218701 -0.0793631 1.3125000 1140
TTGCCTGTCCGCGATG_A001-C-104 0 0 1 -0.0612575 -0.0292726 0.3594352 2241
TTGGGATTCGTTCCCA_A001-C-104 0 0 1 -0.0709637 -0.0806137 0.4050223 1562
TTTCAGTTCGCACTCT_A001-C-104 0 0 1 -0.1047877 -0.0540112 1.8099548 1309
GACTCAACACACACGC_B001-A-301 0 1 0 0.0481511 -0.0900060 0.1574803 990
# Fit model in limma
fit <- lmFit(expr2, mm)
head(coef(fit)) %>%
	  kable() %>%
	  kable_styling("striped", fixed_thead = TRUE)
proper.groupColorectal.Cancer proper.groupNormal proper.groupPolyp S.Score G2M.Score percent_MT nFeature_RNA
ENSG00000291215 -0.2357714 -0.2248263 -0.0954233 0.5209505 -0.3817891 0.0510813 0.0001589
C1orf159 -0.0141535 -0.1325994 -0.0259810 -0.1442860 0.2517446 0.1333100 0.0001426
SDF4 0.3616255 0.1839501 0.4912284 0.2506248 1.0914474 -0.1494113 -0.0000750
CCNL2 0.4402786 1.6154478 0.0118319 0.4012170 -1.8906665 0.2768585 0.0003573
MIB2 0.0480968 -0.0562502 0.2279206 -0.1502399 -0.4402191 -0.1676190 0.0000508
CDK11B 0.6889952 -0.1659776 -0.0648067 -0.7018856 -0.2333263 0.1621756 0.0001548
# Test 'Normal' - 'Colorectal.Cancer'
contr <- makeContrasts(proper.groupNormal - proper.groupColorectal.Cancer, levels = colnames(coef(fit)))
contr %>%
	  kable() %>%
	  kable_styling("striped", fixed_thead = TRUE)
proper.groupNormal - proper.groupColorectal.Cancer
proper.groupColorectal.Cancer -1
proper.groupNormal 1
proper.groupPolyp 0
S.Score 0
G2M.Score 0
percent_MT 0
nFeature_RNA 0
fit2 <- contrasts.fit(fit, contrasts = contr)
fit2 <- eBayes(fit2)
out <- topTable(fit2, n = Inf, sort.by = "P")
head(out, 30) %>%
	  kable() %>%
	  kable_styling("striped", fixed_thead = TRUE)
logFC AveExpr t P.Value adj.P.Val B
UTP6 2.809558 0.1665493 5.729581 0.0000000 0.0002398 7.9550347
S100PBP 2.764765 0.2078133 4.898211 0.0000024 0.0057367 4.4838247
FDPS 2.194920 0.1608721 4.481281 0.0000142 0.0127450 2.8889818
CRKL 2.062359 0.1502419 4.431309 0.0000175 0.0127450 2.7049563
NMRAL1 2.284191 0.1670087 4.430493 0.0000175 0.0127450 2.7019633
SOAT1 2.026978 0.1554562 4.415095 0.0000187 0.0127450 2.6455817
CMSS1 2.179765 0.1843519 4.398089 0.0000200 0.0127450 2.5834884
TXNL4A 2.044588 0.1549221 4.379329 0.0000216 0.0127450 2.5152036
PROX1 2.223377 0.1608351 4.344625 0.0000249 0.0127450 2.3894760
ADAMTSL1 3.378881 0.4051661 4.315871 0.0000280 0.0127450 2.2858848
LNX2 2.802389 0.3089559 4.306615 0.0000290 0.0127450 2.2526522
SEC23B 2.024936 0.1510704 4.256737 0.0000355 0.0142802 2.0745228
TCP11L1 2.194426 0.1880198 4.112117 0.0000629 0.0218800 1.5672384
ZNF3 2.241657 0.1841910 4.097439 0.0000666 0.0218800 1.5165269
RBL1 2.107244 0.1733018 4.092354 0.0000680 0.0218800 1.4989902
ENSG00000287292 2.649530 0.2658202 4.044865 0.0000817 0.0231668 1.3360639
WDTC1 2.184789 0.1787432 4.032124 0.0000858 0.0231668 1.2926107
MED17 2.135271 0.1824321 4.009965 0.0000935 0.0231668 1.2172980
RXRA 2.060384 0.1753848 4.005170 0.0000952 0.0231668 1.2010447
CDK18 2.168277 0.1831622 4.003186 0.0000959 0.0231668 1.1943236
ADPGK 2.058125 0.1664844 3.989304 0.0001012 0.0232690 1.1473755
ZNF480 2.236793 0.2006418 3.965854 0.0001107 0.0242922 1.0683681
PPIL4 2.226664 0.2090412 3.892616 0.0001460 0.0306594 0.8240318
IRF3 2.687759 0.3212877 3.873166 0.0001571 0.0316070 0.7597636
SARS1 2.368320 0.2254349 3.835478 0.0001808 0.0346093 0.6359795
TMEM51-AS1 1.994662 0.1726518 3.808954 0.0001995 0.0346093 0.5494559
PRAG1 2.223802 0.2200442 3.793651 0.0002111 0.0346093 0.4997596
RNF13 2.212740 0.2147929 3.778006 0.0002237 0.0346093 0.4491215
KALRN 2.052896 0.1794657 3.775951 0.0002254 0.0346093 0.4424844
ST6GAL1 2.169592 0.1866848 3.770198 0.0002302 0.0346093 0.4239141

Output columns:

Save files

write.csv(GenTable(GOdata, Fisher = resultFisher), file = "cluster10_GOdata.csv")
write.csv(out, file = "cluster10_Normal-Colorectal.Cancer_topTable.csv")

A note on pseudobulk DE

Pseudobulk differential expression uses count data summed across all cells in each sample (typically within each cell type or cluster). Unlike cell-level DE, pseudobulk DE requires biological replicates so we won’t perform it on this dataset.

Once counts are summed, pseudobulk data are analyzed like bulk RNASeq data.

Pseudobulk DE may result in better false discovery rate control than cell-level DE, as shown here.

The Seurat function AggregateExpression() can be used to sum counts as described here.

A tutorial on using limma for bulk RNASeq is available here.

Prepare for the next section

Download Rmd document

download.file("https://raw.githubusercontent.com/ucdavis-bioinformatics-training/2025-July-Single-Cell-RNA-Seq-Analysis/main/data_analysis/07-doublet_detection.Rmd", "07-doublet_detection.Rmd")

Session Information

sessionInfo()
## R version 4.4.3 (2025-02-28)
## Platform: aarch64-apple-darwin20
## Running under: macOS Ventura 13.7.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/Los_Angeles
## tzcode source: internal
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] org.Hs.eg.db_3.19.1  kableExtra_1.4.0     dplyr_1.1.4         
##  [4] topGO_2.56.0         SparseM_1.82         GO.db_3.19.1        
##  [7] AnnotationDbi_1.66.0 IRanges_2.38.0       S4Vectors_0.44.0    
## [10] Biobase_2.64.0       graph_1.82.0         BiocGenerics_0.50.0 
## [13] limma_3.60.2         Seurat_5.2.1         SeuratObject_5.0.2  
## [16] sp_2.1-4            
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3      rstudioapi_0.16.0       jsonlite_1.8.8         
##   [4] magrittr_2.0.3          spatstat.utils_3.1-2    rmarkdown_2.27         
##   [7] zlibbioc_1.50.0         vctrs_0.6.5             ROCR_1.0-11            
##  [10] memoise_2.0.1           spatstat.explore_3.2-7  htmltools_0.5.8.1      
##  [13] sass_0.4.9              sctransform_0.4.1       parallelly_1.37.1      
##  [16] KernSmooth_2.23-26      bslib_0.7.0             htmlwidgets_1.6.4      
##  [19] ica_1.0-3               plyr_1.8.9              plotly_4.10.4          
##  [22] zoo_1.8-12              cachem_1.1.0            igraph_2.0.3           
##  [25] mime_0.12               lifecycle_1.0.4         pkgconfig_2.0.3        
##  [28] Matrix_1.7-2            R6_2.5.1                fastmap_1.2.0          
##  [31] GenomeInfoDbData_1.2.12 fitdistrplus_1.1-11     future_1.33.2          
##  [34] shiny_1.8.1.1           digest_0.6.35           colorspace_2.1-0       
##  [37] patchwork_1.2.0         tensor_1.5              RSpectra_0.16-1        
##  [40] irlba_2.3.5.1           RSQLite_2.3.7           progressr_0.14.0       
##  [43] fansi_1.0.6             spatstat.sparse_3.0-3   httr_1.4.7             
##  [46] polyclip_1.10-6         abind_1.4-5             compiler_4.4.3         
##  [49] bit64_4.0.5             DBI_1.2.3               fastDummies_1.7.3      
##  [52] highr_0.11              MASS_7.3-64             tools_4.4.3            
##  [55] lmtest_0.9-40           httpuv_1.6.15           future.apply_1.11.2    
##  [58] goftest_1.2-3           glue_1.7.0              nlme_3.1-167           
##  [61] promises_1.3.0          grid_4.4.3              Rtsne_0.17             
##  [64] cluster_2.1.8           reshape2_1.4.4          generics_0.1.3         
##  [67] gtable_0.3.5            spatstat.data_3.0-4     tidyr_1.3.1            
##  [70] data.table_1.15.4       xml2_1.3.6              XVector_0.44.0         
##  [73] utf8_1.2.4              spatstat.geom_3.2-9     RcppAnnoy_0.0.22       
##  [76] ggrepel_0.9.5           RANN_2.6.1              pillar_1.9.0           
##  [79] stringr_1.5.1           spam_2.10-0             RcppHNSW_0.6.0         
##  [82] later_1.3.2             splines_4.4.3           lattice_0.22-6         
##  [85] bit_4.0.5               survival_3.8-3          deldir_2.0-4           
##  [88] tidyselect_1.2.1        Biostrings_2.72.0       miniUI_0.1.1.1         
##  [91] pbapply_1.7-2           knitr_1.47              gridExtra_2.3          
##  [94] svglite_2.1.3           scattermore_1.2         xfun_0.44              
##  [97] statmod_1.5.0           matrixStats_1.3.0       UCSC.utils_1.0.0       
## [100] stringi_1.8.4           lazyeval_0.2.2          yaml_2.3.8             
## [103] evaluate_0.23           codetools_0.2-20        tibble_3.2.1           
## [106] cli_3.6.2               uwot_0.2.2              systemfonts_1.1.0      
## [109] xtable_1.8-4            reticulate_1.39.0       munsell_0.5.1          
## [112] jquerylib_0.1.4         GenomeInfoDb_1.40.1     Rcpp_1.0.12            
## [115] globals_0.16.3          spatstat.random_3.2-3   png_0.1-8              
## [118] parallel_4.4.3          blob_1.2.4              ggplot2_3.5.1          
## [121] dotCall64_1.1-1         listenv_0.9.1           viridisLite_0.4.2      
## [124] scales_1.3.0            ggridges_0.5.6          crayon_1.5.2           
## [127] purrr_1.0.2             rlang_1.1.3             KEGGREST_1.44.0        
## [130] cowplot_1.1.3