Home
Introduction and Lectures
Intro to the Workshop and Core
Schedule
What is Bioinformatics/Genomics?
Experimental Design and Cost Estimation
Support
Slack
Zoom
Cheat Sheets
Software and Links
Scripts
Prerequisites
CLI
R
Data Reduction
Project setup
Generating Expression Matrix
scRNAseq Analysis
Prepare scRNAseq Analysis
scRNAseq Analysis - PART1
scRNAseq Analysis - PART2
scRNAseq Analysis - PART3
scRNAseq Analysis - PART4
scRNAseq Analysis - PART5
scRNAseq Analysis - PART6
ETC
Closing thoughts
Workshop Photos
Github page
Biocore website

Setup additinal options

Load libraries

library(Seurat)
library(ggplot2)
library(limma)
library(topGO)
library(WGCNA)

Load the Seurat object

load("clusters_seurat_object.RData")
experiment.merged

An object of class Seurat 36601 features across 4000 samples within 1 assay Active assay: RNA (36601 features, 3783 variable features) 3 dimensional reductions calculated: pca, tsne, umap

Idents(experiment.merged) <- "RNA_snn_res.0.25"

2. Gene Ontology (GO) Enrichment of Genes Expressed in a Cluster

# install org.Hs.eg.db from Bioconductor if not already installed (for mouse only)
cluster0 <- subset(experiment.merged, idents = '0')
expr <- as.matrix(GetAssayData(cluster0))
# Filter out genes that are 0 for every cell in this cluster
bad <- which(rowSums(expr) == 0)
expr <- expr[-bad,]

# Select genes that are expressed > 0 in at least 75% of cells (somewhat arbitrary definition)
n.gt.0 <- apply(expr, 1, function(x)length(which(x > 0)))
expressed.genes <- rownames(expr)[which(n.gt.0/ncol(expr) >= 0.5)]
all.genes <- rownames(expr)

# define geneList as 1 if gene is in expressed.genes, 0 otherwise
geneList <- ifelse(all.genes %in% expressed.genes, 1, 0)
names(geneList) <- all.genes

# Create topGOdata object
	GOdata <- new("topGOdata",
		ontology = "BP", # use biological process ontology
		allGenes = geneList,
		geneSelectionFun = function(x)(x == 1),
              annot = annFUN.org, mapping = "org.Hs.eg.db", ID = "symbol")
# Test for enrichment using Fisher's Exact Test
	resultFisher <- runTest(GOdata, algorithm = "elim", statistic = "fisher")
	GenTable(GOdata, Fisher = resultFisher, topNodes = 20, numChar = 60)

GO.ID Term Annotated Significant Expected Fisher 1 GO:0006614 SRP-dependent cotranslational protein targeting to membrane 95 80 5.35 < 1e-30 2 GO:0006413 translational initiation 184 101 10.36 < 1e-30 3 GO:0000184 nuclear-transcribed mRNA catabolic process, nonsense-mediate... 119 81 6.70 < 1e-30 4 GO:0019083 viral transcription 173 81 9.74 < 1e-30 5 GO:0002181 cytoplasmic translation 92 49 5.18 1.0e-28 6 GO:0035722 interleukin-12-mediated signaling pathway 43 20 2.42 2.2e-14 7 GO:0060337 type I interferon signaling pathway 74 24 4.17 8.3e-13 8 GO:0006123 mitochondrial electron transport, cytochrome c to oxygen 15 10 0.84 6.9e-10 9 GO:0016032 viral process 744 167 41.88 2.4e-09 10 GO:0042776 mitochondrial ATP synthesis coupled proton transport 21 11 1.18 3.5e-09 11 GO:0000027 ribosomal large subunit assembly 27 12 1.52 7.3e-09 12 GO:0000381 regulation of alternative mRNA splicing, via spliceosome 53 16 2.98 1.8e-08 13 GO:0043066 negative regulation of apoptotic process 621 78 34.96 6.0e-08 14 GO:0043312 neutrophil degranulation 421 52 23.70 6.4e-08 15 GO:0002479 antigen processing and presentation of exogenous peptide ant... 71 17 4.00 2.8e-07 16 GO:0006364 rRNA processing 206 36 11.60 3.9e-07 17 GO:0000398 mRNA splicing, via spliceosome 312 54 17.56 4.6e-07 18 GO:0002480 antigen processing and presentation of exogenous peptide ant... 8 6 0.45 7.9e-07 19 GO:0001732 formation of cytoplasmic translation initiation complex 16 8 0.90 8.3e-07 20 GO:0000028 ribosomal small subunit assembly 16 8 0.90 8.3e-07

Annotated: number of genes (out of all.genes) that are annotated with that GO term
Significant: number of genes that are annotated with that GO term and meet our criteria for “expressed”
Expected: Under random chance, number of genes that would be expected to be annotated with that GO term and meeting our criteria for “expressed”
Fisher: (Raw) p-value from Fisher’s Exact Test

#3. Weighted Gene Co-Expression Network Analysis (WGCNA) WGCNA identifies groups of genes (“modules”) with correlated expression. WARNING: TAKES A LONG TIME TO RUN

options(stringsAsFactors = F)
datExpr <- t(as.matrix(GetAssayData(experiment.merged)))[,VariableFeatures(experiment.merged)]  # only use variable genes in analysis

net <- blockwiseModules(datExpr, power = 10,
  corType = "bicor", # use robust correlation
	networkType = "signed", minModuleSize = 10,
	reassignThreshold = 0, mergeCutHeight = 0.15,
	numericLabels = F, pamRespectsDendro = FALSE,
	saveTOMs = TRUE,
	saveTOMFileBase = "TOM",
	verbose = 3)

Calculating module eigengenes block-wise from all genes Flagging genes and samples with too many missing values... ..step 1 ..Working on block 1 . TOM calculation: adjacency.. ..will not use multithreading. Fraction of slow calculations: 0.000000 ..connectivity.. ..matrix multiplication (system BLAS).. ..normalization.. ..done. ..saving TOM for block 1 into file TOM-block.1.RData ....clustering.. ....detecting modules.. ....calculating module eigengenes.. ....checking kME in modules.. ..removing 599 genes from module 1 because their KME is too low. ..removing 65 genes from module 2 because their KME is too low. ..removing 48 genes from module 4 because their KME is too low. ..removing 7 genes from module 6 because their KME is too low. ..merging modules that are too close.. mergeCloseModules: Merging modules whose distance is less than 0.15 Calculating new MEs...

table(net$colors)

blue brown grey turquoise yellow 71 46 3463 192 11

# Convert labels to colors for plotting
mergedColors = net$colors
# Plot the dendrogram and the module colors underneath
plotDendroAndColors(net$dendrograms[[1]], mergedColors[net$blockGenes[[1]]],
"Module colors",
dendroLabels = FALSE, hang = 0.03,
addGuide = TRUE, guideHang = 0.05)

Genes in grey module are unclustered.

What genes are in the “blue” module?

colnames(datExpr)[net$colors == "blue"]

[1] "ISG15" "EFHD2" "SH3BGRL3" "FGR" "IFI6" "GNG5" "S100A10" "S100A11" "S100A6" "S100A4" "TPM3" "FCER1G" "ARPC5" "H3F3A" "ID2" "PLEK" "TMSB10" "VAMP8" "DBI" "RHOA" "PLAC8" "ATP6V0E1" "PRELID1" "CLIC1" "PSMB9" "ACTB" "RAC1" "ARPC1B" "ATP5MF" "GNB2" "LY6E" "SEC61B" "MAP3K8" "SRGN" "IFITM2" "IFITM3" "POLR2L" "CTSD" "RHOG" "UBE2L6" [41] "COX8A" "CFL1" "GSTP1" "CARD16" "GAPDH" "TPI1" "BLOC1S1" "CD63" "MYL6" "ARPC3" "PSME2" "SERF2" "ANXA2" "ELOB" "MT2A" "PSMB10" "CYBA" "PFN1" "GABARAP" "ACTG1" "UQCR11" "OAZ1" "WDR83OS" "BST2" "COX6B1" "TYROBP" "EMP3" "ATP5F1E" "PSMA7" "ITGB2" "LGALS1"

Each cluster is represented by a summary “eigengene”. Plot eigengenes for each non-grey module by clusters from Seurat:

f <- function(module){
  eigengene <- unlist(net$MEs[paste0("ME", module)])
  means <- tapply(eigengene, Idents(experiment.merged), mean, na.rm = T)
  return(means)
}
unique(net$colors)

[1] "grey" "blue" "turquoise" "brown" "yellow"

modules <- c("blue", "brown", "turquoise", "yellow")
plotdat <- sapply(modules, f)
matplot(plotdat, col = modules, type = "l", lwd = 2, xaxt = "n", xlab = "Seurat Cluster",
        ylab = "WGCNA Module Eigengene")
axis(1, at = 1:19, labels = 0:18, cex.axis = 0.8)
matpoints(plotdat, col = modules, pch = 21)

Can also plot the module onto the tsne plot

blue.eigengene <- unlist(net$MEs[paste0("ME", "blue")])
names(blue.eigengene) <- rownames(datExpr)
experiment.merged$blue.eigengene <- blue.eigengene
FeaturePlot(experiment.merged, features = "blue.eigengene", cols = c("grey", "blue"))

Session Information

sessionInfo()

R version 4.0.3 (2020-10-10) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Big Sur 10.16 Matrix products: default BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] org.Hs.eg.db_3.11.4 WGCNA_1.70-3 fastcluster_1.1.25 dynamicTreeCut_1.63-1 topGO_2.40.0 SparseM_1.81 GO.db_3.11.4 AnnotationDbi_1.50.3 IRanges_2.22.2 S4Vectors_0.26.1 Biobase_2.48.0 graph_1.66.0 BiocGenerics_0.34.0 limma_3.44.3 ggplot2_3.3.3 SeuratObject_4.0.0 Seurat_4.0.1 loaded via a namespace (and not attached): [1] backports_1.2.1 Hmisc_4.5-0 plyr_1.8.6 igraph_1.2.6 lazyeval_0.2.2 splines_4.0.3 listenv_0.8.0 scattermore_0.7 digest_0.6.27 foreach_1.5.1 htmltools_0.5.1.1 fansi_0.4.2 checkmate_2.0.0 magrittr_2.0.1 memoise_2.0.0 tensor_1.5 cluster_2.1.1 doParallel_1.0.16 ROCR_1.0-11 globals_0.14.0 [21] matrixStats_0.58.0 spatstat.sparse_2.0-0 jpeg_0.1-8.1 colorspace_2.0-0 blob_1.2.1 ggrepel_0.9.1 xfun_0.22 dplyr_1.0.5 crayon_1.4.1 jsonlite_1.7.2 spatstat.data_2.1-0 impute_1.62.0 survival_3.2-10 zoo_1.8-9 iterators_1.0.13 glue_1.4.2 polyclip_1.10-0 gtable_0.3.0 leiden_0.3.7 future.apply_1.7.0 [41] abind_1.4-5 scales_1.1.1 DBI_1.1.1 miniUI_0.1.1.1 Rcpp_1.0.6 htmlTable_2.1.0 viridisLite_0.3.0 xtable_1.8-4 reticulate_1.18 spatstat.core_2.0-0 foreign_0.8-81 bit_4.0.4 preprocessCore_1.50.0 Formula_1.2-4 htmlwidgets_1.5.3 httr_1.4.2 RColorBrewer_1.1-2 ellipsis_0.3.1 ica_1.0-2 farver_2.1.0 [61] pkgconfig_2.0.3 nnet_7.3-15 sass_0.3.1 uwot_0.1.10 deldir_0.2-10 utf8_1.2.1 labeling_0.4.2 tidyselect_1.1.0 rlang_0.4.10 reshape2_1.4.4 later_1.1.0.1 munsell_0.5.0 tools_4.0.3 cachem_1.0.4 generics_0.1.0 RSQLite_2.2.4 ggridges_0.5.3 evaluate_0.14 stringr_1.4.0 fastmap_1.1.0 [81] yaml_2.2.1 goftest_1.2-2 knitr_1.31 bit64_4.0.5 fitdistrplus_1.1-3 purrr_0.3.4 RANN_2.6.1 pbapply_1.4-3 future_1.21.0 nlme_3.1-152 mime_0.10 rstudioapi_0.13 compiler_4.0.3 plotly_4.9.3 png_0.1-7 spatstat.utils_2.1-0 tibble_3.1.0 bslib_0.2.4 stringi_1.5.3 highr_0.8 [101] lattice_0.20-41 Matrix_1.3-2 vctrs_0.3.6 pillar_1.5.1 lifecycle_1.0.0 spatstat.geom_2.0-1 lmtest_0.9-38 jquerylib_0.1.3 RcppAnnoy_0.0.18 data.table_1.14.0 cowplot_1.1.1 irlba_2.3.3 httpuv_1.5.5 patchwork_1.1.1 latticeExtra_0.6-29 R6_2.5.0 promises_1.2.0.1 KernSmooth_2.23-18 gridExtra_2.3 parallelly_1.24.0 [121] codetools_0.2-18 MASS_7.3-53.1 assertthat_0.2.1 withr_2.4.1 sctransform_0.3.2 mgcv_1.8-34 grid_4.0.3 rpart_4.1-15 tidyr_1.1.3 rmarkdown_2.7 Rtsne_0.15 base64enc_0.1-3 shiny_1.6.0

☰ Menu

Single Cell RNA-Seq Analysis

Setup additinal options

Load libraries

Load the Seurat object

2. Gene Ontology (GO) Enrichment of Genes Expressed in a Cluster

Session Information