An object of class Seurat
21005 features across 10595 samples within 1 assay
Active assay: RNA (21005 features, 5986 variable features)
3 dimensional reductions calculated: pca, tsne, umap
Idents(experiment.merged)<-"finalcluster"
1. Gene Ontology (GO) Enrichment of Genes Expressed in a Cluster
Gene Ontology provides a controlled vocabulary for describing gene products. Here we use enrichment analysis to identify GO terms that are overrepresented among the gene expressed in cells in a given cluster.
cluster12<-subset(experiment.merged,idents='12')expr<-as.matrix(GetAssayData(cluster12))# Filter out genes that are 0 for every cell in this clusterbad<-which(rowSums(expr)==0)expr<-expr[-bad,]# Select genes that are expressed > 0 in at least half of cellsn.gt.0<-apply(expr,1,function(x)length(which(x>0)))expressed.genes<-rownames(expr)[which(n.gt.0/ncol(expr)>=0.5)]all.genes<-rownames(expr)# define geneList as 1 if gene is in expressed.genes, 0 otherwisegeneList<-ifelse(all.genes%in%expressed.genes,1,0)names(geneList)<-all.genes# Create topGOdata objectGOdata<-new("topGOdata",ontology="BP",# use biological process ontologyallGenes=geneList,geneSelectionFun=function(x)(x==1),annot=annFUN.org,mapping="org.Hs.eg.db",ID="symbol")# Test for enrichment using Fisher's Exact TestresultFisher<-runTest(GOdata,algorithm="elim",statistic="fisher")GenTable(GOdata,Fisher=resultFisher,topNodes=20,numChar=60)
GO.ID Term Annotated Significant Expected Fisher
1 GO:0050852 T cell receptor signaling pathway 114 4 0.24 8.2e-05
2 GO:0050861 positive regulation of B cell receptor signaling pathway 7 2 0.01 8.6e-05
3 GO:0072659 protein localization to plasma membrane 236 5 0.49 0.00011
4 GO:0050727 regulation of inflammatory response 227 4 0.47 0.00113
5 GO:0021762 substantia nigra development 32 2 0.07 0.00197
6 GO:0000304 response to singlet oxygen 1 1 0.00 0.00208
7 GO:0042351 'de novo' GDP-L-fucose biosynthetic process 1 1 0.00 0.00208
8 GO:2000473 positive regulation of hematopoietic stem cell migration 1 1 0.00 0.00208
9 GO:0051623 positive regulation of norepinephrine uptake 1 1 0.00 0.00208
10 GO:0032745 positive regulation of interleukin-21 production 1 1 0.00 0.00208
11 GO:0072749 cellular response to cytochalasin B 1 1 0.00 0.00208
12 GO:1905475 regulation of protein localization to membrane 140 3 0.29 0.00292
13 GO:0010629 negative regulation of gene expression 757 7 1.57 0.00295
14 GO:0034113 heterotypic cell-cell adhesion 45 2 0.09 0.00388
15 GO:1903615 positive regulation of protein tyrosine phosphatase activity 2 1 0.00 0.00415
16 GO:0031022 nuclear migration along microfilament 2 1 0.00 0.00415
17 GO:0021817 nucleokinesis involved in cell motility in cerebral cortex r... 2 1 0.00 0.00415
18 GO:1904155 DN2 thymocyte differentiation 2 1 0.00 0.00415
19 GO:0002728 negative regulation of natural killer cell cytokine producti... 2 1 0.00 0.00415
20 GO:0044855 plasma membrane raft distribution 2 1 0.00 0.00415
Annotated: number of genes (out of all.genes) that are annotated with that GO term
Significant: number of genes that are annotated with that GO term and meet our criteria for “expressed”
Expected: Under random chance, number of genes that would be expected to be annotated with that GO term and meeting our criteria for “expressed”
Fisher: (Raw) p-value from Fisher’s Exact Test
Quiz 1
Challenge Questions
If you have extra time:
Rerun the enrichment analysis for the molecular function (MF) ontology.
Think about how you write code to repeat the above enrichment analysis for every cluster (hint: ?base::sapply).
2. Model-based DE analysis in limma
limma is an R package for differential expression analysis of bulk RNASeq and microarray data. We apply it here to single cell data.
Limma can be used to fit any linear model to expression data and is useful for analyses that go beyond two-group comparisons. A detailed tutorial of model specification in limma is available here and in the limma User’s Guide.
# filter genes to those expressed in at least 10% of cellskeep<-rownames(expr)[which(n.gt.0/ncol(expr)>=0.1)]expr2<-expr[keep,]# Set up "design matrix" with statistical modelcluster12$proper.ident<-make.names(cluster12$orig.ident)mm<-model.matrix(~0+proper.ident+S.Score+G2M.Score+percent.mito+nFeature_RNA,data=cluster12[[]])head(mm)
scMRMA (single cell Multi-Resolution Marker-based Annotation Algorithm) classifies cells by iteratively clustering them then annotating based on a hierarchical external database.
The databases included with the current version are only for use with human and mouse, but a user-constructed hierarchichal database can be used.