UCD Bioinformatics Core Workshop

Home
Introduction and Lectures
Intro to the Workshop and Core
Schedule
What is Bioinformatics/Genomics?
Experimental Design and Cost Estimation
RNA Sequencing Technologies - Dr. Lutz Froenicke
Support
Zoom
Slack
Cheat Sheets
Software and Links
Scripts
Prerequisites
CLI
R
Data Reduction
Files and Filetypes
Prepare dataset
Preprocessing raw data
Indexing a Genome
Alignment with Star
Generating counts tables
Alignment/Counts with Salmon (Extra)
3' TagSeq
Prokaryotic RNASeq
Data analysis
Annotation from BioMart
Prepare R for data analysis
Differential Expression Analysis
Pathway Analysis
WGCNA
Comparison between STAR and Salmon
ETC
Closing thoughts
Workshop Photos
Github page
Report Errors
Biocore website

Load libraries

library(topGO)
library(org.Mm.eg.db)
library(clusterProfiler)
library(pathview)
library(enrichplot)
library(ggplot2)
library(dplyr)

Files for examples were created in the DE analysis.

Gene Ontology (GO) Enrichment

Gene ontology provides a controlled vocabulary for describing biological processes (BP ontology), molecular functions (MF ontology) and cellular components (CC ontology)

The GO ontologies themselves are organism-independent; terms are associated with genes for a specific organism through direct experimentation or through sequence homology with another organism and its GO annotation.

Terms are related to other terms through parent-child relationships in a directed acylic graph.

Enrichment analysis provides one way of drawing conclusions about a set of differential expression results.

1. topGO Example Using Kolmogorov-Smirnov Testing Our first example uses Kolmogorov-Smirnov Testing for enrichment testing of our mouse DE results, with GO annotation obtained from the Bioconductor database org.Mm.eg.db.

The first step in each topGO analysis is to create a topGOdata object. This contains the genes, the score for each gene (here we use the p-value from the DE test), the GO terms associated with each gene, and the ontology to be used (here we use the biological process ontology)

infile <- "WT.C_v_WT.NC.txt"
DE <- read.delim(infile)

## Add entrezgene IDs to top table
tmp <- bitr(DE$Gene.stable.ID, fromType = "ENSEMBL", toType = "ENTREZID", OrgDb = org.Mm.eg.db)

## 'select()' returned 1:many mapping between keys and columns

## Warning in bitr(DE$Gene.stable.ID, fromType = "ENSEMBL", toType = "ENTREZID", :
## 20.42% of input gene IDs are fail to map...

id.conv <- subset(tmp, !duplicated(tmp$ENSEMBL))
DE <- left_join(DE, id.conv, by = c("Gene.stable.ID" = "ENSEMBL"))

# Make gene list
DE.nodupENTREZ <- subset(DE, !is.na(ENTREZID) & !duplicated(ENTREZID))
geneList <- DE.nodupENTREZ$P.Value
names(geneList) <- DE.nodupENTREZ$ENTREZID
head(geneList)

##        67241        18795       104718        12521        94212        12772 
## 4.109222e-16 7.877908e-16 1.094372e-15 2.224431e-15 2.256262e-15 2.467758e-15

# Create topGOData object
GOdata <- new("topGOdata",
	ontology = "BP",
	allGenes = geneList,
	geneSelectionFun = function(x)x,
	annot = annFUN.org , mapping = "org.Mm.eg.db")

## 
## Building most specific GOs .....

## 	( 11080 GO terms found. )

## 
## Build GO DAG topology ..........

## 	( 14033 GO terms and 30925 relations. )

## 
## Annotating nodes ...............

## 	( 12273 genes annotated to the GO terms. )

2. The topGOdata object is then used as input for enrichment testing:

# Kolmogorov-Smirnov testing
resultKS <- runTest(GOdata, algorithm = "weight01", statistic = "ks")

## 
## 			 -- Weight01 Algorithm -- 
## 
## 		 the algorithm is scoring 14033 nontrivial nodes
## 		 parameters: 
## 			 test statistic: ks
## 			 score order: increasing

## 
## 	 Level 19:	2 nodes to be scored	(0 eliminated genes)

## 
## 	 Level 18:	27 nodes to be scored	(0 eliminated genes)

## 
## 	 Level 17:	57 nodes to be scored	(46 eliminated genes)

## 
## 	 Level 16:	100 nodes to be scored	(115 eliminated genes)

## 
## 	 Level 15:	141 nodes to be scored	(239 eliminated genes)

## 
## 	 Level 14:	255 nodes to be scored	(587 eliminated genes)

## 
## 	 Level 13:	580 nodes to be scored	(1036 eliminated genes)

## 
## 	 Level 12:	1028 nodes to be scored	(1681 eliminated genes)

## 
## 	 Level 11:	1568 nodes to be scored	(2913 eliminated genes)

## 
## 	 Level 10:	1841 nodes to be scored	(5037 eliminated genes)

## 
## 	 Level 9:	2063 nodes to be scored	(6865 eliminated genes)

## 
## 	 Level 8:	1936 nodes to be scored	(8367 eliminated genes)

## 
## 	 Level 7:	1713 nodes to be scored	(9542 eliminated genes)

## 
## 	 Level 6:	1367 nodes to be scored	(10370 eliminated genes)

## 
## 	 Level 5:	853 nodes to be scored	(10841 eliminated genes)

## 
## 	 Level 4:	383 nodes to be scored	(11167 eliminated genes)

## 
## 	 Level 3:	101 nodes to be scored	(11323 eliminated genes)

## 
## 	 Level 2:	17 nodes to be scored	(11415 eliminated genes)

## 
## 	 Level 1:	1 nodes to be scored	(11449 eliminated genes)

tab <- GenTable(GOdata, raw.p.value = resultKS, topNodes = length(resultKS@score), numChar = 120)

topGO by default preferentially tests more specific terms, utilizing the topology of the GO graph. The algorithms used are described in detail here.

head(tab, 15)

##         GO.ID
## 1  GO:0140242
## 2  GO:0140236
## 3  GO:0002181
## 4  GO:0006002
## 5  GO:0002218
## 6  GO:0045071
## 7  GO:0032760
## 8  GO:0035458
## 9  GO:0050680
## 10 GO:0036151
## 11 GO:0072672
## 12 GO:1901224
## 13 GO:0001525
## 14 GO:0045944
## 15 GO:0042776
##                                                                  Term Annotated
## 1                                          translation at postsynapse        48
## 2                                           translation at presynapse        47
## 3                                             cytoplasmic translation       193
## 4                              fructose 6-phosphate metabolic process        12
## 5                                activation of innate immune response       336
## 6                     negative regulation of viral genome replication        44
## 7             positive regulation of tumor necrosis factor production       102
## 8                                cellular response to interferon-beta        59
## 9                negative regulation of epithelial cell proliferation       110
## 10                          phosphatidylcholine acyl-chain remodeling         9
## 11                                           neutrophil extravasation        15
## 12 positive regulation of non-canonical NF-kappaB signal transduction        52
## 13                                                       angiogenesis       401
## 14          positive regulation of transcription by RNA polymerase II       928
## 15             proton motive force-driven mitochondrial ATP synthesis        64
##    Significant Expected raw.p.value
## 1           48       48     1.3e-08
## 2           47       47     2.5e-08
## 3          193      193     3.3e-07
## 4           12       12     1.8e-06
## 5          336      336     1.9e-06
## 6           44       44     2.5e-06
## 7          102      102     4.3e-06
## 8           59       59     1.1e-05
## 9          110      110     2.2e-05
## 10           9        9     3.0e-05
## 11          15       15     4.3e-05
## 12          52       52     5.2e-05
## 13         401      401     5.5e-05
## 14         928      928     7.5e-05
## 15          64       64     8.5e-05

Annotated: number of genes (in our gene list) that are annotated with the term
Significant: n/a for this example, same as Annotated here
Expected: n/a for this example, same as Annotated here
raw.p.value: P-value from Kolomogorov-Smirnov test that DE p-values annotated with the term are smaller (i.e. more significant) than those not annotated with the term.

The Kolmogorov-Smirnov test directly compares two probability distributions based on their maximum distance.

To illustrate the KS test, we plot probability distributions of p-values that are and that are not annotated with the term GO:0010556 “regulation of macromolecule biosynthetic process” (2344 genes) p-value 1.00. (This won’t exactly match what topGO does due to their elimination algorithm):

rna.pp.terms <- genesInTerm(GOdata)[["GO:0010556"]] # get genes associated with term
p.values.in <- geneList[names(geneList) %in% rna.pp.terms]
p.values.out <- geneList[!(names(geneList) %in% rna.pp.terms)]
plot.ecdf(p.values.in, verticals = T, do.points = F, col = "red", lwd = 2, xlim = c(0,1),
          main = "Empirical Distribution of DE P-Values\nby Annotation with 'regulation of macromolecule biosynthetic process'",
          cex.main = 0.7, xlab = "p", ylab = "Probability(P-Value <= p)")
ecdf.out <- ecdf(p.values.out)
xx <- unique(sort(c(seq(0, 1, length = 201), knots(ecdf.out))))
lines(xx, ecdf.out(xx), col = "black", lwd = 2)
legend("bottomright", legend = c("Genes Annotated with 'regulation of macromolecule biosynthetic process'", "Genes not annotated with 'regulation of macromolecule biosynthetic process'"), lwd = 2, col = 2:1, cex = 0.7)

versus the probability distributions of p-values that are and that are not annotated with the term GO:0045071 “negative regulation of viral genome replication” (54 genes) p-value 3.3x10-9.

rna.pp.terms <- genesInTerm(GOdata)[["GO:0045071"]] # get genes associated with term
p.values.in <- geneList[names(geneList) %in% rna.pp.terms]
p.values.out <- geneList[!(names(geneList) %in% rna.pp.terms)]
plot.ecdf(p.values.in, verticals = T, do.points = F, col = "red", lwd = 2, xlim = c(0,1),
          main = "Empirical Distribution of DE P-Values\nby Annotation with 'negative regulation of viral genome replication'",
          cex.main = 0.7, xlab = "p", ylab = "Probability(P-Value <= p)")
ecdf.out <- ecdf(p.values.out)
xx <- unique(sort(c(seq(0, 1, length = 201), knots(ecdf.out))))
lines(xx, ecdf.out(xx), col = "black", lwd = 2)
legend("bottomright", legend = c("Genes Annotated with 'negative regulation of viral genome replication'", "Genes Not Annotated with 'negative regulation of viral genome replication'"), lwd = 2, col = 2:1, cex = 0.7)

We can use the function showSigOfNodes to plot the GO graph for the 2 most significant terms and their parents, color coded by enrichment p-value (red is most significant):

par(cex = 0.3)
showSigOfNodes(GOdata, score(resultKS), firstSigNodes = 2, useInfo = "def", .NO.CHAR = 40)

## Loading required package: Rgraphviz

## Loading required package: grid

## 
## Attaching package: 'grid'

## The following object is masked from 'package:topGO':
## 
##     depth

## 
## Attaching package: 'Rgraphviz'

## The following objects are masked from 'package:IRanges':
## 
##     from, to

## The following objects are masked from 'package:S4Vectors':
## 
##     from, to

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 13 
## Number of Edges = 16 
## 
## $complete.dag
## [1] "A graph with 13 nodes."

par(cex = 1)

3. topGO Example Using Fisher’s Exact Test

Next, we use Fisher’s exact test to test for GO enrichment among significantly DE genes.

Create topGOdata object:

geneList <- DE.nodupENTREZ$adj.P.Val
names(geneList) <- DE.nodupENTREZ$ENTREZID

# Create topGOData object
GOdata <- new("topGOdata",
	ontology = "BP",
	allGenes = geneList,
	geneSelectionFun = function(x) (x < 0.05),
	annot = annFUN.org , mapping = "org.Mm.eg.db")

## 
## Building most specific GOs .....

## 	( 11080 GO terms found. )

## 
## Build GO DAG topology ..........

## 	( 14033 GO terms and 30925 relations. )

## 
## Annotating nodes ...............

## 	( 12273 genes annotated to the GO terms. )

Run Fisher’s Exact Test:

resultFisher <- runTest(GOdata, algorithm = "elim", statistic = "fisher")

## 
## 			 -- Elim Algorithm -- 
## 
## 		 the algorithm is scoring 12785 nontrivial nodes
## 		 parameters: 
## 			 test statistic: fisher
## 			 cutOff: 0.01

## 
## 	 Level 19:	2 nodes to be scored	(0 eliminated genes)

## 
## 	 Level 18:	22 nodes to be scored	(0 eliminated genes)

## 
## 	 Level 17:	50 nodes to be scored	(0 eliminated genes)

## 
## 	 Level 16:	90 nodes to be scored	(0 eliminated genes)

## 
## 	 Level 15:	121 nodes to be scored	(65 eliminated genes)

## 
## 	 Level 14:	213 nodes to be scored	(105 eliminated genes)

## 
## 	 Level 13:	492 nodes to be scored	(190 eliminated genes)

## 
## 	 Level 12:	913 nodes to be scored	(200 eliminated genes)

## 
## 	 Level 11:	1391 nodes to be scored	(1319 eliminated genes)

## 
## 	 Level 10:	1670 nodes to be scored	(1602 eliminated genes)

## 
## 	 Level 9:	1881 nodes to be scored	(2085 eliminated genes)

## 
## 	 Level 8:	1802 nodes to be scored	(2886 eliminated genes)

## 
## 	 Level 7:	1578 nodes to be scored	(3988 eliminated genes)

## 
## 	 Level 6:	1281 nodes to be scored	(4825 eliminated genes)

## 
## 	 Level 5:	794 nodes to be scored	(6015 eliminated genes)

## 
## 	 Level 4:	369 nodes to be scored	(6418 eliminated genes)

## 
## 	 Level 3:	98 nodes to be scored	(7074 eliminated genes)

## 
## 	 Level 2:	17 nodes to be scored	(7158 eliminated genes)

## 
## 	 Level 1:	1 nodes to be scored	(7166 eliminated genes)

tab <- GenTable(GOdata, raw.p.value = resultFisher, topNodes = length(resultFisher@score),
				numChar = 120)
head(tab)

##        GO.ID                                 Term Annotated Significant
## 1 GO:0140242           translation at postsynapse        48          44
## 2 GO:0140236            translation at presynapse        47          43
## 3 GO:0002181              cytoplasmic translation       193         144
## 4 GO:0046718        symbiont entry into host cell        72          58
## 5 GO:0001525                         angiogenesis       401         278
## 6 GO:0035458 cellular response to interferon-beta        59          48
##   Expected raw.p.value
## 1    28.35     5.2e-07
## 2    27.76     8.1e-07
## 3   113.98     3.8e-06
## 4    42.52     8.6e-05
## 5   236.82     0.00016
## 6    34.84     0.00022

Annotated: number of genes (in our gene list) that are annotated with the term
Significant: Number of significantly DE genes annotated with that term (i.e. genes where geneList = 1)
Expected: Under random chance, number of genes that would be expected to be significantly DE and annotated with that term
raw.p.value: P-value from Fisher’s Exact Test, testing for association between significance and pathway membership.

Fisher’s Exact Test is applied to the table:

Significance/Annotation	Annotated With GO Term	Not Annotated With GO Term
Significantly DE	n1	n3
Not Significantly DE	n2	n4

and compares the probability of the observed table, conditional on the row and column sums, to what would be expected under random chance.

Advantages over KS (or Wilcoxon) Tests:

Ease of interpretation
Can be applied when you just have a gene list without associated p-values, etc.

Disadvantages:

Relies on significant/non-significant dichotomy (an interesting gene could have an adjusted p-value of 0.051 and be counted as non-significant)
Less powerful
May be less useful if there are very few (or a large number of) significant genes

Quiz 1

KEGG Pathway Enrichment Testing With clusterProfiler

KEGG, the Kyoto Encyclopedia of Genes and Genomes (https://www.genome.jp/kegg/), provides assignment of genes for many organisms into pathways.

We will conduct KEGG enrichment testing using the Bioconductor package clusterProfiler. clusterProfiler implements an algorithm very similar to that used by GSEA.

Cluster profiler can do much more than KEGG enrichment, check out the clusterProfiler book.

We will base our KEGG enrichment analysis on the t statistics from differential expression, which allows for directional testing.

geneList.KEGG <- DE.nodupENTREZ$t                   
geneList.KEGG <- sort(geneList.KEGG, decreasing = TRUE)
names(geneList.KEGG) <- DE.nodupENTREZ$ENTREZID
head(geneList.KEGG)

##    67241    18795   104718    12521    94212    12772 
## 46.54613 42.60381 38.54763 37.46686 36.69806 36.23871

set.seed(99)
KEGG.results <- gseKEGG(gene = geneList.KEGG, organism = "mmu", pvalueCutoff = 1)

## Reading KEGG annotation online: "https://rest.kegg.jp/link/mmu/pathway"...

## Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway/mmu"...

## using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).

## preparing geneSet collections...

## GSEA analysis...

## leading edge analysis...

## done...

KEGG.results <- setReadable(KEGG.results, OrgDb = "org.Mm.eg.db", keyType = "ENTREZID")
outdat <- as.data.frame(KEGG.results)
head(outdat)

##                ID                                     Description setSize
## mmu05168 mmu05168                Herpes simplex virus 1 infection     161
## mmu04142 mmu04142                             Lysosome biogenesis     229
## mmu04970 mmu04970                              Salivary secretion      47
## mmu04015 mmu04015                          Rap1 signaling pathway     156
## mmu05171 mmu05171                  Coronavirus disease - COVID-19     192
## mmu05167 mmu05167 Kaposi sarcoma-associated herpesvirus infection     165
##          enrichmentScore      NES       pvalue     p.adjust       qvalue rank
## mmu05168       0.4932391 1.969289 1.326900e-07 2.446917e-05 1.436011e-05 2285
## mmu04142       0.4476374 1.871709 1.443609e-07 2.446917e-05 1.436011e-05 2313
## mmu04970       0.6675048 2.204195 3.750326e-07 4.237869e-05 2.487058e-05 1115
## mmu04015       0.4736072 1.890513 7.499323e-07 5.084541e-05 2.983941e-05 2568
## mmu05171       0.4511957 1.844312 7.317951e-07 5.084541e-05 2.983941e-05 3262
## mmu05167       0.4665885 1.875059 1.783696e-06 9.457444e-05 5.550247e-05 3321
##                            leading_edge
## mmu05168 tags=37%, list=17%, signal=31%
## mmu04142 tags=31%, list=17%, signal=26%
## mmu04970  tags=34%, list=8%, signal=31%
## mmu04015 tags=36%, list=19%, signal=30%
## mmu05171 tags=48%, list=24%, signal=37%
## mmu05167 tags=42%, list=24%, signal=33%
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            core_enrichment
## mmu05168                                                                                                                                                                                 Pou2f2/C3/H2-Q6/Pilrb2/Ptpn11/Bcl2/Ppp1cb/Daxx/Nxf1/Sp100/Apaf1/Pilra/Tyk2/Ifih1/Sting1/Rnasel/Pilrb1/H2-Q7/Ccl2/H2-K1/Stat2/Itga5/Bad/Bst2/Traf3/Casp8/Tnfsf14/H2-Q4/Hcfc2/H2-T10/Eif2ak2/Oas1a/Tnfrsf14/Fas/H2-Ob/Jak2/Rigi/Eif2b2/Pik3r2/Tlr9/Nfkb1/Socs3/B2m/Oas2/Pik3cd/H2-T24/Tap2/Ikbke/Mavs/H2-D1/Oas3/Nfkbia/Irf3/Akt3/Hcfc1/Irf7/Tapbpl/Tlr2/Chuk/Traf2
## mmu04142                                                                                              Dmxl2/Npc2/Slc29a1/Galns/Lamtor4/Igf2r/Ap2a2/Atp6v1b2/Atp6v1a/Gpr155/Acp2/Ctsc/Mitf/Ctsh/Slc36a1/Lamp2/Ctsb/Slc11a1/Acp5/Glb1/Tpcn2/Hexa/Psap/Clta/Slc29a3/Naga/Asah1/Manba/Ncoa7/Tti1/Atp6v0c/Slc36a4/Lamp1/Gnptab/Borcs6/Scarb2/Laptm4a/Slc15a4/Cltb/Cln3/Npc1/Hexb/Slc17a5/Galc/Gnptg/Arsg/Rraga/Ctsl/Ap5s1/Atp6v1g1/Tlr9/Borcs5/Gla/Vps13c/Slc2a6/Ap2s1/Aga/Vps26b/Tcirg1/Cln5/Clcn7/Hgsnat/Tfeb/Ap3m1/Ctns/Vps16/Wdr7/Kxd1/Lamtor2/Atp6v1d/Tlr7
## mmu04970                                                                                                                                                                                                                                                                                                                                                                                                                                                  Plcb1/Gnaq/Adrb1/Adcy7/Slc12a2/Atp2b1/Bst1/Atp1a1/Adrb2/Calm1/Lyz2/Itpr1/Prkacb/Cst3/Prkcb/Adcy9
## mmu04015                                                                                                                                                                                                         Plcb1/Sipa1l1/Gnaq/P2ry1/Tiam1/Met/Adcy7/Itgb1/Pdgfb/Rassf5/Afdn/Vegfc/Raf1/Itgal/Insr/Fpr1/Prkd2/Thbs1/Vasp/Rasgrp2/Evl/Rras/Cdh1/Calm1/Rap1b/Sipa1l3/Rapgef5/Pdgfrb/Prkcb/Itgb2/Adcy9/Rap1a/Ralb/Map2k1/Pik3r2/Prkci/Calml4/Crk/Pik3cd/Hgf/Rgs14/Kras/Rapgef1/Vav3/Arap3/Tln2/Akt3/Actg1/Fgfr1/Csf1r/Lpar4/Lcp2/Adcy4/Apbb1ip/Akt1/Rac1
## mmu05171 C3/F13a1/Rpl3l/Vwf/Tyk2/C1ra/Isg15/Ifih1/Sting1/Rps9/Nrp1/Ccl2/Rpl28/Rplp0/Stat2/Il6st/C4b/Rpl7a/Rpl41/Ace/Traf3/Mx1/Prkcb/Cxcl10/Eif2ak2/Oas1a/Rpl31/Rpl32/Rpl10a/Rigi/C5ar1/Rpl22/Pik3r2/Mx2/Rpl10/Nfkb1/Oas2/Pik3cd/Rps12/Stat3/Ikbke/Rps26/Mavs/Rpl29/Rps27l/Rpsa/Rps5/Rps18/Rpl13/Rpl14/Oas3/Nfkbia/Irf3/Rpl3/Rpl23a/Rpl15/Nlrp3/Rpl18/Tlr2/Chuk/Rps6/Rpl27a/Tlr7/Rpl11/Rps7/Rps17/Rps28/Rps3/Rps20/Rpl35/Tnf/Rpl37/Fos/Rps4x/Rpl19/Rpl17/Rpl4/Rpl34/Rpl39/Rps27a/Rps2/Rpl6/Rpl36a/Mapk3/Rps15a/Rpl9/Mapk11/Rsl24d1/Rpl13a/Rpl21/Rpl7/Ifnar1
## mmu05167                                                                                                                        C3/H2-Q6/Pik3r6/Ccr1/Nfatc3/Pdgfb/Rcan1/Gng2/Rb1/Raf1/Tcf7l2/Tyk2/E2f2/Calm1/H2-Q7/H2-K1/Itpr1/Stat2/Il6st/Traf3/Casp8/H2-Q4/H2-T10/Eif2ak2/Hck/Fas/Gngt2/Cdk6/Ppp3r1/Jak2/Map2k1/Pik3r2/Cd200r1/Calml4/Nfkb1/Pik3cd/H2-T24/Stat3/Kras/Ikbke/H2-D1/Ppp3ca/Nfkbia/Gng10/Irf3/Akt3/Irf7/Nfatc1/E2f3/Chuk/Traf2/Gng12/Ppp3cb/Akt1/Hif1a/Gnb1/Rac1/Fos/Map2k6/Rps27a/Itpr3/Mapk3/Hras/Mapk11/Ccr3/Angpt2/Cycs/Ifnar1/Ccnd1/Bid

Gene set enrichment analysis output includes the following columns:

setSize: Number of genes in pathway
enrichmentScore: GSEA enrichment score, a statistic reflecting the degree to which a pathway is overrepresented at the top or bottom of the gene list (the gene list here consists of the t-statistics from the DE test).
NES: Normalized enrichment score
pvalue: Raw p-value from permutation test of enrichment score
p.adjust: Benjamini-Hochberg false discovery rate adjusted p-value
qvalue: Storey false discovery rate adjusted p-value
rank: Position in ranked list at which maximum enrichment score occurred
leading_edge: Statistics from leading edge analysis

Dotplot of enrichment results

Gene.ratio = (count of core enrichment genes)/(count of pathway genes)

Core enrichment genes = subset of genes that contribute most to the enrichment result (“leading edge subset”)

dotplot(KEGG.results)

Tree plot of KEGG enrichment

Pathways are clustered based on the degree of overlap in their member genes. Circles are sized based on the number of genes in the pathway and colored based on the adjusted p-value.

tmp <- pairwise_termsim(KEGG.results)
treeplot(tmp, fontsize_cladelab = 5)

Pathview plot of log fold changes on KEGG diagram

foldChangeList <- DE$logFC
names(foldChangeList) <- DE$ENTREZID
head(foldChangeList)

##     67241     18795    104718     12521     94212     12772 
## -2.258712  3.084665 -2.006122 -2.446067 -1.767470  2.024618

mmu04015 <- pathview(gene.data  = foldChangeList,
                     pathway.id = "mmu04015",
                     species    = "mmu",
                     limit      = list(gene = 6))

## 'select()' returned 1:1 mapping between keys and columns

## Info: Working in directory C:/Users/bpdurbin/OneDrive - UC Davis Health/work_stuff/2026-March-RNA-Seq-Analysis/data_analysis

## Info: Writing image file mmu04015.pathview.png

Barplot of p-values for top pathways

A barplot of -log10(p-value) for the top pathways/terms can be used for any type of enrichment analysis.

plotdat <- outdat[1:10,]
plotdat$nice.name <- gsub(" - Mus musculus (house mouse)", "", plotdat$Description, fixed = TRUE)

ggplot(plotdat, aes(x = -log10(p.adjust), y = reorder(nice.name, -log10(p.adjust)), fill = setSize)) + geom_bar(stat = "identity") + labs(x = "-log10(P-Value)", y = NULL, fill = "# Genes") + scale_fill_gradient(low = "red", high = "blue")

Quiz 2

sessionInfo()

## R version 4.5.3 (2026-03-11 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26100)
## 
## Matrix products: default
##   LAPACK version 3.12.1
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## time zone: America/Los_Angeles
## tzcode source: internal
## 
## attached base packages:
## [1] grid      stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] Rgraphviz_2.54.0       dplyr_1.2.0            ggplot2_4.0.2         
##  [4] enrichplot_1.30.5      pathview_1.31.3        clusterProfiler_4.18.4
##  [7] org.Mm.eg.db_3.22.0    topGO_2.62.0           SparseM_1.84-2        
## [10] GO.db_3.22.0           AnnotationDbi_1.72.0   IRanges_2.44.0        
## [13] S4Vectors_0.48.0       Biobase_2.70.0         graph_1.88.1          
## [16] BiocGenerics_0.56.0    generics_0.1.4        
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3      rstudioapi_0.18.0       jsonlite_2.0.0         
##   [4] tidydr_0.0.6            magrittr_2.0.4          ggtangle_0.1.1         
##   [7] farver_2.1.2            rmarkdown_2.30          fs_2.0.0               
##  [10] vctrs_0.7.2             memoise_2.0.1           RCurl_1.98-1.18        
##  [13] ggtree_4.0.5            htmltools_0.5.9         gridGraphics_0.5-1     
##  [16] sass_0.4.10             bslib_0.10.0            htmlwidgets_1.6.4      
##  [19] plyr_1.8.9              cachem_1.1.0            igraph_2.2.2           
##  [22] lifecycle_1.0.5         pkgconfig_2.0.3         Matrix_1.7-5           
##  [25] R6_2.6.1                fastmap_1.2.0           gson_0.1.0             
##  [28] digest_0.6.39           aplot_0.2.9             ggnewscale_0.5.2       
##  [31] patchwork_1.3.2         RSQLite_2.4.6           org.Hs.eg.db_3.22.0    
##  [34] labeling_0.4.3          httr_1.4.8              polyclip_1.10-7        
##  [37] compiler_4.5.3          bit64_4.6.0-1           fontquiver_0.2.1       
##  [40] withr_3.0.2             S7_0.2.1                BiocParallel_1.44.0    
##  [43] DBI_1.3.0               ggforce_0.5.0           R.utils_2.13.0         
##  [46] MASS_7.3-65             rappdirs_0.3.4          tools_4.5.3            
##  [49] otel_0.2.0              ape_5.8-1               scatterpie_0.2.6       
##  [52] R.oo_1.27.1             glue_1.8.0              nlme_3.1-168           
##  [55] GOSemSim_2.36.0         cluster_2.1.8.2         reshape2_1.4.5         
##  [58] fgsea_1.36.2            snow_0.4-4              gtable_0.3.6           
##  [61] R.methodsS3_1.8.2       tidyr_1.3.2             data.table_1.18.2.1    
##  [64] XVector_0.50.0          ggrepel_0.9.8           pillar_1.11.1          
##  [67] stringr_1.6.0           yulab.utils_0.2.4       splines_4.5.3          
##  [70] tweenr_2.0.3            treeio_1.34.0           lattice_0.22-9         
##  [73] bit_4.6.0               tidyselect_1.2.1        fontLiberation_0.1.0   
##  [76] Biostrings_2.78.0       knitr_1.51              fontBitstreamVera_0.1.1
##  [79] Seqinfo_1.0.0           xfun_0.57               matrixStats_1.5.0      
##  [82] KEGGgraph_1.70.0        stringi_1.8.7           lazyeval_0.2.2         
##  [85] ggfun_0.2.0             yaml_2.3.12             evaluate_1.0.5         
##  [88] codetools_0.2-20        gdtools_0.5.0           tibble_3.3.1           
##  [91] qvalue_2.42.0           ggplotify_0.1.3         cli_3.6.5              
##  [94] systemfonts_1.3.2       jquerylib_0.1.4         Rcpp_1.1.1             
##  [97] png_0.1-9               XML_3.99-0.23           parallel_4.5.3         
## [100] blob_1.3.0              DOSE_4.4.0              bitops_1.0-9           
## [103] tidytree_0.4.7          ggiraph_0.9.6           scales_1.4.0           
## [106] purrr_1.2.1             crayon_1.5.3            rlang_1.1.7            
## [109] cowplot_1.2.0           fastmatch_1.1-8         KEGGREST_1.50.0

☰ Menu

RNA-Seq Analysis