Last Updated: July 20, 2022
The challenge in analyzing scATACseq data lies with the nature of the data. It is extremely sparse and almost binary. For a diploid genome, each genomic site can be accessible on one allele, both alleles and no alleles. Many accessible regions are not transposed. The sparsity makes calling peaks very difficult. Many approaches have been developed to overcome this challenge. One approach is to group multiple regions together based on sequence similarity, such as binding motif, or k-mer, or co-activation pattern in DNaseq-seq data in ENCODE. This approach suffers from the loss of resolution. The second approach is to pool multiple cells together, which is used by cellranger-atac analysis pipeline and many other packages. However, this approach does not always solve the issue of sparsity, especially for rare cell types, and could underrepresent the rare cell types. The third approach works with the sparse matrix directly by turning the matrix into a binary accessibility matrix (SnapATAC, 5kb window size).
ArchR uses directly the sparse matrix and 500bp sliding window size to profile genome wide accessibility. It uses an optimized iterative latent semantic indexing (LSI) method for dimensionality reduction. ArchR is very efficient dealing with large datasets.
ArchR comparison
if (!requireNamespace("BiocManager", quietly = TRUE)){
install.packages("BiocManager")
}
if (!requireNamespace("rtracklayer", quietly = TRUE)){
install.packages("rtracklayer")
}
if (!requireNamespace("devtools", quietly = TRUE)){
install.packages("devtools")
}
if (!requireNamespace("pheatmap", quietly = TRUE)){
install.packages("pheatmap")
}
devtools::install_github("GreenleafLab/ArchR", ref="master", repos = BiocManager::repositories())
library(ArchR)
ArchR::installExtraPackages()
## For nice plot in ArchR, package Cairo is required, but it is not absolutely necessary. The installation of Cairo needs XQuartz (https://www.xquartz.org/) for MacOS.
if (!requireNamespace("Cairo", quietly = TRUE)){
install.packages("Cairo")
}
## Install peak calling package macs2
if (!requireNamespace("remotes", quietly = TRUE)){
install.packages("remotes")
}
## for motif analysis
if (!requireNamespace("chromVARmotifs", quietly = TRUE)){
devtools::install_github("GreenleafLab/chromVARmotifs")
}
#remotes::install_github("RockefellerUniversity/Herper")
#library(Herper)
#if (!requireNamespace("hexbin", quietly = TRUE)){
# install.packages("hexbin")
#}
#install_CondaTools(tools="macs2", env="PeakCalling_analysis", pathToMiniConda="./")
ArchR is not dependent on the package Cairo, which provides the way to rasterize plots. Without Cairo, the generated plots are vectorized and difficult to edit.
library(knitr)
library(pheatmap)
library(rtracklayer)
library(chromVARmotifs)
library(ArchR)
addArchRGenome("hg38")
There are pre-build genome data in ArchR: hg19, hg38, mm9 and mm10, but it supports the building of custom genomes.
inputs <- system("ls */outs/fragments.tsv.gz", intern=T)
names(inputs) <- sapply(inputs, function(x){strsplit(x, "/")[[1]][1]})
#ArrowFiles <- createArrowFiles(inputFiles = inputs, sampleNames = names(inputs),
# filterTSS = 4, # filter based on TSS enrichment score
# filterFrags = 1000, # filter based on the minimum fragments per cell
# addTileMat = T, # 500bp sliding window counts matrix
# addGeneScoreMat = T) # Gene-Score Matrix uses ATAC-Seq signal proximal to the TSS to estimate gene activity
ArrowFiles <- c("A001-C-007.arrow", "A001-C-104.arrow", "B001-A-301.arrow")
ArrowFiles
## [1] "A001-C-007.arrow" "A001-C-104.arrow" "B001-A-301.arrow"
The above command create three Arrow files, one for each of our samples. An Arrow file is a hdf5 formatted file that is stored on harddrive. During the creataion of the Arrow files, ArchR does cell filtering based on the number of unique fragments, the signal-to-background ratio which is the ratio between the average accessibility within a 50-bp window around TSS as signal and the average accessibility flanking (+/- 1900-2000 bp) region. It also produce an assessment of the fragment size distribution to check the expected nucleosome periodicity.
Let’s take a look at the TSS enrichment results and the fragment size distribution.
samples <- names(inputs)
print(paste0("for sample: ", samples[1]))
knitr::include_graphics(file.path("QualityControl", samples[1], paste0(samples[1], "-Fragment_Size_Distribution.pdf")), file.path("QualityControl", samples[1], paste0(samples[1], "-TSS_by_Unique_Frag.pdf")), dpi=300)
print(paste0("for sample: ", samples[2]))
knitr::include_graphics(file.path("QualityControl", samples[1], paste0(samples[1], "-Fragment_Size_Distribution.pdf")), file.path("QualityControl", samples[1], paste0(samples[1], "-TSS_by_Unique_Frag.pdf")), dpi=300)
print(paste0("for sample: ", samples[3]))
knitr::include_graphics(file.path("QualityControl", samples[1], paste0(samples[1], "-Fragment_Size_Distribution.pdf")), file.path("QualityControl", samples[1], paste0(samples[1], "-TSS_by_Unique_Frag.pdf")), dpi=300)
in silico synthetic doublets are created by randomly selecting any two cells thousands of times and projects these doublets to the UMAP of our data. Then nearest neighbor to these synthetic doublets are identified to be potential doublets.
double.score <- addDoubletScores(input = ArrowFiles,
k = 10, # the number of nn to each synthetic doublet
knnMethod="UMAP", # the embedding to use for knn search
LSIMethod = 1, # TF-IDF normalization: 1. tf-logidf 2. log(tf-idf) 3. logtf-logidf
verbose = T)
projatac <- ArchRProject(ArrowFiles = ArrowFiles, outputDirectory = "./scATAC_example", copyArrows = T)
projatac
## class: ArchRProject
## outputDirectory: /Users/jli/Jessie/Research/BioInfo/tmp/scATAC/scATAC_example
## samples(3): A001-C-007 A001-C-104 B001-A-301
## sampleColData names(1): ArrowFiles
## cellColData names(15): Sample TSSEnrichment ... DoubletEnrichment
## BlacklistRatio
## numberOfCells(1): 16508
## medianTSS(1): 7.71
## medianFrags(1): 9557
Let’s plot some summary plots
## plot fragment size distribution
plotFragmentSizes(ArchRProj = projatac)
## plot TSS enrichment
p1 <- plotGroups(
ArchRProj = projatac,
groupBy = "Sample",
colorBy = "cellColData",
name = "TSSEnrichment",
plotAs = "ridges"
)
p1
p2 <- plotGroups(
ArchRProj = projatac,
groupBy = "Sample",
colorBy = "cellColData",
name = "TSSEnrichment",
plotAs = "violin",
alpha = 0.4,
addBoxPlot = TRUE
)
p2
plotTSSEnrichment(ArchRProj = projatac)
projatac.filtered <- filterDoublets(projatac)
ArchR normalization and dimensionality reduction
?addIterativeLSI
projatac.filtered <- addIterativeLSI(ArchRProj = projatac.filtered,
useMatrix = "TileMatrix",
name = "IterativeLSI",
iterations = 2,
clusterParams = list(resolution = c(0.2),
sampleCells = 5000,
n.start = 10),
varFeatures = 25000,
dimsToUse = 1:30)
ArchR uses Seurat’s FindCluster() function to perform clustering.
projatac.filtered <- addClusters(input = projatac.filtered,
reduceDims = "IterativeLSI",
method = "Seurat",
name = "Clusters",
resolution = 0.8)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 15821
## Number of edges: 543190
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8951
## Number of communities: 19
## Elapsed time: 1 seconds
Summary plot on the distribution of clusters
cMatrix <- confusionMatrix(paste0(projatac.filtered$Clusters), paste0(projatac.filtered$Sample))
cMatrix <- cMatrix / Matrix::rowSums(cMatrix)
pheatmap::pheatmap(mat = cMatrix, color = paletteContinuous("blueYellow"), border_color = "black")
In order to visualize the cells in 2D, we have to project the high dimensional data to 2D.
projatac.filtered <- addUMAP(ArchRProj = projatac.filtered, reducedDims = "IterativeLSI",
name = "UMAP",
nNeighbors = 30, # number of nn to compute a UMAP
minDist = 0.5, # how tightly UMAP can pack points
metric = "cosine") # which distance to use, cosine for this type of data
## embedding slot is populated after the above step, please explore it
#str(projatac.filtered@embedding)
## plot the UMAP
plotEmbedding(ArchRProj = projatac.filtered, colorBy = "cellColData", name = "Sample", embedding = "UMAP")
## plot the UMAP, color by Clusters that were identified earlier
plotEmbedding(ArchRProj = projatac.filtered, colorBy = "cellColData", name = "Clusters", embedding = "UMAP")
ArchR provides tSNE plot feature as well, which is done using the function addTSNE(). Please explore the function and plot the data using tSNE.
ArchR uses a model to calculate gene activity score:
Gene Scores can be created at the step of creating the ArrowFiles. And it can be added by using the function addGeneScoreMatrix().
?addGeneScoreMatrix
Using these Gene Scores, one can identify marker genes. For a cell cluster, ArchR creates a group of background cells by selecting the nearest neighbor cells across the multi-dimensional space that do not belong to the cell cluster. This enables a robust determination of significance even when the group of cells is small.
GS.markers <- getMarkerFeatures(ArchRProj = projatac.filtered, useMatrix = "GeneScoreMatrix",
groupBy = "Clusters", bias = c("TSSEnrichment", "log10(nFrags)"),
testMethod = "wilcoxon")
## To get the markers
markers <- getMarkers(GS.markers, cutOff = "FDR <= 0.01 & Log2FC >= 1.25")
## We can visualize all of the markers in a heatmap and label some of genes of interest
marker.genes <- sample(unique(unlist(lapply(1:length(markers), function(x){markers[[names(markers)[x]]]$name[1:5]}))), 15)
heatmap.geneMK <- markerHeatmap(seMarker = GS.markers, cutOff = "FDR <= 0.01 & Log2FC >= 1.25",
labelMarkers = marker.genes, transpose = T)
## [1] "LINC02287" "MIR619" "NKX2-3" "COL6A1" "MIR8071-1"
## [6] "SCIMP" "ST6GAL1" "FER1L6" "LOC100130950" "GATA5"
## [11] "NOVA2" "HIF3A" "CEMIP" "RGS6" "SLC26A3"
ComplexHeatmap::draw(heatmap.geneMK, heatmap_legend_side = "bot",
annotation_legend_side = "bot")
We can overlay the marker genes on UMAP embedding.
p <- plotEmbedding(ArchRProj = projatac.filtered, colorBy = "GeneScoreMatrix",
name = marker.genes, embedding = "UMAP", quantCut = c(0.01, 0.95), imputeWeights = NULL)
p
## $LINC02287
##
## $MIR619
##
## $`NKX2-3`
##
## $COL6A1
##
## $`MIR8071-1`
##
## $SCIMP
##
## $ST6GAL1
##
## $FER1L6
##
## $LOC100130950
##
## $GATA5
##
## $NOVA2
##
## $HIF3A
##
## $CEMIP
##
## $RGS6
##
## $SLC26A3
## some fancy plotting of the same set of genes
p1 <- lapply(p, function(x){
x + guides(color = F, fill = F) + theme_ArchR(baseSize = 6.5) +
theme(plot.margin = unit(c(0,0,0,0), "cm")) +
theme(axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank())
})
do.call(cowplot::plot_grid, c(list(ncol=3), p1))
p2 <- plotBrowserTrack(ArchRProj = projatac.filtered, groupBy = "Clusters", geneSymbol = marker.genes,
upstream = 50000, downstream = 50000)
## GRanges object with 15 ranges and 2 metadata columns:
## seqnames ranges strand | gene_id symbol
## <Rle> <IRanges> <Rle> | <character> <character>
## [1] chr14 92905697-92907482 - | 101929002 LINC02287
## [2] chr12 108836908-108837006 - | 693204 MIR619
## [3] chr10 99532933-99536524 + | 159296 NKX2-3
## [4] chr21 45981737-46005050 + | 1291 COL6A1
## [5] chr14 105621116-105640232 + | 102465871 MIR8071-1
## ... ... ... ... . ... ...
## [11] chr19 45933734-45973546 - | 4858 NOVA2
## [12] chr19 46297046-46343433 + | 64344 HIF3A
## [13] chr15 80779343-80951776 + | 57214 CEMIP
## [14] chr14 71932439-72566529 + | 9628 RGS6
## [15] chr7 107765467-107803225 - | 1811 SLC26A3
## -------
## seqinfo: 24 sequences from hg38 genome
#grid::grid.newpage()
#grid.draw(p2$IKZF1)
plotPDF(plotList = p2, name = "marker-gene-tracks.pdf", ArchRProj = projatac.filtered,
addDOC = F, width = 5, height = 5)
## NULL
## NULL
## NULL
## NULL
## NULL
## NULL
## NULL
## NULL
## NULL
## NULL
## NULL
## NULL
## NULL
## NULL
## NULL
The gene scores calculated enables the integration between scATACSeq data and scRNASeq data. ArchR compares the gene scores matrix of the scATACSeq dataset to the gene expression matrix of the scRNASeq dataset and tries to align the cells.
## load scRNASeq data
load("celltyped_seurat_object.RData")
## Integrate
projatac.filtered <- addGeneIntegrationMatrix(ArchRProj = projatac.filtered,
useMatrix = "GeneScoreMatrix",
matrixName = "GeneIntegrationMatrix",
reducedDims = "IterativeLSI",
seRNA = experiment.merged,
addToArrow = F,
groupRNA = "CellType",
nameCell = "predictedCell",
nameGroup = "predictedGroup",
nameScore = "predictedScore")
## After this first round of integration, we can check how the clusters of scATAC data matches to the CellType of scRNA data.
integration.cMatrix <- as.matrix(confusionMatrix(projatac.filtered$Clusters, projatac.filtered$predictedGroup))
preClust <- colnames(integration.cMatrix)[apply(integration.cMatrix, 1, which.max)]
cbind(preClust, rownames(integration.cMatrix))
## preClust
## [1,] "Neurons" "C15"
## [2,] "Podocytes" "C12"
## [3,] "Podocytes" "C18"
## [4,] "Goblet cells" "C11"
## [5,] "Podocytes" "C2"
## [6,] "Plasma cells" "C3"
## [7,] "Podocytes" "C16"
## [8,] "Podocytes" "C17"
## [9,] "Macrophages" "C6"
## [10,] "Goblet cells" "C9"
## [11,] "Epithelial cells" "C13"
## [12,] "T memory cells" "C4"
## [13,] "Goblet cells" "C14"
## [14,] "Epithelial cells" "C19"
## [15,] "Enterocytes" "C10"
## [16,] "B cells memory" "C7"
## [17,] "Epithelial cells" "C1"
## [18,] "T memory cells" "C5"
## [19,] "Podocytes" "C8"
## CellType from scRNA
cEpi <- "Epithelial cells"
matched.types <- unique(projatac.filtered$predictedGroup)
cNon.Epi <- paste0(matched.types[!(matched.types %in% cEpi)], collapse="|")
## Assigned scATAC
assign.Epi <- rownames(integration.cMatrix)[grep(cEpi, preClust)]
assign.Non.Epi <- rownames(integration.cMatrix)[grep(cNon.Epi, preClust)]
## Identify cells from scRNA to each categories
rna.Epi <- colnames(experiment.merged)[grep(cEpi, experiment.merged$CellType)]
rna.Non.Epi <- colnames(experiment.merged)[grep(cNon.Epi, experiment.merged$CellType)]
## Create a nested list to match the celltypes from scRNA to scATAC
groupList <- SimpleList(
Epithelial = SimpleList(
ATAC = projatac.filtered$cellNames[projatac.filtered$Clusters %in% assign.Epi],
RNA = rna.Epi
),
Non.Epithelial = SimpleList(
ATAC = projatac.filtered$cellNames[projatac.filtered$Clusters %in% assign.Non.Epi],
RNA = rna.Non.Epi
)
)
## We pass this list to the `groupList` parameter of the addGeneIntegrationMatrix` function
projatac.filtered <- addGeneIntegrationMatrix(
ArchRProj = projatac.filtered,
useMatrix = "GeneScoreMatrix",
matrixName = "GeneIntegrationMatrix",
reducedDims = "IterativeLSI",
seRNA = experiment.merged,
addToArrow = FALSE,
groupList = groupList,
groupRNA = "CellType",
nameCell = "predictedCell_Constrain",
nameGroup = "predictedGroup_Constrain",
nameScore = "predictedScore_Constrain"
)
pal <- paletteDiscrete(values = experiment.merged$CellType)
pal
## Epithelial cells Goblet cells Neurons Podocytes
## "#D51F26" "#272E6A" "#208A42" "#89288F"
## Enterocytes T memory cells Macrophages Plasma cells
## "#F47D2B" "#FEE500" "#8A9FD1" "#C06CAB"
## Endothelial cells B cells memory Tuft cells
## "#D8A767" "#90D5E4" "#89C75F"
p3 <- plotEmbedding(
projatac.filtered,
colorBy = "cellColData",
name = "predictedGroup",
pal = pal
)
p4 <- plotEmbedding(
projatac.filtered,
colorBy = "cellColData",
name = "predictedGroup_Constrain",
pal = pal
)
plotPDF(p3, p4, name="plot-UMAP-RNA-Integration.pdf", ArchRProj=projatac.filtered, addDOC=F, width=5, height=5)
saveArchRProject(ArchRProj = projatac.filtered, outputDirectory="scATAC_example", load=F)
ArchR use an iterative overlap scheme to form peaks. It also uses fixed width peaks. Variable width peaks makes it difficult to merge peak calls from multiple samples and makes the analysis more complicated because of the need to normalize to peak width.
Based on the nature of scATACSeq data, pseudo bulk data is generated by sampling cells. ArchR uses a complex decision tree to perform the pseudo bulk replicates generation.
#loadArchRProject(path="./scATAC_peak")
#readRDS("scATAC_peak/Save-ArchR-Project.rds")
projatac.peak <- addGroupCoverages(ArchRProj = projatac.filtered, groupBy = "predictedGroup_Constrain")
#saveArchRProject(ArchRProj = projatac.peak, outputDirectory="scATAC_peak", load=T)
#saveArchRProject(ArchRProj = projatac.filtered, outputDirectory="scATAC_filtered", load=T)
#projatac.peak <- addReproduciblePeakSet(ArchRProj = projatac.peak,
# groupBy = "predictedGroup_Constrain",
# pathToMacs2 = path2macs2)
#projatac.peak <- addReproduciblePeakSet(ArchRProj = projatac.peak,
# groupBy = "predictedGroup_Constrain",
# peakMethod = "Tiles", method = "p")
peaks <- import("peaks.bed")
projatac.peak <- addPeakSet(ArchRProj = projatac.peak, peakSet = peaks, force=T)
projatac.peak <- addPeakMatrix(ArchRProj = projatac.peak)
## marker peaks
markersPeaks <- getMarkerFeatures(ArchRProj = projatac.peak,
useMatrix = "PeakMatrix",
groupBy = "predictedGroup_Constrain",
bias = c("TSSEnrichment", "log10(nFrags)"),
testMethod = "wilcoxon")
peak.markerList <- getMarkers(markersPeaks, cutOff = "FDR <= 0.01 & Log2FC >= 1")
heatmapPeaks <- markerHeatmap(
seMarker = markersPeaks,
cutOff = "FDR <= 0.1 & Log2FC >= 0.5",
transpose = TRUE
)
## [1] "chr1:9787-10683" "chr1:633519-634529"
## [3] "chr1:966225-966581" "chr1:966683-967261"
## [5] "chr1:1305331-1306107" "chr1:1330281-1330727"
## [7] "chr1:1330846-1331548" "chr1:1692400-1693364"
## [9] "chr1:1737593-1738494" "chr1:1778366-1779288"
## [11] "chr1:1890547-1891616" "chr1:2493975-2494844"
## [13] "chr1:2547968-2548436" "chr1:2585086-2585770"
## [15] "chr1:2821689-2822574" "chr1:629499-630392"
## [17] "chr1:932841-933733" "chr1:960308-961243"
## [19] "chr1:1004840-1005587" "chr1:1006196-1007123"
## [21] "chr1:1013020-1013911" "chr1:1024377-1025245"
## [23] "chr1:1032738-1033695" "chr1:1059205-1060102"
## [25] "chr1:1068623-1069635" "chr1:1073659-1074596"
## [27] "chr1:1243945-1244780" "chr1:1272219-1273060"
## [29] "chr1:1281526-1282426" "chr1:1317986-1318855"
## [31] "chr1:1337464-1338299" "chr1:1344535-1345428"
## [33] "chr1:1666322-1667009" "chr1:1723978-1724891"
## [35] "chr1:1725971-1726769" "chr1:1881533-1882404"
## [37] "chr1:1896962-1897811" "chr1:2048000-2048854"
## [39] "chr1:2049852-2050885" "chr1:1437766-1438661"
## [41] "chr1:2212418-2213308" "chr1:2341651-2342570"
## [43] "chr1:3221906-3222819" "chr1:5927601-5928449"
## [45] "chr1:6204839-6205572" "chr1:6359442-6360337"
## [47] "chr1:6472015-6472419" "chr1:6706381-6707255"
## [49] "chr1:6730567-6731441" "chr1:6750274-6751148"
## [51] "chr1:6899381-6900286" "chr1:6916802-6917692"
## [53] "chr1:6962936-6963476" "chr1:1125039-1125659"
## [55] "chr1:1126871-1127664" "chr1:1145021-1145782"
## [57] "chr1:1157128-1158692" "chr1:1162345-1163249"
## [59] "chr1:1167026-1167910" "chr1:1304460-1305198"
## [61] "chr1:1364083-1364899" "chr1:1721030-1721966"
## [63] "chr1:1728362-1729206" "chr1:1857013-1857825"
## [65] "chr1:1858510-1859269" "chr1:1915594-1916485"
## [67] "chr1:1143748-1144665" "chr1:1230418-1231154"
## [69] "chr1:2290376-2291307" "chr1:3890735-3891857"
## [71] "chr1:6056167-6056997" "chr1:6127476-6128367"
## [73] "chr1:8126905-8127743" "chr1:8211469-8212414"
## [75] "chr1:9179922-9180607" "chr1:9239515-9240344"
## [77] "chr1:9730322-9731172" "chr1:10414902-10415800"
## [79] "chr1:10530043-10530966" "chr1:10677100-10678010"
## [81] "chr1:1217676-1218574" "chr1:1658691-1659629"
## [83] "chr1:2250383-2251294" "chr1:2632758-2633605"
## [85] "chr1:3249845-3250758" "chr1:3292950-3293839"
## [87] "chr1:3320522-3321030" "chr1:3903038-3903940"
## [89] "chr1:3921932-3922844" "chr1:3942547-3943419"
## [91] "chr1:3948005-3948991" "chr1:3971686-3972679"
## [93] "chr1:4083591-4084465" "chr1:4267227-4268126"
## [95] "chr1:5367937-5368821" "chr1:7401581-7402460"
## [97] "chr1:12539720-12540586" "chr1:14258900-14259819"
## [99] "chr1:30823990-30824852" "chr1:41109372-41110253"
## [101] "chr1:62720732-62721626" "chr1:79192946-79194003"
## [103] "chr1:95742465-95743388" "chr1:172381770-172382634"
## [105] "chr1:223747799-223748651" "chr10:6467239-6468118"
## [107] "chr10:46910904-46911804" "chr10:58810635-58811631"
## [109] "chr10:75401677-75402543" "chr10:111076576-111077105"
## [111] "chr1:1207878-1208719" "chr1:1219068-1219971"
## [113] "chr1:1250615-1251391" "chr1:1615064-1615996"
## [115] "chr1:2231570-2232313" "chr1:2236919-2237786"
## [117] "chr1:2412214-2413128" "chr1:2415717-2416354"
## [119] "chr1:2840075-2840946" "chr1:3530914-3531848"
## [121] "chr1:5667040-5667833" "chr1:5955076-5956099"
## [123] "chr1:6273122-6274039"
#draw(heatmapPeaks, heatmap_legend_side = "bot", annotation_legend_side = "bot")
plotPDF(heatmapPeaks, name = "Peak-Marker-Heatmap", width=8, height=6, ArchRProj=projatac.peak, addDOC=F)
peak.ma <- markerPlot(seMarker = markersPeaks, name = "Macrophages", cutOff="FDR <= 0.01 & Log2FC >= 1")
plotPDF(peak.ma, name = "Peak-MA", width=8, height=6, ArchRProj=projatac.peak, addDOC=F)
#saveArchRProject(ArchRProj = projatac.peak, outputDirectory="scATAC_peak", load=T)
peak.vc <- markerPlot(seMarker=markersPeaks, name = "Macrophages", cutOff="FDR <= 0.01 & Log2FC >= 1", plotAs="Volcano")
plotPDF(peak.vc, name="Peak-Volcano", width=8, height=6, ArchRProj=projatac.peak, addDOC=F)
peak.tracks <- plotBrowserTrack(
ArchRProj = projatac.peak,
groupBy = "predictedGroup_Constrain",
geneSymbol = c("BMP4"),
features = getMarkers(markersPeaks, cutOff="FDR <= 0.01 & Log2FC >= 1", returnGR = TRUE)["Macrophages"],
upstream = 50000, downstream = 50000
)
## GRanges object with 1 range and 2 metadata columns:
## seqnames ranges strand | gene_id symbol
## <Rle> <IRanges> <Rle> | <character> <character>
## [1] chr14 53949736-53958761 - | 652 BMP4
## -------
## seqinfo: 24 sequences from hg38 genome
plotPDF(peak.tracks, name="BMP4.markerpeak.track", width=5, height=5, ArchRProj=projatac.peak, addDOC=F)
## NULL
T.vs.Macro <- getMarkerFeatures(
ArchRProj = projatac.peak,
useMatrix = "PeakMatrix",
groupBy = "predictedGroup_Constrain",
testMethod = "wilcoxon",
bias = c("TSSEnrichment", "log10(nFrags)"),
useGroups = "T memory cells",
bgdGroups = "Macrophages"
)
## motif enrichment using Catalog of Inferred Sequence Binding Preferences database
projatac.peak <- addMotifAnnotations(ArchRProj=projatac.peak, motifSet="cisbp", name="Motif")
# Up regulated TF motif
motifsUp <- peakAnnoEnrichment(
seMarker = T.vs.Macro,
ArchRProj = projatac.peak,
peakAnnotation = "Motif",
cutOff = "FDR <= 0.1 & Log2FC >= 0.5"
)
# gather data for plotting
df <- data.frame(TF = rownames(motifsUp), mlog10Padj = assay(motifsUp)[,1])
df <- df[order(df$mlog10Padj, decreasing=TRUE),]
df$rank <- seq_len(nrow(df))
ggUp <- ggplot(df, aes(rank, mlog10Padj, color = mlog10Padj)) +
geom_point(size = 1) +
ggrepel::geom_label_repel(
data = df[rev(seq_len(30)), ], aes(x = rank, y = mlog10Padj, label = TF),
size = 1.5,
nudge_x = 2,
color = "black"
) + theme_ArchR() +
ylab("-log10(P-adj) Motif Enrichment") +
xlab("Rank Sorted TFs Enriched") +
scale_color_gradientn(colors = paletteContinuous(set = "comet"))
plotPDF(ggUp, name="Up.Motif-enrichment", width=5, height=5, addDOC=F)
# Down regulated TF motif
motifsDn <- peakAnnoEnrichment(
seMarker = T.vs.Macro,
ArchRProj = projatac.peak,
peakAnnotation = "Motif",
cutOff = "FDR <= 0.1 & Log2FC <= -0.5"
)
df <- data.frame(TF = rownames(motifsDn), mlog10Padj = assay(motifsDn)[,1])
df <- df[order(df$mlog10Padj, decreasing = TRUE),]
df$rank <- seq_len(nrow(df))
ggDn <- ggplot(df, aes(rank, mlog10Padj, color = mlog10Padj)) +
geom_point(size = 1) +
ggrepel::geom_label_repel(
data = df[rev(seq_len(30)), ], aes(x = rank, y = mlog10Padj, label = TF),
size = 1.5,
nudge_x = 2,
color = "black"
) + theme_ArchR() +
ylab("-log10(FDR) Motif Enrichment") +
xlab("Rank Sorted TFs Enriched") +
scale_color_gradientn(colors = paletteContinuous(set = "comet"))
plotPDF(ggDn, name="Down.Motif-enrichment", width=5, height=5, addDOC=F)
## Motifs enriched in marker peaks
motif.marker.Up <- peakAnnoEnrichment(
seMarker = markersPeaks,
ArchRProj = projatac.peak,
peakAnnotation = "Motif",
cutOff = "FDR <= 0.1 & Log2FC >= 0.5"
)
heatmap.markerEN <- plotEnrichHeatmap(motif.marker.Up, n = 5, transpose = TRUE)
plotPDF(heatmap.markerEN, name="MarkerPeak.Motif.Up", width=5, height=5, addDOC=F)
## Other enrichment
?addArchRAnnotations
sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] grid stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] motifmatchr_1.16.0 ggrepel_0.9.1
## [3] hexbin_1.28.2 circlize_0.4.13
## [5] ComplexHeatmap_2.10.0 presto_1.0.0
## [7] uwot_0.1.11 ggridges_0.5.3
## [9] BSgenome.Hsapiens.UCSC.hg38_1.4.4 BSgenome_1.62.0
## [11] Biostrings_2.62.0 XVector_0.34.0
## [13] nabor_0.5.0 SeuratObject_4.0.4
## [15] Seurat_4.1.0 rhdf5_2.38.1
## [17] SummarizedExperiment_1.24.0 Biobase_2.54.0
## [19] MatrixGenerics_1.6.0 Rcpp_1.0.8.3
## [21] Matrix_1.3-3 matrixStats_0.61.0
## [23] data.table_1.14.2 stringr_1.4.0
## [25] plyr_1.8.6 magrittr_2.0.2
## [27] ggplot2_3.3.5 gtable_0.3.0
## [29] gtools_3.9.2 gridExtra_2.3
## [31] ArchR_1.0.2 chromVARmotifs_0.2.0
## [33] rtracklayer_1.54.0 GenomicRanges_1.46.1
## [35] GenomeInfoDb_1.30.0 IRanges_2.28.0
## [37] S4Vectors_0.32.3 BiocGenerics_0.40.0
## [39] pheatmap_1.0.12 knitr_1.37
##
## loaded via a namespace (and not attached):
## [1] utf8_1.2.2 reticulate_1.24
## [3] R.utils_2.11.0 tidyselect_1.1.2
## [5] poweRlaw_0.70.6 RSQLite_2.2.9
## [7] AnnotationDbi_1.56.2 htmlwidgets_1.5.4
## [9] BiocParallel_1.28.3 Rtsne_0.15
## [11] munsell_0.5.0 codetools_0.2-18
## [13] ica_1.0-2 future_1.23.0
## [15] miniUI_0.1.1.1 withr_2.4.3
## [17] colorspace_2.0-2 highr_0.9
## [19] rstudioapi_0.13 ROCR_1.0-11
## [21] tensor_1.5 listenv_0.8.0
## [23] labeling_0.4.2 GenomeInfoDbData_1.2.7
## [25] polyclip_1.10-0 farver_2.1.0
## [27] bit64_4.0.5 parallelly_1.30.0
## [29] vctrs_0.3.8 generics_0.1.2
## [31] xfun_0.29 doParallel_1.0.16
## [33] R6_2.5.1 clue_0.3-60
## [35] bitops_1.0-7 rhdf5filters_1.6.0
## [37] spatstat.utils_2.3-0 cachem_1.0.6
## [39] DelayedArray_0.20.0 assertthat_0.2.1
## [41] promises_1.2.0.1 BiocIO_1.4.0
## [43] scales_1.1.1 Cairo_1.6-0
## [45] globals_0.14.0 goftest_1.2-3
## [47] seqLogo_1.60.0 rlang_1.0.2
## [49] GlobalOptions_0.1.2 splines_4.1.0
## [51] lazyeval_0.2.2 spatstat.geom_2.3-1
## [53] yaml_2.3.5 reshape2_1.4.4
## [55] abind_1.4-5 httpuv_1.6.5
## [57] tools_4.1.0 ellipsis_0.3.2
## [59] spatstat.core_2.3-2 jquerylib_0.1.4
## [61] RColorBrewer_1.1-2 zlibbioc_1.40.0
## [63] purrr_0.3.4 RCurl_1.98-1.5
## [65] rpart_4.1-15 deldir_1.0-6
## [67] GetoptLong_1.0.5 pbapply_1.5-0
## [69] cowplot_1.1.1 zoo_1.8-9
## [71] cluster_2.1.2 RSpectra_0.16-0
## [73] scattermore_0.7 lmtest_0.9-39
## [75] RANN_2.6.1 fitdistrplus_1.1-6
## [77] hms_1.1.1 patchwork_1.1.1
## [79] mime_0.12 evaluate_0.14
## [81] xtable_1.8-4 XML_3.99-0.8
## [83] shape_1.4.6 compiler_4.1.0
## [85] tibble_3.1.6 KernSmooth_2.23-20
## [87] crayon_1.5.0 R.oo_1.24.0
## [89] htmltools_0.5.2 mgcv_1.8-35
## [91] later_1.3.0 tzdb_0.2.0
## [93] TFBSTools_1.32.0 tidyr_1.2.0
## [95] DBI_1.1.2 MASS_7.3-54
## [97] readr_2.1.1 cli_3.2.0
## [99] R.methodsS3_1.8.1 parallel_4.1.0
## [101] igraph_1.2.11 pkgconfig_2.0.3
## [103] GenomicAlignments_1.30.0 TFMPvalue_0.0.8
## [105] plotly_4.10.0 spatstat.sparse_2.1-0
## [107] foreach_1.5.1 annotate_1.72.0
## [109] bslib_0.3.1 DirichletMultinomial_1.36.0
## [111] digest_0.6.29 sctransform_0.3.3
## [113] RcppAnnoy_0.0.19 pracma_2.3.8
## [115] CNEr_1.30.0 spatstat.data_2.1-2
## [117] rmarkdown_2.11 leiden_0.3.9
## [119] restfulr_0.0.13 shiny_1.7.1
## [121] Rsamtools_2.10.0 rjson_0.2.21
## [123] nlme_3.1-152 lifecycle_1.0.1
## [125] jsonlite_1.8.0 Rhdf5lib_1.16.0
## [127] viridisLite_0.4.0 fansi_1.0.2
## [129] pillar_1.7.0 lattice_0.20-44
## [131] KEGGREST_1.34.0 fastmap_1.1.0
## [133] httr_1.4.2 survival_3.2-11
## [135] GO.db_3.14.0 glue_1.6.2
## [137] iterators_1.0.13 png_0.1-7
## [139] bit_4.0.4 stringi_1.7.6
## [141] sass_0.4.0 blob_1.2.2
## [143] caTools_1.18.2 memoise_2.0.1
## [145] dplyr_1.0.8 irlba_2.3.5
## [147] future.apply_1.8.1