Home
Introduction and Lectures
Intro to the Workshop and Core
Schedule
What is Bioinformatics/Genomics?
Generating Microbial Amplicons
Support
Zoom
Slack
Cheat Sheets
Software and Links
Scripts
Prerequisites
CLI
Introduction to R
High Performance Computing
Data Reduction
Filetype Description
MCA project setup
Preprocessing Reads
ASV production
Data Analysis
Prepare Analysis
MCA Part 1
MCA Part 2
MCA Part 3
MCA Part 4
MCA Part 5
MCA Part 6
ETC
Closing thoughts
Github page
Biocore website

Diversity andTransformations

Load our libraries

# Set up global options for nice reports and keeping figures:
knitr::opts_chunk$set(fig.width=14, fig.height=8, fig.align="center",
                      warning=FALSE, message=FALSE)

Lets start by loading libraries

library(phyloseq)
library(phangorn)
library(vegan)
library(ggplot2)
library(edgeR)
library(gridExtra)

nice_colors = c("#999999", "#E69F00", "#56B4E9","#e98756","#c08160","#5800e6", "#CDDC49", "#C475D3", 
                "#E94B30", "#233F57", "#FEE659", "#A1CFDD", "#F4755E", "#D6F6F7","#EB6D58", "#6898BF")

Load prior results

load(file=file.path("rdata_objects", "filtered_phyloseq.RData"))
ls()

## [1] "nice_colors" "ps"          "ps.1"

Diversity Plots

These plots are generated on untrimmed datasets as indicated:

You must use untrimmed datasets for meaningful
results, as these estimates (and even the ``observed'' richness)
are highly dependent on the number of singletons. You can always
trim the data later on if needed, just not before using this
function.

Rarefaction, Alpha Diversity, and Statistics
Estimating the Number of Species in Microbial Diversity Studies
And see this for a more detailed review of the large variety of diversity measures.
Deciphering Diversity Indices for a Better Understanding of Microbial Communities

Observed Richness: The number of ASVs detected per sample:

plot_richness(ps, x="Temp", measures=c("Observed"), color="Group") + theme_bw() +
  theme(axis.text.x = element_text(angle = 90, face = "bold", size=10)) + 
  ggtitle("Observed diversity")

Shannon Diversity: Measures richness and evenness

#c("Observed", "Chao1", "ACE", "Shannon",  "Simpson", "InvSimpson", "Fisher")
plot_richness(ps, x="Temp", measures=c("Shannon"), color="Group") + theme_bw() +
  theme(axis.text.x = element_text(angle = 90, face = "bold", size=10)) + 
  ggtitle("Shannon diversity index")

Chao1 index: Attempts to estimate true richness based on Poisson distribution

What are the other measure?
Explore the other measures

er <- estimate_richness(ps, measures=c("Chao1", "Shannon"))

res.aov <- aov(er$Shannon ~ Group + Temp, data = as(sample_data(ps),"data.frame"))
# Summary of the analysis
summary(res.aov)

##              Df Sum Sq Mean Sq F value   Pr(>F)    
## Group         3  0.891  0.2972   7.586 8.11e-05 ***
## Temp          4  1.693  0.4233  10.806 6.56e-08 ***
## Residuals   189  7.404  0.0392                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

These measures can then be modelled and tested for differences between your groups.

Investigate transformations.

We transform microbiome count data to account for differences in library size, variance, scale, etc.

RLE - is the scaling factor method proposed by Anders and Huber (2010). We call it “relative log expression”, as median library is calculated from the geometric mean of all columns and the median ratio of each sample to the median library is taken as the scale factor.

## for Firmictures
plot_abundance = function(physeq, meta, title = "", Facet = "Order", Color = "Order"){
  # Arbitrary subset, based on Phylum, for plotting
  p1f = subset_taxa(physeq, Phylum %in% c("Firmicutes"))
  mphyseq = psmelt(p1f)
  mphyseq <- subset(mphyseq, Abundance > 0)
  ggplot(data = mphyseq, mapping = aes_string(x = meta,y = "Abundance",
                                 color = Color, fill = Color)) +
    geom_violin(fill = NA) +
    geom_point(size = 1, alpha = 0.3,
                position = position_jitter(width = 0.3)) +
    facet_wrap(facets = Facet) + scale_y_log10() +
    theme(legend.position="none") + labs(title=title)
}

# transform counts into "relative abundances"
ps.1ra = transform_sample_counts(ps.1, function(x){x / sum(x)})

# transform counts into "hellinger standardized counts"
ps.1hell <- ps.1
otu_table(ps.1hell) <- otu_table(decostand(otu_table(ps.1hell), method = "hellinger"), taxa_are_rows=FALSE)

# RLE counts
ps.1RLE <- ps.1
RLE_normalization <- function(phyloseq){
  prior.count = 1
  count_scale = median(sample_sums(phyloseq))
  m = as(otu_table(phyloseq), "matrix")
  d = DGEList(counts=t(m), remove.zeros = FALSE)
  z = calcNormFactors(d, method="RLE")
  y <- as.matrix(z)
  lib.size <- z$samples$lib.size * z$samples$norm.factors
  ## rescale to median sample count
  out <- round(count_scale * sweep(y,MARGIN=2, STATS=lib.size,FUN = "/"))
  dimnames(out) <- dimnames(y)
  t(out)
}
otu_table(ps.1RLE) <- otu_table(RLE_normalization(ps.1), taxa_are_rows=FALSE)
ps.1logRLE = transform_sample_counts(ps.1RLE, function(x){ log2(x +1)})

plotOriginal = plot_abundance(ps.1, "Temp", title="original")
plotRelative = plot_abundance(ps.1ra, "Temp", title="relative")
plotHellinger = plot_abundance(ps.1hell, "Temp", title="Hellinger")
plotLogRLE = plot_abundance(ps.1logRLE, "Temp", title="LogRLE")

# Combine each plot into one graphic.
grid.arrange(nrow = 4, plotOriginal, plotRelative, plotHellinger, plotLogRLE)

Normalization and microbial differential abundance strategies depend upon data characteristics

Cleanup

Save object

dir.create("rdata_objects", showWarnings = FALSE)
save(list=c("ps","ps.1","ps.1ra", "ps.1hell", "ps.1logRLE"), file=file.path("rdata_objects", "transformed_objects.RData"))

Get next Rmd

download.file("https://raw.githubusercontent.com/ucdavis-bioinformatics-training/2021-May-Microbial-Community-Analysis/master/data_analysis/mca_part5.Rmd", "mca_part5.Rmd")

Record session information

sessionInfo()

## R version 4.0.3 (2020-10-10)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] gridExtra_2.3   edgeR_3.32.1    limma_3.46.0    ggplot2_3.3.3  
##  [5] vegan_2.5-7     lattice_0.20-44 permute_0.9-5   phangorn_2.7.0 
##  [9] ape_5.5         phyloseq_1.34.0
## 
## loaded via a namespace (and not attached):
##  [1] Biobase_2.50.0      sass_0.4.0          jsonlite_1.7.2     
##  [4] splines_4.0.3       foreach_1.5.1       bslib_0.2.5.1      
##  [7] assertthat_0.2.1    highr_0.9           stats4_4.0.3       
## [10] yaml_2.2.1          progress_1.2.2      pillar_1.6.1       
## [13] glue_1.4.2          quadprog_1.5-8      digest_0.6.27      
## [16] XVector_0.30.0      colorspace_2.0-1    htmltools_0.5.1.1  
## [19] Matrix_1.3-3        plyr_1.8.6          pkgconfig_2.0.3    
## [22] zlibbioc_1.36.0     purrr_0.3.4         scales_1.1.1       
## [25] tibble_3.1.2        mgcv_1.8-35         farver_2.1.0       
## [28] generics_0.1.0      IRanges_2.24.1      ellipsis_0.3.2     
## [31] withr_2.4.2         BiocGenerics_0.36.1 survival_3.2-11    
## [34] magrittr_2.0.1      crayon_1.4.1        evaluate_0.14      
## [37] fansi_0.4.2         nlme_3.1-152        MASS_7.3-54        
## [40] tools_4.0.3         data.table_1.14.0   prettyunits_1.1.1  
## [43] hms_1.1.0           lifecycle_1.0.0     stringr_1.4.0      
## [46] Rhdf5lib_1.12.1     S4Vectors_0.28.1    munsell_0.5.0      
## [49] locfit_1.5-9.4      cluster_2.1.2       Biostrings_2.58.0  
## [52] ade4_1.7-16         compiler_4.0.3      jquerylib_0.1.4    
## [55] rlang_0.4.11        rhdf5_2.34.0        grid_4.0.3         
## [58] iterators_1.0.13    rhdf5filters_1.2.1  biomformat_1.18.0  
## [61] igraph_1.2.6        labeling_0.4.2      rmarkdown_2.8      
## [64] gtable_0.3.0        codetools_0.2-18    multtest_2.46.0    
## [67] DBI_1.1.1           reshape2_1.4.4      R6_2.5.0           
## [70] knitr_1.33          dplyr_1.0.6         utf8_1.2.1         
## [73] fastmatch_1.1-0     stringi_1.6.2       parallel_4.0.3     
## [76] Rcpp_1.0.6          vctrs_0.3.8         tidyselect_1.1.1   
## [79] xfun_0.23

☰ Menu

Microbial Community Analysis (amplicons)