Running the dbcAmplicons pipeline

This document assumes dbcAmplicons installing software and dbcAmplicons commands has been completed.

IF for some reason it didn’t finish, is corrupted or you missed the session, you can use my instance. In your ~/.bash_profile edit the lines to use my folders

export PATH=/share/workshop/msettles/mca_example/bin:$PATH  
module load java/jdk1.8
export RDP_PATH=/share/workshop/msettles/mca_example/src/RDPTools  
module load anaconda2
source /share/workshop/msettles/mca_example/src/dbcA_virtualenv/bin/activate  
export PYTHON_EGG_CACHE=/share/workshop/$USER/mca_example/src  

Lets login and request an interactive session on the clusters

cd /share/workshop/$USER/mca_example
srun -t 08:00:00 -c 4 -n 1 --mem 8000 --account workshop --reservation workshop --pty /bin/bash

After getting onto a cluster node, lets initialize the environment we created.

source /share/workshop/$USER/mca_example/src/dbcA_profile

1. First lets validate our install and environment

Change directory into the workshops space

cd
cd /share/workshop/$USER/mca_example
ls --color

you should see 4 directories: bin, Illumina_Reads, metadata, and src

Lets verify the software is accessible

dbcVersionReport.sh

you should see the version info for dbcAmplicons, flash2 and RDP.

If all this is correct, we are ready to begin.

1. First lets validate our input files:

dbcAmplicons validate -h
dbcAmplicons validate -B metadata/dbcBarcodeLookupTable.txt -P metadata/PrimerTable.txt -S metadata/workshopSamplesheet.txt

Once we are sure our input files are ok

2. Lets Preprocess the data

dbcAmplicons preprocess -h

First lets test preprocessing dbcAmplicons preprocess -B metadata/dbcBarcodeLookupTable.txt -P metadata/PrimerTable.txt -S metadata/workshopSamplesheet.txt -1 Illumina_Reads/Slashpile_only_R1.fastq.gz –test > preprocess.log

View preprocess.log and the file Identified_barcodes.txt, make sure the results make sense

cat preprocess.log
cat Identified_Barcodes.txt

Run all reads

dbcAmplicons preprocess -B metadata/dbcBarcodeLookupTable.txt -P metadata/PrimerTable.txt -S metadata/workshopSamplesheet.txt -O Slashpile.intermediate -1 Illumina_Reads/Slashpile_only_R1.fastq.gz > preprocess.log

cat preprocess.log
cat Identified_Barcodes.txt

3. Lets merge/join the read pairs

dbcAmplicons join -h

dbcAmplicons join -t 4 -O Slashpile.intermediate/16sV1V3/Slashpile-16sV1V3 -1 Slashpile.intermediate/16sV1V3/Slashpile-16sV1V3_R1.fastq.gz > join-16sV1V3.log
cat join-16sV1V3.log

dbcAmplicons join -t 4 -O Slashpile.intermediate/16sV4V5/Slashpile-16sV4V5 -1 Slashpile.intermediate/16sV4V5/Slashpile-16sV4V5_R1.fastq.gz > join-16sV4V5.log
cat join-16sV4V5.log

dbcAmplicons join -t 4 -O Slashpile.intermediate/ITS1/Slashpile-ITS1 -1 Slashpile.intermediate/ITS1/Slashpile-ITS1_R1.fastq.gz  > join-ITS1.log
cat join-ITS1.log

dbcAmplicons join -t 4 -O Slashpile.intermediate/ITS2/Slashpile-ITS2 -1 Slashpile.intermediate/ITS2/Slashpile-ITS2_R1.fastq.gz  > join-ITS2.log
cat join-ITS2.log

dbcAmplicons join -t 4 -O Slashpile.intermediate/LSU/Slashpile-LSU -1 Slashpile.intermediate/LSU/Slashpile-LSU_R1.fastq.gz > join-LSU.log
cat join-LSU.log

4. Classify the merged reads using RDP

dbcAmplicons classify -h

dbcAmplicons classify -p 4 --gene 16srrna -U Slashpile.intermediate/16sV1V3/Slashpile-16sV1V3.extendedFrags.fastq.gz -O Slashpile.intermediate/16sV1V3/Slashpile-16sV1V3

dbcAmplicons classify -p 4 --gene 16srrna -U Slashpile.intermediate/16sV4V5/Slashpile-16sV4V5.extendedFrags.fastq.gz -O Slashpile.intermediate/16sV4V5/Slashpile-16sV4V5

dbcAmplicons classify -p 4 --gene fungalits_unite -U Slashpile.intermediate/ITS1/Slashpile-ITS1.extendedFrags.fastq.gz -O Slashpile.intermediate/ITS1/Slashpile-ITS1

dbcAmplicons classify -p 4 --gene fungalits_unite -U Slashpile.intermediate/ITS2/Slashpile-ITS2.extendedFrags.fastq.gz -O Slashpile.intermediate/ITS2/Slashpile-ITS2

dbcAmplicons classify -p 4 --gene fungallsu -1 Slashpile.intermediate/LSU/Slashpile-LSU.notCombined_1.fastq.gz -2 Slashpile.intermediate/LSU/Slashpile-LSU.notCombined_2.fastq.gz -O Slashpile.intermediate/LSU/Slashpile-LSU

5. Produce Abundance tables

mkdir Slashpile.results

dbcAmplicons abundance -h

dbcAmplicons abundance -S metadata/workshopSamplesheet.txt -O Slashpile.results/16sV1V3 -F Slashpile.intermediate/16sV1V3/Slashpile-16sV1V3.fixrank --biom > abundance.16sV1V3.log
cat abundance.16sV1V3.log

dbcAmplicons abundance -S metadata/workshopSamplesheet.txt -O Slashpile.results/16sV4V5 -F Slashpile.intermediate/16sV4V5/Slashpile-16sV4V5.fixrank --biom  > abundance.16sV4V5.log
cat abundance.16sV4V5.log

dbcAmplicons abundance -S metadata/workshopSamplesheet.txt -O Slashpile.results/ITS1 -F Slashpile.intermediate/ITS1/Slashpile-ITS1.fixrank --biom  > abundance.ITS1.log
cat abundance.ITS1.log

dbcAmplicons abundance -S metadata/workshopSamplesheet.txt -O Slashpile.results/ITS2 -F Slashpile.intermediate/ITS2/Slashpile-ITS2.fixrank  --biom > abundance.ITS2.log
cat abundance.ITS2.log

dbcAmplicons abundance -S metadata/workshopSamplesheet.txt -O Slashpile.results/LSU -F Slashpile.intermediate/LSU/Slashpile-LSU.fixrank --biom > abundance.LSU.log
cat abundance.LSU.log

6. Split Reads by samples, for downstream processing in another application (post preprocessing/merging), or for submission to the SRA.

splitReadsBySample.py -h
splitReadsBySample.py -O SplitBySample/16sV1V3 -1 Slashpile.intermediate/16sV1V3/Slashpile-16sV1V3_R1.fastq.gz -2 Slashpile.intermediate/16sV1V3/Slashpile-16sV1V3_R2.fastq.gz

splitReadsBySample.py -h
splitReadsBySample.py -O SplitBySample/16sV4V5 -1 Slashpile.intermediate/16sV4V5/Slashpile-16sV4V5_R1.fastq.gz -2 Slashpile.intermediate/16sV4V5/Slashpile-16sV4V5_R2.fastq.gz

splitReadsBySample.py -h
splitReadsBySample.py -O SplitBySample/ITS1 -1 Slashpile.intermediate/ITS1/Slashpile-ITS1_R1.fastq.gz -2 Slashpile.intermediate/ITS1/Slashpile-ITS1_R2.fastq.gz

splitReadsBySample.py -h
splitReadsBySample.py -O SplitBySample/ITS2 -1 Slashpile.intermediate/ITS2/Slashpile-ITS2_R1.fastq.gz -2 Slashpile.intermediate/ITS2/Slashpile-ITS2_R2.fastq.gz

splitReadsBySample.py -h
splitReadsBySample.py -O SplitBySample/LSU -1 Slashpile.intermediate/LSU/Slashpile-LSU_R1.fastq.gz -2 Slashpile.intermediate/LSU/Slashpile-LSU_R2.fastq.gz