Running the dbcAmplicons pipeline
ALL of this should only be done in an interactive session on the cluster
srun -t 1-0 -c 2 -n 1 –mem 10000 –reservation workshop –pty /bin/bash
1. First lets validate our install and environment
Change directory into the workshops space
cd
cd mca_example
ls --color
you should see 4 directories: bin, Illumina_Reads, metadata, and src
Lets verify the software is accessible
dbcVersionReport.sh
you should see the version info for dbcAmplicons, flash2 and RDP.
If all this is correct, we are ready to begin.
1. First lets validate our input files:
dbcAmplicons validate
dbcAmplicons validate -h
dbcAmplicons validate -B metadata/dbcBarcodeLookupTable.txt -P metadata/PrimerTable.txt -S metadata/workshopSamplesheet.txt
Once we are sure our input files are ok
2. Lets Preprocess the data
dbcAmplicons preprocess
dbcAmplicons preprocess -h
First lets test preprocessing dbcAmplicons preprocess -B metadata/dbcBarcodeLookupTable.txt -P metadata/PrimerTable.txt -S metadata/workshopSamplesheet.txt -1 Illumina_Reads/Slashpile_only_R1.fastq.gz –test > preprocess.log
View preprocess.log and the file Identified_barcodes.txt, make sure the results make sense
cat preprocess.log
cat Identified_Barcodes.txt
Run all reads
dbcAmplicons preprocess -B metadata/dbcBarcodeLookupTable.txt -P metadata/PrimerTable.txt -S metadata/workshopSamplesheet.txt -O Slashpile.intermediate -1 Illumina_Reads/Slashpile_only_R1.fastq.gz > preprocess.log
cat preprocess.log
cat Identified_Barcodes.txt
3. Lets merge/join the read pairs
dbcAmplicons Join
dbcAmplicons join -h
dbcAmplicons join -t 2 -O Slashpile.intermediate/16sV1V3/Slashpile-16sV1V3 -1 Slashpile.intermediate/16sV1V3/Slashpile-16sV1V3_R1.fastq.gz > join-16sV1V3.log
cat join-16sV1V3.log
dbcAmplicons join -t 2 -O Slashpile.intermediate/16sV4V5/Slashpile-16sV4V5 -1 Slashpile.intermediate/16sV4V5/Slashpile-16sV4V5_R1.fastq.gz > join-16sV4V5.log
cat join-16sV4V5.log
dbcAmplicons join -t 2 -O Slashpile.intermediate/ITS1/Slashpile-ITS1 -1 Slashpile.intermediate/ITS1/Slashpile-ITS1_R1.fastq.gz > join-ITS1.log
cat join-ITS1.log
dbcAmplicons join -t 2 -O Slashpile.intermediate/ITS2/Slashpile-ITS2 -1 Slashpile.intermediate/ITS2/Slashpile-ITS2_R1.fastq.gz > join-ITS2.log
cat join-ITS2.log
dbcAmplicons join -t 2 -O Slashpile.intermediate/LSU/Slashpile-LSU -1 Slashpile.intermediate/LSU/Slashpile-LSU_R1.fastq.gz > join-LSU.log
cat join-LSU.log
4. Classify the merged reads using RDP
dbcAmplicons classify
dbcAmplicons classify -h
dbcAmplicons classify -p 2 --gene 16srrna -U Slashpile.intermediate/16sV1V3/Slashpile-16sV1V3.extendedFrags.fastq.gz -O Slashpile.intermediate/16sV1V3/Slashpile-16sV1V3
dbcAmplicons classify -p 2 --gene 16srrna -U Slashpile.intermediate/16sV4V5/Slashpile-16sV4V5.extendedFrags.fastq.gz -O Slashpile.intermediate/16sV4V5/Slashpile-16sV4V5
dbcAmplicons classify -p 2 --gene fungalits_unite -U Slashpile.intermediate/ITS1/Slashpile-ITS1.extendedFrags.fastq.gz -O Slashpile.intermediate/ITS1/Slashpile-ITS1
dbcAmplicons classify -p 2 --gene fungalits_unite -U Slashpile.intermediate/ITS2/Slashpile-ITS2.extendedFrags.fastq.gz -O Slashpile.intermediate/ITS2/Slashpile-ITS2
dbcAmplicons classify -p 2 --gene fungallsu -1 Slashpile.intermediate/LSU/Slashpile-LSU.notCombined_1.fastq.gz -2 Slashpile.intermediate/LSU/Slashpile-LSU.notCombined_2.fastq.gz -O Slashpile.intermediate/LSU/Slashpile-LSU
5. Produce Abundance tables
mkdir Slashpile.results
dbcAmplicons abundance -h
dbcAmplicons abundance -S metadata/workshopSamplesheet.txt -O Slashpile.results/16sV1V3 -F Slashpile.intermediate/16sV1V3/Slashpile-16sV1V3.fixrank --biom > abundance.16sV1V3.log
cat abundance.16sV1V3.log
dbcAmplicons abundance -S metadata/workshopSamplesheet.txt -O Slashpile.results/16sV4V5 -F Slashpile.intermediate/16sV4V5/Slashpile-16sV4V5.fixrank --biom > abundance.16sV4V5.log
cat abundance.16sV4V5.log
dbcAmplicons abundance -S metadata/workshopSamplesheet.txt -O Slashpile.results/ITS1 -F Slashpile.intermediate/ITS1/Slashpile-ITS1.fixrank --biom > abundance.ITS1.log
cat abundance.ITS1.log
dbcAmplicons abundance -S metadata/workshopSamplesheet.txt -O Slashpile.results/ITS2 -F Slashpile.intermediate/ITS2/Slashpile-ITS2.fixrank --biom > abundance.ITS2.log
cat abundance.ITS2.log
dbcAmplicons abundance -S metadata/workshopSamplesheet.txt -O Slashpile.results/LSU -F Slashpile.intermediate/LSU/Slashpile-LSU.fixrank --biom > abundance.LSU.log
cat abundance.LSU.log
6. Split Reads by samples, for downstream processing in another application (post preprocessing/merging), or for submission to the SRA.
splitReadsBySample.py -h
splitReadsBySample.py -O SplitBySample/16sV1V3 -1 Slashpile.intermediate/16sV1V3/Slashpile-16sV1V3_R1.fastq.gz -2 Slashpile.intermediate/16sV1V3/Slashpile-16sV1V3_R2.fastq.gz
FYI Preprocessing Pairs with inline BC, Mills lab protocol
preproc_Pair_with_inlineBC.py -h