ATACseq-cleaning-report
A modular tool to aggregate results from bioinformatics analyses across many samples into a single report.
Report
generated on 2020-12-01, 02:56
based on data in:
/share/workshop/epigenetics_workshop/msettles/atacseq_example/01-HTS_Preproc
HTStream
HTStream quality control and processing pipeline for High Throughput Sequencing data.
Processing Overview
General statistics from the HTStream pipeline.
Preprocessing Statistics
Fragment Reduction
Provides scaled statistics collected throughout the preprocessing pipeline, highlighting variable statistics across experiment.
Basepair Reduction
Provides scaled statistics collected throughout the preprocessing pipeline, highlighting variable statistics across experiment.
hts_Stats
Generates a JSON formatted file containing a set of statistical measures about the input read data.
Sample Name | % PE | % R1 Q30 | % R2 Q30 | GC Content | N Content | Notes |
---|---|---|---|---|---|---|
JLAC003A_htsStats | 100.00% | 81.20% | 79.87% | 43.21% | 0.0948% | initial Stats |
JLAC003B_htsStats | 100.00% | 81.25% | 78.59% | 43.49% | 0.1002% | initial Stats |
JLAC003C_htsStats | 100.00% | 82.26% | 78.21% | 43.69% | 0.1001% | initial Stats |
JLAC004D_htsStats | 100.00% | 89.27% | 87.50% | 43.81% | 0.1366% | initial Stats |
JLAC004E_htsStats | 100.00% | 89.53% | 88.03% | 44.05% | 0.1355% | initial Stats |
JLAC004F_htsStats | 100.00% | 88.64% | 87.50% | 43.99% | 0.1374% | initial Stats |
Read Lengths: Paired End
Distribution of read lengths for each sample.
Sample Name | R1 Read Lengths | R2 Read Lengths |
---|---|---|
JLAC003A_htsStats | 50 | 50 |
JLAC003B_htsStats | 50 | 50 |
JLAC003C_htsStats | 50 | 50 |
JLAC004D_htsStats | 50 | 50 |
JLAC004E_htsStats | 50 | 50 |
JLAC004F_htsStats | 50 | 50 |
Base by Cycle: Paired End
Provides a measure of the uniformity of a distribution. The higher the average is at a certain position, the more unequal the base pair composition. N's are excluded from this calculation.
Quality by Cycle: Paired End
Mean quality score for each position along the read. Sample is colored red if less than 60% of bps have mean score of at least Q30, orange if between 60% and 80%, and green otherwise.
hts_SeqScreener
A simple sequence screening tool which uses a kmer lookup approach to identify reads from an unwanted source.
Sample Name | PE hits | % PE Hits | Notes |
---|---|---|---|
JLAC003A_htsStats | 11 | 0.0000% | PhiX check |
JLAC003B_htsStats | 15 | 0.0000% | PhiX check |
JLAC003C_htsStats | 17 | 0.0000% | PhiX check |
JLAC004D_htsStats | 16 | 0.0000% | PhiX check |
JLAC004E_htsStats | 14 | 0.0000% | PhiX check |
JLAC004F_htsStats | 20 | 0.0000% | PhiX check |
hts_SuperDeduper
A reference free duplicate read removal tool.
Sample Name | % Duplicates | % Ignored | Notes |
---|---|---|---|
JLAC003A_htsStats | 23.48% | 3.10% | Remove PCR duplicates |
JLAC003B_htsStats | 23.32% | 3.41% | Remove PCR duplicates |
JLAC003C_htsStats | 18.81% | 3.06% | Remove PCR duplicates |
JLAC004D_htsStats | 25.64% | 1.69% | Remove PCR duplicates |
JLAC004E_htsStats | 19.02% | 1.56% | Remove PCR duplicates |
JLAC004F_htsStats | 16.59% | 1.79% | Remove PCR duplicates |
SuperDeduper: Duplicate Saturation
Plots the number of duplicates against the number of unique reads per sample.
hts_AdapterTrimmer
Trims adapters which are sequenced when the fragment insert length is shorter than the read length.
Sample Name | % Bp Lost | % Adapters | Avg. Bps Trimmed | Notes |
---|---|---|---|---|
JLAC003A_htsStats | 3.38% | 39.39% | 8.57 | Overlap and remove adapters |
JLAC003B_htsStats | 3.43% | 38.66% | 8.87 | Overlap and remove adapters |
JLAC003C_htsStats | 4.90% | 52.01% | 9.43 | Overlap and remove adapters |
JLAC004D_htsStats | 3.55% | 40.64% | 8.73 | Overlap and remove adapters |
JLAC004E_htsStats | 4.41% | 47.85% | 9.21 | Overlap and remove adapters |
JLAC004F_htsStats | 5.69% | 60.63% | 9.39 | Overlap and remove adapters |
AdapterTrimmer: Trimmed Basepairs Composition
Composition of basepairs trimmed from the ends of paired end and single end reads.
hts_NTrimmer
Trims reads to the longest subsequence that contains no N's.
Sample Name | % Bp Lost | % R1 of Bp Lost | % R2 of Bp Lost | Avg. Bps Trimmed | % Discarded | Notes |
---|---|---|---|---|---|---|
JLAC003A_htsStats | 0.08% | 11.30% | 88.70% | 0.08 | 0.00% | Remove all Ns |
JLAC003B_htsStats | 0.09% | 10.96% | 89.04% | 0.08 | 0.00% | Remove all Ns |
JLAC003C_htsStats | 0.08% | 11.35% | 88.65% | 0.08 | 0.00% | Remove all Ns |
JLAC004D_htsStats | 0.07% | 5.33% | 94.67% | 0.07 | 0.00% | Remove all Ns |
JLAC004E_htsStats | 0.08% | 5.38% | 94.62% | 0.07 | 0.00% | Remove all Ns |
JLAC004F_htsStats | 0.07% | 5.29% | 94.71% | 0.07 | 0.00% | Remove all Ns |
NTrimmer: Trimmed Basepairs Composition
Plots the number of N bases trimmed from ends of paired end and single end reads.
hts_QWindowTrim
Uses a sliding window approach to remove the low quality ends of reads.
Sample Name | % Bp Lost | % R1 of Bp Lost | % R2 of Bp Lost | Avg. Bps Trimmed | Notes |
---|---|---|---|---|---|
JLAC003A_htsStats | 4.81% | 36.06% | 63.94% | 4.65 | Quality trim |
JLAC003B_htsStats | 5.35% | 33.01% | 66.99% | 5.17 | Quality trim |
JLAC003C_htsStats | 4.79% | 30.78% | 69.22% | 4.55 | Quality trim |
JLAC004D_htsStats | 3.07% | 31.95% | 68.05% | 2.96 | Quality trim |
JLAC004E_htsStats | 2.65% | 32.62% | 67.38% | 2.53 | Quality trim |
JLAC004F_htsStats | 2.46% | 34.54% | 65.46% | 2.32 | Quality trim |
QWindowTrim: Trimmed Basepairs Composition
Plots the number of low quality basepairs trimmed from ends of paired end and single end reads.
hts_LengthFilter
Discards reads below a minimum length threshold.
Sample Name | % PE Lost | Notes |
---|---|---|
JLAC003A_htsStats | 36.80% | Remove too short |
JLAC003B_htsStats | 37.51% | Remove too short |
JLAC003C_htsStats | 42.16% | Remove too short |
JLAC004D_htsStats | 31.60% | Remove too short |
JLAC004E_htsStats | 33.96% | Remove too short |
JLAC004F_htsStats | 39.39% | Remove too short |
hts_Stats 2
Generates a JSON formatted file containing a set of statistical measures about the input read data.
Sample Name | % PE | % R1 Q30 | % R2 Q30 | GC Content | N Content | Notes |
---|---|---|---|---|---|---|
JLAC003A_htsStats | 100.00% | 94.99% | 94.29% | 43.35% | 0.0000% | end Stats |
JLAC003B_htsStats | 100.00% | 95.21% | 93.93% | 43.65% | 0.0000% | end Stats |
JLAC003C_htsStats | 100.00% | 95.65% | 93.97% | 43.42% | 0.0000% | end Stats |
JLAC004D_htsStats | 100.00% | 97.20% | 96.42% | 44.13% | 0.0000% | end Stats |
JLAC004E_htsStats | 100.00% | 97.28% | 96.61% | 43.98% | 0.0000% | end Stats |
JLAC004F_htsStats | 100.00% | 97.27% | 96.74% | 43.47% | 0.0000% | end Stats |
Read Lengths: Paired End
Distribution of read lengths for each sample.
Sample Name | R1 Read Lengths | R2 Read Lengths |
---|---|---|
JLAC003A_htsStats | 50 | 50 |
JLAC003B_htsStats | 50 | 50 |
JLAC003C_htsStats | 50 | 50 |
JLAC004D_htsStats | 50 | 50 |
JLAC004E_htsStats | 50 | 50 |
JLAC004F_htsStats | 50 | 50 |
Base by Cycle: Paired End
Provides a measure of the uniformity of a distribution. The higher the average is at a certain position, the more unequal the base pair composition. N's are excluded from this calculation.
Quality by Cycle: Paired End
Mean quality score for each position along the read. Sample is colored red if less than 60% of bps have mean score of at least Q30, orange if between 60% and 80%, and green otherwise.