mca-cleaning
A modular tool to aggregate results from bioinformatics analyses across many samples into a single report.
Report
generated on 2021-05-17, 06:35
based on data in:
/share/workshop/mca_workshop/msettles/01-HTS_Preproc
HTStream
HTStream quality control and processing pipeline for High Throughput Sequencing data.
Processing Overview
General statistics from the HTStream pipeline.
Preprocessing Statistics
Fragment Reduction
Provides scaled statistics collected throughout the preprocessing pipeline, highlighting variable statistics across experiment.
Basepair Reduction
Provides scaled statistics collected throughout the preprocessing pipeline, highlighting variable statistics across experiment.
hts_Stats
Generates a JSON formatted file containing a set of statistical measures about the input read data.
Sample Name | % PE | % R1 Q30 | % R2 Q30 | GC Content | N Content | Notes |
---|---|---|---|---|---|---|
Bs1_2C_A0 | 100.00% | 87.69% | 64.28% | 54.76% | 0.0000% | Initial Stats |
Bs1_2C_A5 | 100.00% | 87.96% | 64.30% | 54.87% | 0.0000% | Initial Stats |
Bs1_2C_B0 | 100.00% | 87.86% | 64.19% | 54.78% | 0.0000% | Initial Stats |
Bs1_2C_B5 | 100.00% | 87.73% | 60.73% | 54.77% | 0.0000% | Initial Stats |
Bs1_2C_C0 | 100.00% | 88.03% | 64.96% | 54.70% | 0.0000% | Initial Stats |
Bs1_2C_C5 | 100.00% | 87.94% | 64.88% | 54.62% | 0.0000% | Initial Stats |
Bs1_2C_D0 | 100.00% | 87.99% | 63.37% | 54.84% | 0.0000% | Initial Stats |
Bs1_2C_D5 | 100.00% | 87.97% | 65.61% | 54.65% | 0.0000% | Initial Stats |
Read Lengths: Paired End
Distribution of read lengths for each sample.
Sample Name | R1 Read Lengths | R2 Read Lengths |
---|---|---|
Bs1_2C_A0 | 301 | 301 |
Bs1_2C_A5 | 301 | 301 |
Bs1_2C_B0 | 301 | 301 |
Bs1_2C_B5 | 301 | 301 |
Bs1_2C_C0 | 301 | 301 |
Bs1_2C_C5 | 301 | 301 |
Bs1_2C_D0 | 301 | 301 |
Bs1_2C_D5 | 301 | 301 |
Base by Cycle: Paired End
Provides a measure of the uniformity of a distribution. The higher the average is at a certain position, the more unequal the base pair composition. N's are excluded from this calculation.
Quality by Cycle: Paired End
Mean quality score for each position along the read. Sample is colored red if less than 60% of bps have mean score of at least Q30, orange if between 60% and 80%, and green otherwise.
hts_Overlapper
Attempts to overlap paired end reads to produce the original fragment, trims adapters, and can correct sequencing errors.
Sample Name | % Overlapped | Notes |
---|---|---|
Bs1_2C_A0 | 98.28% | Overlap reads |
Bs1_2C_A5 | 98.20% | Overlap reads |
Bs1_2C_B0 | 98.39% | Overlap reads |
Bs1_2C_B5 | 97.76% | Overlap reads |
Bs1_2C_C0 | 98.56% | Overlap reads |
Bs1_2C_C5 | 98.34% | Overlap reads |
Bs1_2C_D0 | 98.16% | Overlap reads |
Bs1_2C_D5 | 98.52% | Overlap reads |
Overlapper: Overlap Composition
Plots the quantities of insert types for each sample.
Overlapper: Overlapped Lengths
Plots the lengths of paired end read overlaps.
hts_Primers
Identifies primer sequences located on the 5' ends of R1 and R2, or 5' and 3' end of SE reads.
Sample Name | % Bp Lost | Reads Flipped | Notes |
---|---|---|---|
Bs1_2C_A0 | 16.7242% | 49729 | Single set V3V4 primers |
Bs1_2C_A5 | 16.9455% | 47539 | Single set V3V4 primers |
Bs1_2C_B0 | 16.7234% | 44384 | Single set V3V4 primers |
Bs1_2C_B5 | 17.3585% | 34183 | Single set V3V4 primers |
Bs1_2C_C0 | 16.6489% | 31868 | Single set V3V4 primers |
Bs1_2C_C5 | 16.8561% | 27089 | Single set V3V4 primers |
Bs1_2C_D0 | 16.9417% | 40076 | Single set V3V4 primers |
Bs1_2C_D5 | 16.6015% | 29558 | Single set V3V4 primers |
Primers: Primer Counts
Heatmap indicating abundance of primer combinations.
hts_NTrimmer
Trims reads to the longest subsequence that contains no N's.
hts_LengthFilter
Discards reads below a minimum length threshold.
Sample Name | % SE Lost | Notes |
---|---|---|
Bs1_2C_A0 | 0.10% | Filter sequences 100 - 400 |
Bs1_2C_A5 | 0.04% | Filter sequences 100 - 400 |
Bs1_2C_B0 | 0.05% | Filter sequences 100 - 400 |
Bs1_2C_B5 | 0.05% | Filter sequences 100 - 400 |
Bs1_2C_C0 | 0.08% | Filter sequences 100 - 400 |
Bs1_2C_C5 | 0.03% | Filter sequences 100 - 400 |
Bs1_2C_D0 | 0.05% | Filter sequences 100 - 400 |
Bs1_2C_D5 | 0.05% | Filter sequences 100 - 400 |
hts_Stats 2
Generates a JSON formatted file containing a set of statistical measures about the input read data.
Sample Name | % PE | % SE | % R1 Q30 | % R2 Q30 | % SE Q30 | GC Content | N Content | Notes |
---|---|---|---|---|---|---|---|---|
Bs1_2C_A0 | 0.88% | 99.12% | 36.71% | 42.82% | 97.92% | 54.60% | 0.0000% | |
Bs1_2C_A5 | 0.87% | 99.13% | 34.85% | 42.70% | 97.97% | 54.75% | 0.0000% | |
Bs1_2C_B0 | 0.80% | 99.20% | 35.27% | 42.14% | 97.93% | 54.64% | 0.0000% | |
Bs1_2C_B5 | 1.05% | 98.95% | 37.90% | 40.36% | 97.50% | 54.58% | 0.0000% | |
Bs1_2C_C0 | 0.76% | 99.24% | 38.88% | 41.89% | 98.04% | 54.54% | 0.0000% | |
Bs1_2C_C5 | 0.76% | 99.24% | 34.91% | 43.30% | 98.02% | 54.46% | 0.0000% | |
Bs1_2C_D0 | 0.92% | 99.08% | 35.28% | 42.57% | 97.83% | 54.69% | 0.0000% | |
Bs1_2C_D5 | 0.73% | 99.27% | 34.70% | 43.26% | 98.09% | 54.51% | 0.0000% |
Read Lengths: Paired End
Distribution of read lengths for each sample.
Base by Cycle: Paired End
Provides a measure of the uniformity of a distribution. The higher the average is at a certain position, the more unequal the base pair composition. N's are excluded from this calculation.
Quality by Cycle: Paired End
Mean quality score for each position along the read. Sample is colored red if less than 60% of bps have mean score of at least Q30, orange if between 60% and 80%, and green otherwise.
Read Lengths: Single End
Distribution of read lengths for each sample.
Base by Cycle: Single End
Provides a measure of the uniformity of a distribution. The higher the average is at a certain position, the more unequal the base pair composition. N's are excluded from this calculation.
Quality by Cycle: Single End
Mean quality score for each position along the read. Sample is colored red if less than 60% of bps have mean score of at least Q30, orange if between 60% and 80%, and green otherwise.