Home
Introduction and Lectures
Intro to the Workshop and Core
Schedule
What is Bioinformatics/Genomics?
Experimental Design and Cost Estimation
RNA Sequencing Technologies - Dr. Lutz Froenicke
Support
Zoom
Slack
Cheat Sheets
Software and Links
Scripts
Prerequisites
CLI
R
Data Reduction
Files and Filetypes
Prepare dataset
Preprocessing raw data
Indexing a Genome
Alignment with Star
Generating counts tables
Alignment/Counts with Salmon (Extra)
3' TagSeq
Prokaryotic RNASeq
Data analysis
Annotation from BioMart
Prepare R for data analysis
Differential Expression Analysis
Pathway Analysis
Comparison between STAR and Salmon
ETC
Closing thoughts
Workshop Photos
Github page
Report Errors
Biocore website

Bacterial RNASeq data reduction workflow

The challenge/difference

The major difference between RNASeq in prokaryotes and in eukaryotes is the gene structure. In generally, there are no introns in prokaryotic genes, therefore no splicing.

prokaryotic gene structure

https://study.com/skill/practice/contrasting-the-regulation-of-gene-expression-in-prokaryotic-eukaryotic-organisms-questions.html

This intron-less structure in prokaryotic genes dictates how the RNASeq data should be processed, especially at the alignment stage.

One option is to use an alignment tool/aligner that allows for continuous mapping of the sequencing reads to the reference genome: Bowtie2. In the default mode (end-to-end mode), Bowtie2 searches for alignments that matches the sequencing reads from end to end. In another words, the alignments do not allow any trimming or clipping of bases from the sequencing reads. After the alignment, the quantification of gene expression can be done using featureCounts, by supplying the gene annotation information.
The other option is to use an aligner that aligns/pseudo-aligns the sequencing reads to the reference transcriptome: such as salmon.

An example of processing script for RNASeq data in prokaryotes. This assumes the sequencing data has gone through quality control: adapter trimming, quality trimming, length filtering,…

#!/bin/bash


 #SBATCH --nodes=1
 #SBATCH --ntasks=8
 #SBATCH --time=360
 #SBATCH --mem=16000 # Memory pool for all cores (see also --mem-per-cpu)
 #SBATCH --partition=production
 #SBATCH --array=1-54
    
 start=`date +%s`
 echo $HOSTNAME
 echo "My SLURM_ARRAY_TASK_ID: " $SLURM_ARRAY_TASK_ID
    
 sample=`sed "${SLURM_ARRAY_TASK_ID}q;d" samples.txt`
 annotation="References/gencode.vM26.annotation.gtf"
 outpath='02-Bowtie2'
    
 echo "SAMPLE: ${sample}"
    
 module load bowtie2

 bowtie2 --very-sensitive -p 12 -x $refP/reference \
         -1 R1.fastq -2 R2.fastq |\
         samtools view -bh -@ 2 -m 5G -o $outpath/${sample}.bam -
    
 samtools index -b ${outpath}/${sample}.bam
    
 module load subread
    
 featureCounts -T 4 --verbose -s 1 \
               -a ${annotation} \
               -t gene \
               -o ${outpath}//${sample}_featurecounts.txt \
               ${outpath}/${sample}.bam \
               > ${outpath}/${sample}_featurecounts.stdout \
               2> ${outpath}/${sample}_featurecounts.stderr
    
    
 end=`date +%s`
 runtime=$((end-start))
 echo $runtime

☰ Menu

RNA-Seq Analysis

Bacterial RNASeq data reduction workflow

The challenge/difference