☰ Menu

      UC Davis Bioinformatics Workshop Base Template

Home
Introduction
Intro to the Workshop and Core
Examples of Markdown Formatting
Snakemake
Introduction
RNAseq/TagSeq Workflow
10x Supernova (In progress)

Snakemake Introduction

Overview

Snakemake is a flexible, python based workflow system.

Snakemake is different from other workflow systems (like CWL-Common workflow language) in the following ways:

Try to draw out what the DAG graph would be for the following snakemake example?



SAMPLES = ["A", "B"]

rule all:
    input:
        "plots/quals.svg"


rule bwa_map:
    input:
        "data/genome.fa",
        "data/samples/{sample}.fastq"
    output:
        "mapped_reads/{sample}.bam"
    shell:
        "bwa mem {input} | samtools view -Sb - > {output}"


rule samtools_sort:
    input:
        "mapped_reads/{sample}.bam"
    output:
        "sorted_reads/{sample}.bam"
    shell:
        "samtools sort -T sorted_reads/{wildcards.sample} "
        "-O bam {input} > {output}"


rule samtools_index:
    input:
        "sorted_reads/{sample}.bam"
    output:
        "sorted_reads/{sample}.bam.bai"
    shell:
        "samtools index {input}"


rule bcftools_call:
    input:
        fa="data/genome.fa",
        bam=expand("sorted_reads/{sample}.bam", sample=SAMPLES),
        bai=expand("sorted_reads/{sample}.bam.bai", sample=SAMPLES)
    output:
        "calls/all.vcf"
    shell:
        "samtools mpileup -g -f {input.fa} {input.bam} | "
        "bcftools call -mv - > {output}"


rule plot_quals:
    input:
        "calls/all.vcf"
    output:
        "plots/quals.svg"
    script:
        "scripts/plot-quals.py"


Setup

Make sure you have a directory in the workshop folder (/share/workshop/$USER): TODO: make this flexible for lab share

mkdir /share/workshop/$USER

Copy the materials for the intro and the tutorial:

cd /share/workshop/$USER
mkdir snakemake-tutorial
cp -r /share/biocore/keith/workshop/snakemake-tutorial/* snakemake-tutorial/
cd snakemake-tutorial

Now lets see what files we have here:

(snakemake) keithgmitchell@tadpole:/share/biocore/keith/workshop/snakemake-tutorial$ ls data mapped_reads pe_rnaseq se_rnaseq slurm_out snakefile snakefile.py summarize_stats.py tagseq templates

Brief Overview of Commands Using the Example Workflow

  1. Prepare the environment for running snakemake:
    • module load snakemake
    • source activate snakemake
  2. Run the snakemake file as a dry run (the example workflow shown above).
    • This will build a DAG of the jobs to be run without actually executing them.
    • snakemake --dry-run
  3. Executing rules of interest.
    • snakemake --dry-run all VS. snakemake --dry-run bcftools_call VS. snakemake --dry-run bwa_map
    • Where is the wildcard specified?
  4. Run the snakemake file and produce an image of the DAG of jobs to be run.
    • snakemake --dag | dot -Tsvg > dag.svg OR snakemake --dag | dot -Tsvg > dag.svg dag

A few extra notes: