☰ Menu

      PacBio Iso-Seq Workshop Online

Home
Introduction
Intro to the Workshop
Schedule
Support
Using Slack in this workshop
Using Zoom in this workshop
Cheat Sheets
Software and Links
Talks
Perspective on Bioinformatics/Genomics?
Iso-Seq Library Prep
Command Line Interface
Logging In and Transferring Files
Intro CLI
Advanced CLI
Advanced CLI Challenge Solutions
Clusters and Modules
Data Reduction
PacBio data
Dr. Liz Tseng
SMRT Sequencing - How it Works
Using the Iso-Seq Application on SMRT Link and BioConda
IsoPhase - Isoform-Level Phasing
Single-Cell RNA Sequencing using the Iso-Seq Method
Tutorial
Richard Kuo
Transcriptome Annotation by Modular Algorithms TAMA
TAMA (hands-on)
Tutorial
Conesa Lab
Installing tappAS
Quality Control and Characterization of LR Transcriptomes
SQANTI3 hands on and IsoAnnot
Hands-on tappAS
ETC
Closing thoughts
Workshop Photos
Github page
Biocore website
Pacific Biosciences website
Conesa lab website

Using the Iso-Seq Application on SMRTlink and BioConda

Elizabeth Tseng, Principal Scientist, PacBio

Why use Iso-Seq analysis?

ISO-SEQ ANALYSIS MAIN FEATURES

HIFI READS FROM CCS

FULL-LENGTH READS HAVE 5’ AND 3’ PRIMERS

REMOVE CONCATEMERS AND POLY(A) TAILS

CLUSTER TO GET ISOFORMS

Note: The pdf is wrong. Low quality is anything that is lower than 99% or has less than 2 FLNC read support.

MAP AND COLLAPSE ISOFORMS

BENEFITS OF ISO-SEQ ANALYSIS APPLICATION

Iso-Seq Analysis Using pbBioConda

INSTRUCTIONS TUTORIAL

Follow the instructions tutorial for installing all the software needed.

DOWNLOAD THE DATA here (we will be using the data to do practice questions in the tutorial)

Example:

$ wget –nv https://downloads.pacbcloud.com/public/dataset/ISMB_workshop/isoseq3/alz.ccs.bam

SPECIFY ISO-SEQ PRIMERS

$ more primers.fasta
>5p GCAATGAAGTCGCAGGGTTGGG
>3p GTACTCTGCGTTGATACCACTGCTT

INPUT CCS BAM FILE

$ samtools view -h alz.ccs.bam
m141008_060349_42194_c100704972550000001823137703241586_s1_p0/63/ccs4*0255 **00CCCGGGGATCCTCTAGAATGC~~~~~~~~~~~~~~~~~~~~~RG:Z:83ba013f np:i:35 rq:f:0.999682 sn:B:f,11.3175,6.64119,11.6261,14.5199 zm:i:63

REFERENCE GENOME

$ grep '>’ hg38.fa # to list the headers per chromosome
>chr1 AC:CM000663.2 gi:568336023 LN:248956422 rl:Chromosome M5:6aef897c3d6ff0c78aff06ac189178dd AS:GRCh38 >chr2 AC:CM000664.2 gi:568336022 LN:242193529 rl:Chromosome M5:f98db672eb0993dcfdabafe2a882905c AS:GRCh38 >chr3 AC:CM000665.2 gi:568336021 LN:198295559 rl:Chromosome M5:76635a41ea913a405ded820447d067b0 AS:GRCh38 >chr4 AC:CM000666.2 gi:568336020 LN:190214555 rl:Chromosome M5:3210fecf1eb92d5489da4346b3fddc6e AS:GRCh38 >chr5 AC:CM000667.2 gi:568336019 LN:181538259 rl:Chromosome M5:a811b3dc9fe66af729dc0dddf7fa4f13 AS:GRCh38 hm:47309185-49591369 …

SOFTWARE INSTALLATION CHECK

Access to your conda environment

$ source activate <name of your environment>

Check your installation

$ isoseq3 --version
isoseq3 3.4.x
$ lima --version
lima 1.11.0
$ pbmm2 –-version
pbmm2 1.3.0

ISO-SEQ WORKFLOW

Click here if you want to look at the entire Iso Seq workflow on PacBio’s GitHub page

PRIMER REMOVAL & DEMULTIPLEXING

Command line:

lima --isoseq --dump-clips --peek-guess -j 24\
alz.ccs.bam isoseq_primers.fasta alz.demult.bam

Input files:

Output files:

Options:

After completion, you will see the following files:

$ ls -ltrh

TRIMMING POLY(A) TAILS AND CONCATEMER REMOVAL

Command line:

isoseq3 refine --require-polya\
alz.demult.5p--3p.bam\ isoseq_primers.fasta alz.flnc.bam

Input files:

Output files:

Options:

After completion, you will see the following files:

$ ls -ltrh

ISOFORMS

Command line:

isoseq3 cluster alz.flnc.bam alz.polished.bam \
--verbose --use-qvs

Input files:

Output files:

Options:

After completion, you will see the following files:

$ ls -ltrh

Note: Because the ccs input is Polished, the isoseq3 cluster output is already polished!

MAP

Command line:

pbmm2 align hg38.fa alz.polished.hq.bam alz.aligned.bam
-j 24 --preset ISOSEQ –-sort --log-level INFO

Note: The pdf is wrong. It should be –sort, not -sort.

Input files:

Output files:

Options:

After completion, you will see the following files:

$ ls –ltrh
alz.aligned.bam alz.aligned.bam.bai

COLLAPSE

Command line:

isoseq3 collapse alz.aligned.bam alz.collapsed.gff

Input files:

Output files:

After completion, you will see the following files:

$ ls –ltrh

PUBLICLY AVAILABLE ISO-SEQ DATA SETS here if you’re interested

ISO-SEQ ANALYSIS TERMINOLOGY

The pdf to this documentation can be found here

Iso-Seq pipeline with Bioconda and visualization using UCSC and IGV & Cupcake (hands on)

The Iso Seq Bioinformatics Tutorial we are using is here. See ‘1. isoseq3’ and answer the corresponding practice questions!

Note: We did not have time to do Cupcake during the workshop so try it out yourself and if you have questions feel free to reach out! See ‘2. Cupcake Fun’ and answer the corresponding practice questions!