Home
Introduction and Lectures
Intro to the Workshop and Core
Support
Schedule
Slack
Cheat Sheets
Software and Links
Scripts
GitHub repository
Biocore website
Prerequisites
CLI
R
Data Reduction
Files and Filetypes
Project setup
Generating Expression Matrix
scRNAseq Analysis
Prepare scRNAseq Analysis
Part 1- Create object
Part 2- Filtering
Part 3- Normalization and scaling
Part 4- Dimensionality reduction
Part 5- Clustering and cell type
Part 6- Enrichment and DE
Part 7- Doublet detection
Part 8- Integration

The dataset used in this workshop is a subset of a much larger dataset from a recent study that generated single nuclei transcriptome and chromatin accessibility profiles from colorectal tissue samples.¹ The authors isolated 1000 to 10000 nuclei per sample for 81 samples of three types: 48 polyp samples, 27 normal tissue samples, and 6 colorectal cancer (CRC) samples from patients with or without germline APC mutations. They observed a continuum of cell state and composition changes from normal tissue, to polyps, to cancer.

For the purposes of this workshop, we will use one sample from each condition (CRC: A001-C-007, polyp: A001-C-104, and normal: B001-A-301).

Data Setup

Let’s set up a project directory for the analysis, and talk a bit about project philosophy.

1. First, create a directory for your user and the example project in the workshop directory:

cd
mkdir -p /share/workshop/scRNA_workshop/$USER/scrnaseq_example

2a. Next, go into that directory, create a raw data directory (we are going to call this 00-RawData) and cd into that directory. Let’s then create symbolic links to the fastq files that contains the raw read data.

cd /share/workshop/scRNA_workshop/$USER/scrnaseq_example
mkdir 00-RawData
cd 00-RawData/
ln -s /share/workshop/scRNA_workshop/DATA/*.fastq.gz .

This directory now contains the reads for each sample.

2b. Let’s create a sample sheet for the project, and store sample names in a file called samples.txt

cd /share/workshop/scRNA_workshop/$USER/scrnaseq_example/00-RawData
ls *_R1_* |cut -d'_' -f1 > ../samples.txt
cat ../samples.txt

Data Exploration

3. Now, take a look at the raw data directory.

ls /share/workshop/scRNA_workshop/$USER/scrnaseq_example/00-RawData

4. View the contents of the files using the ‘less’ command, when gzipped used ‘zless’ (which is just the ‘less’ command for gzipped files, q to exit):

cd /share/workshop/scRNA_workshop/$USER/scrnaseq_example/00-RawData
zless A001-C-007_S4_R1_001.fastq.gz

and Read 2

zless A001-C-007_S4_R2_001.fastq.gz

A detailed explanation of FASTQ file can be found here. Please read on the description and make sure you can identify which lines correspond to a single read and which lines are the header, sequence, and quality values. Press ‘q’ to exit this screen.

Let’s figure out the number of reads in this file. A simple way to do that is to count the number of lines and divide by 4 (because the record of each read uses 4 lines). In order to do this use cat to output the uncompressed file and pipe that to “wc” to count the number of lines:

zcat A001-C-007_S4_R1_001.fastq.gz | wc -l

Divide this number by 4 and you have the number of reads in this file.

expr $(zcat A001-C-007_S4_R1_001.fastq.gz | wc -l) / 4

One more thing to try is to figure out the length of the reads without counting each nucleotide. First get the first 4 lines of the file (i.e. the first record):

zcat A001-C-007_S4_R1_001.fastq.gz  | head -4

Note the header lines (1st and 3rd line) and sequence and quality lines (2nd and 4th) in each 4-line fastq block. You can isolate the sequence line:

zcat A001-C-007_S4_R1_001.fastq.gz | head -2 | tail -1

Then, copy and paste the DNA sequence line into the following command (replace [sequence] with the line):

echo -n [sequence] | wc -c

This will give you the length of the read.

Also can do the bash one liner:

echo -n $(zcat A001-C-007_S4_R1_001.fastq.gz  | head -2 | tail -1) | wc -c

See if you can figure out how this command works.

Quiz

Becker, W. R.; Nevins, S. A.; Chen, D. C.; Chiu, R.; Horning, A. M.; Guha, T. K.; Laquindanum, R.; Mills, M.; Chaib, H.; Ladabaum, U.; Longacre, T.; Shen, J.; Esplin, E. D.; Kundaje, A.; Ford, J. M.; Curtis, C.; Snyder, M. P.; Greenleaf, W. J. Single-Cell Analyses Define a Continuum of Cell State and Composition Changes in the Malignant Transformation of Polyps to Colorectal Cancer. Nat. Genet. 2022. https://doi.org/10.1038/s41588-022-01088-x. ↩

☰ Menu

Single Cell RNA-Seq Analysis

Data Setup

Data Exploration

Quiz