Home
Support
Schedule
Slack
Zoom
Scripts
Spatial Transcriptomics
Generating Expression Matrix
Analysis in R

Generating a Gene Expression Matrix

Most analyses have two stages: data reduction and data analysis.

Analyses of spatial transcriptomics data take as their starting point an expression matrix, where each row represents a gene and each column represents a cell or spot. Each entry in the matrix represents the number of reads (proxy for expression level) of a particular gene in a given sample (spot). Because we are working with 10x Genomics data, we will be using the 10x Cell Ranger pipeline to handle the data reduction tasks.

Visium

structure of a Visium gene expression slide

The Visium slide captures RNA from tissue slices placed over specialized regions of oligo spots called the capture area. Each of these spots is identified by both a coordinate system describing its location within the capture area, and a unique spatial barcode sequence, analogous to the cell barcode from a single cell or single nuclei experiment.

reactions taking place on Visium slide

Each spot is comprised of a group of oligos, each of which includes the spatial barccode, which is the same for every oligo in the spot, and the unique molecular identifier (UMI), which is random, and different for each oligo. Through a series of interactions that take place on the slide, RNA released by tissue permeabilization is therefore barcoded with an identifier corresponding to both the spatial location on the slide, and the identity of the RNA fragment.

After cDNA amplification and processing, the final library has the following structure:

final library structure

All images in this section come from the 10x Visium spatial gene expression reagent kit user guide.

What is Space Ranger?

Space Ranger is a suite of tools for processing Visium data, including:

spaceranger mkfastq demultiplexes BCL files.
spaceranger count produces a gene expression matrix, metrics, and preliminary analyses.
spacranger aggr combines results from multiple samples into a single matrix, down-sampling reads to achieve comparable sequencing depth across samples.
spaceranger targeted-compare assesses targeting performance of a targeted gene expression experiment.
spaceranger targeted-depth simulates a targeted gene expression experiment by computing the fraction of reads from a fresh-frozen experiment that map to targeted genes.

Image processing in Space Ranger

high resolution tissue image

Space Ranger automatically aligns the slide image using special fiducial spots located in the corners of the capture area, and determines which spots are located under the tissue. If automatic alignment fails, or image quality is poor, the Loupe browser can be used for manual image alignment, which produces a json file containing positional information that allows Space Ranger to determine which spots are under tissue.

For an in-depth explanation of Space Ranger methods, visit the 10x Genomics documentation site.

Set-up

Set up your directory:

mkdir -p /share/workshop/spatial_workshop/$USER/scripts/slurmout
cd /share/workshop/spatial_workshop/$USER/
ln -s ../data 00-RawData
ln -s ../refdata-gex-mm10-2020-A .
cp ../scripts/* scripts/

Input

Space Ranger requires sample IDs, the path to fastq files, a brightfield images, the slide serial number, and the capture area. We’ll be using a tab delimited file to this information, which would be recorded before sequencing. Create a file called design.tsv in your project directory with the following contents:

V1_Mouse_Brain_Sagittal_Anterior_Section_1	V1_Mouse_Brain_Sagittal_Anterior_image.tif	V19L29-035	B1
V1_Mouse_Brain_Sagittal_Posterior_Section_1	V1_Mouse_Brain_Sagittal_Posterior_image.tif	V19L29-035	A1
V1_Mouse_Brain_Sagittal_Anterior_Section_2	V1_Mouse_Brain_Sagittal_Anterior_Section_2_image.tif	V19L29-035	D1
V1_Mouse_Brain_Sagittal_Posterior_Section_2	V1_Mouse_Brain_Sagittal_Posterior_Section_2_image.tif	V19L29-035	C1

This file will be parsed by the 01-spaceranger.slurm script, and the fields provided as arguments to spaceranger count. Take a few minutes to examine the arguments to Space Ranger, the content of the input files (both fastq and image).

less scripts/01-spaceranger.slurm

Launch the Space Ranger job.

cd scripts
sbatch 01-spaceranger.slurm

Output

The feature barcode matrices generated by Space Ranger are essentially the same as those generated by a single cell or single nuclei experiment, except in that the columns represent spots rather than cells / nuclei. In addition to this, Space Ranger creates a directory of spatial output files.

Alignment output

Two matrices are produced by each Space Ranger run: the raw and the filtered feature barcode matrix.

Type		Description
raw_feature_bc_matrix.h5		gene-barcode matrix containing every whitelisted barcode with at least 1 read
filtered_feature_bc_matrix.h5		gene-barcode matrix containing only tissue-associated barcodes

These two matrices are each provided in two formats.

Matrix files

Three files needed to completely describe each gene x spot matrix:

matrix.mtx.gz
features.tsv.gz
barcode.tsv.gz

HDF5 files

HDF5 is a file format designed to preserve a hierarchical, filesystem-like organization of large amounts of data. The result is a single file completely describing the gene x spot matrix, which can be read into R or Python for downstream processing.

Spatial output

tissue_highres_image.png and tissue_lowres_image.png

These images are reduced-resolution copies of the original slide image provided to Space Ranger, downsampled 2,000 (6.5mm capture area) or 4,000 (11mm capture area) pixels for the “hires” image, and 600 pixels for the “lowres” image.

low resolution tissue image

aligned_fiducials.jpg

Useful for assessing the accuracy of the slide alignment, aligned_fiducials.jpg highlights the fiducial spots in red.

highlighted fiducial alignment

detected_tissue_image.jpg

The area of the slide determined to be covered by the tissue slice is highlighted in red, while the fiducial spots are circled in blue.

highlighted tissue-associated spots

scalefactors_json.json

A file containing scaling factors to convert between resolutions, and diameters of spots in both the fiducial border and the central capture area.

tissue_positions.csv

This file contains the information required to map each spot barcode sequence to a physical coordinate on the slide. It is a comma separated value file with one row per spot and six columns:

barcode the sequence of the spot barcode
in_tissue 1 if the spot is under tissue, 0 otherwise
array_row row coordinate for spot
arra_col column coordinate for spot
pxl_row_in_fullres vertical coordinate of spot center pixel
pxl_col_in_fullres horizontal coordinate of spot center pixel

barcode_fluorescence_intensity.csv

If a fluorescent image is provided with the –darkimage argument, a barcode_fluorescence_intensity.csv will be created to contain mean fluorescence per spot in each channel.

Prepare for R analysis

Before logging out of tadpole, create a directory for the more computationally intensive portions of the R analysis.

mkdir -p /share/workshop/spatial_workshop/$USER/02-Seurat

Download the R markdown document for the analysis portion of the course.

☰ Menu

Advanced Topics in Single Cell RNA-Seq: Spatial Transcriptomics