Home
Introduction and Lectures
Intro to the Workshop and Core
Support
Schedule
Slack
Cheat Sheets
Software and Links
Scripts
GitHub repository
Biocore website
Prerequisites
CLI
R
Spatial Transcriptomics
Technologies
Introduction to Data
Part 0- Prepare R Env
Part 1- Data Exploration
Part 2- Clustering
Part 3- Cell type
Part 4- Niche
Part 5- NicheDE
Part 6- CCC
Part 7- Cell Segmmentation

Introduction to Data

There are a few different data sets used throughout the workshop because no one data set meets the requirement for all aspects we are going to discuss. The first data set is a Xenium data set on Alzheimer’s disease model of mouse brain coronal section from one hemisphere. It is a data set provided by 10X Genomics on their website: https://www.10xgenomics.com/datasets/xenium-in-situ-analysis-of-alzheimers-disease-mouse-model-brain-coronal-sections-from-one-hemisphere-over-a-time-course-1-standard. The full data set includes 2 genotypes: wild type and Alzheimer’s disease model, and 3 time points per genotype. We are going to use only the time point at 5.7 months for the 2 genotypes in this workshop.

Explore the output from Xenium Onboard Analysis output

First, let’s create a project folder to keep all the workshop materials in. Then download the data and uncompress them and put them into their corresponding directories.

mkdir -p ~/Spatial_transcriptomics; cd Spatial_transcriptomics
mkdir Xenium_V1_FFPE_TgCRND8_5_7_months_outs; mv Xenium_V1_FFPE_TgCRND8_5_7_months_outs.zip Xenium_V1_FFPE_TgCRND8_5_7_months_outs/
mkdir Xenium_V1_FFPE_wildtype_5_7_months_outs; mv Xenium_V1_FFPE_wildtype_5_7_months_outs.zip Xenium_V1_FFPE_wildtype_5_7_months_outs/

Let’s take a look at the files inside Xenium_V1_FFPE_wildtype_5_7_months_outs folder as an example.

cd Xenium_V1_FFPE_wildtype_5_7_months_outs
ls

analysis_summary.html - Contains summary metrics and automated secondary analysis results
gene_panel.json - transcript feature metadata
morphology.ome.tif - 3D Z-stack of the DAPI image that can be useful to resegment cells, assess segmentation quality, and view data. Open with imageJ
morphology_focus - available in Xenium experiment where 3 additional stain images available besides the nuclei DAPI stain image: boundary (ATP1A1/E-Cadherin/CD45) image, interior RNA (18S) image, and interior protein (alphaSMA/Vimentin) image
cells.csv.gz - cell summary file with columns for:
- cell id
- X location of the cell centroid in microns
- Y location of the cell centroid in microns
- count of gene features for transcripts with Q-Score >= 20
- count of negative control probes
- count of genomic control codewords (Xenium Prime)
- count of negative control codewords
- count of unassigned codewords
- total counts of the probes and codewords
- 2D area covered by the cell in micron^2
- 2D area covered by the nucleus in micron^2
- count of detected nuclei
- cell segmentation method
cells.parquet - cell summary in Parquet format that allows faster loading and reading of data
cells.zarr.zip - cell segmentation file in Zarr format that contains segmentation masks and boundaries for nuclei and cells. It’s used to assign transcripts to cells
nucleus_boundaries.csv.gz and cell_boundaries.csv.gz are nucleus and cell boundaries in csv format. Each row represents a vertex in the boundary polygon of the cell/nucleus and the boundaries points for each cell/nucleus are listed in clockwise order.
transcripts.parquet contains data to evaluate transcript quality and localization.
- transcript id
- cell id
- binary value to indicate if the transcript falls within the segmented nuccleus of a cell
- gene or control name
- X location of the transcript in micron
- Y location of the transcript in micron
- Z location of the transcript in micron
- phred-scaled quality value (Q-Score) estimating the probability of incorrect call
- FOV name
- the distance between the transcript and the nearest nucleus boundary in micron based on segmentation mask boundaries.
- an integer index for each codeword used to decode transcripts
- codeword category
- value to indicate whether transcript feature is “Gene Expression” or not

cell_feature_matrix - cell-feature matrix output from XOA in three formats: a standard text-based format (the Market Exchange Format (MEX)), the Hierarchical Data Format (HDF5), and the Zarr format. They only include transcripts that pass the default quality value threshold of Q20 and are assigned to cells.
analysis folder contains outputs from XOA pipeline: clustering, differential expression, principal component analysis and UMAP.
aux_outputs

The output from CosMx AtoMx pipeline

Bruker AtoMx interface provides export functions to download necessary files to be used with community developed tools. One may export flat files (text based, csv).

count matrix
cell metadata
transcripts
polygons
FOV positions

Tertiary analysis objects, such as Seurat object and TileDB array, can be exported as well. User has the option to include transcript coordinates and polygon coordinates in the exported Seurat object. But these produce large data files. Morphology 2D data may be exported as well, but produces very large data.

The output from Visium HD platform

10X Genomics Visium HD runs generate fastq files. The first step in analysis is to run Space Ranger to translate the raw sequencing data to location decoded gene expression matrix. The input files required for Space Ranger.

fastq files
reference file
probe set file
slide layout file
CytAssist-captured image
optionally, a high resolution microscope image in either brightfield and fluorescence image can be added

☰ Menu

Spatial Transcriptomics Analysis

Introduction to Data

Explore the output from Xenium Onboard Analysis output

The output from CosMx AtoMx pipeline

The output from Visium HD platform