Indexing a Reference sequence and annotation
-
First lets make sure we are where we are supposed to be and that the References directory is available.
cd /quobyte/ikorfgrp/bis180l/$USER/rnaseq_example mkdir -p References -
To align our data we will need the genome (fasta) and annotation (gtf) for mouse. There are many places to find them, but we are going to get them from the GENCODE.
We need to first get the url for the genome and annotation gtf. For RNAseq we want to use the primary genome chromosomes and basic gene annotation. At the time of this workshop the current version of GENCODE is M37 . You will want to update the scripts to use the current version.
We will need:
- Genome sequence, primary assembly (GRCm39)
- Basic gene annotation (CHR)


Save them into your “References” directory.
-
We are going to use an aligner called ‘STAR’ to align the data. Lets take a look at the help docs for star:
module load star/2.7.11a STAR -hThe basic options to generate genome indices using STAR are as follows:
–runThreadN: number of threads
–runMode: genomeGenerate mode
–genomeDir: /path/to/store/genome_indices
–genomeFastaFiles: /path/to/FASTA_file
–sjdbGTFfile: /path/to/GTF_file
–sjdbOverhang: readlength -1NOTE: In case of reads of varying length, the ideal value for –sjdbOverhang is max(ReadLength)-1. In most cases, the default value of 100 will work similarly to the ideal value.
-
First we need to index the genome for STAR. Lets pull down a slurm script to index the Ensembl version of the mouse genome.
wget https://raw.githubusercontent.com/ucdavis-bioinformatics-training/2025-Spring-BIS180L/master/rnaseq/software_scripts/scripts/star_index.slurm less star_index.slurm
When you are done, type “q” to exit.
- This script creates the star index directory (star.overlap100.gencode.M35).
- Changes directory into the new star index directory. We run the star indexing command from inside the directory, for some reason star fails if you try to run it outside this directory.
- Finally, runs star in mode genomeGenerate.
Unzip the genome and annotation files and add your Reference directory to the script. Then run star indexing when ready.
cd /quobyte/ikorfgrp/bis180l/$USER/rnaseq_example
sbatch star_index.slurm
This step will take a couple of hours. You can look at the STAR documentation while you wait. All of the output files will be written to the star index directory star.overlap100.gencode.M35.
IF, for the sake of time, or for some reason it didn’t finish, is corrupted, or you missed the session, you can link over a completed copy. If the indexing job is still running, it should be canceled first.
cd /quobyte/ikorfgrp/bis180l/$USER/rnaseq_example/References
ln -s /quobyte/ikorfgrp/bis180l/najoshi/mRNAseq/References/star.overlap100.gencode.M35 .