Home
Introduction and Lectures
Intro to the Workshop and Core
Schedule
Dr. Anthony Musolf Talk
What is Bioinformatics/Genomics?
Support
Slack
Zoom
Cheat Sheets
Software and Links
Scripts
Prerequisites
Logging In
CLI
R
Cluster Computing
Data Reduction
Files and Filetypes
Project setup
Preprocessing raw data
Alignment with BWA
Variant calling using GATK
Comparison of freebayes, GATK, and deepvariant output
Data Analysis
Plink Step by Step TDT
Plink Step by Step TDT (solutions)
wAnnovar Annotation
Plink Step by Step (Non FBAT excercise)
Setup in R
GWAS Visualization
ETC
Closing thoughts
Workshop Photos
Github page
Biocore website

wANNOVAR Annotation of Variants

This assumes that Plink Step by Step has been completed
ANNOVAR Databases Used
ANNOVAR is a rapid, efficient tool to annotate functional consequences of genetic variation from high-throughput sequencing data. wANNOVAR provides easy and intuitive web-based access to the most popular functionalities of the ANNOVAR software

Start Group Exercise (40 mins)

On your machine locally (or using filezilla/winscp)

lets get the file ready for annotation with annovar

user="keithgmitchell"
scp ${user}@tadpole.genomecenter.ucdavis.edu:/share/workshop/gwas_workshop/${user}/plink/master.avinput .

Go to this link: wAnnovar

Enter in the information below with your email, the sample identifier can be anything but use something helpful like chr21workshop.

Before you click submit, select the following parameters.

Now click submit. The result you see should be something like the following:

Finally, when the results are done you should get and email and you can click the link from the previous view and see the following:

Click the CSV file link for the genome summary results and take some time to go through with your group and check out the genome summary results view button to its left.

Question:

What exonic variants exist in gene PSMG1? What are their positions and why do you suppose some have more annotation then the others? What is the dbSNP id for the more annotated variants? What is the MAF of the population at this sight? Do you think it is a variant of interest?

Now on the cluster:

simple python script that makes the awkward plink output a clean CSV

cd /share/workshop/gwas_workshop/${USER}/plink
# Very simple script is just turning the values into a csv file for easier analysis with R uses the adjusted tdt file
python fix_tdt.py

merge these csv files so we only have one to run

head -1 02-CleanedTDT/tdt_21.frq.csv > tdtfrq.csv
for i in `ls 02-CleanedTDT/*.frq.csv`; do tail -n +2 "${i}" >> tdtfrq.csv; done

head -1 02-CleanedTDT/tdt_21.tdt.adjusted.csv > tdtadj.csv
for i in `ls 02-CleanedTDT/*.tdt.adjusted.csv`; do tail -n +2 "${i}" >> tdtadj.csv; done

lets take a quick look at our final files

cat master.avinput | wc -l
cat tdtadj.csv | wc -l
cat tdtfrq.csv | wc -l

☰ Menu

Genome-Wide Association Studies