☰ Menu

      Genome-Wide Association Studies

Home
Introduction and Lectures
Intro to the Workshop and Core
Schedule
Dr. Anthony Musolf Talk
What is Bioinformatics/Genomics?
Support
Slack
Zoom
Cheat Sheets
Software and Links
Scripts
Prerequisites
Logging In
CLI
R
Cluster Computing
Data Reduction
Files and Filetypes
Project setup
Preprocessing raw data
Alignment with BWA
Variant calling using GATK
Comparison of freebayes, GATK, and deepvariant output
Data Analysis
Plink Step by Step TDT
Plink Step by Step TDT (solutions)
wAnnovar Annotation
Plink Step by Step (Non FBAT excercise)
Setup in R
GWAS Visualization
ETC
Closing thoughts
Workshop Photos
Github page
Biocore website

Question on bim files:

What can you say about the variant at position 10001930? What is the observed variant and what is the reference allele?

cat ${outpath}/all_${sample}.bim | grep 10001930
21	.	0	10001930	T	C

Allele 1 is a T and is the minor allele or the variant where as the reference is allele 2 and is a C.

Question on fam files:

Amongst your group:

man grep
cat 8_fam.txt | grep 1280 -A 4 -B 4
14071 5136-SB-0268 5136-SB-1281 5136-SB-1280 1 2
14071 5136-SB-1280 0 0 2 1
14071 5136-SB-1281 0 0 1 1
14023 5136-SB-0214 5136-SB-0216 5136-SB-0215 1 2
14023 5136-SB-0215 0 0 2 1
14023 5136-SB-0216 0 0 1 1

Explanation should be pretty straightforward.

Question on genotyped 95%

diff ${outpath}/all_${sample}_fam.bim ${outpath}/clean-missing_${sample}.bim | head
head ${outpath}/all_${sample}_fam.bim
head ${outpath}/clean-missing_${sample}.bim
< 21	.	0	5030105	A	C

Question on monomorphic

diff ${outpath}/${outpath}/clean-missing_${sample}.bim ${outpath}/clean-missing-monos_${sample}.bim | tail

Question about hemizygous

cat ${outpath}/allHemizgyous_${sample}.txt | wc -l
12179

Question about common variant filtering

What percentage of variants are removed at this step? Why do you think the nature of this percentage is fairly extreme?

282467/305530

Why do these exist in the distribution we saw like most are really rare and none inbetween really. Something to do with hardy weinberg?

Question on IBD

What could be happening with the relationship between 5136-SB-0663 & 5136-SB-0666? (Hint: look at PI HAT)

   FID1          IID1   FID2          IID2 RT    EZ      Z0      Z1      Z2  PI_HAT PHE       DST     PPC   RATIO
   1130  5136-SB-0678   1130  5136-SB-0679 PO   0.5  0.0332  0.9324  0.0344  0.5006   0  0.903368  1.0000 36.0000
   1130  5136-SB-0678   1130  5136-SB-0680 PO   0.5  0.0325  0.9196  0.0479  0.5077   0  0.904732  1.0000 17.7500
   1130  5136-SB-0679   1130  5136-SB-0680 OT     0  0.8181  0.1664  0.0155  0.0987  -1  0.852066  0.9292  2.9474
   1225  5136-SB-0663   1225  5136-SB-0664 PO   0.5  0.0000  1.0000  0.0000  0.5000   0  0.898837  1.0000 36.5000
   1225  5136-SB-0663   1225  5136-SB-0665 PO   0.5  0.0254  0.9461  0.0285  0.5015   0  0.903279  1.0000 36.0000
   1225  5136-SB-0663   1225  5136-SB-0666 FS   0.5  0.4414  0.4721  0.0865  0.3225   1  0.882745  0.9997  5.8182
   1225  5136-SB-0664   1225  5136-SB-0665 OT     0  1.0000  0.0000  0.0000  0.0000  -1  0.846839  0.0432  1.3438
   1225  5136-SB-0664   1225  5136-SB-0666 PO   0.5  0.0255  0.9206  0.0539  0.5142   0  0.905761  1.0000 36.5000
   1225  5136-SB-0665   1225  5136-SB-0666 PO   0.5  0.0361  0.9198  0.0441  0.5040   0  0.904128  1.0000 11.5000

I actually just noticed this yesterday as I am still curating the FAM file more myself but it seems records show that these two are FS but here it looks like HS or something else weird like some sort of incest since it is unclear how PI_HAT ~.33. Something I will talk with anthony about. In addition you see here that we have some non perfect trios in the study. These are helpful