KING Tutorial: GRS Risk Prediction
KING is a toolset to explore genotype data from a genome-wide association study (GWAS) or a sequencing project.
KING can be used to predict disease risks using genetic risk scores (GRS).
Using weights at a few disease susceptibility SNPs (e.g., logOR from published GWAS scans),
GRS can be computed for each individual as the weighted sum of genotypes.
GRS MODEL
Besides the genotype data as described at the main tutorial,
the GRS risk prediction also requires a GRS model file. This file may look like:
SNP EA AF WT CHR POS OA
rs9273363 A 0.131 1.702 6 32626272 C
rs9271594 G 0.095 1.801 6 32591213 A
rs2187668 T 0.076 1.367 6 32605884 C
rs34850435 T 0.345 0.839 6 32583299 C
rs34303755 C 0.216 1.079 6 32450613 A
rs689 T 0.265 0.403 11 2182224 A
rs2290400 C 0.459 0.295 17 38066240 T
The columns are:
SNP: SNP name
EA: effect allele
AF: allele frequency of the effect allele
WT: weight at the effect allele
CHR: chromosome of the SNP
POS: position of the SNP
OA: other allele
GRS RISK PREDICTION
--risk predicts disease risks for each individual according to the GRS risk model. Examples of risk prediction are:
prompt> king -b ex.bed --risk --model model.txt --prevalence 0.004 --noflip
--prevalence and --noflip are optional.
The prevalence of the disease can be specified (through --prevalence) if PPV and NPV are needed, and this option would not affect other prediction results.
The optional --noflip option can be useful when the strands of the genotype data are already consistent with the model. The risk prediction output may look like:
Risk Cutoff 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
TruePositives 1015 987 965 944 923 909 889 857 751
FalsePositives 2616 1959 1524 1306 1126 1030 868 716 444
TrueNegatives 312 969 1404 1622 1802 1898 2060 2212 2484
FalseNegatives 6 34 56 77 98 112 132 164 270
Sensitivity 0.9941 0.9667 0.9452 0.9246 0.9040 0.8903 0.8707 0.8394 0.7356
Specificity 0.1066 0.3309 0.4795 0.5540 0.6154 0.6482 0.7036 0.7555 0.8484
Positive PV 0.0056 0.0072 0.0090 0.0103 0.0117 0.0126 0.0145 0.0170 0.0238
Negative PV 0.9997 0.9995 0.9994 0.9993 0.9992 0.9992 0.9991 0.9989 0.9984
AUC (Area under the ROC curve) = 0.8708
AUC among 1234 males is 0.8686
AUC among 2715 females is 0.8693
Genetic risk scores are saved in file exgrs.txt
The generated risk prediction file (e.g., exgrs.txt) includes the following columns:
FID: family ID
IID: individual ID
InfoSNP: call rate
InfoVar: proportion of GRS variance at non-missing SNPs
GRS: genetic risk score, as the original form of weighted sum
Zscore: GRS divided by the total GRS variance (a function of the model that is independent of the test data)
Percent: Estimated percentage of GRS in the general population
ScaledGRS: transformed GRS, in the range (0, 1)
Status: given disease status, if the 6th column of the .fam file is available
OTHER PARAMETERS
The following parameters can also be specified:
--prefix specifies the name of the file that stores GWAS scan results. "king" is used as default.
REFERENCE
Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen WM
(2010) Robust relationship inference in genome-wide association studies.
Bioinformatics 26(22):2867-2873
[Abstract]
[PDF][Citations]
Onengut-Gumuscu S, Chen WM, Robertson CC, Bonnie JK, Farber E, Zhu Z, Oksenberg JR, Brant SR, Bridges SL Jr, Edberg JC, Kimberly RP, Gregersen PK, Rewers MJ,
Steck AK, Black MH, Dabelea D, Pihoker C, Atkinson MA, Wagenknecht LE, Divers J, Bell RA, Erlich HA, Concannon P, Rich SS (2019)
Type 1 Diabetes Risk in African-Ancestry Participants and Utility of an Ancestry-Specific Genetic Risk Score.
Diabetes care. 2019; 42(3):406-415
======================================
Last updated: August 24, 2018 by Wei-Min Chen
|