KING Tutorial: Visualization of Families
Relationship inference in KING is far beyond accurately estimating the kinship coefficients between any pairs of relatives.
The inferred family members can be clustered / reconstructed, and starting from KING version 2.2, the clustered/reconstructed families can be visualized using R plots.
Two types of visualization methods are used for inferred families: visualization as classic pedigrees, and visualization as igraph graphs.
All visualization in KING is through option --rplot, and different inference analysis comes with different visualization.
Note special R packages may need to be installed. One way is to install R packages locally.
Visualization of Families As Classic Pedigrees
Examples of pedigree plots are:
prompt> king -b ex.bed --prefix ex --rplot
prompt> king -b ex.bed --prefix ex --build --degree 2 --rplot
The first command (--rplot without any inference options) plots all pedigrees that are provided by the ex.fam file.
The second command (--build --rplot) plots all inferred pedigrees that KING is able to build.
The pedigree plot is considered as classic and usually more desired by geneticists;
however, the limitation is not all inferred relationships can be reflected in the reconstructed pedigrees.
Note the installation of R package kinship2 is required for this --rplot to work properly, or otherwise only the R code ex_buildplot.R is generated without the actual plots in PDF.
The pedigree plot for the second clustered family looks like below.
Here we would also like to demonstrate the visual effect of pedigree plots by using a large HapMap MKK pedigree that KING can build.
Visualization of All Unique Family Configurations
Examples of visualizing all unique family configurations are:
prompt> king -b ex.bed --prefix ex --ibdseg --degree 2 --rplot
prompt> king -b ex.bed --prefix ex --related --degree 2 --rplot
Both commands are able to infer and plot all unique family configurations. The main differences are
1) --related --rplot only visualizes unique family configurations that are cryptic (between families) (see the first plot);
2) --related is expected to be orders of magnitude faster; and
3) --related --rplot also visualizes (within-family) pedigree errors (see the third plot).
The advantage of the igraph plots over the classic pedigree plots are all inferred relationships can be visualized.
Note the installation of R package igraph is required for this --rplot to work properly,
or otherwise only the R code ex_uniqfamplot.R is generated without the actual plots in PDF.
The visualization of pedigree errors through --related --rplot requires both igraph and kinship2 packages.
The igraph plots for the first command look like below. The first plot displays all unique families as well as their total counts,
and the second plot shows frequent unique families in greater details, e.g.,, the family structure (trios etc.) is stated explicitly in the title.
Visualization of Each Clustered Family
Examples of visualizing each clustered family are:
prompt> king -b ex.bed --prefix ex --cluster --degree 2 --rplot
Note the installation of R package igraph is required for --cluster --rplot to work properly,
or otherwise only the R code ex_clusterplot.R is generated without the actual plots in PDF.
The igraph plot for the second clustered family looks like below.
Here we would also like to demonstrate the visual effect of igraph plots by using 171 HapMap MKK samples.
Visualization of Duplicates
Suppose we would like to examine if our supposely duplicate data are matching (i.e., two samples with identical DNA and IDs) the original data.
A directed igraph plot can be very useful in examining the patterns of sample mix-ups,
e.g., switch between pairs of samples, or shift among a large number of samples.
We first create a toy dataset with the last 100 IDs being corrupted:
prompt> head -232 ex.fam > ex2.fam
prompt> awk 'NR>232 && NR%2==1' ex.fam >> ex2.fam
prompt> awk 'NR>232 && NR%2==0' ex.fam >> ex2.fam
Then the following KING command allows intentional duplicates and infers possible mismatches (i.e., two samples with identical DNA but different IDs):
prompt> king -b ex.bed,ex.bed --fam ex.fam,ex2.fam --duplicate --rplot
Note the installation of R package igraph is required for --duplicate --rplot to work properly,
or otherwise only the R code king_duplicateplot.R is generated without the actual plots in PDF.
The igraph plot for intentional but mismatched duplicates looks like this:
Run Of Homozygosity
Run of homozygosity (ROH) can indicate inbreeding.
It is easy to generate and plot ROH segments in KING (2.2.1 and later, see below) for all individuals with proportion of their genomes being ROH > 4.4%,
which corresponds to being offspring of parents that are 2nd-degree or closer.
prompt> king -b ex.bed --roh --rplot
Note the installation of R package ggplot2 is required for --roh --rplot to work properly.
The ROH plots for inbred individuals look like this:
REFERENCE
Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen WM
(2010) Robust relationship inference in genome-wide association studies.
Bioinformatics 26(22):2867-2873
[Abstract]
[PDF]
======================================
Last updated: May 21, 2019 by Wei-Min Chen
|