Unified Mixed-Model Method for Association Mapping Association mapping with complex pedigrees, families, founding effects and population structure
For association mapping, a given sample may contain either population structure (associated with local adaptation or diversifying selection), or familial relatedness (from recent coancestry), or both. As population structure and familial relatedness can result in spurious associations, they have constrained the use of association studies in human and plant genetics. We have recently developed a unified mixed-model method to simultaneously account for multiple levels of both gross level population structure (Q) and finer scale relative kinship (K). As this new method crosses the boundary between family-based and mixed association samples, it provides a powerful complement to the current methods for association mapping. The superiority of this novel method in controlling for both Type I and Type II error rates over other method has been demonstrated with both 1) human quantitative gene expression dissection as well as 2) quantitative trait dissection in 277 diverse inbred maize lines with complex familial relationship and population structure.
Q is an n × p population structure incidence matrix where n is the number of individuals assayed and p is the number of populations defined; Q is inferred from Pritchard’s STRUCTURE (Pritchard et al., 2000) estimates with p populations (p is Pritchard’s K).
Coancestry or kinship coefficients is the probability that two homologous genes are identical by decent. Coancestry coefficients can be estimated at a population level (Θ or Fst) or between two individuals (Θij, between genes randomly sampled, one from individual i and the other from individual j). Marker based relative kinship estimates have been developed (Loiselle et al., 1995; Lynch and Ritland, 1999; Ritland, 1996; Rousset, 2002) and can be defined as
Fij = (Qij-Qm)/(1-Qm) ≅ Θij ,
where Qij is the probability of identity by state for random genes from i and j, and Qm is the average probability of identity by state for genes coming from random individuals in the population from which i and j where drawn.
SPAGeDi software (Hardy and Vekemans, 2002) was used to estimate the Loiselle (Loiselle et al., 1995) kinship coefficient using our SNP data set.
There will commonly be negative Fij: Negative values between individuals are set to zero as this indicates they are less related than random individuals. Not setting these values to zero would increase the Type I and Type II error rates in the association test The steps involved in carrying out this approach are as follows:
Step 1. Create a Q matrix
Obtain population structure matrix by running STRUCTURE. Format the output from STRUCTURE to a text file readable by TASSEL.
Step 2. Create a K Matrix
Obtain relative kinship matrix by running SPAGeDi. Set negative values to zero. Format the output from SPAGeDi to a text file readable by TASSEL.
Step 3. Generate Candidate SNPs
Import the candidate gene sequence data to TASSEL directly, or format you candidate marker data to a text file readable by TASSEL.
Step 4. Trait
Format trait data to a text file readable by TASSEL.
Step 5. Mixed model
Run the mixed model in TASSEL. Details about how to use TASSEL in general and for mixed model can be found at TASSEL Documentation.
We also have implemented this approach in SAS. Here is the SAS code.
Yu, J.*, G. Pressoir*, W.H. Briggs, I. Vroh Bi, M. Yamasaki, J.F. Doebley, M.D. McMullen, B.S. Gaut, D.M. Nielsen, J.B. Holland, S. Kresovich, and E.S. Buckler. 2006. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nature Genetics. PDF
* Jianming Yu and Gael Pressoir contributed equally to this work
GAPIT R Package
Henderson, C.R. 1984. Application of linear models in animal breeding Univ. of Guelph, Ontario.
Kennedy, B.W., M. Quinton, and J.A. van Arendonk. 1992. Estimation of effects of single genes on quantitative traits. J Anim Sci 70:2000-12.
Q (Population structure)
Pritchard, J.K., M. Stephens, and P. Donnelly. 2000. Inference of population structure using multilocus genotype data. Genetics 155:945-59.
STRUCTURE software, http://pritch.bsd.uchicago.edu/structure.html
K (relative kinship)
Loiselle, B.A., V.L. Sork, J. Nason, and C. Graham. 1995. Spatial genetic structure of a tropical understory shrub, Psychotria officinalis (Rubiaceae). Am. J. Bot. 82:1420-1425.
Ritland, K. 1996. Estimators for pairwise relatedness and individual inbreeding coefficients. Genet. Res. 67:175-186.
SPAGeDi software, http://www.ulb.ac.be/sciences/ecoevol/spagedi.html
Human gene expression
Morley, M., C.M. Molony, T.M. Weber, J.L. Devlin, K.G. Ewens, R.S. Spielman, and V.G. Cheung. 2004. Genetic analysis of genome-wide variation in human gene expression. Nature 430:743-7.
SNP Consortium Linkage Map Project database
Maize association mapping
Flint-Garcia, S.A., A. Thuillet, J. Yu, G. Pressoir, S.M. Romero, S.E. Mitchell, J.F. Doebley, S. Kresovich, M.M. Goodman, and E.S. Buckler. 2005. Maize association population: A high resolution platform for QTL dissection. Plant J. 44:1054-1064. PDF
Thornsberry, J.M., M.M. Goodman, J. Doebley, S. Kresovich, D. Nielsen, and E.S. Buckler. 2001. Dwarf8 polymorphisms associate with variation in flowering time. Nat Genet 28:286-9. PDF
Maize Molecular and Functional Diversity Project, http://www.panzea.org/