Genome Associated Prediction Integrated Tool
GAPIT – Genome Association and Prediction Integrated Tool – is an R package that performs a Genome-Wide Association Study (GWAS) and genome prediction (or selection). This program uses state-of-the-art methods developed for statistical genetics, such as the unified mixed model, EMMA, the compressed mixed linear model, and P3D/EMMAx.
The Mixed Linear Model (MLM) is one of the most effective methods for controlling false positives in GWAS. This model simultaneously incorporates both population structure and cryptic relationship (Yu et al. Nature Genetics, 2006, 38: 203-208) . Compressed Mixed Linear Model (CMLM) boosts statistical power and dramatically reduces computational time on large samples by clustering individuals into groups (Zhang et al, Nature Genetics, 2010, 42(4): 355–360).
An increased number of SNPs are becoming available because of new genotyping technologies such as Genotyping by Sequencing (GBS). Thus, computational time is becoming more of an issue when conducting GWAS. Population Parameters Previously Determined (P3D), or EMMAx, was developed to decrease computational time without compromising statistical power. This method performs GWAS in two steps. First, a mixed model without a SNP effect is fitted to estimate the population parameters, such as the genetic variance, residual variance, or their ratio. The second step fits a separate mixed model for each SNP using P3D in the first step. This method was independently developed as P3D (Zhang et al., Nature Genetics, 2010, 42: 355–360) and EMMAx (Kang et al., Nature Genetics, 2010, 42: 348-354) .
Efficient Mixed Model Association (EMMA) was developed by Kang et al (Genetics, 2008, 178: 1709-1723). The EMMA algorithm is implemented in an R package. GAPIT uses the EMMA R package as one of the required libraries. Consequently, the EMMA algorithm is automatically embedded in GAPIT. We added one line to the EMMA source code to handle the situations where the estimated likelihood function is not available. The modified EMMA source code can be downloaded here.
Integration of genomic prediction was implemented through CMLM to improve prediction accuracy. Genomic prediction is the terminology used for prediction of disease risk in humans. In plant and animal breeding, the genomic prediction is known as genomic selection. GAPIT estimates genomic breeding values as well as their prediction accuracy.
GAPIT implements additional strategies to handle large genotypic data sets. By subdividing the genotypic data into multiple smaller files, the memory requirement of GAPIT remains constant. GAPIT can read in genotypic data in either HapMap format or the numerical format required for the EMMA R package. GAPIT reports detailed results in a series of tables and graphs (e.g., Manhattan plot, QQ-plot, etc.). on input and output.
Running GAPIT could be as difficult as other software-driven utilizing a command line interface (CLI). This interface is efficient for repeated multiple tasks. However, it is not intuitive as Graphic User Interface (GUI) software (e.g. TASSEL) as memorizing commands takes time, especially for infrequent users. As a result, we have taken a third path that reflects the way people work in the age of search engines –read-copy-paste (RCP). The “Read” part plays the role of GUI. The “Copy/Paste” part serves as CLI. We provide a demonstration dataset, a Case-Control Study from the SNPstats package, and several tutorials (R Demo Code and Results) for the RCP path to make the learning curve shorter. As demonstrated in the “Getting Started” section of GAPIT User Manual, the GAPIT R Source Code can be accessed directly through our website which is updated frequently. Comments, suggestions, and bug reports are appreciated.
Download the user manual for an overview of GAPIT's functionality. Instructions for downloading and installing GAPIT are on Page 1.
Source code and demonstration data:
How to cite GAPIT: