The Genomic Diversity and Phenotype Connection    

Terry Casstevens 1 and Edward Buckler 2
1 Institute for Genomic Diversity, Cornell University
2 USDA-ARS, Institute for Genomic Diversity, Cornell University
What is GDPC?

The Genomic Diversity and Phenotype Connection (GDPC) simplifies access to genomic diversity and phenotype data, thereby encouraging reuse of this data. GDPC accomplishes this by retrieving data from one or more data sources and by allowing researchers to analyze integrated data in a standard format. GDPC provides access to genomic diversity data such as SNPs, SSRs, sequences, etc. and phenotypic data that may be collected in field, genetic, or physiological experiments.
Why is GDPC Important?

GDPC promotes the reuse and reanalysis of data by making it publicly available.
Numerous research projects on genomic diversity and phenotypes have generated valuable data collections. This data often remains on individual desktop PCs or in private databases, even after publication. Ideally, this data would be publicly available to other researchers for reanalysis. GDPC accelerates the public availability of data by providing the infrastructure to create and use connections to multiple data sources, regardless of their underlying format.

GDPC users can retrieve data from multiple data sources simultaneously.
Since all "GDPC enabled" data sources return data in a common format, users can integrate, analyze, and view all the data at the same time. This is a significant advantage over other tools that give access to only one data source.

Software analysis tools using the GDPC JAVA API will automatically have access to new "GDPC enabled" data sources.
GDPC creates a middle layer between software analysis tools and the data these tools rely on. Each GDPC connection to a data source knows the specific data format for its source, and masks it from the rest of the system. Thus, once an analysis tool is made "GDPC aware," it can automatically take advantage of any GDPC data source (present or future) without any additional software development. This also allows programmers to focus on the particular goals of their own analysis tools, rather than on the integration of various data formats.

GDPC integrates Genomic and Phenotypic data to promote the development of software tools that analyze this data.
In order to bridge the gap between genomics and plant breeding, databases and software analysis tools that integrate genomic diversity and phenotypic data are necessary. GDPC provides access to both, retrieved from multiple data sources in a common format. With this standard interface available, it is our hope that more analysis tools for this data will follow.
What Data can be Accessed?

These are the core GDPC data elements and their relationships. This data can be accessed with the front-end application, GDPC Browser , or via the GDPC JAVA API. See the GDPC Data Model for full details.
  • Allele - one allele (i.e. SNP, SSR) and associated properties (i.e. quality score).
  • Locality - a geographical location.
  • Locus - a region on a chromosome.
  • Environment Experiment - an experiment used to acquire phenotypic data. One of its properties is a locality.
  • Taxon - a particular seed line. One of its properties is a locality.
  • Taxon Parent - defines a parent of a give taxon.
  • Genotype Experiment - an experiment used to acquire genotypic data. One of its properties is a locus.
  • Phenotype Ontology - a physical trait defined by a larger classification of traits.
  • Phenotype - a value for a physical trait for a given taxon collected by a particular environment experiment.
  • Genotype - a genotype value for a given taxon collected by a particular genotype experiment.
  • Study - groups together environment experiments and/or genotype experiments to represent a study.
How does GDPC fit into the Big Picture?

Diagram 1 shows how GDPC fits into the bigger picture of acquiring, integrating, and analyzing genotypic and phenotypic data. The data being passed by the system depicted below is described in more detail in the section What Data can be Accessed. Click on the objects in the diagram for more information about the individual pieces of the system.


Diagram 1: Overview of acquiring, integrating, and analyzing data.
 
How can we be contacted?

Please send any comments or feedback to: tmc46@cornell.edu