| |
| |
Terry Casstevens 1 and Edward Buckler 2
|
1 Institute for Genomic Diversity, Cornell University
2 USDA-ARS, Institute for Genomic Diversity, Cornell University
|
|
What is GDPC?
The Genomic Diversity and Phenotype Connection (GDPC) simplifies access to
genomic diversity and phenotype data, thereby encouraging
reuse of this data.
GDPC accomplishes this by retrieving data from
one or more data sources and by allowing researchers to
analyze integrated data in a standard format.
GDPC provides access to genomic diversity data such as SNPs, SSRs,
sequences, etc. and phenotypic data that may be collected in field, genetic,
or physiological experiments.
|
Why is GDPC Important?
GDPC promotes the reuse and reanalysis of data by making it publicly available.
Numerous research projects on genomic diversity and phenotypes have
generated valuable data collections. This data often remains on
individual desktop PCs or in private databases, even after publication.
Ideally, this data would be publicly available to other
researchers for reanalysis. GDPC accelerates the public availability
of data by providing the infrastructure to create and use connections
to multiple data sources, regardless of their underlying format.
GDPC users can retrieve data from multiple data sources simultaneously.
Since all "GDPC enabled" data sources return data in a common format,
users can integrate, analyze, and view all the data at the same time.
This is a significant advantage over other tools that give access to
only one data source.
Software analysis tools using the GDPC JAVA API will automatically have
access to new "GDPC enabled" data sources.
GDPC creates a middle layer between software analysis tools and the data
these tools rely on. Each GDPC connection to a data source knows the
specific data format for its source, and masks it from the rest of the
system. Thus, once an analysis tool is made "GDPC aware," it can
automatically take advantage of any GDPC data source (present or future)
without any additional software development. This also allows programmers
to focus on the particular goals of their own analysis tools, rather
than on the integration of various data formats.
GDPC integrates Genomic and Phenotypic data to promote the development
of software tools that analyze this data.
In order to bridge the gap between genomics and plant breeding, databases
and software analysis tools that integrate genomic diversity and phenotypic
data are necessary. GDPC provides access to both, retrieved from multiple
data sources in a common format. With this standard interface
available, it is our hope that more analysis tools for this data will
follow.
|
What Data can be Accessed?
These are the core GDPC data elements and
their relationships. This data can be accessed with the front-end
application,
GDPC Browser
, or via the GDPC JAVA API. See the
GDPC Data Model
for full details.
- Allele - one allele (i.e. SNP, SSR) and associated properties
(i.e. quality score).
- Locality - a geographical location.
- Locus - a region on a chromosome.
- Environment Experiment - an experiment
used to acquire phenotypic data. One of its properties
is a locality.
- Taxon - a particular seed line.
One of its properties is a locality.
- Taxon Parent - defines a parent of a give taxon.
- Genotype Experiment - an experiment
used to acquire genotypic data. One of its properties
is a locus.
- Phenotype Ontology - a physical trait defined by
a larger classification of traits.
- Phenotype - a value for a physical trait for a given
taxon collected by a particular environment experiment.
- Genotype - a genotype value for a given taxon
collected by a particular genotype experiment.
- Study - groups together environment experiments and/or
genotype experiments to represent a study.
|
How does GDPC fit into the Big Picture?
Diagram 1 shows how GDPC fits into the bigger picture of
acquiring, integrating, and analyzing genotypic and phenotypic data.
The data being passed by the system depicted below is described
in more detail in the section What Data can be Accessed.
Click on the objects in the diagram for more information about the
individual pieces of the system.
|

|
|
Diagram 1: Overview of acquiring, integrating, and analyzing data.
|
|
| |
How can we be contacted?
Please send any comments or feedback to:
tmc46@cornell.edu
|