Understanding Protein Adaptation to Temperature
The temperature of the environment affects molecular phenotypes within the cell. Biochemistry speeds up at high temperatures, cell membranes become more fluid, and both protein structure and composition change because more hydrogen bonds and electrostatic interactions are required to maintain the proper protein conformation.
This project aims to develop models that incorporate protein thermal stability to select for protein efficiency at elevated temperatures. There is a significant knowledge gap preventing us from building these models; most notably, we don’t yet understand which residues in a protein matter most for temperature adaptation. And because there could be thousands, if not millions, of sites that matter, identifying them becomes a fairly daunting task.
In this project we are using prokaryote genomes to 1) identify temperature-sensitive sites, 2) understand the biochemistry behind these sites and what changes make a protein thermotolerant, and 3) develop ways to rank or prioritize these sites so that they could be incorporated into genomic selection models.
Prokaryotes make a good system for studying molecular adaptation to temperature because they are simple organisms with a conserved Central Dogma and a long evolutionary history. Importantly, prokaryote species have adapted to a variety of extreme temperatures, from 0 C to above 100 C, and tend to have large effective population sizes and short generation times, which means that selection is able to act very efficiently to optimize the function of each protein at the prokaryote optimal growth temperature.
Using prokaryotes for this type of research question also requires that we know the optimal growth temperature (OGT) of many prokaryote species. Working with another member of the Buckler lab, I developed a convolutional neural network (CNN) that uses tRNA sequences from the genome to predict OGT, with 87.5% accuracy across both Archaea and Bacteria. The paper discussing this work is in preparation and you can read more about that project here. This model is useful for the current investigation of protein adaptation because it gives us the ability to predict OGT for any new prokaryote species that has a genome assembly. Equally important is the fact that these new OGT predictions are independent of protein composition and other protein features.
In the current project, I’m working from three main hypotheses using a set of 4,800 prokaryote genomes and proteomes, and over 9,000 protein domains.
Hypothesis 1: Protein composition is affected by temperature
This is a fairly straightforward hypothesis, and is already well-documented in the literature1, 2, 3, 4, 5. In the dataset I’m working with here, I see general trends that agree with the literature; some amino acids tend to be more prevalent at high temperatures, including glutamic acid, valine, and isoleucine.
Hypothesis 2: Thermal adaptation is highly polygenic
I’m using a genome-wide association study (GWAS) to determine which residues within a protein domain are associated with temperature stability. The model uses features of a protein domain6 as predictors and species OGT as a response. The results suggest that there are many sites within the genome that really are associated with temperature – nearly 90% of Archaea protein domains and 80% of Bacteria protein domains show large portions of the protein associating with temperature, and on average 25% of the residues within a protein domain matter (Figure 1). These results strongly support the hypothesis that thermal adaption is polygenic.
Figure 1. A large proportion of each protein domain is significantly associated with temperature. Almost 90% of protein domains in Archaea species and 80% in Bacteria species are enriched for associations with optimal growth temperature. On average, 25% of the residues in a protein domain are associated with temperature.
Hypothesis 3: Genomes have similar evolutionary responses to temperature
With this hypothesis, I’m trying to identify which amino acid residues show up as important across phylogenetic domains. Sites that are significantly associated with high temperature in both Archaea and Bacteria are good candidates for sites that then may also be important for Eukaryotes. Taking the intersection of amino acid sites that are important in both Archaea and Bacteria species reduces the search space within the genome, but many sites remain (Figure 2). I’m still working to address this hypothesis and rank the importance of these shared sites.
Figure 2: Pfam domains 4 and 118 as examples of overlapping significant sites. Blue circles indicate the number of sites within the pfam domain that are significantly associated with temperature in bacteria species and red circles indicate the number of sites associated with temperature in archaea species. There are a large number of sites that overlap between these groups, indicating the sites that are important in both phylogenetic domains.
Dijk, Erik van, Arlo Hoogeveen, and Sanne Abeln. 2015. “The Hydrophobic Temperature Dependence of Amino Acids Directly Calculated from Protein Structures.” PLoS Computational Biology 11 (5): e1004277.
Saelensminde, Gisle, Øyvind Halskau Jr, Ronny Helland, Nils-Peder Willassen, and Inge Jonassen. 2007. “Structure-Dependent Relationships between Growth Temperature of Prokaryotes and the Amino Acid Frequency in Their Proteins.” Extremophiles: Life under Extreme Conditions 11 (4): 585–96.
Wang, Guang-Zhong, and Martin J. Lercher. 2010. “Amino Acid Composition in Endothermic Vertebrates Is Biased in the Same Direction as in Thermophilic Prokaryotes.” BMC Evolutionary Biology 10 (August): 263.
Zeldovich, Konstantin B., Igor N. Berezovsky, and Eugene I. Shakhnovich. 2007. “Protein and DNA Sequence Determinants of Thermophilic Adaptation.” PLoS Computational Biology 3 (1): e5.
Zhang, Guangya, and Baishan Fang. 2006. “Application of Amino Acid Distribution along the Sequence for Discriminating Mesophilic and Thermophilic Proteins.” Process Biochemistry 41 (8): 1792–98.
Pfam database: https://pfam.xfam.org/