Tools that infer an organism’s optimal growth temperature (OGT) from sequence data have potential biological and economic implications. Predicting growth temperature can improve understanding of how individual proteins and whole organisms evolve and adapt to their environment and provide insight into how the proteome affects organism fitness. Ideally, protein structure could be used to understand how thermodynamic stability across the whole proteome affects an organism’s fitness. But predicting protein structure is computationally and experimentally difficult, making it useful to predict stability from protein and sequence characteristics. As sequence data becomes more readily available and computational power increases, statistical and computational methods have been developed to identify protein-specific features that affect thermostability. These include linear regression and Bayesian approaches as well as machine learning models like random forest and neural networks (Jensen et al., 2012; Sauer, 2018).
Many models that have been developed to predict OGT use protein or whole-proteome features as predictors. Since our goal is to identify protein features related to OGT, we need an independent method to predict OGT for new organisms. For this reason, we are developing a model to predict OGT using only tRNA features. tRNA molecules are protein-independent, temperature-sensitive, genomic elements that are shared across all domains of life. Most mutations in tRNAs increase temperature sensitivity and result in partial or complete loss of tRNA function, suggesting that tRNAs are also highly adapted to the optimal temperature of their organism (Payea et al., 2018). Our initial ‘tRNA thermometer’ used a Random Forest model and tRNA features such as GC content and minimum free energy of folding to predict OGT. A second iteration of the model focuses on prokaryotes and uses a convolutional neural network with tRNA sequence and automatic feature extraction. These models will be used to predict OGT for further investigations into protein thermal stability.
Figures from poster at Ecological and Evolutionary Genomics Gordon Research Conference, 2019
tRNA features can predict OGT with r=0.87 and MAE=2.335, but predictions are most accurate between 20 and 50 degrees C.
The proteome amino acid content varies between mesophilic and thermophilic organisms.
Jensen, D. B., Vesth, T. C., Hallin, P. F., Pedersen, A. G., & Ussery, D. W. (2012). Bayesian prediction of bacterial growth temperature range based on genome sequences. BMC Genomics, 13(Suppl 7), S3. https://doi.org/10.1186/1471-2164-13-S7-S3
Payea, M. J., Sloma, M. F., Kon, Y., Young, D. L., Guy, M. P., Zhang, X., … Phizicky, E. M. (2018). Widespread temperature sensitivity and tRNA decay due to mutations in a yeast tRNA. Rna, 24(3), 410–422. https://doi.org/10.1261/rna.064642.117
Sauer, D., & Wang, D.-N. (2018). Prediction of Optimal Growth Temperature using only Genome Derived Features. BioRxiv, 1–27. https://doi.org/http://dx.doi.org/10.1101/273094