TY - JOUR
T1 - Assigning functional linkages to proteins using phylogenetic profiles and continuous phenotypes
AU - Gonzalez, Orland
AU - Zimmer, Ralf
PY - 2008/5
Y1 - 2008/5
N2 - Motivation: A class of non-homology-based methods for protein function prediction relies on the assumption that genes linked to a phenotypic trait are preferentially conserved among organisms that share the trait. These methods typically compare pairs of binary strings, where one string encodes the phylogenetic distribution of a trait and the other of a protein. In this work, we extended the approach to automatically deal with continuous phenotypes. Results: Rather than use a priori rules, which can be very subjective, to construct binary profiles from continuous phenotypes, we propose to systematically explore thresholds which can meaningfully separate the phenotype values. We illustrate our method by analyzing optimal growth temperatures, and demonstrate its usefulness by automatically retrieving genes which have been associated with thermophilic growth. We also apply the general approach, for the first time, to optimal growth pH, and make novel predictions. Finally, we show that our method can also be applied to other properties which may not be classically considered as phenotypes. Specifically, we studied correlations between genome size and the distribution of genes.
AB - Motivation: A class of non-homology-based methods for protein function prediction relies on the assumption that genes linked to a phenotypic trait are preferentially conserved among organisms that share the trait. These methods typically compare pairs of binary strings, where one string encodes the phylogenetic distribution of a trait and the other of a protein. In this work, we extended the approach to automatically deal with continuous phenotypes. Results: Rather than use a priori rules, which can be very subjective, to construct binary profiles from continuous phenotypes, we propose to systematically explore thresholds which can meaningfully separate the phenotype values. We illustrate our method by analyzing optimal growth temperatures, and demonstrate its usefulness by automatically retrieving genes which have been associated with thermophilic growth. We also apply the general approach, for the first time, to optimal growth pH, and make novel predictions. Finally, we show that our method can also be applied to other properties which may not be classically considered as phenotypes. Specifically, we studied correlations between genome size and the distribution of genes.
UR - https://www.scopus.com/pages/publications/43349107220
U2 - 10.1093/bioinformatics/btn106
DO - 10.1093/bioinformatics/btn106
M3 - Article
C2 - 18381403
AN - SCOPUS:43349107220
SN - 1367-4803
VL - 24
SP - 1257
EP - 1263
JO - Bioinformatics
JF - Bioinformatics
IS - 10
ER -