Workpackage 1
Annual reports
WP2. Reliability scores for functional annotations
The objective of the WP is to produce a reliable metric to indicate
the friability of the transference fo functional annotations, similar
to the one commonly used to indicate
the confidence of the similarities identified in sequence searches in large databases.
Specific goals are the revision of the work carried out in this area,
the updated of the best available approaches, the application the test
sets prepared in WP1, the inclusion of information derived from
multiple sequence alignments and protein structures, and finally the
delivery of the scores in the proper integrated technical framework.
During this first period we have revisited the literature and the
technical details underlying the previous situations of annotation
errors.
(Valencia, Curr. Op. Struc. Biol., 2005 submitted).
As consequence of that study we have carried out a complete new
implementation using the CE database as the guide for the extraction
of pair wise alignments, we have incorporated a new procedure to
reduce the redundancy at the level of sequence similarity and
functional classes (in this case codes indicating biochemical
functions, EC numbers).
The current estimate of the discrepancy of the EC numbers between
pairs of proteins at different levels of sequence similarity can be
described as inter medium between our previous work (Devos Vlaencia
Proteins 2001) and Todd, et al., J Mol Biol. 2001). This new up dated
calibration will be the basis of the rest of the work (del Pozo
Valencia 2005 in preparation).
For the next reporting period we propose to update the calculations
using the same basic dataset of pairs of proteins, and analyzing other
definitions of protein functions complementary to the definition of
protein enzymatic function.
In parallel we will develop the algorithms for extending this
evaluation to full alignments taking into account not only the pair
wise relations but the full family structure.
Finally, we will incorporate the method for estimating levels of
errors in a web server, able to serve XML annotations to be used by
the other partners, and as a DAS server able to be integrated in other
genome annotation pipelines.
Partner 04 are developing a phylogenetic tree for all the sequenced
genomes which includes branch lenghts. The branch length between
species indicate the average rate of evolution i.e. when analysing
orthologs between different species, the normalisation to the species
tree will be an important prerequisite for function transfer between
species.
Deliverables
- Reliability score for function transfer based on pair wise
similarities: Month 6
The work has been carried out, and the draft of the corresponding
publication (dell Pozo, Valencia 2005) will serve as report.
Deliverables
Arne Elofsson
Last modified: Fri Mar 18 10:36:47 CET 2005