Workpackage 1

Annual reports

-------------------------------------------------

In the course of the GENEFUN project, a unique scoring system for the
prediction of functional associations has been developed and already
implemented. In order to make the heterogenous data comparable, we devised
a single benchmark and scored all the different sets (regardless whether
predicted or experimentally devised). We also divised a scroring scheme
for the transfer of interactions between species (do expression data in
mice apply to human and if so under which circumstances and to which
extend?). Several factors for interaction transfer were considered e.g.
the more distant two species are the less confident we are in the function
transfer and the more inparalogs a gene has the less confidence we have in
the transfer. The existing STRING tool was entirely redesigned to cope
with novel data formats and an expected increase of interaction databases.

Furthermore, we devised a number of filters for several of the raw data
(e.g. y2h, complex purifications or arrays), some of them led to
independent publications. We have bundled the various data types into a
number of distinct channels and for each channel, visualisation tools are 
being been developed.

Another major development concentrated on an improved maintainability of
the tool and server. So far, we have information on 179 species covering
more than 730.000 proteins and more than 23Mio interactions with various
degrees of confidence. They currently come from 11 different resources and
predictions.

The development of the STRING resource has involved considerable human
resources far beyond the man-months allocated by GENEFUN. The result is a
framework that is getting heavily used by the scientific community.

In order to make a metasever a success, each of the method implementations
has to be of high accuracy. The partners in this work package worked on
the improvement of different methods. For example, considerable progress
has been made in text mining (improvement of precision and recall of
protein names and inclusion of a large number of organisms) but also in 
the extension of genomic cotnext methods.

We have continued to apply the evolving tools for the prediction of 
functional features and have successfully combined homology and context 
analysis in a number of projects.




D5.1. Our existing web, server, STRING, to predict and integrate protein 
interaction data, has been entirely redesigned to be able to cope with the 
challenges of this EU project. With version 5.0 in spring 2004 we started 
to incorporate experimental data, enabled by a unifiying benchmark and 
scoring scheme. In February 2005, version 6 was released with a veriety of 
predicted and experimentally derived interactions. The respective 
documents and data are depositied in the STRING WEB server.


Deliverables


Arne Elofsson