|Bioinformatics (2002) 18 Suppl 1(31):S78-86|
|Northeast Structural Genomics Consortium|
(click to unfold)
MOTIVATION: The SWISS-PROT sequence database contains keywords of functional annotations for many proteins. ...
In contrast, information about the sub-cellular localization is available for only a few proteins. Experts can often infer localization from keywords describing protein function. We developed LOCkey, a fully automated method for lexical analysis of SWISS-PROT keywords that assigns sub-cellular localization. With the rapid growth in sequence data, the biochemical characterisation of sequences has been falling behind. Our method may be a useful tool for supplementing functional information already automatically available. RESULTS: The method reached a level of more than 82% accuracy in a full cross-validation test. Due to a lack of functional annotations, we could infer localization for fewer than half of all proteins in SWISS-PROT. We applied LOCkey to annotate five entirely sequenced proteomes, namely Saccharomyces cerevisiae (yeast), Caenorhabditis elegans (worm), Drosophila melanogaster (fly), Arabidopsis thaliana (plant) and a subset of all human proteins. LOCkey found about 8000 new annotations of sub-cellular localization for these eukaryotes.
|classification metabolism chemistry methods |
|Pattern Recognition, Automated Abstracting and Indexing as Topic Humans Databases, Protein Tissue Distribution Animals Information Storage and Retrieval Cellular Structures Vocabulary, Controlled Proteins Algorithms Natural Language Processing Sequence Analysis, Protein |
|61 (Last update: 05/27/2017 12:26:07pm)|
|Bioinformatics. 2002;18 Suppl 1:S78-86.|