|Journal of Molecular Biology (2002) 323:811-22|
|Northeast Structural Genomics Consortium|
(click to unfold)
We have examined conserved protein motifs in the non-coding, intergenic regions ("pseudomotif patterns") and surveyed their occurrence in the fly, worm, yeast and human genomes (chromosomes 21 and 22 only). ...
To identify these patterns, we masked out annotated genes, pseudogenes and repeat regions from the raw genomic sequence and then compared the remaining sequence, in six-frame translation, against 1319 patterns from the PROSITE database. For each pseudomotif pattern, the absolute number of occurrences is not very informative unless compared against a statistical expectation; consequently, we calculated the expected occurrence of each pattern using a Poisson model and verified this with simulations. Using a p-value cut-off of 0.01, we found 67 pseudomotif patterns over-represented in fly intergenic regions, 34 in worm, 21 in human and six in yeast. These include the zinc finger, leucine zipper, nucleotide-binding motif and EGF domain. Many of the over-represented patterns were common to two or more organisms, but there were a few that were unique to specific ones. Furthermore, we found more over-represented patterns in the fly than in the worm, although the fly has fewer pseudogenes. This puzzling observation can be explained by a higher deletion rate in the fly genome. We also surveyed under-represented patterns, finding 23 in the fly, 12 in the worm, 18 in human and two in yeast. If intergenic sequences were truly random, we would expect an equal number of over and under-represented patterns. The fact that for each organism the number of over-represented patterns is greater than the number of under-represented ones implies that a fraction of the intergenic regions consist of ancient protein fragments that, due to accumulated disablements, have become unrecognizable by conventional techniques for gene and pseudogene identification. Moreover, we find that in aggregate the over-represented pseudomotif patterns occupy a substantial fraction of the intergenic regions. Further information is available at http://pseudogene.org
|Conserved Sequence Humans Caenorhabditis elegans Pseudogenes Saccharomyces cerevisiae DNA, Intergenic Animals Species Specificity Genome Amino Acid Sequence DNA, Fungal Proteins Drosophila Amino Acid Motifs DNA, Helminth Genes, Insect |
|15 (Last update: 03/25/2017 12:06:33pm)|
|J Mol Biol. 2002 Nov 8;323(5):811-22.|