|Journal of Structural and Functional Genomics (2010) 11:71-80|
|Midwest Center for Structural Genomics|
(click to unfold)
The high-throughput structure determination pipelines developed by structural genomics programs offer a unique opportunity for data mining. ...
One important question is how protein properties derived from a primary sequence correlate with the protein's propensity to yield X-ray quality crystals (crystallizability) and 3D X-ray structures. A set of protein properties were computed for over 1,300 proteins that expressed well but were insoluble, and for approximately 720 unique proteins that resulted in X-ray structures. The correlation of the protein's iso-electric point and grand average hydropathy (GRAVY) with crystallizability was analyzed for full length and domain constructs of protein targets. In a second step, several additional properties that can be calculated from the protein sequence were added and evaluated. Using statistical analyses we have identified a set of the attributes correlating with a protein's propensity to crystallize and implemented a Support Vector Machine (SVM) classifier based on these. We have created applications to analyze and provide optimal boundary information for query sequences and to visualize the data. These tools are available via the web site http://bioinformatics.anl.gov/cgi-bin/tools/pdpredictor .
|chemistry genetics |
|Amino Acid Sequence Proteins Computational Biology Crystallization X-Rays Data Mining |
|26 (Last update: 02/16/2019 8:44:02pm)|
PDPredictor: Predicting crystallizability from protein sequence