Skip to content
Sagar Nikam edited this page Nov 11, 2013 · 21 revisions

Validation of Time Series Technique for Prediction of Conformational States of Amino Acids

Abstract

  • Aim: -Use of Time Series concept in protein structure analysis and for prediction of conformational states of amino acid residues defined on the basis of Ramachandran plot. To the best of my knowledge, there have been no attempts to apply this statistical technique to analyze and predict protein structure.

  • Methods: -The best time series model was built for each protein structure to forecast different states of amino acid residues in a given protein. Comparison of predicted & original sequence was done to check out forecasted results.

  • Results: - Proteins of best AR (Autoregressive) models follows mainly all alpha and alpha+beta class. Conformational states accuracy was found greater than AA residues accuracy in prediction. Further clustering requires, ARMA (Autoregressive Moving Average), ARIMA (Autoregressive Integrated Moving Average), GARCH (Generalized Autoregressive Conditional Heteroscedasticity) modelling over selected data.

Description

  • entries in PDBSelect (proteins) downloaded from PDB & curated.
  • amino acid (AA) residue sequence used as Time Series by assigning Potential (normalized probability of occurrence of single amino acid residue in allowed conformational regions of Ramachandran plot).
  • done :
    • development of novel clustering methods by evaluating various time series models (AR, ARMA, ARIMA)
    • forecasting of conformational states of AA & AA residues.
    • prediction of missing residues in proteins from PDB

Conclusion

  • New approach has been used for protein structure prediction.
  • Application of Time Series technique for predicting conformational states based on the conformational state potentials instead of secondary structures has been attempted.
  • Accuracy of prediction of conformational states for AA, using time series is higher than that for prediction of AA residues.
  • To increase accuracy for prediction, multivariate time series concept may be useful instead of uni-variate time series.

Future Work

  • Autoregressive and Moving average order of time series models can be used as point of genetic information to predict evolutionary relationship between different proteins.
  • Time series concept can be used to predict conformational states of missing residues in PDB data files
  • Hierarchical clustering/classification of time series of proteins can give birth to new concept of time dependent clustering (pseudo-clustering) and pseudo-phylogeny.
  • Nucleation residues/sites can be predicted using TS graphs, wavelet analysis.
  • Development of synthetic proteins to combat seasonal diseases and to tackle chemical warfare attacks.
  • Time series fluctuations for specific class of proteins can be used as “Pattern” for data analysis and pattern-dependent classification of proteins

Links

License

[Apache 2.0] (http://www.apache.org/licenses/LICENSE-2.0.html "Apache Licence 2.0") & MIT

Like

endorse