Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

T03 ma proteinfolding #5

Open
wants to merge 28 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
b86cca2
Initial commit
Introvertuoso Jun 5, 2023
a42f2c8
Add practical section
Introvertuoso Jun 5, 2023
cd292aa
Update data README.md
Introvertuoso Jun 5, 2023
88988a2
Add contents of practical (sections)
Introvertuoso Jun 5, 2023
23c5023
Change input sequence
Jun 6, 2023
4e4f871
Update intro
Introvertuoso Jun 6, 2023
5dbbfe7
Add output
Jun 6, 2023
3ea034c
Merge remote-tracking branch 'origin/T03-MA_proteinfolding' into T03-…
Introvertuoso Jun 6, 2023
6367e1f
Fix writing
Introvertuoso Jun 6, 2023
07cc9e1
Update notebook
Introvertuoso Jun 6, 2023
d4341d2
Update notebook
Introvertuoso Jun 7, 2023
6ad268c
Update talktorial
Introvertuoso Jun 14, 2023
b271ab9
Add images
Introvertuoso Jun 14, 2023
afa245d
Update README.md
Introvertuoso Jun 14, 2023
a611dcc
Update talktorial
Introvertuoso Jun 14, 2023
9cf66c6
Update OmegaFold
Introvertuoso Jun 15, 2023
4772bee
Update OmegaFold
Introvertuoso Jun 15, 2023
4618d75
Check grammar
Introvertuoso Jun 16, 2023
a57e2aa
Add new protein
Introvertuoso Jun 16, 2023
99e8188
Add new protein
Introvertuoso Jun 22, 2023
4576eea
Update notebook
Introvertuoso Jun 26, 2023
b003170
Little cleanup
Introvertuoso Jun 27, 2023
46138ca
Update the notebook (practical part)
Introvertuoso Jun 30, 2023
c3eb1b5
Update the notebook (practical part)
Introvertuoso Jun 30, 2023
ac81bc7
Update the notebook (practical part)
Introvertuoso Jun 30, 2023
ba65cfd
Update the notebook
Introvertuoso Jul 1, 2023
5d70e66
Add requirements.txt
Introvertuoso Jul 1, 2023
0500197
Update README.md and T03_proteinfolding.ipynb
Introvertuoso Jul 3, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 63 additions & 0 deletions notebooks/T03_proteinfolding/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# T03 · Protein Folding

Authors:
- Mhd Jawad Al Rahwanji, CADD seminar 2023, Volkamer lab, Saarland University
- Paula Linh Kramer, 2023, Volkamer lab, Saarland University
- Andrea Volkamer, 2023, Volkamer lab, Saarland University


## Aim of this talktorial

In this notebook, we will learn about protein folding and how to predict protein structures using machine learning. This task is crucial for understanding diseases and accelerating drug development.


### Contents in *Theory*

* Protein Folding
* Proteins
* The Folding Problem
* History
* CASP
* Breakthroughs
* OmegaFold
* Inner Workings and Training
* Performance Evaluation
* More on Orphan Proteins and Antibodies
* Investigating the Geoformer
* Computational Performance
* Alternative Methods
* Quantum Approach
* Diffusion-based Models


### Contents in *Practical*

**Goal: Predict the 3D structure of a protein from a given sequence of amino acids and assess the results**

* Overview
* Setup
* Processing the Sequences
* Analyzing the Predictions
* 6YJ1
* 7FVU
* Analyzing the Secondary Structures
* RMSD
* Prediction Confidences
* Ramachandran Plots
* Summary


### References

* [CASP](https://predictioncenter.org/)
* AlphaFold2: [Jumper *et al.*, <i>Nature</i> (2021), <b>596</b>, 583–589](https://doi.org/10.1038/s41586-021-03819-2)
* RoseTTAFold: [Baek *et al.*, <i>Science</i> (2021), <b>373</b>, 871-876](https://doi.org/10.1126/science.abj8754)
* OmegaFold: [Wu *et al.*, <i>bioRxiv</i> (2022)](https://doi.org/10.1101/2022.07.21.500999)
* [Baker lab](https://www.bakerlab.org/)
* Quantum folding: [Robert *et al.*, <i>npj Quantum Inf.</i> (2021), <b>7</b>, 38](https://doi.org/10.1038/s41534-021-00368-4)
* Protein generation: [Watson *et al.*, <i>bioRxiv</i> (2022)](https://doi.org/10.1101/2022.12.09.519842)
* [OmegaFold on Github](https://github.com/HeliXonProtein/OmegaFold)
* [PDB](https://www.rcsb.org/)
* [NGLView documentation](http://nglviewer.org/nglview/release/v0.5.1/)
* [Biopython documentations](https://biopython.org/wiki/Documentation)
* [Biotite documentation](https://www.biotite-python.org/)
1,284 changes: 1,284 additions & 0 deletions notebooks/T03_proteinfolding/T03_proteinfolding.ipynb

Large diffs are not rendered by default.

3,305 changes: 3,305 additions & 0 deletions notebooks/T03_proteinfolding/data/6yj1.pdb

Large diffs are not rendered by default.

3,129 changes: 3,129 additions & 0 deletions notebooks/T03_proteinfolding/data/7fvu.pdb

Large diffs are not rendered by default.

58 changes: 58 additions & 0 deletions notebooks/T03_proteinfolding/data/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Information about the input files

For each of the following proteins we have a `PDB` and a `FASTA` file stored in the `data` folder.
Both of which have been downloaded from the respective source page linked below each of the proteins.
The `FASTA` file is needed as input to the algorithm we will be using.
The `PDB` file resembles the ground truth crystalline structure for the protein.
It is used to assess the quality of the predicted conformation.

## 6YJ1

### The M23 peptidase domain of the Staphylococcal phage 2638A endolysin

- PDB DOI: https://doi.org/10.2210/pdb6YJ1/pdb

- Classification: VIRAL PROTEIN
- Organism(s): Staphylococcus phage 2638A
- Expression System: Escherichia coli
- Mutation(s): No

- Deposited: 2020-04-02
- Released: 2020-09-09
- Deposition Author(s): Dunne, M., Ernst, P., Sobieraj, A., Pluckthun, A., Loessner, M.J.

#### Experimental Data Snapshot

- Method: X-RAY DIFFRACTION
- Resolution: 2.30 Å
- R-Value Free: 0.310
- R-Value Work: 0.251
- R-Value Observed: 0.254

[Source](https://www.rcsb.org/structure/6yj1)

## 7FVU
### Crystal Structure of human FABP4 in complex with 2-\[2-(benzothiophen-3-yl)-2,3-dihydrobenzothiophene-3-carbonyl]benzoic acid

- PDB DOI: https://doi.org/10.2210/pdb7FVU/pdb

- Classification: LIPID BINDING PROTEIN
- Deposition Group: G_1002264
- Organism(s): Homo sapiens
- Expression System: Escherichia coli BL21(DE3)
- Mutation(s): No

- Deposited: 2023-04-27
- Released: 2023-06-14
- Deposition Author(s): Ehler, A., Benz, J., Obst, U., Rudolph, M.G.
- Funding Organization(s): F. Hoffmann-La Roche LTD

#### Experimental Data Snapshot

- Method: X-RAY DIFFRACTION
- Resolution: 1.24 Å
- R-Value Free: 0.159
- R-Value Work: 0.135
- R-Value Observed: 0.137

[Source](https://www.rcsb.org/structure/7FVU)
2 changes: 2 additions & 0 deletions notebooks/T03_proteinfolding/data/rcsb_pdb_6YJ1.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
>6YJ1_1|Chains A, B|ORF007|Staphylococcus phage 2638A (320836)
MHHHHHHVNSLEMLTAIDYLTKKGWKISSDPRTYDGYPKNYGYRNYHENGINYDEFCGGYHRAFDVYSNETNDVPAVTSGTVIEANDYGNFGGTFVIRDANDNDWIYGHLQRGSMRFVVGDKVNQGDIIGLQGNSNYYDNPMSVHLHLQLRPKDAKKDEKSQVCSGLAMEKYDITNLNAKQDKSKN
2 changes: 2 additions & 0 deletions notebooks/T03_proteinfolding/data/rcsb_pdb_7FVU.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
>7FVU_1|Chain A|Fatty acid-binding protein, adipocyte|Homo sapiens (9606)
GSHMDAFVGTWKLVSSENFDDYMKEVGVGFATRKVAGMAKPNMIISVNGDVITIKSESTFKNTEISFILGQEFDEVTADDRKVKSTITLDGGVLVHVQKWDGKSTTIKRKREDDKLVVECVMKGVTSTRVYERA
Binary file added notebooks/T03_proteinfolding/images/RMSD.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added notebooks/T03_proteinfolding/images/figure1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added notebooks/T03_proteinfolding/images/figure1A.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added notebooks/T03_proteinfolding/images/figure1B.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added notebooks/T03_proteinfolding/images/figure1C.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added notebooks/T03_proteinfolding/images/figure2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added notebooks/T03_proteinfolding/images/figure2A.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added notebooks/T03_proteinfolding/images/figure2B.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added notebooks/T03_proteinfolding/images/figure3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added notebooks/T03_proteinfolding/images/figure3A.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added notebooks/T03_proteinfolding/images/figure3B.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added notebooks/T03_proteinfolding/images/image1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added notebooks/T03_proteinfolding/images/image2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added notebooks/T03_proteinfolding/images/image3.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added notebooks/T03_proteinfolding/images/image4.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added notebooks/T03_proteinfolding/images/image5.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added notebooks/T03_proteinfolding/images/image6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added notebooks/T03_proteinfolding/images/rama.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added notebooks/T03_proteinfolding/images/rama.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading