Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing randomness for template selection #27

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

smoe
Copy link

@smoe smoe commented Jun 15, 2015

I toyed with the antibody.py a bit more in conjunction with the multigraft options, so I observed the arbitrary assignment of template for a series of invocations in a row, looking like

...RNING: No template avaliable for H3 after filtering! Using a random template of the same length as the query
H3 template: pdb3d85_chothia.pdb

WARNING: No template avaliable for H3 after filtering! Using a random template of the same length as the query
H3 template: pdb3d85_chothia.pdb

WARNING: No template avaliable for H3 after filtering! Using a random template of the same length as the query
H3 template: pdb3d85_chothia.pdb

WARNING: No template avaliable for H3 after filtering! Using a random template of the same length as the query
H3 template: pdb3d85_chothia.pdb

WARNING: No template avaliable for H3 after filtering! Using a random template of the same length as the que...

and somehow I did not believe in observing in seeing the same thing by chance. After my patch, for the same scFv sequence pair it looks like

WARNING: No template avaliable for H3 after filtering! Seeking templates of the same length 6 as the query.
INFO: Making a random choice from 16 entries found featuring H3 with length 6.
H3 template: pdb1igy_chothia.pdb

WARNING: No template avaliable for H3 after filtering! Seeking templates of the same length 6 as the query.
INFO: Making a random choice from 16 entries found featuring H3 with length 6.
H3 template: pdb2eh7_chothia.pdb

WARNING: No template avaliable for H3 after filtering! Seeking templates of the same length 6 as the query.
INFO: Making a random choice from 16 entries found featuring H3 with length 6.
H3 template: pdb1igy_chothia.pdb

WARNING: No template avaliable for H3 after filtering! Seeking templates of the same length 6 as the query.
INFO: Making a random choice from 16 entries found featuring H3 with length 6.
H3 template: pdb4d9r_chothia.pdb

WARNING: No template avaliable for H3 after filtering! Seeking templates of the same length 6 as the query.
INFO: Making a random choice from 16 entries found featuring H3 with length 6.
H3 template: pdb3d85_chothia.pdb

WARNING: No template avaliable for H3 after filtering! Seeking templates of the same length 6 as the query.
INFO: Making a random choice from 16 entries found featuring H3 with length 6.
H3 template: pdb4d9r_chothia.pdb

WARNING: No template avaliable for H3 after filtering! Seeking templates of the same length 6 as the query.
INFO: Making a random choice from 16 entries found featuring H3 with length 6.
H3 template: pdb1f3d_chothia.pdb

WARNING: No template avaliable for H3 after filtering! Seeking templates of the same length 6 as the query.
INFO: Making a random choice from 16 entries found featuring H3 with length 6.
H3 template: pdb2eh7_chothia.pdb
...

so I introduced a random selection from all those templates of the same length and also improved on the wording.

When only selecting by length the previous version only
consistently went for the very first found. This patch
introduces randomness among all equal reference templates.
@smoe
Copy link
Author

smoe commented Jun 16, 2015

Hi Nick, I selected you as the assignee since your patches are physically closest to what I had changed and somehow I had felt that your lines may possibly be not as independent from this template selection as one might possibly think.

Another fix than the presented here may be to just substitute the word "random" with "first" or "arbitrary",

@nmarze
Copy link

nmarze commented Jun 16, 2015

Steffen,

You're right with the wording; random is not the right term - really, we were just picking the first of a list. I don't think, however, that a random selection is really the desired behavior most of the time. Unless multi-template grafting is explicitly turned on for a given CDR, the template chosen should be the same for each model (this does lead to the unfortunate duplication of the warning, but that's a different issue entirely). For this to be committed, there needs to be an option to turn off random selection (or at least to select the same random template for each model) that is automatically invoked when multi-template grafting is turned on.

In general, if there is no match for a CDR, the model is usually void anyway, so it's not really important how random the selection is. For H3, it's expected there won't be a match, so there may be some benefit to selecting a truly random template; though the H3 remodeling step involves large perturbations, it's probably still at least partially template-dependent.

Nick

@smoe
Copy link
Author

smoe commented Jun 17, 2015

Nick,

thank you tons for your kind reply. Hm. Right. Need to order my thoughts a bit now. Will try some random pick among presumed-equal ideas :-)

  • My particular sequence had several loops not matching. The H3 one most accessible for copy'n'paste with the limited screen buffer so I used it for this report. I had not thought about the extra H3 grafting and the expected non-matching at that point.
  • That the model is expected to be void with a non-matching CDR shocks me to some degree. I was hoping that might say that the random pick may with some chance at least allow for finding something that does not disturb the remaining topology too much. I understand that you indeed said something like that when you referred to the multi-template grafting.
  • [slightly off-topic] I have a similar problem with the selection of templates when there is a sequence-similarity. When there are similar sequences both for, say, H1 and H2, today's implementation will take just the respective first reported. Sometimes, though, the same reference antibody is fitting both for H1 and H2, which may be of preferable or which should at least be tried once. So, also here, if not introducing some context-sensitivity, a truly random selection may be preferable over just taking the first reported. The same may hold even across loops of different chains, e.g. when templates suggested for L1 and H1 have some overlap.

What is the consequence? Shall I introduce an option to decide if to return the first (to be deterministic) or a random template (to have a fair chance to include the best template out there)? And have that random flavour auto-selected when multi-template grafting is selected? Do you think there is (need for) someone out there to consult us?

Best,

Steffen

@nmarze
Copy link

nmarze commented Jun 17, 2015

Steffen,

The reason a non-matching CDR usually leads to an invalid model is because there is no explicit remodeling step for non-H3 CDRs. The small minimizations used aren't large enough to change the canonical conformation of a CDR. A random pick would give the correct canonical conformation a small percentage of the time, but typically, if there is no match for a non-H3 CDR, the CDR usually adopts a conformation not seen in the antibody database at all. We recently did a blind prediction of some antibodies and ran into some of these non-matching CDRs; our paper (DOI: 10.1002/prot.24534) has some more in-depth explanation of the issues.

As to the selection of identical matches, we do just take the first from the list, but the list is sorted by crystal resolution, so we always select the best-resolved of the best matches. There are definitely cases where taking different match would give a better structural fit. I agree that a random selection among good templates would be preferential in some circumstances, particularly in conjunction with multi-template grafting, but I think this too should be an option. For a quick, single-template run, it's probably still preferable to select the highest-resolution of the best matches consistently.

You may want to ask the opinions of Brian Weitzner & Daisuke Kuroda as well; they may have some insight I don't.

Best,
Nick

@jjgray
Copy link
Member

jjgray commented Jun 18, 2015

Nick: I agree with Steffen that we'd like to provide as realistic a model
as possible, even if we know a particular CDR may be a poor structural
match. At least we'd like to not clash and to minimally disturb the rest
of the paratope. If there's no better model, people will still want a
model to play with. The message should not be that the model is invalid
but that it has serious uncertainty/weakness in particular regions.

I also find the word 'random' unclear - there must be criteria for the list
the random loop is picked from

On Wednesday, June 17, 2015, Nick Marze [email protected] wrote:

Steffen,

The reason a non-matching CDR usually leads to an invalid model is because
there is no explicit remodeling step for non-H3 CDRs. The small
minimizations used aren't large enough to change the canonical conformation
of a CDR. A random pick would give the correct canonical conformation a
small percentage of the time, but typically, if there is no match for a
non-H3 CDR, the CDR usually adopts a conformation not seen in the antibody
database at all. We recently did a blind prediction of some antibodies and
ran into some of these non-matching CDRs; our paper (DOI:
10.1002/prot.24534) has some more in-depth explanation of the issues.

As to the selection of identical matches, we do just take the first from
the list, but the list is sorted by crystal resolution, so we always select
the best-resolved of the best matches. There are definitely cases where
taking different match would give a better structural fit. I agree that a
random selection among good templates would be preferential in some
circumstances, particularly in conjunction with multi-template grafting,
but I think this too should be an option. For a quick, single-template run,
it's probably still preferable to select the highest-resolution of the best
matches consistently.

You may want to ask the opinions of Brian Weitzner & Daisuke Kuroda as
well; they may have some insight I don't.

Best,
Nick


Reply to this email directly or view it on GitHub
#27 (comment).

Jeffrey J. Gray
Professor of Chemical & Biomolecular Engineering
Director of Graduate Admissions
Johns Hopkins University
208 Maryland Hall
Baltimore, Maryland 21218
http://graylab.jhu.edu
(410) 516-5313

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants