Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgmt: delete candidate scratch buffer #14544

Merged
merged 1 commit into from
Nov 8, 2023

Conversation

idryzhov
Copy link
Contributor

@idryzhov idryzhov commented Oct 6, 2023

The code doesn't work at all. It tries to use libyang operation metadata in a regular (not diff) data tree, and regular data trees don't provide this data. Also, for destroy operations, it searches for nodes in the running config, which may not have the deleted nodes if we're not using implicit commits.

@pushpasis @choppsv1 I'm adding a do not merge label in case I am missing something. Please, take a look before I remove the label.

The code doesn't work at all. It tries to use libyang operation
metadata in a regular (not diff) data tree, and regular data trees
don't provide this data. Also, for destroy operations, it searches
for nodes in the running config, which may not have the deleted nodes
if we're not using implicit commits.

Signed-off-by: Igor Ryzhov <[email protected]>
Copy link
Contributor

@pushpasis pushpasis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@idryzhov Please don't delete the scratch buffer. It has been added to fasten the diff computation. Let's say you already have 10k routes in running DB, and you just added one more static route. The single new static route will be added to the scratch buffer as well as the candidate datastore. But diff between the candidate and running datastore will take a huge time since it already has 10k items. So to avoid the big time the diff causes we use the scratch buffer, that is cleared after every commit-apply. Please don't remove the scratch buffer, it will increase the commit time when there's already huge config set present on the device.

@pushpasis
Copy link
Contributor

Perhaps the correct fix for the current problem is to search the config being deleted in the scratch buffer first (and remove it from there if found)

@pushpasis
Copy link
Contributor

Actually, can you explain which scenarios are not working? That will help me understand your point better.

@idryzhov
Copy link
Contributor Author

idryzhov commented Oct 7, 2023

I understand the idea behind the scratch buffer, the problem is that it does not work in ANY scenarios. Every call of nb_update_candidate_changes ends in either if (!root) check or case 'n' operation, hence, the scratch buffer is always empty. Deleting this code won't make things slower as it never works anyway.

To be honest, while I like the idea of eliminating the additional diff of the whole config tree, I don't like the implementation at all, even if it was working. The code is basically trying to mimic nb_config_diff behavior without actually doing the diff. It looks really fragile. If we want to fix the scratch-buffer instead of deleting it, we should reimplement it by using the same strategy nb_config_diff uses, but doing the diff over necessary subtrees instead of the whole config tree to make things faster.

@idryzhov idryzhov added the mgmt FRR Management Infra label Oct 9, 2023
@pushpasis
Copy link
Contributor

I understand the idea behind the scratch buffer, the problem is that it does not work in ANY scenarios. Every call of nb_update_candidate_changes ends in either if (!root) check or case 'n' operation, hence, the scratch buffer is always empty. Deleting this code won't make things slower as it never works anyway.

To be honest, while I like the idea of eliminating the additional diff of the whole config tree, I don't like the implementation at all, even if it was working. The code is basically trying to mimic nb_config_diff behavior without actually doing the diff. It looks really fragile. If we want to fix the scratch-buffer instead of deleting it, we should reimplement it by using the same strategy nb_config_diff uses, but doing the diff over necessary subtrees instead of the whole config tree to make things faster.

I still did not get why is it not working? You din't tell me yet the exact scenarios you think this is not working. Just deleting the scratch buffer because you think this is not done correctly is not okay. Give a better solution instead. Deleting the scratch buffer is NOT a better solution. I don't understand this thing. Tomorrow if I think I don't understand any other code elsewhere, shall I go ahead and try to remove it? I will expect myself to come up with a better solution than the current one not losing base functionality over it.

@idryzhov
Copy link
Contributor Author

Once again, Pushpasis, this code does not work completely. It's not just wrong in some cases, it ALWAYS DOES NOTHING. The scratch buffer is ALWAYS EMPTY. I'm sorry for using caps, but I think I explained what happens in my previous comment and you keep asking the same questions instead of just testing the code yourself.

I also told you already that it does nothing in ANY scenarios of changing the candidate configuration, but if you want to have specific scenarios, then here they are:

  1. mgmt set-config /frr-interface:lib/interface[name='test']/description test-descr – we're not adding anything to the scratch buffer because nodes in candidate data tree do not have operation metadata and switch (op) in nb_update_candidate_changes always ends up in case 'n': /* none */ and does nothing.
  2. mgmt delete-config /frr-interface:lib/interface[name='test'] – there are two different cases here:
    • If the running data tree has interface test configuration, we're not adding anything to the scratch buffer because of the same reason as in previous scenario.
    • If the running data tree doesn't have interface test configuration, we're not adding anything to the scratch buffer because yang_dnode_get(running_config->dnode, xpath); returns NULL and the function exits on if (!root) check.

Any other command in the CLI is just a combination of two commands mentioned above.

@pushpasis
Copy link
Contributor

Once again, Pushpasis, this code does not work completely. It's not just wrong in some cases, it ALWAYS DOES NOTHING. The scratch buffer is ALWAYS EMPTY. I'm sorry for using caps, but I think I explained what happens in my previous comment and you keep asking the same questions instead of just testing the code yourself.

I also told you already that it does nothing in ANY scenarios of changing the candidate configuration, but if you want to have specific scenarios, then here they are:

  1. mgmt set-config /frr-interface:lib/interface[name='test']/description test-descr – we're not adding anything to the scratch buffer because nodes in candidate data tree do not have operation metadata and switch (op) in nb_update_candidate_changes always ends up in case 'n': /* none */ and does nothing.

  2. mgmt delete-config /frr-interface:lib/interface[name='test'] – there are two different cases here:

    • If the running data tree has interface test configuration, we're not adding anything to the scratch buffer because of the same reason as in previous scenario.
    • If the running data tree doesn't have interface test configuration, we're not adding anything to the scratch buffer because yang_dnode_get(running_config->dnode, xpath); returns NULL and the function exits on if (!root) check.

Any other command in the CLI is just a combination of two commands mentioned above.

Hi @idryzhov, thanks a lot for the detailed explanation. I tried looking into git blame for the code. Looks like it stopped working when we tried to merge our private code into dev/mgmtd. And we never realised its not working after the rebase. II would prefer fixing the scratch buffer non-usage bug instead of removing the scratch buffer all together. If you prefer me fixing it I can raise a PR with the appropriate fix next week.

@idryzhov
Copy link
Contributor Author

Okay, let's wait for the fix next week. If it's not possible to implement it correctly soon, I'd prefer to merge this PR to get rid of the code the currently does nothing.

@pushpasis
Copy link
Contributor

Okay, let's wait for the fix next week. If it's not possible to implement it correctly soon, I'd prefer to merge this PR to get rid of the code the currently does nothing.

Agreed. If I am not able to fix it correctly you can merge this.

@pushpasis
Copy link
Contributor

@idryzhov I have raised a PR with proper fix. You can find it here #14594

@idryzhov
Copy link
Contributor Author

@pushpasis as the correct implementation seems to be non-trivial and going to take time, how about we merge this PR and you update your PR with from-scratch implementation? I think it will be even easier to review this way.

@github-actions github-actions bot added the rebase PR needs rebase label Nov 8, 2023
@donaldsharp donaldsharp merged commit 85a80ba into FRRouting:master Nov 8, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
master mgmt FRR Management Infra rebase PR needs rebase size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants