Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tools: Accelerate config deletion in frr-reload.py by bulk execution #14927

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ishizakiyu
Copy link

This PR introduces an optimization for the frr-reload.py script,
which accelerates config deletion by processing them in bulk.

Motivation

Previously, the script executed each deletion command individually using vtysh -c "configure" -c ..., which was time-consuming and inefficient, especially when dealing with a large number of deletions. This approach could result in minutes of delay during a configuration reload.

Changes

  • Deletion commands are now written to a temporary file and processed in batch using vtysh -f.
    This aligns the deletion process with how addition commands are handled and significantly speeds up
    the configuration changes.
  • The script has been updated to capture the line numbers of any failed deletion commands from the standard error output of vtysh -f. Using these line numbers, the script retrieves the corresponding commands from the file.
  • The retrieved failed commands are subsequently reprocessed according to the established deletion workflow,
    which involves reattempting the failed deletion commands, truncating one word from the end of each command
    until it succeeds or cannot be truncated further.

Impact

The bulk execution of deletion commands greatly reduces the time required for applying configuration changes in frr reload. This enhancement is particularly beneficial for environments with frequent configuration updates or large sets of deletions.

Copy link
Member

@ton31337 ton31337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also (to better understand the speedup) add some before/after metrics?

tools/frr-reload.py Outdated Show resolved Hide resolved
@ishizakiyu ishizakiyu force-pushed the master branch 2 times, most recently from 33384a9 to c55ff15 Compare December 4, 2023 11:09
@ishizakiyu
Copy link
Author

Could you also (to better understand the speedup) add some before/after metrics?

I have measured the time it takes to stop the advertisement of 1,000 networks.

With the previous method, it takes about 0.1 seconds per command.
As a result, it takes about 100 seconds to withdraw 1,000 networks.

2023-12-04 20:49:04,790  INFO: Loading Config object from file /etc/frr/frr.conf
2023-12-04 20:49:04,990  INFO: Loading Config object from vtysh show running
2023-12-04 20:49:05,228  INFO: Executed "router bgp 4200000001  address-family ipv4 unicast   no network 10.0.0.0/32  exit exit"
2023-12-04 20:49:05,326  INFO: Executed "router bgp 4200000001  address-family ipv4 unicast   no network 10.0.0.1/32  exit exit"
2023-12-04 20:49:05,424  INFO: Executed "router bgp 4200000001  address-family ipv4 unicast   no network 10.0.0.2/32  exit exit"
2023-12-04 20:49:05,522  INFO: Executed "router bgp 4200000001  address-family ipv4 unicast   no network 10.0.0.3/32  exit exit"
2023-12-04 20:49:05,620  INFO: Executed "router bgp 4200000001  address-family ipv4 unicast   no network 10.0.0.4/32  exit exit"
2023-12-04 20:49:05,719  INFO: Executed "router bgp 4200000001  address-family ipv4 unicast   no network 10.0.0.5/32  exit exit"
2023-12-04 20:49:05,818  INFO: Executed "router bgp 4200000001  address-family ipv4 unicast   no network 10.0.0.6/32  exit exit"
2023-12-04 20:49:05,917  INFO: Executed "router bgp 4200000001  address-family ipv4 unicast   no network 10.0.0.7/32  exit exit"
2023-12-04 20:49:06,017  INFO: Executed "router bgp 4200000001  address-family ipv4 unicast   no network 10.0.0.8/32  exit exit"
2023-12-04 20:49:06,116  INFO: Executed "router bgp 4200000001  address-family ipv4 unicast   no network 10.0.0.9/32  exit exit"
...
2023-12-04 20:50:43,465  INFO: Executed "router bgp 4200000001  address-family ipv4 unicast   no network 10.0.3.223/32  exit exit"
2023-12-04 20:50:43,564  INFO: Executed "router bgp 4200000001  address-family ipv4 unicast   no network 10.0.3.224/32  exit exit"
2023-12-04 20:50:43,662  INFO: Executed "router bgp 4200000001  address-family ipv4 unicast   no network 10.0.3.225/32  exit exit"
2023-12-04 20:50:43,760  INFO: Executed "router bgp 4200000001  address-family ipv4 unicast   no network 10.0.3.226/32  exit exit"
2023-12-04 20:50:43,858  INFO: Executed "router bgp 4200000001  address-family ipv4 unicast   no network 10.0.3.227/32  exit exit"
2023-12-04 20:50:43,956  INFO: Executed "router bgp 4200000001  address-family ipv4 unicast   no network 10.0.3.228/32  exit exit"
2023-12-04 20:50:44,054  INFO: Executed "router bgp 4200000001  address-family ipv4 unicast   no network 10.0.3.229/32  exit exit"
2023-12-04 20:50:44,152  INFO: Executed "router bgp 4200000001  address-family ipv4 unicast   no network 10.0.3.230/32  exit exit"
2023-12-04 20:50:44,250  INFO: Executed "router bgp 4200000001  address-family ipv4 unicast   no network 10.0.3.231/32  exit exit"
2023-12-04 20:50:44,348  INFO: Executed "router bgp 4200000001  address-family ipv4 unicast   no network 10.0.3.232/32  exit exit"
2023-12-04 20:50:44,348  INFO: /var/run/frr/reload-5VKRVQ.txt content

The method in this PR, which writes the commands to a temporary file and executes them all at once,
takes only 0.5 seconds.

2023-12-04 20:59:09,844  INFO: Loading Config object from file /etc/frr/frr.conf
2023-12-04 20:59:10,057  INFO: Loading Config object from vtysh show running
2023-12-04 20:59:10,211  INFO: /var/run/frr/reload-NEA7DT.txt content
['router bgp 4200000001\n'
 ' address-family ipv4 unicast\n'
 '  no network 10.0.0.1/32\n'
 ' exit\n'
 'exit\n',
 'router bgp 4200000001\n'
 ' address-family ipv4 unicast\n'
 '  no network 10.0.0.2/32\n'
 ' exit\n'
 'exit\n',
 'router bgp 4200000001\n'
 ' address-family ipv4 unicast\n'
 '  no network 10.0.0.3/32\n'
 ' exit\n'
 'exit\n',
 'router bgp 4200000001\n'
 ' address-family ipv4 unicast\n'
 '  no network 10.0.0.4/32\n'
 ' exit\n'
 'exit\n',
 ...
 'router bgp 4200000001\n'
 ' address-family ipv4 unicast\n'
 '  no network 10.0.3.229/32\n'
 ' exit\n'
 'exit\n',
 'router bgp 4200000001\n'
 ' address-family ipv4 unicast\n'
 '  no network 10.0.3.230/32\n'
 ' exit\n'
 'exit\n',
 'router bgp 4200000001\n'
 ' address-family ipv4 unicast\n'
 '  no network 10.0.3.231/32\n'
 ' exit\n'
 'exit\n',
 'router bgp 4200000001\n'
 ' address-family ipv4 unicast\n'
 '  no network 10.0.3.232/32\n'
 ' exit\n'
 'exit\n']
2023-12-04 20:59:10,778  INFO: /var/run/frr/reload-A8JS5A.txt content

Copy link
Member

@ton31337 ton31337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, IMO very nice addition for frr-reload. I'm just not sure if we need an additional argument for frr-reload.py to control this batch behavior. What if we add --batch or so?

@ishizakiyu
Copy link
Author

ishizakiyu commented Dec 6, 2023

Considering that config addition is already processed in batch, an option like --batch might lead to confusion.
The reload itself abstracts the differential update using vtysh, so I don't think it's good to have the internal processing method change with an additional option. And introducing such an option would mean that we would always have to keep in mind two update methods when modifying this script or the FRR itself in the future, which I don't think is desirable.

Commands that do not succeed without truncation will be processed according to the established way, so this change will not affect the actual commands executed, nor will it alter the number of attempts. And many commands that do not require truncation will be rapidly executed through batch processing.

The only minor change is that since we execute deletion commands in batch from a file and then retry the failed ones, the order of success for deletion commands may change where there are commands that cannot succeed without truncation.
But I believe this will not affect the current reload script, as it doesn't specify the execution order of deletion commands.
(although the order between deletion and addition is considered).

So introducing --batch seems to offer little benefits to users or developers to me.

@ton31337 ton31337 requested a review from chiragshah6 December 6, 2023 07:06
@ishizakiyu
Copy link
Author

Hi @chiragshah6,

Could you please take a look when you have a moment?
I'd appreciate your feedback on whether there are any concerns that we should address before merging.

Thanks for your help!

Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Previously, in config deletion, each command was executed individually
using vtysh -c "configure" -c ..., leading to significant delays,
especially with a large number of deletions. This process could take
minutes during a reload.

To resolve this, deletion commands are now written to a file and
processed in batch using vtysh -f, just as addition commands are.
This change significantly improves the speed of configuration changes.

Additionally, to handle cases where deletion commands fail, the script
now captures the line numbers of failed commands from the standard
error output of vtysh -f. Based on these numbers, it retrieves the
corresponding commands from the file. The retrieved failed commands are
subsequently reprocessed according to the established deletion
workflow, which involves reattempting the failed deletion commands,
truncating one word from the end of each command until it succeeds or
cannot be truncated further.

Signed-off-by: Yu Ishizaki <[email protected]>
@ishizakiyu
Copy link
Author

@ton31337 @chiragshah6

Are we good to get this change in? If there are any concerns, please let me know.
Your feedback would be greatly appreciated!

Copy link

This PR is stale because it has been open 180 days with no activity. Comment or remove the autoclose label in order to avoid having this PR closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants