Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize when validation with layering node_when objects #2347

Open
n-toshikazu opened this issue Feb 10, 2025 · 7 comments
Open

optimize when validation with layering node_when objects #2347

n-toshikazu opened this issue Feb 10, 2025 · 7 comments
Labels
is:enhancement Request for adding new feature or enahncing functionality.

Comments

@n-toshikazu
Copy link

I optimize when validation in libyang that reduce about 50% execution time.
If patch(0001-validation-OPTIMIZE-construct-layering-node_when.patch) can be acceptable,
please merge to libyang source tree.

backgroud

Our sysrepo datastore is tremendously grown up, therefore,
sr_apply_change() on edit-config reply is simultaneously slower.

We calibrated each execution in sr_apply_change() and as results,
sr_modinfo_validate() hold about 95% execution time.
=> Our calibration data owns 456917 node_when and 90123 node_type, others are 0
as validation target nodes.

calibration of lyd_validate_module()

[Calibration result with libyang-3.7.8]

patches to use for calibration

o 0001-libyang-validation-performance-check.patch
calibrating time and accounting about when validation
Note: Need libnetconf2 cmake option -DENABLE_EXAMPLES:String=Off to avoid compile error
o 0001-sysrepo-validation-from-sr_apply_changes-calibration.patch
reporting time and accounting in libyang

reported log

sr_lyd_validate_module() total=361108 msec

  • when validation takes 69836 msec
    o total when=456917
    o topmost when=456917
    o when rm=456917
    o validation eval=456917
    == result(disable=424534/enable=32383/autodel=0)
    == autodel type=0
  • incompletely validated terminal 90123 values 291271 msec

Current when validations are executed for all when nodes without autodel
because they are always started from the lowest descendants to the highest ancestor.

I improve this inefficient behavior by splitting node_when entries into layering node_when
objects(ly_set). Validation always starts from topmost when nodes and If when is disable,
its descendants can be dropped without any search or update costs.

[Calibration result with libyang-3.7.8 + patch]

patches to use for calibration

o 0001-validation-OPTIMIZE-construct-layering-node_when.patch
optimize when validation with layering node_when
o 0002-libyang-validation-performance-check.patch
calibrating time and accounting about when validation
Note: Need libnetconf2 cmake option -DENABLE_EXAMPLES:String=Off to avoid compile error
o 0001-sysrepo-validation-from-sr_apply_changes-calibration.patch
reporting time and accounting in libyang

reported log

sr_lyd_validate_module() total=327047 msec

  • when validation takes 31934 msec
    o total when=456917
    o topmost when=204690
    o when rm=247996
    o inner set(new=218010/free=218010)
    == inner type(container=211350/list=6660)
    o validation eval=247996
    == result(disable=215613/enable=32383/autodel=208921)
    == autodel type=0
  • incompletely validated terminal 90123 values 295112 msec

More than 50% execution time(69836 msec -> 31934 msec) is reduced.
We calibrate each pattern more than 5 times and every score is roughly the same.

Regards.

@michalvasko
Copy link
Member

I have nothing against optimizations but you must not modify struct lyd_node. So please adjust your patch accordingly or provide a smaller use-case reproducing your use-case and I can try to optimize it myself.

@michalvasko michalvasko added the is:enhancement Request for adding new feature or enahncing functionality. label Feb 14, 2025
@n-toshikazu
Copy link
Author

I have nothing against optimizations but you must not modify struct lyd_node.
So please adjust your patch accordingly or provide a smaller use-case reproducing your use-case and I can try to optimize it myself.

I appreciate your review and suggestion.
Since our product owner won't permit to export configuration data,
I just rework patch according to your advise.
As results, optimization effects can be kept, virtual memory usage can be stabled,
details are below.

  • Take2-0001-validation-OPTIMIZE-construct-layering-node_when.patch
    Change from previous patch.
    1. Instead of changing struct lyd_node, declare new struct lyd_when_chain.
    2. Declare new validation flag LYD_VALIDATE_LAYERING_WHEN because
      lyd_validate_unres_when() cast lyd_node or lyd_when_chain struct now,
      (I do not modify whole of validation entry points, there still be validated with struct lyd_node)
    3. Rewrite new when validation setup parts with new structure and flag.
  • Take2-0002-libyang-validation-performance-check.patch
    Rewrite with new structure accounting.
    Calibrate setup time in addition to validation time because
    patch append allocation or lookups in validation node setup phase.

Report log

sr_lyd_validate_module() total=327167 msec
when validation cost total=321847 msec

  • when validation takes 31186 msec
    o total when=456917(13320)
    o topmost when=204690
    o when rm=247996
    o chain (alloc=456917/free=456917)
    o inner set(new=218010/free=218010)
    == inner type(container=211350/list=6660)
    o validation eval=247996
    == result(disable=215613/enable=32383/autodel=208921)
    == autodel type=0
  • incompletely validated terminal 90123 values 290660 msec

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 148 46.4 3.6 791464 599560 ? Ssl 17:07 17:18 /usr/sbin/netopeer2-server -t 3600 -c MSG

Validation time can be improved like as previous patch without large setup overheads.
libyang-3.7.8
total=367797msec (setup=5283msec/validation=70028msec/others=292485msec)
libyang-3.7.8 + current patch
total=332869msec (setup=5310msec/validation=31587msec/others=295972msec)
libyang-3.7.8 + previous patch
total=330919msec (setup=5280msec/validation=31399msec/others=294240)

I append VSZ/RSS of netopeer2 from last time and I can improve memory usage from previous patch.
libyang-3.7.8
VSZ=790780/RSS=581440
libyang-3.7.8 + current patch (nearly equal to libyang-3.7.8)
VSZ=791464/RSS=599560
ibyang-3.7.8 + previous patch (must not modify struct lyd_node)
VSZ=856316/RSS=616068

And I'm sorry but if there are any improper naming or style in my patch,
please revise, I'm not familiar with them...

Regards.

@michalvasko
Copy link
Member

Okay, thanks for the effort but it is always better to discuss the details of an optimization before implementing it. If I understood your changes correctly, all they effectively do is avoid this inefficient loop.

Your solution seems fine and is quite straightforward, create a "tree" of the nodes with when so you can quickly find all the descendants of an auto-deleted node with false when. However, for further maintenance these changes are not ideal.

When writing that loop, I did not even try to optimize it as I was not sure it was worth it but I suppose for large data trees with lots of when it may make a significant difference. When trying to come up with a simpler solution, I think one could use the fact that the unres when set follows DFS of the data tree. Meaning you should be able to iterate over all the following nodes checking their ancestor is the node you are deleting until you found a node that is not. And you remove from the set all the nodes that are.

What do you think about this? Should be quite a small change and I would think you should get similar results to yours with much less code changes.

@n-toshikazu
Copy link
Author

I appreciate your follow-up.

Unless my understanding about your explanation is wrong,
"DFS of the data tree" is already executed with LYD_TREE_DFS_BEGIN() and LYD_TREE_DFS_END() loops
at lyd_validate_subtree(), node_when objects are already sorted with DFS order.
And "iterate over all the following nodes checking their ancestor" is implemented at lyd_validate_autodel_node_del() too.

Thus, when we reverse lyd_node objects before starting validation loop at lyd_validate_unres_when()
and at lyd_validate_autodel_node_del(), change LYD_TREE_DFS_BEGIN() loop condition more effective,
we may achieve similar results.

[lyd_validate_autodel_node_del()]

    if (node_when && node_when->count) {
        /* remove nested from node_when set */
        LYD_TREE_DFS_BEGIN(del, iter) {
            if ((del != iter) && ly_set_contains(node_when, iter, &idx)) {
                ly_set_rm_index(node_when, idx, NULL);
            }
            LYD_TREE_DFS_END(del, iter);
        }
    }

I have to avoid inefficient searching from the large node_when above.
My current idea is appending "lysc_has_when(iter->schema)" before ly_set_contains(node_when, iter, &idx),
however, I can not understarnd your "until you found a node that is not." meaning...
(Less code changes with reversing and contition changing)

Please tell me what you think.

Regards.

@michalvasko
Copy link
Member

Could you please somehow provide the use-case I can test on? Either by sharing your files privately by sending them to me via email or, better yet, creating a small use-case with renamed nodes, which I can directly add to the tests.

Because some of my assumptions are wrong. If the nodes are sorted in DFS and are traversed from the end, nested nodes should always be resolved before their parents so deleting anything from the set (unresolved descendants) should never be needed.

@n-toshikazu
Copy link
Author

I'm trying to prepare dummy configuration with hearing howto, please wait for a while.

If the nodes are sorted in DFS and are traversed from the end, nested nodes should always be resolved before their parents
so deleting anything from the set (unresolved descendants) should never be needed.

I'd like to confirm about this comment implication.
Current libyang traverses from the end(set[set.count-1]) to start(set[0]) with DFS sorted set(unresolved descendants).
As stated this comment, there is no unresolved descendant to be deleted, this always takes O(n) costs.
However, against this theory, my proposal is that libyang traverses from start to the end (expressed by reversing) and
unless ancestor when is enabled, delete its unresolved descendant(when)s at the same time, this takes O(n) - 'autodel' costs.
So if you are negative to my such approach, I have no another idea any more.(Sorry if I misread)

Regards.

@michalvasko
Copy link
Member

Just please post any small example that actually executes this code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is:enhancement Request for adding new feature or enahncing functionality.
Projects
None yet
Development

No branches or pull requests

2 participants