-
Notifications
You must be signed in to change notification settings - Fork 17.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scripting: backport memory handling changes to 4.6 #28680
Scripting: backport memory handling changes to 4.6 #28680
Conversation
make functions for lua heap allocation suitable for use in all non-ChibiOS HALs
this allows for new heaps to be added at runtime for lua scripting if you run out of memory while armed
we want the lua garbage collector to be used to re-use memory where possible. This implements a suggestion from Thomas to avoid heap expansion unless the last allocation failed
implement in AP_MultiHeap instead
this is a standalone (no-HAL based) implementation of MultiHeap
I'm not sure we need to rush to backport this. We have had various issues with scripting allocation in the past, I would feel more comfortable if it were to be in master for a while. I don't think I would even call this a vulnerability fix. This is really just a "get out of jail free" for not setting your heap size large enough. This is very unlikely that this would hit existing scripts that have been well tested (of course memory usage does increase a little with each binding we add, so it is theoretically possible that a script that was only just running on 4.5 does not run on 4.6 (but we have also saved a bunch of memory recently due to a binding rework)). I think this is a feature that is more a benefit to developers and those writing scripts than users. Even then, I don't recall the last time I actually had to increase from the default heap size (admittedly using H7s almost exclusively). |
Note that this does also fix an issue which would cause an allocation expansion to fail if the first sub-heap is full no matter how large your heap is. I don't know how large that first sub-heap is typically on H7, but it's surely getting smaller. Backporting just that change wouldn't really be possible, all the rework is necessary to make that not an issue, then adding the auto expansion on top is pretty trivial. I am a bit wary of back-porting something so big and critical as well but there is justification for those with important Lua scripts. |
it doesn't matter how large you set SCR_HEAP_SIZE. It can be set to 400k and fail at 100k due to the way the allocation works between heaps. When scripting calls change_size() on a pointer if the new size is not available within the current heap then the allocation will fail. That current heap can be quite small, and the chances of that is rising as other subsystems take more of the 512k AXI SRAM on the H7. |
@IamPete1 when would you want it to go in? It isn't clear from your comment |
Hi @tpwrules, how do you feel about putting an approved on this to add your vote to merging? |
I think it should be in master for a month or two, maybe aim for 4.6.1. Just seems two soon to go straight into a release with only the testing that was done on the original PR. |
Did we get any reports of this happening in the wild? |
we have no easy way of knowing for sure, as even with the scripting logging we can't really tell what the memory layout is. We have had some cases though where I suspect it is happening, for example on a recent partner onboarding call they reported they could not get the quicktune lua script to run on CubeOrangePlus, so they had (quite independently of the effort in the current PR) worked on porting quicktune to C++ themselves. That one really surprised me as CubeOrangePlus uses the primary H7 memory layout, which keeps the 512k AXI SRAM region away from being chewed up by other things like the main firmware data. |
that violates another one of our principles of trying to avoid major changes in the point releases. We do it sometimes, but we try to avoid it if we can as it violates user expectations of the release process. |
I think if we're eventually going to merge this to 4.6 then the sooner the better in order to get more beta testers using it. Probably the best way to reduce the risk is to have devs pour over it and try and spot issues so I look forward to @tpwrules finishing his review. The second line of defense is the alpha and then beta testing and so maximising the time it spends in beta testing should add safety for our general community. Those testing the beta are knowingly taking on extra risks so they should be ready for any issues and are more likely to report back |
We are still in Beta on 4.6 if I'm not wrong - now is the time to do it. IMO we should put this in for Beta 2, so it becomes part of the 4.6 release. |
@tpwrules, you didn't have a chance to look over this again did you? Sorry, I'm just not personally qualified to review it but as a release manager I would like to get it into 4.6 earlier rather than later. No real pressure though of course, we've all got things to do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, tried it on Cube Orange in sim on HW without problems.
Let's get this in and baking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that @tpwrules has approved, I'm OK as well
this is a backport of this PR:
#26517
I think this is important for 4.6 as users are more commonly relying on scripting now, and we have a vulnerability where an allocation in one heap cannot be expanded if that heap is exhausted even if there is plenty of memory in the other heap. The user has no way of knowing how the scripting allocation is spread across heaps, so they have no way of knowing if their script could fail with an out of memory condition in flight