Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AP_Scripting: reduce Lua state memory usage #27426

Merged
merged 5 commits into from
Jul 23, 2024

Conversation

tpwrules
Copy link
Contributor

@tpwrules tpwrules commented Jul 1, 2024

A few re-engineerings to make the state setup more efficient and on-demand. Please see the commits for details.

There are no visible changes to scripts except for ones that play games reading _ENV, which none in the repo do. There might be a slight downside of increased risk of fragmentation and out of memory problems at inopportune moments, but I think the savings are worth it.

Tested some scripts including my REPL in SITL and on Cube Orange.

Total savings on Cube Orange:

  • 104 bytes of statically allocated RAM
  • 14K of Lua heap memory (reduced by 2/3)
  • ~800 bytes per running script
  • 216 bytes of flash

tpwrules added 4 commits June 30, 2024 18:58
Saves ~100B of statically allocated RAM.
They are never accessed from globals. Only their metatables are
accessed, using luaL_getmetatable.

Saves ~2.9K of Lua heap.
The __call metamethod was set to the metatable itself. With __call not
present, Lua will try to call the metatable (and fail), which is the
same behavior as with the __call metamethod set to the metatable.

Saves ~2K Lua heap.
The global table is then used as the __index metamethod of each state's
environment table. Avoids the overhead of loading binding objects into
each state. The binding objects are immutable from Lua so sandboxing is
not violated.

Does have the slight downside that a script can no longer know all the
binding names by enumerating _ENV.

Saves ~700B of memory per loaded script.
@tpwrules tpwrules requested a review from IamPete1 July 1, 2024 00:28
@IamPete1
Copy link
Member

IamPete1 commented Jul 1, 2024

Nice. Looks like this applies the same optimization we did for singletons to globals so we can keep them in flash rather than memory.

I think the only thing we need to check is how fast it is before vs after. I did do a little testing when we moved the singletons here:
#18249 (comment)

In that case it did approximately double the time taken to call a method. My concern here would be that we pay that on every call. Looks like ahrs is at the end of the list, so it will be the worst case. Its not clear to me if we have to do the search for each call or just the first? There is a DISABLE_INTERRUPTS_FOR_SCRIPT_RUN define that makes the timing more reliable.

What happens if we overload a singleton? EG ahrs = "foo"

@tpwrules
Copy link
Contributor Author

tpwrules commented Jul 1, 2024

Like the commit explains, the lookup result is stored in the global table so there should only be an additional cost the first time a particular singleton is accessed.

I will look into timing it to see if I can notice a difference. There might be a small new cost each call since the name has to fail to be found in the script environment before Lua looks in the global table. If this turns out to be unacceptable we can store the result in the environment at the cost of a bit more RAM usage.

If a script overrides a singleton name, then it will be successful, but will only affect that script as each script has its own environment separate from the global table. If the script deletes the override then it can access the singleton properly again.

@tpwrules
Copy link
Contributor Author

tpwrules commented Jul 2, 2024

Did some timing with the following script:

function update()
  local start_t = micros()
  local lt = 0
  for i = 0, 2000 do
    -- Location userdata creation function at the bottom of the table serach, so
    -- should be a worst case test for the lookup
    -- lt = Location -- Reference
    -- lt = Location() -- Instantiate
  end
  local end_t = micros()

  gcs:send_text(0, "took: " .. tostring(end_t-start_t) .. "us val:"
    .. tostring(lt))

  return update, 1000
end
return update()

And got some interesting results (times in microseconds):

Commit Reference Instantiate
master 2064 33562
Global table 3112 27279
Per-script table 1472 26062

Turns out that extra lookup costs more than I thought. It is probably faster overall in both cases in the instantiate case because the garbage collector has to do less work, so this should be a decent speed improvement overall.

Based on these results I am changing this PR to put the looked up entries into the script environment table instead of the global table. There is going to be a slight downside in that scripts now won't share userdatas, so if two scripts use the same singleton it will be ~32 more bytes than before. But that concern is probably overthinking it.

@tpwrules tpwrules force-pushed the pr/lua-reduce-ram branch from 9847b85 to 4f13c81 Compare July 2, 2024 03:16
Only create the binding object (singleton metatable/userdata or C
function reference) once the user first references a particular
singleton or userdata creation function. Once created, the object is
stored into the script's environment so it doesn't get recreated on the
next reference and there isn't any further overhead. The userdatas are
no longer shared between scripts which imposes a slight memory penalty
for multiple scripts using the same singleton but this avoids an
additional lookup time cost.

Userdata and ap_objects aren't eligible for this optimization as the C++
code might want a particular metatable at any time.

Saves ~9.3K Lua heap.
Copy link
Member

@IamPete1 IamPete1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything works expected in my testing. This saves both run time and memory. I guess the more userdata items a script uses the closer it will approach the current code. This simply loads into memory on demand rather than all of them up front.

We could probably use this method of loading for the userdata level methods too, we already save all the memory there but we could trade some of that back for better runtime.

This does make the logged per run memory usage a little more confusing because if you use a userdata for the first time the memory of loading that will be included, previously that was accounted for in the initial load memory.

This does also mean that a script that may previously have run out of memory immediately when it was loaded is now likely to load and then run out at runtime.

@tridge tridge merged commit b64ed6c into ArduPilot:master Jul 23, 2024
93 checks passed
@tpwrules tpwrules deleted the pr/lua-reduce-ram branch July 23, 2024 00:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants