Convert from pybind11 to nanobind #230

Ahajha · 2025-03-07T00:23:04Z

See nanobind: https://github.com/wjakob/nanobind tl;dr Smaller, faster compiled extensions. Nanobind is written by the original author of pybind11, with the hindsight and lessons from pybind11.

Locally (x64 Linux), the compiled extension went from 14.10MB to 12.01MB.

I have some very crude benchmarks in the comments - this is performing about on par with the previous code.

I've run all the tests locally (x64 Linux), everything passes. Ideally #218 is merged so we can test those automatically on other platforms, but the only major differences I would expect are binary sizes.

I make some CMake changes so that the compiled extensions now just depend on the core project, rather than rebuilding all the sources again for the extension.

Ahajha · 2025-03-07T00:55:54Z

Something to note that I'm not sure about - I think pypy works with nanobind, so I might try re-adding it.

Ubospica · 2025-03-07T04:03:08Z

@Ahajha Thanks for the contribution! Nanobind looks great and fast.

Regarding the performance, could you check the output of

pytest tests/python/test_grammar_matcher_ebnf.py::test_fill_next_token_bitmask

before and after this PR? (it could require HF_TOKEN with access to llama 2 and 3.1) As long as there is no major performance degradation, we can migrate to nanobind.

Besides, could you run pre-commit install && pre-commit run to format the modified files? I noticed several files don’t seem to be formatted correctly, and it should be very easy to format them with pre-commit hooks.

Ahajha · 2025-03-07T19:51:08Z

Got some of the HF tests running, ran into a snag due to a few places allowing interchange between bytes and strings. Nanobind specifically doesn't like this (see wjakob/nanobind#137), so I might need to make some slightly deeper API changes to get a clean solution.

Ahajha · 2025-03-08T01:13:42Z

I came up with a solution that I think is reasonable - basically we defer casting until we get into the relevant functions (there was one other that needed fixing). So instead of directly taking in a vector of strings, we just take the raw python list (with some type annotations, these don't do any runtime checks) and then manually cast from either a string or bytes. This solution ended up being fairly simple without requiring any API changes.

Ahajha · 2025-03-09T07:57:41Z

After some investigating and fixes, I have some better numbers, running the following:

/usr/bin/time pytest tests/python/test_grammar_matcher_ebnf.py::test_fill_next_token_bitmask >/dev/null

(I'm only including the interesting line from each run)
Main:

5.44user 0.25system 0:03.61elapsed 157%CPU (0avgtext+0avgdata 722316maxresident)k
5.64user 0.29system 0:03.79elapsed 156%CPU (0avgtext+0avgdata 723012maxresident)k
5.34user 0.27system 0:03.47elapsed 161%CPU (0avgtext+0avgdata 722360maxresident)k
5.39user 0.28system 0:03.51elapsed 161%CPU (0avgtext+0avgdata 722292maxresident)k
5.15user 0.27system 0:03.40elapsed 159%CPU (0avgtext+0avgdata 723596maxresident)k
5.30user 0.24system 0:03.40elapsed 162%CPU (0avgtext+0avgdata 723056maxresident)k

Nanobind branch:

5.37user 0.32system 0:04.05elapsed 140%CPU (0avgtext+0avgdata 721436maxresident)k
5.41user 0.26system 0:03.58elapsed 158%CPU (0avgtext+0avgdata 723240maxresident)k
5.35user 0.28system 0:03.53elapsed 159%CPU (0avgtext+0avgdata 722456maxresident)k
5.40user 0.27system 0:03.53elapsed 160%CPU (0avgtext+0avgdata 722348maxresident)k
5.44user 0.28system 0:03.66elapsed 156%CPU (0avgtext+0avgdata 723440maxresident)k

This is definitely eyeball statistics, but this seems to be on par with what it is now. I expected a speedup, perhaps there is more to investigate, but as you said as long as it isn't an obvious regression it should be okay. I also have some future PR ideas that involve nanobind that should lead to further speedups (some of them could be done with pybind11, but will be easier with nanobind).

Ahajha · 2025-03-09T08:06:04Z

The issue was actually rather insightful, I'll share some insight for you (or anyone else reading this PR).

The first issue was that pybind11 enables LTO by default, nanobind doesn't - the former basically requires it to get reasonable performance, but nanobind is designed and laid out in a better way so that you can still get good performance without it, though they provide the option should we want it. I know this project has -flto=auto on by default, I'm not sure why this sped things up but it did. Perhaps it's a different level of LTO, it's whatever CMake's CMAKE_INTERPROCEDURAL_OPTIMIZATION does.

The second issue was more interesting to me. Nanobind has a section on their CMake API (https://nanobind.readthedocs.io/en/latest/api_cmake.html#command:nanobind_add_module), and one thing they mention there is binary size. Nanobind defaults to optimizing for size, because in the author's experience with bindings, optimizing for speed tends to not make a difference. However they do make a note about mixing optimization levels, specifically, compiling the core project with whatever default optimization level like -O2, and just binding code with -Osize. So I've slightly rearranged some code to make anything that is just interop (with no 'real' logic) into nanobind.cc, now no other files depend on nanobind.

Ahajha · 2025-03-09T08:21:20Z

@Ubospica This should be ready for another look now!

Ahajha added 2 commits March 6, 2025 19:19

Convert to nanobind

0054e7e

Rename pybind things to nanobind

a6262fa

This comment was marked as outdated.

Sign in to view

Ahajha added 2 commits March 7, 2025 15:20

Fix compile_json_schema

50d653b

Merge strings and bytes

7077aa2

This comment was marked as outdated.

Sign in to view

Ahajha and others added 4 commits March 8, 2025 16:32

Merge branch 'mlc-ai:main' into ahajha/nanobind

1c7f3c5

Enable LTO

2520665

Move all nanobind code to nanobind.cc

e09e3b6

CMake cleanup

8a9fb3a

Run precommit

59ac31e

Simplify _debug_accept_string

6ca80e7

Ahajha mentioned this pull request Mar 9, 2025

Avoid extra indirections in Python bindings #233

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert from pybind11 to nanobind #230

Convert from pybind11 to nanobind #230

Ahajha commented Mar 7, 2025 •

edited

Loading

Ahajha commented Mar 7, 2025

Ubospica commented Mar 7, 2025 •

edited

Loading

This comment was marked as outdated.

Ahajha commented Mar 7, 2025

This comment was marked as outdated.

Ahajha commented Mar 8, 2025

This comment was marked as outdated.

Ahajha commented Mar 9, 2025 •

edited

Loading

Ahajha commented Mar 9, 2025 •

edited

Loading

Ahajha commented Mar 9, 2025

Convert from pybind11 to nanobind #230

Are you sure you want to change the base?

Convert from pybind11 to nanobind #230

Conversation

Ahajha commented Mar 7, 2025 • edited Loading

Ahajha commented Mar 7, 2025

Ubospica commented Mar 7, 2025 • edited Loading

This comment was marked as outdated.

Ahajha commented Mar 7, 2025

This comment was marked as outdated.

Ahajha commented Mar 8, 2025

This comment was marked as outdated.

Ahajha commented Mar 9, 2025 • edited Loading

Ahajha commented Mar 9, 2025 • edited Loading

Ahajha commented Mar 9, 2025

Ahajha commented Mar 7, 2025 •

edited

Loading

Ubospica commented Mar 7, 2025 •

edited

Loading

Ahajha commented Mar 9, 2025 •

edited

Loading

Ahajha commented Mar 9, 2025 •

edited

Loading