feat: measure time for different steps in bootstrapping and build-sequence #445

shubhbapna · 2024-09-25T21:55:17Z

fixes #408

I couldn't find a nice way to measure memory usage for child processes. even using cgroups won't be as straightforward. Anyways measuring time is probably a higher priority than memory so we can discuss how to do memory usage later

For measuring time, instead of wrapping each function call we want to measure individually, I decided it might be better to create a decorator which we can call for whichever function we want to measure. This will make it easy to store metrics in a single store and print it at the end as well. Plus, in the future if we want to measure more functions we simply have to add this decorator

tiran

The _extract_req_and_version_from_args involves a lot of hackery and black magic. Let's make it easier and more sustainable for us:

Change every function with a timeit() decorator to accept only keyword arguments, e.g. def download_source(*, ctx: context.WorkContext, req: Requirement, version: Version, download_url: str). The lonely * indicates keyword-only for remaining arguments.
Change the internal wrapper to def wrapper_timeit(*, ctx: context.WorkContext, req: Requirement, version: Version, **kwargs: typing.Any) -> typing.Any.

Now you have convenient access to the variables without hackery. It makes the timeit function less generic, but that's okay for our purpose. You can also store the timing information on the context instead of a global variable.

tiran · 2024-09-26T08:21:53Z

src/fromager/metrics.py

+def timeit(description: str):
+    def timeit_decorator(func: typing.Callable):
+        _time_description_store[func.__name__] = description
+
+        @functools.wraps(func)
+        def wrapper_timeit(*args: typing.Any, **kwargs: typing.Any):


You are missing return type annotations.

shubhbapna · 2024-09-26T15:45:40Z

Change every function with a timeit() decorator to accept only keyword arguments, e.g. def download_source(*, ctx: context.WorkContext, req: Requirement, version: Version, download_url: str). The lonely * indicates keyword-only for remaining arguments.

Thats a breaking change for a lot of our APIs and everytime we want to measure a new function we will have to change its API to ensure that it is compatible with our metrics decorators. Plus we won't be able to measure functions like prepare_build_environment . Right now timeit functions allows to measure any function and prints a debug log immediately.

Import API functions no longer accept positional arguments and only support keyword arguments. This makes the code more reliable, readable, and prepares the functions for the `timeit` decorator. Related: python-wheel-build#408 Related: python-wheel-build#445 Signed-off-by: Christian Heimes <[email protected]>

tiran · 2024-10-02T11:58:53Z

Thats a breaking change for a lot of our APIs and everytime we want to measure a new function we will have to change its API to ensure that it is compatible with our metrics decorators. Plus we won't be able to measure functions like prepare_build_environment . Right now timeit functions allows to measure any function and prints a debug log immediately.

keyword-only argument cause some breakage, but it's not as bad as you think. I would argue that they also improve code for the better. I have implemented keyword-only args for several functions in PR #459.

Do we need to measure any function? I don't think so. We want to measure how long build steps of a requirement are taking. The API functions all take a ctx and a req, most take a version, too. To handle prepare_build_environment, you can make the version argument optional:

def wrapper_timeit(*, ctx: context.WorkContext, req: Requirement, **kwargs: typing.Any) -> typing.Any:
    version = kwargs.get("version", kwargs.get("dist_version"))
    ...

Another approach is to use inspect.signature to create a Signature object for a callable, then bind the signature to *args and **kwargs:

>>> import inspect
>>> from fromager import build_environment
>>> from packaging.requirements import Requirement
>>> import pathlib

>>> sig = inspect.signature(build_environment.prepare_build_environment)
>>> args = ("ctx", Requirement("egg"), pathlib.Path("source"))
>>> kwargs = {}
>>> bound = sig.bind(*args, **kwargs)
>>> bound.apply_defaults()
>>> bound
<BoundArguments (ctx='ctx', req=<Requirement('egg')>, sdist_root_dir=PosixPath('source'))>
>>> bound.arguments
{'ctx': 'ctx', 'req': <Requirement('egg')>, 'sdist_root_dir': PosixPath('source')}

shubhbapna · 2024-10-02T14:38:45Z

That make sense I like it. What about the resolver functions? They don't take in version but return a version. Should I use the binding the signature approach and then check the return value of the function to extract the version?

dhellmann · 2024-10-02T14:46:47Z

How much granularity do we need for timing?

shubhbapna · 2024-10-02T14:57:15Z

How much granularity do we need for timing?

Since we are trying to time some heavy weight functions (almost all of them requiring some sort of IO), I think we can get away with not being as fine grained in measuring them. We might not gain much by optimizing these functions at the micro-second level anyways. So just a blanket measurement for each step might be enough for now?

dhellmann · 2024-10-02T15:15:48Z

How much granularity do we need for timing?

Since we are trying to time some heavy weight functions (almost all of them requiring some sort of IO), I think we can get away with not being as fine grained in measuring them. We might not gain much by optimizing these functions at the micro-second level anyways. So just a blanket measurement for each step might be enough for now?

So we would time how long it takes to resolve, download, and build?

Should we separate the time spent in plugins vs. the time spent in the core of fromager?

shubhbapna · 2024-10-02T15:32:09Z

So we would time how long it takes to resolve, download, and build?

And prepare as well

Should we separate the time spent in plugins vs. the time spent in the core of fromager?

I think we can just think of the plugin as some IO fromager is performing that we can't control much

dhellmann · 2024-10-02T15:52:19Z

Should we separate the time spent in plugins vs. the time spent in the core of fromager?

I think we can just think of the plugin as some IO fromager is performing that we can't control much

That makes sense.

dhellmann

I have one formatting suggestion that I think we should address now.

We may also find we want a machine-readable output file in the future. Let's do that later, though, when we have some experience using the log output so we can decide what format to use.

src/fromager/metrics.py

tiran · 2024-10-09T14:37:08Z

src/fromager/metrics.py

+            if not version:
+                version = _extract_version_from_return(ret)


Which function needs this hack?

any of the version resolvers like this one: https://github.com/python-wheel-build/fromager/blob/main/src/fromager/sources.py#L73

src/fromager/metrics.py

rd4398

LGTM!

shubhbapna · 2024-10-22T20:06:54Z

@tiran are anymore changes required for this PR?

Signed-off-by: Shubh Bapna <[email protected]>

shubhbapna requested review from dhellmann and tiran September 25, 2024 21:55

tiran requested changes Sep 26, 2024

View reviewed changes

shubhbapna force-pushed the time-measurement branch from e1a1061 to fb28299 Compare September 26, 2024 16:14

tiran mentioned this pull request Oct 2, 2024

Use keyword-only args for API functions #459

Merged

shubhbapna force-pushed the time-measurement branch 3 times, most recently from 56f6b77 to ec5d781 Compare October 3, 2024 19:29

shubhbapna requested a review from tiran October 3, 2024 19:29

dhellmann reviewed Oct 4, 2024

View reviewed changes

src/fromager/metrics.py Outdated Show resolved Hide resolved

shubhbapna force-pushed the time-measurement branch from ec5d781 to 61ff0a4 Compare October 4, 2024 19:05

shubhbapna requested a review from dhellmann October 4, 2024 19:06

dhellmann approved these changes Oct 4, 2024

View reviewed changes

tiran reviewed Oct 9, 2024

View reviewed changes

shubhbapna force-pushed the time-measurement branch from 61ff0a4 to 64903b3 Compare October 9, 2024 15:55

shubhbapna requested a review from tiran October 9, 2024 15:55

rd4398 reviewed Oct 15, 2024

View reviewed changes

src/fromager/metrics.py Show resolved Hide resolved

rd4398 approved these changes Oct 16, 2024

View reviewed changes

shubhbapna force-pushed the time-measurement branch from 64903b3 to 24162c8 Compare October 22, 2024 20:06

measure time for different steps in bootstrapping and build-sequence

e6f4c3b

Signed-off-by: Shubh Bapna <[email protected]>

shubhbapna force-pushed the time-measurement branch from 24162c8 to e6f4c3b Compare November 11, 2024 18:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: measure time for different steps in bootstrapping and build-sequence #445

feat: measure time for different steps in bootstrapping and build-sequence #445

shubhbapna commented Sep 25, 2024

tiran left a comment

tiran Sep 26, 2024

shubhbapna commented Sep 26, 2024 •

edited

Loading

tiran commented Oct 2, 2024

shubhbapna commented Oct 2, 2024

dhellmann commented Oct 2, 2024

shubhbapna commented Oct 2, 2024

dhellmann commented Oct 2, 2024

shubhbapna commented Oct 2, 2024

dhellmann commented Oct 2, 2024

dhellmann left a comment

tiran Oct 9, 2024

shubhbapna Oct 9, 2024

rd4398 left a comment

shubhbapna commented Oct 22, 2024

feat: measure time for different steps in bootstrapping and build-sequence #445

Are you sure you want to change the base?

feat: measure time for different steps in bootstrapping and build-sequence #445

Conversation

shubhbapna commented Sep 25, 2024

tiran left a comment

Choose a reason for hiding this comment

tiran Sep 26, 2024

Choose a reason for hiding this comment

shubhbapna commented Sep 26, 2024 • edited Loading

tiran commented Oct 2, 2024

shubhbapna commented Oct 2, 2024

dhellmann commented Oct 2, 2024

shubhbapna commented Oct 2, 2024

dhellmann commented Oct 2, 2024

shubhbapna commented Oct 2, 2024

dhellmann commented Oct 2, 2024

dhellmann left a comment

Choose a reason for hiding this comment

tiran Oct 9, 2024

Choose a reason for hiding this comment

shubhbapna Oct 9, 2024

Choose a reason for hiding this comment

rd4398 left a comment

Choose a reason for hiding this comment

shubhbapna commented Oct 22, 2024

shubhbapna commented Sep 26, 2024 •

edited

Loading