-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: measure time for different steps in bootstrapping and build-sequence #445
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _extract_req_and_version_from_args
involves a lot of hackery and black magic. Let's make it easier and more sustainable for us:
- Change every function with a
timeit()
decorator to accept only keyword arguments, e.g.def download_source(*, ctx: context.WorkContext, req: Requirement, version: Version, download_url: str)
. The lonely*
indicates keyword-only for remaining arguments. - Change the internal wrapper to
def wrapper_timeit(*, ctx: context.WorkContext, req: Requirement, version: Version, **kwargs: typing.Any) -> typing.Any
.
Now you have convenient access to the variables without hackery. It makes the timeit function less generic, but that's okay for our purpose. You can also store the timing information on the context instead of a global variable.
src/fromager/metrics.py
Outdated
def timeit(description: str): | ||
def timeit_decorator(func: typing.Callable): | ||
_time_description_store[func.__name__] = description | ||
|
||
@functools.wraps(func) | ||
def wrapper_timeit(*args: typing.Any, **kwargs: typing.Any): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are missing return type annotations.
Thats a breaking change for a lot of our APIs and everytime we want to measure a new function we will have to change its API to ensure that it is compatible with our metrics decorators. Plus we won't be able to measure functions like |
e1a1061
to
fb28299
Compare
Import API functions no longer accept positional arguments and only support keyword arguments. This makes the code more reliable, readable, and prepares the functions for the `timeit` decorator. Related: python-wheel-build#408 Related: python-wheel-build#445 Signed-off-by: Christian Heimes <[email protected]>
Import API functions no longer accept positional arguments and only support keyword arguments. This makes the code more reliable, readable, and prepares the functions for the `timeit` decorator. Related: python-wheel-build#408 Related: python-wheel-build#445 Signed-off-by: Christian Heimes <[email protected]>
Import API functions no longer accept positional arguments and only support keyword arguments. This makes the code more reliable, readable, and prepares the functions for the `timeit` decorator. Related: python-wheel-build#408 Related: python-wheel-build#445 Signed-off-by: Christian Heimes <[email protected]>
keyword-only argument cause some breakage, but it's not as bad as you think. I would argue that they also improve code for the better. I have implemented keyword-only args for several functions in PR #459. Do we need to measure any function? I don't think so. We want to measure how long build steps of a requirement are taking. The API functions all take a def wrapper_timeit(*, ctx: context.WorkContext, req: Requirement, **kwargs: typing.Any) -> typing.Any:
version = kwargs.get("version", kwargs.get("dist_version"))
... Another approach is to use >>> import inspect
>>> from fromager import build_environment
>>> from packaging.requirements import Requirement
>>> import pathlib
>>> sig = inspect.signature(build_environment.prepare_build_environment)
>>> args = ("ctx", Requirement("egg"), pathlib.Path("source"))
>>> kwargs = {}
>>> bound = sig.bind(*args, **kwargs)
>>> bound.apply_defaults()
>>> bound
<BoundArguments (ctx='ctx', req=<Requirement('egg')>, sdist_root_dir=PosixPath('source'))>
>>> bound.arguments
{'ctx': 'ctx', 'req': <Requirement('egg')>, 'sdist_root_dir': PosixPath('source')} |
That make sense I like it. What about the resolver functions? They don't take in |
How much granularity do we need for timing? |
Since we are trying to time some heavy weight functions (almost all of them requiring some sort of IO), I think we can get away with not being as fine grained in measuring them. We might not gain much by optimizing these functions at the micro-second level anyways. So just a blanket measurement for each step might be enough for now? |
So we would time how long it takes to resolve, download, and build? Should we separate the time spent in plugins vs. the time spent in the core of fromager? |
And prepare as well
I think we can just think of the plugin as some IO fromager is performing that we can't control much |
That makes sense. |
56f6b77
to
ec5d781
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one formatting suggestion that I think we should address now.
We may also find we want a machine-readable output file in the future. Let's do that later, though, when we have some experience using the log output so we can decide what format to use.
ec5d781
to
61ff0a4
Compare
if not version: | ||
version = _extract_version_from_return(ret) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which function needs this hack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any of the version resolvers like this one: https://github.com/python-wheel-build/fromager/blob/main/src/fromager/sources.py#L73
61ff0a4
to
64903b3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
64903b3
to
24162c8
Compare
@tiran are anymore changes required for this PR? |
Signed-off-by: Shubh Bapna <[email protected]>
24162c8
to
e6f4c3b
Compare
fixes #408
I couldn't find a nice way to measure memory usage for child processes. even using cgroups won't be as straightforward. Anyways measuring time is probably a higher priority than memory so we can discuss how to do memory usage later
For measuring time, instead of wrapping each function call we want to measure individually, I decided it might be better to create a decorator which we can call for whichever function we want to measure. This will make it easy to store metrics in a single store and print it at the end as well. Plus, in the future if we want to measure more functions we simply have to add this decorator