Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: measure time for different steps in bootstrapping and build-sequence #445

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

shubhbapna
Copy link
Collaborator

fixes #408

I couldn't find a nice way to measure memory usage for child processes. even using cgroups won't be as straightforward. Anyways measuring time is probably a higher priority than memory so we can discuss how to do memory usage later

For measuring time, instead of wrapping each function call we want to measure individually, I decided it might be better to create a decorator which we can call for whichever function we want to measure. This will make it easy to store metrics in a single store and print it at the end as well. Plus, in the future if we want to measure more functions we simply have to add this decorator

Copy link
Collaborator

@tiran tiran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _extract_req_and_version_from_args involves a lot of hackery and black magic. Let's make it easier and more sustainable for us:

  1. Change every function with a timeit() decorator to accept only keyword arguments, e.g. def download_source(*, ctx: context.WorkContext, req: Requirement, version: Version, download_url: str). The lonely * indicates keyword-only for remaining arguments.
  2. Change the internal wrapper to def wrapper_timeit(*, ctx: context.WorkContext, req: Requirement, version: Version, **kwargs: typing.Any) -> typing.Any.

Now you have convenient access to the variables without hackery. It makes the timeit function less generic, but that's okay for our purpose. You can also store the timing information on the context instead of a global variable.

Comment on lines 15 to 20
def timeit(description: str):
def timeit_decorator(func: typing.Callable):
_time_description_store[func.__name__] = description

@functools.wraps(func)
def wrapper_timeit(*args: typing.Any, **kwargs: typing.Any):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are missing return type annotations.

@shubhbapna
Copy link
Collaborator Author

shubhbapna commented Sep 26, 2024

  1. Change every function with a timeit() decorator to accept only keyword arguments, e.g. def download_source(*, ctx: context.WorkContext, req: Requirement, version: Version, download_url: str). The lonely * indicates keyword-only for remaining arguments.

Thats a breaking change for a lot of our APIs and everytime we want to measure a new function we will have to change its API to ensure that it is compatible with our metrics decorators. Plus we won't be able to measure functions like prepare_build_environment . Right now timeit functions allows to measure any function and prints a debug log immediately.

tiran added a commit to tiran/fromager that referenced this pull request Oct 2, 2024
Import API functions no longer accept positional arguments and only
support keyword arguments. This makes the code more reliable, readable,
and prepares the functions for the `timeit` decorator.

Related: python-wheel-build#408
Related: python-wheel-build#445
Signed-off-by: Christian Heimes <[email protected]>
tiran added a commit to tiran/fromager that referenced this pull request Oct 2, 2024
Import API functions no longer accept positional arguments and only
support keyword arguments. This makes the code more reliable, readable,
and prepares the functions for the `timeit` decorator.

Related: python-wheel-build#408
Related: python-wheel-build#445
Signed-off-by: Christian Heimes <[email protected]>
tiran added a commit to tiran/fromager that referenced this pull request Oct 2, 2024
Import API functions no longer accept positional arguments and only
support keyword arguments. This makes the code more reliable, readable,
and prepares the functions for the `timeit` decorator.

Related: python-wheel-build#408
Related: python-wheel-build#445
Signed-off-by: Christian Heimes <[email protected]>
@tiran
Copy link
Collaborator

tiran commented Oct 2, 2024

Thats a breaking change for a lot of our APIs and everytime we want to measure a new function we will have to change its API to ensure that it is compatible with our metrics decorators. Plus we won't be able to measure functions like prepare_build_environment . Right now timeit functions allows to measure any function and prints a debug log immediately.

keyword-only argument cause some breakage, but it's not as bad as you think. I would argue that they also improve code for the better. I have implemented keyword-only args for several functions in PR #459.

Do we need to measure any function? I don't think so. We want to measure how long build steps of a requirement are taking. The API functions all take a ctx and a req, most take a version, too. To handle prepare_build_environment, you can make the version argument optional:

def wrapper_timeit(*, ctx: context.WorkContext, req: Requirement, **kwargs: typing.Any) -> typing.Any:
    version = kwargs.get("version", kwargs.get("dist_version"))
    ...

Another approach is to use inspect.signature to create a Signature object for a callable, then bind the signature to *args and **kwargs:

>>> import inspect
>>> from fromager import build_environment
>>> from packaging.requirements import Requirement
>>> import pathlib

>>> sig = inspect.signature(build_environment.prepare_build_environment)
>>> args = ("ctx", Requirement("egg"), pathlib.Path("source"))
>>> kwargs = {}
>>> bound = sig.bind(*args, **kwargs)
>>> bound.apply_defaults()
>>> bound
<BoundArguments (ctx='ctx', req=<Requirement('egg')>, sdist_root_dir=PosixPath('source'))>
>>> bound.arguments
{'ctx': 'ctx', 'req': <Requirement('egg')>, 'sdist_root_dir': PosixPath('source')}

@shubhbapna
Copy link
Collaborator Author

That make sense I like it. What about the resolver functions? They don't take in version but return a version. Should I use the binding the signature approach and then check the return value of the function to extract the version?

@dhellmann
Copy link
Member

How much granularity do we need for timing?

@shubhbapna
Copy link
Collaborator Author

How much granularity do we need for timing?

Since we are trying to time some heavy weight functions (almost all of them requiring some sort of IO), I think we can get away with not being as fine grained in measuring them. We might not gain much by optimizing these functions at the micro-second level anyways. So just a blanket measurement for each step might be enough for now?

@dhellmann
Copy link
Member

How much granularity do we need for timing?

Since we are trying to time some heavy weight functions (almost all of them requiring some sort of IO), I think we can get away with not being as fine grained in measuring them. We might not gain much by optimizing these functions at the micro-second level anyways. So just a blanket measurement for each step might be enough for now?

So we would time how long it takes to resolve, download, and build?

Should we separate the time spent in plugins vs. the time spent in the core of fromager?

@shubhbapna
Copy link
Collaborator Author

So we would time how long it takes to resolve, download, and build?

And prepare as well

Should we separate the time spent in plugins vs. the time spent in the core of fromager?

I think we can just think of the plugin as some IO fromager is performing that we can't control much

@dhellmann
Copy link
Member

Should we separate the time spent in plugins vs. the time spent in the core of fromager?

I think we can just think of the plugin as some IO fromager is performing that we can't control much

That makes sense.

@shubhbapna shubhbapna force-pushed the time-measurement branch 3 times, most recently from 56f6b77 to ec5d781 Compare October 3, 2024 19:29
@shubhbapna shubhbapna requested a review from tiran October 3, 2024 19:29
Copy link
Member

@dhellmann dhellmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one formatting suggestion that I think we should address now.

We may also find we want a machine-readable output file in the future. Let's do that later, though, when we have some experience using the log output so we can decide what format to use.

src/fromager/metrics.py Outdated Show resolved Hide resolved
src/fromager/metrics.py Outdated Show resolved Hide resolved
src/fromager/metrics.py Outdated Show resolved Hide resolved
Comment on lines +33 to +35
if not version:
version = _extract_version_from_return(ret)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which function needs this hack?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/fromager/metrics.py Outdated Show resolved Hide resolved
src/fromager/metrics.py Outdated Show resolved Hide resolved
Copy link
Contributor

@rd4398 rd4398 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@shubhbapna
Copy link
Collaborator Author

@tiran are anymore changes required for this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add time measurements for all steps in bootstrapping and build-sequence
4 participants