-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update feature/perf from master #5993
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…n-python3 Improve spelling and fix typos in comments of Python scripts
Signed-off-by: Ashwinh <[email protected]> Co-authored-by: Bernhard Kaindl <[email protected]>
Signed-off-by: Bernhard Kaindl <[email protected]>
Signed-off-by: Bernhard Kaindl <[email protected]>
- Removed templates debian and debug from Makefile Signed-off-by: Ashwinh <[email protected]>
…rt-tabs-to-spaces CP-49931 - Convert tabs to spaces and fix whitespace for scripts/xe-reset-networking:
…and-warningfixes CP-50100: `backup-sr-metadata,restore-sr-metadata`: `2to3`, fix `pytype`, `pyright`
In `XenAPIPlugin.py`'s `dispatch()` function, SystemExit does not need to be caught and raised because both other exceptions are subclasses of Exception: By design, SystemExit is a subclass of BaseException and because we are not catching BaseException and also not use a bare `except:` here, we can cleanup catching and re-raising `SystemExit()` here. Reference: https://docs.python.org/3/library/exceptions.html#SystemExit Signed-off-by: Bernhard Kaindl <[email protected]>
…perflous-raise-SystemExit
…c directory - Modified python3/Makefile to include this change. - Removed backup-sr-metadata.py from scripts/Makefile Signed-off-by: Ashwinh <[email protected]>
…ec directory - Modified python3 Makefile to include this change - Removed restore-sr-metadata.py from scripts/Makefile - Fixed bare-except exception pylint issue Signed-off-by: Ashwinh <[email protected]>
…sions Original (code supplied) by Bernhard Kaindl: - Declare missing methods to python3/stubs/XenAPI.py for pyright - Initialize variables to fix pyright:reportPossiblyUnboundVariable - Applied isort Signed-off-by: Ashwinh <[email protected]> Signed-off-by: Bernhard Kaindl <[email protected]>
Signed-off-by: Bernhard Kaindl <[email protected]>
Signed-off-by: Bernhard Kaindl <[email protected]>
Signed-off-by: Bernhard Kaindl <[email protected]>
CP-49919: mv scripts/extensions/pool_update.precheck to python3/extensions
mv restore-sr-metadata.py & backup-sr-metadata.py to python3/
- Modified python3/Makefile to include this change - Removed static-vdis from scripts/Makefile - Modified test_static_vdis.py to include new location of the static-vdis Signed-off-by: Ashwinh <[email protected]>
…ing-found-by-mypy
Signed-off-by: Bernhard Kaindl <[email protected]>
These example scripts were imported in 2009 and are obsoleted by samples like: https://github.com/xapi-project/xen-api-sdk/blob/master/python/samples/powercycle.py - The sample xen-api-sdk/python/samples/powercycle.py is much better. - They are not installed and not otherwise mentioned in the repository. Signed-off-by: Bernhard Kaindl <[email protected]>
Signed-off-by: Bernhard Kaindl <[email protected]>
Fix pyright: - Rework unclean exception handling using a contextmanager. - Declare address and master in case of an Exception as well. - Fix warning on unsupported escape sequence using a raw string. - Removed python2 script check from pyproject.toml Signed-off-by: Bernhard Kaindl <[email protected]> Signed-off-by: Ashwinh <[email protected]>
CP-49931: mv `scripts/xe-reset-networking` to `python3/bin`
…l/ocaml_backend/python CP-47869: Remove ocaml/idl/ocaml_backend/python: remove obsolete example scripts: - Could not find any use of installation of them
CP-49900: Removed templates folder from python3/
…rm-py3 Py3: Cleanup `test_mail-alarm.py` to use python3 test helpers and move it
…si-iqn `scripts/generate-iscsi-iqn`: Update the inline Python script for Python3
Instead of fetching a list of enabled tracers from the locked hashtable everytime, look it up directly. Change the 'observer' global boolean to an Atomic.t that stores the currently enabled TracerProvider (if any, a [no_op] tracer that is disabled otherwise). Now 'Tracer.get_tracer' can run without allocating memory, and without having to take any locks. When the tracer gets created/destroyed/enabled/disabled we walk the entire list and find the current tracer, if any, and set the current tracer appropriately. This is O(tracerproviders), but we only support tracerproviders=0, or tracerproviders=1, and this is a rare operation, that happens on startup or the first time the tracer gets enabled. tracing/overhead(on, create span) with workloads: ``` before: 14940.958498 ns/run (confidence: 16363.887949 to 13856.836570) after: 11975.740184 ns/run (confidence: 12710.262010 to 11463.052415) ``` Signed-off-by: Edwin Török <[email protected]>
Signed-off-by: Edwin Török <[email protected]>
We only need to traverse this once, in a direct order. without workload: before: 25172.588238 (confidence: 38289.717741 to 14519.713302); after: 15774.959559 ns/run (confidence: 23866.336958 to 9640.244697); with workload: before: 14940.958498 ns/run (confidence: 16363.887949 to 13856.836570) after: 11028.540184 ns/run (confidence: 11687.173016 to 10454.879814); Signed-off-by: Edwin Török <[email protected]>
Instead of constructing intermediate seq elements, directly fold the new attribute list into the old set. Signed-off-by: Edwin Török <[email protected]>
Tracing shouldn't use locks, because we use it to find hot spots in the application. And introducing a single global lock might introduce lock contention that wouldn't otherwise be present. Use atomic operations on immutable data structures instead. Although a Map is O(log n), and not O(1) like a hashtable, it doesn't require holding any locks to traverse it, or update it. We only need to do an atomic compare-and-set operation once we've finished updating it, and if we raced with anyone (unlikely on OCaml4, unless you got interrupted by the tick thread), then try again with a backoff. We shouldn't hand roll atomic data structures like this, but instead use Saturn (Skip lists), or Kcas (which has a generic [update] function that implements the above method and also works on OCaml 5). before: 3827.092437 ns/run (confidence: 4275.705106 to 3550.511099); after: 2727.247326 ns/run (confidence: 3019.854167 to 2582.316754); Note: when benchmarking ensure that /sys/devices/system/clocksource/clocksource0/current_clocksource is set to TSC. If set to Xen, then reading timestamps is very slow. Signed-off-by: Edwin Török <[email protected]>
A small improvement, but could be more on deep span trees. before:2727.247326 ns/run (confidence: 3019.854167 to 2582.316754) after: 2590.000000 ns/run(confidence: 3362.115942 to 2068.393072); Signed-off-by: Edwin Török <[email protected]>
And print the failed ID. Signed-off-by: Edwin Török <[email protected]>
The following environment variables can now be used by tests: * QCHECK_LONG_FACTOR (default 10): multiplies the default iteration count of 100 * QCHECK_SEED: an integer seed (the tests print their current seed on startup otherwise, can be used to reproduce a failure) These environment variables are already supported by the QCheck framework, we just need to use them. Signed-off-by: Edwin Török <[email protected]>
Required for merge queues, so we have some tests that run in the merge queue: https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/configuring-pull-request-merges/managing-a-merge-queue#triggering-merge-group-checks-with-github-actions Signed-off-by: Edwin Török <[email protected]>
Required for merge queues, so we have some tests that run in the merge queue: https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/configuring-pull-request-merges/managing-a-merge-queue#triggering-merge-group-checks-with-github-actions
On behalf of Gang Ji <[email protected]>, I, Pau Ruiz Safont <[email protected]>, hereby add my Signed-off-by to this commit: e40fe60 Signed-off-by: Pau Ruiz Safont <[email protected]>
There's a commit on master that makes the DCO check to fail on all master commits. Since the issue seems to be an email mismatch, we can fix this easily.
The following environment variables can now be used by tests: * QCHECK_LONG_FACTOR (default 10): multiplies the default iteration count of 100 * QCHECK_SEED: an integer seed (the tests print their current seed on startup otherwise, can be used to reproduce a failure) These environment variables are already supported by the QCheck framework, we just need to use them.
Last year, an argument to determine the group id of the certificates created was added. This broke test clients, and there's no need no make it compulsory since there's a default value: -1. Modify the binary so it can accept only 3 arguments like before, and change the help output to mention the group id. This makes quite a few internal certificate tests to pass compare the previous suite run 204647 to the new 204840
This allows to maintain sub-second precision, and only convert to datetimes when converting to strings. Now timezone conversion is more explicit. Datetimes without timezone are assumed to be UTC. While rare in practice, this is not safe in general, so print a warning to stdout whenever this happens. Signed-off-by: Pau Ruiz Safont <[email protected]>
Datetimes received as call parameters can lack timezone. In that case the host assumed the datetime is in UTC. This is not always safe to do, but returning an error will break clients. Instead keep doing the assumption and enforce proper documentation of the parameters in the datamodel. Signed-off-by: Pau Ruiz Safont <[email protected]>
Signed-off-by: Pau Ruiz Safont <[email protected]>
Previously the dates without timezone were assumed to use UTC when being converted to unix time, but the datatype maintained the lack of timezone when being printed. This means that the problematic timezoneless timestamps were kept and could be produced by the hosts. Since the dates are assumed to be UTC, add the timezone when they are parsed for the first time. Now their timezone is displayed at all times. The output of Localtime is kept without a timezone because clients are not prepared to accept arbitrary timezones in dates. (Both the SDK and XO(Lite)) Signed-off-by: Pau Ruiz Safont <[email protected]>
OCamlformat will correctly format new features only when it's aware the OCaml version it's outputting provides them. This allows to improve formatting for let-punning, punned labelled arguments, and more: https://github.com/ocaml-ppx/ocamlformat/blob/main/CHANGES.md#added-8 Signed-off-by: Andrii Sultanov <[email protected]>
…ion (#5950) The Date module has a very convoluted way to deal with timezones because of its historic clients. While we can't remove all the issues, we can remove most of them. - Dates (timestamps, really) without timezones do not contain a frame of reference, and hence placing them in the UTC timeline can't be done accurately in the general case. This means that comparing timestamps gives incorrect results. - XMLRPC enforces dates to be encoded in ISO8601format. This means timestamps can lack a timezone. - Xapi uses xmlrpc's dates, adding this unfortunate limitation. - While Date allows these timestamps, it assumes that these are in the UTC timezone. On top of that, it refuses to process any timestamps that are in any timezone other than UTC (`Z`) - The Date module tries really hard to hide the timezone of timestamps that lack it when printing them. This means that timezoneless timestamps can persist in the database, for no good reason, as they are treated as UTC timestamps. - Host.localtime is the only call that returns timezoneless timestamps, all the other calls correctly return timestamps in the UTC timezone. Because the call on purpose does not want to return the current time in UTC, changing this might break clients not ready to process any timezone, mainly SDK-built ones #5802 - Dates are stored as a tuple of date, hour, minutes, seconds. This has very limited precision, which might be unexpected. This PR does the following mitigation / fixes: - Document all calls with Datetimes as parameters that the timestamps will be interpreted as UTC ones if they miss the timezone. - Remove the limitation to process any valid timezone - Timezoneless timstamps are immediately processed as UTC timestamps, as refusing these timestamps can break clients. Issues that the PR does not fix - Host.localtime produces timestamps without timezones, this is needed as adding non-zero timestamps breaks the SDK clients. - The server does not reject timezoneless timestamp, this might break migrations, RPUs, or normal clients, so I've held back on this change. - Printed timestamp do _not_ retain sub-second precision Drafting as tests are ongoing
OCamlformat will correctly format new features only when it's aware the OCaml version it's outputting provides them. This allows to improve formatting for let-punning, punned labelled arguments, and more: https://github.com/ocaml-ppx/ocamlformat/blob/main/CHANGES.md#added-8
max-spans=1000 is small for a VM.start, now that we have more instrumentation. With the other optimizations in this series of commits it should now be safe to increase the number of spans, we no longer perform O(n^2) operations on them. Signed-off-by: Edwin Török <[email protected]>
clock now uses xapi-log, so it needs to have the dependency declared in opam Signed-off-by: Pau Ruiz Safont <[email protected]>
clock now uses xapi-log, so it needs to have the dependency declared in opam
We notice that required CI checks cancel themselves even when attempting to merge a single PR at a time. That is probably because there are CI jobs run on both 'push' and 'merge_group'. Try to make the concurrency group more unique by adding the github event name to the group key. Signed-off-by: Edwin Török <[email protected]>
We notice that required CI checks cancel themselves even when attempting to merge a single PR at a time. That is probably because there are CI jobs run on both 'push' and 'merge_group'. Try to make the concurrency group more unique by adding the github event name to the group key.
We've noticed some delays when tracing is enabled, and suspected it is due to lock contention or GC activity created by the tracing module itself. I've added some benchmarks to measure the overhead of tracing, and made some improvements (more improvements are possible). The benchmarks test both a situation with no workload, and a situation with 2 other threads: one running the same workload as the function under test, and another that runs a very GC intensive workload. Improved: * when max spans limit is hit: don't spam syslog. Time goes down from 202 *milli*seconds to 14 *micro*seconds per span * reduce lock contention: eliminate one global lock, replace with atomic ops * reduce allocation We are now at ~9µs/nested span on this particular machine, but further improvements are possible by delaying work to the exporter thread and finishing really quickly otherwise (e.g. upstream ocaml-trace claims ~60ns/span). When tracing is disabled the overhead is very small, just like before (on the order of 20ns without a workload). Before: ``` Running benchmarks (no workloads) ╭─────────────────────────────────────┬───────────────────────────┬───────────────────────────┬───────────────────────────╮ │name │ major-allocated │ minor-allocated │ monotonic-clock │ ├─────────────────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┤ │ tracing/max span overflow │ 0.0000 mjw/run│ 1187.0000 mnw/run│ 96214066.4652 ns/run│ │ tracing/overhead(off) │ 0.0000 mjw/run│ 14.0000 mnw/run│ 19.0061 ns/run│ │ tracing/overhead(on, create span) │ 2.9250 mjw/run│ 496.3888 mnw/run│ 25431.5210 ns/run│ │ tracing/overhead(on, no span) │ 0.0000 mjw/run│ 14.0000 mnw/run│ 18.8063 ns/run│ ╰─────────────────────────────────────┴───────────────────────────┴───────────────────────────┴───────────────────────────╯ tracing/overhead(on, no span) (ns): { monotonic-clock per run = 18.806282 (confidence: 20.597860 to 16.986898); r² = Some 0.63699 } tracing/max span overflow (ns): { monotonic-clock per run = 96214066.465201 (confidence: 125336685.666667 to 70136316.380165); r² = Some 0.518256 } tracing/overhead(off) (ns): { monotonic-clock per run = 19.006133 (confidence: 20.619355 to 17.390492); r² = Some 0.667542 } tracing/overhead(on, create span) (ns): { monotonic-clock per run = 25431.521030 (confidence: 37607.254799 to 15151.227932); r² = Some 0.0329005 } Running benchmarks (workloads) ╭─────────────────────────────────────┬───────────────────────────┬───────────────────────────┬───────────────────────────╮ │name │ major-allocated │ minor-allocated │ monotonic-clock │ ├─────────────────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┤ │ tracing/max span overflow │ 80934.7363 mjw/run│ 10580588.4725 mnw/run│ 202271848.5714 ns/run│ │ tracing/overhead(off) │ 0.0000 mjw/run│ 16.5479 mnw/run│ 212.0958 ns/run│ │ tracing/overhead(on, create span) │ 0.0000 mjw/run│ 503.4400 mnw/run│ 15633.2400 ns/run│ │ tracing/overhead(on, no span) │ 0.0000 mjw/run│ 18.3256 mnw/run│ 326.9568 ns/run│ ╰─────────────────────────────────────┴───────────────────────────┴───────────────────────────┴───────────────────────────╯ tracing/overhead(on, no span) (ns): { monotonic-clock per run = 326.956811; r² = Some -9.40099 } tracing/max span overflow (ns): { monotonic-clock per run = 202271848.571429; r² = Some 0.065489 } tracing/overhead(off) (ns): { monotonic-clock per run = 212.095829; r² = Some -4.46794 } tracing/overhead(on, create span) (ns): { monotonic-clock per run = 15633.240000; r² = Some 0.805726 } ``` After: ``` Running benchmarks (no workloads) ╭─────────────────────────────────────┬───────────────────────────┬───────────────────────────┬───────────────────────────╮ │name │ major-allocated │ minor-allocated │ monotonic-clock │ ├─────────────────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┤ │ tracing/max span overflow │ 0.0964 mjw/run│ 379.0196 mnw/run│ 14479.0371 ns/run│ │ tracing/overhead(off) │ 0.0000 mjw/run│ 14.0000 mnw/run│ 18.2174 ns/run│ │ tracing/overhead(on, create span) │ 0.0000 mjw/run│ 397.3440 mnw/run│ 14712.1307 ns/run│ │ tracing/overhead(on, no span) │ 0.0000 mjw/run│ 14.0000 mnw/run│ 18.1771 ns/run│ ╰─────────────────────────────────────┴───────────────────────────┴───────────────────────────┴───────────────────────────╯ tracing/overhead(on, no span) (ns): { monotonic-clock per run = 18.177093 (confidence: 19.990180 to 16.374104); r² = Some 0.600249 } tracing/max span overflow (ns): { monotonic-clock per run = 14479.037101 (confidence: 15625.053708 to 13293.667720); r² = Some 0.635766 } tracing/overhead(off) (ns): { monotonic-clock per run = 18.217390 (confidence: 19.845841 to 16.442527); r² = Some 0.642373 } tracing/overhead(on, create span) (ns): { monotonic-clock per run = 14712.130670 (confidence: 23416.444172 to 7426.891454); r² = Some 0.0476263 } Running benchmarks (workloads) ╭─────────────────────────────────────┬───────────────────────────┬───────────────────────────┬───────────────────────────╮ │name │ major-allocated │ minor-allocated │ monotonic-clock │ ├─────────────────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┤ │ tracing/max span overflow │ 0.0000 mjw/run│ 380.4429 mnw/run│ 7658.2207 ns/run│ │ tracing/overhead(off) │ 0.0000 mjw/run│ 15.3007 mnw/run│ 102.2479 ns/run│ │ tracing/overhead(on, create span) │ 0.0000 mjw/run│ 400.5094 mnw/run│ 8657.5585 ns/run│ │ tracing/overhead(on, no span) │ 0.0000 mjw/run│ 18.1333 mnw/run│ 373.9794 ns/run│ ╰─────────────────────────────────────┴───────────────────────────┴───────────────────────────┴───────────────────────────╯ tracing/overhead(on, no span) (ns): { monotonic-clock per run = 373.979447 (confidence: 435.400802 to 336.056569); r² = Some -1.84338 } tracing/max span overflow (ns): { monotonic-clock per run = 7658.220695 (confidence: 7952.878804 to 7396.117711); r² = Some 0.950954 } tracing/overhead(off) (ns): { monotonic-clock per run = 102.247932 (confidence: 119.364768 to 89.830417); r² = Some -0.607146 } tracing/overhead(on, create span) (ns): { monotonic-clock per run = 8657.558458 (confidence: 8904.596470 to 8435.216348); r² = Some 0.956299 } ``` Draft PR, this was only unit tested so far.
Continued in #6000 to fix merge conflict, github doesn't allow me to edit the origin repo, just the branch. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Waiting on:
I'm creating a koji build to test all these together.