Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Daemon mode for orion #26

Merged
merged 12 commits into from
May 31, 2024

Conversation

shashank-boyapally
Copy link
Contributor

@shashank-boyapally shashank-boyapally commented Mar 18, 2024

Type of change

  • Refactor
  • New feature
  • Bug fix
  • Optimization
  • Documentation Update

Description

Do not MERGE until fmatch 0.0.6 is released

Firstly, please forgive for changing many things in a single PR. Below are the updates following this PR.

  • Run orion in Daemon mode using orion daemon. This is the opinionated version of daemon mode, where the tests are pre-determined with small-scale-cluster-density being default.
>> curl -L -X POST 'http://127.0.0.1:8000/daemon?filter_changepoints=true&version=4.15'
{
    "aws-small-scale-cluster-density-v2": [
        {
            "uuid": "bd5e6cad-6bc2-472d-948a-b1fbb350e80a",
            "timestamp": 1703680619,
            "buildUrl": "https://tinyurl.com/2ce2rhwx",
            "metrics": {
                "apiserverCPU_avg": {
                    "value": 11.1966602622,
                    "percentage_change": 0
                },
                "etcdCPU_avg": {
                    "value": 8.6323461833,
                    "percentage_change": 0
                },
                "podReadyLatency_P99": {
                    "value": 13000,
                    "percentage_change": 0
                },
                "etcdDisk_avg": {
                    "value": 0.0154012333,
                    "percentage_change": 43.3409860968489
                },
                "ovnCPU_avg": {
                    "value": 2.776452632,
                    "percentage_change": 0
                }
            },
            "is_changepoint": true
        },
        {
            "uuid": "12a1750e-a7eb-48ed-bc00-35078d2bbad8",
            "timestamp": 1704890296,
            "buildUrl": "https://tinyurl.com/28pv78qv",
            "metrics": {
                "apiserverCPU_avg": {
                    "value": 10.9378649752,
                    "percentage_change": 0
                },
                "etcdCPU_avg": {
                    "value": 8.8632852428,
                    "percentage_change": 11.533377325057053
                },
                "podReadyLatency_P99": {
                    "value": 13000,
                    "percentage_change": 0
                },
                "etcdDisk_avg": {
                    "value": 0.0154638616,
                    "percentage_change": 0
                },
                "ovnCPU_avg": {
                    "value": 3.1066655456,
                    "percentage_change": 25.58620416479553
                }
            },
            "is_changepoint": true
        }
    ]
}

Get list of options of tests available using

>> curl -L 'http://127.0.0.1:8000/daemon/options'
{
    "options": [
        "small-scale-cluster-density",
        "small-scale-node-density-cni"
    ]
}
  • The old cli mode is now cmd-mode run it using orion cmd
  • Refactored code for better readability.
  • Documentation update pertaining to daemon-mode.
  • One bug noticed for multiple test run cases, where only the first test case was the only output, fixed it and multiple test cases are now printed in the output.
  • pass version as a param supported by templating.

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please describe the System Under Test.
  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

@shashank-boyapally
Copy link
Contributor Author

Hi @paigerube14 @jtaleric just rebased with latest changes.

@@ -10,3 +10,5 @@ PyYAML==6.0.1
six==1.16.0
tzdata==2023.4
urllib3==1.26.18
fastapi==0.110.0
python-multipart==0.0.9
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried installing this latest requirements and got error below when trying to run:

(venv3) prubenda@prubenda1-mac orion % orion --debug
Traceback (most recent call last):
  File "/Users/prubenda/PycharmProjects/pplRepos/shashank/orion/venv3/bin/orion", line 5, in <module>
    from orion import cli
  File "/Users/prubenda/PycharmProjects/pplRepos/shashank/orion/venv3/lib/python3.10/site-packages/orion.py", line 9, in <module>
    import uvicorn
ModuleNotFoundError: No module named 'uvicorn'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add uvicorn to requirements.txt, also planning on integrating pdm so we don't run into requirement issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use orion cmd-mode or orion daemon-mode, I added these new commands to differentiate between the two.

README.md Outdated
@@ -92,6 +95,45 @@ Activate Orion's regression detection tool for performance-scale CPT runs effort

Additionally, users can specify a custom path for the output CSV file using the ```--output``` flag, providing control over the location where the generated CSV will be stored.

### Daemon mode
The core purpose of Daemon mode is to operate Orion as a self-contained server, dedicated to handling incoming requests. By sending a POST request accompanied by a configuration file, users can trigger change point detection on the provided metadata and metrics. Following the processing, the response is formatted in JSON, providing a structured output for seamless integration and analysis. To trigger daemon mode just use the following commands
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it would be nice to also have an example post url with a config file before showing the output of it. Just reading this its not clear where to make the post

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it!! will add a post URL for further clarity.

Returns:
_type_: _description_
"""
logger_instance = SingletonLogger(debug=logging.INFO).logger
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this is the issue causing: cloud-bulldozer/py-commons#22 because it is creating a new logger each time a new post is ran in the daemon mode

I would think you would want to try to pass the logger from upper daemon mode setting to here instead of creating a new one every run and in sub functions. IE the daemon mode cli would create 1 single logger and every time orion is called it would use the same one. This would also help with this issue. I might be thinking about it wrong but any thoughts? @jtaleric @shashank-boyapally @vishnuchalla

Copy link
Contributor Author

@shashank-boyapally shashank-boyapally Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @paigerube14 , since we implemented SingletonLogger, it only creates the logger for once, later everywhere we call SingletonLogger(debug=logging.INFO).logger it returns back the already created logger instead of creating a new one.

Also, since the context(i.e. file and line info) the logger prints is different in fmatch and orion, we could see that only logger of fmatch is duplicating and not orion. I think passing the logger to each function was redundant which was the case in our previous implementation. In order to tackle this SingletonLogger class was implemented.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, I see. I missed the ".logger" on each of the singletonlogger calls. Thanks for clarifying

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more point to consider here: #26 (comment)

README.md Outdated
*Parameters*

- uuid (optional): The uuid of the run you want to compare with similar runs.
- baseline (optional): The runs you want to compare.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- baseline (optional): The runs you want to compare.
- baseline (optional): The runs you want to compare with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes more sense, will be adding this.

README.md Outdated
@@ -1,5 +1,5 @@
# Orion - CLI tool to find regressions
Orion stands as a powerful command-line tool designed for identifying regressions within perf-scale CPT runs, leveraging metadata provided during the process. The detection mechanism relies on [hunter](https://github.com/datastax-labs/hunter).
Orion stands as a powerful command-line tool/Daemon designed for identifying regressions within perf-scale CPT runs, leveraging metadata provided during the process. The detection mechanism relies on [hunter](https://github.com/datastax-labs/hunter).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Daemon/daemon/*


Below is a sample output structure: the top level of the JSON contains the test name, while within each test, runs are organized into arrays. Each run includes succinct metadata alongside corresponding metrics for comprehensive analysis.
```
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This output structure has to align with the remaining open PRs, let get them merged one by one. A reasonable order would be

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO this would be the cleaner approach.

import sys


class SingletonLogger:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this SingletonLogger to py-commons and re-use it for both fmatch and orion?. Because logger implementation here is thread safe but not here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense.

"""
level = logging.DEBUG if debug else logging.INFO
logger_instance = SingletonLogger(debug=level).logger
logger_instance.info("🏹 Starting Orion in Daemon mode")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if 🏹 is appropriate for orion, may be we want to look for an alternative here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went ahead with bow and arrow as it relates with hunter and orion. Would love to have more suggestions.

orion.py Outdated
# pylint: disable=too-many-locals
@click.command()
@cli.command(name="cmd-mode")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just use cmd and daemon as the subcommands?

pkg/utils.py Outdated
@@ -0,0 +1,328 @@
# pylint: disable=cyclic-import
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this entire file needs a rebase.

pkg/runTest.py Outdated
result, test, output=kwargs["output_format"], matcher=match
)
result_output[testname] = result_data
del match
Copy link
Collaborator

@vishnuchalla vishnuchalla Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this del match automatically done as a part of GC?

pkg/utils.py Outdated
Returns:
_type_: index and uuids
"""
if metadata["benchmark.keyword"] == "k8s-netperf":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one might grow in future. Lets have dict object defined for this one.

Copy link
Collaborator

@vishnuchalla vishnuchalla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good except for some concerns. Let take the open PRs in order.

Copy link
Member

@jtaleric jtaleric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  File "/home/jtaleric/code/orion/test/bin/orion", line 33, in <module>
    sys.exit(load_entry_point('orion==1.0', 'console_scripts', 'orion')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jtaleric/code/orion/test/bin/orion", line 25, in importlib_load_entry_point
    return next(matches).load()
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/importlib/metadata/__init__.py", line 202, in load
    module = import_module(match.group('module'))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/jtaleric/code/orion/test/lib64/python3.11/site-packages/orion.py", line 11, in <module>
    from pkg.runTest import run
  File "/home/jtaleric/code/orion/test/lib64/python3.11/site-packages/pkg/runTest.py", line 8, in <module>
    from pkg.utils import run_hunter_analyze, load_config, get_es_url, process_test
  File "/home/jtaleric/code/orion/test/lib64/python3.11/site-packages/pkg/utils.py", line 19, in <module>
    import pyshorteners
ModuleNotFoundError: No module named 'pyshorteners'

Is this expected? This is newly introduced with this PR as I am able to install the main branch w/o issue.

@shashank-boyapally
Copy link
Contributor Author

I added pyshorteners to shorten the buildUrl in the display, the requirements.txt file is not updated since I wanted to update it along fmatch 0.0.6 when released. There would be some function changes in fmatch 0.0.6.

@jtaleric
Copy link
Member

@shashank-boyapally can you let us know when this is ready for folks to test?

@shashank-boyapally
Copy link
Contributor Author

Hi @jtaleric, the feature is ready to be tested. I updated the requirements with the latest version of fmatch and pyshorteners. I request kindly to please test and let me know any feedback.

@jtaleric
Copy link
Member

Hi @jtaleric, the feature is ready to be tested. I updated the requirements with the latest version of fmatch and pyshorteners. I request kindly to please test and let me know any feedback.

ack - I just saw conflicts...

README.md Outdated

Example
```
curl -X POST 'http://127.0.0.1:8000/daemon?uuid=4cb3efec-609a-4ac5-985d-4cbbcbb11625' \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, I think preferably the endpoints the users hit are more straightforward.

This approach expects the caller to provide the uuid and config. I think this is one option for the daemon, but it leaves the opportunity for error.

I think a more opinionated approach should be how we initially deploy. For example

http://orion.perflab:8000/?payload=true&version=4.16
this endpoint will then use the payload data we have in our data-warehouse, since that will have the most job runs. We could even drop the payload and assume it is set to true by default.

I think the initial endpoint you are suggesting is something else we can consider (with some adjustments) for more custom change-detection -- however I don't think the http endpoint needs to be like the CLI. For example, we shouldn't have the user post the config via http, I think we have candid configs the user provides are daemon launch.

Copy link
Member

@jtaleric jtaleric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't dug into why this is happening, but with our main branch, I run with the cmd

$ orion --config config.yaml --hunter-analyze

And I get the expected output.

However with this branch, I run the exact same config/test and see

$ orion cmd --config config.yaml --hunter-analyze

  File "/home/jtaleric/code/orion/test/bin/orion", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/home/jtaleric/code/orion/test/lib64/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jtaleric/code/orion/test/lib64/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/home/jtaleric/code/orion/test/lib64/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jtaleric/code/orion/test/lib64/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jtaleric/code/orion/test/lib64/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jtaleric/code/orion/test/lib64/python3.11/site-packages/orion.py", line 55, in cmd_analysis
    output = run(**kwargs)
             ^^^^^^^^^^^^^
  File "/home/jtaleric/code/orion/test/lib64/python3.11/site-packages/pkg/runTest.py", line 31, in run
    result = process_test(
             ^^^^^^^^^^^^^
  File "/home/jtaleric/code/orion/test/lib64/python3.11/site-packages/pkg/utils.py", line 263, in process_test
    uuids = [run["uuid"] for run in runs]
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jtaleric/code/orion/test/lib64/python3.11/site-packages/pkg/utils.py", line 263, in <listcomp>
    uuids = [run["uuid"] for run in runs]
             ~~~^^^^^^^^
TypeError: string indices must be integers, not 'str'

@jtaleric
Copy link
Member

It appears that we expect runs = match.get_uuid_by_metadata(metadata) to return a dict, but it returns a list?

2024-03-29 08:46:45,421 - Orion - DEBUG - file: utils.py - line: 263 - ['ff4e1c2c-6960-4081-bc01-5df2c1e72541', '0e190b62-38ac-4d4d-98f7-baf1032c21f8', '91a7a520-ca19-43b9-9d5f-dca8f3df5518', '29dcba68-af3f-4e71-b6a7-2f0ecfae3977', '8f2f5271-5dff-44d4-b126-1f856e6a8387', 'f4fcc9eb-c6ce-4ac3-9979-63c0c5466302', 'b6af9829-ed2d-4773-bef6-26d106e400a5', 'e88b185a-34eb-4447-be11-54829bec9d39', '7873914e-24c1-4657-8317-0f778df1ec14', 'c525cda2-8712-4ad4-93f1-ec4a5169ce0d', 'f50c2163-7f72-4521-8a79-dbdfc5b1182d', '5ab533f6-0712-460a-9a0c-f132c1909e4c', 'f8a237da-9a19-4421-8a68-b6d10dc85f3a', 'f5ae4cde-e1bc-4e1d-b2da-0f358375988d', '9c6b4e2c-950c-46f4-8631-5123f127a082', 'a6e4a657-73c0-4228-ab17-1e1c177f97b8', '17f93785-c04a-4238-8029-ae5d1e3766d5', 'ea77b881-2d0e-4145-ae9d-ae853c08a07c', '668be9c2-7e6c-4134-8519-086085f54e04', '20d0c41a-be34-49b4-b5e0-e127bc49aa0d', 'ade0d220-3cb1-4902-8c5a-456ccb3f09f8', '0fd65106-dd94-4dd7-9ee3-ac062db2909e', '4d63bb03-b137-44be-bbc5-6690b54d7228', 'aaa28324-67af-45c8-b507-73fbb3063091']

@jtaleric
Copy link
Member

I cleaned up my venv and now I get output... however.. New issue.

orion cmd --config config.yaml --hunter-analyze
2024-03-29 08:55:37,186 - Orion - INFO - file: orion.py - line: 54 - 🏹 Starting Orion in command-line mode
2024-03-29 08:55:37,188 - Orion - INFO - file: utils.py - line: 261 - The test aws-416-med-scale-cluster-density-v2 has started
2024-03-29 08:55:37,189 - Matcher - INFO - Executing query against index=perf_scale_ci
2024-03-29 08:55:37,305 - Matcher - INFO - Executing query against index=perf_scale_ci
2024-03-29 08:55:37,352 - Matcher - INFO - Executing query against index=ripsaw-kube-burner*
2024-03-29 08:55:37,398 - Orion - INFO - file: utils.py - line: 123 - Collecting podReadyLatency
2024-03-29 08:55:37,398 - Matcher - INFO - Executing query against index=ripsaw-kube-burner
2024-03-29 08:55:37,433 - Orion - INFO - file: utils.py - line: 123 - Collecting apiserverCPU
2024-03-29 08:55:37,434 - Matcher - INFO - Executing query against index=ripsaw-kube-burner
2024-03-29 08:55:38,170 - Orion - INFO - file: utils.py - line: 123 - Collecting ovnCPU
2024-03-29 08:55:38,170 - Matcher - INFO - Executing query against index=ripsaw-kube-burner
2024-03-29 08:55:39,563 - Orion - INFO - file: utils.py - line: 123 - Collecting etcdCPU
2024-03-29 08:55:39,563 - Matcher - INFO - Executing query against index=ripsaw-kube-burner
2024-03-29 08:55:40,270 - Orion - INFO - file: utils.py - line: 123 - Collecting etcdDisck
2024-03-29 08:55:40,270 - Matcher - INFO - Executing query against index=ripsaw-kube-burner

and with the main branch

orion --config config.yaml --hunter-analyze
2024-03-29 08:57:04,940 - Orion - INFO - The test aws-416-med-scale-cluster-density-v2 has started
2024-03-29 08:57:04,940 - Matcher - INFO - Executing query against index=perf_scale_ci
2024-03-29 08:57:05,080 - Matcher - INFO - Executing query against index=ripsaw-kube-burner*
2024-03-29 08:57:05,118 - Orion - INFO - Collecting podReadyLatency
2024-03-29 08:57:05,119 - Matcher - INFO - Executing query against index=ripsaw-kube-burner
2024-03-29 08:57:05,158 - Orion - INFO - Collecting apiserverCPU
2024-03-29 08:57:05,159 - Matcher - INFO - Executing query against index=ripsaw-kube-burner
2024-03-29 08:57:05,840 - Orion - INFO - Collecting ovnCPU
2024-03-29 08:57:05,841 - Matcher - INFO - Executing query against index=ripsaw-kube-burner
2024-03-29 08:57:07,214 - Orion - INFO - Collecting etcdCPU
2024-03-29 08:57:07,214 - Matcher - INFO - Executing query against index=ripsaw-kube-burner
2024-03-29 08:57:07,953 - Orion - INFO - Collecting etcdDisck
2024-03-29 08:57:07,953 - Matcher - INFO - Executing query against index=ripsaw-kube-burner
time                       uuid                                    P99    apiserverCPU_cpu_avg    ovnCPU_cpu_avg    etcdCPU_cpu_avg    etcdDisck_duration_avg
-------------------------  ------------------------------------  -----  ----------------------  ----------------  -----------------  ------------------------
2024-01-10 14:18:49 +0000  91a7a520-ca19-43b9-9d5f-dca8f3df5518  13000                 28.8317           8.03138            15.8919                 0.0131896
                                                                        ······················                    ·················                            
                                                                                         +8.2%                               +10.5%                            
                                                                        ······················                    ·················                            
2024-02-06 11:05:29 +0000  ff4e1c2c-6960-4081-bc01-5df2c1e72541  13000                 31.0332           8.22205            17.8231                 0.013406
2024-02-06 12:45:08 +0000  20d0c41a-be34-49b4-b5e0-e127bc49aa0d  13000                 31.7655           7.09412            17.3852                 0.0147301
2024-02-06 23:39:44 +0000  e88b185a-34eb-4447-be11-54829bec9d39  13000                 30.832            7.94952            17.5129                 0.0142578
2024-02-07 20:23:43 +0000  7873914e-24c1-4657-8317-0f778df1ec14  13000                 31.6765           6.91436            17.641                  0.0139926
2024-02-08 12:19:27 +0000  c525cda2-8712-4ad4-93f1-ec4a5169ce0d  13000                 31.3236           7.01006            17.8414                 0.0136317
2024-02-09 14:42:53 +0000  f4fcc9eb-c6ce-4ac3-9979-63c0c5466302  13000                 30.7888           6.96967            17.1353                 0.0135467
2024-02-09 14:48:52 +0000  8f2f5271-5dff-44d4-b126-1f856e6a8387  13000                 30.7914           8.18303            17.3592                 0.0132814
2024-02-12 01:22:23 +0000  0fd65106-dd94-4dd7-9ee3-ac062db2909e  13000                 31.42             7.01348            17.8418                 0.0135024

@jtaleric jtaleric self-requested a review March 29, 2024 13:11
@jtaleric
Copy link
Member

I chatted with @shashank-boyapally on Slack. Here is the current thoughts -

  • Initial version of Orion w/ Daemon mode will be opinionated. The user will only provide the version which they want to determine if there is a regression. We will use the openshift-payload job to run Hunter against. The payload jobs have the most data and seem like a good starting point.
  • Follow on version os Orion w/ Daemon mode could consider implementing a way to accept multiple configs, and the user can choose which config they want to determine if there was change detected. One idea is to have a listTests api endpoint to determine which tests are loaded then the user could provide a test name to run Hunter/Algo against, like name=aws-cdv2-fips-120node or whatever the tests are defined as in the configuration file.

@vishnuchalla
Copy link
Collaborator

I chatted with @shashank-boyapally on Slack. Here is the current thoughts -

  • Initial version of Orion w/ Daemon mode will be opinionated. The user will only provide the version which they want to determine if there is a regression. We will use the openshift-payload job to run Hunter against. The payload jobs have the most data and seem like a good starting point.
  • Follow on version os Orion w/ Daemon mode could consider implementing a way to accept multiple configs, and the user can choose which config they want to determine if there was change detected. One idea is to have a listTests api endpoint to determine which tests are loaded then the user could provide a test name to run Hunter/Algo against, like name=aws-cdv2-fips-120node or whatever the tests are defined as in the configuration file.

+1 on these ideas. To add on top of it just so that we don't loose track, Hunter also has this feature of having tests divided into groups and then compare between them for regressions. That would also be a good use case to add later when we get to that point.

@paigerube14
Copy link
Collaborator

@jtaleric @shashank-boyapally these changes sound good to me, one question is is there a way to set a timeframe that if we have a job from say January that shows a regression, should we continue to report that issue with every run or should we set a time period (last 2 weeks) or number of runs back (last 10 runs) to limit re-reporting regressions?

@vishnuchalla
Copy link
Collaborator

vishnuchalla commented Mar 29, 2024

@jtaleric @shashank-boyapally these changes sound good to me, one question is is there a way to set a timeframe that if we have a job from say January that shows a regression, should we continue to report that issue with every run or should we set a time period (last 2 weeks) or number of runs back (last 10 runs) to limit re-reporting regressions?

Hunter also has a parameter where we can specify the timestamp field to look at data regressions since a starting point. Ahh, I forgot to mention earlier, it would be a good addition too.

@shashank-boyapally
Copy link
Contributor Author

Hi Paige, my opinion on this is we should have the previous regression showing in both cmd mode and daemon, the service consuming the api should be able to filter it out based on the timestamp if needed. My take on having the whole regressions based upon the timeline each version of openshift is tested ~ 6 months. One thing we can do is hunter has a timestamp filter which can ignore previous runs before that if we want to have that functionality.

@paigerube14
Copy link
Collaborator

I'm thinking in terms of orion running in a CI where we might only want to know if the latest/current run is showing a regression from previous runs and fail the job if a regression is detected. I think the timestamping from hunter would be helpful with that as both @vishnuchalla and @shashank-boyapally mentioned. Still a ways out from getting this into a CI though. Just a thought. I'll open an issue to track this option

@shashank-boyapally
Copy link
Contributor Author

I added verify certs as an argument so that it can support the new es instances, the PR can be merged once we have fmatch 0.0.7

@vishnuchalla
Copy link
Collaborator

As per our discussion offline, please break down this PR into atomic commits. Thanks

Signed-off-by: Shashank Reddy Boyapally <[email protected]>
Signed-off-by: Shashank Reddy Boyapally <[email protected]>
Signed-off-by: Shashank Reddy Boyapally <[email protected]>
Signed-off-by: Shashank Reddy Boyapally <[email protected]>
Signed-off-by: Shashank Reddy Boyapally <[email protected]>
…ode, rebased, fixed trivial bugs

Signed-off-by: Shashank Reddy Boyapally <[email protected]>
Copy link
Collaborator

@vishnuchalla vishnuchalla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Thanks

Signed-off-by: Shashank Reddy Boyapally <[email protected]>
workerNodesCount: 6
infraNodesCount: 3
benchmark.keyword: node-density-cni
ocpVersion: {{ version }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we maybe put in the title template here so that a user doesn't take this file and just try to run as this version is not filled in

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense, will add a comment mentioning it a template.

Signed-off-by: Shashank Reddy Boyapally <[email protected]>
@@ -0,0 +1,59 @@
# This is a template file
tests :
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hit a small issue with the tiny url timing out when trying to run the below, increasing the timeout I was able to successfully run the command

orion cmd --config examples/small-scale-cluster-density.yaml

`shortener = pyshorteners.Shortener(timeout=10)`

Signed-off-by: Shashank Reddy Boyapally <[email protected]>
Copy link
Collaborator

@paigerube14 paigerube14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@vishnuchalla vishnuchalla merged commit 87bd5e6 into cloud-bulldozer:main May 31, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants