Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Daemon mode for orion #26

Merged
merged 12 commits into from
May 31, 2024
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
122 changes: 111 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,21 @@
# Orion - CLI tool to find regressions
Orion stands as a powerful command-line tool designed for identifying regressions within perf-scale CPT runs, leveraging metadata provided during the process. The detection mechanism relies on [hunter](https://github.com/datastax-labs/hunter).
Orion stands as a powerful command-line tool/daemon designed for identifying regressions within perf-scale CPT runs, leveraging metadata provided during the process. The detection mechanism relies on [hunter](https://github.com/datastax-labs/hunter).

Below is an illustrative example of the config and metadata that Orion can handle:

```
tests :
- name : aws-small-scale-cluster-density-v2
index: ospst-perf-scale-ci-*
benchmarkIndex: ospst-ripsaw-kube-burner*
metadata:
platform: AWS
masterNodesType: m6a.xlarge
masterNodesCount: 3
workerNodesType: m6a.xlarge
workerNodesCount: 24
benchmark.keyword: cluster-density-v2
ocpVersion: 4.15
ocpVersion: {{ version }}
networkType: OVNKubernetes
# encrypted: true
# fips: false
Expand Down Expand Up @@ -50,37 +52,40 @@ tests :
agg:
value: cpu
agg_type: avg
- name: etcdDisck

- name: etcdDisk
metricName : 99thEtcdDiskBackendCommitDurationSeconds
metric_of_interest: value
agg:
value: duration
agg_type: avg

```

## Build Orion
Building Orion is a straightforward process. Follow these commands:

**Note: Orion Compatibility**

Orion currently supports Python versions `3.8.x`, `3.9.x`, `3.10.x`, and `3.11.x`. Please be aware that using other Python versions might lead to dependency conflicts caused by hunter, creating a challenging situation known as "dependency hell." It's crucial to highlight that Python `3.12.x` may result in errors due to the removal of distutils, a dependency used by numpy. This information is essential to ensure a smooth experience with Orion and avoid potential compatibility issues.
Orion currently supports Python version `3.11.x`. Please be aware that using other Python versions might lead to dependency conflicts caused by hunter, creating a challenging situation known as "dependency hell." It's crucial to highlight that Python `3.12.x` may result in errors due to the removal of distutils, a dependency used by numpy. This information is essential to ensure a smooth experience with Orion and avoid potential compatibility issues.

Clone the current repository using git clone.

```
>> git clone <repository_url>
>> pip install venv
>> python3 -m venv venv
>> source venv/bin/activate
>> pip install -r requirements.txt
>> export ES_SERVER = <es_server_url>
>> pip install .
```
## Run Orion
Executing Orion is as simple as building it. After following the build steps, run the following:
Executing Orion is as seamless as its building it. With the latest enhancements, Orion introduces a versatile command-line option and a Daemon mode, empowering users to select the mode that aligns perfectly with their requirements.

### Command-line mode
Running Orion in command-line Mode is straightforward. Simply follow these instructions:
```
>> orion
>> orion cmd --hunter-analyze
```
At the moment,

Expand All @@ -92,7 +97,98 @@ Activate Orion's regression detection tool for performance-scale CPT runs effort

Additionally, users can specify a custom path for the output CSV file using the ```--output``` flag, providing control over the location where the generated CSV will be stored.

### Daemon mode
The core purpose of Daemon mode is to operate Orion as a self-contained server, dedicated to handling incoming requests. By sending a POST request accompanied by a test name of predefined tests, users can trigger change point detection on the provided metadata and metrics. Following the processing, the response is formatted in JSON, providing a structured output for seamless integration and analysis. To trigger daemon mode just use the following commands

```
>> orion daemon
```
**Querying a Test Request to the Daemon Service**

To interact with the Daemon Service, you can send a POST request using `curl` with specific parameters.

*Request URL*

```
POST http://127.0.0.1:8000/daemon
```

*Parameters*

- uuid (optional): The uuid of the run you want to compare with similar runs.
- baseline (optional): The runs you want to compare with.
- version (optional): The ocpVersion you want to use for metadata defaults to `4.15`
- filter_changepoints (optional): set to `true` if you only want changepoints to show up in the response
- test_name (optional): name of the test you want to perform defaults to `small-scale-cluster-density`


Example
```
curl -L -X POST 'http://127.0.0.1:8000/daemon?filter_changepoints=true&version=4.14&test_name=small-scale-node-density-cni'
```


Below is a sample output structure: the top level of the JSON contains the test name, while within each test, runs are organized into arrays. Each run includes succinct metadata alongside corresponding metrics for comprehensive analysis.
```
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This output structure has to align with the remaining open PRs, let get them merged one by one. A reasonable order would be

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO this would be the cleaner approach.

"aws-small-scale-cluster-density-v2": [
{
"uuid": "4cb3efec-609a-4ac5-985d-4cbbcbb11625",
"timestamp": 1704889895,
"buildUrl": "https://tinyurl.com/2ya4ka9z",
"metrics": {
"ovnCPU_avg": {
"value": 2.8503958847,
"percentage_change": 0
},
"apiserverCPU_avg": {
"value": 10.2344511574,
"percentage_change": 0
},
"etcdCPU_avg": {
"value": 8.7663162253,
"percentage_change": 0
},
"P99": {
"value": 13000,
"percentage_change": 0
}
},
"is_changepoint": false
},
]
}
```


**Querying List of Tests Available to the Daemon Service**

To list the tests available, you can send a GET request using `curl`.

*Request URL*

```
GET http://127.0.0.1:8000/daemon/options
```

*Request Body*

The request body should contain the file you want to submit for processing. Ensure that the file is in the proper format (e.g., YAML).

Example
```
curl -L 'http://127.0.0.1:8000/daemon/options'
```

Below is a sample output structure: It contains the opinionated approach list of files available
```
{
"options": [
"small-scale-cluster-density",
"small-scale-node-density-cni"
]
}
```

Orion's seamless integration with metadata and hunter ensures a robust regression detection tool for perf-scale CPT runs.

Expand All @@ -101,7 +197,9 @@ Orion's seamless integration with metadata and hunter ensures a robust regressio

```
tests :
- name : current-uuid-etcd-duration
- name : aws-small-scale-cluster-density-v2
index: ospst-perf-scale-ci-*
benchmarkIndex: ospst-ripsaw-kube-burner*
metrics :
- name: etcdDisck
metricName : 99thEtcdDiskBackendCommitDurationSeconds
Expand All @@ -111,5 +209,7 @@ tests :
agg_type: avg
```

Orion provides flexibility if you know the comparison uuid you want to compare among, use the ```--baseline``` flag. This should only be used in conjunction when setting uuid. Similar to the uuid section mentioned above, you'll have to set a metrics section to specify the data points you want to collect on
Orion provides flexibility if you know the comparison uuid you want to compare among, use the ```--baseline``` flag. This should only be used in conjunction when setting uuid. Similar to the uuid section mentioned above, you'll have to set a metrics section to specify the data points you want to collect on.

`--uuid` and `--baseline` options are available both in cmd and daemon mode

File renamed without changes.
55 changes: 55 additions & 0 deletions configs/small-scale-cluster-density.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
tests :
- name : aws-small-scale-cluster-density-v2
index: ospst-perf-scale-ci-*
benchmarkIndex: ospst-ripsaw-kube-burner*
metadata:
platform: AWS
masterNodesType: m6a.xlarge
masterNodesCount: 3
workerNodesType: m6a.xlarge
workerNodesCount: 24
benchmark.keyword: cluster-density-v2
ocpVersion: {{ version }}
networkType: OVNKubernetes
# encrypted: true
# fips: false
# ipsec: false

metrics :
- name: podReadyLatency
metricName: podLatencyQuantilesMeasurement
quantileName: Ready
metric_of_interest: P99
not:
jobConfig.name: "garbage-collection"

- name: apiserverCPU
metricName : containerCPU
labels.namespace.keyword: openshift-kube-apiserver
metric_of_interest: value
agg:
value: cpu
agg_type: avg

- name: ovnCPU
metricName : containerCPU
labels.namespace.keyword: openshift-ovn-kubernetes
metric_of_interest: value
agg:
value: cpu
agg_type: avg

- name: etcdCPU
metricName : containerCPU
labels.namespace.keyword: openshift-etcd
metric_of_interest: value
agg:
value: cpu
agg_type: avg

- name: etcdDisk
metricName : 99thEtcdDiskBackendCommitDurationSeconds
metric_of_interest: value
agg:
value: duration
agg_type: avg
58 changes: 58 additions & 0 deletions configs/small-scale-node-density-cni.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
tests :
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hit a small issue with the tiny url timing out when trying to run the below, increasing the timeout I was able to successfully run the command

orion cmd --config examples/small-scale-cluster-density.yaml

`shortener = pyshorteners.Shortener(timeout=10)`

- name : aws-small-scale-node-density-cni
index: ospst-perf-scale-ci-*
benchmarkIndex: ospst-ripsaw-kube-burner*
metadata:
platform: AWS
masterNodesType: m6a.xlarge
masterNodesCount: 3
workerNodesType: m6a.xlarge
workerNodesCount: 6
infraNodesCount: 3
benchmark.keyword: node-density-cni
ocpVersion: {{ version }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we maybe put in the title template here so that a user doesn't take this file and just try to run as this version is not filled in

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense, will add a comment mentioning it a template.

networkType: OVNKubernetes
infraNodesType: r5.2xlarge
# encrypted: true
# fips: false
# ipsec: false

metrics :
- name: podReadyLatency
metricName: podLatencyQuantilesMeasurement
quantileName: Ready
metric_of_interest: P99
not:
jobConfig.name: "garbage-collection"

- name: apiserverCPU
metricName : containerCPU
labels.namespace.keyword: openshift-kube-apiserver
metric_of_interest: value
agg:
value: cpu
agg_type: avg

- name: ovnCPU
metricName : containerCPU
labels.namespace.keyword: openshift-ovn-kubernetes
metric_of_interest: value
agg:
value: cpu
agg_type: avg

- name: etcdCPU
metricName : containerCPU
labels.namespace.keyword: openshift-etcd
metric_of_interest: value
agg:
value: cpu
agg_type: avg

- name: etcdDisk
metricName : 99thEtcdDiskBackendCommitDurationSeconds
metric_of_interest: value
agg:
value: duration
agg_type: avg

2 changes: 1 addition & 1 deletion examples/small-scale-node-density-cni.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ tests :
workerNodesCount: 6
infraNodesCount: 3
benchmark.keyword: node-density-cni
ocpVersion: 4.15
ocpVersion: 4.14
networkType: OVNKubernetes
infraNodesType: r5.2xlarge
# encrypted: true
Expand Down
Loading
Loading