Main package to install redborder AI outliers in Rocky Linux 9
- Rocky Linux 9
This code shows runs the outlier detection on a mock dataset. Its recomended to use pipenv or similar to avoid overwritting dependencies.
git clone [email protected]:redBorder/rb-aioutliers.git
cd rb-aioutliers
pip install -r resources/src/requirements.txt
bash resources/src/example/run_example.sh
-
yum install epel-release && rpm -ivh http://repo.redborder.com/redborder-repo-0.0.3-1.el7.rb.noarch.rpm
-
yum install rb-aioutliers
Initially, data is extracted from a designated druid datasource in timeseries format, with configurable metrics and settings. After rescaling from zero to one and segmentation, an autoencoder reconstructs the data, enabling anomaly detection through k-sigma thresholding. The anomalies are outputed in Json format together with the data reconstructed by the autoencoder.
rb-aioutliers utilizes the Flask framework to create an HTTP server. Users can send Druid queries via POST requests to the /calculate endpoint. When rb-aioutliers receives the Druid query, it sends a request to the Druid broker, retrieves the necessary data, and then proceeds to execute the anomaly detection model.
After executing the model, the server can respond with one of two status messages:
HTTP OK 200 Success
: In this case, the response body will be structured as follows:
{
"status": "success",
"anomalies": [Array_of_anomalies],
"prediction": [Array_of_predictions]
}
HTTP 500 Internal Server Error
: If there is an issue during the process, the response will contain an error message:
{
"status": "error",
"msg": "Error_description_message"
}
The rb-aioutliers service generates a custom Druid query and sends it to the Redborder cluster Druid broker. After sending the query, it retrieves the data and attempts to train a model with custom parameters such as epochs or batch size. Once the model training is complete, it outputs a backup file and the generated Keras model.
The rb-aioutliers service can operate in cluster mode! This is achieved by dividing the service into two components: the executor and the trainer.
The executor service is registered in HashiCorp Consul. When the Redborder WebUI sends a request to api/v1/outliers, it will be directed to the rb-aioutliers.service, which, in turn, will redirect the request to any of the nodes running the rb-aioutliers REST server. This server will download the relevant model from S3, based on the sensor specified in the Druid query, and execute it.
For the training service, the Chef client will create individual configuration files for each node, which are generated based on the cookbook and templates. These configuration files specify which sensor should be trained for. The rb-aioutliers-train service will then proceed to (take a look at Trainer jobs below) download the appropriate model from S3 for each node. Once the training is completed, the rb-aioutliers-train service will upload the resulting trained model to S3. It's important to note that each model is unique and specific to the corresponding sensor.
The trainer nodes send job data to a Redis server (this is done by the Trainer service
). Each node also requests the job queue. The RQ worker processes these jobs in the background. If any node goes down, there will be no problem, as the sensor information for training is stored in Redis, and another node can take over its tasks.
For more info about deploy with Chef Server take a look at Outliers Cookbook
If you want to run the app inside a docker container run the following commands
cd ./resources/src
docker-compose up --build -d
now if you list your docker container you will see the following container running.
Container ID | Image | Command | Created | Status | Exposed Ports |
---|---|---|---|---|---|
cb18a72ab60e | src_rb_aioutliers_rest | python main.py | 3 minutes ago | Up 3 minutes | 0.0.0.0:39091->39091/tcp, :::39091->39091/tcp |
- HTTP Method: POST
- Description: Initiates anomaly detection (model execution) with a Druid query.
- Request Body: JSON data containing the Druid query in base64 string.
Example Request:
POST /calculate (application-x-www-form-urlencoded)
query=base64_string
Example Druid Query:
{
"dataSource": "rb_flow",
"granularity": {
"type": "period",
"period": "pt1m",
"origin": "2023-09-21T09:00:00Z"
},
"intervals": [
"2023-09-21T09:00:00+00:00/2023-09-21T10:00:00+00:00"
],
"filter": {
"type": "selector",
"dimension": "sensor_name",
"value": "FlowSensor"
},
"queryType": "timeseries",
"context": {
"timeout": 90000,
"skipEmptyBuckets": "true"
},
"limitSpec": {
"type": "default",
"limit": 100,
"columns": []
},
"aggregations": [
{
"type": "longSum",
"name": "bytes",
"fieldName": "sum_bytes"
}
],
"postAggregations": []
}
Example Respone:
{
"anomalies": [
{
"expected": 36453984.6858499,
"timestamp": "2023-09-28T07:00:00.000Z"
},
{
"expected": 36453984.6858499,
"timestamp": "2023-09-28T07:00:00.000Z"
},
{
"expected": 36453984.6858499,
"timestamp": "2023-09-28T07:00:00.000Z"
},
{
"expected": 36453984.6858499,
"timestamp": "2023-09-28T07:00:00.000Z"
},
{
"expected": 36453984.6858499,
"timestamp": "2023-09-28T07:00:00.000Z"
},
{
"expected": 36453984.6858499,
"timestamp": "2023-09-28T07:00:00.000Z"
},
{
"expected": 36453984.6858499,
"timestamp": "2023-09-28T07:00:00.000Z"
},
{
"expected": 36453984.6858499,
"timestamp": "2023-09-28T07:00:00.000Z"
},
{
"expected": 36453984.6858499,
"timestamp": "2023-09-28T07:00:00.000Z"
},
{
"expected": 36453984.6858499,
"timestamp": "2023-09-28T07:00:00.000Z"
},
{
"expected": 45825798.264862545,
"timestamp": "2023-09-28T07:01:00.000Z"
}
],
"status": "success"
}
- Fork the repository on Github
- Create a named feature branch (like
add_component_x
) - Write your change
- Write tests for your change (if applicable)
- Run the tests, ensuring they all pass
- Submit a Pull Request using Github
- Miguel Álvarez Adsuara [email protected]
- Pablo Rodriguez Flores [email protected]
LICENSE: AFFERO GENERAL PUBLIC LICENSE, Version 3, 19 November 2007