Skip to content

fabric8-analytics/fabric8-analytics-recommender

Repository files navigation

Fabric8-Analytics Recommender

Note on naming: The Fabric8-Analytics project has evolved from 2 different projects called "cucos" and "bayesian". We’re currently in process of renaming the modules and updating documentation. Until that is completed, please consider "cucos" and "bayesian" to be synonyms of "Fabric8-Analytics".

Contributing

See our contributing guidelines for more info.

Instructions

Clone Fabric8-Analytics Analytics repository

git clone [email protected]:fabric8-analytics/fabric8-analytics-recommender.git
cd Analytics/models

Recommender can have multiple machine-learning models. Currently, it has only one model that is based on the idea of similarity with frequent patterns. A model should be trained with training data and that is an offline job. Once we have a model, we score it to generate recommendations for the given user request. That is online activity. Accordingly, the folder structure has been setup. Let us discuss these training and scoring activities in more details now.

Training: Train ML model with training data

The training job happens in a 'serverless' way i.e. the AWS EMR cluster is setup on the fly and training job is submitted to that. This cluster goes away as soon as the training job is over ! This helps minimize AWS consumption that is very expensive.

The current model finds frequent patterns i.e. set of software-packages that are used together most frequently in the given training data. These frequent patterns are then stored inside a Titan graph. So, you need to setup Titan graph database before running the training.

Once you have graph database ready, you can specify AWS credentials and HTTP URL for Titan graph database server in the docker-compose-analytics-training.yml file and submit the training job as follows:

$ docker-compose -f docker-compose-analytics-training.yml build
$ docker-compose -f docker-compose-analytics-training.yml up

This will return AWS job id that can be used to track the training job on AWS EMR cluster.

Scoring: Run the recommendation service

Online scoring is performed by way of invoking stack-analyses API call on server. A user can follow fabric8-analytics-server setup and then do

$ curl -XPOST "manifest[]=@<path_to_manifest_file>" "http://127.0.0.1:32000/api/v1/stack-analyses"

Check for all possible issues

The script named check-all.sh is to be used to check the sources for all detectable errors and issues. This script can be run w/o any arguments:



Expected script output:


Running all tests and checkers Check all BASH scripts OK Check documentation strings in all Python source file OK Detect common errors in all Python source file OK Detect dead code in all Python source file OK Run Python linter for Python source file OK Unit tests for this project OK Done

Overal result OK ---

An example of script output when one error is detected:


Running all tests and checkers Check all BASH scripts Error: please look into files check-bashscripts.log and check-bashscripts.err for possible causes Check documentation strings in all Python source file OK Detect common errors in all Python source file OK Detect dead code in all Python source file OK Run Python linter for Python source file OK Unit tests for this project OK Done

Overal result One error detected! ---

Coding standards

You can use scripts run-linter.sh and check-docstyle.sh to check if the code follows PEP 8 and PEP 257 coding standards. These scripts can be run w/o any arguments:

./run-linter.sh
./check-docstyle.sh

The first script checks the indentation, line lengths, variable names, white space around operators etc. The second script checks all documentation strings - its presence and format. Please fix any warnings and errors reported by these scripts.

Code complexity measurement

The scripts measure-cyclomatic-complexity.sh and measure-maintainability-index.sh are used to measure code complexity. These scripts can be run w/o any arguments:

./measure-cyclomatic-complexity.sh
./measure-maintainability-index.sh

The first script measures cyclomatic complexity of all Python sources found in the repository. Please see this table for further explanation how to comprehend the results.

The second script measures maintainability index of all Python sources found in the repository. Please see the following link with explanation of this measurement.

You can specify command line option --fail-on-error if you need to check and use the exit code in your workflow. In this case the script returns 0 when no failures has been found and non zero value instead.

Dead code detection

The script detect-dead-code.sh can be used to detect dead code in the repository. This script can be run w/o any arguments:

./detect-dead-code.sh

Please note that due to Python’s dynamic nature, static code analyzers are likely to miss some dead code. Also, code that is only called implicitly may be reported as unused.

Because of this potential problems, only code detected with more than 90% of confidence is reported.

List of directories containing source code, that needs to be checked, are stored in a file directories.txt

Common issues detection

The script detect-common-errors.sh can be used to detect common errors in the repository. This script can be run w/o any arguments:

./detect-common-errors.sh

Please note that only semantical problems are reported.

List of directories containing source code, that needs to be checked, are stored in a file directories.txt

Check for scripts written in BASH

The script named check-bashscripts.sh can be used to check all BASH scripts (in fact: all files with the .sh extension) for various possible issues, incompatibilities, and caveats. This script can be run w/o any arguments:

./check-bashscripts.sh

Please see the following link for further explanation, how the ShellCheck works and which issues can be detected.

Releases

No releases published

Packages

No packages published