Configure your project
PYTHONPATH=<project-root>:<project-root>/server/py
to direct python to the server side packages as well.
Some IDEs can manage this for you, for example on PyCharm - right click onserver/py
folder ➜ mark directory as ➜ source root.
If you are working with an ARM64 machine, please see Developing with ARM64 machines.
We recommend using pyenv to manage your python versions. Once you have pyenv installed, you can create a new environment by running:
pyenv install 3.9
To activate the environment, run:
pyenv shell 3.9
Or, set as default by running:
pyenv global 3.9
Fork, clone and cd into the MLRun repository directory
git clone [email protected]:<your username>/mlrun.git
cd mlrun
Set up a virtualenv (we recommend using venv)
python -m venv venv
source venv/bin/activate
Using UV is also an option,
you may need to override current MLRun default packager-installer env var MLRUN_PYTHON_PACKAGE_INSTALLER
to uv
uv venv venv --seed
source venv/bin/activate
Install MLRun, dependencies and dev dependencies
make install-requirements
pip install -e '.[complete]'
Some MLRun dependencies are not yet available for ARM64 machines via pypi, so we need to work with conda to get the packages compiled for ARM64 platform.
Fork, clone and cd into the MLRun repository directory
git clone [email protected]:<your username>/mlrun.git
cd mlrun
Create a Conda environment and activate it
conda create -n mlrun python=3.9
conda activate mlrun
Then, install the dependencies
make install-conda-requirements
Or, alternatively, you may use native Python atop the ARM64 machine, but you will need to compile some dependencies. Execute below script to overcome MLRun current protobuf~3.20 incompatibility.
NOTE: This script will compile and install protobuf 3.20.0 for macOS arm64. At the end of the installation, it will ask you to type your password to finish installing the compiled protobuf.
./automation/scripts/compile_protobuf320_for_mac_arm64.sh
make install-requirements
Run some unit tests to make sure everything works:
python -m pytest ./tests/projects
If you encounter any error with 'charset_normalizer' for example:
AttributeError: partially initialized module 'charset_normalizer' has no attribute 'md__mypyc' (most likely due to a circular import)
Run:
pip install --force-reinstall charset-normalizer
Finally, install mlrun
pip install -e '.[complete]'
We use ruff as our formatter and linter. Format your code prior opening PR by running:
make fmt
-
Lint
make lint
-
Unit tests
make test-dockerized
-
Integration tests
make test-integration-dockerized
-
System tests - see dedicated section below
-
Title
- Begin the title of the PR with
[<scope>]
, with the first letter of the component name in uppercase, e.g[API] Add endpoint to list runs
. - If the PR is addressing a bug, include the keywords
fix
orbug
in the title of the PR, so that it will be added to theBugs & Fixes
section in the release notes. Additionally, the PR title should reflect how the fix was done and not how it was fixed. For example if there was a race condition where an artifact got deleted and created at the same time, instead of writing "Fixed artifact locking mechanism", write "Fixed artifact creation/deletion race condition". - Use imperative verbs when describing the changes made in the PR. For example, instead of writing
Adding endpoint to list runs
, writeAdd endpoint to list runs
. - Start with a verb after the
[<scope>]
prefix, e.g.[API] Add endpoint to list runs
.
- Begin the title of the PR with
-
Description - It's much easier to review when there is a detailed description of the changes, and especially the why-s, please put effort in writing good description
-
Tests - we care a lot about tests! if your PR will include good test coverage higher chances it will be merged fast
As we support additional enterprise features while running MLRun in an Iguazio system, some system tests can only run
on an Iguazio system. To support this, we have two types of system tests.
Using @pytest.mark.enterprise
markers, we can distinguish between tests that can run on a MLRun Community Edition
instance and tests that requires and can only run on a full Iguazio system.
Any system test which isn't marked with the @pytest.mark.enterprise
marker can run on MLRun Community Edition which
incidentally can also be installed locally on a developer machine.
In the tests/system/
directory exist test suites to run against a running system, in order to test full MLRun flows.
You can follow the Install MLRun on Kubernetes guide to install an instance of MLRun Community Edition on your local machine. Notice the mentioned prerequisites and make sure you have some kubernetes cluster running locally. You can use minikube for this purpose (however this will require an extra step, see below).
To run the system tests, you need to set up the tests/system/env.yml
file. For running the open source system tests,
the only requirement is to set the MLRUN_DBPATH
environment variable to the url of the mlrun api service which is installed
as part of the MLRun Community Edition installation.
Once the installation is completed, it outputs a list of urls which you can use to access the various services. Similar to:
NOTES:
You're up and running !
1. Jupyter UI is available at:
http://127.0.0.1:30040
...
4. MLRun API is exposed externally at:
http://127.0.0.1:30070
...
Happy MLOPSing !!! :]
Notice the "MLRun API is exposed externally at: http://127.0.0.1:30070" line. This is the url you need to set in the
env.yml
file, as the MLRUN_DBPATH
value..
If running via minikube, you will first need to run
minikube -n mlrun service mlrun-api
Which will tunnel the MLRun api service to your local machine. You can then use the url that is outputted by this command
to set the MLRUN_DBPATH
environment variable.
To add new system tests, all that is required is to create a test suite class which inherits the TestMLRunSystem
class from tests.system.base
. In addition, a special skip
annotation must be added to the suite, so it won't run
if the env.yml
isn't filled. If the test can only run on a full Iguazio system and not on an MLRun CE
instance, add the enterprise
marker under the skip
annotation or on the test method itself.
If the enterprise
marker is added to a specific test method, the skip
annotation must be added above it in addition to the annotation
over the test suite.
This is because enterprise tests and open source tests require different env vars to be set in the env.yml
.
For example:
import pytest
from tests.system.base import TestMLRunSystem
@TestMLRunSystem.skip_test_if_env_not_configured
@pytest.mark.enterprise
class TestSomeFunctionality(TestMLRunSystem):
def test_the_functionality(self):
pass
Example of a suite with two tests, one of them meant for enterprise only
import pytest
from tests.system.base import TestMLRunSystem
@TestMLRunSystem.skip_test_if_env_not_configured
class TestSomeFunctionality(TestMLRunSystem):
def test_open_source_features(self):
pass
@TestMLRunSystem.skip_test_if_env_not_configured
@pytest.mark.enterprise
def test_enterprise_features(self):
pass
If some setup or teardown is required for the tests in the suite, add these following functions to the suite:
from tests.system.base import TestMLRunSystem
@TestMLRunSystem.skip_test_if_env_not_configured
class TestSomeFunctionality(TestMLRunSystem):
def custom_setup(self):
pass
def custom_teardown(self):
pass
def test_the_functionality(self):
pass
From here, just use the MLRun SDK within the setup/teardown functions and the tests themselves with regular pytest functionality. The MLRun SDK will work against the live system you configured, and you can write the tests as you would any other pytest test.
All that's left now is to run whichever open source system tests you want to run. You can run them all by running the command
make test-system-open-source
Currently, this can only be done by one of the maintainers, the process is:
- Push your changes to a branch in the upstream repo
- Go to the build action and trigger it for the branch (leave all options default)
- Go to the system test action and trigger
it for the branch, change "Take tested code from action REF" to
true
MLRun moved to Python 3.9 from 1.3.0.
If you are working on MLRun 1.2.x or earlier, you will need to switch between python 3.9 and python 3.7 interpreters.
To work with multiple python interpreters, we recommend using pyenv (see Creating a development environment).
Once you have pyenv installed, create multiple venv
for each Python version, so when you switch between them, you will
have the correct dependencies installed. You can manage and switch venvs through PyCharm project settings.
e.g.:
pyenv shell 3.9
pyenv virtualenv mlrun
pyenv shell 3.7
pyenv virtualenv mlrun37
- Use snake_case for parameters, functions and module names.
- Use CamelCase for Class names.
- Style conventions (handled by linter) can be found here: https://pycodestyle.pycqa.org/en/latest/intro.html#error-codes
- Functions must be kept short, read like a story and break into smaller functions when needed.
- Use meaningful names and avoid abbreviations for variables, functions and modules to enhance readability.
- Avoid using hardcoded values, use Enum values, env variables or configuration files instead.
- Use typing hints for complex data structures.
- Use tmp_path fixture for temporary files/directories in tests.
- When using a function in the same file, it should be declared below the calling function as it makes it easier to understand the functionality.
- Private methods should be declared below the public ones.
- When dealing with numerical values representing sizes or time related variables, add a comment to specify the unit (e.g., KB, MB, seconds, hours) for clarity, or alternatively, use variable names like “size_in_KB” for explicit unit indication.
- After updating an existing code, ensure that the old documentation is synced with the newly added code or needs to be updated accordingly.
- When importing from local code, NEVER use:
from X import Y
instead use:import X
orimport X as Y
. For external packages, it's acceptable to usefrom X import Y
since they won't try to import back to our code. - Docstring format, for all public API-level functions: Format: Use triple quotes (""" """) for docstrings. Description: Provide a brief and informative description of the function's purpose. Parameters: List all parameters with their data types and a brief description. Use the :param tag for each parameter. Return Value: If the function returns a value, describe it using the :return tag. Example:
def function_name(parameter1, parameter2):
"""
Brief description of the function's purpose.
:param parameter1: Description of parameter1.
:param parameter2: Description of parameter2.
:return: Description of the return value (if applicable).
"""
# Function implementation
- When calling functions with multiple parameters, prefer using keyword arguments to improve readability and clarity.
- Logging: use structured variable instead of f-strings, for example:
logger.debug("Message", var1=var1, ...)
, and try to avoid logging large objects which are hard to decipher. - Use f-strings for string formatting instead of the old
.format(...)
except when dealing with template strings.
- When converting an error object to a string representation, instead of using:
str(error)
use:mlrun.errors.err_to_str(error)
- Use
mlrun.mlconf
Instead ofmlrun.config.config
. - When deprecating a parameter/method/class we keep backwards compatibility for 2 minor versions. For example if we deprecated a parameter in 1.6.0, it will be removed in 1.8.0. Always specify what should be used instead. If there is nothing to be used instead, specify why.
- Deprecating a parameter: Check if the parameter is given and output a FutureWarning and add a TODO with when this should be removed to help developers keep track. for example:
if uid:
warnings.warn(
"'uid' is deprecated in 1.6.0 and will be removed in 1.8.0, use 'tree' instead.",
# TODO: Remove this in 1.8.0
FutureWarning,
)
- Deprecating a method: Use 'deprecated'
# TODO: remove in 1.6.0
@deprecated(
version="1.4.0",
reason="'verify_base_image' will be removed in 1.6.0, use 'prepare_image_for_deploy' instead",
category=FutureWarning,
)
def verify_base_image(self):
- Deprecating a class:
# TODO: Remove in 1.7.0
@deprecated(
version="1.5.0",
reason="v1alpha1 mpi will be removed in 1.7.0, use v1 instead",
category=FutureWarning,
)
class MpiRuntimeV1Alpha1(AbstractMPIJobRuntime):
- Minimize imports and avoid unnecessary dependencies in client code.
- Scale performance: be caution when executing large queries in order to prevent overloading the database.
Table partitioning is a database optimization technique that divides a large table into smaller, more manageable segments called partitions, based on a specified column. This can improve query performance by allowing the database to access only relevant partitions instead of scanning the entire table.
Key considerations:
-
Unique and Primary index Every unique key (with the primary key also being a unique key) in the table must include every column used in the table’s partitioning expression. For more information, refer to the documentation.
-
Auto-incremental columns If you need a column to be auto-incremented, it must be part of the primary key and positioned as the first column in the primary key definition. This is essential because otherwise, auto-increment functionality will not work.
-
Influence on MLRun unit tests In MLRun, unit tests are run on an SQLite database, not MySQL. SQLite does not support auto-increment on composite primary keys. To support both MySQL deployments and unit tests, avoid setting
autoincrementable=True
in the column’s model inmodels.py
for partitioned tables with auto-increment columns. Instead, enable auto-increment in MySQL by adding the following to the migration file:
op.execute(
"""
ALTER TABLE <table_name>
MODIFY COLUMN <column> INT NOT NULL AUTO_INCREMENT
"""
)