This project uses Poetry to manage dependencies and virtual environments.
- Create the virtual environment:
poetry install
- Activate the environment:
poetry shell
This project also includes a requirements.txt
generated from the poetry.lock
file.
Install dependencies directly from this file using:
pip install -r requirements.txt
Install the package (note that you need pip ≥ 21.3):
pip install -e .
The pipeline uses Prefect for workflows and Hydra for composable configuration. Workflows can be run via CLI or through the Prefect UI as deployments.
When running via CLI, configurations can be passed in three ways: inline, using a single config file, or using composable config files. Note that Hydra supports additional overriding configurations via CLI for all three options.
Configurations can be passed directly using:
abmpipe demo :: parameters.name=demo_parameters context.name=demo_context series.name=demo_series
Create a config file demo.yaml
with the following contents:
context:
name: demo_context
series:
name: demo_series
parameters:
name: demo_parameters
Then use:
abmpipe demo /path/to/demo.yaml
Create a configs
directory with the following structure:
configs
├── context
│ └── demo.yaml
├── parameters
│ └── demo.yaml
└── series
└── demo.yaml
Each demo.yaml
should contain the field name: <name>
.
Then use:
abmpipe demo parameters=demo context=demo series=demo
Use the flag --dryrun
to display the composed configuration without running the workflow.
Use the flag --deploy
to create a Prefect deployment.
Configs can use Secret fields.
In configs, any field in the form ${secret:name-of-secret}
will be resolved using the Prefect Secret loader.
These values must be configured as a Secret Block in Prefect via a script:
from prefect.blocks.system import Secret
Secret(value="secret-value").save(name="name-of-secret")
or in the Prefect UI under Blocks.
New flows can be added to the flows
module with following structure:
from dataclasses import dataclass
from prefect import flow
@dataclass
class ParametersConfig:
# TODO: add parameter config
@dataclass
class ContextConfig:
# TODO: add context config
@dataclass
class SeriesConfig:
# TODO: add series config
@flow(name="name-of-flow")
def run_flow(context: ContextConfig, series: SeriesConfig, parameters: ParametersConfig) -> None:
# TODO: add flow
The command:
abmpipe name-of-flow
will create a new flow template under the flows
module with the name name_of_flow
.
Notebooks can be helpful for prototyping flows.
Create dataclasses for all relevant configuration for the flow. Specify types and default values, if relevant. For flows in this repo, three types of configs are used:
ParametersConfig
specifies all parameters for the flowContextConfig
specifies the infrastructure context (e.g. local working path or S3 bucket names)SeriesConfig
specifies the simulation series the flow is applied to (e.g. simulation name, conditions, seeds)
Configurations can be loaded in multiple ways.
- Load entire configuration directly from an existing configuration file using the
make_config_from_file
function. Works best for simple configurations without interpolation.
config = make_config_from_file(ConfigDataclass, f"/path/to/config.yaml")
- Load partial configuration directly from an existing configuration file using the
make_config_from_file
function. Missing fields in the config can be loaded from other configuration files usingOmegaConf.load
or set directly. Works best for configurations that use interpolation.
config = make_config_from_file(ConfigDataclass, f"/path/to/config.yaml")
config.field = OmegaConf.load(f"/path/to/another/config.yaml").field
config.field = "value"
- Directly instantiate the config object.
Fields in object initialization can also be loaded using
OmegaConf.load
. Works best for custom configurations or testing configurations.
config = ConfigDataclass(
field="value",
field=OmegaConf.load(f"/path/to/config.yaml").field,
...
)
Import tasks from collections in the undecorated form:
from collection.module.task import task
Tasks can also be imported in decorated form:
from collection.module import task
but will need to called using task.fn()
because we are not in a Prefect flow environment.
Make sure the main flow method has the @flow
decorator and imports should be switched to their @task
decorated form to take advantage of Prefect task and flow monitoring.