Skip to content

Commit

Permalink
docs: add mkdocs documentation (#79)
Browse files Browse the repository at this point in the history
  • Loading branch information
julesbertrand authored Nov 6, 2023
1 parent 5cd8321 commit dd66594
Show file tree
Hide file tree
Showing 14 changed files with 252 additions and 49 deletions.
45 changes: 45 additions & 0 deletions .github/workflows/deploy_docs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
name: Deploy MkDocs to GitHub Pages

on:
push:
branches:
- main

env:
python-version: "3.10"

jobs:
deploy-docs:
name: Deploy docs
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3

- name: Set up Python ${{ env.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ env.python-version }}

- name: Install poetry
run: make download-poetry

- name: Set up pip cache
uses: actions/[email protected]
with:
path: ~/.cache/pypoetry/virtualenvs
key: venv-${{ env.python-version }}-${{ hashFiles('pyproject.toml') }}-${{ hashFiles('poetry.lock') }}

- name: Set Poetry Path
run: |
echo "$HOME/.poetry/bin" >> $GITHUB_PATH
- name: Install requirements
run: |
poetry run pip install --upgrade pip
poetry install --with docs
- name: Deploying MkDocs documentation
run: |
poetry run mkdocs build
poetry run mkdocs gh-deploy --force
15 changes: 10 additions & 5 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,8 @@ To develop, you will need dev requirements too. Run:
make install-dev-requirements
```

> **Note**
> `poetry.lock` is not committed deliberately, as recommended by Poetry's doc. You can read more about it [here](https://python-poetry.org/docs/basic-usage/#as-a-library-developer).
!!! note "About poetry.lock"
`poetry.lock` is not committed deliberately, as recommended by Poetry's doc. You can read more about it [here](https://python-poetry.org/docs/basic-usage/#as-a-library-developer).

### Codestyle

Expand Down Expand Up @@ -69,6 +69,11 @@ The Release GitHub Action does the following:

The action is triggered by any push to main.

> [!NOTE]
> The release action will be triggered by any push to `main` only if the 'CI' job in the 'release.yaml' workflow succeeds.
> Python Semantic Release will take care of version number update, tag creation and release creation.
!!! tip
The release action will be triggered by any push to `main` only if the 'CI' job in the 'release.yaml' workflow succeeds.
Python Semantic Release will take care of version number update, tag creation and release creation.


When it's done, rebase develop to keep it up to date with main.

And you're done ! 🎉
115 changes: 73 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
<h1 align="center">Vertex Pipelines Deployer</h1>
<h3 align="center">Deploy Vertex Pipelines within minutes</h3>
<p align="center">
This tool is a wrapper aound <a href="https://www.kubeflow.org/docs/components/pipelines/v2/hello-world/">kfp</a> and <a href="https://cloud.google.com/python/docs/reference/aiplatform/latest">google-cloud-aiplatform</a> that allows you to check, compile, upload, run and schedule Vertex Pipelines to a Vertex AI Pipelines endpoint in a standardized manner.
This tool is a wrapper around <a href="https://www.kubeflow.org/docs/components/pipelines/v2/hello-world/">kfp</a> and <a href="https://cloud.google.com/python/docs/reference/aiplatform/latest">google-cloud-aiplatform</a> that allows you to check, compile, upload, run and schedule Vertex Pipelines in a standardized manner.
</p>
</div>
</br>
Expand All @@ -23,48 +23,55 @@
</div>


## Table of Contents
- [Why this tool?](#why-this-tool)
- [Prerequisites](#prerequisites)
- [Installation](#installation)
## 📚 Table of Contents
- [Why this tool?](#❓-why-this-tool)
- [Prerequisites](#📋-prerequisites)
- [Installation](#📦-installation)
- [From git repo](#from-git-repo)
- [From GCS (not available in PyPI yet)](#from-gcs-not-available-in-pypi-yet)
- [Add to requirements](#add-to-requirements)
- [Usage](#usage)
- [Setup](#setup)
- [Folder Structure](#folder-structure)
- [CLI: Deploying a Pipeline with `deploy`](#cli-deploying-a-pipeline-with-deploy)
- [CLI: Checking Pipelines are valid with `check`](#cli-checking-pipelines-are-valid-with-check)
- [CLI: Other commands](#cli-other-commands)
- [Usage](#🚀-usage)
- [Setup](#🛠️-setup)
- [Folder Structure](#📁-folder-structure)
- [CLI: Deploying a Pipeline with `deploy`](#🚀-cli-deploying-a-pipeline-with-deploy)
- [CLI: Checking Pipelines are valid with `check`](#✅-cli-checking-pipelines-are-valid-with-check)
- [CLI: Other commands](#🛠️-cli-other-commands)
- [`create`](#create)
- [`list`](#list)
- [CLI: Options](#cli-options)
- [CLI: Options](#🍭-cli-options)

[Full CLI documentation](docs/CLI_REFERENCE.md)

## Why this tool?
<!-- --8<-- [start:why] -->
## ❓ Why this tool?


Three uses cases:

1. **CI:** check pipeline validity.
1. **Dev mode:** duickly iterate over your pipelines by compiling and running them in multiple environments (test, dev, staging, etc) without duplicating code or looking for the right kfp / aiplatform snippet.
2. **CD:** deploy your pipelines to Vertex Pipelines in a standardized manner in your CD with Cloud Build or GitHub Actions.


Four commands:

- `check`: check your pipelines (imports, compile, check configs validity against pipeline definition).
- `deploy`: compile, upload to Artifact Registry, run and schedule your pipelines.
- `create`: create a new pipeline and config files.
- `list`: list all pipelines in the `vertex/pipelines` folder.
<!-- --8<-- [end:why] -->

## Prerequisites
## 📋 Prerequisites
<!-- --8<-- [start:prerequisites] -->

- Unix-like environment (Linux, macOS, WSL, etc...)
- Python 3.8 to 3.10
- Google Cloud SDK
- A GCP project with Vertex Pipelines enabled
<!-- --8<-- [end:prerequisites] -->

## Installation

## 📦 Installation
<!-- --8<-- [start:installation] -->

### From git repo

Expand Down Expand Up @@ -107,32 +114,33 @@ Then add the following line to your `requirements.in` file:
```bash
file:my/path/to/vertex_deployer-$VERSION.tar.gz
```
<!-- --8<-- [end:installation] -->

## Usage

### Setup
## 🚀 Usage
<!-- --8<-- [start:setup] -->
### 🛠️ Setup

1. Setup your GCP environment:

```bash
export PROJECT_ID=<gcp_project_id>
gcloud config set project $PROJECT_ID
gcloud auth login
gcloud auth application-default login
```
2. You need the following APIs to be enabled:
- Cloud Build API
- Artifact Registry API
- Cloud Storage API
- Vertex AI API

2. You need the following APIs to be enabled:
- Cloud Build API
- Artifact Registry API
- Cloud Storage API
- Vertex AI API
```bash
gcloud services enable \
cloudbuild.googleapis.com \
artifactregistry.googleapis.com \
storage.googleapis.com \
aiplatform.googleapis.com
```

3. Create an artifact registry repository for your base images (Docker format):
```bash
export GAR_DOCKER_REPO_ID=<your_gar_repo_id_for_images>
Expand All @@ -141,20 +149,24 @@ gcloud artifacts repositories create ${GAR_DOCKER_REPO_ID} \
--location=${GAR_LOCATION} \
--repository-format=docker
```

4. Build and upload your base images to the repository. To do so, please follow Google Cloud Build documentation.

5. Create an artifact registry repository for your pipelines (KFP format):
```bash
export GAR_PIPELINES_REPO_ID=<your_gar_repo_id_for_pipelines>
gcloud artifacts repositories create ${GAR_PIPELINES_REPO_ID} \
--location=${GAR_LOCATION} \
--repository-format=kfp
```

6. Create a GCS bucket for Vertex Pipelines staging:
```bash
export GCP_REGION=<your_gcp_region>
export VERTEX_STAGING_BUCKET_NAME=<your_bucket_name>
gcloud storage buckets create gs://${VERTEX_STAGING_BUCKET_NAME} --location=${GCP_REGION}
```

7. Create a service account for Vertex Pipelines:
```bash
export VERTEX_SERVICE_ACCOUNT_NAME=foobar
Expand All @@ -178,7 +190,7 @@ gcloud artifacts repositories add-iam-policy-binding ${GAR_PIPELINES_REPO_ID} \

You can use the deployer CLI (see example below) or import [`VertexPipelineDeployer`](deployer/pipeline_deployer.py) in your code (try it yourself).

### Folder Structure
### 📁 Folder Structure

You must respect the following folder structure. If you already follow the
[Vertex Pipelines Starter Kit folder structure](https://github.com/artefactory/vertex-pipeline-starter-kit), it should be pretty smooth to use this tool:
Expand All @@ -192,9 +204,11 @@ vertex
└─ {pipeline_name}.py
```

> [!NOTE]
> You must have at least these files. If you need to share some config elements between pipelines,
> you can have a `shared` folder in `configs` and import them in your pipeline configs.
!!! tip "About folder structure"
You must have at least these files. If you need to share some config elements between pipelines,
you can have a `shared` folder in `configs` and import them in your pipeline configs.

You can use the [`create`](../usage#create) command to create a new pipeline and config files.

#### Pipelines

Expand Down Expand Up @@ -226,10 +240,12 @@ They must be located in the `config/{pipeline_name}` folder.
`.py` files are useful to define complex configs (e.g. a list of dicts) while `.json` / `.toml` files are useful to define simple configs (e.g. a string).

**How to format them?**

- `.json` and `.toml` files must be valid json files containing only one dict of key: value representing parameter values.
- `.py` files must be valid python files with two important elements:
- `parameter_values` to pass arguments to your pipeline
- `input_artifacts` if you want to retrieve and create input artifacts to your pipeline.

* `parameter_values` to pass arguments to your pipeline
* `input_artifacts` if you want to retrieve and create input artifacts to your pipeline.
See [Vertex Documentation](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.PipelineJob) for more information.

**How to name them?**
Expand All @@ -252,13 +268,15 @@ VERTEX_STAGING_BUCKET_NAME=YOUR_VERTEX_STAGING_BUCKET_NAME # GCS Bucket for Ver
VERTEX_SERVICE_ACCOUNT=YOUR_VERTEX_SERVICE_ACCOUNT # Vertex Pipelines Service Account
```

> **Note**
> We're using env files and dotenv to load the environment variables.
> No default value for `--env-file` argument is provided to ensure that you don't accidentally deploy to the wrong project.
> An [`example.env`](./example/example.env) file is provided in this repo.
> This also allows you to work with multiple environments thanks to env files (`test.env`, `dev.env`, `prod.env`, etc)
!!! note "About env files"
We're using env files and dotenv to load the environment variables.
No default value for `--env-file` argument is provided to ensure that you don't accidentally deploy to the wrong project.
An [`example.env`](./example/example.env) file is provided in this repo.
This also allows you to work with multiple environments thanks to env files (`test.env`, `dev.env`, `prod.env`, etc)
<!-- --8<-- [end:setup] -->

### CLI: Deploying a Pipeline with `deploy`
<!-- --8<-- [start:usage] -->
### 🚀 CLI: Deploying a Pipeline with `deploy`

Let's say you defines a pipeline in `dummy_pipeline.py` and a config file named `config_test.json`. You can deploy your pipeline using the following command:
```bash
Expand All @@ -274,7 +292,7 @@ vertex-deployer deploy dummy_pipeline \
--enable-caching
```

### CLI: Checking Pipelines are valid with `check`
### CLI: Checking Pipelines are valid with `check`

To check that your pipelines are valid, you can use the `check` command. It uses a pydantic model to:
- check that your pipeline imports and definition are valid
Expand All @@ -292,7 +310,7 @@ vertex-deployer check --all
```


### CLI: Other commands
### 🛠️ CLI: Other commands

#### `create`

Expand All @@ -310,17 +328,24 @@ You can list all pipelines in the `vertex/pipelines` folder using the `list` com
vertex-deployer list --with-configs
```

### CLI: Options
### 🍭 CLI: Options

```bash
vertex-deployer --help
```

To see package version:
```bash
vertex-deployer --version
```

To adapt log level, use the `--log-level` option. Default is `INFO`.
```bash
vertex-deployer --log-level DEBUG deploy ...
```

<!-- --8<-- [end:usage] -->

## Repository Structure

```
Expand All @@ -332,7 +357,7 @@ vertex-deployer --log-level DEBUG deploy ...
│ │ └─ release.yaml
│ ├─ CODEOWNERS
│ └─ PULL_REQUEST_TEMPLATE.md
├─ deployer
├─ deployer # Source code
│ ├─ __init__.py
│ ├─ cli.py
│ ├─ constants.py
Expand All @@ -344,8 +369,10 @@ vertex-deployer --log-level DEBUG deploy ...
│ ├─ logging.py
│ ├─ models.py
│ └─ utils.py
├─ docs/ # Documentation folder (mkdocs)
├─ templates/ # Semantic Release templates
├─ tests/
├─ example
├─ example # Example folder with dummy pipeline and config
| ├─ example.env
│ └─ vertex
│ ├─ components
Expand All @@ -364,8 +391,12 @@ vertex-deployer --log-level DEBUG deploy ...
│ └─ dummy_pipeline.py
├─ .gitignore
├─ .pre-commit-config.yaml
├─ catalog-info.yaml # Roadie integration configuration
├─ CHANGELOG.md
├─ CONTRIBUTING.md
├─ LICENSE
├─ Makefile
├─ mkdocs.yml # Mkdocs configuration
├─ pyproject.toml
└─ README.md
```
19 changes: 19 additions & 0 deletions catalog-info.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: vertex-pipelines-deployer
title: Vertex Pipelines Deployer
description: >
Check, compile, upload, run and schedule Vertex Pipelines in a standardized manner.
annotations:
github.com/project-slug: artefactory/vertex-pipelines-deployer
backstage.io/techdocs-ref: dir:.
tags:
- vertex
- mlops
- python

spec:
type: service
owner: julesbertrand
lifecycle: experimental
4 changes: 2 additions & 2 deletions docs/CLI_REFERENCE.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ $ vertex-deployer create [OPTIONS] PIPELINE_NAME

**Options**:

* `--config-type, -ct [json|py]`: The type of the config to create. [default: ConfigType.json]
* `--config-type, -ct [json|py|toml]`: The type of the config to create. [default: ConfigType.json]
* `--help`: Show this message and exit.

## `vertex-deployer deploy`
Expand All @@ -93,7 +93,7 @@ $ vertex-deployer deploy [OPTIONS] PIPELINE_NAME:{}
* `--upload, -u / --no-upload, -nu`: Whether to upload the pipeline to Google Artifact Registry. [default: no-upload]
* `--run, -r / --no-run, -nr`: Whether to run the pipeline. [default: no-run]
* `--schedule, -s / --no-schedule, -ns`: Whether to create a schedule for the pipeline. [default: no-schedule]
* `--cron TEXT`: Cron expression for scheduling the pipeline. To pass it to the CLI, use hyphens e.g. '0-10-*-*-*'.
* `--cron TEXT`: Cron expression for scheduling the pipeline. To pass it to the CLI, use hyphens e.g. `0-10-*-*-*`.
* `--delete-last-schedule, -dls / --no-delete-last-schedule`: Whether to delete the previous schedule before creating a new one. [default: no-delete-last-schedule]
* `--tags TEXT`: The tags to use when uploading the pipeline. [default: latest]
* `--config-filepath, -cfp PATH`: Path to the json/py file with parameter values and input artifacts to use when running the pipeline.
Expand Down
1 change: 1 addition & 0 deletions docs/changelog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
--8<-- "CHANGELOG.md"
Loading

0 comments on commit dd66594

Please sign in to comment.