Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Dev Containers Docs #39

Merged
merged 38 commits into from
Dec 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
c97caed
Merge pull request #33 from FR-DC/frml-82
Eve-ning Dec 19, 2023
f7ad610
Move common functions to utils
Eve-ning Dec 20, 2023
01b9849
Suppress gcs environ warning
Eve-ning Dec 20, 2023
643a98d
Fix missing label-studio-sdk
Eve-ning Dec 20, 2023
dd8a18d
Set default LABEL_STUDIO_API_KEY
Eve-ning Dec 20, 2023
e16480b
Fix bad typing version
Eve-ning Dec 20, 2023
69e99a9
Merge pull request #34 from FR-DC/FRML-83
Eve-ning Dec 20, 2023
6a85fa7
Unignore shell files, add cml entrypoint
Eve-ning Dec 20, 2023
8e71ad1
Update model-tests.yml
Eve-ning Dec 20, 2023
68effdb
Update model-tests.yml
Eve-ning Dec 20, 2023
046e140
Attempt to fix tests not found
Eve-ning Dec 20, 2023
79c3a99
Attempt to connect to label studio in cml
Eve-ning Dec 20, 2023
4bb29e3
Remove network line
Eve-ning Dec 20, 2023
6084fbb
Update docker-compose.yml
Eve-ning Dec 20, 2023
9a7f408
Force owner
Eve-ning Dec 20, 2023
2ebc298
don't mount rsc
Eve-ning Dec 20, 2023
4b13ce7
Attempt to get label-studio via docker host
Eve-ning Dec 20, 2023
ede2a5c
Update conf and compose
Eve-ning Dec 20, 2023
2d1efdd
Fix incorrect git config
Eve-ning Dec 20, 2023
b55afdd
Add a check for Label Studio server up
Eve-ning Dec 20, 2023
2de41eb
Improve formatting for report
Eve-ning Dec 20, 2023
a4f00c2
Fix issue with env substitution
Eve-ning Dec 20, 2023
bf92c2c
Remove unused evaluate script
Eve-ning Dec 26, 2023
77ba78a
Make GCS error clearer
Eve-ning Dec 26, 2023
5c4a36c
Fix missing default on exception
Eve-ning Dec 26, 2023
d46f4e3
Add dev container spec
Eve-ning Dec 26, 2023
c2ba141
Delete rsc.dvc
Eve-ning Dec 26, 2023
60e5c2a
Merge branch '0.0.8' into FRML-93
Eve-ning Dec 26, 2023
276fa17
Get api key from host
Eve-ning Dec 26, 2023
70b275e
Add missing lightning dep
Eve-ning Dec 26, 2023
a1d79c1
Add uncommentable local W&B setup
Eve-ning Dec 26, 2023
5d457ab
Update getting started docs for dev container
Eve-ning Dec 26, 2023
3ad231b
Update README.md
Eve-ning Dec 26, 2023
3eb0b40
Update devcontainer.json
Eve-ning Dec 26, 2023
d021af7
Attempt to fix codespace problem
Eve-ning Dec 26, 2023
2636cf1
Update Dockerfile
Eve-ning Dec 26, 2023
bac614a
Force Dockerfile to LF
Eve-ning Dec 27, 2023
399fc54
Force Dockerfile to LF
Eve-ning Dec 27, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"name": "frdc",
"build": {
"dockerfile": "../Dockerfile",
},
"containerEnv": {
"LABEL_STUDIO_HOST": "host.docker.internal",
"LABEL_STUDIO_API_KEY": "${localEnv:LABEL_STUDIO_API_KEY}",
},
"runArgs": [
"--gpus=all",
],
"hostRequirements": {
"gpu": true,
}
}
3 changes: 0 additions & 3 deletions .dvcignore

This file was deleted.

1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Dockerfile text=auto eol=lf
44 changes: 33 additions & 11 deletions .github/workflows/model-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,39 +2,55 @@ name: Model Training

on:
pull_request:
branches: ['main']
branches: [ 'main' ]
workflow_dispatch:
inputs:
debug_enabled:
type: boolean
description: 'Run the build with tmate debugging enabled (https://github.com/marketplace/actions/debugging-with-tmate)'
required: false
default: false


jobs:
build:

runs-on: self-hosted
container:
image: docker://ghcr.io/iterative/cml:0-dvc2-base1-gpu
volumes:
# This mounts and persists the venv between runs
- /home/runner/work/frdc-ml/_github_home:/root
# This mounts the resources folder
# - /home/runner/work/frdc-ml/_github_home/rsc:/__w/FRDC-ML/FRDC-ML/rsc
env:
# This is where setup-python will install and cache the venv
AGENT_TOOLSDIRECTORY: "/root/venv"
options: --gpus all

# This uses the host's exposed services
LABEL_STUDIO_HOST: "host.docker.internal"
LABEL_STUDIO_API_KEY: "${{ secrets.LABEL_STUDIO_API_KEY }}"

options: --gpus all
steps:
- uses: actions/checkout@v3

- name: Force change owner
run: |
chown -R root: ~

- name: Check if Label Studio Server is up
run: |
curl --fail --silent --head http://host.docker.internal:8080 || exit 1

- name: Set up Python 3.11
uses: actions/setup-python@v4
with:
python-version: "3.11"

- name: Install via exported requirements.txt
run: |
python -m pip install --upgrade pip
python -m pip install flake8 pytest poetry
python3 -m pip install --upgrade pip
python3 -m pip install flake8 pytest poetry
poetry export --with dev --without-hashes -o requirements.txt
pip3 install -r requirements.txt
pip3 install torch torchvision torchaudio
Expand All @@ -55,15 +71,21 @@ jobs:
run: |
echo "WANDB_API_KEY=${{ secrets.WANDB_API_KEY }}" >> $GITHUB_ENV

# Our project has src as a source path, explicitly add that in.
- name: Add src as PYTHONPATH
run: |
echo "PYTHONPATH=src" >> $GITHUB_ENV
- name: Add directories to PYTHONPATH
run: >
echo "PYTHONPATH=${{ github.workspace }}/src:\
${{ github.workspace }}/tests" >> $GITHUB_ENV

# Enable tmate debugging of manually-triggered workflows if the input option was provided
- name: Setup tmate session
uses: mxschmitt/action-tmate@v3
if: ${{ github.event_name == 'workflow_dispatch' && inputs.debug_enabled }}

# Do not do cd as it'll break PYTHONPATH.
- name: Run Model Training
working-directory: ${{ github.workspace }}/tests
run: |
python3 -m tests.model_tests.chestnut_dec_may.train
git config --global --add safe.directory /__w/FRDC-ML/FRDC-ML
python3 -m model_tests.chestnut_dec_may.train

- name: Comment results via CML
run: |
Expand Down
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -166,7 +166,6 @@ rsc/**/*.tif

**/*/lightning_logs
*.zip
*.sh
*.ckpt
/rsc
**/wandb/
20 changes: 20 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
FROM pytorch/pytorch:2.1.2-cuda12.1-cudnn8-runtime as torch
WORKDIR /devcontainer

COPY ./pyproject.toml /devcontainer/pyproject.toml

RUN apt update -y && apt upgrade -y
RUN apt install git -y

RUN pip3 install --upgrade pip && \
pip3 install poetry && \
pip3 install lightning

RUN conda init bash \
&& . ~/.bashrc \
&& conda activate base \
&& poetry config virtualenvs.create false \
&& poetry install --with dev --no-interaction --no-ansi

RUN apt install curl -y && curl -sSL https://sdk.cloud.google.com | bash
ENV PATH $PATH:/root/google-cloud-sdk/bin
10 changes: 2 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,14 +54,6 @@ To illustrate this, take a look at how
`tests/model_tests/chestnut_dec_may/train.py` is written. It pulls in relevant
modules from each stage and constructs a pipeline.


> Initially, we evaluated a few ML E2E solutions, despite them offering great
> functionality, their flexibility was
> limited. From a dev perspective, **Active Learning** was a gray area, and we
> foresee heavy shoehorning.
> Ultimately, we decided that the risk was too great, thus we resort to
> creating our own solution.

## Contributing

### Pre-commit Hooks
Expand All @@ -80,3 +72,5 @@ If you're using `pip` instead of `poetry`, run the following commands:
pip install pre-commit
pre-commit install
```

Alternatively, you can use Black configured with your own IDE.
125 changes: 99 additions & 26 deletions Writerside/topics/Getting-Started.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
</step>
<step>Start by cloning our repository.
<code-block lang="shell">
git clone https://github.com/Forest-Recovery-Digital-Companion/FRDC-ML.git
git clone https://github.com/FR-DC/FRDC-ML.git
</code-block>
</step>
<step>Then, create a Python Virtual Env <code>pyvenv</code>
Expand Down Expand Up @@ -60,6 +60,26 @@
</step>
</procedure>

<procedure title="Use a Dev. Container" id="install-dev-con">
<tip>
Only use Dev. Containers if you're familiar with your IDEs, it's highly
dependent on clicking around the IDE.
</tip>
<warning>Do not set up a new environment, it'll be included in the environment.</warning>
<step>
Ensure that you have installed pre-requisites for respective IDEs.
<a href="https://code.visualstudio.com/docs/remote/containers#_system-requirements"> VSCode </a>
<a href="https://www.jetbrains.com/help/idea/prerequisites-for-dev-containers.html"> IntelliJ </a>
</step>
<step>Start by cloning our repository.
<code-block lang="shell">
git clone https://github.com/FR-DC/FRDC-ML.git
</code-block>
</step>
<step>Follow steps for respective IDEs to set up the Dev. Container.</step>
<step>Activate the virtual environment. The venv is located in <code>/opt/venv</code></step>
</procedure>

<procedure title="Setting Up Google Cloud" id="gcloud">
<step>
We use Google Cloud to store our datasets. To set up Google Cloud,
Expand All @@ -86,6 +106,49 @@
</step>
</procedure>

<procedure title="Setting Up Label Studio" id="ls">
<tip>This is only necessary if any task requires Label Studio annotations</tip>
<step>
We use Label Studio to annotate our datasets.
We won't go through how to install Label Studio, for contributors, it
should be up on <code>localhost:8080</code>.
</step>
<step>
Then, retrieve your own API key from Label Studio.
<a href="http://localhost:8080/user/account"> Go to your account page </a>
and copy the API key. <br/></step>
<step> Set your API key as an environment variable.
<tabs>
<tab title="Windows">
In Windows, go to "Edit environment variables for
your account" and add this as a new environment variable with name
<code>LABEL_STUDIO_API_KEY</code>.
</tab>
<tab title="Linux">
Export it as an environment variable.
<code-block lang="shell">export LABEL_STUDIO_API_KEY=...</code-block>
</tab>
</tabs>
</step>
</procedure>


<procedure title="Setting Up Weight and Biases" id="wandb">
<step>
We use W&B to track our experiments. To set up W&B,
<a href="https://docs.wandb.ai/quickstart">
install the W&B CLI
</a>
</step>
<step>
Then,
<a href="https://docs.wandb.ai/quickstart">
authenticate your account
</a>.
<code-block lang="shell">wandb login</code-block>
</step>
</procedure>

<procedure title="Pre-commit Hooks" collapsible="true">
<note>This is optional but recommended.
Pre-commit hooks are a way to ensure that your code is formatted correctly.
Expand All @@ -98,30 +161,45 @@
</step>
</procedure>

<procedure title="Running the Tests" collapsible="true" id="tests">
<procedure title="Running the Tests" id="tests">
<step>
Run the tests to make sure everything is working
<code-block lang="shell">
pytest
</code-block>
</step>
<step>
In case of errors:
<deflist>
<def title="google.auth.exceptions.DefaultCredentialsError">
If you get this error, it means that you haven't authenticated your
Google Cloud account.
See <a anchor="gcloud">Setting Up Google Cloud</a>
</def>
<def title="ModuleNotFoundError" collapsible="true">
If you get this error, it means that you haven't installed the
dependencies.
See <a anchor="install">Installing the Dev. Environment</a>
</def>
</deflist>
</step>
</procedure>

## Troubleshooting

### ModuleNotFoundError

It's likely that your `src` and `tests` directories are not in `PYTHONPATH`.
To fix this, run the following command:

```shell
export PYTHONPATH=$PYTHONPATH:./src:./tests
```

Or, set it in your IDE, for example, IntelliJ allows setting directories as
**Source Roots**.

### google.auth.exceptions.DefaultCredentialsError

It's likely that you haven't authenticated your Google Cloud account.
See [Setting Up Google Cloud](#gcloud)

### Couldn't connect to Label Studio

Label Studio must be running locally, exposed on `localhost:8080`. Furthermore,
you need to specify the `LABEL_STUDIO_API_KEY` environment variable. See
[Setting Up Label Studio](#ls)

### Cannot login to W&B

You need to authenticate your W&B account. See [Setting Up Weight and Biases](#wandb)
If you're facing difficulties, set the `WANDB_MODE` environment variable to `offline`
to disable W&B.

## Our Repository Structure

Expand All @@ -132,15 +210,13 @@ help you understand where to put your code.
graph LR
FRDC -- " Core Dependencies " --> src/frdc/
FRDC -- " Resources " --> rsc/
FRDC -- " Pipeline " --> pipeline/
FRDC -- " Tests " --> tests/
FRDC -- " Repo Dependencies " --> pyproject.toml,poetry.lock
src/frdc/ -- " Dataset Loaders " --> ./load/
src/frdc/ -- " Preprocessing Fn. " --> ./preprocess/
src/frdc/ -- " Train Deps " --> ./train/
src/frdc/ -- " Model Architectures " --> ./models/
rsc/ -- " Datasets ... " --> ./dataset_name/
pipeline/ -- " Model Training Pipeline " --> ./model_tests/
```

src/frdc/
Expand All @@ -149,19 +225,16 @@ src/frdc/
rsc/
: Resources. These are usually cached datasets

pipeline/
: Pipeline code. These are the full ML tests of our pipeline.

tests/
: PyTest tests. These are unit tests & integration tests.
: PyTest tests. These are unit, integration, and model tests.

### Unit, Integration, and Pipeline Tests

We have 3 types of tests:

- Unit Tests are usually small, single function tests.
- Integration Tests are larger tests that tests a mock pipeline.
- Pipeline Tests are the true production pipeline tests that will generate a
- Model Tests are the true production pipeline tests that will generate a
model.

### Where Should I contribute?
Expand All @@ -176,9 +249,9 @@ at the <code>src/frdc/</code> directory.
By adding a new component, you'll need to add a new test. Take a look at the
<code>tests/</code> directory.
</def>
<def title="Changing the pipeline">
<def title="Changing the model pipeline">
If you're a ML Researcher, you'll probably be changing the pipeline. Take a
look at the <code>pipeline/</code> directory.
look at the <code>tests/model_tests/</code> directory.
</def>
<def title="Adding a dependency">
If you're adding a new dependency, use <code>poetry add PACKAGE</code> and
Expand Down
2 changes: 1 addition & 1 deletion Writerside/writerside.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,5 @@
<ihp version="2.0">
<topics dir="topics" web-path="topics"/>
<images dir="images" web-path="images"/>
<instance src="d.tree" web-path="/d/" version="0.0.7"/>
<instance src="d.tree" web-path="/d/" version="0.0.8"/>
</ihp>
9 changes: 9 additions & 0 deletions cml.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
set -a
source .env
set +a

cml runner launch \
--token=${GH_CML_TOKEN} \
--labels="cml-gpu" \
--idle-timeout="1h" --driver=github

2 changes: 1 addition & 1 deletion docs/HelpTOC.json
Original file line number Diff line number Diff line change
@@ -1 +1 @@
{"entities":{"pages":{"Overview":{"id":"Overview","title":"Overview","url":"overview.html","level":0,"tabIndex":0},"Getting-Started":{"id":"Getting-Started","title":"Getting Started","url":"getting-started.html","level":0,"tabIndex":1},"ae6f1f90_3454":{"id":"ae6f1f90_3454","title":"Tutorials","level":0,"pages":["Retrieve-our-Datasets"],"tabIndex":2},"Retrieve-our-Datasets":{"id":"Retrieve-our-Datasets","title":"Retrieve our Datasets","url":"retrieve-our-datasets.html","level":1,"parentId":"ae6f1f90_3454","tabIndex":0},"mix-match":{"id":"mix-match","title":"MixMatch","url":"mix-match.html","level":0,"pages":["mix-match-module","custom-k-aug-dataloaders"],"tabIndex":3},"mix-match-module":{"id":"mix-match-module","title":"MixMatch Module","url":"mix-match-module.html","level":1,"parentId":"mix-match","tabIndex":0},"custom-k-aug-dataloaders":{"id":"custom-k-aug-dataloaders","title":"Custom K-Aug Dataloaders","url":"custom-k-aug-dataloaders.html","level":1,"parentId":"mix-match","tabIndex":1},"ae6f1f90_3459":{"id":"ae6f1f90_3459","title":"Model Tests","level":0,"pages":["Model-Test-Chestnut-May-Dec"],"tabIndex":4},"Model-Test-Chestnut-May-Dec":{"id":"Model-Test-Chestnut-May-Dec","title":"Model Test Chestnut May-Dec","url":"model-test-chestnut-may-dec.html","level":1,"parentId":"ae6f1f90_3459","tabIndex":0},"ae6f1f90_3461":{"id":"ae6f1f90_3461","title":"API","level":0,"pages":["load.dataset","load.gcs","preprocessing.scale","preprocessing.extract_segments","preprocessing.morphology","preprocessing.glcm_padded","train.frdc_lightning"],"tabIndex":5},"load.dataset":{"id":"load.dataset","title":"load.dataset","url":"load-dataset.html","level":1,"parentId":"ae6f1f90_3461","tabIndex":0},"load.gcs":{"id":"load.gcs","title":"load.gcs","url":"load-gcs.html","level":1,"parentId":"ae6f1f90_3461","tabIndex":1},"preprocessing.scale":{"id":"preprocessing.scale","title":"preprocessing.scale","url":"preprocessing-scale.html","level":1,"parentId":"ae6f1f90_3461","tabIndex":2},"preprocessing.extract_segments":{"id":"preprocessing.extract_segments","title":"preprocessing.extract_segments","url":"preprocessing-extract-segments.html","level":1,"parentId":"ae6f1f90_3461","tabIndex":3},"preprocessing.morphology":{"id":"preprocessing.morphology","title":"preprocessing.morphology","url":"preprocessing-morphology.html","level":1,"parentId":"ae6f1f90_3461","tabIndex":4},"preprocessing.glcm_padded":{"id":"preprocessing.glcm_padded","title":"preprocessing.glcm_padded","url":"preprocessing-glcm-padded.html","level":1,"parentId":"ae6f1f90_3461","tabIndex":5},"train.frdc_lightning":{"id":"train.frdc_lightning","title":"train.frdc_datamodule \u0026 frdc_module","url":"train-frdc-lightning.html","level":1,"parentId":"ae6f1f90_3461","tabIndex":6}}},"topLevelIds":["Overview","Getting-Started","ae6f1f90_3454","mix-match","ae6f1f90_3459","ae6f1f90_3461"]}
{"entities":{"pages":{"Overview":{"id":"Overview","title":"Overview","url":"overview.html","level":0,"tabIndex":0},"Getting-Started":{"id":"Getting-Started","title":"Getting Started","url":"getting-started.html","level":0,"tabIndex":1},"e8e19623_38829":{"id":"e8e19623_38829","title":"Tutorials","level":0,"pages":["Retrieve-our-Datasets"],"tabIndex":2},"Retrieve-our-Datasets":{"id":"Retrieve-our-Datasets","title":"Retrieve our Datasets","url":"retrieve-our-datasets.html","level":1,"parentId":"e8e19623_38829","tabIndex":0},"mix-match":{"id":"mix-match","title":"MixMatch","url":"mix-match.html","level":0,"pages":["mix-match-module","custom-k-aug-dataloaders"],"tabIndex":3},"mix-match-module":{"id":"mix-match-module","title":"MixMatch Module","url":"mix-match-module.html","level":1,"parentId":"mix-match","tabIndex":0},"custom-k-aug-dataloaders":{"id":"custom-k-aug-dataloaders","title":"Custom K-Aug Dataloaders","url":"custom-k-aug-dataloaders.html","level":1,"parentId":"mix-match","tabIndex":1},"e8e19623_38834":{"id":"e8e19623_38834","title":"Model Tests","level":0,"pages":["Model-Test-Chestnut-May-Dec"],"tabIndex":4},"Model-Test-Chestnut-May-Dec":{"id":"Model-Test-Chestnut-May-Dec","title":"Model Test Chestnut May-Dec","url":"model-test-chestnut-may-dec.html","level":1,"parentId":"e8e19623_38834","tabIndex":0},"e8e19623_38836":{"id":"e8e19623_38836","title":"API","level":0,"pages":["load.dataset","load.gcs","preprocessing.scale","preprocessing.extract_segments","preprocessing.morphology","preprocessing.glcm_padded","train.frdc_lightning"],"tabIndex":5},"load.dataset":{"id":"load.dataset","title":"load.dataset","url":"load-dataset.html","level":1,"parentId":"e8e19623_38836","tabIndex":0},"load.gcs":{"id":"load.gcs","title":"load.gcs","url":"load-gcs.html","level":1,"parentId":"e8e19623_38836","tabIndex":1},"preprocessing.scale":{"id":"preprocessing.scale","title":"preprocessing.scale","url":"preprocessing-scale.html","level":1,"parentId":"e8e19623_38836","tabIndex":2},"preprocessing.extract_segments":{"id":"preprocessing.extract_segments","title":"preprocessing.extract_segments","url":"preprocessing-extract-segments.html","level":1,"parentId":"e8e19623_38836","tabIndex":3},"preprocessing.morphology":{"id":"preprocessing.morphology","title":"preprocessing.morphology","url":"preprocessing-morphology.html","level":1,"parentId":"e8e19623_38836","tabIndex":4},"preprocessing.glcm_padded":{"id":"preprocessing.glcm_padded","title":"preprocessing.glcm_padded","url":"preprocessing-glcm-padded.html","level":1,"parentId":"e8e19623_38836","tabIndex":5},"train.frdc_lightning":{"id":"train.frdc_lightning","title":"train.frdc_datamodule \u0026 frdc_module","url":"train-frdc-lightning.html","level":1,"parentId":"e8e19623_38836","tabIndex":6}}},"topLevelIds":["Overview","Getting-Started","e8e19623_38829","mix-match","e8e19623_38834","e8e19623_38836"]}
6 changes: 3 additions & 3 deletions docs/custom-k-aug-dataloaders.html

Large diffs are not rendered by default.

Loading