SRI DARPA AIE - CriticalMAAS TA3

Background

Key Tools

PyTorch - an open-source deep learning framework primarily developed by Facebook's AI Research lab (FAIR). It provides a flexible and dynamic computational graph computation model, making it popular among researchers and developers for building and training deep neural networks.

PyTorch Lightning - a lightweight PyTorch wrapper that simplifies the process of building, training, and deploying complex deep learning models. It provides a high-level interface and abstractions that abstract away boilerplate code, making it easier for researchers and practitioners to focus on experimenting with and improving their models rather than dealing with low-level implementation details.

Hydra - a framework for elegantly configuring complex applications. The key feature is the ability to dynamically create a hierarchical configuration by composition and override it through config files and the command line.

Project Structure

The directory structure of new project looks like this:

├── data                   <- Project data
│
├── docker                 <- Docker scripts to build images / run containers
│
├── logs                   <- Logs generated by hydra and lightning loggers
├── sri_maper              <- Primary source code folder for MAPER
│   ├── ckpts                 <- Optional folder to hold pretrained models (if not in logs)
│   │
│   ├── configs                 <- Hydra configs
│   │   ├── callbacks               <- Callbacks configs
│   │   ├── data                    <- Data configs
│   │   ├── debug                   <- Debugging configs
│   │   ├── experiment              <- Experiment configs
│   │   ├── extras                  <- Extra utilities configs
│   │   ├── hparams_search          <- Hyperparameter search configs
│   │   ├── hydra                   <- Hydra configs
│   │   ├── logger                  <- Logger configs
│   │   ├── model                   <- Model configs
│   │   ├── paths                   <- Project paths configs
│   │   ├── preprocess              <- Preprocessing configs
│   │   ├── trainer                 <- Trainer configs
│   │   │
│   │   ├── __init__.py        <- python module __init__
│   │   ├── test.yaml          <- Main config for testing
│   │   └── train.yaml         <- Main config for training
│   │
│   ├── notebooks              <- Jupyter notebooks
│   │
│   ├── src                    <- Source code
│   │   ├── data                    <- Data code
│   │   ├── models                  <- Model code
│   │   ├── utils                   <- Utility code
│   │   │
│   │   ├── __init__.py         <- python module __init__
│   │   ├── map.py              <- Run mapping via CLI
│   │   ├── pretrain.py         <- Run pretraining via CLI
│   │   ├── test.py             <- Run testing via CLI
│   │   └── train.py            <- Run training via CLI
│   │
│   ├── __init__.py        <- python module __init__
│
├── .gitignore                <- List of files ignored by git
├── LICENSE.txt               <- License for code repo
├── project_vars.sh           <- Project variables for infrastructure
├── setup.py                  <- File for installing project as a package
└── README.md

Installation

This repo is compatible with running locally, on docker locally, or on docker in a Kubernetes cluster. Please follow the corresponding instrcutions exactly, carefully so install is smooth. Once you are familiar with the structure, you can make changes.

Local install and run

This setup presents the easiest installation but is more brittle than using docker containers. Please make a virtual environment of your choosing, source the environment, clone the repo, and install the code using setup.py. Below are example commands to do so.

# creates and activates virtual environment
conda create -n [VIRTUAL_ENV_NAME] python=3.10
conda activate [VIRTUAL_ENV_NAME]
# clone repo source code locally
git clone https://github.com/DARPA-CRITICALMAAS/sri-ta3.git
cd sri-ta3
# installs from source code
python3 -m pip install -e .

If installation succeeded without errors, you should be able to run the code locally. While we strongly encourage using the command-line-interface (CLI) to MAPER, we provide example notebook files to demonstrate one might use the built CLI from a notebook or python file. You can view the example notebook files to get a sense of what the CLI is capable of. See Command-Line-Interface Tutorial below for getting started with the CLI.

Install with docker container that is run locally

This setup is slightly more involved but provides more robustness across physical devices by using docker. We've written convenience bash scripts to make building and running the docker container much eaiser. First, edit the JOB_TAG REPO_HOST, DUSER, WANDB_API_KEY variables project_vars.sh to your use case. After editing project_vars.sh, please clone the repo, and build the docker image. Below are example commands to do so using the conenivence scripts.

# clone repo source code locally
git clone https://github.com/DARPA-CRITICALMAAS/sri-ta3.git
cd sri-ta3
# builds docker image (installing source in image) and pushes to docker repo
bash docker/run_docker_build_push.sh

Optionally, if you would like to override the default logs and data folders within this repo that are empty to use exisitng ones (e.g. on datalake) that might contain existing logs and data, simply mount (or overwite) the corresponding folders on the datalake to the empty logs and data folders within this repo. Below are examles commands to do so.

sudo mount.cifs -o username=${USER},domain=sri,uid=$(id -u),gid=$(id -g) /datalake/path/to/existing/logs ./logs
sudo mount.cifs -o username=${USER},domain=sri,uid=$(id -u),gid=$(id -g) /datalake/path/to/existing/data ./data

If installation succeeded without errors, you should be able to run the code locally. While we strongly encourage using the command-line-interface (CLI) to MAPER, we provide example notebook files to demonstrate one might use the built CLI from a notebook or python file. You can view the example notebook files to get a sense of what the CLI is capable of. Below are example commands to do so using the conenivence scripts. See Command-Line-Interface Tutorial below for getting started with the CLI.

# starts the docker container
bash docker/run_docker_local.sh
##### EXECUTED WITHIN THE DOCKER CONTIAINER #####
# begins jupyter notebook
jupyter lab --ip 0.0.0.0 --allow-root --NotebookApp.token='' --no-browser
# now you can access the notebook files by browsing to http://localhost:8888/lab

Install with docker container that is run on the SRI International Kubernetes cluster

This setup is slightly more involved but provides more scalability to use more compute by using docker and Kubernetes. First we'll need to prepare some folders on the datalake to contain your data, code, and logs. Under the criticalmaas-ta3 folder (namespace) within the vt-open datalake, make the following directory structure for YOUR use using your employee ID number (i.e. eXXXXX). NOTE, you only need to make the folders with the comment CREATE in it, the others should exist already. Be careful not to corrupt the folders of other users or namespaces.

vt-open
├── ... # other folders for other namespaces - avoid
├── criticalmaas-ta3 # top-level of criticalmaas-ta3 namespace
│   ├── data # contains all criticalmaas-ta3 data - (k8s READ ONLY)
│   └── k8s # contains criticalmaas-ta3 code & logs for ALL users - (k8s READ & WRITE)
│       ├── eXXXXX # folder you should CREATE to contain your code & logs
│       │   ├── code # folder you should CREATE to contain your code
│       │   └── logs # folder you should CREATE to contain your logs
│       └── ... # other folders for other users - avoid
└── ... # other folders for other namespaces - avoid

Next you will need to mount the code folder above locally. By mounting the code folder on the datalake locally, your local edits to source code will be reflected in the datalake, and therefore, on the Kubernetes cluster.

# makes a local code folder
mkdir k8s-code
# mount the datalake folder that hosts the code (Kubernetes will have access)
sudo mount.cifs -o username=${USER},domain=sri,uid=$(id -u),gid=$(id -g) /datalake/path/to/vt-open/criticalmaas-ta3/k8s/${USER}/code ./k8s-code

Last, we'll install the repo. We've written convenience bash scripts to make building and running the docker container much eaiser. Edit the JOB_TAG REPO_HOST, DUSER, WANDB_API_KEY variables project_vars.sh to your use case. After editing project_vars.sh, please clone the repo, and build the docker image. Below are example commands to do so using the conenivence scripts.

# clone repo source code locally
git clone https://github.com/DARPA-CRITICALMAAS/sri-ta3.git
cd sri-ta3
# builds docker image (installing source in image) and pushes to docker repo
bash docker/run_docker_build_push.sh

If installation succeeded without errors, you should be able to run the code locally. While we strongly encourage using the command-line-interface (CLI) to MAPER, we provide example notebook files to demonstrate one might use the built CLI from a notebook or python file. You can view the example notebook files to get a sense of what the CLI is capable of. Below are example commands to do so using the conenivence scripts. See Command-Line-Interface Tutorial below for getting started with the CLI.

# starts the docker container
bash docker/run_docker_k8s.sh
# now you can access the notebook files by browsing to http://localhost:8888/lab
# note, you'll want to forward the Kubernetes container port 8888

Command-Line-Interface (CLI) Tutorial

Usage

Using the CLI is the suggested method of integration into the MAPER code. As additional documentation, we provide example notebook files that use the CLI internally within the jupyter notebook files. However, all actions performed in the jupyter notebook can be performed with the CLI (the notebooks just call the CLI functions internally). We suggest viewing the notebooks files as is (i.e. without running) to understand the CLI, then experiment with using the CLI directly. Below we give examples of the train, test, map, and pretrain capabilties through the CLI.

Train model with default configuration

# train on CPU
python sri_maper/src/train.py trainer=cpu

# train on GPU
python sri_maper/src/train.py trainer=gpu

# train on multi-GPU
python sri_maper/src/train.py trainer=ddp

Train model with chosen experiment configuration from configs/experiment/

python sri_maper/src/train.py experiment=[example]

You can override any parameter from command line like this

python sri_maper/src/train.py trainer.max_epochs=20 data.batch_size=64

You can pretain a model like this

python sri_maper/src/pretrain.py ckpt_path=<PATH_TO_CHECKPOINT/*.ckpt>

You can test an existing checkpoint like this

python sri_maper/src/test.py ckpt_path=<PATH_TO_CHECKPOINT/*.ckpt>

You can output a map with prospectivity likelihood and uncertainty using an existing checkpoint like this (example in exp_maniac_resnet_l22_uscont.yaml experiment)

python sri_maper/src/map.py +experiment=exp_maniac_resnet_l22_uscont data.batch_size=128 ckpt_path=<PATH_TO_CHECKPOINT/*.ckpt>

How It Works - Background about CLI

All PyTorch Lightning modules are dynamically instantiated from module paths specified in config using Hydra. Example model config:

_target_: src.models.mnist_model.MNISTLitModule
lr: 0.001
net:
  _target_: src.models.components.simple_dense_net.SimpleDenseNet
  input_size: 784
  lin1_size: 256
  lin2_size: 256
  lin3_size: 256
  output_size: 10

Using this config we can instantiate the object with the following line:

model = hydra.utils.instantiate(config.model)

This allows you to easily iterate over new models! Every time you create a new one, just specify its module path and parameters in appropriate config file.

Switch between models and datamodules with command line arguments:

python train.py model=mnist

Example pipeline managing the instantiation logic: src/train.py.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SRI DARPA AIE - CriticalMAAS TA3

Background

Key Tools

Project Structure

Installation

Local install and run

Install with docker container that is run locally

Install with docker container that is run on the SRI International Kubernetes cluster

Command-Line-Interface (CLI) Tutorial

Usage

How It Works - Background about CLI

To Do: DocStrings on entire repo

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
data		data
docker		docker
logs		logs
sri_maper		sri_maper
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
project_vars.sh		project_vars.sh
setup.py		setup.py

ajmuelle/sri-ta3

Folders and files

Latest commit

History

Repository files navigation

SRI DARPA AIE - CriticalMAAS TA3

Background

Key Tools

Project Structure

Installation

Local install and run

Install with docker container that is run locally

Install with docker container that is run on the SRI International Kubernetes cluster

Command-Line-Interface (CLI) Tutorial

Usage

How It Works - Background about CLI

To Do: DocStrings on entire repo

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages