diff --git a/.gitignore b/.gitignore
index 8e26edb0..2c949384 100644
--- a/.gitignore
+++ b/.gitignore
@@ -164,9 +164,5 @@ cython_debug/
# Ignores the raw .tif files
rsc/**/*.tif
-# Ignore any secrets files
-.secrets/
-# REMOVE ONLY IF THE SECRET FILES ARE IN .secrets
-*.json
-
-**/*/lightning_logs
\ No newline at end of file
+**/*/lightning_logs
+*.zip
\ No newline at end of file
diff --git a/Writerside/c.list b/Writerside/c.list
new file mode 100644
index 00000000..c4c77a29
--- /dev/null
+++ b/Writerside/c.list
@@ -0,0 +1,6 @@
+
+
+
+
+
\ No newline at end of file
diff --git a/Writerside/d.tree b/Writerside/d.tree
new file mode 100644
index 00000000..faf0dd16
--- /dev/null
+++ b/Writerside/d.tree
@@ -0,0 +1,26 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/Writerside/images/cm-chestnut-maydec.png b/Writerside/images/cm-chestnut-maydec.png
new file mode 100644
index 00000000..c41f8697
Binary files /dev/null and b/Writerside/images/cm-chestnut-maydec.png differ
diff --git a/Writerside/images/graph-chestnut-maydec.png b/Writerside/images/graph-chestnut-maydec.png
new file mode 100644
index 00000000..b717ae36
Binary files /dev/null and b/Writerside/images/graph-chestnut-maydec.png differ
diff --git a/Writerside/topics/Getting-Started.md b/Writerside/topics/Getting-Started.md
new file mode 100644
index 00000000..7615174d
--- /dev/null
+++ b/Writerside/topics/Getting-Started.md
@@ -0,0 +1,191 @@
+# Getting Started
+
+
+ Ensure that you have the right version of Python.
+ The required Python version can be seen in pyproject.toml
+
+ [tool.poetry.dependencies]
+ python = "..."
+
+
+ Start by cloning our repository.
+
+ git clone https://github.com/Forest-Recovery-Digital-Companion/FRDC-ML.git
+
+
+ Then, create a Python Virtual Env pyvenv
+
+
+ python -m venv venv/
+
+
+ python3 -m venv venv/
+
+
+
+
+ Install Poetry
+ Then check if it's installed with
+ poetry --version
+
+ If poetry is not found, it's likely not in the user PATH.
+
+
+ Activate the virtual environment
+
+
+
+ cd venv/Scripts
+ activate
+ cd ../..
+
+
+
+
+ source venv/bin/activate
+
+
+
+
+ Install the dependencies. You should be in the same directory as
+ pyproject.toml
+
+ poetry install --with dev
+
+
+ Install Pre-Commit Hooks
+
+ pre-commit install
+
+
+
+
+
+
+ We use Google Cloud to store our datasets. To set up Google Cloud,
+
+ install the Google Cloud CLI
+
+
+
+ Then,
+
+ authenticate your account
+ .
+ gcloud auth login
+
+
+ Finally,
+
+ set up Application Default Credentials (ADC)
+ .
+ gcloud auth application-default login
+
+
+ To make sure everything is working, run the tests.
+
+
+
+
+ This is optional but recommended.
+ Pre-commit hooks are a way to ensure that your code is formatted correctly.
+ This is done by running a series of checks before you commit your code.
+
+
+
+ pre-commit install
+
+
+
+
+
+
+ Run the tests to make sure everything is working
+
+ pytest
+
+
+
+ In case of errors:
+
+
+ If you get this error, it means that you haven't authenticated your
+ Google Cloud account.
+ See Setting Up Google Cloud
+
+
+ If you get this error, it means that you haven't installed the
+ dependencies.
+ See Installing the Dev. Environment
+
+
+
+
+
+
+## Our Repository Structure
+
+Before starting development, take a look at our repository structure. This will
+help you understand where to put your code.
+
+```mermaid
+graph LR
+ FRDC -- " Core Dependencies " --> src/frdc/
+ FRDC -- " Resources " --> rsc/
+ FRDC -- " Pipeline " --> pipeline/
+ FRDC -- " Tests " --> tests/
+ FRDC -- " Repo Dependencies " --> pyproject.toml,poetry.lock
+ src/frdc/ -- " Dataset Loaders " --> ./load/
+ src/frdc/ -- " Preprocessing Fn. " --> ./preprocess/
+ src/frdc/ -- " Train Deps " --> ./train/
+ src/frdc/ -- " Model Architectures " --> ./models/
+ rsc/ -- " Datasets ... " --> ./dataset_name/
+ pipeline/ -- " Model Training Pipeline " --> ./model_tests/
+```
+
+src/frdc/
+: Source Code for our package. These are the unit components of our pipeline.
+
+rsc/
+: Resources. These are usually cached datasets
+
+pipeline/
+: Pipeline code. These are the full ML tests of our pipeline.
+
+tests/
+: PyTest tests. These are unit tests & integration tests.
+
+### Unit, Integration, and Pipeline Tests
+
+We have 3 types of tests:
+
+- Unit Tests are usually small, single function tests.
+- Integration Tests are larger tests that tests a mock pipeline.
+- Pipeline Tests are the true production pipeline tests that will generate a
+ model.
+
+### Where Should I contribute?
+
+
+
+If you're changing a small component, such as a argument for preprocessing,
+a new model architecture, or a new configuration for a dataset, take a look
+at the src/frdc/ directory.
+
+
+By adding a new component, you'll need to add a new test. Take a look at the
+tests/ directory.
+
+
+If you're a ML Researcher, you'll probably be changing the pipeline. Take a
+look at the pipeline/ directory.
+
+
+If you're adding a new dependency, use poetry add PACKAGE and
+commit the changes to pyproject.toml and poetry.lock.
+
+ E.g. Adding numpy is the same as
+ poetry add numpy
+
+
+
\ No newline at end of file
diff --git a/Writerside/topics/Model-Test-Chestnut-May-Dec.md b/Writerside/topics/Model-Test-Chestnut-May-Dec.md
new file mode 100644
index 00000000..f1ce790d
--- /dev/null
+++ b/Writerside/topics/Model-Test-Chestnut-May-Dec.md
@@ -0,0 +1,113 @@
+# Model Test Chestnut May-Dec
+
+This test is used to evaluate the model performance on the Chestnut Nature Park
+May & December dataset.
+
+See this script in pipeline/model_tests/chestnut_dec_may/main.py.
+
+## Motivation
+
+The usage of this model will be to classify trees in unseen datasets under
+different conditions. In this test, we'll evaluate it under a different season.
+
+A caveat is that it'll be evaluated on the same set of trees, so it's not a
+representative of a field-test. However, given difficulties of yielding
+datasets, this still gives us a good preliminary idea of how the model will
+perform in different conditions.
+
+## Methodology
+
+We simply train on the December dataset, and test on the May dataset.
+
+```mermaid
+graph LR
+ Model -- Train --> DecDataset
+ Model -- Test --> MayDataset
+```
+
+> The inverse of this test is also plausible.
+
+> Ideally, we should have a Validation set to tune the hyperparameters, but
+> given the limitations of the dataset, we'll skip this step.
+> {style='warning'}
+
+## Model
+
+The current Model used is a simple InceptionV3 Transfer Learning model, with
+the last layer replaced with a fully connected layer(s).
+
+```mermaid
+graph LR
+ Input --> InceptionV3
+ InceptionV3[InceptionV3 Frozen] --> FC["FC Layer(s)"]
+ FC --> Softmax
+ Softmax --> Output
+ Input -- Cross Entropy Loss --> Output
+```
+
+> We didn't find significant evidence of improvements of using a more complex
+> FC layer, so multiple or single FC layer are feasible.
+
+## Preprocessing
+
+We perform the following steps:
+
+```mermaid
+graph v
+ Segment --> Scale[Scale Values to 0-1]
+ Scale --> GLCM[GLCM Step 7, Rad 3, Bin 128, Mean Feature]
+ GLCM --> ScaleNorm[Scale Values to 0 Mean 1 Var]
+ ScaleNorm --> Resize[Resize to 299x299]
+```
+
+> We need to scale to 0-1 before GLCM, so that GLCM can bin the values
+> correctly.
+
+### Augmentation
+
+The following augmentations are used:
+
+```mermaid
+graph >
+ Segment --> HFLip[Horizontal Flip 50%]
+ HFLip --> VFLip[Vertical Flip 50%]
+```
+
+> This only operates on training data.
+
+## Hyperparameters
+
+The following hyperparameters are used:
+
+- Optimizer: Adam
+- Learning Rate: 1e-3
+- Batch Size: 5
+- Epochs: 100
+- Early Stopping: 4
+
+## Results
+
+We yield around 40% accuracy on the test set, compared to around 65% for the
+training set. Raising the training accuracy with a more complex model may
+improve the test accuracy, however, due to instability of our test
+results, we can't be sure of this.
+
+### Result Images {collapsible="true"}
+
+
+
+
+
+
+
+
+
+
+### Caveats
+
+- The test set is very small, so the results are not very representative.
+- The test set is the same set of trees, so it's not a true test of the model
+ performance in different conditions.
+- There are many classes with 1 sample, so the model may not be able to learn
+ the features of these classes well.
+
\ No newline at end of file
diff --git a/Writerside/topics/Overview.md b/Writerside/topics/Overview.md
new file mode 100644
index 00000000..a90cd34c
--- /dev/null
+++ b/Writerside/topics/Overview.md
@@ -0,0 +1,17 @@
+# Overview
+
+Forest Recovery Digital Companion (FRDC) is a ML-assisted companion for
+ecologists to automatically classify surveyed trees via an Unmanned Aerial
+Vehicle (UAV).
+
+This package, FRDC-ML is the Machine Learning backbone of this project,
+a centralized repository of tools and model architectures to be used in the
+FRDC pipeline.
+
+[**Get started here**](Getting-Started.md)
+
+## Other Projects
+
+FRDC-UI
+: [The User Interface Repository](https://github.com/Forest-Recovery-Digital-Companion/FRDC-UI/)
+for FRDC, a WebApp GUI for ecologists to adjust annotations.
diff --git a/Writerside/topics/Retrieve-our-Datasets.md b/Writerside/topics/Retrieve-our-Datasets.md
new file mode 100644
index 00000000..41ef9272
--- /dev/null
+++ b/Writerside/topics/Retrieve-our-Datasets.md
@@ -0,0 +1,137 @@
+# Retrieve our Datasets
+
+A tutorial to retrieve our datasets
+
+In this tutorial, we'll learn how to :
+
+- Retrieve FRDC's Hyperspectral Image Data as `np.ndarray`
+- Retrieve FRDC's Ground Truth bounds and labels
+- Slice/segment the image data by the bounds
+
+## Prerequisites
+
+- New here? [Get Started](Getting-Started.md).
+- Setup the Google Cloud Authorization to download the data.
+
+## Retrieve the Data
+
+To retrieve the data, use [FRDCDataset](load.dataset.md#frdcdataset)
+
+Here, we'll download and load our
+
+- `ar`: Hyperspectral Image Data
+- `order`: The order of the bands
+- `bounds`: The bounds of the trees (segments)
+- `labels`: The labels of the trees (segments)
+
+```python
+from frdc.load.dataset import FRDCDataset
+
+ds = FRDCDataset(site="chestnut_nature_park", date="20201218", version=None)
+ar, order = ds.get_ar_bands()
+bounds, labels = ds.get_bounds_and_labels()
+```
+
+### What Datasets are there? {collapsible="true"}
+
+> To know what datasets are available, you can run
+> [FRDCDownloader](load.dataset.md#frdcdownloader)'s `list_gcs_datasets()`
+> method
+
+> Note that some datasets do not have `bounds` and `labels` available as they
+> have not been annotated yet.
+> {style='warning'}
+
+```console
+>>> from frdc.load.dataset import FRDCDownloader
+>>> df = FRDCDownloader().list_gcs_datasets()
+>>> print(df)
+# 0 DEBUG/0
+# 1 casuarina/20220418/183deg
+# 2 casuarina/20220418/93deg
+# 3 chestnut_nature_park/20201218
+# ...
+```
+
+- The first part of the path is the `site`, and the second part is the `date`.
+- The `version` is the rest of the path, if there isn't any, use `None`.
+
+
+
+
+
site="ds"
+
date="date"
+
version="ver"
+
+
+
+
+
site="ds"
+
date="date"
+
version="ver/01/data"
+
+
+
+
+
site="ds"
+
date="date"
+
version=None
+
+
+
+
+## Segment the Data
+
+To segment the data, use [Extract Segments](preprocessing.extract_segments.md).
+
+Here, we'll segment the data by the bounds.
+
+```python
+from frdc.load.dataset import FRDCDataset
+from frdc.preprocess.extract_segments import extract_segments_from_bounds
+
+ds = FRDCDataset(site="chestnut_nature_park", date="20201218", version=None)
+ar, order = ds.get_ar_bands()
+bounds, labels = ds.get_bounds_and_labels()
+segments = extract_segments_from_bounds(ar, bounds)
+```
+
+`segments` is a list of `np.ndarray` of shape H, W, C, representing a tree.
+The order of `segments` is the same as `labels`, so you can use `labels` to
+identify the tree.
+
+> While we have not used `order` in our example, it's useful to determine the
+> order of the bands in `ar` in other applications.
+
+## Plot the Data (Optional) {collapsible="true"}
+
+We can then use these data to plot out the first tree segment.
+
+```python
+import matplotlib.pyplot as plt
+
+from frdc.load.dataset import FRDCDataset
+from frdc.preprocess.extract_segments import extract_segments_from_bounds
+from frdc.preprocess.scale import scale_0_1_per_band
+
+ds = FRDCDataset(site="chestnut_nature_park", date="20201218", version=None)
+ar, order = ds.get_ar_bands()
+bounds, labels = ds.get_bounds_and_labels()
+segments = extract_segments_from_bounds(ar, bounds)
+segment_0_bgr = segments[0]
+segment_0_rgb = segment_0_bgr[..., [2, 1, 0]]
+segment_0_rgb_scaled = scale_0_1_per_band(segment_0_rgb)
+
+plt.imshow(segment_0_rgb_scaled)
+plt.title(f"Tree {labels[0]}")
+plt.show()
+```
+See also: [preprocessing.scale.scale_0_1_per_band](preprocessing.scale.md)
+
+MatPlotLib cannot show the data correctly as-is, so we need to
+- Convert the data from BGR to RGB
+- Scale the data to 0-1 per band
+
+> Remember that the library returns the band `order`? This is useful in
+> debugging the data. If we had shown it in BGR, it'll look off!
+{style='note'}
diff --git a/Writerside/topics/load.dataset.md b/Writerside/topics/load.dataset.md
new file mode 100644
index 00000000..4462b3ca
--- /dev/null
+++ b/Writerside/topics/load.dataset.md
@@ -0,0 +1,157 @@
+# load.dataset
+
+> You need to Set-Up [Google Cloud](Getting-Started.md#gcloud) with the
+> appropriate permissions to use this library.
+> {style='warning'}
+
+
+Load datasets from our GCS bucket.
+
+
+## Classes
+
+
+
+This facilitates authentication and downloading from GCS.
+
+
+This uses the Downloader to download and load the dataset.
+It also implements useful helper functions to load FRDC-specific datasets,
+such as loading our images and labels.
+
+
+
+## Usage
+
+An example loading our Chestnut Nature Park dataset. We retrieve the
+
+- hyperspectral bands
+- order of the bands
+- bounding boxes
+- labels
+
+```python
+from frdc.load import FRDCDataset
+
+ds = FRDCDataset(site='chestnut_nature_park',
+ date='20201218',
+ version=None, )
+ar, order = ds.get_ar_bands()
+bounds, labels = ds.get_bounds_and_labels()
+```
+
+### Custom Authentication & Downloads {collapsible="true"}
+
+If you need granular control over
+
+- where the files are downloaded
+- the credentials used
+- the project used
+- the bucket used
+
+Then pass in a `FRDCDownloader` object to `FRDCDataset`.
+
+```python
+from frdc.load import FRDCDownloader, FRDCDataset
+
+dl = FRDCDownloader(credentials=...,
+ local_dataset_root_dir=...,
+ project_id=...,
+ bucket_name=...)
+ds = FRDCDataset(site='chestnut_nature_park',
+ date='20201218',
+ version=None,
+ dl=dl)
+ar, order = ds.get_ar_bands()
+bounds, labels = ds.get_bounds_and_labels()
+```
+
+If you have a file not easily downloadable by `FRDCDataset`, you can use
+`FRDCDownloader` to download it.
+
+```python
+from frdc.load import FRDCDownloader
+
+dl = FRDCDownloader(credentials=...,
+ local_dataset_root_dir=...,
+ project_id=...,
+ bucket_name=...)
+
+dl.download_file(path_glob="path/to/gcs/file")
+```
+
+This will automatically save the file to the local dataset root dir.
+
+## API
+
+### FRDCDataset
+
+
+
+Initializes the dataset downloader.
+This doesn't immediately download the dataset, but only when you call the
+get_* functions.
+The site, date, version must match the dataset path on GCS. For example
+if the dataset is at
+gs://frdc-scan/my-site/20201218/90deg/map,
+
+
site='my-site'
+
date='20201218'
+
version='90deg/map'
+
+If the dataset doesn't have a "version", for example:
+gs://frdc-scan/my-site/20201218,
+then you can pass in version=None.
+
+If you don't want to search up GCS, you can use FRDCDownloader to list all
+datasets, and their versions with
+FRDCDownloader().list_gcs_datasets()
+
+
+If dl is None, it will create a new FRDCDownloader. Usually,
+you don't need to pass this in unless you have a custom credential, or project.
+
+
+
+Gets the NDArray bands (H x W x C) and channel order as
+tuple[np.ndarray, list[str]].
+This downloads (if missing) and retrieves the stacked NDArray bands.
+This wraps around get_ar_bands_as_dict(), thus if you want more
+control over how the bands are loaded, use that instead.
+
+
+Gets the NDArray bands (H x W) as a dict[str, np.ndarray].
+This downloads (if missing) and retrieves the individual NDArray bands as a
+dictionary. The keys are the band names, and the values are the NDArray bands.
+
+
+Gets the bounding boxes and labels as
+tuple[list[Rect], list[str]].
+This downloads (if missing) and retrieves the bounding boxes and labels as a
+tuple. The first element is a list of bounding boxes, and the second element
+is a list of labels.
+The bounding boxes are in the format of Rect, a
+namedtuple of x0, y0, x1, y1.
+
+
+
+### FRDCDownloader
+
+
+
+Lists all GCS datasets in the bucket as DataFrame
+This works by checking which folders have a specific file, which we call the
+anchor.
+
+
+Downloads a file from GCS.
+This takes in a path glob, a string containing wildcards, and downloads exactly
+1 file. If it matches 0 or more than 1 file, it will raise an error.
+If local_exists_ok is True, it will not download the file if it
+already exists locally. However, if it's False, it will download the file
+only if the hashes don't match.
+
+Use this if you have a file on GCS that can't be downloaded via
+FRDCDataset.
+
+
\ No newline at end of file
diff --git a/Writerside/topics/preprocessing.extract_segments.md b/Writerside/topics/preprocessing.extract_segments.md
new file mode 100644
index 00000000..ed257e8d
--- /dev/null
+++ b/Writerside/topics/preprocessing.extract_segments.md
@@ -0,0 +1,203 @@
+# preprocessing.extract_segments
+
+
+Extracts segments from a label classification or bounds and labels.
+
+
+## Functions
+
+
+
+Extracts segments from a label classification.
+
+
+Extracts segments from Rect bounds.
+
+
+Removes small segments from a label classification.
+
+
+
+### Extract with Boundaries
+
+A boundary is a `Rect` object that represents the minimum bounding box of a
+segment, with x0, y0, x1, y1 coordinates.
+
+It simply slices the original image to the bounding box. The origin is
+the top left corner of the image.
+
+
+
+
++-----------------+ +-----------+
+| Original | | Segmented |
+| Image | | Image |
++-----+-----+-----+ +-----+-----+
+| 1 | 2 | 3 | | 2 | 3 |
++-----+-----+-----+ +-----+-----+
+| 4 | 5 | 6 | -----------> | 5 | 6 |
++-----+-----+-----+ 1, 2, 0, 2 +-----+-----+
+| 7 | 8 | 9 | x0 y0 x1 y1 | 8 | 9 |
++-----+-----+-----+ +-----+-----+
+
+
+
+
++-----------------+ +-----------------+
+| Original | | Segmented |
+| Image | | Image |
++-----+-----+-----+ +-----+-----+-----+
+| 1 | 2 | 3 | | 0 | 2 | 3 |
++-----+-----+-----+ +-----+-----+-----+
+| 4 | 5 | 6 | -----------> | 0 | 5 | 6 |
++-----+-----+-----+ 1, 2, 0, 2 +-----+-----+-----+
+| 7 | 8 | 9 | x0 y0 x1 y1 | 0 | 8 | 9 |
++-----+-----+-----+ +-----+-----+-----+
+
+
+
+
+
+The shape of an NDArray is usually H x W x C. Thus, if you're manually slicing
+with the bounds, make sure that you're slicing the correct axis.
+
+The correct syntax should be ar[y0:y1,x0:x1].
+
+
+### Extract with Labels {collapsible="true"}
+
+A label classification is a `np.ndarray` where each pixel is mapped to a
+segment. The segments are mapped to a unique integer.
+In our project, the 0th label is the background.
+
+For example, a label classification of 3 segments will look like this:
+
+```
++-----------------+ +-----------------+
+| Label | | Original |
+| Classification | | Image |
++-----+-----+-----+ +-----+-----+-----+
+| 1 | 2 | 0 | | 1 | 2 | 3 |
++-----+-----+-----+ +-----+-----+-----+
+| 1 | 2 | 2 | | 4 | 5 | 6 |
++-----+-----+-----+ +-----+-----+-----+
+| 1 | 1 | 0 | | 7 | 8 | 9 |
++-----+-----+-----+ +-----+-----+-----+
+```
+
+The extraction will take the **minimum bounding box** of each segment and
+return a list of segments.
+
+For example, the label 1 and 2 extracted images will be
+
+
+
+
++-----------+ +-----------+
+| Extracted | | Extracted |
+| Segment 1 | | Segment 2 |
++-----+-----+ +-----+-----+
+| 1 | 0 | | 2 | 0 |
++-----+-----+ +-----+-----+
+| 4 | 0 | | 5 | 6 |
++-----+-----+ +-----+-----+
+| 7 | 8 |
++-----+-----+
+
+
+
+
++-----------------+ +-----------------+
+| Extracted | | Extracted |
+| Segment 1 | | Segment 2 |
++-----+-----+-----+ +-----+-----+-----+
+| 1 | 0 | 0 | | 0 | 2 | 0 |
++-----+-----+-----+ +-----+-----+-----+
+| 4 | 0 | 0 | | 0 | 5 | 6 |
++-----+-----+-----+ +-----+-----+-----+
+| 7 | 8 | 0 | | 0 | 0 | 0 |
++-----+-----+-----+ +-----+-----+-----+
+
+
+
+
+- If **cropped is False**, the segments are padded with 0s to the
+ original image size. While this can ensure shape consistency, it can consume
+ more memory for large images.
+- If **cropped is True**, the segments are cropped to the minimum bounding box.
+ This can save memory, but the shape of the segments will be inconsistent.
+
+## Usage
+
+### Extract from Bounds and Labels
+
+Extract segments from bounds and labels.
+
+```python
+import numpy as np
+from frdc.load import FRDCDataset
+from frdc.preprocess.extract_segments import extract_segments_from_bounds
+
+ds = FRDCDataset(site='chestnut_nature_park',
+ date='20201218',
+ version=None, )
+ar, order = ds.get_ar_bands()
+bounds, labels = ds.get_bounds_and_labels()
+
+segments: list[np.ndarray] = extract_segments_from_bounds(ar, bounds)
+```
+
+### Extract from Auto-Segmentation {collapsible="true"}
+
+Extract segments from a label classification.
+
+```python
+from skimage.morphology import remove_small_objects, remove_small_holes
+import numpy as np
+
+from frdc.load import FRDCDataset
+from frdc.preprocess.morphology import (
+ threshold_binary_mask, binary_watershed
+)
+from frdc.preprocess.scale import scale_0_1_per_band
+from frdc.preprocess.extract_segments import (
+ extract_segments_from_labels, remove_small_segments_from_labels
+)
+
+ds = FRDCDataset(site='chestnut_nature_park',
+ date='20201218',
+ version=None, )
+ar, order = ds.get_ar_bands()
+ar = scale_0_1_per_band(ar)
+ar_mask = threshold_binary_mask(ar, -1, 90 / 256)
+ar_mask = remove_small_objects(ar_mask, min_size=100, connectivity=2)
+ar_mask = remove_small_holes(ar_mask, area_threshold=100, connectivity=2)
+ar_labels = binary_watershed(ar_mask)
+ar_labels = remove_small_segments_from_labels(ar_labels,
+ min_height=10, min_width=10)
+
+segments: list[np.ndarray] = extract_segments_from_labels(ar, ar_labels)
+```
+
+> The `remove_small_objects` and `remove_small_holes` are used to remove
+> small noise from the binary mask. This is recommended and used in the
+> original paper.
+> {style='note'}
+
+## API
+
+
+
+Extracts segments from a label classification.
+ar_labels is a label classification as a np.ndarray
+
+
+Extracts segments from Rect bounds.
+bounds is a list of Rect bounds.
+
+
+Removes small segments from a label classification.
+
+
+
+
\ No newline at end of file
diff --git a/Writerside/topics/preprocessing.glcm_padded.md b/Writerside/topics/preprocessing.glcm_padded.md
new file mode 100644
index 00000000..c9b9d175
--- /dev/null
+++ b/Writerside/topics/preprocessing.glcm_padded.md
@@ -0,0 +1,129 @@
+# preprocessing.glcm_padded
+
+
+Computes the GLCM of the NDArray bands with padding.
+
+
+> This is largely a handy wrapper around
+> my [glcm-cupy](https://github.com/Eve-ning/glcm-cupy) package.
+> This auto-computes the necessary padding so that the GLCM is the same size
+> as the original image.
+
+> The GLCM computation is rather slow, so it is recommended to use it
+> only if necessary.
+> {style='warning'}
+
+## Functions
+
+
+Assumes shape H x W x C, where C is the number of bands.
+
+
+
+
+Computes the GLCM of the NDArray bands with padding.
+
+
+Computes the GLCM of the NDArray bands with padding, and caches it.
+
+
+Computes the GLCM of the NDArray bands with padding, and caches it and
+also appends it onto the original array.
+
+
+
+## Usage
+
+We show a few examples of how to use the GLCM functions.
+
+```python
+import numpy as np
+from glcm_cupy import Features
+
+from frdc.preprocess.glcm_padded import (
+ append_glcm_padded_cached, glcm_padded, glcm_padded_cached
+)
+
+ar = np.random.rand(50, 25, 4)
+
+# Returns a shape of H x W x C x GLCM Features
+ar_glcm = glcm_padded(ar, bin_from=1, bin_to=4, radius=3, )
+
+# Returns a shape of H x W x C x 2
+ar_glcm_2_features = glcm_padded(ar, bin_from=1, bin_to=4, radius=3,
+ features=[Features.CONTRAST,
+ Features.CORRELATION])
+
+# Returns a shape of H x W x C x GLCM Features
+ar_glcm_cached = glcm_padded_cached(ar, bin_from=1, bin_to=4, radius=3)
+
+# Returns a shape of H x W x (C x GLCM Features + C)
+ar_glcm_cached_appended = append_glcm_padded_cached(ar, bin_from=1, bin_to=4,
+ radius=3)
+
+```
+
+- `ar_glcm` is the GLCM of the original array, with the last dimension being
+ the GLCM features. The number of features is determined by the `features`
+ parameter, which defaults to all features.
+- `ar_glcm_2_features` selects only 2 features, with the last dimension being
+ the 2 GLCM features specified.
+- `ar_glcm_cached` caches the GLCM so that if you call it again,
+ it will return the cached version. It stores its data at the project root
+ dir, under `.cache/`.
+- `ar_glcm_cached_appended` is a wrapper around `ar_glcm_cached`, it
+ appends the GLCM features onto the original array. It's equivalent to calling
+ `ar_glcm_cached` and then `np.concatenate` on the final axes.
+
+### Caching
+
+GLCM is an expensive operation, thus we recommend to cache it if the input
+parameters will be the same. This is especially useful if you're
+experimenting with the same dataset with constant parameters.
+
+> This cache is automatically invalidated if the parameters change. Thus, if
+> you perform augmentation, the cache will not be used and will be recomputed.
+> This can be wasteful, so it is recommended to perform augmentation after
+> the GLCM computation if possible.
+> {style='warning'}
+
+> The cache is stored at the project root dir, under `.cache/`. It is safe to
+> delete this folder if you want to clear the cache.
+> {style='note'}
+
+## API
+
+
+
+Computes the GLCM of the NDArray bands with padding.
+
+
ar is the input array
+
bin_from is the upper bound of the input
+
bin_to is the upper bound of the GLCM input, i.e. the
+resolution that GLCM operates on
+
radius is the radius of the GLCM
+
step_size is the step size of the GLCM
+
features is the list of GLCM features to compute
+
+The return shape is
+
+H \times W \times C \times \text{GLCM Features}
+
+See glcm_cupy for the GLCM Features.
+
+
+Computes the GLCM of the NDArray bands with padding, and caches it.
+See glcm_padded for the parameters and output shape
+
+
+Computes the GLCM of the NDArray bands with padding, and caches it and
+also appends it onto the original array.
+See glcm_padded for the parameters
+The return shape is:
+
+H \times W \times (C \times \text{GLCM Features} + C)
+
+The function automatically flattens the last 2 dimensions of the GLCM
+features, and appends it onto the original array.
+
+
diff --git a/Writerside/topics/preprocessing.morphology.md b/Writerside/topics/preprocessing.morphology.md
new file mode 100644
index 00000000..50b5d7b2
--- /dev/null
+++ b/Writerside/topics/preprocessing.morphology.md
@@ -0,0 +1,66 @@
+# preprocessing.morphology
+
+
+Performs morphological operations on the NDArray bands.
+
+
+> This is currently only used for auto-segmentation. If you want to use
+> predefined segmentation see
+> [preprocessing.extract_segments](preprocessing.extract_segments.md).
+
+## Functions
+
+
+Assumes shape H x W x C, where C is the number of bands.
+
+
+
+
+Thresholds a selected NDArray bands to yield a binary mask.
+
+
+Performs watershed on a binary mask to yield a mapped label
+classification
+
+
+
+## Usage
+
+Perform auto-segmentation on a dataset to yield a label classification.
+
+```python
+from frdc.load import FRDCDataset
+from frdc.preprocess.morphology import (
+ threshold_binary_mask, binary_watershed
+)
+
+ds = FRDCDataset(site='chestnut_nature_park',
+ date='20201218',
+ version=None, )
+ar, order = ds.get_ar_bands()
+mask = threshold_binary_mask(ar, order.index('NIR'), 90 / 256)
+ar_label = binary_watershed(mask)
+```
+
+## API
+
+
+
+Thresholds a selected NDArray bands to yield a binary mask as
+np.ndarray
+This is equivalent to
+
+ar[..., band_idx] > threshold_value
+
+
+
+Performs watershed on a binary mask to yield a mapped label
+classification as a np.ndarray
+
+
peaks_footprint is the footprint of
+skimage.feature.peak_local_max
+
watershed_compactness is the compactness of
+skimage.morphology.watershed
+
+
+
\ No newline at end of file
diff --git a/Writerside/topics/preprocessing.scale.md b/Writerside/topics/preprocessing.scale.md
new file mode 100644
index 00000000..e210493c
--- /dev/null
+++ b/Writerside/topics/preprocessing.scale.md
@@ -0,0 +1,70 @@
+# preprocessing.scale
+
+
+Scales the NDArray bands.
+
+
+## Functions
+
+
+Assumes shape H x W x C, where C is the number of bands.
+
+
+
+
+Scales the NDArray bands to [0, 1] per band.
+
+
+Scales the NDArray bands to zero mean unit variance per band.
+
+
+Scales the NDArray bands by a predefined configuration.
+
+
+
+## Usage
+
+```python
+from frdc.load import FRDCDataset
+from frdc.preprocess.scale import (
+ scale_0_1_per_band, scale_normal_per_band, scale_static_per_band
+)
+from frdc.conf import BAND_MAX_CONFIG
+
+ds = FRDCDataset(site='chestnut_nature_park',
+ date='20201218',
+ version=None, )
+ar, order = ds.get_ar_bands()
+ar_01 = scale_0_1_per_band(ar)
+ar_norm = scale_normal_per_band(ar)
+ar_static = scale_static_per_band(ar, order, BAND_MAX_CONFIG)
+```
+
+> The static scaling has a default config, which was inferred by our capturing
+> device.
+
+## API
+
+
+
+Scales the NDArray bands to [0, 1] per band.
+
+(x - \min(x)) / (\max(x) - \min(x))
+
+
+
+Scales the NDArray bands to zero mean unit variance per band.
+
+(x - \mu) / \sigma
+
+
+
+Scales the NDArray bands by a predefined configuration.
+The config is of dict[str, tuple[int, int]] where
+the key is the band name, and the value is a tuple of (min, max).
+Take a look at frdc.conf.BAND_MAX_CONFIG for an example.
+
+(x - c_0) / (c_1 - c_0)
+
+
+
\ No newline at end of file
diff --git a/Writerside/topics/train.frdc_lightning.md b/Writerside/topics/train.frdc_lightning.md
new file mode 100644
index 00000000..e5a358d2
--- /dev/null
+++ b/Writerside/topics/train.frdc_lightning.md
@@ -0,0 +1,70 @@
+# train.frdc_datamodule & frdc_module
+
+
+The FRDC PyTorch LightningDataModule and LightningModule.
+
+
+These are FRDC specific LightningDataModule and LightningModule,
+a core component in the PyTorch Lightning ecosystem to provide a simple
+interface to train and evaluate models.
+
+## Classes
+
+> It's optional to use these classes, you can use your own training loop
+> if you want. We'll use these for our training pipeline.
+> {style='note'}
+
+
+
+The FRDC PyTorch Lightning DataModule.
+
+
+The FRDC PyTorch Lightning Module.
+
+
+
+## Usage
+
+> See our training pipeline for a full example
+
+## API
+
+
+
+Initializes the FRDC PyTorch Lightning DataModule.
+
+
preprocess is a function that takes in a segment and returns a preprocessed
+segment. In particular, it should accept a list of NumPy NDArrays and return
+a single stacked PyToch Tensor.
+
augmentation is a function that takes in a segment and returns an augmented
+segment. In particular, it takes in a PyTorch Tensor and returns another.
+
train_val_test_split is a function that takes a TensorDataset and returns
+a list of 3 TensorDatasets, for train, val and test respectively.
+
batch_size is the batch size.
+
+For now, the augmentation is only applied to training
+data
+
+
+Initializes the FRDC PyTorch Lightning Module.
+
+
model_cls is the Class of the model.
+
model_kwargs is the kwargs to pass to the model.
+
optim_cls is the Class of the optimizer.
+
optim_kwargs is the kwargs to pass to the optimizer.
+
+Internally, the module will initialize the model and optimizer as follows:
+
+model = model_cls(**model_kwargs)
+optim = optim_cls(model.parameters(), **optim_kwargs)
+
+We do not accept the instances of the Model and Optimizer so
+that we can pickle them.
+
+
diff --git a/Writerside/v.list b/Writerside/v.list
new file mode 100644
index 00000000..2d12cb39
--- /dev/null
+++ b/Writerside/v.list
@@ -0,0 +1,5 @@
+
+
+
+
+
diff --git a/Writerside/writerside.cfg b/Writerside/writerside.cfg
new file mode 100644
index 00000000..e83c370c
--- /dev/null
+++ b/Writerside/writerside.cfg
@@ -0,0 +1,8 @@
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/docs/HelpTOC.json b/docs/HelpTOC.json
new file mode 100644
index 00000000..322920fe
--- /dev/null
+++ b/docs/HelpTOC.json
@@ -0,0 +1 @@
+{"entities":{"pages":{"Overview":{"id":"Overview","title":"Overview","url":"overview.html","level":0,"tabIndex":0},"Getting-Started":{"id":"Getting-Started","title":"Getting Started","url":"getting-started.html","level":0,"tabIndex":1},"87c6272d_78682":{"id":"87c6272d_78682","title":"Tutorials","level":0,"pages":["Retrieve-our-Datasets"],"tabIndex":2},"Retrieve-our-Datasets":{"id":"Retrieve-our-Datasets","title":"Retrieve our Datasets","url":"retrieve-our-datasets.html","level":1,"parentId":"87c6272d_78682","tabIndex":0},"87c6272d_78684":{"id":"87c6272d_78684","title":"Model Tests","level":0,"pages":["Model-Test-Chestnut-May-Dec"],"tabIndex":3},"Model-Test-Chestnut-May-Dec":{"id":"Model-Test-Chestnut-May-Dec","title":"Model Test Chestnut May-Dec","url":"model-test-chestnut-may-dec.html","level":1,"parentId":"87c6272d_78684","tabIndex":0},"87c6272d_78686":{"id":"87c6272d_78686","title":"API","level":0,"pages":["load.dataset","preprocessing.scale","preprocessing.extract_segments","preprocessing.morphology","preprocessing.glcm_padded","train.frdc_lightning"],"tabIndex":4},"load.dataset":{"id":"load.dataset","title":"load.dataset","url":"load-dataset.html","level":1,"parentId":"87c6272d_78686","tabIndex":0},"preprocessing.scale":{"id":"preprocessing.scale","title":"preprocessing.scale","url":"preprocessing-scale.html","level":1,"parentId":"87c6272d_78686","tabIndex":1},"preprocessing.extract_segments":{"id":"preprocessing.extract_segments","title":"preprocessing.extract_segments","url":"preprocessing-extract-segments.html","level":1,"parentId":"87c6272d_78686","tabIndex":2},"preprocessing.morphology":{"id":"preprocessing.morphology","title":"preprocessing.morphology","url":"preprocessing-morphology.html","level":1,"parentId":"87c6272d_78686","tabIndex":3},"preprocessing.glcm_padded":{"id":"preprocessing.glcm_padded","title":"preprocessing.glcm_padded","url":"preprocessing-glcm-padded.html","level":1,"parentId":"87c6272d_78686","tabIndex":4},"train.frdc_lightning":{"id":"train.frdc_lightning","title":"train.frdc_datamodule \u0026 frdc_module","url":"train-frdc-lightning.html","level":1,"parentId":"87c6272d_78686","tabIndex":5}}},"topLevelIds":["Overview","Getting-Started","87c6272d_78682","87c6272d_78684","87c6272d_78686"]}
\ No newline at end of file
diff --git a/docs/Map.jhm b/docs/Map.jhm
new file mode 100644
index 00000000..2de2b668
--- /dev/null
+++ b/docs/Map.jhm
@@ -0,0 +1 @@
+
\ No newline at end of file
diff --git a/docs/config.json b/docs/config.json
new file mode 100644
index 00000000..88bdd5dc
--- /dev/null
+++ b/docs/config.json
@@ -0,0 +1 @@
+{"productVersion":"0.0.4","productId":"d","stage":"release","downloadTitle":"Get Documentation","keymaps":{},"searchMaxHits":75,"productName":"Documentation"}
\ No newline at end of file
diff --git a/docs/current.help.version b/docs/current.help.version
new file mode 100644
index 00000000..05b19b1f
--- /dev/null
+++ b/docs/current.help.version
@@ -0,0 +1 @@
+0.0.4
\ No newline at end of file
diff --git a/docs/getting-started.html b/docs/getting-started.html
new file mode 100644
index 00000000..ddf5b5e8
--- /dev/null
+++ b/docs/getting-started.html
@@ -0,0 +1,20 @@
+
Getting Started | Documentation
Documentation 0.0.4 Help
Getting Started
Installing the Dev. Environment
Ensure that you have the right version of Python. The required Python version can be seen in pyproject.toml
Before starting development, take a look at our repository structure. This will help you understand where to put your code.
src/frdc/
Source Code for our package. These are the unit components of our pipeline.
rsc/
Resources. These are usually cached datasets
pipeline/
Pipeline code. These are the full ML tests of our pipeline.
tests/
PyTest tests. These are unit tests & integration tests.
Unit, Integration, and Pipeline Tests
We have 3 types of tests:
Unit Tests are usually small, single function tests.
Integration Tests are larger tests that tests a mock pipeline.
Pipeline Tests are the true production pipeline tests that will generate a model.
Where Should I contribute?
Changing a small component
If you're changing a small component, such as a argument for preprocessing, a new model architecture, or a new configuration for a dataset, take a look at the src/frdc/ directory.
Adding a test
By adding a new component, you'll need to add a new test. Take a look at the tests/ directory.
Changing the pipeline
If you're a ML Researcher, you'll probably be changing the pipeline. Take a look at the pipeline/ directory.
Adding a dependency
If you're adding a new dependency, use poetry add PACKAGE and commit the changes to pyproject.toml and poetry.lock.
\ No newline at end of file
diff --git a/docs/images/cm-chestnut-maydec.png b/docs/images/cm-chestnut-maydec.png
new file mode 100644
index 00000000..c41f8697
Binary files /dev/null and b/docs/images/cm-chestnut-maydec.png differ
diff --git a/docs/images/graph-chestnut-maydec.png b/docs/images/graph-chestnut-maydec.png
new file mode 100644
index 00000000..b717ae36
Binary files /dev/null and b/docs/images/graph-chestnut-maydec.png differ
diff --git a/docs/index.html b/docs/index.html
new file mode 100644
index 00000000..369fbdf1
--- /dev/null
+++ b/docs/index.html
@@ -0,0 +1,9 @@
+
+
+
+You will be redirected shortly
+
+
Redirecting…
+Click here if you are not redirected.
+
+
diff --git a/docs/load-dataset.html b/docs/load-dataset.html
new file mode 100644
index 00000000..e82495dc
--- /dev/null
+++ b/docs/load-dataset.html
@@ -0,0 +1,31 @@
+ load.dataset | Documentation
Documentation 0.0.4 Help
load.dataset
Classes
FRDCDownloader
This facilitates authentication and downloading from GCS.
FRDCDataset
This uses the Downloader to download and load the dataset. It also implements useful helper functions to load FRDC-specific datasets, such as loading our images and labels.
Usage
An example loading our Chestnut Nature Park dataset. We retrieve the
This doesn't immediately download the dataset, but only when you call the get_* functions.
The site, date, version must match the dataset path on GCS. For example if the dataset is at gs://frdc-scan/my-site/20201218/90deg/map,
site='my-site'
date='20201218'
version='90deg/map'
If the dataset doesn't have a "version", for example: gs://frdc-scan/my-site/20201218, then you can pass in version=None.
get_ar_bands()
Gets the NDArray bands (H x W x C) and channel order as tuple[np.ndarray, list[str]].
This downloads (if missing) and retrieves the stacked NDArray bands. This wraps around get_ar_bands_as_dict(), thus if you want more control over how the bands are loaded, use that instead.
get_ar_bands_as_dict()
Gets the NDArray bands (H x W) as a dict[str, np.ndarray].
This downloads (if missing) and retrieves the individual NDArray bands as a dictionary. The keys are the band names, and the values are the NDArray bands.
get_bounds_and_labels()
Gets the bounding boxes and labels as tuple[list[Rect], list[str]].
This downloads (if missing) and retrieves the bounding boxes and labels as a tuple. The first element is a list of bounding boxes, and the second element is a list of labels.
FRDCDownloader
list_gcs_datasets(anchor)
Lists all GCS datasets in the bucket as DataFrame
This works by checking which folders have a specific file, which we call the anchor.
download_file(path_glob, local_exists_ok)
Downloads a file from GCS.
This takes in a path glob, a string containing wildcards, and downloads exactly 1 file. If it matches 0 or more than 1 file, it will raise an error.
If local_exists_ok is True, it will not download the file if it already exists locally. However, if it's False, it will download the file only if the hashes don't match.
This test is used to evaluate the model performance on the Chestnut Nature Park May & December dataset.
See this script in pipeline/model_tests/chestnut_dec_may/main.py.
Motivation
The usage of this model will be to classify trees in unseen datasets under different conditions. In this test, we'll evaluate it under a different season.
A caveat is that it'll be evaluated on the same set of trees, so it's not a representative of a field-test. However, given difficulties of yielding datasets, this still gives us a good preliminary idea of how the model will perform in different conditions.
Methodology
We simply train on the December dataset, and test on the May dataset.
Model
The current Model used is a simple InceptionV3 Transfer Learning model, with the last layer replaced with a fully connected layer(s).
Preprocessing
We perform the following steps:
Augmentation
The following augmentations are used:
Hyperparameters
The following hyperparameters are used:
Optimizer: Adam
Learning Rate: 1e-3
Batch Size: 5
Epochs: 100
Early Stopping: 4
Results
We yield around 40% accuracy on the test set, compared to around 65% for the training set. Raising the training accuracy with a more complex model may improve the test accuracy, however, due to instability of our test results, we can't be sure of this.
Result Images
Caveats
The test set is very small, so the results are not very representative.
The test set is the same set of trees, so it's not a true test of the model performance in different conditions.
There are many classes with 1 sample, so the model may not be able to learn the features of these classes well.
\ No newline at end of file
diff --git a/docs/overview.html b/docs/overview.html
new file mode 100644
index 00000000..72478e87
--- /dev/null
+++ b/docs/overview.html
@@ -0,0 +1 @@
+ Overview | Documentation
Documentation 0.0.4 Help
Overview
Forest Recovery Digital Companion (FRDC) is a ML-assisted companion for ecologists to automatically classify surveyed trees via an Unmanned Aerial Vehicle (UAV).
This package, FRDC-ML is the Machine Learning backbone of this project, a centralized repository of tools and model architectures to be used in the FRDC pipeline.
\ No newline at end of file
diff --git a/docs/preprocessing-extract-segments.html b/docs/preprocessing-extract-segments.html
new file mode 100644
index 00000000..39000f35
--- /dev/null
+++ b/docs/preprocessing-extract-segments.html
@@ -0,0 +1,94 @@
+ preprocessing.extract_segments | Documentation
Documentation 0.0.4 Help
preprocessing.extract_segments
Functions
extract_segments_from_labels
Extracts segments from a label classification.
extract_segments_from_bounds
Extracts segments from Rect bounds.
remove_small_segments_from_labels
Removes small segments from a label classification.
Extract with Boundaries
A boundary is a Rect object that represents the minimum bounding box of a segment, with x0, y0, x1, y1 coordinates.
It simply slices the original image to the bounding box. The origin is the top left corner of the image.
A label classification is a np.ndarray where each pixel is mapped to a segment. The segments are mapped to a unique integer. In our project, the 0th label is the background.
For example, a label classification of 3 segments will look like this:
If cropped is False, the segments are padded with 0s to the original image size. While this can ensure shape consistency, it can consume more memory for large images.
If cropped is True, the segments are cropped to the minimum bounding box. This can save memory, but the shape of the segments will be inconsistent.
\ No newline at end of file
diff --git a/docs/preprocessing-glcm-padded.html b/docs/preprocessing-glcm-padded.html
new file mode 100644
index 00000000..11159608
--- /dev/null
+++ b/docs/preprocessing-glcm-padded.html
@@ -0,0 +1,26 @@
+ preprocessing.glcm_padded | Documentation
Documentation 0.0.4 Help
preprocessing.glcm_padded
Functions
glcm_padded
Computes the GLCM of the NDArray bands with padding.
glcm_padded_cached
Computes the GLCM of the NDArray bands with padding, and caches it.
append_glcm_padded_cached
Computes the GLCM of the NDArray bands with padding, and caches it and also appends it onto the original array.
Usage
We show a few examples of how to use the GLCM functions.
+import numpy as np
+from glcm_cupy import Features
+
+from frdc.preprocess.glcm_padded import (
+ append_glcm_padded_cached, glcm_padded, glcm_padded_cached
+)
+
+ar = np.random.rand(50, 25, 4)
+
+# Returns a shape of H x W x C x GLCM Features
+ar_glcm = glcm_padded(ar, bin_from=1, bin_to=4, radius=3, )
+
+# Returns a shape of H x W x C x 2
+ar_glcm_2_features = glcm_padded(ar, bin_from=1, bin_to=4, radius=3,
+ features=[Features.CONTRAST,
+ Features.CORRELATION])
+
+# Returns a shape of H x W x C x GLCM Features
+ar_glcm_cached = glcm_padded_cached(ar, bin_from=1, bin_to=4, radius=3)
+
+# Returns a shape of H x W x (C x GLCM Features + C)
+ar_glcm_cached_appended = append_glcm_padded_cached(ar, bin_from=1, bin_to=4,
+ radius=3)
+
+
ar_glcm is the GLCM of the original array, with the last dimension being the GLCM features. The number of features is determined by the features parameter, which defaults to all features.
ar_glcm_2_features selects only 2 features, with the last dimension being the 2 GLCM features specified.
ar_glcm_cached caches the GLCM so that if you call it again, it will return the cached version. It stores its data at the project root dir, under .cache/.
ar_glcm_cached_appended is a wrapper around ar_glcm_cached, it appends the GLCM features onto the original array. It's equivalent to calling ar_glcm_cached and then np.concatenate on the final axes.
Caching
GLCM is an expensive operation, thus we recommend to cache it if the input parameters will be the same. This is especially useful if you're experimenting with the same dataset with constant parameters.
\ No newline at end of file
diff --git a/docs/preprocessing-morphology.html b/docs/preprocessing-morphology.html
new file mode 100644
index 00000000..db402a61
--- /dev/null
+++ b/docs/preprocessing-morphology.html
@@ -0,0 +1,15 @@
+ preprocessing.morphology | Documentation
Documentation 0.0.4 Help
preprocessing.morphology
Functions
threshold_binary_mask
Thresholds a selected NDArray bands to yield a binary mask.
binary_watershed
Performs watershed on a binary mask to yield a mapped label classification
Usage
Perform auto-segmentation on a dataset to yield a label classification.
\ No newline at end of file
diff --git a/docs/preprocessing-scale.html b/docs/preprocessing-scale.html
new file mode 100644
index 00000000..28868ce7
--- /dev/null
+++ b/docs/preprocessing-scale.html
@@ -0,0 +1,15 @@
+ preprocessing.scale | Documentation
Documentation 0.0.4 Help
preprocessing.scale
Functions
scale_0_1_per_band
Scales the NDArray bands to [0, 1] per band.
scale_normal_per_band
Scales the NDArray bands to zero mean unit variance per band.
scale_static_per_band
Scales the NDArray bands by a predefined configuration.
Scales the NDArray bands to zero mean unit variance per band.
scale_static_per_band(ar, order, config)
Scales the NDArray bands by a predefined configuration.
The config is of dict[str, tuple[int, int]] where the key is the band name, and the value is a tuple of (min, max). Take a look at frdc.conf.BAND_MAX_CONFIG for an example.
segments is a list of np.ndarray of shape H, W, C, representing a tree. The order of segments is the same as labels, so you can use labels to identify the tree.
Plot the Data (Optional)
We can then use these data to plot out the first tree segment.
\ No newline at end of file
diff --git a/docs/train-frdc-lightning.html b/docs/train-frdc-lightning.html
new file mode 100644
index 00000000..122fb48a
--- /dev/null
+++ b/docs/train-frdc-lightning.html
@@ -0,0 +1,4 @@
+ train.frdc_datamodule & frdc_module | Documentation
Documentation 0.0.4 Help
train.frdc_datamodule & frdc_module
These are FRDC specific LightningDataModule and LightningModule, a core component in the PyTorch Lightning ecosystem to provide a simple interface to train and evaluate models.
preprocess is a function that takes in a segment and returns a preprocessed segment. In particular, it should accept a list of NumPy NDArrays and return a single stacked PyToch Tensor.
augmentation is a function that takes in a segment and returns an augmented segment. In particular, it takes in a PyTorch Tensor and returns another.
train_val_test_split is a function that takes a TensorDataset and returns a list of 3 TensorDatasets, for train, val and test respectively.