Skip to content

Commit

Permalink
Merge pull request #58 from FR-DC/0.0.11
Browse files Browse the repository at this point in the history
0.0.11
  • Loading branch information
Eve-ning authored May 21, 2024
2 parents 94d6190 + fc76f16 commit 96abc8b
Show file tree
Hide file tree
Showing 21 changed files with 513 additions and 30 deletions.
4 changes: 3 additions & 1 deletion .gitattributes
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
Dockerfile text=auto eol=lf
Dockerfile text=auto eol=lf
* text=auto
*.ipynb -linguist-detectable
2 changes: 2 additions & 0 deletions .github/workflows/basic-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ on:
push:
branches: [ "main" ]
pull_request:
paths:
- "src/frdc/**"

jobs:
build:
Expand Down
47 changes: 47 additions & 0 deletions .github/workflows/build-ls-image.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
name: Publish Custom Label Studio Image
on:
push:
branches: [ 'terraform' ]
paths:
- 'src/terraform/Dockerfile'
workflow_dispatch:

env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}

jobs:
build-and-push-image:
runs-on: ubuntu-latest
# Sets the permissions granted to the `GITHUB_TOKEN` for the actions in this job.
permissions:
contents: read
packages: write

steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Log in to the Container registry
uses: docker/[email protected]
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/[email protected]
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}

# This step uses the `docker/build-push-action` action to build the image, based on your repository's `Dockerfile`. If the build succeeds, it pushes the image to GitHub Packages.
# It uses the `context` parameter to define the build's context as the set of files located in the specified path. For more information, see "[Usage](https://github.com/docker/build-push-action#usage)" in the README of the `docker/build-push-action` repository.
# It uses the `tags` and `labels` parameters to tag and label the image with the output from the "meta" step.
- name: Build and push Docker image
uses: docker/[email protected]
with:
context: ./src/terraform
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
37 changes: 37 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,43 @@ cython_debug/
# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839
.idea/

# Terraform Ignores
# Local .terraform directories
**/.terraform/*

# .tfstate files
*.tfstate
*.tfstate.*

# Crash log files
crash.log
crash.*.log

# Exclude all .tfvars files, which are likely to contain sensitive data, such as
# password, private keys, and other secrets. These should not be part of version
# control as they are data points which are potentially sensitive and subject
# to change depending on the environment.
*.tfvars
*.tfvars.json

# Ignore override files as they are usually used to override resources locally and so
# are not checked in
override.tf
override.tf.json
*_override.tf
*_override.tf.json

# Include override files you do wish to add to version control using negated pattern
# !example_override.tf

# Include tfplan files to ignore the plan output of command: terraform plan -out=tfplan
# example: *tfplan*

# Ignore CLI configuration files
.terraformrc
terraform.rc


# Ignores the raw .tif files
rsc/**/*.tif

Expand Down
19 changes: 19 additions & 0 deletions notebooks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Jupyter Notebooks

## FRRD-60: Consistency as a feature discriminator for Novelty Detection

Hypothesis: Consistency can be used as a measure to discriminate between normal
and novel data. In the case of FRDC, it's a metric to separate seen and unseen
tree-species. Seen data will have high consistency, while unseen data will have
low consistency.

Conclusion: It's not possible. Consistency is a measure of the similarity
of output distributions, upon different pertubations of the input. A simple
counter-example is the following: a black square in a white background.
The consistency of that image is always perfect given weak augmentations
(flips). We also show that using CIFAR10 and noise datasets that the
Jenson-Shannon divergence goes against our hypothesis, sometimes yielding
higher consistency despite being Out-of-Distribution (OOD).

Author Discussion: We believe that the formulated hypothesis, while on the
surface, plausible, requires more mathematical rigor to be proven.
2 changes: 2 additions & 0 deletions notebooks/frrd-60/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
data/
*.ckpt
File renamed without changes.
File renamed without changes
65 changes: 36 additions & 29 deletions src/frdc/load/label_studio.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,35 +16,42 @@ class Task(dict):
def get_bounds_and_labels(self) -> tuple[list[tuple[int, int]], list[str]]:
bounds = []
labels = []
for ann_ix, ann in enumerate(self["annotations"]):
results = ann["result"]
for r_ix, r in enumerate(results):
r: dict

# We flatten the value dict into the result dict
v = r.pop("value")
r = {**r, **v}

# Points are in percentage, we need to convert them to pixels
r["points"] = [
(
int(x * r["original_width"] / 100),
int(y * r["original_height"] / 100),
)
for x, y in r["points"]
]

# Only take the first label as this is not a multi-label task
r["label"] = r.pop("polygonlabels")[0]
if not r["closed"]:
logger.warning(
f"Label for {r['label']} @ {r['points']} not closed. "
f"Skipping"
)
continue

bounds.append(r["points"])
labels.append(r["label"])

# for ann_ix, ann in enumerate(self["annotations"]):

ann = self["annotations"][0]
results = ann["result"]
for r_ix, r in enumerate(results):
r: dict

# See Issue FRML-78: Somehow some labels are actually just metadata
if r["from_name"] != "label":
continue

# We flatten the value dict into the result dict
v = r.pop("value")
r = {**r, **v}

# Points are in percentage, we need to convert them to pixels
r["points"] = [
(
int(x * r["original_width"] / 100),
int(y * r["original_height"] / 100),
)
for x, y in r["points"]
]

# Only take the first label as this is not a multi-label task
r["label"] = r.pop("polygonlabels")[0]
if not r["closed"]:
logger.warning(
f"Label for {r['label']} @ {r['points']} not closed. "
f"Skipping"
)
continue

bounds.append(r["points"])
labels.append(r["label"])

return bounds, labels

Expand Down
45 changes: 45 additions & 0 deletions src/terraform/.terraform.lock.hcl

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

17 changes: 17 additions & 0 deletions src/terraform/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
FROM ubuntu:22.04
WORKDIR /app

RUN apt-get update && apt-get install -y python3-pip

RUN pip3 install label-studio

# NOTE
# This doesn't automatically port forward the port
# to the host machine. You need to do that manually
EXPOSE 8080

ENV DJANGO_DB=default
ENV POSTGRE_NAME=postgres
ENV POSTGRE_PORT=5432

ENTRYPOINT ["label-studio"]
101 changes: 101 additions & 0 deletions src/terraform/google_compute_engine.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# This code is compatible with Terraform 4.25.0 and versions that are backward compatible to 4.25.0.
# For information about validating this Terraform code, see https://developer.hashicorp.com/terraform/tutorials/gcp-get-started/google-cloud-platform-build#format-and-validate-the-configuration

resource "google_compute_instance" "label-studio" {
boot_disk {
auto_delete = true
device_name = var.google_compute_name

initialize_params {
image = "projects/cos-cloud/global/images/cos-stable-109-17800-147-38"
size = 10
type = "pd-balanced"
}

mode = "READ_WRITE"
}

can_ip_forward = false
deletion_protection = false
enable_display = false

labels = {
container-vm = "cos-stable-109-17800-147-38"
goog-ec-src = "vm_add-tf"
}

machine_type = var.google_compute_machine_type

metadata = {
enable-oslogin = "true"
gce-container-declaration = <<-EOF
spec:
containers:
- name: label-studio
image: ghcr.io/fr-dc/frdc-ml:terraform
env:
- name: POSTGRE_PASSWORD
value: ${var.db_password}
- name: POSTGRE_USER
value: postgres.${supabase_project.production.id}
- name: POSTGRE_HOST
value: aws-0-${supabase_project.production.region}.pooler.supabase.com
- name: LABEL_STUDIO_DISABLE_SIGNUP_WITHOUT_LINK
value: true
- name: LABEL_STUDIO_USERNAME
value: ${var.ls_username}
- name: LABEL_STUDIO_PASSWORD
value: ${var.ls_password}
stdin: false
tty: false
restartPolicy: Always
EOF
}

name = var.google_compute_name
zone = var.google_zone

network_interface {
access_config {
nat_ip = google_compute_address.label-studio-ip.address
network_tier = "STANDARD"
}

queue_count = 0
stack_type = "IPV4_ONLY"
subnetwork = google_compute_network.label-studio-vpc.name
}

scheduling {
automatic_restart = false
on_host_maintenance = "TERMINATE"
preemptible = true
provisioning_model = "SPOT"
}

service_account {
email = "[email protected]"
scopes = [
"https://www.googleapis.com/auth/devstorage.read_only",
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring.write",
"https://www.googleapis.com/auth/service.management.readonly",
"https://www.googleapis.com/auth/servicecontrol",
"https://www.googleapis.com/auth/trace.append"
]
}

shielded_instance_config {
enable_integrity_monitoring = true
enable_secure_boot = false
enable_vtpm = true
}

tags = ["label-studio"]
depends_on = [
google_compute_network.label-studio-vpc,
google_compute_address.label-studio-ip,
supabase_project.production
]
}

24 changes: 24 additions & 0 deletions src/terraform/google_network.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
resource "google_compute_firewall" "label-studio-port" {
name = var.google_firewall_name
network = google_compute_network.label-studio-vpc.name

allow {
protocol = "tcp"
ports = ["22", "80", "443", "8080"]
}

source_ranges = ["0.0.0.0/0"]
target_tags = ["label-studio"]
depends_on = [
google_compute_network.label-studio-vpc
]
}

resource "google_compute_network" "label-studio-vpc" {
name = var.google_vpc_name
auto_create_subnetworks = true
}

resource "google_compute_address" "label-studio-ip" {
name = var.google_address_name
}
Loading

0 comments on commit 96abc8b

Please sign in to comment.