Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Outlier detection #120

Merged
merged 41 commits into from
Jan 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
90dd432
Update GenerateAnnotationPatch with test cases from biigle/maia
mzur Dec 6, 2023
cc4cc99
Merge branch 'master' into sim-sort
mzur Dec 6, 2023
9e406a1
Implement trait to compute an annotation bounding box
mzur Dec 8, 2023
33ebbee
Add ExtractFeatures script from biigle/maia
mzur Dec 8, 2023
4c1addc
Implement feature vector models
mzur Dec 15, 2023
fb73480
Fix migration timestamp
mzur Dec 15, 2023
0896365
Implement GenerateFeatureVectors job
mzur Dec 20, 2023
27ce50c
Implement generating feature vectors in GenerateAnnotationPatch jobs
mzur Dec 20, 2023
c1df26f
Generate new feature vectors for all annotation labels instead of first
mzur Jan 4, 2024
bc57c1e
Implement annotation label observers to copy feature vectors
mzur Jan 4, 2024
e8ff0b5
Update/fix feature vector migration
mzur Jan 4, 2024
271d0dd
Copy feature vectors when a Largo job is applied
mzur Jan 4, 2024
f458f4b
Handle feature vector generation of whole frame annotations
mzur Jan 5, 2024
5098614
Add update schema action
mzur Jan 9, 2024
16b3328
WIP Start implementing sorting tab with sort by ID and outlier
mzur Jan 10, 2024
d55c42d
Implement outlier sorting UI for volume largo
mzur Jan 11, 2024
6960687
Fix default patch sorting direction
mzur Jan 11, 2024
a5b75b4
Implement project largo sort by outlier controller
mzur Jan 11, 2024
1715d01
Implement outlier sorting in project Largo UI
mzur Jan 12, 2024
3a44e20
Implement job to initialize feature vectors from thumbnails
mzur Jan 12, 2024
c216ba7
Simplify InitializeFeatureVectorChunk job
mzur Jan 15, 2024
09f5bc7
Implement command to initialize feature vectors from thumbnails
mzur Jan 15, 2024
9021480
WIP Implement updated feature vector initialization based on thumbnails
mzur Jan 17, 2024
567ff7e
WIP Transform GenerateAnnotationPatch to ProcessAnnotatedFile
mzur Jan 18, 2024
6e681de
Fix data returned by volume sort by outliers controller
mzur Jan 18, 2024
1f5f7d0
Implement missing features of ProcessAnnotationFile jobs
mzur Jan 18, 2024
e609f4a
Make code more reusable in biigle/maia
mzur Jan 19, 2024
f804ff8
Update Python requirements
mzur Jan 23, 2024
3fe453a
Merge branch 'sim-sort-thumbs' into sim-sort
mzur Jan 23, 2024
78c4957
Finish and test refined InitializeFeatureVectorChunk job
mzur Jan 23, 2024
40e2d7f
Update initialize feature vector command to process only one volume
mzur Jan 23, 2024
e0c9ade
Fix sorting in video volume
mzur Jan 23, 2024
a960cdb
Merge branch 'master' into sim-sort
mzur Jan 23, 2024
54fe650
Merge branch 'master' into sim-sort
mzur Jan 23, 2024
90a30a7
WIP Update GenerateMissing command
mzur Jan 23, 2024
5f54efc
Finish update of generate missing command with tests
mzur Jan 24, 2024
7bf82b8
Refactor generate missing command
mzur Jan 24, 2024
9012f5d
Implement checking for feature vectors in generate missing command
mzur Jan 24, 2024
6eade58
Update manual article with sorting instructions
mzur Jan 24, 2024
13ca366
Replace annotation label observers with event handler
mzur Jan 24, 2024
2343427
Update Pillow version
mzur Jan 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions .github/workflows/update-schema.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
name: Update Schema

on:
push:
branches:
- master
paths:
- 'src/Database/migrations/**'

jobs:
update-schema:

runs-on: ubuntu-latest

steps:
- name: Trigger schema update
run: |
curl -X POST --fail \
-H "Authorization: token ${{ secrets.BIIGLE_SCHEMA_API_TOKEN }}" \
-H "Content-Type: application/json" \
--data '{"event_type": "build_application"}' \
https://api.github.com/repos/biigle/schema/dispatches
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ This is the BIIGLE module to review image annotations in a regular grid.
This module is already included in [`biigle/biigle`](https://github.com/biigle/biigle).

1. Run `composer require biigle/largo`.
2. Install the Python dependencies with `pip install -r requirements.txt`.
2. Add `Biigle\Modules\Largo\LargoServiceProvider::class` to the `providers` array in `config/app.php`.
3. Run `php artisan vendor:publish --tag=public` to publish the public assets of this module.
4. Configure a storage disk for the Largo annotation patches and set the `LARGO_PATCH_STORAGE_DISK` variable to the name of this storage disk in the `.env` file. The content of the storage disk should be publicly accessible. Example for a local disk:
Expand Down
4 changes: 4 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Pillow==10.2.0
torch==2.1.*
torchvision==0.16.*
xformers==0.0.18
219 changes: 132 additions & 87 deletions src/Console/Commands/GenerateMissing.php
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,18 @@

namespace Biigle\Modules\Largo\Console\Commands;

use Biigle\Annotation;
use Biigle\Image;
use Biigle\ImageAnnotation;
use Biigle\Modules\Largo\Jobs\GenerateImageAnnotationPatch;
use Biigle\Modules\Largo\Jobs\GenerateVideoAnnotationPatch;
use Biigle\Modules\Largo\Jobs\ProcessAnnotatedFile;
use Biigle\Modules\Largo\Jobs\ProcessAnnotatedImage;
use Biigle\Modules\Largo\Jobs\ProcessAnnotatedVideo;
use Biigle\VideoAnnotation;
use Biigle\VolumeFile;
use Carbon\Carbon;
use File;
use Illuminate\Console\Command;
use Illuminate\Contracts\Filesystem\Filesystem;
use Illuminate\Database\Eloquent\Builder;
use Storage;

class GenerateMissing extends Command
Expand All @@ -18,39 +23,38 @@ class GenerateMissing extends Command
*
* @var string
*/
protected $signature = 'largo:generate-missing {--dry-run} {--volume=} {--no-image-annotations} {--no-video-annotations} {--queue=} {--newer-than=}
{--older-than=}';
protected $signature = 'largo:generate-missing
{--dry-run : Do not submit processing jobs to the queue}
{--volume= : Check only this volume}
{--skip-images : Do not check image annotations}
{--skip-videos : Do not check video annotations}
{--skip-vectors : Do not check feature vectors}
{--skip-patches : Do not check annotation patches}
{--queue= : Submit processing jobs to this queue}
{--newer-than= : Only check annotations newer than this date}
{--older-than= : Only check annotations older than this date}';

/**
* The console command description.
*
* @var string
*/
protected $description = 'Generate missing patches for annotations.';
protected $description = 'Generate missing data for annotations.';

/**
* Largo patch storage file format.
*
* @var string
* Queue to push process jobs to.
*/
protected $format;
protected string $queue;

/**
* Number of annotations missing patches.
*
* @var int
* Whether to skip checking for missing patches.
*/
protected $count;
protected bool $skipPatches;

/**
* Create a new command instance.
* Whether to skip checking for missing feature vectors.
*/
public function __construct()
{
parent::__construct();
$this->format = config('largo.patch_format');
$this->count = 0.0;
}
protected bool $skipVectors;

/**
* Execute the command.
Expand All @@ -59,31 +63,31 @@ public function __construct()
*/
public function handle()
{
$pushToQueue = !$this->option('dry-run');
$storage = Storage::disk(config('largo.patch_storage_disk'));
$queue = $this->option('queue') ?: config('largo.generate_annotation_patch_queue');
$this->queue = $this->option('queue') ?: config('largo.generate_annotation_patch_queue');
$this->skipPatches = $this->option('skip-patches');
$this->skipVectors = $this->option('skip-vectors');

if (!$this->option('no-image-annotations')) {
$this->handleImageAnnotations($storage, $pushToQueue, $queue);
if (!$this->option('skip-images')) {
$this->handleImageAnnotations();
}

$this->count = 0;

if (!$this->option('no-video-annotations')) {
$this->handleVideoAnnotations($storage, $pushToQueue, $queue);
if (!$this->option('skip-videos')) {
$this->handleVideoAnnotations();
}
}

/**
* Check image annnotation patches
*
* @param \Illuminate\Filesystem\FilesystemAdapter $storage
* @param bool $pushToQueue
* @param string $queue
*/
protected function handleImageAnnotations($storage, $pushToQueue, $queue)
protected function handleImageAnnotations(): void
{
$annotations = ImageAnnotation::join('images', 'images.id', '=', 'image_annotations.image_id')
// Order by image ID first because we want to submit the annotations in
// batches for each image.
->orderBy('image_annotations.image_id')
// Order by annotation ID second to ensure a deterministic order for lazy().
->orderBy('image_annotations.id')
->select('image_annotations.id', 'image_annotations.image_id')
->when($this->option('volume'), function ($query) {
$query->where('images.volume_id', $this->option('volume'));
})
Expand All @@ -93,75 +97,97 @@ protected function handleImageAnnotations($storage, $pushToQueue, $queue)
->when($this->option('older-than'), function ($query) {
$query->where('image_annotations.created_at', '<', new Carbon($this->option('older-than')));
})
->select('image_annotations.id', 'images.uuid as uuid');

$total = $annotations->count();
$progress = $this->output->createProgressBar($total);
$this->info("Checking {$total} image annotations...");

$handleAnnotation = function ($annotation) use ($progress, $pushToQueue, $storage, $queue) {
$prefix = fragment_uuid_path($annotation->uuid);
if (!$storage->exists("{$prefix}/{$annotation->id}.{$this->format}")) {
$this->count++;
if ($pushToQueue) {
GenerateImageAnnotationPatch::dispatch($annotation)
->onQueue($queue);
}
}
$progress->advance();
};
->when(!$this->skipVectors, function ($query) {
$query->leftJoin('image_annotation_label_feature_vectors', 'image_annotation_label_feature_vectors.annotation_id', '=', 'image_annotations.id')
->addSelect('image_annotation_label_feature_vectors.id as vector_id');
});

$annotations->eachById($handleAnnotation, 10000, 'image_annotations.id', 'id');

$progress->finish();

if($total === 0) {
$this->info("\n");
return;
}

$percent = round($this->count / $total * 100, 2);
$this->info("\nFound {$this->count} image annotations with missing patches ({$percent} %).");
if ($pushToQueue) {
$this->info("Pushed {$this->count} jobs to queue {$queue}.");
}
$this->line("Image annotations");
$this->handleAnnotations($annotations);
}

/**
* Check video annnotation patches
*
* @param \Illuminate\Filesystem\FilesystemAdapter $storage
* @param bool $pushToQueue
* @param string $queue
*/
protected function handleVideoAnnotations($storage, $pushToQueue, $queue)
protected function handleVideoAnnotations(): void
{
$annotations = VideoAnnotation::join('videos', 'videos.id', '=', 'video_annotations.video_id')
// Order by video ID first because we want to submit the annotations in
// batches for each video.
->orderBy('video_annotations.video_id')
// Order by annotation ID second to ensure a deterministic order for lazy().
->orderBy('video_annotations.id')
->select('video_annotations.id', 'video_annotations.video_id')
->when($this->option('volume'), function ($query) {
$query->where('videos.volume_id', $this->option('volume'));
})
->when($this->option('newer-than'), function ($query) {
$query->where('video_annotations.created_at', '>', new Carbon($this->option('newer-than')));
})
->select('video_annotations.id', 'videos.uuid as uuid');
->when($this->option('older-than'), function ($query) {
$query->where('video_annotations.created_at', '<', new Carbon($this->option('older-than')));
})
->when(!$this->skipVectors, function ($query) {
$query->leftJoin('video_annotation_label_feature_vectors', 'video_annotation_label_feature_vectors.annotation_id', '=', 'video_annotations.id')
->addSelect('video_annotation_label_feature_vectors.id as vector_id');
});

$this->line("Video annotations");
$this->handleAnnotations($annotations);
}

protected function handleAnnotations(Builder $annotations): void
{
$pushToQueue = !$this->option('dry-run');
$storage = Storage::disk(config('largo.patch_storage_disk'));

$count = 0;
$jobCount = 0;
$total = $annotations->count();
$progress = $this->output->createProgressBar($total);
$this->info("Checking {$total} video annotations...");

$handleAnnotation = function ($annotation) use ($progress, $pushToQueue, $storage, $queue) {
$prefix = fragment_uuid_path($annotation->uuid);
if (!$storage->exists("{$prefix}/v-{$annotation->id}.{$this->format}")) {
$this->count++;
if ($pushToQueue) {
GenerateVideoAnnotationPatch::dispatch($annotation)
->onQueue($queue);
$this->info("Checking {$total} annotations...");

$currentFile = null;
$currentAnnotationBatch = [];

// lazy() is crucial as we can't load all annotations at once!
foreach ($annotations->with('file')->lazy() as $annotation) {
$progress->advance();

if ($this->skipPatches) {
$needsPatch = false;
} else {
$needsPatch = !$storage->exists(
ProcessAnnotatedFile::getTargetPath($annotation)
);
}

$needsVector = !$this->skipVectors && is_null($annotation->vector_id);

if (!$needsPatch && !$needsVector) {
continue;
}

$count++;

if (!$currentFile || $currentFile->id !== $annotation->file->id) {
if (!empty($currentAnnotationBatch) && $pushToQueue) {
$jobCount++;
$this->dispatcheProcessJob($currentFile, $currentAnnotationBatch);
}

$currentFile = $annotation->file;
$currentAnnotationBatch = [];
}
$progress->advance();
};

$annotations->eachById($handleAnnotation, 10000, 'video_annotations.id', 'id');
$currentAnnotationBatch[] = $annotation->id;
}

// Push final job.
if (!empty($currentAnnotationBatch) && $pushToQueue) {
$jobCount++;
$this->dispatcheProcessJob($currentFile, $currentAnnotationBatch);
}

$progress->finish();

Expand All @@ -170,10 +196,29 @@ protected function handleVideoAnnotations($storage, $pushToQueue, $queue)
return;
}

$percent = round($this->count / $total * 100, 2);
$this->info("\nFound {$this->count} video annotations with missing patches ({$percent} %).");
$percent = round($count / $total * 100, 2);
$this->info("\nFound {$count} annotations with missing patches ({$percent} %).");
if ($pushToQueue) {
$this->info("Pushed {$this->count} jobs to queue {$queue}.");
$this->info("Pushed {$jobCount} jobs to queue {$this->queue}.");
}
}

protected function dispatcheProcessJob(VolumeFile $file, array $ids)
{
if ($file instanceof Image) {
ProcessAnnotatedImage::dispatch($file,
only: $ids,
skipPatches: $this->skipPatches,
skipFeatureVectors: $this->skipVectors
)
->onQueue($this->queue);
} else {
ProcessAnnotatedVideo::dispatch($file,
only: $ids,
skipPatches: $this->skipPatches,
skipFeatureVectors: $this->skipVectors
)
->onQueue($this->queue);
}
}
}
Loading