-
Notifications
You must be signed in to change notification settings - Fork 289
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #72 from lightly-ai/develop
Pre-release 1.0.6 - Develop to Master
- Loading branch information
Showing
52 changed files
with
1,977 additions
and
430 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,4 @@ | ||
source | ||
_data | ||
logos | ||
logos | ||
**lightning_logs |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,101 @@ | ||
Datapool | ||
================= | ||
|
||
The Lightly Datapool is a tool which allows users to incrementally build up a | ||
dataset for their project. It keeps track of the representations of previously | ||
selected samples and uses this information to pick new samples in order to | ||
maximize the quality of the final dataset. It also allows for combining two | ||
different datasets into one. | ||
|
||
- | If you're interested in how the datapool works, go to | ||
| --> `How It Works`_ | ||
- | To see how you can use the datapool, check out | ||
| --> `Usage`_ | ||
|
||
How It Works | ||
--------------- | ||
|
||
The Lightly Datapool keeps track of the selected samples in a csv file called | ||
`datapool_latest.csv`. It contains the filenames of the selected images, their | ||
embeddings, and their weak labels. Additionally, after training a self-supervised | ||
model, the datapool contains the checkpoint `checkpoint_latest.ckpt` which was | ||
used to generate the embeddings. | ||
|
||
The datapool is located in the `shared` directory. In general, it is a directory | ||
with the following structure: | ||
|
||
|
||
.. code-block:: bash | ||
# example of a datapool | ||
datapool/ | ||
+--- datapool_latest.csv | ||
+--- checkpoint_latest.ckpt | ||
+--- history/ | ||
The files `datapool_latest.csv` and `checkpoint_latest.csv` are updated after every | ||
run of the Lightly Docker. The history folder contains the previous versions of | ||
the datapool. This feature is meant to prevent accidental overrides and can be | ||
deactivated from the command-line (see `Usage`_ for more information). | ||
|
||
Usage | ||
--------------- | ||
|
||
To **initialize** a datapool, simply pass the name of the datapool as an argument | ||
to your docker run command and sample from a dataset as always. The Lightly Docker | ||
will automatically create a datapool directory and populate it with the required | ||
files. | ||
|
||
.. note:: To use the datapool feature, the Lightly Docker requires write access | ||
to a shared directory. This directory can be passed with the `-v` flag. | ||
|
||
.. code-block:: console | ||
docker run --gpus all --rm -it \ | ||
-v INPUT_DIR:/home/input_dir:ro \ | ||
-v SHARED_DIR:/home/shared_dir \ | ||
-v OUTPUT_DIR:/home/output_dir \ | ||
lightly/sampling:latest \ | ||
token=MYAWESOMETOKEN \ | ||
append_weak_labels=False \ | ||
stopping_condition.min_distance=0.1 \ | ||
datapool.name=my_datapool | ||
To **append** to your datapool, pass the name of an existing datapool as an argument. | ||
The Lightly Docker will read the embeddings and filenames from the existing pool and | ||
consider them during sampling. Then, it will update the datapool and checkpoint files. | ||
|
||
.. note:: You can't change the dimension of the embeddings once the datapool has | ||
been initialized so choose carefully! | ||
|
||
.. code-block:: console | ||
docker run --gpus all --rm -it \ | ||
-v OTHER_INPUT_DIR:/home/input_dir:ro \ | ||
-v SHARED_DIR:/home/shared_dir \ | ||
-v OUTPUT_DIR:/home/output_dir \ | ||
lightly/sampling:latest \ | ||
token=MYAWESOMETOKEN \ | ||
append_weak_labels=False \ | ||
stopping_condition.min_distance=0.1 \ | ||
datapool.name=my_datapool | ||
To **deactivate automatic archiving** of the past datapool versions, you can pass | ||
set the flag `keep_history` to False. | ||
|
||
.. code-block:: console | ||
docker run --gpus all --rm -it \ | ||
-v INPUT_DIR:/home/input_dir:ro \ | ||
-v SHARED_DIR:/home/shared_dir \ | ||
-v OUTPUT_DIR:/home/output_dir \ | ||
lightly/sampling:latest \ | ||
token=MYAWESOMETOKEN \ | ||
append_weak_labels=False \ | ||
stopping_condition.min_distance=0.1 \ | ||
datapool.name=my_datapool \ | ||
datapool.keep_history=False |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
Meta Information | ||
====================== | ||
|
||
Depending on your current setup one of the following topics might interest you: | ||
|
||
- | You have a dataset but want lightly to "ignore" certain Samples. | ||
| --> `Mask Samples`_ | ||
- | You have an existing dataset and want to add only relevant new data. | ||
| --> `Use Pre-Selected Samples`_ | ||
- | You have your own (weak) labels. Can lightly use this information to improve | ||
the selection? | ||
| --> `Custom Labels`_ | ||
|
||
Mask Samples | ||
----------------------------------- | ||
|
||
You can also add masking information to prevent certain samples from being | ||
used to the .csv file. | ||
|
||
The following example shows a dataset in which the column "masked" is used | ||
to prevent Lightly Docker from using this specific sample. In this example, | ||
img-1.jpg is simply ignored and not considered for sampling. E.g. the sample | ||
neither gets selected nor is it affecting the selection of any other sample. | ||
|
||
.. list-table:: masked_embeddings.csv | ||
:widths: 50 50 50 50 50 | ||
:header-rows: 1 | ||
|
||
* - filenames | ||
- embedding_0 | ||
- embedding_1 | ||
- masked | ||
- labels | ||
* - img-1.jpg | ||
- 0.1 | ||
- 0.5 | ||
- 1 | ||
- 0 | ||
* - img-2.jpg | ||
- 0.2 | ||
- 0.2 | ||
- 0 | ||
- 0 | ||
* - img-3.jpg | ||
- 0.1 | ||
- 0.9 | ||
- 0 | ||
- 0 | ||
|
||
|
||
Use Pre-Selected Samples | ||
----------------------------------- | ||
Very similar to masking samples we can also pre-select specific samples. This | ||
can be useful for semi-automated data selection processes. A human annotator | ||
can pre-select some of the relevant samples and let Lightly Docker add only | ||
additional samples that are enriching the existing selection. | ||
|
||
|
||
.. list-table:: selected_embeddings.csv | ||
:widths: 50 50 50 50 50 | ||
:header-rows: 1 | ||
|
||
* - filenames | ||
- embedding_0 | ||
- embedding_1 | ||
- selected | ||
- labels | ||
* - img-1.jpg | ||
- 0.1 | ||
- 0.5 | ||
- 0 | ||
- 0 | ||
* - img-2.jpg | ||
- 0.2 | ||
- 0.2 | ||
- 0 | ||
- 0 | ||
* - img-3.jpg | ||
- 0.1 | ||
- 0.9 | ||
- 1 | ||
- 0 | ||
|
||
.. note:: Pre-selected samples also count for the target number of samples. | ||
For example, you have a dataset with 100 samples. If you pre-select | ||
60 and want to sample 50, sampling would have no effect since there | ||
are already more than 50 samples selected. | ||
|
||
Custom Labels | ||
----------------------------------- | ||
|
||
You can always add custom embeddings to the dataset by following the guide | ||
here: :ref:`lightly-custom-labels` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
Advanced | ||
=================================== | ||
Here you learn more advanced usage patterns of Lightly Docker. | ||
|
||
|
||
.. toctree:: | ||
:maxdepth: 2 | ||
|
||
meta_information.rst | ||
datapool.rst |
Oops, something went wrong.