Merge pull request #72 from lightly-ai/develop

Pre-release 1.0.6 - Develop to Master
lightly-ai · Dec 10, 2020 · 1a5bc35 · 1a5bc35
2 parents bbc4b7d + d1b4711
commit 1a5bc35
Show file tree

Hide file tree

Showing 52 changed files with 1,977 additions and 430 deletions.
diff --git a/.gitignore b/.gitignore
@@ -3,7 +3,11 @@
 **settings.json
 lightning_logs/
 **lightning_logs/
-docs/source/tutorials
+docs/source/tutorials/package/*
+docs/source/tutorials/platform/*
 docs/source/tutorials_source/platform/data
 docs/source/tutorials_source/platform/pizzas
-**.zip
+!docs/source/tutorials/package.rst
+!docs/source/tutorials/platform.rst
+!docs/source/tutorials/package/structure_your_input.rst
+**.zip
diff --git a/Makefile b/Makefile
@@ -28,6 +28,7 @@ clean-out:
 	rm -fr lightly_outputs/
 	rm -fr lightning_logs/
 	rm -fr lightly_epoch_*.ckpt
+	rm -fr last.ckpt
 
 ## remove tox cache
 clean-tox:

diff --git a/README.md b/README.md
@@ -19,7 +19,16 @@ Lightly is a computer vision framework for self-supervised learning.
 
 ## Quick Start
 
-Lightly requires Python 3.6+. We recommend installing Lightly in a **Linux** or **OSX** environment.
+Lightly requires **Python 3.6+**. We recommend installing Lightly in a **Linux** or **OSX** environment.
+
+### Requirements
+
+- hydra-core>=1.0.0
+- numpy>=1.18.1
+- pytorch_lightning>=0.10.0   
+- requests>=2.23.0
+- torchvision
+- tqdm
 
 ### Installation
 You can install Lightly and its dependencies from PyPI with:
@@ -29,6 +38,25 @@ pip install lightly
 
 We strongly recommend that you install Lightly in a dedicated virtualenv, to avoid conflicting with your system packages.
 
+### Command-Line Interface
+
+Lightly is accessible also through a command-line interface (CLI).
+To train a SimCLR model on a folder of images you can simply run
+the following command:
+
+```
+lightly-train input_dir=/mydataset
+```
+
+To create an embedding of a dataset you can use:
+
+```
+lightly-embed input_dir=/mydataset checkpoint=/mycheckpoint
+```
+
+The embeddings with the corresponding filename are stored in a human-readable .csv file.
+
+
 ### Next Steps
 Head to the [documentation](https://docs.lightly.ai) and see the things you can achieve with Lightly!
 

diff --git a/docs/.gcloudignore b/docs/.gcloudignore
@@ -1,3 +1,4 @@
 source
 _data
-logos
+logos
+**lightning_logs
diff --git a/docs/source/basic_concepts/command_line_tool.rst b/docs/source/basic_concepts/command_line_tool.rst
diff --git a/docs/source/docker/advanced/datapool.rst b/docs/source/docker/advanced/datapool.rst
@@ -0,0 +1,101 @@
+Datapool
+=================
+
+The Lightly Datapool is a tool which allows users to incrementally build up a 
+dataset for their project. It keeps track of the representations of previously
+selected samples and uses this information to pick new samples in order to
+maximize the quality of the final dataset. It also allows for combining two 
+different datasets into one.
+
+- | If you're interested in how the datapool works, go to
+  | --> `How It Works`_
+
+- | To see how you can use the datapool, check out
+  | --> `Usage`_
+
+
+How It Works
+---------------
+
+The Lightly Datapool keeps track of the selected samples in a csv file called
+`datapool_latest.csv`. It contains the filenames of the selected images, their
+embeddings, and their weak labels. Additionally, after training a self-supervised
+model, the datapool contains the checkpoint `checkpoint_latest.ckpt` which was 
+used to generate the embeddings.
+
+The datapool is located in the `shared` directory. In general, it is a directory
+with the following structure:
+
+
+.. code-block:: bash
+
+    # example of a datapool
+    datapool/
+    +--- datapool_latest.csv
+    +--- checkpoint_latest.ckpt
+    +--- history/
+  
+The files `datapool_latest.csv` and `checkpoint_latest.csv` are updated after every
+run of the Lightly Docker. The history folder contains the previous versions of 
+the datapool. This feature is meant to prevent accidental overrides and can be 
+deactivated from the command-line (see `Usage`_ for more information).
+
+Usage
+---------------
+
+To **initialize** a datapool, simply pass the name of the datapool as an argument
+to your docker run command and sample from a dataset as always. The Lightly Docker
+will automatically create a datapool directory and populate it with the required
+files.
+
+.. note:: To use the datapool feature, the Lightly Docker requires write access
+          to a shared directory. This directory can be passed with the `-v` flag.
+
+.. code-block:: console
+
+   docker run --gpus all --rm -it \
+      -v INPUT_DIR:/home/input_dir:ro \
+      -v SHARED_DIR:/home/shared_dir \
+      -v OUTPUT_DIR:/home/output_dir \
+      lightly/sampling:latest \
+      token=MYAWESOMETOKEN \
+      append_weak_labels=False \
+      stopping_condition.min_distance=0.1 \
+      datapool.name=my_datapool
+
+
+To **append** to your datapool, pass the name of an existing datapool as an argument.
+The Lightly Docker will read the embeddings and filenames from the existing pool and
+consider them during sampling. Then, it will update the datapool and checkpoint files.
+
+.. note:: You can't change the dimension of the embeddings once the datapool has
+          been initialized so choose carefully!
+
+.. code-block:: console
+
+   docker run --gpus all --rm -it \
+      -v OTHER_INPUT_DIR:/home/input_dir:ro \
+      -v SHARED_DIR:/home/shared_dir \
+      -v OUTPUT_DIR:/home/output_dir \
+      lightly/sampling:latest \
+      token=MYAWESOMETOKEN \
+      append_weak_labels=False \
+      stopping_condition.min_distance=0.1 \
+      datapool.name=my_datapool
+
+
+To **deactivate automatic archiving** of the past datapool versions, you can pass
+set the flag `keep_history` to False.
+
+.. code-block:: console
+
+   docker run --gpus all --rm -it \
+      -v INPUT_DIR:/home/input_dir:ro \
+      -v SHARED_DIR:/home/shared_dir \
+      -v OUTPUT_DIR:/home/output_dir \
+      lightly/sampling:latest \
+      token=MYAWESOMETOKEN \
+      append_weak_labels=False \
+      stopping_condition.min_distance=0.1 \
+      datapool.name=my_datapool \
+      datapool.keep_history=False
diff --git a/docs/source/docker/advanced/meta_information.rst b/docs/source/docker/advanced/meta_information.rst
@@ -0,0 +1,96 @@
+Meta Information
+======================
+
+Depending on your current setup one of the following topics might interest you:
+
+- | You have a dataset but want lightly to "ignore" certain Samples.
+  | --> `Mask Samples`_
+
+- | You have an existing dataset and want to add only relevant new data.
+  | --> `Use Pre-Selected Samples`_
+
+- | You have your own (weak) labels. Can lightly use this information to improve
+    the selection? 
+  | --> `Custom Labels`_
+
+
+Mask Samples
+-----------------------------------
+
+You can also add masking information to prevent certain samples from being
+used to the .csv file. 
+
+The following example shows a dataset in which the column "masked" is used
+to prevent Lightly Docker from using this specific sample. In this example,
+img-1.jpg is simply ignored and not considered for sampling. E.g. the sample
+neither gets selected nor is it affecting the selection of any other sample.
+
+.. list-table:: masked_embeddings.csv
+   :widths: 50 50 50 50 50
+   :header-rows: 1
+
+   * - filenames
+     - embedding_0
+     - embedding_1
+     - masked
+     - labels
+   * - img-1.jpg
+     - 0.1
+     - 0.5
+     - 1
+     - 0
+   * - img-2.jpg
+     - 0.2
+     - 0.2
+     - 0
+     - 0
+   * - img-3.jpg
+     - 0.1
+     - 0.9
+     - 0
+     - 0
+
+
+Use Pre-Selected Samples
+-----------------------------------
+Very similar to masking samples we can also pre-select specific samples. This 
+can be useful for semi-automated data selection processes. A human annotator
+can pre-select some of the relevant samples and let Lightly Docker add only
+additional samples that are enriching the existing selection.
+
+
+.. list-table:: selected_embeddings.csv
+   :widths: 50 50 50 50 50
+   :header-rows: 1
+
+   * - filenames
+     - embedding_0
+     - embedding_1
+     - selected
+     - labels
+   * - img-1.jpg
+     - 0.1
+     - 0.5
+     - 0
+     - 0
+   * - img-2.jpg
+     - 0.2
+     - 0.2
+     - 0
+     - 0
+   * - img-3.jpg
+     - 0.1
+     - 0.9
+     - 1
+     - 0
+
+.. note:: Pre-selected samples also count for the target number of samples.
+          For example, you have a dataset with 100 samples. If you pre-select
+          60 and want to sample 50, sampling would have no effect since there
+          are already more than 50 samples selected.
+
+Custom Labels
+-----------------------------------
+
+You can always add custom embeddings to the dataset by following the guide
+here: :ref:`lightly-custom-labels`
diff --git a/docs/source/docker/advanced/overview.rst b/docs/source/docker/advanced/overview.rst
@@ -0,0 +1,10 @@
+Advanced
+===================================
+Here you learn more advanced usage patterns of Lightly Docker.
+
+
+.. toctree::
+   :maxdepth: 2
+
+   meta_information.rst
+   datapool.rst