Merge pull request #248 from lightly-ai/develop

Develop to master - Pre-release 1.1.2
lightly-ai · Mar 18, 2021 · 356bd3f · 356bd3f
2 parents e3dcedd + 212a5b4
commit 356bd3f
Show file tree

Hide file tree

Showing 68 changed files with 1,406 additions and 257 deletions.
diff --git a/.coveragerc b/.coveragerc
@@ -0,0 +1,3 @@
+[run]
+omit = 
+    lightly/openapi_generated/*
diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
@@ -1,6 +1,6 @@
 name: Unit Tests
 
-on: [push]
+on: [push, workflow_dispatch]
 
 jobs:
   test:
@@ -27,6 +27,7 @@ jobs:
       run: pip install -e '.[all]'
     - name: Run Pytest
       run: |
+        LIGHTLY_SERVER_LOCATION="localhost:-1"
         pip install pytest-cov
         python -m pytest -s -v --runslow --cov=./lightly --cov-report=xml --ignore=./lightly/openapi_generated/
     - name: Upload coverage to Codecov

diff --git a/.github/workflows/test_setup.yml b/.github/workflows/test_setup.yml
@@ -1,5 +1,5 @@
 name: check setup.py
-on: [push]
+on: [push, workflow_dispatch]
 
 jobs:
   test:
@@ -29,13 +29,15 @@ jobs:
         pip install "git+https://github.com/lightly-ai/lightly.git@$BRANCH_NAME"
     - name: basic tests of CLI
       run: |
+        LIGHTLY_SERVER_LOCATION="localhost:-1"
         lightly-train --help
         lightly-embed --help
         lightly-upload --help
         lightly-magic --help
         lightly-download --help
     - name: test of CLI on a real dataset
       run: |
+        LIGHTLY_SERVER_LOCATION="localhost:-1"
         git clone https://github.com/alexeygrigorev/clothing-dataset-small clothing_dataset_small
         INPUT_DIR_1="clothing_dataset_small/test/dress"
         lightly-train input_dir=$INPUT_DIR_1 trainer.max_epochs=1 loader.num_workers=6

diff --git a/.gitignore b/.gitignore
@@ -32,3 +32,4 @@ lightly_outputs
 #ignore eggs
 .eggs
 tests/UNMOCKED_end2end_tests/call_test_api.py
+tests/UNMOCKED_end2end_tests/get_versions_all_apis.py
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -118,7 +118,7 @@ Follow these steps to start contributing:
 
    ```bash
    $ git fetch upstream
-   $ git rebase upstream/master
+   $ git rebase upstream/develop
    ```
 
    Push the changes to your account using:

diff --git a/Makefile b/Makefile
@@ -47,7 +47,7 @@ lint-tests:
 
 ## run tests
 test:
-	pytest tests -n 4 --runslow
+	pytest tests --runslow
 
 ## build source and wheel package
 dist: clean 

diff --git a/README.md b/README.md
@@ -16,6 +16,13 @@ Lightly is a computer vision framework for self-supervised learning.
 - [Github](https://github.com/lightly-ai/lightly)
 - [Discord](https://discord.gg/xvNJW94)
 
+### Supported Models
+
+- [MoCo, 2019](https://arxiv.org/abs/1911.05722)
+- [SimCLR, 2020](https://arxiv.org/abs/2002.05709)
+- [SimSiam, 2021](https://arxiv.org/abs/2011.10566)
+- [Barlow Twins, 2021](https://arxiv.org/abs/2103.03230)
+
 
 ### Tutorials
 
@@ -24,11 +31,13 @@ Want to jump to the tutorials and see lightly in action?
 - [Train MoCo on CIFAR-10](https://docs.lightly.ai/tutorials/package/tutorial_moco_memory_bank.html)
 - [Train SimCLR on clothing data](https://docs.lightly.ai/tutorials/package/tutorial_simclr_clothing.html)
 - [Train SimSiam on satellite images](https://docs.lightly.ai/tutorials/package/tutorial_simsiam_esa.html)
+- [Use lightly with custom augmentations](https://docs.lightly.ai/tutorials/package/tutorial_custom_augmentations.html)
 
 
 ### Benchmarks
 
 Currently implemented models and their accuracy on cifar10. All models have been evaluated using kNN. We report the max test accuracy over the epochs as well as the maximum GPU memory consumption. All models in this benchmark use the same augmentations as well as the same ResNet-18 backbone. Training precision is set to FP32 and SGD is used as an optimizer with cosineLR.
+One epoch on cifar10 takes ~35 secondson a V100 GPU. [Learn more about the cifar10 benchmark here](https://docs.lightly.ai/getting_started/benchmarks.html)
 
 | Model   | Epochs | Batch Size | Test Accuracy | Peak GPU usage |
 |---------|--------|------------|---------------|----------------|
@@ -38,6 +47,9 @@ Currently implemented models and their accuracy on cifar10. All models have been
 | MoCo    |  200   | 512        | 0.85          | 7.4 GBytes     |
 | SimCLR  |  200   | 512        | 0.83          | 7.8 GBytes     |
 | SimSiam |  200   | 512        | 0.81          | 7.0 GBytes     |
+| MoCo    |  800   | 128        | 0.89          | 2.1 GBytes     |
+| SimCLR  |  800   | 128        | 0.87          | 1.9 GBytes     |
+| SimSiam |  800   | 128        | 0.80          | 2.0 GBytes     |
 | MoCo    |  800   | 512        | 0.90          | 7.2 GBytes     |
 | SimCLR  |  800   | 512        | 0.89          | 7.7 GBytes     |
 | SimSiam |  800   | 512        | 0.91          | 6.9 GBytes     |
@@ -59,7 +71,7 @@ Lightly requires **Python 3.6+**. We recommend installing Lightly in a **Linux**
 
 - hydra-core>=1.0.0
 - numpy>=1.18.1
-- pytorch_lightning>=0.10.0   
+- pytorch_lightning>=1.0.4 
 - requests>=2.23.0
 - torchvision
 - tqdm
@@ -88,7 +100,8 @@ To create an embedding of a dataset you can use:
 lightly-embed input_dir=/mydataset checkpoint=/mycheckpoint
 ```
 
-The embeddings with the corresponding filename are stored in a human-readable .csv file.
+The embeddings with the corresponding filename are stored in a 
+[human-readable .csv file](https://docs.lightly.ai/getting_started/command_line_tool.html#create-embeddings-using-the-cli).
 
 ### Next Steps
 Head to the [documentation](https://docs.lightly.ai) and see the things you can achieve with Lightly!

diff --git a/docs/README.md b/docs/README.md
@@ -13,6 +13,11 @@ pip install sphinx_rtd_theme
 make html
 ```
 
+Shortcut to build the docs (with env variables for active-learning tutorial) use:
+```
+LIGHTLY_SERVER_LOCATION='https://api.lightly.ai' TOKEN='YOUR_TOKEN' AL_TUTORIAL_DATASET_ID='YOUR_DATASET_ID' make html && python -m http.server 1234 -d build/html
+```
+
 You can host the docs after building using the following python command `python -m http.server 1234 -d build/html` from the docs folder.
 Open a browser and go to `http://localhost:1234` to see the documentation.
 

diff --git a/docs/source/docker/advanced/pretagging.rst b/docs/source/docker/advanced/pretagging.rst
@@ -45,7 +45,7 @@ before filtering.
 For every docker run with pretagging enabled we also dump all model predictions
 into a json file with the following format:
 
-.. code-block:: json
+.. code-block:: javascript
 
     // boxes have format x1, y1, x2, y2
     [

diff --git a/docs/source/docker/getting_started/first_steps.rst b/docs/source/docker/getting_started/first_steps.rst
@@ -57,7 +57,7 @@ There are **three** types of volume mappings:
 * **Input Directory:**
    The input directory contains the dataset we want to process. The format of the input data should be either a single
    folder containing all the images or a folder containing a subfolder which holds the images.
-   See the tutorial "Structure Your Input" for more information.
+   See the tutorial :ref:`input-structure-label`  for more information.
    The container has only **read access** to this directory (note the *:ro* at
    the end of the volume mapping).
 * **Shared Directory:**
@@ -124,6 +124,15 @@ The command above does the following:
   will be 30% of the initial dataset size. You can also specify the exact 
   number of remaining images by setting **n_samples** to an integer value.
 
+  This allows you to specify the minimum allowed distance between two image 
+  embeddings in the output dataset. After normalizing the input embeddings 
+  to unit length, this value should be between 0 and 2. This is often a more 
+  convenient method when working with different data sources and trying to 
+  combine them in a balanced way.
+
+- **stopping_condition.min_distance=0.2** would remove all samples which are
+  closer to each other than 0.2. 
+
 
 Train a Self-Supervised Model
 -----------------------------------

diff --git a/docs/source/docker/getting_started/setup.rst b/docs/source/docker/getting_started/setup.rst
@@ -23,7 +23,7 @@ container has a working internet connection and has access to
 https://api.lightly.ai.
 
 
-Download image
+Download the Docker Image
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 Ask your account manager from Lightly for the credentials

diff --git a/docs/source/docker/known_issues_faq.rst b/docs/source/docker/known_issues_faq.rst
@@ -3,6 +3,25 @@
 Known Issues and FAQ
 ===================================
 
+Docker is slow when working with long videos
+---------------------------------------------------
+
+We are working on this issue internally. For now we suggest to split the large
+videos into chunks. You can do this using ffmpeg and without losing quality.
+The following code just breaks up the video in a way that no re-encoding is needed.
+
+.. code-block:: console
+
+    ffmpeg -i input.mp4 -c copy -map 0 -segment_time 01:00:00 -f segment -reset_timestamps 1 output%03d.mp4
+
+What exactly happens here?
+
+- `input.mp4`, this is your input video
+- `-c copy -map 0`, this makes sure we just copy and don't re-encode the video
+- `-segment_time 01:00:00 -f segment`, defines that we want chunks of 1h each
+- `-reset_timestamps 1`, makes sure we reset the timestamps (each video starts from 0)
+- `output%03d.mp4`, name of the output vidoes (output001.mp4, output002.mp4, ...)
+
 
 Shared Memory Error when running Lightly Docker
 -----------------------------------------------

diff --git a/docs/source/docker/overview.rst b/docs/source/docker/overview.rst
@@ -12,8 +12,16 @@ and an easy way to work with lightly. But there is more!
 With the introduction of our on-premise solution, you can process larger datasets completely on your end without data leaving your infrastructure.
 We worked hard to make this happen and are very proud to present you with the following specs:
 
+* **NEW** Lightly Docker has built-in pretagging models (see :ref:`ref-docker-pretagging` )
+
+  * Use this feature to pre-label your dataset or to only select images which contain certain objects
+
+  * Supported object categories are: bicycle, bus, car, motorcycle, person, train, truck
+
 * Sample more than 1 Million samples within a few hours!
 
+* Runs directly with videos without prior extraction of the frames!
+
 * Wrapped in a docker container (no setup required if your system supports docker)
 
 * Configurable

diff --git a/docs/source/getting_started/active_learning.rst b/docs/source/getting_started/active_learning.rst
@@ -1,11 +1,15 @@
 .. _lightly-active-learning:
 
-Active Learning
+Active learning
 ===================
 Lightly enables active learning with only a few lines of additional code. Learn 
 here, how to get the most out of your data by maximizing the available information
 in your annotated dataset.
 
+.. figure:: images/al_accuracy_plot.png
+
+   Plot showing the different samples and how they perform on the clothing dataset.
+
 Preparations
 -----------------
 Before you read on, make sure you have read the section on the :ref:`lightly-platform`. 
@@ -20,7 +24,7 @@ Lightly makes use of the following concepts for active learning:
 
 * **ApiWorkflowClient:** :py:class:`lightly.api.api_workflow_client.ApiWorkflowClient`
    The `ApiWorkflowClient` is used to connect to our API. The API handles the 
-   selection of the images based on embeddings and active-learning scores. To initialize
+   selection of the images based on embeddings and active learning scores. To initialize
    the `ApiWorkflowClient`, you will need the `datasetId` and the `token` from the 
    :ref:`lightly-platform`.