Skip to content

Commit

Permalink
Merge pull request #248 from lightly-ai/develop
Browse files Browse the repository at this point in the history
Develop to master - Pre-release 1.1.2
  • Loading branch information
IgorSusmelj authored Mar 18, 2021
2 parents e3dcedd + 212a5b4 commit 356bd3f
Show file tree
Hide file tree
Showing 68 changed files with 1,406 additions and 257 deletions.
3 changes: 3 additions & 0 deletions .coveragerc
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[run]
omit =
lightly/openapi_generated/*
3 changes: 2 additions & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: Unit Tests

on: [push]
on: [push, workflow_dispatch]

jobs:
test:
Expand All @@ -27,6 +27,7 @@ jobs:
run: pip install -e '.[all]'
- name: Run Pytest
run: |
LIGHTLY_SERVER_LOCATION="localhost:-1"
pip install pytest-cov
python -m pytest -s -v --runslow --cov=./lightly --cov-report=xml --ignore=./lightly/openapi_generated/
- name: Upload coverage to Codecov
Expand Down
4 changes: 3 additions & 1 deletion .github/workflows/test_setup.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: check setup.py
on: [push]
on: [push, workflow_dispatch]

jobs:
test:
Expand Down Expand Up @@ -29,13 +29,15 @@ jobs:
pip install "git+https://github.com/lightly-ai/lightly.git@$BRANCH_NAME"
- name: basic tests of CLI
run: |
LIGHTLY_SERVER_LOCATION="localhost:-1"
lightly-train --help
lightly-embed --help
lightly-upload --help
lightly-magic --help
lightly-download --help
- name: test of CLI on a real dataset
run: |
LIGHTLY_SERVER_LOCATION="localhost:-1"
git clone https://github.com/alexeygrigorev/clothing-dataset-small clothing_dataset_small
INPUT_DIR_1="clothing_dataset_small/test/dress"
lightly-train input_dir=$INPUT_DIR_1 trainer.max_epochs=1 loader.num_workers=6
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,4 @@ lightly_outputs
#ignore eggs
.eggs
tests/UNMOCKED_end2end_tests/call_test_api.py
tests/UNMOCKED_end2end_tests/get_versions_all_apis.py
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ Follow these steps to start contributing:

```bash
$ git fetch upstream
$ git rebase upstream/master
$ git rebase upstream/develop
```

Push the changes to your account using:
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ lint-tests:

## run tests
test:
pytest tests -n 4 --runslow
pytest tests --runslow

## build source and wheel package
dist: clean
Expand Down
17 changes: 15 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,13 @@ Lightly is a computer vision framework for self-supervised learning.
- [Github](https://github.com/lightly-ai/lightly)
- [Discord](https://discord.gg/xvNJW94)

### Supported Models

- [MoCo, 2019](https://arxiv.org/abs/1911.05722)
- [SimCLR, 2020](https://arxiv.org/abs/2002.05709)
- [SimSiam, 2021](https://arxiv.org/abs/2011.10566)
- [Barlow Twins, 2021](https://arxiv.org/abs/2103.03230)


### Tutorials

Expand All @@ -24,11 +31,13 @@ Want to jump to the tutorials and see lightly in action?
- [Train MoCo on CIFAR-10](https://docs.lightly.ai/tutorials/package/tutorial_moco_memory_bank.html)
- [Train SimCLR on clothing data](https://docs.lightly.ai/tutorials/package/tutorial_simclr_clothing.html)
- [Train SimSiam on satellite images](https://docs.lightly.ai/tutorials/package/tutorial_simsiam_esa.html)
- [Use lightly with custom augmentations](https://docs.lightly.ai/tutorials/package/tutorial_custom_augmentations.html)


### Benchmarks

Currently implemented models and their accuracy on cifar10. All models have been evaluated using kNN. We report the max test accuracy over the epochs as well as the maximum GPU memory consumption. All models in this benchmark use the same augmentations as well as the same ResNet-18 backbone. Training precision is set to FP32 and SGD is used as an optimizer with cosineLR.
One epoch on cifar10 takes ~35 secondson a V100 GPU. [Learn more about the cifar10 benchmark here](https://docs.lightly.ai/getting_started/benchmarks.html)

| Model | Epochs | Batch Size | Test Accuracy | Peak GPU usage |
|---------|--------|------------|---------------|----------------|
Expand All @@ -38,6 +47,9 @@ Currently implemented models and their accuracy on cifar10. All models have been
| MoCo | 200 | 512 | 0.85 | 7.4 GBytes |
| SimCLR | 200 | 512 | 0.83 | 7.8 GBytes |
| SimSiam | 200 | 512 | 0.81 | 7.0 GBytes |
| MoCo | 800 | 128 | 0.89 | 2.1 GBytes |
| SimCLR | 800 | 128 | 0.87 | 1.9 GBytes |
| SimSiam | 800 | 128 | 0.80 | 2.0 GBytes |
| MoCo | 800 | 512 | 0.90 | 7.2 GBytes |
| SimCLR | 800 | 512 | 0.89 | 7.7 GBytes |
| SimSiam | 800 | 512 | 0.91 | 6.9 GBytes |
Expand All @@ -59,7 +71,7 @@ Lightly requires **Python 3.6+**. We recommend installing Lightly in a **Linux**

- hydra-core>=1.0.0
- numpy>=1.18.1
- pytorch_lightning>=0.10.0
- pytorch_lightning>=1.0.4
- requests>=2.23.0
- torchvision
- tqdm
Expand Down Expand Up @@ -88,7 +100,8 @@ To create an embedding of a dataset you can use:
lightly-embed input_dir=/mydataset checkpoint=/mycheckpoint
```

The embeddings with the corresponding filename are stored in a human-readable .csv file.
The embeddings with the corresponding filename are stored in a
[human-readable .csv file](https://docs.lightly.ai/getting_started/command_line_tool.html#create-embeddings-using-the-cli).

### Next Steps
Head to the [documentation](https://docs.lightly.ai) and see the things you can achieve with Lightly!
Expand Down
5 changes: 5 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,11 @@ pip install sphinx_rtd_theme
make html
```

Shortcut to build the docs (with env variables for active-learning tutorial) use:
```
LIGHTLY_SERVER_LOCATION='https://api.lightly.ai' TOKEN='YOUR_TOKEN' AL_TUTORIAL_DATASET_ID='YOUR_DATASET_ID' make html && python -m http.server 1234 -d build/html
```

You can host the docs after building using the following python command `python -m http.server 1234 -d build/html` from the docs folder.
Open a browser and go to `http://localhost:1234` to see the documentation.

Expand Down
2 changes: 1 addition & 1 deletion docs/source/docker/advanced/pretagging.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ before filtering.
For every docker run with pretagging enabled we also dump all model predictions
into a json file with the following format:

.. code-block:: json
.. code-block:: javascript
// boxes have format x1, y1, x2, y2
[
Expand Down
11 changes: 10 additions & 1 deletion docs/source/docker/getting_started/first_steps.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ There are **three** types of volume mappings:
* **Input Directory:**
The input directory contains the dataset we want to process. The format of the input data should be either a single
folder containing all the images or a folder containing a subfolder which holds the images.
See the tutorial "Structure Your Input" for more information.
See the tutorial :ref:`input-structure-label` for more information.
The container has only **read access** to this directory (note the *:ro* at
the end of the volume mapping).
* **Shared Directory:**
Expand Down Expand Up @@ -124,6 +124,15 @@ The command above does the following:
will be 30% of the initial dataset size. You can also specify the exact
number of remaining images by setting **n_samples** to an integer value.

This allows you to specify the minimum allowed distance between two image
embeddings in the output dataset. After normalizing the input embeddings
to unit length, this value should be between 0 and 2. This is often a more
convenient method when working with different data sources and trying to
combine them in a balanced way.

- **stopping_condition.min_distance=0.2** would remove all samples which are
closer to each other than 0.2.


Train a Self-Supervised Model
-----------------------------------
Expand Down
2 changes: 1 addition & 1 deletion docs/source/docker/getting_started/setup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ container has a working internet connection and has access to
https://api.lightly.ai.


Download image
Download the Docker Image
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Ask your account manager from Lightly for the credentials
Expand Down
19 changes: 19 additions & 0 deletions docs/source/docker/known_issues_faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,25 @@
Known Issues and FAQ
===================================

Docker is slow when working with long videos
---------------------------------------------------

We are working on this issue internally. For now we suggest to split the large
videos into chunks. You can do this using ffmpeg and without losing quality.
The following code just breaks up the video in a way that no re-encoding is needed.

.. code-block:: console
ffmpeg -i input.mp4 -c copy -map 0 -segment_time 01:00:00 -f segment -reset_timestamps 1 output%03d.mp4
What exactly happens here?

- `input.mp4`, this is your input video
- `-c copy -map 0`, this makes sure we just copy and don't re-encode the video
- `-segment_time 01:00:00 -f segment`, defines that we want chunks of 1h each
- `-reset_timestamps 1`, makes sure we reset the timestamps (each video starts from 0)
- `output%03d.mp4`, name of the output vidoes (output001.mp4, output002.mp4, ...)


Shared Memory Error when running Lightly Docker
-----------------------------------------------
Expand Down
8 changes: 8 additions & 0 deletions docs/source/docker/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,16 @@ and an easy way to work with lightly. But there is more!
With the introduction of our on-premise solution, you can process larger datasets completely on your end without data leaving your infrastructure.
We worked hard to make this happen and are very proud to present you with the following specs:

* **NEW** Lightly Docker has built-in pretagging models (see :ref:`ref-docker-pretagging` )

* Use this feature to pre-label your dataset or to only select images which contain certain objects

* Supported object categories are: bicycle, bus, car, motorcycle, person, train, truck

* Sample more than 1 Million samples within a few hours!

* Runs directly with videos without prior extraction of the frames!

* Wrapped in a docker container (no setup required if your system supports docker)

* Configurable
Expand Down
8 changes: 6 additions & 2 deletions docs/source/getting_started/active_learning.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,15 @@
.. _lightly-active-learning:

Active Learning
Active learning
===================
Lightly enables active learning with only a few lines of additional code. Learn
here, how to get the most out of your data by maximizing the available information
in your annotated dataset.

.. figure:: images/al_accuracy_plot.png

Plot showing the different samples and how they perform on the clothing dataset.

Preparations
-----------------
Before you read on, make sure you have read the section on the :ref:`lightly-platform`.
Expand All @@ -20,7 +24,7 @@ Lightly makes use of the following concepts for active learning:

* **ApiWorkflowClient:** :py:class:`lightly.api.api_workflow_client.ApiWorkflowClient`
The `ApiWorkflowClient` is used to connect to our API. The API handles the
selection of the images based on embeddings and active-learning scores. To initialize
selection of the images based on embeddings and active learning scores. To initialize
the `ApiWorkflowClient`, you will need the `datasetId` and the `token` from the
:ref:`lightly-platform`.

Expand Down
Loading

0 comments on commit 356bd3f

Please sign in to comment.