Skip to content

Commit

Permalink
Merge pull request #72 from lightly-ai/develop
Browse files Browse the repository at this point in the history
Pre-release 1.0.6 - Develop to Master
  • Loading branch information
IgorSusmelj authored Dec 10, 2020
2 parents bbc4b7d + d1b4711 commit 1a5bc35
Show file tree
Hide file tree
Showing 52 changed files with 1,977 additions and 430 deletions.
8 changes: 6 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,11 @@
**settings.json
lightning_logs/
**lightning_logs/
docs/source/tutorials
docs/source/tutorials/package/*
docs/source/tutorials/platform/*
docs/source/tutorials_source/platform/data
docs/source/tutorials_source/platform/pizzas
**.zip
!docs/source/tutorials/package.rst
!docs/source/tutorials/platform.rst
!docs/source/tutorials/package/structure_your_input.rst
**.zip
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ clean-out:
rm -fr lightly_outputs/
rm -fr lightning_logs/
rm -fr lightly_epoch_*.ckpt
rm -fr last.ckpt

## remove tox cache
clean-tox:
Expand Down
30 changes: 29 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,16 @@ Lightly is a computer vision framework for self-supervised learning.

## Quick Start

Lightly requires Python 3.6+. We recommend installing Lightly in a **Linux** or **OSX** environment.
Lightly requires **Python 3.6+**. We recommend installing Lightly in a **Linux** or **OSX** environment.

### Requirements

- hydra-core>=1.0.0
- numpy>=1.18.1
- pytorch_lightning>=0.10.0
- requests>=2.23.0
- torchvision
- tqdm

### Installation
You can install Lightly and its dependencies from PyPI with:
Expand All @@ -29,6 +38,25 @@ pip install lightly

We strongly recommend that you install Lightly in a dedicated virtualenv, to avoid conflicting with your system packages.

### Command-Line Interface

Lightly is accessible also through a command-line interface (CLI).
To train a SimCLR model on a folder of images you can simply run
the following command:

```
lightly-train input_dir=/mydataset
```

To create an embedding of a dataset you can use:

```
lightly-embed input_dir=/mydataset checkpoint=/mycheckpoint
```

The embeddings with the corresponding filename are stored in a human-readable .csv file.


### Next Steps
Head to the [documentation](https://docs.lightly.ai) and see the things you can achieve with Lightly!

Expand Down
3 changes: 2 additions & 1 deletion docs/.gcloudignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
source
_data
logos
logos
**lightning_logs
69 changes: 0 additions & 69 deletions docs/source/basic_concepts/command_line_tool.rst

This file was deleted.

101 changes: 101 additions & 0 deletions docs/source/docker/advanced/datapool.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
Datapool
=================

The Lightly Datapool is a tool which allows users to incrementally build up a
dataset for their project. It keeps track of the representations of previously
selected samples and uses this information to pick new samples in order to
maximize the quality of the final dataset. It also allows for combining two
different datasets into one.

- | If you're interested in how the datapool works, go to
| --> `How It Works`_
- | To see how you can use the datapool, check out
| --> `Usage`_

How It Works
---------------

The Lightly Datapool keeps track of the selected samples in a csv file called
`datapool_latest.csv`. It contains the filenames of the selected images, their
embeddings, and their weak labels. Additionally, after training a self-supervised
model, the datapool contains the checkpoint `checkpoint_latest.ckpt` which was
used to generate the embeddings.

The datapool is located in the `shared` directory. In general, it is a directory
with the following structure:


.. code-block:: bash
# example of a datapool
datapool/
+--- datapool_latest.csv
+--- checkpoint_latest.ckpt
+--- history/
The files `datapool_latest.csv` and `checkpoint_latest.csv` are updated after every
run of the Lightly Docker. The history folder contains the previous versions of
the datapool. This feature is meant to prevent accidental overrides and can be
deactivated from the command-line (see `Usage`_ for more information).

Usage
---------------

To **initialize** a datapool, simply pass the name of the datapool as an argument
to your docker run command and sample from a dataset as always. The Lightly Docker
will automatically create a datapool directory and populate it with the required
files.

.. note:: To use the datapool feature, the Lightly Docker requires write access
to a shared directory. This directory can be passed with the `-v` flag.

.. code-block:: console
docker run --gpus all --rm -it \
-v INPUT_DIR:/home/input_dir:ro \
-v SHARED_DIR:/home/shared_dir \
-v OUTPUT_DIR:/home/output_dir \
lightly/sampling:latest \
token=MYAWESOMETOKEN \
append_weak_labels=False \
stopping_condition.min_distance=0.1 \
datapool.name=my_datapool
To **append** to your datapool, pass the name of an existing datapool as an argument.
The Lightly Docker will read the embeddings and filenames from the existing pool and
consider them during sampling. Then, it will update the datapool and checkpoint files.

.. note:: You can't change the dimension of the embeddings once the datapool has
been initialized so choose carefully!

.. code-block:: console
docker run --gpus all --rm -it \
-v OTHER_INPUT_DIR:/home/input_dir:ro \
-v SHARED_DIR:/home/shared_dir \
-v OUTPUT_DIR:/home/output_dir \
lightly/sampling:latest \
token=MYAWESOMETOKEN \
append_weak_labels=False \
stopping_condition.min_distance=0.1 \
datapool.name=my_datapool
To **deactivate automatic archiving** of the past datapool versions, you can pass
set the flag `keep_history` to False.

.. code-block:: console
docker run --gpus all --rm -it \
-v INPUT_DIR:/home/input_dir:ro \
-v SHARED_DIR:/home/shared_dir \
-v OUTPUT_DIR:/home/output_dir \
lightly/sampling:latest \
token=MYAWESOMETOKEN \
append_weak_labels=False \
stopping_condition.min_distance=0.1 \
datapool.name=my_datapool \
datapool.keep_history=False
96 changes: 96 additions & 0 deletions docs/source/docker/advanced/meta_information.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
Meta Information
======================

Depending on your current setup one of the following topics might interest you:

- | You have a dataset but want lightly to "ignore" certain Samples.
| --> `Mask Samples`_
- | You have an existing dataset and want to add only relevant new data.
| --> `Use Pre-Selected Samples`_
- | You have your own (weak) labels. Can lightly use this information to improve
the selection?
| --> `Custom Labels`_

Mask Samples
-----------------------------------

You can also add masking information to prevent certain samples from being
used to the .csv file.

The following example shows a dataset in which the column "masked" is used
to prevent Lightly Docker from using this specific sample. In this example,
img-1.jpg is simply ignored and not considered for sampling. E.g. the sample
neither gets selected nor is it affecting the selection of any other sample.

.. list-table:: masked_embeddings.csv
:widths: 50 50 50 50 50
:header-rows: 1

* - filenames
- embedding_0
- embedding_1
- masked
- labels
* - img-1.jpg
- 0.1
- 0.5
- 1
- 0
* - img-2.jpg
- 0.2
- 0.2
- 0
- 0
* - img-3.jpg
- 0.1
- 0.9
- 0
- 0


Use Pre-Selected Samples
-----------------------------------
Very similar to masking samples we can also pre-select specific samples. This
can be useful for semi-automated data selection processes. A human annotator
can pre-select some of the relevant samples and let Lightly Docker add only
additional samples that are enriching the existing selection.


.. list-table:: selected_embeddings.csv
:widths: 50 50 50 50 50
:header-rows: 1

* - filenames
- embedding_0
- embedding_1
- selected
- labels
* - img-1.jpg
- 0.1
- 0.5
- 0
- 0
* - img-2.jpg
- 0.2
- 0.2
- 0
- 0
* - img-3.jpg
- 0.1
- 0.9
- 1
- 0

.. note:: Pre-selected samples also count for the target number of samples.
For example, you have a dataset with 100 samples. If you pre-select
60 and want to sample 50, sampling would have no effect since there
are already more than 50 samples selected.

Custom Labels
-----------------------------------

You can always add custom embeddings to the dataset by following the guide
here: :ref:`lightly-custom-labels`
10 changes: 10 additions & 0 deletions docs/source/docker/advanced/overview.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
Advanced
===================================
Here you learn more advanced usage patterns of Lightly Docker.


.. toctree::
:maxdepth: 2

meta_information.rst
datapool.rst
Loading

0 comments on commit 1a5bc35

Please sign in to comment.