Skip to content

Commit

Permalink
Merge pull request #258 from lightly-ai/develop
Browse files Browse the repository at this point in the history
Develop to Master - Pre-release 1.1.3
  • Loading branch information
philippmwirth authored Mar 23, 2021
2 parents 356bd3f + 35de5cc commit 75d2623
Show file tree
Hide file tree
Showing 20 changed files with 837 additions and 36 deletions.
149 changes: 117 additions & 32 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,16 @@ Lightly is a computer vision framework for self-supervised learning.
- [Github](https://github.com/lightly-ai/lightly)
- [Discord](https://discord.gg/xvNJW94)

### Supported Models
### Features

Lightly offers features like

- modular framework
- support for multi-gpu training using PyTorch Lightning
- easy to use and written in a PyTorch like style
- supports custom backbone models for self-supervised pre-training

#### Supported Models

- [MoCo, 2019](https://arxiv.org/abs/1911.05722)
- [SimCLR, 2020](https://arxiv.org/abs/2002.05709)
Expand All @@ -33,41 +42,11 @@ Want to jump to the tutorials and see lightly in action?
- [Train SimSiam on satellite images](https://docs.lightly.ai/tutorials/package/tutorial_simsiam_esa.html)
- [Use lightly with custom augmentations](https://docs.lightly.ai/tutorials/package/tutorial_custom_augmentations.html)


### Benchmarks

Currently implemented models and their accuracy on cifar10. All models have been evaluated using kNN. We report the max test accuracy over the epochs as well as the maximum GPU memory consumption. All models in this benchmark use the same augmentations as well as the same ResNet-18 backbone. Training precision is set to FP32 and SGD is used as an optimizer with cosineLR.
One epoch on cifar10 takes ~35 secondson a V100 GPU. [Learn more about the cifar10 benchmark here](https://docs.lightly.ai/getting_started/benchmarks.html)

| Model | Epochs | Batch Size | Test Accuracy | Peak GPU usage |
|---------|--------|------------|---------------|----------------|
| MoCo | 200 | 128 | 0.83 | 2.1 GBytes |
| SimCLR | 200 | 128 | 0.78 | 2.0 GBytes |
| SimSiam | 200 | 128 | 0.73 | 3.0 GBytes |
| MoCo | 200 | 512 | 0.85 | 7.4 GBytes |
| SimCLR | 200 | 512 | 0.83 | 7.8 GBytes |
| SimSiam | 200 | 512 | 0.81 | 7.0 GBytes |
| MoCo | 800 | 128 | 0.89 | 2.1 GBytes |
| SimCLR | 800 | 128 | 0.87 | 1.9 GBytes |
| SimSiam | 800 | 128 | 0.80 | 2.0 GBytes |
| MoCo | 800 | 512 | 0.90 | 7.2 GBytes |
| SimCLR | 800 | 512 | 0.89 | 7.7 GBytes |
| SimSiam | 800 | 512 | 0.91 | 6.9 GBytes |


## Terminology

Below you can see a schematic overview of the different concepts present in the lightly Python package. The terms in bold are explained in more detail in our [documentation](https://docs.lightly.ai).

<img src="docs/source/images/lightly_overview.png" alt="Overview of the lightly pip package"/></a>



## Quick Start

Lightly requires **Python 3.6+**. We recommend installing Lightly in a **Linux** or **OSX** environment.

### Requirements
### Dependencies

- hydra-core>=1.0.0
- numpy>=1.18.1
Expand All @@ -84,6 +63,83 @@ pip3 install lightly

We strongly recommend that you install Lightly in a dedicated virtualenv, to avoid conflicting with your system packages.


### Lightly in Action

With lightly you can use latest self-supervised learning methods in a modular
way using the full power of PyTorch. Experiment with different backbones,
models and loss functions. The framework has been designed to be easy to use
from the ground up.

```python
import torch
import torchvision
import lightly.models as models
import lightly.loss as loss
import lightly.data as data

# the collate function applies random transforms to the input images
collate_fn = data.ImageCollateFunction(input_size=32, cj_prob=0.5)

# create a dataset from your image folder
dataset = data.LightlyDataset(input_dir='./my/cute/cats/dataset/')

# build a PyTorch dataloader
dataloader = torch.utils.data.DataLoader(
dataset, # pass the dataset to the dataloader
batch_size=128, # a large batch size helps with the learning
shuffle=True, # shuffling is important!
collate_fn=collate_fn) # apply transformations to the input images

# use a resnet backbone
resnet = torchvision.models.resnet.resnet18()
resnet = nn.Sequential(*list(resnet.children())[:-1])

# build the simclr model
model = models.SimCLR(resnet, num_ftrs=512)

# use a criterion for self-supervised learning
criterion = loss.NTXentLoss(temperature=0.5)

# get a PyTorch optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=1e-0, weight_decay=1e-5)
```

You can easily use another model like SimSiam by swapping the model and the
loss function.
```python
# build the simsiam model
model = models.SimSiam(resnet, num_ftrs=512)

# use the SimSiam loss function
criterion = loss.SymNegCosineSimilarityLoss()
```

Use PyTorch Lightning to train the model:

```python
trainer = pl.Trainer(max_epochs=max_epochs, gpus=1)
trainer.fit(
model,
dataloader
)
```

Or train the model on 4 GPUs:
```python
trainer = pl.Trainer(
max_epochs=max_epochs,
gpus=4,
distributed_backend='ddp'
)
trainer.fit(
model,
dataloader
)
```



### Command-Line Interface

Lightly is accessible also through a command-line interface (CLI).
Expand All @@ -103,6 +159,35 @@ lightly-embed input_dir=/mydataset checkpoint=/mycheckpoint
The embeddings with the corresponding filename are stored in a
[human-readable .csv file](https://docs.lightly.ai/getting_started/command_line_tool.html#create-embeddings-using-the-cli).


### Benchmarks

Currently implemented models and their accuracy on cifar10. All models have been evaluated using kNN. We report the max test accuracy over the epochs as well as the maximum GPU memory consumption. All models in this benchmark use the same augmentations as well as the same ResNet-18 backbone. Training precision is set to FP32 and SGD is used as an optimizer with cosineLR.
One epoch on cifar10 takes ~35 seconds on a V100 GPU. [Learn more about the cifar10 benchmark here](https://docs.lightly.ai/getting_started/benchmarks.html)

| Model | Epochs | Batch Size | Test Accuracy |
|---------|--------|------------|---------------|
| MoCo | 200 | 128 | 0.83 |
| SimCLR | 200 | 128 | 0.78 |
| SimSiam | 200 | 128 | 0.73 |
| MoCo | 200 | 512 | 0.85 |
| SimCLR | 200 | 512 | 0.83 |
| SimSiam | 200 | 512 | 0.81 |
| MoCo | 800 | 128 | 0.89 |
| SimCLR | 800 | 128 | 0.87 |
| SimSiam | 800 | 128 | 0.80 |
| MoCo | 800 | 512 | 0.90 |
| SimCLR | 800 | 512 | 0.89 |
| SimSiam | 800 | 512 | 0.91 |


## Terminology

Below you can see a schematic overview of the different concepts present in the lightly Python package. The terms in bold are explained in more detail in our [documentation](https://docs.lightly.ai).

<img src="docs/source/images/lightly_overview.png" alt="Overview of the lightly pip package"/></a>


### Next Steps
Head to the [documentation](https://docs.lightly.ai) and see the things you can achieve with Lightly!

Expand Down
8 changes: 8 additions & 0 deletions docs/source/docker/getting_started/first_steps.rst
Original file line number Diff line number Diff line change
Expand Up @@ -79,8 +79,12 @@ There are **three** types of volume mappings:
Typically, your docker command would start like this:

- Map *INPUT_DIR* (from your system) to */home/input_dir* in the container

*e.g. /path/to/my/cat/dataset:/home/input_dir:ro*

- Map *OUTPUT_DIR* (from your system) to */home/output_dir* in the container

*e.g. /path/where/I/want/the/docker/output:/home/output_dir*

- Specify the token to authenticate your user

Expand All @@ -96,6 +100,10 @@ Now, let's see how this will look in action!

.. note:: Learn how to obtain your :ref:`ref-authentication-token`.

.. warning:: Don't forget to replace **INPUT_DIR** and **OUTPUT_DIR** with the path
to your local input and output directory. You must not change the
path after the **:** since this path is describing the internal
file system within the container!

Embedding and Sampling a Dataset
-----------------------------------
Expand Down
34 changes: 34 additions & 0 deletions docs/source/getting_started/lightly_at_a_glance.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,13 @@ Let's now load an image dataset and create a PyTorch dataloader with the collate
shuffle=True, # shuffling is important!
collate_fn=collate_fn) # apply transformations to the input images
.. note:: You can also use a custom PyTorch `Dataset` instead of the
`LightlyDataset`. Just make sure your `Dataset` implementation returns
a tuple of (sample, target, fname) to support the basic functions
for training models. See :py:class:`lightly.data.dataset`
for more information.


Head to the next section to see how you can train a ResNet on the data you just prepared.

Training
Expand Down Expand Up @@ -103,6 +110,33 @@ Put everything together in an embedding model and train it for 10 epochs on a si
Congrats, you just trained your first model using self-supervised learning!

You can also train the model using PyTorch Lightning directly.

.. code-block:: python
trainer = pl.Trainer(max_epochs=max_epochs, gpus=1)
trainer.fit(
model,
dataloader
)
To train on a machine with multiple GPUs we recommend using the
`distributed data parallel` backend.

.. code-block:: python
# if we have a machine with 4 GPUs we set gpus=4
trainer = pl.Trainer(
max_epochs=max_epochs,
gpus=4,
distributed_backend='ddp'
)
trainer.fit(
model,
dataloader
)
Embeddings
^^^^^^^^^^
You can use the trained model to embed your images or even access the embedding
Expand Down
5 changes: 5 additions & 0 deletions docs/source/lightly.transforms.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,8 @@ lightly.transforms
---------------
.. automodule:: lightly.transforms.rotation
:members:

.solarize
---------------
.. automodule:: lightly.transforms.solarize
:members:
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,11 @@
Tutorial 3: Active learning with kNN
==============================================
We provide the tutorial in a ready to use
`Google Colab <https://colab.research.google.com/drive/1E3rz7fY7UqXNI_VYNxSu6KvQINzotwrz?usp=sharing>`_
notebook:
In this tutorial, we will run an active learning loop using both the lightly package and the platform.
An active learning loop is a sequence of multiple samplings each choosing only a subset
of all samples in the dataset.
Expand Down Expand Up @@ -33,7 +38,6 @@
We use the euclidean distance between a sample's embeddings as the distance metric.
The advantage of such a classifier compared to CNNs is that it is very fast and easily implemented.
What you will learn
-------------------
* You learn how an active learning loop is set up and which components are needed for it.
Expand Down
6 changes: 5 additions & 1 deletion lightly/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,8 @@
- SimSiam
- Barlow Twins
- **transforms**:
The lightly.transforms module implements custom data transforms. Currently implements:
Expand All @@ -58,6 +60,8 @@
- Random Rotation
- Random Solarization
- **utils**:
The lightly.utils package provides global utility methods.
Expand All @@ -70,7 +74,7 @@
# All Rights Reserved

__name__ = 'lightly'
__version__ = '1.1.2'
__version__ = '1.1.3'


try:
Expand Down
Empty file.
2 changes: 1 addition & 1 deletion lightly/active_learning/scorers/classification.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ class ScorerClassification(Scorer):
"""
def __init__(self, model_output: np.ndarray):
self.model_output = model_output
super(ScorerClassification, self).__init__(model_output)

def _calculate_scores(self) -> Dict[str, np.ndarray]:
scores = dict()
Expand Down
Loading

0 comments on commit 75d2623

Please sign in to comment.