Releases: lightly-ai/lightly
Faster metadata upload, bugfixes and improvements
Faster metadata upload, bugfixes and improvements
Faster metadata upload
The upload of custom metadata is now done async and with multiple workers in parallel, allowing to speed up the upload process by up to 30 times.
Bugfixes
- When there is a failure uploading a file to a signed url, now the status code is printed correctly.
- Creating a
LightlyDataset
with aninput_dir
with videos will now raise all errors scanning the input directory instead of ignoring them. This means e.g. that if a subfolder without read permissions is encountered, aPermissionError
will be raised instead of silently ignoring the subfolder. - When embedding, the order of the embeddings in the output will now be the order of the samples in the dataset, even if multiple workers are used in the dataloader. Thus also the embeddings in the embedding file are in the sorted order. This is not directly a bugfix but might prevent problems later on.
Improvements
- The usage of resnet backbones in the example models is now consistent. Thanks for bringing this up @JeanKaddour!
- The SimCLR example now does not use Gaussian blur anymore, just like in the paper. Thanks for pointing this out @littleolex!
- The BarlowTwins example now also uses an input size of 32 to make it consistent with the other examples. Thanks for bringing this up @heytitle!
- The documentation for setting up Azure as cloud storage for the Lightly Platform has been improved.
Models
- Bootstrap your own latent: A new approach to self-supervised Learning, 2020
- Barlow Twins: Self-Supervised Learning via Redundancy Reduction, 2021
- SimSiam: Exploring Simple Siamese Representation Learning, 2020
- MoCo: Momentum Contrast for Unsupervised Visual Representation Learning, 2019
- SimCLR: A Simple Framework for Contrastive Learning of Visual Representations, 2020
- NNCLR: Nearest-Neighbor Contrastive Learning of Visual Representations, 2021
- SwAV: Unsupervised Learning of Visual Features by Contrasting Cluster Assignments, M. Caron, 2020
Speeding up io and improving documentation
Bug fixes and documentation updates
Documentation Updates
We have now added support for other cloud storage providers. You can now work directly with data stored in AWS S3, Azure Blob Storage, and Google Cloud Storage. Furthermore, you can stream data directly from your local filesystem in the Lightly Platform without uploading any images/ videos to any cloud. Check out the instructions here!
Performance
- We improved the dataset indexing that is used whenever you create a lightly dataset. Indexing of large datasets (>1 mio samples) now works much faster.
Models
- Bootstrap your own latent: A new approach to self-supervised Learning, 2020
- Barlow Twins: Self-Supervised Learning via Redundancy Reduction, 2021
- SimSiam: Exploring Simple Siamese Representation Learning, 2020
- MoCo: Momentum Contrast for Unsupervised Visual Representation Learning, 2019
- SimCLR: A Simple Framework for Contrastive Learning of Visual Representations, 2020
- NNCLR: Nearest-Neighbor Contrastive Learning of Visual Representations, 2021
- SwAV: Unsupervised Learning of Visual Features by Contrasting Cluster Assignments, M. Caron, 2020
Bug fixes and documentation updates
Bug fixes and documentation updates
Bug fixes
We fixed the bug that lightly-download
with the option exclude_parent_tag
didn't work and a bug in the api_workflow_upload_metadata
introduced by the change which made the apis private members.
Documentation Updates
The docs have received a fresh new look! Additionally, we have added tutorials about how to use Lightly with data hosted on S3 and how to export data directly to Labelstudio (no download needed!)
CLI
The CLI now stores important results such as the dataset_id
, path_to_embeddings
and path_to_checkpoint
in the environment variables LIGHTLY_LAST_CHECKPOINT_PATH
, LIGHTLY_LAST_EMBEDDING_PATH
, and LIGHTLY_LAST_DATASET_ID
.
Models
- Bootstrap your own latent: A new approach to self-supervised Learning, 2020
- Barlow Twins: Self-Supervised Learning via Redundancy Reduction, 2021
- SimSiam: Exploring Simple Siamese Representation Learning, 2020
- MoCo: Momentum Contrast for Unsupervised Visual Representation Learning, 2019
- SimCLR: A Simple Framework for Contrastive Learning of Visual Representations, 2020
- NNCLR: Nearest-Neighbor Contrastive Learning of Visual Representations, 2021
- SwAV: Unsupervised Learning of Visual Features by Contrasting Cluster Assignments, M. Caron, 2020
Refactoring Models
Low-Level Building Blocks
To improve the flexibility of the lightly framework we refactored our models into smaller building blocks. These blocks can now easily be assembled into novel model architectures and allow lightly to better integrate with other deep learning libraries such as PyTorch Lightning.
Examples
We provide example implementations using low-level building blocks for all the models we support in the new Examples section in our documentation.
We also updated the other parts of the documentation to use the new building blocks. We hope that this makes it easier to integrate lightly into your own project!
Deprecation
As part of this refactoring have added a deprecation warning to all old models under lightly/models
. We intend to remove those models in version 1.3.0.
Detectron2 Pretraining Tutorial
We created a new tutorial which shows how to use lightly to pre-train an object detection model with the Detectron2 framework.
Models
- Bootstrap your own latent: A new approach to self-supervised Learning, 2020
- Barlow Twins: Self-Supervised Learning via Redundancy Reduction, 2021
- SimSiam: Exploring Simple Siamese Representation Learning, 2020
- MoCo: Momentum Contrast for Unsupervised Visual Representation Learning, 2019
- SimCLR: A Simple Framework for Contrastive Learning of Visual Representations, 2020
- NNCLR: Nearest-Neighbor Contrastive Learning of Visual Representations, 2021
- SwAV: Unsupervised Learning of Visual Features by Contrasting Cluster Assignments, M. Caron, 2020
Bugfixes, Better VideoDatasets
Bugfix: Upload Embeddings
Uploading embeddings to a new dataset through lightly-magic
or lightly-upload
raised an error. This is now fixed. Thanks to @natejenkins for the help!
Bugfix: lightly-download
with integer tag names
lightly-download
now supports downloading datasets with integer tag names.
Readme Overview Image
The overview image in the readme file should now point again to the right address. Thanks @vnshanmukh for the contribution!
VideoDatasets are more efficient
VideoDatasets now precompute the dataset length instead of recalculating it every time a frame is accessed.
Other
Added scikit-learn
and pandas
to dev dependencies.
Added more API tests.
Models
- Bootstrap your own latent: A new approach to self-supervised Learning, 2020
- Barlow Twins: Self-Supervised Learning via Redundancy Reduction, 2021
- SimSiam: Exploring Simple Siamese Representation Learning, 2020
- MoCo: Momentum Contrast for Unsupervised Visual Representation Learning, 2019
- SimCLR: A Simple Framework for Contrastive Learning of Visual Representations, 2020
- NNCLR: Nearest-Neighbor Contrastive Learning of Visual Representations, 2021
- SwAV: Unsupervised Learning of Visual Features by Contrasting Cluster Assignments, M. Caron, 2020
Dataset Upsizing, Bugfixes
Dataset Upsizing, Bugfixes
Dataset Upsizing
You can now add new samples and embedding to an existing dataset. Just run the usual lightly-upload
or lightly-magic
command with the dataset_id
of an existing dataset and it will upload all new images to it. The embeddings are also updated.
Bugfix: ResnetGenerator now uses the argument num_classes
correctly.
Before the fix, it was hardcoded to 10 classes. Thanks to @smartdanny for finding and fixing this bug!
Bugfix: NNCLR
NNCLR had a bug that the projection and prediction head were not connected correctly. Thanks to @HBU-Lin-Li for finding this bug!
Bugfix: Version check timeout
When lightly starts, it checks if a newer version is available. This check could occur multiple times due to circular imports and it could take long if your don't have an internet connect. We fixed this to do only one version check and restrict its duration to 1s. Thanks to @luzuku for finding this bug!
Models
- Bootstrap your own latent: A new approach to self-supervised Learning, 2020
- Barlow Twins: Self-Supervised Learning via Redundancy Reduction, 2021
- SimSiam: Exploring Simple Siamese Representation Learning, 2020
- MoCo: Momentum Contrast for Unsupervised Visual Representation Learning, 2019
- SimCLR: A Simple Framework for Contrastive Learning of Visual Representations, 2020
- NNCLR: Nearest-Neighbor Contrastive Learning of Visual Representations, 2021
- SwAV: Unsupervised Learning of Visual Features by Contrasting Cluster Assignments, M. Caron, 2020
Video Datasets with Subfolders, Specify Relevant Files
Video Datasets with Subfolders, Specify Relevant Files
Video Datasets with Subfolders
Just like for image datasets, now also video datasets with the videos in subfolders are supported. E.g. you can have the following input directory:
/path/to/data/
L subfolder_1/
L my-video-1-1.mp4
L my-video-1-2.mp4
L subfolder_2/
L my-video-2-1.mp4
Specify relevant files
When creating a LightlyDataset
you can now also specify the argument filenames
. It must be a list of filenames relative to the input directory. Then the dataset only uses the files specified and ignores all other files. E.g. using
LightlyDataset(input_dir='/path/to/data', filenames=['subfolder_1/my-video-1-1.mp4', 'subfolder_2/my-video-2-1.mp4'])
will only create a dataset out of the two specified files and ignore the third file.
Other
We added the SwAV model to the README, it was already added to the documentation.
Models
- Bootstrap your own latent: A new approach to self-supervised Learning, 2020
- Barlow Twins: Self-Supervised Learning via Redundancy Reduction, 2021
- SimSiam: Exploring Simple Siamese Representation Learning, 2020
- MoCo: Momentum Contrast for Unsupervised Visual Representation Learning, 2019
- SimCLR: A Simple Framework for Contrastive Learning of Visual Representations, 2020
- NNCLR: Nearest-Neighbor Contrastive Learning of Visual Representations, 2021
- SwAV: Unsupervised Learning of Visual Features by Contrasting Cluster Assignments, M. Caron, 2020
Refactor Models, SwAV Model, S3-Bucket Integration
Refactor Models, SwAV Model, S3-Bucket Integration
Refactor Models
This release will make it much easier to implement new models or adapt existing models by using basic building blocks. E.g. you can define your own model out of blocks like a backbone, projection head, momentum encoder, nearest neighbour memory bank and more.
We want you to see easily how the models in current papers are build and that different papers often only differ in one or two of these blocks.
Compatible examples of all models are shown in the benchmarking scripts for imagenette and cifar10.
As part of this refactoring to improve flexibility of the framework we have added a deprecation warning to all old models under lightly/models
, e.g.:
The high-level building block NNCLR will be deprecated in version 1.2.0.
Use low-level building blocks instead.
See https://docs.lightly.ai/lightly.models.html for more information
These models will be removed with the upcoming version 1.2. The necessity of the refactoring stems from a lack of flexibility which makes it difficult to keep up with the latest publications.
SwAV Model
Lightly now supports the Swapping assignment between views (SWaV) paper. Thanks to the new system with building blocks, we could implement it more easily.
S3Bucket Integration
- We added documentation on how to use an S3Bucket as input directory for lightly. It allows you to train your model and create embeddings without needing to download all your data.
Other
- When uploading the embeddings to the Lightly Platform, no file
embeddings_sorted.csv
is created anymore, as it was only used internally. We also made the upload of large embeddings files faster.
Models
- Bootstrap your own latent: A new approach to self-supervised Learning, 2020
- Barlow Twins: Self-Supervised Learning via Redundancy Reduction, 2021
- SimSiam: Exploring Simple Siamese Representation Learning, 2020
- MoCo: Momentum Contrast for Unsupervised Visual Representation Learning, 2019
- SimCLR: A Simple Framework for Contrastive Learning of Visual Representations, 2020
- NNCLR: Nearest-Neighbor Contrastive Learning of Visual Representations, 2021
- SwAV: Unsupervised Learning of Visual Features by Contrasting Cluster Assignments, M. Caron, 2020
Refactored Prediction Heads and Jigsaw
Refactored Prediction Heads and Jigsaw
Refactored Prediction Heads
Excited to bring the newly refactored prediction and projection heads to you! The new abstractions are easy to understand and
can be extended to arbitrary projection head implementations - making the framework more flexible. Additionally, the implementation of each projection head is now based on a direct citation from the respective paper. Check it out here.
Breaking Changes:
- The argument
num_mlp_layers
was removed from SimSiam and NNCLR and defaults to 3 (as in the respective papers). - The projection heads and prediction heads of the models are now separate modules which might break old checkpoints. However, the following function helps loading old checkpoints:
load_from_state_dict
Jigsaw (@shikharmn)
Lightly now features the jigsaw augmentation! Thanks a lot @shikharmn for your contribution.
Documentation Updates
Parts of the documentation have been refactored to give a clearer overview of the features lightly provides. Additionally, external tutorials have been linked so that everything is in one place.
Bug Fixes
- The
lightly-crop
feature now has a smaller memory footprint - Filenames containing commas are now ignored
- Checks for the latest pip version occur less often
Models
- Bootstrap your own latent: A new approach to self-supervised Learning, 2020
- Barlow Twins: Self-Supervised Learning via Redundancy Reduction, 2021
- SimSiam: Exploring Simple Siamese Representation Learning, 2020
- MoCo: Momentum Contrast for Unsupervised Visual Representation Learning, 2019
- SimCLR: A Simple Framework for Contrastive Learning of Visual Representations, 2020
- NNCLR: Nearest-Neighbor Contrastive Learning of Visual Representations, 2021
Custom Metadata, 3 New Tutorials
Custom Metadata
Lightly now supports uploading custom metadata, which can be used in the Lightly Web-app.
Tutorial on custom metadata
We added a new tutorial on how to create and use custom metadata to understand your dataset even better.
Tutorial to use lightly to find false negatives in object detection.
Do you have problems with your object detector not finding all objects? Lightly can help to you to find these false negatives. We created a tutorial describing how to do it.
Tutorial to embed the Lightly docker into a Dagster pipeline
Do you want to use the Lightly Docker as part of a bigger data pipeline, e.g. with Dagster? We added a tutorial on how to do it.
Models
- Bootstrap your own latent: A new approach to self-supervised Learning, 2020
- Barlow Twins: Self-Supervised Learning via Redundancy Reduction, 2021
- SimSiam: Exploring Simple Siamese Representation Learning, 2020
- MoCo: Momentum Contrast for Unsupervised Visual Representation Learning, 2019
- SimCLR: A Simple Framework for Contrastive Learning of Visual Representations, 2020
- NNCLR: Nearest-Neighbor Contrastive Learning of Visual Representations, 2021