Skip to content

Commit

Permalink
Merge pull request #320 from lightly-ai/develop
Browse files Browse the repository at this point in the history
- lightly-download has the flag exclude_parent_tag
- lightly-upload and lightly-magic support new_dataset_name to directly create a new dataset
- Bugfixes
- better documentation
  • Loading branch information
MalteEbner authored Apr 22, 2021
2 parents bb64b5c + 7314e93 commit 6f7e711
Show file tree
Hide file tree
Showing 45 changed files with 1,293 additions and 114 deletions.
21 changes: 21 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE/PR_template_checklist.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
closes #issue_number

## Description
- [ ] My change is breaking
Please_describe_what_you_changed_and_why___You_do_not_need_to_repeat_stuff_from_the_issue

## Tests
- [ ] My change is covered by existing tests.
- [ ] My change needs new tests.
- [ ] I have added/adapted the tests accordingly.
- [ ] I have manually tested the change. if_yes_describe_how

## Documentation
- [ ] I have added docstrings to all public functions/methods.
- [ ] My change requires a change to the documentation ( `.rst` files).
- [ ] I have updated the documentation accordingly.
- [ ] The autodocs update the documentation accordingly.

## Implications / comments / further issues
- #e_g_link_to_issue_to_cover_breaking_changes

31 changes: 31 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE/PR_template_checklist_full.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
closes #issue_number

## Description
- [ ] My change is breaking
Please_describe_what_you_changed_and_why___You_do_not_need_to_repeat_stuff_from_the_issue

## Tests
- [ ] My change is covered by existing tests
- [ ] My change needs new tests
- [ ] I have added/adapted tests accordingly.
- [ ] I have manually tested the change.

If applicable, describe the manual test procedure, e.g:
```bash
pip uninstall lightly
export BRANCH_NAME="branch_name"
pip install "git+https://github.com/lightly-ai/lightly.git@$BRANCH_NAME"
lightly-cli_do_something_command
```

## Documentation
- [ ] I have added docstrings to all changed/added public functions/methods.
- [ ] My change requires a change to the documentation ( `.rst` files).
- [ ] I have updated the documentation accordingly.
- [ ] The autodocs update the documentation accordingly.`

## Improvements put into another issue:
- #issue_number

## Issues covering the breaking change:
- #link_to_issue_in_other_repo to adapt the other side of the breaking change
12 changes: 12 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE/PR_template_minimal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
closes #issue_number

## Description
Please_describe_what_you_changed_and_why___You_do_not_need_to_repeat_stuff_from_the_issue

## Documentation
- [ ] I have updated the documentation.
- [ ] I need help on it.

## Tests
- [ ] I have updated the tests.
- [ ] I need help on it.
2 changes: 1 addition & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ jobs:
run: pip install -e '.[all]'
- name: Run Pytest
run: |
LIGHTLY_SERVER_LOCATION="localhost:-1"
export LIGHTLY_SERVER_LOCATION="localhost:-1"
pip install pytest-cov
python -m pytest -s -v --runslow --cov=./lightly --cov-report=xml --ignore=./lightly/openapi_generated/
- name: Upload coverage to Codecov
Expand Down
7 changes: 4 additions & 3 deletions docs/source/getting_started/active_learning.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,9 +45,10 @@ Lightly makes use of the following concepts for active learning:

* **Scorer:** :py:class:`lightly.active_learning.scorers.scorer.Scorer`
The `Scorer` takes as input the predictions of a pre-trained model on the set
of unlabeled images. It evaluates different scores based on how certain the model
is about the images and passes them to the API so the sampler can use them with
Coral.
of unlabeled images. It offers a `calculate_scores()` method, which evaluates
different scores based on how certain the model is about the images. When
performing a sampling, the scores are passed to the API so the sampler can use
them with Coral.


Continue reading to see how these components interact and how active learning is
Expand Down
10 changes: 8 additions & 2 deletions docs/source/getting_started/command_line_tool.rst
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,9 @@ Upload data using the CLI
In this example we will upload a dataset to the Lightly Platform.
First, make sure you have an account on `Lightly <https://www.lightly.ai>`_.
A free account is sufficient. Log in to the app and create a new dataset.
You will get a *token* and *dataset_id* which can be used to upload your dataset
You will get a *token* and *dataset_id* which can be used to upload your dataset.
Alternatively, you can create a new dataset directly with the *token*
by providing the *new_dataset_name* instead of the *dataset_id*.

.. code-block:: bash
Expand All @@ -110,6 +112,9 @@ You will get a *token* and *dataset_id* which can be used to upload your dataset
lightly-upload input_dir=cat embeddings=your_embedding.csv \
token=your_token dataset_id=your_dataset_id
# create a new dataset and upload to it
lightly-upload input_dir=cat token=your_token new_dataset_name=your_dataset_name
.. note:: To obtain your *token* and *dataset_id* check:
:ref:`ref-authentication-token` and :ref:`ref-webapp-dataset-id`.

Expand All @@ -120,6 +125,7 @@ Upload embeddings using the CLI
----------------------------------

You can upload embeddings directly to the Lightly Platform using the CLI.
Again, you can use the *dataset_id* and *new_dataset_name* interchangeably.

.. code-block:: bash
Expand All @@ -129,7 +135,7 @@ You can upload embeddings directly to the Lightly Platform using the CLI.
# you can upload the dataset together with the embeddings
lightly-upload input_dir=cat embeddings=your_embedding.csv \
token=your_token dataset_id=your_dataset_id
token=your_token new_dataset_name=your_dataset_name
Download data using the CLI
Expand Down
33 changes: 30 additions & 3 deletions docs/source/getting_started/platform.rst
Original file line number Diff line number Diff line change
Expand Up @@ -159,12 +159,39 @@ drag-and-drop or using the Python Package according to:
You can upload up to 1'000 images using the frontend.


Images can also be uploaded from a Python script:

.. code-block:: python
from lightly.api.api_workflow_client import ApiWorkflowClient
client = ApiWorkflowClient(token='123'm dataset_id='xyz')
# change mode to 'thumbnails' or 'meta' if you're working with sensitive data
client.upload_dataset('path/to/your/images/', mode='full')
Upload Embeddings
-------------------------

Embeddings can be uploaded using the Python Package.
You can not upload embedding through the web interface. Instead
:ref:`ref-upload-embedding-lightly`
Embeddings can be uploaded using the Python Package or the front-end. The simplest
way to upload the embeddings is from the command line: :ref:`ref-upload-embedding-lightly`.

If you have a numpy array of image embeddings, the filenames of the images, and categorical pseudo-labels,
you can use the `save_embeddings` function to store them in a lightly-compatible CSV format and upload
them from your Python code or using the CLI. The following snippet shows how to upload the embeddings from Python.

.. code-block:: python
from lightly.utils import save_embeddings
from lightly.api.api_workflow_client import ApiWorkflowClient
# store the embeddings in a lightly compatible CSV format before uploading
# them to the platform
save_embeddings('embeddings.csv', embeddings, labels, filenames)
# upload the embeddings.csv file to the platform
client = ApiWorkflowClient(token='123', dataset_id='xyz')
client.upload_embeddings('embeddings.csv', name='my-embeddings')
Sampling
Expand Down
6 changes: 6 additions & 0 deletions docs/source/lightly.active_learning.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,9 @@ lightly.active_learning
.. automodule:: lightly.active_learning.scorers.detection
:members:

.utils
--------
.. automodule:: lightly.active_learning.utils.bounding_box
:members:
.. automodule:: lightly.active_learning.utils.object_detection_output
:members:
14 changes: 14 additions & 0 deletions docs/source/lightly.api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,20 @@ lightly.api
.. automodule:: lightly.api.api_workflow_client
:members:

.. automodule:: lightly.api.api_workflow_datasets
:members:

.. automodule:: lightly.api.api_workflow_download_dataset
:members:

.. automodule:: lightly.api.api_workflow_sampling
:members:

.. automodule:: lightly.api.api_workflow_upload_dataset
:members:

.. automodule:: lightly.api.api_workflow_upload_embeddings
:members:

.utils
---------------
Expand Down
2 changes: 1 addition & 1 deletion docs/source/tutorials/structure_your_input.rst
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,7 @@ To upload the three videos from above to the platform, you can use

.. code-block:: bash
lightly-upload token='123' dataset_id='XYZ' input_dir='data/'
lightly-upload token='123' new_dataset_name='my_video_dataset' input_dir='data/'
All other operations (like training a self-supervised model and embedding the frames individually)
also work on video data. Give it a try!
Expand Down
2 changes: 1 addition & 1 deletion lightly/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@
# All Rights Reserved

__name__ = 'lightly'
__version__ = '1.1.5'
__version__ = '1.1.6'


try:
Expand Down
66 changes: 37 additions & 29 deletions lightly/active_learning/agents/agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,8 @@ class ActiveLearningAgent:
"""

def __init__(self, api_workflow_client: ApiWorkflowClient, query_tag_name: str = None, preselected_tag_name: str = None):
def __init__(self, api_workflow_client: ApiWorkflowClient, query_tag_name: str = None,
preselected_tag_name: str = None):

self.api_workflow_client = api_workflow_client
if query_tag_name is not None or preselected_tag_name is not None:
Expand All @@ -76,28 +77,31 @@ def _set_labeled_and_unlabeled_set(self, preselected_tag_data: TagData = None):
optional param, then it must not be loaded from the API
"""
if self.preselected_tag_id is None:
self.labeled_set = []
else:
if preselected_tag_data is None:

if not hasattr(self, "bitmask_labeled_set"):
self.bitmask_labeled_set = BitMask.from_hex("0x0") # empty labeled set
self.bitmask_added_set = BitMask.from_hex("0x0") # empty added set
if self.preselected_tag_id is not None: # else the default values (empty labeled and added set) are kept
if preselected_tag_data is None: # if it is not passed as argument, it must be loaded from the API
preselected_tag_data = self.api_workflow_client.tags_api.get_tag_by_tag_id(
self.api_workflow_client.dataset_id, tag_id=self.preselected_tag_id)
chosen_samples_ids = BitMask.from_hex(preselected_tag_data.bit_mask_data).to_indices()
self.labeled_set = [self.api_workflow_client.filenames_on_server[i] for i in chosen_samples_ids]

if not hasattr(self, "unlabeled_set"):
if self.query_tag_id is None:
self.unlabeled_set = self.api_workflow_client.filenames_on_server
else:
query_tag_data = self.api_workflow_client.tags_api.get_tag_by_tag_id(
self.api_workflow_client.dataset_id, tag_id=self.query_tag_id)
chosen_samples_ids = BitMask.from_hex(query_tag_data.bit_mask_data).to_indices()
self.unlabeled_set = [self.api_workflow_client.filenames_on_server[i] for i in chosen_samples_ids]

filenames_labeled = set(self.labeled_set)
self.unlabeled_set = [f for f in self.unlabeled_set if f not in filenames_labeled]

def query(self, sampler_config: SamplerConfig, al_scorer: Scorer = None) -> List[str]:
new_bitmask_labeled_set = BitMask.from_hex(preselected_tag_data.bit_mask_data)
self.bitmask_added_set = new_bitmask_labeled_set - self.bitmask_labeled_set
self.bitmask_labeled_set = new_bitmask_labeled_set

if self.query_tag_id is None:
bitmask_query_tag = BitMask.from_length(len(self.api_workflow_client.filenames_on_server))
else:
query_tag_data = self.api_workflow_client.tags_api.get_tag_by_tag_id(
self.api_workflow_client.dataset_id, tag_id=self.query_tag_id)
bitmask_query_tag = BitMask.from_hex(query_tag_data.bit_mask_data)
self.bitmask_unlabeled_set = bitmask_query_tag - self.bitmask_labeled_set

self.labeled_set = self.bitmask_labeled_set.masked_select_from_list(self.api_workflow_client.filenames_on_server)
self.added_set = self.bitmask_added_set.masked_select_from_list(self.api_workflow_client.filenames_on_server)
self.unlabeled_set = self.bitmask_unlabeled_set.masked_select_from_list(self.api_workflow_client.filenames_on_server)

def query(self, sampler_config: SamplerConfig, al_scorer: Scorer = None) -> Tuple[List[str], List[str]]:
"""Performs an active learning query.
As part of it, the self.labeled_set and self.unlabeled_set are updated
Expand All @@ -110,26 +114,30 @@ def query(self, sampler_config: SamplerConfig, al_scorer: Scorer = None) -> List
An instance of a class inheriting from Scorer, e.g. a ClassificationScorer.
Returns:
The filenames of the samples in the new labeled_set.
The filenames of the samples in the new labeled_set
and the filenames of the samples chosen by the sampler.
This added_set was added to the old labeled_set
to form the new labeled_set.
"""
# check input
if sampler_config.n_samples < len(self.labeled_set):
warnings.warn("ActiveLearningAgent.query: The number of samples which should be sampled "
"including the current labeled set "
"(sampler_config.n_samples) "
"is smaller than the number of samples in the current labeled set.")
return self.labeled_set
"including the current labeled set "
"(sampler_config.n_samples) "
"is smaller than the number of samples in the current labeled set."
"Skipping the sampling and returning the previous labeled set.")
return self.labeled_set, []

# calculate scores
if al_scorer is not None:
no_unlabeled_samples = len(self.unlabeled_set)
no_samples_with_predictions = len(al_scorer.model_output)
if no_unlabeled_samples != no_samples_with_predictions:
raise ValueError(f"The scorer must have exactly as much samples as in the unlabeled set,"
raise ValueError(f"The scorer must have exactly as many samples as in the unlabeled set,"
f"but there are {no_samples_with_predictions} predictions in the scorer,"
f"but {no_unlabeled_samples} in the unlabeled set.")
scores_dict = al_scorer._calculate_scores()
scores_dict = al_scorer.calculate_scores()
else:
scores_dict = None

Expand All @@ -144,4 +152,4 @@ def query(self, sampler_config: SamplerConfig, al_scorer: Scorer = None) -> List
self.preselected_tag_id = new_tag_data.id
self._set_labeled_and_unlabeled_set(new_tag_data)

return self.labeled_set
return self.labeled_set, self.added_set
8 changes: 7 additions & 1 deletion lightly/active_learning/scorers/classification.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,13 @@ class ScorerClassification(Scorer):
def __init__(self, model_output: np.ndarray):
super(ScorerClassification, self).__init__(model_output)

def _calculate_scores(self) -> Dict[str, np.ndarray]:
def calculate_scores(self) -> Dict[str, np.ndarray]:
"""Calculates and returns the active learning scores.
Returns:
A dictionary mapping from the score name (as string)
to the scores (as a single-dimensional numpy array).
"""
scores = dict()
scores["prediction-margin"] = self._get_prediction_margin_score()
scores["prediction-entropy"] = self._get_prediction_entropy_score()
Expand Down
8 changes: 7 additions & 1 deletion lightly/active_learning/scorers/detection.py
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,13 @@ def _check_config(self):
else:
self.config = default_conf

def _calculate_scores(self) -> Dict[str, np.ndarray]:
def calculate_scores(self) -> Dict[str, np.ndarray]:
"""Calculates and returns the active learning scores.
Returns:
A dictionary mapping from the score name (as string)
to the scores (as a single-dimensional numpy array).
"""
scores = dict()
scores['object-frequency'] = self._get_object_frequency()
scores['prediction-margin'] = self._get_prediction_margin()
Expand Down
4 changes: 3 additions & 1 deletion lightly/active_learning/scorers/scorer.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,7 @@ class Scorer():
def __init__(self, model_output):
self.model_output = model_output

def _calculate_scores(self) -> Dict[str, np.ndarray]:
def calculate_scores(self) -> Dict[str, np.ndarray]:
"""Calculates and returns active learning scores in a dictionary.
"""
raise NotImplementedError
7 changes: 7 additions & 0 deletions lightly/active_learning/utils/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
""" Collection of Utils for Active Learning """

# Copyright (c) 2020. Lightly AG and its affiliates.
# All Rights Reserved

from lightly.active_learning.utils.bounding_box import BoundingBox
from lightly.active_learning.utils.object_detection_output import ObjectDetectionOutput
1 change: 1 addition & 0 deletions lightly/api/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,5 @@
# Copyright (c) 2020. Lightly AG and its affiliates.
# All Rights Reserved

from lightly.api.api_workflow_client import ApiWorkflowClient
from lightly.api import routes
Loading

0 comments on commit 6f7e711

Please sign in to comment.