Merge pull request #320 from lightly-ai/develop

- lightly-download has the flag exclude_parent_tag - lightly-upload and lightly-magic support new_dataset_name to directly create a new dataset - Bugfixes - better documentation
lightly-ai · Apr 22, 2021 · 6f7e711 · 6f7e711
2 parents bb64b5c + 7314e93
commit 6f7e711
Show file tree

Hide file tree

Showing 45 changed files with 1,293 additions and 114 deletions.
diff --git a/.github/PULL_REQUEST_TEMPLATE/PR_template_checklist.md b/.github/PULL_REQUEST_TEMPLATE/PR_template_checklist.md
@@ -0,0 +1,21 @@
+closes #issue_number
+
+## Description
+- [ ] My change is breaking
+Please_describe_what_you_changed_and_why___You_do_not_need_to_repeat_stuff_from_the_issue
+
+## Tests
+- [ ] My change is covered by existing tests.
+- [ ] My change needs new tests.
+- [ ] I have added/adapted the tests accordingly.
+- [ ] I have manually tested the change. if_yes_describe_how
+
+## Documentation
+- [ ] I have added docstrings to all public functions/methods.
+- [ ] My change requires a change to the documentation ( `.rst` files).
+- [ ] I have updated the documentation accordingly.
+- [ ] The autodocs update the documentation accordingly.
+
+## Implications / comments / further issues
+- #e_g_link_to_issue_to_cover_breaking_changes
+
diff --git a/.github/PULL_REQUEST_TEMPLATE/PR_template_checklist_full.md b/.github/PULL_REQUEST_TEMPLATE/PR_template_checklist_full.md
@@ -0,0 +1,31 @@
+closes #issue_number
+
+## Description
+- [ ] My change is breaking
+Please_describe_what_you_changed_and_why___You_do_not_need_to_repeat_stuff_from_the_issue
+
+## Tests
+- [ ] My change is covered by existing tests
+- [ ] My change needs new tests
+- [ ] I have added/adapted tests accordingly.
+- [ ] I have manually tested the change. 
+
+If applicable, describe the manual test procedure, e.g:
+```bash
+pip uninstall lightly
+export BRANCH_NAME="branch_name"
+pip install "git+https://github.com/lightly-ai/lightly.git@$BRANCH_NAME"
+lightly-cli_do_something_command
+```
+
+## Documentation
+- [ ] I have added docstrings to all changed/added public functions/methods.
+- [ ] My change requires a change to the documentation ( `.rst` files).
+- [ ] I have updated the documentation accordingly.
+- [ ] The autodocs update the documentation accordingly.`
+
+## Improvements put into another issue:
+- #issue_number
+
+## Issues covering the breaking change:
+- #link_to_issue_in_other_repo to adapt the other side of the breaking change
diff --git a/.github/PULL_REQUEST_TEMPLATE/PR_template_minimal.md b/.github/PULL_REQUEST_TEMPLATE/PR_template_minimal.md
@@ -0,0 +1,12 @@
+closes #issue_number
+
+## Description
+Please_describe_what_you_changed_and_why___You_do_not_need_to_repeat_stuff_from_the_issue
+
+## Documentation
+- [ ] I have updated the documentation.
+- [ ] I need help on it.
+
+## Tests
+- [ ] I have updated the tests.
+- [ ] I need help on it.
diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
@@ -27,7 +27,7 @@ jobs:
       run: pip install -e '.[all]'
     - name: Run Pytest
       run: |
-        LIGHTLY_SERVER_LOCATION="localhost:-1"
+        export LIGHTLY_SERVER_LOCATION="localhost:-1"
         pip install pytest-cov
         python -m pytest -s -v --runslow --cov=./lightly --cov-report=xml --ignore=./lightly/openapi_generated/
     - name: Upload coverage to Codecov

diff --git a/docs/source/getting_started/active_learning.rst b/docs/source/getting_started/active_learning.rst
@@ -45,9 +45,10 @@ Lightly makes use of the following concepts for active learning:
 
 * **Scorer:** :py:class:`lightly.active_learning.scorers.scorer.Scorer`
    The `Scorer` takes as input the predictions of a pre-trained model on the set
-   of unlabeled images. It evaluates different scores based on how certain the model
-   is about the images and passes them to the API so the sampler can use them with
-   Coral.
+   of unlabeled images. It offers a `calculate_scores()` method, which evaluates
+   different scores based on how certain the model is about the images. When
+   performing a sampling, the scores are passed to the API so the sampler can use
+   them with Coral.
 
 
 Continue reading to see how these components interact and how active learning is

diff --git a/docs/source/getting_started/command_line_tool.rst b/docs/source/getting_started/command_line_tool.rst
@@ -99,7 +99,9 @@ Upload data using the CLI
 In this example we will upload a dataset to the Lightly Platform.
 First, make sure you have an account on `Lightly <https://www.lightly.ai>`_. 
 A free account is sufficient. Log in to the app and create a new dataset. 
-You will get a *token* and *dataset_id* which can be used to upload your dataset
+You will get a *token* and *dataset_id* which can be used to upload your dataset.
+Alternatively, you can create a new dataset directly with the *token*
+by providing the *new_dataset_name* instead of the *dataset_id*.
 
 .. code-block:: bash
 
@@ -110,6 +112,9 @@ You will get a *token* and *dataset_id* which can be used to upload your dataset
     lightly-upload input_dir=cat embeddings=your_embedding.csv \
                    token=your_token dataset_id=your_dataset_id
 
+    # create a new dataset and upload to it
+    lightly-upload input_dir=cat token=your_token new_dataset_name=your_dataset_name
+
 .. note:: To obtain your *token* and *dataset_id* check: 
           :ref:`ref-authentication-token` and :ref:`ref-webapp-dataset-id`.
 
@@ -120,6 +125,7 @@ Upload embeddings using the CLI
 ----------------------------------
 
 You can upload embeddings directly to the Lightly Platform using the CLI.
+Again, you can use the *dataset_id* and *new_dataset_name* interchangeably.
 
 .. code-block:: bash
 
@@ -129,7 +135,7 @@ You can upload embeddings directly to the Lightly Platform using the CLI.
 
     # you can upload the dataset together with the embeddings
     lightly-upload input_dir=cat embeddings=your_embedding.csv \
-                   token=your_token dataset_id=your_dataset_id
+                   token=your_token new_dataset_name=your_dataset_name
 
 
 Download data using the CLI

diff --git a/docs/source/getting_started/platform.rst b/docs/source/getting_started/platform.rst
@@ -159,12 +159,39 @@ drag-and-drop or using the Python Package according to:
     You can upload up to 1'000 images using the frontend.
 
 
+Images can also be uploaded from a Python script:
+
+.. code-block:: python
+
+    from lightly.api.api_workflow_client import ApiWorkflowClient
+    client = ApiWorkflowClient(token='123'm dataset_id='xyz')
+
+    # change mode to 'thumbnails' or 'meta' if you're working with sensitive data
+    client.upload_dataset('path/to/your/images/', mode='full')
+
+
 Upload Embeddings
 -------------------------
 
-Embeddings can be uploaded using the Python Package.
-You can not upload embedding through the web interface. Instead
-:ref:`ref-upload-embedding-lightly`
+Embeddings can be uploaded using the Python Package or the front-end. The simplest
+way to upload the embeddings is from the command line: :ref:`ref-upload-embedding-lightly`.
+
+If you have a numpy array of image embeddings, the filenames of the images, and categorical pseudo-labels,
+you can use the `save_embeddings` function to store them in a lightly-compatible CSV format and upload
+them from your Python code or using the CLI. The following snippet shows how to upload the embeddings from Python.
+
+.. code-block:: python
+
+    from lightly.utils import save_embeddings
+    from lightly.api.api_workflow_client import ApiWorkflowClient
+
+    # store the embeddings in a lightly compatible CSV format before uploading
+    # them to the platform
+    save_embeddings('embeddings.csv', embeddings, labels, filenames)
+
+    # upload the embeddings.csv file to the platform
+    client = ApiWorkflowClient(token='123', dataset_id='xyz')
+    client.upload_embeddings('embeddings.csv', name='my-embeddings')
 
 
 Sampling

diff --git a/docs/source/lightly.active_learning.rst b/docs/source/lightly.active_learning.rst
@@ -22,3 +22,9 @@ lightly.active_learning
 .. automodule:: lightly.active_learning.scorers.detection
    :members:
 
+.utils
+--------
+.. automodule:: lightly.active_learning.utils.bounding_box
+   :members:
+.. automodule:: lightly.active_learning.utils.object_detection_output
+   :members:
diff --git a/docs/source/lightly.api.rst b/docs/source/lightly.api.rst
@@ -8,6 +8,20 @@ lightly.api
 .. automodule:: lightly.api.api_workflow_client
    :members:
 
+.. automodule:: lightly.api.api_workflow_datasets
+   :members:
+
+.. automodule:: lightly.api.api_workflow_download_dataset
+   :members:
+
+.. automodule:: lightly.api.api_workflow_sampling
+   :members:
+
+.. automodule:: lightly.api.api_workflow_upload_dataset
+   :members:
+
+.. automodule:: lightly.api.api_workflow_upload_embeddings
+   :members:
 
 .utils
 ---------------

diff --git a/docs/source/tutorials/structure_your_input.rst b/docs/source/tutorials/structure_your_input.rst
@@ -156,7 +156,7 @@ To upload the three videos from above to the platform, you can use
 
 .. code-block:: bash
 
-    lightly-upload token='123' dataset_id='XYZ' input_dir='data/'
+    lightly-upload token='123' new_dataset_name='my_video_dataset' input_dir='data/'
 
 All other operations (like training a self-supervised model and embedding the frames individually)
 also work on video data. Give it a try! 

diff --git a/lightly/__init__.py b/lightly/__init__.py
@@ -74,7 +74,7 @@
 # All Rights Reserved
 
 __name__ = 'lightly'
-__version__ = '1.1.5'
+__version__ = '1.1.6'
 
 
 try:

diff --git a/lightly/active_learning/agents/agent.py b/lightly/active_learning/agents/agent.py
@@ -49,7 +49,8 @@ class ActiveLearningAgent:
 
     """
 
-    def __init__(self, api_workflow_client: ApiWorkflowClient, query_tag_name: str = None, preselected_tag_name: str = None):
+    def __init__(self, api_workflow_client: ApiWorkflowClient, query_tag_name: str = None,
+                 preselected_tag_name: str = None):
 
         self.api_workflow_client = api_workflow_client
         if query_tag_name is not None or preselected_tag_name is not None:
@@ -76,28 +77,31 @@ def _set_labeled_and_unlabeled_set(self, preselected_tag_data: TagData = None):
                 optional param, then it must not be loaded from the API
 
         """
-        if self.preselected_tag_id is None:
-            self.labeled_set = []
-        else:
-            if preselected_tag_data is None:
+
+        if not hasattr(self, "bitmask_labeled_set"):
+            self.bitmask_labeled_set = BitMask.from_hex("0x0")  # empty labeled set
+            self.bitmask_added_set = BitMask.from_hex("0x0")  # empty added set
+        if self.preselected_tag_id is not None:  # else the default values (empty labeled and added set) are kept
+            if preselected_tag_data is None:  # if it is not passed as argument, it must be loaded from the API
                 preselected_tag_data = self.api_workflow_client.tags_api.get_tag_by_tag_id(
                     self.api_workflow_client.dataset_id, tag_id=self.preselected_tag_id)
-            chosen_samples_ids = BitMask.from_hex(preselected_tag_data.bit_mask_data).to_indices()
-            self.labeled_set = [self.api_workflow_client.filenames_on_server[i] for i in chosen_samples_ids]
-
-        if not hasattr(self, "unlabeled_set"):
-            if self.query_tag_id is None:
-                self.unlabeled_set = self.api_workflow_client.filenames_on_server
-            else:
-                query_tag_data = self.api_workflow_client.tags_api.get_tag_by_tag_id(
-                    self.api_workflow_client.dataset_id, tag_id=self.query_tag_id)
-                chosen_samples_ids = BitMask.from_hex(query_tag_data.bit_mask_data).to_indices()
-                self.unlabeled_set = [self.api_workflow_client.filenames_on_server[i] for i in chosen_samples_ids]
-
-        filenames_labeled = set(self.labeled_set)
-        self.unlabeled_set = [f for f in self.unlabeled_set if f not in filenames_labeled]
-
-    def query(self, sampler_config: SamplerConfig, al_scorer: Scorer = None) -> List[str]:
+            new_bitmask_labeled_set = BitMask.from_hex(preselected_tag_data.bit_mask_data)
+            self.bitmask_added_set = new_bitmask_labeled_set - self.bitmask_labeled_set
+            self.bitmask_labeled_set = new_bitmask_labeled_set
+
+        if self.query_tag_id is None:
+            bitmask_query_tag = BitMask.from_length(len(self.api_workflow_client.filenames_on_server))
+        else:
+            query_tag_data = self.api_workflow_client.tags_api.get_tag_by_tag_id(
+                self.api_workflow_client.dataset_id, tag_id=self.query_tag_id)
+            bitmask_query_tag = BitMask.from_hex(query_tag_data.bit_mask_data)
+        self.bitmask_unlabeled_set = bitmask_query_tag - self.bitmask_labeled_set
+
+        self.labeled_set = self.bitmask_labeled_set.masked_select_from_list(self.api_workflow_client.filenames_on_server)
+        self.added_set = self.bitmask_added_set.masked_select_from_list(self.api_workflow_client.filenames_on_server)
+        self.unlabeled_set = self.bitmask_unlabeled_set.masked_select_from_list(self.api_workflow_client.filenames_on_server)
+
+    def query(self, sampler_config: SamplerConfig, al_scorer: Scorer = None) -> Tuple[List[str], List[str]]:
         """Performs an active learning query.
 
         As part of it, the self.labeled_set and self.unlabeled_set are updated
@@ -110,26 +114,30 @@ def query(self, sampler_config: SamplerConfig, al_scorer: Scorer = None) -> List
                 An instance of a class inheriting from Scorer, e.g. a ClassificationScorer.
 
         Returns:
-            The filenames of the samples in the new labeled_set.
+            The filenames of the samples in the new labeled_set
+            and the filenames of the samples chosen by the sampler.
+            This added_set was added to the old labeled_set
+            to form the new labeled_set.
 
         """
         # check input
         if sampler_config.n_samples < len(self.labeled_set):
             warnings.warn("ActiveLearningAgent.query: The number of samples which should be sampled "
-                           "including the current labeled set "
-                           "(sampler_config.n_samples) "
-                            "is smaller than the number of samples in the current labeled set.")
-            return self.labeled_set
+                          "including the current labeled set "
+                          "(sampler_config.n_samples) "
+                          "is smaller than the number of samples in the current labeled set."
+                          "Skipping the sampling and returning the previous labeled set.")
+            return self.labeled_set, []
 
         # calculate scores
         if al_scorer is not None:
             no_unlabeled_samples = len(self.unlabeled_set)
             no_samples_with_predictions = len(al_scorer.model_output)
             if no_unlabeled_samples != no_samples_with_predictions:
-                raise ValueError(f"The scorer must have exactly as much samples as in the unlabeled set,"
+                raise ValueError(f"The scorer must have exactly as many samples as in the unlabeled set,"
                                  f"but there are {no_samples_with_predictions} predictions in the scorer,"
                                  f"but {no_unlabeled_samples} in the unlabeled set.")
-            scores_dict = al_scorer._calculate_scores()
+            scores_dict = al_scorer.calculate_scores()
         else:
             scores_dict = None
 
@@ -144,4 +152,4 @@ def query(self, sampler_config: SamplerConfig, al_scorer: Scorer = None) -> List
         self.preselected_tag_id = new_tag_data.id
         self._set_labeled_and_unlabeled_set(new_tag_data)
 
-        return self.labeled_set
+        return self.labeled_set, self.added_set
diff --git a/lightly/active_learning/scorers/classification.py b/lightly/active_learning/scorers/classification.py
@@ -67,7 +67,13 @@ class ScorerClassification(Scorer):
     def __init__(self, model_output: np.ndarray):
         super(ScorerClassification, self).__init__(model_output)
 
-    def _calculate_scores(self) -> Dict[str, np.ndarray]:
+    def calculate_scores(self) -> Dict[str, np.ndarray]:
+        """Calculates and returns the active learning scores.
+
+        Returns:
+            A dictionary mapping from the score name (as string)
+            to the scores (as a single-dimensional numpy array).
+        """
         scores = dict()
         scores["prediction-margin"] = self._get_prediction_margin_score()
         scores["prediction-entropy"] = self._get_prediction_entropy_score()

diff --git a/lightly/active_learning/scorers/detection.py b/lightly/active_learning/scorers/detection.py
@@ -177,7 +177,13 @@ def _check_config(self):
         else:
             self.config = default_conf
 
-    def _calculate_scores(self) -> Dict[str, np.ndarray]:
+    def calculate_scores(self) -> Dict[str, np.ndarray]:
+        """Calculates and returns the active learning scores.
+
+        Returns:
+            A dictionary mapping from the score name (as string)
+            to the scores (as a single-dimensional numpy array).
+        """
         scores = dict()
         scores['object-frequency'] = self._get_object_frequency()
         scores['prediction-margin'] = self._get_prediction_margin()

diff --git a/lightly/active_learning/scorers/scorer.py b/lightly/active_learning/scorers/scorer.py
@@ -8,5 +8,7 @@ class Scorer():
     def __init__(self, model_output):
         self.model_output = model_output
 
-    def _calculate_scores(self) -> Dict[str, np.ndarray]:
+    def calculate_scores(self) -> Dict[str, np.ndarray]:
+        """Calculates and returns active learning scores in a dictionary.
+        """
         raise NotImplementedError
diff --git a/lightly/active_learning/utils/__init__.py b/lightly/active_learning/utils/__init__.py
@@ -0,0 +1,7 @@
+""" Collection of Utils for Active Learning """
+
+# Copyright (c) 2020. Lightly AG and its affiliates.
+# All Rights Reserved
+
+from lightly.active_learning.utils.bounding_box import BoundingBox
+from lightly.active_learning.utils.object_detection_output import ObjectDetectionOutput
diff --git a/lightly/api/__init__.py b/lightly/api/__init__.py
@@ -3,4 +3,5 @@
 # Copyright (c) 2020. Lightly AG and its affiliates.
 # All Rights Reserved
 
+from lightly.api.api_workflow_client import ApiWorkflowClient
 from lightly.api import routes