Merge pull request #343 from lightly-ai/develop

Develop to Master - Pre-release 1.1.7
lightly-ai · Apr 29, 2021 · 89a7b08 · 89a7b08
2 parents 6f7e711 + 86b4641
commit 89a7b08
Show file tree

Hide file tree

Showing 17 changed files with 712 additions and 210 deletions.
diff --git a/docs/source/getting_started/active_learning.rst b/docs/source/getting_started/active_learning.rst
@@ -71,7 +71,7 @@ dataset_id and token.
    # use trainer.max_epochs=0 to skip training
    lightly-magic input_dir='path/to/your/raw/dataset' dataset_id='xyz' token='123' trainer.max_epochs=0
 
-Next, you will need to initialize the `ApiWorkflowClient` and the `ActiveLearningAgent`
+Then, in your Python script, you will need to initialize the `ApiWorkflowClient` and the `ActiveLearningAgent`
 
 .. code-block:: Python
 
@@ -89,7 +89,7 @@ Next, you will need to initialize the `ApiWorkflowClient` and the `ActiveLearnin
    it could be that a large portion of the images is blurry. In that case, it's 
    possible to create a tag in the web-app which only contains the sharp images
    and tell the `ActiveLearningAgent` to only sample from this tag. To do so, set
-   the `query_tag_name` argument in the constructor.
+   the `query_tag_name` argument in the constructor of the agent.
 
 Let's configure the sampling request and request an initial selection next:
 
@@ -98,23 +98,26 @@ Let's configure the sampling request and request an initial selection next:
    from lightly.active_learning.config import SamplerConfig
    from lightly.openapi_generated.swagger_client import SamplingMethod
 
-   # we want an initial pool of 100 images
-   config = SamplerConfig(n_samples=100, method=SamplingMethod.CORESET, name='initial-selection')
-   initial_selection = al_agent.query(config)
+   # we want an initial pool of 150 images
+   config = SamplerConfig(n_samples=150, method=SamplingMethod.CORESET, name='initial-selection')
+   al_agent.query(config)
+   initial_selection = al_agent.labeled_set
    
-   # initial_selection contains now 100 filenames
+   # initial_selection now contains 150 filenames
+   assert len(initial_selection) == 150
 
-The query returns the list of filenames corresponding to the initial selection. Additionally, you
-will find that a tag has been created in the web-app under the name "initial-selection".
-Head there to scroll through the samples and download the selected images before annotating them.
+The result of the query is a tag in the web-app under the name "initial-selection". The tag contains
+the images which were selected by the sampling algorithm. Head there to scroll through the samples and
+download the selected images before annotating them. Alternatively, you can access the filenames
+of the selected images via the attribute `labeled_set` as shown above.
 
 
 Active Learning Step
 ----------------------
 
 After you have annotated your initial selection of images, you can train a model
-on them. The trained model can then be used to figure out, with which images it 
-has problems. These images can then be added to the labeled dataset.
+on them. The trained model can then be used to figure out which images pose problems.
+This section will show you how these images can be added to the labeled dataset.
 
 To continue with active learning with Lightly, you will need the `ApiWorkflowClient` and `ActiveLearningAgent` from before.
 If you perform the next selection step in a new file you have to initialize the client and agent again.
@@ -130,23 +133,22 @@ have to re-initialize them, the tracking of the tags is taken care of for you.
    al_agent = ActiveLearningAgent(api_client, preselected_tag_name='initial-selection')
 
 The next part is what differentiates active learning from simple subsampling; the
-trained model is used to get predictions on the unlabeled data and the sampler then
-decides based on these predictions. To get a list of all filenames in the unlabeled set,
-you can simply call
+trained model is used to get predictions on the data and the sampler then
+decides based on these predictions. To get a list of all filenames for which 
+predictions are required, you can use the `query_set`:
 
 .. code-block:: Python
 
-   # get all filenames in the unlabeled set
-   unlabeled_set = al_agent.unlabeled_set
+   # get all filenames in the query set
+   query_set = al_agent.query_set
 
 Use this list to get predictions on the unlabeled images.
 
 **Important:** The predictions need to be in the same order as the filenames in the
-list returned by the `ActiveLearningAgent` and they need to be stored in a numpy array.
-
-Once you have the scores in the right order, make sure to normalize them such that
-the rows sum to one. Then, create a scorer object like so:
+list returned by the `ActiveLearningAgent`.
 
+For classification, the predictions need to be in a numpy array and normalized,
+such that the rows sum to one. Then, create a scorer object like so:
 
 .. code-block:: Python
 
@@ -159,16 +161,21 @@ here is that the argument `n_samples` always refers to the total size of the lab
 
 .. code-block:: Python
 
-   # we want a total of 200 images after the first iteration
+   # we want a total of 200 images after the first iteration (50 new samples)
    # this time, we use the CORAL sampler and provide a scorer to the query
    config = SamplerConfig(n_samples=200, method=SamplingMethod.CORAL, name='al-iteration-1')
-   labeled_set_iteration_1 = al_agent.query(sampler_config, scorer)
+   al_agent.query(sampler_config, scorer)
+
+   labeled_set_iteration_1 = al_agent.labeled_set
+   added_set_iteration_1 = al_agent.added_set
 
    assert len(labeled_set_iteration_1) == 200
+   assert len(added_set_iteration_1) == 50
 
-As before, you will receive the filenames of all the images in the labeled set and there
-will be a new tag named `al-iteration-1` visible in the web-app. You can repeat the active
-learning step until the model achieves the required accuracy.
+As before, there will be a new tag named `al-iteration-1` visible in the web-app. Additionally, 
+you can access the filenames of all the images in the labeled set and the filenames which were
+added by this query via the attributes `labeled_set` and `added_set` respectively.
+You can repeat the active learning step until the model achieves the required accuracy.
 
 Scorers
 -----------------
@@ -180,13 +187,26 @@ Image Classification
 ^^^^^^^^^^^^^^^^^^^^^
 Use this scorer when working on a classification problem (binary or multiclass).
 
-Currently, the following scorers are available:
 
-- **prediction-margin** uses the margin between 1.0 and the highest confidence 
-  prediction. Use this scorer to select images where the model is insecure.
+Currently we offer three uncertainty scorers,which are taken from
+http://burrsettles.com/pub/settles.activelearning.pdf, Section 3.1, page 12f
+and also explained in https://towardsdatascience.com/uncertainty-sampling-cheatsheet-ec57bc067c0b
+They all have in common, that the score is highest if all classes have the
+same confidence and are 0 if the model assigns 100% probability to a single class.
+The differ in the number of class confidences they take into account.
+
+- **uncertainty_least_confidence**:
+    This score is 1 - the highest confidence prediction. It is high
+    when the confidence about the most probable class is low.
 
-- **prediction-entropy** computes the entropy of the prediction. 
-  All confidences are considered to compute the entropy of a sample.
+- **uncertainty_margin**
+    This score is 1 - the margin between the highest confidence
+    and second highest confidence prediction. It is high when the model
+    cannot decide between the two most probable classes.
+
+- **uncertainty_entropy**
+    This scorer computes the entropy of the prediction. The confidences
+    for all classes are considered to compute the entropy of a sample.
 
 For more information about how to use the classification scorer have a look here:
 :py:class:`lightly.active_learning.scorers.classification.ScorerClassification`
@@ -195,17 +215,47 @@ For more information about how to use the classification scorer have a look here
 Object Detection
 ^^^^^^^^^^^^^^^^^^^^^
 Use this scorer when working on an object detection problem using bounding
-boxes.
+boxes. The object detection scorers require the input to be in 
+the `ObjectDetectionOutput` format.
+
+We expect the model predictions to contain
+
+- bounding boxes of shape (x0, y0, x1, y1)
+- objectness_probability for each bounding box
+- classification_probabilities for each bounding box
+
+You can find more about the format here: 
+:py:class:`lightly.active_learning.utils.object_detection_output.ObjectDetectionOutput`
+
+We also provide a helper method to work with the model output format consisting 
+of only a probability per bounding box and the associated label.
+:py:class:`lightly.active_learning.utils.object_detection_output.ObjectDetectionOutput.from_scores`
+
 
 Currently, the following scorers are available:
 
-- **object-frequency** uses model predictions to focus more on images which 
-  have many objects in them. Use this scorer if you want scenes with lots of 
-  objects in them like we usually want in computer vision tasks such as 
-  perception in autonomous driving.
+- **object_frequency**
+  This score measures the number of objects in the image. Use this scorer if
+  you want scenes with lots of objects in them. This is suited for computer vision
+  tasks such as perception in autonomous driving.
+
+- **objectness_least_confidence**
+  This score is 1 - the mean of the highest confidence prediction. Use this scorer
+  to select images where the model is insecure about both whether it found an object
+  at all and the class of the object.
+
+- **classification_scores**
+  These scores are computed for each object detection per image out of
+  the class probability prediction for this detection. Then, they are reduced
+  to one score per image by taking the maximum. In particular we support:
+  - **uncertainty_least_confidence**
+  - **uncertainty_margin**
+  - **uncertainty_entropy**
+  The scores are computed using the scorer for classification.
+
 
-- **prediction-margin** uses the margin between 1.0 and the mean of the highest 
-  confidence prediction. Use this scorer to select images where the model is insecure.
+For more information about how to use the object detection scorer have a look here:
+:py:class:`lightly.active_learning.scorers.detection.ScorerDetection
 
 
 Image Segmentation

diff --git a/docs/source/tutorials_source/platform/tutorial_active_learning.py b/docs/source/tutorials_source/platform/tutorial_active_learning.py
@@ -188,9 +188,9 @@ def get_labels(self, filenames: List[str]) -> np.ndarray:
 classifier.fit(X=labeled_set_features, y=labeled_set_labels)
 
 # %%
-# 3. Use the classifier to predict on the unlabeled set.
-unlabeled_set_features = dataset.get_features(agent.unlabeled_set)
-predictions = classifier.predict_proba(X=unlabeled_set_features)
+# 3. Use the classifier to predict on the query set.
+query_set_features = dataset.get_features(agent.query_set)
+predictions = classifier.predict_proba(X=query_set_features)
 
 # %%
 # 4. Calculate active learning scores from the prediction.

diff --git a/lightly/__init__.py b/lightly/__init__.py
@@ -74,7 +74,7 @@
 # All Rights Reserved
 
 __name__ = 'lightly'
-__version__ = '1.1.6'
+__version__ = '1.1.7'
 
 
 try: