Skip to content

Commit

Permalink
Merge pull request #343 from lightly-ai/develop
Browse files Browse the repository at this point in the history
Develop to Master - Pre-release 1.1.7
  • Loading branch information
philippmwirth authored Apr 29, 2021
2 parents 6f7e711 + 86b4641 commit 89a7b08
Show file tree
Hide file tree
Showing 17 changed files with 712 additions and 210 deletions.
124 changes: 87 additions & 37 deletions docs/source/getting_started/active_learning.rst
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ dataset_id and token.
# use trainer.max_epochs=0 to skip training
lightly-magic input_dir='path/to/your/raw/dataset' dataset_id='xyz' token='123' trainer.max_epochs=0
Next, you will need to initialize the `ApiWorkflowClient` and the `ActiveLearningAgent`
Then, in your Python script, you will need to initialize the `ApiWorkflowClient` and the `ActiveLearningAgent`

.. code-block:: Python
Expand All @@ -89,7 +89,7 @@ Next, you will need to initialize the `ApiWorkflowClient` and the `ActiveLearnin
it could be that a large portion of the images is blurry. In that case, it's
possible to create a tag in the web-app which only contains the sharp images
and tell the `ActiveLearningAgent` to only sample from this tag. To do so, set
the `query_tag_name` argument in the constructor.
the `query_tag_name` argument in the constructor of the agent.

Let's configure the sampling request and request an initial selection next:

Expand All @@ -98,23 +98,26 @@ Let's configure the sampling request and request an initial selection next:
from lightly.active_learning.config import SamplerConfig
from lightly.openapi_generated.swagger_client import SamplingMethod
# we want an initial pool of 100 images
config = SamplerConfig(n_samples=100, method=SamplingMethod.CORESET, name='initial-selection')
initial_selection = al_agent.query(config)
# we want an initial pool of 150 images
config = SamplerConfig(n_samples=150, method=SamplingMethod.CORESET, name='initial-selection')
al_agent.query(config)
initial_selection = al_agent.labeled_set
# initial_selection contains now 100 filenames
# initial_selection now contains 150 filenames
assert len(initial_selection) == 150
The query returns the list of filenames corresponding to the initial selection. Additionally, you
will find that a tag has been created in the web-app under the name "initial-selection".
Head there to scroll through the samples and download the selected images before annotating them.
The result of the query is a tag in the web-app under the name "initial-selection". The tag contains
the images which were selected by the sampling algorithm. Head there to scroll through the samples and
download the selected images before annotating them. Alternatively, you can access the filenames
of the selected images via the attribute `labeled_set` as shown above.


Active Learning Step
----------------------

After you have annotated your initial selection of images, you can train a model
on them. The trained model can then be used to figure out, with which images it
has problems. These images can then be added to the labeled dataset.
on them. The trained model can then be used to figure out which images pose problems.
This section will show you how these images can be added to the labeled dataset.

To continue with active learning with Lightly, you will need the `ApiWorkflowClient` and `ActiveLearningAgent` from before.
If you perform the next selection step in a new file you have to initialize the client and agent again.
Expand All @@ -130,23 +133,22 @@ have to re-initialize them, the tracking of the tags is taken care of for you.
al_agent = ActiveLearningAgent(api_client, preselected_tag_name='initial-selection')
The next part is what differentiates active learning from simple subsampling; the
trained model is used to get predictions on the unlabeled data and the sampler then
decides based on these predictions. To get a list of all filenames in the unlabeled set,
you can simply call
trained model is used to get predictions on the data and the sampler then
decides based on these predictions. To get a list of all filenames for which
predictions are required, you can use the `query_set`:

.. code-block:: Python
# get all filenames in the unlabeled set
unlabeled_set = al_agent.unlabeled_set
# get all filenames in the query set
query_set = al_agent.query_set
Use this list to get predictions on the unlabeled images.

**Important:** The predictions need to be in the same order as the filenames in the
list returned by the `ActiveLearningAgent` and they need to be stored in a numpy array.

Once you have the scores in the right order, make sure to normalize them such that
the rows sum to one. Then, create a scorer object like so:
list returned by the `ActiveLearningAgent`.

For classification, the predictions need to be in a numpy array and normalized,
such that the rows sum to one. Then, create a scorer object like so:

.. code-block:: Python
Expand All @@ -159,16 +161,21 @@ here is that the argument `n_samples` always refers to the total size of the lab

.. code-block:: Python
# we want a total of 200 images after the first iteration
# we want a total of 200 images after the first iteration (50 new samples)
# this time, we use the CORAL sampler and provide a scorer to the query
config = SamplerConfig(n_samples=200, method=SamplingMethod.CORAL, name='al-iteration-1')
labeled_set_iteration_1 = al_agent.query(sampler_config, scorer)
al_agent.query(sampler_config, scorer)
labeled_set_iteration_1 = al_agent.labeled_set
added_set_iteration_1 = al_agent.added_set
assert len(labeled_set_iteration_1) == 200
assert len(added_set_iteration_1) == 50
As before, you will receive the filenames of all the images in the labeled set and there
will be a new tag named `al-iteration-1` visible in the web-app. You can repeat the active
learning step until the model achieves the required accuracy.
As before, there will be a new tag named `al-iteration-1` visible in the web-app. Additionally,
you can access the filenames of all the images in the labeled set and the filenames which were
added by this query via the attributes `labeled_set` and `added_set` respectively.
You can repeat the active learning step until the model achieves the required accuracy.

Scorers
-----------------
Expand All @@ -180,13 +187,26 @@ Image Classification
^^^^^^^^^^^^^^^^^^^^^
Use this scorer when working on a classification problem (binary or multiclass).

Currently, the following scorers are available:

- **prediction-margin** uses the margin between 1.0 and the highest confidence
prediction. Use this scorer to select images where the model is insecure.
Currently we offer three uncertainty scorers,which are taken from
http://burrsettles.com/pub/settles.activelearning.pdf, Section 3.1, page 12f
and also explained in https://towardsdatascience.com/uncertainty-sampling-cheatsheet-ec57bc067c0b
They all have in common, that the score is highest if all classes have the
same confidence and are 0 if the model assigns 100% probability to a single class.
The differ in the number of class confidences they take into account.

- **uncertainty_least_confidence**:
This score is 1 - the highest confidence prediction. It is high
when the confidence about the most probable class is low.

- **prediction-entropy** computes the entropy of the prediction.
All confidences are considered to compute the entropy of a sample.
- **uncertainty_margin**
This score is 1 - the margin between the highest confidence
and second highest confidence prediction. It is high when the model
cannot decide between the two most probable classes.

- **uncertainty_entropy**
This scorer computes the entropy of the prediction. The confidences
for all classes are considered to compute the entropy of a sample.

For more information about how to use the classification scorer have a look here:
:py:class:`lightly.active_learning.scorers.classification.ScorerClassification`
Expand All @@ -195,17 +215,47 @@ For more information about how to use the classification scorer have a look here
Object Detection
^^^^^^^^^^^^^^^^^^^^^
Use this scorer when working on an object detection problem using bounding
boxes.
boxes. The object detection scorers require the input to be in
the `ObjectDetectionOutput` format.

We expect the model predictions to contain

- bounding boxes of shape (x0, y0, x1, y1)
- objectness_probability for each bounding box
- classification_probabilities for each bounding box

You can find more about the format here:
:py:class:`lightly.active_learning.utils.object_detection_output.ObjectDetectionOutput`

We also provide a helper method to work with the model output format consisting
of only a probability per bounding box and the associated label.
:py:class:`lightly.active_learning.utils.object_detection_output.ObjectDetectionOutput.from_scores`


Currently, the following scorers are available:

- **object-frequency** uses model predictions to focus more on images which
have many objects in them. Use this scorer if you want scenes with lots of
objects in them like we usually want in computer vision tasks such as
perception in autonomous driving.
- **object_frequency**
This score measures the number of objects in the image. Use this scorer if
you want scenes with lots of objects in them. This is suited for computer vision
tasks such as perception in autonomous driving.

- **objectness_least_confidence**
This score is 1 - the mean of the highest confidence prediction. Use this scorer
to select images where the model is insecure about both whether it found an object
at all and the class of the object.

- **classification_scores**
These scores are computed for each object detection per image out of
the class probability prediction for this detection. Then, they are reduced
to one score per image by taking the maximum. In particular we support:
- **uncertainty_least_confidence**
- **uncertainty_margin**
- **uncertainty_entropy**
The scores are computed using the scorer for classification.


- **prediction-margin** uses the margin between 1.0 and the mean of the highest
confidence prediction. Use this scorer to select images where the model is insecure.
For more information about how to use the object detection scorer have a look here:
:py:class:`lightly.active_learning.scorers.detection.ScorerDetection
Image Segmentation
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -188,9 +188,9 @@ def get_labels(self, filenames: List[str]) -> np.ndarray:
classifier.fit(X=labeled_set_features, y=labeled_set_labels)

# %%
# 3. Use the classifier to predict on the unlabeled set.
unlabeled_set_features = dataset.get_features(agent.unlabeled_set)
predictions = classifier.predict_proba(X=unlabeled_set_features)
# 3. Use the classifier to predict on the query set.
query_set_features = dataset.get_features(agent.query_set)
predictions = classifier.predict_proba(X=query_set_features)

# %%
# 4. Calculate active learning scores from the prediction.
Expand Down
2 changes: 1 addition & 1 deletion lightly/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@
# All Rights Reserved

__name__ = 'lightly'
__version__ = '1.1.6'
__version__ = '1.1.7'


try:
Expand Down
Loading

0 comments on commit 89a7b08

Please sign in to comment.