evaluation threshold of wake-up word detection #129

Ruiqin-Huang · 2025-02-02T03:11:25Z

Problem related to the evaluation threshold of wake-up word detection: Why, when setting the threshold to 0.0, according to the code in howl/howl/model/inference.py:

if max_prob < self.threshold:
        max_label = self.negative_label

all samples with prediction probability (probability of being predicted as a positive sample of the wake-up word?) < 0.0 should be classified as negative. Since the probability >= 0, theoretically all samples should be classified as positive, i.e., fn=tp=0. However, according to the hey_fire_fox experiment, when the threshold is 0.0, tn=2428, fn=2. The model still retains the ability to distinguish between negative samples. What causes this issue? Could it be related to OOV (Out-of-Vocabulary) classification? Or is it related to rounding errors?

line | eval_dataset | threshold | tp | tn | fp | fn
-- | -- | -- | -- | -- | --
1 | Dev positive | 0.0 | 74 | 0 | 0 | 2
2 | Dev negative | 0.0 | 0 | 2428 | 103 | 0
3 | Dev noisy positive | 0.0 | 69 | 0 | 0 | 7
4 | Dev noisy negative | 0.0 | 0 | 2468 | 63 | 0
5 | Test positive | 0.0 | 47 | 0 | 0 | 7
6 | Test negative | 0.0 | 0 | 2399 | 105 | 0
7 | Test noisy positive | 0.0 | 45 | 0 | 0 | 9
8 | Test noisy negative | 0.0 | 0 | 2442 | 62 | 0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evaluation threshold of wake-up word detection #129

evaluation threshold of wake-up word detection #129

Ruiqin-Huang commented Feb 2, 2025

evaluation threshold of wake-up word detection #129

evaluation threshold of wake-up word detection #129

Comments

Ruiqin-Huang commented Feb 2, 2025