Segmentation metrics overview

Metric	Needs ref	Boundary types	Near misses	Different sequences	Implementation
F1	✔️(1)	❌	❌	❌
Pk	✔️(1)	❌	✔️	❌	SegEval
WindowDiff	✔️(1)	❌	✔️	❌	SegEval
Segmentation Similarity	✔️(1)	✔️	✔️	❌	SegEval
Boundary Similarity	✔️(1)	✔️	✔️	❌	SegEval
BLEU(-br)	✔️(n)	✔️	✔️	✔️	SacreBLEU?
TER-br	✔️(n)	✔️	✔️	✔️	TER
S-BLEU	✔️(n)	✔️	✔️	✔️	SacreBLEU

Pk

Beeferman99statistical

Measures the probability that two units k steps apart are incorrectly labeled as being in different segments. Is calculated by setting k to half of the average true segment size and then computing penalties via a moving window of length k. At each location, the algorithm determines whether the two ends of the probe are in the same or different segments in the reference segmentation and increases a counter if the algorithm’s segmentation disagrees. The resulting count is scaled between 0 and 1 by dividing by the number of measurements taken.

$P_k(hyp,ref)=\frac{1}{N-k}\sum_{i=1}^{N-k}\delta(f(hyp,i,i+k) \ne f(ref,i,i+k))$

WindowDiff

Pevzner02critique

For each position of the window, compares the number of reference segmentation boundaries that fall in this interval (b(ref,i,i+k)) with the number of boundaries that are assigned by the algorithm (b(hyp,i,i+k)). The algorithm is penalized if b(ref,i,i+k) != b(hyp,i,i+k)

$WD_k(hyp,ref)=\frac{1}{N-k}\sum_{i=1}^{N-k}\delta(b(hyp,i,i+k) \ne b(ref,i,i+k))$

Segmentation Similarity

Fournier12segmentation

Proportion of boundaries that are not transformed (added/deleted, substituted) when comparing them using edit distance (transposition allowed up to n steps).

$S(s_a,s_b,n)=1-\frac{d(s_a,s_b,n)}{N-1}$

Boundary Similarity

Fournier13evaluating

New weights and new normalization for boundary edit distance. Assuming that boundary edit distance produces sets of edit operations where A is the set of additions/deletions, T the set of n-wise transpositions, S the set of substitutions, and M the set of matching boundary pairs, boundary similarity can be defined as:

$B(s_a,s_b,n)=1-\frac{|A|+w_t^{span}(T,n)+w_s^{ord}(S,n)}{|A|+|T|+|S|+|M|}$

BLEU(-br)

karakanta2042

BLEU computed with the data containing breaks as special symbols. Each break symbol counts as an extra token that contributes to the score.

TER-br

karakanta2042

TER calculated with all tokens of the sentence masked.

S mode BLEU (S-BLEU)

Matusov19customizing

Subtitle BLEU. Calculates BLEU on subtitles instead of sentences, so that any target words that appear in the wrong subtitle count as error. Assumes that the subtitles in the target and the reference match.

Conformity to the subtitle constraint of length (CPL_conf)

Subtitles should not exceed a specific length. Conformity is measured as a maximum subtitle length of n characters per line (maximum 2 lines of up to n characters each for the subtitle block), where n is 42 according to TED subtitling guidelines. CPL_conformity is the percentage of subtitles in the corpus conforming to the length constraint.

Adapting standard metrics via alignment

MWER (Matusov et al., 2006)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metrics.md

metrics.md

Segmentation metrics overview

Pk

WindowDiff

Segmentation Similarity

Boundary Similarity

BLEU(-br)

TER-br

S mode BLEU (S-BLEU)

Conformity to the subtitle constraint of length (CPL_conf)

Adapting standard metrics via alignment

Files

metrics.md

Latest commit

History

metrics.md

File metadata and controls

Segmentation metrics overview

Pk

WindowDiff

Segmentation Similarity

Boundary Similarity

BLEU(-br)

TER-br

S mode BLEU (S-BLEU)

Conformity to the subtitle constraint of length (CPL_conf)

Adapting standard metrics via alignment