[Not an Issue] How to keep the trained model as close as possible to groundtruth #646

johnlockejrr · 2024-09-27T13:57:13Z

I try to train a segmentation model for a modern printed Judaeo-Arabic dataset. The problem I face is that in the trained model I mainly loose vowel signs below the line. What can be done? I tried from scratch training and finetuning.

ketos segtrain --line-width 10 -mr Main:textzone --precision 16 -d cuda:0 -f page -t output.txt --resize both -tl -i /home/incognito/kraken-train/teyman_print/biblialong02_se3_2_tl.mlmodel -q early --min-epochs 80 -o /home/incognito/kraken-train/teyman_print/teyman_print_scr_cl/teyman_print_tl_v3

Manual segmentation as groundtruth:

Segmentation with the new trained model (small data, is preliminary):

The text was updated successfully, but these errors were encountered:

johnlockejrr · 2024-09-28T08:14:05Z

Should I try to train it as center line and not top line as normal for Hebrew?

dstoekl · 2024-09-28T08:27:20Z

it will not help I think. use the api to improve the polyons by calculating average line distance and extrapolating from there.

johnlockejrr · 2024-09-28T11:23:01Z

it will not help I think. use the api to improve the polyons by calculating average line distance and extrapolating from there.

Is not a problem with the dataset but the model output. Using API to do what? The model should perform better.

Maybe you are aware of a Hebrew segmentation model that can properly handle nikkud and cantillation?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Not an Issue] How to keep the trained model as close as possible to groundtruth #646

[Not an Issue] How to keep the trained model as close as possible to groundtruth #646

johnlockejrr commented Sep 27, 2024

johnlockejrr commented Sep 28, 2024

dstoekl commented Sep 28, 2024

johnlockejrr commented Sep 28, 2024 •

edited

Loading

[Not an Issue] How to keep the trained model as close as possible to groundtruth #646

[Not an Issue] How to keep the trained model as close as possible to groundtruth #646

Comments

johnlockejrr commented Sep 27, 2024

johnlockejrr commented Sep 28, 2024

dstoekl commented Sep 28, 2024

johnlockejrr commented Sep 28, 2024 • edited Loading

johnlockejrr commented Sep 28, 2024 •

edited

Loading