Skip to content

Commit

Permalink
Merge pull request #49 from spraakbanken/44-handle-annotations-when-2…
Browse files Browse the repository at this point in the history
…-tokens-are-merged

fix: handle annotations when 2 tokens are merged
  • Loading branch information
kod-kristoff authored Nov 13, 2024
2 parents b73f93d + 64fc457 commit 83efc38
Show file tree
Hide file tree
Showing 15 changed files with 883 additions and 1,262 deletions.
21 changes: 11 additions & 10 deletions examples/christoph-borg/config.yaml
Original file line number Diff line number Diff line change
@@ -1,16 +1,17 @@
metadata:
id: christoph-borg
language: swe
id: christoph-borg
language: swe

import:
importer: text_import:parse
importer: text_import:parse

export:
annotations:
- <sentence>
# - <token:word>
- <token>:stanza.pos
- <token>:sbx_ocr_correction_viklofg_sweocr.ocr-correction--viklofg-sweocr

annotations:
- <sentence>
# - <token:word>
- <token>:stanza.pos
- sbx_ocr_correction_viklofg_sweocr.sbx-ocr-correction
- sbx_ocr_correction_viklofg_sweocr.sbx-ocr-correction:sbx_ocr_correction_viklofg_sweocr.ocr-correction--viklofg-sweocr

sparv:
compression: none
compression: none
19 changes: 10 additions & 9 deletions examples/ocr-correction-viklofg-sweocr/config.yaml
Original file line number Diff line number Diff line change
@@ -1,16 +1,17 @@
metadata:
id: hello-ocr
language: swe
id: hello-ocr
language: swe

import:
importer: text_import:parse
importer: text_import:parse

export:
annotations:
- <sentence>
# - <token:word>
- <token>:stanza.pos
- <token>:sbx_ocr_correction_viklofg_sweocr.ocr-correction--viklofg-sweocr
annotations:
- <sentence>
# - <token:word>
- <token>:stanza.pos
- sbx_ocr_correction_viklofg_sweocr.sbx-ocr-correction
- sbx_ocr_correction_viklofg_sweocr.sbx-ocr-correction:sbx_ocr_correction_viklofg_sweocr.ocr-correction--viklofg-sweocr

sparv:
compression: none
compression: none
3 changes: 2 additions & 1 deletion examples/texts/dokument.txt
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
Den i HandelstidniDgens g&rdagsnnmmer omtalade hvalfisken, sorn fångats i Frölnndaviken
Den i HandelstidniDgens g&rdagsnnmmer omtalade hvalfisken, sorn fångats i Frölnndaviken.
Jonath an saknades.
4 changes: 2 additions & 2 deletions ocr-correction-viklofg-sweocr/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ prepare-release: tests/requirements-testing.lock

# we use lock extension so that dependabot doesn't pick up changes in this file
tests/requirements-testing.lock: pyproject.toml
pdm export --dev --format requirements --output $@
pdm export --dev --format requirements --no-hashes --output $@

.PHONY: kb-bert-prepare-release
kb-bert-prepare-release: ocr-correction-viklofg-sweocr/CHANGELOG.md
Expand All @@ -158,4 +158,4 @@ CHANGELOG.md:

.PHONY: ocr-correction-viklofg-sweocr/CHANGELOG.md
ocr-correction-viklofg-sweocr/CHANGELOG.md:
git cliff --unreleased --include-path "ocr-correction-viklofg-sweocr/**/*" --include-path "examples/ocr-correction-viklofg-sweocr/**/*" --prepend $@
git cliff --unreleased --include-path "ocr-correction-viklofg-sweocr/**/*" --include-path "examples/ocr-correction-viklofg-sweocr/**/*" --prepend $@
Loading

0 comments on commit 83efc38

Please sign in to comment.