Skip to content

Commit

Permalink
more metadata
Browse files Browse the repository at this point in the history
  • Loading branch information
anne17 committed Nov 4, 2024
1 parent c5dcff0 commit 4959865
Show file tree
Hide file tree
Showing 5 changed files with 317 additions and 0 deletions.
26 changes: 26 additions & 0 deletions sparv/modules/conll_export/metadata.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
id: export-conllu
name:
swe: CoNLL-U-export
eng: CoNLL-U export
short_description:
swe: Export av korpusdata i Språkbanken Texts CoNLL-U-format
eng: Export of corpus data in Språkbanken Text's CoNLL-U format
task: export
keywords:
- conll-u
- export
sparv_handler: conll_export:conllu
example_output: |-
```
# global.columns = ID FORM LEMMA UPOS XPOS FEATS HEAD DEPREL DEPS MISC
# document_name = rävar
# text_date = 2017-01-10
# text_title = Rödräv
# sent_id = 157
1 Rödräven rödräv NN NN.UTR.SIN.DEF.NOM Case=Nom|Definite=Def|Gender=Com|Number=Sing 2 ss _ _
2 är vara VB VB.PRS.AKT Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Act 0 root _ _
3 ett en DT DT.NEU.SIN.IND Definite=Ind|Gender=Neut|Number=Sing 4 dt _ _
4 hunddjur hunddjur NN NN.NEU.SIN.IND.NOM Case=Nom|Definite=Ind|Gender=Neut|Number=Sing 2 sp _ _
```
created: 2020-11-20
updated: 2022-03-25
50 changes: 50 additions & 0 deletions sparv/modules/malt/metadata.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
id: swe-dependency-malt-treebank
name:
swe: Dependensparsning med MaltParser
eng: Dependency parsing with MaltParser
short_description:
swe: Svensk dependensparsning tränad på Svensk trädbank baserad på MaltParser
eng: Swedish dependency parsing from MaltParser trained on Sweedish treebank
task: dependency parsing
language_codes:
- swe
keywords:
- dependency parsing
annotations:
- <token>:malt.ref
- <token>:malt.dephead_ref
- <token>:malt.deprel
example_output: |-
```xml
<token dephead_ref="4" deprel="SS" ref="1">Alfred</token>
<token dephead_ref="1" deprel="HD" ref="2">Bernhard</token>
<token dephead_ref="1" deprel="HD" ref="3">Nobel</token>
<token deprel="ROOT" ref="4">var</token>
<token dephead_ref="8" deprel="DT" ref="5">en</token>
<token dephead_ref="8" deprel="AT" ref="6">svensk</token>
<token dephead_ref="8" deprel="CJ" ref="7">kemist</token>
<token dephead_ref="9" deprel="DT" ref="8">och</token>
<token dephead_ref="4" deprel="SP" ref="9">stiftare</token>
<token dephead_ref="9" deprel="ET" ref="10">av</token>
<token dephead_ref="10" deprel="PA" ref="11">Nobelpriset</token>
<token dephead_ref="4" deprel="IP" ref="12">.</token>
```
standard_reference: |-
Joakim Nivre, Johan Hall, and Jens Nilsson. 2006. MaltParser: A Data-Driven Parser-Generator for Dependency Parsing.
In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy.
European Language Resources Association (ELRA).
other_references:
- "Maltparser: https://www.maltparser.org/download.html"
- 'https://aclanthology.org/2021.nodalida-main.20/'
tool: "Maltparser"
model: "[Swemalt](https://www.maltparser.org/mco/swedish_parser/swemalt.html)"
trained_on: "[Svensk trädbank (the TalbankenSTB part)](https://spraakbanken.gu.se/resurser/sv-treebank)"
tagset: "[MambaDep](https://svn.spraakdata.gu.se/sb-arkiv/pub/mamba.html)"
evaluation_results: Labelled Attachment Score 0.78 (using the TalbankenSBX train-dev-test split)
description:
swe: |-
Denna Maltparser model har konfigurerats för svenska och tränats på TalbankenSTB-korpusen.
eng: |-
This MaltParser model configured for Swedish has been trained on the TalbankenSTB corpus.
created: 2010-12-15
updated: 2021-06-01
97 changes: 97 additions & 0 deletions sparv/modules/phrase_structure/metadata.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
id: swe-phrasestructure-sparv
name:
swe: Svensk frasstrukturparsning
eng: Swedish phrase structure parsing
short_description:
swe: Svensk frastrukturparser konverterade från Mamba-Dep dependensanalys
eng: Swedish phrase structure parser converted from Mamba-Dep dependency analysis
task: phrase structure parsing
language_codes:
- swe
keywords:
- phrase structure parsing
annotations:
- phrase_structure.phrase
- phrase_structure.phrase:phrase_structure.name
- phrase_structure.phrase:phrase_structure.func
example_output: |-
```xml
<phrase func="ROOT" name="S">
<phrase func="SS" name="NP">
<token>Alfred</token>
<token>Bernhard</token>
<token>Nobel</token>
</phrase>
<token>var</token>
<phrase func="SP" name="NP">
<phrase func="DT" name="NP">
<token>en</token>
<phrase func="AT" name="ADJP">
<token>svensk</token>
<phrase func="CJ" name="NP">
<token>kemist</token>
</phrase>
</phrase>
<token>och</token>
</phrase>
<token>stiftare</token>
<phrase func="ET" name="PrP">
<token>av</token>
<phrase func="PA" name="NP">
<token>Nobelpriset</token>
<token>.</token>
</phrase>
</phrase>
</phrase>
</phrase>
```
standard_reference: ''
other_references: []
tool: ''
model: "Method has no model"
trained_on: "[TalbankenSBX](https://spraakbanken.gu.se/resurser/talbanken)"
tagset: "See description below"
evaluation_results: ''
description:
swe: |-
Konverterar svenska frasstrukturer från Mamba-Dep dependensanalys med hjälp av en regel-baserad heuristik. Nedan är
fulla listan av möjliga frastrukturer:
NP: noun phrase
NP-wh: noun phrase with a relativizer, e.g. "whose mother"
PrP: prepositional phrase
PrP-wh: prepositional phrase with a relativizer, t.e. "in which"
SBAR: subordinate clause introduced by a subordinator
S: clause
S-wh: clause introduced by a relativizer
S-imp: clause in the imperative
VP-sup: verb phrase using the supine
VP-att: verb phrase with the infinitive, including the infinitive marker "att"
VP-inf: verb phrase with the infinitive, without an infinitive marker
VP-fin: finite verb phrase
ADJP: adjective phrase
ADVP: adverb phrase
ADVP-wh: adverb phrase with a relativizer
QP: numeral phrase
eng: |-
Converts swedish phrase structures from Mamba-Dep dependency analysis using a rule-based heuristic. Below is the
complete list of possible phrase structure labels:
NP: noun phrase
NP-wh: noun phrase with a relativizer, e.g. "whose mother"
PrP: prepositional phrase
PrP-wh: prepositional phrase with a relativizer, t.e. "in which"
SBAR: subordinate clause introduced by a subordinator
S: clause
S-wh: clause introduced by a relativizer
S-imp: clause in the imperative
VP-sup: verb phrase using the supine
VP-att: verb phrase with the infinitive, including the infinitive marker "att"
VP-inf: verb phrase with the infinitive, without an infinitive marker
VP-fin: finite verb phrase
ADJP: adjective phrase
ADVP: adverb phrase
ADVP-wh: adverb phrase with a relativizer
QP: numeral phrase
created: 2018-03-28
updated: 2018-03-28
83 changes: 83 additions & 0 deletions sparv/modules/swener/metadata.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
id: swe-namedentity-swener
name:
swe: Namnigenkänning med HFST-SweNER
eng: Named entity recognition with HFST-SweNER
short_description:
swe: Namnigenkänning känner igen och förser namn och namnliknande uttryck (s.k. entiteter) i löpande text med fördefinierade etiketter, som organisation, person eller plats.
eng: Named entity recognition (NER) recognises named entities such as locations, persons and time expressions in text.
task: named entity recognition
language_codes:
- swe
keywords:
- ner
annotations:
- swener.ne
- swener.ne:swener.name
- swener.ne:swener.ex
- swener.ne:swener.type
- swener.ne:swener.subtype
example_output: |-
```xml
<ne ex="ENAMEX" name="Alfred Bernhard Nobel" subtype="HUM" type="PRS">
<token>Alfred</token>
<token>Bernhard</token>
<token>Nobel</token>
</ne>
<token>,</token>
<token>född</token>
<ne ex="TIMEX" name="21 oktober 1833" subtype="DAT" type="TME">
<token>21</token>
<token>oktober</token>
<token>1833</token>
</ne>
<token>i</token>
<ne ex="ENAMEX" name="Stockholm" subtype="PPL" type="LOC">
<token>Stockholm</token>
</ne>
<token>,</token>
<ne ex="ENAMEX" name="Italien" subtype="PPL" type="LOC">
<token>Italien</token>
</ne>
<token>,</token>
<token>var</token>
<token>en</token>
<token>svensk</token>
<token>kemist</token>
<token>och</token>
<token>stiftare</token>
<token>av</token>
<ne ex="ENAMEX" name="Nobelpriset" subtype="PRZ" type="OBJ">
<token>Nobelpriset</token>
</ne>
```
standard_reference: |-
[Dimitrios Kokkinakis, Jyrki Niemi, Sam Hardwick, Krister Lindén, and Lars Borin. 2014. HFST-SweNER — A New NER
Resource for Swedish. In Proceedings of the Ninth International Conference on Language Resources and Evaluation
(LREC'14), pages 2537-2543, Reykjavik, Iceland. European Language Resources Association
(ELRA).](http://www.lrec-conf.org/proceedings/lrec2014/pdf/391_Paper.pdf)
other_references:
- "[Dimitrios Kokkinakis. 2004. Reducing the effect of name explosion](https://demo.spraakbanken.gu.se/svedk/pbl/kokkinakisBNER.pdf)"
- "Download HFST-SweNER: https://www.kielipankki.fi/download/HFST-SweNER/"
tool: "HFST-SweNER"
model: "Included in the tool"
trained_on: ''
tagset: "[Named entity tags from hfst-SweNER](https://svn.spraakdata.gu.se/sb-arkiv/pub/swener-tags.html)"
evaluation_results: "f-score between 91.33% to 27.48%, depending on the named entity category"
description:
swe: |-
Namnigenkänning är en språkteknologisk tekniks som automatiskt känner igen och förser namn och namnliknande uttryck
(s.k. entiteter) i löpande text med fördefinierade etiketter, som t. ex. person eller organisationer, men, beroende
på tillämpningsområdet, även numeriska uttryck och tidsuttryck. HFST-SweNER bygger på konvertering, modellering och
anpassning av en tidigare svenskt NER-system till Helsinki Finite-State Transducer Technology (HFST)-plattformen.
HFST-SweNER är en fullfjädrad implementering med öppen källkod som stöder en mängd olika generiska namngivna
entitetstyper och består av flera lexikala resurslager såsom olika n-gram-baserade namngivna namnlistor (s.k.
gazetteers).
eng: |-
Named entity recognition (NER) recognises textual mentions of named entities that belong to a predefined set of
categories, such as locations, and time expressions. HFST-SweNER is based on the conversion, modelling and
adaptation of a Swedish NER system from a hybrid environment to the Helsinki Finite-State Transducer Technology
(HFST) platform. HFST-SweNER is a full-fledged open source implementation that supports a variety of generic named
entity types and consists of multiple, reusable resource layers such as various n-gram-based named entity lists
(gazetteers).
created: 2014-07-04
updated: 2020-05-13
61 changes: 61 additions & 0 deletions sparv/modules/wsd/metadata.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
id: swe-sense-wsd
name:
swe: Betydelsedisambiguering med hjälp av SALDO ID:n
eng: Sense disambiguation of SALDO identifiers
short_description:
swe: Ordbetydelsedisambiguering baserad på annotering i SALDO
eng: Word sense disambiguation based on SALDO annotation
task: sense disambiguation
language_codes:
- swe
keywords:
- sense disambiguation
- saldo
annotations:
- <token>:wsd.sense
example_output: |-
```xml
<token sense="|den..2:-1.000|">Det</token>
<token sense="|finna..1:0.497|finnas..1:0.472|finna..2:0.031|">finns</token>
<token sense="|den..1:-1.000|en..2:-1.000|">en</token>
<token sense="|fil..4:0.661|fil..5:0.194|fil..1:0.104|fil..2:0.040|fil..3:0.001|">fil</token>
<token sense="|i..2:-1.000|">i</token>
<token sense="|katalog..1:-1.000|">katalogen</token>
<token sense="|på..1:-1.000|">på</token>
<token sense="|den..1:-1.000|den..2:-1.000|en..2:-1.000|">den</token>
<token sense="|extern..1:-1.000|">externa</token>
<token sense="|hårddisk..1:-1.000|">hårddisken</token>
<token sense="|">.</token>
<token sense="|man..1:-1.000|">Man</token>
<token sense="|kunna..1:0.666|kunna..4:0.147|kunna..3:0.110|kunna..2:0.077|">kan</token>
<token sense="|använda..1:-1.000|">använda</token>
<token sense="|den..1:-1.000|en..2:-1.000|">en</token>
<token sense="|fil..2:0.573|fil..4:0.213|fil..1:0.130|fil..5:0.084|fil..3:0.001|">fil</token>
<token sense="|för..1:-1.000|för..5:-1.000|för..6:-1.000|för..7:-1.000|för..9:-1.000|">för</token>
<token sense="|att..1:-1.000|">att</token>
<token sense="|slipa..2:0.832|slipa..1:0.168|">slipa</token>
<token sense="|kant..1:-1.000|">kanterna</token>
<token sense="|på..1:-1.000|">på</token>
<token sense="|bräda..1:0.787|bräde..1:0.213|">brädan</token>
<token sense="|">.</token>
```
standard_reference: 'https://aclanthology.org/N15-1164.pdf'
other_references:
- https://github.com/spraakbanken/sparv-wsd/blob/master/README.pdf
- "Sparv wsd: https://github.com/spraakbanken/sparv-wsd"
tool: "Sparv wsd"
model: |-
- [ALL_512_128_w10_A2_140403_ctx1.bin](https://github.com/spraakbanken/sparv-wsd/blob/master/models/scouse/ALL_512_128_w10_A2_140403_ctx1.bin)
- [lem_cbow0_s512_w10_NEW2_ctx.bin](https://github.com/spraakbanken/sparv-wsd/blob/master/models/scouse/lem_cbow0_s512_w10_NEW2_ctx.bin)
trained_on: 'SALDO from May 2014 (SCOUSE model)'
tagset: ''
evaluation_results: |-
Using lemma embeddings:
precision: 0.569 recall: 0.292 f-measure: 0.386
Using sense embeddings:
precision: 0.667 recall: 0.332 f-measure: 0.443
More information: https://aclanthology.org/N15-1164.pdf
created: 2018-05-28
updated: 2022-05-13

0 comments on commit 4959865

Please sign in to comment.