-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
5 changed files
with
317 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
id: export-conllu | ||
name: | ||
swe: CoNLL-U-export | ||
eng: CoNLL-U export | ||
short_description: | ||
swe: Export av korpusdata i Språkbanken Texts CoNLL-U-format | ||
eng: Export of corpus data in Språkbanken Text's CoNLL-U format | ||
task: export | ||
keywords: | ||
- conll-u | ||
- export | ||
sparv_handler: conll_export:conllu | ||
example_output: |- | ||
``` | ||
# global.columns = ID FORM LEMMA UPOS XPOS FEATS HEAD DEPREL DEPS MISC | ||
# document_name = rävar | ||
# text_date = 2017-01-10 | ||
# text_title = Rödräv | ||
# sent_id = 157 | ||
1 Rödräven rödräv NN NN.UTR.SIN.DEF.NOM Case=Nom|Definite=Def|Gender=Com|Number=Sing 2 ss _ _ | ||
2 är vara VB VB.PRS.AKT Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Act 0 root _ _ | ||
3 ett en DT DT.NEU.SIN.IND Definite=Ind|Gender=Neut|Number=Sing 4 dt _ _ | ||
4 hunddjur hunddjur NN NN.NEU.SIN.IND.NOM Case=Nom|Definite=Ind|Gender=Neut|Number=Sing 2 sp _ _ | ||
``` | ||
created: 2020-11-20 | ||
updated: 2022-03-25 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
id: swe-dependency-malt-treebank | ||
name: | ||
swe: Dependensparsning med MaltParser | ||
eng: Dependency parsing with MaltParser | ||
short_description: | ||
swe: Svensk dependensparsning tränad på Svensk trädbank baserad på MaltParser | ||
eng: Swedish dependency parsing from MaltParser trained on Sweedish treebank | ||
task: dependency parsing | ||
language_codes: | ||
- swe | ||
keywords: | ||
- dependency parsing | ||
annotations: | ||
- <token>:malt.ref | ||
- <token>:malt.dephead_ref | ||
- <token>:malt.deprel | ||
example_output: |- | ||
```xml | ||
<token dephead_ref="4" deprel="SS" ref="1">Alfred</token> | ||
<token dephead_ref="1" deprel="HD" ref="2">Bernhard</token> | ||
<token dephead_ref="1" deprel="HD" ref="3">Nobel</token> | ||
<token deprel="ROOT" ref="4">var</token> | ||
<token dephead_ref="8" deprel="DT" ref="5">en</token> | ||
<token dephead_ref="8" deprel="AT" ref="6">svensk</token> | ||
<token dephead_ref="8" deprel="CJ" ref="7">kemist</token> | ||
<token dephead_ref="9" deprel="DT" ref="8">och</token> | ||
<token dephead_ref="4" deprel="SP" ref="9">stiftare</token> | ||
<token dephead_ref="9" deprel="ET" ref="10">av</token> | ||
<token dephead_ref="10" deprel="PA" ref="11">Nobelpriset</token> | ||
<token dephead_ref="4" deprel="IP" ref="12">.</token> | ||
``` | ||
standard_reference: |- | ||
Joakim Nivre, Johan Hall, and Jens Nilsson. 2006. MaltParser: A Data-Driven Parser-Generator for Dependency Parsing. | ||
In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. | ||
European Language Resources Association (ELRA). | ||
other_references: | ||
- "Maltparser: https://www.maltparser.org/download.html" | ||
- 'https://aclanthology.org/2021.nodalida-main.20/' | ||
tool: "Maltparser" | ||
model: "[Swemalt](https://www.maltparser.org/mco/swedish_parser/swemalt.html)" | ||
trained_on: "[Svensk trädbank (the TalbankenSTB part)](https://spraakbanken.gu.se/resurser/sv-treebank)" | ||
tagset: "[MambaDep](https://svn.spraakdata.gu.se/sb-arkiv/pub/mamba.html)" | ||
evaluation_results: Labelled Attachment Score 0.78 (using the TalbankenSBX train-dev-test split) | ||
description: | ||
swe: |- | ||
Denna Maltparser model har konfigurerats för svenska och tränats på TalbankenSTB-korpusen. | ||
eng: |- | ||
This MaltParser model configured for Swedish has been trained on the TalbankenSTB corpus. | ||
created: 2010-12-15 | ||
updated: 2021-06-01 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,97 @@ | ||
id: swe-phrasestructure-sparv | ||
name: | ||
swe: Svensk frasstrukturparsning | ||
eng: Swedish phrase structure parsing | ||
short_description: | ||
swe: Svensk frastrukturparser konverterade från Mamba-Dep dependensanalys | ||
eng: Swedish phrase structure parser converted from Mamba-Dep dependency analysis | ||
task: phrase structure parsing | ||
language_codes: | ||
- swe | ||
keywords: | ||
- phrase structure parsing | ||
annotations: | ||
- phrase_structure.phrase | ||
- phrase_structure.phrase:phrase_structure.name | ||
- phrase_structure.phrase:phrase_structure.func | ||
example_output: |- | ||
```xml | ||
<phrase func="ROOT" name="S"> | ||
<phrase func="SS" name="NP"> | ||
<token>Alfred</token> | ||
<token>Bernhard</token> | ||
<token>Nobel</token> | ||
</phrase> | ||
<token>var</token> | ||
<phrase func="SP" name="NP"> | ||
<phrase func="DT" name="NP"> | ||
<token>en</token> | ||
<phrase func="AT" name="ADJP"> | ||
<token>svensk</token> | ||
<phrase func="CJ" name="NP"> | ||
<token>kemist</token> | ||
</phrase> | ||
</phrase> | ||
<token>och</token> | ||
</phrase> | ||
<token>stiftare</token> | ||
<phrase func="ET" name="PrP"> | ||
<token>av</token> | ||
<phrase func="PA" name="NP"> | ||
<token>Nobelpriset</token> | ||
<token>.</token> | ||
</phrase> | ||
</phrase> | ||
</phrase> | ||
</phrase> | ||
``` | ||
standard_reference: '' | ||
other_references: [] | ||
tool: '' | ||
model: "Method has no model" | ||
trained_on: "[TalbankenSBX](https://spraakbanken.gu.se/resurser/talbanken)" | ||
tagset: "See description below" | ||
evaluation_results: '' | ||
description: | ||
swe: |- | ||
Konverterar svenska frasstrukturer från Mamba-Dep dependensanalys med hjälp av en regel-baserad heuristik. Nedan är | ||
fulla listan av möjliga frastrukturer: | ||
NP: noun phrase | ||
NP-wh: noun phrase with a relativizer, e.g. "whose mother" | ||
PrP: prepositional phrase | ||
PrP-wh: prepositional phrase with a relativizer, t.e. "in which" | ||
SBAR: subordinate clause introduced by a subordinator | ||
S: clause | ||
S-wh: clause introduced by a relativizer | ||
S-imp: clause in the imperative | ||
VP-sup: verb phrase using the supine | ||
VP-att: verb phrase with the infinitive, including the infinitive marker "att" | ||
VP-inf: verb phrase with the infinitive, without an infinitive marker | ||
VP-fin: finite verb phrase | ||
ADJP: adjective phrase | ||
ADVP: adverb phrase | ||
ADVP-wh: adverb phrase with a relativizer | ||
QP: numeral phrase | ||
eng: |- | ||
Converts swedish phrase structures from Mamba-Dep dependency analysis using a rule-based heuristic. Below is the | ||
complete list of possible phrase structure labels: | ||
NP: noun phrase | ||
NP-wh: noun phrase with a relativizer, e.g. "whose mother" | ||
PrP: prepositional phrase | ||
PrP-wh: prepositional phrase with a relativizer, t.e. "in which" | ||
SBAR: subordinate clause introduced by a subordinator | ||
S: clause | ||
S-wh: clause introduced by a relativizer | ||
S-imp: clause in the imperative | ||
VP-sup: verb phrase using the supine | ||
VP-att: verb phrase with the infinitive, including the infinitive marker "att" | ||
VP-inf: verb phrase with the infinitive, without an infinitive marker | ||
VP-fin: finite verb phrase | ||
ADJP: adjective phrase | ||
ADVP: adverb phrase | ||
ADVP-wh: adverb phrase with a relativizer | ||
QP: numeral phrase | ||
created: 2018-03-28 | ||
updated: 2018-03-28 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
id: swe-namedentity-swener | ||
name: | ||
swe: Namnigenkänning med HFST-SweNER | ||
eng: Named entity recognition with HFST-SweNER | ||
short_description: | ||
swe: Namnigenkänning känner igen och förser namn och namnliknande uttryck (s.k. entiteter) i löpande text med fördefinierade etiketter, som organisation, person eller plats. | ||
eng: Named entity recognition (NER) recognises named entities such as locations, persons and time expressions in text. | ||
task: named entity recognition | ||
language_codes: | ||
- swe | ||
keywords: | ||
- ner | ||
annotations: | ||
- swener.ne | ||
- swener.ne:swener.name | ||
- swener.ne:swener.ex | ||
- swener.ne:swener.type | ||
- swener.ne:swener.subtype | ||
example_output: |- | ||
```xml | ||
<ne ex="ENAMEX" name="Alfred Bernhard Nobel" subtype="HUM" type="PRS"> | ||
<token>Alfred</token> | ||
<token>Bernhard</token> | ||
<token>Nobel</token> | ||
</ne> | ||
<token>,</token> | ||
<token>född</token> | ||
<ne ex="TIMEX" name="21 oktober 1833" subtype="DAT" type="TME"> | ||
<token>21</token> | ||
<token>oktober</token> | ||
<token>1833</token> | ||
</ne> | ||
<token>i</token> | ||
<ne ex="ENAMEX" name="Stockholm" subtype="PPL" type="LOC"> | ||
<token>Stockholm</token> | ||
</ne> | ||
<token>,</token> | ||
<ne ex="ENAMEX" name="Italien" subtype="PPL" type="LOC"> | ||
<token>Italien</token> | ||
</ne> | ||
<token>,</token> | ||
<token>var</token> | ||
<token>en</token> | ||
<token>svensk</token> | ||
<token>kemist</token> | ||
<token>och</token> | ||
<token>stiftare</token> | ||
<token>av</token> | ||
<ne ex="ENAMEX" name="Nobelpriset" subtype="PRZ" type="OBJ"> | ||
<token>Nobelpriset</token> | ||
</ne> | ||
``` | ||
standard_reference: |- | ||
[Dimitrios Kokkinakis, Jyrki Niemi, Sam Hardwick, Krister Lindén, and Lars Borin. 2014. HFST-SweNER — A New NER | ||
Resource for Swedish. In Proceedings of the Ninth International Conference on Language Resources and Evaluation | ||
(LREC'14), pages 2537-2543, Reykjavik, Iceland. European Language Resources Association | ||
(ELRA).](http://www.lrec-conf.org/proceedings/lrec2014/pdf/391_Paper.pdf) | ||
other_references: | ||
- "[Dimitrios Kokkinakis. 2004. Reducing the effect of name explosion](https://demo.spraakbanken.gu.se/svedk/pbl/kokkinakisBNER.pdf)" | ||
- "Download HFST-SweNER: https://www.kielipankki.fi/download/HFST-SweNER/" | ||
tool: "HFST-SweNER" | ||
model: "Included in the tool" | ||
trained_on: '' | ||
tagset: "[Named entity tags from hfst-SweNER](https://svn.spraakdata.gu.se/sb-arkiv/pub/swener-tags.html)" | ||
evaluation_results: "f-score between 91.33% to 27.48%, depending on the named entity category" | ||
description: | ||
swe: |- | ||
Namnigenkänning är en språkteknologisk tekniks som automatiskt känner igen och förser namn och namnliknande uttryck | ||
(s.k. entiteter) i löpande text med fördefinierade etiketter, som t. ex. person eller organisationer, men, beroende | ||
på tillämpningsområdet, även numeriska uttryck och tidsuttryck. HFST-SweNER bygger på konvertering, modellering och | ||
anpassning av en tidigare svenskt NER-system till Helsinki Finite-State Transducer Technology (HFST)-plattformen. | ||
HFST-SweNER är en fullfjädrad implementering med öppen källkod som stöder en mängd olika generiska namngivna | ||
entitetstyper och består av flera lexikala resurslager såsom olika n-gram-baserade namngivna namnlistor (s.k. | ||
gazetteers). | ||
eng: |- | ||
Named entity recognition (NER) recognises textual mentions of named entities that belong to a predefined set of | ||
categories, such as locations, and time expressions. HFST-SweNER is based on the conversion, modelling and | ||
adaptation of a Swedish NER system from a hybrid environment to the Helsinki Finite-State Transducer Technology | ||
(HFST) platform. HFST-SweNER is a full-fledged open source implementation that supports a variety of generic named | ||
entity types and consists of multiple, reusable resource layers such as various n-gram-based named entity lists | ||
(gazetteers). | ||
created: 2014-07-04 | ||
updated: 2020-05-13 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
id: swe-sense-wsd | ||
name: | ||
swe: Betydelsedisambiguering med hjälp av SALDO ID:n | ||
eng: Sense disambiguation of SALDO identifiers | ||
short_description: | ||
swe: Ordbetydelsedisambiguering baserad på annotering i SALDO | ||
eng: Word sense disambiguation based on SALDO annotation | ||
task: sense disambiguation | ||
language_codes: | ||
- swe | ||
keywords: | ||
- sense disambiguation | ||
- saldo | ||
annotations: | ||
- <token>:wsd.sense | ||
example_output: |- | ||
```xml | ||
<token sense="|den..2:-1.000|">Det</token> | ||
<token sense="|finna..1:0.497|finnas..1:0.472|finna..2:0.031|">finns</token> | ||
<token sense="|den..1:-1.000|en..2:-1.000|">en</token> | ||
<token sense="|fil..4:0.661|fil..5:0.194|fil..1:0.104|fil..2:0.040|fil..3:0.001|">fil</token> | ||
<token sense="|i..2:-1.000|">i</token> | ||
<token sense="|katalog..1:-1.000|">katalogen</token> | ||
<token sense="|på..1:-1.000|">på</token> | ||
<token sense="|den..1:-1.000|den..2:-1.000|en..2:-1.000|">den</token> | ||
<token sense="|extern..1:-1.000|">externa</token> | ||
<token sense="|hårddisk..1:-1.000|">hårddisken</token> | ||
<token sense="|">.</token> | ||
<token sense="|man..1:-1.000|">Man</token> | ||
<token sense="|kunna..1:0.666|kunna..4:0.147|kunna..3:0.110|kunna..2:0.077|">kan</token> | ||
<token sense="|använda..1:-1.000|">använda</token> | ||
<token sense="|den..1:-1.000|en..2:-1.000|">en</token> | ||
<token sense="|fil..2:0.573|fil..4:0.213|fil..1:0.130|fil..5:0.084|fil..3:0.001|">fil</token> | ||
<token sense="|för..1:-1.000|för..5:-1.000|för..6:-1.000|för..7:-1.000|för..9:-1.000|">för</token> | ||
<token sense="|att..1:-1.000|">att</token> | ||
<token sense="|slipa..2:0.832|slipa..1:0.168|">slipa</token> | ||
<token sense="|kant..1:-1.000|">kanterna</token> | ||
<token sense="|på..1:-1.000|">på</token> | ||
<token sense="|bräda..1:0.787|bräde..1:0.213|">brädan</token> | ||
<token sense="|">.</token> | ||
``` | ||
standard_reference: 'https://aclanthology.org/N15-1164.pdf' | ||
other_references: | ||
- https://github.com/spraakbanken/sparv-wsd/blob/master/README.pdf | ||
- "Sparv wsd: https://github.com/spraakbanken/sparv-wsd" | ||
tool: "Sparv wsd" | ||
model: |- | ||
- [ALL_512_128_w10_A2_140403_ctx1.bin](https://github.com/spraakbanken/sparv-wsd/blob/master/models/scouse/ALL_512_128_w10_A2_140403_ctx1.bin) | ||
- [lem_cbow0_s512_w10_NEW2_ctx.bin](https://github.com/spraakbanken/sparv-wsd/blob/master/models/scouse/lem_cbow0_s512_w10_NEW2_ctx.bin) | ||
trained_on: 'SALDO from May 2014 (SCOUSE model)' | ||
tagset: '' | ||
evaluation_results: |- | ||
Using lemma embeddings: | ||
precision: 0.569 recall: 0.292 f-measure: 0.386 | ||
Using sense embeddings: | ||
precision: 0.667 recall: 0.332 f-measure: 0.443 | ||
More information: https://aclanthology.org/N15-1164.pdf | ||
created: 2018-05-28 | ||
updated: 2022-05-13 |