diff --git a/sparv/modules/conll_export/metadata.yaml b/sparv/modules/conll_export/metadata.yaml new file mode 100644 index 00000000..767c9eaa --- /dev/null +++ b/sparv/modules/conll_export/metadata.yaml @@ -0,0 +1,26 @@ +id: export-conllu +name: + swe: CoNLL-U-export + eng: CoNLL-U export +short_description: + swe: Export av korpusdata i Språkbanken Texts CoNLL-U-format + eng: Export of corpus data in Språkbanken Text's CoNLL-U format +task: export +keywords: + - conll-u + - export +sparv_handler: conll_export:conllu +example_output: |- + ``` + # global.columns = ID FORM LEMMA UPOS XPOS FEATS HEAD DEPREL DEPS MISC + # document_name = rävar + # text_date = 2017-01-10 + # text_title = Rödräv + # sent_id = 157 + 1 Rödräven rödräv NN NN.UTR.SIN.DEF.NOM Case=Nom|Definite=Def|Gender=Com|Number=Sing 2 ss _ _ + 2 är vara VB VB.PRS.AKT Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Act 0 root _ _ + 3 ett en DT DT.NEU.SIN.IND Definite=Ind|Gender=Neut|Number=Sing 4 dt _ _ + 4 hunddjur hunddjur NN NN.NEU.SIN.IND.NOM Case=Nom|Definite=Ind|Gender=Neut|Number=Sing 2 sp _ _ + ``` +created: 2020-11-20 +updated: 2022-03-25 diff --git a/sparv/modules/malt/metadata.yaml b/sparv/modules/malt/metadata.yaml new file mode 100644 index 00000000..bf5994d6 --- /dev/null +++ b/sparv/modules/malt/metadata.yaml @@ -0,0 +1,50 @@ +id: swe-dependency-malt-treebank +name: + swe: Dependensparsning med MaltParser + eng: Dependency parsing with MaltParser +short_description: + swe: Svensk dependensparsning tränad på Svensk trädbank baserad på MaltParser + eng: Swedish dependency parsing from MaltParser trained on Sweedish treebank +task: dependency parsing +language_codes: + - swe +keywords: + - dependency parsing +annotations: + - :malt.ref + - :malt.dephead_ref + - :malt.deprel +example_output: |- + ```xml + Alfred + Bernhard + Nobel + var + en + svensk + kemist + och + stiftare + av + Nobelpriset + . + ``` +standard_reference: |- + Joakim Nivre, Johan Hall, and Jens Nilsson. 2006. MaltParser: A Data-Driven Parser-Generator for Dependency Parsing. + In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. + European Language Resources Association (ELRA). +other_references: + - "Maltparser: https://www.maltparser.org/download.html" + - 'https://aclanthology.org/2021.nodalida-main.20/' +tool: "Maltparser" +model: "[Swemalt](https://www.maltparser.org/mco/swedish_parser/swemalt.html)" +trained_on: "[Svensk trädbank (the TalbankenSTB part)](https://spraakbanken.gu.se/resurser/sv-treebank)" +tagset: "[MambaDep](https://svn.spraakdata.gu.se/sb-arkiv/pub/mamba.html)" +evaluation_results: Labelled Attachment Score 0.78 (using the TalbankenSBX train-dev-test split) +description: + swe: |- + Denna Maltparser model har konfigurerats för svenska och tränats på TalbankenSTB-korpusen. + eng: |- + This MaltParser model configured for Swedish has been trained on the TalbankenSTB corpus. +created: 2010-12-15 +updated: 2021-06-01 diff --git a/sparv/modules/phrase_structure/metadata.yaml b/sparv/modules/phrase_structure/metadata.yaml new file mode 100644 index 00000000..2454b420 --- /dev/null +++ b/sparv/modules/phrase_structure/metadata.yaml @@ -0,0 +1,97 @@ +id: swe-phrasestructure-sparv +name: + swe: Svensk frasstrukturparsning + eng: Swedish phrase structure parsing +short_description: + swe: Svensk frastrukturparser konverterade från Mamba-Dep dependensanalys + eng: Swedish phrase structure parser converted from Mamba-Dep dependency analysis +task: phrase structure parsing +language_codes: + - swe +keywords: + - phrase structure parsing +annotations: + - phrase_structure.phrase + - phrase_structure.phrase:phrase_structure.name + - phrase_structure.phrase:phrase_structure.func +example_output: |- + ```xml + + + Alfred + Bernhard + Nobel + + var + + + en + + svensk + + kemist + + + och + + stiftare + + av + + Nobelpriset + . + + + + + ``` +standard_reference: '' +other_references: [] +tool: '' +model: "Method has no model" +trained_on: "[TalbankenSBX](https://spraakbanken.gu.se/resurser/talbanken)" +tagset: "See description below" +evaluation_results: '' +description: + swe: |- + Konverterar svenska frasstrukturer från Mamba-Dep dependensanalys med hjälp av en regel-baserad heuristik. Nedan är + fulla listan av möjliga frastrukturer: + + NP: noun phrase + NP-wh: noun phrase with a relativizer, e.g. "whose mother" + PrP: prepositional phrase + PrP-wh: prepositional phrase with a relativizer, t.e. "in which" + SBAR: subordinate clause introduced by a subordinator + S: clause + S-wh: clause introduced by a relativizer + S-imp: clause in the imperative + VP-sup: verb phrase using the supine + VP-att: verb phrase with the infinitive, including the infinitive marker "att" + VP-inf: verb phrase with the infinitive, without an infinitive marker + VP-fin: finite verb phrase + ADJP: adjective phrase + ADVP: adverb phrase + ADVP-wh: adverb phrase with a relativizer + QP: numeral phrase + eng: |- + Converts swedish phrase structures from Mamba-Dep dependency analysis using a rule-based heuristic. Below is the + complete list of possible phrase structure labels: + + NP: noun phrase + NP-wh: noun phrase with a relativizer, e.g. "whose mother" + PrP: prepositional phrase + PrP-wh: prepositional phrase with a relativizer, t.e. "in which" + SBAR: subordinate clause introduced by a subordinator + S: clause + S-wh: clause introduced by a relativizer + S-imp: clause in the imperative + VP-sup: verb phrase using the supine + VP-att: verb phrase with the infinitive, including the infinitive marker "att" + VP-inf: verb phrase with the infinitive, without an infinitive marker + VP-fin: finite verb phrase + ADJP: adjective phrase + ADVP: adverb phrase + ADVP-wh: adverb phrase with a relativizer + QP: numeral phrase +created: 2018-03-28 +updated: 2018-03-28 diff --git a/sparv/modules/swener/metadata.yaml b/sparv/modules/swener/metadata.yaml new file mode 100644 index 00000000..15fde2d9 --- /dev/null +++ b/sparv/modules/swener/metadata.yaml @@ -0,0 +1,83 @@ +id: swe-namedentity-swener +name: + swe: Namnigenkänning med HFST-SweNER + eng: Named entity recognition with HFST-SweNER +short_description: + swe: Namnigenkänning känner igen och förser namn och namnliknande uttryck (s.k. entiteter) i löpande text med fördefinierade etiketter, som organisation, person eller plats. + eng: Named entity recognition (NER) recognises named entities such as locations, persons and time expressions in text. +task: named entity recognition +language_codes: + - swe +keywords: + - ner +annotations: + - swener.ne + - swener.ne:swener.name + - swener.ne:swener.ex + - swener.ne:swener.type + - swener.ne:swener.subtype +example_output: |- + ```xml + + Alfred + Bernhard + Nobel + + , + född + + 21 + oktober + 1833 + + i + + Stockholm + + , + + Italien + + , + var + en + svensk + kemist + och + stiftare + av + + Nobelpriset + + ``` +standard_reference: |- + [Dimitrios Kokkinakis, Jyrki Niemi, Sam Hardwick, Krister Lindén, and Lars Borin. 2014. HFST-SweNER — A New NER + Resource for Swedish. In Proceedings of the Ninth International Conference on Language Resources and Evaluation + (LREC'14), pages 2537-2543, Reykjavik, Iceland. European Language Resources Association + (ELRA).](http://www.lrec-conf.org/proceedings/lrec2014/pdf/391_Paper.pdf) +other_references: + - "[Dimitrios Kokkinakis. 2004. Reducing the effect of name explosion](https://demo.spraakbanken.gu.se/svedk/pbl/kokkinakisBNER.pdf)" + - "Download HFST-SweNER: https://www.kielipankki.fi/download/HFST-SweNER/" +tool: "HFST-SweNER" +model: "Included in the tool" +trained_on: '' +tagset: "[Named entity tags from hfst-SweNER](https://svn.spraakdata.gu.se/sb-arkiv/pub/swener-tags.html)" +evaluation_results: "f-score between 91.33% to 27.48%, depending on the named entity category" +description: + swe: |- + Namnigenkänning är en språkteknologisk tekniks som automatiskt känner igen och förser namn och namnliknande uttryck + (s.k. entiteter) i löpande text med fördefinierade etiketter, som t. ex. person eller organisationer, men, beroende + på tillämpningsområdet, även numeriska uttryck och tidsuttryck. HFST-SweNER bygger på konvertering, modellering och + anpassning av en tidigare svenskt NER-system till Helsinki Finite-State Transducer Technology (HFST)-plattformen. + HFST-SweNER är en fullfjädrad implementering med öppen källkod som stöder en mängd olika generiska namngivna + entitetstyper och består av flera lexikala resurslager såsom olika n-gram-baserade namngivna namnlistor (s.k. + gazetteers). + eng: |- + Named entity recognition (NER) recognises textual mentions of named entities that belong to a predefined set of + categories, such as locations, and time expressions. HFST-SweNER is based on the conversion, modelling and + adaptation of a Swedish NER system from a hybrid environment to the Helsinki Finite-State Transducer Technology + (HFST) platform. HFST-SweNER is a full-fledged open source implementation that supports a variety of generic named + entity types and consists of multiple, reusable resource layers such as various n-gram-based named entity lists + (gazetteers). +created: 2014-07-04 +updated: 2020-05-13 diff --git a/sparv/modules/wsd/metadata.yaml b/sparv/modules/wsd/metadata.yaml new file mode 100644 index 00000000..d866af59 --- /dev/null +++ b/sparv/modules/wsd/metadata.yaml @@ -0,0 +1,61 @@ +id: swe-sense-wsd +name: + swe: Betydelsedisambiguering med hjälp av SALDO ID:n + eng: Sense disambiguation of SALDO identifiers +short_description: + swe: Ordbetydelsedisambiguering baserad på annotering i SALDO + eng: Word sense disambiguation based on SALDO annotation +task: sense disambiguation +language_codes: + - swe +keywords: + - sense disambiguation + - saldo +annotations: + - :wsd.sense +example_output: |- + ```xml + Det + finns + en + fil + i + katalogen + + den + externa + hårddisken + . + Man + kan + använda + en + fil + för + att + slipa + kanterna + + brädan + . + ``` +standard_reference: 'https://aclanthology.org/N15-1164.pdf' +other_references: + - https://github.com/spraakbanken/sparv-wsd/blob/master/README.pdf + - "Sparv wsd: https://github.com/spraakbanken/sparv-wsd" +tool: "Sparv wsd" +model: |- + - [ALL_512_128_w10_A2_140403_ctx1.bin](https://github.com/spraakbanken/sparv-wsd/blob/master/models/scouse/ALL_512_128_w10_A2_140403_ctx1.bin) + - [lem_cbow0_s512_w10_NEW2_ctx.bin](https://github.com/spraakbanken/sparv-wsd/blob/master/models/scouse/lem_cbow0_s512_w10_NEW2_ctx.bin) +trained_on: 'SALDO from May 2014 (SCOUSE model)' +tagset: '' +evaluation_results: |- + Using lemma embeddings: + precision: 0.569 recall: 0.292 f-measure: 0.386 + + Using sense embeddings: + precision: 0.667 recall: 0.332 f-measure: 0.443 + + More information: https://aclanthology.org/N15-1164.pdf +created: 2018-05-28 +updated: 2022-05-13