Skip to content

Commit

Permalink
Add Measurement detection in Swedish (#211)
Browse files Browse the repository at this point in the history
The language model detects measurements
with units (km, liter, kronor, procent, mg, %,...),
e.g. "5 procent", "20mg", "218-263 GHz",...

It also detects certain measurements with an
indicator but without a unit, e.g. "BMI 25".

Basic detection of units combined with frequency
(time) as one unit, e.g. "mm per år", "dollar om dagen",
"l per dygn".

Only detection of markers, no spans.
  • Loading branch information
ISC-SDE committed Aug 22, 2022
1 parent 061cecf commit f239134
Show file tree
Hide file tree
Showing 35 changed files with 45,092 additions and 39,649 deletions.
48 changes: 32 additions & 16 deletions language_models/sv/labels.csv
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
;1,15,40,$;SVCapitalMixed;typeAttribute;copy of mandatory label;0;;

/* Default label
;1,15,25,30,40,45,50,60,65,70,75,76,80,$;SVCon;typeConcept;default label;1;;
;1,15,25,30,35,40,45,50,60,65,70,75,76,80,$;SVCon;typeConcept;default label;1;;

/* Structural labels
;40,$;-;typeOther;symbol for composite labels;0;;
Expand Down Expand Up @@ -43,7 +43,7 @@
;1,15,25,30,40,50,65,70,75,76,$;SVAdvTime;typeBeginEndConcept;adverb of time;0;;
;1,15,30;SVAdvTimeFreq;typeBeginEndConcept;adverb expressing concrete frequency;0;;

;1,30;SVAgeprep;typeRelation;preposition used before age;0;;
;1,30,35;SVAgeprep;typeRelation;preposition used before age;0;;
;1,15,40;SVAnnan;typeAmbiguous;annan, annat, helt annat;0;;
;1,60,65,70,75;SVAndOr;typeRelation;eller, och, eller/och, och/eller;0;;
;1,15,30,40,60,65,70,75,76,80,85,$;SVAndOrBut;typeRelation;coordinate conjunction;0;;
Expand All @@ -55,12 +55,15 @@
;1,15,30,40,45,65,70,75,76,80,$;SVbCon;typeBeginConcept;first word of a Concept;0;;
;1,65,70,75,76,80,85,$;SVbRel;typeBeginRelation;first word of a Relation;0;;
;1,30;SVClocktime;typeConcept;hh.mm;0;;
;1,30;SVClocktime2;typeConcept;hh:mm -> can be time or price;0;;
;30,40,60,65,70,75,76,$;SVComma;typeOther;comma;0;;
;1,30,35;SVClocktime2;typeConcept;hh:mm -> can be time or price;0;;
;1,35;SVColonnumber;typeConcept;d(d+):d+, not in clock time range;0;;
;30,35,40,60,65,70,75,76,$;SVComma;typeOther;comma;0;;
;1,35;SVCompareprep;typeRelation;preposition used in comparisons;0;;
;1,15,40,75,76,$;SVConIfCap;typeConcept;capitalized noun or name;0;;
;1,15,40,75,76,$;SVConIfAllCap;typeConcept;acronym in upper case;0;;
;1,15,40,45;SVConpart1;typeConcept;first part of a concept;0;;
;1,15,25,40,60,65,70,75,76,80,85,$;SVConj;typeRelation;conjunction;0;;
;1,35;SVCurrency;typeConcept;currency name;0;;
;1,45,60,65,70;SVCPron;typeConcept;extra label for Concept-pronouns;0;;
;1,15,30,45,50,75,76,$;SVDay;typeConcept;name of day;0;;
;1,30,40,75,76;SVDecinum;typeConcept;extra label for decimal numbers;0;;
Expand All @@ -72,7 +75,7 @@
;1,15,40;SVFoerstaas;typeRelation;förstås: modal adverb, inf or verb;0;;
;1,15,30,40,45,65,75,76,$;SVGen;typeConcept;genitive form of noun;0;;
;1,15,40;SVIgnoreCap;typeOther;label to avoid that relation words get concepts when written with a capital;0;;
;1,15,25,30,40,60,65,70,75,76,$;SVIndefart;typeOther;indefinite article;0;;
;1,15,25,30,35,40,60,65,70,75,76,$;SVIndefart;typeOther;indefinite article;0;;
;1,15,40;SVImpCon;typeAmbiguous;imperative or noun;0;;

;1,30;SVHour;typeConcept;number that can indicate time;0;;
Expand All @@ -94,26 +97,31 @@
;1,70;SVNegadj;typeConcept;extra label for ingen, inga, inget;0;;
;1,10,15,40,60,65,70,75,76,80,85,$;SVNegrel;typeRelation;relation with negation;0;;
;1,69,75,76;SVNonRelevant;typeOther;copy of NonRelevant;0;;
;1,25,30,40,45,65,70,75,76,80,$;SVNum;typeConcept;number written in digits;0;;
;1,15,25,30,40,45,65,75,76,$;SVNumber;typeConcept;number;0;;
;1,30;SVNumpart2;typeConcept;plural numbers like 'miljoner';0;;
;1,30,35;SV12dNum;typeConcept;1 or 2 digits, can be first part of larger number;0;;
;1,30;SV3dNum;typeConcept;3 digits, can be part of larger number;0;;
;1,25,30,35,40,45,65,70,75,76,80,$;SVNum;typeConcept;number written in digits;0;;
;1,15,25,30,35,40,45,65,75,76,$;SVNumber;typeConcept;number;0;;
;1,35,$;SVNumberPlusUnit;typeConcept;general label for all numbers plus units without space;0;;Entity(Measurement,Value,Unit)
;1,30,35;SVNumpart2;typeConcept;plural numbers like 'miljoner';0;;
;1,30;SVNumX;typeConcept;digit with x in one, e.g. 2x;0;;
;1,15,40,45,60,65,70,75,76,80,$;SVObjpron;typePathRelevant;object form of personal pronoun;0;;
;1,15,30,40,45,75,76,$;SVOrdnumber;typeConcept;ordinal number;0;;
;1,30,35,40;SVQuantity;typeConcept;några, flera,...;0;;

;1,15,40;SVPart;typeAmbiguous;past participle;0;;
;1,15,40;SVPartCon;typeAmbiguous;past participle or noun;0;;
;1,15,40;SVPartSup;typeAmbiguous;past participle or supinum;0;;
;1,15,40;SVPartSupCon;typeAmbiguous;past participle, supinum or noun;0;;

;1,15,25,30,40,45,65,80;SVpAux;typeRelation;passive auxiliary;0;;
;1,15,25,30,35,40,45,65,80;SVpAux;typeRelation;passive auxiliary;0;;
;1,15,40,45;SVPluralnoun;typeConcept;plural noun;0;;
;1,15,40,75,76,$;SVPlussign;typeConcept;plus sign (cannot be used as literal);0;;
;1,15,25,40,60,65,70,75,76,$;SVPosspron;typePathRelevant;possessive pronoun;0;;
;1,15,40;SVPossCon;typeAmbiguous;possessive pronoun or noun;0;;
;1,15,40;SVPossibleGennoun;typeConcept;unknown word ending in -s;0;;
;1,15,40;SVPreferRelation;typeAmbiguous;extra label for ambiguous words that are more often relations than concepts;0;;

;1,15,25,30,40,45,60,65,70,75,76,80,85,$;SVPrep;typeRelation;preposition;0;;
;1,15,25,30,35,40,45,60,65,70,75,76,80,85,$;SVPrep;typeRelation;preposition;0;;
;1,15,40;SVPrepAdv;typeAmbiguous;preposition or adverb;0;;
;1,15,40;SVPrepCon;typeAmbiguous;preposition or noun;0;;
;1,30,75,76;SVTimeprep;typeRelation;preposition that occurs in time expressions, extra label next to SVPrep;0;;
Expand All @@ -137,9 +145,9 @@
;1,15,40;SVSupCon;typeAmbiguous;supinum or noun;0;;

;1,30;SVTimeadj;typeConcept;potential part of time indication, next to other label;0;;
;1,15,25,30,45,75,76;SVTimeconcept;typeBeginEndConcept;time indication;0;;
;1,15,25,30,35,45,75,76;SVTimeconcept;typeBeginEndConcept;time indication;0;;
;1,30;SVTimespan;typeConcept;words indication a part: halvan, resten,...;0;;
;1,30;SVUnit;typeConcept;unit for measurements;0;;
;1,15,30,35;SVUnit;typeConcept;unit for measurements;0;;
;1,65,70,75,76,80,85,$;SVUtan;typeRelation;utan;0;;
;1,15,40;SVVara;typeAmbiguous;vara: passive aux. or main verb as verb or infinitive, noun;0;;

Expand Down Expand Up @@ -168,7 +176,7 @@
;40,70,80;SVDummy;typeAttribute;meaningless attribute, usable in rules when ^LabelA in the left part has to remain untouched;0;;
;40,$;SVEttform;typeAttribute;to mark pronouns that can be used attributively with an AdvAdj;0;;
;40,$;SVSForm;typeAttribute;to mark s-forms;0;;
;15,40,$;SVRegex;typeAttribute;to mark lexreps that are labeled via regular expressions;0;;
;15,35,40,$;SVRegex;typeAttribute;to mark lexreps that are labeled via regular expressions;0;;
;60,65,70,80;SVList;typeAttribute;mark enumerations;0;;
;60;SVListEnd;typeAttribute;och så vidare, etc.;0;;

Expand All @@ -189,14 +197,22 @@
;30,45,$;SVDate;typeAttribute;used within the rules file to mark detected dates;0;;
;30,75;SVTimemodifier;typeAttribute;ungefär, exakt, etc.;0;;
;30;SVPosttime;typeAttribute;senare, e.Kr., postoperativt etc.;0;
;30,50,75,76,$;SVTime;typeAttribute;mark time indications;0;;Entity(DateTime)
;30,35,50,75,76,$;SVTime;typeAttribute;mark time indications;0;;Entity(DateTime)
;30,75,76,$;SVTimeBegin;typeAttribute;begin of Time expansion;0;;Path(Begin,DateTime)
;30,75,76,$;SVTimeStop;typeAttribute;marker to end Time expansion;0;;Path(End,DateTime)

;30;SVNummodifier;typeAttribute;ungefär, drygt, exakt etc.;0;;
;30,35;SVNummodifier;typeAttribute;ungefär, drygt, exakt etc.;0;;
;30;SVModBeforeNum;typeAttribute;assigned by rule to Nummodifier before number;0;;

;1,30,40;SVAge;typeAttribute;words that indicate the presence of an age mention;0;;
;1,30,35,40;SVAge;typeAttribute;words that indicate the presence of an age mention;0;;

;1,35,$;Measurement;typeAttribute;general attribute for all measurements;0;;Entity(Measurement)
;1,35,$;ValueProperty;typeAttribute;attribute for the Unit property;0;;Entity(Measurement,Value)
;1,35,$;UnitProperty;typeAttribute;attribute for the Unit property;0;;Entity(Measurement,Unit)
;1,$;MeasurementBegin;typeAttribute;added for enabling path expansion;0;;Path(Begin,Measurement)
;1,$;MeasurementStop;typeAttribute;added for enabling path expansion;0;;Path(End,Measurement)
;1,35;SVPostmeas;typeAttribute;speficies a unit, e.g. högre, färre;0;
;1,35;Measindicator;typeAttribute;indicates that the following lexrep is probably a measurement;0;

;40,65,70,75,80;SVPBegin;typeAttribute;start of clause;0;;
;40,65,70;SVPVerb;typeAttribute;verb;0;;
Expand Down
Loading

0 comments on commit f239134

Please sign in to comment.