Skip to content

Commit

Permalink
More spanish transforms (FooSoft#1791)
Browse files Browse the repository at this point in the history
Co-authored-by: Darius Jahandarie <[email protected]>
  • Loading branch information
jamesmaa and djahandarie authored Feb 11, 2025
1 parent 5854dbc commit 3d9756b
Show file tree
Hide file tree
Showing 4 changed files with 89 additions and 5 deletions.
12 changes: 8 additions & 4 deletions docs/development/language-features.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,9 +129,9 @@ This kind of text processing is to a degree interdependent with the dictionaries

<img align="right" src="../../img/deinflection-example-simple.png">

Deinflection is the process of converting a word to its base or dictionary form. For example, "running" would be deinflected to "run". This is useful for finding the word in the dictionary, as well as helping the user understand the grammar (morphology) of the language.
Deinflection is the process of converting a word to its base or dictionary form. For example, "running" should be deinflected to "run". This is useful for finding the word in the dictionary, as well as helping the user understand the grammar (morphology) of the language.

These grammatical rules are located in files such as `english-transforms.js`.
These grammatical rules are located in files such as `english-transforms.js`. We recommend reading through this file as an example.

> Not all the grammatical rules of a language can or need to be implemented in the transforms file. Even a little bit goes a long way, and you can always add more rules later. For every couple rules you add, write some tests in the respective file ([see the writing tests section below](#writing-deinflection-tests)). This will help you verify that your rules are correct, and make sure nothing is accidentally broken along the way.
Expand All @@ -156,7 +156,7 @@ export type TransformMapObject<TCondition> = {
```

- `language` is the ISO code of the language
- `conditions` are an object containing parts of speech and grammatical forms that are used to check which deinflections make sense. They are referenced by the deinflection rules.
- `conditions` are an object containing parts of speech and grammatical forms that are used to check which deinflections should execute. They are referenced by the deinflection rules.
- `transforms` are the actual deinflection rules
- `TCondition` is an optional generic parameter that can be passed to `LanguageTransformDescriptor`. You can learn more about it at the end of this section.

Expand Down Expand Up @@ -188,7 +188,7 @@ For the input string "cats", the following strings will be looked up:

If the dictionary contains an entry for `cat`, it will successfully match the 2nd looked up string, (as shown in the image). Note the 🧩 symbol and the `plural` rule.

However, this rule will also match the word "reads", and show the verb "read" from the dictionary, marked as being `plural`. This makes no sense, and we can use conditions to prevent it. Let's add a condition and use it in the rule.
However, this rule will also match the word "reads", and show the verb "read" from the dictionary, marked as being `plural`. This makes no sense (e.g. "I have many reads" is not a sensible sentence), and we can use conditions to prevent it. Let's add a condition and use it in the rule.

```js
conditions: {
Expand Down Expand Up @@ -328,6 +328,10 @@ Here, by setting `valid` to `false`, we are telling the test function to fail th

You can also optionally pass a `preprocess` helper function to `testLanguageTransformer`. Refer to the language transforms test files for its specific use case.

#### Testing manually

If you want to test manually, make sure to reload the extension between changes to reflect your code changes. See the [CONTRIBUTING.md](../../CONTRIBUTING.md#loading-an-unpacked-build-into-chromium-browsers) doc for more info.

#### Opting in autocompletion

If you want additional type-checking and autocompletion when writing your deinflection rules, you can add them with just a few extra lines of code. Due to the limitations of TypeScript and JSDoc annotations, we will have to perform some type magic in our transformations file, but you don't need to understand what they mean in detail.
Expand Down
64 changes: 64 additions & 0 deletions ext/js/language/es/spanish-transforms.js
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@

import {suffixInflection, wholeWordInflection} from '../language-transforms.js';

/** @typedef {keyof typeof conditions} Condition */
const REFLEXIVE_PATTERN = /\b(me|te|se|nos|os)\s+(\w+)(ar|er|ir)\b/g;

const ACCENTS = new Map([
['a', 'á'],
['e', 'é'],
Expand Down Expand Up @@ -618,5 +621,66 @@ export const spanishTransforms = {
wholeWordInflection('fueran', 'ir', ['v'], ['v']),
],
},
'participle': {
name: 'participle',
description: 'Participle form of a verb',
rules: [
// -ar verbs
suffixInflection('ado', 'ar', ['adj'], ['v_ar']),
// -er verbs
suffixInflection('ido', 'er', ['adj'], ['v_er']),
// -ir verbs
suffixInflection('ido', 'ir', ['adj'], ['v_ir']),
// irregular verbs
wholeWordInflection('dicho', 'decir', ['adj'], ['v']),
wholeWordInflection('escrito', 'escribir', ['adj'], ['v']),
wholeWordInflection('hecho', 'hacer', ['adj'], ['v']),
wholeWordInflection('muerto', 'morir', ['adj'], ['v']),
wholeWordInflection('puesto', 'poner', ['adj'], ['v']),
wholeWordInflection('roto', 'romper', ['adj'], ['v']),
wholeWordInflection('visto', 'ver', ['adj'], ['v']),
wholeWordInflection('vuelto', 'volver', ['adj'], ['v']),
],
},
'reflexive': {
name: 'reflexive',
description: 'Reflexive form of a verb',
rules: [
suffixInflection('arse', 'ar', ['v_ar'], ['v_ar']),
suffixInflection('erse', 'er', ['v_er'], ['v_er']),
suffixInflection('irse', 'ir', ['v_ir'], ['v_ir']),
],
},
'pronoun substitution': {
name: 'pronoun substitution',
description: 'Substituted pronoun of a reflexive verb',
rules: [
suffixInflection('arme', 'arse', ['v_ar'], ['v_ar']),
suffixInflection('arte', 'arse', ['v_ar'], ['v_ar']),
suffixInflection('arnos', 'arse', ['v_er'], ['v_er']),
suffixInflection('erme', 'erse', ['v_er'], ['v_er']),
suffixInflection('erte', 'erse', ['v_er'], ['v_er']),
suffixInflection('ernos', 'erse', ['v_er'], ['v_er']),
suffixInflection('irme', 'irse', ['v_ir'], ['v_ir']),
suffixInflection('irte', 'irse', ['v_ir'], ['v_ir']),
suffixInflection('irnos', 'irse', ['v_ir'], ['v_ir']),
],
},
'pronominal': {
// me despertar -> despertarse
name: 'pronominal',
description: 'Pronominal form of a verb',
rules: [
{
type: 'other',
isInflected: new RegExp(REFLEXIVE_PATTERN),
deinflect: (term) => {
return term.replace(REFLEXIVE_PATTERN, (_match, _pronoun, verb, ending) => `${verb}${ending}se`);
},
conditionsIn: ['v'],
conditionsOut: ['v'],
},
],
},
},
};
16 changes: 16 additions & 0 deletions test/language/spanish-transforms.test.js
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,22 @@ const tests = [
{term: 'vivir', source: 'vivan', rule: 'v', reasons: ['present subjunctive']},
],
},
{
category: 'participle',
valid: true,
tests: [
{term: 'escuchar', source: 'escuchado', rule: 'v', reasons: ['participle']},
],
},
{
category: 'reflexive',
valid: true,
tests: [
{term: 'lavar', source: 'lavarse', rule: 'v', reasons: ['reflexive']},
{term: 'lavarse', source: 'lavarte', rule: 'v', reasons: ['pronoun substitution']},
{term: 'lavarse', source: 'me lavar', rule: 'v', reasons: ['pronominal']},
],
},

];

Expand Down
2 changes: 1 addition & 1 deletion types/ext/language-transformer.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ export type DeinflectFunction = (inflectedWord: string) => string;

export type Rule<TCondition = string> = {
type: 'suffix' | 'prefix' | 'wholeWord' | 'other';
isInflected: RegExp;
isInflected: RegExp; // If evaluates true, will try to deinflect
deinflect: DeinflectFunction;
conditionsIn: TCondition[];
conditionsOut: TCondition[];
Expand Down

0 comments on commit 3d9756b

Please sign in to comment.