Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locate pattern misses some matches with the automaton-intersection search algorithm #46

Open
eric-laporte opened this issue Apr 9, 2018 · 0 comments

Comments

@eric-laporte
Copy link
Member

eric-laporte commented Apr 9, 2018

Locate pattern misses some matches with the automaton-intersection search algorithm

What steps will reproduce the problem?

  1. Launch Unitex in French and preprocess the 80jours corpus with the dela-fr-public dicionary
  2. Construct the FST-Text
  3. Launch Locate pattern on the V_31H_DA_0009 graph with the default search algorithm (Paumier 2003)
  4. Launch Locate pattern on the same graph with the automaton-intersection search algorithm

What is the expected output?

Both searches should find the same matches.

What do you see instead?

Step 3 finds 40 matches, and step 4 (automaton-intersection algorithm) finds only 3 matches. The automaton-intersection algorithm finds only the matches for which no lexical mask in the path contains several inflectional constraints, as in <V:P:C:I:J:F> which is equivalent to <V:P>+<V:C>+<V:I>+<V:J>+<V:F> (manual, Section 4.3.4).
This symptom reminds me of issue #44.

More info

  • Unitex/GramLab IDE version: 3.2.59 alpha
  • UnitexToolLogger version: 3.2.59 alpha
  • Did this work before?: same symptom on version 3.1
    bug-aut-int.zip
@martinec martinec added this to the v3.2-beta milestone Apr 9, 2018
@martinec martinec removed this from the v3.2-beta milestone Jul 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants