Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error occurs when parsing a sentence containing multiple sentence in the quotation in split mode == A #248

Open
tadashikumano opened this issue Jul 26, 2022 · 0 comments

Comments

@tadashikumano
Copy link

Error occurs when parsing a sentence containing multiple sentences in the quotation in split mode == A, as in the following example.
No Error occurs in split mode == B or C.

  • Version: 5.1.0
  • Model: both ja_ginza & ja_ginza_electra
% ginza -d -s A
埼玉県の男性は「青森らしい祭りが見られてよかったです。みんな待ちわびていました」と話していました。
Traceback (most recent call last):
  File ".../ginza/bin/ginza", line 8, in <module>
    sys.exit(main_ginza())
  File ".../ginza/lib/python3.9/site-packages/ginza/command_line.py", line 357, in main_ginza
    plac.call(run_ginza)
  File ".../ginza/lib/python3.9/site-packages/plac_core.py", line 436, in call
    cmd, result = parser.consume(arglist)
  File ".../ginza/lib/python3.9/site-packages/plac_core.py", line 287, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File ".../ginza/lib/python3.9/site-packages/ginza/command_line.py", line 341, in run_ginza
    run(
  File ".../ginza/lib/python3.9/site-packages/ginza/command_line.py", line 130, in run
    _analyze_tty(analyzer, output)
  File ".../ginza/lib/python3.9/site-packages/ginza/command_line.py", line 147, in _analyze_tty
    output.write(analyzer.analyze_line(line))
  File ".../ginza/lib/python3.9/site-packages/ginza/analyzer.py", line 129, in analyze_line
    return format_doc(doc, self.output_format, self.use_normalized_form, self.use_orth_if_reading_is_none)
  File ".../ginza/lib/python3.9/site-packages/ginza/analyzer.py", line 136, in format_doc
    return "".join(format_conllu(sent, use_normalized_form, use_orth_if_reading_is_none) for sent in doc.sents)
  File ".../ginza/lib/python3.9/site-packages/ginza/analyzer.py", line 136, in <genexpr>
    return "".join(format_conllu(sent, use_normalized_form, use_orth_if_reading_is_none) for sent in doc.sents)
  File ".../ginza/lib/python3.9/site-packages/ginza/analyzer.py", line 199, in format_conllu
    token_lines = "".join(conllu_token_line(sent, token, np_label, use_bunsetu, use_normalized_form, use_orth_if_reading_is_none) for token, np_label in zip(sent, np_labels))
  File ".../ginza/lib/python3.9/site-packages/ginza/analyzer.py", line 199, in <genexpr>
    token_lines = "".join(conllu_token_line(sent, token, np_label, use_bunsetu, use_normalized_form, use_orth_if_reading_is_none) for token, np_label in zip(sent, np_labels))
  File ".../ginza/lib/python3.9/site-packages/ginza/analyzer.py", line 233, in conllu_token_line
    token.norm_ if use_normalized_form else token.lemma_,
  File "spacy/tokens/token.pyx", line 860, in spacy.tokens.token.Token.lemma_.__get__
  File "spacy/strings.pyx", line 132, in spacy.strings.StringStore.__getitem__
KeyError: "[E018] Can't retrieve string for hash '13686814359803441017'. This usually refers to an issue with the `Vocab` or `StringStore`."

Regards,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant