Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Invalid cz code when calling num2words #236

Closed
SkaceKamen opened this issue Dec 27, 2024 · 0 comments · Fixed by #237
Closed

[Bug] Invalid cz code when calling num2words #236

SkaceKamen opened this issue Dec 27, 2024 · 0 comments · Fixed by #237
Labels
bug Something isn't working

Comments

@SkaceKamen
Copy link

Describe the bug

Due to a change in num2words package, cz is no longer valid lang code. cs should be used now.

See savoirfairelinux/num2words#587 for the change

To Reproduce

  1. Try to use TTS with czech language and latest num2words dependency
  2. Crash due to unsupported language

Expected behavior

No response

Logs

File "/usr/local/lib/python3.10/site-packages/TTS/api.py", line 366, in tts_to_file
    wav = self.tts(
  File "/usr/local/lib/python3.10/site-packages/TTS/api.py", line 312, in tts
    wav = self.synthesizer.tts(
  File "/usr/local/lib/python3.10/site-packages/TTS/utils/synthesizer.py", line 406, in tts
    outputs = self.tts_model.synthesize(
  File "/usr/local/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 410, in synthesize
    return self.full_inference(text, speaker_wav, language, **settings)
  File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 479, in full_inference
    return self.inference(
  File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 525, in inference
    text_tokens = torch.IntTensor(self.tokenizer.encode(sent, lang=language)).unsqueeze(0).to(self.device)
  File "/usr/local/lib/python3.10/site-packages/TTS/tts/layers/xtts/tokenizer.py", line 666, in encode
    txt = self.preprocess_text(txt, lang)
  File "/usr/local/lib/python3.10/site-packages/TTS/tts/layers/xtts/tokenizer.py", line 652, in preprocess_text
    txt = multilingual_cleaners(txt, lang)
  File "/usr/local/lib/python3.10/site-packages/TTS/tts/layers/xtts/tokenizer.py", line 573, in multilingual_cleaners
    text = expand_numbers_multilingual(text, lang)
  File "/usr/local/lib/python3.10/site-packages/TTS/tts/layers/xtts/tokenizer.py", line 562, in expand_numbers_multilingual
    text = re.sub(_number_re, lambda m: _expand_number(m, lang), text)
  File "/usr/local/lib/python3.10/re.py", line 209, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "/usr/local/lib/python3.10/site-packages/TTS/tts/layers/xtts/tokenizer.py", line 562, in <lambda>
    text = re.sub(_number_re, lambda m: _expand_number(m, lang), text)
  File "/usr/local/lib/python3.10/site-packages/TTS/tts/layers/xtts/tokenizer.py", line 542, in _expand_number
    return num2words(int(m.group(0)), lang=lang if lang != "cs" else "cz")
  File "/usr/local/lib/python3.10/site-packages/num2words/__init__.py", line 98, in num2words
    raise NotImplementedError()

Environment

{
    "CUDA": {
        "GPU": [
            "Tesla P40"
        ],
        "available": true,
        "version": "12.4"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.5.1+cu124",
        "TTS": "0.25.1",
        "numpy": "1.26.4"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "",
        "python": "3.10.16",
        "version": "#140-Ubuntu SMP Wed Dec 18 17:59:53 UTC 2024"
    }
}

Additional context

Simple fix would be to remove the fix that was probably applied in the past to get around the num2words non-standard code:
https://github.com/idiap/coqui-ai-TTS/blob/dev/TTS/tts/layers/xtts/tokenizer.py#L504
https://github.com/idiap/coqui-ai-TTS/blob/dev/TTS/tts/layers/xtts/tokenizer.py#L509
https://github.com/idiap/coqui-ai-TTS/blob/dev/TTS/tts/layers/xtts/tokenizer.py#L538
https://github.com/idiap/coqui-ai-TTS/blob/dev/TTS/tts/layers/xtts/tokenizer.py#L542

The num2words release that contains the fix:
https://github.com/savoirfairelinux/num2words/releases/tag/v0.5.14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant