Proper switches and voice for TTS of technical documents? #173

pvonmoradi · 2022-10-13T15:07:06Z

What's the best voice and switch combination for synthesis of technical texts (like manuals, books, papers)?
@jnordberg sorry for pinging, I think you have a good idea about this...

For example, consider the following text from Wikipedia:

Connection to artificial intelligence

Since inception, Lisp was closely connected with the artificial intelligence research community, especially on PDP-10 systems. Lisp was used as the implementation of the language Micro Planner, which was used in the famous AI system SHRDLU. In the 1970s, as AI research spawned commercial offshoots, the performance of existing Lisp systems became a growing issue, as programmers needed to be familiar with the performance ramifications of the various techniques and choices involved in the implementation of Lisp.
Genealogy and variants

Over its sixty-year history, Lisp has spawned many variations on the core theme of an S-expression language. Moreover, each given dialect may have several implementations—for instance, there are more than a dozen implementations of Common Lisp.

Differences between dialects may be quite visible—for instance, Common Lisp uses the keyword defun to name a function, but Scheme uses define.[20] Within a dialect that is standardized, however, conforming implementations support the same core language, but with different extensions and libraries.

Suppose we remove [x] and other html artifacts.

How long should a segment be? Should a segment consist of only one paragraph so the system can detect the tone and cadence of the words based on previous sentences?

Is it possible to properly synthesize abbreviates like PDP-11 or AI?

I'm currently using this pipeline in a Unix environment:

# cat source.html | select-paragraphs-by-css | convert-to-plain-text | remove-symbols | NLP-to-sentences
cat source.html | pup 'p' | pandoc --from html --to plain  | tr
 -d '⁰¹²³⁴⁵⁶⁷⁸⁹' | sentences

sentences is a NLP tool that finds sentences on STDIN and outputs one sentence on each line. Then I use this text file as input to the tortoise-tts.py script, setting only voice (lj), --disable-redaction and ultra_fast and fast as presets.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proper switches and voice for TTS of technical documents? #173

Proper switches and voice for TTS of technical documents? #173

pvonmoradi commented Oct 13, 2022 •

edited

Loading

Proper switches and voice for TTS of technical documents? #173

Proper switches and voice for TTS of technical documents? #173

Comments

pvonmoradi commented Oct 13, 2022 • edited Loading

pvonmoradi commented Oct 13, 2022 •

edited

Loading