How to use Coqui STT for a text-to-speech server (in NodeJs) #1870

solyarisoftware · 2021-05-17T15:12:38Z

solyarisoftware
May 17, 2021

H all!

I just published a very simple opensource project: https://github.com/solyarisoftware/CoquiSTTJs enabling NodeJs developers to use Coqui STT with a simplified API.

Now I want to set up a speech recognition SERVER architecture, using Coqui STT engine to manage multiple concurrent user requests.

The problem:
Following some quick tests (using CPU, without a GPU), STT (as DeepSpeech) decoder seems to me a single-thread "long-processing" application that put a single CPU core at 100% for a while (speech to text of a 3 words english sentence has a latency of more than 1 sec on my laptop. For details see this test).

To build a server, I instead need a multi-process / multi-thread architecture. My preferred approach, in NodeJs, would be to use NodeJs "worker threads", passing the loaded Model object from a main dispatcher thread and the workers (that could make the STT in a separate thread), nevertheless, believe it doesn't run because data passing with worker threads is "by value" and I suppose the Model is a huge in memory object.

Questions:

It seems to me that the STT Model is loaded once in memory (using Linux) as a shared library. That's correct? How can see how much RAM memory a Model uses?
If true, probably the solution is to build a pool of worker processes each one accessing the Model separately. Does it make sense?

See also this thread:
https://discourse.mozilla.org/t/how-to-use-deepspeech-for-a-text-to-speech-server-in-nodejs/79636/2

Thoughts? Suggestions?

Thanks!
Giorgio

solyarisoftware · 2021-06-03T08:56:46Z

solyarisoftware
Jun 3, 2021
Author

Some advances:

With the new version of coquittsjs, I partially "solved" the multithreading need:
https://github.com/solyarisoftware/CoquiSTTJs/tree/master/examples#multithreading-speech-to-text

I'm still renew my question 1: is Coqui STT Model loaded once in memory (as a shared library)?

thanks

0 replies

reuben · 2021-06-03T09:31:14Z

reuben
Jun 3, 2021
Maintainer

The model is not loaded as a shared library. When using the .pbmm model, it's mmap-ed. No idea how that works when mapping multiple times the same file.

0 replies

solyarisoftware · 2021-06-03T10:32:17Z

solyarisoftware
Jun 3, 2021
Author

Hi Reuben,

mmap keyword clarify the point! As far as I understand it's like a not-persistent shared memory and works well for me: on my linux machine I experience that the first time a program load the (huge) model in RAM, loading latency is ~20 seconds.
No problem for successive runs of programs accessing the loaded model. Now latency is ~10 milliseconds.
Is it still not clear when eventually the mmaped memory will be freed... but I believe mamp are a good design choice, allowing to build multiThread/multiprocess servers (in different programming languages).

Thanks

1 reply

reuben Jun 3, 2021
Maintainer

To be clear the native client is not /at all/ designed for multithread or multiprocess server. It's designed for single stream, on-device inference. 20 second long latency is also completely unseen from my tests or reports, precisely because of the mmap. Make sure nothing is wrong with your IO setup.

solyarisoftware · 2021-06-03T10:46:46Z

solyarisoftware
Jun 3, 2021
Author

To be clear the native client is not /at all/ designed for multithread or multiprocess server. It's designed for single stream, on-device inference.

Yes, that's the reason I'm working on setting-up a multithread architecture on top pf native APIs:
https://github.com/solyarisoftware/CoquiSTTJs/tree/master/examples#multithreading-speech-to-text

The doubt I have now is if multiples threads can concurrently access the model, and I guess they can just because the model is a mmap, right?

20 second long latency is also completely unseen from my tests or reports, precisely because of the mmap. Make sure nothing is wrong with your IO setup.

That's happens on my linux desktop laptop, with small amount of free RAM, and it's probably due to a pages swap, but it's not an issue because it happens just the "first time". At regime, model loading take few msecs!

1 reply

reuben Jun 3, 2021
Maintainer

Yes, that's the reason I'm working on setting-up a multithread architecture on top pf native APIs:
https://github.com/solyarisoftware/CoquiSTTJs/tree/master/examples#multithreading-speech-to-text

Sure, what I'm saying is that you should be suspicious of any performance issues and question our design choices.

solyarisoftware · 2021-06-04T16:30:56Z

solyarisoftware
Jun 4, 2021
Author

Hi @reuben
continuining my experiment (to create a Coqui STT server in nodejs) I implemented a simple HTTP server that, for each incoming request (an HTTP POST with attached a binary WAV to decode), "spawns" a workerThread to serve each incoming request. What the worker thread does is just loading the model, calling model.STT and freeing the model. See here.

Now, If I "stress test" the above demo server, with a simple bash script that send curl requests, every few hundreds of milliseconds, the server crashes with segmentation fault. See logs.

Sometime, if I relax the delay between requests, up to 500 msecs, the program crashes after a while and in this case I also read in stderr the message:

#
# Fatal error in , line 0
# Check failed: 0 == value.
#
#
#
#FailureMessage Object: 0x7f16687596f0
 1: 0xb831a1  [node]
 2: 0x1c1e754 V8_Fatal(char const*, ...) [node]
 3: 0xec2191 v8::internal::Factory::InitializeMap(v8::internal::Map, v8::internal::InstanceType, int, v8::internal::ElementsKind, int) [node]
 4: 0xec2212 v8::internal::Factory::NewMap(v8::internal::InstanceType, int, v8::internal::ElementsKind, int) [node]
 5: 0x10de15b v8::internal::Map::RawCopy(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Map>, int, int) [node]
 6: 0x10de2fa v8::internal::Map::CopyNormalized(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Map>, v8::internal::PropertyNormalizationMode) [node]
 7: 0x10e0aaa v8::internal::Map::Normalize(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Map>, v8::internal::ElementsKind, v8::internal::PropertyNormalizationMode, char const*) [node]
 8: 0x10b52de v8::internal::JSObject::NormalizeProperties(v8::internal::Isolate*, v8::internal::Handle<v8::internal::JSObject>, v8::internal::PropertyNormalizationMode, int, char const*) [node]
 9: 0x10b6681 v8::internal::JSObject::OptimizeAsPrototype(v8::internal::Handle<v8::internal::JSObject>, bool) [node]
10: 0xcfb66d  [node]
11: 0xcf945a  [node]
12: 0xcfac00  [node]
13: 0xcfbbca v8::internal::ApiNatives::InstantiateObject(v8::internal::Isolate*, v8::internal::Handle<v8::internal::ObjectTemplateInfo>, v8::internal::Handle<v8::internal::JSReceiver>) [node]
14: 0xd23fb2 v8::ObjectTemplate::NewInstance(v8::Local<v8::Context>) [node]
15: 0x7f171c2cc26b  [/home/giorgio/CoquiSTTjs/node_modules/stt/lib/binding/v0.10.0-alpha.6/linux-x64/node-v93/stt.node]
16: 0x7f171c2d16fe  [/home/giorgio/CoquiSTTjs/node_modules/stt/lib/binding/v0.10.0-alpha.6/linux-x64/node-v93/stt.node]
17: 0xd655cb  [node]
18: 0xd66a6c  [node]
19: 0xd670e6 v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) [node]
20: 0x16148d9  [node]
Illegal instruction (core dumped)

Now I'm confused, because the I don't understand what happens. it seems that the server crashes if incoming requests are "close" in time. (under 300 msecs).

Premising also that:

I'm aware of the fact serving each incoming request with a dedicated workerThread, create on the fly, is a quick & dirty solution, and probably i need a worker threads pool,
I'm aware this problem is maybe not directly related to the Coqui STT Nodejs binding implementation and Coqui STT is indeed designed for single stream, on-device inference.

My question/help request is:
Do you think the problem is related to accessing mmaped model from nodejs workerThreads?
If so, I'd probably solve using child processes instead of workerThreads?
Or is the problem unrelated from serving requests using threads or processes (because related to the N-API binding)?

Let me know if you need more info.
Thanks for your patience.
Giorgio

2 replies

reuben Jun 4, 2021
Maintainer

The stack seems to indicate you're running out of memory, it looks like an allocation failure.

solyarisoftware Jun 7, 2021
Author

Well, I wasn't able to find the reason why, running transcription IN a nodejs workerThread, create a segmentation violation (running out of memory as you suggested), so the running workaround have been to fork processes:

https://github.com/solyarisoftware/CoquiSTTJs/tree/71e27dc943b6aefb5e352adeae759d9f5df042f9/examples#multiprocessing-vs-multithreading-speech-to-text-server-architecture

Now the server seems to run smoothly with concurrent requests:
https://github.com/solyarisoftware/CoquiSTTJs/tree/master/examples#simple-speech-to-text-http-server

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use Coqui STT for a text-to-speech server (in NodeJs) #1870

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 4 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

How to use Coqui STT for a text-to-speech server (in NodeJs) #1870

solyarisoftware May 17, 2021

Replies: 5 comments · 4 replies

solyarisoftware Jun 3, 2021 Author

reuben Jun 3, 2021 Maintainer

solyarisoftware Jun 3, 2021 Author

reuben Jun 3, 2021 Maintainer

solyarisoftware Jun 3, 2021 Author

reuben Jun 3, 2021 Maintainer

solyarisoftware Jun 4, 2021 Author

reuben Jun 4, 2021 Maintainer

solyarisoftware Jun 7, 2021 Author

solyarisoftware
May 17, 2021

Replies: 5 comments 4 replies

solyarisoftware
Jun 3, 2021
Author

reuben
Jun 3, 2021
Maintainer

solyarisoftware
Jun 3, 2021
Author

reuben Jun 3, 2021
Maintainer

solyarisoftware
Jun 3, 2021
Author

reuben Jun 3, 2021
Maintainer

solyarisoftware
Jun 4, 2021
Author

reuben Jun 4, 2021
Maintainer

solyarisoftware Jun 7, 2021
Author