Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem in building clustergen indic voice #13

Closed
skmalviya opened this issue Aug 18, 2018 · 123 comments
Closed

Problem in building clustergen indic voice #13

skmalviya opened this issue Aug 18, 2018 · 123 comments

Comments

@skmalviya
Copy link

Following the link "http://festvox.org/bsv/x3528.html", tried to built hindi tts from scratch on sample of100 'hindi' wav files ['hindi_0001.wav' - 'hindi_0102.wav'] obtained from 'cmu_indic_hin_ab.tar.bz2'.

Every script works fine upto the following script--
./bin/do_clustergen parallel cluster etc/txt.done.data.train

getting lots of 'file not found' & 'Segmentation fault (core dumped)' Errors

example------------ final lines of above command:

Segmentation fault (core dumped)
Dataset of 0 vectors of 64 parameters from: festival/feats/9r_1.feats
Attempt to access channel 1 of 0 channel track
Dataset of 0 vectors of 64 parameters from: festival/feats/9r_1.feats
Segmentation fault (core dumped)
Segmentation fault (core dumped)
Dataset of 0 vectors of 64 parameters from: festival/feats/9r=_1.feats
Attempt to access channel 1 of 0 channel track
Dataset of 0 vectors of 64 parameters from: festival/feats/9r=_1.feats
Dataset of 0 vectors of 64 parameters from: festival/feats/9r_1.feats
Attempt to access channel 1 of 0 channel track
Dataset of 0 vectors of 64 parameters from: festival/feats/9r_1.feats
Segmentation fault (core dumped)
Dataset of 0 vectors of 64 parameters from: festival/feats/9r=_1.feats
Attempt to access channel 1 of 0 channel track
Dataset of 0 vectors of 64 parameters from: festival/feats/9r=_1.feats
Segmentation fault (core dumped)
Dataset of 0 vectors of 64 parameters from: festival/feats/9r_1.feats
Attempt to access channel 1 of 0 channel track
Dataset of 0 vectors of 64 parameters from: festival/feats/9r_1.feats
Segmentation fault (core dumped)
Segmentation fault (core dumped)
Dataset of 0 vectors of 64 parameters from: festival/feats/9r=_1.feats
Attempt to access channel 1 of 0 channel track
Dataset of 0 vectors of 64 parameters from: festival/feats/9r=_1.feats
Dataset of 0 vectors of 64 parameters from: festival/feats/9r_1.feats
Attempt to access channel 1 of 0 channel track
Dataset of 0 vectors of 64 parameters from: festival/feats/9r_1.feats
Segmentation fault (core dumped)
Dataset of 0 vectors of 64 parameters from: festival/feats/9r=_1.feats
Attempt to access channel 1 of 0 channel track
Dataset of 0 vectors of 64 parameters from: festival/feats/9r=_1.feats
Segmentation fault (core dumped)
Dataset of 0 vectors of 64 parameters from: festival/feats/9r_1.feats
Attempt to access channel 1 of 0 channel track
Dataset of 0 vectors of 64 parameters from: festival/feats/9r_1.feats
Segmentation fault (core dumped)
Dataset of 0 vectors of 64 parameters from: festival/feats/9r=_1.feats
Attempt to access channel 1 of 0 channel track
Dataset of 0 vectors of 64 parameters from: festival/feats/9r=_1.feats
Segmentation fault (core dumped)
Segmentation fault (core dumped)
Dataset of 0 vectors of 64 parameters from: festival/feats/9r_1.feats
Attempt to access channel 1 of 0 channel track
Dataset of 0 vectors of 64 parameters from: festival/feats/9r_1.feats
Segmentation fault (core dumped)
Segmentation fault (core dumped)
Segmentation fault (core dumped)
Segmentation fault (core dumped)
Segmentation fault (core dumped)
Segmentation fault (core dumped)
Segmentation fault (core dumped)
Collect trees
SIOD ERROR: wrong type of argument to setcar
BACKTRACE:
0: (set-car! (car tree) vector_num)
1: (clustergen::dump_tree_vectors tree rawtrackfd)
2: (set! tree (clustergen::dump_tree_vectors tree rawtrackfd))
3: (f (car l2))
4: (cons (f (car l2)) r)
5: (set! r (cons (f (car l2)) r))
6: (while l2 (set! r (cons (f (car l2)) r)) (set! l2 (cdr l2)))
7: (mapcar
(lambda
(unit)
(...)
...)
unittypes)
8: (if
(consp cg:multimodel)
(mapcar
(...)
cg:multimodel)
...)
9: (begin
(set! cg:parallel_tree_build t)
(build_clustergen "etc/txt.done.data.train"))
closing a file left open: festival/trees/cmu_indic_ss_mcep.rawparams
closing a file left open: festival/trees/cmu_indic_ss_mcep.tree

Please tell the solution where I am doing wrong.

Note:- I have build all the required tools as mentioned in 'fest_build' script.

@saikrishnarallabandi
Copy link
Collaborator

Hi,

This looks like an issue with labeling or utts creation.

Do you have a log of the previous steps so that I can better figure this out?

Also the steps you pointed out are a bit old. I can point you at the latest set of steps.

@saikrishnarallabandi
Copy link
Collaborator

saikrishnarallabandi commented Aug 18, 2018

Here is the series of steps to build a decent voice assuming Festival, Festvox, speechtools and SPTK are installed (let me know if this is an issue):

Setup the directory structure

$FESTVOXDIR/src/clustergen/setup_cg cmu indic hin ab #(assuming name of the voice is ab)

(16 July 2020 so so so sorry I entered this wrong. The command should be as follows:
$FESTVOXDIR/src/clustergen/setup_cg_indic cmu indic hin ab #(assuming name of the voice is ab)

setup_cg_indic NOT setup_cg sorry for the inconvenience)

Copy the wavefiles and prompts

./bin/get_wavs ${LOCATION}/*.wav
cp ${LOCATION}/txt.done.data etc/txt.done.data

Some Text Processing

./bin/do_build build_prompts etc/txt.done.data
./bin/do_build label etc/txt.done.data
./bin/do_clustergen parallel build_utts etc/txt.done.data
./bin/do_clustergen generate_statenames etc/txt.done.data
./bin/do_clustergen generate_filters etc/txt.done.data

Feature Extraction

./bin/do_clustergen parallel f0_v_sptk etc/txt.done.data
./bin/do_clustergen parallel mcep_sptk etc/txt.done.data
./bin/do_clustergen parallel str_sptk etc/txt.done.data # Strengths of excitation

Combining the features for Machine Learning

mv festvox/clustergen.scm festvox/clustergen.scm.xxx
cat festvox/clustergen.scm.xxx |
sed 's/mixed_excitation nil/mixed_excitation t/' |
cat >festvox/clustergen.scm
./bin/do_clustergen parallel combine_coeffs_me etc/txt.done.data

Separate train and test splits

./bin/traintest etc/txt.done.data

Training

./bin/do_clustergen parallel cluster etc/txt.done.data.train
./bin/do_clustergen dur etc/txt.done.data.train

Testing

./bin/do_clustergen cg_test resynth cgp etc/txt.done.data.test
./bin/do_clustergen cg_test tts tts etc/txt.done.data.test

@skmalviya
Copy link
Author

Thanks for the such detailed help.

I followed the latest set of steps suggested. Got stuck with the following issues on the similar sample of 100 wav files from 'cmu_indic_hin_ab.tar.bz2'.:

######################## Issue 1############################
./bin/do_build build_prompts etc/txt.done.data
SIOD ERROR: could not open file ./festvox/language_variant.scm
closing a file left open: ./festvox/indic_lexicon.scm
closing a file left open: ./festvox/cmu_indic_hin_lexicon.scm
closing a file left open: festvox/cmu_indic_hin_clunits.scm
closing a file left open: festvox/build_clunits.scm
(Note:- This was there before too in the earlier scripts. But I had resolved it by putting a file named "language_variant.scm" with content 'hin' in 'cmu_indic_hin_ab/festvox/' directory.

######################## Issue 2############################
./bin/do_clustergen parallel str_stpk etc/txt.done.data # Strengths of excitation
do_clustergen: unknown options str_stpk tmpdir/dobuild_parallelworker.31140.4
do_clustergen: unknown options str_stpk tmpdir/dobuild_parallelworker.31139.3
do_clustergen: unknown options str_stpk tmpdir/dobuild_parallelworker.31137.1
do_clustergen: unknown options str_stpk tmpdir/dobuild_parallelworker.31141.5
do_clustergen: unknown options str_stpk tmpdir/dobuild_parallelworker.31138.2
do_clustergen: unknown options str_stpk tmpdir/dobuild_parallelworker.31136.0
do_clustergen: unknown options str_stpk tmpdir/dobuild_parallelworker.31142.6
do_clustergen: unknown options str_stpk tmpdir/dobuild_parallelworker.31155.7
do_clustergen: unknown options str_stpk tmpdir/dobuild_parallelworker.31173.10
do_clustergen: unknown options str_stpk tmpdir/dobuild_parallelworker.31171.9
do_clustergen: unknown options str_stpk tmpdir/dobuild_parallelworker.31177.11
do_clustergen: unknown options str_stpk tmpdir/dobuild_parallelworker.31186.13
do_clustergen: unknown options str_stpk tmpdir/dobuild_parallelworker.31168.8
do_clustergen: unknown options str_stpk tmpdir/dobuild_parallelworker.31183.12
do_clustergen: unknown options str_stpk tmpdir/dobuild_parallelworker.31187.14
do_clustergen: unknown options str_stpk tmpdir/dobuild_parallelworker.31204.15

######################## Issue 3############################
./bin/traintest etc/txt.done.data

hindi_0003 COMBINE_COEFFS (f0,mcep_deltas,str,v)
hindi_0002 COMBINE_COEFFS (f0,mcep_deltas,str,v)
hindi_0004 COMBINE_COEFFS (f0,mcep_deltas,str,v)
hindi_0016 COMBINE_COEFFS (f0,mcep_deltas,str,v)
hindi_0001 COMBINE_COEFFS (f0,mcep_deltas,str,v)
hindi_0005 COMBINE_COEFFS (f0,mcep_deltas,str,v)
hindi_0006 COMBINE_COEFFS (f0,mcep_deltas,str,v)
hindi_0007 COMBINE_COEFFS (f0,mcep_deltas,str,v)
cat: str/hindi_0001.str: No such file or directory
cat: str/hindi_0002.str: No such file or directory
cat: str/hindi_0004.str: No such file or directory
cat: str/hindi_0006.str: No such file or directory
cat: str/hindi_0005.str: No such file or directory
cat: str/hindi_0007.str: No such file or directory
cat: str/hindi_0016.str: No such file or directory
cat: str/hindi_0003.str: No such file or directory
..............
issue3.txt

######################## Issue 4############################
./bin/do_clustergen cg_test resynth cgp etc/txt.done.data.test
Error reading ESPS file /home/shrikant/festival_hindi_tts/indic/cmu_indic_hin_ab//festival/trees/cmu_indic_hin_mcep.params
Cannot load track: /home/shrikant/festival_hindi_tts/indic/cmu_indic_hin_ab//festival/trees/cmu_indic_hin_mcep.params
SIOD ERROR: could not open file /home/shrikant/festival_hindi_tts/indic/cmu_indic_hin_ab//festival/trees/cmu_indic_hin_mcep.tree
awk: cmd. line:2: fatal: division by zero attempted
awk: cmd. line:2: fatal: division by zero attempted
awk: cmd. line:2: fatal: division by zero attempted
awk: cmd. line:2: fatal: division by zero attempted

Not gone further after this much of error......

@saikrishnarallabandi
Copy link
Collaborator

saikrishnarallabandi commented Aug 19, 2018 via email

@skmalviya
Copy link
Author

skmalviya commented Aug 19, 2018

same as mentioned in the script fest_build. I just ran the script and then source the "export_various_PATHS.sh" in order to export them.
export_various_PATHS.txt
fest_build.txt

@saikrishnarallabandi
Copy link
Collaborator

This seems the issue.
SIOD ERROR: could not open file ./festvox/language_variant.scm

I see that you have latest versions. Just to be sure, can you create a new directory and run only the prompt building command ( ./bin/do_build build_prompts) that gave this error.

Let me know if this happens again

@skmalviya
Copy link
Author

shall I put the content of file ./festvox/language_variant.scm as hin?
And what about the issue2, issue3 and issue4?

@saikrishnarallabandi
Copy link
Collaborator

Issues 2 through 4 are caused by 1.

The content of /festvox/language_variant.scm should be 'hin' by default.

@saikrishnarallabandi
Copy link
Collaborator

Note that the sample build script in fest_build.txt is for English.

When building an indic voice, the command to setup directory structure is:
$FESTVOXDIR/src/clustergen/setup_cg_indic cmu indic hindi ab # 4 arguments instead of 3

@skmalviya
Copy link
Author

Hello saikrishna!

I followed exactly as instructed.
commands I executed are as below:
mkdir cmu_indic_hin_ab cd cmu_indic_hin_ab $FESTVOXDIR/src/clustergen/setup_cg_indic cmu indic hin ab
And I put wavs in /wav folder and txt.done.data inside /etc, both are of size "100"
after this I ran this script directly having all the new steps as told earlier"
sh script
script.txt

Please see, I attached complete directory in a zip.
cmu_indic_hin_ab

@skmalviya
Copy link
Author

One more point! During the execution of step13, a continuous stream of multiple segmentation fault errors are coming:
./bin/do_clustergen parallel cluster etc/txt.done.data.train
Errors at step 13.txt

For other steps (1 --> 12), out files are there in the above attached zipped folder.

Not able to figure out where the problem is actually because I dont see any difference in errors earlier and now.

Thanks for the support and help saikrishna btw.

@saikrishnarallabandi
Copy link
Collaborator

For some reason I am unable to download the directory. Can you attach ou1, out2 here

@skmalviya
Copy link
Author

Yes please have a look

out1.txt
out2.txt

@saikrishnarallabandi
Copy link
Collaborator

Nothing wrong with these. I next am suspecting some issue in feature extraction. Can you attach out6 , out7, out8.

@skmalviya
Copy link
Author

OK
out6.txt
out7.txt
out8.txt

@saikrishnarallabandi
Copy link
Collaborator

saikrishnarallabandi commented Aug 22, 2018

The problem is with out8.
That step is extracting "Strengths of excitation" for using as Mixed Excitation.
The script fails saying it doesnt recognize the command 'str_sptk'

There are two things we can do for this:

(1) Ignore this and continue voice building.

In this case, modify the next step to the following
./bin/do_clustergen parallel combine_coeffs_v etc/txt.done.data

from
./bin/do_clustergen parallel combine_coeffs_me etc/txt.done.data

combine_coeffs_me uses strengths of excitation
combine_coeffs_v ignores them.

In this case, we also need to modify the clustergen.scm file and indicate that we are not using mixed excitation. An easy way to do this is the following:

cp festvox/clustergen.scm.xxx festvox/clustergen.scm # ( We previously made an explicit in this file through steps 9 and 10 that we would be using Mixed excitation. So we are just reverting.)

Now you can run the clustering step:
./bin/do_clustergen parallel cluster etc/txt.done.data.train

(2) The other (and real) solution is to dig deeper into why str_sptk is failing. Can you paste the file ./bin/do_clustergen here so that I can inspect it. It should support the argument 'str_sptk'

@skmalviya
Copy link
Author

For (2), Bear if you can the do_clustergen file is here

do_clustergen.txt

(1) Let me incorporate it.

@saikrishnarallabandi
Copy link
Collaborator

saikrishnarallabandi commented Aug 22, 2018

Wait. I just noticed that the spelling is incorrect in the step8 in the script you shared.

It should be str_sptk not str_stpk

@saikrishnarallabandi
Copy link
Collaborator

Once things run smoothly till 'cluster', I'd say run the duration model, the following without 'parallel'
./bin/do_clustergen dur etc/txt.done.data.train

instead of

./bin/do_clustergen parallel dur etc/txt.done.data.train

@saikrishnarallabandi
Copy link
Collaborator

I realize that I made that spelling error when I shared the steps. Sorry for that :)

@skmalviya
Copy link
Author

Still the same situation. Please have a look again what I did this time.

  1. I ran : $FESTVOXDIR/src/clustergen/setup_cg_indic cmu indic hin ab # again after emptying the directory except the script
  2. Copied again wav and txt.done.data to respective directory.
  3. Ran the command : sh script
    script.txt

Got the ouput files:
out1.txt
out2.txt
out3.txt

out6.txt
out7.txt
out8.txt
out9.txt
out11.txt

Again_Errors at step 13.txt

Complete Directory in a Zipped

@saikrishnarallabandi
Copy link
Collaborator

Step 11 has an error on the last phone z_3.

Can you run that step again.

Once that runs successfully, should be fine.

I also notice that there are seg faults in step 13 log. Segmentation fault might also be occuring due to less space being allocated. There is a parameter called SIODHEAPSIZE in do_clustergen. Increasing that should alleviate this fault.

@skmalviya
Copy link
Author

Step 11 you mean to say: this command
./bin/do_clustergen parallel cluster etc/txt.done.data.train > out11

I checked : SIODHEAPSIZE=20000000 in the ./bin/do_clustergen file
I increased to one more zero.... SIODHEAPSIZE=200000000
Now its giving the error : "WALLOC: failed to malloc -424509440 bytes"

@saikrishnarallabandi
Copy link
Collaborator

@step 11 yes

Just double the heap size and see ( not multiplying by 10). This is usually not necessary tbh

@skmalviya
Copy link
Author

With updated SIODHEAPSIZE=25000000
I ran again command ./bin/do_clustergen parallel cluster etc/txt.done.data.train > out11
Still the situation is same...
out11.txt
ErrorStep13.txt # These Errors comes on terminal while execution of the command

@saikrishnarallabandi
Copy link
Collaborator

Hi,

I was able to download the zip. When I ran the step, it did finish without any issues.

Here are the last lines from log:

RMSE 0.1516 Correlation is 0.9867 Mean (abs) Error 0.0963 (0.1171)
Dataset of 74 vectors of 67 parameters from: festival/feats/9r=_3.feats
Dataset of 74 vectors of 67 parameters from: festival/feats/9r=_3.feats
RMSE 0.1330 Correlation is 0.9899 Mean (abs) Error 0.0837 (0.1035)
Dataset of 73 vectors of 67 parameters from: festival/feats/9r=_2.feats
Dataset of 73 vectors of 67 parameters from: festival/feats/9r=_2.feats
RMSE 0.1317 Correlation is 0.9898 Mean (abs) Error 0.0789 (0.1054)
Dataset of 24 vectors of 67 parameters from: festival/feats/9r=_1.feats
Dataset of 24 vectors of 67 parameters from: festival/feats/9r=_1.feats
RMSE 0.1605 Correlation is 0.9862 Mean (abs) Error 0.0977 (0.1273)
RMSE 0.1527 Correlation is 0.9856 Mean (abs) Error 0.0958 (0.1189)
RMSE 0.1371 Correlation is 0.9911 Mean (abs) Error 0.0832 (0.1089)
RMSE 0.1264 Correlation is 0.9922 Mean (abs) Error 0.0766 (0.1005)
Collect trees
184 unittypes as 1829 subunittypes dumped
Tree models and vector params dumped

I was able to finish the duration model ( next step) and generate test samples too.

This is weird since I am essentially continuing from your folder structure

@saikrishnarallabandi
Copy link
Collaborator

@awbcmu Can you look into this

@festvox
Copy link
Owner

festvox commented Aug 22, 2018

Given the failure of the missing language_variant.scm file I suspect initialization with the wrong version might be the culprit. Also note it should be str_sptk not str_stpk as the option.

Another suggestion it running with the parallel option. If you run out of memory and something dies, that might be hard to detect in the next step.

I would regenerate the templates, and then copy in the waveforms and txt.done.data

@saikrishnarallabandi
Copy link
Collaborator

@shrikant6153 can you run it without 'parallel' once

@skmalviya
Copy link
Author

with or without parallel : ./bin/do_clustergen cluster etc/txt.done.data.train > out11
I stuck with the same error.... sample of it given below....
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
paste: 1081.f0: No such file or directory
sort: cannot read: festival/feats/i_1.feats.unsorted: No such file or directory
sort: cannot read: festival/disttabs/i_1.mcep.unsorted: No such file or directory
rm: cannot remove 'festival/disttabs/i_1.mcep.unsorted': No such file or directory
-=-=-=-=-=- EST Error -=-=-=-=-=-
Tried to extract channel number 0 from track with only 0 channels

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
paste: 1081.f0: No such file or directory
sort: cannot read: 'festival/feats/i:_1.feats.unsorted': No such file or directory
sort: cannot read: 'festival/disttabs/i:_1.mcep.unsorted': No such file or directory
rm: cannot remove 'festival/disttabs/i:_1.mcep.unsorted': No such file or directory
-=-=-=-=-=- EST Error -=-=-=-=-=-
Tried to extract channel number 0 from track with only 0 channels

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
paste: 1081.f0: No such file or directory
sort: cannot read: festival/feats/i_1.feats.unsorted: No such file or directory
sort: cannot read: festival/disttabs/i_1.mcep.unsorted: No such file or directory
rm: cannot remove 'festival/disttabs/i_1.mcep.unsorted': No such file or directory
-=-=-=-=-=- EST Error -=-=-=-=-=-
Tried to extract channel number 0 from track with only 0 channels

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
paste: 1081.f0: No such file or directory
sort: cannot read: 'festival/feats/i:_1.feats.unsorted': No such file or directory
sort: cannot read: 'festival/disttabs/i:_1.mcep.unsorted': No such file or directory
rm: cannot remove 'festival/disttabs/i:_1.mcep.unsorted': No such file or directory
-=-=-=-=-=- EST Error -=-=-=-=-=-
Tried to extract channel number 0 from track with only 0 channels

@saikrishnarallabandi
Copy link
Collaborator

cool. keep me posted :)

also try building a voice ignoring the corpus building process. the only thing is perhaps try to avoid super long sentences.
42 hours is a lot, and reduction to 1.5 seems way too much.

Try building a voice with around 2 to 3 hours and check if ownas has a decent count.

closing for now. reopen here / new issue and tag me if something goes wrong

@prajwaljpj
Copy link

prajwaljpj commented Jul 21, 2020

@saikrishnarallabandi I just trained around 12 hours of data. with strength of excitation from your comment
Here is the log file
whole_process_with_mixed_feature.txt

I must say while corpus creation I allowed 40000 words while this process to make lexicon, The default was 5000 words.
$FESTVOXDIR/src/promptselect/make_nice_prompts make_freq_lex

While training i got many of these warnings
festival/dur/data/dur.data.train.train: bad float -inf in field lisp_zscore_dur vector number

Is this because of my lexicon?

I will update how well it does without pruning silences.

@saikrishnarallabandi
Copy link
Collaborator

saikrishnarallabandi commented Jul 21, 2020 via email

@saikrishnarallabandi
Copy link
Collaborator

saikrishnarallabandi commented Jul 21, 2020 via email

@prajwaljpj
Copy link

prajwaljpj commented Jul 21, 2020

I just for the results of cg_test and the mean still seems to be high
all mean 12.012876 std 1236.585618 var 1529143.991218 n 21965050
F0 mean 48.533261 std 34.990552 var 1224.338737 n 878602
noF0 mean 0.252691 std 0.273995 var 0.075073 n 21086448
MCD mean 6.989517 std 3.008346 var 9.050144 n 878602

Will update if I'm able to synthesize any audio.

@prajwaljpj
Copy link

prajwaljpj commented Jul 21, 2020

Unfortunately, the synthesized data is not understandable. Attaching sample
data_03.zip

@saikrishnarallabandi
Copy link
Collaborator

Ya i was afraid of that. the MCD is too high.

If you have access to GPU based compute, here is the run script for building a Tacotron voice. The example is from Hindi speaker(male).

The script creates exp/taco_one_phseq. Checkpoints and intermediate files(attention plots, etc) will be in exp/taco_one_phseq/checkpoints. The logfile will be in exp/taco_one_phseq/tracking/logfile. TB eventfiles will be in exp/taco_one_phseq/tracking.

If you choose to use this, pay attention to line number 35. I am selecting the shortest 600 utterances in the example script. It might be worth playing with this number. With this configuration, you should be able to see clear attention around 10K timesteps. The model trains for ~200K time steps if you leave it running.

@saikrishnarallabandi
Copy link
Collaborator

Listening to the samples, clearly the duration model seems wrong. What about the wavefiles in the directroy test/cgp?

These use original durations

@prajwaljpj
Copy link

This is my original sample data.
data_orig.zip

The data in test/cgp sounds similar to test/tts, probably a little better. Here is an example.
data_001.zip

Yes i have access to GPUs. I am open to trying Falcon, my only requirement is CPU inferencing and fast inferencing. Will get back after trying this script.
Im also curious to know where exactly Im going wrong.

@saikrishnarallabandi
Copy link
Collaborator

saikrishnarallabandi commented Jul 22, 2020 via email

@prajwaljpj
Copy link

prajwaljpj commented Jul 22, 2020

I already had pruned the silences for this dataset. Both end and middle using the scripts provided.
There could actually be less number of examples of some phones. What is the average word count required? If i pick top 5000 words my highest word count is 7513 and lowest word count is 5.
data.wc.txt

Im not sure about points 4 and 5. Will have to check. I think the data is recorded through an app on the phone.

Could there be something wrong with alignment? How can I verify this?
Also, the dataset does not have long sentences. Short 5 to 15 word sentences.

@saikrishnarallabandi
Copy link
Collaborator

saikrishnarallabandi commented Jul 22, 2020 via email

@prajwaljpj
Copy link

prajwaljpj commented Jul 26, 2020

In my case following this to build data did not work.
But when generated the txt.data.done on my own and copied the wavs using ./bin/get_wavs, it worked!
However, both the txt.done.data was the same!
Thank you for your detailed help @saikrishnarallabandi

Finally got these results for 42h of data:

all  mean 11.247145 std 999.729682 var 999459.436663 n 55196550
F0   mean 45.817497 std 32.325401 var 1044.931537 n 2207862
noF0 mean 0.160005 std 0.108984 var 0.011878 n 52988688
MCD  mean 4.549429 std 1.591600 var 2.533191 n 2207862

@saikrishnarallabandi
Copy link
Collaborator

saikrishnarallabandi commented Jul 26, 2020 via email

@plehal
Copy link

plehal commented May 13, 2021

Voice compilation script/process is acting weird. I was able to compile PUNJABI voice and also associated flite voice but it failed to compile the same voice again and also failed to compile hindi voice. Most of the console errors are similar to what are mentioned in the issue from 2018.

I tried to follow both the processes i.e. one mentioned in that issue comments long with do_indic command but none of it succeeds. all my festival/speech tools etc are clean build by fest_build.sh command. Intriguing part is that it compiled the voice twice without any problem but now it fails.

@saikrishnarallabandi
Copy link
Collaborator

saikrishnarallabandi commented May 13, 2021 via email

@plehal
Copy link

plehal commented May 14, 2021

What is the location of the logfile? Most of these errors are on the console where do_indic command is ran. Please, let me know if the log is captured in some file. Otherwise, I'll redirect stdout to afile and rerun the command. I ran one session this morning which is still running after more than 10 hours and seems to be going better than before just looking at correlation numbers....

Dataset of 16 vectors of 67 parameters from: wagon_rf_572210/data
RMSE 0.0767 Correlation is 0.9465 Mean (abs) Error 0.0516 (0.0568)
Iteration 16 festival/trees/nX_1_mcep.tree
Iteration 17 festival/trees/nX_2_mcep.tree
Dataset of 17 vectors of 67 parameters from: wagon_rf_572233/data
Attempt to access channel 53 of 52 channel track
Attempt to access channel 54 of 52 channel track
Attempt to access channel 55 of 52 channel track
Dataset of 17 vectors of 67 parameters from: wagon_rf_572233/data
RMSE 0.6720 Correlation is 0.8022 Mean (abs) Error 0.1739 (0.6497)
Dataset of 16 vectors of 67 parameters from: wagon_rf_572210/data
Attempt to access channel 52 of 52 channel track
Attempt to access channel 53 of 52 channel track
Dataset of 16 vectors of 67 parameters from: wagon_rf_572210/data
RMSE 0.5969 Correlation is 0.9498 Mean (abs) Error 0.1331 (0.5824)

Dataset of 4122 vectors of 67 parameters from: wagon_rf_580592/data
Attempt to access channel 52 of 52 channel track
Attempt to access channel 53 of 52 channel track
Attempt to access channel 55 of 52 channel track
Dataset of 4122 vectors of 67 parameters from: wagon_rf_580592/data
RMSE 0.5827 Correlation is 0.9535 Mean (abs) Error 0.1320 (0.5676)
RMSE 0.1190 Correlation is 0.9910 Mean (abs) Error 0.0739 (0.0933)
Iteration 11 festival/trees/i:_3_mcep.tree
Iteration 13 festival/trees/hv_3_mcep.tree
Dataset of 8487 vectors of 67 parameters from: wagon_rf_580258/data
Attempt to access channel 52 of 52 channel track

@saikrishnarallabandi
Copy link
Collaborator

saikrishnarallabandi commented May 14, 2021 via email

@plehal
Copy link

plehal commented May 14, 2021

Thanks. Sure, I'll rerun it and send you the log file. Really appreciate the responsiveness.

@plehal
Copy link

plehal commented May 17, 2021

Here is the log file. (gzipped). It took more than 24 hours to complete the run which ended in a failure.(indic pan amp)
p.log.gz

@saikrishnarallabandi
Copy link
Collaborator

saikrishnarallabandi commented May 17, 2021 via email

@plehal
Copy link

plehal commented May 17, 2021

No, this is the resulting log file for "do_indic pan amp" script. OK, let me run your base build script and catch the log.

@plehal
Copy link

plehal commented May 19, 2021

The script does create a festvox/festival voice. Now, how do I create flite voice from it as this script does create the basic directory structure for flite. Thanks for the help.

@saikrishnarallabandi
Copy link
Collaborator

saikrishnarallabandi commented May 19, 2021 via email

@plehal
Copy link

plehal commented May 19, 2021

Thanks. The flite file is built. However, it doesn't work, whereas the corresponding festvox voice does work. I'll open a new issue here to keep this thread sane.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants