Unexpectedly long outputs #167

joshhansen · 2019-01-06T08:47:25Z

I'm finding repeatedly that the g2p-seq2seq model generates strangely long pronunciations using the included model. For all sequences up to three letters long, the following strange outputs occur:

ysl
output: IY EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH

ybr
output: IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B

xsn
output: EH K S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH

wsq
output: D AH B AH L Y UW EH S K Y UW EH S K Y UW EH S K Y UW EH S K Y UW EH S K Y UW EH S K Y UW EH S K Y UW EH S K Y UW EH S K Y UW EH S K Y UW EH S K Y UW EH S K Y UW EH S K Y UW EH S K Y UW EH S K Y UW EH S K Y UW EH S K Y UW EH S K Y UW EH S K Y UW EH S

wsk
output: D AH B AH L Y UW EH S K Y UW EH S K Y UW EH S K Y UW EH S K Y UW EH S K Y UW EH S K Y UW EH S K Y UW EH S K EY

wjr
output: W JH UW JH AH B AH L Y UW Y UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW JH UW

vsl
output: V IY EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S

ssc
output: EH S EH S S S S S S S S S S S S S S S S S S S S S S IY

qsn
output: K Y UW EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH

qrk
output: K Y UW EH R K Y UW EH R K Y UW EH R K Y UW EH R K Y T IH K Y UW EH R K Y UW EH R K Y UW EH R K Y UW EH R K Y UW EH R K Y UW EH R K Y UW AA R K Y UW AA R K Y UW AA R K Y UW EH R K Y UW AA R K Y UW EH R K Y UW EH R K Y UW EH R K Y UW EH R K Y UW EH R

nqn
output: EH N D IY EH N Y UW EH N Y UW EH N Y UW EH N Y UW EH N Y UW EH N Y UW EH N Y UW EH N Y UW EH N D IY EH N Y UW EH N D IY EH N Y UW EH N Y UW EH N Y UW EH N Y UW EH N Y UW EH N D IY EH N D IY EH N Y UW EH N D IY EH N Y UW EH N Y UW EH N D IY EH N Y UW EH N D IY

lqr
output: EH L K Y UW EH L Y UW EH L K Y UW EH L K Y UW EH L K Y UW EH L K Y UW EH L K Y UW EH L K Y UW EH L K Y UW EH L K AA R

But these are all fairly arbitrary. Actual words get such results, too:

uncleanness
output: AH N K L IY N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N K L IY S

micrometeorological
output: M AY K R OW M IY T AO R AA L AO JH IH K AH L AA L AA L AA L AA L AA L AA L AA L AA L AA L AA L AA L AA L AA L AA L AA L AA L AA L AA L AA L AA L AA L AA L AA L AA L AA JH IH K AH L AA L AA L AA L AA L AA L AA L AA L AA L AA L AA L AA L AH L AA L AA JH IH K AH L AH JH IH K AH L AA L AA L AA L AA L AA L

quadrituberculate
output: K W AA D R AH T UW B ER K Y UW B ER K Y UW B ER K Y UW B ER K Y UW B ER K Y UW B ER K Y UW B ER K Y UW AE T

unexceptionableness
output: AH N IH K S EH P SH AH N AH B AH L IY N AH L N AH L N AH L N AH L AH L AH L N AH L N AH B AH L AH L AH L AH L AH S

The recurring theme seems to be that for whatever reason these words get stuck in a loop for a long time.

These are pretty rare, but are so egregiously bad that it makes me wonder if there is a bug somewhere? If not, guidance would be appreciated on how to train a model that avoids these issues.

nshmyrev · 2019-01-06T09:11:07Z

it depends on tensor2tensor version, they break it every month

vijay120 · 2020-03-28T00:05:05Z

I am facing a similar issue as well:

> kittipeumpoonwong
S IH T IY P IY AH M P UW N W AO N W AO N W AO N W AO N W AO N W AO N W AO N W AO N W AO N W AO N W AO NG

Is this a model issue or a bug in the decoder code?

I tried using the suggestion that it might be due to the tensor2tensor lib but I am getting the same results for tensor2tensor==1.6.6 and tensor2tensor==1.7.0

vijay120 · 2020-03-30T19:20:12Z

@joshhansen I solved this issue by adjusting the beam size of the decoding from 1 to 5.

g2p-seq2seq --decode wordlist.txt --model_dir g2p-seq2seq-model-6.2-cmudict-nostress --return_beams --beam_size 5

ysl IY EH S EH S EH L

ysl IH S AH L

ysl IY EH S EH S EH S EH S EH S EH L

ysl IY EH S EH S EH S EH S EH L

ysl IY EH S EH S EH S EH L

ybr W AY B ER

ybr IH B ER

ybr IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY

ybr IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY

ybr IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY B IY

xsn EH K S EH S EH N

xsn EH K S EH S EH S EH S EH N

xsn EH K S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH N

xsn EH K S EH S EH S EH N

xsn EH K S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH S EH N

wsq D AH B AH L Y UW EH S K Y UW EH S K Y UW

wsq D AH B AH L Y UW EH S K Y UW

wsq D AH B AH L Y UW EH S K Y UW EH S K EY

wsq D AH B AH L Y UW EH S IY

wsq D AH B AH L Y UW EH S K Y UW EH S K

wsk D AH B AH L Y UW EH S K Y UW EH S K EY

wsk D AH B AH Y UW EH S K Y UW EH S K EY

wsk W EH S K

wsk D AH B AH L Y UW EH S K Y UW EH S K Y UW EH S K EY

wsk D AH B AH L Y UW EH S K Y UW EH S K Y UW EH S K Y

wjr W ER

wjr W AA R

wjr W AY R

wjr W JH UW N Y ER

wjr D AH B AH L Y UW JH UW JH IY AA R

lqr EH L K Y UW EH S AA R

lqr EH L K Y UW EH R

lqr EH L K Y UW EH L AA R

lqr EH L K Y UW EH L Y ER

lqr EH L K Y UW EH L Y UW AA R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpectedly long outputs #167

Unexpectedly long outputs #167

joshhansen commented Jan 6, 2019

nshmyrev commented Jan 6, 2019

vijay120 commented Mar 28, 2020 •

edited

Loading

vijay120 commented Mar 30, 2020

Unexpectedly long outputs #167

Unexpectedly long outputs #167

Comments

joshhansen commented Jan 6, 2019

nshmyrev commented Jan 6, 2019

vijay120 commented Mar 28, 2020 • edited Loading

vijay120 commented Mar 30, 2020

vijay120 commented Mar 28, 2020 •

edited

Loading