You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I notice that both training and testing utterances are 4seconds long and the inference is "the evaluation utterances are first chunked to 4-second segments and processed by the network, with 2-second overlapping between consecutive segments."
If I want to shink the input of the network, Is there any chance I can use them in shorter audio, say 200ms long?
Can I use 4-seconds for training and 200ms for inference?
If not, Can I use 200ms for training and 200ms for inference?
The text was updated successfully, but these errors were encountered:
You can try. But I think 200ms may not be a good choice for training/inference, as context is not enough for the neural network to learn/predict.
I notice that both training and testing utterances are 4seconds long and the inference is "the evaluation utterances are first chunked to 4-second segments and processed by the network, with 2-second overlapping between consecutive segments."
Note: this configuration is for SpatialNet, not for online SpatialNet
I notice that both training and testing utterances are 4seconds long and the inference is "the evaluation utterances are first chunked to 4-second segments and processed by the network, with 2-second overlapping between consecutive segments."
If I want to shink the input of the network, Is there any chance I can use them in shorter audio, say 200ms long?
Can I use 4-seconds for training and 200ms for inference?
If not, Can I use 200ms for training and 200ms for inference?
The text was updated successfully, but these errors were encountered: