squeezenet for speech #64

akankshaaa13 · 2021-10-07T15:09:43Z

can squeezenet be used for speech emotion recognition if we feed 3D log mel spectrum values?

dragon18456 · 2021-10-08T03:57:03Z

There exists a paradigm for speech emotion recognition where you can use a backbone like squeezenet for SER. Given some audio, you wish to classify the speech to some discrete number of classes like happy, sad, angry, etc. You can run squeezenet through the log mel spectrogram features (or MFCC if you want), discarding the classification layer. From here, you will have some activation tensor with length that depends on the length of the input, so you need to reduce it to a predefined size. Some works use RNNs or LSTMs, with some mixed results. If you are starting on SER, I think that something simple like global average pooling is a good place to start. From there, you can have a simple classification FC layer to get your logits.

akankshanarahari · 2021-10-09T06:58:38Z

What are the classification layers in squeezenet?

forresti · 2021-10-10T19:16:40Z

The final layer of SqueezeNet outputs a 1 dimensional vector with length equal to the number of categories. For example, if you are classifying an images and you have 1000 categories, each image will have a 1000-d vector. The model's predicted class is the element of the vector with the highest numerical value.

If you're classifying emotions of from audio data and you have 10 different emotions (e.g. happy, sad, confused, distracted, ...), then you would want to configure the model to have a 10-dimensional output vector.

One other note - this code repository is over 5 years old and uses a neural network framework called Caffe. Caffe is pretty old at this point, and I have since switched to using PyTorch. (It can be debated whether PyTorch or TensorFlow is better; I personally prefer PyTorch.) If you install PyTorch and Torchvision, there is an easy-to-use implementation of SqueezeNet there: https://pytorch.org/hub/pytorch_vision_squeezenet/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

squeezenet for speech #64

squeezenet for speech #64

akankshaaa13 commented Oct 7, 2021

dragon18456 commented Oct 8, 2021

akankshanarahari commented Oct 9, 2021

forresti commented Oct 10, 2021

squeezenet for speech #64

squeezenet for speech #64

Comments

akankshaaa13 commented Oct 7, 2021

dragon18456 commented Oct 8, 2021

akankshanarahari commented Oct 9, 2021

forresti commented Oct 10, 2021