Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

squeezenet for speech #64

Open
akankshaaa13 opened this issue Oct 7, 2021 · 3 comments
Open

squeezenet for speech #64

akankshaaa13 opened this issue Oct 7, 2021 · 3 comments

Comments

@akankshaaa13
Copy link

can squeezenet be used for speech emotion recognition if we feed 3D log mel spectrum values?

@dragon18456
Copy link

There exists a paradigm for speech emotion recognition where you can use a backbone like squeezenet for SER. Given some audio, you wish to classify the speech to some discrete number of classes like happy, sad, angry, etc. You can run squeezenet through the log mel spectrogram features (or MFCC if you want), discarding the classification layer. From here, you will have some activation tensor with length that depends on the length of the input, so you need to reduce it to a predefined size. Some works use RNNs or LSTMs, with some mixed results. If you are starting on SER, I think that something simple like global average pooling is a good place to start. From there, you can have a simple classification FC layer to get your logits.

@akankshanarahari
Copy link

What are the classification layers in squeezenet?

@forresti
Copy link
Owner

The final layer of SqueezeNet outputs a 1 dimensional vector with length equal to the number of categories. For example, if you are classifying an images and you have 1000 categories, each image will have a 1000-d vector. The model's predicted class is the element of the vector with the highest numerical value.

If you're classifying emotions of from audio data and you have 10 different emotions (e.g. happy, sad, confused, distracted, ...), then you would want to configure the model to have a 10-dimensional output vector.

One other note - this code repository is over 5 years old and uses a neural network framework called Caffe. Caffe is pretty old at this point, and I have since switched to using PyTorch. (It can be debated whether PyTorch or TensorFlow is better; I personally prefer PyTorch.) If you install PyTorch and Torchvision, there is an easy-to-use implementation of SqueezeNet there: https://pytorch.org/hub/pytorch_vision_squeezenet/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants