Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Created SER tutorial #201

Draft
wants to merge 14 commits into
base: main
Choose a base branch
from
Draft

Created SER tutorial #201

wants to merge 14 commits into from

Conversation

wilke0818
Copy link
Collaborator

Description

Creates a tutorial using Senselab for SER.

Related Issue(s)

#197

Motivation and Context

We were lacking a tutorial using the functionality for audio classification, which currently has one specific implementation, speech emotion recognition, and so now novice users can better understand how to Senselab might be useful for this task.

How Has This Been Tested?

Through Colab

@wilke0818 wilke0818 requested a review from fabiocat93 November 18, 2024 21:58
@fabiocat93
Copy link
Collaborator

Thanks @wilke0818 for the tutorial! It’s helpful and does a good job of addressing real challenges that users might face. I like it, and I’m curious to see how actual users respond.

A few things are still missing to make this more complete:

  1. Senselab installation: We should install senselab only when running in Colab. If the user is running elsewhere, we can assume they’ve already set it up. You can use a function like this:
def is_colab():
    try:
        import google.colab
        return True
    except ImportError:
        return False

if is_colab():
    %pip install senselab
else:
    print("Not running on Colab. Skipping installation.")
  1. API for classification and SER: Add an API for audio classification and speech emotion recognition (SER). For now, this API should call the Hugging Face-based function only if the model uses Hugging Face. If other types of models are used, raise a NotImplementedError. This will make this task more in line with the others in senselab and more easily maintainable.

  2. Documentation for classification and SER: Add documentation for these tasks. It should explain what the tasks are and link to the tutorial. You can use resources like this to get started: https://huggingface.co/tasks/audio-classification.

  3. This branch is out-of-date with the base branch: Please, update the branch before requesting a new review.

Let me know if you need help with any of these points!

Copy link
Collaborator

@fabiocat93 fabiocat93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wilke0818 . I have commented some required changes

@fabiocat93 fabiocat93 marked this pull request as draft November 20, 2024 02:39
@fabiocat93
Copy link
Collaborator

hi, @wilke0818 ! Did you have any time to work on this?

@wilke0818
Copy link
Collaborator Author

Nope. Added the tutorial change that you gave (might not have updated). Need to refactor for the API per the other issue on this topic. It is unclear to me what you would want for documentation. The functionalities themselves are documented already and this tutorial provides the information about the task a user might need (it is pretty akin to the link you sent).

@fabiocat93
Copy link
Collaborator

It is unclear to me what you would want for documentation. The functionalities themselves are documented already and this tutorial provides the information about the task a user might need (it is pretty akin to the link you sent).

Every task has a documentation page that explains what the task is, how it's commonly evaluated, what are the popular datasets and models. You can see the doc: https://sensein.group/senselab/senselab/audio/tasks/text_to_speech.html

@wilke0818
Copy link
Collaborator Author

I mean that makes sense but do we want this to be an SER task or a generic audio classification task (which is what the HuggingFace pipeline is) which doesn't have a specific task/dataset but where SER is just an example usage of the task?

@fabiocat93
Copy link
Collaborator

I mean, that makes sense, but do we want this to be an SER task or a generic audio classification task (which is what the HuggingFace pipeline is) that doesn't have a specific task/dataset but where SER is just an example usage of the task?

Following #197, both. I would implement both a classification task (as HuggingFace has) and a SER task and would make it so that SER exploits the classification interfaces and employs some checks before and after (e.g., outputs should be emotion-related)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Task: create an abstract interface for senselab Speech Emotion Recognition and Audio Classification
2 participants