Sound-guided Image Manipulation

EmoStyle: Emotion-aware Semantic Image Manipulation with Audio Guidance

EmoStyle: Emotion-aware Semantic Image Manipulation with Audio Guidance

Qiwei Shen, Junjie Xu, Jiahao Mei, Jialing Zou, Xingjiao Wu, Daoguo Dong

Abstract: With the flourishing development of generative models, image manipulation is receiving increasing attention. Rather than text modality, several elegant designs have delved into leveraging audio to manipulate images. However, existing methodologies mainly focus on image generation conditional on semantic alignment, ignoring the vivid affective information depicted in the audio. We propose Emotion-aware StyleGAN Manipulator (EmoStyle), a framework where affective information from audio can be explicitly extracted and further utilized during the image manipulation. Specifically, we first leverage the Multi-modality model ImageBind for initial cross-modal retrieval between images and music and select the music-related image for further manipulation. Simultaneously, by extracting sentiment polarity from the lyrics of the audio, we generate an emotionally rich auxiliary music branch to accentuate the affective information. We then leverage pre-trained encoders to encode audio and the audio-related image into the same embedding space. With the aligned embeddings, we manipulate the image via a direct latent optimization method. We conduct objective and subjective evaluations on the generated images, and our results show that our framework is capable of generating images with specified human emotions conveyed in the audio.

Installation

For all the methods described in the paper, is it required to have:

Anaconda
CLIP

Method

Extract Lyrics of the Audio

Visit https://github.com/YuanGongND/whisper-at, download the pretrained weight to dir"./whisper_at/pretrained_models/" and deploy the pretrained whisper_at model to extract lyrics from the given audio.

ChatGLM Deploy

Visit https://huggingface.co/THUDM/chatglm3-6b/tree/main, download the weight of ChatGLM3-6B and deploy the model to classify sentiment polarity of the lyrics.

Generate-Emotional-Music

Visit https://github.com/BaoChunhui/Generate-Emotional-Music and deploy the GRU-EBS branch to generate emotional music based on the sentiment polarity.

Download Pretrained StyleGAN2 and text-aligned audio encoder

Manipulate Image Generation

cd optimization

bash run.sh

Sound-guided Image Manipulation

This repository contains the code and data for the project Sound-guided Image Manipulation.

Citation

If you find our work useful, please cite our paper:

Shen Q, Xu J, Mei J, et al. EmoStyle: Emotion-Aware Semantic Image Manipulation with Audio Guidance. Applied Sciences, 2024, 14(8): 3193. Link to paper (Add actual link if available)

@article{shen2024emostyle,
  title={EmoStyle: Emotion-Aware Semantic Image Manipulation with Audio Guidance},
  author={Shen, Qiwei and Xu, Junjie and Mei, Jiahao and Wu, Xingjiao and Dong, Daoguo},
  journal={Applied Sciences},
  volume={14},
  number={8},
  pages={3193},
  year={2024},
  publisher={MDPI}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Generate-Lyrics-and-Melody-with-Emotions		Generate-Lyrics-and-Melody-with-Emotions
__pycache__		__pycache__
criteria		criteria
imagebind		imagebind
img		img
models		models
optimization		optimization
whisper_at		whisper_at
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EmoStyle: Emotion-aware Semantic Image Manipulation with Audio Guidance

Installation

Method

Extract Lyrics of the Audio

ChatGLM Deploy

Generate-Emotional-Music

Download Pretrained StyleGAN2 and text-aligned audio encoder

Manipulate Image Generation

Sound-guided Image Manipulation

Citation

About

Releases

Packages

Languages

AndreJJXu/EmoStyle

Folders and files

Latest commit

History

Repository files navigation

EmoStyle: Emotion-aware Semantic Image Manipulation with Audio Guidance

Installation

Method

Extract Lyrics of the Audio

ChatGLM Deploy

Generate-Emotional-Music

Download Pretrained StyleGAN2 and text-aligned audio encoder

Manipulate Image Generation

Sound-guided Image Manipulation

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages