xls-r-analysis-sqa

1. Overview

This repository hosts the models for the paper "Analysis of XLS-R for Speech Quality Assessment".

1.1. Performance On Unseen Datasets

Comparison of model performance on each unseen corpus individually (NISQA, IUB) and combined together (Unseen). The metric is RMSE, lower is better.

V1 Results

Model	NISQA	IUB	Unseen
XLS-R 300M Layer24 Bi-LSTM [1]	0.5907	0.5067	0.5323
DNSMOS [2]	0.8718	0.5452	0.6565
MFCC Transformer	0.8280	0.7775	0.7924
XLS-R 300M Layer5 Transformer	0.6256	0.5049	0.5425
XLS-R 300M Layer21 Transformer	0.5694	0.5025	0.5227
XLS-R 300M Layer5+21 Transformer	0.5683	0.4886	0.5129
XLS-R 1B Layer10 Transformer	0.5456	0.5815	0.5713
XLS-R 1B Layer41 Transformer	0.5657	0.4656	0.4966
XLS-R 1B Layer10+41 Transformer	0.5748	0.5288	0.5425
XLS-R 2B Layer10 Transformer	0.6277	0.4899	0.5334
XLS-R 2B Layer41 Transformer	0.5724	0.4897	0.5150
XLS-R 2B Layer10+41 Transformer	0.6036	0.4743	0.5150
Human	0.6738	0.6573	0.6629

V2 Results

UPDATE: the code has been updated to use version 2 of the models. Version 1 used the final model checkpoint by mistake, version 2 uses the checkpoint with the minimum validation loss.

Model	NISQA	IUB	Unseen
XLS-R 300M Layer24 Bi-LSTM [1]	0.5907	0.5067	0.5323
DNSMOS [2]	0.8718	0.5452	0.6565
MFCC Transformer	0.9291	0.7415	0.8003
XLS-R 300M Layer5 Transformer	0.6494	0.5117	0.5550
XLS-R 300M Layer21 Transformer	0.5852	0.4838	0.5152
XLS-R 300M Layer5+21 Transformer	0.5861	0.4768	0.5108
XLS-R 1B Layer10 Transformer	0.6217	0.4763	0.5225
XLS-R 1B Layer41 Transformer	0.5615	0.4646	0.4946
XLS-R 1B Layer10+41 Transformer	0.6024	0.4624	0.5068
XLS-R 2B Layer10 Transformer	0.5227	0.4447	0.4686
XLS-R 2B Layer41 Transformer	0.5295	0.4926	0.5035
XLS-R 2B Layer10+41 Transformer	0.5191	0.4573	0.4760
Human	0.6738	0.6573	0.6629

[1] Tamm, B., Balabin, H., Vandenberghe, R., Van hamme, H. (2022) Pre-trained Speech Representations as Feature Extractors for Speech Quality Assessment in Online Conferencing Applications. Proc. Interspeech 2022, 4083-4087, doi: 10.21437/Interspeech.2022-10147

[2] C. K. A. Reddy, V. Gopal and R. Cutler, "DNSMOS: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors," ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 2021, pp. 6493-6497, doi: 10.1109/ICASSP39728.2021.9414878.

1.2. Visualization of MOS Predictions

MOS predictions on two unseen datasets: NISQA (top) and IU Bloomington (bottom). Our proposed model based on embeddings extracted from the 10th layer of the pre-trained XLS-R 2B outperforms DNSMOS and the MFCC baseline. The human ACRs are also visualized for the IUB corpus.

1.3. Example Audio Segments

🔊

Excellent (MOS = 4.808)

Audio Sample	Model	Prediction	Error
iub-excellent.mp4	DNSMOS	3.699	-1.109
	MFCC Transformer	3.497	−1.311
	XLS-R 2B Layer10 Transformer	3.935	-0.873

🔊

Good (MOS = 4.104)

Audio Sample	Model	Prediction	Error
iub-good.mp4	DNSMOS	3.269	-0.835
	MFCC Transformer	2.498	-1.606
	XLS-R 2B Layer10 Transformer	3.793	-0.311

🔊

Fair (MOS = 3.168)

Audio Sample	Model	Prediction	Error
iub-fair.mp4	DNSMOS	3.309	+0.141
	MFCC Transformer	3.931	+0.763
	XLS-R 2B Layer10 Transformer	3.080	-0.088

🔊

Poor (MOS = 2.240)

Audio Sample	Model	Prediction	Error
iub-poor.mp4	DNSMOS	2.704	+0.464
	MFCC Transformer	1.927	-0.313
	XLS-R 2B Layer10 Transformer	2.284	+0.044

🔊

Bad (MOS = 1.416)

Audio Sample	Model	Prediction	Error
iub-bad.mp4	DNSMOS	2.553	+1.137
	MFCC Transformer	1.806	+0.390
	XLS-R 2B Layer10 Transformer	2.312	+0.896

2. Installation

Option A: Install via `pip` (Recommended)

pip install xls-r-sqa

Option B: Install From Source

First, clone the repository.

git clone https://github.com/lcn-kul/xls-r-analysis-sqa.git

Next, install the requirements to a virtual environment of your choice.

cd xls-r-analysis-sqa/
pip3 install -r requirements.txt

3. Truncated XLS-R Models

This code uses truncated XLS-R models. By default, the code will attempt to auto-download the required truncated XLS-R model from Hugging Face whenever you create an E2EModel that uses XLS-R. For example:

from xls_r_sqa.config import XLSR_2B_TRANSFORMER_32DEEP_CONFIG
from xls_r_sqa.e2e_model import E2EModel

model = E2EModel(
    config=XLSR_2B_TRANSFORMER_32DEEP_CONFIG,
    xlsr_layers=10,
    auto_download=True  # <-- default is True
)

If you do not wish to auto-download, or if you would like to choose your own save location, there are two manual approaches:

Download Truncated Models: Clone the truncated XLS-R repositories from Hugging Face (using Git LFS). Follow [these instructions] in xls_r_sqa/models/xls-r-trunc/README.md.
Truncate Full XLS-R Yourself: Download the full pre-trained XLS-R models (see [these instructions] in xls_r_sqa/models/xls-r/README.md) and then run truncate_w2v2.py to create the truncated versions locally.

Warning: The combined size of all truncated XLS-R repos is approximately 15 GB (plus .git overhead, effectively doubling the storage needed). Make sure you have sufficient disk space before downloading or truncating them yourself.

4. Usage

A working example is provided in test_e2e_sqa.py.

5. Citation

@INPROCEEDINGS{10248049,
  author={Tamm, Bastiaan and Vandenberghe, Rik and Van Hamme, Hugo},
  booktitle={2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)}, 
  title={Analysis of XLS-R for Speech Quality Assessment}, 
  year={2023},
  volume={},
  number={},
  pages={1-5},
  doi={10.1109/WASPAA58266.2023.10248049}
}

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
audio_samples		audio_samples
img-v2		img-v2
img		img
xls_r_sqa		xls_r_sqa
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
test_e2e_sqa.py		test_e2e_sqa.py
truncate_w2v2.py		truncate_w2v2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

xls-r-analysis-sqa

1. Overview

1.1. Performance On Unseen Datasets

1.2. Visualization of MOS Predictions

1.3. Example Audio Segments

2. Installation

Option A: Install via `pip` (Recommended)

Option B: Install From Source

3. Truncated XLS-R Models

4. Usage

5. Citation

About

Releases

Packages

Contributors 2

Languages

License

lcn-kul/xls-r-analysis-sqa

Folders and files

Latest commit

History

Repository files navigation

xls-r-analysis-sqa

1. Overview

1.1. Performance On Unseen Datasets

1.2. Visualization of MOS Predictions

1.3. Example Audio Segments

2. Installation

Option A: Install via pip (Recommended)

Option B: Install From Source

3. Truncated XLS-R Models

4. Usage

5. Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Option A: Install via `pip` (Recommended)

Packages