Skip to content

Repository for the thesis titled "The Effects of Fine-Tuning on the ASR Performance of Dialectal Arabic".

Notifications You must be signed in to change notification settings

O-T-O-Z/finetune-ar-dialects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The Effects of Fine-Tuning on the ASR Performance of Dialectal Arabic

This repository contains the code for training the models evaluated in the thesis as well as the results and plotting.

Setting up environment

To get started, please install the requirements on a Python 3.10 environment. An example using conda:

conda create -n venv python=3.10
conda activate venv
pip install -r requirements.txt

Instructions for usage

Experiments

The training/experiment_*.py files expect datasets to be available. Please check out the files before trying to run them. Example uses are displayed below.

usage: experiment_dialect.py [-h] -d DIALECT

options:
  -h, --help            show this help message and exit
  -d DIALECT, --dialect DIALECT
                        all, egyptian, gulf, iraqi, levantine, maghrebi
usage: experiment_finetune.py [-h] -d DIALECT

options:
  -h, --help            show this help message and exit
  -d DIALECT, --dialect DIALECT
                        all, egyptian, gulf, iraqi, levantine, maghrebi
usage: experiment_msa.py [-h] -t TRAIN_SIZE

options:
  -h, --help            show this help message and exit
  -t TRAIN_SIZE, --train_size TRAIN_SIZE
                        Train size between 0 and 1

Evaluation

Evaluation can be done with both the training/evaluate_all.py and training/evaluate_whisper*.py files, with the latter being a manual input of the model checkpoint and only evaluating on MSA. training/evaluate_all.py evaluates on all test sets:

usage: evaluate_all.py [-h] -c CHECKPOINT

options:
  -h, --help            show this help message and exit
  -c CHECKPOINT, --checkpoint CHECKPOINT

Results

The results can be found in results/ as well as the Jupyter notebooks required for recreation of the plots in the thesis. results/training_plots.ipynb plots the training processes, while results/results.ipynb plots the final results. The plots can also be found in results/plots/

About

Repository for the thesis titled "The Effects of Fine-Tuning on the ASR Performance of Dialectal Arabic".

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published