Skip to content

Commit

Permalink
Fix occurrences of David Adelani & co-authors
Browse files Browse the repository at this point in the history
  • Loading branch information
mbollmann committed Feb 9, 2025
1 parent 6d3e96c commit 7d99dec
Show file tree
Hide file tree
Showing 15 changed files with 120 additions and 94 deletions.
2 changes: 1 addition & 1 deletion data/xml/2020.emnlp.xml
Original file line number Diff line number Diff line change
Expand Up @@ -3118,7 +3118,7 @@
<paper id="204">
<title>Transfer Learning and Distant Supervision for Multilingual Transformer Models: A Study on <fixed-case>A</fixed-case>frican Languages</title>
<author><first>Michael A.</first><last>Hedderich</last></author>
<author><first>David</first><last>Adelani</last></author>
<author><first>David I.</first><last>Adelani</last></author>
<author><first>Dawei</first><last>Zhu</last></author>
<author><first>Jesujoba</first><last>Alabi</last></author>
<author><first>Udia</first><last>Markus</last></author>
Expand Down
4 changes: 2 additions & 2 deletions data/xml/2020.lrec.xml
Original file line number Diff line number Diff line change
Expand Up @@ -4123,9 +4123,9 @@
</paper>
<paper id="335">
<title>Massive vs. Curated Embeddings for Low-Resourced Languages: the Case of <fixed-case>Y</fixed-case>orùbá and <fixed-case>T</fixed-case>wi</title>
<author><first>Jesujoba</first><last>Alabi</last></author>
<author><first>Jesujoba O.</first><last>Alabi</last></author>
<author><first>Kwabena</first><last>Amponsah-Kaakyire</last></author>
<author><first>David</first><last>Adelani</last></author>
<author><first>David I.</first><last>Adelani</last></author>
<author><first>Cristina</first><last>España-Bonet</last></author>
<pages>2754–2762</pages>
<abstract>The success of several architectures to learn semantic representations from unannotated text and the availability of these kind of texts in online multilingual resources such as Wikipedia has facilitated the massive and automatic creation of resources for multiple languages. The evaluation of such resources is usually done for the high-resourced languages, where one has a smorgasbord of tasks and test sets to evaluate on. For low-resourced languages, the evaluation is more difficult and normally ignored, with the hope that the impressive capability of deep learning architectures to learn (multilingual) representations in the high-resourced setting holds in the low-resourced setting too. In this paper we focus on two African languages, Yorùbá and Twi, and compare the word embeddings obtained in this way, with word embeddings obtained from curated corpora and a language-dependent processing. We analyse the noise in the publicly available corpora, collect high quality and noisy data for the two languages and quantify the improvements that depend not only on the amount of data but on the quality too. We also use different architectures that learn word representations both from surface forms and characters to further exploit all the available information which showed to be important for these languages. For the evaluation, we manually translate the wordsim-353 word pairs dataset from English into Yorùbá and Twi. We extend the analysis to contextual word embeddings and evaluate multilingual BERT on a named entity recognition task. For this, we annotate with named entities the Global Voices corpus for Yorùbá. As output of the work, we provide corpora, embeddings and the test suits for both languages.</abstract>
Expand Down
2 changes: 1 addition & 1 deletion data/xml/2021.emnlp.xml
Original file line number Diff line number Diff line change
Expand Up @@ -10761,7 +10761,7 @@
</paper>
<paper id="684">
<title>Preventing Author Profiling through Zero-Shot Multilingual Back-Translation</title>
<author><first>David</first><last>Adelani</last></author>
<author><first>David Ifeoluwa</first><last>Adelani</last></author>
<author><first>Miaoran</first><last>Zhang</last></author>
<author><first>Xiaoyu</first><last>Shen</last></author>
<author><first>Ali</first><last>Davody</last></author>
Expand Down
4 changes: 2 additions & 2 deletions data/xml/2021.mtsummit.xml
Original file line number Diff line number Diff line change
Expand Up @@ -76,9 +76,9 @@
</paper>
<paper id="6">
<title>The Effect of Domain and Diacritics in <fixed-case>Y</fixed-case>oruba–<fixed-case>E</fixed-case>nglish Neural Machine Translation</title>
<author><first>David</first><last>Adelani</last></author>
<author><first>David Ifeoluwa</first><last>Adelani</last></author>
<author><first>Dana</first><last>Ruiter</last></author>
<author><first>Jesujoba</first><last>Alabi</last></author>
<author><first>Jesujoba O.</first><last>Alabi</last></author>
<author><first>Damilola</first><last>Adebonojo</last></author>
<author><first>Adesina</first><last>Ayeni</last></author>
<author><first>Mofe</first><last>Adeyemi</last></author>
Expand Down
18 changes: 14 additions & 4 deletions data/xml/2022.emnlp.xml
Original file line number Diff line number Diff line change
Expand Up @@ -3912,15 +3912,15 @@
</paper>
<paper id="298">
<title><fixed-case>M</fixed-case>asakha<fixed-case>NER</fixed-case> 2.0: <fixed-case>A</fixed-case>frica-centric Transfer Learning for Named Entity Recognition</title>
<author><first>David</first><last>Adelani</last><affiliation>University College London</affiliation></author>
<author><first>David Ifeoluwa</first><last>Adelani</last><affiliation>University College London</affiliation></author>
<author><first>Graham</first><last>Neubig</last><affiliation>Carnegie Mellon University</affiliation></author>
<author><first>Sebastian</first><last>Ruder</last><affiliation>Google</affiliation></author>
<author><first>Shruti</first><last>Rijhwani</last><affiliation>Carnegie Mellon University</affiliation></author>
<author><first>Michael</first><last>Beukman</last><affiliation>University of the Witwatersrand</affiliation></author>
<author><first>Chester</first><last>Palen-Michel</last><affiliation>Brandeis University</affiliation></author>
<author><first>Constantine</first><last>Lignos</last><affiliation>Brandeis University</affiliation></author>
<author><first>Jesujoba</first><last>Alabi</last><affiliation>Saarland University</affiliation></author>
<author><first>Shamsuddeen</first><last>Muhammad</last><affiliation>Bayero University, Kano</affiliation></author>
<author><first>Jesujoba O.</first><last>Alabi</last><affiliation>Saarland University</affiliation></author>
<author><first>Shamsuddeen H.</first><last>Muhammad</last><affiliation>Bayero University, Kano</affiliation></author>
<author><first>Peter</first><last>Nabende</last><affiliation>Makerere University</affiliation></author>
<author><first>Cheikh M. Bamba</first><last>Dione</last><affiliation>University of Bergen</affiliation></author>
<author><first>Andiswa</first><last>Bukula</last><affiliation>SADiLaR</affiliation></author>
Expand All @@ -3942,11 +3942,21 @@
<author><first>Allahsera Auguste</first><last>Tapo</last><affiliation>Rochester Institute of Technology</affiliation></author>
<author><first>Tebogo</first><last>Macucwa</last><affiliation>University of Pretoria, Masakhane</affiliation></author>
<author><first>Vukosi</first><last>Marivate</last><affiliation>Department of Computer Science, University of Pretoria</affiliation></author>
<author><first>Mboning Tchiaze</first><last>Elvis</last><affiliation>NTeALan</affiliation></author>
<author><first>Elvis</first><last>Mboning</last><affiliation>NTeALan</affiliation></author>
<author><first>Tajuddeen</first><last>Gwadabe</last><affiliation>University of Chinese Academy of Science</affiliation></author>
<author><first>Tosin</first><last>Adewumi</last><affiliation>Luleå University of Technology</affiliation></author>
<author><first>Orevaoghene</first><last>Ahia</last><affiliation>University of Washington</affiliation></author>
<author><first>Joyce</first><last>Nakatumba-Nabende</last><affiliation>Makerere University</affiliation></author>
<author><first>Neo L.</first><last>Mokono</last></author>
<author><first>Ignatius</first><last>Ezeani</last></author>
<author><first>Chiamaka</first><last>Chukwuneke</last></author>
<author><first>Mofetoluwa</first><last>Adeyemi</last></author>
<author><first>Gilles Q.</first><last>Hacheme</last></author>
<author><first>Idris</first><last>Abdulmumim</last></author>
<author><first>Odunayo</first><last>Ogundepo</last></author>
<author><first>Oreen</first><last>Yousuf</last></author>
<author><first>Tatiana</first><last>Moteu Ngoli</last></author>
<author><first>Dietrich</first><last>Klakow</last></author>
<pages>4488-4508</pages>
<abstract>African languages are spoken by over a billion people, but they are under-represented in NLP research and development. Multiple challenges exist, including the limited availability of annotated training and evaluation datasets as well as the lack of understanding of which settings, languages, and recently proposed methods like cross-lingual transfer will be effective. In this paper, we aim to move towards solutions for these challenges, focusing on the task of named entity recognition (NER). We present the creation of the largest to-date human-annotated NER dataset for 20 African languages. We study the behaviour of state-of-the-art cross-lingual transfer methods in an Africa-centric setting, empirically demonstrating that the choice of source transfer language significantly affects performance. While much previous work defaults to using English as the source language, our results show that choosing the best transfer language improves zero-shot F1 scores by an average of 14% over 20 languages as compared to using English.</abstract>
<url hash="51dbe768">2022.emnlp-main.298</url>
Expand Down
6 changes: 3 additions & 3 deletions data/xml/2022.findings.xml
Original file line number Diff line number Diff line change
Expand Up @@ -106,13 +106,13 @@
</paper>
<paper id="6">
<title>Pre-Trained Multilingual Sequence-to-Sequence Models: A Hope for Low-Resource Language Translation?</title>
<author><first>En-Shiun</first><last>Lee</last></author>
<author><first>En-Shiun Annie</first><last>Lee</last></author>
<author><first>Sarubi</first><last>Thillainathan</last></author>
<author><first>Shravan</first><last>Nayak</last></author>
<author><first>Surangika</first><last>Ranathunga</last></author>
<author><first>David</first><last>Adelani</last></author>
<author><first>David Ifeoluwa</first><last>Adelani</last></author>
<author><first>Ruisi</first><last>Su</last></author>
<author><first>Arya</first><last>McCarthy</last></author>
<author><first>Arya D.</first><last>McCarthy</last></author>
<pages>58-67</pages>
<abstract>What can pre-trained multilingual sequence-to-sequence models like mBART contribute to translating low-resource languages? We conduct a thorough empirical experiment in 10 languages to ascertain this, considering five factors: (1) the amount of fine-tuning data, (2) the noise in the fine-tuning data, (3) the amount of pre-training data in the model, (4) the impact of domain mismatch, and (5) language typology. In addition to yielding several heuristics, the experiments form a framework for evaluating the data sensitivities of machine translation systems. While mBART is robust to domain differences, its translations for unseen and typologically distant languages remain below 3.0 BLEU. In answer to our title’s question, mBART is not a low-resource panacea; we therefore encourage shifting the emphasis from new models to new data.</abstract>
<url hash="57ee5031">2022.findings-acl.6</url>
Expand Down
2 changes: 1 addition & 1 deletion data/xml/2022.insights.xml
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@
<author><first>Dawei</first><last>Zhu</last></author>
<author><first>Michael A.</first><last>Hedderich</last></author>
<author><first>Fangzhou</first><last>Zhai</last></author>
<author><first>David</first><last>Adelani</last></author>
<author><first>David Ifeoluwa</first><last>Adelani</last></author>
<author><first>Dietrich</first><last>Klakow</last></author>
<pages>62-67</pages>
<abstract>Incorrect labels in training data occur when human annotators make mistakes or when the data is generated via weak or distant supervision. It has been shown that complex noise-handling techniques - by modeling, cleaning or filtering the noisy instances - are required to prevent models from fitting this label noise. However, we show in this work that, for text classification tasks with modern NLP models like BERT, over a variety of noise types, existing noise-handling methods do not always improve its performance, and may even deteriorate it, suggesting the need for further investigation. We also back our observations with a comprehensive analysis.</abstract>
Expand Down
24 changes: 12 additions & 12 deletions data/xml/2022.naacl.xml
Original file line number Diff line number Diff line change
Expand Up @@ -3574,8 +3574,8 @@
</paper>
<paper id="223">
<title>A Few Thousand Translations Go a Long Way! Leveraging Pre-trained Models for <fixed-case>A</fixed-case>frican News Translation</title>
<author><first>David</first><last>Adelani</last></author>
<author><first>Jesujoba</first><last>Alabi</last></author>
<author><first>David Ifeoluwa</first><last>Adelani</last></author>
<author><first>Jesujoba Oluwadara</first><last>Alabi</last></author>
<author><first>Angela</first><last>Fan</last></author>
<author><first>Julia</first><last>Kreutzer</last></author>
<author><first>Xiaoyu</first><last>Shen</last></author>
Expand All @@ -3590,27 +3590,27 @@
<author><first>Chris</first><last>Emezue</last></author>
<author><first>Colin</first><last>Leong</last></author>
<author><first>Michael</first><last>Beukman</last></author>
<author><first>Shamsuddeen</first><last>Muhammad</last></author>
<author><first>Guyo</first><last>Jarso</last></author>
<author><first>Shamsuddeen H.</first><last>Muhammad</last></author>
<author><first>Guyo D.</first><last>Jarso</last></author>
<author><first>Oreen</first><last>Yousuf</last></author>
<author><first>Andre</first><last>Niyongabo Rubungo</last></author>
<author><first>Andre N.</first><last>Niyongabo Rubungo</last></author>
<author><first>Gilles</first><last>Hacheme</last></author>
<author><first>Eric Peter</first><last>Wairagala</last></author>
<author><first>Muhammad Umair</first><last>Nasir</last></author>
<author><first>Benjamin</first><last>Ajibade</last></author>
<author><first>Tunde</first><last>Ajayi</last></author>
<author><first>Yvonne</first><last>Gitau</last></author>
<author><first>Benjamin A.</first><last>Ajibade</last></author>
<author><first>Tunde Oluwaseyi</first><last>Ajayi</last></author>
<author><first>Yvonne Wambui</first><last>Gitau</last></author>
<author><first>Jade</first><last>Abbott</last></author>
<author><first>Mohamed</first><last>Ahmed</last></author>
<author><first>Millicent</first><last>Ochieng</last></author>
<author><first>Anuoluwapo</first><last>Aremu</last></author>
<author><first>Perez</first><last>Ogayo</last></author>
<author><first>Jonathan</first><last>Mukiibi</last></author>
<author><first>Fatoumata</first><last>Ouoba Kabore</last></author>
<author><first>Godson</first><last>Kalipe</last></author>
<author><first>Godson Koffi</first><last>Kalipe</last></author>
<author><first>Derguene</first><last>Mbaye</last></author>
<author><first>Allahsera Auguste</first><last>Tapo</last></author>
<author><first>Victoire</first><last>Memdjokam Koagne</last></author>
<author><first>Victoire M.</first><last>Memdjokam Koagne</last></author>
<author><first>Edwin</first><last>Munkoh-Buabeng</last></author>
<author><first>Valencia</first><last>Wagner</last></author>
<author><first>Idris</first><last>Abdulmumin</last></author>
Expand Down Expand Up @@ -7076,8 +7076,8 @@
<title><fixed-case>MCSE</fixed-case>: <fixed-case>M</fixed-case>ultimodal Contrastive Learning of Sentence Embeddings</title>
<author><first>Miaoran</first><last>Zhang</last></author>
<author><first>Marius</first><last>Mosbach</last></author>
<author><first>David</first><last>Adelani</last></author>
<author><first>Michael</first><last>Hedderich</last></author>
<author><first>David Ifeoluwa</first><last>Adelani</last></author>
<author><first>Michael A.</first><last>Hedderich</last></author>
<author><first>Dietrich</first><last>Klakow</last></author>
<pages>5959-5969</pages>
<abstract>Learning semantically meaningful sentence embeddings is an open problem in natural language processing. In this work, we propose a sentence embedding learning approach that exploits both visual and textual information via a multimodal contrastive objective. Through experiments on a variety of semantic textual similarity tasks, we demonstrate that our approach consistently improves the performance across various datasets and pre-trained encoders. In particular, combining a small amount of multimodal data with a large text-only corpus, we improve the state-of-the-art average Spearman’s correlation by 1.7%. By analyzing the properties of the textual embedding space, we show that our model excels in aligning semantically similar sentences, providing an explanation for its improved performance.</abstract>
Expand Down
2 changes: 1 addition & 1 deletion data/xml/2022.wmt.xml
Original file line number Diff line number Diff line change
Expand Up @@ -982,7 +982,7 @@
</paper>
<paper id="72">
<title>Findings of the <fixed-case>WMT</fixed-case>’22 Shared Task on Large-Scale Machine Translation Evaluation for <fixed-case>A</fixed-case>frican Languages</title>
<author><first>David</first><last>Adelani</last><affiliation>University College London</affiliation></author>
<author><first>David Ifeoluwa</first><last>Adelani</last><affiliation>University College London</affiliation></author>
<author><first>Md Mahfuz Ibn</first><last>Alam</last><affiliation>George Mason University</affiliation></author>
<author><first>Antonios</first><last>Anastasopoulos</last><affiliation>George Mason University</affiliation></author>
<author><first>Akshita</first><last>Bhagia</last><affiliation>Ai2</affiliation></author>
Expand Down
6 changes: 3 additions & 3 deletions data/xml/2023.c3nlp.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<booktitle>Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP)</booktitle>
<editor><first>Sunipa</first><last>Dev</last></editor>
<editor><first>Vinodkumar</first><last>Prabhakaran</last></editor>
<editor><first>David</first><last>Adelani</last></editor>
<editor><first>David Ifeoluwa</first><last>Adelani</last></editor>
<editor><first>Dirk</first><last>Hovy</last></editor>
<editor><first>Luciana</first><last>Benotti</last></editor>
<publisher>Association for Computational Linguistics</publisher>
Expand All @@ -22,8 +22,8 @@
<paper id="1">
<title>Varepsilon kú mask: Integrating <fixed-case>Y</fixed-case>orùbá cultural greetings into machine translation</title>
<author><first>Idris</first><last>Akinade</last><affiliation>University of Ibadan</affiliation></author>
<author><first>Jesujoba</first><last>Alabi</last><affiliation>Saarland University</affiliation></author>
<author><first>David</first><last>Adelani</last><affiliation>University College London</affiliation></author>
<author><first>Jesujoba O.</first><last>Alabi</last><affiliation>Saarland University</affiliation></author>
<author><first>David Ifeoluwa</first><last>Adelani</last><affiliation>University College London</affiliation></author>
<author><first>Clement</first><last>Odoje</last><affiliation>University of Ibadan</affiliation></author>
<author><first>Dietrich</first><last>Klakow</last><affiliation>Saarland University</affiliation></author>
<pages>1-7</pages>
Expand Down
Loading

0 comments on commit 7d99dec

Please sign in to comment.