Skip to content
/ nediom Public

Data for the paper - neDIOM: Dataset and Analysis of Nepali Idioms

License

Notifications You must be signed in to change notification settings

PortNLP/nediom

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

neDIOM

This repository contains the data released in the paper - neDIOM: Dataset and Analysis of Nepali Idioms, accepted at CHiPSAL (co-located with COLING 2025).

Cite Our Work

@inproceedings{pokharel-agrawal-2025-nediom,
    title = "ne{DIOM}: Dataset and Analysis of {N}epali Idioms",
    author = "Pokharel, Rhitabrat  and
      Agrawal, Ameeta",
    editor = "Sarveswaran, Kengatharaiyer  and
      Vaidya, Ashwini  and
      Krishna Bal, Bal  and
      Shams, Sana  and
      Thapa, Surendrabikram",
    booktitle = "Proceedings of the First Workshop on Challenges in Processing South Asian Languages (CHiPSAL 2025)",
    month = jan,
    year = "2025",
    address = "Abu Dhabi, UAE",
    publisher = "International Committee on Computational Linguistics",
    url = "https://aclanthology.org/2025.chipsal-1.16/",
    pages = "160--171",
    abstract = "Idioms, integral to any language, convey nuanced meanings and cultural references. However, beyond English, few resources exist to support any meaningful exploration of this unique linguistic phenomenon. To facilitate such an inquiry in a low resource language, we introduce a novel dataset of Nepali idioms and the sentences in which these naturally appear. We describe the methodology of creating this resource as well as discuss some of the challenges we encountered. The results of our empirical analysis under various settings using four distinct multilingual models consistently highlight the difficulties these models face in processing Nepali figurative language. Even fine-tuning the models yields limited benefits. Interestingly, the larger models from the BLOOM family of models failed to consistently outperform the smaller models. Overall, we hope that this new resource will facilitate further development of models that can support processing of idiomatic expressions in low resource languages such as Nepali."
}

About

Data for the paper - neDIOM: Dataset and Analysis of Nepali Idioms

Resources

License

Stars

Watchers

Forks