This repository contains pointers to various datasets of adposition/case supersenses in multiple languages.
The name of the language links to annotation guidelines if available. The guidelines version reflected in the corpus appears {in curly braces}.
-
STREUSLE: Online reviews section of the English Web Treebank, obtained from the Universal Dependencies project. STREUSLE is fully annotated with SNACS (as well as other lexical semantic annotations) and served as the primary reference corpus in developing the English SNACS guidelines. {EN v2.6} Citation: Schneider et al. ACL 2018
-
The Little Prince: {EN v2.5} All except Ch. 1, 4, 5, which are TBD.
- Ch. 1, 4, and 5 were annotated in a pilot phase with older guidelines, as described in Schneider et al. ACL 2018, and need to be updated.
-
PASTRIE (Reddit international English). {EN v2.5} Citation: Kranzlein et al. LAW 2020
- 小王子(Xiǎo Wáng Zǐ) [The Little Prince]. {based on EN v2.3} Citation: Peng et al. LREC 2020
- 어린왕자(Erin Wangca) [The Little Prince]. {KO v0.9 based on EN v2.5} Citation: Hwang et al. DMR 2020
- 星の王子さま(Hoshinoōjisama) [The Little Prince]. {based on EN v2.6} Ch. 1–10. Citation (paper includes guidelines): Aoyama et al. LREC-COLING 2024
- Der kleine Prinz [The Little Prince]. {based on EN v2.0 (superset)/2.5 (revised subset)} Citation (paper includes guidelines): Prange and Schneider Künstliche Intelligenz 2021
- नन्हा राजकुमार(Nanhā Rājkumār) [The Little Prince]. {HI v1.0 based on EN v2.5} Citation: Arora et al. LREC 2022
- નાનકડો રાજકુમાર(Nānakado Rājakumār) [The Little Prince]. {based on HI v1.0/EN v2.5} Citation: Mehta and Srikumar ACL Findings 2023
https://github.com/dchensta/adpositions_case contains two datasets:
- Finnish: Pikku Prinssi [The Little Prince], chapters 4 and 5. {based on EN v2.5}
- Latin: Regulus [The Little Prince], chapters 4 and 5. {based on EN v2.5}
Citation: Chen and Hulden LREC 2022
- Jena D. Hwang, Hanwool Choe, Na-Rae Han, and Nathan Schneider. "K-SNACS: Annotating Korean adposition semantics." In Proceedings of the Second International Workshop on Designing Meaning Representations, pp. 53-66. 2020.
- Michael Kranzlein, Emma Manning, Siyao Peng, Shira Wein, Aryaman Arora, and Nathan Schneider. "PASTRIE: A Corpus of Prepositions Annotated with Supersense Tags in Reddit International English." In Proceedings of the 14th Linguistic Annotation Workshop, pp. 105-116. 2020.
- Siyao Peng, Yang Liu, Yilun Zhu, Austin Blodgett, Yushi Zhao, and Nathan Schneider. "A Corpus of Adpositional Supersenses for Mandarin Chinese." In Proceedings of The 12th Language Resources and Evaluation Conference, pp. 5986-5994. 2020.
- Jakob Prange and Nathan Schneider. "Draw mir a sheep: a supersense-based analysis of German case and adposition semantics." KI-Künstliche Intelligenz 35(291–306). 2021.
- Nathan Schneider, Jena D. Hwang, Vivek Srikumar, Jakob Prange, Austin Blodgett, Sarah Moeller, Aviram Stern, Adi Shalev, and Omri Abend. "Comprehensive Supersense Disambiguation of English Prepositions and Possessives." In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 185-196. 2018.
- Aryaman Arora, Aryaman Arora, Nitin Venkateswaran, and Nathan Schneider. "MASALA: Modelling and Analysing the Semantics of Adpositions in Linguistic Annotation of Hindi." In Proceedings of The Thirteenth Language Resources and Evaluation Conference, pp. 5696–5704. 2022.
- Maitrey Metha and Vivek Srikumar. "Verifying Annotation Agreement without Multiple Experts: A Case Study with Gujarati SNACS." In Findings of the Association for Computational Linguistics: ACL 2023, pp. 10941-10958. 2023.
- Daniel Chen and Mans Hulden. "My Case, For an Adposition: Lexical Polysemy of Adpositions and Case Markers in Finnish and Latin". In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pp. 2610-2616. 2022.
- Tatsuya Aoyama, Chihiro Taguchi, and Nathan Schneider. "J-SNACS: Adposition and Case Supersenses for Japanese Joshi." In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pp. 9604–9614. 2024.