Japanese and Korean data for RC attachment experiment in 2024 PACLIC paper Multilingual Relative Clause Attachment Ambiguity Resolution in Large Language Models by Lee et al.
The original dataset by Hemworth () is available here. This dataset is a set of sentences with two conditions: long and short relative clauses, and attaching in object and subject position. Our work (link coming soon!) extends Hemworth's dataset to include translations into Japanese and Korean, which are head-final instead of head-initial, allowing us to test the ability of LLMs to work with typologically different languages.
DATA and PAPER link coming soon!