Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mapping to ESCO 1.2 Taxonomy #1

Open
kuba-hf opened this issue Jan 23, 2025 · 1 comment
Open

Mapping to ESCO 1.2 Taxonomy #1

kuba-hf opened this issue Jan 23, 2025 · 1 comment

Comments

@kuba-hf
Copy link

kuba-hf commented Jan 23, 2025

Hi, first of all thank you for providing the datasets, they are very useful!

I am trying to link the labels of the dataset (particularly version for Germany) to concepts from ESCO taxonomy. However, for every annotation, the only identifiers available are the ones like [C000756_en_000, C000756_en_001, ... , C000756_en_022] that all correspond to different alternative labels of a single ESCO concept -- "aircraft engine specialist" with concept uri http://data.europa.eu/esco/occupation/0ac8fe65-32e6-4c25-8345-2b87bc7b2698.

Is there any mapping of the labels from annotations.tsv to conceptUri field from original ESCO taxonomy?

federetyk added a commit that referenced this issue Jan 25, 2025
Create a mapping file between the MELO corpus key and the ESCO concept URI for each element in the corpus, during the creation of a MELO dataset, as suggested by @kuba-bialczyk in #1 (comment)
@federetyk
Copy link
Collaborator

Hi @kuba-bialczyk,

Thank you for your message and for your interest in the MELO Benchmark! We're thrilled to hear that you find the datasets useful.

We agree that having a mapping between the "corpus element ID" in MELO and the concept URI in ESCO would be helpful. Based on your suggestion, we've just added this functionality to the repository. A new file, concept_mapping.tsv, is now included in the directory for each dataset and provides this mapping. For example, you can find the mapping for the German cross-lingual dataset here.

Please note that the German datasets use data from ESCO version 1.0.3 rather than 1.2, as mentioned in the issue title. (This is because of the ESCO version used in the original crosswalk created by the German Bundesagentur für Arbeit.) While there are some differences between these ESCO versions, most concepts should align well.

Thanks again for your suggestion! If there’s anything else we can help you with, or if you have further ideas to enhance the project, please don’t hesitate to reach out.

Best regards,
Federico Retyk
(Avature Machine Learning Team)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants