A collaborative project for training foundational Danish language model. Which seeks to:
- Develop and maintain state-of-the-art models for Danish,
- which are well-validated across a wide range of tasks.
- Furthermore, we wish to ensure good documentation, which allows users to assess the model for their use-case critically
- Open-source, both model and source code
Note: This repository is intended for the text model of DFM.
For more information please check out the following links:
📑 About | A overview of the DFM project |
Research Paper | An paper introducing DFM and its rationale |
🚀 Models | A overview of current models available through the DFM project |
💽 Datasets | Includes datasheets about the datasets which includes preprocessing, reason for constructions and more. |
DFM is considered a collaborative project for training and maintaining Danish Language models. If you wish to contribute don't hesitate to reach out using one of the following channels:
🗣 DDSC Slack | Join the discussion in the "danish-foundation-models"-channel |
💬 GitHub Discussion | Ask questions or start a discussion |
🚨 GitHub Issues | Notices a bug in the code? Please create an issue |
You can contribute both:
- Developer time, the lifeblood of any open-source project
- Pre-training datasets you wish to include in the model training
- Validation tasks can even be private benchmarks where you only wish to share the performance metrics.
- And probably in many other ways
🗣 Adding a dataset | A guide on how to add a new dataset |