|
| 1 | +# Confluence document source |
| 2 | + |
| 3 | +Importing information from Confluence is crucial for fine-tuning models on internal documentation. |
| 4 | +Many companies use Confluence to store their internal documents. |
| 5 | +Fine-tuned models can be employed within these companies and shared externally without compromising the internal documentation itself. |
| 6 | +Therefore, importing information from Confluence benefits both companies and the broader community. |
| 7 | + |
| 8 | +## Interfaces |
| 9 | + |
| 10 | +qna.yaml file, `document` section: |
| 11 | + |
| 12 | +- Confluence Host: The base URL of the Confluence instance. |
| 13 | +- Space: The Confluence space key where the documents reside. |
| 14 | +- Page titles: The titles of the Confluence pages to fetch. |
| 15 | +- Version: The version of the Confluence page. |
| 16 | + |
| 17 | +The qna.yaml file can define single host and multiple spaces and pages, |
| 18 | +each with an optional version. |
| 19 | + |
| 20 | +Confluence credentials in config.yaml: |
| 21 | +- Username |
| 22 | +- [Token](https://support.atlassian.com/atlassian-account/docs/manage-api-tokens-for-your-atlassian-account/) |
| 23 | + |
| 24 | +## Changes across modules |
| 25 | + |
| 26 | +- [Configuration module](https://github.com/instructlab/instructlab/blob/main/src/instructlab/config.py) |
| 27 | + defines the structure and validation rules for |
| 28 | + the config.yaml file. |
| 29 | +- [Schema module](https://github.com/instructlab/schema) defines the structure and validation rules for |
| 30 | + the qna.yaml file. |
| 31 | +- [sdg utilities module](https://github.com/instructlab/sdg/blob/main/src/instructlab/sdg/utils/taxonomy.py) |
| 32 | + fetches documents |
| 33 | +- [unit test](https://github.com/instructlab/instructlab/tree/main/tests) |
| 34 | + |
| 35 | +## Additional External Packages |
| 36 | + |
| 37 | +The implementation relies on the following external packages: |
| 38 | + |
| 39 | +- [atlassian-python-api](https://atlassian-python-api.readthedocs.io/) – |
| 40 | + A Python library to interact with Atlassian products, including Confluence. |
| 41 | +- [markdownify](https://pypi.org/project/markdownify/) – |
| 42 | + A library to convert HTML content to Markdown for processing Confluence page content. |
0 commit comments