Using LLM to improve documentation #563

keceli · 2024-12-08T06:39:47Z

I recently wrote a Python script that leverages GPT4o via Argonne's OpenAI service, Argo, to improve our documentation. The script focuses on fixing grammar and formatting issues while also improving the overall clarity of the content. Using this script, I have submitted PRs (#504, #562, #564) with updated documentation for review. I am creating this issue so we can have some of the more general discussion here rather than in the PRs.

Are there any concerns about the use of LLMs?
Are there any specific improvements or changes to the script and the prompts you’d suggest?
Is it worth automating this process further (e.g., integrating it into a GitHub action for automated checks)?
Should I continue submitting PRs in this way for the rest of the documentation?
Are the PR sizes appropriate?

For 3, I already wrote a script, but it requires an OpenAI API key. Would ALCF provide one for this service or should we try free models that can run on GitHub CI servers? Alternative is to have a mirror of the repo on GitLab and run GitLab action on ALCF machines with Argo access. Is this a viable approach?

Prompt

 Your task is to:

1. Identify and correct any grammatical errors.
2. Check for and fix any broken links.
3. Address any formatting issues.
7. Do not modify anchors within headers.
8. Provide a brief explanation of the changes made.
9. If no changes are necessary, respond with "The page reads great, no changes required."
10. If any change is required your response should include
    1. The revised content of the markdown file.
    2. An explanation of your changes, after adding this separator {separator}.

The text was updated successfully, but these errors were encountered:

keceli · 2024-12-13T19:55:08Z

Updated code is available here: https://github.com/argonne-lcf/drdoc I'd appreciate any feedback.

felker · 2025-01-13T23:00:27Z

Starting to curate a list of examples undesirable and/or unexplainable behaviors I noticed from the first batch of LLM-generated pull requests:

Behaviors to explore, discuss, and decide if we care or not:

Makes mysterious stylistic changes, like adding a horizontal divider line at the bottom of a random page or two for no apparent reason, like in this PR: https://github.com/argonne-lcf/user-guides/pull/568/files#r1913895914
As you also identified in Improve Sophia documentation with Argo/GPT4o #564 and Improve documentation for account and project management with Argo/GPT4o #567, it should maybe ignore files that are in not_in_nav/ subdirectories?
- Or rather, we need to separate not_in_nav/ (but files are still relevant and linked to by other pages that may be in the navigation sidebar, e.g. account-project-management/allocation-management/not_in_nav/sbank-*.md) and a new category unused/ (e.g. the old ThetaGPU docs under sophia/not_in_nav/)
The pull requests contain a lot of changes that some would consider "noise", and burdens code review
- Changing whitespace and combining lines, removing hard line wraps (see Decide on line wraps: hard wraps, soft wraps, semantic wraps #330 and Improve documentation with Argo/GPT4o #565 (review))
- Maybe add to the prompt "do not suggest changes to whitespace that would not result in a different rendered output"?

Bad:

Removes all HTML comments, some of which are left by authors to give context to future editors of the page and/or contain future TODO additions to the docs.
Silently deletes some snippets extension declarations: https://github.com/argonne-lcf/user-guides/pull/564/files#r1913855259
Doesnt like InlineHilite: https://github.com/argonne-lcf/user-guides/pull/564/files#r1913856736

Good:

Finds tons of typos, grammatical improvements, and smooths language
Capitalizes machines, software package names, etc. appropriately when authors get lazy with those conventions
Deletes extra blank lines and trailing whitespace
Finds subtle bugs in code examples: https://github.com/argonne-lcf/user-guides/pull/564/files#r1913857140
Adds richness to the markup when authors were too lazy to do so (e.g. backicks around qsub for inline code)

- Modified prompt to ignore whitespace-only changes to reduce noise in PRs - Add --exlcude option to exclued files, directories See the discussion: argonne-lcf/user-guides#563

keceli · 2025-01-22T01:31:22Z

Thanks Kyle @felker for the suggestions. The new PR #671 makes use of the updated prompt for the whitespace.

keceli mentioned this issue Jan 22, 2025

Drdoc updates for polaris #671

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using LLM to improve documentation #563

Using LLM to improve documentation #563

keceli commented Dec 8, 2024 •

edited

Loading

keceli commented Dec 13, 2024

felker commented Jan 13, 2025 •

edited

Loading

keceli commented Jan 22, 2025

Using LLM to improve documentation #563

Using LLM to improve documentation #563

Comments

keceli commented Dec 8, 2024 • edited Loading

Prompt

keceli commented Dec 13, 2024

felker commented Jan 13, 2025 • edited Loading

keceli commented Jan 22, 2025

keceli commented Dec 8, 2024 •

edited

Loading

felker commented Jan 13, 2025 •

edited

Loading