Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using LLM to improve documentation #563

Open
keceli opened this issue Dec 8, 2024 · 3 comments
Open

Using LLM to improve documentation #563

keceli opened this issue Dec 8, 2024 · 3 comments

Comments

@keceli
Copy link
Contributor

keceli commented Dec 8, 2024

I recently wrote a Python script that leverages GPT4o via Argonne's OpenAI service, Argo, to improve our documentation. The script focuses on fixing grammar and formatting issues while also improving the overall clarity of the content. Using this script, I have submitted PRs (#504, #562, #564) with updated documentation for review. I am creating this issue so we can have some of the more general discussion here rather than in the PRs.

  1. Are there any concerns about the use of LLMs?
  2. Are there any specific improvements or changes to the script and the prompts you’d suggest?
  3. Is it worth automating this process further (e.g., integrating it into a GitHub action for automated checks)?
  4. Should I continue submitting PRs in this way for the rest of the documentation?
  5. Are the PR sizes appropriate?

For 3, I already wrote a script, but it requires an OpenAI API key. Would ALCF provide one for this service or should we try free models that can run on GitHub CI servers? Alternative is to have a mirror of the repo on GitLab and run GitLab action on ALCF machines with Argo access. Is this a viable approach?

Prompt

 Your task is to:

1. Identify and correct any grammatical errors.
2. Check for and fix any broken links.
3. Address any formatting issues.
7. Do not modify anchors within headers.
8. Provide a brief explanation of the changes made.
9. If no changes are necessary, respond with "The page reads great, no changes required."
10. If any change is required your response should include
    1. The revised content of the markdown file.
    2. An explanation of your changes, after adding this separator {separator}.
@keceli
Copy link
Contributor Author

keceli commented Dec 13, 2024

Updated code is available here: https://github.com/argonne-lcf/drdoc I'd appreciate any feedback.

@felker
Copy link
Member

felker commented Jan 13, 2025

Starting to curate a list of examples undesirable and/or unexplainable behaviors I noticed from the first batch of LLM-generated pull requests:

Behaviors to explore, discuss, and decide if we care or not:

Bad:

Good:

  • Finds tons of typos, grammatical improvements, and smooths language
  • Capitalizes machines, software package names, etc. appropriately when authors get lazy with those conventions
  • Deletes extra blank lines and trailing whitespace
  • Finds subtle bugs in code examples: https://github.com/argonne-lcf/user-guides/pull/564/files#r1913857140
  • Adds richness to the markup when authors were too lazy to do so (e.g. backicks around qsub for inline code)

keceli added a commit to argonne-lcf/drdoc that referenced this issue Jan 22, 2025
- Modified prompt to ignore whitespace-only changes to reduce noise in PRs
- Add --exlcude option to exclued files, directories

See the discussion: argonne-lcf/user-guides#563
@keceli
Copy link
Contributor Author

keceli commented Jan 22, 2025

Thanks Kyle @felker for the suggestions. The new PR #671 makes use of the updated prompt for the whitespace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants