Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci[minor]: Add GitHub action to check broken links #403

Draft
wants to merge 21 commits into
base: main
Choose a base branch
from
85 changes: 85 additions & 0 deletions .github/workflows/link_check.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
name: Check Docs & Links

on:
pull_request:
branches:
- main
push:
branches:
- main
schedule:
- cron: "0 5 * * *"
workflow_dispatch:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the code below, it looks like you only want .ipynb files, in that case do this

Suggested change
on:
pull_request:
branches:
- main
push:
branches:
- main
schedule:
- cron: "0 5 * * *"
workflow_dispatch:
on:
push:
branches: ["main"]
pull_request:
paths:
- '**.ipynb'
workflow_dispatch:

and no need for scheduling, just have it run in PRs and on commits to main

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, unless this is because we want to have a recurring check. Prob makes sense. In that case ya keep the scheduling.


env:
POETRY_VERSION: "1.7.1"

jobs:
markdown-link-check:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Check links in Markdown files
uses: gaurav-nelson/github-action-markdown-link-check@v1
with:
folder-path: "examples/,docs/"
check-modified-files-only: ${{ github.event_name != 'schedule' }}
file-path: "./README.md"
config-file: "./.markdown-link-check.config.json"

notebook-link-check:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Set up Node.js
uses: actions/setup-node@v3
with:
node-version: '16.x'
cache: 'yarn'

- name: Install dependencies
run: |
yarn install --frozen-lockfile

- name: Check links in notebooks
env:
LANGCHAIN_API_KEY: test
run: |
if [ "${{ github.event_name }}" == "schedule" ] || [ "${{ github.event_name }}" == "workflow_dispatch" ] || ([ "${{ github.event_name }}" == "push" ] && [ "${{ github.ref }}" == "refs/heads/main" ]); then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prob dont need this since we have the condition at the top which will only run this on a pr push, push to main, workflow dispatch, or scheduling

echo "Running link check on all notebooks in examples directory..."
yarn run pytest -v --check-links-ignore "https://(api|web)\.smith\.langchain\.com/.*" --check-links-ignore "https://x.com/.*" --check-links examples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure where the pytest yarn script is coming from. Maybe you forgot to commit a new dependency you added?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, is this --check-links-ignore "https://(api|web)\.smith\.langchain\.com/.*" saying don't check langsmith links? If so, why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

langsmith links stuff was copied over, not sure why but assume there was a reason. tried to fix the pytest stuff in next push

else
echo "Fetching changes from origin/main..."
git fetch origin main
echo "Checking for changed notebook files..."
CHANGED_FILES=$(git diff --name-only --diff-filter=d origin/main | grep '\.ipynb$' || true)
echo "Changed files: ${CHANGED_FILES}"
if [ -n "${CHANGED_FILES}" ]; then
echo "Running link check on changed notebook files..."
yarn run pytest -v --check-links-ignore "https://(api|web)\.smith\.langchain\.com/.*" --check-links-ignore "https://x.com/.*" --check-links ${CHANGED_FILES} || ([ $? = 5 ] && exit 0 || exit $?)
else
echo "No notebook files changed."
fi
fi
check-readmes-synced:
# This checks that the repo README.md is identical to the libs/langgraph/README.md
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 1

- name: Check README.md is in sync
run: |
if ! diff -q README.md libs/langgraph/README.md >/dev/null; then
echo "README.md is out of sync with libs/langgraph/README.md"
diff -C 3 README.md libs/langgraph/README.md
exit 1
fi
isahers1 marked this conversation as resolved.
Show resolved Hide resolved
4 changes: 4 additions & 0 deletions .markdown-link-check.config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"aliveStatusCodes": [200, 206, 402],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to do ranges? We prob want all 200's, all 300's, etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my brief chat-gpt research it's not easily doable programatically. I just copied this from the main langgraph one so I assume someone smarter than me had a good reason for selecting these codes but idk

"ignorePatterns": ["*dcbadge.vercel.app*"]
}
Loading