Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: explanation about sitemaps in headless WordPress #116

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

theodesp
Copy link
Member

@theodesp theodesp commented Mar 26, 2025

closes #50

@theodesp theodesp requested a review from a team as a code owner March 26, 2025 14:47
colinmurphy
colinmurphy previously approved these changes Mar 26, 2025
Copy link
Member

@colinmurphy colinmurphy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @theodesp 🚀 🚀 🚀

@colinmurphy colinmurphy requested a review from moonmeister March 27, 2025 10:10
Copy link
Member

@moonmeister moonmeister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the overview and the introduction of the problem. I like how each solution has clear pros and cons. The three solutions need a little work. 1 is new to me, I haven't heard of it before, have you seen others do this method? I'm open to it but seems a little strange. The three methods documented in the RFC #17 that we need to comer here are:

  1. Proxying the sitemap from the front end to the back end. - https://github.com/wpdecoupled/site/tree/main/src/routes/%5Bfile%3Dsitemap%5D
  2. Generating a sitemap from GraphQL content (you cover this with 2/3...but static VS SSR is not a "separate" method...just a different rendering strategy. - https://github.com/wpengine/faustjs.org/blob/main/src/pages/wp-sitemap.xml/index.jsx
  3. Fetching existing sitemaps and and parsing them into a new sitemap with any added framework routes (the current faust method)

@ahuseyn
Copy link
Member

ahuseyn commented Mar 31, 2025

@theodesp @moonmeister, the problem with the 2. approach (WPGraphQL) is that WPGraphQL returns maximum 100 node per page. If you set first: 1000 it will only return the first 100. Details: https://www.wpgraphql.com/docs/known-limitations#pagination-limits

You can increase that limit but it may cause problems for the WP instances low on resources: https://www.wpgraphql.com/filters/graphql_connection_max_query_amount

@colinmurphy colinmurphy requested a review from ahuseyn March 31, 2025 10:25
colinmurphy
colinmurphy previously approved these changes Mar 31, 2025
Copy link
Member

@colinmurphy colinmurphy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀 🚀 🚀

@whoami-pwd whoami-pwd self-requested a review March 31, 2025 15:07
whoami-pwd
whoami-pwd previously approved these changes Mar 31, 2025
Copy link

@whoami-pwd whoami-pwd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

ahuseyn
ahuseyn previously approved these changes Mar 31, 2025
Copy link
Member

@moonmeister moonmeister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking really close. just had a couple clarifying questions needed on some of the code examples

Comment on lines 45 to 54
const transformedContent = sitemapContent.replace(
new RegExp(process.env.WORDPRESS_URL, 'g'),
process.env.FRONTEND_URL
);

return new Response(transformedContent, {
headers: {
'Content-Type': 'application/xml',
},
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the front-end url is correctly set in WP this isn't necessary and causes performance issues. I'd recommend we make a note that the correct URL needs to be set in WP for this method and do a direct proxy as shown in https://github.com/wpdecoupled/site/blob/main/src/routes/%5Bfile%3Dsitemap%5D/%2Bserver.ts

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moonmeister I'm resposting this question regarding the nature of the proxy:

Regarding proxying the sitemap.xml from WordPress to the Headless site:

  • If the sitemap.xml is served from WordPress, the links inside it will use the WordPress site URL, which can cause CORS issues when accessed from the headless frontend.
  • If we ensure the sitemap links use the correct headless frontend hostname, the next issue is how Next.js maps <sitemapindex> links to file system routes.
    For example, these sitemap links:
http://localhost:3000/wp-sitemap-posts-post-1.xml

http://localhost:3000/wp-sitemap-posts-page-1.xml
http://localhost:3000/wp-sitemap-taxonomies-category-1.xml
http://localhost:3000/wp-sitemap-users-1.xml

Each of these would need a corresponding file or API route in Next.js.
I saw that Faust.js solves this by transforming sitemap links into a query format:
sitemap.xml?sitemap=<url.pathname>
I tried to do some Next.js Regex Path Matching to match in the config but its very unreliable they don't seem to work
I’m interested in hearing your thoughts on what we mean by proxying.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So a proxy fetches content from another endpoint based on a request. So a VPN is a proxy of sorts. https://en.wikipedia.org/wiki/Proxy_server

Comment on lines 34 to 39
// pages/[...sitemap].js
export async function GET(request, { params }) {
const { sitemap } = params;
const sitemapFile = Array.isArray(sitemap) ? sitemap.join('/') : sitemap;

const wpSitemapUrl = `${process.env.WORDPRESS_URL}/wp-sitemap${sitemapFile ? `-${sitemapFile}` : ''}.xml`;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand this code. A catch-all route wouldn't work with most other routing situations. Aditionally, any unknown URL that should 404 would just render a sitemap.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above regarding some questions about the logic.

@theodesp theodesp dismissed stale reviews from ahuseyn, whoami-pwd, and colinmurphy via 19073cd April 7, 2025 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

docs[explanation]: Overview of sitemaps in HWP
5 participants