Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create recursive sitemap with robots.txt url extraction #10

Open
ghost opened this issue Oct 13, 2023 · 4 comments
Open

Create recursive sitemap with robots.txt url extraction #10

ghost opened this issue Oct 13, 2023 · 4 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@ghost
Copy link

ghost commented Oct 13, 2023

Hey mate,

i think it would be great to extend pysitemaps so it can extract urls matching specific locations.

for example the user could use a robots.txt in order to find valid urls on the server, or a config file .

What do you think?

@seowings seowings added enhancement New feature or request question Further information is requested labels Oct 15, 2023
@seowings
Copy link
Collaborator

Thanks for your comments. This certainly looks like a good/useful feature for following use cases.

  • Create Sitemap only for a "Path" specified by user
  • Sitemaps must respect robots.txt

I am currently looking into this, and will come with an extention.

@ghost
Copy link
Author

ghost commented Oct 17, 2023

great.... i will be watching your repo and feel free to contact me in case you have any questions :)

@seowings
Copy link
Collaborator

@9967819 Can you help me with some example data so that I can finalize a new build.

For example,

  • Input data for the test e.g. list of urls, or paths to exclue or sample robots.txt file.
  • desired output e.g. possible sitemap.

@ghost
Copy link
Author

ghost commented Oct 25, 2023

@9967819 Can you help me with some example data so that I can finalize a new build.

For example,

  • Input data for the test e.g. list of urls, or paths to exclue or sample robots.txt file.
  • desired output e.g. possible sitemap.

Sorry for the late reply. Here is my sample robots.txt file.

Also there should be a way to exclude static files (css and js) when generating the sitemap file. Note that it makes sense that the sitemap only includes urls matching the content of the robots.txt input. For testing, i suggest you make a testcase using your local development server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant