Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bad url: sitemap url value trails with bad characters #249

Open
adplincinst opened this issue Dec 21, 2023 · 1 comment
Open

bad url: sitemap url value trails with bad characters #249

adplincinst opened this issue Dec 21, 2023 · 1 comment

Comments

@adplincinst
Copy link

Verified during a gleaner run that not all sitemap urls would transfer to S3 in summoned/ subdirectory. Upon inspection from the following error it appears that internal/summoner/sitemaps/sitemap_ng.go:DomainSitemap() for some reason appends bad characters (i.e. \u003nil\0003e) to URL causing gleaner to be unable to fetch with following error (example):

logs/gleaner-2023-12-20-20-44-34.log:{"file":"/home/runner/work/gleaner/gleaner/internal/summoner/acquire/acquire.go:299","func":"github.com/gleanerio/gleaner/internal/summoner/acquire.getDomain.func2","level":"error","msg":"#112 bad url https://geoconnex.us/ca-gage-assessment/gages/LMN\u003cnil\u003e","time":"2023-12-20T20:44:48Z"}

The code attempts to fix this extraneous trailing whitespace by performing strings.Trimpspace but it doesn't remove these characters evidently.

@adplincinst
Copy link
Author

note used nsfearthcube/gleaner:dev_ec docker image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant