Add more entries to `robots.txt` #2044

YoshiRulz · 2024-11-23T17:25:44Z

It should be apparent why each of these isn't useful to crawl.

Might want to keep /Wiki/ViewSource for archival purposes?
Might want to block /Forum/Posts/User/ for humans who aren't logged in?
The false positive thing: spec, Google's nicer docs

Masterjun3 · 2024-11-23T17:46:44Z

Why disallow /Account/? It makes sense to me that people would google "tasvideos register" to find how to register an account.
And /Forum/Topics/Create/ needs authorization so we don't need to block it here. Otherwise I wonder why you don't also block wiki editing, post creation, userfile creation, and all the other creation stuff. Seems too much.
Why block /Forum/Posts/User/ ?
I don't understand the TODO. We know each entry has an implicit trailing wildcard, that's why we can block /Movies- e.g.

YoshiRulz · 2024-11-23T18:37:19Z

Oh right, I'll revert that.
It was in a list of highest-trafficked pages you shared recently. I think those others should be blocked, but if you don't want them listed here, I did have a solution using rel="nofollow" in another branch.
This was also in said list. It's not really a useful thing to index: crawling /Forum/Topics/{id} will include the posters' names with their posts, so in theory they should be searchable that way if someone wanted. But I think searching by poster is prone to abuse, hence my suggestion in OP.
The "correct" way is to put a $ after the path fragment (and then if there are subpages, include a separate .../* entry). I haven't bothered.

adelikat · 2024-11-23T19:56:09Z

2. I would recommend not listing this or any page that is auth required as it is not necessary and there are too many others. Robots will not be logged in. Also, I think usage statistics doesn't need to play a high role here. If it makes sense for something to be crawled, it should be crawled. If it needs to be more performant as a result, we can address that

4. Then at least put that information in the comment, so that someone who isn't an expert in robot crawling can do the comment. But I would recommend not bothering with it

Masterjun3

I have commited some removals as discussed, so that this PR can be merged.

Add more entries to robots.txt

76f5b0a

remove some entries as discussed

e9f8896

Masterjun3 approved these changes Nov 26, 2024

View reviewed changes

adelikat approved these changes Nov 26, 2024

View reviewed changes

adelikat merged commit ee2a5a8 into TASVideos:main Nov 26, 2024
1 check passed

YoshiRulz deleted the more-less-robots branch November 26, 2024 22:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more entries to `robots.txt` #2044

Add more entries to `robots.txt` #2044

YoshiRulz commented Nov 23, 2024

Masterjun3 commented Nov 23, 2024

YoshiRulz commented Nov 23, 2024

adelikat commented Nov 23, 2024 •

edited by Masterjun3

Loading

Masterjun3 left a comment

Add more entries to robots.txt #2044

Add more entries to robots.txt #2044

Conversation

YoshiRulz commented Nov 23, 2024

Masterjun3 commented Nov 23, 2024

YoshiRulz commented Nov 23, 2024

adelikat commented Nov 23, 2024 • edited by Masterjun3 Loading

Masterjun3 left a comment

Choose a reason for hiding this comment

Add more entries to `robots.txt` #2044

Add more entries to `robots.txt` #2044

adelikat commented Nov 23, 2024 •

edited by Masterjun3

Loading