-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Googlebot ignores robots.txt? #20
Comments
@yarikoptic I believe it's supposed to be |
As for .git directories, that should be easy with a wildcard. Untested, but something akin to:
|
|
Whenever I looked before I could not find clarity, e.g. from https://en.wikipedia.org/wiki/Robots_exclusion_standard#Universal_%22*%22_match
but even there not clear how it is recognizing. e.g. we have
to cover some THANKS ;) |
Yeah, there isn't clarity, it's more of a living standard. Bots don't /have/ to follow your rules. They just should. And if they don't, you can ban them. Wildcard support looks to be common, and globs across directories, so you shouldn't need a glob per level. Perhaps that will help some less sophisticated bots though. |
last line in apache log file:
and robots.txt is accessed by google bots:
@aqw - have a clue what is going on?
Overall goal is to forbid bots to crawl .git/ directories, but I found no way to disable that.
The text was updated successfully, but these errors were encountered: