Blacklist by Keyword #1243

Sh4d0wHunt3rX · 2024-04-09T15:31:58Z

Sh4d0wHunt3rX
Apr 9, 2024

In some programs, we need to blacklist specific path such as https://www.example.com/blog/

However, it seems this is not possible with bbot, I wanted to suggest if it's possible add blacklist based on keyword.

So, if I add blog , then it won't scan or crawl any links that have blog in it.

Thanks 🙏

Update: I was also thinking about a way to limit crawling of similar links. For example, a site can have 100k products. I want to crawl only one of them, because the others are similar to this. one. Or for example a site can have 50k posts, but I want to crawl one of them. That would be great if it's possible to implement this.

Sh4d0wHunt3rX · 2024-04-25T18:16:57Z

Sh4d0wHunt3rX
Apr 25, 2024
Author

In this example, bbot is crawling +7k links in this format:

https://www.bluehost.com/cdn-cgi/challenge-platform/h/b/jsd/r/879e95894fc60a61

If there was a feature that we could stop crawling during scan, same as kill ffuf , use block /cdn-cgi/challenge-platform/ , so it stops crawling links contain that path, that would be great.

0 replies

TheTechromancer · 2024-04-25T18:22:03Z

TheTechromancer
Apr 25, 2024
Maintainer

Agreed this would be a good feature to have. Converting to issue.

0 replies

Sh4d0wHunt3rX · 2024-05-11T13:20:05Z

Sh4d0wHunt3rX
May 11, 2024
Author

Raw idea: If there was a module in bbot, that was responsible for blacklisting and also could prevent HTTPX crawling similar links, I think that would help a lot in crawling duration.

For example something like urless based on some configuration options check each URL before HTTPX wants to crawl it and decides if HTTPX has already crawled a similar link before or not and then allow for crawling or skip it.

It can have some similar features:

We define some keywords to blacklist them in crawling, for example /cdn-cgi/challenge-platform/

It only allows to crawl one language and skip others.

It skips similar links of posts and articles and products.

1 reply

TheTechromancer May 11, 2024
Maintainer

Yes this is a good idea. With Intercept modules in BBOT 2.0 it should be easy to implement.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blacklist by Keyword #1243

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Blacklist by Keyword #1243

Sh4d0wHunt3rX Apr 9, 2024

Replies: 3 comments · 1 reply

Sh4d0wHunt3rX Apr 25, 2024 Author

TheTechromancer Apr 25, 2024 Maintainer

Sh4d0wHunt3rX May 11, 2024 Author

TheTechromancer May 11, 2024 Maintainer

Sh4d0wHunt3rX
Apr 9, 2024

Replies: 3 comments 1 reply

Sh4d0wHunt3rX
Apr 25, 2024
Author

TheTechromancer
Apr 25, 2024
Maintainer

Sh4d0wHunt3rX
May 11, 2024
Author

TheTechromancer May 11, 2024
Maintainer