diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml
index 4c7584280..f8caa0fd7 100644
--- a/.github/workflows/tests.yml
+++ b/.github/workflows/tests.yml
@@ -83,7 +83,7 @@ jobs:
publish_docs:
needs: update_docs
runs-on: ubuntu-latest
- if: github.event_name == 'push' && (github.ref == 'refs/heads/dev')
+ if: github.event_name == 'push' && (github.ref == 'refs/heads/stable')
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
diff --git a/README.md b/README.md
index b190acc39..7e7c31174 100644
--- a/README.md
+++ b/README.md
@@ -1,123 +1,287 @@
[![bbot_banner](https://user-images.githubusercontent.com/20261699/158000235-6c1ace81-a267-4f8e-90a1-f4c16884ebac.png)](https://github.com/blacklanternsecurity/bbot)
-# BEE·bot
+[![Python Version](https://img.shields.io/badge/python-3.9+-FF8400)](https://www.python.org) [![License](https://img.shields.io/badge/license-GPLv3-FF8400.svg)](https://github.com/blacklanternsecurity/bbot/blob/dev/LICENSE) [![DEF CON Demo Labs 2023](https://img.shields.io/badge/DEF%20CON%20Demo%20Labs-2023-FF8400.svg)](https://forum.defcon.org/node/246338) [![PyPi Downloads](https://static.pepy.tech/personalized-badge/bbot?right_color=orange&left_color=grey)](https://pepy.tech/project/bbot) [![Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [![Tests](https://github.com/blacklanternsecurity/bbot/actions/workflows/tests.yml/badge.svg?branch=stable)](https://github.com/blacklanternsecurity/bbot/actions?query=workflow%3A"tests") [![Codecov](https://codecov.io/gh/blacklanternsecurity/bbot/branch/dev/graph/badge.svg?token=IR5AZBDM5K)](https://codecov.io/gh/blacklanternsecurity/bbot) [![Discord](https://img.shields.io/discord/859164869970362439)](https://discord.com/invite/PZqkgxu5SA)
-### A Recursive Internet Scanner for Hackers.
+### **BEE·bot** is a multipurpose scanner inspired by [Spiderfoot](https://github.com/smicallef/spiderfoot), built to automate your **Recon**, **Bug Bounties**, and **ASM**!
-[![Python Version](https://img.shields.io/badge/python-3.9+-FF8400)](https://www.python.org) [![License](https://img.shields.io/badge/license-GPLv3-FF8400.svg)](https://github.com/blacklanternsecurity/bbot/blob/dev/LICENSE) [![DEF CON Demo Labs 2023](https://img.shields.io/badge/DEF%20CON%20Demo%20Labs-2023-FF8400.svg)](https://forum.defcon.org/node/246338) [![PyPi Downloads](https://static.pepy.tech/personalized-badge/bbot?right_color=orange&left_color=grey)](https://pepy.tech/project/bbot) [![Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [![Tests](https://github.com/blacklanternsecurity/bbot/actions/workflows/tests.yml/badge.svg?branch=stable)](https://github.com/blacklanternsecurity/bbot/actions?query=workflow%3A"tests") [![Codecov](https://codecov.io/gh/blacklanternsecurity/bbot/branch/dev/graph/badge.svg?token=IR5AZBDM5K)](https://codecov.io/gh/blacklanternsecurity/bbot) [![Discord](https://img.shields.io/discord/859164869970362439)](https://discord.com/invite/PZqkgxu5SA)
+https://github.com/blacklanternsecurity/bbot/assets/20261699/e539e89b-92ea-46fa-b893-9cde94eebf81
-BBOT (Bighuge BLS OSINT Tool) is a recursive internet scanner inspired by [Spiderfoot](https://github.com/smicallef/spiderfoot), but designed to be faster, more reliable, and friendlier to pentesters, bug bounty hunters, and developers.
+_A BBOT scan in real-time - visualization with [VivaGraphJS](https://github.com/blacklanternsecurity/bbot-vivagraphjs)_
-Special features include:
+## Installation
-- Support for Multiple Targets
-- Web Screenshots
-- Suite of Offensive Web Modules
-- AI-powered Subdomain Mutations
-- Native Output to Neo4j (and more)
-- Python API + Developer [Documentation](https://www.blacklanternsecurity.com/bbot/)
+```bash
+# stable version
+pipx install bbot
-https://github.com/blacklanternsecurity/bbot/assets/20261699/742df3fe-5d1f-4aea-83f6-f990657bf695
+# bleeding edge (dev branch)
+pipx install --pip-args '\--pre' bbot
+```
-_A BBOT scan in real-time - visualization with [VivaGraphJS](https://github.com/blacklanternsecurity/bbot-vivagraphjs)_
+_For more installation methods, including [Docker](https://hub.docker.com/r/blacklanternsecurity/bbot), see [Getting Started](https://www.blacklanternsecurity.com/bbot/)_
-## Quick Start Guide
+## Example Commands
+
+### 1) Subdomain Finder
+
+Passive API sources plus a recursive DNS brute-force with target-specific subdomain mutations.
+
+```bash
+# find subdomains of evilcorp.com
+bbot -t evilcorp.com -p subdomain-enum
+```
-Below are some short help sections to get you up and running.
+
-Installation ( Pip )
+subdomain-enum.yml
+
+```yaml
+description: Enumerate subdomains via APIs, brute-force
+
+flags:
+ # enable every module with the subdomain-enum flag
+ - subdomain-enum
+
+output_modules:
+ # output unique subdomains to TXT file
+ - subdomains
+
+config:
+ dns:
+ threads: 25
+ brute_threads: 1000
+ # put your API keys here
+ modules:
+ github:
+ api_key: ""
+ chaos:
+ api_key: ""
+ securitytrails:
+ api_key: ""
+
+```
+
+
+
+
+
+BBOT consistently finds 20-50% more subdomains than other tools. The bigger the domain, the bigger the difference. To learn how this is possible, see [How It Works](https://www.blacklanternsecurity.com/bbot/how_it_works/).
-Note: BBOT's [PyPi package](https://pypi.org/project/bbot/) requires Linux and Python 3.9+.
+![subdomain-stats-ebay](https://github.com/blacklanternsecurity/bbot/assets/20261699/de3e7f21-6f52-4ac4-8eab-367296cd385f)
+
+### 2) Web Spider
```bash
-# stable version
-pipx install bbot
+# crawl evilcorp.com, extracting emails and other goodies
+bbot -t evilcorp.com -p spider
+```
-# bleeding edge (dev branch)
-pipx install --pip-args '\--pre' bbot
+
+
+
+spider.yml
+
+```yaml
+description: Recursive web spider
+
+modules:
+ - httpx
+
+config:
+ web:
+ # how many links to follow in a row
+ spider_distance: 2
+ # don't follow links whose directory depth is higher than 4
+ spider_depth: 4
+ # maximum number of links to follow per page
+ spider_links_per_page: 25
-bbot --help
```
-
-Installation ( Docker )
+
-[Docker images](https://hub.docker.com/r/blacklanternsecurity/bbot) are provided, along with helper script `bbot-docker.sh` to persist your scan data.
+### 3) Email Gatherer
```bash
-# bleeding edge (dev)
-docker run -it blacklanternsecurity/bbot --help
+# quick email enum with free APIs + scraping
+bbot -t evilcorp.com -p email-enum
-# stable
-docker run -it blacklanternsecurity/bbot:stable --help
+# pair with subdomain enum + web spider for maximum yield
+bbot -t evilcorp.com -p email-enum subdomain-enum spider
+```
+
+
+
+
+email-enum.yml
+
+```yaml
+description: Enumerate email addresses from APIs, web crawling, etc.
+
+flags:
+ - email-enum
+
+output_modules:
+ - emails
-# helper script
-git clone https://github.com/blacklanternsecurity/bbot && cd bbot
-./bbot-docker.sh --help
```
+
+
+### 4) Web Scanner
+
+```bash
+# run a light web scan against www.evilcorp.com
+bbot -t www.evilcorp.com -p web-basic
+
+# run a heavy web scan against www.evilcorp.com
+bbot -t www.evilcorp.com -p web-thorough
+```
+
+
+
-Example Usage
+web-basic.yml
-## Example Commands
+```yaml
+description: Quick web scan
-Scan output, logs, etc. are saved to `~/.bbot`. For more detailed examples and explanations, see [Scanning](https://www.blacklanternsecurity.com/bbot/scanning).
+include:
+ - iis-shortnames
-
-**Subdomains:**
+flags:
+ - web-basic
-```bash
-# Perform a full subdomain enumeration on evilcorp.com
-bbot -t evilcorp.com -f subdomain-enum
```
-**Subdomains (passive only):**
+
+
+
+
+
+
+
+web-thorough.yml
+
+```yaml
+description: Aggressive web scan
+
+include:
+ # include the web-basic preset
+ - web-basic
+
+flags:
+ - web-thorough
-```bash
-# Perform a passive-only subdomain enumeration on evilcorp.com
-bbot -t evilcorp.com -f subdomain-enum -rf passive
```
-**Subdomains + port scan + web screenshots:**
+
+
+
+
+### 5) Everything Everywhere All at Once
```bash
-# Port-scan every subdomain, screenshot every webpage, output to current directory
-bbot -t evilcorp.com -f subdomain-enum -m nmap gowitness -n my_scan -o .
+# everything everywhere all at once
+bbot -t evilcorp.com -p kitchen-sink
+
+# roughly equivalent to:
+bbot -t evilcorp.com -p subdomain-enum cloud-enum code-enum email-enum spider web-basic paramminer dirbust-light web-screenshots
```
-**Subdomains + basic web scan:**
+
+
+
+kitchen-sink.yml
+
+```yaml
+description: Everything everywhere all at once
+
+include:
+ - subdomain-enum
+ - cloud-enum
+ - code-enum
+ - email-enum
+ - spider
+ - web-basic
+ - paramminer
+ - dirbust-light
+ - web-screenshots
+
+config:
+ modules:
+ baddns:
+ enable_references: True
+
+
-```bash
-# A basic web scan includes wappalyzer, robots.txt, and other non-intrusive web modules
-bbot -t evilcorp.com -f subdomain-enum web-basic
```
-**Web spider:**
+
-```bash
-# Crawl www.evilcorp.com up to a max depth of 2, automatically extracting emails, secrets, etc.
-bbot -t www.evilcorp.com -m httpx robots badsecrets secretsdb -c web_spider_distance=2 web_spider_depth=2
+
+
+## How it Works
+
+Click the graph below to explore the [inner workings](https://www.blacklanternsecurity.com/bbot/how_it_works/) of BBOT.
+
+[![image](https://github.com/blacklanternsecurity/bbot/assets/20261699/e55ba6bd-6d97-48a6-96f0-e122acc23513)](https://www.blacklanternsecurity.com/bbot/how_it_works/)
+
+## BBOT as a Python Library
+
+#### Synchronous
+```python
+from bbot.scanner import Scanner
+
+scan = Scanner("evilcorp.com", presets=["subdomain-enum"])
+for event in scan.start():
+ print(event)
```
-**Everything everywhere all at once:**
+#### Asynchronous
+```python
+from bbot.scanner import Scanner
-```bash
-# Subdomains, emails, cloud buckets, port scan, basic web, web screenshots, nuclei
-bbot -t evilcorp.com -f subdomain-enum email-enum cloud-enum web-basic -m nmap gowitness nuclei --allow-deadly
+async def main():
+ scan = Scanner("evilcorp.com", presets=["subdomain-enum"])
+ async for event in scan.async_start():
+ print(event.json())
+
+import asyncio
+asyncio.run(main())
```
-
+
+
+SEE: This Nefarious Discord Bot
+
+A [BBOT Discord Bot](https://www.blacklanternsecurity.com/bbot/dev/discord_bot/) that responds to the `/scan` command. Scan the internet from the comfort of your discord server!
+
+![bbot-discord](https://github.com/blacklanternsecurity/bbot/assets/20261699/22b268a2-0dfd-4c2a-b7c5-548c0f2cc6f9)
+
+
+
+## Feature Overview
+
+- Support for Multiple Targets
+- Web Screenshots
+- Suite of Offensive Web Modules
+- NLP-powered Subdomain Mutations
+- Native Output to Neo4j (and more)
+- Automatic dependency install with Ansible
+- Search entire attack surface with custom YARA rules
+- Python API + Developer Documentation
## Targets
BBOT accepts an unlimited number of targets via `-t`. You can specify targets either directly on the command line or in files (or both!):
```bash
-bbot -t evilcorp.com evilcorp.org 1.2.3.0/24 -f subdomain-enum
+bbot -t evilcorp.com evilcorp.org 1.2.3.0/24 -p subdomain-enum
```
Targets can be any of the following:
@@ -134,7 +298,7 @@ For more information, see [Targets](https://www.blacklanternsecurity.com/bbot/sc
Similar to Amass or Subfinder, BBOT supports API keys for various third-party services such as SecurityTrails, etc.
-The standard way to do this is to enter your API keys in **`~/.config/bbot/secrets.yml`**:
+The standard way to do this is to enter your API keys in **`~/.config/bbot/bbot.yml`**:
```yaml
modules:
shodan_dns:
@@ -152,43 +316,17 @@ If you like, you can also specify them on the command line:
bbot -c modules.virustotal.api_key=dd5f0eee2e4a99b71a939bded450b246
```
-For details, see [Configuration](https://www.blacklanternsecurity.com/bbot/scanning/configuration/)
-
-## BBOT as a Python Library
-
-BBOT exposes a Python API that allows it to be used for all kinds of fun and nefarious purposes, like a [Discord Bot](https://www.blacklanternsecurity.com/bbot/dev/#bbot-python-library-advanced-usage#discord-bot-example) that responds to the `/scan` command.
-
-![bbot-discord](https://github.com/blacklanternsecurity/bbot/assets/20261699/22b268a2-0dfd-4c2a-b7c5-548c0f2cc6f9)
-
-**Synchronous**
-
-```python
-from bbot.scanner import Scanner
-
-# any number of targets can be specified
-scan = Scanner("example.com", "scanme.nmap.org", modules=["nmap", "sslcert"])
-for event in scan.start():
- print(event.json())
-```
-
-**Asynchronous**
-
-```python
-from bbot.scanner import Scanner
-
-async def main():
- scan = Scanner("example.com", "scanme.nmap.org", modules=["nmap", "sslcert"])
- async for event in scan.async_start():
- print(event.json())
+For details, see [Configuration](https://www.blacklanternsecurity.com/bbot/scanning/configuration/).
-import asyncio
-asyncio.run(main())
-```
+## Complete Lists of Modules, Flags, etc.
-
+- Complete list of [Modules](https://www.blacklanternsecurity.com/bbot/modules/list_of_modules/).
+- Complete list of [Flags](https://www.blacklanternsecurity.com/bbot/scanning/#list-of-flags).
+- Complete list of [Presets](https://www.blacklanternsecurity.com/bbot/scanning/presets_list/).
+ - Complete list of [Global Config Options](https://www.blacklanternsecurity.com/bbot/scanning/configuration/#global-config-options).
+ - Complete list of [Module Config Options](https://www.blacklanternsecurity.com/bbot/scanning/configuration/#module-config-options).
-
-Documentation - Table of Contents
+## Documentation
- **User Manual**
@@ -198,6 +336,9 @@ asyncio.run(main())
- [Comparison to Other Tools](https://www.blacklanternsecurity.com/bbot/comparison)
- **Scanning**
- [Scanning Overview](https://www.blacklanternsecurity.com/bbot/scanning/)
+ - **Presets**
+ - [Overview](https://www.blacklanternsecurity.com/bbot/scanning/presets)
+ - [List of Presets](https://www.blacklanternsecurity.com/bbot/scanning/presets_list)
- [Events](https://www.blacklanternsecurity.com/bbot/scanning/events)
- [Output](https://www.blacklanternsecurity.com/bbot/scanning/output)
- [Tips and Tricks](https://www.blacklanternsecurity.com/bbot/scanning/tips_and_tricks)
@@ -207,31 +348,36 @@ asyncio.run(main())
- [List of Modules](https://www.blacklanternsecurity.com/bbot/modules/list_of_modules)
- [Nuclei](https://www.blacklanternsecurity.com/bbot/modules/nuclei)
- **Misc**
+ - [Contribution](https://www.blacklanternsecurity.com/bbot/contribution)
- [Release History](https://www.blacklanternsecurity.com/bbot/release_history)
- [Troubleshooting](https://www.blacklanternsecurity.com/bbot/troubleshooting)
- **Developer Manual**
- - [How to Write a Module](https://www.blacklanternsecurity.com/bbot/contribution)
- [Development Overview](https://www.blacklanternsecurity.com/bbot/dev/)
- - [Scanner](https://www.blacklanternsecurity.com/bbot/dev/scanner)
- - [Event](https://www.blacklanternsecurity.com/bbot/dev/event)
- - [Target](https://www.blacklanternsecurity.com/bbot/dev/target)
- - [BaseModule](https://www.blacklanternsecurity.com/bbot/dev/basemodule)
- - **Helpers**
- - [Overview](https://www.blacklanternsecurity.com/bbot/dev/helpers/)
- - [Command](https://www.blacklanternsecurity.com/bbot/dev/helpers/command)
- - [DNS](https://www.blacklanternsecurity.com/bbot/dev/helpers/dns)
- - [Interactsh](https://www.blacklanternsecurity.com/bbot/dev/helpers/interactsh)
- - [Miscellaneous](https://www.blacklanternsecurity.com/bbot/dev/helpers/misc)
- - [Web](https://www.blacklanternsecurity.com/bbot/dev/helpers/web)
- - [Word Cloud](https://www.blacklanternsecurity.com/bbot/dev/helpers/wordcloud)
+ - [BBOT Internal Architecture](https://www.blacklanternsecurity.com/bbot/dev/architecture)
+ - [How to Write a BBOT Module](https://www.blacklanternsecurity.com/bbot/dev/module_howto)
+ - [Unit Tests](https://www.blacklanternsecurity.com/bbot/dev/tests)
+ - [Discord Bot Example](https://www.blacklanternsecurity.com/bbot/dev/discord_bot)
+ - **Code Reference**
+ - [Scanner](https://www.blacklanternsecurity.com/bbot/dev/scanner)
+ - [Presets](https://www.blacklanternsecurity.com/bbot/dev/presets)
+ - [Event](https://www.blacklanternsecurity.com/bbot/dev/event)
+ - [Target](https://www.blacklanternsecurity.com/bbot/dev/target)
+ - [BaseModule](https://www.blacklanternsecurity.com/bbot/dev/basemodule)
+ - [BBOTCore](https://www.blacklanternsecurity.com/bbot/dev/core)
+ - [Engine](https://www.blacklanternsecurity.com/bbot/dev/engine)
+ - **Helpers**
+ - [Overview](https://www.blacklanternsecurity.com/bbot/dev/helpers/)
+ - [Command](https://www.blacklanternsecurity.com/bbot/dev/helpers/command)
+ - [DNS](https://www.blacklanternsecurity.com/bbot/dev/helpers/dns)
+ - [Interactsh](https://www.blacklanternsecurity.com/bbot/dev/helpers/interactsh)
+ - [Miscellaneous](https://www.blacklanternsecurity.com/bbot/dev/helpers/misc)
+ - [Web](https://www.blacklanternsecurity.com/bbot/dev/helpers/web)
+ - [Word Cloud](https://www.blacklanternsecurity.com/bbot/dev/helpers/wordcloud)
-
-
-
-Contribution
+## Contribution
-BBOT is constantly being improved by the community. Every day it grows more powerful!
+Some of the best BBOT modules were written by the community. BBOT is being constantly improved; every day it grows more powerful!
We welcome contributions. Not just code, but ideas too! If you have an idea for a new feature, please let us know in [Discussions](https://github.com/blacklanternsecurity/bbot/discussions). If you want to get your hands dirty, see [Contribution](https://www.blacklanternsecurity.com/bbot/contribution/). There you can find setup instructions and a simple tutorial on how to write a BBOT module. We also have extensive [Developer Documentation](https://www.blacklanternsecurity.com/bbot/dev/).
@@ -243,72 +389,12 @@ Thanks to these amazing people for contributing to BBOT! :heart:
-Special thanks to the following people who made BBOT possible:
+Special thanks to:
- @TheTechromancer for creating [BBOT](https://github.com/blacklanternsecurity/bbot)
-- @liquidsec for his extensive work on BBOT's web hacking features, including [badsecrets](https://github.com/blacklanternsecurity/badsecrets)
+- @liquidsec for his extensive work on BBOT's web hacking features, including [badsecrets](https://github.com/blacklanternsecurity/badsecrets) and [baddns](https://github.com/blacklanternsecurity/baddns)
- Steve Micallef (@smicallef) for creating Spiderfoot
- @kerrymilan for his Neo4j and Ansible expertise
+- @domwhewell-sage for his family of badass code-looting modules
- @aconite33 and @amiremami for their ruthless testing
-- Aleksei Kornev (@alekseiko) for allowing us ownership of the bbot Pypi repository <3
-
-
-
-## Comparison to Other Tools
-
-BBOT consistently finds 20-50% more subdomains than other tools. The bigger the domain, the bigger the difference. To learn how this is possible, see [How It Works](https://www.blacklanternsecurity.com/bbot/how_it_works/).
-
-![subdomain-stats-ebay](https://github.com/blacklanternsecurity/bbot/assets/20261699/53e07e9f-50b6-4b70-9e83-297dbfbcb436)
-
-## BBOT Modules By Flag
-For a full list of modules, including the data types consumed and emitted by each one, see [List of Modules](https://www.blacklanternsecurity.com/bbot/modules/list_of_modules/).
-
-
-| Flag | # Modules | Description | Modules |
-|------------------|-------------|----------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| safe | 84 | Non-intrusive, safe to run | affiliates, aggregate, ajaxpro, anubisdb, asn, azure_realm, azure_tenant, baddns, baddns_zone, badsecrets, bevigil, binaryedge, bucket_amazon, bucket_azure, bucket_digitalocean, bucket_file_enum, bucket_firebase, bucket_google, builtwith, c99, censys, certspotter, chaos, code_repository, columbus, credshed, crobat, crt, dehashed, digitorus, dnscaa, dnscommonsrv, dnsdumpster, docker_pull, dockerhub, emailformat, filedownload, fingerprintx, fullhunt, git, git_clone, github_codesearch, github_org, github_workflows, gitlab, gowitness, hackertarget, httpx, hunt, hunterio, iis_shortnames, internetdb, ip2location, ipstack, leakix, myssl, newsletters, ntlm, oauth, otx, passivetotal, pgp, postman, rapiddns, riddler, robots, secretsdb, securitytrails, shodan_dns, sitedossier, skymem, social, sslcert, subdomaincenter, sublist3r, threatminer, trufflehog, unstructured, urlscan, viewdns, virustotal, wappalyzer, wayback, zoomeye |
-| passive | 64 | Never connects to target systems | affiliates, aggregate, anubisdb, asn, azure_realm, azure_tenant, bevigil, binaryedge, bucket_file_enum, builtwith, c99, censys, certspotter, chaos, code_repository, columbus, credshed, crobat, crt, dehashed, digitorus, dnscaa, dnscommonsrv, dnsdumpster, docker_pull, dockerhub, emailformat, excavate, fullhunt, git_clone, github_codesearch, github_org, github_workflows, hackertarget, hunterio, internetdb, ip2location, ipneighbor, ipstack, leakix, massdns, myssl, otx, passivetotal, pgp, postman, rapiddns, riddler, securitytrails, shodan_dns, sitedossier, skymem, social, speculate, subdomaincenter, sublist3r, threatminer, trufflehog, unstructured, urlscan, viewdns, virustotal, wayback, zoomeye |
-| subdomain-enum | 46 | Enumerates subdomains | anubisdb, asn, azure_realm, azure_tenant, baddns_zone, bevigil, binaryedge, builtwith, c99, censys, certspotter, chaos, columbus, crt, digitorus, dnscaa, dnscommonsrv, dnsdumpster, fullhunt, github_codesearch, github_org, hackertarget, httpx, hunterio, internetdb, ipneighbor, leakix, massdns, myssl, oauth, otx, passivetotal, postman, rapiddns, riddler, securitytrails, shodan_dns, sitedossier, sslcert, subdomaincenter, subdomains, threatminer, urlscan, virustotal, wayback, zoomeye |
-| active | 43 | Makes active connections to target systems | ajaxpro, baddns, baddns_zone, badsecrets, bucket_amazon, bucket_azure, bucket_digitalocean, bucket_firebase, bucket_google, bypass403, dastardly, dotnetnuke, ffuf, ffuf_shortnames, filedownload, fingerprintx, generic_ssrf, git, gitlab, gowitness, host_header, httpx, hunt, iis_shortnames, masscan, newsletters, nmap, ntlm, nuclei, oauth, paramminer_cookies, paramminer_getparams, paramminer_headers, robots, secretsdb, smuggler, sslcert, telerik, url_manipulation, vhost, wafw00f, wappalyzer, wpscan |
-| web-thorough | 29 | More advanced web scanning functionality | ajaxpro, azure_realm, badsecrets, bucket_amazon, bucket_azure, bucket_digitalocean, bucket_firebase, bucket_google, bypass403, dastardly, dotnetnuke, ffuf_shortnames, filedownload, generic_ssrf, git, host_header, httpx, hunt, iis_shortnames, nmap, ntlm, oauth, robots, secretsdb, smuggler, sslcert, telerik, url_manipulation, wappalyzer |
-| aggressive | 21 | Generates a large amount of network traffic | bypass403, dastardly, dotnetnuke, ffuf, ffuf_shortnames, generic_ssrf, host_header, ipneighbor, masscan, massdns, nmap, nuclei, paramminer_cookies, paramminer_getparams, paramminer_headers, smuggler, telerik, url_manipulation, vhost, wafw00f, wpscan |
-| web-basic | 17 | Basic, non-intrusive web scan functionality | azure_realm, baddns, badsecrets, bucket_amazon, bucket_azure, bucket_firebase, bucket_google, filedownload, git, httpx, iis_shortnames, ntlm, oauth, robots, secretsdb, sslcert, wappalyzer |
-| cloud-enum | 12 | Enumerates cloud resources | azure_realm, azure_tenant, baddns, baddns_zone, bucket_amazon, bucket_azure, bucket_digitalocean, bucket_file_enum, bucket_firebase, bucket_google, httpx, oauth |
-| slow | 10 | May take a long time to complete | bucket_digitalocean, dastardly, docker_pull, fingerprintx, git_clone, paramminer_cookies, paramminer_getparams, paramminer_headers, smuggler, vhost |
-| affiliates | 8 | Discovers affiliated hostnames/domains | affiliates, azure_realm, azure_tenant, builtwith, oauth, sslcert, viewdns, zoomeye |
-| email-enum | 8 | Enumerates email addresses | dehashed, dnscaa, emailformat, emails, hunterio, pgp, skymem, sslcert |
-| deadly | 4 | Highly aggressive | dastardly, ffuf, nuclei, vhost |
-| portscan | 3 | Discovers open ports | internetdb, masscan, nmap |
-| web-paramminer | 3 | Discovers HTTP parameters through brute-force | paramminer_cookies, paramminer_getparams, paramminer_headers |
-| baddns | 2 | Runs all modules from the DNS auditing tool BadDNS | baddns, baddns_zone |
-| iis-shortnames | 2 | Scans for IIS Shortname vulnerability | ffuf_shortnames, iis_shortnames |
-| report | 2 | Generates a report at the end of the scan | affiliates, asn |
-| social-enum | 2 | Enumerates social media | httpx, social |
-| repo-enum | 1 | Enumerates code repositories | code_repository |
-| service-enum | 1 | Identifies protocols running on open ports | fingerprintx |
-| subdomain-hijack | 1 | Detects hijackable subdomains | baddns |
-| web-screenshots | 1 | Takes screenshots of web pages | gowitness |
-
-
-## BBOT Output Modules
-BBOT can save its data to TXT, CSV, JSON, and tons of other destinations including [Neo4j](https://www.blacklanternsecurity.com/bbot/scanning/output/#neo4j), [Splunk](https://www.blacklanternsecurity.com/bbot/scanning/output/#splunk), and [Discord](https://www.blacklanternsecurity.com/bbot/scanning/output/#discord-slack-teams). For instructions on how to use these, see [Output Modules](https://www.blacklanternsecurity.com/bbot/scanning/output).
-
-
-| Module | Type | Needs API Key | Description | Flags | Consumed Events | Produced Events |
-|-----------------|--------|-----------------|-----------------------------------------------------------------------------------------|----------------|--------------------------------------------------------------------------------------------------|---------------------------|
-| asset_inventory | output | No | Merge hosts, open ports, technologies, findings, etc. into a single asset inventory CSV | | DNS_NAME, FINDING, HTTP_RESPONSE, IP_ADDRESS, OPEN_TCP_PORT, TECHNOLOGY, URL, VULNERABILITY, WAF | IP_ADDRESS, OPEN_TCP_PORT |
-| csv | output | No | Output to CSV | | * | |
-| discord | output | No | Message a Discord channel when certain events are encountered | | * | |
-| emails | output | No | Output any email addresses found belonging to the target domain | email-enum | EMAIL_ADDRESS | |
-| http | output | No | Send every event to a custom URL via a web request | | * | |
-| human | output | No | Output to text | | * | |
-| json | output | No | Output to Newline-Delimited JSON (NDJSON) | | * | |
-| neo4j | output | No | Output to Neo4j | | * | |
-| python | output | No | Output via Python API | | * | |
-| slack | output | No | Message a Slack channel when certain events are encountered | | * | |
-| splunk | output | No | Send every event to a splunk instance through HTTP Event Collector | | * | |
-| subdomains | output | No | Output only resolved, in-scope subdomains | subdomain-enum | DNS_NAME, DNS_NAME_UNRESOLVED | |
-| teams | output | No | Message a Teams channel when certain events are encountered | | * | |
-| web_report | output | No | Create a markdown report with web assets | | FINDING, TECHNOLOGY, URL, VHOST, VULNERABILITY | |
-| websocket | output | No | Output to websockets | | * | |
-
+- Aleksei Kornev (@alekseiko) for granting us ownership of the bbot Pypi repository <3
diff --git a/bbot/__init__.py b/bbot/__init__.py
index 1d95273e3..8746d8131 100644
--- a/bbot/__init__.py
+++ b/bbot/__init__.py
@@ -1,10 +1,4 @@
# version placeholder (replaced by poetry-dynamic-versioning)
-__version__ = "0.0.0"
+__version__ = "v0.0.0"
-# global app config
-from .core import configurator
-
-config = configurator.config
-
-# helpers
-from .core import helpers
+from .scanner import Scanner, Preset
diff --git a/bbot/agent/__init__.py b/bbot/agent/__init__.py
deleted file mode 100644
index d2361b7a3..000000000
--- a/bbot/agent/__init__.py
+++ /dev/null
@@ -1 +0,0 @@
-from .agent import Agent
diff --git a/bbot/agent/agent.py b/bbot/agent/agent.py
deleted file mode 100644
index 1c8debc1e..000000000
--- a/bbot/agent/agent.py
+++ /dev/null
@@ -1,204 +0,0 @@
-import json
-import asyncio
-import logging
-import traceback
-import websockets
-from omegaconf import OmegaConf
-
-from . import messages
-import bbot.core.errors
-from bbot.scanner import Scanner
-from bbot.scanner.dispatcher import Dispatcher
-from bbot.core.helpers.misc import urlparse, split_host_port
-from bbot.core.configurator.environ import prepare_environment
-
-log = logging.getLogger("bbot.core.agent")
-
-
-class Agent:
- def __init__(self, config):
- self.config = config
- prepare_environment(self.config)
- self.url = self.config.get("agent_url", "")
- self.parsed_url = urlparse(self.url)
- self.host, self.port = split_host_port(self.parsed_url.netloc)
- self.token = self.config.get("agent_token", "")
- self.scan = None
- self.task = None
- self._ws = None
- self._scan_lock = asyncio.Lock()
-
- self.dispatcher = Dispatcher()
- self.dispatcher.on_status = self.on_scan_status
- self.dispatcher.on_finish = self.on_scan_finish
-
- def setup(self):
- if not self.url:
- log.error(f"Must specify agent_url")
- return False
- if not self.token:
- log.error(f"Must specify agent_token")
- return False
- return True
-
- async def ws(self, rebuild=False):
- if self._ws is None or rebuild:
- kwargs = {"close_timeout": 0.5}
- if self.token:
- kwargs.update({"extra_headers": {"Authorization": f"Bearer {self.token}"}})
- verbs = ("Building", "Built")
- if rebuild:
- verbs = ("Rebuilding", "Rebuilt")
- url = f"{self.url}/control/"
- log.debug(f"{verbs[0]} websocket connection to {url}")
- while 1:
- try:
- self._ws = await websockets.connect(url, **kwargs)
- break
- except Exception as e:
- log.error(f'Failed to establish websockets connection to URL "{url}": {e}')
- log.trace(traceback.format_exc())
- await asyncio.sleep(1)
- log.debug(f"{verbs[1]} websocket connection to {url}")
- return self._ws
-
- async def start(self):
- rebuild = False
- while 1:
- ws = await self.ws(rebuild=rebuild)
- rebuild = False
- try:
- message = await ws.recv()
- log.debug(f"Got message: {message}")
- try:
- message = json.loads(message)
- message = messages.Message(**message)
-
- if message.command == "ping":
- if self.scan is None:
- await self.send({"conversation": str(message.conversation), "message_type": "pong"})
- continue
-
- command_type = getattr(messages, message.command, None)
- if command_type is None:
- log.warning(f'Invalid command: "{message.command}"')
- continue
-
- command_args = command_type(**message.arguments)
- command_fn = getattr(self, message.command)
- response = await self.err_handle(command_fn, **command_args.dict())
- log.info(str(response))
- await self.send({"conversation": str(message.conversation), "message": response})
-
- except json.decoder.JSONDecodeError as e:
- log.warning(f'Failed to decode message "{message}": {e}')
- log.trace(traceback.format_exc())
- continue
- except Exception as e:
- log.debug(f"Error receiving message: {e}")
- log.debug(traceback.format_exc())
- await asyncio.sleep(1)
- rebuild = True
-
- async def send(self, message):
- rebuild = False
- while 1:
- try:
- ws = await self.ws(rebuild=rebuild)
- j = json.dumps(message)
- log.debug(f"Sending message of length {len(message)}")
- await ws.send(j)
- rebuild = False
- break
- except Exception as e:
- log.warning(f"Error sending message: {e}, retrying")
- log.trace(traceback.format_exc())
- await asyncio.sleep(1)
- # rebuild = True
-
- async def start_scan(self, scan_id, name=None, targets=[], modules=[], output_modules=[], config={}):
- async with self._scan_lock:
- if self.scan is None:
- log.success(
- f"Starting scan with targets={targets}, modules={modules}, output_modules={output_modules}"
- )
- output_module_config = OmegaConf.create(
- {"output_modules": {"websocket": {"url": f"{self.url}/scan/{scan_id}/", "token": self.token}}}
- )
- config = OmegaConf.create(config)
- config = OmegaConf.merge(self.config, config, output_module_config)
- output_modules = list(set(output_modules + ["websocket"]))
- scan = Scanner(
- *targets,
- scan_id=scan_id,
- name=name,
- modules=modules,
- output_modules=output_modules,
- config=config,
- dispatcher=self.dispatcher,
- )
- self.task = asyncio.create_task(self._start_scan_task(scan))
-
- return {"success": f"Started scan", "scan_id": scan.id}
- else:
- msg = f"Scan {self.scan.id} already in progress"
- log.warning(msg)
- return {"error": msg, "scan_id": self.scan.id}
-
- async def _start_scan_task(self, scan):
- self.scan = scan
- try:
- await scan.async_start_without_generator()
- except bbot.core.errors.ScanError as e:
- log.error(f"Scan error: {e}")
- log.trace(traceback.format_exc())
- except Exception:
- log.critical(f"Encountered error: {traceback.format_exc()}")
- self.on_scan_status("FAILED", scan.id)
- finally:
- self.task = None
-
- async def stop_scan(self):
- log.warning("Stopping scan")
- try:
- async with self._scan_lock:
- if self.scan is None:
- msg = "Scan not in progress"
- log.warning(msg)
- return {"error": msg}
- scan_id = str(self.scan.id)
- self.scan.stop()
- msg = f"Stopped scan {scan_id}"
- log.warning(msg)
- self.scan = None
- return {"success": msg, "scan_id": scan_id}
- except Exception as e:
- log.warning(f"Error while stopping scan: {e}")
- log.trace(traceback.format_exc())
- finally:
- self.scan = None
- self.task = None
-
- async def scan_status(self):
- async with self._scan_lock:
- if self.scan is None:
- msg = "Scan not in progress"
- log.warning(msg)
- return {"error": msg}
- return {"success": "Polled scan", "scan_status": self.scan.status}
-
- async def on_scan_status(self, status, scan_id):
- await self.send({"message_type": "scan_status_change", "status": str(status), "scan_id": scan_id})
-
- async def on_scan_finish(self, scan):
- self.scan = None
- self.task = None
-
- async def err_handle(self, callback, *args, **kwargs):
- try:
- return await callback(*args, **kwargs)
- except Exception as e:
- msg = f"Error in {callback.__qualname__}(): {e}"
- log.error(msg)
- log.trace(traceback.format_exc())
- return {"error": msg}
diff --git a/bbot/agent/messages.py b/bbot/agent/messages.py
deleted file mode 100644
index 34fd2c15c..000000000
--- a/bbot/agent/messages.py
+++ /dev/null
@@ -1,29 +0,0 @@
-from uuid import UUID
-from typing import Optional
-from pydantic import BaseModel
-
-
-class Message(BaseModel):
- conversation: UUID
- command: str
- arguments: Optional[dict] = {}
-
-
-### COMMANDS ###
-
-
-class start_scan(BaseModel):
- scan_id: str
- targets: list
- modules: list
- output_modules: list = []
- config: dict = {}
- name: Optional[str] = None
-
-
-class stop_scan(BaseModel):
- pass
-
-
-class scan_status(BaseModel):
- pass
diff --git a/bbot/cli.py b/bbot/cli.py
index 9427c063f..6c9718fca 100755
--- a/bbot/cli.py
+++ b/bbot/cli.py
@@ -1,420 +1,291 @@
#!/usr/bin/env python3
-import os
-import re
import sys
-import asyncio
import logging
-import traceback
-from omegaconf import OmegaConf
-from contextlib import suppress
-
-# fix tee buffering
-sys.stdout.reconfigure(line_buffering=True)
-
-# logging
-from bbot.core.logger import get_log_level, toggle_log_level
-
-import bbot.core.errors
+import multiprocessing
+from bbot.errors import *
from bbot import __version__
-from bbot.modules import module_loader
+from bbot.logger import log_to_stderr
from bbot.core.helpers.misc import chain_lists
-from bbot.core.configurator.args import parser
-from bbot.core.helpers.logger import log_to_stderr
-from bbot.core.configurator import ensure_config_files, check_cli_args, environ
-log = logging.getLogger("bbot.cli")
+if multiprocessing.current_process().name == "MainProcess":
+ silent = "-s" in sys.argv or "--silent" in sys.argv
+
+ if not silent:
+ ascii_art = rf""" [1;38;5;208m ______ [0m _____ ____ _______
+ [1;38;5;208m| ___ \[0m| __ \ / __ \__ __|
+ [1;38;5;208m| |___) [0m| |__) | | | | | |
+ [1;38;5;208m| ___ <[0m| __ <| | | | | |
+ [1;38;5;208m| |___) [0m| |__) | |__| | | |
+ [1;38;5;208m|______/[0m|_____/ \____/ |_|
+ [1;38;5;208mBIGHUGE[0m BLS OSINT TOOL {__version__}
+
+www.blacklanternsecurity.com/bbot
+"""
+ print(ascii_art, file=sys.stderr)
+ log_to_stderr(
+ "This is a pre-release of BBOT 2.0. If you upgraded from version 1, we recommend cleaning your old configs etc. before running this version!",
+ level="WARNING",
+ )
+ log_to_stderr(
+ "For details, see https://github.com/blacklanternsecurity/bbot/discussions/1540", level="WARNING"
+ )
+
+scan_name = ""
-log_level = get_log_level()
+async def _main():
-from . import config
+ import asyncio
+ import traceback
+ from contextlib import suppress
+ # fix tee buffering
+ sys.stdout.reconfigure(line_buffering=True)
-err = False
-scan_name = ""
+ log = logging.getLogger("bbot.cli")
+ from bbot.scanner import Scanner
+ from bbot.scanner.preset import Preset
-async def _main():
- global err
global scan_name
- environ.cli_execution = True
-
- # async def monitor_tasks():
- # in_row = 0
- # while 1:
- # try:
- # print('looooping')
- # tasks = asyncio.all_tasks()
- # current_task = asyncio.current_task()
- # if len(tasks) == 1 and list(tasks)[0] == current_task:
- # print('no tasks')
- # in_row += 1
- # else:
- # in_row = 0
- # for t in tasks:
- # print(t)
- # if in_row > 2:
- # break
- # await asyncio.sleep(1)
- # except BaseException as e:
- # print(traceback.format_exc())
- # with suppress(BaseException):
- # await asyncio.sleep(.1)
-
- # monitor_tasks_task = asyncio.create_task(monitor_tasks())
-
- ensure_config_files()
try:
+
+ # start by creating a default scan preset
+ preset = Preset(_log=True, name="bbot_cli_main")
+ # parse command line arguments and merge into preset
+ try:
+ preset.parse_args()
+ except BBOTArgumentError as e:
+ log_to_stderr(str(e), level="WARNING")
+ log.trace(traceback.format_exc())
+ return
+ # ensure arguments (-c config options etc.) are valid
+ options = preset.args.parsed
+
+ # print help if no arguments
if len(sys.argv) == 1:
- parser.print_help()
+ print(preset.args.parser.format_help())
sys.exit(1)
-
- options = parser.parse_args()
- check_cli_args()
+ return
# --version
if options.version:
- log.stdout(__version__)
+ print(__version__)
sys.exit(0)
return
- # --current-config
- if options.current_config:
- log.stdout(f"{OmegaConf.to_yaml(config)}")
+ # --list-presets
+ if options.list_presets:
+ print("")
+ print("### PRESETS ###")
+ print("")
+ for row in preset.presets_table().splitlines():
+ print(row)
+ return
+
+ # if we're listing modules or their options
+ if options.list_modules or options.list_module_options:
+
+ # if no modules or flags are specified, enable everything
+ if not (options.modules or options.output_modules or options.flags):
+ for module, preloaded in preset.module_loader.preloaded().items():
+ module_type = preloaded.get("type", "scan")
+ preset.add_module(module, module_type=module_type)
+
+ if options.modules or options.output_modules or options.flags:
+ preset._default_output_modules = options.output_modules
+ preset._default_internal_modules = []
+
+ preset.bake()
+
+ # --list-modules
+ if options.list_modules:
+ print("")
+ print("### MODULES ###")
+ print("")
+ for row in preset.module_loader.modules_table(preset.modules).splitlines():
+ print(row)
+ return
+
+ # --list-module-options
+ if options.list_module_options:
+ print("")
+ print("### MODULE OPTIONS ###")
+ print("")
+ for row in preset.module_loader.modules_options_table(preset.modules).splitlines():
+ print(row)
+ return
+
+ # --list-flags
+ if options.list_flags:
+ flags = preset.flags if preset.flags else None
+ print("")
+ print("### FLAGS ###")
+ print("")
+ for row in preset.module_loader.flags_table(flags=flags).splitlines():
+ print(row)
+ return
+
+ try:
+ scan = Scanner(preset=preset)
+ except (PresetAbortError, ValidationError) as e:
+ log.warning(str(e))
+ return
+
+ deadly_modules = [
+ m for m in scan.preset.scan_modules if "deadly" in preset.preloaded_module(m).get("flags", [])
+ ]
+ if deadly_modules and not options.allow_deadly:
+ log.hugewarning(f"You enabled the following deadly modules: {','.join(deadly_modules)}")
+ log.hugewarning(f"Deadly modules are highly intrusive")
+ log.hugewarning(f"Please specify --allow-deadly to continue")
+ return False
+
+ # --current-preset
+ if options.current_preset:
+ print(scan.preset.to_yaml())
sys.exit(0)
return
- if options.agent_mode:
- from bbot.agent import Agent
-
- agent = Agent(config)
- success = agent.setup()
- if success:
- await agent.start()
-
- else:
- from bbot.scanner import Scanner
-
- try:
- output_modules = set(options.output_modules)
- module_filtering = False
- if (options.list_modules or options.help_all) and not any([options.flags, options.modules]):
- module_filtering = True
- modules = set(module_loader.preloaded(type="scan"))
- else:
- modules = set(options.modules)
- # enable modules by flags
- for m, c in module_loader.preloaded().items():
- module_type = c.get("type", "scan")
- if m not in modules:
- flags = c.get("flags", [])
- if "deadly" in flags:
- continue
- for f in options.flags:
- if f in flags:
- log.verbose(f'Enabling {m} because it has flag "{f}"')
- if module_type == "output":
- output_modules.add(m)
- else:
- modules.add(m)
-
- default_output_modules = ["human", "json", "csv"]
-
- # Make a list of the modules which can be output to the console
- consoleable_output_modules = [
- k for k, v in module_loader.preloaded(type="output").items() if "console" in v["config"]
- ]
-
- # if none of the output modules provided on the command line are consoleable, don't turn off the defaults. Instead, just add the one specified to the defaults.
- if not any(o in consoleable_output_modules for o in output_modules):
- output_modules.update(default_output_modules)
-
- scanner = Scanner(
- *options.targets,
- modules=list(modules),
- output_modules=list(output_modules),
- output_dir=options.output_dir,
- config=config,
- name=options.name,
- whitelist=options.whitelist,
- blacklist=options.blacklist,
- strict_scope=options.strict_scope,
- force_start=options.force,
- )
-
- if options.install_all_deps:
- all_modules = list(module_loader.preloaded())
- scanner.helpers.depsinstaller.force_deps = True
- succeeded, failed = await scanner.helpers.depsinstaller.install(*all_modules)
- log.info("Finished installing module dependencies")
- return False if failed else True
-
- scan_name = str(scanner.name)
-
- # enable modules by dependency
- # this is only a basic surface-level check
- # todo: recursive dependency graph with networkx or topological sort?
- all_modules = list(set(scanner._scan_modules + scanner._internal_modules + scanner._output_modules))
- while 1:
- changed = False
- dep_choices = module_loader.recommend_dependencies(all_modules)
- if not dep_choices:
- break
- for event_type, deps in dep_choices.items():
- if event_type in ("*", "all"):
- continue
- # skip resolving dependency if a target provides the missing type
- if any(e.type == event_type for e in scanner.target.events):
- continue
- required_by = deps.get("required_by", [])
- recommended = deps.get("recommended", [])
- if not recommended:
- log.hugewarning(
- f"{len(required_by):,} modules ({','.join(required_by)}) rely on {event_type} but no modules produce it"
- )
- elif len(recommended) == 1:
- log.verbose(
- f"Enabling {next(iter(recommended))} because {len(required_by):,} modules ({','.join(required_by)}) rely on it for {event_type}"
- )
- all_modules = list(set(all_modules + list(recommended)))
- scanner._scan_modules = list(set(scanner._scan_modules + list(recommended)))
- changed = True
- else:
- log.hugewarning(
- f"{len(required_by):,} modules ({','.join(required_by)}) rely on {event_type} but no enabled module produces it"
- )
- log.hugewarning(
- f"Recommend enabling one or more of the following modules which produce {event_type}:"
+ # --current-preset-full
+ if options.current_preset_full:
+ print(scan.preset.to_yaml(full_config=True))
+ sys.exit(0)
+ return
+
+ # --install-all-deps
+ if options.install_all_deps:
+ all_modules = list(preset.module_loader.preloaded())
+ scan.helpers.depsinstaller.force_deps = True
+ succeeded, failed = await scan.helpers.depsinstaller.install(*all_modules)
+ log.info("Finished installing module dependencies")
+ return False if failed else True
+
+ scan_name = str(scan.name)
+
+ log.verbose("")
+ log.verbose("### MODULES ENABLED ###")
+ log.verbose("")
+ for row in scan.preset.module_loader.modules_table(scan.preset.modules).splitlines():
+ log.verbose(row)
+
+ scan.helpers.word_cloud.load()
+ await scan._prep()
+
+ if not options.dry_run:
+ log.trace(f"Command: {' '.join(sys.argv)}")
+
+ if sys.stdin.isatty():
+
+ # warn if any targets belong directly to a cloud provider
+ for event in scan.target.events:
+ if event.type == "DNS_NAME":
+ cloudcheck_result = scan.helpers.cloudcheck(event.host)
+ if cloudcheck_result:
+ scan.hugewarning(
+ f'YOUR TARGET CONTAINS A CLOUD DOMAIN: "{event.host}". You\'re in for a wild ride!'
)
- for m in recommended:
- log.warning(f" - {m}")
- if not changed:
- break
-
- # required flags
- modules = set(scanner._scan_modules)
- for m in scanner._scan_modules:
- flags = module_loader._preloaded.get(m, {}).get("flags", [])
- if not all(f in flags for f in options.require_flags):
- log.verbose(
- f"Removing {m} because it does not have the required flags: {'+'.join(options.require_flags)}"
- )
- with suppress(KeyError):
- modules.remove(m)
-
- # excluded flags
- for m in scanner._scan_modules:
- flags = module_loader._preloaded.get(m, {}).get("flags", [])
- if any(f in flags for f in options.exclude_flags):
- log.verbose(f"Removing {m} because of excluded flag: {','.join(options.exclude_flags)}")
- with suppress(KeyError):
- modules.remove(m)
-
- # excluded modules
- for m in options.exclude_modules:
- if m in modules:
- log.verbose(f"Removing {m} because it is excluded")
- with suppress(KeyError):
- modules.remove(m)
- scanner._scan_modules = list(modules)
-
- log_fn = log.info
- if options.list_modules or options.help_all:
- log_fn = log.stdout
-
- help_modules = list(modules)
- if module_filtering:
- help_modules = None
-
- if options.help_all:
- log_fn(parser.format_help())
-
- if options.list_flags:
- log.stdout("")
- log.stdout("### FLAGS ###")
- log.stdout("")
- for row in module_loader.flags_table(flags=options.flags).splitlines():
- log.stdout(row)
- return
-
- log_fn("")
- log_fn("### MODULES ###")
- log_fn("")
- for row in module_loader.modules_table(modules=help_modules).splitlines():
- log_fn(row)
-
- if options.help_all:
- log_fn("")
- log_fn("### MODULE OPTIONS ###")
- log_fn("")
- for row in module_loader.modules_options_table(modules=help_modules).splitlines():
- log_fn(row)
-
- if options.list_modules or options.list_flags or options.help_all:
- return
-
- module_list = module_loader.filter_modules(modules=modules)
- deadly_modules = []
- active_modules = []
- active_aggressive_modules = []
- slow_modules = []
- for m in module_list:
- if m[0] in scanner._scan_modules:
- if "deadly" in m[-1]["flags"]:
- deadly_modules.append(m[0])
- if "active" in m[-1]["flags"]:
- active_modules.append(m[0])
- if "aggressive" in m[-1]["flags"]:
- active_aggressive_modules.append(m[0])
- if "slow" in m[-1]["flags"]:
- slow_modules.append(m[0])
- if scanner._scan_modules:
- if deadly_modules and not options.allow_deadly:
- log.hugewarning(f"You enabled the following deadly modules: {','.join(deadly_modules)}")
- log.hugewarning(f"Deadly modules are highly intrusive")
- log.hugewarning(f"Please specify --allow-deadly to continue")
- return False
- if active_modules:
- if active_modules:
- if active_aggressive_modules:
- log.hugewarning(
- "This is an (aggressive) active scan! Intrusive connections will be made to target"
- )
- else:
- log.hugewarning(
- "This is a (safe) active scan. Non-intrusive connections will be made to target"
- )
+
+ if not options.yes:
+ log.hugesuccess(f"Scan ready. Press enter to execute {scan.name}")
+ input()
+
+ import os
+ import re
+ import fcntl
+ from bbot.core.helpers.misc import smart_decode
+
+ def handle_keyboard_input(keyboard_input):
+ kill_regex = re.compile(r"kill (?P[a-z0-9_ ,]+)")
+ if keyboard_input:
+ log.verbose(f'Got keyboard input: "{keyboard_input}"')
+ kill_match = kill_regex.match(keyboard_input)
+ if kill_match:
+ modules = kill_match.group("modules")
+ if modules:
+ modules = chain_lists(modules)
+ for module in modules:
+ if module in scan.modules:
+ log.hugewarning(f'Killing module: "{module}"')
+ scan.kill_module(module, message="killed by user")
+ else:
+ log.warning(f'Invalid module: "{module}"')
else:
- log.hugeinfo("This is a passive scan. No connections will be made to target")
- if slow_modules:
- log.warning(
- f"You have enabled the following slow modules: {','.join(slow_modules)}. Scan may take a while"
- )
-
- scanner.helpers.word_cloud.load()
-
- await scanner._prep()
-
- if not options.dry_run:
- log.trace(f"Command: {' '.join(sys.argv)}")
-
- # if we're on the terminal, enable keyboard interaction
- if sys.stdin.isatty():
-
- # warn if any targets belong directly to a cloud provider
- for event in scanner.target.events:
- if event.type == "DNS_NAME":
- provider, _, _ = scanner.helpers.cloudcheck(event.host)
- if provider:
- scanner.hugewarning(
- f'YOUR TARGET CONTAINS A CLOUD DOMAIN: "{event.host}". You\'re in for a wild ride!'
- )
-
- import fcntl
- from bbot.core.helpers.misc import smart_decode
-
- if not options.agent_mode and not options.yes:
- log.hugesuccess(f"Scan ready. Press enter to execute {scanner.name}")
- input()
-
- def handle_keyboard_input(keyboard_input):
- kill_regex = re.compile(r"kill (?P[a-z0-9_ ,]+)")
- if keyboard_input:
- log.verbose(f'Got keyboard input: "{keyboard_input}"')
- kill_match = kill_regex.match(keyboard_input)
- if kill_match:
- modules = kill_match.group("modules")
- if modules:
- modules = chain_lists(modules)
- for module in modules:
- if module in scanner.modules:
- log.hugewarning(f'Killing module: "{module}"')
- scanner.manager.kill_module(module, message="killed by user")
- else:
- log.warning(f'Invalid module: "{module}"')
- else:
- toggle_log_level(logger=log)
- scanner.manager.modules_status(_log=True)
-
- reader = asyncio.StreamReader()
- protocol = asyncio.StreamReaderProtocol(reader)
- await asyncio.get_event_loop().connect_read_pipe(lambda: protocol, sys.stdin)
-
- # set stdout and stderr to blocking mode
- # this is needed to prevent BlockingIOErrors in logging etc.
- fds = [sys.stdout.fileno(), sys.stderr.fileno()]
- for fd in fds:
- flags = fcntl.fcntl(fd, fcntl.F_GETFL)
- fcntl.fcntl(fd, fcntl.F_SETFL, flags & ~os.O_NONBLOCK)
-
- async def akeyboard_listen():
+ scan.preset.core.logger.toggle_log_level(logger=log)
+ scan.modules_status(_log=True)
+
+ reader = asyncio.StreamReader()
+ protocol = asyncio.StreamReaderProtocol(reader)
+ await asyncio.get_event_loop().connect_read_pipe(lambda: protocol, sys.stdin)
+
+ # set stdout and stderr to blocking mode
+ # this is needed to prevent BlockingIOErrors in logging etc.
+ fds = [sys.stdout.fileno(), sys.stderr.fileno()]
+ for fd in fds:
+ flags = fcntl.fcntl(fd, fcntl.F_GETFL)
+ fcntl.fcntl(fd, fcntl.F_SETFL, flags & ~os.O_NONBLOCK)
+
+ async def akeyboard_listen():
+ try:
+ allowed_errors = 10
+ while 1:
+ keyboard_input = None
try:
+ keyboard_input = smart_decode((await reader.readline()).strip())
allowed_errors = 10
- while 1:
- keyboard_input = None
- try:
- keyboard_input = smart_decode((await reader.readline()).strip())
- allowed_errors = 10
- except Exception as e:
- log_to_stderr(f"Error in keyboard listen loop: {e}", level="TRACE")
- log_to_stderr(traceback.format_exc(), level="TRACE")
- allowed_errors -= 1
- if keyboard_input is not None:
- handle_keyboard_input(keyboard_input)
- if allowed_errors <= 0:
- break
except Exception as e:
- log_to_stderr(f"Error in keyboard listen task: {e}", level="ERROR")
+ log_to_stderr(f"Error in keyboard listen loop: {e}", level="TRACE")
log_to_stderr(traceback.format_exc(), level="TRACE")
+ allowed_errors -= 1
+ if keyboard_input is not None:
+ handle_keyboard_input(keyboard_input)
+ if allowed_errors <= 0:
+ break
+ except Exception as e:
+ log_to_stderr(f"Error in keyboard listen task: {e}", level="ERROR")
+ log_to_stderr(traceback.format_exc(), level="TRACE")
- asyncio.create_task(akeyboard_listen())
+ asyncio.create_task(akeyboard_listen())
- await scanner.async_start_without_generator()
+ await scan.async_start_without_generator()
- except bbot.core.errors.ScanError as e:
- log_to_stderr(str(e), level="ERROR")
- except Exception:
- raise
+ return True
- except bbot.core.errors.BBOTError as e:
- log_to_stderr(f"{e} (--debug for details)", level="ERROR")
- if log_level <= logging.DEBUG:
- log_to_stderr(traceback.format_exc(), level="DEBUG")
- err = True
-
- except Exception:
- log_to_stderr(f"Encountered unknown error: {traceback.format_exc()}", level="ERROR")
- err = True
+ except BBOTError as e:
+ log.error(str(e))
+ log.trace(traceback.format_exc())
finally:
# save word cloud
with suppress(BaseException):
- save_success, filename = scanner.helpers.word_cloud.save()
+ save_success, filename = scan.helpers.word_cloud.save()
if save_success:
- log_to_stderr(f"Saved word cloud ({len(scanner.helpers.word_cloud):,} words) to {filename}")
+ log_to_stderr(f"Saved word cloud ({len(scan.helpers.word_cloud):,} words) to {filename}")
# remove output directory if empty
with suppress(BaseException):
- scanner.home.rmdir()
- if err:
- os._exit(1)
+ scan.home.rmdir()
def main():
+ import asyncio
+ import traceback
+ from bbot.core import CORE
+
global scan_name
try:
asyncio.run(_main())
except asyncio.CancelledError:
- if get_log_level() <= logging.DEBUG:
+ if CORE.logger.log_level <= logging.DEBUG:
log_to_stderr(traceback.format_exc(), level="DEBUG")
except KeyboardInterrupt:
msg = "Interrupted"
if scan_name:
msg = f"You killed {scan_name}"
log_to_stderr(msg, level="WARNING")
- if get_log_level() <= logging.DEBUG:
+ if CORE.logger.log_level <= logging.DEBUG:
log_to_stderr(traceback.format_exc(), level="DEBUG")
exit(1)
diff --git a/bbot/core/__init__.py b/bbot/core/__init__.py
index 52cf06cc5..6cfaecf0f 100644
--- a/bbot/core/__init__.py
+++ b/bbot/core/__init__.py
@@ -1,4 +1,3 @@
-# logging
-from .logger import init_logging
+from .core import BBOTCore
-init_logging()
+CORE = BBOTCore()
diff --git a/bbot/core/config/__init__.py b/bbot/core/config/__init__.py
new file mode 100644
index 000000000..c36d91f48
--- /dev/null
+++ b/bbot/core/config/__init__.py
@@ -0,0 +1,12 @@
+import sys
+import multiprocessing as mp
+
+try:
+ mp.set_start_method("spawn")
+except Exception:
+ start_method = mp.get_start_method()
+ if start_method != "spawn":
+ print(
+ f"[WARN] Multiprocessing spawn method is set to {start_method}. This may negatively affect performance.",
+ file=sys.stderr,
+ )
diff --git a/bbot/core/config/files.py b/bbot/core/config/files.py
new file mode 100644
index 000000000..c66e92116
--- /dev/null
+++ b/bbot/core/config/files.py
@@ -0,0 +1,42 @@
+import sys
+from pathlib import Path
+from omegaconf import OmegaConf
+
+from ...logger import log_to_stderr
+from ...errors import ConfigLoadError
+
+
+bbot_code_dir = Path(__file__).parent.parent.parent
+
+
+class BBOTConfigFiles:
+
+ config_dir = (Path.home() / ".config" / "bbot").resolve()
+ defaults_filename = (bbot_code_dir / "defaults.yml").resolve()
+ config_filename = (config_dir / "bbot.yml").resolve()
+ secrets_filename = (config_dir / "secrets.yml").resolve()
+
+ def __init__(self, core):
+ self.core = core
+
+ def _get_config(self, filename, name="config"):
+ filename = Path(filename).resolve()
+ try:
+ conf = OmegaConf.load(str(filename))
+ cli_silent = any(x in sys.argv for x in ("-s", "--silent"))
+ if __name__ == "__main__" and not cli_silent:
+ log_to_stderr(f"Loaded {name} from {filename}")
+ return conf
+ except Exception as e:
+ if filename.exists():
+ raise ConfigLoadError(f"Error parsing config at {filename}:\n\n{e}")
+ return OmegaConf.create()
+
+ def get_custom_config(self):
+ return OmegaConf.merge(
+ self._get_config(self.config_filename, name="config"),
+ self._get_config(self.secrets_filename, name="secrets"),
+ )
+
+ def get_default_config(self):
+ return self._get_config(self.defaults_filename, name="defaults")
diff --git a/bbot/core/config/logger.py b/bbot/core/config/logger.py
new file mode 100644
index 000000000..6a213d42d
--- /dev/null
+++ b/bbot/core/config/logger.py
@@ -0,0 +1,246 @@
+import sys
+import atexit
+import logging
+from copy import copy
+import multiprocessing
+import logging.handlers
+from pathlib import Path
+
+from ..helpers.misc import mkdir, error_and_exit
+from ...logger import colorize, loglevel_mapping
+
+
+debug_format = logging.Formatter("%(asctime)s [%(levelname)s] %(name)s %(filename)s:%(lineno)s %(message)s")
+
+
+class ColoredFormatter(logging.Formatter):
+ """
+ Pretty colors for terminal
+ """
+
+ formatter = logging.Formatter("%(levelname)s %(message)s")
+ module_formatter = logging.Formatter("%(levelname)s %(name)s: %(message)s")
+
+ def format(self, record):
+ colored_record = copy(record)
+ levelname = colored_record.levelname
+ levelshort = loglevel_mapping.get(levelname, "INFO")
+ colored_record.levelname = colorize(f"[{levelshort}]", level=levelname)
+ if levelname == "CRITICAL" or levelname.startswith("HUGE"):
+ colored_record.msg = colorize(colored_record.msg, level=levelname)
+ # remove name
+ if colored_record.name.startswith("bbot.modules."):
+ colored_record.name = colored_record.name.split("bbot.modules.")[-1]
+ return self.module_formatter.format(colored_record)
+ return self.formatter.format(colored_record)
+
+
+class BBOTLogger:
+ """
+ The main BBOT logger.
+
+ The job of this class is to manage the different log handlers in BBOT,
+ allow adding new log handlers, and easily switching log levels on the fly.
+ """
+
+ def __init__(self, core):
+ # custom logging levels
+ if getattr(logging, "HUGEWARNING", None) is None:
+ self.addLoggingLevel("TRACE", 49)
+ self.addLoggingLevel("HUGEWARNING", 31)
+ self.addLoggingLevel("HUGESUCCESS", 26)
+ self.addLoggingLevel("SUCCESS", 25)
+ self.addLoggingLevel("HUGEINFO", 21)
+ self.addLoggingLevel("HUGEVERBOSE", 16)
+ self.addLoggingLevel("VERBOSE", 15)
+ self.verbosity_levels_toggle = [logging.INFO, logging.VERBOSE, logging.DEBUG]
+
+ self._loggers = None
+ self._log_handlers = None
+ self._log_level = None
+ self.root_logger = logging.getLogger()
+ self.core_logger = logging.getLogger("bbot")
+ self.core = core
+
+ self.listener = None
+
+ self.process_name = multiprocessing.current_process().name
+ if self.process_name == "MainProcess":
+ self.queue = multiprocessing.Queue()
+ self.setup_queue_handler()
+ # Start the QueueListener
+ self.listener = logging.handlers.QueueListener(self.queue, *self.log_handlers.values())
+ self.listener.start()
+ atexit.register(self.listener.stop)
+
+ self.log_level = logging.INFO
+
+ def setup_queue_handler(self, logging_queue=None, log_level=logging.DEBUG):
+ if logging_queue is None:
+ logging_queue = self.queue
+ else:
+ self.queue = logging_queue
+ self.queue_handler = logging.handlers.QueueHandler(logging_queue)
+
+ self.root_logger.addHandler(self.queue_handler)
+
+ self.core_logger.setLevel(log_level)
+ # disable asyncio logging for child processes
+ if self.process_name != "MainProcess":
+ logging.getLogger("asyncio").setLevel(logging.ERROR)
+
+ def addLoggingLevel(self, levelName, levelNum, methodName=None):
+ """
+ Comprehensively adds a new logging level to the `logging` module and the
+ currently configured logging class.
+
+ `levelName` becomes an attribute of the `logging` module with the value
+ `levelNum`. `methodName` becomes a convenience method for both `logging`
+ itself and the class returned by `logging.getLoggerClass()` (usually just
+ `logging.Logger`). If `methodName` is not specified, `levelName.lower()` is
+ used.
+
+ To avoid accidental clobberings of existing attributes, this method will
+ raise an `AttributeError` if the level name is already an attribute of the
+ `logging` module or if the method name is already present
+
+ Example
+ -------
+ >>> addLoggingLevel('TRACE', logging.DEBUG - 5)
+ >>> logging.getLogger(__name__).setLevel('TRACE')
+ >>> logging.getLogger(__name__).trace('that worked')
+ >>> logging.trace('so did this')
+ >>> logging.TRACE
+ 5
+
+ """
+ if not methodName:
+ methodName = levelName.lower()
+
+ if hasattr(logging, levelName):
+ raise AttributeError(f"{levelName} already defined in logging module")
+ if hasattr(logging, methodName):
+ raise AttributeError(f"{methodName} already defined in logging module")
+ if hasattr(logging.getLoggerClass(), methodName):
+ raise AttributeError(f"{methodName} already defined in logger class")
+
+ # This method was inspired by the answers to Stack Overflow post
+ # http://stackoverflow.com/q/2183233/2988730, especially
+ # http://stackoverflow.com/a/13638084/2988730
+ def logForLevel(self, message, *args, **kwargs):
+ if self.isEnabledFor(levelNum):
+ self._log(levelNum, message, args, **kwargs)
+
+ def logToRoot(message, *args, **kwargs):
+ logging.log(levelNum, message, *args, **kwargs)
+
+ logging.addLevelName(levelNum, levelName)
+ setattr(logging, levelName, levelNum)
+ setattr(logging.getLoggerClass(), methodName, logForLevel)
+ setattr(logging, methodName, logToRoot)
+
+ @property
+ def loggers(self):
+ if self._loggers is None:
+ self._loggers = [
+ logging.getLogger("bbot"),
+ logging.getLogger("asyncio"),
+ ]
+ return self._loggers
+
+ def add_log_handler(self, handler, formatter=None):
+ if self.listener is None:
+ return
+ if handler.formatter is None:
+ handler.setFormatter(debug_format)
+ if handler not in self.listener.handlers:
+ self.listener.handlers = self.listener.handlers + (handler,)
+
+ def remove_log_handler(self, handler):
+ if self.listener is None:
+ return
+ if handler in self.listener.handlers:
+ new_handlers = list(self.listener.handlers)
+ new_handlers.remove(handler)
+ self.listener.handlers = tuple(new_handlers)
+
+ def include_logger(self, logger):
+ if logger not in self.loggers:
+ self.loggers.append(logger)
+ if self.log_level is not None:
+ logger.setLevel(self.log_level)
+ for handler in self.log_handlers.values():
+ self.add_log_handler(handler)
+
+ def stderr_filter(self, record):
+ if record.levelno == logging.TRACE and self.log_level > logging.DEBUG:
+ return False
+ if record.levelno < self.log_level:
+ return False
+ return True
+
+ @property
+ def log_handlers(self):
+ if self._log_handlers is None:
+ log_dir = Path(self.core.home) / "logs"
+ if not mkdir(log_dir, raise_error=False):
+ error_and_exit(f"Failure creating or error writing to BBOT logs directory ({log_dir})")
+
+ # Main log file
+ main_handler = logging.handlers.TimedRotatingFileHandler(
+ f"{log_dir}/bbot.log", when="d", interval=1, backupCount=14
+ )
+
+ # Separate log file for debugging
+ debug_handler = logging.handlers.TimedRotatingFileHandler(
+ f"{log_dir}/bbot.debug.log", when="d", interval=1, backupCount=14
+ )
+
+ # Log to stderr
+ stderr_handler = logging.StreamHandler(sys.stderr)
+ stderr_handler.addFilter(self.stderr_filter)
+ # log to files
+ debug_handler.addFilter(lambda x: x.levelno == logging.TRACE or (x.levelno < logging.VERBOSE))
+ main_handler.addFilter(lambda x: x.levelno != logging.TRACE and x.levelno >= logging.VERBOSE)
+
+ # Set log format
+ debug_handler.setFormatter(debug_format)
+ main_handler.setFormatter(debug_format)
+ stderr_handler.setFormatter(ColoredFormatter("%(levelname)s %(name)s: %(message)s"))
+
+ self._log_handlers = {
+ "stderr": stderr_handler,
+ "file_debug": debug_handler,
+ "file_main": main_handler,
+ }
+ return self._log_handlers
+
+ @property
+ def log_level(self):
+ if self._log_level is None:
+ return logging.INFO
+ return self._log_level
+
+ @log_level.setter
+ def log_level(self, level):
+ self.set_log_level(level)
+
+ def set_log_level(self, level, logger=None):
+ if isinstance(level, str):
+ level = logging.getLevelName(level)
+ if logger is not None:
+ logger.hugeinfo(f"Setting log level to {logging.getLevelName(level)}")
+ self._log_level = level
+ for logger in self.loggers:
+ logger.setLevel(level)
+
+ def toggle_log_level(self, logger=None):
+ if self.log_level in self.verbosity_levels_toggle:
+ for i, level in enumerate(self.verbosity_levels_toggle):
+ if self.log_level == level:
+ self.set_log_level(
+ self.verbosity_levels_toggle[(i + 1) % len(self.verbosity_levels_toggle)], logger=logger
+ )
+ break
+ else:
+ self.set_log_level(self.verbosity_levels_toggle[0], logger=logger)
diff --git a/bbot/core/configurator/__init__.py b/bbot/core/configurator/__init__.py
deleted file mode 100644
index 15962ce59..000000000
--- a/bbot/core/configurator/__init__.py
+++ /dev/null
@@ -1,103 +0,0 @@
-import re
-from omegaconf import OmegaConf
-
-from . import files, args, environ
-from ..errors import ConfigLoadError
-from ...modules import module_loader
-from ..helpers.logger import log_to_stderr
-from ..helpers.misc import error_and_exit, filter_dict, clean_dict, match_and_exit, is_file
-
-# cached sudo password
-bbot_sudo_pass = None
-
-modules_config = OmegaConf.create(
- {
- "modules": module_loader.configs(type="scan"),
- "output_modules": module_loader.configs(type="output"),
- "internal_modules": module_loader.configs(type="internal"),
- }
-)
-
-try:
- config = OmegaConf.merge(
- # first, pull module defaults
- modules_config,
- # then look in .yaml files
- files.get_config(),
- # finally, pull from CLI arguments
- args.get_config(),
- )
-except ConfigLoadError as e:
- error_and_exit(e)
-
-
-config = environ.prepare_environment(config)
-default_config = OmegaConf.merge(files.default_config, modules_config)
-
-
-sentinel = object()
-
-
-exclude_from_validation = re.compile(r".*modules\.[a-z0-9_]+\.(?:batch_size|max_event_handlers)$")
-
-
-def check_cli_args():
- conf = [a for a in args.cli_config if not is_file(a)]
- all_options = None
- for c in conf:
- c = c.split("=")[0].strip()
- v = OmegaConf.select(default_config, c, default=sentinel)
- # if option isn't in the default config
- if v is sentinel:
- if exclude_from_validation.match(c):
- continue
- if all_options is None:
- from ...modules import module_loader
-
- modules_options = set()
- for module_options in module_loader.modules_options().values():
- modules_options.update(set(o[0] for o in module_options))
- global_options = set(default_config.keys()) - {"modules", "output_modules"}
- all_options = global_options.union(modules_options)
- match_and_exit(c, all_options, msg="module option")
-
-
-def ensure_config_files():
- secrets_strings = ["api_key", "username", "password", "token", "secret", "_id"]
- exclude_keys = ["modules", "output_modules", "internal_modules"]
-
- comment_notice = (
- "# NOTICE: THESE ENTRIES ARE COMMENTED BY DEFAULT\n"
- + "# Please be sure to uncomment when inserting API keys, etc.\n"
- )
-
- # ensure bbot.yml
- if not files.config_filename.exists():
- log_to_stderr(f"Creating BBOT config at {files.config_filename}")
- no_secrets_config = OmegaConf.to_object(default_config)
- no_secrets_config = clean_dict(
- no_secrets_config,
- *secrets_strings,
- fuzzy=True,
- exclude_keys=exclude_keys,
- )
- yaml = OmegaConf.to_yaml(no_secrets_config)
- yaml = comment_notice + "\n".join(f"# {line}" for line in yaml.splitlines())
- with open(str(files.config_filename), "w") as f:
- f.write(yaml)
-
- # ensure secrets.yml
- if not files.secrets_filename.exists():
- log_to_stderr(f"Creating BBOT secrets at {files.secrets_filename}")
- secrets_only_config = OmegaConf.to_object(default_config)
- secrets_only_config = filter_dict(
- secrets_only_config,
- *secrets_strings,
- fuzzy=True,
- exclude_keys=exclude_keys,
- )
- yaml = OmegaConf.to_yaml(secrets_only_config)
- yaml = comment_notice + "\n".join(f"# {line}" for line in yaml.splitlines())
- with open(str(files.secrets_filename), "w") as f:
- f.write(yaml)
- files.secrets_filename.chmod(0o600)
diff --git a/bbot/core/configurator/args.py b/bbot/core/configurator/args.py
deleted file mode 100644
index 173583827..000000000
--- a/bbot/core/configurator/args.py
+++ /dev/null
@@ -1,255 +0,0 @@
-import sys
-import argparse
-from pathlib import Path
-from omegaconf import OmegaConf
-from contextlib import suppress
-
-from ...modules import module_loader
-from ..helpers.logger import log_to_stderr
-from ..helpers.misc import chain_lists, match_and_exit, is_file
-
-module_choices = sorted(set(module_loader.configs(type="scan")))
-output_module_choices = sorted(set(module_loader.configs(type="output")))
-
-flag_choices = set()
-for m, c in module_loader.preloaded().items():
- flag_choices.update(set(c.get("flags", [])))
-
-
-class BBOTArgumentParser(argparse.ArgumentParser):
- _dummy = False
-
- def parse_args(self, *args, **kwargs):
- """
- Allow space or comma-separated entries for modules and targets
- For targets, also allow input files containing additional targets
- """
- ret = super().parse_args(*args, **kwargs)
- # silent implies -y
- if ret.silent:
- ret.yes = True
- ret.modules = chain_lists(ret.modules)
- ret.exclude_modules = chain_lists(ret.exclude_modules)
- ret.output_modules = chain_lists(ret.output_modules)
- ret.targets = chain_lists(ret.targets, try_files=True, msg="Reading targets from file: {filename}")
- ret.whitelist = chain_lists(ret.whitelist, try_files=True, msg="Reading whitelist from file: {filename}")
- ret.blacklist = chain_lists(ret.blacklist, try_files=True, msg="Reading blacklist from file: {filename}")
- ret.flags = chain_lists(ret.flags)
- ret.exclude_flags = chain_lists(ret.exclude_flags)
- ret.require_flags = chain_lists(ret.require_flags)
- for m in ret.modules:
- if m not in module_choices and not self._dummy:
- match_and_exit(m, module_choices, msg="module")
- for m in ret.exclude_modules:
- if m not in module_choices and not self._dummy:
- match_and_exit(m, module_choices, msg="module")
- for m in ret.output_modules:
- if m not in output_module_choices and not self._dummy:
- match_and_exit(m, output_module_choices, msg="output module")
- for f in set(ret.flags + ret.require_flags):
- if f not in flag_choices and not self._dummy:
- if f not in flag_choices and not self._dummy:
- match_and_exit(f, flag_choices, msg="flag")
- return ret
-
-
-class DummyArgumentParser(BBOTArgumentParser):
- _dummy = True
-
- def error(self, message):
- pass
-
-
-scan_examples = [
- (
- "Subdomains",
- "Perform a full subdomain enumeration on evilcorp.com",
- "bbot -t evilcorp.com -f subdomain-enum",
- ),
- (
- "Subdomains (passive only)",
- "Perform a passive-only subdomain enumeration on evilcorp.com",
- "bbot -t evilcorp.com -f subdomain-enum -rf passive",
- ),
- (
- "Subdomains + port scan + web screenshots",
- "Port-scan every subdomain, screenshot every webpage, output to current directory",
- "bbot -t evilcorp.com -f subdomain-enum -m nmap gowitness -n my_scan -o .",
- ),
- (
- "Subdomains + basic web scan",
- "A basic web scan includes wappalyzer, robots.txt, and other non-intrusive web modules",
- "bbot -t evilcorp.com -f subdomain-enum web-basic",
- ),
- (
- "Web spider",
- "Crawl www.evilcorp.com up to a max depth of 2, automatically extracting emails, secrets, etc.",
- "bbot -t www.evilcorp.com -m httpx robots badsecrets secretsdb -c web_spider_distance=2 web_spider_depth=2",
- ),
- (
- "Everything everywhere all at once",
- "Subdomains, emails, cloud buckets, port scan, basic web, web screenshots, nuclei",
- "bbot -t evilcorp.com -f subdomain-enum email-enum cloud-enum web-basic -m nmap gowitness nuclei --allow-deadly",
- ),
-]
-
-usage_examples = [
- (
- "List modules",
- "",
- "bbot -l",
- ),
- (
- "List flags",
- "",
- "bbot -lf",
- ),
-]
-
-
-epilog = "EXAMPLES\n"
-for example in (scan_examples, usage_examples):
- for title, description, command in example:
- epilog += f"\n {title}:\n {command}\n"
-
-
-parser = BBOTArgumentParser(
- description="Bighuge BLS OSINT Tool", formatter_class=argparse.RawTextHelpFormatter, epilog=epilog
-)
-dummy_parser = DummyArgumentParser(
- description="Bighuge BLS OSINT Tool", formatter_class=argparse.RawTextHelpFormatter, epilog=epilog
-)
-for p in (parser, dummy_parser):
- p.add_argument("--help-all", action="store_true", help="Display full help including module config options")
- target = p.add_argument_group(title="Target")
- target.add_argument("-t", "--targets", nargs="+", default=[], help="Targets to seed the scan", metavar="TARGET")
- target.add_argument(
- "-w",
- "--whitelist",
- nargs="+",
- default=[],
- help="What's considered in-scope (by default it's the same as --targets)",
- )
- target.add_argument("-b", "--blacklist", nargs="+", default=[], help="Don't touch these things")
- target.add_argument(
- "--strict-scope",
- action="store_true",
- help="Don't consider subdomains of target/whitelist to be in-scope",
- )
- modules = p.add_argument_group(title="Modules")
- modules.add_argument(
- "-m",
- "--modules",
- nargs="+",
- default=[],
- help=f'Modules to enable. Choices: {",".join(module_choices)}',
- metavar="MODULE",
- )
- modules.add_argument("-l", "--list-modules", action="store_true", help=f"List available modules.")
- modules.add_argument(
- "-em", "--exclude-modules", nargs="+", default=[], help=f"Exclude these modules.", metavar="MODULE"
- )
- modules.add_argument(
- "-f",
- "--flags",
- nargs="+",
- default=[],
- help=f'Enable modules by flag. Choices: {",".join(sorted(flag_choices))}',
- metavar="FLAG",
- )
- modules.add_argument("-lf", "--list-flags", action="store_true", help=f"List available flags.")
- modules.add_argument(
- "-rf",
- "--require-flags",
- nargs="+",
- default=[],
- help=f"Only enable modules with these flags (e.g. -rf passive)",
- metavar="FLAG",
- )
- modules.add_argument(
- "-ef",
- "--exclude-flags",
- nargs="+",
- default=[],
- help=f"Disable modules with these flags. (e.g. -ef aggressive)",
- metavar="FLAG",
- )
- modules.add_argument(
- "-om",
- "--output-modules",
- nargs="+",
- default=["human", "json", "csv"],
- help=f'Output module(s). Choices: {",".join(output_module_choices)}',
- metavar="MODULE",
- )
- modules.add_argument("--allow-deadly", action="store_true", help="Enable the use of highly aggressive modules")
- scan = p.add_argument_group(title="Scan")
- scan.add_argument("-n", "--name", help="Name of scan (default: random)", metavar="SCAN_NAME")
- scan.add_argument(
- "-o",
- "--output-dir",
- metavar="DIR",
- )
- scan.add_argument(
- "-c",
- "--config",
- nargs="*",
- help="custom config file, or configuration options in key=value format: 'modules.shodan.api_key=1234'",
- metavar="CONFIG",
- )
- scan.add_argument("-v", "--verbose", action="store_true", help="Be more verbose")
- scan.add_argument("-d", "--debug", action="store_true", help="Enable debugging")
- scan.add_argument("-s", "--silent", action="store_true", help="Be quiet")
- scan.add_argument("--force", action="store_true", help="Run scan even if module setups fail")
- scan.add_argument("-y", "--yes", action="store_true", help="Skip scan confirmation prompt")
- scan.add_argument("--dry-run", action="store_true", help=f"Abort before executing scan")
- scan.add_argument(
- "--current-config",
- action="store_true",
- help="Show current config in YAML format",
- )
- deps = p.add_argument_group(
- title="Module dependencies", description="Control how modules install their dependencies"
- )
- g2 = deps.add_mutually_exclusive_group()
- g2.add_argument("--no-deps", action="store_true", help="Don't install module dependencies")
- g2.add_argument("--force-deps", action="store_true", help="Force install all module dependencies")
- g2.add_argument("--retry-deps", action="store_true", help="Try again to install failed module dependencies")
- g2.add_argument(
- "--ignore-failed-deps", action="store_true", help="Run modules even if they have failed dependencies"
- )
- g2.add_argument("--install-all-deps", action="store_true", help="Install dependencies for all modules")
- agent = p.add_argument_group(title="Agent", description="Report back to a central server")
- agent.add_argument("-a", "--agent-mode", action="store_true", help="Start in agent mode")
- misc = p.add_argument_group(title="Misc")
- misc.add_argument("--version", action="store_true", help="show BBOT version and exit")
-
-
-cli_options = None
-with suppress(Exception):
- cli_options = dummy_parser.parse_args()
-
-
-cli_config = []
-
-
-def get_config():
- global cli_config
- with suppress(Exception):
- if cli_options.config:
- cli_config = cli_options.config
- if cli_config:
- filename = Path(cli_config[0]).resolve()
- if len(cli_config) == 1 and is_file(filename):
- try:
- conf = OmegaConf.load(str(filename))
- log_to_stderr(f"Loaded custom config from {filename}")
- return conf
- except Exception as e:
- log_to_stderr(f"Error parsing custom config at {filename}: {e}", level="ERROR")
- sys.exit(2)
- try:
- return OmegaConf.from_cli(cli_config)
- except Exception as e:
- log_to_stderr(f"Error parsing command-line config: {e}", level="ERROR")
- sys.exit(2)
diff --git a/bbot/core/configurator/environ.py b/bbot/core/configurator/environ.py
deleted file mode 100644
index 4358bb78d..000000000
--- a/bbot/core/configurator/environ.py
+++ /dev/null
@@ -1,153 +0,0 @@
-import os
-import sys
-import omegaconf
-from pathlib import Path
-
-from . import args
-from ...modules import module_loader
-from ..helpers.misc import cpu_architecture, os_platform, os_platform_friendly
-
-
-# keep track of whether BBOT is being executed via the CLI
-cli_execution = False
-
-
-def increase_limit(new_limit):
- try:
- import resource
-
- # Get current limit
- soft_limit, hard_limit = resource.getrlimit(resource.RLIMIT_NOFILE)
-
- new_limit = min(new_limit, hard_limit)
-
- # Attempt to set new limit
- resource.setrlimit(resource.RLIMIT_NOFILE, (new_limit, hard_limit))
- except Exception as e:
- sys.stderr.write(f"Failed to set new ulimit: {e}\n")
-
-
-increase_limit(65535)
-
-
-def flatten_config(config, base="bbot"):
- """
- Flatten a JSON-like config into a list of environment variables:
- {"modules": [{"httpx": {"timeout": 5}}]} --> "BBOT_MODULES_HTTPX_TIMEOUT=5"
- """
- if type(config) == omegaconf.dictconfig.DictConfig:
- for k, v in config.items():
- new_base = f"{base}_{k}"
- if type(v) == omegaconf.dictconfig.DictConfig:
- yield from flatten_config(v, base=new_base)
- elif type(v) != omegaconf.listconfig.ListConfig:
- yield (new_base.upper(), str(v))
-
-
-def add_to_path(v, k="PATH"):
- var_list = os.environ.get(k, "").split(":")
- deduped_var_list = []
- for _ in var_list:
- if not _ in deduped_var_list:
- deduped_var_list.append(_)
- if not v in deduped_var_list:
- deduped_var_list = [v] + deduped_var_list
- new_var_str = ":".join(deduped_var_list)
- os.environ[k] = new_var_str
-
-
-def prepare_environment(bbot_config):
- """
- Sync config to OS environment variables
- """
- # ensure bbot_home
- if not "home" in bbot_config:
- bbot_config["home"] = "~/.bbot"
- home = Path(bbot_config["home"]).expanduser().resolve()
- bbot_config["home"] = str(home)
-
- # if we're running in a virtual environment, make sure to include its /bin in PATH
- if sys.prefix != sys.base_prefix:
- bin_dir = str(Path(sys.prefix) / "bin")
- add_to_path(bin_dir)
-
- # add ~/.local/bin to PATH
- local_bin_dir = str(Path.home() / ".local" / "bin")
- add_to_path(local_bin_dir)
-
- # ensure bbot_tools
- bbot_tools = home / "tools"
- os.environ["BBOT_TOOLS"] = str(bbot_tools)
- if not str(bbot_tools) in os.environ.get("PATH", "").split(":"):
- os.environ["PATH"] = f'{bbot_tools}:{os.environ.get("PATH", "").strip(":")}'
- # ensure bbot_cache
- bbot_cache = home / "cache"
- os.environ["BBOT_CACHE"] = str(bbot_cache)
- # ensure bbot_temp
- bbot_temp = home / "temp"
- os.environ["BBOT_TEMP"] = str(bbot_temp)
- # ensure bbot_lib
- bbot_lib = home / "lib"
- os.environ["BBOT_LIB"] = str(bbot_lib)
- # export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:~/.bbot/lib/
- add_to_path(str(bbot_lib), k="LD_LIBRARY_PATH")
-
- # platform variables
- os.environ["BBOT_OS_PLATFORM"] = os_platform()
- os.environ["BBOT_OS"] = os_platform_friendly()
- os.environ["BBOT_CPU_ARCH"] = cpu_architecture()
-
- # exchange certain options between CLI args and config
- if cli_execution and args.cli_options is not None:
- # deps
- bbot_config["retry_deps"] = args.cli_options.retry_deps
- bbot_config["force_deps"] = args.cli_options.force_deps
- bbot_config["no_deps"] = args.cli_options.no_deps
- bbot_config["ignore_failed_deps"] = args.cli_options.ignore_failed_deps
- # debug
- bbot_config["debug"] = args.cli_options.debug
- bbot_config["silent"] = args.cli_options.silent
-
- import logging
-
- log = logging.getLogger()
- if bbot_config.get("debug", False):
- bbot_config["silent"] = False
- log = logging.getLogger("bbot")
- log.setLevel(logging.DEBUG)
- logging.getLogger("asyncio").setLevel(logging.DEBUG)
- elif bbot_config.get("silent", False):
- log = logging.getLogger("bbot")
- log.setLevel(logging.CRITICAL)
-
- # copy config to environment
- bbot_environ = flatten_config(bbot_config)
- os.environ.update(bbot_environ)
-
- # handle HTTP proxy
- http_proxy = bbot_config.get("http_proxy", "")
- if http_proxy:
- os.environ["HTTP_PROXY"] = http_proxy
- os.environ["HTTPS_PROXY"] = http_proxy
- else:
- os.environ.pop("HTTP_PROXY", None)
- os.environ.pop("HTTPS_PROXY", None)
-
- # replace environment variables in preloaded modules
- module_loader.find_and_replace(**os.environ)
-
- # ssl verification
- import urllib3
-
- urllib3.disable_warnings()
- ssl_verify = bbot_config.get("ssl_verify", False)
- if not ssl_verify:
- import requests
- import functools
-
- requests.adapters.BaseAdapter.send = functools.partialmethod(requests.adapters.BaseAdapter.send, verify=False)
- requests.adapters.HTTPAdapter.send = functools.partialmethod(requests.adapters.HTTPAdapter.send, verify=False)
- requests.Session.request = functools.partialmethod(requests.Session.request, verify=False)
- requests.request = functools.partial(requests.request, verify=False)
-
- return bbot_config
diff --git a/bbot/core/configurator/files.py b/bbot/core/configurator/files.py
deleted file mode 100644
index e56950597..000000000
--- a/bbot/core/configurator/files.py
+++ /dev/null
@@ -1,40 +0,0 @@
-import sys
-from pathlib import Path
-from omegaconf import OmegaConf
-
-from ..helpers.misc import mkdir
-from ..errors import ConfigLoadError
-from ..helpers.logger import log_to_stderr
-
-config_dir = (Path.home() / ".config" / "bbot").resolve()
-defaults_filename = (Path(__file__).parent.parent.parent / "defaults.yml").resolve()
-mkdir(config_dir)
-config_filename = (config_dir / "bbot.yml").resolve()
-secrets_filename = (config_dir / "secrets.yml").resolve()
-default_config = None
-
-
-def _get_config(filename, name="config"):
- notify = False
- if sys.argv and sys.argv[0].endswith("bbot") and not any(x in sys.argv for x in ("-s", "--silent")):
- notify = True
- filename = Path(filename).resolve()
- try:
- conf = OmegaConf.load(str(filename))
- if notify and __name__ == "__main__":
- log_to_stderr(f"Loaded {name} from {filename}")
- return conf
- except Exception as e:
- if filename.exists():
- raise ConfigLoadError(f"Error parsing config at {filename}:\n\n{e}")
- return OmegaConf.create()
-
-
-def get_config():
- global default_config
- default_config = _get_config(defaults_filename, name="defaults")
- return OmegaConf.merge(
- default_config,
- _get_config(config_filename, name="config"),
- _get_config(secrets_filename, name="secrets"),
- )
diff --git a/bbot/core/core.py b/bbot/core/core.py
new file mode 100644
index 000000000..e7eacf18d
--- /dev/null
+++ b/bbot/core/core.py
@@ -0,0 +1,213 @@
+import os
+import logging
+from copy import copy
+from pathlib import Path
+from contextlib import suppress
+from omegaconf import OmegaConf
+
+from bbot.errors import BBOTError
+
+
+DEFAULT_CONFIG = None
+
+
+class BBOTCore:
+ """
+ This is the first thing that loads when you import BBOT.
+
+ Unlike a Preset, BBOTCore holds only the config, not scan-specific stuff like targets, flags, modules, etc.
+
+ Its main jobs are:
+
+ - set up logging
+ - keep separation between the `default` and `custom` config (this allows presets to only display the config options that have changed)
+ - allow for easy merging of configs
+ - load quickly
+ """
+
+ # used for filtering out sensitive config values
+ secrets_strings = ["api_key", "username", "password", "token", "secret", "_id"]
+ # don't filter/remove entries under this key
+ secrets_exclude_keys = ["modules"]
+
+ def __init__(self):
+ self._logger = None
+ self._files_config = None
+
+ self.bbot_sudo_pass = None
+
+ self._config = None
+ self._custom_config = None
+
+ # bare minimum == logging
+ self.logger
+ self.log = logging.getLogger("bbot.core")
+
+ import multiprocessing
+
+ self.process_name = multiprocessing.current_process().name
+
+ @property
+ def home(self):
+ return Path(self.config["home"]).expanduser().resolve()
+
+ @property
+ def cache_dir(self):
+ return self.home / "cache"
+
+ @property
+ def tools_dir(self):
+ return self.home / "tools"
+
+ @property
+ def temp_dir(self):
+ return self.home / "temp"
+
+ @property
+ def lib_dir(self):
+ return self.home / "lib"
+
+ @property
+ def scans_dir(self):
+ return self.home / "scans"
+
+ @property
+ def config(self):
+ """
+ .config is just .default_config + .custom_config merged together
+
+ any new values should be added to custom_config.
+ """
+ if self._config is None:
+ self._config = OmegaConf.merge(self.default_config, self.custom_config)
+ # set read-only flag (change .custom_config instead)
+ OmegaConf.set_readonly(self._config, True)
+ return self._config
+
+ @property
+ def default_config(self):
+ """
+ The default BBOT config (from `defaults.yml`). Read-only.
+ """
+ global DEFAULT_CONFIG
+ if DEFAULT_CONFIG is None:
+ self.default_config = self.files_config.get_default_config()
+ # ensure bbot home dir
+ if not "home" in self.default_config:
+ self.default_config["home"] = "~/.bbot"
+ return DEFAULT_CONFIG
+
+ @default_config.setter
+ def default_config(self, value):
+ # we temporarily clear out the config so it can be refreshed if/when default_config changes
+ global DEFAULT_CONFIG
+ self._config = None
+ DEFAULT_CONFIG = value
+ # set read-only flag (change .custom_config instead)
+ OmegaConf.set_readonly(DEFAULT_CONFIG, True)
+
+ @property
+ def custom_config(self):
+ """
+ Custom BBOT config (from `~/.config/bbot/bbot.yml`)
+ """
+ # we temporarily clear out the config so it can be refreshed if/when custom_config changes
+ self._config = None
+ if self._custom_config is None:
+ self.custom_config = self.files_config.get_custom_config()
+ return self._custom_config
+
+ @custom_config.setter
+ def custom_config(self, value):
+ # we temporarily clear out the config so it can be refreshed if/when custom_config changes
+ self._config = None
+ # ensure the modules key is always a dictionary
+ modules_entry = value.get("modules", None)
+ if modules_entry is not None and not OmegaConf.is_dict(modules_entry):
+ value["modules"] = {}
+ self._custom_config = value
+
+ def no_secrets_config(self, config):
+ from .helpers.misc import clean_dict
+
+ with suppress(ValueError):
+ config = OmegaConf.to_object(config)
+
+ return clean_dict(
+ config,
+ *self.secrets_strings,
+ fuzzy=True,
+ exclude_keys=self.secrets_exclude_keys,
+ )
+
+ def secrets_only_config(self, config):
+ from .helpers.misc import filter_dict
+
+ with suppress(ValueError):
+ config = OmegaConf.to_object(config)
+
+ return filter_dict(
+ config,
+ *self.secrets_strings,
+ fuzzy=True,
+ exclude_keys=self.secrets_exclude_keys,
+ )
+
+ def merge_custom(self, config):
+ """
+ Merge a config into the custom config.
+ """
+ self.custom_config = OmegaConf.merge(self.custom_config, OmegaConf.create(config))
+
+ def merge_default(self, config):
+ """
+ Merge a config into the default config.
+ """
+ self.default_config = OmegaConf.merge(self.default_config, OmegaConf.create(config))
+
+ def copy(self):
+ """
+ Return a semi-shallow copy of self. (`custom_config` is copied, but `default_config` stays the same)
+ """
+ core_copy = copy(self)
+ core_copy._custom_config = self._custom_config.copy()
+ return core_copy
+
+ @property
+ def files_config(self):
+ """
+ Get the configs from `bbot.yml` and `defaults.yml`
+ """
+ if self._files_config is None:
+ from .config import files
+
+ self.files = files
+ self._files_config = files.BBOTConfigFiles(self)
+ return self._files_config
+
+ def create_process(self, *args, **kwargs):
+ if os.environ.get("BBOT_TESTING", "") == "True":
+ process = self.create_thread(*args, **kwargs)
+ else:
+ if self.process_name == "MainProcess":
+ from .helpers.process import BBOTProcess
+
+ process = BBOTProcess(*args, **kwargs)
+ else:
+ raise BBOTError(f"Tried to start server from process {self.process_name}")
+ process.daemon = True
+ return process
+
+ def create_thread(self, *args, **kwargs):
+ from .helpers.process import BBOTThread
+
+ return BBOTThread(*args, **kwargs)
+
+ @property
+ def logger(self):
+ self.config
+ if self._logger is None:
+ from .config.logger import BBOTLogger
+
+ self._logger = BBOTLogger(self)
+ return self._logger
diff --git a/bbot/core/engine.py b/bbot/core/engine.py
new file mode 100644
index 000000000..7f0b131d1
--- /dev/null
+++ b/bbot/core/engine.py
@@ -0,0 +1,434 @@
+import zmq
+import pickle
+import asyncio
+import inspect
+import logging
+import tempfile
+import traceback
+import zmq.asyncio
+from pathlib import Path
+from contextlib import asynccontextmanager, suppress
+
+from bbot.core import CORE
+from bbot.errors import BBOTEngineError
+from bbot.core.helpers.misc import rand_string
+
+
+error_sentinel = object()
+
+
+class EngineBase:
+ """
+ Base Engine class for Server and Client.
+
+ An Engine is a simple and lightweight RPC implementation that allows offloading async tasks
+ to a separate process. It leverages ZeroMQ in a ROUTER-DEALER configuration.
+
+ BBOT makes use of this by spawning a dedicated engine for DNS and HTTP tasks.
+ This offloads I/O and helps free up the main event loop for other tasks.
+
+ To use of Engine, you must subclass both EngineClient and EngineServer.
+
+ See the respective EngineClient and EngineServer classes for usage examples.
+ """
+
+ ERROR_CLASS = BBOTEngineError
+
+ def __init__(self):
+ self.log = logging.getLogger(f"bbot.core.{self.__class__.__name__.lower()}")
+
+ def pickle(self, obj):
+ try:
+ return pickle.dumps(obj)
+ except Exception as e:
+ self.log.error(f"Error serializing object: {obj}: {e}")
+ self.log.trace(traceback.format_exc())
+ return error_sentinel
+
+ def unpickle(self, binary):
+ try:
+ return pickle.loads(binary)
+ except Exception as e:
+ self.log.error(f"Error deserializing binary: {e}")
+ self.log.trace(f"Offending binary: {binary}")
+ self.log.trace(traceback.format_exc())
+ return error_sentinel
+
+
+class EngineClient(EngineBase):
+ """
+ The client portion of BBOT's RPC Engine.
+
+ To create an engine, you must create a subclass of this class and also
+ define methods for each of your desired functions.
+
+ Note that this only supports async functions. If you need to offload a synchronous function to another CPU, use BBOT's multiprocessing pool instead.
+
+ Any CPU or I/O intense logic should be implemented in the EngineServer.
+
+ These functions are typically stubs whose only job is to forward the arguments to the server.
+
+ Functions with the same names should be defined on the EngineServer.
+
+ The EngineClient must specify its associated server class via the `SERVER_CLASS` variable.
+
+ Depending on whether your function is a generator, you will use either `run_and_return()`, or `run_and_yield`.
+
+ Examples:
+ >>> from bbot.core.engine import EngineClient
+ >>>
+ >>> class MyClient(EngineClient):
+ >>> SERVER_CLASS = MyServer
+ >>>
+ >>> async def my_function(self, **kwargs)
+ >>> return await self.run_and_return("my_function", **kwargs)
+ >>>
+ >>> async def my_generator(self, **kwargs):
+ >>> async for _ in self.run_and_yield("my_generator", **kwargs):
+ >>> yield _
+ """
+
+ SERVER_CLASS = None
+
+ def __init__(self, **kwargs):
+ self._shutdown = False
+ super().__init__()
+ self.name = f"EngineClient {self.__class__.__name__}"
+ self.process = None
+ if self.SERVER_CLASS is None:
+ raise ValueError(f"Must set EngineClient SERVER_CLASS, {self.SERVER_CLASS}")
+ self.CMDS = dict(self.SERVER_CLASS.CMDS)
+ for k, v in list(self.CMDS.items()):
+ self.CMDS[v] = k
+ self.socket_address = f"zmq_{rand_string(8)}.sock"
+ self.socket_path = Path(tempfile.gettempdir()) / self.socket_address
+ self.server_kwargs = kwargs.pop("server_kwargs", {})
+ self._server_process = None
+ self.context = zmq.asyncio.Context()
+ self.context.setsockopt(zmq.LINGER, 0)
+ self.sockets = set()
+
+ def check_error(self, message):
+ if isinstance(message, dict) and len(message) == 1 and "_e" in message:
+ error, trace = message["_e"]
+ error = self.ERROR_CLASS(error)
+ error.engine_traceback = trace
+ raise error
+ return False
+
+ async def run_and_return(self, command, *args, **kwargs):
+ if self._shutdown:
+ self.log.verbose("Engine has been shut down and is not accepting new tasks")
+ return
+ async with self.new_socket() as socket:
+ try:
+ message = self.make_message(command, args=args, kwargs=kwargs)
+ if message is error_sentinel:
+ return
+ await socket.send(message)
+ binary = await socket.recv()
+ except BaseException:
+ # -1 == special "cancel" signal
+ cancel_message = pickle.dumps({"c": -1})
+ with suppress(Exception):
+ await socket.send(cancel_message)
+ raise
+ # self.log.debug(f"{self.name}.{command}({kwargs}) got binary: {binary}")
+ message = self.unpickle(binary)
+ self.log.debug(f"{self.name}.{command}({kwargs}) got message: {message}")
+ # error handling
+ if self.check_error(message):
+ return
+ return message
+
+ async def run_and_yield(self, command, *args, **kwargs):
+ if self._shutdown:
+ self.log.verbose("Engine has been shut down and is not accepting new tasks")
+ return
+ message = self.make_message(command, args=args, kwargs=kwargs)
+ if message is error_sentinel:
+ return
+ async with self.new_socket() as socket:
+ await socket.send(message)
+ while 1:
+ try:
+ binary = await socket.recv()
+ # self.log.debug(f"{self.name}.{command}({kwargs}) got binary: {binary}")
+ message = self.unpickle(binary)
+ self.log.debug(f"{self.name}.{command}({kwargs}) got message: {message}")
+ # error handling
+ if self.check_error(message) or self.check_stop(message):
+ break
+ yield message
+ except GeneratorExit:
+ # -1 == special "cancel" signal
+ cancel_message = pickle.dumps({"c": -1})
+ with suppress(Exception):
+ await socket.send(cancel_message)
+ raise
+
+ def check_stop(self, message):
+ if isinstance(message, dict) and len(message) == 1 and "_s" in message:
+ return True
+ return False
+
+ def make_message(self, command, args=None, kwargs=None):
+ try:
+ cmd_id = self.CMDS[command]
+ except KeyError:
+ raise KeyError(f'Command "{command}" not found. Available commands: {",".join(self.available_commands)}')
+ message = {"c": cmd_id}
+ if args:
+ message["a"] = args
+ if kwargs:
+ message["k"] = kwargs
+ return pickle.dumps(message)
+
+ @property
+ def available_commands(self):
+ return [s for s in self.CMDS if isinstance(s, str)]
+
+ def start_server(self):
+ import multiprocessing
+
+ process_name = multiprocessing.current_process().name
+ if process_name == "MainProcess":
+ self.process = CORE.create_process(
+ target=self.server_process,
+ args=(
+ self.SERVER_CLASS,
+ self.socket_path,
+ ),
+ kwargs=self.server_kwargs,
+ custom_name=f"BBOT {self.__class__.__name__}",
+ )
+ self.process.start()
+ return self.process
+ else:
+ raise BBOTEngineError(
+ f"Tried to start server from process {process_name}. Did you forget \"if __name__ == '__main__'?\""
+ )
+
+ @staticmethod
+ def server_process(server_class, socket_path, **kwargs):
+ try:
+ engine_server = server_class(socket_path, **kwargs)
+ asyncio.run(engine_server.worker())
+ except (asyncio.CancelledError, KeyboardInterrupt):
+ pass
+ except Exception:
+ import traceback
+
+ log = logging.getLogger("bbot.core.engine.server")
+ log.critical(f"Unhandled error in {server_class.__name__} server process: {traceback.format_exc()}")
+
+ @asynccontextmanager
+ async def new_socket(self):
+ if self._server_process is None:
+ self._server_process = self.start_server()
+ while not self.socket_path.exists():
+ await asyncio.sleep(0.1)
+ socket = self.context.socket(zmq.DEALER)
+ socket.setsockopt(zmq.LINGER, 0)
+ socket.connect(f"ipc://{self.socket_path}")
+ self.sockets.add(socket)
+ try:
+ yield socket
+ finally:
+ self.sockets.remove(socket)
+ with suppress(Exception):
+ socket.close()
+
+ async def shutdown(self):
+ self._shutdown = True
+ async with self.new_socket() as socket:
+ # -99 == special shutdown signal
+ shutdown_message = pickle.dumps({"c": -99})
+ await socket.send(shutdown_message)
+ for socket in self.sockets:
+ socket.close()
+ self.context.term()
+ # delete socket file on exit
+ self.socket_path.unlink(missing_ok=True)
+
+
+class EngineServer(EngineBase):
+ """
+ The server portion of BBOT's RPC Engine.
+
+ Methods defined here must match the methods in your EngineClient.
+
+ To use the functions, you must create mappings for them in the CMDS attribute, as shown below.
+
+ Examples:
+ >>> from bbot.core.engine import EngineServer
+ >>>
+ >>> class MyServer(EngineServer):
+ >>> CMDS = {
+ >>> 0: "my_function",
+ >>> 1: "my_generator",
+ >>> }
+ >>>
+ >>> def my_function(self, arg1=None):
+ >>> await asyncio.sleep(1)
+ >>> return str(arg1)
+ >>>
+ >>> def my_generator(self):
+ >>> for i in range(10):
+ >>> await asyncio.sleep(1)
+ >>> yield i
+ """
+
+ CMDS = {}
+
+ def __init__(self, socket_path):
+ super().__init__()
+ self.name = f"EngineServer {self.__class__.__name__}"
+ self.socket_path = socket_path
+ if self.socket_path is not None:
+ # create ZeroMQ context
+ self.context = zmq.asyncio.Context()
+ self.context.setsockopt(zmq.LINGER, 0)
+ # ROUTER socket can handle multiple concurrent requests
+ self.socket = self.context.socket(zmq.ROUTER)
+ self.socket.setsockopt(zmq.LINGER, 0)
+ # create socket file
+ self.socket.bind(f"ipc://{self.socket_path}")
+ # task <--> client id mapping
+ self.tasks = dict()
+
+ async def run_and_return(self, client_id, command_fn, *args, **kwargs):
+ try:
+ self.log.debug(f"{self.name} run-and-return {command_fn.__name__}({args}, {kwargs})")
+ try:
+ result = await command_fn(*args, **kwargs)
+ except (asyncio.CancelledError, KeyboardInterrupt):
+ return
+ except BaseException as e:
+ error = f"Error in {self.name}.{command_fn.__name__}({args}, {kwargs}): {e}"
+ trace = traceback.format_exc()
+ self.log.debug(error)
+ self.log.debug(trace)
+ result = {"_e": (error, trace)}
+ finally:
+ self.tasks.pop(client_id, None)
+ await self.send_socket_multipart(client_id, result)
+ except BaseException as e:
+ self.log.critical(
+ f"Unhandled exception in {self.name}.run_and_return({client_id}, {command_fn}, {args}, {kwargs}): {e}"
+ )
+ self.log.critical(traceback.format_exc())
+
+ async def run_and_yield(self, client_id, command_fn, *args, **kwargs):
+ try:
+ self.log.debug(f"{self.name} run-and-yield {command_fn.__name__}({args}, {kwargs})")
+ try:
+ async for _ in command_fn(*args, **kwargs):
+ await self.send_socket_multipart(client_id, _)
+ await self.send_socket_multipart(client_id, {"_s": None})
+ except (asyncio.CancelledError, KeyboardInterrupt):
+ return
+ except BaseException as e:
+ error = f"Error in {self.name}.{command_fn.__name__}({args}, {kwargs}): {e}"
+ trace = traceback.format_exc()
+ self.log.debug(error)
+ self.log.debug(trace)
+ result = {"_e": (error, trace)}
+ await self.send_socket_multipart(client_id, result)
+ finally:
+ self.tasks.pop(client_id, None)
+ except BaseException as e:
+ self.log.critical(
+ f"Unhandled exception in {self.name}.run_and_yield({client_id}, {command_fn}, {args}, {kwargs}): {e}"
+ )
+ self.log.critical(traceback.format_exc())
+
+ async def send_socket_multipart(self, client_id, message):
+ try:
+ message = pickle.dumps(message)
+ await self.socket.send_multipart([client_id, message])
+ except Exception as e:
+ self.log.warning(f"Error sending ZMQ message: {e}")
+ self.log.trace(traceback.format_exc())
+
+ def check_error(self, message):
+ if message is error_sentinel:
+ return True
+
+ async def worker(self):
+ try:
+ while 1:
+ client_id, binary = await self.socket.recv_multipart()
+ message = self.unpickle(binary)
+ self.log.debug(f"{self.name} got message: {message}")
+ if self.check_error(message):
+ continue
+
+ cmd = message.get("c", None)
+ if not isinstance(cmd, int):
+ self.log.warning(f"No command sent in message: {message}")
+ continue
+
+ # -1 == cancel task
+ if cmd == -1:
+ await self.cancel_task(client_id)
+ continue
+
+ # -99 == shut down engine
+ if cmd == -99:
+ self.log.verbose("Got shutdown signal, shutting down...")
+ await self.cancel_all_tasks()
+ return
+
+ args = message.get("a", ())
+ if not isinstance(args, tuple):
+ self.log.warning(f"{self.name}: received invalid args of type {type(args)}, should be tuple")
+ continue
+ kwargs = message.get("k", {})
+ if not isinstance(kwargs, dict):
+ self.log.warning(f"{self.name}: received invalid kwargs of type {type(kwargs)}, should be dict")
+ continue
+
+ command_name = self.CMDS[cmd]
+ command_fn = getattr(self, command_name, None)
+
+ if command_fn is None:
+ self.log.warning(f'{self.name} has no function named "{command_fn}"')
+ continue
+
+ if inspect.isasyncgenfunction(command_fn):
+ coroutine = self.run_and_yield(client_id, command_fn, *args, **kwargs)
+ else:
+ coroutine = self.run_and_return(client_id, command_fn, *args, **kwargs)
+
+ task = asyncio.create_task(coroutine)
+ self.tasks[client_id] = task, command_fn, args, kwargs
+ except Exception as e:
+ self.log.error(f"Error in EngineServer worker: {e}")
+ self.log.trace(traceback.format_exc())
+ finally:
+ self.socket.close()
+ self.context.term()
+ # delete socket file on exit
+ self.socket_path.unlink(missing_ok=True)
+
+ async def cancel_task(self, client_id):
+ task = self.tasks.get(client_id, None)
+ if task is None:
+ return
+ task, _cmd, _args, _kwargs = task
+ self.log.debug(f"Cancelling client id {client_id} (task: {task})")
+ task.cancel()
+ try:
+ await task
+ except (KeyboardInterrupt, asyncio.CancelledError):
+ pass
+ except BaseException as e:
+ self.log.error(f"Unhandled error in {_cmd}({_args}, {_kwargs}): {e}")
+ self.log.trace(traceback.format_exc())
+ finally:
+ self.tasks.pop(client_id, None)
+
+ async def cancel_all_tasks(self):
+ for client_id in self.tasks:
+ await self.cancel_task(client_id)
diff --git a/bbot/core/event/__init__.py b/bbot/core/event/__init__.py
index e4410fcea..b5d1c8608 100644
--- a/bbot/core/event/__init__.py
+++ b/bbot/core/event/__init__.py
@@ -1,2 +1 @@
-from .helpers import make_event_id
from .base import make_event, is_event, event_from_json
diff --git a/bbot/core/event/base.py b/bbot/core/event/base.py
index c539936f5..c23769aad 100644
--- a/bbot/core/event/base.py
+++ b/bbot/core/event/base.py
@@ -2,39 +2,41 @@
import re
import json
import base64
-import asyncio
import logging
import tarfile
+import datetime
import ipaddress
import traceback
from copy import copy
+from pathlib import Path
from typing import Optional
-from datetime import datetime
from contextlib import suppress
-from urllib.parse import urljoin
+from radixtarget import RadixTarget
+from urllib.parse import urljoin, parse_qs
from pydantic import BaseModel, field_validator
-from pathlib import Path
+
from .helpers import *
-from bbot.core.errors import *
+from bbot.errors import *
from bbot.core.helpers import (
extract_words,
- get_file_extension,
- host_in_host,
is_domain,
is_subdomain,
is_ip,
is_ptr,
is_uri,
+ url_depth,
domain_stem,
make_netloc,
make_ip_type,
recursive_decode,
+ sha1,
smart_decode,
split_host_port,
tagify,
validators,
+ get_file_extension,
)
@@ -71,8 +73,8 @@ class BaseEvent:
scan (Scanner): The scan object that generated the event.
timestamp (datetime.datetime): The time at which the data was discovered.
resolved_hosts (list of str): List of hosts to which the event data resolves, applicable for URLs and DNS names.
- source (BaseEvent): The source event that led to the discovery of this event.
- source_id (str): The `id` attribute of the source event.
+ parent (BaseEvent): The parent event that led to the discovery of this event.
+ parent_id (str): The `id` attribute of the parent event.
tags (set of str): Descriptive tags for the event, e.g., `mx-record`, `in-scope`.
module (BaseModule): The module that discovered the event.
module_sequence (str): The sequence of modules that participated in the discovery.
@@ -88,7 +90,7 @@ class BaseEvent:
"scan": "SCAN:4d786912dbc97be199da13074699c318e2067a7f",
"timestamp": 1688526222.723366,
"resolved_hosts": ["185.199.108.153"],
- "source": "OPEN_TCP_PORT:cf7e6a937b161217eaed99f0c566eae045d094c7",
+ "parent": "OPEN_TCP_PORT:cf7e6a937b161217eaed99f0c566eae045d094c7",
"tags": ["in-scope", "distance-0", "dir", "ip-185-199-108-153", "status-301", "http-title-301-moved-permanently"],
"module": "httpx",
"module_sequence": "httpx"
@@ -99,14 +101,12 @@ class BaseEvent:
# Always emit this event type even if it's not in scope
_always_emit = False
# Always emit events with these tags even if they're not in scope
- _always_emit_tags = ["affiliate"]
+ _always_emit_tags = ["affiliate", "target"]
# Bypass scope checking and dns resolution, distribute immediately to modules
# This is useful for "end-of-line" events like FINDING and VULNERABILITY
_quick_emit = False
# Whether this event has been retroactively marked as part of an important discovery chain
_graph_important = False
- # Exclude from output modules
- _omit = False
# Disables certain data validations
_dummy = False
# Data validation, if data is a dictionary
@@ -118,7 +118,8 @@ def __init__(
self,
data,
event_type,
- source=None,
+ parent=None,
+ context=None,
module=None,
scan=None,
scans=None,
@@ -137,7 +138,7 @@ def __init__(
Attributes:
data (str, dict): The primary data for the event.
event_type (str, optional): Type of the event, e.g., 'IP_ADDRESS'.
- source (BaseEvent, optional): Source event that led to this event's discovery. Defaults to None.
+ parent (BaseEvent, optional): Parent event that led to this event's discovery. Defaults to None.
module (str, optional): Module that discovered the event. Defaults to None.
scan (Scan, optional): BBOT Scan object. Required unless _dummy is True. Defaults to None.
scans (list of Scan, optional): BBOT Scan objects, used primarily when unserializing an Event from the database. Defaults to None.
@@ -148,36 +149,45 @@ def __init__(
_internal (Any, optional): If specified, makes the event internal. Defaults to None.
Raises:
- ValidationError: If either `scan` or `source` are not specified and `_dummy` is False.
+ ValidationError: If either `scan` or `parent` are not specified and `_dummy` is False.
"""
self._id = None
self._hash = None
+ self._data = None
self.__host = None
+ self._tags = set()
self._port = None
+ self._omit = False
self.__words = None
+ self._parent = None
self._priority = None
+ self._parent_id = None
+ self._host_original = None
self._module_priority = None
self._resolved_hosts = set()
+ self.dns_children = dict()
+ self._discovery_context = ""
+
+ # for creating one-off events without enforcing parent requirement
+ self._dummy = _dummy
+ self.module = module
+ self._type = event_type
# keep track of whether this event has been recorded by the scan
self._stats_recorded = False
- self.timestamp = datetime.utcnow()
-
- self._tags = set()
- if tags is not None:
- self._tags = set(tagify(s) for s in tags)
+ if timestamp is not None:
+ self.timestamp = timestamp
+ else:
+ try:
+ self.timestamp = datetime.datetime.now(datetime.UTC)
+ except AttributeError:
+ self.timestamp = datetime.datetime.utcnow()
- self._data = None
- self._type = event_type
self.confidence = int(confidence)
-
- # for creating one-off events without enforcing source requirement
- self._dummy = _dummy
self._internal = False
- self.module = module
# self.scan holds the instantiated scan object (for helpers, etc.)
self.scan = scan
if (not self.scan) and (not self._dummy):
@@ -189,9 +199,6 @@ def __init__(
if self.scan:
self.scans = list(set([self.scan.id] + self.scans))
- # check type blacklist
- self._check_omit()
-
self._scope_distance = -1
try:
@@ -203,23 +210,27 @@ def __init__(
if not self.data:
raise ValidationError(f'Invalid event data "{data}" for type "{self.type}"')
- self._source = None
- self._source_id = None
- self.source = source
- if (not self.source) and (not self._dummy):
- raise ValidationError(f"Must specify event source")
+ self.parent = parent
+ if (not self.parent) and (not self._dummy):
+ raise ValidationError(f"Must specify event parent")
+
+ # inherit web spider distance from parent
+ self.web_spider_distance = getattr(self.parent, "web_spider_distance", 0)
+
+ if tags is not None:
+ for tag in tags:
+ self.add_tag(tag)
# internal events are not ingested by output modules
if not self._dummy:
# removed this second part because it was making certain sslcert events internal
- if _internal: # or source._internal:
+ if _internal: # or parent._internal:
self.internal = True
- # an event indicating whether the event has undergone DNS resolution
- self._resolved = asyncio.Event()
-
- # inherit web spider distance from parent
- self.web_spider_distance = getattr(self.source, "web_spider_distance", 0)
+ if not context:
+ context = getattr(self.module, "default_discovery_context", "")
+ if context:
+ self.discovery_context = context
@property
def data(self):
@@ -236,6 +247,7 @@ def resolved_hosts(self):
@data.setter
def data(self, data):
self._hash = None
+ self._data_hash = None
self._id = None
self.__host = None
self._port = None
@@ -283,18 +295,33 @@ def host(self):
E.g. for IP_ADDRESS, it could be an ipaddress.IPv4Address() or IPv6Address() object
"""
if self.__host is None:
- self.__host = self._host()
+ self.host = self._host()
return self.__host
+ @host.setter
+ def host(self, host):
+ if self._host_original is None:
+ self._host_original = host
+ self.__host = host
+
+ @property
+ def host_original(self):
+ """
+ Original host data, in case it was changed due to a wildcard DNS, etc.
+ """
+ if self._host_original is None:
+ return self.host
+ return self._host_original
+
@property
def port(self):
self.host
- if getattr(self, "parsed", None):
- if self.parsed.port is not None:
- return self.parsed.port
- elif self.parsed.scheme == "https":
+ if getattr(self, "parsed_url", None):
+ if self.parsed_url.port is not None:
+ return self.parsed_url.port
+ elif self.parsed_url.scheme == "https":
return 443
- elif self.parsed.scheme == "http":
+ elif self.parsed_url.scheme == "http":
return 80
return self._port
@@ -309,6 +336,26 @@ def host_stem(self):
else:
return f"{self.host}"
+ @property
+ def discovery_context(self):
+ return self._discovery_context
+
+ @discovery_context.setter
+ def discovery_context(self, context):
+ try:
+ self._discovery_context = context.format(module=self.module, event=self)
+ except Exception as e:
+ log.warning(f"Error formatting discovery context for {self}: {e} (context: '{context}')")
+ self._discovery_context = context
+
+ @property
+ def discovery_path(self):
+ """
+ This event's full discovery context, including those of all its parents
+ """
+ full_event_chain = list(reversed(self.get_parents())) + [self]
+ return [e.discovery_context for e in full_event_chain if e.type != "SCAN"]
+
@property
def words(self):
if self.__words is None:
@@ -324,9 +371,11 @@ def tags(self):
@tags.setter
def tags(self, tags):
+ self._tags = set()
if isinstance(tags, str):
tags = (tags,)
- self._tags = set(tagify(s) for s in tags)
+ for tag in tags:
+ self.add_tag(tag)
def add_tag(self, tag):
self._tags.add(tagify(tag))
@@ -351,10 +400,22 @@ def quick_emit(self):
@property
def id(self):
+ """
+ A uniquely identifiable hash of the event from the event type + a SHA1 of its data
+ """
if self._id is None:
- self._id = make_event_id(self.data_id, self.type)
+ self._id = f"{self.type}:{self.data_hash.hex()}"
return self._id
+ @property
+ def data_hash(self):
+ """
+ A raw byte hash of the event's data
+ """
+ if self._data_hash is None:
+ self._data_hash = sha1(self.data_id).digest()
+ return self._data_hash
+
@property
def scope_distance(self):
return self._scope_distance
@@ -394,76 +455,105 @@ def scope_distance(self, scope_distance):
self.add_tag(f"distance-{new_scope_distance}")
self._scope_distance = new_scope_distance
# apply recursively to parent events
- source_scope_distance = getattr(self.source, "scope_distance", -1)
- if source_scope_distance >= 0 and self != self.source:
- self.source.scope_distance = scope_distance + 1
+ parent_scope_distance = getattr(self.parent, "scope_distance", -1)
+ if parent_scope_distance >= 0 and self != self.parent:
+ self.parent.scope_distance = scope_distance + 1
+
+ @property
+ def scope_description(self):
+ """
+ Returns a single word describing the scope of the event.
+
+ "in-scope" if the event is in scope, "affiliate" if it's an affiliate, otherwise "distance-{scope_distance}"
+ """
+ if self.scope_distance == 0:
+ return "in-scope"
+ elif "affiliate" in self.tags:
+ return "affiliate"
+ return f"distance-{self.scope_distance}"
@property
- def source(self):
- return self._source
+ def parent(self):
+ return self._parent
- @source.setter
- def source(self, source):
+ @parent.setter
+ def parent(self, parent):
"""
- Setter for the source attribute, ensuring it's a valid event and updating scope distance.
+ Setter for the parent attribute, ensuring it's a valid event and updating scope distance.
- Sets the source of the event and automatically adjusts the scope distance based on the source event's
- scope distance. The scope distance is incremented by 1 if the host of the source event is different
+ Sets the parent of the event and automatically adjusts the scope distance based on the parent event's
+ scope distance. The scope distance is incremented by 1 if the host of the parent event is different
from the current event's host.
Parameters:
- source (BaseEvent): The new source event to set. Must be a valid event object.
+ parent (BaseEvent): The new parent event to set. Must be a valid event object.
Note:
- If an invalid source is provided and the event is not a dummy, a warning will be logged.
+ If an invalid parent is provided and the event is not a dummy, a warning will be logged.
"""
- if is_event(source):
- self._source = source
- hosts_are_same = self.host and (self.host == source.host)
- if source.scope_distance >= 0:
- new_scope_distance = int(source.scope_distance)
+ if is_event(parent):
+ self._parent = parent
+ hosts_are_same = self.host and (self.host == parent.host)
+ if parent.scope_distance >= 0:
+ new_scope_distance = int(parent.scope_distance)
# only increment the scope distance if the host changes
if self._scope_distance_increment_same_host or not hosts_are_same:
new_scope_distance += 1
self.scope_distance = new_scope_distance
# inherit certain tags
if hosts_are_same:
- for t in source.tags:
+ for t in parent.tags:
if t == "affiliate":
self.add_tag("affiliate")
elif t.startswith("mutation-"):
self.add_tag(t)
elif not self._dummy:
- log.warning(f"Tried to set invalid source on {self}: (got: {source})")
+ log.warning(f"Tried to set invalid parent on {self}: (got: {parent})")
+
+ @property
+ def parent_id(self):
+ parent_id = getattr(self.get_parent(), "id", None)
+ if parent_id is not None:
+ return parent_id
+ return self._parent_id
@property
- def source_id(self):
- source_id = getattr(self.get_source(), "id", None)
- if source_id is not None:
- return source_id
- return self._source_id
+ def validators(self):
+ """
+ Depending on whether the scan attribute is accessible, return either a config-aware or non-config-aware validator
+
+ This exists to prevent a chicken-and-egg scenario during the creation of certain events such as URLs,
+ whose sanitization behavior is different depending on the config.
+
+ However, thanks to this property, validation can still work in the absence of a config.
+ """
+ if self.scan is not None:
+ return self.scan.helpers.config_aware_validators
+ return validators
- def get_source(self):
+ def get_parent(self):
"""
Takes into account events with the _omit flag
"""
- if getattr(self.source, "_omit", False):
- return self.source.get_source()
- return self.source
+ if getattr(self.parent, "_omit", False):
+ return self.parent.get_parent()
+ return self.parent
- def get_sources(self, omit=False):
- sources = []
+ def get_parents(self, omit=False):
+ parents = []
e = self
while 1:
if omit:
- source = e.get_source()
+ parent = e.get_parent()
else:
- source = e.source
- if e == source:
+ parent = e.parent
+ if parent is None:
break
- sources.append(source)
- e = source
- return sources
+ if e == parent:
+ break
+ parents.append(parent)
+ e = parent
+ return parents
def _host(self):
return ""
@@ -572,7 +662,9 @@ def __contains__(self, other):
if self.host == other.host:
return True
# hostnames and IPs
- return host_in_host(other.host, self.host)
+ radixtarget = RadixTarget()
+ radixtarget.insert(self.host)
+ return bool(radixtarget.search(other.host))
return False
def json(self, mode="json", siem_friendly=False):
@@ -589,11 +681,13 @@ def json(self, mode="json", siem_friendly=False):
Returns:
dict: JSON-serializable dictionary representation of the event object.
"""
+ # type, ID, scope description
j = dict()
- for i in ("type", "id"):
+ for i in ("type", "id", "scope_description"):
v = getattr(self, i, "")
if v:
j.update({i: v})
+ # event data
data_attr = getattr(self, f"data_{mode}", None)
if data_attr is not None:
data = data_attr
@@ -603,30 +697,44 @@ def json(self, mode="json", siem_friendly=False):
j["data"] = {self.type: data}
else:
j["data"] = data
+ # host, dns children
+ if self.host:
+ j["host"] = str(self.host)
+ j["resolved_hosts"] = sorted(str(h) for h in self.resolved_hosts)
+ j["dns_children"] = {k: list(v) for k, v in self.dns_children.items()}
+ # web spider distance
web_spider_distance = getattr(self, "web_spider_distance", None)
if web_spider_distance is not None:
j["web_spider_distance"] = web_spider_distance
+ # scope distance
j["scope_distance"] = self.scope_distance
+ # scan
if self.scan:
j["scan"] = self.scan.id
+ # timestamp
j["timestamp"] = self.timestamp.timestamp()
- if self.host:
- j["resolved_hosts"] = [str(h) for h in self.resolved_hosts]
- source_id = self.source_id
- if source_id:
- j["source"] = source_id
+ # parent event
+ parent_id = self.parent_id
+ if parent_id:
+ j["parent"] = parent_id
+ # tags
if self.tags:
j.update({"tags": list(self.tags)})
+ # parent module
if self.module:
j.update({"module": str(self.module)})
+ # sequence of modules that led to discovery
if self.module_sequence:
j.update({"module_sequence": str(self.module_sequence)})
+ # discovery context
+ j["discovery_context"] = self.discovery_context
+ j["discovery_path"] = self.discovery_path
# normalize non-primitive python objects
for k, v in list(j.items()):
if k == "data":
continue
- if type(v) not in (str, int, float, bool, list, type(None)):
+ if type(v) not in (str, int, float, bool, list, dict, type(None)):
try:
j[k] = json.dumps(v, sort_keys=True)
except Exception:
@@ -653,14 +761,14 @@ def module_sequence(self):
"""
Get a human-friendly string that represents the sequence of modules responsible for generating this event.
- Includes the names of omitted source events to provide a complete view of the module sequence leading to this event.
+ Includes the names of omitted parent events to provide a complete view of the module sequence leading to this event.
Returns:
str: The module sequence in human-friendly format.
"""
module_name = getattr(self.module, "name", "")
- if getattr(self.source, "_omit", False):
- module_name = f"{self.source.module_sequence}->{module_name}"
+ if getattr(self.parent, "_omit", False):
+ module_name = f"{self.parent.module_sequence}->{module_name}"
return module_name
@property
@@ -678,10 +786,10 @@ def module_priority(self, priority):
def priority(self):
if self._priority is None:
timestamp = self.timestamp.timestamp()
- if self.source.timestamp == self.timestamp:
+ if self.parent.timestamp == self.timestamp:
self._priority = (timestamp,)
else:
- self._priority = getattr(self.source, "priority", ()) + (timestamp,)
+ self._priority = getattr(self.parent, "priority", ()) + (timestamp,)
return self._priority
@@ -694,13 +802,24 @@ def type(self, val):
self._type = val
self._hash = None
self._id = None
- self._check_omit()
- def _check_omit(self):
- if self.scan is not None:
- omit_event_types = self.scan.config.get("omit_event_types", [])
- if omit_event_types and self.type in omit_event_types:
- self._omit = True
+ @property
+ def _host_size(self):
+ """
+ Used for sorting events by their host size, so that parent ones (e.g. IP subnets) come first
+ """
+ if self.host:
+ if isinstance(self.host, str):
+ # smaller domains should come first
+ return len(self.host)
+ else:
+ try:
+ # bigger IP subnets should come first
+ return -self.host.num_addresses
+ except AttributeError:
+ # IP addresses default to 1
+ return 1
+ return 0
def __iter__(self):
"""
@@ -741,6 +860,11 @@ def __repr__(self):
return str(self)
+class SCAN(BaseEvent):
+ def _data_human(self):
+ return f"{self.data['name']} ({self.data['id']})"
+
+
class FINISHED(BaseEvent):
"""
Special signal event to indicate end of scan
@@ -748,7 +872,7 @@ class FINISHED(BaseEvent):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
- self._priority = (999999999999999999999,)
+ self._priority = (999999999999999,)
class DefaultEvent(BaseEvent):
@@ -760,7 +884,7 @@ class DictEvent(BaseEvent):
def sanitize_data(self, data):
url = data.get("url", "")
if url:
- self.parsed = validators.validate_url_parsed(url)
+ self.parsed_url = self.validators.validate_url_parsed(url)
return data
def _data_load(self, data):
@@ -774,7 +898,7 @@ def _host(self):
if isinstance(self.data, dict) and "host" in self.data:
return make_ip_type(self.data["host"])
else:
- parsed = getattr(self, "parsed", None)
+ parsed = getattr(self, "parsed_url", None)
if parsed is not None:
return make_ip_type(parsed.hostname)
@@ -838,8 +962,8 @@ def __init__(self, *args, **kwargs):
ip = ipaddress.ip_address(self.data)
self.add_tag(f"ipv{ip.version}")
if ip.is_private:
- self.add_tag("private")
- self.dns_resolve_distance = getattr(self.source, "dns_resolve_distance", 0)
+ self.add_tag("private-ip")
+ self.dns_resolve_distance = getattr(self.parent, "dns_resolve_distance", 0)
def sanitize_data(self, data):
return validators.validate_host(data)
@@ -853,14 +977,14 @@ def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# prevent runaway DNS entries
self.dns_resolve_distance = 0
- source = getattr(self, "source", None)
+ parent = getattr(self, "parent", None)
module = getattr(self, "module", None)
module_type = getattr(module, "_type", "")
- source_module = getattr(source, "module", None)
- source_module_type = getattr(source_module, "_type", "")
+ parent_module = getattr(parent, "module", None)
+ parent_module_type = getattr(parent_module, "_type", "")
if module_type == "DNS":
- self.dns_resolve_distance = getattr(source, "dns_resolve_distance", 0)
- if source_module_type == "DNS":
+ self.dns_resolve_distance = getattr(parent, "dns_resolve_distance", 0)
+ if parent_module_type == "DNS":
self.dns_resolve_distance += 1
# self.add_tag(f"resolve-distance-{self.dns_resolve_distance}")
@@ -924,57 +1048,84 @@ class URL_UNVERIFIED(BaseEvent):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
- # increment the web spider distance
- if self.type == "URL_UNVERIFIED" and getattr(self.module, "name", "") != "TARGET":
- self.web_spider_distance += 1
- self.num_redirects = getattr(self.source, "num_redirects", 0)
+ self.num_redirects = getattr(self.parent, "num_redirects", 0)
+
+ def _data_id(self):
+
+ data = super()._data_id()
+
+ # remove the querystring for URL/URL_UNVERIFIED events, because we will conditionally add it back in (based on settings)
+ if self.__class__.__name__.startswith("URL") and self.scan is not None:
+ prefix = data.split("?")[0]
+
+ # consider spider-danger tag when deduping
+ if "spider-danger" in self.tags:
+ prefix += "spider-danger"
+
+ if not self.scan.config.get("url_querystring_remove", True) and self.parsed_url.query:
+ query_dict = parse_qs(self.parsed_url.query)
+ if self.scan.config.get("url_querystring_collapse", True):
+ # Only consider parameter names in dedup (collapse values)
+ cleaned_query = "|".join(sorted(query_dict.keys()))
+ else:
+ # Consider parameter names and values in dedup
+ cleaned_query = "&".join(
+ f"{key}={','.join(sorted(values))}" for key, values in sorted(query_dict.items())
+ )
+ data = f"{prefix}:{self.parsed_url.scheme}:{self.parsed_url.netloc}:{self.parsed_url.path}:{cleaned_query}"
+ return data
def sanitize_data(self, data):
- self.parsed = validators.validate_url_parsed(data)
+ self.parsed_url = self.validators.validate_url_parsed(data)
+
+ # special handling of URL extensions
+ if self.parsed_url is not None:
+ url_path = self.parsed_url.path
+ if url_path:
+ parsed_path_lower = str(url_path).lower()
+ extension = get_file_extension(parsed_path_lower)
+ if extension:
+ self.url_extension = extension
+ self.add_tag(f"extension-{extension}")
# tag as dir or endpoint
- if str(self.parsed.path).endswith("/"):
+ if str(self.parsed_url.path).endswith("/"):
self.add_tag("dir")
else:
self.add_tag("endpoint")
- parsed_path_lower = str(self.parsed.path).lower()
-
- scan = getattr(self, "scan", None)
- url_extension_blacklist = getattr(scan, "url_extension_blacklist", [])
- url_extension_httpx_only = getattr(scan, "url_extension_httpx_only", [])
+ data = self.parsed_url.geturl()
+ return data
- extension = get_file_extension(parsed_path_lower)
- if extension:
- self.add_tag(f"extension-{extension}")
- if extension in url_extension_blacklist:
- self.add_tag("blacklisted")
- if extension in url_extension_httpx_only:
- self.add_tag("httpx-only")
- self._omit = True
+ def add_tag(self, tag):
+ if tag == "spider-danger":
+ # increment the web spider distance
+ if self.type == "URL_UNVERIFIED":
+ self.web_spider_distance += 1
+ if self.is_spider_max:
+ self.add_tag("spider-max")
+ super().add_tag(tag)
- data = self.parsed.geturl()
- return data
+ @property
+ def is_spider_max(self):
+ if self.scan:
+ depth = url_depth(self.parsed_url)
+ if (self.web_spider_distance > self.scan.web_spider_distance) or (depth > self.scan.web_spider_depth):
+ return True
+ return False
def with_port(self):
netloc_with_port = make_netloc(self.host, self.port)
- return self.parsed._replace(netloc=netloc_with_port)
+ return self.parsed_url._replace(netloc=netloc_with_port)
def _words(self):
- first_elem = self.parsed.path.lstrip("/").split("/")[0]
+ first_elem = self.parsed_url.path.lstrip("/").split("/")[0]
if not "." in first_elem:
return extract_words(first_elem)
return set()
def _host(self):
- return make_ip_type(self.parsed.hostname)
-
- def _data_id(self):
- # consider spider-danger tag when deduping
- data = super()._data_id()
- if "spider-danger" in self.tags:
- data = "spider-danger" + data
- return data
+ return make_ip_type(self.parsed_url.hostname)
@property
def http_status(self):
@@ -986,16 +1137,19 @@ def http_status(self):
class URL(URL_UNVERIFIED):
- def sanitize_data(self, data):
+
+ def __init__(self, *args, **kwargs):
+ super().__init__(*args, **kwargs)
+
if not self._dummy and not any(t.startswith("status-") for t in self.tags):
raise ValidationError(
'Must specify HTTP status tag for URL event, e.g. "status-200". Use URL_UNVERIFIED if the URL is unvisited.'
)
- return super().sanitize_data(data)
@property
def resolved_hosts(self):
- return [".".join(i.split("-")[1:]) for i in self.tags if i.startswith("ip-")]
+ # TODO: remove this when we rip out httpx
+ return set(".".join(i.split("-")[1:]) for i in self.tags if i.startswith("ip-"))
@property
def pretty_string(self):
@@ -1023,6 +1177,24 @@ class URL_HINT(URL_UNVERIFIED):
pass
+class WEB_PARAMETER(DictHostEvent):
+
+ def _data_id(self):
+ # dedupe by url:name:param_type
+ url = self.data.get("url", "")
+ name = self.data.get("name", "")
+ param_type = self.data.get("type", "")
+ return f"{url}:{name}:{param_type}"
+
+ def _url(self):
+ return self.data["url"]
+
+ def __str__(self):
+ max_event_len = 200
+ d = str(self.data)
+ return f'{self.type}("{d[:max_event_len]}{("..." if len(d) > max_event_len else "")}", module={self.module}, tags={self.tags})'
+
+
class EMAIL_ADDRESS(BaseEvent):
def sanitize_data(self, data):
return validators.validate_email(data)
@@ -1040,14 +1212,17 @@ class HTTP_RESPONSE(URL_UNVERIFIED, DictEvent):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# count number of consecutive redirects
- self.num_redirects = getattr(self.source, "num_redirects", 0)
+ self.num_redirects = getattr(self.parent, "num_redirects", 0)
if str(self.http_status).startswith("3"):
self.num_redirects += 1
+ def _data_id(self):
+ return self.data["method"] + "|" + self.data["url"]
+
def sanitize_data(self, data):
url = data.get("url", "")
- self.parsed = validators.validate_url_parsed(url)
- data["url"] = self.parsed.geturl()
+ self.parsed_url = self.validators.validate_url_parsed(url)
+ data["url"] = self.parsed_url.geturl()
header_dict = {}
for i in data.get("raw_header", "").splitlines():
@@ -1055,7 +1230,11 @@ def sanitize_data(self, data):
k, v = i.split(":", 1)
k = k.strip().lower()
v = v.lstrip()
- header_dict[k] = v
+ if k in header_dict:
+ header_dict[k].append(v)
+ else:
+ header_dict[k] = [v]
+
data["header-dict"] = header_dict
# move URL to the front of the dictionary for visibility
data = dict(data)
@@ -1095,7 +1274,7 @@ def redirect_location(self):
# if there's no scheme (i.e. it's a relative redirect)
if not scheme:
# then join the location with the current url
- location = urljoin(self.parsed.geturl(), location)
+ location = urljoin(self.parsed_url.geturl(), location)
return location
@@ -1249,10 +1428,15 @@ class FILESYSTEM(DictPathEvent):
pass
+class RAW_DNS_RECORD(DictHostEvent):
+ pass
+
+
def make_event(
data,
event_type=None,
- source=None,
+ parent=None,
+ context=None,
module=None,
scan=None,
scans=None,
@@ -1271,7 +1455,8 @@ def make_event(
Parameters:
data (Union[str, dict, BaseEvent]): The primary data for the event or an existing event object.
event_type (str, optional): Type of the event, e.g., 'IP_ADDRESS'. Auto-detected if not provided.
- source (BaseEvent, optional): Source event leading to this event's discovery.
+ parent (BaseEvent, optional): Parent event leading to this event's discovery.
+ context (str, optional): Description of circumstances leading to event's discovery.
module (str, optional): Module that discovered the event.
scan (Scan, optional): BBOT Scan object associated with the event.
scans (List[Scan], optional): Multiple BBOT Scan objects, primarily used for unserialization.
@@ -1288,11 +1473,11 @@ def make_event(
Examples:
If inside a module, e.g. from within its `handle_event()`:
- >>> self.make_event("1.2.3.4", source=event)
- IP_ADDRESS("1.2.3.4", module=nmap, tags={'ipv4', 'distance-1'})
+ >>> self.make_event("1.2.3.4", parent=event)
+ IP_ADDRESS("1.2.3.4", module=portscan, tags={'ipv4', 'distance-1'})
If you're outside a module but you have a scan object:
- >>> scan.make_event("1.2.3.4", source=scan.root_event)
+ >>> scan.make_event("1.2.3.4", parent=scan.root_event)
IP_ADDRESS("1.2.3.4", module=None, tags={'ipv4', 'distance-1'})
If you're outside a scan and just messing around:
@@ -1320,8 +1505,10 @@ def make_event(
data.scans = scans
if module is not None:
data.module = module
- if source is not None:
- data.source = source
+ if parent is not None:
+ data.parent = parent
+ if context is not None:
+ data.discovery_context = context
if internal == True:
data.internal = True
if tags:
@@ -1363,7 +1550,8 @@ def make_event(
return event_class(
data,
event_type=event_type,
- source=source,
+ parent=parent,
+ context=context,
module=module,
scan=scan,
scans=scans,
@@ -1403,6 +1591,7 @@ def event_from_json(j, siem_friendly=False):
"scans": j.get("scans", []),
"tags": j.get("tags", []),
"confidence": j.get("confidence", 5),
+ "context": j.get("discovery_context", None),
"dummy": True,
}
if siem_friendly:
@@ -1415,11 +1604,11 @@ def event_from_json(j, siem_friendly=False):
resolved_hosts = j.get("resolved_hosts", [])
event._resolved_hosts = set(resolved_hosts)
- event.timestamp = datetime.fromtimestamp(j["timestamp"])
+ event.timestamp = datetime.datetime.fromtimestamp(j["timestamp"])
event.scope_distance = j["scope_distance"]
- source_id = j.get("source", None)
- if source_id is not None:
- event._source_id = source_id
+ parent_id = j.get("parent", None)
+ if parent_id is not None:
+ event._parent_id = parent_id
return event
except KeyError as e:
raise ValidationError(f"Event missing required field: {e}")
diff --git a/bbot/core/event/helpers.py b/bbot/core/event/helpers.py
index d3ad3ee78..0e3bd5fcd 100644
--- a/bbot/core/event/helpers.py
+++ b/bbot/core/event/helpers.py
@@ -2,9 +2,9 @@
import ipaddress
from contextlib import suppress
-from bbot.core.errors import ValidationError
+from bbot.errors import ValidationError
from bbot.core.helpers.regexes import event_type_regexes
-from bbot.core.helpers import sha1, smart_decode, smart_encode_punycode
+from bbot.core.helpers import smart_decode, smart_encode_punycode
log = logging.getLogger("bbot.core.event.helpers")
@@ -50,7 +50,3 @@ def get_event_type(data):
return t, data
raise ValidationError(f'Unable to autodetect event type from "{data}"')
-
-
-def make_event_id(data, event_type):
- return f"{event_type}:{sha1(data).hexdigest()}"
diff --git a/bbot/core/flags.py b/bbot/core/flags.py
index c6b675798..f65dbad28 100644
--- a/bbot/core/flags.py
+++ b/bbot/core/flags.py
@@ -4,6 +4,7 @@
"aggressive": "Generates a large amount of network traffic",
"baddns": "Runs all modules from the DNS auditing tool BadDNS",
"cloud-enum": "Enumerates cloud resources",
+ "code-enum": "Find public code repositories and search them for secrets etc.",
"deadly": "Highly aggressive",
"email-enum": "Enumerates email addresses",
"iis-shortnames": "Scans for IIS Shortname vulnerability",
@@ -14,7 +15,6 @@
"service-enum": "Identifies protocols running on open ports",
"slow": "May take a long time to complete",
"social-enum": "Enumerates social media",
- "repo-enum": "Enumerates code repositories",
"subdomain-enum": "Enumerates subdomains",
"subdomain-hijack": "Detects hijackable subdomains",
"web-basic": "Basic, non-intrusive web scan functionality",
diff --git a/bbot/core/helpers/async_helpers.py b/bbot/core/helpers/async_helpers.py
index 8434ccb0f..dcc510ee4 100644
--- a/bbot/core/helpers/async_helpers.py
+++ b/bbot/core/helpers/async_helpers.py
@@ -2,7 +2,6 @@
import random
import asyncio
import logging
-import threading
from datetime import datetime
from queue import Queue, Empty
from cachetools import LRUCache
@@ -118,8 +117,10 @@ def generator():
if is_done:
break
+ from .process import BBOTThread
+
# Start the event loop in a separate thread
- thread = threading.Thread(target=lambda: asyncio.run(runner()))
+ thread = BBOTThread(target=lambda: asyncio.run(runner()), daemon=True, custom_name="bbot async_to_sync_gen()")
thread.start()
# Return the generator
diff --git a/bbot/core/helpers/bloom.py b/bbot/core/helpers/bloom.py
new file mode 100644
index 000000000..357c715c0
--- /dev/null
+++ b/bbot/core/helpers/bloom.py
@@ -0,0 +1,71 @@
+import os
+import mmh3
+import mmap
+
+
+class BloomFilter:
+ """
+ Simple bloom filter implementation capable of rougly 400K lookups/s.
+
+ BBOT uses bloom filters in scenarios like DNS brute-forcing, where it's useful to keep track
+ of which mutations have been tried so far.
+
+ A 100-megabyte bloom filter (800M bits) can store 10M entries with a .01% false-positive rate.
+ A python hash is 36 bytes. So if you wanted to store these in a set, this would take up
+ 36 * 10M * 2 (key+value) == 720 megabytes. So we save rougly 7 times the space.
+ """
+
+ def __init__(self, size=8000000):
+ self.size = size # total bits
+ self.byte_size = (size + 7) // 8 # calculate byte size needed for the given number of bits
+
+ # Create an anonymous mmap region, compatible with both Windows and Unix
+ if os.name == "nt": # Windows
+ # -1 indicates an anonymous memory map in Windows
+ self.mmap_file = mmap.mmap(-1, self.byte_size)
+ else: # Unix/Linux
+ # Use MAP_ANONYMOUS along with MAP_SHARED
+ self.mmap_file = mmap.mmap(-1, self.byte_size, prot=mmap.PROT_WRITE, flags=mmap.MAP_ANON | mmap.MAP_SHARED)
+
+ self.clear_all_bits()
+
+ def add(self, item):
+ for hash_value in self._hashes(item):
+ index = hash_value // 8
+ position = hash_value % 8
+ current_byte = self.mmap_file[index]
+ self.mmap_file[index] = current_byte | (1 << position)
+
+ def check(self, item):
+ for hash_value in self._hashes(item):
+ index = hash_value // 8
+ position = hash_value % 8
+ current_byte = self.mmap_file[index]
+ if not (current_byte & (1 << position)):
+ return False
+ return True
+
+ def clear_all_bits(self):
+ self.mmap_file.seek(0)
+ # Write zeros across the entire mmap length
+ self.mmap_file.write(b"\x00" * self.byte_size)
+
+ def _hashes(self, item):
+ if not isinstance(item, bytes):
+ if not isinstance(item, str):
+ item = str(item)
+ item = item.encode("utf-8")
+ return [abs(hash(item)) % self.size, abs(mmh3.hash(item)) % self.size, abs(self._fnv1a_hash(item)) % self.size]
+
+ def _fnv1a_hash(self, data):
+ hash = 0x811C9DC5 # 2166136261
+ for byte in data:
+ hash ^= byte
+ hash = (hash * 0x01000193) % 2**32 # 16777619
+ return hash
+
+ def __del__(self):
+ self.mmap_file.close()
+
+ def __contains__(self, item):
+ return self.check(item)
diff --git a/bbot/core/helpers/cloud.py b/bbot/core/helpers/cloud.py
deleted file mode 100644
index 811ca070c..000000000
--- a/bbot/core/helpers/cloud.py
+++ /dev/null
@@ -1,104 +0,0 @@
-import asyncio
-import logging
-
-from cloudcheck import cloud_providers
-
-log = logging.getLogger("bbot.helpers.cloud")
-
-
-class CloudHelper:
- def __init__(self, parent_helper):
- self.parent_helper = parent_helper
- self.providers = cloud_providers
- self.dummy_modules = {}
- for provider_name in self.providers.providers:
- self.dummy_modules[provider_name] = self.parent_helper._make_dummy_module(
- f"{provider_name}_cloud", _type="scan"
- )
- self._updated = False
- self._update_lock = asyncio.Lock()
-
- def excavate(self, event, s):
- """
- Extract buckets, etc. from strings such as an HTTP responses
- """
- for provider in self:
- provider_name = provider.name.lower()
- base_kwargs = {"source": event, "tags": [f"cloud-{provider_name}"], "_provider": provider_name}
- for event_type, sigs in provider.signatures.items():
- found = set()
- for sig in sigs:
- for match in sig.findall(s):
- kwargs = dict(base_kwargs)
- kwargs["event_type"] = event_type
- if not match in found:
- found.add(match)
- if event_type == "STORAGE_BUCKET":
- self.emit_bucket(match, **kwargs)
- else:
- self.emit_event(**kwargs)
-
- def speculate(self, event):
- """
- Look for DNS_NAMEs that are buckets or other cloud resources
- """
- for provider in self:
- provider_name = provider.name.lower()
- base_kwargs = dict(
- source=event, tags=[f"{provider.provider_type}-{provider_name}"], _provider=provider_name
- )
- if event.type.startswith("DNS_NAME"):
- for event_type, sigs in provider.signatures.items():
- found = set()
- for sig in sigs:
- match = sig.match(event.data)
- if match:
- kwargs = dict(base_kwargs)
- kwargs["event_type"] = event_type
- if not event.data in found:
- found.add(event.data)
- if event_type == "STORAGE_BUCKET":
- self.emit_bucket(match.groups(), **kwargs)
- else:
- self.emit_event(**kwargs)
-
- def emit_bucket(self, match, **kwargs):
- bucket_name, bucket_domain = match
- kwargs["data"] = {"name": bucket_name, "url": f"https://{bucket_name}.{bucket_domain}"}
- self.emit_event(**kwargs)
-
- def emit_event(self, *args, **kwargs):
- provider_name = kwargs.pop("_provider")
- dummy_module = self.dummy_modules[provider_name]
- event = dummy_module.make_event(*args, **kwargs)
- if event:
- self.parent_helper.scan.manager.queue_event(event)
-
- async def tag_event(self, event):
- """
- Tags an event according to cloud provider
- """
- async with self._update_lock:
- if not self._updated:
- await self.providers.update()
- self._updated = True
-
- if event.host:
- for host in [event.host] + list(event.resolved_hosts):
- provider_name, provider_type, source = self.providers.check(host)
- if provider_name is not None:
- provider = self.providers.providers[provider_name.lower()]
- event.add_tag(f"{provider_type}-{provider_name.lower()}")
- # if its host directly matches this cloud provider's domains
- if not self.parent_helper.is_ip(host):
- # tag as buckets, etc.
- for event_type, sigs in provider.signatures.items():
- for sig in sigs:
- if sig.match(host):
- event.add_tag(f"{provider_type}-{event_type}")
-
- def __getitem__(self, item):
- return self.providers.providers[item.lower()]
-
- def __iter__(self):
- yield from self.providers
diff --git a/bbot/core/helpers/command.py b/bbot/core/helpers/command.py
index 59751cbee..7283291fc 100644
--- a/bbot/core/helpers/command.py
+++ b/bbot/core/helpers/command.py
@@ -38,6 +38,7 @@ async def run(self, *command, check=False, text=True, idle_timeout=None, **kwarg
# proc_tracker optionally keeps track of which processes are running under which modules
# this allows for graceful SIGINTing of a module's processes in the case when it's killed
proc_tracker = kwargs.pop("_proc_tracker", set())
+ log_stderr = kwargs.pop("_log_stderr", True)
proc, _input, command = await self._spawn_proc(*command, **kwargs)
if proc is not None:
proc_tracker.add(proc)
@@ -66,7 +67,7 @@ async def run(self, *command, check=False, text=True, idle_timeout=None, **kwarg
if proc.returncode:
if check:
raise CalledProcessError(proc.returncode, command, output=stdout, stderr=stderr)
- if stderr:
+ if stderr and log_stderr:
command_str = " ".join(command)
log.warning(f"Stderr for run({command_str}):\n\t{stderr}")
@@ -103,6 +104,7 @@ async def run_live(self, *command, check=False, text=True, idle_timeout=None, **
# proc_tracker optionally keeps track of which processes are running under which modules
# this allows for graceful SIGINTing of a module's processes in the case when it's killed
proc_tracker = kwargs.pop("_proc_tracker", set())
+ log_stderr = kwargs.pop("_log_stderr", True)
proc, _input, command = await self._spawn_proc(*command, **kwargs)
if proc is not None:
proc_tracker.add(proc)
@@ -151,7 +153,7 @@ async def run_live(self, *command, check=False, text=True, idle_timeout=None, **
if check:
raise CalledProcessError(proc.returncode, command, output=stdout, stderr=stderr)
# surface stderr
- if stderr:
+ if stderr and log_stderr:
command_str = " ".join(command)
log.warning(f"Stderr for run_live({command_str}):\n\t{stderr}")
finally:
@@ -201,11 +203,13 @@ async def _write_proc_line(proc, chunk):
try:
proc.stdin.write(smart_encode(chunk) + b"\n")
await proc.stdin.drain()
+ return True
except Exception as e:
proc_args = [str(s) for s in getattr(proc, "args", [])]
command = " ".join(proc_args)
log.warning(f"Error writing line to stdin for command: {command}: {e}")
log.trace(traceback.format_exc())
+ return False
async def _write_stdin(proc, _input):
@@ -225,10 +229,14 @@ async def _write_stdin(proc, _input):
_input = [_input]
if isinstance(_input, (list, tuple)):
for chunk in _input:
- await _write_proc_line(proc, chunk)
+ write_result = await _write_proc_line(proc, chunk)
+ if not write_result:
+ break
else:
async for chunk in _input:
- await _write_proc_line(proc, chunk)
+ write_result = await _write_proc_line(proc, chunk)
+ if not write_result:
+ break
proc.stdin.close()
diff --git a/bbot/core/helpers/depsinstaller/installer.py b/bbot/core/helpers/depsinstaller/installer.py
index 049baef86..f17c96499 100644
--- a/bbot/core/helpers/depsinstaller/installer.py
+++ b/bbot/core/helpers/depsinstaller/installer.py
@@ -13,8 +13,6 @@
from ansible_runner.interface import run
from subprocess import CalledProcessError
-from bbot.core import configurator
-from bbot.modules import module_loader
from ..misc import can_sudo_without_password, os_platform
log = logging.getLogger("bbot.core.helpers.depsinstaller")
@@ -23,17 +21,20 @@
class DepsInstaller:
def __init__(self, parent_helper):
self.parent_helper = parent_helper
+ self.preset = self.parent_helper.preset
+ self.core = self.preset.core
# respect BBOT's http timeout
- http_timeout = self.parent_helper.config.get("http_timeout", 30)
+ self.web_config = self.parent_helper.config.get("web", {})
+ http_timeout = self.web_config.get("http_timeout", 30)
os.environ["ANSIBLE_TIMEOUT"] = str(http_timeout)
self.askpass_filename = "sudo_askpass.py"
self._installed_sudo_askpass = False
self._sudo_password = os.environ.get("BBOT_SUDO_PASS", None)
if self._sudo_password is None:
- if configurator.bbot_sudo_pass is not None:
- self._sudo_password = configurator.bbot_sudo_pass
+ if self.core.bbot_sudo_pass is not None:
+ self._sudo_password = self.core.bbot_sudo_pass
elif can_sudo_without_password():
self._sudo_password = ""
self.data_dir = self.parent_helper.cache_dir / "depsinstaller"
@@ -43,17 +44,12 @@ def __init__(self, parent_helper):
self.parent_helper.mkdir(self.command_status)
self.setup_status = self.read_setup_status()
- self.no_deps = self.parent_helper.config.get("no_deps", False)
- self.ansible_debug = True
- self.force_deps = self.parent_helper.config.get("force_deps", False)
- self.retry_deps = self.parent_helper.config.get("retry_deps", False)
- self.ignore_failed_deps = self.parent_helper.config.get("ignore_failed_deps", False)
+ self.deps_behavior = self.parent_helper.config.get("deps_behavior", "abort_on_failure").lower()
+ self.ansible_debug = self.core.logger.log_level <= logging.DEBUG
self.venv = ""
if sys.prefix != sys.base_prefix:
self.venv = sys.prefix
- self.all_modules_preloaded = module_loader.preloaded()
-
self.ensure_root_lock = Lock()
async def install(self, *modules):
@@ -64,7 +60,7 @@ async def install(self, *modules):
notified = False
for m in modules:
# assume success if we're ignoring dependencies
- if self.no_deps:
+ if self.deps_behavior == "disable":
succeeded.append(m)
continue
# abort if module name is unknown
@@ -73,6 +69,7 @@ async def install(self, *modules):
failed.append(m)
continue
preloaded = self.all_modules_preloaded[m]
+ log.debug(f"Installing {m} - Preloaded Deps {preloaded['deps']}")
# make a hash of the dependencies and check if it's already been handled
# take into consideration whether the venv or bbot home directory changes
module_hash = self.parent_helper.sha1(
@@ -84,11 +81,15 @@ async def install(self, *modules):
success = self.setup_status.get(module_hash, None)
dependencies = list(chain(*preloaded["deps"].values()))
if len(dependencies) <= 0:
- log.debug(f'No setup to do for module "{m}"')
+ log.debug(f'No dependency work to do for module "{m}"')
succeeded.append(m)
continue
else:
- if success is None or (success is False and self.retry_deps) or self.force_deps:
+ if (
+ success is None
+ or (success is False and self.deps_behavior == "retry_failed")
+ or self.deps_behavior == "force_install"
+ ):
if not notified:
log.hugeinfo(f"Installing module dependencies. Please be patient, this may take a while.")
notified = True
@@ -98,14 +99,14 @@ async def install(self, *modules):
self.ensure_root(f'Module "{m}" needs root privileges to install its dependencies.')
success = await self.install_module(m)
self.setup_status[module_hash] = success
- if success or self.ignore_failed_deps:
+ if success or self.deps_behavior == "ignore_failed":
log.debug(f'Setup succeeded for module "{m}"')
succeeded.append(m)
else:
log.warning(f'Setup failed for module "{m}"')
failed.append(m)
else:
- if success or self.ignore_failed_deps:
+ if success or self.deps_behavior == "ignore_failed":
log.debug(
f'Skipping dependency install for module "{m}" because it\'s already done (--force-deps to re-run)'
)
@@ -148,6 +149,20 @@ async def install_module(self, module):
if deps_pip:
success &= await self.pip_install(deps_pip, constraints=deps_pip_constraints)
+ # shared/common
+ deps_common = preloaded["deps"]["common"]
+ if deps_common:
+ for dep_common in deps_common:
+ if self.setup_status.get(dep_common, False) == True:
+ log.debug(
+ f'Skipping installation of dependency "{dep_common}" for module "{module}" since it is already installed'
+ )
+ continue
+ ansible_tasks = self.preset.module_loader._shared_deps[dep_common]
+ result = self.tasks(module, ansible_tasks)
+ self.setup_status[dep_common] = result
+ success &= result
+
return success
async def pip_install(self, packages, constraints=None):
@@ -310,7 +325,7 @@ def ensure_root(self, message=""):
if self.parent_helper.verify_sudo_password(password):
log.success("Authentication successful")
self._sudo_password = password
- configurator.bbot_sudo_pass = password
+ self.core.bbot_sudo_pass = password
else:
log.warning("Incorrect password")
@@ -336,3 +351,7 @@ def _install_sudo_askpass(self):
askpass_dst = self.parent_helper.tools_dir / self.askpass_filename
shutil.copy(askpass_src, askpass_dst)
askpass_dst.chmod(askpass_dst.stat().st_mode | stat.S_IEXEC)
+
+ @property
+ def all_modules_preloaded(self):
+ return self.preset.module_loader.preloaded()
diff --git a/bbot/core/helpers/diff.py b/bbot/core/helpers/diff.py
index 5df86fc0f..59ee96567 100644
--- a/bbot/core/helpers/diff.py
+++ b/bbot/core/helpers/diff.py
@@ -3,19 +3,43 @@
from deepdiff import DeepDiff
from contextlib import suppress
from xml.parsers.expat import ExpatError
-from bbot.core.errors import HttpCompareError
+from bbot.errors import HttpCompareError
log = logging.getLogger("bbot.core.helpers.diff")
class HttpCompare:
- def __init__(self, baseline_url, parent_helper, method="GET", allow_redirects=False, include_cache_buster=True):
+ def __init__(
+ self,
+ baseline_url,
+ parent_helper,
+ method="GET",
+ data=None,
+ allow_redirects=False,
+ include_cache_buster=True,
+ headers=None,
+ cookies=None,
+ timeout=15,
+ ):
self.parent_helper = parent_helper
self.baseline_url = baseline_url
self.include_cache_buster = include_cache_buster
self.method = method
+ self.data = data
self.allow_redirects = allow_redirects
self._baselined = False
+ self.headers = headers
+ self.cookies = cookies
+ self.timeout = 15
+
+ @staticmethod
+ def merge_dictionaries(headers1, headers2):
+ if headers2 is None:
+ return headers1
+ else:
+ merged_headers = headers1.copy()
+ merged_headers.update(headers2)
+ return merged_headers
async def _baseline(self):
if not self._baselined:
@@ -25,7 +49,14 @@ async def _baseline(self):
else:
url_1 = self.baseline_url
baseline_1 = await self.parent_helper.request(
- url_1, follow_redirects=self.allow_redirects, method=self.method
+ url_1,
+ follow_redirects=self.allow_redirects,
+ method=self.method,
+ data=self.data,
+ headers=self.headers,
+ cookies=self.cookies,
+ retries=2,
+ timeout=self.timeout,
)
await self.parent_helper.sleep(1)
# put random parameters in URL, headers, and cookies
@@ -36,10 +67,17 @@ async def _baseline(self):
url_2 = self.parent_helper.add_get_params(self.baseline_url, get_params).geturl()
baseline_2 = await self.parent_helper.request(
url_2,
- headers={self.parent_helper.rand_string(6): self.parent_helper.rand_string(6)},
- cookies={self.parent_helper.rand_string(6): self.parent_helper.rand_string(6)},
+ headers=self.merge_dictionaries(
+ {self.parent_helper.rand_string(6): self.parent_helper.rand_string(6)}, self.headers
+ ),
+ cookies=self.merge_dictionaries(
+ {self.parent_helper.rand_string(6): self.parent_helper.rand_string(6)}, self.cookies
+ ),
follow_redirects=self.allow_redirects,
method=self.method,
+ data=self.data,
+ retries=2,
+ timeout=self.timeout,
)
self.baseline = baseline_1
@@ -79,6 +117,7 @@ async def _baseline(self):
"ETag",
"X-Pad",
"X-Backside-Transport",
+ "keep-alive",
]
]
dynamic_headers = self.compare_headers(baseline_1.headers, baseline_2.headers)
@@ -123,7 +162,15 @@ def compare_body(self, content_1, content_2):
return False
async def compare(
- self, subject, headers=None, cookies=None, check_reflection=False, method="GET", allow_redirects=False
+ self,
+ subject,
+ headers=None,
+ cookies=None,
+ check_reflection=False,
+ method="GET",
+ data=None,
+ allow_redirects=False,
+ timeout=None,
):
"""
Compares a URL with the baseline, with optional headers or cookies added
@@ -133,8 +180,12 @@ async def compare(
"reason" is the location of the change ("code", "body", "header", or None), and
"reflection" is whether the value was reflected in the HTTP response
"""
+
await self._baseline()
+ if timeout == None:
+ timeout = self.timeout
+
reflection = False
if self.include_cache_buster:
cache_key, cache_value = list(self.gen_cache_buster().items())[0]
@@ -142,7 +193,13 @@ async def compare(
else:
url = subject
subject_response = await self.parent_helper.request(
- url, headers=headers, cookies=cookies, follow_redirects=allow_redirects, method=method
+ url,
+ headers=headers,
+ cookies=cookies,
+ follow_redirects=allow_redirects,
+ method=method,
+ data=data,
+ timeout=timeout,
)
if subject_response is None:
@@ -190,7 +247,7 @@ async def compare(
diff_reasons.append("body")
if not diff_reasons:
- return (True, [], reflection, None)
+ return (True, [], reflection, subject_response)
else:
return (False, diff_reasons, reflection, subject_response)
diff --git a/bbot/core/helpers/dns.py b/bbot/core/helpers/dns.py
deleted file mode 100644
index 63177756f..000000000
--- a/bbot/core/helpers/dns.py
+++ /dev/null
@@ -1,1022 +0,0 @@
-import dns
-import time
-import asyncio
-import logging
-import ipaddress
-import traceback
-import contextlib
-import dns.exception
-import dns.asyncresolver
-from cachetools import LRUCache
-from contextlib import suppress
-
-from .regexes import dns_name_regex
-from bbot.core.helpers.ratelimiter import RateLimiter
-from bbot.core.helpers.async_helpers import NamedLock
-from bbot.core.errors import ValidationError, DNSError, DNSWildcardBreak
-from .misc import is_ip, is_domain, is_dns_name, domain_parents, parent_domain, rand_string, cloudcheck
-
-log = logging.getLogger("bbot.core.helpers.dns")
-
-
-class BBOTAsyncResolver(dns.asyncresolver.Resolver):
- """Custom asynchronous resolver for BBOT with rate limiting.
-
- This class extends dnspython's async resolver and provides additional support for rate-limiting DNS queries.
- The maximum number of queries allowed per second can be customized via BBOT's config.
-
- Attributes:
- _parent_helper: A reference to the instantiated `ConfigAwareHelper` (typically `scan.helpers`).
- _dns_rate_limiter (RateLimiter): An instance of the RateLimiter class for DNS query rate-limiting.
-
- Args:
- *args: Positional arguments passed to the base resolver.
- **kwargs: Keyword arguments. '_parent_helper' is expected among these to provide configuration data for
- rate-limiting. All other keyword arguments are passed to the base resolver.
- """
-
- def __init__(self, *args, **kwargs):
- self._parent_helper = kwargs.pop("_parent_helper")
- dns_queries_per_second = self._parent_helper.config.get("dns_queries_per_second", 100)
- self._dns_rate_limiter = RateLimiter(dns_queries_per_second, "DNS")
- super().__init__(*args, **kwargs)
- self.rotate = True
-
- async def resolve(self, *args, **kwargs):
- async with self._dns_rate_limiter:
- return await super().resolve(*args, **kwargs)
-
-
-class DNSHelper:
- """Helper class for DNS-related operations within BBOT.
-
- This class provides mechanisms for host resolution, wildcard domain detection, event tagging, and more.
- It centralizes all DNS-related activities in BBOT, offering both synchronous and asynchronous methods
- for DNS resolution, as well as various utilities for batch resolution and DNS query filtering.
-
- Attributes:
- parent_helper: A reference to the instantiated `ConfigAwareHelper` (typically `scan.helpers`).
- resolver (BBOTAsyncResolver): An asynchronous DNS resolver tailored for BBOT with rate-limiting capabilities.
- timeout (int): The timeout value for DNS queries. Defaults to 5 seconds.
- retries (int): The number of retries for failed DNS queries. Defaults to 1.
- abort_threshold (int): The threshold for aborting after consecutive failed queries. Defaults to 50.
- max_dns_resolve_distance (int): Maximum allowed distance for DNS resolution. Defaults to 4.
- all_rdtypes (list): A list of DNS record types to be considered during operations.
- wildcard_ignore (tuple): Domains to be ignored during wildcard detection.
- wildcard_tests (int): Number of tests to be run for wildcard detection. Defaults to 5.
- _wildcard_cache (dict): Cache for wildcard detection results.
- _dns_cache (LRUCache): Cache for DNS resolution results, limited in size.
- _event_cache (LRUCache): Cache for event resolution results, tags. Limited in size.
- resolver_file (Path): File containing system's current resolver nameservers.
- filter_bad_ptrs (bool): Whether to filter out DNS names that appear to be auto-generated PTR records. Defaults to True.
-
- Args:
- parent_helper: The parent helper object with configuration details and utilities.
-
- Raises:
- DNSError: If an issue arises when creating the BBOTAsyncResolver instance.
-
- Examples:
- >>> dns_helper = DNSHelper(parent_config)
- >>> resolved_host = dns_helper.resolver.resolve("example.com")
- """
-
- all_rdtypes = ["A", "AAAA", "SRV", "MX", "NS", "SOA", "CNAME", "TXT"]
-
- def __init__(self, parent_helper):
- self.parent_helper = parent_helper
- try:
- self.resolver = BBOTAsyncResolver(_parent_helper=self.parent_helper)
- except Exception as e:
- raise DNSError(f"Failed to create BBOT DNS resolver: {e}")
- self.timeout = self.parent_helper.config.get("dns_timeout", 5)
- self.retries = self.parent_helper.config.get("dns_retries", 1)
- self.abort_threshold = self.parent_helper.config.get("dns_abort_threshold", 50)
- self.max_dns_resolve_distance = self.parent_helper.config.get("max_dns_resolve_distance", 5)
- self.resolver.timeout = self.timeout
- self.resolver.lifetime = self.timeout
-
- # skip certain queries
- dns_omit_queries = self.parent_helper.config.get("dns_omit_queries", None)
- if not dns_omit_queries:
- dns_omit_queries = []
- self.dns_omit_queries = dict()
- for d in dns_omit_queries:
- d = d.split(":")
- if len(d) == 2:
- rdtype, query = d
- rdtype = rdtype.upper()
- query = query.lower()
- try:
- self.dns_omit_queries[rdtype].add(query)
- except KeyError:
- self.dns_omit_queries[rdtype] = {query}
-
- self.wildcard_ignore = self.parent_helper.config.get("dns_wildcard_ignore", None)
- if not self.wildcard_ignore:
- self.wildcard_ignore = []
- self.wildcard_ignore = tuple([str(d).strip().lower() for d in self.wildcard_ignore])
- self.wildcard_tests = self.parent_helper.config.get("dns_wildcard_tests", 5)
- self._wildcard_cache = dict()
- # since wildcard detection takes some time, This is to prevent multiple
- # modules from kicking off wildcard detection for the same domain at the same time
- self._wildcard_lock = NamedLock()
- self._dns_connectivity_lock = asyncio.Lock()
- self._last_dns_success = None
- self._last_connectivity_warning = time.time()
- # keeps track of warnings issued for wildcard detection to prevent duplicate warnings
- self._dns_warnings = set()
- self._errors = dict()
- self.fallback_nameservers_file = self.parent_helper.wordlist_dir / "nameservers.txt"
- self._debug = self.parent_helper.config.get("dns_debug", False)
- self._dummy_modules = dict()
- self._dns_cache = LRUCache(maxsize=10000)
- self._event_cache = LRUCache(maxsize=10000)
- self._event_cache_locks = NamedLock()
-
- # copy the system's current resolvers to a text file for tool use
- self.system_resolvers = dns.resolver.Resolver().nameservers
- if len(self.system_resolvers) == 1:
- log.warning("BBOT performs better with multiple DNS servers. Your system currently only has one.")
- self.resolver_file = self.parent_helper.tempfile(self.system_resolvers, pipe=False)
-
- self.filter_bad_ptrs = self.parent_helper.config.get("dns_filter_ptrs", True)
-
- async def resolve(self, query, **kwargs):
- """Resolve DNS names and IP addresses to their corresponding results.
-
- This is a high-level function that can translate a given domain name to its associated IP addresses
- or an IP address to its corresponding domain names. It's structured for ease of use within modules
- and will abstract away most of the complexity of DNS resolution, returning a simple set of results.
-
- Args:
- query (str): The domain name or IP address to resolve.
- **kwargs: Additional arguments to be passed to the resolution process.
-
- Returns:
- set: A set containing resolved domain names or IP addresses.
-
- Examples:
- >>> results = await resolve("1.2.3.4")
- {"evilcorp.com"}
-
- >>> results = await resolve("evilcorp.com")
- {"1.2.3.4", "dead::beef"}
- """
- results = set()
- try:
- r = await self.resolve_raw(query, **kwargs)
- if r:
- raw_results, errors = r
- for rdtype, answers in raw_results:
- for answer in answers:
- for _, t in self.extract_targets(answer):
- results.add(t)
- except BaseException:
- log.trace(f"Caught exception in resolve({query}, {kwargs}):")
- log.trace(traceback.format_exc())
- raise
-
- self.debug(f"Results for {query} with kwargs={kwargs}: {results}")
- return results
-
- async def resolve_raw(self, query, **kwargs):
- """Resolves the given query to its associated DNS records.
-
- This function is a foundational method for DNS resolution in this class. It understands both IP addresses and
- hostnames and returns their associated records in a raw format provided by the dnspython library.
-
- Args:
- query (str): The IP address or hostname to resolve.
- type (str or list[str], optional): Specifies the DNS record type(s) to fetch. Can be a single type like 'A'
- or a list like ['A', 'AAAA']. If set to 'any', 'all', or '*', it fetches all supported types. If not
- specified, the function defaults to fetching 'A' and 'AAAA' records.
- **kwargs: Additional arguments that might be passed to the resolver.
-
- Returns:
- tuple: A tuple containing two lists:
- - list: A list of tuples where each tuple consists of a record type string (like 'A') and the associated
- raw dnspython answer.
- - list: A list of tuples where each tuple consists of a record type string and the associated error if
- there was an issue fetching the record.
-
- Examples:
- >>> await resolve_raw("8.8.8.8")
- ([('PTR', )], [])
-
- >>> await resolve_raw("dns.google")
- ([('A', ), ('AAAA', )], [])
- """
- # DNS over TCP is more reliable
- # But setting this breaks DNS resolution on Ubuntu because systemd-resolve doesn't support TCP
- # kwargs["tcp"] = True
- results = []
- errors = []
- try:
- query = str(query).strip()
- if is_ip(query):
- kwargs.pop("type", None)
- kwargs.pop("rdtype", None)
- results, errors = await self._resolve_ip(query, **kwargs)
- return [("PTR", results)], [("PTR", e) for e in errors]
- else:
- types = ["A", "AAAA"]
- kwargs.pop("rdtype", None)
- if "type" in kwargs:
- t = kwargs.pop("type")
- types = self._parse_rdtype(t, default=types)
- for t in types:
- r, e = await self._resolve_hostname(query, rdtype=t, **kwargs)
- if r:
- results.append((t, r))
- for error in e:
- errors.append((t, error))
- except BaseException:
- log.trace(f"Caught exception in resolve_raw({query}, {kwargs}):")
- log.trace(traceback.format_exc())
- raise
-
- return (results, errors)
-
- async def _resolve_hostname(self, query, **kwargs):
- """Translate a hostname into its corresponding IP addresses.
-
- This is the foundational function for converting a domain name into its associated IP addresses. It's designed
- for internal use within the class and handles retries, caching, and a variety of error/timeout scenarios.
- It also respects certain configurations that might ask to skip certain types of queries. Results are returned
- in the default dnspython answer object format.
-
- Args:
- query (str): The hostname to resolve.
- rdtype (str, optional): The type of DNS record to query (e.g., 'A', 'AAAA'). Defaults to 'A'.
- retries (int, optional): The number of times to retry on failure. Defaults to class-wide `retries`.
- use_cache (bool, optional): Whether to check the cache before trying a fresh resolution. Defaults to True.
- **kwargs: Additional arguments that might be passed to the resolver.
-
- Returns:
- tuple: A tuple containing:
- - list: A list of resolved IP addresses.
- - list: A list of errors encountered during the resolution process.
-
- Examples:
- >>> results, errors = await _resolve_hostname("google.com")
- (, [])
- """
- self.debug(f"Resolving {query} with kwargs={kwargs}")
- results = []
- errors = []
- rdtype = kwargs.get("rdtype", "A")
-
- # skip certain queries if requested
- if rdtype in self.dns_omit_queries:
- if any(h == query or query.endswith(f".{h}") for h in self.dns_omit_queries[rdtype]):
- self.debug(f"Skipping {rdtype}:{query} because it's omitted in the config")
- return results, errors
-
- parent = self.parent_helper.parent_domain(query)
- retries = kwargs.pop("retries", self.retries)
- use_cache = kwargs.pop("use_cache", True)
- tries_left = int(retries) + 1
- parent_hash = hash(f"{parent}:{rdtype}")
- dns_cache_hash = hash(f"{query}:{rdtype}")
- while tries_left > 0:
- try:
- if use_cache:
- results = self._dns_cache.get(dns_cache_hash, [])
- if not results:
- error_count = self._errors.get(parent_hash, 0)
- if error_count >= self.abort_threshold:
- connectivity = await self._connectivity_check()
- if connectivity:
- log.verbose(
- f'Aborting query "{query}" because failed {rdtype} queries for "{parent}" ({error_count:,}) exceeded abort threshold ({self.abort_threshold:,})'
- )
- if parent_hash not in self._dns_warnings:
- log.verbose(
- f'Aborting future {rdtype} queries to "{parent}" because error count ({error_count:,}) exceeded abort threshold ({self.abort_threshold:,})'
- )
- self._dns_warnings.add(parent_hash)
- return results, errors
- results = await self._catch(self.resolver.resolve, query, **kwargs)
- if use_cache:
- self._dns_cache[dns_cache_hash] = results
- if parent_hash in self._errors:
- self._errors[parent_hash] = 0
- break
- except (
- dns.resolver.NoNameservers,
- dns.exception.Timeout,
- dns.resolver.LifetimeTimeout,
- TimeoutError,
- ) as e:
- try:
- self._errors[parent_hash] += 1
- except KeyError:
- self._errors[parent_hash] = 1
- errors.append(e)
- # don't retry if we get a SERVFAIL
- if isinstance(e, dns.resolver.NoNameservers):
- break
- tries_left -= 1
- err_msg = (
- f'DNS error or timeout for {rdtype} query "{query}" ({self._errors[parent_hash]:,} so far): {e}'
- )
- if tries_left > 0:
- retry_num = (retries + 1) - tries_left
- self.debug(err_msg)
- self.debug(f"Retry (#{retry_num}) resolving {query} with kwargs={kwargs}")
- else:
- log.verbose(err_msg)
-
- if results:
- self._last_dns_success = time.time()
- self.debug(f"Answers for {query} with kwargs={kwargs}: {list(results)}")
-
- if errors:
- self.debug(f"Errors for {query} with kwargs={kwargs}: {errors}")
-
- return results, errors
-
- async def _resolve_ip(self, query, **kwargs):
- """Translate an IP address into a corresponding DNS name.
-
- This is the most basic function that will convert an IP address into its associated domain name. It handles
- retries, caching, and multiple types of timeout/error scenarios internally. The function is intended for
- internal use and should not be directly called by modules without understanding its intricacies.
-
- Args:
- query (str): The IP address to be reverse-resolved.
- retries (int, optional): The number of times to retry on failure. Defaults to 0.
- use_cache (bool, optional): Whether to check the cache for the result before attempting resolution. Defaults to True.
- **kwargs: Additional arguments to be passed to the resolution process.
-
- Returns:
- tuple: A tuple containing:
- - list: A list of resolved domain names (in default dnspython answer format).
- - list: A list of errors encountered during resolution.
-
- Examples:
- >>> results, errors = await _resolve_ip("8.8.8.8")
- (, [])
- """
- self.debug(f"Reverse-resolving {query} with kwargs={kwargs}")
- retries = kwargs.pop("retries", 0)
- use_cache = kwargs.pop("use_cache", True)
- tries_left = int(retries) + 1
- results = []
- errors = []
- dns_cache_hash = hash(f"{query}:PTR")
- while tries_left > 0:
- try:
- if use_cache:
- results = self._dns_cache.get(dns_cache_hash, [])
- if not results:
- results = await self._catch(self.resolver.resolve_address, query, **kwargs)
- if use_cache:
- self._dns_cache[dns_cache_hash] = results
- break
- except (
- dns.exception.Timeout,
- dns.resolver.LifetimeTimeout,
- dns.resolver.NoNameservers,
- TimeoutError,
- ) as e:
- errors.append(e)
- # don't retry if we get a SERVFAIL
- if isinstance(e, dns.resolver.NoNameservers):
- self.debug(f"{e} (query={query}, kwargs={kwargs})")
- break
- else:
- tries_left -= 1
- if tries_left > 0:
- retry_num = (retries + 2) - tries_left
- self.debug(f"Retrying (#{retry_num}) {query} with kwargs={kwargs}")
-
- if results:
- self._last_dns_success = time.time()
-
- return results, errors
-
- async def handle_wildcard_event(self, event, children):
- """
- Used within BBOT's scan manager to detect and tag DNS wildcard events.
-
- Wildcards are detected for every major record type. If a wildcard is detected, its data
- is overwritten, for example: `_wildcard.evilcorp.com`.
-
- Args:
- event (object): The event to check for wildcards.
- children (list): A list of the event's resulting DNS children after resolution.
-
- Returns:
- None: This method modifies the `event` in place and does not return a value.
-
- Examples:
- >>> handle_wildcard_event(event, children)
- # The `event` might now have tags like ["wildcard", "a-wildcard", "aaaa-wildcard"] and
- # its `data` attribute might be modified to "_wildcard.evilcorp.com" if it was detected
- # as a wildcard.
- """
- log.debug(f"Entering handle_wildcard_event({event}, children={children})")
- try:
- event_host = str(event.host)
- # wildcard checks
- if not is_ip(event.host):
- # check if the dns name itself is a wildcard entry
- wildcard_rdtypes = await self.is_wildcard(event_host)
- for rdtype, (is_wildcard, wildcard_host) in wildcard_rdtypes.items():
- wildcard_tag = "error"
- if is_wildcard == True:
- event.add_tag("wildcard")
- wildcard_tag = "wildcard"
- event.add_tag(f"{rdtype.lower()}-{wildcard_tag}")
-
- # wildcard event modification (www.evilcorp.com --> _wildcard.evilcorp.com)
- if not is_ip(event.host) and children:
- if wildcard_rdtypes:
- # these are the rdtypes that successfully resolve
- resolved_rdtypes = set([c.upper() for c in children])
- # these are the rdtypes that have wildcards
- wildcard_rdtypes_set = set(wildcard_rdtypes)
- # consider the event a full wildcard if all its records are wildcards
- event_is_wildcard = False
- if resolved_rdtypes:
- event_is_wildcard = all(r in wildcard_rdtypes_set for r in resolved_rdtypes)
-
- if event_is_wildcard:
- if event.type in ("DNS_NAME",) and not "_wildcard" in event.data.split("."):
- wildcard_parent = self.parent_helper.parent_domain(event_host)
- for rdtype, (_is_wildcard, _parent_domain) in wildcard_rdtypes.items():
- if _is_wildcard:
- wildcard_parent = _parent_domain
- break
- wildcard_data = f"_wildcard.{wildcard_parent}"
- if wildcard_data != event.data:
- log.debug(
- f'Wildcard detected, changing event.data "{event.data}" --> "{wildcard_data}"'
- )
- event.data = wildcard_data
- # tag wildcard domains for convenience
- elif is_domain(event_host) or hash(event_host) in self._wildcard_cache:
- event_target = "target" in event.tags
- wildcard_domain_results = await self.is_wildcard_domain(event_host, log_info=event_target)
- for hostname, wildcard_domain_rdtypes in wildcard_domain_results.items():
- if wildcard_domain_rdtypes:
- event.add_tag("wildcard-domain")
- for rdtype, ips in wildcard_domain_rdtypes.items():
- event.add_tag(f"{rdtype.lower()}-wildcard-domain")
- finally:
- log.debug(f"Finished handle_wildcard_event({event}, children={children})")
-
- async def resolve_event(self, event, minimal=False):
- """
- Tag the given event with the appropriate DNS record types and optionally create child
- events based on DNS resolutions.
-
- Args:
- event (object): The event to be resolved and tagged.
- minimal (bool, optional): If set to True, the function will perform minimal DNS
- resolution. Defaults to False.
-
- Returns:
- tuple: A 4-tuple containing the following items:
- - event_tags (set): Set of tags for the event.
- - event_whitelisted (bool): Whether the event is whitelisted.
- - event_blacklisted (bool): Whether the event is blacklisted.
- - dns_children (dict): Dictionary containing child events from DNS resolutions.
-
- Examples:
- >>> event = make_event("evilcorp.com")
- >>> resolve_event(event)
- ({'resolved', 'ns-record', 'a-record',}, False, False, {'A': {IPv4Address('1.2.3.4'), IPv4Address('1.2.3.5')}, 'NS': {'ns1.evilcorp.com'}})
-
- Note:
- This method does not modify the passed in `event`. Instead, it returns data
- that can be used to modify or act upon the `event`.
- """
- log.debug(f"Resolving {event}")
- event_host = str(event.host)
- event_tags = set()
- dns_children = dict()
- event_whitelisted = False
- event_blacklisted = False
-
- try:
- if (not event.host) or (event.type in ("IP_RANGE",)):
- return event_tags, event_whitelisted, event_blacklisted, dns_children
-
- # lock to ensure resolution of the same host doesn't start while we're working here
- async with self._event_cache_locks.lock(event_host):
- # try to get data from cache
- _event_tags, _event_whitelisted, _event_blacklisted, _dns_children = self.event_cache_get(event_host)
- event_tags.update(_event_tags)
- # if we found it, return it
- if _event_whitelisted is not None:
- return event_tags, _event_whitelisted, _event_blacklisted, _dns_children
-
- # then resolve
- types = ()
- if self.parent_helper.is_ip(event.host):
- if not minimal:
- types = ("PTR",)
- else:
- if event.type == "DNS_NAME" and not minimal:
- types = self.all_rdtypes
- else:
- types = ("A", "AAAA")
-
- if types:
- for t in types:
- resolved_raw, errors = await self.resolve_raw(event_host, type=t, use_cache=True)
- for rdtype, e in errors:
- if rdtype not in resolved_raw:
- event_tags.add(f"{rdtype.lower()}-error")
- for rdtype, records in resolved_raw:
- rdtype = str(rdtype).upper()
- if records:
- event_tags.add("resolved")
- event_tags.add(f"{rdtype.lower()}-record")
-
- # whitelisting and blacklisting of IPs
- for r in records:
- for _, t in self.extract_targets(r):
- if t:
- ip = self.parent_helper.make_ip_type(t)
-
- if rdtype in ("A", "AAAA", "CNAME"):
- with contextlib.suppress(ValidationError):
- if self.parent_helper.is_ip(ip):
- if self.parent_helper.scan.whitelisted(ip):
- event_whitelisted = True
- with contextlib.suppress(ValidationError):
- if self.parent_helper.scan.blacklisted(ip):
- event_blacklisted = True
-
- if self.filter_bad_ptrs and rdtype in ("PTR") and self.parent_helper.is_ptr(t):
- self.debug(f"Filtering out bad PTR: {t}")
- continue
-
- try:
- dns_children[rdtype].add(ip)
- except KeyError:
- dns_children[rdtype] = {ip}
-
- # tag with cloud providers
- if not self.parent_helper.in_tests:
- to_check = set()
- if event.type == "IP_ADDRESS":
- to_check.add(event.data)
- for rdtype, ips in dns_children.items():
- if rdtype in ("A", "AAAA"):
- for ip in ips:
- to_check.add(ip)
- for ip in to_check:
- provider, provider_type, subnet = cloudcheck(ip)
- if provider:
- event_tags.add(f"{provider_type}-{provider}")
-
- # if needed, mark as unresolved
- if not is_ip(event_host) and "resolved" not in event_tags:
- event_tags.add("unresolved")
- # check for private IPs
- for rdtype, ips in dns_children.items():
- for ip in ips:
- try:
- ip = ipaddress.ip_address(ip)
- if ip.is_private:
- event_tags.add("private-ip")
- except ValueError:
- continue
-
- self._event_cache[event_host] = (event_tags, event_whitelisted, event_blacklisted, dns_children)
-
- return event_tags, event_whitelisted, event_blacklisted, dns_children
-
- finally:
- log.debug(f"Finished resolving {event}")
-
- def event_cache_get(self, host):
- """
- Retrieves cached event data based on the given host.
-
- Args:
- host (str): The host for which the event data is to be retrieved.
-
- Returns:
- tuple: A 4-tuple containing the following items:
- - event_tags (set): Set of tags for the event.
- - event_whitelisted (bool or None): Whether the event is whitelisted. Returns None if not found.
- - event_blacklisted (bool or None): Whether the event is blacklisted. Returns None if not found.
- - dns_children (set): Set containing child events from DNS resolutions.
-
- Examples:
- Assuming an event with host "www.evilcorp.com" has been cached:
-
- >>> event_cache_get("www.evilcorp.com")
- ({"resolved", "a-record"}, False, False, {'1.2.3.4'})
-
- Assuming no event with host "www.notincache.com" has been cached:
-
- >>> event_cache_get("www.notincache.com")
- (set(), None, None, set())
- """
- try:
- event_tags, event_whitelisted, event_blacklisted, dns_children = self._event_cache[host]
- return (event_tags, event_whitelisted, event_blacklisted, dns_children)
- except KeyError:
- return set(), None, None, set()
-
- async def resolve_batch(self, queries, **kwargs):
- """
- A helper to execute a bunch of DNS requests.
-
- Args:
- queries (list): List of queries to resolve.
- **kwargs: Additional keyword arguments to pass to `resolve()`.
-
- Yields:
- tuple: A tuple containing the original query and its resolved value.
-
- Examples:
- >>> import asyncio
- >>> async def example_usage():
- ... async for result in resolve_batch(['www.evilcorp.com', 'evilcorp.com']):
- ... print(result)
- ('www.evilcorp.com', {'1.1.1.1'})
- ('evilcorp.com', {'2.2.2.2'})
-
- """
- for q in queries:
- yield (q, await self.resolve(q, **kwargs))
-
- def extract_targets(self, record):
- """
- Extracts hostnames or IP addresses from a given DNS record.
-
- This method reads the DNS record's type and based on that, extracts the target
- hostnames or IP addresses it points to. The type of DNS record
- (e.g., "A", "MX", "CNAME", etc.) determines which fields are used for extraction.
-
- Args:
- record (dns.rdata.Rdata): The DNS record to extract information from.
-
- Returns:
- set: A set of tuples, each containing the DNS record type and the extracted value.
-
- Examples:
- >>> from dns.rrset import from_text
- >>> record = from_text('www.example.com', 3600, 'IN', 'A', '192.0.2.1')
- >>> extract_targets(record[0])
- {('A', '192.0.2.1')}
-
- >>> record = from_text('example.com', 3600, 'IN', 'MX', '10 mail.example.com.')
- >>> extract_targets(record[0])
- {('MX', 'mail.example.com')}
-
- """
- results = set()
- rdtype = str(record.rdtype.name).upper()
- if rdtype in ("A", "AAAA", "NS", "CNAME", "PTR"):
- results.add((rdtype, self._clean_dns_record(record)))
- elif rdtype == "SOA":
- results.add((rdtype, self._clean_dns_record(record.mname)))
- elif rdtype == "MX":
- results.add((rdtype, self._clean_dns_record(record.exchange)))
- elif rdtype == "SRV":
- results.add((rdtype, self._clean_dns_record(record.target)))
- elif rdtype == "TXT":
- for s in record.strings:
- s = self.parent_helper.smart_decode(s)
- for match in dns_name_regex.finditer(s):
- start, end = match.span()
- host = s[start:end]
- results.add((rdtype, host))
- elif rdtype == "NSEC":
- results.add((rdtype, self._clean_dns_record(record.next)))
- else:
- log.warning(f'Unknown DNS record type "{rdtype}"')
- return results
-
- @staticmethod
- def _clean_dns_record(record):
- """
- Cleans and formats a given DNS record for further processing.
-
- This static method converts the DNS record to text format if it's not already a string.
- It also removes any trailing dots and converts the record to lowercase.
-
- Args:
- record (str or dns.rdata.Rdata): The DNS record to clean.
-
- Returns:
- str: The cleaned and formatted DNS record.
-
- Examples:
- >>> _clean_dns_record('www.evilcorp.com.')
- 'www.evilcorp.com'
-
- >>> from dns.rrset import from_text
- >>> record = from_text('www.evilcorp.com', 3600, 'IN', 'A', '1.2.3.4')[0]
- >>> _clean_dns_record(record)
- '1.2.3.4'
- """
- if not isinstance(record, str):
- record = str(record.to_text())
- return str(record).rstrip(".").lower()
-
- async def _catch(self, callback, *args, **kwargs):
- """
- Asynchronously catches exceptions thrown during DNS resolution and logs them.
-
- This method wraps around a given asynchronous callback function to handle different
- types of DNS exceptions and general exceptions. It logs the exceptions for debugging
- and, in some cases, re-raises them.
-
- Args:
- callback (callable): The asynchronous function to be executed.
- *args: Positional arguments to pass to the callback.
- **kwargs: Keyword arguments to pass to the callback.
-
- Returns:
- Any: The return value of the callback function, or an empty list if an exception is caught.
-
- Raises:
- dns.resolver.NoNameservers: When no nameservers could be reached.
- """
- try:
- return await callback(*args, **kwargs)
- except dns.resolver.NoNameservers:
- raise
- except (dns.exception.Timeout, dns.resolver.LifetimeTimeout, TimeoutError):
- log.debug(f"DNS query with args={args}, kwargs={kwargs} timed out after {self.timeout} seconds")
- raise
- except dns.exception.DNSException as e:
- self.debug(f"{e} (args={args}, kwargs={kwargs})")
- except Exception as e:
- log.warning(f"Error in {callback.__qualname__}() with args={args}, kwargs={kwargs}: {e}")
- log.trace(traceback.format_exc())
- return []
-
- async def is_wildcard(self, query, ips=None, rdtype=None):
- """
- Use this method to check whether a *host* is a wildcard entry
-
- This can reliably tell the difference between a valid DNS record and a wildcard within a wildcard domain.
-
- If you want to know whether a domain is using wildcard DNS, use `is_wildcard_domain()` instead.
-
- Args:
- query (str): The hostname to check for a wildcard entry.
- ips (list, optional): List of IPs to compare against, typically obtained from a previous DNS resolution of the query.
- rdtype (str, optional): The DNS record type (e.g., "A", "AAAA") to consider during the check.
-
- Returns:
- dict: A dictionary indicating if the query is a wildcard for each checked DNS record type.
- Keys are DNS record types like "A", "AAAA", etc.
- Values are tuples where the first element is a boolean indicating if the query is a wildcard,
- and the second element is the wildcard parent if it's a wildcard.
-
- Raises:
- ValueError: If only one of `ips` or `rdtype` is specified or if no valid IPs are specified.
-
- Examples:
- >>> is_wildcard("www.github.io")
- {"A": (True, "github.io"), "AAAA": (True, "github.io")}
-
- >>> is_wildcard("www.evilcorp.com", ips=["93.184.216.34"], rdtype="A")
- {"A": (False, "evilcorp.com")}
-
- Note:
- `is_wildcard` can be True, False, or None (indicating that wildcard detection was inconclusive)
- """
- result = {}
-
- if [ips, rdtype].count(None) == 1:
- raise ValueError("Both ips and rdtype must be specified")
-
- if not is_dns_name(query):
- return {}
-
- # skip check if the query's parent domain is excluded in the config
- for d in self.wildcard_ignore:
- if self.parent_helper.host_in_host(query, d):
- log.debug(f"Skipping wildcard detection on {query} because it is excluded in the config")
- return {}
-
- query = self._clean_dns_record(query)
- # skip check if it's an IP
- if is_ip(query) or not "." in query:
- return {}
- # skip check if the query is a domain
- if is_domain(query):
- return {}
-
- parent = parent_domain(query)
- parents = list(domain_parents(query))
-
- rdtypes_to_check = [rdtype] if rdtype is not None else self.all_rdtypes
-
- base_query_ips = dict()
- # if the caller hasn't already done the work of resolving the IPs
- if ips is None:
- # then resolve the query for all rdtypes
- for t in rdtypes_to_check:
- raw_results, errors = await self.resolve_raw(query, type=t, use_cache=True)
- if errors and not raw_results:
- self.debug(f"Failed to resolve {query} ({t}) during wildcard detection")
- result[t] = (None, parent)
- continue
- for __rdtype, answers in raw_results:
- base_query_results = set()
- for answer in answers:
- for _, t in self.extract_targets(answer):
- base_query_results.add(t)
- if base_query_results:
- base_query_ips[__rdtype] = base_query_results
- else:
- # otherwise, we can skip all that
- cleaned_ips = set([self._clean_dns_record(ip) for ip in ips])
- if not cleaned_ips:
- raise ValueError("Valid IPs must be specified")
- base_query_ips[rdtype] = cleaned_ips
- if not base_query_ips:
- return result
-
- # once we've resolved the base query and have IP addresses to work with
- # we can compare the IPs to the ones we have on file for wildcards
-
- # for every parent domain, starting with the shortest
- try:
- for host in parents[::-1]:
- # make sure we've checked that domain for wildcards
- await self.is_wildcard_domain(host)
-
- # for every rdtype
- for _rdtype in list(base_query_ips):
- # get the IPs from above
- query_ips = base_query_ips.get(_rdtype, set())
- host_hash = hash(host)
-
- if host_hash in self._wildcard_cache:
- # then get its IPs from our wildcard cache
- wildcard_rdtypes = self._wildcard_cache[host_hash]
-
- # then check to see if our IPs match the wildcard ones
- if _rdtype in wildcard_rdtypes:
- wildcard_ips = wildcard_rdtypes[_rdtype]
- # if our IPs match the wildcard ones, then ladies and gentlemen we have a wildcard
- is_wildcard = any(r in wildcard_ips for r in query_ips)
-
- if is_wildcard and not result.get(_rdtype, (None, None))[0] is True:
- result[_rdtype] = (True, host)
-
- # if we've reached a point where the dns name is a complete wildcard, class can be dismissed early
- base_query_rdtypes = set(base_query_ips)
- wildcard_rdtypes_set = set([k for k, v in result.items() if v[0] is True])
- if base_query_rdtypes and wildcard_rdtypes_set and base_query_rdtypes == wildcard_rdtypes_set:
- log.debug(
- f"Breaking from wildcard detection for {query} at {host} because base query rdtypes ({base_query_rdtypes}) == wildcard rdtypes ({wildcard_rdtypes_set})"
- )
- raise DNSWildcardBreak()
- except DNSWildcardBreak:
- pass
-
- return result
-
- async def is_wildcard_domain(self, domain, log_info=False):
- """
- Check whether a given host or its children make use of wildcard DNS entries. Wildcard DNS can have
- various implications, particularly in subdomain enumeration and subdomain takeovers.
-
- Args:
- domain (str): The domain to check for wildcard DNS entries.
- log_info (bool, optional): Whether to log the result of the check. Defaults to False.
-
- Returns:
- dict: A dictionary where the keys are the parent domains that have wildcard DNS entries,
- and the values are another dictionary of DNS record types ("A", "AAAA", etc.) mapped to
- sets of their resolved IP addresses.
-
- Examples:
- >>> is_wildcard_domain("github.io")
- {"github.io": {"A": {"1.2.3.4"}, "AAAA": {"dead::beef"}}}
-
- >>> is_wildcard_domain("example.com")
- {}
- """
- wildcard_domain_results = {}
- domain = self._clean_dns_record(domain)
-
- if not is_dns_name(domain):
- return {}
-
- # skip check if the query's parent domain is excluded in the config
- for d in self.wildcard_ignore:
- if self.parent_helper.host_in_host(domain, d):
- log.debug(f"Skipping wildcard detection on {domain} because it is excluded in the config")
- return {}
-
- rdtypes_to_check = set(self.all_rdtypes)
-
- # make a list of its parents
- parents = list(domain_parents(domain, include_self=True))
- # and check each of them, beginning with the highest parent (i.e. the root domain)
- for i, host in enumerate(parents[::-1]):
- # have we checked this host before?
- host_hash = hash(host)
- async with self._wildcard_lock.lock(host_hash):
- # if we've seen this host before
- if host_hash in self._wildcard_cache:
- wildcard_domain_results[host] = self._wildcard_cache[host_hash]
- continue
-
- log.verbose(f"Checking if {host} is a wildcard")
-
- # determine if this is a wildcard domain
-
- # resolve a bunch of random subdomains of the same parent
- is_wildcard = False
- wildcard_results = dict()
- for rdtype in list(rdtypes_to_check):
- # continue if a wildcard was already found for this rdtype
- # if rdtype in self._wildcard_cache[host_hash]:
- # continue
- for _ in range(self.wildcard_tests):
- rand_query = f"{rand_string(digits=False, length=10)}.{host}"
- results = await self.resolve(rand_query, type=rdtype, use_cache=False)
- if results:
- is_wildcard = True
- if not rdtype in wildcard_results:
- wildcard_results[rdtype] = set()
- wildcard_results[rdtype].update(results)
- # we know this rdtype is a wildcard
- # so we don't need to check it anymore
- with suppress(KeyError):
- rdtypes_to_check.remove(rdtype)
-
- self._wildcard_cache.update({host_hash: wildcard_results})
- wildcard_domain_results.update({host: wildcard_results})
- if is_wildcard:
- wildcard_rdtypes_str = ",".join(sorted([t.upper() for t, r in wildcard_results.items() if r]))
- log_fn = log.verbose
- if log_info:
- log_fn = log.info
- log_fn(f"Encountered domain with wildcard DNS ({wildcard_rdtypes_str}): {host}")
- else:
- log.verbose(f"Finished checking {host}, it is not a wildcard")
-
- return wildcard_domain_results
-
- async def _connectivity_check(self, interval=5):
- """
- Periodically checks for an active internet connection by attempting DNS resolution.
-
- Args:
- interval (int, optional): The time interval, in seconds, at which to perform the check.
- Defaults to 5 seconds.
-
- Returns:
- bool: True if there is an active internet connection, False otherwise.
-
- Examples:
- >>> await _connectivity_check()
- True
- """
- if self._last_dns_success is not None:
- if time.time() - self._last_dns_success < interval:
- return True
- dns_server_working = []
- async with self._dns_connectivity_lock:
- with suppress(Exception):
- dns_server_working = await self._catch(self.resolver.resolve, "www.google.com", rdtype="A")
- if dns_server_working:
- self._last_dns_success = time.time()
- return True
- if time.time() - self._last_connectivity_warning > interval:
- log.warning(f"DNS queries are failing, please check your internet connection")
- self._last_connectivity_warning = time.time()
- self._errors.clear()
- return False
-
- def _parse_rdtype(self, t, default=None):
- if isinstance(t, str):
- if t.strip().lower() in ("any", "all", "*"):
- return self.all_rdtypes
- else:
- return [t.strip().upper()]
- elif any([isinstance(t, x) for x in (list, tuple)]):
- return [str(_).strip().upper() for _ in t]
- return default
-
- def debug(self, *args, **kwargs):
- if self._debug:
- log.trace(*args, **kwargs)
-
- def _get_dummy_module(self, name):
- try:
- dummy_module = self._dummy_modules[name]
- except KeyError:
- dummy_module = self.parent_helper._make_dummy_module(name=name, _type="DNS")
- dummy_module.suppress_dupes = False
- self._dummy_modules[name] = dummy_module
- return dummy_module
diff --git a/bbot/core/helpers/dns/__init__.py b/bbot/core/helpers/dns/__init__.py
new file mode 100644
index 000000000..75426cd26
--- /dev/null
+++ b/bbot/core/helpers/dns/__init__.py
@@ -0,0 +1 @@
+from .dns import DNSHelper
diff --git a/bbot/core/helpers/dns/brute.py b/bbot/core/helpers/dns/brute.py
new file mode 100644
index 000000000..c34e96610
--- /dev/null
+++ b/bbot/core/helpers/dns/brute.py
@@ -0,0 +1,180 @@
+import json
+import random
+import asyncio
+import logging
+import subprocess
+
+
+class DNSBrute:
+ """
+ Helper for DNS brute-forcing.
+
+ Examples:
+ >>> domain = "evilcorp.com"
+ >>> subdomains = ["www", "mail"]
+ >>> results = await self.helpers.dns.brute(self, domain, subdomains)
+ """
+
+ nameservers_url = (
+ "https://raw.githubusercontent.com/blacklanternsecurity/public-dns-servers/master/nameservers.txt"
+ )
+
+ def __init__(self, parent_helper):
+ self.parent_helper = parent_helper
+ self.log = logging.getLogger("bbot.helper.dns.brute")
+ self.num_canaries = 100
+ self.max_resolvers = self.parent_helper.config.get("dns", {}).get("brute_threads", 1000)
+ self.devops_mutations = list(self.parent_helper.word_cloud.devops_mutations)
+ self.digit_regex = self.parent_helper.re.compile(r"\d+")
+ self._resolver_file = None
+ self._dnsbrute_lock = asyncio.Lock()
+
+ async def __call__(self, *args, **kwargs):
+ return await self.dnsbrute(*args, **kwargs)
+
+ async def dnsbrute(self, module, domain, subdomains, type=None):
+ subdomains = list(subdomains)
+
+ if type is None:
+ type = "A"
+ type = str(type).strip().upper()
+
+ domain_wildcard_rdtypes = set()
+ for _domain, rdtypes in (await self.parent_helper.dns.is_wildcard_domain(domain)).items():
+ for rdtype, results in rdtypes.items():
+ if results:
+ domain_wildcard_rdtypes.add(rdtype)
+ if any([r in domain_wildcard_rdtypes for r in (type, "CNAME")]):
+ self.log.info(
+ f"Aborting massdns on {domain} because it's a wildcard domain ({','.join(domain_wildcard_rdtypes)})"
+ )
+ return []
+ else:
+ self.log.trace(f"{domain}: A is not in domain_wildcard_rdtypes:{domain_wildcard_rdtypes}")
+
+ canaries = self.gen_random_subdomains(self.num_canaries)
+ canaries_list = list(canaries)
+ canaries_pre = canaries_list[: int(self.num_canaries / 2)]
+ canaries_post = canaries_list[int(self.num_canaries / 2) :]
+ # sandwich subdomains between canaries
+ subdomains = canaries_pre + subdomains + canaries_post
+
+ results = []
+ canaries_triggered = []
+ async for hostname, ip, rdtype in self._massdns(module, domain, subdomains, rdtype=type):
+ sub = hostname.split(domain)[0]
+ if sub in canaries:
+ canaries_triggered.append(sub)
+ else:
+ results.append(hostname)
+
+ if len(canaries_triggered) > 5:
+ self.log.info(
+ f"Aborting massdns on {domain} due to false positive: ({len(canaries_triggered):,} canaries triggered - {','.join(canaries_triggered)})"
+ )
+ return []
+
+ # everything checks out
+ return results
+
+ async def _massdns(self, module, domain, subdomains, rdtype):
+ """
+ {
+ "name": "www.blacklanternsecurity.com.",
+ "type": "A",
+ "class": "IN",
+ "status": "NOERROR",
+ "data": {
+ "answers": [
+ {
+ "ttl": 3600,
+ "type": "CNAME",
+ "class": "IN",
+ "name": "www.blacklanternsecurity.com.",
+ "data": "blacklanternsecurity.github.io."
+ },
+ {
+ "ttl": 3600,
+ "type": "A",
+ "class": "IN",
+ "name": "blacklanternsecurity.github.io.",
+ "data": "185.199.108.153"
+ }
+ ]
+ },
+ "resolver": "168.215.165.186:53"
+ }
+ """
+ resolver_file = await self.resolver_file()
+ command = (
+ "massdns",
+ "-r",
+ resolver_file,
+ "-s",
+ self.max_resolvers,
+ "-t",
+ rdtype,
+ "-o",
+ "J",
+ "-q",
+ )
+ subdomains = self.gen_subdomains(subdomains, domain)
+ hosts_yielded = set()
+ async with self._dnsbrute_lock:
+ async for line in module.run_process_live(*command, stderr=subprocess.DEVNULL, input=subdomains):
+ try:
+ j = json.loads(line)
+ except json.decoder.JSONDecodeError:
+ self.log.debug(f"Failed to decode line: {line}")
+ continue
+ answers = j.get("data", {}).get("answers", [])
+ if type(answers) == list and len(answers) > 0:
+ answer = answers[0]
+ hostname = answer.get("name", "").strip(".").lower()
+ if hostname.endswith(f".{domain}"):
+ data = answer.get("data", "")
+ rdtype = answer.get("type", "").upper()
+ if data and rdtype:
+ hostname_hash = hash(hostname)
+ if hostname_hash not in hosts_yielded:
+ hosts_yielded.add(hostname_hash)
+ yield hostname, data, rdtype
+
+ async def gen_subdomains(self, prefixes, domain):
+ for p in prefixes:
+ if domain:
+ p = f"{p}.{domain}"
+ yield p
+
+ async def resolver_file(self):
+ if self._resolver_file is None:
+ self._resolver_file = await self.parent_helper.wordlist(
+ self.nameservers_url,
+ cache_hrs=24 * 7,
+ )
+ return self._resolver_file
+
+ def gen_random_subdomains(self, n=50):
+ delimiters = (".", "-")
+ lengths = list(range(3, 8))
+ for i in range(0, max(0, n - 5)):
+ d = delimiters[i % len(delimiters)]
+ l = lengths[i % len(lengths)]
+ segments = list(random.choice(self.devops_mutations) for _ in range(l))
+ segments.append(self.parent_helper.rand_string(length=8, digits=False))
+ subdomain = d.join(segments)
+ yield subdomain
+ for _ in range(5):
+ yield self.parent_helper.rand_string(length=8, digits=False)
+
+ def has_excessive_digits(self, d):
+ """
+ Identifies dns names with excessive numbers, e.g.:
+ - w1-2-3.evilcorp.com
+ - ptr1234.evilcorp.com
+ """
+ is_ptr = self.parent_helper.is_ptr(d)
+ digits = self.digit_regex.findall(d)
+ excessive_digits = len(digits) > 2
+ long_digits = any(len(d) > 3 for d in digits)
+ return is_ptr or excessive_digits or long_digits
diff --git a/bbot/core/helpers/dns/dns.py b/bbot/core/helpers/dns/dns.py
new file mode 100644
index 000000000..d171036ba
--- /dev/null
+++ b/bbot/core/helpers/dns/dns.py
@@ -0,0 +1,181 @@
+import dns
+import logging
+import dns.exception
+import dns.asyncresolver
+from radixtarget import RadixTarget
+
+from bbot.errors import DNSError
+from bbot.core.engine import EngineClient
+from ..misc import clean_dns_record, is_ip, is_domain, is_dns_name
+
+from .engine import DNSEngine
+
+log = logging.getLogger("bbot.core.helpers.dns")
+
+
+class DNSHelper(EngineClient):
+
+ SERVER_CLASS = DNSEngine
+ ERROR_CLASS = DNSError
+
+ """Helper class for DNS-related operations within BBOT.
+
+ This class provides mechanisms for host resolution, wildcard domain detection, event tagging, and more.
+ It centralizes all DNS-related activities in BBOT, offering both synchronous and asynchronous methods
+ for DNS resolution, as well as various utilities for batch resolution and DNS query filtering.
+
+ Attributes:
+ parent_helper: A reference to the instantiated `ConfigAwareHelper` (typically `scan.helpers`).
+ resolver (BBOTAsyncResolver): An asynchronous DNS resolver tailored for BBOT with rate-limiting capabilities.
+ timeout (int): The timeout value for DNS queries. Defaults to 5 seconds.
+ retries (int): The number of retries for failed DNS queries. Defaults to 1.
+ abort_threshold (int): The threshold for aborting after consecutive failed queries. Defaults to 50.
+ runaway_limit (int): Maximum allowed distance for consecutive DNS resolutions. Defaults to 5.
+ all_rdtypes (list): A list of DNS record types to be considered during operations.
+ wildcard_ignore (tuple): Domains to be ignored during wildcard detection.
+ wildcard_tests (int): Number of tests to be run for wildcard detection. Defaults to 5.
+ _wildcard_cache (dict): Cache for wildcard detection results.
+ _dns_cache (LRUCache): Cache for DNS resolution results, limited in size.
+ resolver_file (Path): File containing system's current resolver nameservers.
+ filter_bad_ptrs (bool): Whether to filter out DNS names that appear to be auto-generated PTR records. Defaults to True.
+
+ Args:
+ parent_helper: The parent helper object with configuration details and utilities.
+
+ Raises:
+ DNSError: If an issue arises when creating the BBOTAsyncResolver instance.
+
+ Examples:
+ >>> dns_helper = DNSHelper(parent_config)
+ >>> resolved_host = dns_helper.resolver.resolve("example.com")
+ """
+
+ def __init__(self, parent_helper):
+ self.parent_helper = parent_helper
+ self.config = self.parent_helper.config
+ self.dns_config = self.config.get("dns", {})
+ super().__init__(server_kwargs={"config": self.config})
+
+ # resolver
+ self.timeout = self.dns_config.get("timeout", 5)
+ self.resolver = dns.asyncresolver.Resolver()
+ self.resolver.rotate = True
+ self.resolver.timeout = self.timeout
+ self.resolver.lifetime = self.timeout
+
+ self.runaway_limit = self.config.get("runaway_limit", 5)
+
+ # wildcard handling
+ self.wildcard_disable = self.dns_config.get("wildcard_disable", False)
+ self.wildcard_ignore = RadixTarget()
+ for d in self.dns_config.get("wildcard_ignore", []):
+ self.wildcard_ignore.insert(d)
+
+ # copy the system's current resolvers to a text file for tool use
+ self.system_resolvers = dns.resolver.Resolver().nameservers
+ # TODO: DNS server speed test (start in background task)
+ self.resolver_file = self.parent_helper.tempfile(self.system_resolvers, pipe=False)
+
+ # brute force helper
+ self._brute = None
+
+ async def resolve(self, query, **kwargs):
+ return await self.run_and_return("resolve", query=query, **kwargs)
+
+ async def resolve_raw(self, query, **kwargs):
+ return await self.run_and_return("resolve_raw", query=query, **kwargs)
+
+ async def resolve_batch(self, queries, **kwargs):
+ async for _ in self.run_and_yield("resolve_batch", queries=queries, **kwargs):
+ yield _
+
+ async def resolve_raw_batch(self, queries):
+ async for _ in self.run_and_yield("resolve_raw_batch", queries=queries):
+ yield _
+
+ @property
+ def brute(self):
+ if self._brute is None:
+ from .brute import DNSBrute
+
+ self._brute = DNSBrute(self.parent_helper)
+ return self._brute
+
+ async def is_wildcard(self, query, ips=None, rdtype=None):
+ """
+ Use this method to check whether a *host* is a wildcard entry
+
+ This can reliably tell the difference between a valid DNS record and a wildcard within a wildcard domain.
+
+ If you want to know whether a domain is using wildcard DNS, use `is_wildcard_domain()` instead.
+
+ Args:
+ query (str): The hostname to check for a wildcard entry.
+ ips (list, optional): List of IPs to compare against, typically obtained from a previous DNS resolution of the query.
+ rdtype (str, optional): The DNS record type (e.g., "A", "AAAA") to consider during the check.
+
+ Returns:
+ dict: A dictionary indicating if the query is a wildcard for each checked DNS record type.
+ Keys are DNS record types like "A", "AAAA", etc.
+ Values are tuples where the first element is a boolean indicating if the query is a wildcard,
+ and the second element is the wildcard parent if it's a wildcard.
+
+ Raises:
+ ValueError: If only one of `ips` or `rdtype` is specified or if no valid IPs are specified.
+
+ Examples:
+ >>> is_wildcard("www.github.io")
+ {"A": (True, "github.io"), "AAAA": (True, "github.io")}
+
+ >>> is_wildcard("www.evilcorp.com", ips=["93.184.216.34"], rdtype="A")
+ {"A": (False, "evilcorp.com")}
+
+ Note:
+ `is_wildcard` can be True, False, or None (indicating that wildcard detection was inconclusive)
+ """
+ if [ips, rdtype].count(None) == 1:
+ raise ValueError("Both ips and rdtype must be specified")
+
+ query = self._wildcard_prevalidation(query)
+ if not query:
+ return {}
+
+ # skip check if the query is a domain
+ if is_domain(query):
+ return {}
+
+ return await self.run_and_return("is_wildcard", query=query, ips=ips, rdtype=rdtype)
+
+ async def is_wildcard_domain(self, domain, log_info=False):
+ domain = self._wildcard_prevalidation(domain)
+ if not domain:
+ return {}
+
+ return await self.run_and_return("is_wildcard_domain", domain=domain, log_info=False)
+
+ def _wildcard_prevalidation(self, host):
+ if self.wildcard_disable:
+ return False
+
+ host = clean_dns_record(host)
+ # skip check if it's an IP or a plain hostname
+ if is_ip(host) or not "." in host:
+ return False
+
+ # skip if query isn't a dns name
+ if not is_dns_name(host):
+ return False
+
+ # skip check if the query's parent domain is excluded in the config
+ wildcard_ignore = self.wildcard_ignore.search(host)
+ if wildcard_ignore:
+ log.debug(f"Skipping wildcard detection on {host} because {wildcard_ignore} is excluded in the config")
+ return False
+
+ return host
+
+ async def _mock_dns(self, mock_data):
+ from .mock import MockResolver
+
+ self.resolver = MockResolver(mock_data)
+ await self.run_and_return("_mock_dns", mock_data=mock_data)
diff --git a/bbot/core/helpers/dns/engine.py b/bbot/core/helpers/dns/engine.py
new file mode 100644
index 000000000..981a0948c
--- /dev/null
+++ b/bbot/core/helpers/dns/engine.py
@@ -0,0 +1,661 @@
+import os
+import dns
+import time
+import asyncio
+import logging
+import traceback
+from cachetools import LRUCache
+from contextlib import suppress
+
+from bbot.errors import DNSWildcardBreak
+from bbot.core.engine import EngineServer
+from bbot.core.helpers.async_helpers import NamedLock
+from bbot.core.helpers.dns.helpers import extract_targets
+from bbot.core.helpers.misc import (
+ is_ip,
+ rand_string,
+ parent_domain,
+ domain_parents,
+ clean_dns_record,
+)
+
+
+log = logging.getLogger("bbot.core.helpers.dns.engine.server")
+
+all_rdtypes = ["A", "AAAA", "SRV", "MX", "NS", "SOA", "CNAME", "TXT"]
+
+
+class DNSEngine(EngineServer):
+
+ CMDS = {
+ 0: "resolve",
+ 1: "resolve_raw",
+ 2: "resolve_batch",
+ 3: "resolve_raw_batch",
+ 4: "is_wildcard",
+ 5: "is_wildcard_domain",
+ 99: "_mock_dns",
+ }
+
+ def __init__(self, socket_path, config={}):
+ super().__init__(socket_path)
+
+ self.config = config
+ self.dns_config = self.config.get("dns", {})
+ # config values
+ self.timeout = self.dns_config.get("timeout", 5)
+ self.retries = self.dns_config.get("retries", 1)
+ self.abort_threshold = self.dns_config.get("abort_threshold", 50)
+
+ # resolver
+ self.resolver = dns.asyncresolver.Resolver()
+ self.resolver.rotate = True
+ self.resolver.timeout = self.timeout
+ self.resolver.lifetime = self.timeout
+
+ # skip certain queries
+ dns_omit_queries = self.dns_config.get("omit_queries", None)
+ if not dns_omit_queries:
+ dns_omit_queries = []
+ self.dns_omit_queries = dict()
+ for d in dns_omit_queries:
+ d = d.split(":")
+ if len(d) == 2:
+ rdtype, query = d
+ rdtype = rdtype.upper()
+ query = query.lower()
+ try:
+ self.dns_omit_queries[rdtype].add(query)
+ except KeyError:
+ self.dns_omit_queries[rdtype] = {query}
+
+ # wildcard handling
+ self.wildcard_ignore = self.dns_config.get("wildcard_ignore", None)
+ if not self.wildcard_ignore:
+ self.wildcard_ignore = []
+ self.wildcard_ignore = tuple([str(d).strip().lower() for d in self.wildcard_ignore])
+ self.wildcard_tests = self.dns_config.get("wildcard_tests", 5)
+ self._wildcard_cache = dict()
+ # since wildcard detection takes some time, This is to prevent multiple
+ # modules from kicking off wildcard detection for the same domain at the same time
+ self._wildcard_lock = NamedLock()
+
+ self._dns_connectivity_lock = None
+ self._last_dns_success = None
+ self._last_connectivity_warning = time.time()
+ # keeps track of warnings issued for wildcard detection to prevent duplicate warnings
+ self._dns_warnings = set()
+ self._errors = dict()
+ self._debug = self.dns_config.get("debug", False)
+ self._dns_cache = LRUCache(maxsize=10000)
+
+ self.filter_bad_ptrs = self.dns_config.get("filter_ptrs", True)
+
+ async def resolve(self, query, **kwargs):
+ """Resolve DNS names and IP addresses to their corresponding results.
+
+ This is a high-level function that can translate a given domain name to its associated IP addresses
+ or an IP address to its corresponding domain names. It's structured for ease of use within modules
+ and will abstract away most of the complexity of DNS resolution, returning a simple set of results.
+
+ Args:
+ query (str): The domain name or IP address to resolve.
+ **kwargs: Additional arguments to be passed to the resolution process.
+
+ Returns:
+ set: A set containing resolved domain names or IP addresses.
+
+ Examples:
+ >>> results = await resolve("1.2.3.4")
+ {"evilcorp.com"}
+
+ >>> results = await resolve("evilcorp.com")
+ {"1.2.3.4", "dead::beef"}
+ """
+ results = set()
+ try:
+ answers, errors = await self.resolve_raw(query, **kwargs)
+ for answer in answers:
+ for _, host in extract_targets(answer):
+ results.add(host)
+ except BaseException:
+ log.trace(f"Caught exception in resolve({query}, {kwargs}):")
+ log.trace(traceback.format_exc())
+ raise
+
+ self.debug(f"Results for {query} with kwargs={kwargs}: {results}")
+ return results
+
+ async def resolve_raw(self, query, **kwargs):
+ """Resolves the given query to its associated DNS records.
+
+ This function is a foundational method for DNS resolution in this class. It understands both IP addresses and
+ hostnames and returns their associated records in a raw format provided by the dnspython library.
+
+ Args:
+ query (str): The IP address or hostname to resolve.
+ type (str or list[str], optional): Specifies the DNS record type(s) to fetch. Can be a single type like 'A'
+ or a list like ['A', 'AAAA']. If set to 'any', 'all', or '*', it fetches all supported types. If not
+ specified, the function defaults to fetching 'A' and 'AAAA' records.
+ **kwargs: Additional arguments that might be passed to the resolver.
+
+ Returns:
+ tuple: A tuple containing two lists:
+ - list: A list of tuples where each tuple consists of a record type string (like 'A') and the associated
+ raw dnspython answer.
+ - list: A list of tuples where each tuple consists of a record type string and the associated error if
+ there was an issue fetching the record.
+
+ Examples:
+ >>> await resolve_raw("8.8.8.8")
+ ([('PTR', )], [])
+
+ >>> await resolve_raw("dns.google")
+ (, [])
+ """
+ # DNS over TCP is more reliable
+ # But setting this breaks DNS resolution on Ubuntu because systemd-resolve doesn't support TCP
+ # kwargs["tcp"] = True
+ try:
+ query = str(query).strip()
+ kwargs.pop("rdtype", None)
+ rdtype = kwargs.pop("type", "A")
+ if is_ip(query):
+ return await self._resolve_ip(query, **kwargs)
+ else:
+ return await self._resolve_hostname(query, rdtype=rdtype, **kwargs)
+ except BaseException:
+ log.trace(f"Caught exception in resolve_raw({query}, {kwargs}):")
+ log.trace(traceback.format_exc())
+ raise
+
+ async def _resolve_hostname(self, query, **kwargs):
+ """Translate a hostname into its corresponding IP addresses.
+
+ This is the foundational function for converting a domain name into its associated IP addresses. It's designed
+ for internal use within the class and handles retries, caching, and a variety of error/timeout scenarios.
+ It also respects certain configurations that might ask to skip certain types of queries. Results are returned
+ in the default dnspython answer object format.
+
+ Args:
+ query (str): The hostname to resolve.
+ rdtype (str, optional): The type of DNS record to query (e.g., 'A', 'AAAA'). Defaults to 'A'.
+ retries (int, optional): The number of times to retry on failure. Defaults to class-wide `retries`.
+ use_cache (bool, optional): Whether to check the cache before trying a fresh resolution. Defaults to True.
+ **kwargs: Additional arguments that might be passed to the resolver.
+
+ Returns:
+ tuple: A tuple containing:
+ - list: A list of resolved IP addresses.
+ - list: A list of errors encountered during the resolution process.
+
+ Examples:
+ >>> results, errors = await _resolve_hostname("google.com")
+ (, [])
+ """
+ self.debug(f"Resolving {query} with kwargs={kwargs}")
+ results = []
+ errors = []
+ rdtype = kwargs.get("rdtype", "A")
+
+ # skip certain queries if requested
+ if rdtype in self.dns_omit_queries:
+ if any(h == query or query.endswith(f".{h}") for h in self.dns_omit_queries[rdtype]):
+ self.debug(f"Skipping {rdtype}:{query} because it's omitted in the config")
+ return results, errors
+
+ parent = parent_domain(query)
+ retries = kwargs.pop("retries", self.retries)
+ use_cache = kwargs.pop("use_cache", True)
+ tries_left = int(retries) + 1
+ parent_hash = hash(f"{parent}:{rdtype}")
+ dns_cache_hash = hash(f"{query}:{rdtype}")
+ while tries_left > 0:
+ try:
+ if use_cache:
+ results = self._dns_cache.get(dns_cache_hash, [])
+ if not results:
+ error_count = self._errors.get(parent_hash, 0)
+ if error_count >= self.abort_threshold:
+ connectivity = await self._connectivity_check()
+ if connectivity:
+ log.verbose(
+ f'Aborting query "{query}" because failed {rdtype} queries for "{parent}" ({error_count:,}) exceeded abort threshold ({self.abort_threshold:,})'
+ )
+ if parent_hash not in self._dns_warnings:
+ log.verbose(
+ f'Aborting future {rdtype} queries to "{parent}" because error count ({error_count:,}) exceeded abort threshold ({self.abort_threshold:,})'
+ )
+ self._dns_warnings.add(parent_hash)
+ return results, errors
+ results = await self._catch(self.resolver.resolve, query, **kwargs)
+ if use_cache:
+ self._dns_cache[dns_cache_hash] = results
+ if parent_hash in self._errors:
+ self._errors[parent_hash] = 0
+ break
+ except (
+ dns.resolver.NoNameservers,
+ dns.exception.Timeout,
+ dns.resolver.LifetimeTimeout,
+ TimeoutError,
+ ) as e:
+ try:
+ self._errors[parent_hash] += 1
+ except KeyError:
+ self._errors[parent_hash] = 1
+ errors.append(e)
+ # don't retry if we get a SERVFAIL
+ if isinstance(e, dns.resolver.NoNameservers):
+ break
+ tries_left -= 1
+ err_msg = (
+ f'DNS error or timeout for {rdtype} query "{query}" ({self._errors[parent_hash]:,} so far): {e}'
+ )
+ if tries_left > 0:
+ retry_num = (retries + 1) - tries_left
+ self.debug(err_msg)
+ self.debug(f"Retry (#{retry_num}) resolving {query} with kwargs={kwargs}")
+ else:
+ log.verbose(err_msg)
+
+ if results:
+ self._last_dns_success = time.time()
+ self.debug(f"Answers for {query} with kwargs={kwargs}: {list(results)}")
+
+ if errors:
+ self.debug(f"Errors for {query} with kwargs={kwargs}: {errors}")
+
+ return results, errors
+
+ async def _resolve_ip(self, query, **kwargs):
+ """Translate an IP address into a corresponding DNS name.
+
+ This is the most basic function that will convert an IP address into its associated domain name. It handles
+ retries, caching, and multiple types of timeout/error scenarios internally. The function is intended for
+ internal use and should not be directly called by modules without understanding its intricacies.
+
+ Args:
+ query (str): The IP address to be reverse-resolved.
+ retries (int, optional): The number of times to retry on failure. Defaults to 0.
+ use_cache (bool, optional): Whether to check the cache for the result before attempting resolution. Defaults to True.
+ **kwargs: Additional arguments to be passed to the resolution process.
+
+ Returns:
+ tuple: A tuple containing:
+ - list: A list of resolved domain names (in default dnspython answer format).
+ - list: A list of errors encountered during resolution.
+
+ Examples:
+ >>> results, errors = await _resolve_ip("8.8.8.8")
+ (, [])
+ """
+ self.debug(f"Reverse-resolving {query} with kwargs={kwargs}")
+ retries = kwargs.pop("retries", 0)
+ use_cache = kwargs.pop("use_cache", True)
+ tries_left = int(retries) + 1
+ results = []
+ errors = []
+ dns_cache_hash = hash(f"{query}:PTR")
+ while tries_left > 0:
+ try:
+ if use_cache:
+ results = self._dns_cache.get(dns_cache_hash, [])
+ if not results:
+ results = await self._catch(self.resolver.resolve_address, query, **kwargs)
+ if use_cache:
+ self._dns_cache[dns_cache_hash] = results
+ break
+ except (
+ dns.exception.Timeout,
+ dns.resolver.LifetimeTimeout,
+ dns.resolver.NoNameservers,
+ TimeoutError,
+ ) as e:
+ errors.append(e)
+ # don't retry if we get a SERVFAIL
+ if isinstance(e, dns.resolver.NoNameservers):
+ self.debug(f"{e} (query={query}, kwargs={kwargs})")
+ break
+ else:
+ tries_left -= 1
+ if tries_left > 0:
+ retry_num = (retries + 2) - tries_left
+ self.debug(f"Retrying (#{retry_num}) {query} with kwargs={kwargs}")
+
+ if results:
+ self._last_dns_success = time.time()
+
+ return results, errors
+
+ async def resolve_batch(self, queries, threads=10, **kwargs):
+ """
+ A helper to execute a bunch of DNS requests.
+
+ Args:
+ queries (list): List of queries to resolve.
+ **kwargs: Additional keyword arguments to pass to `resolve()`.
+
+ Yields:
+ tuple: A tuple containing the original query and its resolved value.
+
+ Examples:
+ >>> import asyncio
+ >>> async def example_usage():
+ ... async for result in resolve_batch(['www.evilcorp.com', 'evilcorp.com']):
+ ... print(result)
+ ('www.evilcorp.com', {'1.1.1.1'})
+ ('evilcorp.com', {'2.2.2.2'})
+ """
+ tasks = {}
+
+ def new_task(query):
+ task = asyncio.create_task(self.resolve(query, **kwargs))
+ tasks[task] = query
+
+ queries = list(queries)
+ for _ in range(threads): # Start initial batch of tasks
+ if queries: # Ensure there are args to process
+ new_task(queries.pop(0))
+
+ while tasks: # While there are tasks pending
+ # Wait for the first task to complete
+ done, pending = await asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED)
+
+ for task in done:
+ results = task.result()
+ query = tasks.pop(task)
+
+ if results:
+ yield (query, results)
+
+ if queries: # Start a new task for each one completed, if URLs remain
+ new_task(queries.pop(0))
+
+ async def resolve_raw_batch(self, queries, threads=10):
+ tasks = {}
+
+ def new_task(query, rdtype):
+ task = asyncio.create_task(self.resolve_raw(query, type=rdtype))
+ tasks[task] = (query, rdtype)
+
+ queries = list(queries)
+ for _ in range(threads): # Start initial batch of tasks
+ if queries: # Ensure there are args to process
+ new_task(*queries.pop(0))
+
+ while tasks: # While there are tasks pending
+ # Wait for the first task to complete
+ done, pending = await asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED)
+
+ for task in done:
+ answers, errors = task.result()
+ query, rdtype = tasks.pop(task)
+ for answer in answers:
+ yield ((query, rdtype), (answer, errors))
+
+ if queries: # Start a new task for each one completed, if URLs remain
+ new_task(*queries.pop(0))
+
+ async def _catch(self, callback, *args, **kwargs):
+ """
+ Asynchronously catches exceptions thrown during DNS resolution and logs them.
+
+ This method wraps around a given asynchronous callback function to handle different
+ types of DNS exceptions and general exceptions. It logs the exceptions for debugging
+ and, in some cases, re-raises them.
+
+ Args:
+ callback (callable): The asynchronous function to be executed.
+ *args: Positional arguments to pass to the callback.
+ **kwargs: Keyword arguments to pass to the callback.
+
+ Returns:
+ Any: The return value of the callback function, or an empty list if an exception is caught.
+
+ Raises:
+ dns.resolver.NoNameservers: When no nameservers could be reached.
+ """
+ try:
+ return await callback(*args, **kwargs)
+ except dns.resolver.NoNameservers:
+ raise
+ except (dns.exception.Timeout, dns.resolver.LifetimeTimeout, TimeoutError):
+ log.debug(f"DNS query with args={args}, kwargs={kwargs} timed out after {self.timeout} seconds")
+ raise
+ except dns.exception.DNSException as e:
+ self.debug(f"{e} (args={args}, kwargs={kwargs})")
+ except Exception as e:
+ log.warning(f"Error in {callback.__qualname__}() with args={args}, kwargs={kwargs}: {e}")
+ log.trace(traceback.format_exc())
+ return []
+
+ async def is_wildcard(self, query, ips=None, rdtype=None):
+ """
+ Use this method to check whether a *host* is a wildcard entry
+
+ This can reliably tell the difference between a valid DNS record and a wildcard within a wildcard domain.
+
+ If you want to know whether a domain is using wildcard DNS, use `is_wildcard_domain()` instead.
+
+ Args:
+ query (str): The hostname to check for a wildcard entry.
+ ips (list, optional): List of IPs to compare against, typically obtained from a previous DNS resolution of the query.
+ rdtype (str, optional): The DNS record type (e.g., "A", "AAAA") to consider during the check.
+
+ Returns:
+ dict: A dictionary indicating if the query is a wildcard for each checked DNS record type.
+ Keys are DNS record types like "A", "AAAA", etc.
+ Values are tuples where the first element is a boolean indicating if the query is a wildcard,
+ and the second element is the wildcard parent if it's a wildcard.
+
+ Raises:
+ ValueError: If only one of `ips` or `rdtype` is specified or if no valid IPs are specified.
+
+ Examples:
+ >>> is_wildcard("www.github.io")
+ {"A": (True, "github.io"), "AAAA": (True, "github.io")}
+
+ >>> is_wildcard("www.evilcorp.com", ips=["93.184.216.34"], rdtype="A")
+ {"A": (False, "evilcorp.com")}
+
+ Note:
+ `is_wildcard` can be True, False, or None (indicating that wildcard detection was inconclusive)
+ """
+ result = {}
+
+ parent = parent_domain(query)
+ parents = list(domain_parents(query))
+
+ rdtypes_to_check = [rdtype] if rdtype is not None else all_rdtypes
+
+ query_baseline = dict()
+ # if the caller hasn't already done the work of resolving the IPs
+ if ips is None:
+ # then resolve the query for all rdtypes
+ queries = [(query, t) for t in rdtypes_to_check]
+ async for (query, _rdtype), (answers, errors) in self.resolve_raw_batch(queries):
+ answers = extract_targets(answers)
+ if answers:
+ query_baseline[_rdtype] = set([a[1] for a in answers])
+ else:
+ if errors:
+ self.debug(f"Failed to resolve {query} ({_rdtype}) during wildcard detection")
+ result[_rdtype] = (None, parent)
+ continue
+ else:
+ # otherwise, we can skip all that
+ cleaned_ips = set([clean_dns_record(ip) for ip in ips])
+ if not cleaned_ips:
+ raise ValueError("Valid IPs must be specified")
+ query_baseline[rdtype] = cleaned_ips
+ if not query_baseline:
+ return result
+
+ # once we've resolved the base query and have IP addresses to work with
+ # we can compare the IPs to the ones we have on file for wildcards
+
+ # for every parent domain, starting with the shortest
+ try:
+ for host in parents[::-1]:
+ # make sure we've checked that domain for wildcards
+ await self.is_wildcard_domain(host)
+
+ # for every rdtype
+ for _rdtype in list(query_baseline):
+ # get the IPs from above
+ query_ips = query_baseline.get(_rdtype, set())
+ host_hash = hash(host)
+
+ if host_hash in self._wildcard_cache:
+ # then get its IPs from our wildcard cache
+ wildcard_rdtypes = self._wildcard_cache[host_hash]
+
+ # then check to see if our IPs match the wildcard ones
+ if _rdtype in wildcard_rdtypes:
+ wildcard_ips = wildcard_rdtypes[_rdtype]
+ # if our IPs match the wildcard ones, then ladies and gentlemen we have a wildcard
+ is_wildcard = any(r in wildcard_ips for r in query_ips)
+
+ if is_wildcard and not result.get(_rdtype, (None, None))[0] is True:
+ result[_rdtype] = (True, host)
+
+ # if we've reached a point where the dns name is a complete wildcard, class can be dismissed early
+ base_query_rdtypes = set(query_baseline)
+ wildcard_rdtypes_set = set([k for k, v in result.items() if v[0] is True])
+ if base_query_rdtypes and wildcard_rdtypes_set and base_query_rdtypes == wildcard_rdtypes_set:
+ log.debug(
+ f"Breaking from wildcard detection for {query} at {host} because base query rdtypes ({base_query_rdtypes}) == wildcard rdtypes ({wildcard_rdtypes_set})"
+ )
+ raise DNSWildcardBreak()
+
+ except DNSWildcardBreak:
+ pass
+
+ return result
+
+ async def is_wildcard_domain(self, domain, log_info=False):
+ """
+ Check whether a given host or its children make use of wildcard DNS entries. Wildcard DNS can have
+ various implications, particularly in subdomain enumeration and subdomain takeovers.
+
+ Args:
+ domain (str): The domain to check for wildcard DNS entries.
+ log_info (bool, optional): Whether to log the result of the check. Defaults to False.
+
+ Returns:
+ dict: A dictionary where the keys are the parent domains that have wildcard DNS entries,
+ and the values are another dictionary of DNS record types ("A", "AAAA", etc.) mapped to
+ sets of their resolved IP addresses.
+
+ Examples:
+ >>> is_wildcard_domain("github.io")
+ {"github.io": {"A": {"1.2.3.4"}, "AAAA": {"dead::beef"}}}
+
+ >>> is_wildcard_domain("example.com")
+ {}
+ """
+ wildcard_domain_results = {}
+
+ rdtypes_to_check = set(all_rdtypes)
+
+ # make a list of its parents
+ parents = list(domain_parents(domain, include_self=True))
+ # and check each of them, beginning with the highest parent (i.e. the root domain)
+ for i, host in enumerate(parents[::-1]):
+ # have we checked this host before?
+ host_hash = hash(host)
+ async with self._wildcard_lock.lock(host_hash):
+ # if we've seen this host before
+ if host_hash in self._wildcard_cache:
+ wildcard_domain_results[host] = self._wildcard_cache[host_hash]
+ continue
+
+ log.verbose(f"Checking if {host} is a wildcard")
+
+ # determine if this is a wildcard domain
+
+ # resolve a bunch of random subdomains of the same parent
+ is_wildcard = False
+ wildcard_results = dict()
+
+ queries = []
+ for rdtype in rdtypes_to_check:
+ for _ in range(self.wildcard_tests):
+ rand_query = f"{rand_string(digits=False, length=10)}.{host}"
+ queries.append((rand_query, rdtype))
+
+ async for (query, rdtype), (answers, errors) in self.resolve_raw_batch(queries):
+ answers = extract_targets(answers)
+ if answers:
+ is_wildcard = True
+ if not rdtype in wildcard_results:
+ wildcard_results[rdtype] = set()
+ wildcard_results[rdtype].update(set(a[1] for a in answers))
+ # we know this rdtype is a wildcard
+ # so we don't need to check it anymore
+ with suppress(KeyError):
+ rdtypes_to_check.remove(rdtype)
+
+ self._wildcard_cache.update({host_hash: wildcard_results})
+ wildcard_domain_results.update({host: wildcard_results})
+ if is_wildcard:
+ wildcard_rdtypes_str = ",".join(sorted([t.upper() for t, r in wildcard_results.items() if r]))
+ log_fn = log.verbose
+ if log_info:
+ log_fn = log.info
+ log_fn(f"Encountered domain with wildcard DNS ({wildcard_rdtypes_str}): {host}")
+ else:
+ log.verbose(f"Finished checking {host}, it is not a wildcard")
+
+ return wildcard_domain_results
+
+ @property
+ def dns_connectivity_lock(self):
+ if self._dns_connectivity_lock is None:
+ self._dns_connectivity_lock = asyncio.Lock()
+ return self._dns_connectivity_lock
+
+ async def _connectivity_check(self, interval=5):
+ """
+ Periodically checks for an active internet connection by attempting DNS resolution.
+
+ Args:
+ interval (int, optional): The time interval, in seconds, at which to perform the check.
+ Defaults to 5 seconds.
+
+ Returns:
+ bool: True if there is an active internet connection, False otherwise.
+
+ Examples:
+ >>> await _connectivity_check()
+ True
+ """
+ if self._last_dns_success is not None:
+ if time.time() - self._last_dns_success < interval:
+ return True
+ dns_server_working = []
+ async with self.dns_connectivity_lock:
+ with suppress(Exception):
+ dns_server_working = await self._catch(self.resolver.resolve, "www.google.com", rdtype="A")
+ if dns_server_working:
+ self._last_dns_success = time.time()
+ return True
+ if time.time() - self._last_connectivity_warning > interval:
+ log.warning(f"DNS queries are failing, please check your internet connection")
+ self._last_connectivity_warning = time.time()
+ self._errors.clear()
+ return False
+
+ def debug(self, *args, **kwargs):
+ if self._debug:
+ log.trace(*args, **kwargs)
+
+ @property
+ def in_tests(self):
+ return os.getenv("BBOT_TESTING", "") == "True"
+
+ async def _mock_dns(self, mock_data):
+ from .mock import MockResolver
+
+ self.resolver = MockResolver(mock_data)
diff --git a/bbot/core/helpers/dns/helpers.py b/bbot/core/helpers/dns/helpers.py
new file mode 100644
index 000000000..061ed829c
--- /dev/null
+++ b/bbot/core/helpers/dns/helpers.py
@@ -0,0 +1,61 @@
+import logging
+
+from bbot.core.helpers.regexes import dns_name_regex
+from bbot.core.helpers.misc import clean_dns_record, smart_decode
+
+log = logging.getLogger("bbot.core.helpers.dns")
+
+
+def extract_targets(record):
+ """
+ Extracts hostnames or IP addresses from a given DNS record.
+
+ This method reads the DNS record's type and based on that, extracts the target
+ hostnames or IP addresses it points to. The type of DNS record
+ (e.g., "A", "MX", "CNAME", etc.) determines which fields are used for extraction.
+
+ Args:
+ record (dns.rdata.Rdata): The DNS record to extract information from.
+
+ Returns:
+ set: A set of tuples, each containing the DNS record type and the extracted value.
+
+ Examples:
+ >>> from dns.rrset import from_text
+ >>> record = from_text('www.example.com', 3600, 'IN', 'A', '192.0.2.1')
+ >>> extract_targets(record[0])
+ {('A', '192.0.2.1')}
+
+ >>> record = from_text('example.com', 3600, 'IN', 'MX', '10 mail.example.com.')
+ >>> extract_targets(record[0])
+ {('MX', 'mail.example.com')}
+
+ """
+ results = set()
+
+ def add_result(rdtype, _record):
+ cleaned = clean_dns_record(_record)
+ if cleaned:
+ results.add((rdtype, cleaned))
+
+ rdtype = str(record.rdtype.name).upper()
+ if rdtype in ("A", "AAAA", "NS", "CNAME", "PTR"):
+ add_result(rdtype, record)
+ elif rdtype == "SOA":
+ add_result(rdtype, record.mname)
+ elif rdtype == "MX":
+ add_result(rdtype, record.exchange)
+ elif rdtype == "SRV":
+ add_result(rdtype, record.target)
+ elif rdtype == "TXT":
+ for s in record.strings:
+ s = smart_decode(s)
+ for match in dns_name_regex.finditer(s):
+ start, end = match.span()
+ host = s[start:end]
+ add_result(rdtype, host)
+ elif rdtype == "NSEC":
+ add_result(rdtype, record.next)
+ else:
+ log.warning(f'Unknown DNS record type "{rdtype}"')
+ return results
diff --git a/bbot/core/helpers/dns/mock.py b/bbot/core/helpers/dns/mock.py
new file mode 100644
index 000000000..70d978aff
--- /dev/null
+++ b/bbot/core/helpers/dns/mock.py
@@ -0,0 +1,56 @@
+import dns
+
+
+class MockResolver:
+
+ def __init__(self, mock_data=None):
+ self.mock_data = mock_data if mock_data else {}
+ self.nameservers = ["127.0.0.1"]
+
+ async def resolve_address(self, ipaddr, *args, **kwargs):
+ modified_kwargs = {}
+ modified_kwargs.update(kwargs)
+ modified_kwargs["rdtype"] = "PTR"
+ return await self.resolve(str(dns.reversename.from_address(ipaddr)), *args, **modified_kwargs)
+
+ def create_dns_response(self, query_name, rdtype):
+ query_name = query_name.strip(".")
+ answers = self.mock_data.get(query_name, {}).get(rdtype, [])
+ if not answers:
+ raise dns.resolver.NXDOMAIN(f"No answer found for {query_name} {rdtype}")
+
+ message_text = f"""id 1234
+opcode QUERY
+rcode NOERROR
+flags QR AA RD
+;QUESTION
+{query_name}. IN {rdtype}
+;ANSWER"""
+ for answer in answers:
+ message_text += f"\n{query_name}. 1 IN {rdtype} {answer}"
+
+ message_text += "\n;AUTHORITY\n;ADDITIONAL\n"
+ message = dns.message.from_text(message_text)
+ return message
+
+ async def resolve(self, query_name, rdtype=None):
+ if rdtype is None:
+ rdtype = "A"
+ elif isinstance(rdtype, str):
+ rdtype = rdtype.upper()
+ else:
+ rdtype = str(rdtype.name).upper()
+
+ domain_name = dns.name.from_text(query_name)
+ rdtype_obj = dns.rdatatype.from_text(rdtype)
+
+ if "_NXDOMAIN" in self.mock_data and query_name in self.mock_data["_NXDOMAIN"]:
+ # Simulate the NXDOMAIN exception
+ raise dns.resolver.NXDOMAIN
+
+ try:
+ response = self.create_dns_response(query_name, rdtype)
+ answer = dns.resolver.Answer(domain_name, rdtype_obj, dns.rdataclass.IN, response)
+ return answer
+ except dns.resolver.NXDOMAIN:
+ return []
diff --git a/bbot/core/helpers/files.py b/bbot/core/helpers/files.py
index 438f74112..fb92d1c8b 100644
--- a/bbot/core/helpers/files.py
+++ b/bbot/core/helpers/files.py
@@ -1,6 +1,5 @@
import os
import logging
-import threading
import traceback
from contextlib import suppress
@@ -104,7 +103,13 @@ def feed_pipe(self, pipe, content, text=True):
text (bool, optional): If True, the content is decoded using smart_decode function.
If False, smart_encode function is used. Defaults to True.
"""
- t = threading.Thread(target=self._feed_pipe, args=(pipe, content), kwargs={"text": text}, daemon=True)
+ t = self.preset.core.create_thread(
+ target=self._feed_pipe,
+ args=(pipe, content),
+ kwargs={"text": text},
+ daemon=True,
+ custom_name="bbot feed_pipe()",
+ )
t.start()
@@ -127,7 +132,9 @@ def tempfile_tail(self, callback):
rm_at_exit(filename)
try:
os.mkfifo(filename)
- t = threading.Thread(target=tail, args=(filename, callback), daemon=True)
+ t = self.preset.core.create_thread(
+ target=tail, args=(filename, callback), daemon=True, custom_name="bbot tempfile_tail()"
+ )
t.start()
except Exception as e:
log.error(f"Error setting up tail for file {filename}: {e}")
diff --git a/bbot/core/helpers/helper.py b/bbot/core/helpers/helper.py
index 899f3ab0b..77aa22566 100644
--- a/bbot/core/helpers/helper.py
+++ b/bbot/core/helpers/helper.py
@@ -1,19 +1,21 @@
import os
+import asyncio
import logging
from pathlib import Path
+import multiprocessing as mp
+from functools import partial
+from concurrent.futures import ProcessPoolExecutor
from . import misc
from .dns import DNSHelper
from .web import WebHelper
from .diff import HttpCompare
-from .cloud import CloudHelper
+from .regex import RegexHelper
from .wordcloud import WordCloud
from .interactsh import Interactsh
from ...scanner.target import Target
-from ...modules.base import BaseModule
from .depsinstaller import DepsInstaller
-
log = logging.getLogger("bbot.core.helpers")
@@ -51,10 +53,9 @@ class ConfigAwareHelper:
from .cache import cache_get, cache_put, cache_filename, is_cached
from .command import run, run_live, _spawn_proc, _prepare_command_kwargs
- def __init__(self, config, scan=None):
- self.config = config
- self._scan = scan
- self.bbot_home = Path(self.config.get("home", "~/.bbot")).expanduser().resolve()
+ def __init__(self, preset):
+ self.preset = preset
+ self.bbot_home = self.preset.bbot_home
self.cache_dir = self.bbot_home / "cache"
self.temp_dir = self.bbot_home / "temp"
self.tools_dir = self.bbot_home / "tools"
@@ -68,20 +69,72 @@ def __init__(self, config, scan=None):
self.mkdir(self.tools_dir)
self.mkdir(self.lib_dir)
+ self._loop = None
+
+ # multiprocessing thread pool
+ start_method = mp.get_start_method()
+ if start_method != "spawn":
+ self.warning(f"Multiprocessing spawn method is set to {start_method}.")
+
+ # we spawn 1 fewer processes than cores
+ # this helps to avoid locking up the system or competing with the main python process for cpu time
+ num_processes = max(1, mp.cpu_count() - 1)
+ self.process_pool = ProcessPoolExecutor(max_workers=num_processes)
+
+ self._cloud = None
+
+ self.re = RegexHelper(self)
self.dns = DNSHelper(self)
- self.web = WebHelper(self)
+ self._web = None
+ self.config_aware_validators = self.validators.Validators(self)
self.depsinstaller = DepsInstaller(self)
self.word_cloud = WordCloud(self)
self.dummy_modules = {}
- # cloud helpers
- self.cloud = CloudHelper(self)
+ @property
+ def web(self):
+ if self._web is None:
+ self._web = WebHelper(self)
+ return self._web
+
+ @property
+ def cloud(self):
+ if self._cloud is None:
+ from cloudcheck import cloud_providers
+
+ self._cloud = cloud_providers
+ return self._cloud
+
+ def bloom_filter(self, size):
+ from .bloom import BloomFilter
+
+ return BloomFilter(size)
def interactsh(self, *args, **kwargs):
return Interactsh(self, *args, **kwargs)
- def http_compare(self, url, allow_redirects=False, include_cache_buster=True):
- return HttpCompare(url, self, allow_redirects=allow_redirects, include_cache_buster=include_cache_buster)
+ def http_compare(
+ self,
+ url,
+ allow_redirects=False,
+ include_cache_buster=True,
+ headers=None,
+ cookies=None,
+ method="GET",
+ data=None,
+ timeout=15,
+ ):
+ return HttpCompare(
+ url,
+ self,
+ allow_redirects=allow_redirects,
+ include_cache_buster=include_cache_buster,
+ headers=headers,
+ cookies=cookies,
+ timeout=timeout,
+ method=method,
+ data=data,
+ )
def temp_filename(self, extension=None):
"""
@@ -96,31 +149,56 @@ def clean_old_scans(self):
_filter = lambda x: x.is_dir() and self.regexes.scan_name_regex.match(x.name)
self.clean_old(self.scans_dir, keep=self.keep_old_scans, filter=_filter)
- def make_target(self, *events):
- return Target(self.scan, *events)
+ def make_target(self, *events, **kwargs):
+ return Target(*events, **kwargs)
@property
- def scan(self):
- if self._scan is None:
- from bbot.scanner import Scanner
+ def config(self):
+ return self.preset.config
- self._scan = Scanner()
- return self._scan
+ @property
+ def web_config(self):
+ return self.preset.web_config
@property
- def in_tests(self):
- return os.environ.get("BBOT_TESTING", "") == "True"
+ def scan(self):
+ return self.preset.scan
- def _make_dummy_module(self, name, _type="scan"):
+ @property
+ def loop(self):
"""
- Construct a dummy module, for attachment to events
+ Get the current event loop
"""
- try:
- return self.dummy_modules[name]
- except KeyError:
- dummy = DummyModule(scan=self.scan, name=name, _type=_type)
- self.dummy_modules[name] = dummy
- return dummy
+ if self._loop is None:
+ self._loop = asyncio.get_running_loop()
+ return self._loop
+
+ def run_in_executor(self, callback, *args, **kwargs):
+ """
+ Run a synchronous task in the event loop's default thread pool executor
+
+ Examples:
+ Execute callback:
+ >>> result = await self.helpers.run_in_executor(callback_fn, arg1, arg2)
+ """
+ callback = partial(callback, **kwargs)
+ return self.loop.run_in_executor(None, callback, *args)
+
+ def run_in_executor_mp(self, callback, *args, **kwargs):
+ """
+ Same as run_in_executor() except with a process pool executor
+ Use only in cases where callback is CPU-bound
+
+ Examples:
+ Execute callback:
+ >>> result = await self.helpers.run_in_executor_mp(callback_fn, arg1, arg2)
+ """
+ callback = partial(callback, **kwargs)
+ return self.loop.run_in_executor(self.process_pool, callback, *args)
+
+ @property
+ def in_tests(self):
+ return os.environ.get("BBOT_TESTING", "") == "True"
def __getattribute__(self, attr):
"""
@@ -163,12 +241,3 @@ def __getattribute__(self, attr):
except AttributeError:
# then die
raise AttributeError(f'Helper has no attribute "{attr}"')
-
-
-class DummyModule(BaseModule):
- _priority = 4
-
- def __init__(self, *args, **kwargs):
- self._name = kwargs.pop("name")
- self._type = kwargs.pop("_type")
- super().__init__(*args, **kwargs)
diff --git a/bbot/core/helpers/interactsh.py b/bbot/core/helpers/interactsh.py
index aad4a169f..f707fac93 100644
--- a/bbot/core/helpers/interactsh.py
+++ b/bbot/core/helpers/interactsh.py
@@ -11,7 +11,7 @@
from Crypto.PublicKey import RSA
from Crypto.Cipher import AES, PKCS1_OAEP
-from bbot.core.errors import InteractshError
+from bbot.errors import InteractshError
log = logging.getLogger("bbot.core.helpers.interactsh")
diff --git a/bbot/core/helpers/misc.py b/bbot/core/helpers/misc.py
index 57e7e189e..cce1c1ff8 100644
--- a/bbot/core/helpers/misc.py
+++ b/bbot/core/helpers/misc.py
@@ -1,42 +1,22 @@
import os
-import re
import sys
import copy
-import idna
import json
-import atexit
-import codecs
-import psutil
import random
-import shutil
-import signal
import string
import asyncio
-import difflib
-import inspect
import logging
-import platform
import ipaddress
-import traceback
+import regex as re
import subprocess as sp
from pathlib import Path
-from itertools import islice
-from datetime import datetime
-from tabulate import tabulate
-import wordninja as _wordninja
from contextlib import suppress
from unidecode import unidecode # noqa F401
-import cloudcheck as _cloudcheck
-import tldextract as _tldextract
-import xml.etree.ElementTree as ET
-from collections.abc import Mapping
-from hashlib import sha1 as hashlib_sha1
from asyncio import create_task, gather, sleep, wait_for # noqa
from urllib.parse import urlparse, quote, unquote, urlunparse, urljoin # noqa F401
from .url import * # noqa F401
-from .. import errors
-from .logger import log_to_stderr
+from ... import errors
from . import regexes as bbot_regexes
from .names_generator import random_name, names, adjectives # noqa F401
@@ -257,6 +237,7 @@ def split_host_port(d):
port = match.group(3)
if port is None and scheme is not None:
+ scheme = scheme.lower()
if scheme in ("https", "wss"):
port = 443
elif scheme in ("http", "ws"):
@@ -479,6 +460,8 @@ def tldextract(data):
- Utilizes `smart_decode` to preprocess the data.
- Makes use of the `tldextract` library for extraction.
"""
+ import tldextract as _tldextract
+
return _tldextract.extract(smart_decode(data))
@@ -656,7 +639,7 @@ def is_ip_type(i):
>>> is_ip_type("192.168.1.0/24")
False
"""
- return isinstance(i, ipaddress._BaseV4) or isinstance(i, ipaddress._BaseV6)
+ return ipaddress._IPAddressBase in i.__class__.__mro__
def make_ip_type(s):
@@ -682,78 +665,17 @@ def make_ip_type(s):
>>> make_ip_type("evilcorp.com")
'evilcorp.com'
"""
+ if not s:
+ raise ValueError(f'Invalid hostname: "{s}"')
# IP address
with suppress(Exception):
- return ipaddress.ip_address(str(s).strip())
+ return ipaddress.ip_address(s)
# IP network
with suppress(Exception):
- return ipaddress.ip_network(str(s).strip(), strict=False)
+ return ipaddress.ip_network(s, strict=False)
return s
-def host_in_host(host1, host2):
- """
- Checks if host1 is included within host2, either as a subdomain, IP, or IP network.
- Used for scope calculations/decisions within BBOT.
-
- Args:
- host1 (str or ipaddress.IPv4Address or ipaddress.IPv6Address or ipaddress.IPv4Network or ipaddress.IPv6Network):
- The host to check for inclusion within host2.
- host2 (str or ipaddress.IPv4Address or ipaddress.IPv6Address or ipaddress.IPv4Network or ipaddress.IPv6Network):
- The host within which to check for the inclusion of host1.
-
- Returns:
- bool: True if host1 is included in host2, otherwise False.
-
- Examples:
- >>> host_in_host("www.evilcorp.com", "evilcorp.com")
- True
- >>> host_in_host("evilcorp.com", "www.evilcorp.com")
- False
- >>> host_in_host(ipaddress.IPv6Address('dead::beef'), ipaddress.IPv6Network('dead::/64'))
- True
- >>> host_in_host(ipaddress.IPv4Address('192.168.1.1'), ipaddress.IPv4Network('10.0.0.0/8'))
- False
-
- Notes:
- - If checking an IP address/network, you MUST FIRST convert your IP into an ipaddress object (e.g. via `make_ip_type()`) before passing it to this function.
- """
-
- """
- Is host1 included in host2?
- "www.evilcorp.com" in "evilcorp.com"? --> True
- "evilcorp.com" in "www.evilcorp.com"? --> False
- IPv6Address('dead::beef') in IPv6Network('dead::/64')? --> True
- IPv4Address('192.168.1.1') in IPv4Network('10.0.0.0/8')? --> False
-
- Very important! Used throughout BBOT for scope calculations/decisions.
-
- Works with hostnames, IPs, and IP networks.
- """
-
- if not host1 or not host2:
- return False
-
- # check if hosts are IP types
- host1_ip_type = is_ip_type(host1)
- host2_ip_type = is_ip_type(host2)
- # if both hosts are IP types
- if host1_ip_type and host2_ip_type:
- if not host1.version == host2.version:
- return False
- host1_net = ipaddress.ip_network(host1)
- host2_net = ipaddress.ip_network(host2)
- return host1_net.subnet_of(host2_net)
-
- # else hostnames
- elif not (host1_ip_type or host2_ip_type):
- host2_len = len(host2.split("."))
- host1_truncated = ".".join(host1.split(".")[-host2_len:])
- return host1_truncated == host2
-
- return False
-
-
def sha1(data):
"""
Computes the SHA-1 hash of the given data.
@@ -768,6 +690,8 @@ def sha1(data):
>>> sha1("asdf").hexdigest()
'3da541559918a808c2402bba5012f6c60b27661c'
"""
+ from hashlib import sha1 as hashlib_sha1
+
if isinstance(data, dict):
data = json.dumps(data, sort_keys=True)
return hashlib_sha1(smart_encode(data))
@@ -841,6 +765,8 @@ def recursive_decode(data, max_depth=5):
>>> recursive_dcode("%5Cu0020%5Cu041f%5Cu0440%5Cu0438%5Cu0432%5Cu0435%5Cu0442%5Cu0021")
" Привет!"
"""
+ import codecs
+
# Decode newline and tab escapes
data = backslash_regex.sub(
lambda match: {"n": "\n", "t": "\t", "r": "\r", "b": "\b", "v": "\v"}.get(match.group("char")), data
@@ -897,133 +823,105 @@ def truncate_string(s, n):
return s
-def extract_params_json(json_data):
+def extract_params_json(json_data, compare_mode="getparam"):
"""
- Extracts keys from a JSON object and returns them as a set. Used by the `paramminer_headers` module.
+ Extracts key-value pairs from a JSON object and returns them as a set of tuples. Used by the `paramminer_headers` module.
Args:
json_data (str): JSON-formatted string containing key-value pairs.
Returns:
- set: A set containing the keys present in the JSON object.
+ set: A set of tuples containing the keys and their corresponding values present in the JSON object.
Raises:
- Logs a message if JSONDecodeError occurs.
+ Returns an empty set if JSONDecodeError occurs.
Examples:
>>> extract_params_json('{"a": 1, "b": {"c": 2}}')
- {'a', 'b', 'c'}
+ {('a', 1), ('b', {'c': 2}), ('c', 2)}
"""
try:
data = json.loads(json_data)
except json.JSONDecodeError:
- log.debug("Invalid JSON supplied. Returning empty list.")
return set()
- keys = set()
- stack = [data]
+ key_value_pairs = set()
+ stack = [(data, "")]
while stack:
- current_data = stack.pop()
+ current_data, path = stack.pop()
if isinstance(current_data, dict):
for key, value in current_data.items():
- keys.add(key)
- if isinstance(value, (dict, list)):
- stack.append(value)
+ full_key = f"{path}.{key}" if path else key
+ if isinstance(value, dict):
+ stack.append((value, full_key))
+ elif isinstance(value, list):
+ stack.append((value, full_key))
+ else:
+ if validate_parameter(full_key, compare_mode):
+ key_value_pairs.add((full_key, value))
elif isinstance(current_data, list):
for item in current_data:
if isinstance(item, (dict, list)):
- stack.append(item)
-
- return keys
+ stack.append((item, path))
+ return key_value_pairs
-def extract_params_xml(xml_data):
+def extract_params_xml(xml_data, compare_mode="getparam"):
"""
- Extracts tags from an XML object and returns them as a set.
+ Extracts tags and their text values from an XML object and returns them as a set of tuples.
Args:
xml_data (str): XML-formatted string containing elements.
Returns:
- set: A set containing the tags present in the XML object.
+ set: A set of tuples containing the tags and their corresponding text values present in the XML object.
Raises:
- Logs a message if ParseError occurs.
+ Returns an empty set if ParseError occurs.
Examples:
- >>> extract_params_xml('')
- {'child1', 'child2', 'root'}
+ >>> extract_params_xml('value')
+ {('root', None), ('child1', None), ('child2', 'value')}
"""
+ import xml.etree.ElementTree as ET
+
try:
root = ET.fromstring(xml_data)
except ET.ParseError:
- log.debug("Invalid XML supplied. Returning empty list.")
return set()
- tags = set()
+ tag_value_pairs = set()
stack = [root]
while stack:
current_element = stack.pop()
- tags.add(current_element.tag)
+ if validate_parameter(current_element.tag, compare_mode):
+ tag_value_pairs.add((current_element.tag, current_element.text))
for child in current_element:
stack.append(child)
- return tags
+ return tag_value_pairs
-def extract_params_html(html_data):
- """
- Extracts parameters from an HTML object, yielding them one at a time.
+# Define valid characters for each mode based on RFCs
+valid_chars_dict = {
+ "header": set(
+ chr(c) for c in range(33, 127) if chr(c) in "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_"
+ ),
+ "getparam": set(chr(c) for c in range(33, 127) if chr(c) not in ":/?#[]@!$&'()*+,;="),
+ "postparam": set(chr(c) for c in range(33, 127) if chr(c) not in ":/?#[]@!$&'()*+,;="),
+ "cookie": set(chr(c) for c in range(33, 127) if chr(c) not in '()<>@,;:"/[]?={} \t'),
+}
- Args:
- html_data (str): HTML-formatted string.
- Yields:
- str: A string containing the parameter found in HTML object.
-
- Examples:
- >>> html_data = '''
- ...
- ...
- ...
- ... Click Me
- ...
- ...
- ...
- ... '''
- >>> list(extract_params_html(html_data))
- ['user', 'param2', 'param3']
- """
- input_tag = bbot_regexes.input_tag_regex.findall(html_data)
-
- for i in input_tag:
- log.debug(f"FOUND PARAM ({i}) IN INPUT TAGS")
- yield i
-
- # check for jquery get parameters
- jquery_get = bbot_regexes.jquery_get_regex.findall(html_data)
-
- for i in jquery_get:
- log.debug(f"FOUND PARAM ({i}) IN JQUERY GET PARAMS")
- yield i
-
- # check for jquery post parameters
- jquery_post = bbot_regexes.jquery_post_regex.findall(html_data)
- if jquery_post:
- for i in jquery_post:
- for x in i.split(","):
- s = x.split(":")[0].rstrip()
- log.debug(f"FOUND PARAM ({s}) IN A JQUERY POST PARAMS")
- yield s
-
- a_tag = bbot_regexes.a_tag_regex.findall(html_data)
- for s in a_tag:
- log.debug(f"FOUND PARAM ({s}) IN A TAG GET PARAMS")
- yield s
+def validate_parameter(param, compare_mode):
+ compare_mode = compare_mode.lower()
+ if len(param) > 100:
+ return False
+ if compare_mode not in valid_chars_dict:
+ raise ValueError(f"Invalid compare_mode: {compare_mode}")
+ allowed_chars = valid_chars_dict[compare_mode]
+ return set(param).issubset(allowed_chars)
def extract_words(data, acronyms=True, wordninja=True, model=None, max_length=100, word_regexes=None):
@@ -1047,6 +945,7 @@ def extract_words(data, acronyms=True, wordninja=True, model=None, max_length=10
>>> extract_words('blacklanternsecurity')
{'black', 'lantern', 'security', 'bls', 'blacklanternsecurity'}
"""
+ import wordninja as _wordninja
if word_regexes is None:
word_regexes = bbot_regexes.word_regexes
@@ -1103,6 +1002,8 @@ def closest_match(s, choices, n=1, cutoff=0.0):
>>> closest_match("asdf", ["asd", "fds", "asdff"], n=3)
['asdff', 'asd', 'fds']
"""
+ import difflib
+
matches = difflib.get_close_matches(s, choices, n=n, cutoff=cutoff)
if not choices or not matches:
return
@@ -1111,8 +1012,8 @@ def closest_match(s, choices, n=1, cutoff=0.0):
return matches
-def match_and_exit(s, choices, msg=None, loglevel="HUGEWARNING", exitcode=2):
- """Finds the closest match from a list of choices for a given string, logs a warning, and exits the program.
+def get_closest_match(s, choices, msg=None):
+ """Finds the closest match from a list of choices for a given string.
This function is particularly useful for CLI applications where you want to validate flags or modules.
@@ -1124,23 +1025,27 @@ def match_and_exit(s, choices, msg=None, loglevel="HUGEWARNING", exitcode=2):
exitcode (int, optional): The exit code to use when exiting the program. Defaults to 2.
Examples:
- >>> match_and_exit("some_module", ["some_mod", "some_other_mod"], msg="module")
+ >>> get_closest_match("some_module", ["some_mod", "some_other_mod"], msg="module")
# Output: Could not find module "some_module". Did you mean "some_mod"?
- # Exits with code 2
"""
if msg is None:
msg = ""
else:
msg += " "
closest = closest_match(s, choices)
- log_to_stderr(f'Could not find {msg}"{s}". Did you mean "{closest}"?', level="HUGEWARNING")
- sys.exit(2)
+ return f'Could not find {msg}"{s}". Did you mean "{closest}"?'
-def kill_children(parent_pid=None, sig=signal.SIGTERM):
+def kill_children(parent_pid=None, sig=None):
"""
Forgive me father for I have sinned
"""
+ import psutil
+ import signal
+
+ if sig is None:
+ sig = signal.SIGTERM
+
try:
parent = psutil.Process(parent_pid)
except psutil.NoSuchProcess:
@@ -1283,6 +1188,8 @@ def rm_at_exit(path):
Examples:
>>> rm_at_exit("/tmp/test/file1.txt")
"""
+ import atexit
+
atexit.register(delete_file, path)
@@ -1396,6 +1303,8 @@ def which(*executables):
>>> which("python", "python3")
"/usr/bin/python"
"""
+ import shutil
+
for e in executables:
location = shutil.which(e)
if location:
@@ -1494,74 +1403,6 @@ def search_dict_values(d, *regexes):
yield from search_dict_values(v, *regexes)
-def filter_dict(d, *key_names, fuzzy=False, exclude_keys=None, _prev_key=None):
- """
- Recursively filter a dictionary based on key names.
-
- Args:
- d (dict): The input dictionary.
- *key_names: Names of keys to filter for.
- fuzzy (bool): Whether to perform fuzzy matching on keys.
- exclude_keys (list, None): List of keys to be excluded from the final dict.
- _prev_key (str, None): For internal recursive use; the previous key in the hierarchy.
-
- Returns:
- dict: A dictionary containing only the keys specified in key_names.
-
- Examples:
- >>> filter_dict({"key1": "test", "key2": "asdf"}, "key2")
- {"key2": "asdf"}
- >>> filter_dict({"key1": "test", "key2": {"key3": "asdf"}}, "key1", "key3", exclude_keys="key2")
- {'key1': 'test'}
- """
- if exclude_keys is None:
- exclude_keys = []
- if isinstance(exclude_keys, str):
- exclude_keys = [exclude_keys]
- ret = {}
- if isinstance(d, dict):
- for key in d:
- if key in key_names or (fuzzy and any(k in key for k in key_names)):
- if not any(k in exclude_keys for k in [key, _prev_key]):
- ret[key] = copy.deepcopy(d[key])
- elif isinstance(d[key], list) or isinstance(d[key], dict):
- child = filter_dict(d[key], *key_names, fuzzy=fuzzy, _prev_key=key, exclude_keys=exclude_keys)
- if child:
- ret[key] = child
- return ret
-
-
-def clean_dict(d, *key_names, fuzzy=False, exclude_keys=None, _prev_key=None):
- """
- Recursively clean unwanted keys from a dictionary.
- Useful for removing secrets from a config.
-
- Args:
- d (dict): The input dictionary.
- *key_names: Names of keys to remove.
- fuzzy (bool): Whether to perform fuzzy matching on keys.
- exclude_keys (list, None): List of keys to be excluded from removal.
- _prev_key (str, None): For internal recursive use; the previous key in the hierarchy.
-
- Returns:
- dict: A dictionary cleaned of the keys specified in key_names.
-
- """
- if exclude_keys is None:
- exclude_keys = []
- if isinstance(exclude_keys, str):
- exclude_keys = [exclude_keys]
- d = copy.deepcopy(d)
- if isinstance(d, dict):
- for key, val in list(d.items()):
- if key in key_names or (fuzzy and any(k in key for k in key_names)):
- if _prev_key not in exclude_keys:
- d.pop(key)
- else:
- d[key] = clean_dict(val, *key_names, fuzzy=fuzzy, _prev_key=key, exclude_keys=exclude_keys)
- return d
-
-
def grouper(iterable, n):
"""
Grouper groups an iterable into chunks of a given size.
@@ -1577,6 +1418,7 @@ def grouper(iterable, n):
>>> list(grouper('ABCDEFG', 3))
[['A', 'B', 'C'], ['D', 'E', 'F'], ['G']]
"""
+ from itertools import islice
iterable = iter(iterable)
return iter(lambda: list(islice(iterable, n)), [])
@@ -1655,6 +1497,8 @@ def make_date(d=None, microseconds=False):
>>> make_date(microseconds=True)
"20220707_1330_35167617"
"""
+ from datetime import datetime
+
f = "%Y%m%d_%H%M_%S"
if microseconds:
f += "%f"
@@ -1788,6 +1632,8 @@ def rm_rf(f):
Examples:
>>> rm_rf("/tmp/httpx98323849")
"""
+ import shutil
+
shutil.rmtree(f)
@@ -1907,6 +1753,8 @@ def smart_encode_punycode(text: str) -> str:
"""
ドメイン.テスト --> xn--eckwd4c7c.xn--zckzah
"""
+ import idna
+
host, before, after = extract_host(text)
if host is None:
return text
@@ -1923,6 +1771,8 @@ def smart_decode_punycode(text: str) -> str:
"""
xn--eckwd4c7c.xn--zckzah --> ドメイン.テスト
"""
+ import idna
+
host, before, after = extract_host(text)
if host is None:
return text
@@ -2014,6 +1864,8 @@ def make_table(rows, header, **kwargs):
| row2 | row2 |
+-----------+-----------+
"""
+ from tabulate import tabulate
+
# fix IndexError: list index out of range
if not rows:
rows = [[]]
@@ -2150,6 +2002,50 @@ def human_to_bytes(filesize):
raise ValueError(f'Unable to convert filesize "{filesize}" to bytes')
+def integer_to_ordinal(n):
+ """
+ Convert an integer to its ordinal representation.
+
+ Args:
+ n (int): The integer to convert.
+
+ Returns:
+ str: The ordinal representation of the integer.
+
+ Examples:
+ >>> integer_to_ordinal(1)
+ '1st'
+ >>> integer_to_ordinal(2)
+ '2nd'
+ >>> integer_to_ordinal(3)
+ '3rd'
+ >>> integer_to_ordinal(11)
+ '11th'
+ >>> integer_to_ordinal(21)
+ '21st'
+ >>> integer_to_ordinal(101)
+ '101st'
+ """
+ # Check the last digit
+ last_digit = n % 10
+ # Check the last two digits for special cases (11th, 12th, 13th)
+ last_two_digits = n % 100
+
+ if 10 <= last_two_digits <= 20:
+ suffix = "th"
+ else:
+ if last_digit == 1:
+ suffix = "st"
+ elif last_digit == 2:
+ suffix = "nd"
+ elif last_digit == 3:
+ suffix = "rd"
+ else:
+ suffix = "th"
+
+ return f"{n}{suffix}"
+
+
def cpu_architecture():
"""Return the CPU architecture of the current system.
@@ -2163,6 +2059,8 @@ def cpu_architecture():
>>> cpu_architecture()
'amd64'
"""
+ import platform
+
uname = platform.uname()
arch = uname.machine.lower()
if arch.startswith("aarch"):
@@ -2185,6 +2083,8 @@ def os_platform():
>>> os_platform()
'linux'
"""
+ import platform
+
return platform.system().lower()
@@ -2210,7 +2110,7 @@ def os_platform_friendly():
tag_filter_regex = re.compile(r"[^a-z0-9]+")
-def tagify(s, maxlen=None):
+def tagify(s, delimiter=None, maxlen=None):
"""Sanitize a string into a tag-friendly format.
Converts a given string to lowercase and replaces all characters not matching
@@ -2229,8 +2129,10 @@ def tagify(s, maxlen=None):
>>> tagify("HTTP Web Title", maxlen=8)
'http-web'
"""
+ if delimiter is None:
+ delimiter = "-"
ret = str(s).lower()
- return tag_filter_regex.sub("-", ret)[:maxlen].strip("-")
+ return tag_filter_regex.sub(delimiter, ret)[:maxlen].strip(delimiter)
def memory_status():
@@ -2253,6 +2155,8 @@ def memory_status():
>>> mem.percent
79.0
"""
+ import psutil
+
return psutil.virtual_memory()
@@ -2275,6 +2179,8 @@ def swap_status():
>>> swap.used
2097152
"""
+ import psutil
+
return psutil.swap_memory()
@@ -2297,6 +2203,8 @@ def get_size(obj, max_depth=5, seen=None):
>>> get_size(my_dict, max_depth=3)
8400
"""
+ from collections.abc import Mapping
+
# If seen is not provided, initialize an empty set
if seen is None:
seen = set()
@@ -2372,6 +2280,8 @@ def cloudcheck(ip):
>>> cloudcheck("168.62.20.37")
('Azure', 'cloud', IPv4Network('168.62.0.0/19'))
"""
+ import cloudcheck as _cloudcheck
+
return _cloudcheck.check(ip)
@@ -2391,6 +2301,8 @@ def is_async_function(f):
>>> is_async_function(foo)
True
"""
+ import inspect
+
return inspect.iscoroutinefunction(f)
@@ -2451,6 +2363,27 @@ def get_exception_chain(e):
return exception_chain
+def in_exception_chain(e, exc_types):
+ """
+ Given an Exception and a list of Exception types, returns whether any of the specified types are contained anywhere in the Exception chain.
+
+ Args:
+ e (BaseException): The exception to check
+ exc_types (list[Exception]): Exception types to consider intentional cancellations. Default is KeyboardInterrupt
+
+ Returns:
+ bool: Whether the error is the result of an intentional cancellaion
+
+ Examples:
+ >>> try:
+ ... raise ValueError("This is a value error")
+ ... except Exception as e:
+ ... if not in_exception_chain(e, (KeyboardInterrupt, asyncio.CancelledError)):
+ ... raise
+ """
+ return any([isinstance(_, exc_types) for _ in get_exception_chain(e)])
+
+
def get_traceback_details(e):
"""
Retrieves detailed information from the traceback of an exception.
@@ -2469,6 +2402,8 @@ def get_traceback_details(e):
... print(f"File: {filename}, Line: {lineno}, Function: {funcname}")
File: , Line: 2, Function:
"""
+ import traceback
+
tb = traceback.extract_tb(e.__traceback__)
last_frame = tb[-1] # Get the last frame in the traceback (the one where the exception was raised)
filename = last_frame.filename
@@ -2499,7 +2434,7 @@ async def cancel_tasks(tasks, ignore_errors=True):
current_task = asyncio.current_task()
tasks = [t for t in tasks if t != current_task]
for task in tasks:
- log.debug(f"Cancelling task: {task}")
+ # log.debug(f"Cancelling task: {task}")
task.cancel()
if ignore_errors:
for task in tasks:
@@ -2507,6 +2442,8 @@ async def cancel_tasks(tasks, ignore_errors=True):
await task
except BaseException as e:
if not isinstance(e, asyncio.CancelledError):
+ import traceback
+
log.trace(traceback.format_exc())
@@ -2529,7 +2466,7 @@ def cancel_tasks_sync(tasks):
current_task = asyncio.current_task()
for task in tasks:
if task != current_task:
- log.debug(f"Cancelling task: {task}")
+ # log.debug(f"Cancelling task: {task}")
task.cancel()
@@ -2653,6 +2590,33 @@ async def as_completed(coros):
yield task
+def clean_dns_record(record):
+ """
+ Cleans and formats a given DNS record for further processing.
+
+ This static method converts the DNS record to text format if it's not already a string.
+ It also removes any trailing dots and converts the record to lowercase.
+
+ Args:
+ record (str or dns.rdata.Rdata): The DNS record to clean.
+
+ Returns:
+ str: The cleaned and formatted DNS record.
+
+ Examples:
+ >>> clean_dns_record('www.evilcorp.com.')
+ 'www.evilcorp.com'
+
+ >>> from dns.rrset import from_text
+ >>> record = from_text('www.evilcorp.com', 3600, 'IN', 'A', '1.2.3.4')[0]
+ >>> clean_dns_record(record)
+ '1.2.3.4'
+ """
+ if not isinstance(record, str):
+ record = str(record.to_text())
+ return str(record).rstrip(".").lower()
+
+
def truncate_filename(file_path, max_length=255):
"""
Truncate the filename while preserving the file extension to ensure the total path length does not exceed the maximum length.
@@ -2686,3 +2650,138 @@ def truncate_filename(file_path, max_length=255):
new_path = directory / (truncated_stem + suffix)
return new_path
+
+
+def get_keys_in_dot_syntax(config):
+ """Retrieve all keys in an OmegaConf configuration in dot notation.
+
+ This function converts an OmegaConf configuration into a list of keys
+ represented in dot notation.
+
+ Args:
+ config (DictConfig): The OmegaConf configuration object.
+
+ Returns:
+ List[str]: A list of keys in dot notation.
+
+ Examples:
+ >>> config = OmegaConf.create({
+ ... "web": {
+ ... "test": True
+ ... },
+ ... "db": {
+ ... "host": "localhost",
+ ... "port": 5432
+ ... }
+ ... })
+ >>> get_keys_in_dot_syntax(config)
+ ['web.test', 'db.host', 'db.port']
+ """
+ from omegaconf import OmegaConf
+
+ container = OmegaConf.to_container(config, resolve=True)
+ keys = []
+
+ def recursive_keys(d, parent_key=""):
+ for k, v in d.items():
+ full_key = f"{parent_key}.{k}" if parent_key else k
+ if isinstance(v, dict):
+ recursive_keys(v, full_key)
+ else:
+ keys.append(full_key)
+
+ recursive_keys(container)
+ return keys
+
+
+def filter_dict(d, *key_names, fuzzy=False, exclude_keys=None, _prev_key=None):
+ """
+ Recursively filter a dictionary based on key names.
+
+ Args:
+ d (dict): The input dictionary.
+ *key_names: Names of keys to filter for.
+ fuzzy (bool): Whether to perform fuzzy matching on keys.
+ exclude_keys (list, None): List of keys to be excluded from the final dict.
+ _prev_key (str, None): For internal recursive use; the previous key in the hierarchy.
+
+ Returns:
+ dict: A dictionary containing only the keys specified in key_names.
+
+ Examples:
+ >>> filter_dict({"key1": "test", "key2": "asdf"}, "key2")
+ {"key2": "asdf"}
+ >>> filter_dict({"key1": "test", "key2": {"key3": "asdf"}}, "key1", "key3", exclude_keys="key2")
+ {'key1': 'test'}
+ """
+ if exclude_keys is None:
+ exclude_keys = []
+ if isinstance(exclude_keys, str):
+ exclude_keys = [exclude_keys]
+ ret = {}
+ if isinstance(d, dict):
+ for key in d:
+ if key in key_names or (fuzzy and any(k in key for k in key_names)):
+ if not any(k in exclude_keys for k in [key, _prev_key]):
+ ret[key] = copy.deepcopy(d[key])
+ elif isinstance(d[key], list) or isinstance(d[key], dict):
+ child = filter_dict(d[key], *key_names, fuzzy=fuzzy, _prev_key=key, exclude_keys=exclude_keys)
+ if child:
+ ret[key] = child
+ return ret
+
+
+def clean_dict(d, *key_names, fuzzy=False, exclude_keys=None, _prev_key=None):
+ """
+ Recursively clean unwanted keys from a dictionary.
+ Useful for removing secrets from a config.
+
+ Args:
+ d (dict): The input dictionary.
+ *key_names: Names of keys to remove.
+ fuzzy (bool): Whether to perform fuzzy matching on keys.
+ exclude_keys (list, None): List of keys to be excluded from removal.
+ _prev_key (str, None): For internal recursive use; the previous key in the hierarchy.
+
+ Returns:
+ dict: A dictionary cleaned of the keys specified in key_names.
+
+ """
+ if exclude_keys is None:
+ exclude_keys = []
+ if isinstance(exclude_keys, str):
+ exclude_keys = [exclude_keys]
+ d = copy.deepcopy(d)
+ if isinstance(d, dict):
+ for key, val in list(d.items()):
+ if key in key_names or (fuzzy and any(k in key for k in key_names)):
+ if _prev_key not in exclude_keys:
+ d.pop(key)
+ continue
+ d[key] = clean_dict(val, *key_names, fuzzy=fuzzy, _prev_key=key, exclude_keys=exclude_keys)
+ return d
+
+
+top_ports_cache = None
+
+
+def top_tcp_ports(n, as_string=False):
+ """
+ Returns the top *n* TCP ports as evaluated by nmap
+ """
+ top_ports_file = Path(__file__).parent.parent.parent / "wordlists" / "top_open_ports_nmap.txt"
+
+ global top_ports_cache
+ if top_ports_cache is None:
+ # Read the open ports from the file
+ with open(top_ports_file, "r") as f:
+ top_ports_cache = [int(line.strip()) for line in f]
+
+ # If n is greater than the length of the ports list, add remaining ports from range(1, 65536)
+ unique_ports = set(top_ports_cache)
+ top_ports_cache.extend([port for port in range(1, 65536) if port not in unique_ports])
+
+ top_ports = top_ports_cache[:n]
+ if as_string:
+ return ",".join([str(s) for s in top_ports])
+ return top_ports
diff --git a/bbot/core/helpers/names_generator.py b/bbot/core/helpers/names_generator.py
index 3e16b446a..c0a9ef4c3 100644
--- a/bbot/core/helpers/names_generator.py
+++ b/bbot/core/helpers/names_generator.py
@@ -2,6 +2,7 @@
adjectives = [
"abnormal",
+ "accidental",
"acoustic",
"acrophobic",
"adorable",
@@ -9,6 +10,7 @@
"affectionate",
"aggravated",
"aggrieved",
+ "almighty",
"anal",
"atrocious",
"awkward",
@@ -140,6 +142,7 @@
"medicated",
"mediocre",
"melodramatic",
+ "mighty",
"moist",
"molten",
"monstrous",
@@ -188,6 +191,7 @@
"rapid_unscheduled",
"raving",
"reckless",
+ "reductive",
"ripped",
"sadistic",
"satanic",
@@ -233,7 +237,6 @@
"ticklish",
"tiny",
"tricky",
- "tufty",
"twitchy",
"ugly",
"unabated",
@@ -578,6 +581,7 @@
"rachel",
"radagast",
"ralph",
+ "rambunctious",
"randy",
"raymond",
"rebecca",
diff --git a/bbot/core/helpers/ntlm.py b/bbot/core/helpers/ntlm.py
index 8605ef34a..9d66b3ea7 100644
--- a/bbot/core/helpers/ntlm.py
+++ b/bbot/core/helpers/ntlm.py
@@ -5,7 +5,7 @@
import logging
import collections
-from bbot.core.errors import NTLMError
+from bbot.errors import NTLMError
log = logging.getLogger("bbot.core.helpers.ntlm")
diff --git a/bbot/core/helpers/process.py b/bbot/core/helpers/process.py
new file mode 100644
index 000000000..7f3a23849
--- /dev/null
+++ b/bbot/core/helpers/process.py
@@ -0,0 +1,71 @@
+import logging
+import traceback
+import threading
+import multiprocessing
+from multiprocessing.context import SpawnProcess
+
+from .misc import in_exception_chain
+
+
+current_process = multiprocessing.current_process()
+
+
+class BBOTThread(threading.Thread):
+
+ default_name = "default bbot thread"
+
+ def __init__(self, *args, **kwargs):
+ self.custom_name = kwargs.pop("custom_name", self.default_name)
+ super().__init__(*args, **kwargs)
+
+ def run(self):
+ from setproctitle import setthreadtitle
+
+ setthreadtitle(str(self.custom_name))
+ super().run()
+
+
+class BBOTProcess(SpawnProcess):
+
+ default_name = "bbot process pool"
+
+ def __init__(self, *args, **kwargs):
+ self.log_queue = kwargs.pop("log_queue", None)
+ self.log_level = kwargs.pop("log_level", None)
+ self.custom_name = kwargs.pop("custom_name", self.default_name)
+ super().__init__(*args, **kwargs)
+ self.daemon = True
+
+ def run(self):
+ """
+ A version of Process.run() with BBOT logging and better error handling
+ """
+ log = logging.getLogger("bbot.core.process")
+ try:
+ if self.log_level is not None and self.log_queue is not None:
+ from bbot.core import CORE
+
+ CORE.logger.setup_queue_handler(self.log_queue, self.log_level)
+ if self.custom_name:
+ from setproctitle import setproctitle
+
+ setproctitle(str(self.custom_name))
+ super().run()
+ except BaseException as e:
+ if not in_exception_chain(e, (KeyboardInterrupt,)):
+ log.warning(f"Error in {self.name}: {e}")
+ log.trace(traceback.format_exc())
+
+
+if current_process.name == "MainProcess":
+ # if this is the main bbot process, set the logger and queue for the first time
+ from bbot.core import CORE
+ from functools import partialmethod
+
+ BBOTProcess.__init__ = partialmethod(
+ BBOTProcess.__init__, log_level=CORE.logger.log_level, log_queue=CORE.logger.queue
+ )
+
+# this makes our process class the default for process pools, etc.
+mp_context = multiprocessing.get_context("spawn")
+mp_context.Process = BBOTProcess
diff --git a/bbot/core/helpers/regex.py b/bbot/core/helpers/regex.py
new file mode 100644
index 000000000..f0bee1fc0
--- /dev/null
+++ b/bbot/core/helpers/regex.py
@@ -0,0 +1,105 @@
+import asyncio
+import regex as re
+from . import misc
+
+
+class RegexHelper:
+ """
+ Class for misc CPU-intensive regex operations
+
+ Offloads regex processing to other CPU cores via GIL release + thread pool
+
+ For quick, one-off regexes, you don't need to use this helper.
+ Only use this helper if you're searching large bodies of text
+ or if your regex is CPU-intensive
+ """
+
+ def __init__(self, parent_helper):
+ self.parent_helper = parent_helper
+
+ def ensure_compiled_regex(self, r):
+ """
+ Make sure a regex has been compiled
+ """
+ if not isinstance(r, re.Pattern):
+ raise ValueError("Regex must be compiled first!")
+
+ def compile(self, *args, **kwargs):
+ return re.compile(*args, **kwargs)
+
+ async def search(self, compiled_regex, *args, **kwargs):
+ self.ensure_compiled_regex(compiled_regex)
+ return await self.parent_helper.run_in_executor(compiled_regex.search, *args, **kwargs)
+
+ async def findall(self, compiled_regex, *args, **kwargs):
+ self.ensure_compiled_regex(compiled_regex)
+ return await self.parent_helper.run_in_executor(compiled_regex.findall, *args, **kwargs)
+
+ async def findall_multi(self, compiled_regexes, *args, threads=10, **kwargs):
+ """
+ Same as findall() but with multiple regexes
+ """
+ if not isinstance(compiled_regexes, dict):
+ raise ValueError('compiled_regexes must be a dictionary like this: {"regex_name": }')
+ for k, v in compiled_regexes.items():
+ self.ensure_compiled_regex(v)
+
+ tasks = {}
+
+ def new_task(regex_name, r):
+ task = self.parent_helper.run_in_executor(r.findall, *args, **kwargs)
+ tasks[task] = regex_name
+
+ compiled_regexes = dict(compiled_regexes)
+ for _ in range(threads): # Start initial batch of tasks
+ if compiled_regexes: # Ensure there are args to process
+ new_task(*compiled_regexes.popitem())
+
+ while tasks: # While there are tasks pending
+ # Wait for the first task to complete
+ done, pending = await asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED)
+
+ for task in done:
+ result = task.result()
+ regex_name = tasks.pop(task)
+ yield (regex_name, result)
+
+ if compiled_regexes: # Start a new task for each one completed, if URLs remain
+ new_task(*compiled_regexes.popitem())
+
+ async def finditer(self, compiled_regex, *args, **kwargs):
+ self.ensure_compiled_regex(compiled_regex)
+ return await self.parent_helper.run_in_executor(self._finditer, compiled_regex, *args, **kwargs)
+
+ async def finditer_multi(self, compiled_regexes, *args, **kwargs):
+ """
+ Same as finditer() but with multiple regexes
+ """
+ for r in compiled_regexes:
+ self.ensure_compiled_regex(r)
+ return await self.parent_helper.run_in_executor(self._finditer_multi, compiled_regexes, *args, **kwargs)
+
+ def _finditer_multi(self, compiled_regexes, *args, **kwargs):
+ matches = []
+ for r in compiled_regexes:
+ for m in r.finditer(*args, **kwargs):
+ matches.append(m)
+ return matches
+
+ def _finditer(self, compiled_regex, *args, **kwargs):
+ return list(compiled_regex.finditer(*args, **kwargs))
+
+ async def extract_params_html(self, *args, **kwargs):
+ return await self.parent_helper.run_in_executor(misc.extract_params_html, *args, **kwargs)
+
+ async def extract_emails(self, *args, **kwargs):
+ return await self.parent_helper.run_in_executor(misc.extract_emails, *args, **kwargs)
+
+ async def search_dict_values(self, *args, **kwargs):
+ def _search_dict_values(*_args, **_kwargs):
+ return list(misc.search_dict_values(*_args, **_kwargs))
+
+ return await self.parent_helper.run_in_executor(_search_dict_values, *args, **kwargs)
+
+ async def recursive_decode(self, *args, **kwargs):
+ return await self.parent_helper.run_in_executor(misc.recursive_decode, *args, **kwargs)
diff --git a/bbot/core/helpers/regexes.py b/bbot/core/helpers/regexes.py
index 890cedf40..0c01ff022 100644
--- a/bbot/core/helpers/regexes.py
+++ b/bbot/core/helpers/regexes.py
@@ -1,4 +1,4 @@
-import re
+import regex as re
from collections import OrderedDict
# for extracting words from strings
@@ -114,11 +114,29 @@
scan_name_regex = re.compile(r"[a-z]{3,20}_[a-z]{3,20}")
-# For use with extract_params_html helper
-input_tag_regex = re.compile(r"]+?name=[\"\'](\w+)[\"\']")
+# For use with excavate paramaters extractor
+input_tag_regex = re.compile(
+ r"]+?name=[\"\']?([\.$\w]+)[\"\']?(?:[^>]*?value=[\"\']([=+\/\w]*)[\"\'])?[^>]*>"
+)
jquery_get_regex = re.compile(r"url:\s?[\"\'].+?\?(\w+)=")
jquery_post_regex = re.compile(r"\$.post\([\'\"].+[\'\"].+\{(.+)\}")
-a_tag_regex = re.compile(r"]*href=[\"\'][^\"\'?>]*\?([^&\"\'=]+)")
+a_tag_regex = re.compile(r"]*href=[\"\']([^\"\'?>]*)\?([^&\"\'=]+)=([^&\"\'=]+)")
+img_tag_regex = re.compile(r"]*src=[\"\']([^\"\'?>]*)\?([^&\"\'=]+)=([^&\"\'=]+)")
+get_form_regex = re.compile(
+ r"
+
+
+