Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passive Subdomain Enum takes forever on large domains #1045

Closed
lappsec opened this issue Feb 1, 2024 · 11 comments
Closed

Passive Subdomain Enum takes forever on large domains #1045

lappsec opened this issue Feb 1, 2024 · 11 comments
Assignees
Labels
bug Something isn't working high-priority

Comments

@lappsec
Copy link

lappsec commented Feb 1, 2024

Describe the bug
I'll start by saying I know that large domain spaces will generally take longer to scan and this issue is part question and part feature request.

I've found that when running a subdomain enum on large domain spaces the scan can take forever. This is even after disabling the more aggressive modules like massdns. For example, I ran a subdomain enum scan on comcast.com for a test with the following command:

bbot -t comcast.com -f subdomain-enum --output-dir /root/subdomain_enum/logs --name comcast.com -y -rf passive --config /root/subdomain_enum/bbot_secrets.yml -em ipneighbor,asn,massdns,postman

So far it has been running for 13 hours and still has 37,000 events in queue as seen here:

[INFO] comcast.com: Modules running (incoming:processing:outgoing) anubisdb(20,974:0:19,015), subdomaincenter(21,524:0:12,385), columbus(21,691:0:3,971), internetdb(11,456:0:101), dnscommonsrv(8,507:5:0), speculate(0:0:320)
[INFO] comcast.com: Events produced so far: DNS_NAME_UNRESOLVED: 30741, DNS_NAME: 21830, IP_ADDRESS: 9825, TECHNOLOGY: 420, STORAGE_BUCKET: 1
[INFO] comcast.com: 37,315 events in queue (DNS_NAME: 36,269, IP_ADDRESS: 1,019, OPEN_TCP_PORT: 20, TECHNOLOGY: 5, FINDING: 2)

Expected behavior
I would expect a passive subdomain enumeration scan to take far less time, at least when the massdns module is not enabled. It seems like there's a bottleneck somewhere since the number of events in the queue only goes down in small decrements.

That also leads me to a question: Are there any optimization options that can be used to limit scan times? It would be great to have an option to limit the amount of time a scan and/or module can run. This is especially helpful when running a scan in a non-interactive environment (part of another workflow, cron job, etc.) where you're not actively monitoring its execution. Amass has a similar option for limiting the runtime of a scan. Unfortunately it never actually worked for me which is why I was wanting to switch to bbot but am now running into similar issues.

BBOT Command
bbot -t comcast.com -f subdomain-enum --output-dir /root/subdomain_enum/logs --name comcast.com -y -rf passive --config /root/subdomain_enum/bbot_secrets.yml -em ipneighbor,asn,massdns,postman

OS, BBOT Installation Method + Version
`OS: Ubuntu 20.04, Installation method: Docker, BBOT version: v1.1.5

BBOT Config
The config is the default that comes in the docker image.

Logs
Attached
debug.log

@lappsec lappsec added the bug Something isn't working label Feb 1, 2024
@TheTechromancer
Copy link
Collaborator

@lappsec thanks for opening an issue. This kind of feedback helps a lot in speeding up BBOT and making it as efficient as possible.

At first glance, I would guess the bottleneck in this situation is DNS. Whenever you see a lot of events in the queue like that, they are waiting to be resolved. DNS resolution is considered by BBOT to be passive, so even during a passive scan, it will perform DNS lookups on each subdomain for each record type - A, AAAA, MX, NS, etc. If you consider the extra checks that needs to happen for wildcards, this turns out to be quite a few DNS queries, probably on average about ten or twenty per subdomain.

For this reason one of the most important requirements is that you have a good internet connection, fast dns servers configured in your OS, and preferably as many of them as possible. (i.e. in your /etc/resolv.conf). BBOT will automatically rotate through them, balancing the load.

Here are some questions that if you can answer them will help us narrow down the problem:

  • What speed of internet do you have (upload and download)?
  • How many resolvers do you have in your /etc/resolv.conf?
  • What does your CPU usage look like during the scan?

@TheTechromancer TheTechromancer self-assigned this Feb 1, 2024
@lappsec
Copy link
Author

lappsec commented Feb 1, 2024

@TheTechromancer Thanks for the quick response, and also for all the work you've put into bbot!

That certainly makes sense with the DNS bottleneck, especially when dealing with a large domain space that has a lot of wildcards.
To answer your questions:

  • Internet Speed:
    • I was running this from an EC2 instance (t2.medium). The throughput on a speed test was around 970mbps. Speed shouldn't be a problem, though I don't know if that gets throttled eventually from AWS.
  • Number of resolvers;
    • I have 4 resolvers in /etc/resolv.conf. Three of those are typical large DNS services (8.8.8.8, 8.8.4.4, 1.1.1.1) and one is the local resolver 127.0.0.53 which I think is just using the AWS resolvers.
  • CPU Usage
    • CPU was at 100% at the time I posted the issue this morning (I canceled the scan shortly afterwards). I don't remember what the memory usage was but it wasn't excessive.

I can always try adding more resolvers to the system and see if that helps. If you have any more ideas or need more info, let me know.

@TheTechromancer
Copy link
Collaborator

TheTechromancer commented Feb 1, 2024

Thanks, that helps a lot. Based on that information, I think it's safe to rule out your internet/dns servers as the cause. CPU usage being at 100% during that phase of the scan is definitely abnormal, and probably indicates a bug.

I'll be digging deeper into this. BBOT should have no problem scanning a huge domain like comcast.com, so it is high priority to get this fixed.

EDIT: In the meantime, if you're able, would you mind running the same scan again with --debug and the environment variable PYTHONASYNCIODEBUG=1? This may give us some hints as to the cause of the high CPU.

EDIT2: On second thought, don't bother with the debug stuff. I was able to reproduce it on my end.

@lappsec
Copy link
Author

lappsec commented Feb 1, 2024

Sounds good, I was waiting until another task was finished before trying it again but will hold off. Let me know if you need more input from me.

@TheTechromancer
Copy link
Collaborator

Okay, already one interesting finding from this. Did some testing, and google's DNS servers are rate-limiting us:

image

Cloudflare's seem unaffected. So removing 8.8.8.8 and 8.8.4.4 from your /etc/resolv.conf (and any mystery resolvers like 127.0.0.53, which may still forward to 8.8.8.8) should actually speed up the scan significantly.

There may still be some DNS optimizations we can do within BBOT, which I will look into. But in the meantime, for big domains like comcast.com, it seems to perform best if you decrease the number of threads (-c max_threads=5). This prevents it from doing too many queries at once, and seems to set it on a steady pace.

@TheTechromancer
Copy link
Collaborator

TheTechromancer commented Feb 2, 2024

Based on my tests it seems like this scan was suffering from a few different small problems that combined together to cause a lot of trouble:

  • Google's DNS rate limiting (fixed by changing DNS servers)
  • Too much asyncio parallelization (fixed by tweaking DNS multitasking)
  • A lot of garbage data (~20K unresolved subdomains) from anubisdb (fixed by limiting its results to a max of 1000)

These fixes have been pushed to the feature branch speed-optimizations. With these changes in place, I was able to run your scan in a little over 3 hours. The scan found 16,407 unique subdomains.

image

There is one more issue which is a bit more subtle, but I think if we can fix it it will speed this up even more. For my own future reference, this was the result of the cProfile for the above scan:

         7043000661 function calls (7028841795 primitive calls) in 11823.096 seconds                                                                                                                               
                                                                                                                                                                                                                   
   Ordered by: cumulative time                                                                                                                                                                                     
                                                                                                                                                                                                                   
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)                                                                                                                                            
   1213/1    0.031    0.000 11827.446 11827.446 {built-in method builtins.exec}                                                                                                                                    
        1    0.000    0.000 11827.446 11827.446 bbot:1(<module>)                                                                                                                                                   
        1    0.000    0.000 11824.569 11824.569 cli.py:375(main)                                                                                                                                                   
        1    0.000    0.000 11824.569 11824.569 runners.py:160(run)                                                                                                                                                
        3    0.000    0.000 11824.567 3941.522 base_events.py:617(run_until_complete)                                                                                                                              
        3    0.509    0.170 11824.567 3941.522 base_events.py:593(run_forever)                                                                                                                                     
        1    0.000    0.000 11824.566 11824.566 runners.py:86(run)                                                                                                                                                 
   374185   23.966    0.000 11824.058    0.032 base_events.py:1845(_run_once)                                                                                                                                      
  9701615   12.825    0.000 11764.843    0.001 events.py:78(_run)                                                                                                                                                  
  9701615   15.153    0.000 11752.017    0.001 {method 'run' of '_contextvars.Context' objects}                                                                                                                    
  1990923    4.720    0.000 10432.677    0.005 manager.py:80(emit_event)                                                                                                                                           
  1978462    9.919    0.000 10404.217    0.005 manager.py:136(_emit_event)                                                                                                                                         
  2745001   12.176    0.000 8378.017    0.003 dns.py:187(resolve_raw)                                                                                                                                              
   874983   10.416    0.000 7999.252    0.009 dns.py:471(resolve_event)                                                                                                                                            
  2683246   18.356    0.000 7706.722    0.003 dns.py:245(_resolve_hostname)                                                                                                                                        
   843260    7.590    0.000 6827.694    0.008 cache.py:83(put)                                                                                                                                                     
  2146531 6816.544    0.003 6822.026    0.003 cache.py:92(_truncate)                                                                                                                                               
   756645    1.044    0.000 6800.870    0.009 cache.py:129(__setitem__)                                                                                                                                            
  2648275   16.409    0.000 1503.915    0.001 dns.py:727(_catch)                                                                                                                                                   
  2648275    6.687    0.000 1451.388    0.001 dns.py:44(resolve)                                                                                                                                                   
  2648275   10.739    0.000 1435.928    0.001 asyncresolver.py:45(resolve)                                                                                                                                         
  1164436    3.322    0.000 1210.057    0.001 dns.py:400(handle_wildcard_event)                                                                                                                                    
  1164236    6.322    0.000 1187.735    0.001 dns.py:760(is_wildcard)                                                                                                                                              
  2729273    5.073    0.000 1043.550    0.000 nameserver.py:121(async_query)                                                                                                                                       
  2326749   14.976    0.000  930.475    0.000 asyncquery.py:154(udp)                                                                                                                                               
   984981   26.750    0.000  848.209    0.001 base.py:536(_worker)                                                                                                                                                 
 30421968   57.377    0.000  692.447    0.000 ipaddress.py:57(ip_network)                                                                                                                                          
    61755    0.450    0.000  631.390    0.010 dns.py:340(_resolve_ip)
   325942    0.775    0.000  606.315    0.002 __init__.py:58(check)
   228937   38.646    0.000  600.017    0.003 __init__.py:63(check_ip)
 10691833   36.393    0.000  590.714    0.000 misc.py:193(split_host_port)
  1234577    2.162    0.000  585.359    0.000 base.py:660(_event_postcheck)
  1551251    5.984    0.000  577.358    0.000 asyncquery.py:113(receive_udp)
   855664    4.219    0.000  571.631    0.001 message.py:1227(from_wire)
   855664   10.795    0.000  559.504    0.001 message.py:1192(read)
  1234577   13.165    0.000  558.848    0.000 base.py:673(__event_postcheck)
 16072519   10.210    0.000  525.532    0.000 dictconfig.py:430(get)
 16073170   23.186    0.000  515.343    0.000 dictconfig.py:438(_get_impl)
  3184506    9.939    0.000  502.291    0.000 misc.py:260(parent_domain)
    81238    8.113    0.000  480.465    0.006 manager.py:376(distribute_event)
  3366714    3.914    0.000  476.682    0.000 misc.py:295(domain_parents)
   422287    0.678    0.000  474.033    0.001 target.py:235(__contains__)
   422287    0.544    0.000  473.355    0.001 target.py:224(_contains)
   422287   28.895    0.000  472.811    0.001 target.py:186(get)
  2566992   12.132    0.000  446.795    0.000 message.py:1096(_get_section)

This line particularly:

  2146531 6816.544    0.003 6822.026    0.003 cache.py:92(_truncate)

Indicates a possible performance issue with BBOT's DNS cache.

@lappsec
Copy link
Author

lappsec commented Feb 3, 2024

This is great - thanks a lot for your work on this. Damn you, Google!

I will try again based on your advice and see how it goes. Seems like it was just the perfect storm of minor problems.
I'll let you know if I still have problems.

@Sh4d0wHunt3rX
Copy link
Contributor

Sh4d0wHunt3rX commented Feb 14, 2024

@TheTechromancer Thanks for the quick response, and also for all the work you've put into bbot!

That certainly makes sense with the DNS bottleneck, especially when dealing with a large domain space that has a lot of wildcards. To answer your questions:

  • Internet Speed:

    • I was running this from an EC2 instance (t2.medium). The throughput on a speed test was around 970mbps. Speed shouldn't be a problem, though I don't know if that gets throttled eventually from AWS.
  • Number of resolvers;

    • I have 4 resolvers in /etc/resolv.conf. Three of those are typical large DNS services (8.8.8.8, 8.8.4.4, 1.1.1.1) and one is the local resolver 127.0.0.53 which I think is just using the AWS resolvers.
  • CPU Usage

    • CPU was at 100% at the time I posted the issue this morning (I canceled the scan shortly afterwards). I don't remember what the memory usage was but it wasn't excessive.

I can always try adding more resolvers to the system and see if that helps. If you have any more ideas or need more info, let me know.

Hey, I was checking my server and noticed I have 127.0.0.53 too on my /etc/resolv.conf

image

image

Then I found this:

image

With this command: resolvectl status | grep -i "DNS Serve"

I got this:

image

There is an explanation here seems related and helpful:
https://unix.stackexchange.com/questions/612416/why-does-etc-resolv-conf-point-at-127-0-0-53

I updated my DNS resolvers like this:

sudo systemctl edit systemd-resolved.service

[Service]
DNS=YOUR_FIRST_DNS_SERVER_IP YOUR_SECOND_DNS_SERVER_IP

sudo systemctl daemon-reload
sudo systemctl restart systemd-resolved.service

And copy pasted these IP:
https://github.com/blacklanternsecurity/public-dns-servers/blob/master/nameservers.txt

I didn't get the initial warning of bbot which says "I'm only using one dns server and it's better to add more", however, the result of this command: resolvectl status | grep -i "DNS Serve" didn't change. Not quite sure if I have done it correctly or no.

I still got some warnings

image

Now, I tried to add the dns resolvers in to /etc/systemd/resolved.conf , Let's see if this is the correct way and I get any warnings or not.

@TheTechromancer
Copy link
Collaborator

TheTechromancer commented Feb 14, 2024

And copy pasted these IP:
https://github.com/blacklanternsecurity/public-dns-servers/blob/master/nameservers.txt

I wouldn't recommend using those DNS servers like that. They are good for DNS brute forcing but there isn't much advantage to using them in your OS. Your scan will be much faster and more reliable if you pick one or two extremely fast servers like 1.1.1.1 and 1.0.0.1.

EDIT: we are still learning the optimal setup. Soon the warning about having only one DNS server will be probably replaced by a DNS speed test that will check for rate limiting etc. and warn you if your DNS servers are too slow.

@Sh4d0wHunt3rX
Copy link
Contributor

And copy pasted these IP:
https://github.com/blacklanternsecurity/public-dns-servers/blob/master/nameservers.txt

I wouldn't recommend using those DNS servers like that. They are good for DNS brute forcing but there isn't much advantage to using them in your OS. Your scan will be much faster and more reliable if you pick one or two extremely fast servers like 1.1.1.1 and 1.0.0.1.

EDIT: we are still learning the optimal setup. Soon the warning about having only one DNS server will be probably replaced by a DNS speed test that will check for rate limiting etc. and warn you if your DNS servers are too slow.

Thanks a lot : ) My previous scan reduced from normal 50 minutes to 34 minutes. And I got once dns failing error. For next scans, I will do as you said : )

@TheTechromancer
Copy link
Collaborator

This issue has been fixed in #1051, which will be merged soon into dev.

Thanks to the combined fixes in this branch, the scan now completes in under 1 hour:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working high-priority
Projects
None yet
Development

No branches or pull requests

3 participants