Offload HTTP work into dedicated process #1223

TheTechromancer · 2024-03-28T06:59:44Z

Eventually I would like to give HTTP and DNS each their own process with their own event loop, etc.

DNS
HTTP

This would enable the scans to go much faster by both decreasing the CPU usage in the main process and freeing up the async event loop to do other things. It would make managing DNS/HTTP rate-limits easier, and allow us to finally replace projectdiscovery's httpx.

Check out this example that uses ZeroMQ and unix sockets to farm out concurrent web requests to another process:

import json
import asyncio
import zmq.asyncio
import httpx
from multiprocessing import Process

async def fetch_and_reply(client_id, message, socket, client):
    url = message['url']
    # response = await client.get(url)
    # body = response.text
    await asyncio.sleep(1)

    # Prepend the client_id to the reply so it knows where to send it
    encoded = json.dumps({'response': url}).encode('utf-8')
    await socket.send_multipart([client_id, b'', encoded])

async def web_request_server():
    context = zmq.asyncio.Context()
    socket = context.socket(zmq.ROUTER)  # Use ROUTER socket for handling multiple requests
    socket.bind("ipc:///tmp/zmqtest.sock")
    
    async with httpx.AsyncClient() as client:
        while True:
            # Receive client identity and message
            client_id, _, message = await socket.recv_multipart()
            message = json.loads(message.decode('utf-8'))

            # Process each request in a separate task for true concurrency
            asyncio.create_task(fetch_and_reply(client_id, message, socket, client))


# Wrapper to start the server
def start_server():
    asyncio.run(web_request_server())

# Client function to send requests and receive responses
async def make_web_request(url):
    context = zmq.asyncio.Context()
    socket = context.socket(zmq.REQ)  # REQ socket for requests
    socket.connect("ipc:///tmp/zmqtest.sock")  # Connect to the server

    # Send a web request
    await socket.send_json({'url': url})
    
    # Wait for the response
    message = await socket.recv_json()
    response = message['response']
    print(f"Received response: {response}")

    socket.close()

# Main function to run the client
async def main():
    urls = [f"http://{_}" for _ in range(100)]  # Assuming you're making the same request for simplicity
    await asyncio.gather(*(make_web_request(url) for url in urls))

if __name__ == "__main__":
    # Start the server in a separate process
    p = Process(target=start_server)
    p.start()
    
    # Run the client in the main process
    asyncio.run(main())
    
    # Wait for the server process to finish (it won't in this case, as it runs an infinite loop)
    p.join()

The text was updated successfully, but these errors were encountered:

TheTechromancer · 2024-03-30T04:44:43Z

zmq vs multiprocessing.queue benchmarks

1M tiny binary messages:

pyzmq (IPC) time: 0.7660360336303711 seconds
multiprocessing.Queue time: 4.027175188064575 seconds

1M big JSON messages:

pyzmq (IPC) time: 17.17799139022827 seconds
multiprocessing.Queue time: 9.628838539123535 seconds

The biggest discovery here is that the cost of IPC is almost negligible. We can do roughly 1 million messages in ten seconds, that's pretty amazing.

Pyzmq may be slightly slower for larger messages. But it has two very important features that multiprocessing.queue doesn't -- router/dealer topology, and async support.

1M big JSON messages (async):

pyzmq (IPC) async time: 45.30812379199779 seconds

Ousret · 2024-04-06T05:18:11Z

Sorry to barge in, I would like to recommend Niquests as a solution embedding advanced multiplexing, DNS-over-QUIC/HTTPS/TLS, async, happy eyeballs, etc... Given the project goals, I think it's a good match.

If interested, I can assist.

Regards,

TheTechromancer · 2024-04-06T13:46:56Z

@Ousret I hadn't heard of niquests, thanks for the recommendation!

I am finished overhauling DNS and almost ready to start on HTTP. Right now we use httpx for our web library, but my plan was to benchmark its speed vs aiohttp before the overhaul. Having seen niquests, I think we should try it too.

From its readme it seems like it wins in features. Do you want to write a benchmark comparing the speed of all three -- httpx, aiohttp, and niquests?

EDIT: I see you've already done some benchmarks. I didn't realize you're the author of the tool. Congrats and nice work on those features!

I'll handle writing the benchmark. For BBOT, speed and stability matter a lot since we can easily issue tens of thousands of requests during a single scan. I'm especially interested in async performance with a big pool size (i.e. 50 concurrent requests).

Ousret · 2024-04-06T14:20:04Z

Here what my experience tells me about those benchmarks.

aiohttp is fairly low-to-mid level http client with a c extension, it's nearly unbeatable as of right now in terms of raw performance. But one cannot compare httpx, or niquests against it. The features served are on another level.

If you want to beat aiohttp, you'll have no choice but to implement urllib3-future itself, but usually, such a speed won't be productive as I have seen many remote peers (waf) simply blocking you due to the excessive throughput
Look at how the scripts are written at https://github.com/Ousret/niquests-stats to get a sense of how to leverage properly multiplexing. Multiplexing is the key.

My advise, is to leverage Niquests (asyncio + multiplexing + happy eyeball + multiple dns over https provider) with a pool of 50 to 100 connections. You'll keep a nice and clean code with a lot of flexibility. This shall be interesting to witness work along your software.

Lastly, all others are currently blocked by advanced WAF (TLS fingerprinting) and our is really closer to a real browser. Especially with HTTP/3.

Let me know if you need anything.

For the WAF proof

import niquests
import requests
import httpx

URL = "https://kick.com/video/f60e4115-7b7b-4680-a4b9-a48e7b74d45c"

if __name__ == "__main__":

    r = requests.get(URL)
    print("Requests", r)

    r = httpx.get(URL)
    print("HTTPX", r)

    r = niquests.get(URL)
    print("Niquests", r)

    r = niquests.get(URL)
    print("Niquests", r)

gives you

Requests <Response [403]>
HTTPX <Response [403 Forbidden]>
Niquests <Response HTTP/2 [200]>
Niquests <Response HTTP/3 [200]>

TheTechromancer · 2024-04-06T14:35:58Z

Thanks, that's really insightful. Especially about the WAF. Excited to try it out.

@liquidsec

TheTechromancer · 2024-05-13T02:23:26Z

HTTP engine added in #1340.

TheTechromancer added the enhancement New feature or request label Mar 28, 2024

TheTechromancer self-assigned this Mar 28, 2024

TheTechromancer mentioned this issue Mar 28, 2024

Presets #1058

Merged

74 tasks

TheTechromancer added this to the BBOT 2.0.0 - crispy_diane milestone Mar 28, 2024

This was referenced Apr 2, 2024

DNS Engine - Offload DNS to Dedicated Process #1231

Merged

BBOT 2.0 #1235

Merged

TheTechromancer mentioned this issue Apr 16, 2024

Bypass403 #1264

Open

TheTechromancer changed the title ~~Offload HTTP and DNS work into dedicated process~~ Offload HTTP work into dedicated process Apr 30, 2024

TheTechromancer closed this as completed May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Offload HTTP work into dedicated process #1223

Offload HTTP work into dedicated process #1223

TheTechromancer commented Mar 28, 2024 •

edited

Loading

TheTechromancer commented Mar 30, 2024 •

edited

Loading

Ousret commented Apr 6, 2024

TheTechromancer commented Apr 6, 2024 •

edited

Loading

Ousret commented Apr 6, 2024 •

edited

Loading

TheTechromancer commented Apr 6, 2024

TheTechromancer commented May 13, 2024

Offload HTTP work into dedicated process #1223

Offload HTTP work into dedicated process #1223

Comments

TheTechromancer commented Mar 28, 2024 • edited Loading

TheTechromancer commented Mar 30, 2024 • edited Loading

zmq vs multiprocessing.queue benchmarks

Ousret commented Apr 6, 2024

TheTechromancer commented Apr 6, 2024 • edited Loading

Ousret commented Apr 6, 2024 • edited Loading

TheTechromancer commented Apr 6, 2024

TheTechromancer commented May 13, 2024

TheTechromancer commented Mar 28, 2024 •

edited

Loading

TheTechromancer commented Mar 30, 2024 •

edited

Loading

TheTechromancer commented Apr 6, 2024 •

edited

Loading

Ousret commented Apr 6, 2024 •

edited

Loading