Performance comparison #205
-
These past few days I have been working with this library and first of all, I wanted to thank the entire team for all the work done. I have been testing the library to simulate 10,000 producers sending messages every second, half a second, and every 100ms. I have encountered some problems with my complete program, so I preferred to do some simple benchmarks to see if I am misusing the library or if these are the actual limits. First, I tried paho-mqtt using the following code (obtained from the internet). I get about 20,000 messages sent per second. import paho.mqtt.client as mqtt
from time import time, sleep
import random
BROKER = "localhost"
NB_MESSAGES = 100000
PAYLOAD_LEN = 128
TOPIC = 'a/b/c/d'
rcpt_counter = 0
def on_disconnect(client, userdata, rc):
elapsed = time() - T0
print('sending', NB_MESSAGES / elapsed, 'messages per sec')
def on_message(client, userdata, msg):
global T1, rcpt_counter
rcpt_counter += 1
if rcpt_counter % 1000 == 0:
T2 = time()
print(' receiveing', 1000 / (T2 - T1), 'messages per sec')
T1 = T2
p = mqtt.Client()
p.on_disconnect = on_disconnect
p.connect(BROKER)
p.loop_start()
c = mqtt.Client()
c.connect(BROKER)
c.on_message = on_message
c.subscribe(TOPIC)
c.loop_start()
#prepare some random data
data = [''.join(chr(random.getrandbits(8)) for _ in range(PAYLOAD_LEN))
for _ in range(NB_MESSAGES)]
T0 = T1 = time()
for i in range(NB_MESSAGES):
p.publish(TOPIC, data[i])
p.disconnect()
sleep(20)
c.disconnect() On the other hand, with the following benchmark I did using async-mqtt, I get about 10k messages per second. import asyncio
from asyncio_mqtt import Client
import logging
import time
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)
messages = 0
async def monitor_task():
start_time = 0
while True:
await asyncio.sleep(1)
global messages
elapsed = time.time() - start_time
logger.debug(f"Messages: {messages/elapsed} per second")
messages = 0
start_time = time.time()
async def publisher(client):
await client.publish("test", "Hello World!")
global messages
messages += 1
logger.debug("Messages: " + messages)
async def main():
async with Client("localhost") as client:
while True:
tasks = set()
for i in range(1000):
task = publisher(client)
tasks.add(task)
await asyncio.gather(*tasks, return_exceptions=True)
async def bootstrap():
tasks = [monitor_task(), main()]
await asyncio.gather(*tasks, return_exceptions=True)
if __name__ == "__main__":
asyncio.run(bootstrap()) In the production code, I have also observed that asyncio, in debug mode, complains that the function on_socket_register_write callback takes too much time (0.4 seconds, 400ms), specifically in client.loop_write(). Could it be that this function is blocking asyncio? I hope everything is clear. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Hi Jose. Thanks for opening this discussion. Always nice with a code example to reproduce the issue with. 👍 It's an interesting benchmark! First of: Python (in 2023) is about correctness and maintainability. Not performance. If you want performance, start with another programming language. I ran the code on my machine with a local mosquitto MQTT server, python 3.10, and using Ubuntu 20.04. Roughly, I got
Quite the difference. I changed the publishers "batch" size from 1000 to 100 (
It's always a challenge to find the optimal batch size. Depends on the overhead of async scheduling and the task itself. To set things in perspective, try to comment out Next, I switch from the built-in asyncio event loop to uvloop. That gives:
Not much of a difference but still better. Now I cheat: I go into the asyncio-mqtt source code and comment out the "wait until the publish succeeds" logic inside
That seems to be the main culprit! We are now very close to raw paho-mqtt. It also showcases one of the key differences between asyncio-mqtt and paho-mqtt: asyncio-mqtt waits for acknowledgement from the server before it proceeds. paho-mqtt uses a fire-and-forget approach. The big question is now: Why is asyncio-mqtt just about on par with paho-mqtt in terms of performance? Why are we still far from the performance limit of 200K msgs/s? Isn't async/await supposed to be fast? The answer is usually yes, but only if you use async I/O in depth. We (asyncio-mqtt) do not. We simply call paho-mqtt and let paho deal with the I/O. This means that underneath everything goes through python's sync So we use asyncio-mqtt and not just paho-mqtt? Well, use whatever you want. :) Don't expect asyncio-mqtt to give you extra performance due to concurrent I/O (it's all paho behind the scenes). asyncio-mqtt gives you safety on top of paho. It proceeds carefully and always makes sure that your requests (publish, subscribe, etc.) gets an acknowledgement. This cautious approach costs in terms of performance. That is what you see in your benchmarks. I hope that helps explain it. |
Beta Was this translation helpful? Give feedback.
Great to hear that it helped. :)
It's more about OS-level concurrency than hardware-based parallelism (here is a great thread on that topic). asyncio-mqtt uses paho-mqtt. paho-mqtt uses
socket
.socket
is a sync interface (no concurrency).If we want OS-level concurrency, we need to use an actual async interface. E.g.:
anyio.open_connection
/loop.create_connection
instead ofsocket
. That's what, e.g.,gmqtt
does.In a perfect world, we would have:
I don't think such a library exists ye…