ALL_BROKERS_DOWN on Producer #1813

dusanu87 · 2024-09-12T12:14:33Z

Description

We have spotted the following behavior after moving to version 2.5.0.
After regular AWS MSK maintenance where all brokers are restarted one by one, we can see in the logs following errors
cimpl.KafkaException: KafkaError{code=_ALL_BROKERS_DOWN,val=-187,str="3/3 brokers are down"}
and
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="ssl://b-3.XXX.kafka.XXX.amazonaws.com:9094/3: Disconnected (after 1091448ms in state UP)"}
This is on the producer side which occasionally occurs even after 2-3 days after broker restart. This behavior is also present during Kubernetes deployment restart when flush is called on the producer side.
This issue is only present after the broker restart which happens during regular MSK maintenance.
The issue is also present in version 2.4.0. Before this version, we didn't encounter this behavior.
Once the application is restarted on K8s, the issue is gone.

How to reproduce

Restart one broker on the AWS MSK cluster(3 brokers)
Restart K8s deployment(application with producer side).

Checklist

Please provide the following information:

confluent-kafka-python and librdkafka version (confluent_kafka.version() and confluent_kafka.libversion()):
(2.5.0) (2.5.0)
Apache Kafka broker version: (3.5.1)
Client configuration: { "queue.buffering.max.messages": settings.KAFKA_PRODUCER_QUEUE_COUNT, "queue.buffering.max.kbytes": settings.KAFKA_PRODUCER_QUEUE_BUFF_KBYTES, "linger.ms": settings.KAFKA_PRODUCER_LINGER, "bootstrap.servers": settings.KAFKA_BROKERS, "enable.idempotence": True, "acks": "all", "delivery.timeout.ms": settings.KAFKA_PRODUCER_DELIVERY_TIMEOUT_MS, "security.protocol": "SSL", "error_cb": error_cb, }
Operating system:
Provide client logs (with 'debug': '..' as necessary)
Provide broker log excerpts
Critical issue

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ALL_BROKERS_DOWN on Producer #1813

ALL_BROKERS_DOWN on Producer #1813

dusanu87 commented Sep 12, 2024

ALL_BROKERS_DOWN on Producer #1813

ALL_BROKERS_DOWN on Producer #1813

Comments

dusanu87 commented Sep 12, 2024

Description

How to reproduce

Checklist