You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have spotted the following behavior after moving to version 2.5.0.
After regular AWS MSK maintenance where all brokers are restarted one by one, we can see in the logs following errors cimpl.KafkaException: KafkaError{code=_ALL_BROKERS_DOWN,val=-187,str="3/3 brokers are down"}
and cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="ssl://b-3.XXX.kafka.XXX.amazonaws.com:9094/3: Disconnected (after 1091448ms in state UP)"}
This is on the producer side which occasionally occurs even after 2-3 days after broker restart. This behavior is also present during Kubernetes deployment restart when flush is called on the producer side.
This issue is only present after the broker restart which happens during regular MSK maintenance.
The issue is also present in version 2.4.0. Before this version, we didn't encounter this behavior.
Once the application is restarted on K8s, the issue is gone.
How to reproduce
Restart one broker on the AWS MSK cluster(3 brokers)
Restart K8s deployment(application with producer side).
Checklist
Please provide the following information:
confluent-kafka-python and librdkafka version (confluent_kafka.version() and confluent_kafka.libversion()):
(2.5.0) (2.5.0)
Description
We have spotted the following behavior after moving to version 2.5.0.
After regular AWS MSK maintenance where all brokers are restarted one by one, we can see in the logs following errors
cimpl.KafkaException: KafkaError{code=_ALL_BROKERS_DOWN,val=-187,str="3/3 brokers are down"}
and
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="ssl://b-3.XXX.kafka.XXX.amazonaws.com:9094/3: Disconnected (after 1091448ms in state UP)"}
This is on the producer side which occasionally occurs even after 2-3 days after broker restart. This behavior is also present during Kubernetes deployment restart when flush is called on the producer side.
This issue is only present after the broker restart which happens during regular MSK maintenance.
The issue is also present in version 2.4.0. Before this version, we didn't encounter this behavior.
Once the application is restarted on K8s, the issue is gone.
How to reproduce
Restart one broker on the AWS MSK cluster(3 brokers)
Restart K8s deployment(application with producer side).
Checklist
Please provide the following information:
confluent_kafka.version()
andconfluent_kafka.libversion()
):(
2.5.0
) (2.5.0
)3.5.1
){ "queue.buffering.max.messages": settings.KAFKA_PRODUCER_QUEUE_COUNT, "queue.buffering.max.kbytes": settings.KAFKA_PRODUCER_QUEUE_BUFF_KBYTES, "linger.ms": settings.KAFKA_PRODUCER_LINGER, "bootstrap.servers": settings.KAFKA_BROKERS, "enable.idempotence": True, "acks": "all", "delivery.timeout.ms": settings.KAFKA_PRODUCER_DELIVERY_TIMEOUT_MS, "security.protocol": "SSL", "error_cb": error_cb, }
'debug': '..'
as necessary)The text was updated successfully, but these errors were encountered: