Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics for received and send messages to starter #20

Open
cmdjulian opened this issue Oct 28, 2024 · 5 comments
Open

Add metrics for received and send messages to starter #20

cmdjulian opened this issue Oct 28, 2024 · 5 comments

Comments

@cmdjulian
Copy link
Contributor

Hey,

I would like to retrofit the starter with metrics. What it essentially does, is registering certain counters for the received and send messages. I would also like to tag the metrics with the topic it is published under.
I mad a small internal prototype and you could think of it like something like the following:

grafik

Using a counter for that seems well suited. I was now wondering what you would consider the best points to start it. We would need a counter within the logic for sending and for receiving to get access to the concrete topic.
As I'm unsure what are good points to start at in the code base I thought I would ask here to get a rough idea what you would suggest.
I was thinking about using some form of bean post processor, so if a user supplies it's own beans we can still gather the metrics from it.
Another idea would be to broaden the interface to force every impl to also expose some form of metrics and create a metrics collector bean conditional on such a bean and enabled metrics.

What do you think would be the best way forward?

@rubengees
Copy link
Member

rubengees commented Oct 29, 2024

Hi, interesting idea!

How I would envision this is to have a service that allows recording metrics like message received, message successfully processed and message dropped:

@Service
class MqttMetricService(private val registry: MeterRegistry) {

    fun recordMessageReceived(topic: String, bytes: Int) {
        registry.summary("mqtt_bytes_received", "topic", topic).record(bytes.toDouble())
        registry.counter("mqtt_messages_received", "topic", topic).increment()
    }

    fun recordMessageProcessed(topic: String) {
        // Same here
    }

    fun recordMessageDropped(topic: String) {
        // Same here
    }

    // More for e.g. publishing? 
}

Then we would call these for every message that goes through our logic:

// MqttHandler

fun handle(message: MqttPublishContainer) {
    val (topic, payload) = message

    logger.trace("Received mqtt message on topic [{}] with payload {}", topic, payload)
    mqttMetricService.recordMessageReceived(topic.toString(), payload.size)
    
    //...
}
// MqttMessageErrorHandler

open fun handle(error: MqttMessageException) {
    logger.error("Error while delivering mqtt message on topic [${error.topic}]", error)
    metricService.recordMessageDropped(error.topic.toString())
}

For messages that do not go through our logic (when subscribing or publishing manually), the user would have to call these manually, but that is fine since then it's custom logic anyways. I wouldn't want to make the api any more complicated for this.

One concern I have is that this should maybe be it's own module or at least behind configuration so that not everyone has this enabled automatically.

Is this an approach with which we could start?

@cmdjulian
Copy link
Contributor Author

cmdjulian commented Oct 29, 2024

I like the idea of having a dedicated services, regardless it would require to add the service into the constructor of the existing classes and call it, right?

One concern I have is that this should maybe be it's own module or at least behind configuration so that not everyone has this enabled automatically.

I don't think this is necessary honestly. We can make two impls of the service, one No-Op one and one which utilizes the metrics facade from Spring Actuator. We can then use conditional beans to only use the metrics impl if actuator is present.
What you would typically do for such metrics, is having a property within the configuration to disable it if required.
So something in the likes of mqtt.metrics.enabled=true as a default based on actuator.
I would also add a flag called mqtt.metrics.include-topic-as-tag=false as a default. The idea here is, if you have a lot of topics, these can accumulate pretty fast and the user should willingly opt-in to add those tags, if he / she is sure, there are not too many topics polluting the metrics.

Having a dedicated module would only make sense when also moving the health contributor into it and maybe any form of tracing as well.

I would also like to refine the MqttMetricService into an interface and broaden the scope a bit, to also include the Exception in the case of an error and the actual publish, to give the user the possibility to replace the metrics collector for a custom one and acting up on that information.

What do you think about that?

@rubengees
Copy link
Member

I like the idea of having a dedicated services, regardless it would require to add the service into the constructor of the existing classes and call it, right?

Yes, right. That would be a breaking change then. I don't think anyone would instantiate e.g. the MqttHandler directly so it should be fine, but will increment the version number accordingly then.

What do you think about that?

Sounds very good, configuration via the actuator convention is what I was looking for. Same for a more abstract MqttMetricService with a default implementation 👍.

@cmdjulian
Copy link
Contributor Author

One small thing left though. I saw you chose a summary for the received bytes, but not for the total count of messages.
I'm still getting familiar with the different metrics facade in micrometer. When would one use a summary and when a counter though?

@rubengees
Copy link
Member

The micrometer docs have explanations: https://docs.micrometer.io/micrometer/reference/concepts/distribution-summaries.html

A distribution summary tracks the distribution of events. It is structurally similar to a timer but records values that do not represent a unit of time. For example, you could use a distribution summary to measure the payload sizes of requests hitting a server.

A counter is used to represent a rate ("we received an average of 20 messages per second between 08:00 and 10:00") whereas a distribution summary is a distribution of concrete values ("we received this large message at 09:10"). With counters you would track things like requests, errors, or successful outcomes and with distribution summaries e.g. payload size, response times.

I can also recommend this guide by Baeldung: https://www.baeldung.com/micrometer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants