Monitoring is a very comprehensive topic. For devon4j
we focus on selected core topics which are most important when developing production-ready applications.
On a high level view we strongly suggest to separate the application to be monitored from the monitoring system
itself.
The monitoring system covers aspects like
-
Collect monitoring information
-
Aggregate, process and visualizate data, e.g. in dashboards
-
Provide alarms
-
…
In distributed systems it is crucial that a monitoring system provides a central overview over all applications in your application landscape. So it is not feasible to provide a monitoring system as part of your application. Your application is responsible for providing information to the monitoring system.
JMX is the official java monitoring solution. It is part of the JDK. Your application may provide monitoring information or receive monitoring related commands via MBeans
. There is a huge amount of information about JMX available. A good starting point might be JMX on wikipedia.
Traditionally JMX uses RMI for communication. In many environments HTTP(S) is preferred, so be careful on deciding if JMX is the right solution.
Alternativly your application could provide a monitoring API via REST.
Since REST APIs are very popular it is quite natural to provide monitoring information via REST. If your application already uses REST and there is no JMX/RMI based monitoring solution in place, it is often a better approach to provide monitoring APIs via REST than introducing a new protocol with JMX.
Some aspects of monitoring are covered by "logging". A typical case might be the requirement to "monitor" the application REST APIs. For that it is perfectly fine to log the perfomance of all (or some) request and ship this information to a logging system like Graylog. So please carefully read the logging guide. To allow efficient processing of those logs you should use JSON based logs.
It is not possible to provide a final list of which monitoring information is exactly required for your application. But we give you a general advice what type of information your application should provide at a minimum.
It is often useful if your application provides basic information about itself. This should cover
-
The name of the application
-
The version (or buildno., or commit-id, …) of the application
This is espicially useful in complex environments, e.g. to very that deployments went correctly.
Metric means providing key figures about your applications like
-
performance key figures
-
request duration
-
queue lengths
-
duration of database queries
-
..
-
-
business related information
-
number of successful / unsuccesful requests
-
size of result sets
-
worth of shopping baskets
-
..
-
-
Technical key figures
-
JVM heap usage
-
cache sizes
-
(database) pool usage
-
…
-
-
…
Remember that processing of this data should be done mainly in the monitoring system. You might have noticed that there are different types of metrics: those that represent current values (like JVM heap usage, queue length, …), others base on (timed) events like (duration of requests). Handling of different types of metrics might be different. For handling events you may:
-
Write log statements for each (or a sample of) event. These logs must then be shipped to your monitoring systems.
-
Send data for the event via an API of your monitoring system
-
Provide a REST API (or JMX MBeans) with pre-aggregated key figures, which is periodically polled by your monitoring system. This solution is a bit inferior since the aggregation is part of your application and might not fit to the desired visualization in your monitoring systems.
For actual values you may:
-
Write them perodically to your log. These logs must then be shipped to your monitoring systems.
-
Send them peridocally via an API of your monitoring systems
-
Provide a REST API (or JMX MBeans) which is periodically polled by your monitoring system.
For monitoring a complex application landscape it is crucial to have an exact overview if all applications are up and running. So your application should offer an API for the monitoring system which allows to easily check if the application is alive. Often this alive information is polled by the monitoring system with a kind of watchdog. The health check should include checks if the application is working "correctly". For that we suggest to check if all required neighbour systems and infrastructure components are usable:
-
Check if your database can be queried (with a dummy query)
-
Check if you can reach your messaging system
-
Check if you can reach all your neighbour system, e.g. by querying their info-endpoint
You should be very careful to not cascade those requests, e.g. your system should only test their direct neighbours. This test should not lead to additional tests in these systems.
The healthcheck should return a simple OK/NOK result for use in dashboards, but addtionally include detailed results for each check.
To implement a monitoring API for your systems we suggest to use Spring Boot Actuator. Actuator offers APIs which provide monitoring information including metrics via HTTP and JMX. It also contains a framework to implement health checks.
Please consult the original documentation for information about how to use it.
Basically to use it, add the following dependency to the pom.xml
of your application core:
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
</dependencies>
There will be several endpoints with monitoring information available out-of-the-box.
We strongly advice to check carefully which information is required in your context, normally this is ìnfo
, health
and metrics
. Be careful not to expose any security related information via this mechanismen (e.g. by exposing those endpoints externally).
To make the info-endpoint useful you need to provide information to actuactor. A good way to achive this is by using the provided maven module.
For first steps it might be useful to deactive security for the actuator endpoints (this is just for testing, never release it!). This can be accomblished by implementing the following class:
@Configuration
@EnableWebSecurity
@Profile(SpringProfileConstants.NOT_JUNIT)
public class WebSecurityConfig extends BaseWebSecurityConfig {
@Override
public void configure(WebSecurity web) throws Exception {
super.configure(web);
web.ignoring().requestMatchers(EndpointRequest.toAnyEndpoint());
}
}
To loadbalance HTTP requests the loadbalancers needs to know which instances of the desired application are available and functioning. Often loadbalancers support reacting on the HTTP status code of an HTTP request to the service. The loadbalancer will periodically poll the service to find out if is available or not. To configure this you may use the healthcheck of the service to find out if the instance is functioning correctly or not.
Docker supports a healtcheck. You may use a simple local curl
to your application here to find out if the service is healthy or not. But be careful often unhealthy containers are automatically restarted. If you use the health information of your application this may lead to undesired effects. Since the health checks relies on querying all neighbour systems and infrastucure components, applications often become unhealthy because a 3rd system has problems. Restarting the application itself will not heal the problem and be inexpedient. So generally it is better you query the info endpoint of your application to just check if the service itself is up and running.