[APM] Service map can cause OOM in elasticsearch #187707
Labels
apm:service-maps
Service Map feature in APM
bug
Fixes for quality problems that affect the customer experience
stale
Used to mark issues that were closed for being stale
Team:obs-ux-infra_services
Observability Infrastructure & Services User Experience Team
Following up from #186417
When testing the service map under the maximum conditions of 1k trace ids with each trace having ~500 spans, the scripted metric aggregation can cause an OOM in elasticsearch depending on the memory available. Looking at the elasticsearch heapdump, I suspect this is due to the # of hash maps and other data structures being created simultaneously, where data can be duplicated and exist at the same time within the reduce phase. This issue did not happen when disabling parallel async requests and having them sync, when calling fetch_service_paths_from_trace_ids. Further investigation needed.
The text was updated successfully, but these errors were encountered: