Skip to content

Commit

Permalink
Changes to the JITServerUseAOTCache option
Browse files Browse the repository at this point in the history
#1350

`-XX:+JITServerUseAOTCache` is the default setting at the JITServer server

Closes #1350
Signed-off-by: Sreekala Gopakumar <[email protected]>
  • Loading branch information
Sreekala-Gopakumar committed Jun 14, 2024
1 parent 8f8e08d commit ca2401e
Show file tree
Hide file tree
Showing 4 changed files with 31 additions and 24 deletions.
32 changes: 16 additions & 16 deletions docs/jitserver_tuning.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@

### Client-session caches

Multiple client JVMs can be connected at the same time to a single JIT server. For each client, the server maintains a client-session cache with information about the environment the client is running in (Java classes, class hierarchy, profiling information, JVM options, etc.). Typically, the information in these caches is kept separately, per client. However, if you specify the `-XX:+JITServerShareROMClasses` option, the read-only part of the Java classes (ROMClasses in Eclipse OpenJ9&trade; parlance) is shared between the different clients. This option can generate memory savings at the server when the connected clients run identical or similar Java applications.
Multiple client JVMs can be connected at the same time to a single JIT server. For each client, the server maintains a client-session cache with information about the environment the client is running in (Java classes, class hierarchy, profiling information, JVM options, and so on). Typically, the information in these caches is kept separately per client. However, if you specify the `-XX:+JITServerShareROMClasses` option, the read-only part of the Java classes (ROMClasses in Eclipse OpenJ9&trade; parlance) is shared between the different clients. This option can generate memory savings at the server when the connected clients run identical or similar Java applications.

The client-session caches are deleted when the clients terminate, but this can happen only if the clients are shutdown gracefully, giving them the opportunity to send a termination message to the server. To address the scenario of clients ending abruptly, the server also deletes the cache for a client that hasn’t issued a compilation request for 1000 minutes, or 5 minutes under memory pressure. If needed, you can change these values with the following options:

Expand All @@ -35,13 +35,13 @@ The client-session caches are deleted when the clients terminate, but this can h
### JITServer AOT cache

The JITServer technology can cache AOT compiled methods at the server.
The JITServer can, therefore, avoid carrying out an AOT compilation when a compatible AOT method body already exists in the cache, thereby saving CPU resource and improving remote compilation latency. This mechanism works in conjunction with the [dynamic AOT technology](https://www.eclipse.org/openj9/docs/aot/) at the client and therefore the client needs to have the [shared classes cache](https://www.eclipse.org/openj9/docs/shrc/) (SCC) enabled (the SCC is the repository for the AOT code).
The JITServer can, therefore, avoid carrying out an AOT compilation when a compatible AOT method body already exists in the cache, thereby saving CPU resource and improving remote compilation latency. This mechanism works along with the [dynamic AOT technology](https://www.eclipse.org/openj9/docs/aot/) at the client and therefore the client needs to have the [shared classes cache](https://www.eclipse.org/openj9/docs/shrc/) (SCC) enabled (the SCC is the repository for the AOT code).

When the JITServer receives an AOT compilation request, it checks its AOT cache for a compatible compiled method body. If one is not found, the server performs the AOT compilation, sends the response to the client JVM, then serializes the compiled method and stores it in its local AOT cache, for future use. If a compatible compiled method is found, the server sends the client the serialized compiled method from its cache, thus avoiding a compilation. The client deserializes the response, stores the result in its local SCC, and loads the compiled method as a regular dynamic AOT code.

To enable this feature, specify the [`-XX:+JITServerUseAOTCache`](xxjitserveruseaotcache.md) command line option, both at the server and at the client JVM.
This JITServer AOT caching feature is by default enabled at the server but disabled for the JITServer clients. To enable this feature for the JITServer clients, specify the [`-XX:+JITServerUseAOTCache`](xxjitserveruseaotcache.md) command-line option for each client.

A JITServer instance can have several AOT caches, each with its own name. This addresses the situation when client JVMs with significantly different profiles of execution use the same JITServer instance. A client JVM can indicate a specific AOT cache it wants to use by providing its name with the following command-line option [`-XX:JITServerAOTCacheName=<cache_name>`](xxjitserveraotcachename.md). If the client doesn't specify a name for the AOT cache, the server uses a cache named `default`.
A JITServer instance can have several AOT caches, each with its own name. These multiple AOT caches with names address the situation when client JVMs with significantly different profiles of execution use the same JITServer instance. A client JVM can indicate a specific AOT cache that it wants to use by providing its name with the following command-line option [`-XX:JITServerAOTCacheName=<cache_name>`](xxjitserveraotcachename.md). If the client doesn't specify a name for the AOT cache, the server uses a cache named `default`.

The maximum amount of memory that all the AOT cache instances combined can use at the server is 300 MB, by default. You can change this value by using the [`-XX:JITServerAOTmx=<size>`](xxjitserveraotmx.md) option. When the cache size reaches the specified limit, new clients cannot create new AOT cache instances or add new compiled methods to the existing AOT cache instances.

Expand All @@ -58,33 +58,33 @@ The client-session caches are deleted when the clients terminate, but this can h

## Number of concurrent clients

The amount of CPU and memory resources consumed by the server is expected to increase with the number of connected clients. Finding the appropriate number of clients to connect to a server is a tricky proposition that depends on many factors: number of methods that need to be compiled by the clients, optimization levels for these compilations, how clients are started (staggered or not), how clients are shutdown (gracefully or not), etc.
The amount of CPU and memory resources consumed by the server is expected to increase with the number of connected clients. Finding the appropriate number of clients to connect to a server is a tricky proposition that depends on many factors, such as the number of methods that need to be compiled by the clients, optimization levels for these compilations, how clients are started (staggered or not), and how clients are shutdown (gracefully or not).

As a rule of thumb, you should have 10-20 JVMs simultaneously connected to a server with 1-2 GB of memory. With respect to CPU resources, in Kubernetes you might want to set a low "request" value at the server (1-2 vCPUs) and a larger "limit" value (4-8 vCPUs) in order to soak all those idle cycles. It is possible to connect even more clients to one server instance if memory and CPU resources are increased, but in general, two medium-sized server instances placed on different nodes are better than a single, larger server.
Generally, you should have 10-20 JVMs simultaneously connected to a server with 1-2 GB of memory. With respect to CPU resources, in Kubernetes you might want to set a low "request" value at the server (1-2 vCPUs) and a larger "limit" value (4-8 vCPUs) in order to soak all those idle cycles. It is possible to connect even more clients to one server instance if memory and CPU resources are increased, but in general, two medium-sized server instances that are placed on different nodes are better than a single, larger server.

## Alleviating CPU congestion at the server

When too many clients connect to the server, the server can become flooded with compilation requests, leading to increased compilation times and slower start-up/ramp-up for applications. It should be noted that a client JVM issues most of its compilation requests during the start-up phase and ramp-up phase of an application, when load is first applied to it. Thus, from the CPU consumption point of view what matters is the number of clients that start-up or ramp-up concurrently. To alleviate the CPU strain on the server, you can start the client JVMs in a staggered fashion, rather than all at the same time. Sometimes the staggering happens naturally; for instance, when using Kubernetes horizontal pod auto-scaling, additional application instances are launched gradually as the load increases.
When too many clients connect to the server, the server can become flooded with compilation requests, leading to increased compilation times and slower start-up/ramp-up for applications. A client JVM issues most of its compilation requests during the start-up phase and ramp-up phase of an application, when load is first applied to it. Thus, from the CPU consumption point of view what matters is the number of clients that start-up or ramp-up concurrently. To alleviate the CPU strain on the server, you can start the client JVMs in a staggered fashion, rather than all at the same time. Sometimes the staggering happens naturally; for instance, when using Kubernetes horizontal pod auto-scaling, additional application instances are started gradually as the load increases.

Another idea is to use the `-Xjit:enableJITServerHeuristics` command line option at the clients. When this option is present, the client JVMs share some of the compilation burden by performing the cheap compilations locally and send only expensive compilations to the server. What constitutes a cheap compilation is determined by JIT heuristics that look at the method size, optimization level and the amount of CPU and memory available to the JVM.
Another idea is to use the `-Xjit:enableJITServerHeuristics` command-line option at the clients. When this option is present, the client JVMs share some of the compilation burden by performing the cheap compilations locally and send only expensive compilations to the server. What constitutes a cheap compilation is determined by JIT heuristics that look at the method size, optimization level and the amount of CPU and memory available to the JVM.

## Avoiding memory shortages at the server

Roughly speaking, the server uses two types of memory:
1. "Scratch" memory. This is allocated during a compilation (for JIT internal data structures) and released to the operating system at the end of the compilation.
2. "Persistent" memory. This is used for client-session caches and gets deleted only when a client terminates gracefully (or when the JITServer purging mechanism is triggered).
1. "Scratch" memory. This memory is allocated during a compilation (for JIT internal data structures) and released to the operating system at the end of the compilation.
2. "Persistent" memory. This memory is used for client-session caches and gets deleted only when a client terminates gracefully (or when the JITServer purging mechanism is triggered).

The total amount of scratch memory at any given moment depends on how many compilations are in progress and how expensive those compilations are. To reduce this amount, you can start the clients in a staggered fashion as suggested previously, or reduce the number of compilation threads per client. Note that the latter already happens automatically: when the server senses that it is about to run out of memory, it provides feedback to the connected clients to reduce their number of active compilation threads.
The total amount of scratch memory at any particular moment depends on how many compilations are in progress and how expensive those compilations are. To reduce this amount, you can start the clients in a staggered fashion as suggested previously, or reduce the number of compilation threads per client. Note that the latter already happens automatically: when the server senses that it is about to run out of memory, it provides feedback to the connected clients to reduce their number of active compilation threads.

To reduce the amount of persistent memory, you can use the techniques described in section [Server caches](#server-caches).
To reduce the amount of persistent memory, you can use the techniques that are described in section [Server caches](#server-caches).

## Traffic encryption

Enabling network encryption can increase the CPU overhead, both at the client and at the server. For this reason, you should turn on encryption only if needed. Note that some technologies like Istio, Weave, Linkerd, Calico, Cilium already encrypt all network traffic, so using JITServer encryption might be redundant.

## Minimizing application stalls

Usually, the compilation threads in OpenJ9 JVM execute in parallel with Java application threads. However, for correctness reasons a small number of compilations are performed synchronously, meaning that Java application threads have to wait for the compilation result before being allowed to execute the method being compiled. Since remote compilations typically take longer to complete due to network latency, application stalls caused by synchronous compilations can be more severe in a JITServer setting. If this becomes a problem, you should add the following command line option at the client:
Usually, the compilation threads in OpenJ9 JVM execute in parallel with Java application threads. However, for correctness reasons a small number of compilations are performed synchronously, meaning that Java application threads have to wait for the compilation result before being allowed to execute the method being compiled. Since remote compilations typically take longer to complete due to network latency, application stalls that are caused by synchronous compilations can be more severe in a JITServer setting. If z becomes a problem, you should add the following command line option at the client:

-XX:+JITServerLocalSyncCompiles

Expand Down Expand Up @@ -121,7 +121,7 @@ If the client JVM does not find a compatible server to connect to, compilations

### Performance metrics

You can enable the provision of performance metrics by specifying the `-XX:+JITServerMetrics` command line option. After enabling this option, you can use a monitoring tool that follows the OpenMetrics standard, such as Prometheus, to collect the data by issuing an HTTP `GET` request to the following url: `http://<jitserveraddress>:<port>/metrics`.
You can enable the provision of performance metrics by specifying the `-XX:+JITServerMetrics` command-line option. After enabling this option, you can use a monitoring tool that follows the OpenMetrics standard, such as Prometheus, to collect the data by issuing an HTTP `GET` request to the following url: `http://<jitserveraddress>:<port>/metrics`.

:fontawesome-solid-pencil:{: .note aria-hidden="true"} **Note:** There is a limit of four concurrent `GET` requests at any given time.

Expand All @@ -132,7 +132,7 @@ For more information, including the types of metrics that are provided, see the
### Verbose logging

You can inspect the behavior of a JITServer instance by using the [OpenJ9 verbose logging facility](https://blog.openj9.org/2018/06/07/reading-verbose-jit-logs/). Note that if the name of the verbose log is not specified, the relevant information is printed to stderr.
When you use the `-XX:+JITServerLogConnections` command line option, the server prints a message to the verbose log every time a new client JVM connects to it or disconnects from it. This is an easy way to determine that the clients are able to reach the server. Example of output:
When you use the `-XX:+JITServerLogConnections` command-line option, the server prints a message to the verbose log every time a new client JVM connects to it or disconnects from it. This is an easy way to determine that the clients are able to reach the server. Example of output:
```
#JITServer: t= 74232 A new client (clientUID=14692403771747196083) connected. Server allocated a new client session.
#JITServer: t= 74282 A new client (clientUID=2599593246759846167) connected. Server allocated a new client session.
Expand All @@ -154,7 +154,7 @@ Example of output:
#JITServer: CpuLoad 206% (AvgUsage 25%) JvmCpu 113%
...
```
A value greater than 0 for the `Compilation Queue Size` is a sign that the server is overloaded. Compilation requests that wait in the compilation queue face greater delays and run the risk of exceeding network timeouts. To avoid this scenario, you can reduce the number of connected clients, use the techniques described in section [Alleviating CPU congestion at the server](#alleviating-CPU-congestion-at-the-server), or increase the number of compilation threads at the server by using the [`-XcompilationThreads`](xcompilationthreads.md) option.
A value greater than 0 for the `Compilation Queue Size` is a sign that the server is overloaded. Compilation requests that wait in the compilation queue face greater delays and run the risk of exceeding network timeouts. To avoid this scenario, you can reduce the number of connected clients, use the techniques that are described in section [Alleviating CPU congestion at the server](#alleviating-CPU-congestion-at-the-server), or increase the number of compilation threads at the server by using the [`-XcompilationThreads`](xcompilationthreads.md) option.

Increasing the maximum number of client threads can improve performance in high network latency settings because there can be more in-progress concurrent compilation requests. Increasing the number of threads at the server can improve performance if the server has many CPU cores available and serves a large number of clients concurrently.

Expand Down
5 changes: 5 additions & 0 deletions docs/version0.46.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ The following new features and notable changes since version 0.45.0 are included
- [New binaries and changes to supported environments](#binaries-and-supported-environments)
- [MD5 message digest algorithm support for OpenSSL](#md5-message-digest-algorithm-support-for-openssl)
- [Support added for the `com.sun.management.ThreadMXBean.getTotalThreadAllocatedBytes()` API](#support-added-for-the-comsunmanagementthreadmxbeangettotalthreadallocatedbytes-api)
- [The JITServer AOT caching feature enabled by default at the JITServer server](#the-jitserver-aot-caching-feature-enabled-by-default-at-the-jitserver-server)

## Features and changes

Expand All @@ -47,6 +48,10 @@ With this release, the OpenJ9 VM implementation supports measurement of the tota

The `getTotalThreadAllocatedBytes()` method now returns the total thread allocated bytes instead of `-1`.

### The JITServer AOT caching feature enabled by default at the JITServer server

`-XX:+JITServerUseAOTCache` is the default setting at the JITServer server now. That means that you don't have to specify the `-XX:+JITServerUseAOTCache` option at the server to enable the JITServer AOT caching feature. For more information, see [ `-XX:[+|-]JITServerUseAOTCache`](xxjitserveruseaotcache.md).

## Known problems and full release information

To see known problems and a complete list of changes between Eclipse OpenJ9 v0.45.0 and v0.46.0 releases, see the [Release notes](https://github.com/eclipse-openj9/openj9/blob/master/doc/release-notes/0.46/0.46.md).
Expand Down
4 changes: 2 additions & 2 deletions docs/xxjitservershareromclasses.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@

# -XX:\[+|-\]JITServerShareROMClasses

This option enables the JITServer server to share cached ROM classes between JITServer clients.
This option enables or disables the JITServer server to share cached ROM classes between JITServer clients.

## Syntax

Expand All @@ -34,7 +34,7 @@ This option enables the JITServer server to share cached ROM classes between JIT
|`-XX:+JITServerShareROMClasses` | Enable | |
|`-XX:-JITServerShareROMClasses` | Disable| :fontawesome-solid-check:{: .yes aria-hidden="true"}<span class="sr-only">yes</span> |

:fontawesome-solid-pencil:{: .note aria-hidden="true"} **Note:** The `-XX:+JITServerShareROMClasses` option is enabled by default at the server on specifying the [`-XX:+JITServerUseAOTCache`](xxjitserveruseaotcache.md) option.
:fontawesome-solid-pencil:{: .note aria-hidden="true"} **Note:** The `-XX:+JITServerShareROMClasses` option is enabled by default at the server because of the default setting of the [`-XX:+JITServerUseAOTCache`](xxjitserveruseaotcache.md) option.

## Explanation

Expand Down
Loading

0 comments on commit ca2401e

Please sign in to comment.