Add maxmemory-reserved-scale parameter for evicting key earlier to avoid OOM #831

hwware · 2024-07-26T15:24:25Z

Reference: #742 and https://learn.microsoft.com/en-us/azure/azure-cache-for-redis/cache-best-practices-memory-management (Azure)

Generally, when clients set maxmemory-policy as allkeys-lru or other memory eviction policies, and maxmemory as well, If server runs as write-heavy workloads, the data stored in memory could reach the maxmemory limit very quickly, then OOM message will be reported.

If we have maxmemory-reserved-scale parameter and client enable the lazyfree-lazy feature, for example, we can set maxmemory-reserved-scale as 0.8, it means key eviction begin when used memory reaches 8GB if maxmemory is 10GB, thus, server could continue process clients data and OOM will not happen at this time.

Thus, we can see the benefit is we can delay OOM time, but we need pay for less memory available.

One example for this paramter：
Assume

maxmemory 4GB
maxmemory-reserved-scale 20

Then we could check the detail by info memory command:

maxmemory:4294967296
maxmemory_human:4.00G
maxmemory_policy:allkeys-lru
maxmemory_reserved_scale:20
maxmemory_available:3435973836
maxmemory_available_human:3.20G

We could also update and get the maxmemory-reserved-scale value during runtime as following:

config set maxmemory-reserved-scale value
config get maxmemory-reserved-scale

codecov · 2024-07-26T15:53:55Z

Codecov Report

Attention: Patch coverage is 50.00000% with 20 lines in your changes missing coverage. Please review.

Project coverage is 70.51%. Comparing base (4986310) to head (9c3068b).
Report is 3 commits behind head on unstable.

Files with missing lines	Patch %	Lines
src/config.c	10.00%	9 Missing ⚠️
src/evict.c	60.00%	8 Missing ⚠️
src/module.c	0.00%	3 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable     #831      +/-   ##
============================================
- Coverage     70.68%   70.51%   -0.18%     
============================================
  Files           115      115              
  Lines         63177    63209      +32     
============================================
- Hits          44657    44569      -88     
- Misses        18520    18640     +120

Files with missing lines	Coverage Δ
src/server.c	`87.65% <100.00%> (-0.02%)`	⬇️
src/server.h	`100.00% <100.00%> (ø)`
src/module.c	`9.64% <0.00%> (ø)`
src/evict.c	`94.56% <60.00%> (-3.19%)`	⬇️
src/config.c	`78.39% <10.00%> (-0.45%)`	⬇️

... and 15 files with indirect coverage changes

---- 🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests
JS Bundle Analysis - Avoid shipping oversized bundles

enjoy-binbin · 2024-09-11T15:14:45Z

this seem like a interesting feature, did not review, just drop a comment that approve the concept.

madolson · 2024-09-11T15:59:51Z

I also like the idea. I just haven't really spent enough time thinking about it. Memory management is a big area in Valkey we need to improve.

PingXie

Took a quick look.

src/evict.c

src/server.c

src/evict.c

PingXie · 2024-09-25T04:31:26Z

+1 on introducing "back pressure" earlier. I feel that this could be used with `maxmemory_eviction_tenacity" to give an even smoother eviction experience.

PingXie

Thanks @hwware! LGTM overall. Can you add some tests too?

src/evict.c

src/server.h

hwware · 2024-10-02T13:52:11Z

Thanks @hwware! LGTM overall. Can you add some tests too?

Sure, ready to work, Thanks

src/config.c

valkey.conf

PingXie · 2024-10-11T06:23:53Z

src/evict.c

@@ -398,11 +398,12 @@ int getMaxmemoryState(size_t *total, size_t *logical, size_t *tofree, float *lev
    if (total) *total = mem_reported;

    /* We may return ASAP if there is no need to compute the level. */
-    if (!server.maxmemory) {
+    if (!server.maxmemory_available) {


Thinking about this more, I feel that it is more preferable to model this new setting as a "soft" maxmemory, which can trigger key eviction earlier but won't cause OOM to be returned before hitting the actual maxmemory. Otherwise, we effectively create an alias of maxmemory. More specifically, I think performEviction should only return EVICT_FAIL when the memory usage goes beyond the real maxmemory.

Additionally, since getMaxmemoryState is also used outside of the OOM prevention path such as in VM_GetUsedMemoryRatio, we should consider parameterizing maxmemory and having it passed in by the caller instead, so that we can maintain the same semantics in these externally facing scenarios.

Thoughts?

Like what you suggested, if we can think when "maxmemory_available" is available, it is a soft maxmemory. Then you maybe think we should not return OOM, in the case, how we can return message to client, any idea?

right - how about just processing the command normally?

if used_memory is below soft max, no change to the existing logic

if used_memory is above soft max but below hard max, trigger key eviction and continue with the normal processing

if used_memory is above hard max, no change to the existing logic, i.e., trigger key eviction and fail the command if the used_memory is still above hard max after the eviction)

hwware · 2024-10-29T01:40:29Z

@PingXie According to your last comment suggestion, I made some code changes as below:

In getMaxmemoryState, I update the level calculation way to *level = (float)mem_used / (float)server.maxmemory; instead of pass the maxmemory as a function parameter
performEvictions return EVICT_FAIL only when the memory usage goes beyond the real maxmemory (server.maxmemory)
if used_memory is above soft max but below hard max, trigger key eviction and continue with the normal processing

Please take a look, Thanks

PingXie

The main logic LGTM. Thanks @hwware!

PingXie · 2024-10-29T17:56:10Z

src/evict.c

    result = (isEvictionProcRunning) ? EVICT_RUNNING : EVICT_OK;

+


nit - one empty line?

Suggested change

PingXie · 2024-10-29T18:54:24Z

src/evict.c

+    if (server.maxmemory_reserved_scale && server.maxmemory_available) {
+        size_t mem_used = zmalloc_used_memory();
+        size_t overhead = freeMemoryGetNotCountedMemory();
+        mem_used = (mem_used > overhead) ? mem_used - overhead : 0;
+        if (mem_used > server.maxmemory) {
+            result = EVICT_FAIL;
+            goto cant_free;
+        }
+    }


I don't think we should check server.maxmemory_reserved_scale and server.maxmemory_available. Also we should not goto cant_free on L725/728; instead, we should just break out of the while loop there and then always compute the result based on the current memory state. Otherwise, failing to evict keys due to used memory being greater than the new "soft" limit will always result in EVICT_FAIL/OOM rejection.

Suggested change

if (server.maxmemory_reserved_scale && server.maxmemory_available) {

size_t mem_used = zmalloc_used_memory();

size_t overhead = freeMemoryGetNotCountedMemory();

mem_used = (mem_used > overhead) ? mem_used - overhead : 0;

if (mem_used > server.maxmemory) {

result = EVICT_FAIL;

goto cant_free;

}

}

size_t mem_used = zmalloc_used_memory();

size_t overhead = freeMemoryGetNotCountedMemory();

mem_used = (mem_used > overhead) ? mem_used - overhead : 0;

if (mem_used > server.maxmemory) {

result = EVICT_FAIL;

goto cant_free;

}

Hi Ping, I think you misunderstand the logic for this condition: server.maxmemory_reserved_scale && server.maxmemory_soft. Let me explain to you here:
There are 2 cases:
Case 1. server.maxmemory_soft == server.maxmemory.
In this case, if getMaxmemoryState() return C_ERR, server should make the key eviction. then in the following for... loop, some keys are evicted and some memory are released. Basically, we can guarantee used_memory is below soft max. We do not need to calculate the used_memory again and just keep current logic.

Case 2. server.maxmemory_soft < server.maxmemory.
In this case, even we make some key eviction behavior, but we are not sure if used_memory is smaller than the soft memory, then we should calculate here.

Thus, only when server.maxmemory_soft_scale is set, we need to recalculate the used_memory （I think keep one condition if (server.maxmemory_soft_scale) is enough because server.maxmemory_soft is always true）

PingXie · 2024-10-29T19:06:26Z

valkey.conf

+#
+# This parameter allows Valkey to evict keys earlier. The value represents
+# a percentage of the maxmemory limit, indicating how much memory the Valkey instance reserves
+# without storing data.
+# The default is 0, and the value can be set between 10 and 60.
+#
+# maxmemory-reserved-scale 0


With the recent update, we are essentially introducing a new "soft" version of "maxmemory". What do you think about the following?

Suggested change

#

# This parameter allows Valkey to evict keys earlier. The value represents

# a percentage of the maxmemory limit, indicating how much memory the Valkey instance reserves

# without storing data.

# The default is 0, and the value can be set between 10 and 60.

#

# maxmemory-reserved-scale 0

#

# `maxmemory-soft-scale` defines a "soft" memory threshold as a scale of the `maxmemory` limit.

# When memory usage exceeds this scale, Valkey begins proactive key eviction. However, exceeding this

# threshold does not immediately reject new write commands; only the hard `maxmemory` limit will do so.

# This scale is specified as a percentage of `maxmemory`, with a valid range between 10 and 60.

#

# maxmemory-soft-scale 0

Addressed. Parameter name, variable name, and function name had been changed upon request.

PingXie · 2024-10-29T19:08:00Z

src/server.h

@@ -3031,6 +3033,9 @@ void trimStringObjectIfNeeded(robj *o, int trim_small_values);
 static inline int canUseSharedObject(void) {
    return server.maxmemory == 0 || !(server.maxmemory_policy & MAXMEMORY_FLAG_NO_SHARED_INTEGERS);
 }
+static inline void updateMaxAvailableMemory(void) {
+    server.maxmemory_available = (unsigned long long)server.maxmemory / 100.0 * (100 - server.maxmemory_reserved_scale);


If we are aligned on the config name maxmemory_soft_scale, I would suggest going with maxmemory_soft for the field name too

Suggested change

server.maxmemory_available = (unsigned long long)server.maxmemory / 100.0 * (100 - server.maxmemory_reserved_scale);

server.maxmemory_soft = (unsigned long long)server.maxmemory / 100.0 * (100 - server.maxmemory_soft_scale);

Signed-off-by: hwware <[email protected]>

Signed-off-by: Shivshankar-Reddy <[email protected]> Signed-off-by: hwware <[email protected]>

Signed-off-by: hwware <[email protected]>

hwware force-pushed the maxmemory-reserved-parameter branch from e31dd8b to b8a98df Compare July 26, 2024 15:39

hwware requested review from PingXie, enjoy-binbin and madolson and removed request for PingXie and enjoy-binbin July 26, 2024 15:52

hwware force-pushed the maxmemory-reserved-parameter branch from b8a98df to ba13a0c Compare August 29, 2024 14:50

hwware force-pushed the maxmemory-reserved-parameter branch from ba13a0c to e27e9de Compare September 9, 2024 16:16

hwware force-pushed the maxmemory-reserved-parameter branch from e27e9de to 0a68013 Compare September 16, 2024 17:10

hwware force-pushed the maxmemory-reserved-parameter branch from 0a68013 to 9596c78 Compare September 24, 2024 01:14

PingXie reviewed Sep 25, 2024

View reviewed changes

src/evict.c Outdated Show resolved Hide resolved

src/server.c Outdated Show resolved Hide resolved

src/evict.c Outdated Show resolved Hide resolved

hwware force-pushed the maxmemory-reserved-parameter branch 2 times, most recently from 35437eb to db966fe Compare September 26, 2024 02:46

PingXie reviewed Sep 29, 2024

View reviewed changes

src/evict.c Outdated Show resolved Hide resolved

src/server.h Outdated Show resolved Hide resolved

hwware force-pushed the maxmemory-reserved-parameter branch 4 times, most recently from f7cc8c8 to 7a5584b Compare October 7, 2024 09:31

PingXie reviewed Oct 11, 2024

View reviewed changes

hwware force-pushed the maxmemory-reserved-parameter branch 3 times, most recently from 200b203 to 4da32a2 Compare October 18, 2024 05:54

hwware force-pushed the maxmemory-reserved-parameter branch from 4da32a2 to 510265f Compare October 28, 2024 13:42

hwware force-pushed the maxmemory-reserved-parameter branch from 510265f to c6db98b Compare October 28, 2024 13:43

PingXie reviewed Oct 29, 2024

View reviewed changes

hwware force-pushed the maxmemory-reserved-parameter branch from 01fd9d2 to 62a95e2 Compare November 3, 2024 09:25

hwware force-pushed the maxmemory-reserved-parameter branch 3 times, most recently from 5519dab to b40a26f Compare November 21, 2024 15:49

hwware and others added 12 commits November 21, 2024 16:28

Add maxmemory-reserved-scale parameter for evicting key earlier

1bc6005

Signed-off-by: hwware <[email protected]>

clang-format adjust

2709d1a

Signed-off-by: hwware <[email protected]>

Rebase and address Ping commnt

777f057

Signed-off-by: hwware <[email protected]>

adjust clang-format

d06ad85

Signed-off-by: hwware <[email protected]>

Add test case and Address Ping Comment

bf1f3d2

Signed-off-by: hwware <[email protected]>

Update comments

ac961b2

Signed-off-by: hwware <[email protected]>

Update level but test case fail

1169453

Signed-off-by: hwware <[email protected]>

fix for the test failure

3173193

Signed-off-by: Shivshankar-Reddy <[email protected]> Signed-off-by: hwware <[email protected]>

Update to maxmemory_soft_scale

3f6e99b

Signed-off-by: hwware <[email protected]>

Update eviction logic

bc908e7

Signed-off-by: hwware <[email protected]>

update passed parameter

3e5b507

Signed-off-by: hwware <[email protected]>

update condition in server.maxmemory_soft_scale

9c3068b

Signed-off-by: hwware <[email protected]>

hwware force-pushed the maxmemory-reserved-parameter branch from b40a26f to 9c3068b Compare November 21, 2024 16:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add maxmemory-reserved-scale parameter for evicting key earlier to avoid OOM #831

Add maxmemory-reserved-scale parameter for evicting key earlier to avoid OOM #831

hwware commented Jul 26, 2024

codecov bot commented Jul 26, 2024 •

edited

Loading

enjoy-binbin commented Sep 11, 2024

madolson commented Sep 11, 2024

PingXie left a comment

PingXie commented Sep 25, 2024

PingXie left a comment

hwware commented Oct 2, 2024

PingXie Oct 11, 2024

hwware Oct 18, 2024 •

edited

Loading

PingXie Oct 24, 2024

hwware commented Oct 29, 2024

PingXie left a comment

PingXie Oct 29, 2024

PingXie Oct 29, 2024

hwware Oct 30, 2024 •

edited

Loading

PingXie Oct 29, 2024

hwware Nov 3, 2024

PingXie Oct 29, 2024

		result = (isEvictionProcRunning) ? EVICT_RUNNING : EVICT_OK;

	server.maxmemory_available = (unsigned long long)server.maxmemory / 100.0 * (100 - server.maxmemory_reserved_scale);
	server.maxmemory_soft = (unsigned long long)server.maxmemory / 100.0 * (100 - server.maxmemory_soft_scale);

Add maxmemory-reserved-scale parameter for evicting key earlier to avoid OOM #831

Are you sure you want to change the base?

Add maxmemory-reserved-scale parameter for evicting key earlier to avoid OOM #831

Conversation

hwware commented Jul 26, 2024

codecov bot commented Jul 26, 2024 • edited Loading

Codecov Report

enjoy-binbin commented Sep 11, 2024

madolson commented Sep 11, 2024

PingXie left a comment

Choose a reason for hiding this comment

PingXie commented Sep 25, 2024

PingXie left a comment

Choose a reason for hiding this comment

hwware commented Oct 2, 2024

PingXie Oct 11, 2024

Choose a reason for hiding this comment

hwware Oct 18, 2024 • edited Loading

Choose a reason for hiding this comment

PingXie Oct 24, 2024

Choose a reason for hiding this comment

hwware commented Oct 29, 2024

PingXie left a comment

Choose a reason for hiding this comment

PingXie Oct 29, 2024

Choose a reason for hiding this comment

PingXie Oct 29, 2024

Choose a reason for hiding this comment

hwware Oct 30, 2024 • edited Loading

Choose a reason for hiding this comment

PingXie Oct 29, 2024

Choose a reason for hiding this comment

hwware Nov 3, 2024

Choose a reason for hiding this comment

PingXie Oct 29, 2024

Choose a reason for hiding this comment

codecov bot commented Jul 26, 2024 •

edited

Loading

hwware Oct 18, 2024 •

edited

Loading

hwware Oct 30, 2024 •

edited

Loading