-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dragonfly crashes in minimal memory configuration #2739
Comments
Removing |
you can easily verify it @applike-ss by raising the limit and checking |
Is there anything we can help here @applike-ss ? @chakaz probably a good use-case to focus on, when switch working on RSS instrumentation. Not urgent though. |
I'm doing that right now. I assume you want me to
Sorry, was on vacation and couldn't check back here. |
Observed memory usages: Limited to 320Mi (256Mi maxmemory, crashing):
Limited to 640Mi (512Mi maxmemory, not crashing):
My assumption is that due to dragonfly running in cache mode, it should never go OOM - so no crash should happen. |
@applike-ss we plan to take into account the additional memory of serializing to file (and also for new replica connecting in full sync stage), and perhaps fail / slow down these actions when memory is tight. |
With the current version (docker.dragonflydb.io/dragonflydb/dragonfly:v1.20.1), we have to use 430Mi when using max-memory of 320M and doing snapshots every 5 minutes and pushing 1mb data to redis all the time in cache-mode. Still waiting on dragonfly to exit gracefully when this kind of issue happens. From the logs i don't see it shut down when crashing due to OOM situation and no error log for memory allocation or similar. At least a message would be nice that let's us easier see from logs why it crashed. |
Unfortunately based on the information we have here, we can not create any constructive action items here. We need to have a way to reproduce it before we try fixing it 🤷🏼 |
I did provide a demo already in the first message of this issue, however here's another one for k8s. Here's the steps for reproduction:
Create K3D Cluster with port forwarding: k3d cluster create -p 127.0.0.1:6379:6379 Install the dragonfly operator: kubectl apply -f https://raw.githubusercontent.com/dragonflydb/dragonfly-operator/main/manifests/dragonfly-operator.yaml CR: apiVersion: dragonflydb.io/v1alpha1
kind: Dragonfly
metadata:
labels:
app.kubernetes.io/name: dragonfly
app.kubernetes.io/instance: dragonfly-sample
app.kubernetes.io/part-of: dragonfly-operator
app.kubernetes.io/managed-by: kustomize
app.kubernetes.io/created-by: dragonfly-operator
name: dragonfly
spec:
replicas: 3
args:
- --maxmemory=256M
- --alsologtostderr
- --primary_port_http_enabled=false
- --admin_port=9999
- --admin_nopass
- --cache_mode
- --primary_port_http_enabled=true
- --cluster_mode=emulated
snapshot:
cron: "*/5 * * * *"
persistentVolumeClaimSpec:
resources:
requests:
storage: 1Gi
accessModes:
- ReadWriteOnce
resources:
requests:
cpu: 500m
memory: 320Mi
limits:
cpu: 500m
memory: 320Mi
serviceSpec:
type: LoadBalancer |
I just re-checked it with v1.22.1 and the problem still exists. |
Describe the bug
I have discovered crashes of dragonfly upon testing the cache mode to ensure it is stable before using it in prod workloads.
It seems that it is crashing at a random point when putting lots of data
To Reproduce
Steps to reproduce the behavior:
dd if=/dev/urandom bs=1024 count=100| base64 > data.txt
redis-pump.sh
(put same folder as data.txt):docker run -it --rm --cpus 0.1 -m 320M -p 6379:6379 -p 9999:9999 --mount type=tmpfs,destination=/dragonfly/snapshots,tmpfs-size=524288000 ghcr.io/dragonflydb/dragonfly-weekly:e8650ed2b4ebd550c966751dd33ebb1ac4f82b1f-ubuntu --alsologtostderr --primary_port_http_enabled=false --admin_port=9999 --admin_nopass --cache_mode --primary_port_http_enabled=true --cluster_mode=emulated --dir=/dragonfly/snapshots --snapshot_cron="*/5 * * * *"
./redis-pump.sh 127.0.0.1 1000000
and wait ~7-10 minutesExpected behavior
There should not be any crashes. If there are crashes, there should be a reasonable error/crit message telling why it happened
Logs
dragonfly
:redis-pump.sh
Environment (please complete the following information):
Reproducible Code Snippet
Is there any additional information i can provide or an option to let dragonfly run with debug logging for gathering more interesting information?
The text was updated successfully, but these errors were encountered: