-
Notifications
You must be signed in to change notification settings - Fork 458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DB consumes lots of disk #159
Comments
Hey, apologies for the delayed response. First off, I assume you are using the default Badger DB? That size is definitely not expected. I'm curious about your usage patterns: ballpark how many certificates have you created? You have 8 services, how often are they regenerating certificates? Are you using the More importantly, we'd definitely like to understand what's happening here. Unfortunately, we're storing the data as nosql key-value which makes it difficult to analyze without writing specific code to do that. Would you be open to sending us the database so that we can attempt to analyze it on our end? |
Hi
eg: Gateways are exposed to the world and they are use let's encrypt for listen and for internal communication with mutual authenticated TLS we use step-ca (connect). Some services are internal and in this use case, they have same configuration for both acme clients that points to the same acme server(step-ca). All services running in k8s. We use acme cert manager that provides renew: https://github.com/go-ocf/kit/blob/5cad919232f614458aaae356353192d6a0e89706/security/acme/certManager.go#L123
|
I compressed DB with 7z and now it has 7.6GB. |
Hey @jkralik that's awesome. Sorry for the delayed response, I've been chugging away at a late deadline all day. I was thinking of easy ways for you to upload that. If you send me an ssh pub key I can give you access to a test box and then you can scp it over there. Would that work for you? Also, follow up question, would you mind sending a snippet of logs from the CA? The rate at which the db is growing makes me think that something is pummeling the CA with requests. |
Sure. I restarted step-ca pod and all logs are lost. But I'm thinking that this issue can be related with #149 . Now we are used patched version with #162 and after clean DB it has only 2.7MB after 12hours. |
Whoa! That's super interesting. hmmm. Ok, I'm gonna try and pm you the host address. Also, first time I've seen an ssh ed25519 key out in the wild. cool. |
Actually I don't know how to do that with github 😬. My email is [email protected]. Wanna email me and then I'll email you the host address. Sorry, hate to make this complicated. |
I found that #162 is noy related. Again it takes 3GB after three days running ... I will provide the smaller one for you. |
I found where was the issue. I expected that lego client fill resource with PrivateKey when it's called Renew with CSR, but it's just set certificate and CSR without PrivateKey .... Sorry my fault. |
@jkralik I don't know the Lego client well enough. But, did this cause some sort of loop that continuously hit the db? I guess I'm not understanding why this was causing the DB to expand so rapidly. |
I have loop in my cert manager that renew certificate and when any call fails it try again in 15seconds. It means that every 15seconds was called renew. In my case problem was in https://github.com/go-ocf/kit/blob/cbf12801499b2699b37d72c79f66d8c261d7767e/security/certManager/acme/certManager.go#L238 - this function fails because PrivateKey was empty. And then I fixed it with commit plgd-dev/kit@cbf1280#diff-b1659f964b8384a232f0aec94303c811 |
Interesting. Even if you were renewing every 15 seconds, it's hard for me to understand how you could possibly be generating that much data. If you still have access to the 3GB database, I'd love to take a look at it. |
Sure. I uploaded new archive at |
Hi @dopey, |
Hey @ki-pete want to hop in our Discord? It might be easier to debug in real time. https://discord.gg/fX5VJZAc Here is a script you can use to count the rows in each table in your DB: https://gist.github.com/dopey/8e9206073e2cb052b6f633c0b7d4d8df. We'll want that info to help with debugging. |
Subject of the issue
Server's run from 5. December and DB has lot's of "*.vlog" (314) that consumes 309GB.
Your environment
We are using ca with acme server and 8(services) as acme clients.
Expected behavior
I expected that it consumes max 1GB of disk.
Actual behavior
It consumes 309GB and it's growing.
The text was updated successfully, but these errors were encountered: