Heads up, we were seeing a memory leak issue on v0...
# help
m
Heads up, we were seeing a memory leak issue on v0.20. We’ve upgrade to v.0.25, so hoping something in there might solve the problem. It’s pretty minimal so we figure kicking the box every ~2 weeks is a good solution for now.
image.png
Running behind a load balancer and using github as our policy store
a
Thanks for reporting. How are you deploying Cerbos?
m
We build an image in ECR and deploy it via ECS. We’re managing it all via CDK.
a
Thanks - there have a been a number of changes between 0.20 and 0.25 so let us know if you start seeing the same trend and we can try and dig into it
m
Will do, was reading the change-log and looks like there’s been a bunch 🤞
a
Specifically there were some updates to how the cache evicted over time which could be related to this
m
I’ll set a reminder to check back in a few days
c
Hi. Has it ever reached 100% and crashed? Some times things look like a memory leak but that's simply because the memory is not immediately released back to the OS because there's no memory pressure.
60% seems to be the max here so I think that's the likelier explanation.
m
Not 100% and crashed, but we noticed it today because we started seeing errors from the cerbos service
The dip there at the end is after we forced a restart
c
Yeah. It's been running continuously for almost a month before that, hasn't it?
What were the errors?
m
We got GRPC errors in our primary application when calling the Cerbos service
It’s been running continuously for a month 🙂
Perhaps slowly on the rise, we’ll kick it every two weeks or so for now
c
What's the memory limit you have set on the container? If you have a staging/dev environment, I'd suggest setting a lower limit and seeing if the usage keeps rising over 100%. So far I haven't been able to reproduce the issue.