Title
m

Matthew Ebeweber

02/21/2023, 5:49 PM
Heads up, we were seeing a memory leak issue on v0.20. We’ve upgrade to v.0.25, so hoping something in there might solve the problem. It’s pretty minimal so we figure kicking the box every ~2 weeks is a good solution for now.
image.png
Running behind a load balancer and using github as our policy store
a

Alex Olivier (Cerbos)

02/21/2023, 5:54 PM
Thanks for reporting. How are you deploying Cerbos?
m

Matthew Ebeweber

02/21/2023, 5:58 PM
We build an image in ECR and deploy it via ECS. We’re managing it all via CDK.
a

Alex Olivier (Cerbos)

02/21/2023, 6:01 PM
Thanks - there have a been a number of changes between 0.20 and 0.25 so let us know if you start seeing the same trend and we can try and dig into it
m

Matthew Ebeweber

02/21/2023, 6:02 PM
Will do, was reading the change-log and looks like there’s been a bunch 🤞
a

Alex Olivier (Cerbos)

02/21/2023, 6:02 PM
Specifically there were some updates to how the cache evicted over time which could be related to this
m

Matthew Ebeweber

02/21/2023, 6:04 PM
I’ll set a reminder to check back in a few days
c

Charith (Cerbos)

02/21/2023, 6:11 PM
Hi. Has it ever reached 100% and crashed? Some times things look like a memory leak but that's simply because the memory is not immediately released back to the OS because there's no memory pressure.
60% seems to be the max here so I think that's the likelier explanation.
m

Matthew Ebeweber

02/21/2023, 6:18 PM
Not 100% and crashed, but we noticed it today because we started seeing errors from the cerbos service
The dip there at the end is after we forced a restart
c

Charith (Cerbos)

02/21/2023, 6:20 PM
Yeah. It's been running continuously for almost a month before that, hasn't it?
What were the errors?
m

Matthew Ebeweber

02/21/2023, 9:45 PM
We got GRPC errors in our primary application when calling the Cerbos service
It’s been running continuously for a month 🙂
Perhaps slowly on the rise, we’ll kick it every two weeks or so for now
c

Charith (Cerbos)

02/27/2023, 8:55 AM
What's the memory limit you have set on the container? If you have a staging/dev environment, I'd suggest setting a lower limit and seeing if the usage keeps rising over 100%. So far I haven't been able to reproduce the issue.