Hi We are experiencing certain problem in one of our deploym Cerbos Community #help

Hi We are experiencing certain problem in one of o...

Yehiel Mizrahi

06/08/2025, 1:44 PM

Hi We are experiencing certain problem in one of our deployments with Cerbos. The pod in EKS that runs Cerbos and out service suddenly becomes unhealthy - Cerbos crashes due to OOM error. When I try to analyze Cerbos logs in Cloudwatch I see nothing, alss using Kubectl on the pod to see the memory usage (or to see if there is a memory pressure on the node) gives nothing. How can I understand the reasons for this sudden problem? The

policy

table size in Db is only 5 Mb (we are using AWS managed MySQL 8.0 DB), but

policy_retention

table size is 66 Gb (another issue that I want to address - can we configure this so Cerbos will keep only limited policy changes history or we need to do a scheduled deletion by ourselves?). The policy caching settings are the default ones. Thank you!

oguzhan

06/11/2025, 8:00 AM

Hi @Yehiel Mizrahi, There are triggers for “delete”, “insert” and “update” operations which inserts rows to the

policy_revision

table. They are named

policy_on_delete

polic_on_insert

, and

policy_on_update

in the MySQL schema. You could drop all of those triggers, or only some set of them (Ex: leave only

policy_on_delete

so that you have a backup of deleted policies) without any problem. Clearing all rows in

policy_revision

table to save some space is OK, too. For the OOM problem, how is memory requests/limits configured for cerbos?

Yehiel Mizrahi

06/11/2025, 8:20 AM

@oguzhan Cerbos config is pretty basic:

Copy code

".cerbos.yaml": |-
    server:
      # Configure Cerbos to listen on a Unix domain socket.
      grpcListenAddr: "127.0.0.1:3593"  
      # Note that adminAPI will be enabled only for PermissionsService
      adminAPI:
        enabled: false
    storage:
      driver: mysql
      mysql:
        dsn: ${MYSQL_USERNAME}:${MYSQL_PASSWORD}@tcp(${MYSQL_HOST}:3306)/${MYSQL_DATABASE}
    compile:
      cacheDuration: 60s

and these are the settings for Cerbos resource requirements:

Copy code

containers:
      - name: cerbos
        image: "<http://ghcr.io/cerbos/cerbos:0.40.0|ghcr.io/cerbos/cerbos:0.40.0>"
        resources:
          requests:
            memory: "512Mi"
            cpu: "1000m"
          limits:
            memory: "512Mi"
            cpu: "1000m"

oguzhan

06/11/2025, 9:13 AM

When did you start to see this error first, and which version of Cerbos you are using? Did the traffic increased, or the payload size of the requests increased lately? Have you added new policies to the mix as of late?

Yehiel Mizrahi

06/11/2025, 9:14 AM

There is only one policy, but it is pretty large (~5Mb). We are using image version 0.40.0 Traffic & payloads didn't increase.

Yael Margalit

06/18/2025, 3:48 PM

@Serhii Pavlovskyi

3 Views

Open in Slack

Previous Next