Hi We are experiencing certain problem in one of o...
# help
y
Hi We are experiencing certain problem in one of our deployments with Cerbos. The pod in EKS that runs Cerbos and out service suddenly becomes unhealthy - Cerbos crashes due to OOM error. When I try to analyze Cerbos logs in Cloudwatch I see nothing, alss using Kubectl on the pod to see the memory usage (or to see if there is a memory pressure on the node) gives nothing. How can I understand the reasons for this sudden problem? The
policy
table size in Db is only 5 Mb (we are using AWS managed MySQL 8.0 DB), but
policy_retention
table size is 66 Gb (another issue that I want to address - can we configure this so Cerbos will keep only limited policy changes history or we need to do a scheduled deletion by ourselves?). The policy caching settings are the default ones. Thank you!
o
Hi @Yehiel Mizrahi, There are triggers for “delete”, “insert” and “update” operations which inserts rows to the
policy_revision
table. They are named
policy_on_delete
,
polic_on_insert
, and
policy_on_update
in the MySQL schema. You could drop all of those triggers, or only some set of them (Ex: leave only
policy_on_delete
so that you have a backup of deleted policies) without any problem. Clearing all rows in
policy_revision
table to save some space is OK, too. For the OOM problem, how is memory requests/limits configured for cerbos?
y
@oguzhan Cerbos config is pretty basic:
Copy code
".cerbos.yaml": |-
    server:
      # Configure Cerbos to listen on a Unix domain socket.
      grpcListenAddr: "127.0.0.1:3593"  
      # Note that adminAPI will be enabled only for PermissionsService
      adminAPI:
        enabled: false
    storage:
      driver: mysql
      mysql:
        dsn: ${MYSQL_USERNAME}:${MYSQL_PASSWORD}@tcp(${MYSQL_HOST}:3306)/${MYSQL_DATABASE}
    compile:
      cacheDuration: 60s
and these are the settings for Cerbos resource requirements:
Copy code
containers:
      - name: cerbos
        image: "<http://ghcr.io/cerbos/cerbos:0.40.0|ghcr.io/cerbos/cerbos:0.40.0>"
        resources:
          requests:
            memory: "512Mi"
            cpu: "1000m"
          limits:
            memory: "512Mi"
            cpu: "1000m"
o
When did you start to see this error first, and which version of Cerbos you are using? Did the traffic increased, or the payload size of the requests increased lately? Have you added new policies to the mix as of late?
y
There is only one policy, but it is pretty large (~5Mb). We are using image version 0.40.0 Traffic & payloads didn't increase.
y
@Serhii Pavlovskyi