Hi After upgrading Cerbos image to 0.43.0 on one o...
# help
y
Hi After upgrading Cerbos image to 0.43.0 on one of out services all of our tests related to Cerbos authorization checks started failing. Upon debugging I have noticed that policy that was returning ALLOW on certain check request is now returning DENY. Was there some change in regards to policy structure? I also would like to understand how to extract a single policy using
cerbosctl
- when running:
Copy code
cerbosctl get resource_policy --name=RECORD --server=localhost:3594 --username=<my_user> --password=<my_pass> --plaintext
I get:
Copy code
cerbosctl: error: failed to list: error while listing policies: could not get policy: rpc error: code = ResourceExhausted desc = grpc: received message larger than max (12398930 vs. 4194304)
But policies are not that big - the entire policy table is 5 MB (see the screenshots) Is it trying to get all of the policies? Why if I am trying to get only one? Thank you!
s
Hello. Regarding the first point, can you provide an example demonstrating a policy and request that was previously returning an
ALLOW
that now returns a
DENY
?
y
Hi, The second issue is preventing me from doing that now.
s
I don't need the entire policy set. Perhaps just the relevant bits of what you were able to retrieve with this?
Upon debugging I have noticed that policy that was returning ALLOW on certain check request is now returning DENY.
(we will certainly address the latter point, I've just been focussing on the former recently so context is fresh)
y
There is another thing that I have noticed: 1. The service A that is sidecared with Cerbos uses Cerbos for checking the action. Cerbos there was upgraded to 0.43.0 2. Another service - let's call it service B - that is used for policy creation & update is still on 0.40.0. Was there some change in policy creation and checking from 0.40.0 to 0.43.0?
s
Yes, the underlying engine was largely refactored between those releases. It's worth noting that there was a batch of fixes in the followup releases, too (we're now on
v0.45.1
)
y
OK. What is your advice on this matter?
It seems like either policy creation or checking is not backwards compatible.
s
If you were able to provide the context I mentioned above, it would be a lot easier to debug the issue. Is that possible?
y
I am working on it. First, I will change the default grpc message size in Cerbos config and enable audit logs. Then, I'll get the policies & audit logs for 0.40.0 and 0.43.0 using cerbosctl.
Then we will have the proper context that will enable us to move forward.
👍 1
What format do you want the policies provided with? Is JSON enough?
s
Yes! JSON or YAML--either works. Thanks. Really, I'm just interested in the request that you're sending and the rule that's returning the differing result, I wouldn't need the entire policy (just anything that's relevant). Just to check, is it a scoped policy (or does it use the default scope:
""
?
y
Its a resource based policy
👍 1
s
The scope is just a field defined in the policy, something like this:
Copy code
---
apiVersion: api.cerbos.dev/v1
resourcePolicy:
  resource: "album:object" 
  version: "default" 
  scope: "acme.corp" <--
  ...
If it's omitted, then it's just the default scope (which is useful context for debugging).
y
We don't have scopes
👍 1
@Sam Lock (Cerbos) each time I run:
Copy code
cerbosctl get resource_policy --name=RECORD --server=localhost:3594 --username=cerbos --password=cerbosAdmin --plaintext
I get:
Copy code
cerbosctl: error: failed to list: error while listing policies: could not get policy: rpc error: code = ResourceExhausted desc = grpc: received message larger than max (12550547 vs. 4194304)
even though I set
Copy code
maxRecvMsgSizeBytes: 33554432      # 32MB in bytes
in my Cerbos config:
Copy code
server:
      # Configure Cerbos to listen on a Unix domain socket.
      grpcListenAddr: "127.0.0.1:3593"  
      # Note that adminAPI will be enabled only for PermissionsService
      adminAPI:
        enabled: true
        adminCredentials:
          username: cerbos
          passwordHash: whatever=
      advanced:
        grpc:
          maxRecvMsgSizeBytes: 33554432      # 32MB in bytes
    audit:
      enabled: true
      backend: file
      file:
        path: stdout
    storage:
      driver: mysql
      mysql:
        dsn: ${MYSQL_USERNAME}:${MYSQL_PASSWORD}@tcp(${MYSQL_HOST}:3306)/${MYSQL_DATABASE}
    compile:
      cacheDuration: 60s
maxRecvMsgSizeBytes: 33554432
doesn't seem to kick in
s
Hmm. It's tricky for me to reproduce this locally (the PDP is responding to my config changes as expected). Can you raise an issue with as much info as you can provide, and one of us can take a look?
y
Is there any way to see if the changes kick in through logs?
The issue is when running cerbosctl to get the policies after I port-forward to the pod where Cerbos runs:
Copy code
cerbosctl: error: failed to list: error while listing policies: could not get policy: rpc error: code = ResourceExhausted desc = grpc: received message larger than max (12398930 vs. 4194304)
s
That's quite a vague error message--it's difficult to know what's going on without more information (policy set, individual and total sizes, formatted config etc). It'll be better if you raised an issue on our Github with as much of this information as possible and one of us will take a look properly.
y
The total size of policy table is 5M, as you can see in the screenshot.
b
I haven't come across this issue but as a general Cerbos user I'm curious about it. Yehiel, if you make a Github issue, would be interested to track it 😇
c
https://cerboscommunity.slack.com/archives/C02A364JYMQ/p1751887410141939?thread_ts=1751815509.482709&amp;cid=C02A364JYMQ The
maxRecvMsgSizeBytes
setting is for the Cerbos PDP defining how big the client's request could be. The error you're getting from
cerbosctl
is the opposite. The server is sending a 12MiB message when the client is only able to accept 4MiB. That's the default value for gRPC clients.
👏 1
y
@Charith (Cerbos) Afarin! I figured that out by now and trying to resolve the issue with API. Should I open a github issue on it?
c
Feel free to create an issue. I am baffled by how that single policy is 12MiB though. Are you able to run
SELECT LENGTH(definition) FROM  policy WHERE name = 'RECORD'
and post the output?
y
1117519
c
Hmm... that's only 1 MiB. Do you mind posting the output of
SELECT SUM(LENGTH(definition)) FROM policy
as well?
y
And it is not 12Mb
12539353 It seems like it is selecting them all
c
Yeah, that seems to be the case. The filter is not getting applied for some reason. Thanks for the info. I'll create issues to track these.
👁️ 1
y
Here is one of the policy examples:
resource_PRODUCT_vdefault.json
This policy is created by Cerbos 0.40.0 and stored in MySQL DB When Cerbos (that is connected to the same DB!) 0.43.0 checks authorization for some action on that resource it returns DENY while the same request from from 0.40.0 returns ALLOW.
c
Are you generating these policies programmatically?
y
Yes
Just a sec
What do you mean? I used Cerbos SDK to extract them from MySQL server that Cerbos uses to store those policies.
Policies themselves are generated programmatically with a service that is sidecared with Cerbos 0.40.0 with Admin API enabled
c
I meant the contents of the policy. Are you generating that with code? Just a side note but it seems a bit inefficient to do it this way. The policy size would keep growing indefinitely as well and things would get slower and slower.
y
Yes, the policies are generated programmatically. As soon as we add a new role with actions attached to it for certain group ID we update the policy
And since we chose a resource based approach then for each role and the related group ID that it was created for we need to update the policy for that resource
c
I'd personally store the role-mapping in a database table and make it do the lookup instead of repeatedly generating expressions like
"(\"cnc:3ddb7420-b82a-40a9-84d6-dc9381788b21#role:35568e6e-f807-406b-b3ce-e074ea8bd28e\" in request.principal.attr.cncs[request.resource.attr.cnc].roles)"
. You can then attach the result of that lookup as a principal attribute in the call to Cerbos and the policy would be much simpler and wouldn't grow indefinitely.
y
You mean I should use my own table for the lookup and if the result is positive then I query a generic Cerbos policy, right?
While the generic Cerbos policy has no info about the groupIDs and roleIDs whatsoever
c
Yes. You can lookup all that information from the database much more efficiently because it's optimized for that kind of work.
y
But then it defies the purpose of using the attributes and conditions on attributes as a way to save the effort for us, I mean the mechanism is already there - why not to use it?
And it can save me some race conditions at updates since I need to lock the whole process, I mean that the whole attribute check up should be done as a single transaction
c
Cerbos is not a database. It has to evaluate all of these expressions for every request and it can't optimize it away like a database can. It just won't scale as the number of policy rules increases. Eventually the system will grind to a halt.
y
I do understand that this is a kind of an abuse, but what you see is actually a testing environment - in production we have significantly lesser expressions
The issue, though, is the incompatibility between versions 0.40.0 and 0.43.0
Should I upgrade the Cerbos that is used for policy generation and management to 0.43.0 to resolve this issue?
Will this upgrade resolve this issue?
c
No, I don't think an upgrade will fix it. We'll look into why the behaviour change happened. However, I just want to make sure you understand that Cerbos wasn't meant to be used this way and it will eventually reach a breaking point. Unless your system is only ever expected to have a small, finite set of users, I wouldn't advise taking this route.
y
Thank you for the advice and the time you spent to look into this issue. We greatly appreciate your input and will discuss additional approaches to find better ways for attribute checks.
I'd like to understand more about the version incompatibility issue, though.
What do you think can be done on our side to better understand the issue and provide you with more information?
c
Keep using Cerbos 0.40 for now. We'll try to figure out why your policy no longer works in newer versions. The policy is inscrutable because it's so large. It'd be really helpful if you can send us a sample request to replicate the issue.