is there some kind of caching in cerbos i changed my resourc Cerbos Community #help

is there some kind of caching in cerbos? i changed...

Jesum Yip

02/06/2023, 9:02 AM

is there some kind of caching in cerbos? i changed my resource policy versions from "abc" to "def" and i find that when i call Cerbos check_resources, i can still get EFFECT.ALLOW for policy version "abc"

Jesum Yip

02/06/2023, 9:02 AM

if i restart the cerbos pods, the problem goes away

Charith (Cerbos)

02/06/2023, 9:13 AM

Yes, policies are cached but

abc

should have been evicted when you changed it. Which store are you using?

Jesum Yip

02/06/2023, 9:13 AM

i am using git driver - backend is gitlab

Jesum Yip

02/06/2023, 9:16 AM

cerbos version = 0.24.0

Jesum Yip

02/06/2023, 9:18 AM

i have 3 pods running. if i kill just 1 pod and let it restart, then i find some of my calls to cerbos will return EFFECT.ALLOW, and some of the calls will return EFFECT.DENY (no changes in code - just repeated runs of the same code).

Emre (Cerbos)

02/06/2023, 9:25 AM

How long is your

updatePollInterval

set to?

Emre (Cerbos)

02/06/2023, 9:26 AM

When you update the policy and restart only one pod, it will automatically start with the latest policy. However, those that are alive, will only update when the polling interval expires.

Jesum Yip

02/06/2023, 9:28 AM

1minute.

Emre (Cerbos)

02/06/2023, 9:33 AM

Do the pods start responding consistently 1 minute after you make the changes to the policy?

Jesum Yip

02/06/2023, 9:34 AM

Let me test that. I suspect the worst case scenario is 2minutes - a few seconds

Emre (Cerbos)

02/06/2023, 9:35 AM

We have users running in production with 10 seconds update interval with no issues.

Jesum Yip

02/06/2023, 9:35 AM

Ok. I will test that.

Jesum Yip

02/06/2023, 9:40 AM

I've just reran the test with no changes. I'm still getting inconsistent results - this is more than 10mins after.

Jesum Yip

02/06/2023, 9:41 AM

Let me check the logs for the pods

Jesum Yip

02/06/2023, 9:53 AM

Interesting. The pod logs show a consistent result of effect.deny but I'm getting different results in the cloud shell where I run the code. My only conclusion is that Google's cloud shell has some kind of caching mechanism in place.

Charith (Cerbos)

02/06/2023, 9:55 AM

Do you have a proxy or load balancer in front of the Cerbos service? Maybe a service mesh?

Jesum Yip

02/06/2023, 9:55 AM

Yes. Istio.

Jesum Yip

02/06/2023, 9:55 AM

I need to check there as well.

Charith (Cerbos)

02/06/2023, 9:57 AM

Do you have a traffic split that's potentially sending some of your requests to a different service?

Jesum Yip

02/06/2023, 9:57 AM

No. It's a single gke cluster with only cerbos in it. Plus, I check the output from the python sdk and it's properly formatted as a cerbos response. No errors from the sdk.

Jesum Yip

02/06/2023, 10:02 AM

I can't find any cache config in istio virtual services and gateways.

Jesum Yip

02/06/2023, 10:02 AM

I'll try the same python code later outside of the Google cloud shell - from a local laptop.

👍 1

Charith (Cerbos)

02/06/2023, 5:15 PM

Hey, if you edit an existing policy file and change its version in place, the old version will still remain in the compile cache until either the store is reloaded using the Admin API or Cerbos itself is restarted. We hadn't anticipated that people will change the policy identifiers while the system is live. We'll put out a fix for that soon. Is your script doing something like that? If so, that could explain why you're getting inconsistent results. The instances where the old policy is cached will return the results as if the policy still exists while instances where it is not cached will simply return a

DENY

🆗 1

Jesum Yip

02/06/2023, 10:27 PM

it's been 12 hours and i'm still getting the same problem. so i just submitted 3 API calls to cerbos. two of them returned EFFECT.ALLOW for resource-id "cspm123". one returned EFFECT.DENY.

Jesum Yip

02/06/2023, 10:28 PM

based on the pod logs, i can see that cerbos IS evaluating the policies wrongly. the return values i get (EFFECT.ALLOW and EFFECT.DENY) in the logs is consistent with what my python code is getting.

Jesum Yip

02/06/2023, 10:28 PM

the strangest part of this is in my resource policies, the resource-kind is actually "cspm". there is no "cspm123" (there was - i defined it previously but i've since changed it)

Jesum Yip

02/06/2023, 10:30 PM

i suspect what i am experiencing is the behaviour you mentioned Charith. i do remember modifying the policy versions in-place (switching from

development

version to

production

version for testing).

Charith (Cerbos)

02/07/2023, 9:04 AM

Yeah, I think that's probably what's happening. If you have the Admin API enabled, you can force each pod to reload itself using

cerbosctl store reload

(https://docs.cerbos.dev/cerbos/latest/cli/cerbosctl.html#reload) or just roll the pods manually and I think the issue will go away. We have a fix in progress. Will get it out soon.

👍 1

12 Views

Open in Slack

Previous Next