hi cerbos running into some odd helm chart deploym...
# help
d
hi cerbos running into some odd helm chart deployment issues - details in thread
so while I can run the helm command ok
Copy code
helm upgrade --install cerbos cerbos/cerbos --namespace cerbos-dev --version=0.29.0 --values=./cerbos_config.yaml --kubeconfig /tmp/kube_config.yaml
  shell: sh -e {0}
"cerbos" has been added to your repositories
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "cerbos" chart repository
Update Complete. ⎈Happy Helming!⎈
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /tmp/kube_config.yaml
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /tmp/kube_config.yaml
NAME	NAMESPACE	REVISION	UPDATED	STATUS	CHART	APP VERSION
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /tmp/kube_config.yaml
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /tmp/kube_config.yaml
Release "cerbos" does not exist. Installing it now.
NAME: cerbos
LAST DEPLOYED: Wed Jul 26 17:23:13 2023
NAMESPACE: cerbos-dev
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
You have successfully deployed Cerbos.
however. . . the app is stuck in ‘pending-install’ state
or more to the point
Copy code
dmeyerson@C02G73VMMD6P cerbos-ABAC % helm status cerbos -n cerbos-dev 
NAME: cerbos
LAST DEPLOYED: Wed Jul 26 17:23:13 2023
NAMESPACE: cerbos-dev
STATUS: pending-install
REVISION: 1
TEST SUITE: None
NOTES:
You have successfully deployed Cerbos.
i do notice this at deploy time - might this keep the helm chart stuck in ‘pending-install’
Copy code
50s         Warning   Unhealthy           pod/cerbos-996bd55cb-dtbws    Readiness probe failed: Get "<http://10.32.5.123:3592/_cerbos/health>": dial tcp 10.32.5.123:3592: connect: connection refused
c
That warning is normal while the pods are starting. Are they healthy now?
kubectl get deploy cerbos -n cerbos-dev
I think getting stuck on
pending-install
is a known issue with Helm. You can try doing a
helm rollback
and rolling back to the previous release to "reset" the state or just simply uninstall and reinstall the chart.
d
I uninstalled and reinstalled a few time w/ same end result
I will try to hit the _cerbos/health endpoint manually
is there a way to set rediness probe timeout value - don’t see anything to control readiness probe config in the helm chart schema - https://artifacthub.io/packages/helm/cerbos/cerbos?modal=values
Copy code
6m8s        Warning   Unhealthy           pod/cerbos-5859d99ff8-8r9t8    Readiness probe failed: Get "<http://10.32.5.129:3592/_cerbos/health>": dial tcp 10.32.5.129:3592: connect: connection refused
c
If the pod is not ready for so long, then that suggests something wrong with the config that prevents Cerbos from starting. Please check the logs of the pod.
d
what odd is that I can kubectl port-forward the curl the pod directly . . seems to be up and ‘SERVING’
Copy code
dmeyerson@my_laptop cerbos-ABAC % curl <http://localhost:3592/_cerbos/health>
{"status":"SERVING"}
here are the pod logs 😕 - seems ok
Copy code
%  kubectl logs cerbos-5859d99ff8-8r9t8  -n cerbos-dev
{"log.level":"info","@timestamp":"2023-07-27T16:09:20.242Z","log.logger":"cerbos.server","message":"maxprocs: Leaving GOMAXPROCS=4: CPU quota undefined"}
{"log.level":"info","@timestamp":"2023-07-27T16:09:20.243Z","log.logger":"cerbos.server","message":"Loading configuration from /config/config.yaml"}
{"log.level":"info","@timestamp":"2023-07-27T16:09:20.244Z","log.logger":"cerbos.git.store","message":"Cloning git repo from <https://git.viasat.com/OPS-ML-Engineering/cerbos-ABAC.git>","dir":"/work"}
{"log.level":"info","@timestamp":"2023-07-27T16:09:20.581Z","log.logger":"cerbos.git.store","message":"Opening git repo","dir":"/work"}
{"log.level":"info","@timestamp":"2023-07-27T16:09:20.592Z","log.logger":"cerbos.index","message":"Found 2 executable policies"}
{"log.level":"info","@timestamp":"2023-07-27T16:09:20.593Z","log.logger":"cerbos.telemetry","message":"Telemetry disabled"}
{"log.level":"info","@timestamp":"2023-07-27T16:09:20.593Z","log.logger":"cerbos.git.store","message":"Polling for updates every 1m0s","dir":"/work"}
{"log.level":"info","@timestamp":"2023-07-27T16:09:20.595Z","log.logger":"cerbos.grpc","message":"Starting gRPC server at :3593"}
{"log.level":"info","@timestamp":"2023-07-27T16:09:20.598Z","log.logger":"cerbos.http","message":"Starting HTTP server at :3592"}
c
So what makes you think it's the health check? What's the output of
kubectl get deploy cerbos -n cerbos-dev
?
d
well I don’t think it is the health check, but does it does seem like the readiness check fails once (as seen in events) and that the helm chart is stuck in ‘pending-install’ even though the workload is healthy - its more like ~ ‘why does the helm chart think the readiness probe is failing or why does it fail for the helm chart?’
c
The Helm chart doesn't do a health check. It waits for all the deployed resources to become available. Try running Helm with verbose logging to see if that gives you a clue as to why it gets stuck in pending install state
d
i just ran it w/ --wait -turns out the service account I used to run ‘helm . . .’ needed more verbs+resources, but still stuck in ‘pending-install’ - will dump updates here
ok got it fixed - here are some observations
• helm issue: helm as a cli has a race condition - so while on the client end one may that
'STATUS: deployed'
in reality on the server side it may yet get stuck • permissions and service accounts: one needs to run
helm .. .
with an account having sufficient permission, because I using some automation w/ service account creds rather than just running
helm
myself helm get stuck (silenty) in
pending-install
because of insufficient set of verbs+resources associated w/ the account running the helm command, rather than raise an error complaining that ~ “service account X doesn’t get to perform Y on resource Z”
but all good now
anyway short version - helm + service account issue unrelated to cerbos itself