hi cerbos running into some odd helm chart deploym...
# help
hi cerbos running into some odd helm chart deployment issues - details in thread
so while I can run the helm command ok
Copy code
helm upgrade --install cerbos cerbos/cerbos --namespace cerbos-dev --version=0.29.0 --values=./cerbos_config.yaml --kubeconfig /tmp/kube_config.yaml
  shell: sh -e {0}
"cerbos" has been added to your repositories
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "cerbos" chart repository
Update Complete. ⎈Happy Helming!⎈
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /tmp/kube_config.yaml
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /tmp/kube_config.yaml
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /tmp/kube_config.yaml
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /tmp/kube_config.yaml
Release "cerbos" does not exist. Installing it now.
NAME: cerbos
LAST DEPLOYED: Wed Jul 26 17:23:13 2023
NAMESPACE: cerbos-dev
STATUS: deployed
You have successfully deployed Cerbos.
however. . . the app is stuck in ‘pending-install’ state
or more to the point
Copy code
dmeyerson@C02G73VMMD6P cerbos-ABAC % helm status cerbos -n cerbos-dev 
NAME: cerbos
LAST DEPLOYED: Wed Jul 26 17:23:13 2023
NAMESPACE: cerbos-dev
STATUS: pending-install
You have successfully deployed Cerbos.
i do notice this at deploy time - might this keep the helm chart stuck in ‘pending-install’
Copy code
50s         Warning   Unhealthy           pod/cerbos-996bd55cb-dtbws    Readiness probe failed: Get "<>": dial tcp connect: connection refused
That warning is normal while the pods are starting. Are they healthy now?
kubectl get deploy cerbos -n cerbos-dev
I think getting stuck on
is a known issue with Helm. You can try doing a
helm rollback
and rolling back to the previous release to "reset" the state or just simply uninstall and reinstall the chart.
I uninstalled and reinstalled a few time w/ same end result
I will try to hit the _cerbos/health endpoint manually
is there a way to set rediness probe timeout value - don’t see anything to control readiness probe config in the helm chart schema - https://artifacthub.io/packages/helm/cerbos/cerbos?modal=values
Copy code
6m8s        Warning   Unhealthy           pod/cerbos-5859d99ff8-8r9t8    Readiness probe failed: Get "<>": dial tcp connect: connection refused
If the pod is not ready for so long, then that suggests something wrong with the config that prevents Cerbos from starting. Please check the logs of the pod.
what odd is that I can kubectl port-forward the curl the pod directly . . seems to be up and ‘SERVING’
Copy code
dmeyerson@my_laptop cerbos-ABAC % curl <http://localhost:3592/_cerbos/health>
here are the pod logs 😕 - seems ok
Copy code
%  kubectl logs cerbos-5859d99ff8-8r9t8  -n cerbos-dev
{"log.level":"info","@timestamp":"2023-07-27T16:09:20.242Z","log.logger":"cerbos.server","message":"maxprocs: Leaving GOMAXPROCS=4: CPU quota undefined"}
{"log.level":"info","@timestamp":"2023-07-27T16:09:20.243Z","log.logger":"cerbos.server","message":"Loading configuration from /config/config.yaml"}
{"log.level":"info","@timestamp":"2023-07-27T16:09:20.244Z","log.logger":"cerbos.git.store","message":"Cloning git repo from <https://git.viasat.com/OPS-ML-Engineering/cerbos-ABAC.git>","dir":"/work"}
{"log.level":"info","@timestamp":"2023-07-27T16:09:20.581Z","log.logger":"cerbos.git.store","message":"Opening git repo","dir":"/work"}
{"log.level":"info","@timestamp":"2023-07-27T16:09:20.592Z","log.logger":"cerbos.index","message":"Found 2 executable policies"}
{"log.level":"info","@timestamp":"2023-07-27T16:09:20.593Z","log.logger":"cerbos.telemetry","message":"Telemetry disabled"}
{"log.level":"info","@timestamp":"2023-07-27T16:09:20.593Z","log.logger":"cerbos.git.store","message":"Polling for updates every 1m0s","dir":"/work"}
{"log.level":"info","@timestamp":"2023-07-27T16:09:20.595Z","log.logger":"cerbos.grpc","message":"Starting gRPC server at :3593"}
{"log.level":"info","@timestamp":"2023-07-27T16:09:20.598Z","log.logger":"cerbos.http","message":"Starting HTTP server at :3592"}
So what makes you think it's the health check? What's the output of
kubectl get deploy cerbos -n cerbos-dev
well I don’t think it is the health check, but does it does seem like the readiness check fails once (as seen in events) and that the helm chart is stuck in ‘pending-install’ even though the workload is healthy - its more like ~ ‘why does the helm chart think the readiness probe is failing or why does it fail for the helm chart?’
The Helm chart doesn't do a health check. It waits for all the deployed resources to become available. Try running Helm with verbose logging to see if that gives you a clue as to why it gets stuck in pending install state
i just ran it w/ --wait -turns out the service account I used to run ‘helm . . .’ needed more verbs+resources, but still stuck in ‘pending-install’ - will dump updates here
ok got it fixed - here are some observations
• helm issue: helm as a cli has a race condition - so while on the client end one may that
'STATUS: deployed'
in reality on the server side it may yet get stuck • permissions and service accounts: one needs to run
helm .. .
with an account having sufficient permission, because I using some automation w/ service account creds rather than just running
myself helm get stuck (silenty) in
because of insufficient set of verbs+resources associated w/ the account running the helm command, rather than raise an error complaining that ~ “service account X doesn’t get to perform Y on resource Z”
but all good now
anyway short version - helm + service account issue unrelated to cerbos itself