The NKP Certificate Catch-22
What happens when the only tool that can fix your credentials needs those same credentials to run.
Nutanix Kubernetes Platform handles control plane certificate lifecycle through automated renewal mechanisms built into the Cluster API workflow. When properly configured, certificates renew themselves well before expiration and administrators never need to think about them. But when they are not configured, and the kubeconfig credentials expire before anyone notices, the result is a quiet lockout that can escalate quickly.
This article walks through a real recovery scenario where the management cluster was fully operational but completely inaccessible from the CLI. No SSH, no valid kubeconfig, no bootstrap context. Two recovery paths exist depending on the environment, and both go through the Kommander UI.
How the Lockout Happens
When you create an NKP cluster, the CLI generates a kubeconfig file containing a client certificate signed by the cluster's certificate authority. This certificate has its own expiration, independent of the control plane certificates that secure internal Kubernetes communication. By default, NKP sets the control plane certificate expiry to 180 days, but the client certificate embedded in the kubeconfig follows standard Kubernetes behavior and typically expires after one year.
The distinction matters. The control plane can be perfectly healthy. The API server responds, pods run, the Kommander dashboard loads without issues. But if the client certificate in your kubeconfig has expired, every kubectl and nkp command fails with a credentials error. The cluster is not broken. You simply cannot prove who you are.
In this case, the kubeconfig was generated during the initial cluster bootstrap and never refreshed. Twelve months later, the certificate expired and the bastion host lost all CLI access to the management cluster.
Recognising the Problem
The first indication is usually a failed nkp command. Attempting to update the control plane or retrieve a new kubeconfig returns an authentication error:
the server has asked for the client to provide credentials
To confirm, decode the client certificate from the kubeconfig and check its validity period:
kubectl config view --kubeconfig=nkp-mgmt.conf --raw \
-o jsonpath='{.users[0].user.client-certificate-data}' \
| base64 -d | openssl x509 -noout -dates
The output shows notBefore and notAfter timestamps:
notBefore=Mar 6 14:49:30 2025 GMT
notAfter=Mar 6 14:54:31 2026 GMT
If notAfter is in the past, the certificate is expired and the kubeconfig is useless until you obtain fresh credentials.
Why the Obvious Recovery Paths Fail
The natural instinct is to regenerate the kubeconfig using nkp get kubeconfig, but that command itself requires a valid kubeconfig to authenticate against the management cluster. This creates a circular dependency.
SSH into a control plane node would bypass the problem entirely. The file at /etc/kubernetes/admin.conf is maintained by kubelet and always contains valid credentials. But if the VM image was provisioned with key-based authentication and you no longer have the private key, or if password authentication was disabled during cluster creation, console access through Prism Central may also fail.
The GRUB single-user mode trick works in theory, but in production environments with encrypted disks or restricted boot configurations, it is not always viable.
The Quick Path: Kommander Generate Token
If the Kommander UI is accessible through a browser, the fastest recovery is built right into the interface. Click on the user menu in the top right corner of the Kommander dashboard and select Generate Token. Kommander produces a complete kubeconfig with a Dex OIDC token, the cluster CA certificate, and all the kubectl commands needed to configure access. Copy the commands, paste them on the bastion host, and CLI access is restored immediately.
The generated token is valid for approximately 24 hours. That is more than enough time to regenerate a proper kubeconfig with a fresh client certificate:
nkp get kubeconfig --cluster-name nkp-mgmt > nkp-mgmt-refreshed.conf
Verify the new credentials:
kubectl config view --kubeconfig=nkp-mgmt-refreshed.conf --raw \
-o jsonpath='{.users[0].user.client-certificate-data}' \
| base64 -d | openssl x509 -noout -dates
notBefore=Mar 6 14:49:30 2025 GMT
notAfter=Mar 6 15:01:11 2027 GMT
The new certificate is valid until 2027. Problem solved. This is the short path, and it works as long as you have access to the Kommander UI and a valid user account.
The Longer Path: Recovery Through the Kubernetes Dashboard
If the Generate Token option is not available, if the Dex identity provider is misconfigured, or if the token does not grant sufficient permissions to regenerate the kubeconfig, the Kubernetes Dashboard provides a more robust alternative. The dashboard authenticates through its own service account and does not rely on the expired kubeconfig. It has full access to the Kubernetes API and can create resources in any namespace.
Step 1: Access the Dashboard
From the Kommander UI, navigate to the management cluster view and open the Kubernetes Dashboard through the Actions menu. The dashboard provides direct access to cluster resources including Service Accounts, Secrets, and Cluster Role Bindings.
Step 2: Create an Emergency Service Account
Switch the namespace selector to kube-system and use the Create button (the + icon in the top right) to apply the following manifest:
apiVersion: v1
kind: ServiceAccount
metadata:
name: emergency-admin
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: emergency-admin-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: emergency-admin
namespace: kube-system
---
apiVersion: v1
kind: Secret
metadata:
name: emergency-admin-token
namespace: kube-system
annotations:
kubernetes.io/service-account.name: emergency-admin
type: kubernetes.io/service-account-token
This creates a service account with cluster-admin privileges and a long-lived token. The Secret of type kubernetes.io/service-account-token is automatically populated by Kubernetes with a signed JWT that does not expire based on certificate validity.
Step 3: Retrieve the Token
Navigate to Secrets in the kube-system namespace, open emergency-admin-token, and copy the value of the token field. This is a base64-decoded JWT ready for use.
Step 4: Build a New Kubeconfig
On the bastion host, construct a kubeconfig using the service account token:
TOKEN="<paste-token-here>"
kubectl config set-cluster nkp-mgmt \
--server=https://<control-plane-vip>:6443 \
--insecure-skip-tls-verify=true \
--kubeconfig=nkp-mgmt-new.conf
kubectl config set-credentials emergency-admin \
--token=$TOKEN \
--kubeconfig=nkp-mgmt-new.conf
kubectl config set-context emergency-admin@nkp-mgmt \
--cluster=nkp-mgmt \
--user=emergency-admin \
--kubeconfig=nkp-mgmt-new.conf
kubectl config use-context emergency-admin@nkp-mgmt \
--kubeconfig=nkp-mgmt-new.conf
Verify access:
kubectl get nodes --kubeconfig=nkp-mgmt-new.conf
If the nodes are listed, CLI access is restored.
Step 5: Regenerate a Proper Kubeconfig
The service account token works but is not the ideal long-term credential. Use the restored access to generate a fresh kubeconfig with a new client certificate:
nkp get kubeconfig --cluster-name nkp-mgmt \
--kubeconfig=nkp-mgmt-new.conf > nkp-mgmt-refreshed.conf
This produces a kubeconfig with a freshly signed client certificate. Verifying the new credentials confirms the renewal:
notBefore=Mar 6 14:49:30 2025 GMT
notAfter=Mar 6 15:01:11 2027 GMT
The new certificate is valid until 2027. CLI access is now fully restored with proper credentials.
Step 6: Clean Up
Once the refreshed kubeconfig is verified, remove the emergency resources:
kubectl delete secret emergency-admin-token -n kube-system \
--kubeconfig=nkp-mgmt-refreshed.conf
kubectl delete clusterrolebinding emergency-admin-binding \
--kubeconfig=nkp-mgmt-refreshed.conf
kubectl delete sa emergency-admin -n kube-system \
--kubeconfig=nkp-mgmt-refreshed.conf
Delete the temporary kubeconfig from the bastion:
rm nkp-mgmt-new.conf
While You Are Back In, Check the Control Plane Certificates Too
The problem described in this article is a kubeconfig credential expiry - the client certificate that proves your identity to the API server. This is separate from the control plane certificates that secure internal Kubernetes communication between etcd, the API server, the controller manager, and the scheduler. In this scenario, the control plane was perfectly healthy. But now that CLI access is restored, it is worth verifying that the internal certificates are also covered.
Kubernetes renews control plane certificates automatically during upgrades. For clusters that run for extended periods without upgrades, NKP provides two automated renewal mechanisms that handle the rotation independently.
CAPI Method (Recommended)
The Cluster API method is the preferred approach and works with all infrastructure providers except EKS and AKS. It monitors certificate expiration and triggers renewal through the CAPI reconciliation loop.
To enable on an existing cluster:
nkp update controlplane nutanix \
--cluster-name=<cluster-name> \
--control-plane-renew-certificates-before=50 \
--kubeconfig=<management-kubeconfig>
The value represents the number of days before expiration when automatic renewal begins. A value of 50 means the certificate renews 50 days before it would expire. For new clusters, the same flag can be passed at creation time with nkp create cluster.
Systemd Method
The systemd method uses a timer and service on each control plane node to periodically renew certificates. It applies to all providers except Nutanix, EKS, and AKS, and requires Python 3.5 or later on the control plane hosts.
nkp create cluster <provider> \
--certificate-renew-interval=60 \
--cluster-name=<cluster-name>
The interval is expressed in days. A value of 60 means the certificates renew every 60 days regardless of their expiration date.
Operational Habits That Prevent the Lockout Itself
Automated renewal covers the control plane, but the kubeconfig client certificate that caused this lockout is a different layer. A few practices help avoid the same situation:
Keep a record of the kubeconfig generation date and set a calendar reminder before the one-year expiry. Periodically regenerate kubeconfigs with nkp get kubeconfig as part of routine cluster maintenance. Consider maintaining a break-glass service account with a long-lived token stored securely for emergency access. The token does not expire with certificate rotation, making it a reliable fallback when everything else fails.
Final Thoughts
Certificate expiry on an NKP management cluster is not a catastrophic failure. The cluster continues to run, workloads are unaffected, and the Kommander dashboard remains accessible. But losing CLI access to the management cluster blocks all lifecycle operations on every workload cluster it manages. The quick path through Kommander's Generate Token resolves the situation in minutes. The Kubernetes Dashboard fallback covers the edge cases where that is not enough. Either way, the better path is making sure you never need them.