The Hidden Gotcha in NKP 2.16: No NFS on Rocky Linux Workers

While testing Nutanix AI 2.5 on NKP 2.16, I found that NFS fails on Rocky Linux CIS workers but works fine on Ubuntu CIS. Here’s my technical analysis.

The Hidden Gotcha in NKP 2.16: No NFS on Rocky Linux Workers

When Nutanix released NKP 2.16 with Kubernetes 1.33.2, I adopted it right away, but I didn’t have the chance to test Nutanix Enterprise AI 2.4.
So when I finally needed to deploy it, version 2.5 was already out.
That’s why I only discovered this behavior now and not earlier.


Everything looked solid from the start.
The CIS images deployed cleanly, the control planes and worker nodes came up without issues, and all components appeared healthy.
It was only when I ran a quick validation test with Nutanix Files, creating a simple PVC claim, that I noticed something unusual.

The volume never got bound.
After a few checks, it became clear that Rocky Linux CIS no longer supports NFS mounts by default, which means any workload relying on Nutanix Files will fail to attach RWX volumes unless you switch to Ubuntu CIS or build your own image.


Background: What Changed in NKP 2.16

Starting with NKP 2.16, Nutanix moved to a simplified image catalog.
Instead of four different OS images, only two are now provided, both based on CIS-hardened builds of Ubuntu and Rocky Linux.
This change makes it easier to maintain compliance and consistency across clusters, but it also removes the non-CIS variants that previously included broader protocol support.

The CIS images are locked down by design.
They disable unused services, restrict kernel modules, and minimize the attack surface.
That’s great for security posture and compliance audits, but sometimes it hides small behavioral differences that only surface under specific workloads, like AI inference pipelines that depend on shared file systems.


Verifying NFS Connectivity on NKP 2.16

The issue first appeared while running the same basic test suggested in the Nutanix Enterprise AI 2.5 documentation:

kubectl apply -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-rwx-pvc
  namespace: default
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
  storageClassName: nai-nfs-storage
EOF

The expected output for a healthy cluster is a PVC in Bound state:

NAME           STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
test-rwx-pvc   Bound    pvc-0df5145a-590c-4f28-9f9d-cefefc55266c   1Gi        RWX            nai-nfs-storage   5s

Instead, my test remained stuck in Pending, even after several retries:

NAME           STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS      AGE
test-rwx-pvc   Pending                                      nai-nfs-storage   24s

Everything else in the cluster looked normal.
All nodes were Ready, running Rocky Linux 9.6 (Blue Onyx) with kernel 5.14.0-570.18.1.el9_6.x86_64 and container runtime containerd 1.7.27-d2iq.1.

To rule out a storage or configuration issue, I redeployed the same setup using Ubuntu 22.04.5 LTS CIS.
This time, the PVC was created successfully, and the NFS mount completed instantly.
That single test confirmed the problem wasn’t in NKP, NAI, or Nutanix Files, it was in the OS layer.


What’s Missing in Rocky CIS

Digging deeper on a Rocky CIS worker node made it clear.
The NFS packages and binaries are there, but the essential RPC binding layer is completely disabled.

rpm -qa | grep -E 'nfs|rpcbind'
libnfsidmap-2.5.4-34.el9.x86_64
nfs-utils-2.5.4-34.el9.x86_64
rpcbind-1.2.6-7.el9.x86_64

systemctl list-unit-files | grep rpcbind
rpcbind.service    masked    enabled
rpcbind.socket     masked    enabled

The client tools exist, yet the services required for NFS negotiation are masked at systemd level.
Even a privileged user can’t start them without unmasking, which violates the CIS Level 2 hardening baseline.
In practice, NFS mounts silently fail and any PVC using an RWX storage class stays in Pending forever.


Comparing Ubuntu CIS and Rocky CIS

Ubuntu 22.04.5 CIS

  • NFS and RPC packages installed
  • RPC services available but disabled by default
  • NFS mounts functional
  • Nutanix Files works out of the box

Rocky Linux 9.6 CIS

  • NFS and RPC packages installed
  • RPC services masked and cannot start
  • NFS mounts blocked
  • PVCs remain stuck in Pending

Why It Happens

Both OS images are fully CIS-compliant, but they enforce different philosophies regarding exposed network services.

Ubuntu CIS uses a secure but available model.
The NFS client stack (rpcbind, nfs-common) is present and inactive until explicitly needed.
This allows NFS-based workloads such as Nutanix Files or Enterprise AI to work seamlessly without breaching compliance.

Rocky Linux CIS applies a stricter secure by exclusion model.
Services like rpcbind and nfsd are masked at the system level, meaning even if the binaries exist, the system refuses to start them unless the administrator unmasks them manually.
It’s a deliberate CIS choice to reduce the attack surface.

Both are valid implementations, yet their behavior diverges in production scenarios where distributed workloads rely on network storage.


Key takeaway

Ubuntu CIS favors flexibility and operational readiness, while Rocky CIS enforces stricter isolation.
If your workloads need to access Nutanix Files, Ubuntu CIS will work out of the box.
Rocky CIS will not, unless you customize the image or relax hardening.


Practical Check

If you’re unsure which image your NKP cluster is using, or if your RWX volumes are stalling, here’s a quick way to confirm whether NFS support is active.

Run these commands on a worker node (no root privileges required):

echo "=== NFS check ==="
cat /proc/filesystems | grep nfs || echo "No NFS support"
systemctl list-unit-files | grep rpcbind | head -n 5
rpcinfo -p 2>/dev/null || echo "rpcbind not responding"

echo "=== SMB/CIFS check ==="
cat /proc/filesystems | grep cifs || echo "No CIFS support"
which mount.cifs || echo "mount.cifs not found"

If you see rpcbind.service masked or “No NFS support,” your worker node cannot mount Nutanix Files volumes.
Switching to Ubuntu CIS or creating a custom Rocky image will resolve the issue immediately.


How to Make NFS Work Again

If your workloads depend on Nutanix Files (NFS), there are two reliable paths forward.

1. Switch to Ubuntu CIS images
Ubuntu CIS keeps the NFS stack disabled but available, allowing RWX volumes to function as intended without breaking compliance.
You can verify compatible images in your NKP image catalog or refer to the Nutanix Kubernetes Platform documentation.

2. Build a custom image
If you need to stay on Rocky CIS, you can create a custom OS image where rpcbind is unmasked and NFS kernel modules are verified.
Follow the Bring Your Own Image guide for build, signing, and validation details.

Both approaches restore NFS functionality, but the first one aligns with Nutanix’s standard workflow and keeps future upgrades simpler.


Recommendation for Cluster Operators

For mixed environments where both CIS profiles are in use, it’s a good idea to:

  • Verify NFS functionality as part of post-upgrade validation, especially when updating to new NKP or CIS releases.
  • Standardize worker node images to reduce behavioral drift between Ubuntu and Rocky pools.
  • Use storage-class labels to direct RWX workloads to Ubuntu nodes automatically if mixed OS pools are unavoidable.
  • Document any custom image adjustments to maintain compliance visibility during audits.

A few minutes of validation upfront can prevent hours of debugging later, especially when your AI or analytics pipelines silently stall waiting for NFS mounts that never complete.


Understanding the Difference

While both CIS-hardened images are valid from a compliance standpoint, their philosophies differ.
Ubuntu aims for a balance between security and usability, ensuring core client services can be activated when needed.
Rocky takes a stricter approach that prioritizes defense-in-depth and predictable immutability, even at the expense of some functionality.

Neither model is wrong.
But when deploying workloads that depend on shared NFS storage, this small difference can have a big operational impact.
In my case, it only surfaced when testing Nutanix AI 2.5, months after NKP 2.16 was released, a reminder that even small configuration shifts at the OS level can ripple up through the stack.