Why Inference Is Becoming a Control Plane Problem

The real news in the latest enterprise AI release is not the models. It is who governs the tokens.

Why Inference Is Becoming a Control Plane Problem

The hard problem in enterprise AI is no longer getting a model to run. Any team with a GPU and a Helm chart can serve an LLM today. The hard problem is governing who consumes it, at what cost, through which provider, and with what guarantees when something fails. That is a control plane problem, the same class of problem I wrote about in Nutanix Is Making Service Providers a First-Class Tenant, and the latest Nutanix Enterprise AI release moves it to the center of the product.

NAI 2.7 shipped on May 27, 2026. The headline feature is the general availability of AI Gateway mode, and in my reading it is the only headline that matters. New validated models arrive with every release. A governance layer for AI consumption does not.

What Was Actually Announced

Stripped of the announcement language, NAI 2.7 delivers five things. AI Gateway mode reaches GA and is enabled automatically, both on fresh installs and on upgrades from 2.6 standard mode. Unified endpoints can now be backed by external providers, with Anthropic, AWS Bedrock, Azure OpenAI, Cohere, GCP Vertex AI, Google Gemini, Mistral, and OpenAI supported alongside local models. Batch inferencing lands for large-volume jobs such as embeddings and summarization. Palo Alto Networks Prisma AIRS integration adds model scanning and endpoint security under the ML Admin role. And local MCP servers can now be deployed inside the NAI cluster, with SSE transport and the 2025-06-18 specification.

The availability honesty matters here, because not everything in the release is GA. KV cache aware routing, remote and local MCP server access, and model fine-tuning are tech preview. The documentation also flags rate limit management for unified endpoints as tech preview, which is worth knowing before you design a chargeback model around it. Plan with these labels in mind: tech preview features are for validation, not for tenant-facing commitments.

The Gateway Is a Consumption Boundary

What practitioners build by hand today is exactly what the gateway formalizes. Anyone running inference for multiple internal teams or external customers has assembled some combination of a reverse proxy, an open source LLM router, custom rate limiting, and a spreadsheet of provider API keys. Every one of those components is undifferentiated plumbing, and every one of them is a boundary you now own and patch.

The gateway model replaces that assembly with three primitives. A unified endpoint gives applications one OpenAI-compatible API address backed by multiple inference services, even across clusters, with load balancing or an explicit fallback order when a backend degrades. Token-based rate limits apply globally per endpoint and per API client key, with up to 50 keys per unified endpoint, which is the raw material for per-tenant budgets and consumption reporting. And provider credentials live in the platform, not in the tenant's application, so a customer consumes Anthropic or Bedrock capacity through your governed path without ever holding the upstream key.

For an MSP this is the difference between reselling GPU hours and operating an AI service. The reference sizing in the documentation, validated at 50 inference endpoints with 150 API keys each, tells you Nutanix is thinking about exactly this consumption pattern.

One distinction worth stating plainly: a consumption boundary is not tenancy. NAI has no tenant construct today. API keys meter and limit, but the mapping between keys, endpoints, and customers is a design discipline you bring, the same way MSPs have built tenancy on Prism Central with Projects, categories, and RBAC while waiting for SP Central. The gateway gives you the primitives a multi-tenant AI service needs. The tenant model on top of them is still yours to architect, and that is precisely where a provider's value lives until the platform closes the gap.

The Sovereign Angle Is Structural, Not Marketing

NAI runs on any CNCF-compliant Kubernetes distribution and explicitly supports dark site and air-gapped deployments. The minimum stack for this release is Kubernetes 1.33 or 1.34, NKP 2.17.1 for Nutanix-based deployments, and Envoy Gateway 1.7.0. That deployment flexibility produces an architecture European providers should pay attention to: local models served on sovereign infrastructure, on-premises or on NC2 capacity in a European datacenter, with the same gateway providing governed, rate-limited, auditable egress to hyperscaler model providers when a workload genuinely needs them.

Data residency for the default path, controlled and observable exceptions for the rest. That is a far more defensible answer to a European customer's sovereignty requirements than either extreme of "everything stays local" or "everything goes to a US API".

How to Arrive Ready

A few practical points deserve attention before this lands in production. The upgrade path is narrow: only NAI 2.6 standard mode upgrades to 2.7, and early access mode deployments do not. Gateway mode arrives enabled by default on upgrade, so review your endpoint and API key layout before the upgrade window rather than after. Check the platform floor honestly, because NKP 2.17.1, which ships Kubernetes 1.34, is a recent release and many environments will need a platform upgrade cycle first; I covered what that upgrade changes on the networking side in a previous article. On non-Nutanix CNCF distributions the floor is Kubernetes 1.33. And treat the GPU matrix as a procurement input: L40S, A100, H100 variants, H200, and the RTX PRO 6000 Blackwell server edition are the supported options.

The strategic takeaway echoes a principle I keep returning to on the infrastructure side: control planes, not capacity, are where platform value concentrates. Inference capacity is already a commodity. The layer that meters it, secures it, and routes it per tenant is not, and with this release that layer ships as a product instead of a weekend project.