Upgrading a Kubernetes cluster has traditionally been a one-way process: you move forward, and if there is an issue with the control plane, your only option is to proceed with a fix. This scenario introduces significant risks to routine maintenance, a problem that is compounded as organizations upgrade more frequently to access new AI features while requiring maximum reliability.
🫣 The challenge: Why were rollbacks so hard?
The control plane components of Kubernetes, particularly the kube-apiserver and etcd, are stateful and highly sensitive to changes in API versions. When you upgrade, many new APIs and features are added in the new binary. Some data may need to be migrated to new formats and API versions. Downgrading is not supported because there is no safe way to revert changes, which poses a risk of data corruption and potential cluster failure.
✅ The solution: Emulated versions
The Kubernetes Enhancement Proposal (KEP), KEP-4330: Compatibility Versions, introduces the concept of an "emulated version" for the control plane. Contributed by Googlers, this creates a new two-step upgrade process:
Step 1: Upgrade binaries. You upgrade the control plane binary, but the "emulated version" stays the same as the pre-upgrade version. At this stage, all APIs, features, and storage data formats remain unchanged. This makes it safe to roll back your control plane to the previously stable version if you find a problem.
Validate health and check for regressions. The 1st step creates a safe validation window during which you can verify that it's safe to proceed for example, making sure your own components or workloads are running healthy under the new binaries and checking for any performance regressions before committing to the new API versions.
Step 2: Finalize upgrade. After you complete your testing, you "bump" the emulated version to the new version. This enables all the new APIs and features of the latest Kubernetes release and completes the upgrade.
This two-step process gives you granular control, more observability, and a safe window for rollbacks. If an upgrade has an unexpected issue, you no longer need to scramble to roll forward. You now have a reliable way to revert to a known-good state, stabilize your cluster, and plan your next move calmly. This is all backed by comprehensive testing for the two-step upgrade in both open-source Kubernetes and GKE.
Enabling this was a major effort, and we want to thank all the Kubernetes contributors and feature owners whose collective work to test, comply, and adapt their features made this advanced capability a reality.