Checks¶
If a check fails, it is reported as a finding. Each check will have a remediation type - either recommended or required. A recommended remediation is one that is recommended to be performed, but is not required to be performed.
- ⚠️ Recommended: A finding that users are encouraged to evaluate the recommendation and determine if it is applicable and whether or not to act upon that recommendation. Not remediating the finding does not prevent the upgrade from occurring.
- ❌ Required: A finding that requires remediation prior to upgrading to be able to perform the upgrade and avoid downtime or disruption
See the symbol table for further details on the symbols used throughout the documentation.
Amazon¶
Checks that are not specific to Amazon EKS or Kubernetes
AWS001¶
🚧 Not yet implemented
⚠️ Remediation recommended
There is a sufficient quantity of IPs available for the nodes to support the upgrade.
If custom networking is enabled, the results represent the number of IPs available in the subnets used by the EC2 instances. Otherwise, the results represent the number of IPs available in the subnets used by both the EC2 instances and the pods.
AWS002¶
⚠️ Remediation recommended
There is a sufficient quantity of IPs available for the pods to support the upgrade.
This check is used when custom networking is enabled since the IPs used by pods are coming from subnets different from those used by the EC2 instances themselves.
AWS003¶
🚧 Not yet implemented
EC2 instance service limits
AWS004¶
🚧 Not yet implemented
EBS GP2 volume service limits
AWS005¶
🚧 Not yet implemented
EBS GP3 volume service limits
Amazon EKS¶
Checks that are specific to Amazon EKS
EKS001¶
❌ Remediation required
There are at least 2 subnets in different availability zones, each with at least 5 available IPs for the control plane to upgrade.
EKS002¶
❌ Remediation required
Control plane does not have any reported health issues.
EKS003¶
❌ Remediation required
EKS managed nodegroup does not have any reported health issues.
This does not include self-managed nodegroups or Fargate profiles; those are not currently supported by the AWS API to report health issues.
EKS004¶
❌ Remediation required
EKS addon does not have any reported health issues.
EKS005¶
❌ Remediation required
EKS addon version is within the supported range.
The addon must be updated to a version that is supported by the target Kubernetes version prior to upgrading.
⚠️ Remediation recommended
The target Kubernetes version default addon version is newer than the current addon version.
For example, if the default addon version of CoreDNS for Kubernetes v1.24
is v1.8.7-eksbuild.3
and the current addon version is v1.8.4-eksbuild.2
, while the current version is supported on Kubernetes v1.24
, its recommended to update the addon to v1.8.7-eksbuild.3
during the upgrade.
EKS006¶
⚠️ Remediation recommended
EKS managed nodegroup are using the latest launch template version and there are no pending updates for the nodegroup.
Users are encourage to evaluate if remediation is warranted or not and whether to update to the latest launch template version prior to upgrading. If there are pending updates, this could potentially introduce additional changes to the nodegroup during the upgrade.
EKS007¶
⚠️ Remediation recommended
Self-managed nodegroup are using the latest launch template version and there are no pending updates for the nodegroup.
Users are encourage to evaluate if remediation is warranted or not and whether to update to the latest launch template version prior to upgrading. If there are pending updates, this could potentially introduce additional changes to the nodegroup during the upgrade.
Kubernetes¶
Checks that are specific to Kubernetes, regardless of the underlying platform provider.
Table below shows the checks that are applicable, or not, to the respective Kubernetes resource.
Check | Deployment | ReplicaSet | ReplicationController | StatefulSet | Job | CronJob | Daemonset |
---|---|---|---|---|---|---|---|
K8S001 |
➖ | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ |
K8S002 |
✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
K8S003 |
✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
K8S004 |
✅ | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ |
K8S005 |
✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
K8S006 |
✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
K8S007 |
✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
K8S008 |
❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ |
K8S009 |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
K8S010 |
➖ | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ |
K8S011 |
➖ | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ |
K8S001¶
❌ Remediation required
The version skew between the control plane (API Server) and the data plane (kubelet) violates the Kubernetes version skew policy, or will violate the version skew policy after the control plane has been upgraded.
The data plane nodes must be upgraded to at least within 1 minor version of the control plane version in order to stay within the version skew policy through the upgrade; it is recommended to upgrade the data plane nodes to the same version as the control plane.
⚠️ Remediation recommended
There is a version skew between the control plane (API Server) and the data plane (kubelet).
While Kubernetes does support a version skew of n-2 between the API Server and kubelet, it is recommended to upgrade the data plane nodes to the same version as the control plane.
Kubernetes version skew policy
K8S002¶
❌ Remediation required
There are at least 3 replicas specified for the resource.
Multiple replicas, along with the use of PodDisruptionBudget
, are required to ensure high availability during the upgrade.
EKS Best Practices - Reliability
K8S003¶
❌ Remediation required
minReadySeconds
has been set to a value greater than 0 seconds for StatefulSet
You can read more about why this is necessary for StatefulSet
here
⚠️ Remediation recommended
minReadySeconds
has been set to a value greater than 0 seconds for Deployment
, ReplicaSet
, ReplicationController
K8S004¶
🚧 Not yet implemented
❌ Remediation required
At least one podDisruptionBudget
covers the workload, and at least one of minAvailable
or maxUnavailable
is set
The Kubernetes eviction API is the preferred method for draining nodes for replacement during an upgrade. The eviction API respects PodDisruptionBudget
and will not evict pods that would violate the PodDisruptionBudget
to ensure application availability, when specified.
K8S005¶
❌ Remediation required
Either .spec.affinity.podAntiAffinity
or .spec.topologySpreadConstraints
is set to avoid multiple pods from the same workload from being scheduled on the same node.
topologySpreadConstraints
are preferred over affinity, especially for larger clusters:
-
Inter-pod affinity and anti-affinity
Note: Inter-pod affinity and anti-affinity require substantial amount of processing which can slow down scheduling in large clusters significantly. We do not recommend using them in clusters larger than several hundred nodes.
Types of inter-pod affinity and anti-affinity
Pod Topology Spread Constraints
K8S006¶
❌ Remediation required
A readinessProbe
must be set to ensure traffic is not routed to pods before they are ready following their re-deployment from a node replacement.
K8S007¶
❌ Remediation required
The StatefulSet
should not specify a TerminationGracePeriodSeconds
of 0
-
Deployment and Scaling Guarantees
The StatefulSet should not specify a pod.Spec.TerminationGracePeriodSeconds of 0. This practice is unsafe and strongly discouraged. For further explanation, please refer to force deleting StatefulSet Pods.
K8S008¶
Pod volumes should not mount the docker.sock
file with the removal of the Dockershim starting in Kubernetes v1.24
❌ Remediation required
For clusters on Kubernetes v1.23
⚠️ Remediation recommended
For clusters on Kubernetes <v1.22
Detector for Docker Socket (DDS)
K8S009¶
The pod security policy resource has been removed started in Kubernetes v1.25
❌ Remediation required
For clusters on Kubernetes v1.24
⚠️ Remediation recommended
For clusters on Kubernetes <v1.23
Migrate from PodSecurityPolicy to the Built-In PodSecurity Admission Controller
PodSecurityPolicy Deprecation: Past, Present, and Future
K8S010¶
🚧 Not yet implemented
The in-tree Amazon EBS storage provisioner is deprecated. If you are upgrading your cluster to version v1.23
, then you must first install the Amazon EBS driver before updating your cluster. For more information, see Amazon EBS CSI migration frequently asked questions.
❌ Remediation required
For clusters on Kubernetes v1.22
⚠️ Remediation recommended
For clusters on Kubernetes <v1.21
Amazon EBS CSI migration frequently asked questions
Kubernetes In-Tree to CSI Volume Migration Status Update
K8S011¶
❌ Remediation required
kube-proxy
on an Amazon EKS cluster has the same compatibility and skew policy as Kubernetes
- It must be the same minor version as kubelet on your Amazon EC2 nodes
- It cannot be newer than the minor version of your cluster's control plane
- Its version on your Amazon EC2 nodes can't be more than two minor versions older than your control plane. For example, if your control plane is running Kubernetes
1.25
, then the kube-proxy minor version cannot be older than1.23
If you recently updated your cluster to a new Kubernetes minor version, then update your Amazon EC2 nodes (i.e. - kubelet
) to the same minor version before updating kube-proxy
to the same minor version as your nodes. The order of operations during an upgrade are as follows:
1. Update the control plane to the new Kubernetes minor version
2. Update the nodes, which updates `kubelet`, to the new Kubernetes minor version
3. Update `kube-proxy` to the new Kubernetes minor version