|
This is unreleased documentation for SUSE® Virtualization v1.8 (Dev). |
High Availability Principles
High availability (HA) in SUSE Virtualization is not a single feature but a coordinated effort between Kubernetes (control plane), KubeVirt (virtualization), and SUSE Storage (distributed storage).
Minimum of three nodes for etcd quorum
You must create a cluster with three or more nodes to fully leverage SUSE Virtualization’s multi-node features, including HA. The first node that is added to the cluster is by default a management node. To form an HA cluster, the two nodes added after the first are automatically promoted to management nodes.
A basic HA cluster has three management nodes that each contain the complete set of control plane components for node and pod management. One key component is etcd, which Kubernetes uses to store its data configuration, state, and metadata. The etcd node count must always be an odd number to ensure that a member quorum can be established, which occurs when a majority of nodes agrees on updates to the cluster’s state.
Homogeneous CPU specifications
In SUSE Virtualization v1.7.x and earlier versions, Live Migration functions correctly only if the CPUs of all physical servers in the SUSE Virtualization cluster have the same specifications. Newer CPUs (even those from the same vendor, generation, and family) can have varying capabilities that may be exposed to guest operating systems. To ensure virtual machine stability, SUSE Virtualization checks if the CPU capabilities are consistent, and blocks migration attempts when the source and destination are incompatible. Because of these strict feature-matching checks, you must always use CPUs with the same specifications when creating clusters, adding nodes to a cluster, and replacing nodes.
In SUSE Virtualization v1.8.x and later versions, you can manually select a CPU model when configuring virtual machines. If your cluster uses multiple CPU generations, do not use the host-model or host-passthrough options. Instead, select a specific CPU model to ensure that Live Migration functions correctly. As a best practice, select the most modern CPU model supported by every node in the cluster.
Minimum of two NICs per node for management bond
When a SUSE Virtualization cluster is deployed, a cluster network named mgmt is automatically created for intra-cluster communications. This built-in cluster network consists of the same bridge, bond, and NICs as the external infrastructure network to which each SUSE Virtualization node attaches with management NICs. mgmt also allows virtual machines to be accessed from the external infrastructure network for cluster management purposes.
To achieve high availability in production environments, ensure that each node has multiple NICs and that you select at least two when configuring the management network interfaces during installation. By default, SUSE Virtualization combines the selected interfaces into a single logical bonded interface named mgmt-bo, which usually serves as the uplink for the bridge interface named mgmt-br. If a physical cable or switch port fails, mgmt-bo ensures that the node remains reachable via the secondary interface.
Static IPs or DHCP reservations
Each SUSE Virtualization node must have a stable, fixed IP address to ensure reliable communication with cluster services. You can assign these addresses statically during ISO installation or by using DHCP reservations (IP-MAC binding), which ensures that the DHCP server always assigns the same IP address to a given node.
SUSE Virtualization does not support changing a node’s IP address after installation because this may cause node failure or total cluster instability. You must finalize your IP plan before creating the cluster. Specifically, if you want to select DHCP during the ISO installation, you must first locate the MAC addresses of the network interfaces intended for the management bond (mgmt-bo). Ensure that these MAC addresses are mapped to static IP addresses on your DHCP server before proceeding.
Accurate time synchronization
Because etcd stores data about the cluster’s state, accurate time synchronization is required across all nodes. Clock drift exceeding 500 ms will likely result in cluster instability or failure.
To ensure high availability, configure multiple NTP servers during installation or via the SUSE Virtualization UI. This prevents synchronization failure if a single server becomes unreachable or inaccurate. Do not manually modify NTP configuration files on individual nodes, as SUSE Virtualization automatically synchronizes these settings cluster-wide.
SSD or NVMe disks with high IOPS
The choice of storage media is critical because it directly affects both management node stability and virtual machine performance. SSDs or NVMes with more than 5,000 random input/output operations per second (IOPS) per disk are considered the standard requirement for the following reasons:
-
etcd is extremely sensitive to disk write latency. Slow log commits can trigger leader election timeouts. High latency on slower disks can cause the cluster to flap or lose quorum, leading to instability.
-
SUSE Storage replicates data across nodes synchronously, requiring high-performance storage to manage the resulting overhead. Since system components and virtual machines typically share a storage pool, high IOPS are critical to ensuring heavy virtual machine workloads do not compromise management interface performance.