Topology-Aware Provisioning
Topology-aware provisioning allows you to control the nodeAffinity rules that Kubernetes writes into a PersistentVolume (PV) at creation time. This is useful when you want to pin volumes to specific zones or regions so that pods and their data always land in the same failure domain.
SUSE Storage supports this through two complementary features:
-
csi-allowed-topology-keyssetting: Controls which topology keys (for example,topology.kubernetes.io/zone) appear in the PVnodeAffinity. -
strictTopologyStorageClass parameter: When enabled, pins the PV to the topology of the exact node selected by the scheduler instead of all matching topologies.
Prerequisites
-
Nodes in your cluster must be labeled with the topology keys you plan to use. Kubernetes automatically applies the well-known label
topology.kubernetes.io/zonein most cloud environments. Verify with:kubectl get nodes --label-columns topology.kubernetes.io/zone -
Configure the CSI Allowed Topology Keys setting in SUSE Storage. Set the value to a comma-separated list of topology keys that SUSE Storage should pass through to Kubernetes.
-
SUSE Storage UI: Go to Setting > General > CSI Allowed Topology Keys and enter, for example,
topology.kubernetes.io/zone. -
Longhorn API / kubectl:
kubectl -n longhorn-system edit settings.longhorn.io csi-allowed-topology-keysSet the
valuefield totopology.kubernetes.io/zone.After changing this setting, you must manually restart the
longhorn-csi-pluginDaemonSet for the change to take effect. Topology is applied correctly only after the CSI plugin pod on each node has restarted.
-
How It Works
When a PVC is created against a StorageClass that uses the Longhorn CSI driver, several fields interact to determine what nodeAffinity the resulting PV receives:
| Field | Role |
|---|---|
|
Tells the CSI driver which topology keys to advertise. If empty (the default), no topology information is passed to Kubernetes, and PVs do not receive topology-based |
|
Restricts which topology values are eligible. For example, you can limit provisioning to zones |
|
|
|
When |
Examples
The examples below assume a cluster with six nodes across three zones:
| Node | Zone |
|---|---|
node2 |
a |
node3 |
b |
node4 |
c |
node5 |
a |
node6 |
b |
node7 |
c |
Basic Zone-Level Affinity
Use WaitForFirstConsumer together with allowedTopologies and csi-allowed-topology-keys to restrict volumes to specific zones.
Longhorn setting:
csi-allowed-topology-keys = topology.kubernetes.io/zone
StorageClass:
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: longhorn-zone-ab
provisioner: driver.longhorn.io
volumeBindingMode: WaitForFirstConsumer
parameters:
numberOfReplicas: "3"
allowedTopologies:
- matchLabelExpressions:
- key: topology.kubernetes.io/zone
values:
- a
- b
Result: The PV nodeAffinity is set to zone in [a, b]. The PV can only be attached to nodes in zones a or b.
Strict Topology Pinning
Add strictTopology: "true" to pin the PV to the exact zone of the scheduled node.
Longhorn setting:
csi-allowed-topology-keys = topology.kubernetes.io/zone
StorageClass:
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: longhorn-strict-zone
provisioner: driver.longhorn.io
volumeBindingMode: WaitForFirstConsumer
parameters:
numberOfReplicas: "3"
strictTopology: "true"
allowedTopologies:
- matchLabelExpressions:
- key: topology.kubernetes.io/zone
values:
- a
- b
- c
Result: Even though all three zones are listed in allowedTopologies, the PV nodeAffinity is set to only the zone of the node where the pod was scheduled (for example, zone in [a] if the pod lands on node2 or node5).
Behavior Reference
In the following table, zones [a, b, c] represent all zones present in the example cluster above.
| # | volumeBindingMode |
allowedTopologies |
csi-allowed-topology-keys |
strictTopology |
PV nodeAffinity |
|---|---|---|---|---|---|
1 |
Immediate |
None |
|
false |
None |
2 |
Immediate |
None |
|
false |
zone in [a, b, c] |
3 |
Immediate |
zone: [a, b] |
|
false |
zone in [a, b] |
4 |
WFFC |
None |
|
false |
None |
5 |
WFFC |
None |
|
false |
zone in [a, b, c] |
6 |
WFFC |
None |
|
true |
zone in [selected] |
7 |
WFFC |
zone: [a] |
|
false |
zone in [a] |
8 |
WFFC |
zone: [a, b, c] |
|
true |
zone in [selected] |
In this table,
zoneis shorthand fortopology.kubernetes.io/zone, and[selected]means the zone of the node chosen by the Kubernetes scheduler.
Key takeaways:
-
Without
csi-allowed-topology-keys, no topology information is passed and PVs do not receive topology-basednodeAffinity(scenarios 1, 4). -
strictTopologyonly pins the PV to the topology of the scheduled Pod when used withWaitForFirstConsumer. WithImmediate, the PV is created before the Pod is scheduled, so its topology is selected randomly. -
allowedTopologiesnarrows the set of eligible zones;strictTopologyfurther narrows it to the single selected zone.
Notes and Warnings
-
Do not use
allowedTopologiestogether withdataLocality: strict-local. The PVnodeAffinityis immutable once set and will conflict with SUSE Storage’s strict-local volume pinning. See Data Locality for details. -
The most common configuration for users who do not need topology-aware provisioning is to leave
csi-allowed-topology-keysempty (scenarios 1 and 4). This is the default. -
For users who want topology-aware provisioning, the recommended configurations are scenarios 7 and 8 — use
WaitForFirstConsumertogether withallowedTopologiesandcsi-allowed-topology-keys.