|
本文档采用自动化机器翻译技术翻译。 尽管我们力求提供准确的译文,但不对翻译内容的完整性、准确性或可靠性作出任何保证。 若出现任何内容不一致情况,请以原始 英文 版本为准,且原始英文版本为权威文本。 |
孤立实例清理
SUSE Storage 可以识别并清理每个节点上的孤立实例。
孤立运行时实例
当网络故障影响到 SUSE Storage 节点时,可能会留下不再被 SUSE Storage 系统跟踪的引擎或副本运行时实例。在故障期间,相应的引擎和副本自定义资源(CR)可能会被去除或重新调度到其他节点。当节点恢复时,SUSE Storage 系统不再跟踪这些运行时实例。这些实例,例如 v1 卷的引擎和副本处理,被称为孤立实例。孤立实例继续消耗节点资源,如 处理器 和内存。
SUSE Storage 支持孤立实例的检测和清理。它识别这些实例并创建描述它们的 orphan 资源。默认情况下,SUSE Storage 不会自动删除 orphan 资源。用户可以手动触发孤立实例的删除,或配置 SUSE Storage 以自动删除它们。
当启用自动孤立删除时,SUSE Storage 会在 orphan-resource-auto-deletion-grace-period 设置定义的延迟后自动删除孤立自定义资源(CR)及其相关目录。如果用户手动删除孤立的 CR,删除将立即发生,并不遵循此宽限期。
示例
以下示例演示如何使用 kubectl 管理孤立实例。
通过 kubectl 管理孤立实例
-
引入运行孤立实例进程的节点
-
节点
worker1上的孤立副本实例Name: instance-manager-8ff396d6d3744979b32abafc6346781c Namespace: longhorn-system Kind: InstanceManager ... Status: Instance Replicas: pvc-569e44c0-b352-4aca-bf14-2cf7a6cfe86f-r-05660b73: # This instance might be an orphan Spec: Data Engine: v1 Name: pvc-569e44c0-b352-4aca-bf14-2cf7a6cfe86f-r-05660b73 Status: Conditions: <nil> Endpoint: Error Msg: Listen: Port End: 10020 Port Start: 10011 Resource Version: 0 State: running Target Port End: 0 Target Port Start:0 Type: replica ... -
节点
worker2上的孤立引擎实例Name: instance-manager-b87f10b867cec1dca2b814f5e78bcc90 Namespace: longhorn-system Kind: InstanceManager ... Status: Instance Engines: pvc-569e44c0-b352-4aca-bf14-2cf7a6cfe86f-e-0: # This instance might be an orphan Spec: Data Engine: v1 Name: pvc-569e44c0-b352-4aca-bf14-2cf7a6cfe86f-e-0 Status: Conditions: Filesystem Read Only: false Endpoint: Error Msg: Listen: Port End: 10020 Port Start: 10020 Resource Version: 0 State: running Target Port End: 10020 Target Port Start: 10020 Type: engine ...
-
-
SUSE Storage 检测到孤立实例并创建描述这些实例的
orphan资源。NAME TYPE NODE orphan-1807009489e50534c35c350e22680449c97deca4e5d3b72f4591976145f8bc41 engine-instance worker2 orphan-a91aa42ab5eda6b8b9fe1116d5b5f5673e5108d89be3db6fd18a275913463eef replica-instance worker1 -
您可以通过运行
kubectl -n longhorn-system get orphan来查看orphan系统创建的 SUSE Storage 资源列表。# kubectl -n longhorn-system get orphan -
通过
spec.parameters获取kubectl -n longhorn-system get orphan <name>中一个孤立副本实例的详细信息。apiVersion: longhorn.io/v1beta2 kind: Orphan metadata: creationTimestamp: "2025-05-02T06:07:32Z" finalizers: - longhorn.io generation: 1 labels: longhorn.io/component: orphan longhorn.io/managed-by: longhorn-manager longhorn.io/orphan-type: replica-instance longhornnode: worker1 longhornreplica: pvc-569e44c0-b352-4aca-bf14-2cf7a6cfe86f-r-05660b73 # ... (representing other omitted metadata fields) spec: dataEngine: v1 nodeID: worker1 orphanType: replica-instance parameters: InstanceManager: instance-manager-8ff396d6d3744979b32abafc6346781c InstanceName: pvc-569e44c0-b352-4aca-bf14-2cf7a6cfe86f-r-05660b73 status: conditions: - lastProbeTime: "" lastTransitionTime: "2025-05-02T06:06:39Z" message: "" reason: running status: "True" type: InstanceExist - lastProbeTime: "" lastTransitionTime: "2025-05-02T06:06:39Z" message: "" reason: "" status: "False" type: Error ownerID: worker1 -
通过
spec.parameters获取kubectl -n longhorn-system get orphan <name>中一个孤立引擎实例的详细信息。apiVersion: longhorn.io/v1beta2 kind: Orphan metadata: creationTimestamp: "2025-05-02T06:47:25Z" finalizers: - longhorn.io generation: 1 labels: longhorn.io/component: orphan longhorn.io/managed-by: longhorn-manager longhorn.io/orphan-type: engine-instance longhornengine: pvc-569e44c0-b352-4aca-bf14-2cf7a6cfe86f-e-0 longhornnode: worker2 # ... (representing other omitted metadata fields) spec: dataEngine: v1 nodeID: worker2 orphanType: engine-instance parameters: InstanceManager: instance-manager-b87f10b867cec1dca2b814f5e78bcc90 InstanceName: pvc-569e44c0-b352-4aca-bf14-2cf7a6cfe86f-e-0 status: conditions: - lastProbeTime: "" lastTransitionTime: "2025-05-02T06:47:25Z" message: "" reason: running status: "True" type: InstanceExist - lastProbeTime: "" lastTransitionTime: "2025-05-02T06:47:25Z" message: "" reason: "" status: "False" type: Error ownerID: worker2 -
您可以通过运行
orphan删除一个kubectl -n longhorn-system delete orphan <name>资源。相应的孤立实例也将被去除。# kubectl -n longhorn-system delete orphan orphan-a91aa42ab5eda6b8b9fe1116d5b5f5673e5108d89be3db6fd18a275913463eef # kubectl -n longhorn-system get orphan -l "longhorn.io/orphan-type in (engine-instance,replica-instance)" NAME TYPE NODE orphan-1807009489e50534c35c350e22680449c97deca4e5d3b72f4591976145f8bc41 engine-instance worker2孤立实例已被删除。
# kubectl -n longhorn-system describe instancemanager -l "longhorn.io/node=worker1" Name: instance-manager-8ff396d6d3744979b32abafc6346781c Namespace: longhorn-system Kind: InstanceManager ... Status: Instance Replicas: ... -
默认情况下,SUSE Storage 不会自动删除孤立实例。您可以通过配置
orphan-resource-auto-deletion设置来启用自动删除。# kubectl -n longhorn-system edit settings.longhorn.io orphan-resource-auto-deletion然后,通过将
instance作为分号分隔的项目之一添加到列表中。NAME VALUE APPLIED AGE orphan-resource-auto-deletion instance true 45h -
在启用自动删除后,稍等片刻,
orphan资源和处理将自动被删除。# kubectl -n longhorn-system get orphan -l "longhorn.io/orphan-type in (engine-instance,replica-instance)" No resources found in longhorn-system namespace.孤立实例已从实例管理器中删除。
# kubectl -n longhorn-system describe instancemanager -l "longhorn.io/node=worker1" Name: instance-manager-8ff396d6d3744979b32abafc6346781c Namespace: longhorn-system Kind: InstanceManager ... Status: Instance Replicas: ... # kubectl -n longhorn-system describe instancemanager -l "longhorn.io/node=worker2" Name: instance-manager-b87f10b867cec1dca2b814f5e78bcc90 Namespace: longhorn-system Kind: InstanceManager ... Status: Instance Engines: ...此外,您可以通过运行以下命令删除指定节点上的所有孤立实例:
# kubectl -n longhorn-system delete orphan -l "longhorn.io/orphan-type in (engine-instance,replica-instance),longhornnode=<node name>"