What is Cluster API Add-on Provider for Fleet (CAAPF)?

Cluster API Add-on Provider for Fleet (CAAPF) is a Cluster API (CAPI) provider that provides integration with Fleet to enable the easy deployment of applications to a CAPI provisioned cluster.

It provides the following functionality:

Addon provider automatically installs Fleet in your management cluster.
The provider will register a newly provisioned CAPI cluster with Fleet so that applications can be automatically deployed to the created cluster via GitOps, Bundle or HelmOp.
The provider will automatically create a Fleet Cluster Group for every CAPI ClusterClass. This enables you to deploy the same applications to all clusters created from the same ClusterClass.
CAPI Cluster, ControlPlane resources are automatically added to the Fleet Cluster resource templates, allowing to perform per-cluster configuration templating for Helm based installations.

Installation

Clusterctl

To install provider with clusterctl:

Install clusterctl
Run clusterctl init --addon rancher-fleet

Cluster API Operator

You can install production instance of CAAPF in your cluster with CAPI Operator.

We need to install cert-manager as a pre-requisite to CAPI Operator, if it is not currently installed:

kubectl apply -f https://github.com/jetstack/cert-manager/releases/latest/download/cert-manager.yaml

To install CAPI Operator, docker infrastructure provider and the fleet addon together:

helm repo add capi-operator https://kubernetes-sigs.github.io/cluster-api-operator
helm repo update
helm upgrade --install capi-operator capi-operator/cluster-api-operator \
    --create-namespace -n capi-operator-system \
    --set infrastructure.docker.enabled=true --set addon.rancher-fleet.enabled=true

Motivation

Currently, in the CAPI ecosystem, several solutions exist for deploying applications as add-ons on clusters provisioned by CAPI. However, this idea and its alternatives have not been actively explored upstream, particularly in the GitOps space. The need to address this gap was raised in the Cluster API Addon Orchestration proposal.

One of the projects involved in deploying Helm charts on CAPI-provisioned clusters is the CAPI Addon Provider Helm (CAAPH). This solution enables users to automatically install HelmChartProxy on provisioned clusters.

Fleet also supports deploying Helm charts via the (experimental) HelmOp resource, which offers similar capabilities to HelmChartProxy. However, Fleet primarily focuses on providing GitOps capabilities for managing CAPI clusters and application states within these clusters.

Out of the box, Fleet allows users to deploy and maintain the state of arbitrary templates on child clusters using the Fleet Bundle resource. This approach addresses the need for alternatives to ClusterResourceSet while offering full application lifecycle management.

CAAPF is designed to streamline and enhance native Fleet integration with CAPI. It functions as a separate Addon provider that can be installed via clusterctl or the CAPI Operator.

User Stories

User Story 1

As an infrastructure provider, I want to deploy my provisioning application to every provisioned child cluster so that I can provide immediate functionality during and after cluster bootstrap.

User Story 2

As a DevOps engineer, I want to use GitOps practices to deploy CAPI clusters and applications centrally so that I can manage all cluster configurations and deployed applications from a single location.

User Story 3

As a user, I want to deploy applications into my CAPI clusters and configure those applications based on the cluster infrastructure templates so that they are correctly provisioned for the cluster environment.

User Story 4

As a cluster operator, I want to streamline the provisioning of Cluster API child clusters so that they can be successfully provisioned and become Ready from a template without manual intervention.

User Story 5

As a cluster operator, I want to facilitate the provisioning of Cluster API child clusters located behind NAT so that they can be successfully provisioned and establish connectivity with the management cluster.

Getting Started

This section contains guides on how to get started with CAAPF and Fleet

Installation

Clusterctl

To install provider with clusterctl:

Install clusterctl
Run clusterctl init --addon rancher-fleet

Cluster API Operator

You can install production instance of CAAPF in your cluster with CAPI Operator.

We need to install cert-manager as a pre-requisite to CAPI Operator, if it is not currently installed:

kubectl apply -f https://github.com/jetstack/cert-manager/releases/latest/download/cert-manager.yaml

To install CAPI Operator, docker infrastructure provider and the fleet addon together:

helm repo add capi-operator https://kubernetes-sigs.github.io/cluster-api-operator
helm repo update
helm upgrade --install capi-operator capi-operator/cluster-api-operator \
    --create-namespace -n capi-operator-system \
    --set infrastructure.docker.enabled=true --set addon.rancher-fleet.enabled=true

Configuration

Installing Fleet

By default CAAPF expects your cluster to have Fleet helm chart pre-installed and configured, but it can manage Fleet installation via FleetAddonConfig resource, named fleet-addon-config. To install Fleet helm chart with latest stable Fleet version:

apiVersion: addons.cluster.x-k8s.io/v1alpha1
kind: FleetAddonConfig
metadata:
  name: fleet-addon-config
spec:
  config:
    server:
      inferLocal: true # Uses default `kuberenetes` endpoint and secret for APIServerURL configuration
  install:
    followLatest: true

Alternatively, a specific version can be provided in the spec.install.version:

apiVersion: addons.cluster.x-k8s.io/v1alpha1
kind: FleetAddonConfig
metadata:
  name: fleet-addon-config
spec:
  config:
    server:
      inferLocal: true # Uses default `kuberenetes` endpoint and secret for APIServerURL configuration
  install:
    followLatest: true

Fleet Public URL and Certificate setup

Fleet agent requires direct access to the Fleet server instance running in the management cluster. When provisioning Fleet agent on the downstream cluster using the default manager-initiated registration, the public API server url and certificates will be taken from the current Fleet server configuration.

If a user installaling Fleet via FleetAddonConfig resource, there are fields which allow to configure these settings.

Field config.server allows to specify setting for the Fleet server configuration, such as apiServerURL and certificates.

Using inferLocal: true setting allows to use default kubernetes endpoint and CA secret to configure the Fleet instance.

apiversion: addons.cluster.x-k8s.io/v1alpha1
kind: FleetAddonConfig
metadata:
  name: fleet-addon-config
spec:
  config:
    server:
      inferLocal: true # Uses default `kuberenetes` endpoint and secret for APIServerURL configuration
  install:
    followLatest: true

This scenario works well in a test setup, while using CAPI docker provider and docker clusters.

Here is an example of a manulal API server URL configuration with a reference to certificates ConfigMap or Secret, which contains a ca.crt data key for the Fleet helm chart:

apiversion: addons.cluster.x-k8s.io/v1alpha1
kind: FleetAddonConfig
metadata:
  name: fleet-addon-config
spec:
  config:
    server:
      apiServerUrl: "https://public-url.io"
      apiServerCaConfigRef:
        apiVersion: v1
        kind: ConfigMap
        name: kube-root-ca.crt
        namespace: default
  install:
    followLatest: true # Installs current latest version of fleet from https://github.com/rancher/fleet-helm-charts

Cluster Import Strategy

-> Import Strategy

Fleet Feature Flags

Fleet includes experimental features that can be enabled or disabled using feature gates in the FleetAddonConfig resource. These flags are configured under .spec.config.featureGates.

To enable experimental features such as OCI storage support and HelmOp support, update the FleetAddonConfig as follows:

apiVersion: addons.cluster.x-k8s.io/v1alpha1
kind: FleetAddonConfig
metadata:
  name: fleet-addon-config
spec:
  config:
    featureGates:
      experimentalOciStorage: true   # Enables experimental OCI storage support
      experimentalHelmOps: true      # Enables experimental Helm operations support

By default, if the featureGates field is not present, these feature gates are enabled. To disable these need to explicitly be set to false.

Optionally, the featureGates flags can be synced to a ConfigMap object.
This is useful when Fleet is installed and managed by Rancher.
When a ConfigMap reference is defined, the controller will just sync the featureGates to it, without making any changes to the Fleet helm chart.

apiVersion: addons.cluster.x-k8s.io/v1alpha1
kind: FleetAddonConfig
metadata:
  name: fleet-addon-config
spec:
  config:
    featureGates:
      experimentalOciStorage: true   # Enables experimental OCI storage support
      experimentalHelmOps: true      # Enables experimental Helm operations support
      configMap:
        ref:
          apiVersion: v1
          kind: ConfigMap
          name: rancher-config
          namespace: cattle-system

Tutorials

This section contains tutorials, such as quick-start, installation, application deployments and operator guides.

Prerequisites

Requirements

helm
CAPI management cluster.
- Features EXP_CLUSTER_RESOURCE_SET and CLUSTER_TOPOLOGY must be enabled.
- clusterctl.

Create your local cluster

NOTE: if you prefer to opt for a one-command installation, you can refer to the notes on how to use just and the project's justfile here.

Start by adding the helm repositories that are required to proceed with the installation.

helm repo add fleet https://rancher.github.io/fleet-helm-charts/
helm repo update

Create the local cluster

kind create cluster --config testdata/kind-config.yaml

Install fleet and specify the API_SERVER_URL and CA.

# We start by retrieving the CA data from the cluster
kubectl config view -o json --raw | jq -r '.clusters[] | select(.name=="kind-dev").cluster["certificate-authority-data"]' | base64 -d > _out/ca.pem
# Set the API server URL
API_SERVER_URL=`kubectl config view -o json --raw | jq -r '.clusters[] | select(.name=="kind-dev").cluster["server"]'`
# And proceed with the installation via helm
helm -n cattle-fleet-system install --version v0.12.0 --create-namespace --wait fleet-crd fleet/fleet-crd
helm install --create-namespace --version v0.12.0 -n cattle-fleet-system --set apiServerURL=$API_SERVER_URL --set-file apiServerCA=_out/ca.pem fleet fleet/fleet --wait

Install CAPI with the required experimental features enabled and initialized the Docker provider for testing.

EXP_CLUSTER_RESOURCE_SET=true CLUSTER_TOPOLOGY=true clusterctl init -i docker --addon rancher-fleet

Wait for all pods to become ready and your cluster should be ready to use CAAPF!

Create your downstream cluster

In order to initiate CAAPF autoimport, a CAPI Cluster needs to be created.

To create one, we can either follow quickstart documentation or create a cluster from existing template.

kubectl apply -f testdata/capi-quickstart.yaml

For more advanced cluster import strategy, check the configuration section.

Remember that you can follow along with the video demo to install the provider and get started quickly.

Installing Kindnet CNI using resource Bundle

This section describes steps to install kindnet CNI solution on a CAPI cluster using Fleet Bundle resource.

Deploying Kindnet

We will use Fleet Bundle resource to deploy Kindnet on the docker cluster.

> kubectl get clusters
NAME              CLUSTERCLASS   PHASE         AGE   VERSION
docker-demo       quick-start    Provisioned   35h   v1.29.2

First, let's review our targes for the kindnet bundle. They should match labels on the cluster, or the name of the cluster, as in this instance:

  targets:
  - clusterName: docker-demo

We will apply the resource from the:

kind: Bundle
apiVersion: fleet.cattle.io/v1alpha1
metadata:
  name: kindnet-cni
spec:
  resources:
  # List of all resources that will be deployed
  - content: |-
      # kindnetd networking manifest
      ---
      kind: ClusterRole
      apiVersion: rbac.authorization.k8s.io/v1
      metadata:
        name: kindnet
      rules:
        - apiGroups:
            - ""
          resources:
            - nodes
          verbs:
            - list
            - watch
            - patch
        - apiGroups:
            - ""
          resources:
            - configmaps
          verbs:
            - get
      ---
      kind: ClusterRoleBinding
      apiVersion: rbac.authorization.k8s.io/v1
      metadata:
        name: kindnet
      roleRef:
        apiGroup: rbac.authorization.k8s.io
        kind: ClusterRole
        name: kindnet
      subjects:
        - kind: ServiceAccount
          name: kindnet
          namespace: kube-system
      ---
      apiVersion: v1
      kind: ServiceAccount
      metadata:
        name: kindnet
        namespace: kube-system
      ---
      apiVersion: apps/v1
      kind: DaemonSet
      metadata:
        name: kindnet
        namespace: kube-system
        labels:
          tier: node
          app: kindnet
          k8s-app: kindnet
      spec:
        selector:
          matchLabels:
            app: kindnet
        template:
          metadata:
            labels:
              tier: node
              app: kindnet
              k8s-app: kindnet
          spec:
            hostNetwork: true
            tolerations:
              - operator: Exists
                effect: NoSchedule
            serviceAccountName: kindnet
            containers:
              - name: kindnet-cni
                image: kindest/kindnetd:v20230511-dc714da8
                env:
                  - name: HOST_IP
                    valueFrom:
                      fieldRef:
                        fieldPath: status.hostIP
                  - name: POD_IP
                    valueFrom:
                      fieldRef:
                        fieldPath: status.podIP
                  - name: POD_SUBNET
                    value: '10.1.0.0/16'
                volumeMounts:
                  - name: cni-cfg
                    mountPath: /etc/cni/net.d
                  - name: xtables-lock
                    mountPath: /run/xtables.lock
                    readOnly: false
                  - name: lib-modules
                    mountPath: /lib/modules
                    readOnly: true
                resources:
                  requests:
                    cpu: "100m"
                    memory: "50Mi"
                  limits:
                    cpu: "100m"
                    memory: "50Mi"
                securityContext:
                  privileged: false
                  capabilities:
                    add: ["NET_RAW", "NET_ADMIN"]
            volumes:
              - name: cni-bin
                hostPath:
                  path: /opt/cni/bin
                  type: DirectoryOrCreate
              - name: cni-cfg
                hostPath:
                  path: /etc/cni/net.d
                  type: DirectoryOrCreate
              - name: xtables-lock
                hostPath:
                  path: /run/xtables.lock
                  type: FileOrCreate
              - name: lib-modules
                hostPath:
                  path: /lib/modules
    name: kindnet.yaml
  targets:
  - clusterName: docker-demo

> kubectl apply -f testdata/cni.yaml
bundle.fleet.cattle.io/kindnet-cni configured

After some time we should see the resource in a ready state:

> kubectl get bundles kindnet-cni
NAME          BUNDLEDEPLOYMENTS-READY   STATUS
kindnet-cni   1/1

This should result in a kindnet running on the matching cluster:

> kubectl get pods --context docker-demo -A | grep kindnet
kube-system         kindnet-dqzwh                                         1/1     Running   0          2m11s
kube-system         kindnet-jbkjq                                         1/1     Running   0          2m11s

Demo

Installing Calico CNI using HelmOp

Note: For this setup to work, you need to install Fleet and Fleet CRDs charts via FleetAddonConfig resource. Both need to have version >= v0.12.0, which provides support for HelmOp resource.

In this tutorial we will deploy Calico CNI using HelmOp resource and Fleet cluster substitution mechanism.

Deploying Calico CNI

Here's an example of how a HelmOp resource can be used in combination with templateValues to deploy application consistently on any matching cluster.

In this scenario we are matching cluster directly by name, using clusterName reference, but a clusterGroup or a label based selection can be used instead or together with clusterName:

  targets:
  - clusterName: docker-demo

We are deploying HelmOp resource in the default namespace. The namespace should be the same for the CAPI Cluster for fleet to locate it.

apiVersion: fleet.cattle.io/v1alpha1
kind: HelmOp
metadata:
  name: calico
spec:
  helm:
    releaseName: projectcalico
    repo: https://docs.tigera.io/calico/charts
    chart: tigera-operator
    templateValues:
      installation: |-
        cni:
          type: Calico
          ipam:
            type: HostLocal
        calicoNetwork:
          bgp: Disabled
          mtu: 1350
          ipPools:
            ${- range $cidr := .ClusterValues.Cluster.spec.clusterNetwork.pods.cidrBlocks }
            - cidr: "${ $cidr }"
              encapsulation: None
              natOutgoing: Enabled
              nodeSelector: all()${- end}
  insecureSkipTLSVerify: true
  targets:
  - clusterName: docker-demo
  - clusterGroup: quick-start.clusterclass

HelmOp supports fleet templating options, otherwise available exclusively to the fleet.yaml configuration, stored in the git repository contents, and applied via the GitRepo resource.

In this example we are using values from the Cluster.spec.clusterNetwork.pods.cidrBlocks list to define ipPools for the calicoNetwork. These chart settings will be unique per each matching cluster, and based on the observed cluster state at any moment.

After appying the resource we will observe the app rollout:

> kubectl apply -f testdata/helm.yaml
helmop.fleet.cattle.io/calico created
> kubectl get helmop
NAME     REPO                                   CHART             VERSION   BUNDLEDEPLOYMENTS-READY   STATUS
calico   https://docs.tigera.io/calico/charts   tigera-operator   v3.29.2   0/1                       NotReady(1) [Bundle calico]; apiserver.operator.tigera.io default [progressing]
# After some time
> kubectl get helmop
NAME     REPO                                   CHART             VERSION   BUNDLEDEPLOYMENTS-READY   STATUS
calico   https://docs.tigera.io/calico/charts   tigera-operator   v3.29.2   1/1
> kubectl get pods -n calico-system --context capi-quickstart
NAME                                      READY   STATUS    RESTARTS   AGE
calico-kube-controllers-9cd68cb75-p46pz   1/1     Running   0          53s
calico-node-bx5b6                         1/1     Running   0          53s
calico-node-hftwd                         1/1     Running   0          53s
calico-typha-6d9fb6bcb4-qz6kt             1/1     Running   0          53s
csi-node-driver-88jqc                     2/2     Running   0          53s
csi-node-driver-mjwxc                     2/2     Running   0          53s

Demo

You can follow along with the demo to verify that your deployment is matching expected result:

Installing Calico CNI using GitRepo

Note: For this setup to work, you need have Fleet and Fleet CRDs charts installed with version >= v0.12.0.

In this tutorial we will deploy Calico CNI using GitRepo resource on RKE2 based docker cluster.

Deploying RKE2 docker cluster

We will first need to create a RKE2 based docker cluster from templates:

> kubectl apply -f testdata/cluster_docker_rke2.yaml
dockercluster.infrastructure.cluster.x-k8s.io/docker-demo created
cluster.cluster.x-k8s.io/docker-demo created
dockermachinetemplate.infrastructure.cluster.x-k8s.io/docker-demo-control-plane created
rke2controlplane.controlplane.cluster.x-k8s.io/docker-demo-control-plane created
dockermachinetemplate.infrastructure.cluster.x-k8s.io/docker-demo-md-0 created
rke2configtemplate.bootstrap.cluster.x-k8s.io/docker-demo-md-0 created
machinedeployment.cluster.x-k8s.io/docker-demo-md-0 created
configmap/docker-demo-lb-config created

In this scenario cluster is located in the default namespace, where the rest of fleet objects will go. Cluster is labeled with cni: calico in order for the GitRepo to match on it.

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: docker-demo
  labels:
    cni: calico

Now that cluster is created, GitRepo can be applied which will be evaluated asynchroniously.

Deploying Calico CNI via `GitRepo`

We will first review the content of our fleet.yaml file:

helm:
  releaseName: projectcalico
  repo: https://docs.tigera.io/calico/charts
  chart: tigera-operator
  templateValues:
    installation: |-
      cni:
        type: Calico
        ipam:
          type: HostLocal
      calicoNetwork:
        bgp: Disabled
        mtu: 1350
        ipPools:
          ${- range $cidr := .ClusterValues.Cluster.spec.clusterNetwork.pods.cidrBlocks }
          - cidr: "${ $cidr }"
            encapsulation: None
            natOutgoing: Enabled
            nodeSelector: all()${- end}

diff:
  comparePatches:
  - apiVersion: operator.tigera.io/v1
    kind: Installation
    name: default
    operations:
    - {"op":"remove", "path":"/spec/kubernetesProvider"}

In this scenario we are using helm definition which is consistent with the HelmOp spec from the previous guide, and defines same templating rules.

We also need to resolve conflicts, which happen due to in-place modification of some resources by the calico controllers. For that, the diff section is used, where we remove blocking fields from comparison.

Once everything is ready, we need to apply our GitRepo in the default namespace. In our case, we will match on clusters labeled with cni: calico label:

apiVersion: fleet.cattle.io/v1alpha1
kind: GitRepo
metadata:
  name: calico
spec:
  branch: main
  paths:
  - /fleet/applications/calico
  repo: https://github.com/rancher/cluster-api-addon-provider-fleet.git
  targets:
  - clusterSelector:
      matchLabels:
        cni: calico

> kubectl apply -f testdata/gitrepo-calico.yaml
gitrepo.fleet.cattle.io/calico created
# After some time
> kubectl get gitrepo
NAME     REPO                                                                     COMMIT                                     BUNDLEDEPLOYMENTS-READY   STATUS
calico   https://github.com/rancher/cluster-api-addon-provider-fleet.git   62b4fe6944687e02afb331b9e1839e33c539f0c7   1/1

Now our cluster have calico installed, and all nodes are marked as Ready:

# exec into one of the CP node containers
> docker exec -it fef3427009f6 /bin/bash
root@docker-demo-control-plane-krtnt:/#
root@docker-demo-control-plane-krtnt:/# kubectl get pods -n calico-system --kubeconfig /var/lib/rancher/rke2/server/cred/api-server.kubeconfig
NAME                                       READY   STATUS    RESTARTS   AGE
calico-kube-controllers-55cbcc7467-j5bbd   1/1     Running   0          3m30s
calico-node-mbrqg                          1/1     Running   0          3m30s
calico-node-wlbwn                          1/1     Running   0          3m30s
calico-typha-f48c7ddf7-kbq6d               1/1     Running   0          3m30s
csi-node-driver-87tlx                      2/2     Running   0          3m30s
csi-node-driver-99pqw                      2/2     Running   0          3m30s

Demo

You can follow along with the demo to verify that your deployment is matching expected result:

Reference

This section contains reference guides and information about main CAAPF features and how to use them.

Import Strategy

CAAPF follows a simple import strategy for CAPI clusters:

Each CAPI cluster has a corresponding Fleet Cluster object.
Each CAPI Cluster Class has a corresponding Fleet ClusterGroup object.
When a CAPI Cluster references a ClusterClass in a different namespace, a ClusterGroup is created in the Cluster namespace. This ClusterGroup targets all clusters in this namespace that reference the same ClusterClass. See the configuration section for details.
If at least one CAPI Cluster references a ClusterClass in a different namespace, a BundleNamespaceMapping is created in the ClusterClass namespace. This allows Fleet Cluster resources to use application sources such as Bundles, HelmOps, or GitRepos from the ClusterClass namespace as if they were deployed in the Cluster namespace. See the configuration section for details.

By default, CAAPF imports all CAPI clusters under Fleet management. See the configuration section for details.

CAAPF-import-groups excalidraw dark

Label Synchronization

Fleet relies on Cluster labels, Cluster names, and ClusterGroups for target matching when deploying applications or referenced repository content. To ensure consistency, CAAPF synchronizes resource labels:

From the CAPI ClusterClass to the imported Fleet Cluster resource.
From the CAPI ClusterClass to the imported Fleet ClusterGroup resource.

When a CAPI Cluster references a ClusterClass, CAAPF applies two specific labels to both the Cluster and ClusterGroup resources:

clusterclass-name.fleet.addons.cluster.x-k8s.io: <class-name>
clusterclass-namespace.fleet.addons.cluster.x-k8s.io: <class-ns>

Templating strategy

The Cluster API Addon Provider Fleet automates application templating for imported CAPI clusters based on matching cluster state.

Functionality

The Addon Provider Fleet ensures that the state of a CAPI cluster and resources is always up-to-date in the spec.templateValues.ClusterValues field of the Fleet cluster resource. This allows users to:

Reference specific parts of CAPI cluster directly or via Helm substitution patterns referencing .ClusterValues.Cluster data.
Substiture based on the state of the control plane resource via .ClusterValues.ControlPlane field.
Substiture based on the state of the infrastructure cluster resource via .ClusterValues.InfrastructureCluster field.
Maintain a consistent application state across different clusters.
Use the same template for multiple matching clusters to simplify deployment and management.

Example - templating withing HelmOp

-> Installing Calico

FleetAddonConfig Reference

The FleetAddonConfig Custom Resource Definition (CRD) is used to configure the behavior of the Cluster API Addon Provider for Fleet.

Spec

The spec field of the FleetAddonConfig CRD contains the configuration options. It is a required field and provides a config for fleet addon functionality.

config
- Description: An object that holds various configuration settings.
- Type: object
- Optional: Yes
- config.bootstrapLocalCluster
  - Description: Enable auto-installation of a fleet agent in the local cluster.
  - Type: boolean
  - Optional: Yes
  When set to true, the provider will automatically install a Fleet agent in the cluster where the provider is running. This is useful for bootstrapping a local development or management cluster to be managed by Fleet.
  
  Example:
```
spec:
  config:
    bootstrapLocalCluster: true
```
- config.featureGates
  - Description: Feature gates controlling experimental features.
  - Type: object
  - Optional: Yes
  This section allows enabling or disabling experimental features within the provider.
  - config.featureGates.configMap
    - Description: References a ConfigMap where to apply feature flags. If a ConfigMap is referenced, the controller will update it instead of upgrading the Fleet chart.
    - Type: object (ObjectReference)
    - Optional: Yes
    Example:
```
spec:
  config:
    featureGates:
      configMap:
        ref:
          apiVersion: v1
          kind: ConfigMap
          name: fleet-feature-flags
          namespace: fleet-system
```
  - config.featureGates.experimentalHelmOps
    - Description: Enables experimental Helm operations support.
    - Type: boolean
    - Optional: No (Required within featureGates)
    Example:
```
spec:
  config:
    featureGates:
      experimentalHelmOps: true
```
  - config.featureGates.experimentalOciStorage
    - Description: Enables experimental OCI storage support.
    - Type: boolean
    - Optional: No (Required within featureGates)
    Example:
```
spec:
  config:
    featureGates:
      experimentalOciStorage: true
```
- config.server
  - Description: Fleet server URL configuration options.
  - Type: object (oneOf inferLocal or custom)
  - Optional: Yes
  This section configures how the provider connects to the Fleet server. You must specify either inferLocal or custom.
  - config.server.inferLocal
    - Description: Infer the local cluster's API server URL as the Fleet server URL.
    - Type: boolean
    - Optional: No (Required if custom is not set)
    Example:
```
spec:
  config:
    server:
      inferLocal: true
```
  - config.server.custom
    - Description: Custom configuration for the Fleet server URL.
    - Type: object
    - Optional: No (Required if inferLocal is not set)
    - config.server.custom.apiServerCaConfigRef
      - Description: Reference to a ConfigMap containing the CA certificate for the API server.
      - Type: object (ObjectReference)
      - Optional: Yes
      Example:
```
spec:
  config:
    server:
      custom:
        apiServerCaConfigRef:
          apiVersion: v1
          kind: ConfigMap
          name: fleet-server-ca
          namespace: fleet-system
```
    - config.server.custom.apiServerUrl
      - Description: The custom URL for the Fleet API server.
      - Type: string
      - Optional: Yes
      Example:
```
spec:
  config:
    server:
      custom:
        apiServerUrl: https://fleet.example.com
```
cluster
- Description: Enable Cluster config functionality. This will create Fleet Cluster for each Cluster with the same name. In case the cluster specifies topology.class, the name of the ClusterClass will be added to the Fleet Cluster labels.
- Type: object
- Optional: Yes
This section configures the behavior for creating Fleet Clusters from Cluster API Clusters.
- cluster.agentEnvVars
  - Description: Extra environment variables to be added to the agent deployment.
  - Type: array of object (EnvVar)
  - Optional: Yes
  Example:
```
spec:
  cluster:
    agentEnvVars:
      - name: HTTP_PROXY
        value: http://proxy.example.com:8080
      - name: NO_PROXY
        value: localhost,127.0.0.1,.svc
```
- cluster.agentNamespace
  - Description: Namespace selection for the fleet agent.
  - Type: string
  - Optional: Yes
  Example:
```
spec:
  cluster:
    agentNamespace: fleet-agents
```
- cluster.agentTolerations
  - Description: Agent taint toleration settings for every cluster.
  - Type: array of object (Toleration)
  - Optional: Yes
  Example:
```
spec:
  cluster:
    agentTolerations:
      - key: "node.kubernetes.io/unreachable"
        operator: "Exists"
        effect: "NoExecute"
        tolerationSeconds: 600
      - key: "node.kubernetes.io/not-ready"
        operator: "Exists"
        effect: "NoExecute"
        tolerationSeconds: 600
```
- cluster.applyClassGroup
  - Description: Apply a ClusterGroup for a ClusterClass referenced from a different namespace.
  - Type: boolean
  - Optional: Yes
  When a CAPI Cluster references a ClusterClass in a different namespace, a corresponding ClusterGroup is created in the Cluster namespace. This ensures that all clusters within the namespace that share the same ClusterClass from another namespace are grouped together.
  
  This ClusterGroup inherits ClusterClass labels and applies two CAAPF-specific labels to uniquely identify the group within the cluster scope:
  - clusterclass-name.fleet.addons.cluster.x-k8s.io: <class-name>
  - clusterclass-namespace.fleet.addons.cluster.x-k8s.io: <class-ns>
  Additionally, this configuration enables the creation of a BundleNamespaceMapping. This mapping selects all available bundles and establishes a link between the namespace of the Cluster and the namespace of the referenced ClusterClass. This allows the Fleet Cluster to be evaluated as a target for application sources such as Bundles, HelmOps, or GitRepos from the ClusterClass namespace.
  
  When all CAPI Cluster resources referencing the same ClusterClass are removed, both the ClusterGroup and BundleNamespaceMapping are cleaned up.
  
  Note: If the cluster field is not set, this setting is enabled by default.
  
  Example:
```
spec:
  cluster:
    applyClassGroup: true
```
- cluster.hostNetwork
  - Description: Host network allows to deploy agent configuration using hostNetwork: true setting which eludes dependency on the CNI configuration for the cluster.
  - Type: boolean
  - Optional: Yes
  Example:
```
spec:
  cluster:
    hostNetwork: true
```
- cluster.namespaceSelector
  - Description: Namespace label selector. If set, only clusters in the namespace matching label selector will be imported. This configuration defines how to select namespaces based on specific labels. The namespaceSelector field ensures that the import strategy applies only to namespaces that have the label import: "true". This is useful for scoping automatic import to specific namespaces rather than applying it cluster-wide.
  - Type: object (LabelSelector)
  - Optional: No (Required within cluster)
  Example:
```
apiVersion: addons.cluster.x-k8s.io/v1alpha1
kind: FleetAddonConfig
metadata:
  name: fleet-addon-config
spec:
  cluster:
    namespaceSelector:
      matchLabels:
        import: "true"
```
- cluster.naming
  - Description: Naming settings for the fleet cluster.
  - Type: object
  - Optional: Yes
  This section allows customizing the name of the created Fleet Cluster resource.
  - cluster.naming.prefix
    - Description: Specify a prefix for the Cluster name, applied to created Fleet cluster.
    - Type: string
    - Optional: Yes
    Example:
```
spec:
  cluster:
    naming:
      prefix: capi-
```
  - cluster.naming.suffix
    - Description: Specify a suffix for the Cluster name, applied to created Fleet cluster.
    - Type: string
    - Optional: Yes
    Example:
```
spec:
  cluster:
    naming:
      suffix: -fleet
```
- cluster.patchResource
  - Description: Allow to patch resources, maintaining the desired state. If is not set, resources will only be re-created in case of removal.
  - Type: boolean
  - Optional: Yes
  Example:
```
spec:
  cluster:
    patchResource: true
```
- cluster.selector
  - Description: Cluster label selector. If set, only clusters matching label selector will be imported. This configuration filters clusters based on labels, ensuring that the FleetAddonConfig applies only to clusters with the label import: "true". This allows more granular per-cluster selection across the cluster scope.
  - Type: object (LabelSelector)
  - Optional: No (Required within cluster)
  Example:
```
apiVersion: addons.cluster.x-k8s.io/v1alpha1
kind: FleetAddonConfig
metadata:
  name: fleet-addon-config
spec:
  cluster:
    selector:
      matchLabels:
        import: "true"
```
- cluster.setOwnerReferences
  - Description: Setting to disable setting owner references on the created resources.
  - Type: boolean
  - Optional: Yes
  Example:
```
spec:
  cluster:
    setOwnerReferences: false
```
clusterClass
- Description: Enable clusterClass controller functionality. This will create Fleet ClusterGroups for each ClusterClaster with the same name.
- Type: object
- Optional: Yes
This section configures the behavior for creating Fleet ClusterGroups from Cluster API ClusterClasses.
- clusterClass.patchResource
  - Description: Allow to patch resources, maintaining the desired state. If is not set, resources will only be re-created in case of removal.
  - Type: boolean
  - Optional: Yes
  Example:
```
spec:
  clusterClass:
    patchResource: true
```
- clusterClass.setOwnerReferences
  - Description: Setting to disable setting owner references on the created resources.
  - Type: boolean
  - Optional: Yes
  Example:
```
spec:
  clusterClass:
    setOwnerReferences: false
```
install
- Description: Configuration for installing the Fleet chart.
- Type: object (oneOf followLatest or version)
- Optional: Yes
This section configures how the Fleet chart is installed. You must specify either followLatest or version.
- install.followLatest
  - Description: Follow the latest version of the chart on install.
  - Type: boolean
  - Optional: No (Required if version is not set)
  Example:
```
spec:
  install:
    followLatest: true
```
- install.version
  - Description: Use specific version to install.
  - Type: string
  - Optional: No (Required if followLatest is not set)
  Example:
```
spec:
  install:
    version: 0.12.0
```

Developers

This section contains developer oriented guides

CAAPF Releases

Release Cadence

New versions are usually released every 2-4 weeks.

Release Process

Clone the repository locally:

git clone git@github.com:rancher/cluster-api-addon-provider-fleet.git

Depending on whether you are cutting a minor/major or patch release, the process varies.

If you are cutting a new minor/major release:

Create a new release branch (i.e release-X) and push it to the upstream repository.

    # Note: `upstream` must be the remote pointing to `github.com/rancher/cluster-api-addon-provider-fleet`.
    git checkout -b release-0.4
    git push -u upstream release-0.4
    # Export the tag of the minor/major release to be cut, e.g.:
    export RELEASE_TAG=v0.4.0

If you are cutting a patch release from an existing release branch:

Use existing release branch.

    # Note: `upstream` must be the remote pointing to `github.com/rancher/cluster-api-addon-provider-fleet`
    git checkout upstream/release-0.4
    # Export the tag of the patch release to be cut, e.g.:
    export RELEASE_TAG=v0.4.1

Create a signed/annotated tag and push it:

# Create tags locally
git tag -s -a ${RELEASE_TAG} -m ${RELEASE_TAG}

# Push tags
git push upstream ${RELEASE_TAG}

This will trigger a release GitHub action that creates a release with CAAPF components.

Wait for the update metadata workflow to pass successfully. This workflow will update the metadata.yaml file in the root of the repository preparing it for the next release. It will open a PR, which needs to be merged before the next minor version release can be cut.

WARNING: Out of date published metadata.yaml file will cause upstream install via clusterctl to fail

Perform Downstream Build

Perform the downstream build for the release tag using the CAAPF GitHub action. Specific steps and references for this process can be found by asking in the #team-rancher-highlander channel.

Versioning

CAAPF follows semantic versioning specification.

Example versions:

Pre-release: v0.4.0-alpha.1
Minor release: v0.4.0
Patch release: v0.4.1
Major release: v1.0.0

With the v0 release of our codebase, we provide the following guarantees:

A (minor) release CAN include:
- Introduction of new API versions, or new Kinds.
- Compatible API changes like field additions, deprecation notices, etc.
- Breaking API changes for deprecated APIs, fields, or code.
- Features, promotion or removal of feature gates.
- And more!
A (patch) release SHOULD only include backwards compatible set of bugfixes.

Backporting

Any backport MUST not be breaking for either API or behavioral changes.

It is generally not accepted to submit pull requests directly against release branches (release-X). However, backports of fixes or changes that have already been merged into the main branch may be accepted to all supported branches:

Critical bugs fixes, security issue fixes, or fixes for bugs without easy workarounds.
Dependency bumps for CVE (usually limited to CVE resolution; backports of non-CVE related version bumps are considered exceptions to be evaluated case by case)
Cert-manager version bumps (to avoid having releases with cert-manager versions that are out of support, when possible)
Changes required to support new Kubernetes versions, when possible. See supported Kubernetes versions for more details.
Changes to use the latest Go patch version to build controller images.
Improvements to existing docs (the latest supported branch hosts the current version of the book)

Branches

CAAPF has two types of branches: the main and release-X branches.

The main branch is where development happens. All the latest and greatest code, including breaking changes, happens on main.

The release-X branches contain stable, backwards compatible code. On every major or minor release, a new branch is created. It is from these branches that minor and patch releases are tagged. In some cases, it may be necessary to open PRs for bugfixes directly against stable branches, but this should generally not be the case.

Support and guarantees

CAAPF maintains the most recent release/releases for all supported APIs. Support for this section refers to the ability to backport and release patch versions; backport policy is defined above.

The API version is determined from the GroupVersion defined in the #[kube(...)] derive macro inside ./src/api.
For the current stable API version (v1alpha1) we support the two most recent minor releases; older minor releases are immediately unsupported when a new major/minor release is available.

Development

Development setup

Prerequisites

Alternatively:

To enter the environment with prerequisites:

nix-shell

Common prerequisite

docker

Create a local development environment

Clone the CAAPF repository locally.
The project provides an easy way of starting your own development environment. You can take some time to study the justfile that includes a number of pre-configured commands to set up and build your own CAPI management cluster and install the addon provider for Fleet.
Run the following:

just start-dev

This command will create a kind cluster and manage the installation of the fleet provider and all dependencies. 4. Once the installation is complete, you can inspect the current state of your development cluster.

E2E Test Failure Investigation Guide

This guide provides a structured approach to investigating end-to-end (e2e) test failures in the cluster-api-addon-provider-fleet project.

Understanding E2E Tests

Our CI pipeline runs several e2e tests to validate functionality across different Kubernetes versions:

Cluster Class Import Tests: Validate the cluster class import functionality
Import Tests: Validate the general import functionality
Import RKE2 Tests: Validate import functionality specific to RKE2 clusters

Each test runs on multiple Kubernetes versions (stable and latest) to ensure compatibility.

Accessing Test Artifacts

When e2e tests fail, the CI pipeline automatically collects and uploads artifacts containing valuable debugging information. These artifacts are created using crust-gather, a tool that captures the state of Kubernetes clusters.

Finding the Artifact URL

Navigate to the failed GitHub Actions workflow run
Scroll down to the "Artifacts" section
Find the artifact corresponding to the failed test (e.g., artifacts-cluster-class-import-stable)
Copy the artifact URL (right-click on the artifact link and copy the URL)

Using the serve-artifact.sh Script

The serve-artifact.sh script allows you to download and serve the test artifacts locally, providing access to the Kubernetes contexts from the test environment.

Prerequisites

A GitHub token with repo read permissions (set as GITHUB_TOKEN environment variable)
kubectl installed, krew installed.
crust-gather installed. Can be replicated with nix, if available.

Serving Artifacts

Fetch the serve-artifact.sh script from the crust-gather GitHub repository:

curl -L https://raw.githubusercontent.com/crust-gather/crust-gather/refs/heads/main/serve-artifact.sh -o serve-artifact.sh && chmod +x serve-artifact.sh

# Using the full artifact URL
./serve-artifact.sh -u https://github.com/rancher/cluster-api-addon-provider-fleet/actions/runs/15737662078/artifacts/3356068059 -s 0.0.0.0:9095

# OR using individual components
./serve-artifact.sh -o rancher -r cluster-api-addon-provider-fleet -a 3356068059 -s 0.0.0.0:9095

This will:

Download the artifact from GitHub
Extract its contents
Start a local server that provides access to the Kubernetes contexts from the test environment

Investigating Failures

Once the artifact server is running, you can use various tools to investigate the failure:

Using k9s

k9s provides a terminal UI to interact with Kubernetes clusters:

Open a new terminal
Run k9s
Press : to open the command prompt
Type ctx and press Enter
Select the context from the test environment (there may be multiple contexts). dev for the e2e tests.
Navigate through resources to identify issues:
- Check pods for crash loops or errors
- Examine events for warnings or errors
- Review logs from relevant components

Common Investigation Paths

Check Fleet Resources:
- FleetAddonConfig resources
- Fleet Cluster resource
- CAPI ClusterGroup resources
- Ensure all relevant labels are present on above.
- Check for created Fleet namespace cluster-<ns>-<cluster name>-<random-prefix> that it is consitent with the NS in the Cluster .status.namespace.
- Check for ClusterRegistrationToken in the cluster namespace.
- Check for BundleNamespaceMapping in the ClusterClass namespace if a cluster references a ClusterClass in a different namespace
Check CAPI Resources:
- Cluster resource
- Check for ControlPlaneInitialized condition to be true
- ClusterClass resources, these are present and have status.observedGeneration consistent with the metadata.generation
- Continue on a per-cluster basis
Check Controller Logs:
- Look for error messages or warnings in the controller logs in the caapf-system namespace.
- Check for reconciliation failures in manager container. In case of upstream installation, check for helm-manager container logs.
Check Kubernetes Events:
- Events often contain information about failures, otherwise CAAPF publishes events for each resource apply from CAPI Cluster, including Fleet Cluster in the cluster namespace, ClusterGroup and BundleNamespaceMapping in the ClusterClass namespace. These events are created by caapf-controller component.

Common Failure Patterns

Import Failures

Symptom: Fleet Cluster not created or in error state
Investigation: Check the controller logs in the cattle-fleet-system namespace for errors during import processing. Check for errors in the CAAPF logs for missing cluster definition.
Common causes:
- Fleet cluster import process is serial, and hot loop in other cluster import blocks further cluster imports. Fleet issue.
- CAPI Cluster is not ready and does not have ControlPlaneInitialized condition. Issue with CAPI or requires more time to be ready.
- Otherwise CAAPF issue.

Cluster Class Failures

Symptom: ClusterClass not properly imported or is not evaluated as a target.
Investigation: Check for the BundleNamespaceMapping in the ClusterClass namespace named after the Cluster resource. Check the controller logs in the caapf-system namespace for errors during ClusterClass processing. Check ClusterGroup resource in the Cluster namespace.
Common causes:
- Check for Cluster referencing ClusterClass in a different namespace.
- In the event of missing resources, CAAPF related error.

Keyboard shortcuts

Cluster API Addon Provider Fleet