Operators
An operator extends Kubernetes by encoding domain-specific operational knowledge — how to deploy, scale, back up, and upgrade a particular application — into a controller that runs inside the cluster.
Kubernetes already uses a controller loop: watch desired state, compare to actual state, reconcile. Operators use the exact same pattern for your own custom resources.
The Building Blocks
Section titled “The Building Blocks”- Custom Resource Definition (CRD) — Extends the Kubernetes API with a new resource type (e.g.
kind: PostgresClusterinstead of justDeployment). - Custom Resource (CR) — An instance of that CRD (e.g. “I want a 3-node Postgres cluster with 100Gi storage”).
- Controller — A program that watches CRs and reconciles reality to match the desired state.
User creates CR → Controller sees it → Creates Pods, Services, PVCs, etc. ↑ | └──── Updates CR status with current state ────┘Why Use an Operator?
Section titled “Why Use an Operator?”Without an operator, running something like PostgreSQL on Kubernetes means manually managing StatefulSets, PVCs, Services, ConfigMaps, backups, failover, replication, version upgrades, and monitoring.
An operator encodes all that knowledge so you just write:
apiVersion: postgres-operator.crunchydata.com/v1beta1kind: PostgresClustermetadata: name: my-dbspec: postgresVersion: 15 instances: - replicas: 3 dataVolumeClaimSpec: accessModes: [ReadWriteOnce] resources: requests: storage: 100Gi backups: pgbackrest: repos: - name: repo1 schedules: full: "0 1 * * 0"And the operator handles everything else — creating pods, setting up replication, running backups on schedule, and healing failures.
Well-Known Operators
Section titled “Well-Known Operators”| Operator | What It Manages |
|---|---|
| Prometheus Operator | Prometheus instances, alerting rules, ServiceMonitors |
| Cert-Manager | TLS certificates (auto-renewal from Let’s Encrypt, Vault, etc.) |
| External Secrets Operator | Syncs secrets from Vault / AWS SM / Azure KV into K8s Secrets |
| ArgoCD | GitOps-based continuous deployments |
| Strimzi | Apache Kafka clusters |
| CloudNativePG / Crunchy | PostgreSQL clusters |
Cert-Manager fits into the broader TLS/PKI picture for clusters and cloud load balancers — see TLS and Certificates for ACM-centric lifecycle and how teams often split trust between cloud edges and in-cluster issuers.
You can browse hundreds more at OperatorHub.io.
The Reconcile Loop
Section titled “The Reconcile Loop”Every operator follows the same pattern, regardless of framework:
Watch for changes (create / update / delete of your CR) ↓Fetch the current CR ↓Compare desired state (spec) vs actual state (what exists in cluster) ↓Take action to reconcile (create / update / delete child resources) ↓Update CR status ↓Return (requeue if needed)Key principles:
- Idempotent — Running reconcile twice with the same input produces the same result.
- Level-triggered, not edge-triggered — React to current state, not to “what just happened.”
- Owns child resources — Set
ownerReferenceso garbage collection cleans up when the CR is deleted.
Frameworks for Building Operators
Section titled “Frameworks for Building Operators”| Framework | Language | Best For |
|---|---|---|
| Kubebuilder | Go | The standard — most production operators use this |
| Operator SDK | Go, Ansible, Helm | Red Hat’s toolkit; wraps Kubebuilder for Go, also supports Ansible/Helm-based operators |
| Kopf | Python | Python shops, simpler operators, rapid prototyping |
| Metacontroller | Any (via webhooks) | Lightweight — you write a webhook in any language, Metacontroller handles the watch/reconcile loop |
Building With Kubebuilder (Go)
Section titled “Building With Kubebuilder (Go)”1. Scaffold the Project
Section titled “1. Scaffold the Project”kubebuilder init --domain example.com --repo github.com/myorg/my-operator2. Create an API (CRD + Controller)
Section titled “2. Create an API (CRD + Controller)”kubebuilder create api --group apps --version v1alpha1 --kind MyAppThis generates:
api/v1alpha1/myapp_types.go— Your CRD struct (spec and status fields)internal/controllers/myapp_controller.go— The reconcile loop
3. Define the CRD Spec
Section titled “3. Define the CRD Spec”type MyAppSpec struct { Replicas int32 `json:"replicas"` Image string `json:"image"` Port int32 `json:"port,omitempty"`}
type MyAppStatus struct { AvailableReplicas int32 `json:"availableReplicas"` Phase string `json:"phase"`}4. Write the Reconcile Loop
Section titled “4. Write the Reconcile Loop”func (r *MyAppReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) { log := log.FromContext(ctx)
// 1. Fetch the CR var myapp appsv1alpha1.MyApp if err := r.Get(ctx, req.NamespacedName, &myapp); err != nil { return ctrl.Result{}, client.IgnoreNotFound(err) }
// 2. Build the desired Deployment desired := &appsv1.Deployment{ ObjectMeta: metav1.ObjectMeta{ Name: myapp.Name, Namespace: myapp.Namespace, }, Spec: appsv1.DeploymentSpec{ Replicas: &myapp.Spec.Replicas, Selector: &metav1.LabelSelector{ MatchLabels: map[string]string{"app": myapp.Name}, }, Template: corev1.PodTemplateSpec{ ObjectMeta: metav1.ObjectMeta{ Labels: map[string]string{"app": myapp.Name}, }, Spec: corev1.PodSpec{ Containers: []corev1.Container{{ Name: myapp.Name, Image: myapp.Spec.Image, Ports: []corev1.ContainerPort{{ ContainerPort: myapp.Spec.Port, }}, }}, }, }, }, }
// 3. Set owner reference (garbage collection) if err := ctrl.SetControllerReference(&myapp, desired, r.Scheme); err != nil { return ctrl.Result{}, err }
// 4. Create or update the Deployment found := &appsv1.Deployment{} err := r.Get(ctx, types.NamespacedName{Name: desired.Name, Namespace: desired.Namespace}, found) if err != nil && errors.IsNotFound(err) { log.Info("Creating Deployment", "name", desired.Name) err = r.Create(ctx, desired) } else if err == nil { log.Info("Updating Deployment", "name", desired.Name) found.Spec = desired.Spec err = r.Update(ctx, found) } if err != nil { return ctrl.Result{}, err }
// 5. Update status myapp.Status.AvailableReplicas = found.Status.AvailableReplicas myapp.Status.Phase = "Running" if err := r.Status().Update(ctx, &myapp); err != nil { return ctrl.Result{}, err }
return ctrl.Result{}, nil}5. Generate, Install, and Run
Section titled “5. Generate, Install, and Run”make manifests # generates CRD YAML from Go struct tagsmake install # applies CRD to cluster (kubectl apply)make run # runs the controller locally (for development)6. Build and Deploy
Section titled “6. Build and Deploy”make docker-build docker-push IMG=myorg/my-operator:v0.1.0make deploy IMG=myorg/my-operator:v0.1.0The operator now runs as a Deployment inside the cluster, watching for MyApp resources.
Building With Kopf (Python)
Section titled “Building With Kopf (Python)”A lighter alternative for simpler operators or teams that prefer Python:
import kopfimport kubernetes.client as k8s
@kopf.on.create('example.com', 'v1alpha1', 'myapps')def on_create(spec, name, namespace, **kwargs): replicas = spec.get('replicas', 1) image = spec.get('image', 'nginx')
api = k8s.AppsV1Api() deployment = k8s.V1Deployment( metadata=k8s.V1ObjectMeta(name=name), spec=k8s.V1DeploymentSpec( replicas=replicas, selector=k8s.V1LabelSelector(match_labels={'app': name}), template=k8s.V1PodTemplateSpec( metadata=k8s.V1ObjectMeta(labels={'app': name}), spec=k8s.V1PodSpec(containers=[ k8s.V1Container(name=name, image=image) ]) ) ) ) api.create_namespaced_deployment(namespace, deployment)
@kopf.on.update('example.com', 'v1alpha1', 'myapps')def on_update(spec, name, namespace, **kwargs): api = k8s.AppsV1Api() patch = {'spec': {'replicas': spec.get('replicas', 1)}} api.patch_namespaced_deployment(name, namespace, patch)
@kopf.on.delete('example.com', 'v1alpha1', 'myapps')def on_delete(name, namespace, **kwargs): api = k8s.AppsV1Api() api.delete_namespaced_deployment(name, namespace)Run it with:
kopf run my_operator.py --verboseKopf handles the watch loop, retries, and leader election. You write the handlers.
Using Your Custom Resource
Section titled “Using Your Custom Resource”Once the CRD is installed and the operator is running:
apiVersion: apps.example.com/v1alpha1kind: MyAppmetadata: name: web-frontend namespace: defaultspec: replicas: 3 image: my-frontend:v2.0.0 port: 8080kubectl apply -f my-app.yamlkubectl get myapps # see your custom resourceskubectl describe myapp web-frontend # see status and eventsKey Takeaways
Section titled “Key Takeaways”- An operator = CRD (defines a new resource type) + controller (reconciles desired vs actual state).
- The reconcile loop must be idempotent and level-triggered.
- Use Kubebuilder (Go) for production operators; Kopf (Python) for simpler use cases.
- Set owner references on child resources so deletion cascades automatically.
- Start simple — an operator that creates a Deployment + Service — and layer on complexity (backups, upgrades, failover) as needed.