Production Patterns

First PublishedFeb 16, 2026Last UpdatedApr 8, 2026ByAtif Alam

Running Kubernetes in production requires more than just deploying pods. These patterns help keep workloads reliable, efficient, and resilient.

Health Checks (Probes)

Probes let Kubernetes know whether your containers are healthy.

Startup Probe

“Has the container finished starting?” Useful for slow-starting apps. Until the startup probe succeeds, liveness and readiness probes are disabled.

Why this matters: prevents slow-starting workloads from being killed too early by liveness checks, which would otherwise cause avoidable restart loops.

How SREs use it: if startup probes fail, focus on boot path problems (migrations, cold caches, config/secret load, external dependencies) rather than steady-state traffic behavior.

1
startupProbe:
2
  httpGet:
3
    path: /healthz
4
    port: 8080
5
  failureThreshold: 30
6
  periodSeconds: 10

Readiness Probe

“Is the container ready to receive traffic?” If it fails, the pod is removed from Service endpoints (no traffic routed to it) but not restarted.

Why this matters: protects users from receiving traffic to pods that are alive but temporarily unable to serve safely (startup warmup, dependency outage, overload).

How SREs use it: readiness failures are often capacity or dependency health signals. They help explain latency/5xx during rollouts and incidents because traffic is being drained from unhealthy pods.

1
readinessProbe:
2
  httpGet:
3
    path: /ready
4
    port: 8080
5
  initialDelaySeconds: 5
6
  periodSeconds: 10

Liveness Probe

“Is the container still working?” If it fails, Kubernetes restarts the container.

Why this matters: catches deadlocks/hung processes that are still running but no longer making progress, so bad pods do not stay stuck forever.

How SREs use it: repeated liveness failures usually mean app/runtime instability (crash loop, lockup, dependency hard-fail). Treat spikes as a reliability signal to inspect logs, recent deploys, and runtime limits.

1
livenessProbe:
2
  httpGet:
3
    path: /healthz
4
    port: 8080
5
  initialDelaySeconds: 10
6
  periodSeconds: 15

Probe signals in practice

Probe	Primary purpose	What failures usually indicate
Startup	Protect slow boot from premature restarts	boot-time regressions, startup dependency delays
Readiness	Protect traffic from not-ready pods	dependency degradation, warmup, overload, partial outage
Liveness	Recover from stuck/broken processes	app crash loop, deadlock, hard runtime failure

Resource Requests and Limits

Tell Kubernetes how much CPU and memory your containers need.

Requests — The minimum guaranteed resources. The scheduler uses this to place pods on nodes.
Limits — The maximum allowed. If a container exceeds its memory limit, it’s killed (OOMKilled). If it exceeds CPU, it’s throttled.

1
resources:
2
  requests:
3
    cpu: 100m        # 0.1 CPU cores
4
    memory: 128Mi
5
  limits:
6
    cpu: 500m
7
    memory: 512Mi

Always set requests. Set limits to prevent runaway containers from taking down a node.

Horizontal Pod Autoscaler (HPA)

Automatically scales the number of pod replicas based on CPU, memory, or custom metrics.

1
apiVersion: autoscaling/v2
2
kind: HorizontalPodAutoscaler
3
metadata:
4
  name: my-app-hpa
5
spec:
6
  scaleTargetRef:
7
    apiVersion: apps/v1
8
    kind: Deployment
9
    name: my-app
10
  minReplicas: 2
11
  maxReplicas: 10
12
  metrics:
13
    - type: Resource
14
      resource:
15
        name: cpu
16
        target:
17
          type: Utilization
18
          averageUtilization: 70

This scales my-app between 2 and 10 replicas, targeting 70% average CPU utilization. Requires metrics-server running in the cluster.

Pod Disruption Budgets (PDB)

A PDB limits how many pods can be down at the same time during voluntary disruptions (node drains, cluster upgrades). It does not prevent involuntary disruptions (node crashes).

1
apiVersion: policy/v1
2
kind: PodDisruptionBudget
3
metadata:
4
  name: my-app-pdb
5
spec:
6
  minAvailable: 2       # at least 2 pods must be running
7
  selector:
8
    matchLabels:
9
      app: my-app

Alternatively use maxUnavailable: 1 to say “at most 1 pod can be down at a time.”

Rolling Updates

Deployments use rolling updates by default. You can tune the behavior:

The snippet below is a partial from a Kubernetes Deployment manifest, specifically spec.strategy under kind: Deployment.

1
spec:
2
  strategy:
3
    type: RollingUpdate
4
    rollingUpdate:
5
      maxSurge: 1          # how many extra pods during update
6
      maxUnavailable: 0    # never have fewer than desired replicas

maxSurge: 1, maxUnavailable: 0 — Conservative: always keep full capacity, add one new pod at a time.
maxSurge: 0, maxUnavailable: 1 — Tight on resources: remove one old pod before adding a new one.

To trigger a rollout:

1
kubectl set image deployment/my-app app=my-app:v2
2
kubectl rollout status deployment/my-app
3
kubectl rollout pause deployment/my-app         # pause rollout
4
kubectl rollout resume deployment/my-app        # resume rollout
5
kubectl rollout undo deployment/my-app          # rollback

Capacity planning

Horizontal Pod Autoscaler reacts to live signals (CPU, memory, custom metrics), but planning still matters: you need enough node capacity, quotas, and headroom for spikes. Before production:

Load-test realistic traffic and failure modes; compare to SLOs if you have them.
Right-size requests/limits using measured usage, not guesses.
Forecast growth (seasonal traffic, new features) and schedule cluster or instance changes before limits bite.

If workloads are not purely elastic (batch windows, fixed contracts), document expected peaks and who approves extra capacity spend.

Production Checklist

Practice	Why
Set liveness and readiness probes	Auto-restart broken containers; stop routing to unready ones
Set resource requests and limits	Fair scheduling; prevent noisy neighbors
Use HPA for variable load	Scale out automatically; save cost when idle
Define PodDisruptionBudgets	Protect availability during maintenance
Use rolling updates with maxUnavailable: 0	Zero-downtime deployments
Run multiple replicas	No single point of failure
Use namespaces	Isolate environments and teams
Back up etcd	Cluster state recovery