Production Patterns
Running Kubernetes in production requires more than just deploying pods. These patterns help keep workloads reliable, efficient, and resilient.
Health Checks (Probes)
Section titled “Health Checks (Probes)”Probes let Kubernetes know whether your containers are healthy.
Startup Probe
Section titled “Startup Probe”“Has the container finished starting?” Useful for slow-starting apps. Until the startup probe succeeds, liveness and readiness probes are disabled.
Why this matters: prevents slow-starting workloads from being killed too early by liveness checks, which would otherwise cause avoidable restart loops.
How SREs use it: if startup probes fail, focus on boot path problems (migrations, cold caches, config/secret load, external dependencies) rather than steady-state traffic behavior.
startupProbe: httpGet: path: /healthz port: 8080 failureThreshold: 30 periodSeconds: 10Readiness Probe
Section titled “Readiness Probe”“Is the container ready to receive traffic?” If it fails, the pod is removed from Service endpoints (no traffic routed to it) but not restarted.
Why this matters: protects users from receiving traffic to pods that are alive but temporarily unable to serve safely (startup warmup, dependency outage, overload).
How SREs use it: readiness failures are often capacity or dependency health signals. They help explain latency/5xx during rollouts and incidents because traffic is being drained from unhealthy pods.
readinessProbe: httpGet: path: /ready port: 8080 initialDelaySeconds: 5 periodSeconds: 10Liveness Probe
Section titled “Liveness Probe”“Is the container still working?” If it fails, Kubernetes restarts the container.
Why this matters: catches deadlocks/hung processes that are still running but no longer making progress, so bad pods do not stay stuck forever.
How SREs use it: repeated liveness failures usually mean app/runtime instability (crash loop, lockup, dependency hard-fail). Treat spikes as a reliability signal to inspect logs, recent deploys, and runtime limits.
livenessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 10 periodSeconds: 15Probe signals in practice
Section titled “Probe signals in practice”| Probe | Primary purpose | What failures usually indicate |
|---|---|---|
| Startup | Protect slow boot from premature restarts | boot-time regressions, startup dependency delays |
| Readiness | Protect traffic from not-ready pods | dependency degradation, warmup, overload, partial outage |
| Liveness | Recover from stuck/broken processes | app crash loop, deadlock, hard runtime failure |
Resource Requests and Limits
Section titled “Resource Requests and Limits”Tell Kubernetes how much CPU and memory your containers need.
- Requests — The minimum guaranteed resources. The scheduler uses this to place pods on nodes.
- Limits — The maximum allowed. If a container exceeds its memory limit, it’s killed (OOMKilled). If it exceeds CPU, it’s throttled.
resources: requests: cpu: 100m # 0.1 CPU cores memory: 128Mi limits: cpu: 500m memory: 512MiAlways set requests. Set limits to prevent runaway containers from taking down a node.
Horizontal Pod Autoscaler (HPA)
Section titled “Horizontal Pod Autoscaler (HPA)”Automatically scales the number of pod replicas based on CPU, memory, or custom metrics.
apiVersion: autoscaling/v2kind: HorizontalPodAutoscalermetadata: name: my-app-hpaspec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: my-app minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70This scales my-app between 2 and 10 replicas, targeting 70% average CPU utilization. Requires metrics-server running in the cluster.
Pod Disruption Budgets (PDB)
Section titled “Pod Disruption Budgets (PDB)”A PDB limits how many pods can be down at the same time during voluntary disruptions (node drains, cluster upgrades). It does not prevent involuntary disruptions (node crashes).
apiVersion: policy/v1kind: PodDisruptionBudgetmetadata: name: my-app-pdbspec: minAvailable: 2 # at least 2 pods must be running selector: matchLabels: app: my-appAlternatively use maxUnavailable: 1 to say “at most 1 pod can be down at a time.”
Rolling Updates
Section titled “Rolling Updates”Deployments use rolling updates by default. You can tune the behavior:
The snippet below is a partial from a Kubernetes Deployment manifest, specifically spec.strategy under kind: Deployment.
spec: strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 # how many extra pods during update maxUnavailable: 0 # never have fewer than desired replicas- maxSurge: 1, maxUnavailable: 0 — Conservative: always keep full capacity, add one new pod at a time.
- maxSurge: 0, maxUnavailable: 1 — Tight on resources: remove one old pod before adding a new one.
To trigger a rollout:
kubectl set image deployment/my-app app=my-app:v2kubectl rollout status deployment/my-appkubectl rollout pause deployment/my-app # pause rolloutkubectl rollout resume deployment/my-app # resume rolloutkubectl rollout undo deployment/my-app # rollbackCapacity planning
Section titled “Capacity planning”Horizontal Pod Autoscaler reacts to live signals (CPU, memory, custom metrics), but planning still matters: you need enough node capacity, quotas, and headroom for spikes. Before production:
- Load-test realistic traffic and failure modes; compare to SLOs if you have them.
- Right-size requests/limits using measured usage, not guesses.
- Forecast growth (seasonal traffic, new features) and schedule cluster or instance changes before limits bite.
If workloads are not purely elastic (batch windows, fixed contracts), document expected peaks and who approves extra capacity spend.
Production Checklist
Section titled “Production Checklist”| Practice | Why |
|---|---|
| Set liveness and readiness probes | Auto-restart broken containers; stop routing to unready ones |
| Set resource requests and limits | Fair scheduling; prevent noisy neighbors |
| Use HPA for variable load | Scale out automatically; save cost when idle |
| Define PodDisruptionBudgets | Protect availability during maintenance |
| Use rolling updates with maxUnavailable: 0 | Zero-downtime deployments |
| Run multiple replicas | No single point of failure |
| Use namespaces | Isolate environments and teams |
| Back up etcd | Cluster state recovery |
See also
Section titled “See also”- Service readiness checklist — observability, CI/CD, and Kubernetes in one place before go-live.
- Sidecar Pattern — design and operate helper containers safely in production Pods.