Skip to content

Production Patterns

First PublishedLast UpdatedByAtif Alam

Running Kubernetes in production requires more than just deploying pods. These patterns help keep workloads reliable, efficient, and resilient.

Probes let Kubernetes know whether your containers are healthy.

“Has the container finished starting?” Useful for slow-starting apps. Until the startup probe succeeds, liveness and readiness probes are disabled.

Why this matters: prevents slow-starting workloads from being killed too early by liveness checks, which would otherwise cause avoidable restart loops.

How SREs use it: if startup probes fail, focus on boot path problems (migrations, cold caches, config/secret load, external dependencies) rather than steady-state traffic behavior.

startupProbe:
httpGet:
path: /healthz
port: 8080
failureThreshold: 30
periodSeconds: 10

“Is the container ready to receive traffic?” If it fails, the pod is removed from Service endpoints (no traffic routed to it) but not restarted.

Why this matters: protects users from receiving traffic to pods that are alive but temporarily unable to serve safely (startup warmup, dependency outage, overload).

How SREs use it: readiness failures are often capacity or dependency health signals. They help explain latency/5xx during rollouts and incidents because traffic is being drained from unhealthy pods.

readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 10

“Is the container still working?” If it fails, Kubernetes restarts the container.

Why this matters: catches deadlocks/hung processes that are still running but no longer making progress, so bad pods do not stay stuck forever.

How SREs use it: repeated liveness failures usually mean app/runtime instability (crash loop, lockup, dependency hard-fail). Treat spikes as a reliability signal to inspect logs, recent deploys, and runtime limits.

livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 10
periodSeconds: 15
ProbePrimary purposeWhat failures usually indicate
StartupProtect slow boot from premature restartsboot-time regressions, startup dependency delays
ReadinessProtect traffic from not-ready podsdependency degradation, warmup, overload, partial outage
LivenessRecover from stuck/broken processesapp crash loop, deadlock, hard runtime failure

Tell Kubernetes how much CPU and memory your containers need.

  • Requests — The minimum guaranteed resources. The scheduler uses this to place pods on nodes.
  • Limits — The maximum allowed. If a container exceeds its memory limit, it’s killed (OOMKilled). If it exceeds CPU, it’s throttled.
resources:
requests:
cpu: 100m # 0.1 CPU cores
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi

Always set requests. Set limits to prevent runaway containers from taking down a node.

Automatically scales the number of pod replicas based on CPU, memory, or custom metrics.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70

This scales my-app between 2 and 10 replicas, targeting 70% average CPU utilization. Requires metrics-server running in the cluster.

A PDB limits how many pods can be down at the same time during voluntary disruptions (node drains, cluster upgrades). It does not prevent involuntary disruptions (node crashes).

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: my-app-pdb
spec:
minAvailable: 2 # at least 2 pods must be running
selector:
matchLabels:
app: my-app

Alternatively use maxUnavailable: 1 to say “at most 1 pod can be down at a time.”

Deployments use rolling updates by default. You can tune the behavior:

The snippet below is a partial from a Kubernetes Deployment manifest, specifically spec.strategy under kind: Deployment.

spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # how many extra pods during update
maxUnavailable: 0 # never have fewer than desired replicas
  • maxSurge: 1, maxUnavailable: 0 — Conservative: always keep full capacity, add one new pod at a time.
  • maxSurge: 0, maxUnavailable: 1 — Tight on resources: remove one old pod before adding a new one.

To trigger a rollout:

Terminal window
kubectl set image deployment/my-app app=my-app:v2
kubectl rollout status deployment/my-app
kubectl rollout pause deployment/my-app # pause rollout
kubectl rollout resume deployment/my-app # resume rollout
kubectl rollout undo deployment/my-app # rollback

Horizontal Pod Autoscaler reacts to live signals (CPU, memory, custom metrics), but planning still matters: you need enough node capacity, quotas, and headroom for spikes. Before production:

  • Load-test realistic traffic and failure modes; compare to SLOs if you have them.
  • Right-size requests/limits using measured usage, not guesses.
  • Forecast growth (seasonal traffic, new features) and schedule cluster or instance changes before limits bite.

If workloads are not purely elastic (batch windows, fixed contracts), document expected peaks and who approves extra capacity spend.

PracticeWhy
Set liveness and readiness probesAuto-restart broken containers; stop routing to unready ones
Set resource requests and limitsFair scheduling; prevent noisy neighbors
Use HPA for variable loadScale out automatically; save cost when idle
Define PodDisruptionBudgetsProtect availability during maintenance
Use rolling updates with maxUnavailable: 0Zero-downtime deployments
Run multiple replicasNo single point of failure
Use namespacesIsolate environments and teams
Back up etcdCluster state recovery