CI/CD Best Practices

First PublishedFeb 17, 2026Last UpdatedMar 31, 2026ByAtif Alam

Good CI/CD isn’t just about having a pipeline — it’s about having a pipeline that is fast, reliable, secure, and maintainable. This page covers the practices that separate a basic pipeline from a production-grade one.

Self-service and platform guardrails

Platform teams often expose self-service pipelines or templates so product teams can ship without a ticket for every change. That only works with guardrails: approved base images, mandatory scans, environment promotion rules, and observability hooks. The goal is safe autonomy—speed with defaults that prevent repeated mistakes. See Pipeline fundamentals for stages and secrets; pair with Kubernetes production patterns and service readiness for what “done” means before production.

Pipeline Design

Fail Fast

Order stages so the quickest checks run first. If linting takes 10 seconds and e2e tests take 10 minutes, run linting first:

1
lint (10s) ──► unit tests (60s) ──► integration tests (3m) ──► e2e tests (10m) ──► deploy
2
     ▲                                                                              │
3
     │              If any step fails, pipeline stops here                           │
4
     └──────────────────────────────────────────────────────────────────────────────┘

Parallelize

Run independent jobs simultaneously:

1
                    ┌── lint (10s)
2
                    │
3
build (30s) ────────┼── unit tests (60s)        Total: 30s + 60s = 90s
4
                    │                            (not 30s + 10s + 60s + 30s = 130s)
5
                    └── security scan (30s)

Keep Pipelines Under 10 Minutes

Fast feedback is the core value of CI. If the pipeline takes 30+ minutes, developers stop waiting for it and context-switch.

Technique	Impact
Cache dependencies	Save 30-60s per run (npm, pip, Go modules)
Parallelize tests	Cut test time by N (number of parallel jobs)
Use faster runners	Larger VMs = faster builds
Skip unnecessary work	Path filtering for monorepos
Use incremental builds	Only recompile changed modules
Split test suites	Run unit tests in CI, e2e tests on merge to main only

Pipeline Stages

A recommended stage ordering:

Stage	What	When to Run
Lint / format	Code style, formatting	Every push and PR
Build	Compile, install deps, create artifact	Every push and PR
Unit tests	Fast, isolated tests	Every push and PR
Integration tests	Tests with real dependencies (DB, API)	Every push and PR (or on merge)
Security scan	SAST, dependency vulnerabilities, container scan	Every push and PR
E2E tests	Full system tests	On merge to main (or nightly)
Deploy staging	Deploy to staging, smoke test	On merge to main
Deploy production	Manual approval, deploy, monitor	After staging validation

Security

No Secrets in Code

1
Bad:  AWS_SECRET_KEY = "AKIA..." hardcoded in pipeline YAML or source code
2
Good: Use the CI/CD platform's encrypted secret store
3
Best: Use OIDC — no stored credentials at all

OIDC Over Long-Lived Credentials

OIDC (OpenID Connect) lets the pipeline request a short-lived token from the cloud provider — no access keys to store, rotate, or leak:

Platform	OIDC Support
GitHub Actions	`permissions: id-token: write` + cloud provider trust
GitLab CI	`id_tokens` keyword
Azure Pipelines	Workload Identity Federation

See GitHub Actions OIDC and GitLab CI OIDC for setup.

Least-Privilege Permissions

GitHub Actions: Set permissions in the workflow to restrict GITHUB_TOKEN scope.
GitLab CI: Use protected and masked variables, scoped to environments.
Cloud roles: Grant only the permissions the pipeline needs (e.g. push to ECR, deploy to ECS — not full admin).

Pin Dependencies and Actions

1
# Bad: uses latest (could change without notice)
2
- uses: actions/checkout@main
3

4
# Good: pin to a version tag
5
- uses: actions/checkout@v4
6

7
# Best: pin to a full commit SHA (immutable)
8
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11

Supply Chain Security

Practice	What It Does
Pin action/image versions	Prevent unexpected changes from upstream
Dependency scanning	Detect known vulnerabilities in packages
Container scanning	Scan Docker images for CVEs
SBOM generation	Create a Software Bill of Materials for each build
Signed artifacts	Sign container images (cosign, Notary) to prove provenance
Dependabot / Renovate	Auto-update dependencies with PRs

Testing Strategy in CI

The Test Pyramid

1
           ┌─────────┐
2
          /  E2E Tests  \            Slow, expensive, fragile
3
         /   (few: ~10)   \          Run on merge to main
4
        /─────────────────────\
5
       /  Integration Tests     \    Medium speed
6
      /     (~50-100)            \   Run on every PR
7
     /─────────────────────────────\
8
    /      Unit Tests                \  Fast, cheap, reliable
9
   /        (~500-1000+)              \ Run on every push
10
  /─────────────────────────────────────\

Level	What It Tests	Speed	Run When
Unit	Individual functions/classes in isolation	Milliseconds	Every push
Integration	Components working together (DB, API)	Seconds	Every PR
E2E	Full user flows through the UI or API	Minutes	Merge to main, nightly

Test Splitting

For large test suites, split tests across parallel runners:

1
# GitHub Actions — run tests in parallel shards
2
strategy:
3
  matrix:
4
    shard: [1, 2, 3, 4]
5
steps:
6
  - run: npm test -- --shard=${{ matrix.shard }}/4

Flaky Test Management

Flaky tests (tests that sometimes pass, sometimes fail) erode confidence in the pipeline:

Strategy	What It Does
Quarantine	Move flaky tests to a separate job (non-blocking)
Retry	Retry failed tests once (but track flake rate)
Track metrics	Dashboard of flaky tests — fix or delete them
No new flakes	Require new tests to pass 10 consecutive runs before merging

Branch Strategies

Trunk-Based Development (Recommended)

1
main ───●───●───●───●───●───●───●───●──► (always deployable)
2
         \       /     \       /
3
          feat-A        feat-B
4
          (short-lived, 1-2 days)

Everyone commits to main (or very short-lived feature branches).
Feature flags hide incomplete work.
CI runs on every push; CD deploys main continuously.

Best for: Teams with good test coverage and feature flags. Fastest feedback loop.

GitHub Flow

1
main ───●───────●───────●───────●──► (protected, always deployable)
2
         \     /         \     /
3
          feat-A          feat-B
4
          (PR + review)   (PR + review)

Create a feature branch from main.
Open a PR, get review, merge.
main is always deployable.

Best for: Most teams. Simple, well-understood, works with GitHub PRs.

GitLab Flow

1
main ──────●──────●──────●──────●───► (development)
2
            \            \
3
             ▼             ▼
4
staging ─────●──────────────●───────► (staging environment)
5
              \              \
6
               ▼              ▼
7
production ────●──────────────●─────► (production environment)

main for development, staging and production branches for deployment.
Merge from main → staging → production.

Best for: Teams that need environment branches and explicit promotion.

Comparison

Strategy	Branches	Merge Frequency	Complexity	Best For
Trunk-based	`main` only (+ short feature)	Multiple times/day	Low	High-performing teams
GitHub Flow	`main` + feature branches	Daily to weekly	Low	Most teams
GitLab Flow	`main` + env branches	Weekly	Medium	Teams needing env promotion
Git Flow	`main` + `develop` + feature + release + hotfix	Weekly to monthly	High	Versioned software (avoid if possible)

Monorepo CI

For repositories containing multiple services/packages:

Path Filtering

Only run pipelines for the service that changed:

1
# GitHub Actions
2
on:
3
  push:
4
    paths:
5
      - 'services/api/**'
6
      - 'shared/**'                     # Also rebuild if shared code changes

1
# GitLab CI
2
api-tests:
3
  rules:
4
    - changes:
5
        - services/api/**
6
        - shared/**

Affected-Only Builds

Tools like Nx (JavaScript), Turborepo, Bazel, or pants understand the dependency graph and only build/test what was affected:

1
# Nx: only test projects affected by changes since main
2
npx nx affected --target=test --base=origin/main

Monorepo Best Practices

Practice	Why
Path filters	Don’t rebuild everything on every change
Shared base image	Pre-built Docker image with common deps
Dependency graph tool	Only build/test affected packages
Separate deploy jobs per service	Don’t deploy the API when only the frontend changed
Cache aggressively	Share caches across services where possible

Pipeline as Code

Treat pipeline definitions like application code:

Practice	What It Means
Version controlled	Pipeline YAML lives in the same repo as the code
Code reviewed	Pipeline changes go through PR review
Tested	Use `act` (GitHub Actions) or `gitlab-ci-lint` to validate locally
DRY	Reusable workflows (GitHub) / includes+extends (GitLab) / templates (Azure)
Documented	Comments explaining non-obvious steps

Notifications and Observability

Notifications

1
# GitHub Actions — Slack notification on failure
2
- name: Notify Slack on failure
3
  if: failure()
4
  uses: slackapi/slack-github-action@v1
5
  with:
6
    channel-id: 'C0123456789'
7
    slack-message: "Pipeline failed: ${{ github.repository }}@${{ github.sha }}"
8
  env:
9
    SLACK_BOT_TOKEN: ${{ secrets.SLACK_TOKEN }}

Channel	When	What
Slack / Teams	Failure	Pipeline failed, deployment failed
Email	Failure (optional)	Summary of failures
GitHub/GitLab comments	PR pipelines	Test results, coverage, plan output
Dashboard	Always	Pipeline success rate, duration trends

DORA Metrics

The DORA (DevOps Research and Assessment) metrics measure CI/CD effectiveness:

Metric	What It Measures	Elite Benchmark
Deployment Frequency	How often you deploy to production	Multiple times per day
Lead Time for Changes	Time from commit to production	Less than 1 hour
Change Failure Rate	% of deployments that cause a failure	0–15%
Time to Restore Service	Time to recover from a production failure	Less than 1 hour

Track these metrics to understand and improve your CI/CD process:

1
Deployment Frequency:     3x/day     ✓ Elite
2
Lead Time for Changes:    45 min     ✓ Elite
3
Change Failure Rate:      8%         ✓ Elite
4
Time to Restore Service:  30 min     ✓ Elite

Tools for DORA metrics: Sleuth, LinearB, Faros AI, GitLab Value Stream Analytics, GitHub-based custom dashboards.

Common Anti-Patterns

Anti-Pattern	Problem	Fix
30+ minute pipelines	Developers don’t wait, context-switch	Parallelize, cache, split test suites
Flaky tests	False failures erode trust	Quarantine, fix, or delete flaky tests
Manual gates everywhere	Slow deployments, bottleneck on approvers	Automate staging deploy; manual gate only for production
No rollback plan	Stuck when a deployment goes bad	Test rollback procedures regularly
Secrets in code	Credential leaks	Use platform secret store + OIDC
Pipeline YAML copy-paste	Inconsistent, hard to maintain	Reusable workflows / includes / templates
No path filtering in monorepo	Every change rebuilds everything	Add path filters and affected-only builds
Testing only in CI	Slow feedback for developers	Run fast tests locally too (`pre-commit`, `husky`)
Ignoring security scans	Vulnerabilities ship to production	Block merge if critical/high vulnerabilities found
No deployment observability	Don’t know if deploy succeeded or degraded	Smoke tests + monitoring after deploy

Checklist

A quick checklist for a healthy CI/CD setup:

Key Takeaways

Fail fast — lint and unit tests before long-running jobs.
Parallelize — independent jobs should run simultaneously.
Keep it under 10 minutes — fast feedback is the core value of CI.
OIDC > stored credentials — short-lived tokens with no secrets to manage.
Pin everything — actions, images, dependencies.
Test pyramid — many unit tests, some integration, few e2e.
Trunk-based or GitHub Flow — short-lived branches, frequent merges.
Track DORA metrics — deployment frequency, lead time, change failure rate, time to restore.
Treat pipeline YAML as code — version controlled, reviewed, DRY, documented.