Skip to content

Scaling Prometheus

First PublishedByAtif Alam

A single Prometheus server works well for small-to-medium environments. But at scale you hit limits: local disk fills up, a single instance can’t scrape thousands of targets, and you need queries across multiple clusters. Thanos, Cortex, and Mimir solve these problems.

ProblemSymptom
RetentionLocal disk runs out; you can only keep 15–30 days of data
High availabilityOne Prometheus instance = single point of failure
Multi-clusterSeparate Prometheus per cluster; no unified view
CardinalityMillions of time series overwhelm a single TSDB
Query performanceLarge range queries on months of data are slow

All three projects solve these problems, but with different architectures.

Thanos extends existing Prometheus instances with long-term storage and global querying. It’s a sidecar-based approach — you keep your existing Prometheus servers and add Thanos components alongside them.

┌──────────────────┐ ┌──────────────────┐
│ Prometheus A │ │ Prometheus B │
│ (cluster-us) │ │ (cluster-eu) │
│ ┌────────────┐ │ │ ┌────────────┐ │
│ │ Thanos │ │ │ │ Thanos │ │
│ │ Sidecar │──┼─────┼──│ Sidecar │ │
│ └────────────┘ │ │ └────────────┘ │
└──────────────────┘ └──────────────────┘
│ │
▼ ▼
┌──────────────────────────────────────┐
│ Object Storage (S3/GCS) │
└──────────────────────────────────────┘
┌──────────────────┐ ┌──────────────────┐
│ Thanos Store │ │ Thanos Compact │
│ Gateway │ │ (downsampling) │
└──────────────────┘ └──────────────────┘
┌──────────────────┐
│ Thanos Query │ ◄── Grafana connects here
│ (global view) │
└──────────────────┘
ComponentWhat It Does
SidecarRuns alongside each Prometheus. Uploads blocks to object storage and serves real-time data to Query.
Store GatewayReads historical data from object storage and serves it to Query.
QueryA Prometheus-compatible query endpoint that fans out to Sidecars and Store Gateways. Grafana points here.
CompactorCompacts and downsamples blocks in object storage (5m → 1h resolution for old data). Reduces storage costs.
RulerEvaluates recording and alerting rules across the global view (optional; you can keep rules on Prometheus).
ReceiveAlternative to Sidecar — accepts remote-write from Prometheus instances. Useful when sidecars aren’t possible.
  1. Each Prometheus scrapes its targets normally and writes TSDB blocks to local disk.
  2. The Sidecar uploads completed 2-hour blocks to object storage (S3, GCS, Azure Blob).
  3. Store Gateway indexes those blocks and serves them for historical queries.
  4. Thanos Query federates queries: recent data from Sidecars, historical data from Store Gateway.
  5. Compactor runs periodically to merge small blocks and create downsampled data.
# Run Thanos Sidecar alongside Prometheus
thanos sidecar \
--tsdb.path=/prometheus/data \
--prometheus.url=http://localhost:9090 \
--objstore.config-file=bucket.yaml \
--grpc-address=0.0.0.0:10901
# bucket.yaml
type: S3
config:
bucket: thanos-metrics
endpoint: s3.amazonaws.com
region: us-east-1
access_key: ${AWS_ACCESS_KEY_ID}
secret_key: ${AWS_SECRET_ACCESS_KEY}
# Thanos Query connects to all Sidecars and Store Gateways
thanos query \
--store=prometheus-a-sidecar:10901 \
--store=prometheus-b-sidecar:10901 \
--store=store-gateway:10901 \
--http-address=0.0.0.0:9090

Grafana connects to Thanos Query at port 9090 as if it were a single Prometheus.

Cortex is a horizontally scalable, multi-tenant Prometheus-compatible backend. Unlike Thanos (sidecar model), Cortex uses remote write — Prometheus pushes data to Cortex.

┌──────────────┐ remote write ┌──────────────────────────────┐
│ Prometheus │─────────────────────►│ Cortex │
└──────────────┘ │ ┌─────────┐ ┌───────────┐ │
│ │Distribu-│─►│ Ingester │ │
│ │ tor │ │ │ │
│ └─────────┘ └─────┬─────┘ │
│ │ │
│ ┌──────▼─────┐ │
│ │ Object │ │
│ │ Storage │ │
│ └──────┬─────┘ │
│ │ │
│ ┌──────────┐ ┌──────▼─────┐ │
│ │ Query │◄│ Store │ │
│ │ Frontend │ │ Gateway │ │
│ └──────────┘ └────────────┘ │
└──────────────────────────────┘
FeatureThanosCortex
Data flowSidecar uploads blocksRemote write (push)
Multi-tenancyNo built-in tenancyNative multi-tenancy (X-Scope-OrgID header)
Existing PrometheusKeep as-is, add sidecarAdd remote_write config
StorageObject storage (blocks)Object storage (chunks or blocks)
HADeduplicate via external labelsBuilt-in replication factor
ComplexityModerate (add sidecars)Higher (more microservices)
  • You need multi-tenancy (e.g. SaaS platform where each customer has isolated metrics).
  • You want a fully push-based architecture.
  • You’re already using remote write.

Grafana Mimir is the successor to Cortex, built by Grafana Labs. It’s architecturally similar to Cortex but with significant performance improvements and simpler operations.

FeatureCortexMimir
PerformanceGoodSignificantly faster (query splitting, shuffle sharding)
StorageChunks or blocksBlocks only (simpler)
Cardinality limitsBasicAdvanced per-tenant limits
Out-of-order ingestionNoYes (handles late-arriving data)
Maintained byCommunity (slower pace)Grafana Labs (active development)
Query performanceGoodBetter (query sharding, split-and-merge)

Mimir is the recommended choice for new deployments. Grafana Labs considers Cortex essentially superseded by Mimir.

Architecture (Same as Cortex, Improved Internals)

Section titled “Architecture (Same as Cortex, Improved Internals)”
Prometheus ──remote_write──► Mimir Distributor ──► Ingester ──► Object Storage
Grafana ◄── Query Frontend ◄── Querier ◄── Store Gateway ◄───────────┘
Terminal window
helm repo add grafana https://grafana.github.io/helm-charts
helm install mimir grafana/mimir-distributed \
--set mimir.structuredConfig.common.storage.backend=s3 \
--set mimir.structuredConfig.common.storage.s3.bucket_name=mimir-blocks \
--set mimir.structuredConfig.common.storage.s3.endpoint=s3.amazonaws.com
# prometheus.yml — send metrics to Mimir (or Cortex)
remote_write:
- url: http://mimir-distributor:8080/api/v1/push
headers:
X-Scope-OrgID: my-tenant # required for multi-tenancy
CriteriaThanosCortexMimir
Best forExtending existing PrometheusMulti-tenant SaaSNew deployments, Grafana stack
Data modelKeep Prometheus local, upload blocksPush via remote writePush via remote write
Multi-tenancyNoYesYes
Operational effortLower (sidecar is lightweight)HigherModerate (good Helm chart)
Long-term storageYes (object storage)Yes (object storage)Yes (object storage)
Grafana integrationGoodGoodNative (same company)
Active developmentActiveSlowerVery active
Migration pathAdd sidecars to existing PrometheusRequires remote writeRequires remote write; easy migration from Cortex
Do you need multi-tenancy?
├─ Yes → Mimir (or Cortex if already using it)
└─ No
└─ Want to keep existing Prometheus as-is?
├─ Yes → Thanos (sidecar model)
└─ No → Mimir (remote write, best performance)

Both Thanos and Mimir can downsample old data to reduce storage costs:

ResolutionRetentionUse Case
Raw (15s scrape)14 daysDetailed troubleshooting
5 minute90 daysWeekly reviews
1 hour1+ yearCapacity planning, trend analysis
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Cluster US-East │ │ Cluster EU-West │ │ Cluster AP-SE │
│ Prometheus │ │ Prometheus │ │ Prometheus │
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
│ │ │
└────────────────────┼────────────────────┘
┌───────────────────┐
│ Thanos Query / │
│ Mimir Querier │
└─────────┬─────────┘
┌───────────────────┐
│ Grafana │
└───────────────────┘

One Grafana instance, one query endpoint, all clusters — without duplicating metrics.

  • A single Prometheus is fine for small/medium setups; scale out when you need long-term retention, HA, or multi-cluster queries.
  • Thanos adds a sidecar to existing Prometheus — least disruptive, no multi-tenancy.
  • Cortex is push-based (remote write) with native multi-tenancy — suited for SaaS platforms.
  • Mimir is the successor to Cortex with better performance and active development — the recommended choice for new deployments using the Grafana stack.
  • All three use object storage (S3/GCS) for cost-effective long-term retention with optional downsampling.