Scaling Prometheus

First PublishedFeb 16, 2026ByAtif Alam

A single Prometheus server works well for small-to-medium environments. But at scale you hit limits: local disk fills up, a single instance can’t scrape thousands of targets, and you need queries across multiple clusters. Thanos, Cortex, and Mimir solve these problems.

When a Single Prometheus Isn’t Enough

Problem	Symptom
Retention	Local disk runs out; you can only keep 15–30 days of data
High availability	One Prometheus instance = single point of failure
Multi-cluster	Separate Prometheus per cluster; no unified view
Cardinality	Millions of time series overwhelm a single TSDB
Query performance	Large range queries on months of data are slow

All three projects solve these problems, but with different architectures.

Thanos

Thanos extends existing Prometheus instances with long-term storage and global querying. It’s a sidecar-based approach — you keep your existing Prometheus servers and add Thanos components alongside them.

Architecture

1
┌──────────────────┐     ┌──────────────────┐
2
│  Prometheus A    │     │  Prometheus B    │
3
│  (cluster-us)   │     │  (cluster-eu)   │
4
│  ┌────────────┐  │     │  ┌────────────┐  │
5
│  │ Thanos     │  │     │  │ Thanos     │  │
6
│  │ Sidecar    │──┼─────┼──│ Sidecar    │  │
7
│  └────────────┘  │     │  └────────────┘  │
8
└──────────────────┘     └──────────────────┘
9
         │                        │
10
         ▼                        ▼
11
┌──────────────────────────────────────┐
12
│         Object Storage (S3/GCS)      │
13
└──────────────────────────────────────┘
14
         │
15
         ▼
16
┌──────────────────┐     ┌──────────────────┐
17
│  Thanos Store    │     │  Thanos Compact  │
18
│  Gateway         │     │  (downsampling)  │
19
└──────────────────┘     └──────────────────┘
20
         │
21
         ▼
22
┌──────────────────┐
23
│  Thanos Query    │ ◄── Grafana connects here
24
│  (global view)   │
25
└──────────────────┘

Components

Component	What It Does
Sidecar	Runs alongside each Prometheus. Uploads blocks to object storage and serves real-time data to Query.
Store Gateway	Reads historical data from object storage and serves it to Query.
Query	A Prometheus-compatible query endpoint that fans out to Sidecars and Store Gateways. Grafana points here.
Compactor	Compacts and downsamples blocks in object storage (5m → 1h resolution for old data). Reduces storage costs.
Ruler	Evaluates recording and alerting rules across the global view (optional; you can keep rules on Prometheus).
Receive	Alternative to Sidecar — accepts remote-write from Prometheus instances. Useful when sidecars aren’t possible.

How It Works

Each Prometheus scrapes its targets normally and writes TSDB blocks to local disk.
The Sidecar uploads completed 2-hour blocks to object storage (S3, GCS, Azure Blob).
Store Gateway indexes those blocks and serves them for historical queries.
Thanos Query federates queries: recent data from Sidecars, historical data from Store Gateway.
Compactor runs periodically to merge small blocks and create downsampled data.

Key Config: Sidecar

1
# Run Thanos Sidecar alongside Prometheus
2
thanos sidecar \
3
  --tsdb.path=/prometheus/data \
4
  --prometheus.url=http://localhost:9090 \
5
  --objstore.config-file=bucket.yaml \
6
  --grpc-address=0.0.0.0:10901
7

8
# bucket.yaml
9
type: S3
10
config:
11
  bucket: thanos-metrics
12
  endpoint: s3.amazonaws.com
13
  region: us-east-1
14
  access_key: ${AWS_ACCESS_KEY_ID}
15
  secret_key: ${AWS_SECRET_ACCESS_KEY}

Key Config: Query

1
# Thanos Query connects to all Sidecars and Store Gateways
2
thanos query \
3
  --store=prometheus-a-sidecar:10901 \
4
  --store=prometheus-b-sidecar:10901 \
5
  --store=store-gateway:10901 \
6
  --http-address=0.0.0.0:9090

Grafana connects to Thanos Query at port 9090 as if it were a single Prometheus.

Cortex

Cortex is a horizontally scalable, multi-tenant Prometheus-compatible backend. Unlike Thanos (sidecar model), Cortex uses remote write — Prometheus pushes data to Cortex.

Architecture

1
┌──────────────┐     remote write     ┌──────────────────────────────┐
2
│  Prometheus  │─────────────────────►│  Cortex                      │
3
└──────────────┘                      │  ┌─────────┐  ┌───────────┐ │
4
                                      │  │Distribu-│─►│ Ingester  │ │
5
                                      │  │  tor    │  │           │ │
6
                                      │  └─────────┘  └─────┬─────┘ │
7
                                      │                      │       │
8
                                      │               ┌──────▼─────┐ │
9
                                      │               │  Object    │ │
10
                                      │               │  Storage   │ │
11
                                      │               └──────┬─────┘ │
12
                                      │                      │       │
13
                                      │  ┌──────────┐ ┌──────▼─────┐ │
14
                                      │  │  Query   │◄│  Store     │ │
15
                                      │  │ Frontend │ │  Gateway   │ │
16
                                      │  └──────────┘ └────────────┘ │
17
                                      └──────────────────────────────┘

Key Differences from Thanos

Feature	Thanos	Cortex
Data flow	Sidecar uploads blocks	Remote write (push)
Multi-tenancy	No built-in tenancy	Native multi-tenancy (`X-Scope-OrgID` header)
Existing Prometheus	Keep as-is, add sidecar	Add `remote_write` config
Storage	Object storage (blocks)	Object storage (chunks or blocks)
HA	Deduplicate via external labels	Built-in replication factor
Complexity	Moderate (add sidecars)	Higher (more microservices)

When to Use Cortex

You need multi-tenancy (e.g. SaaS platform where each customer has isolated metrics).
You want a fully push-based architecture.
You’re already using remote write.

Grafana Mimir

Grafana Mimir is the successor to Cortex, built by Grafana Labs. It’s architecturally similar to Cortex but with significant performance improvements and simpler operations.

Why Mimir Over Cortex?

Feature	Cortex	Mimir
Performance	Good	Significantly faster (query splitting, shuffle sharding)
Storage	Chunks or blocks	Blocks only (simpler)
Cardinality limits	Basic	Advanced per-tenant limits
Out-of-order ingestion	No	Yes (handles late-arriving data)
Maintained by	Community (slower pace)	Grafana Labs (active development)
Query performance	Good	Better (query sharding, split-and-merge)

Mimir is the recommended choice for new deployments. Grafana Labs considers Cortex essentially superseded by Mimir.

Architecture (Same as Cortex, Improved Internals)

1
Prometheus ──remote_write──► Mimir Distributor ──► Ingester ──► Object Storage
2
                                                                      │
3
Grafana ◄── Query Frontend ◄── Querier ◄── Store Gateway ◄───────────┘

Deploying Mimir with Helm

1
helm repo add grafana https://grafana.github.io/helm-charts
2
helm install mimir grafana/mimir-distributed \
3
  --set mimir.structuredConfig.common.storage.backend=s3 \
4
  --set mimir.structuredConfig.common.storage.s3.bucket_name=mimir-blocks \
5
  --set mimir.structuredConfig.common.storage.s3.endpoint=s3.amazonaws.com

Prometheus Remote Write Config

1
# prometheus.yml — send metrics to Mimir (or Cortex)
2
remote_write:
3
  - url: http://mimir-distributor:8080/api/v1/push
4
    headers:
5
      X-Scope-OrgID: my-tenant     # required for multi-tenancy

Choosing Between Them

Criteria	Thanos	Cortex	Mimir
Best for	Extending existing Prometheus	Multi-tenant SaaS	New deployments, Grafana stack
Data model	Keep Prometheus local, upload blocks	Push via remote write	Push via remote write
Multi-tenancy	No	Yes	Yes
Operational effort	Lower (sidecar is lightweight)	Higher	Moderate (good Helm chart)
Long-term storage	Yes (object storage)	Yes (object storage)	Yes (object storage)
Grafana integration	Good	Good	Native (same company)
Active development	Active	Slower	Very active
Migration path	Add sidecars to existing Prometheus	Requires remote write	Requires remote write; easy migration from Cortex

Decision Flow

1
Do you need multi-tenancy?
2
  ├─ Yes → Mimir (or Cortex if already using it)
3
  └─ No
4
       └─ Want to keep existing Prometheus as-is?
5
            ├─ Yes → Thanos (sidecar model)
6
            └─ No → Mimir (remote write, best performance)

Common Patterns

Long-Term Retention with Downsampling

Both Thanos and Mimir can downsample old data to reduce storage costs:

Resolution	Retention	Use Case
Raw (15s scrape)	14 days	Detailed troubleshooting
5 minute	90 days	Weekly reviews
1 hour	1+ year	Capacity planning, trend analysis

Global View Across Clusters

1
┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
2
│ Cluster US-East │  │ Cluster EU-West │  │ Cluster AP-SE   │
3
│  Prometheus     │  │  Prometheus     │  │  Prometheus     │
4
└────────┬────────┘  └────────┬────────┘  └────────┬────────┘
5
         │                    │                    │
6
         └────────────────────┼────────────────────┘
7
                              ▼
8
                    ┌───────────────────┐
9
                    │  Thanos Query /   │
10
                    │  Mimir Querier    │
11
                    └─────────┬─────────┘
12
                              ▼
13
                    ┌───────────────────┐
14
                    │     Grafana       │
15
                    └───────────────────┘

One Grafana instance, one query endpoint, all clusters — without duplicating metrics.

Key Takeaways

A single Prometheus is fine for small/medium setups; scale out when you need long-term retention, HA, or multi-cluster queries.
Thanos adds a sidecar to existing Prometheus — least disruptive, no multi-tenancy.
Cortex is push-based (remote write) with native multi-tenancy — suited for SaaS platforms.
Mimir is the successor to Cortex with better performance and active development — the recommended choice for new deployments using the Grafana stack.
All three use object storage (S3/GCS) for cost-effective long-term retention with optional downsampling.