This website uses cookies to ensure you get the best experience. Read more in Privacy Policy.

OK

$ kubectl get mongodbclusters --all-namespaces

Running MongoDB
on Kubernetes

Replica Sets, Sharded Clusters & Operator Comparison

STATUS PRODUCTION_READY

AUDIENCE DBA | SRE | PLATFORM_ENG

DATABASE MongoDB 6+

$ cat mongodb-on-kubernetes/table-of-contents.md

This guide covers production MongoDB on Kubernetes end-to-end — from replica set topology and sharded cluster architecture to operator selection, PITR backups, and compute sizing.

01

Introduction

Why MongoDB on Kubernetes. Key benefits, production maturity, and what changes compared to VM-based deployments.

Document DB CNCF

→

02

Architecture

Replica set with driver-side direct connection, sharded clusters with mongos routing, Config Server RS, and multi-AZ HA.

Replica Set Sharding mongos

→

03

Compute & Storage

WiredTiger cache sizing, K8s resource limits, oplog sizing for PITR coverage, NVMe storage classes, and journal layout.

WiredTiger PVC Oplog

→

04

Monitoring

Prometheus with mongodb_exporter, oplog window alerting, WiredTiger cache hit ratio, slow operations, and PMM integration.

Prometheus Grafana PMM

→

05

Backups & PITR

Percona Backup for MongoDB (PBM), physical snapshots, continuous oplog archiving to S3, and point-in-time restore.

PBM S3 Oplog PITR

→

06

Operators Compared

Percona Operator, MongoDB Community Operator, KubeDB, and Atlas Kubernetes Operator — side-by-side feature matrix.

Operator Percona OpenEverest

→

$ kubectl get mongodbclusters -n production

1. Why MongoDB on Kubernetes?

MongoDB is the world's leading document database, and Kubernetes has become the standard infrastructure platform. Running MongoDB on Kubernetes means your clusters are declaratively defined, GitOps-managed, and co-located with the applications they serve — on your infrastructure, under your control.

Unlike relational workloads, MongoDB's horizontal scaling model (sharding) was designed for distributed systems from the start. Kubernetes amplifies this: shards become StatefulSets, mongos routers become Deployments, and failover automation is baked into both the operator and MongoDB's Raft-based election protocol. The result is a database platform that scales out as naturally as the applications it powers.

⟂

No Vendor Lock-In

Run on any Kubernetes — GKE, EKS, AKS, bare metal, or on-prem. Open-source operators under Apache 2.0 with no per-node license fees.

⚙

Automated Operations

Rolling upgrades, replica set scaling, shard addition, and TLS certificate rotation — all via Kubernetes CRDs. No manual runbooks.

$

Cost Efficiency

Bin-pack MongoDB alongside other workloads. Scale dev clusters to a single-node. Assign secondary members to spot/preemptible node pools.

PRODUCTION MATURITY

Large e-commerce platforms, financial services firms, and SaaS providers run multi-TB sharded MongoDB clusters on Kubernetes in production. The Percona Operator for MongoDB and the MongoDB Community Operator both have extensive production history.

$ kubectl describe mongodbcluster production-rs

2. Architecture

MongoDB on Kubernetes supports two primary topologies: a single Replica Set where the application driver connects directly to all members, and a Sharded Cluster with mongos routers for horizontal data partitioning. Choosing the right topology depends on your dataset size and scaling requirements.

TOPOLOGY 1 — Single Replica Set

Direct Connection: No Proxy Required

A MongoDB Replica Set consists of three or more nodes: one Primary (handles all writes) and one or more Secondaries (replicate via oplog tailing). Unlike PostgreSQL, where the application typically speaks to a single endpoint, the MongoDB driver is topology-aware — it discovers all replica set members automatically from the initial seed list and maintains a connection to each.

This means no proxy is needed for a single replica set. The application connection string lists all member hosts (or the service DNS entries). The driver monitors member health continuously and routes reads and writes intelligently based on your configured read preference.

Connection String — all members listed

mongodb://mongo-0.mongo-svc:27017,mongo-1.mongo-svc:27017,mongo-2.mongo-svc:27017/?replicaSet=rs0&readPreference=secondaryPreferred

The driver performs topology discovery from any of the seed hosts. Adding or removing members is transparent to the application after the replica set membership is updated.

WHEN TO USE REPLICA SET

✓ Working set fits on one node

✓ Dataset under ~500 GB–1 TB

✓ Simpler operational model

✓ PITR from a single oplog stream

✓ Driver-native failover (~10 s)

⚠ Write throughput limited to one primary

Read Preference Options

The driver routes read operations based on your readPreference setting — no extra infrastructure needed:

primary All reads go to the primary. Strongest consistency. Default.

primaryPreferred Reads go to primary; falls back to secondary if unavailable.

secondary All reads go to a secondary. Good for analytics and reporting.

secondaryPreferred Prefers secondaries; falls back to primary if none available.

nearest Routes to the member with lowest network latency.

Automatic Failover

MongoDB uses a Raft-based election protocol. When the primary becomes unavailable, the replica set elects a new primary automatically:

1 Primary stops sending heartbeats (10 s electionTimeout by default)

2 Secondaries detect the outage and a candidate initiates election

3 Majority vote — the most up-to-date secondary wins

4 Driver detects topology change and reroutes writes to new primary in ~10–15 s

On Kubernetes: set priority on members to prefer nodes in the primary AZ. Use votes: 0 for arbiter-only members to spare resources.

TOPOLOGY 2 — Sharded Cluster

Horizontal Scale-Out with mongos

When your dataset exceeds what a single replica set can serve — in throughput or storage — a sharded cluster distributes data across multiple shards. Each shard is itself a full replica set. Applications never talk to shards directly: they connect to mongos routers, which are stateless proxy pods that route operations to the correct shard using a shard key.

A sharded cluster has three component groups: mongos routers (stateless, scale horizontally as a Deployment), a Config Server Replica Set (3-node RS storing chunk ranges and metadata), and N data shards (each a 3-node RS). The mongos reads chunk metadata from the Config Server to route every query or write to the right shard.

Use sharding when: dataset exceeds 1 TB, write throughput needs horizontal scaling, or data must be partitioned by geography or tenant.

Avoid sharding when: your working set fits on a single RS — sharding adds operational complexity (Config Server RS, shard key design, chunk balancing).

MongoDB Sharded Cluster — Kubernetes Topology

Config Server Replica Set

The Config Server RS is a dedicated 3-node replica set that stores the cluster's sharding metadata: which chunks of data live on which shard. Every mongos reads from this RS before routing operations.

Always exactly 3 nodes — never scale down
Outage of Config Server stops all mongos routing
Must be placed on Guaranteed QoS pods
Low storage requirements (~1–5 GB) but high availability requirement

kind: PerconaServerMongoDB
spec:
  sharding:
    enabled: true
    configsvrReplSet:
      size: 3
      affinity:
        podAntiAffinityTopologyKey: topology.kubernetes.io/zone

Shard Key Selection

The shard key determines how data is distributed. Poor shard key selection leads to hotspots where all writes go to one shard, negating the benefit of sharding.

{ userId: "hashed" } Even distribution ✓

{ tenantId: 1, _id: 1 } Tenant partitioning ✓

{ createdAt: 1 } Monotonic hotspot ✗

{ status: 1 } Low cardinality ✗

Use hashed shard keys for maximum write distribution. Use compound range shard keys when zone sharding (geographic partitioning) is required.

$ kubectl top pods -n mongo-production --sort-by=memory

3. Compute & Storage

MongoDB's performance depends almost entirely on memory: the WiredTiger cache must hold the working set, and the OS page cache provides a second layer. Getting resource limits and storage layout wrong on Kubernetes will cause constant cache evictions and poor read performance.

WiredTiger Cache Sizing

WiredTiger is MongoDB's default storage engine. It maintains an in-memory cache of working data. By default, the cache size is (0.5 × RAM) - 1 GB. On Kubernetes you must set this explicitly — the container memory limit is not automatically visible to MongoDB, and the default calculation may use the node's total RAM, leading to OOMKills.

BAD — implicit sizing

resources:
  limits:
    memory: "8Gi"
  requests:
    memory: "8Gi"
# wiredTigerCacheSizeGB not set →
# MongoDB may read host RAM (e.g. 128 GB)
# and set cache to 63 GB → OOMKill

GOOD — explicit cache sizing

resources:
  limits:
    memory: "16Gi"
  requests:
    memory: "16Gi"
# Rule: cache ≤ (containerMemory × 0.5) - 1 GB
# For 16 Gi container → cache = 7 GB
storage:
  wiredTiger:
    engineConfig:
      cacheSizeGB: 7

MEMORY LAYOUT

MongoDB uses two memory layers: the WiredTiger cache (for modified and frequently-accessed data) and the OS page cache (for reads not in the WiredTiger cache). Set the container memory limit to at least 2× the WiredTiger cache size to leave room for the OS page cache, connections, and aggregation pipeline buffers.

Oplog Sizing

The oplog (operations log) is a capped collection on each replica set member that records all write operations. Its size determines your PITR window — how far back in time you can restore, and how much replication lag is tolerable before a secondary falls too far behind to resync.

Default (avoid in production) ~990 MB May roll over in <1 hour under write pressure

Recommended baseline 10–50 GB Provides multi-hour PITR window; size to cover maintenance duration

Heavy-write workloads 50–100 GB Ensure secondaries can relag without falling off the oplog

storage:
  oplogSizeMB: 51200  # 50 GB — set in MiB, not GB

NVMe / Local SSD — Primary Members

RECOMMENDED

MongoDB's WiredTiger performs many small random I/Os for cache evictions and checkpoints. NVMe or local SSD-backed storage classes deliver the sub-millisecond latency these operations require. Use node-local PVCs via local-path-provisioner or cloud NVMe storage classes.

Sub-ms latency for checkpoint writes
High IOPS for cache eviction under pressure
WAL (journal) writes are sequential — benefit from NVMe burst

Network-Attached Storage — Secondaries

ACCEPTABLE

Secondary members replicate via oplog tailing — a largely sequential read/write pattern that is tolerant of higher latency. Cloud block storage (gp3, Premium SSD, Persistent Disk) is acceptable for secondaries, reducing cost without impacting write throughput on the primary.

Lower cost than NVMe for secondary nodes
Sequential oplog writes tolerate higher latency
Separate journal PVC prevents I/O contention

$ kubectl port-forward svc/prometheus 9090 -n monitoring

4. Monitoring

MongoDB produces rich observability data, but the default monitoring focus for Kubernetes operators differs from standalone deployments. Oplog window and replication lag are the two most operationally critical metrics — both invisible without dedicated exporters.

Prometheus + mongodb_exporter

The Percona mongodb_exporter scrapes MongoDB's serverStatus, replSetGetStatus, and dbStats endpoints and exposes them as Prometheus metrics. The Percona Operator for MongoDB deploys it as a sidecar and automatically creates a ServiceMonitor for Prometheus Operator.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: mongo-metrics
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: percona-server-mongodb
  endpoints:
    - port: metrics
      interval: 30s
      path: /metrics

PMM (Percona Monitoring & Management) provides a pre-built Grafana stack with MongoDB dashboards, query analytics, and replication topology views — available as a K8s Deployment alongside the operator.

Critical MongoDB Metrics

These five metrics should be the foundation of any MongoDB on Kubernetes alert configuration:

● Oplog Window (hours)

mongodb_mongod_replset_oplog_tail_timestamp - mongodb_mongod_replset_oplog_head_timestamp

Time span covered by the oplog. Alert if < 2× your longest maintenance window. Shrinking oplog window means secondaries may fall off and require full resync.

● Replication Lag (seconds)

mongodb_mongod_replset_member_replication_lag

How far behind each secondary is from the primary's optime. Alert at > 30 s; investigate at > 10 s under normal load.

● WiredTiger Cache Hit Ratio

mongodb_mongod_wiredtiger_cache_bytes_read_into_cache / mongodb_mongod_wiredtiger_cache_bytes_total

Should stay above 95%. Dropping below 90% signals the working set no longer fits in cache — time to scale memory or shard.

● Current Connections

mongodb_mongod_connections{state="current"}

MongoDB spawns one thread per connection. Alert when approaching maxIncomingConnections. High connection count is a sign PgBouncer-equivalent pooling (mongos or application-level pooling) is needed.

● Page Faults / Disk I/O

mongodb_extra_info_page_faults_total

Page faults mean data was not in the WiredTiger cache or OS page cache — direct disk reads. Sustained page faults indicate insufficient memory.

K8s-Aware Monitoring Checklist

✓

Replica Set member state

Monitor mongodb_mongod_replset_member_state — alert on any member in state RECOVERING (6), DOWN (8), or UNKNOWN (6+).

✓

PVC capacity alerts

Set kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes > 0.80 — MongoDB does not pre-allocate but checkpoint files can spike disk usage.

✓

Config Server RS health

For sharded clusters, alert immediately if the Config Server RS loses quorum. All mongos routing stops until a majority is restored.

✓

mongos active connections

Track mongodb_mongos_connections for sharded clusters. mongos pools connections to each shard; size the pool to avoid shard connection exhaustion.

✓

Slow query tracking

Enable profiling: 1 with slowOpThresholdMs: 100. PMM's Query Analytics surfaces the top slow operations with explain plan data.

✓

Grafana dashboard

Percona provides pre-built Grafana dashboards for MongoDB Replica Set Overview, WiredTiger, and ReplSet Summary — importable via PMM or standalone JSON.

$ pbm status

5. Backups & PITR

MongoDB point-in-time recovery is implemented differently from relational databases. Instead of WAL files, MongoDB uses the oplog — a capped collection that records every write operation. Percona Backup for MongoDB (PBM) continuously archives this oplog to object storage, enabling recovery to any second within the oplog window.

Percona Backup for MongoDB (PBM)

PBM is the recommended backup solution for self-managed MongoDB on Kubernetes. It supports both physical backups (direct copy of WiredTiger data files — fast for large datasets) and logical backups (BSON document dump). For PITR, PBM archives oplog chunks to S3-compatible storage continuously.

◉

Physical Backup Copies WiredTiger data files directly. Backup time proportional to data size, not document count. Recommended for production (>50 GB).

◉

Logical Backup BSON dump via mongodump. Slower but portable across MongoDB versions. Useful for development and smaller datasets.

◉

Oplog Archiving (PITR) Continuously uploads oplog chunks to S3 on a configurable interval (default 10 min). Enables second-level PITR within the upload window.

◉

Sharded Cluster Aware PBM coordinates backups across all shards and the Config Server RS simultaneously, ensuring a consistent cluster-wide snapshot.

# Trigger a physical backup
pbm backup --type=physical

# Restore to a point in time
pbm restore --time="2026-04-01T14:32:00"

# Check PITR coverage window
pbm status | grep -A5 "PITR"

PITR via Oplog Archiving

Point-in-time recovery combines a base backup snapshot with a continuous stream of oplog operations. Restoring to any point requires replaying the oplog from the nearest base backup up to the target timestamp.

▸ PITR granularity: seconds (limited by oplog upload interval)

▸ PITR window = age of oldest base backup in S3

▸ Oplog must not roll over between backup and restore target

▸ Works identically for replica sets and sharded clusters

Backup Strategy Decision Matrix

RECOMMENDED

PBM Physical + PITR

Production workloads on MongoDB 5+. Fast backup, fast restore, second-level PITR. Runs as a sidecar agent in each pod — no external backup host needed.

ACCEPTABLE

PBM Logical + PITR

Development clusters or datasets under 50 GB. Portable across MongoDB patch versions. Slower than physical for backup and restore.

NOT RECOMMENDED

mongodump only

No PITR capability. Locks collections during dump on older versions. Slow for large datasets. Insufficient alone for production SLAs.

NOT RECOMMENDED

Volume snapshots only

No oplog-aware consistency for sharded clusters. Cannot restore to arbitrary points in time. Use only as an additional layer alongside PBM.

$ kubectl get operators --selector=database=mongodb

6. MongoDB Operators Compared

The Kubernetes ecosystem offers several operators for MongoDB, ranging from full-featured open-source solutions to proprietary enterprise platforms. Here's how the main options compare on the features that matter most in production.

●

Percona Operator for MongoDB

Percona

License Apache 2.0

Sharding Full support

Backup PBM (physical + PITR)

The most feature-complete open-source MongoDB operator. Supports replica sets and sharded clusters, PBM integration for PITR, PMM for monitoring, TLS, custom roles, and multi-cluster federation. Active development with regular releases.

Apache 2.0 Sharding PBM PITR PMM

↗ percona.com

●

MongoDB Community Operator

MongoDB Inc.

License Apache 2.0

Sharding Not supported

Backup No native integration

The official open-source operator from MongoDB Inc. Handles replica set provisioning and basic lifecycle management. No sharding support, no built-in backup, no monitoring integration. Suitable for development clusters and use cases where Atlas Operator is not viable.

Apache 2.0 Replica Set only Basic lifecycle

↗ GitHub

●

KubeDB

AppsCode

License BSL / Proprietary

Sharding Supported

Backup Stash (paid)

Multi-database operator suite (MongoDB, PostgreSQL, MySQL, Redis, and more) with a single unified management plane. Enterprise features including automated backups and monitoring require a commercial license. Useful for teams that need a single operator for many database types under an enterprise agreement.

Multi-DB Enterprise Sharding

↗ kubedb.com

●

Atlas Kubernetes Operator

MongoDB Inc.

License Apache 2.0 (code)

Sharding Via Atlas

Self-hosted No — Atlas cloud only

Manages MongoDB Atlas clusters via Kubernetes CRDs. The operator code is open source but the database runs in Atlas (MongoDB's cloud service) — not on your own Kubernetes nodes. Offers full feature parity with Atlas including Global Clusters and Atlas Search. Requires an Atlas account and subscription costs.

Atlas Cloud Full-featured Subscription

↗ mongodb.com

$ kubectl get everestoperator

OpenEverest: The Unified Approach

OpenEverest is a CNCF Sandbox project that provides a unified control plane for multiple database engines on Kubernetes. For MongoDB workloads, it currently uses Percona Operator for MongoDB as its engine — delivering full replica set and sharded cluster support, PBM-based PITR, and PMM monitoring through a single unified API.

OpenEverest is built on a modular operator architecture: the underlying engine is pluggable, and additional MongoDB operator backends are on the roadmap. The same modular approach already spans MySQL (Percona Operator for MySQL), MongoDB, and more engines planned. This means teams invest in one API, one RBAC model, and one monitoring stack — regardless of which database or operator runs underneath.

CNCF Sandbox Open Source Modular Architecture Percona Operator Now Sharding Support Multi-Database

solanica@k8s:~$ █

Run MongoDB on Kubernetes — The Right Way

Replica sets with driver-native failover, sharded clusters with mongos, PBM PITR to S3 — all managed through a single open-source control plane. No vendor lock-in. No surprises.

100% Open Source

3+ DB Engines

40–60% Cost Savings

Book a Demo Try OpenEverest

Products

Solutions

COMPARISONS

Resources

Running MongoDBon Kubernetes