This website uses cookies to ensure you get the best experience. Read more in Privacy Policy.
OK
/whitepapers/ mysql-on-kubernetes

Running MySQL on Kubernetes

A Complete Architecture Guide

MySQL is one of the most mature relational databases in the world. Running it on Kubernetes is no longer a question of if — it's a question of how.

1. Introduction: MySQL Meets Kubernetes

MySQL was born in 1995 and has matured into one of the most widely deployed relational databases on the planet. Running it on Kubernetes is no longer experimental — it is a production-proven approach used by organizations that want portability, automation, and cost efficiency.

STATUS: PRODUCTION_PROVEN

Is MySQL on Kubernetes Production-Ready?

The short answer: yes. The longer answer: multiple battle-tested Kubernetes operators now exist specifically for MySQL. Organizations like GitHub, Shopify, and Slack have demonstrated that stateful workloads — including relational databases — run reliably on Kubernetes when managed with the right tooling.

Running MySQL on Kubernetes delivers three core advantages that traditional VM-based deployments cannot match:

🔓

No Vendor Lock-In

Deploy on any cloud or on-prem infrastructure. The same MySQL deployment runs on AWS, GCP, Azure, or bare metal — your data stays portable.

⚙️

Automated Operations

Kubernetes operators handle failover, scaling, backups, and upgrades automatically. What used to be runbooks and pager alerts becomes declarative YAML.

💰

Cost Efficiency

Consolidate workloads on shared clusters, right-size resources dynamically, and avoid paying the cloud database premium — often saving 40–60% on TCO.

The debate is over. Dozens of mature operators, a vibrant Data on Kubernetes community, and years of production usage across thousands of clusters have proven that MySQL on Kubernetes works. The real question is which topology, storage strategy, and operator to choose.

WHY_KUBERNETES
Run anywhere — any cloud, on-prem, edge
Automate Day-2 operations (backups, failover, scaling)
Declarative, version-controlled config
Consistent tooling across all databases
Save 40–60% vs managed cloud databases

2. MySQL on Kubernetes Architectures

A MySQL deployment on Kubernetes is best understood as two layers working together: a proxy layer that routes traffic and a database layer that handles replication and storage. Getting the topology right is the foundation of everything else.

LAYER: PROXY

The Proxy Layer (HAProxy, ProxySQL)

Every production MySQL deployment on Kubernetes needs a proxy in front of the database nodes. The proxy handles connection routing, read/write splitting, and transparent failover so that application code never needs to know which pod is the current primary.

The two most common choices are HAProxy and ProxySQL. HAProxy is a fast, lightweight TCP/HTTP load balancer — simple to configure and excellent for basic read/write splitting. ProxySQL is MySQL-aware, offering query-level routing, connection multiplexing, and query caching. Both integrate well with Kubernetes operators.

The proxy pods must be deployed separately from the MySQL pods, ideally on their own nodes, to prevent a single node failure from taking down both the proxy and a database instance simultaneously.

HAProxy TCP LB, simple
ProxySQL SQL-aware routing
Isolation Required
TOPOLOGY: ASYNC_REPLICATION

Asynchronous Replication

The standard MySQL replication model. The primary node writes to its binary log and replicas pull changes asynchronously. This provides good performance and is the most widely used topology. The trade-off is a small replication lag — if the primary fails before a replica catches up, some transactions may be lost.

Semi-synchronous replication improves on this by requiring at least one replica to acknowledge the transaction before the primary commits, reducing the data loss window significantly.

Primary-Replica Low Latency Semi-Sync Option
TOPOLOGY: SYNC_REPLICATION

Synchronous Replication (Galera, Group Replication)

For workloads that require zero data loss, MySQL offers two synchronous replication options: Percona XtraDB Cluster (PXC) built on Galera, and MySQL Group Replication (InnoDB Cluster). Both ensure every committed transaction is written to all nodes before acknowledgment.

Galera uses certification-based replication and supports true multi-primary writes. Group Replication offers a similar model with tighter integration into the MySQL ecosystem. The trade-off is higher write latency due to cross-node coordination.

Zero Data Loss Galera / PXC Group Replication
DIAGRAM: MYSQL_K8S_TOPOLOGY

MySQL Cluster Topology on Kubernetes

KUBERNETES CLUSTER 3 Nodes × 3 AZs node-1 (AZ-a) HAProxy proxy-0 MySQL Primary ✓ READ/WRITE mysql-0 PVC: mysql-data-0 StorageClass: fast-ssd podAntiAffinity: strict node-2 (AZ-b) HAProxy proxy-1 MySQL Replica ◎ READ ONLY mysql-1 PVC: mysql-data-1 StorageClass: fast-ssd podAntiAffinity: strict node-3 (AZ-c) HAProxy proxy-2 MySQL Replica ◎ READ ONLY mysql-2 PVC: mysql-data-2 StorageClass: fast-ssd podAntiAffinity: strict REPL Availability Zone A Availability Zone B Availability Zone C
HA: NODE_AFFINITY

High Availability with Node Affinity

For a 3-node MySQL cluster (1 primary + 2 replicas) with 3 proxy pods, you need at least 3 Kubernetes nodes. Each MySQL pod and each proxy pod must sit on a different physical node, enforced via podAntiAffinity. This ensures no single node failure takes down more than one database instance and one proxy simultaneously.

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchLabels:
          app: mysql
Node Isolation Required Policy
RESILIENCE: MULTI_AZ

Multi-AZ Deployments

Cloud providers enable scheduling pods across different availability zones (AZs). This is critical — if an entire AZ goes down, your cluster maintains quorum in the remaining zones. Use topologySpreadConstraints to distribute MySQL pods evenly across AZs and ensure the cluster survives zone-level failures.

topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: DoNotSchedule
Zone Redundancy Quorum Protection topologySpreadConstraints

3. Compute & Storage Strategy

Getting resources right is the difference between a rock-solid database and one that gets killed by the Linux OOM killer at 3 AM. Storage choices directly impact your recovery time and cost.

CRITICAL: GUARANTEED_QOS

Guaranteed QoS (Requests = Limits)

For stateless apps, setting low requests and high limits is standard for bin-packing. For databases, this is a recipe for disaster. If your memory limit is higher than the request, the Linux OOM killer will target the database pod first when the node comes under pressure.

The recommended approach for MySQL on Kubernetes: set requests equal to limits for both CPU and RAM. This gives the pod Guaranteed QoS class — the highest priority in the Kubernetes scheduler, immune to eviction during node pressure.

BURSTABLE_QOS
resources:
  requests:
    memory: 2Gi
    cpu: 500m
  limits:
    memory: 8Gi
    cpu: 4000m

⚠️ OOM Kill Target Under Pressure

GUARANTEED_QOS
resources:
  requests:
    memory: 8Gi
    cpu: 4000m
  limits:
    memory: 8Gi
    cpu: 4000m

✓ Protected by Kernel Priority

STORAGE: NETWORK_ATTACHED

Network-Attached Storage (EBS, Ceph)

The standard and most common approach is to decouple storage from compute using network-attached volumes like AWS EBS, GCP Persistent Disk, or Ceph. This provides a major operational advantage: when a node fails, the pod is rescheduled on another node and the same PersistentVolume is re-attached with all data intact.

This makes pod migration seamless and fast — there is no data resynchronization needed. The trade-off is I/O latency, which is higher than local disk because every read/write traverses the network.

Easy Migration AWS EBS Ceph GCP PD
STORAGE: LOCAL_NVME

Local NVMe SSDs

For latency-sensitive workloads, local NVMe SSDs offer the best raw performance — dramatically lower I/O latency and higher throughput than any network-attached option. This can also save significant cost since local storage is typically cheaper than provisioned IOPS network volumes.

The trade-off is durability: if the node fails, the local volume is lost. The database pod is rescheduled on a different node and must resynchronize the entire dataset from a replica. This takes time and may briefly impact performance on the source replica. Choose local NVMe when the performance gain outweighs the recovery cost, and always ensure you have at least one replica on network-attached storage or recent backups.

STORAGE COMPARISON: NETWORK vs LOCAL NVMe Network-Attached (EBS) Latency: ~1ms Throughput: Medium ✓ Pod migration: instant reattach ✓ Data survives node failure ⚠ Higher I/O latency ⚠ IOPS provisioning cost Best for: general workloads, simplicity Local NVMe SSD Latency: ~0.1ms Throughput: High ✓ Lowest latency possible ✓ Better cost per IOPS ✕ Data lost on node failure ✕ Full resync on recovery Best for: low-latency, cost-sensitive
NVMe SSD Performance Cost Savings Trade-offs
BEST_PRACTICE
Always set requests = limits for databases
Use Guaranteed QoS class
Network storage for easy recovery
NVMe for performance-critical workloads
Always maintain replicas for NVMe setups
QOS_CLASSES
Guaranteed req == limit
Burstable req < limit
BestEffort none set

4. Monitoring MySQL on Kubernetes

Monitoring in Kubernetes is largely a solved problem at the infrastructure level — Prometheus Operator, Grafana dashboards, and alerting pipelines are mature and battle-tested. But database monitoring is more than CPU and memory graphs. You need to see inside the engine.

INFRA: PROMETHEUS_STACK

Prometheus and Grafana

The standard Kubernetes monitoring stack — Prometheus Operator for metric collection and Grafana for visualization — integrates well with MySQL operators. Most operators deploy a mysqld_exporter sidecar alongside each MySQL pod that exposes internal engine metrics via a /metrics endpoint.

ServiceMonitor resources enable Prometheus to automatically discover and scrape new MySQL pods as they scale up or get rescheduled — adapting to Kubernetes' dynamic nature without manual configuration changes.

Prometheus Grafana ServiceMonitor Auto-Discovery
CRITICAL: DATABASE_METRICS

Database Metrics (Not Just Infrastructure)

Watching node CPU and pod memory is necessary but insufficient. You need to monitor MySQL-specific metrics that reveal the real health of your database:

Replication Lag — Seconds_Behind_Master / group replication lag. Non-zero means your replicas are falling behind.
InnoDB Buffer Pool Hit Rate — If below 99%, your working set doesn't fit in memory. You're hitting disk.
Slow Queries — Count and rate of queries exceeding long_query_time. Trend matters more than absolute number.
Connection Count — Active connections vs max_connections. Running out causes hard application failures.
Threads Running — Concurrent query execution. Spikes indicate lock contention or resource bottlenecks.
mysqld_exporter Replication Lag InnoDB Slow Queries
WARNING: DYNAMIC_TARGETS

Adapting Monitoring for Kubernetes

Traditional monitoring assumes static hosts with fixed IPs. Kubernetes breaks that assumption — pods get rescheduled, IPs change, replicas scale up and down. Make sure your monitoring setup:

Uses label-based service discovery (not static IPs)
Scrapes via ServiceMonitor or PodMonitor CRDs
Alerts on replication lag, not just pod restarts
Handles pod churn without losing metric continuity
Includes both infrastructure and database-level dashboards

5. Backup Strategies

Backups are the ultimate safety net. In cloud-native environments, the standard approach is S3-compatible object storage — but where you store backups and whether you support point-in-time recovery matters enormously.

STORAGE: S3_COMPATIBLE

S3-Compatible Backup Storage

The cloud-native standard for backup storage is any S3-compatible endpoint — AWS S3, Google Cloud Storage, Azure Blob (via S3 API), or self-hosted alternatives like MinIO. This gives you durability, versioning, and lifecycle management out of the box.

The recommended practice is to store backups outside of the Kubernetes cluster — in a separate account, region, or at minimum a separate namespace. If the cluster is compromised or destroyed, your backups survive.

That said, for development or staging environments it is sometimes acceptable to host backup storage within the same cluster using tools like MinIO alternatives, Ceph, or RustFS.

S3 Off-Cluster Durability Versioning
CRITICAL: POINT_IN_TIME_RECOVERY

Point-in-Time Recovery (PITR)

A full backup gives you a snapshot at a point in time. But real disasters require restoring to any arbitrary second — right before the bad DELETE statement or data corruption event. This is point-in-time recovery (PITR), and it requires continuous binary log streaming to your backup storage.

PITR on Kubernetes with S3 is not trivial. Binary logs must be uploaded continuously, and the restore process must be able to replay them on top of the last full backup. Not all MySQL operators support PITR well — evaluate this capability carefully when choosing your solution.

BACKUP & PITR TIMELINE FULL BACKUP Mon 00:00 binlog stream → S3 ☠ Bad DELETE Wed 14:32 ← RESTORE HERE Wed 14:31 FULL BACKUP Thu 00:00
PITR Binary Logs Continuous Granular Recovery
ADVICE: BACKUP_ARCHITECTURE

Backup Architecture Decisions

Recommended

Store backups in a separate cloud account or region. Use S3 lifecycle rules for retention. Test restores regularly — an untested backup is not a backup.

Acceptable (Dev/Staging)

Host S3-compatible storage (MinIO, RustFS, Ceph) in the same Kubernetes cluster. Convenient but carries shared-fate risk — cluster loss means backup loss.

Not Recommended

Writing backups to local PVCs or shared filesystem mounts. No durability guarantees, no versioning, and the same blast radius as the database itself.

6. MySQL Operators Compared

The Kubernetes ecosystem offers multiple operators for MySQL — each with different HA strategies, licensing models, and trade-offs. Here's how they compare.

MySQL Operator

Oracle
License Community (limited) / Enterprise
HA Method InnoDB Cluster (Group Replication)

Official Oracle operator. Full features like advanced backup and data-at-rest encryption require an Enterprise license.

Group Replication Enterprise
↗ GitHub

Percona Operators

Percona
License Fully Open Source (Apache 2.0)
HA Method PXC (Galera sync) & PS (Async + Orchestrator)

Two fully open-source operators for different consistency needs. PXC provides synchronous Galera-based replication; Percona Server operator covers async/semi-sync with Orchestrator for failover.

Apache 2.0 Galera Async
↗ percona.com

Moco

Cybozu
License Open Source
HA Method GTID-based Semi-Sync Replication

Designed for high compatibility with standard MySQL 8, avoiding Group Replication's limitations with large transactions. Great for vanilla MySQL use cases.

Semi-Sync MySQL 8
↗ GitHub

Bitpoke MySQL Operator

Bitpoke
License Open Source (commercial support)
HA Method Orchestrator + ProxySQL

Built for WordPress at scale, then open-sourced. Focuses on robust backups, PITR, and operational flexibility. ⚠ No longer actively developed.

PITR ProxySQL Inactive
↗ GitHub

Vitess Operator

PlanetScale
License Open Source
HA Method Vitess Clustering (Sharding)

For horizontal scaling and sharding of MySQL at massive scale. Originally developed at YouTube. Adds complexity but solves the "MySQL doesn't shard" problem.

Sharding Horizontal Scale
↗ GitHub

KubeDB

AppsCode
License Proprietary (not open source)
HA Method Varies by configuration

Part of a broader multi-database operator suite (PostgreSQL, MongoDB, etc.) with a unified management plane. Enterprise features behind a license.

Multi-DB Enterprise
↗ kubedb.com

Tungsten Operator

Continuent
License Commercial Product
HA Method Tungsten Cluster (Advanced Replication)

Commercial enterprise solution for advanced cross-datacenter and cross-cloud replication scenarios. Proprietary.

Commercial Multi-DC
↗ Documentation

OpenEverest: The Unified Approach

OpenEverest is a CNCF Sandbox project that simplifies multi-database orchestration on Kubernetes. Under the hood, it uses Percona Operator for MySQL based on PXC for MySQL workloads — giving you Galera-based synchronous replication with simplified management.

OpenEverest provides a single control plane for deploying and managing MySQL, PostgreSQL, and MongoDB clusters with consistent APIs, automated backups, monitoring integration, and zero vendor lock-in. Instead of managing each operator independently, OpenEverest orchestrates them through one unified interface.

CNCF Sandbox Open Source Multi-Database Percona PXC Unified API
$ kubectl apply -f mysql-cluster.yaml

Run MySQL on Kubernetes — The Right Way

Deploy production-grade MySQL on any Kubernetes cluster with automated failover, backups, and monitoring. Solanica Platform powered by OpenEverest gives you the tooling without the lock-in.

100% Open Source
3+ DB Engines
40-60% Cost Savings
solanica@k8s:~$ █