MySQL was born in 1995 and has matured into one of the most widely deployed relational databases on the planet. Running it on Kubernetes is no longer experimental — it is a production-proven approach used by organizations that want portability, automation, and cost efficiency.
●STATUS: PRODUCTION_PROVEN
Is MySQL on Kubernetes Production-Ready?
The short answer: yes. The longer answer: multiple battle-tested Kubernetes operators now exist specifically for MySQL. Organizations like GitHub, Shopify, and Slack have demonstrated that stateful workloads — including relational databases — run reliably on Kubernetes when managed with the right tooling.
Running MySQL on Kubernetes delivers three core advantages that traditional VM-based deployments cannot match:
🔓
No Vendor Lock-In
Deploy on any cloud or on-prem infrastructure. The same MySQL deployment runs on AWS, GCP, Azure, or bare metal — your data stays portable.
⚙️
Automated Operations
Kubernetes operators handle failover, scaling, backups, and upgrades automatically. What used to be runbooks and pager alerts becomes declarative YAML.
💰
Cost Efficiency
Consolidate workloads on shared clusters, right-size resources dynamically, and avoid paying the cloud database premium — often saving 40–60% on TCO.
The debate is over. Dozens of mature operators, a vibrant Data on Kubernetes community, and years of production usage across thousands of clusters have proven that MySQL on Kubernetes works. The real question is which topology, storage strategy, and operator to choose.
A MySQL deployment on Kubernetes is best understood as two layers working together: a proxy layer that routes traffic and a database layer that handles replication and storage. Getting the topology right is the foundation of everything else.
●LAYER: PROXY
The Proxy Layer (HAProxy, ProxySQL)
Every production MySQL deployment on Kubernetes needs a proxy in front of the database nodes. The proxy handles connection routing, read/write splitting, and transparent failover so that application code never needs to know which pod is the current primary.
The two most common choices are HAProxy and ProxySQL. HAProxy is a fast, lightweight TCP/HTTP load balancer — simple to configure and excellent for basic read/write splitting. ProxySQL is MySQL-aware, offering query-level routing, connection multiplexing, and query caching. Both integrate well with Kubernetes operators.
The proxy pods must be deployed separately from the MySQL pods, ideally on their own nodes, to prevent a single node failure from taking down both the proxy and a database instance simultaneously.
HAProxyTCP LB, simple
ProxySQLSQL-aware routing
IsolationRequired
●TOPOLOGY: ASYNC_REPLICATION
Asynchronous Replication
The standard MySQL replication model. The primary node writes to its binary log and replicas pull changes asynchronously. This provides good performance and is the most widely used topology. The trade-off is a small replication lag — if the primary fails before a replica catches up, some transactions may be lost.
Semi-synchronous replication improves on this by requiring at least one replica to acknowledge the transaction before the primary commits, reducing the data loss window significantly.
Primary-ReplicaLow LatencySemi-Sync Option
●TOPOLOGY: SYNC_REPLICATION
Synchronous Replication (Galera, Group Replication)
For workloads that require zero data loss, MySQL offers two synchronous replication options: Percona XtraDB Cluster (PXC) built on Galera, and MySQL Group Replication (InnoDB Cluster). Both ensure every committed transaction is written to all nodes before acknowledgment.
Galera uses certification-based replication and supports true multi-primary writes. Group Replication offers a similar model with tighter integration into the MySQL ecosystem. The trade-off is higher write latency due to cross-node coordination.
Zero Data LossGalera / PXCGroup Replication
●DIAGRAM: MYSQL_K8S_TOPOLOGY
MySQL Cluster Topology on Kubernetes
●HA: NODE_AFFINITY
High Availability with Node Affinity
For a 3-node MySQL cluster (1 primary + 2 replicas) with 3 proxy pods, you need at least 3 Kubernetes nodes. Each MySQL pod and each proxy pod must sit on a different physical node, enforced via podAntiAffinity. This ensures no single node failure takes down more than one database instance and one proxy simultaneously.
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: mysql
Node IsolationRequired Policy
●RESILIENCE: MULTI_AZ
Multi-AZ Deployments
Cloud providers enable scheduling pods across different availability zones (AZs). This is critical — if an entire AZ goes down, your cluster maintains quorum in the remaining zones. Use topologySpreadConstraints to distribute MySQL pods evenly across AZs and ensure the cluster survives zone-level failures.
Zone RedundancyQuorum ProtectiontopologySpreadConstraints
$ kubectl get qos --selector=app=mysql
3. Compute & Storage Strategy
Getting resources right is the difference between a rock-solid database and one that gets killed by the Linux OOM killer at 3 AM. Storage choices directly impact your recovery time and cost.
●CRITICAL: GUARANTEED_QOS
Guaranteed QoS (Requests = Limits)
For stateless apps, setting low requests and high limits is standard for bin-packing. For databases, this is a recipe for disaster. If your memory limit is higher than the request, the Linux OOM killer will target the database pod first when the node comes under pressure.
The recommended approach for MySQL on Kubernetes: set requests equal to limits for both CPU and RAM. This gives the pod Guaranteed QoS class — the highest priority in the Kubernetes scheduler, immune to eviction during node pressure.
The standard and most common approach is to decouple storage from compute using network-attached volumes like AWS EBS, GCP Persistent Disk, or Ceph. This provides a major operational advantage: when a node fails, the pod is rescheduled on another node and the same PersistentVolume is re-attached with all data intact.
This makes pod migration seamless and fast — there is no data resynchronization needed. The trade-off is I/O latency, which is higher than local disk because every read/write traverses the network.
Easy MigrationAWS EBSCephGCP PD
●STORAGE: LOCAL_NVME
Local NVMe SSDs
For latency-sensitive workloads, local NVMe SSDs offer the best raw performance — dramatically lower I/O latency and higher throughput than any network-attached option. This can also save significant cost since local storage is typically cheaper than provisioned IOPS network volumes.
The trade-off is durability: if the node fails, the local volume is lost. The database pod is rescheduled on a different node and must resynchronize the entire dataset from a replica. This takes time and may briefly impact performance on the source replica. Choose local NVMe when the performance gain outweighs the recovery cost, and always ensure you have at least one replica on network-attached storage or recent backups.
NVMe SSDPerformanceCost SavingsTrade-offs
●BEST_PRACTICE
▸Always set requests = limits for databases
▸Use Guaranteed QoS class
▸Network storage for easy recovery
▸NVMe for performance-critical workloads
▸Always maintain replicas for NVMe setups
●QOS_CLASSES
Guaranteedreq == limit
Burstablereq < limit
BestEffortnone set
$ kubectl get servicemonitor -l app=mysql
4. Monitoring MySQL on Kubernetes
Monitoring in Kubernetes is largely a solved problem at the infrastructure level — Prometheus Operator, Grafana dashboards, and alerting pipelines are mature and battle-tested. But database monitoring is more than CPU and memory graphs. You need to see inside the engine.
●INFRA: PROMETHEUS_STACK
Prometheus and Grafana
The standard Kubernetes monitoring stack — Prometheus Operator for metric collection and Grafana for visualization — integrates well with MySQL operators. Most operators deploy a mysqld_exporter sidecar alongside each MySQL pod that exposes internal engine metrics via a /metrics endpoint.
ServiceMonitor resources enable Prometheus to automatically discover and scrape new MySQL pods as they scale up or get rescheduled — adapting to Kubernetes' dynamic nature without manual configuration changes.
PrometheusGrafanaServiceMonitorAuto-Discovery
●CRITICAL: DATABASE_METRICS
Database Metrics (Not Just Infrastructure)
Watching node CPU and pod memory is necessary but insufficient. You need to monitor MySQL-specific metrics that reveal the real health of your database:
●
Replication Lag— Seconds_Behind_Master / group replication lag. Non-zero means your replicas are falling behind.
●
InnoDB Buffer Pool Hit Rate— If below 99%, your working set doesn't fit in memory. You're hitting disk.
●
Slow Queries— Count and rate of queries exceeding long_query_time. Trend matters more than absolute number.
●
Connection Count— Active connections vs max_connections. Running out causes hard application failures.
Traditional monitoring assumes static hosts with fixed IPs. Kubernetes breaks that assumption — pods get rescheduled, IPs change, replicas scale up and down. Make sure your monitoring setup:
✓Uses label-based service discovery (not static IPs)
✓Scrapes via ServiceMonitor or PodMonitor CRDs
✓Alerts on replication lag, not just pod restarts
✓Handles pod churn without losing metric continuity
✓Includes both infrastructure and database-level dashboards
$ kubectl get backup -l engine=mysql
5. Backup Strategies
Backups are the ultimate safety net. In cloud-native environments, the standard approach is S3-compatible object storage — but where you store backups and whether you support point-in-time recovery matters enormously.
●STORAGE: S3_COMPATIBLE
S3-Compatible Backup Storage
The cloud-native standard for backup storage is any S3-compatible endpoint — AWS S3, Google Cloud Storage, Azure Blob (via S3 API), or self-hosted alternatives like MinIO. This gives you durability, versioning, and lifecycle management out of the box.
The recommended practice is to store backups outside of the Kubernetes cluster — in a separate account, region, or at minimum a separate namespace. If the cluster is compromised or destroyed, your backups survive.
That said, for development or staging environments it is sometimes acceptable to host backup storage within the same cluster using tools like MinIO alternatives, Ceph, or RustFS.
S3Off-ClusterDurabilityVersioning
●CRITICAL: POINT_IN_TIME_RECOVERY
Point-in-Time Recovery (PITR)
A full backup gives you a snapshot at a point in time. But real disasters require restoring to any arbitrary second — right before the bad DELETE statement or data corruption event. This is point-in-time recovery (PITR), and it requires continuous binary log streaming to your backup storage.
PITR on Kubernetes with S3 is not trivial. Binary logs must be uploaded continuously, and the restore process must be able to replay them on top of the last full backup. Not all MySQL operators support PITR well — evaluate this capability carefully when choosing your solution.
PITRBinary LogsContinuousGranular Recovery
●ADVICE: BACKUP_ARCHITECTURE
Backup Architecture Decisions
●Recommended
Store backups in a separate cloud account or region. Use S3 lifecycle rules for retention. Test restores regularly — an untested backup is not a backup.
●Acceptable (Dev/Staging)
Host S3-compatible storage (MinIO, RustFS, Ceph) in the same Kubernetes cluster. Convenient but carries shared-fate risk — cluster loss means backup loss.
●Not Recommended
Writing backups to local PVCs or shared filesystem mounts. No durability guarantees, no versioning, and the same blast radius as the database itself.
$ kubectl get operators --selector=database=mysql
6. MySQL Operators Compared
The Kubernetes ecosystem offers multiple operators for MySQL — each with different HA strategies, licensing models, and trade-offs. Here's how they compare.
●
MySQL Operator
Oracle
LicenseCommunity (limited) / Enterprise
HA MethodInnoDB Cluster (Group Replication)
Official Oracle operator. Full features like advanced backup and data-at-rest encryption require an Enterprise license.
HA MethodPXC (Galera sync) & PS (Async + Orchestrator)
Two fully open-source operators for different consistency needs. PXC provides synchronous Galera-based replication; Percona Server operator covers async/semi-sync with Orchestrator for failover.
Designed for high compatibility with standard MySQL 8, avoiding Group Replication's limitations with large transactions. Great for vanilla MySQL use cases.
For horizontal scaling and sharding of MySQL at massive scale. Originally developed at YouTube. Adds complexity but solves the "MySQL doesn't shard" problem.
OpenEverest is a CNCF Sandbox project that simplifies multi-database orchestration on Kubernetes. Under the hood, it uses Percona Operator for MySQL based on PXC for MySQL workloads — giving you Galera-based synchronous replication with simplified management.
OpenEverest provides a single control plane for deploying and managing MySQL, PostgreSQL, and MongoDB clusters with consistent APIs, automated backups, monitoring integration, and zero vendor lock-in. Instead of managing each operator independently, OpenEverest orchestrates them through one unified interface.
CNCF SandboxOpen SourceMulti-DatabasePercona PXCUnified API
$ kubectl apply -f mysql-cluster.yaml
Run MySQL on Kubernetes — The Right Way
Deploy production-grade MySQL on any Kubernetes cluster with automated failover, backups, and monitoring. Solanica Platform powered by OpenEverest gives you the tooling without the lock-in.