Age Gap Is Not a Problem

MySQL × Cloud-Native

Sergey Pronin Founder, Solanica Inc. · Maintainer, OpenEverest

The Age Gap

MySQL turned 19 when Kubernetes was born

18 years
🎬 Toy Story premieres
1995 MySQL 1.0 MySQL
📖 Wikipedia launches
2001
📱 First iPhone announced
2007
2013 Docker
2014 Kubernetes Kubernetes
2018 Operator SDK
2019 Percona
Operator
for MySQL

The Problem

Two problems. Different solutions.

01

Philosophical

MySQL was built for permanent, named servers. Kubernetes assumes everything is temporary. The mental models don't overlap.

02

Technical

Replication, storage, failover, and connection routing all behave differently inside a cluster. The sharp edges are real.

Philosophical

MySQL grew up as a pet.
Kubernetes raises cattle.

🐕 Pet
  • db-primary.prod.local
  • Alert → SSH → investigate
  • The disk is the server
🐄🐄🐄 Cattle
  • mysql-abc-7f9d2
  • Crash → reschedule
  • State must be explicit

Part 2

Technical

Replication  ·  Storage  ·  Failover  ·  Networking

Problem

01

of many

High
Availability

When everything is allowed to fail — what happens to your database?

Uptime, then  and  now

Then

423 days

Maintenance windows. Graceful failover. Expected behavior.

Now

14:02:11evictpod terminated
14:02:34drainnode cordoned
14:02:48oomcontainer killed
14:03:07pvcvolume remount
14:03:22netpartition detected
14:03:55nodekernel reboot
14:04:18probeliveness failed
14:04:46scaleHPA scaled down
14:02:11evictpod terminated
14:02:34drainnode cordoned
14:02:48oomcontainer killed
14:03:07pvcvolume remount
14:03:22netpartition detected
14:03:55nodekernel reboot
14:04:18probeliveness failed
14:04:46scaleHPA scaled down

Anything. Anytime. By design.

Classic primary → replica

Async replication.
Failover = data loss.

Primary

mysql-0

accepts writes

T₁ T₂ T₃ T₄ T₅

binlog · async, with lag

Replica

mysql-1

always behind

In-flight transactions never arrive

Percona XtraDB Cluster · Galera

Every commit, on every node.

INSERT …

PXC

mysql-0

PXC

mysql-1

PXC

mysql-2

Lose any node. Zero data loss.

Problem

02

crash recovery

Full Cluster
Crash

Everyone wakes up. Nobody volunteers.

mysql-0

seqno

18

“might not be me”

mysql-1

seqno

18

“might not be me”

mysql-2

seqno

19

“might not be me”

how it happens

regional outage power loss kubectl delete --all

Highest seqno wins.

The operator scans every pod, picks the leader, and the cluster picks itself back up.

operator

scans all pods

mysql-0

seqno

18

joins

mysql-1

seqno

18

joins

mysql-2

seqno

19

bootstrap

Problem

03

backups & pitr

Backups &
Point-in-Time
Recovery

3am. The primary is gone. Where is your last transaction?

Backups are easy.   PITR is not.

Backups

xtrabackup S3

Snapshot. Schedule. Done.

?

PITR

binlogs S3

Streamed. Continuously. From a cluster.

The naive way

Every node uploads. Welcome to chaos.

PXC

mysql-0

PXC

mysql-1

PXC

mysql-2

S3 bucket

binlog.000087
binlog.000088 dup
binlog.000088 dup
binlog.000089
…000091 gap
Duplicate uploads Gaps when a node dies Who owns the binlog?

The answer

One Pod. One stream. One source of truth.

PXC

mysql-0

selected

mysql-1

oldest binlogs

PXC

mysql-2

PITR Pod

binlog-collector

lastUploaded: …d4a:3217
source: mysql-1
stream: mysqlbinlog -R

S3

binlog_…3215
binlog_…3216
binlog_…3217
One writer Tracks GTIDs Detects gaps Survives node failures