-
(Maybe/Probably) Suspend boost reporting job
-
ut site in maintenance mode(?) -
Manual run of boost etl job
-
Switch Production k8s endpoints to
boost02- external-reporting-compute-database
- external-reporting-database
- PR in
infrastructurerepo https://github.com/watermelonexpress/infrastructure/pull/673 - Deploy ☝️
Note: switching endpoints before promoting the standby database seems like the best way to prevent any split-brain or data loss in the HA cluster, but may result in some momentary ugly airbrakes. Obviously the goal is to switch/promote/stop old master as close to simultaneously as we can manage.
Q: Should this be a PR into
infrastructure:productionbranch, or can we deploy a separate branch into Production (we do intend to switch back, after all)? -
Roll pgbouncer pods in k8s
-
Switchover
boost01:5432primary instance toboost02:5432by runningrepmgr standby switchoveronboost02 -
Confirm logical replication subscription in session db (refresh or rebuild as needed)
Logical subscription in
benchprep_reporting_api_productionis pointed at db02, and in theory will pickup where it left off when we promote the standby, but my confidence in that is limited
-
Order new 1.9TB SED SSD for
boost01 -
Wait for IBM to install the disk
-
Stop postgres on
boost01:6432 -
Copy
/var/lib/pgsql/10/data/*.confto/tmp/5432/ -
Copy
/mount/pgsql/10/wmx_rails_api/*confto/tmp/6432/ -
Create new replica from base backup of
db02on/mount/pgsql/10/wmx_rails_api -
Copy
*.conffiles from/tmp/6432/to new data directory -
Start postgres
boost01:6432as streaming replica -
Create replica from base backup of
boost02:5432on/var/lib/pgsql/10/data -
Copy
*.conffiles from/tmp/5432/to/var/lib/pgsql/10/data -
Start postgres
boost01:5432in standby mode
- Put site in maintenance mode(?)
- Switch Production k8s endpoints to
boost01- external-reporting-compute-database
- external-reporting-database
- Roll pgbouncer pods
- Stop postgres on
boost02:5432 - Promote
boost01:5432from standby to master - Out of maintenance mode
- Confirm FDW config and connection from
boost01:5432/production_boost_reportingtowmx_rails_api_productionon 6432 replica. - Refresh or rebuild logical replication subscription in session db
- Re-enable boost etl cron job