This guide records the playbook we followed on 2025-11-01 when Ceph started warning that each OSD hosted too many placement groups (PGs). It assumes you have kubectl access to the rook-ceph namespace and that the toolbox pod (deploy/rook-ceph-tools) is available.
KUBECONFIG=~/home-ops/kubeconfig \
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph statusLook for HEALTH_WARN too many PGs per OSD (XXX > max 250). Capture the total PG count and verify that all OSDs are up so you know the warning is PG-related rather than an OSD outage.
If you want the detailed message without the status summary:
KUBECONFIG=~/home-ops/kubeconfig \
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph health detailRun ceph df detail to see which pools exist and how many PGs they consume:
KUBECONFIG=~/home-ops/kubeconfig \
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph df detailSort the output by pool name. In our incident we found three legacy pools: default.rgw.log, default.rgw.control, and default.rgw.meta. They belonged to an unused RGW realm, yet each still reserved 8 PGs, pushing us over the per-OSD threshold.
- List realms, zonegroups, and zones:
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- radosgw-admin realm list kubectl -n rook-ceph exec deploy/rook-ceph-tools -- radosgw-admin zonegroup list kubectl -n rook-ceph exec deploy/rook-ceph-tools -- radosgw-admin zone list
- If you find an old realm (ours was named
default) with zero buckets or clients, mark it for deletion. Double-check that the active realm (e.g.,home) is still healthy before proceeding.
Note: These deletions are irreversible. Only run them after confirming no workloads depend on the target realm.
- Delete the unused zone and zonegroup:
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- \ radosgw-admin zone delete --rgw-zone=<zone> --rgw-zonegroup=<zonegroup> --rgw-realm=<realm> kubectl -n rook-ceph exec deploy/rook-ceph-tools -- \ radosgw-admin zonegroup delete --rgw-zonegroup=<zonegroup> --rgw-realm=<realm>
- Update the RGW period so the cluster accepts the change:
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- \ radosgw-admin period update --rgw-realm=<active-realm> --commit
- Remove the empty pools that belonged to the retired realm. Use both confirm flags to bypass the interactive prompt:
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- \ ceph osd pool rm default.rgw.log default.rgw.log --yes-i-really-really-mean-it kubectl -n rook-ceph exec deploy/rook-ceph-tools -- \ ceph osd pool rm default.rgw.control default.rgw.control --yes-i-really-really-mean-it kubectl -n rook-ceph exec deploy/rook-ceph-tools -- \ ceph osd pool rm default.rgw.meta default.rgw.meta --yes-i-really-really-mean-it
- Re-run
ceph statusand confirm the health returns toHEALTH_OK. - Check
ceph df detailagain; PG count should drop (ours fell from 265 to 169 total, ~169 per OSD). - Watch the Prometheus Ceph alerts (or Grafana dashboards) to ensure the warning stays cleared for at least one full scrape interval.
- When decommissioning RGW realms/zonegroups, delete their pools immediately so they do not keep PGs allocated.
- Keep total PGs per OSD <250 by using the Ceph PG calculator when adding pools or resizing them.
- Automate a periodic check (
ceph status+ceph df detail) and alert if PG count rises unexpectedly, so we catch rogue pools sooner.