Timeline:
- Oct 3: Learning about this issue
- Tweet: https://x.com/andatki/status/1841915460062769187
- Reviewing commit: https://github.com/postgres/postgres/commit/62ddf7ee9a399e0b9624412fc482ed7365e38958
- Summarizing thoughts below
Goal: Understand WHY to ANALYZE ONLY on a root partition table.
The current version (17) documentation seems to be wrong. It says only the root table is analyzed, but in the commit message David Rowley describes the partitions of the root table are also analyzed when ANALYZE runs on the root.
If you are using manual VACUUM or ANALYZE commands, don't forget that you need to run them on each child table individually. A command like: ANALYZE measurement;
After the commit, we see the docs now describe using ONLY to get the behavior that was (incorrectly) documented before.
Manual VACUUM and ANALYZE commands will automatically process all inheritance child tables. If this is undesirable, you can use the ONLY keyword. A command like: ANALYZE ONLY measurement;
My question is: why ANALYZE ONLY on the root partition table at all? The interesting statistics like samples and counts etc. are on the partitions of the root.
For example, even doing a count of rows mixed with non-partitioned and partitioned tables, the partitioned tables require special counting logic.
Need to do some experiments to understand what stats we need on the root table, and why to ANALYZE it all.
With that said, since Autovacuum is handling running ANALYZE for us, evaluating thresholds and triggering a VACUUM (ANALYZE) part_table for each partition as needed, for whatever reason there is to ANALYZE only the root, being able to isolate to the root by adding ONLY makes sense, especially since that's what seems to have been previously documented although inaccurately.
Todo:
- Do some experiments, when TBD :)
Observations:
only_parted1(the partition) hasnull_fracpopulated, and this same info is available on the rootonly_partedn_distinctand othersThanks for the share of this below post Michael Christofides:
Daniel Westermann points out https://www.dbi-services.com/blog/postgresql-partitioned-tables-and-optimizer-statistics/:
ANALYZE ONLY, since ANALYZE propagates "If your partitions are huge, this can take quite some time and there is not much you can do about it. "ANALYZE ONLY"avoids redundant work as statistics on the partitions will be collected automatically anyway."My overall summary:
ANALYZE ONLY/ Postgres 18, stats were not possible to be collected on ONLY the partition root. But partitions have stats based on normal Autovacuum running/normal criteria, where partitioned tables aren't different from non-partitioned tables. WhenVACUUMruns it includes theANALYZEargument. Partitions stats don't seem to be propagated up automatically.My next question then is why not always propagate stats "up" from partitions to the partition root? I don't know. My guess is it could possibly be load intensive, and possibly redundant, running multiple instances if multiple partitions are being analyzed at the same time. May be easier to leave up to the user.
How big is the scope of the problem where partition root tables don't have stats by default? It seems like it must be lower priority issue, since it seems like Postgres has never automatically collected stats on the partition root by default, yet features like partition pruning work to my knowledge without explicitly running a manual
ANALYZEon the partition root table.