- Local env setup
- Bonus: CloudSQL methods
- Configuration: most useful configs for balanced performance
- Basics of query dynamics:
- How a Query Works
- JOINs
- CTEs
- Transactions
- Functions
- Window Functions
- Aggregations
- Data Generation
- Constraints
- Useful Queries For Daily Business
- Some of main 4-Cast queries
- Performance Hints: How to
explainthe Queries- Indexes
- Statistics, Vacuum, Analyze, Re-Index
- Maintenance:
- Backup & Restore
- Storage: All About Sizes (see: link)
- Upgrades
- Migrations
- Bonus:
- Transactions
- Isolation Levels
psqlTips- Columnar
➜ docker run --name demo_server -e POSTGRES_PASSWORD=secret -e POSTGRES_DB=demo -e POSTGRES_USER=admin -p 54455:5432 postgis/postgis:14-3.2-alpine -d
# overwrite `application_name` to enable transparant monitoring
➜ docker exec -it demo_server psql -U admin -d 'dbname=demo application_name=ilhan_cli_app'
# OR
➜ docker -e PGAPPNAME=ilhan_cli_app exec -it demo_server psql -U admin -d demoGet cloud proxy binaries from link
# Expose database to local access
PRROJECT_ID=... \
REGION=... \
INSTANCE_NAME=... \
./cloud_sql_proxy -instances=${PRROJECT_ID:-myproject}:{REGION:-myregioon}:{INSTANCE_NAME:-myinstance}=tcp:54552
- Use
clonefunctionality to have an exact copy of current database - Run experiments
orheavy qureies (over yearly reporting queries) against clone db - Delete clones as soon as you finished your experiment
Use SHOW all; to get all available configurations.
Important values:
- effective_cache_size
- max_connextions
- maintenance_work_mem
- statement_timeout
- log_min_duration_statement
- max_locks_per_transaction
Recommendations
- Use
~/.psqlrcfor personalized values - Use
LOCALsetting changes for transaction based changes SET ...is only effectivfe for current session- Database Connections
- Keep
max_connextionslow - Use
pg_stat_*for overviews. PS: Google CloudSQL's query-insights also usespg_stat_activity
- Keep
- Query Execution Order: Top to down execution order => FROM, JOIN -> WHERE -> GROUP BY -> HAVING -> SELECT -> ORDER BY -> DISTINCT [ON] -> LIMIT
- When more than one table reference is listed in the
FROMclause, the tables are cross-joined (the Cartesian product) - Table Functions can be handy for query re-use
JOIN clauses nest left-to-right by default. Parentheses can be used to control the JOIN order
CROSS JOIN: For every possible combination of rows from T1 and T2 a row consisting of all columns in T1 followed by all columns in T2. T1(N rows), T2(M rows) => N * M rows.FROM T1 CROSS JOIN T2is equivalent toFROM T1 INNER JOIN T2 ON TRUE. It is also equivalent toFROM T1, T2- The words INNER and OUTER are optional in all forms.
- INNER is the default.
- LEFT, RIGHT and FULL imply an
OUTER JOIN. JOIN T1, T2 USING (a, b)is equevalant to.. ON T1.a = T2.a AND T1.b = T2.b. This is handy when joined column names are same.NATURALis a shorthand form of USING: it forms a USING list consisting of all column names that appear in both input tables.USINGis safe for schema changes,NATURALraises conflicting risks
q1 UNION [ALL] q2,q1 INTERSECT [ALL] q2andq1 EXCEPT [ALL] q2can be used to combine result of two queries- Ordering (like
ORDER BY sort_expression1 [ASC | DESC] [NULLS { FIRST | LAST }]) happens at the end of data fetching and is one of the most expensive operations - VALUES (
VALUES (1, 'one'), (2, 'two'), (3, 'three')) can be used to generate intermediate tables to ease queries RETURNING *give control over output of any query- Use range data types when applicable (see link)
demo=# create table api(jdoc jsonb); CREATE TABLE demo=# CREATE INDEX idxgin ON api USING GIN (jdoc); CREATE INDEX demo=#
CTEs
CTEs can split complex queries into multiple small portions
- CTEs can be used
RECURSIVEover row's own value. TIP: It is iterable, not recursion.
- B-Tree: Use for operations like
<OR<=OR=OR>=OR>together withBETWEEN,IN,IS NULLorIS NOT NULL - Hash: Use for
=operator. ie: UUID - GiST: A combination of index algorithms. Use for ...
- GIN: Inverted index. Use for text-search, ranking ...
- BRIN: Best for repeating data sets (dates, integers, etc)
-
SQL Dump & Restore
- Exporting:
man pg_dump(also see pg_dumpall) - Importing:
man pg_restore(orpsql {DB_NAME} < dump.file). Don't forget toANALYZEafter imports!
- Exporting:
-
File system level backup
-
Continuous archiving => use cloud :)
psql [-h .. -d .. -U .. -P ..] OR psql 'dbname=.. user=.. application_name=..' OR APPNAME=.. PGPASSWORD=.. PGHOST=.. psql
\?-> List of all available apis (likehelp)\d[+]-> show table definition\di-> list indexes\di <schema>.<tablename>for table specific\dp-> list access priviliges in tables, views, functions etc\g-> equavalant to;\sf-> show function definition script\sv-> show view definition script\g-> equavalant to;\q-> quit\i <filename>.sql-> execute sql file from the session\o <filename>-> send all query results to specified file\copy ...-> perform sql copy for data ingestion/extraction
Relational databases write data to disk in a row-based aproach
Row Based
Saved as tuples of row into disk. Needs a scan on disk in complex queries.
select value from data where name like 'fra%'; will first fetch all the table data to memory. This is due to storage structure undertheneeth.
| Name | City | ... | Some Value |
|---|---|---|---|
| Frank | Berlin | ... | 123 |
| ... | ... | ... | ... |
| Franz | Potsdam | ... | 321 |
Column Based
Each column saved to disk as a block. Thus, compression is highly possible.
Also, select value from data where name like 'fra%'; will fetch only data of value column.
| Name | City | ... | Some Value |
|---|---|---|---|
| Franz | Potsdam | ... | 125 |
| Frank | col2_val | ... | 222 |
| Henrik | col2_val | ... | 121 |
| ... | ... | ... | ... |
| Zeynep | Bitlis | XYZ | 111 |
Example
- ClickHous: Fully opensource analytics data capable of petabyte scale queries in seconds.
- Citus Data: Fully open source Postgresql extension with distributed capacity.
- Delegate heavy-lifting to database server
- Minimize network latency, computing time & costs
- Reduces compelxity
- Rethink your data design! Use jsonb, generated-columns and other features
- Use columnar data storage for long-term queries (like over-yearly calculations)
HELPER QUERIES
Basic JOIN Examples
Table Functions
Columnar DB Example (Citus Data)
Citus is an fully opensource Postgresql extension maintained by Microsoft for distributed PG.
docker run --name columnar_db -e POSTGRES_PASSWORD=skecret -e POSTGRES_DB=demo -e POSTGRES_USER=admin -p 1111:5432 citusdata/citus:11.0 docker exec -it columnar_db psql -U admin -d demoWHEREColumnar ShinesEven though everytinh in both tables is almost identical (only value contens are different, sizes & types are same), normal table answers in 1321 miliseconds but columnar only takes 20!
A factor of 66 times faster.
B-tree Indexes
Below is a demonstration of aa query with and without index.
Look carefully at the query plan of second (inclusive) index!
BRIN Index