Skip to content

Instantly share code, notes, and snippets.

@kidd
Last active January 18, 2025 13:00
Show Gist options
  • Select an option

  • Save kidd/97cfcd7fb42fc2b03c8c812559bb01d2 to your computer and use it in GitHub Desktop.

Select an option

Save kidd/97cfcd7fb42fc2b03c8c812559bb01d2 to your computer and use it in GitHub Desktop.
arg_max example in duckdb
D create table test as SELECT i as i ,random() as r, i%4 as gr from generate_series(1,100) t(i);
D select * from test;
┌───────┬──────────────────────┬───────┐
│ i │ r │ gr │
│ int64 │ double │ int64 │
├───────┼──────────────────────┼───────┤
10.149840296979490751
20.202743451391054182
30.074784896121077443
40.7822599512949260
50.22847936168044791
60.0416952469483239662
70.75733479269718163
80.739629612942140
90.77792079881250881
100.93890723584660032
│ · │ · │ · │
│ · │ · │ · │
│ · │ · │ · │
910.63642738471655813
920.186788567152523570
930.69673270189592911
940.92265254559988452
950.87448592224961293
960.8686024236652540
970.27604357136321341
980.033053687781340842
990.66633209531808563
1000.368920184990605360
├───────┴──────────────────────┴───────┤
100 rows (20 shown) 3 columns │
└──────────────────────────────────────┘
D SELECT gr,max(r) from test group by all; -- did you notice this group by all? super cool!
┌───────┬────────────────────┐
│ gr │ max(r) │
│ int64 │ double │
├───────┼────────────────────┤
00.9624206805514219
10.9922432776056797
20.9643812561324754
30.9202900638152589
└───────┴────────────────────┘
-- How could we get the id of the one that has the max r?
-- https://stackoverflow.com/questions/3800551/select-first-row-in-each-group-by-group/7630564#7630564.
-- not easy. But in some cases, arg_max can just do the job
D SELECT gr,max(r),arg_max(i,r) from test group by gr;
┌───────┬────────────────────┬───────────────┐
│ gr │ max(r) │ arg_max(i, r) │
│ int64 │ double │ int64 │
├───────┼────────────────────┼───────────────┤
00.962420680551421960
10.992243277605679761
20.964381256132475422
30.920290063815258923
└───────┴────────────────────┴───────────────┘
@matsonj
Copy link

matsonj commented Jan 16, 2025

@ANelson82 the advantage here is that you can sort the partitions only, not the entire list. QUALIFY forces you sort the entire list, arg_max does not.

@ANelson82
Copy link

That makes sense. Appreciate the explination @matsonj

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment