t:.`9.csv \t select sum v1 by id1 from t \t select sum v1 by id1,id2 from t \t select sum v1,avg v3 by id3 from t \t select avg v1,avg v2,avg v3 by id1 from t \t select sum v1,sum v2,sum v3 by id3 from t \t select med v3,dev v3 by id1,id2 from t \t select min v1,max v1 by id3 from t; \t select 2 max v3 by id3 from t \t select v1 dev v2 by id1,id2 from t \\ https://h2oai.github.io/db-benchmark data: 50GB csv: 1e9 rows[id1 id2 id3 id4 id5 id6 v1 v2 v3] query: 9 multi-column aggregations[sum avg var dev correlation median 2max] machine: amd epyc 9374f code: k p/polars r/datatable [and much slower:clickhouse spark pandas arrow duckdb ..] query csvload (milliseconds) k 950 1,600 ? 97,000 606,000 p 258,000 265,000 r 257,000 1250,000 detail for the 9 queries k 42 76 25 117 16 293 33 15 330 ? 616 1509 6499 693 6260 20655 5817 51161 4231 p 1366 2401 42054 943 47177 5093 90847 29360 38500 r 3364 4494 7307 10008 7466 49770 63584 76673 31024 notes: similar results for the 1e8 and 1e7 ? is a 20th century version of k(32/64) please contact fintan if you are a possible customer and would like to duplicate these timings