shakti

aboutk: fast fun universal database and language for: hedgefunds banks manufacturers iot formula1 .. why: 100 times faster than polars datatable biqquery redshift databricks snowflake .. benchmarks: same machine. same data. same queries. (apples-to-apples) h20: 1Billion rows k is about 100 times faster than polars datatable .. taxi: 1Billion rows k is about 100 times faster than biggie shifty sparky .. stac: 100Billion rows only k and kx.com can do these queries. taq: 2000Billion rows only k and kx.com can do the asof joins in our lifetime. time - user and machine - is expensive. pandas and polars are free - god bless them. so 1K rows: use excel 1M rows: use pandas/polars 1B rows: use shakti 1T rows: only shakti by: arthur whitney+ thanks to e.l. whitney[1920-1966] dad:multiple putnam winner(beat john nash every time) k.e. iverson[1920-2004] advisor:APL turing award'79 john cocke [1925-2002] advisor:RISC turing award'87benchmarkaboutreal-sql(k) is consistently 100 times faster (or more) than redshift, bigquery, snowflake, spark, mongodb, postgres, .. same data. same queries. same hardware. anyone can run the scripts. benchmarks: h2o 1Billion rows taxi 1Billion rows taq 1000Billion trades and quotes stac 2000Billion trades and quotes Taq 1.1T https://www.nyse.com/publicdocs/nyse/data/Daily_TAQ_Client_Spec_v2.2a.pdf q1:select max price by sym,ex from trade where sym in S q2:select sum size by sym,time.hour from trade where sym in S q3:do(100)select last bid by sym from quote where sym in S / point select q4:select from trade[s],quote[s] where price<bid / asof join S is top 100 (10%) time(ms) 16core 100 days q1 q2 q3 q4 k 44 72 63 20 spark 80000 70000 DNF DNF - can't do it postgres 20000 80000 DNF DNF - can't do it .. Taxi 1.1B https://tech.marksblogg.com/benchmarks.html q1:select count by type from trips q2:select avg amount by pcount from trips q3:select count by year,pcount from trips q4:select count by year,pcount,_ distance from trips cpu cost core/ram elapsed machines k 4 .0004 4/16 1 1*i3.2xlarge(8v/32/$.62+$.93) redshift 864 .0900 108/1464 8(1 2 2 3) 6*ds2.8xlarge(36v/244/$6.80) bigquery 1600 .3200 200/3200 8(2 2 1 3) db/spark 1260 .0900 42/336 30(2 4 4 20) 21*m5.xlarge(4v/16/$.20+$.30) Stac https://www.stacresearch.com/ ..h2o.kt:.`9.csv \t select sum v1 by id1 from t \t select sum v1 by id1,id2 from t \t select sum v1,avg v3 by id3 from t \t select avg v1,avg v2,avg v3 by id1 from t \t select sum v1,sum v2,sum v3 by id3 from t \t select med v3,dev v3 by id1,id2 from t \t select min v1,max v1 by id3 from t; \t select 2 max v3 by id3 from t \t select v1 dev v2 by id1,id2 from t \\ https://h2oai.github.io/db-benchmark data: 50GB csv: 1e9 rows[id1 id2 id3 id4 id5 id6 v1 v2 v3] query: 9 multi-column aggregations[sum avg var dev correlation median 2max] machine: amd epyc 9374f code: k p/polars r/datatable [and much slower:clickhouse spark pandas arrow duckdb ..] query csvload (milliseconds) k 950 1,600 ? 97,000 606,000 p 258,000 265,000 r 257,000 1250,000 detail for the 9 queries k 42 76 25 117 16 293 33 15 330 ? 616 1509 6499 693 6260 20655 5817 51161 4231 p 1366 2401 42054 943 47177 5093 90847 29360 38500 r 3364 4494 7307 10008 7466 49770 63584 76673 31024 notes: similar results for the 1e8 and 1e7 ? is a 20th century version of k(32/64) please contact fintan if you are a possible customer and would like to duplicate these timingsdocumentk.dfast universal database and language. connect to everything. depend on nothing. select min e,max e,avg e by n from`t.csv (billion row challenge) `t 2'`t.csv [>>>k.k("2'","t",pandas.read_csv("t.csv"))] select [count first last min max sum avg var dev med ..] by from where .. while(..)if(..)else .. in .. exp log sqrt sin cos .. flip flop list sort asc desc unique group key val .. Verb (monad) Adverb Noun + + ' each char " ab" - - / over right name ``ab * * exp \ scan left int 2 3 4 % div log float 2 3.4 & and flip \l load | or flop \t time < < asc \v vars z.d date 2001.01.01 > > desc \w work z.t time 12:34:56.789 = = group \\ exit ~ ~ ~ ! mod index . . value @ @ first I/O Class ? find unique 0' line List (2;3.4) # take count 1' char Dict {a:2 3} _ drop root 2' data Table [a:2 3] ^ cut sort 3' set* Expr :2+a , , , 4' get* Func {[a]2+a} rosetta Atom List atom list Ddd Mm k ()[].;: +-*%&|<>=~$! @?#_^, -*%_~$ @,#!!|^<>?= ..! @& f' F' while(a)if(b)c else d python ()[].;= +-*/&|<> ^ % is2sa -els~s lrwrsa u2 dvk @t [f(x)..] [F(x,y)..] while a:if b:c else:d (index slice2 split append;exp log sqrt str;len range where reverse sort argsort unique2;dict key value;transpose) Type int float geo boolean char name date time Class List Dict Array Table Expression Function(* immutable) ifgbcndt LDATEF* devs k 1992 833...88 ++++++* 300spartans sql 1992 32 . .22 - * 20million python 1991 .... . -- 10million numpy 2005 433 99 + * 5million go 2009 422.. 2million nodejs 1995 . . . -- 15million excel 1982 . . . + * 80million c 1972 42 5million apl 1962 . .. + *sql.dshakti universal database includes: ansi-sql [1992..2011] ok for row/col select. real-sql [1974..2021] atw@ipsa does it better. join: real-easy ansi-ok real: select from T,U ansi: select from T left outer join U group: real-easy ansi-annoy real: select A by B from T ansi: select B, A from T group by B order by B simple: real-easy ansi-easy real: select A from T where C or D, E ansi: select A from T where (C or D)and E complex: real-easy ansi-awful asof/joins select from t,q where price<bid first/last select last bid from quote where sym=`A deltas/sums select from t where 0<deltas price foreignkeys select order.cust.nation.region .. arithmetic x+y e.g. combine markets through time example: TPC-H National Market Share Query 8 http://www.qdpma.com/tpch/TPCH100_Query_plans.html what market share does supplier.nation BRAZIL have by order.year for order.customer.nation.region AMERICA and part.type STEEL? real: select revenue avg supplier.nation=`BRAZIL by order.year from t where order.customer.nation.region=`AMERICA, part.type=`STEEL ansi: select o_year,sum(case when nation = 'BRAZIL' then revenue else 0 end) / sum(revenue) as mkt_share from ( select extract(year from o_orderdate) as o_year, revenue, n2.n_name as nation from t,part,supplier,orders,customer,nation n1,nation n2,region where p_partkey = l_partkey and s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name = 'AMERICA' and s_nationkey = n2.n_nationkey and o_orderdate between date '1995-01-01' and date '1996-12-31' and p_type = 'STEEL') as all_nations group by o_year order by o_year; Comparison: real ansi(sqlserver/oracle/db2/sap/teradata/..) install 1 second 100,000 second hardware 1 milliwatt 100,000 milliwatt software 160 kilobyte 8,000,000 kilobyte (+ 10,000,000kilobyte O/S) mediandb 1,000,000 megarow 10 megarow https://docs.microsoft.com/en-us/sql/database-engine/install-windows/install-sql-server?view=sql-server-ver15 shakti is essential for analyzing big (trillion row+) and/or complex data.k.funk.dk/fun: tiny k for educational purposes NYI {}()[].; ""23 `a "ab".!2 [*%&|^?<>=]x x[@?.]Y '/\ while if else + + + - - - * * exp % div log & and flip | or flop < < asc > > desc = = group ~ ~ ~ $ cast str . . value ! mod index @ @ first ? find unique # take count _ drop root ^ cut sort , , , rosetta Atom List atom list Ddd Mm k ()[].;: +-*%&|<>=~$! @?#_^, -*%_~$ @,#!!|^<>?= ..! @& f' F' while(a)if(b)c else d python ()[].;= +-*/&|<> ^ % is2sa -els~s lrwrsa u2 dvk @t [f(x)..] [F(x,y)..] while a:if b:c else:d index slice2 split append;exp log sqrt str;len range where reverse sort argsort unique2;dict key value;transpose [+&.!]E for norm etc. +-*%&|<>=~$.!@?#_^, '/\ +-*% &min |max < > = ~ $prs . !mod @at ?find #take _drop ^cut ,cat f'/\(wind while.) F'/\(each right left) +-*% &flip |flop <asc >dsc =grp ~ $str . !ind @at ?uniq #count _sqrt ^sort ,list f'/\(each fixed.) F'/\(prior over scan) [ci]'/\