
aboutk: fast fun universal database and language for: hedgefunds banks manufacturers formula1 .. who: arthur whitney+ thanks to e.l. whitney[1920-1966] dad:multiple putnam winner(beat john nash every time) k.e. iverson[1920-2004] advisor:APL turing award'79 john cocke [1925-2002] advisor:RISC turing award'87 benchmarks: same machine. same data. same queries. (apples-to-apples) h20: 1Billion rows k is about 100 times faster than polars datatable .. taxi: 1Billion rows k is about 100 times faster than biggie shifty sparky .. taq: 1000Billion rows only k (incl. can do the asof joins in our lifetime. stac: 2000Billion rows only k (incl. can do these queries. time - user and machine - is expensive. pandas and polars are free - god bless them - so 1,000 rows: use excel 1,000,000 rows: use polars 1,000,000,000 rows: use k 1,000,000,000,000 rows: only k compareaboutreal-sql(k) is consistently 100 times faster (or more) than redshift, bigquery, snowflake, spark, mongodb, postgres, .. same data. same queries. same hardware. anyone can run the scripts. benchmarks: h2o 1Billion rows taxi 1Billion rows taq 1000Billion trades and quotes stac 2000Billion trades and quotes Taq 1.1T q1:select max price by sym,ex from trade where sym in S q2:select sum size by sym,time.hour from trade where sym in S q3:do(100)select last bid by sym from quote where sym in S / point select q4:select from trade[s],quote[s] where price<bid / asof join S is top 100 (10%) time(ms) 16core 100 days q1 q2 q3 q4 k 44 72 63 20 spark 80000 70000 DNF DNF - can't do it postgres 20000 80000 DNF DNF - can't do it .. Taxi 1.1B q1:select count by type from trips q2:select avg amount by pcount from trips q3:select count by year,pcount from trips q4:select count by year,pcount,_ distance from trips cpu cost core/ram elapsed machines k 4 .0004 4/16 1 1*i3.2xlarge(8v/32/$.62+$.93) redshift 864 .0900 108/1464 8(1 2 2 3) 6*ds2.8xlarge(36v/244/$6.80) bigquery 1600 .3200 200/3200 8(2 2 1 3) db/spark 1260 .0900 42/336 30(2 4 4 20) 21*m5.xlarge(4v/16/$.20+$.30) Stac ..h2o.kt:.`9.csv \t select sum v1 by id1 from t \t select sum v1 by id1,id2 from t \t select sum v1,avg v3 by id3 from t \t select avg v1,avg v2,avg v3 by id1 from t \t select sum v1,sum v2,sum v3 by id3 from t \t select med v3,dev v3 by id1,id2 from t \t select min v1,max v1 by id3 from t; \t select 2 max v3 by id3 from t \t select v1 dev v2 by id1,id2 from t \\ data: 50GB csv: 1e9 rows[id1 id2 id3 id4 id5 id6 v1 v2 v3] query: 9 multi-column aggregations[sum avg var dev correlation median 2max] machine: amd epyc 9374f code: k p/polars r/datatable [and much slower:clickhouse spark pandas arrow duckdb ..] query csvload (milliseconds) k 950 1,600 * 97,000 606,000 p 258,000 265,000 r 257,000 1250,000 detail for the 9 queries k 42 76 25 117 16 293 33 15 330 * 616 1509 6499 693 6260 20655 5817 51161 4231 p 1366 2401 42054 943 47177 5093 90847 29360 38500 r 3364 4494 7307 10008 7466 49770 63584 76673 31024 similar results for the 1e7 and 1e8. *(k4) is an old/1999 version of k. d:2017.01.01+m*!n g:{[[]v:x?2;p:x?9;m:x?100;a:x?2.3e]} t:d!g'n#380000 m*n*.38e6 \t:m select count by v from t \t:m select avg a by p from t \t:m select count by d.year,p from t \t:m select count by d.year,p,m from t \ /data curl -s > 2017.01 .. import`csv \t x:1:`2017.01 \t t:+`v`d`p`m`a!+csv["bd ii 2";x] \t t:`d grp t \t "t/"2:t 1.1billion taxi rides apples to apples (same data. same hardware. same queries.) k is 100 times faster than spark redshift snowflake bigquery .. select v,count(*)as n from t group by v select p,avg(a) as a from t group by p select extract(year from d)as year,p, count(*)as n from t group by year,p select extract(year from d)as year,p,m,count(*)as n from t group by year,p,m timings(aws i3.4xlarge) k sparky shifty flaky .0 12 19 20+ .3 18 15 30+ .1 20 33 50+ .5 103 36 60+ ---------------------- .9 153 103 160+ bottomline: k good (sparky/shifty/flaky/googly bad)docmanfast universal database and language. connect to everything. depend on nothing. select min e,max e,avg e by n from`t.csv (billion row challenge) `t 2'`t.csv [>>>k.k("2'","t",pandas.read_csv("t.csv"))] select [count first last min max sum avg var dev med ..] by from where .. while(..)if(..)else .. in .. exp log sqrt sin cos .. flip flop list sort asc desc unique group key val .. Verb (monad) Adverb Noun + + ' each char " ab" - - / over right name ``ab * * sqr \ scan left int 2 3 4 % div sqrt float 2 3.4 & and flip \l load | or flop \t time < < asc \v vars z.d date 2001.01.01 > > desc \w work z.t time 12:34:56.789 = = group \\ exit ~ ~ ~ . . value ! mod index @ @ first I/O Class ? find unique 0' line List (2;3.4) # take count 1' char Dict {a:2 3} _ drop floor 2' data Table [a:2 3] ^ cut order 3' set* Expr :2+a , , , 4' get* Func {[a]2+a} rosetta Atom List atom list Ddd Mm k ()[].;: +-*%&|<>=~$! @?#_^, ~-*%_$ @,#!!|^<>?= ..! @& f' F' while(a)if(b)c else d python ()[].;= +-*/&|<> ^ % isssa ~- sfs lrwrsa u dvk @t [f(x)..] [F(x,y)..] while a:if b:c else:d (index slices split append;sqrt floor str;len range where reverse sort argsort unique;dict key value;transpose) Type int float complex boolean char name date time Class List Dict Array Table Expression Function(* immutable) ifgbcndt LDATEF* devs k 1992 833...88 ++++++* 300spartans sql 1992 32 . .22 - * 20million python 1991 .... . -- 10million numpy 2005 433 99 + * 5million go 2009 422.. 2million nodejs 1995 . . . -- 15million excel 1982 . . . + * 80million c 1972 42 5million apl 1962 . .. + *sqlshakti universal database includes: ansi-sql [1992..2011] ok for row/col select. real-sql [1974..2021] atw@ipsa does it better. join: real-easy ansi-ok real: select from T,U ansi: select from T left outer join U group: real-easy ansi-annoy real: select A by B from T ansi: select B, A from T group by B order by B simple: real-easy ansi-easy real: select A from T where C or D, E ansi: select A from T where (C or D)and E complex: real-easy ansi-awful asof/joins select from t,q where price<bid first/last select last bid from quote where sym=`A deltas/sums select from t where 0<deltas price foreignkeys select order.cust.nation.region .. arithmetic x+y e.g. combine markets through time example: TPC-H National Market Share Query 8 what market share does supplier.nation BRAZIL have by order.year for order.customer.nation.region AMERICA and part.type STEEL? real: select revenue avg supplier.nation=`BRAZIL by order.year from t where order.customer.nation.region=`AMERICA, part.type=`STEEL ansi: select o_year,sum(case when nation = 'BRAZIL' then revenue else 0 end) / sum(revenue) as mkt_share from ( select extract(year from o_orderdate) as o_year, revenue, n2.n_name as nation from t,part,supplier,orders,customer,nation n1,nation n2,region where p_partkey = l_partkey and s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name = 'AMERICA' and s_nationkey = n2.n_nationkey and o_orderdate between date '1995-01-01' and date '1996-12-31' and p_type = 'STEEL') as all_nations group by o_year order by o_year; Comparison: real ansi(sqlserver/oracle/db2/sap/teradata/..) install 1 second 100,000 second hardware 1 milliwatt 100,000 milliwatt software 160 kilobyte 8,000,000 kilobyte (+ 10,000,000kilobyte O/S) mediandb 1,000,000 megarow 10 megarow shakti is essential for analyzing big (trillion row+) and/or complex data.eduaboutgiven vars(ijklmn(int) RAX(V6*)..) fns(v(permvarsi) V(vperm2varps)..) nRX(f) even-odd arthur: void f(in,VR,VX){i(n/16,Iz=X_;Ri=V(_I,z,r(X_,R[i+n/16]=V(I_,z,r))))} claude: void f(in,VR,VX){i(n/16,R[i]=V(X[i*16],X[i*16+8],0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30);R[i+n/16]=V(X[i*16],X[i*16+8],1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31))} nRX(f) recursive RX(f) flip 16.16 mnRX(f) flip m.x RX(f) f3(fft8) RX(f) f4(fft16) RX(f) f6(fft64) lmnRX (l+m/2>)split lmnRX recursivek.h#define I ((i6){0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}) #define V6 __attribute((vector_size(1<<6),aligned(1)))//ABCDEFGHIJKLMNOPQRSTUVWXYZ typedef unsigned long U;typedef char i0,g6 V6;typedef unsigned short i1;typedef unsigned i2,i6 V6;typedef float e2,e6 V6; static i6 _I=2*I,I_=1|2*I,AB=I%2*16|I,AA=I/2*2,BB=1|I/2*2,BA=I+1-I%2*2;static g6 z0,I0={0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63}; #define x86(o) __builtin_ia32_##o #define _Z(g,z,x...) static void g(x){z;} #define _D(t,g,z,x...) static t g(x){return({z;});} #define _U(g,z,x...) _D(U,g,z,x) #define _V(g,z,x...) _D(g6,g,z,x) #define _f(g,z) _U(g,z,Ux) #define UV(g,z) _U(g,z,Vx) #define Vf(g,z) _V(g,z,Vx) #define VF(g,z) _V(g,z,Va,Vx) #define VG(g,z) _V(g,z,ii,Va,Vx) #define V3(g,z) _V(g,z,Va,Vb,Vx) #define ii i2 i #define ss i0*s #define Ux U x #define Va i6 a #define Vb i6 b #define Vx i6 x #define o(o) x86(o##512) _f(nu,__builtin_popcountl(x))_f(iu,x?__builtin_ctzl(x):64)_f(lu,--x?64-__builtin_clzl(x):0)VF(L0,o(pminub)(a,x))VF(M0,o(pmaxub)(a,x))VF(L2,o(pminud)(a,x))VF(M2,o(pmaxud)(a,x)) UV(b0,o(cvtb2mask)(x))VF(p0,o(permvarqi)(a,x))V3(P0,o(vpermi2varqi)(a,b,x))_V(g0,x86(gathersiv16si)(z0,s,x,-1,1),ss,Vx) UV(b2,o(cvtd2mask)(x))VF(p2,o(permvarsi)(a,x))V3(P2,o(vpermi2varps)(a,b,x))_V(g2,x86(gathersiv16si)(z0,s,x,-1,4),ss,Vx)_U(c2,x86(compressstoresi512_mask)(s,x,i);nu(i),ss,i1 i,Vx) Vf(ba,p2(x,BA))Vf(ab,P2(L2(x,ba(x)),AB,M2(x,ba(x))))VG(R0,2>i?P0(a,I0-(i0)(1<<i),x):P2(a,I-(1<<i-2),x))VF(S0,a+x)VF(S2,(i6)a+x) #define Z0 static i0 #define Z2 static i2 #define r(b,z) ({typeof(b)r=b;z;r;}) #define h(b,z) {i2 $=b;i2 h=0;while(h<$){z;++h;}} #define i(b,z) {i2 $=b;i2 i=0;while(i<$){z;++i;}} #define j(b,z) {i2 $=b;i2 j=0;while(j<$){z;++j;}} #define Iz i6 z #define cc i0 c #define ij i2 j #define ik i2 k #define il i2 l #define im i2 m #define in i2 n #define sd i0*d #define Ri R[i] #define Rk R[k|i] #define IA i6*A #define VR g6*R #define VA g6*A #define VX g6*X #define R_ *R++ #define A_ *A++ #define X_ *X++man$wget;unzip k;make + - * % &and |or < > = ~ . ! , @ ?inv ^cut #take _drop 'map/over\scan abs - sqr sqrt flip rev asc desc freq not val key , first uniq sort count floor 'map/over\scan select [count first last min max sum avg ..] by from rand grid (also &|<>=) while if else k-torch n[012..] Zigmoid(x%1+E-x) Zoftmax(x%Sx:Ex-Mx) Rms(x%%A*x) fwd(24M/18)[6 18 288 288] f::x+g@fyZey:Rx+:d(vi,:cy)(z*by)'ki,:(z:4^#ki)*ay:Rx gen(37M/19)[ 32000 288] g::x?Mx:a@bf/ax tcn(.6M/16)[13 50 192 64] a+0|b?d^0|c?d^x mathematics from 60000BC 2(3)\3 5 /diatonic 3(@,?)1 3 5 divje babe -60000 flute s:1' /quicksort s rand 16 uruk -4400 divide&impera q:avg/1>+ /quadrant q grid 96 thebes -3200 log -1(`z?1) a:+\| /fibonacci a 2 3 pingala ujjain -300 b:!\| /gcdivisor b 6 4 euclid alexandria -300 p::(+x*)/1 2 3 /polynomial p rand 16 al-khwarizmi baghdad 820 r::(-x*)/1 3 5 /taylorseries q rand 16 madhava kerala 1390 arc cos sin e::x log x%+/x /entropy e freq kj napier louvain 1572 ratio zeno f:(+,-!)0' /fft f rand 16 gauss brunswick 1805 divide&impera w:(k log)w' /wordle w 2315 boltzmann graz 1872 divide&impera m::2>+(x+*)/x /mandelbrot m grid 96 fatou&julia paris 1915 rosetta infix prefix postfix k ()[]: +-*%!&|<>=~@?^#_, +-*%~_@,&|^<>?=#.! f'/\ 1992 apl ()[]: +-*%!&|<>=~ i TT, +- ~_ &| AA f'/\ 1964 python ()[]= +-*/%&|<>ZZ w[:]a a-ss~f ,trsa uclvk[len value key] [f(x) for x in x] 1991 numpy.[where append abs square sqrt floor transpose reverse sort argsort unique collections.counter]