e.l. whitney[1920-1966] dad:multiple putnam winner(beat john nash every time)
k.e. iverson[1920-2004] advisor:APL turing award'79
john cocke [1925-2002] advisor:RISC turing award'87
benchmarks: same machine. same data. same queries. (apples-to-apples)
h20: 1Billion rows k is about 100 times faster than polars datatable ..
taxi: 1Billion rows k is about 100 times faster than biggie shifty sparky ..
taq: 1000Billion rows only k (incl. kx.com) can do the asof joins in our lifetime.
stac: 2000Billion rows only k (incl. kx.com) can do these queries.
time - user and machine - is expensive. pandas and polars are free - god bless them - so
1,000 rows: use excel
1,000,000 rows: use polars
1,000,000,000 rows: use k
1,000,000,000,000 rows: only k
compareaboutreal-sql(k) is consistently 100 times faster (or more) than
redshift, bigquery, snowflake, spark, mongodb, postgres, ..
same data. same queries. same hardware. anyone can run the scripts.
h2o 1Billion rows
taxi 1Billion rows
taq 1000Billion trades and quotes
stac 2000Billion trades and quotes
Taq 1.1T https://www.nyse.com/publicdocs/nyse/data/Daily_TAQ_Client_Spec_v2.2a.pdf
q1:select max price by sym,ex from trade where sym in S
q2:select sum size by sym,time.hour from trade where sym in S
q3:do(100)select last bid by sym from quote where sym in S / point select
q4:select from trade[s],quote[s] where price<bid / asof join
S is top 100 (10%)
time(ms) 16core 100 days
q1 q2 q3 q4
k 44 72 63 20
spark 80000 70000 DNF DNF - can't do it
postgres 20000 80000 DNF DNF - can't do it
Taxi 1.1B https://tech.marksblogg.com/benchmarks.html
q1:select count by type from trips
q2:select avg amount by pcount from trips
q3:select count by year,pcount from trips
q4:select count by year,pcount,_ distance from trips
cpu cost core/ram elapsed machines
k 4 .0004 4/16 1 1*i3.2xlarge(8v/32/$.62+$.93)
redshift 864 .0900 108/1464 8(1 2 2 3) 6*ds2.8xlarge(36v/244/$6.80)
bigquery 1600 .3200 200/3200 8(2 2 1 3)
db/spark 1260 .0900 42/336 30(2 4 4 20) 21*m5.xlarge(4v/16/$.20+$.30)
Stac https://www.stacresearch.com/
\t select sum v1 by id1 from t
\t select sum v1 by id1,id2 from t
\t select sum v1,avg v3 by id3 from t
\t select avg v1,avg v2,avg v3 by id1 from t
\t select sum v1,sum v2,sum v3 by id3 from t
\t select med v3,dev v3 by id1,id2 from t
\t select min v1,max v1 by id3 from t;
\t select 2 max v3 by id3 from t
\t select v1 dev v2 by id1,id2 from t
data: 50GB csv: 1e9 rows[id1 id2 id3 id4 id5 id6 v1 v2 v3]
query: 9 multi-column aggregations[sum avg var dev correlation median 2max]
machine: amd epyc 9374f
code: k p/polars r/datatable [and much slower:clickhouse spark pandas arrow duckdb ..]
query csvload (milliseconds)
k 950 1,600
* 97,000 606,000
p 258,000 265,000
r 257,000 1250,000
detail for the 9 queries
k 42 76 25 117 16 293 33 15 330
* 616 1509 6499 693 6260 20655 5817 51161 4231
p 1366 2401 42054 943 47177 5093 90847 29360 38500
r 3364 4494 7307 10008 7466 49770 63584 76673 31024
similar results for the 1e7 and 1e8. *(k4) is an old/1999 version of k.
\t:m select count by v from t
\t:m select avg a by p from t
\t:m select count by d.year,p from t
\t:m select count by d.year,p,m from t
curl -s https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2017-01.csv > 2017.01
\t x:1:`2017.01
\t t:+`v`d`p`m`a!+csv["bd ii 2";x]
\t t:`d grp t
\t "t/"2:t
1.1billion taxi rides https://tech.marksblogg.com/benchmarks.html
apples to apples (same data. same hardware. same queries.)
k is 100 times faster than spark redshift snowflake bigquery ..
select v,count(*)as n from t group by v
select p,avg(a) as a from t group by p
select extract(year from d)as year,p, count(*)as n from t group by year,p
select extract(year from d)as year,p,m,count(*)as n from t group by year,p,m
timings(aws i3.4xlarge)
k sparky shifty flaky
.0 12 19 20+
.3 18 15 30+
.1 20 33 50+
.5 103 36 60+
.9 153 103 160+
fast universal database and language.
connect to everything. depend on nothing.
connect to everything. depend on nothing.
select min e,max e,avg e by n from`t.csv (billion row challenge)
`t 2'`t.csv [>>>k.k("2'","t",pandas.read_csv("t.csv"))]
select [count first last min max sum avg var dev med ..] by from where ..
while(..)if(..)else .. in .. exp log sqrt sin cos ..
flip flop list sort asc desc unique group key val ..
Verb (monad) Adverb Noun
+ + ' each char " ab"
- - / over right name ``ab
* * sqr \ scan left int 2 3 4
% div sqrt float 2 3.4
& and flip \l load
| or flop \t time
< < asc \v vars z.d date 2001.01.01
> > desc \w work z.t time 12:34:56.789
= = group \\ exit
~ ~ ~
. . value
! mod index
@ @ first I/O Class
? find unique 0' line List (2;3.4)
# take count 1' char Dict {a:2 3}
_ drop floor 2' data Table [a:2 3]
^ cut order 3' set* Expr :2+a
, , , 4' get* Func {[a]2+a}
rosetta Atom List atom list Ddd Mm
k ()[].;: +-*%&|<>=~$! @?#_^, ~-*%_$ @,#!!|^<>?= ..! @& f' F' while(a)if(b)c else d
python ()[].;= +-*/&|<> ^ % isssa ~- sfs lrwrsa u dvk @t [f(x)..] [F(x,y)..] while a:if b:c else:d
(index slices split append;sqrt floor str;len range where reverse sort argsort unique;dict key value;transpose)
Type int float complex boolean char name date time
Class List Dict Array Table Expression Function(* immutable)
ifgbcndt LDATEF* devs
k 1992 833...88 ++++++* 300spartans
sql 1992 32 . .22 - * 20million
python 1991 .... . -- 10million
numpy 2005 433 99 + * 5million
go 2009 422.. 2million
nodejs 1995 . . . -- 15million
excel 1982 . . . + * 80million
c 1972 42 5million
apl 1962 . .. + *sqlshakti universal database includes:
ansi-sql [1992..2011] ok for row/col select.
real-sql [1974..2021] atw@ipsa does it better.
join: real-easy ansi-ok
real: select from T,U
ansi: select from T left outer join U
group: real-easy ansi-annoy
real: select A by B from T
ansi: select B, A from T group by B order by B
simple: real-easy ansi-easy
real: select A from T where C or D, E
ansi: select A from T where (C or D)and E
complex: real-easy ansi-awful
asof/joins select from t,q where price<bid
first/last select last bid from quote where sym=`A
deltas/sums select from t where 0<deltas price
foreignkeys select order.cust.nation.region ..
arithmetic x+y e.g. combine markets through time
example: TPC-H National Market Share Query 8 http://www.qdpma.com/tpch/TPCH100_Query_plans.html
what market share does supplier.nation BRAZIL have by order.year for order.customer.nation.region AMERICA and part.type STEEL?
real: select revenue avg supplier.nation=`BRAZIL by order.year from t where order.customer.nation.region=`AMERICA, part.type=`STEEL
ansi: select o_year,sum(case when nation = 'BRAZIL' then revenue else 0 end) / sum(revenue) as mkt_share from (
select extract(year from o_orderdate) as o_year, revenue, n2.n_name as nation
from t,part,supplier,orders,customer,nation n1,nation n2,region
where p_partkey = l_partkey and s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey and
c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name = 'AMERICA' and
s_nationkey = n2.n_nationkey and o_orderdate between date '1995-01-01' and date '1996-12-31' and p_type = 'STEEL') as all_nations
group by o_year order by o_year;
Comparison: real ansi(sqlserver/oracle/db2/sap/teradata/..)
install 1 second 100,000 second
hardware 1 milliwatt 100,000 milliwatt
software 160 kilobyte 8,000,000 kilobyte (+ 10,000,000kilobyte O/S)
mediandb 1,000,000 megarow 10 megarow
shakti is essential for analyzing big (trillion row+) and/or complex data.eduaboutgiven shakti.com/edu/k.h vars(ijklmn(int) RAX(V6*)..) fns(v(permvarsi) V(vperm2varps)..)
nRX(f) even-odd
arthur: void f(in,VR,VX){i(n/16,Iz=X_;Ri=V(_I,z,r(X_,R[i+n/16]=V(I_,z,r))))}
claude: void f(in,VR,VX){i(n/16,R[i]=V(X[i*16],X[i*16+8],0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30);R[i+n/16]=V(X[i*16],X[i*16+8],1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31))}
nRX(f) recursive
RX(f) flip 16.16
mnRX(f) flip m.x
RX(f) f3(fft8)
RX(f) f4(fft16)
RX(f) f6(fft64)
lmnRX (l+m/2>)split
lmnRX recursivek.h#define I ((i6){0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15})
#define V6 __attribute((vector_size(1<<6),aligned(1)))//ABCDEFGHIJKLMNOPQRSTUVWXYZ
typedef unsigned long U;typedef char i0,g6 V6;typedef unsigned short i1;typedef unsigned i2,i6 V6;typedef float e2,e6 V6;
static i6 _I=2*I,I_=1|2*I,AB=I%2*16|I,AA=I/2*2,BB=1|I/2*2,BA=I+1-I%2*2;static g6
#define x86(o) __builtin_ia32_##o
#define _Z(g,z,x...) static void g(x){z;}
#define _D(t,g,z,x...) static t g(x){return({z;});}
#define _U(g,z,x...) _D(U,g,z,x)
#define _V(g,z,x...) _D(g6,g,z,x)
#define _f(g,z) _U(g,z,Ux)
#define UV(g,z) _U(g,z,Vx)
#define Vf(g,z) _V(g,z,Vx)
#define VF(g,z) _V(g,z,Va,Vx)
#define VG(g,z) _V(g,z,ii,Va,Vx)
#define V3(g,z) _V(g,z,Va,Vb,Vx)
#define ii i2 i
#define ss i0*s
#define Ux U x
#define Va i6 a
#define Vb i6 b
#define Vx i6 x
#define o(o) x86(o##512)
UV(b2,o(cvtd2mask)(x))VF(p2,o(permvarsi)(a,x))V3(P2,o(vpermi2varps)(a,b,x))_V(g2,x86(gathersiv16si)(z0,s,x,-1,4),ss,Vx)_U(c2,x86(compressstoresi512_mask)(s,x,i);nu(i),ss,i1 i,Vx)
#define Z0 static i0
#define Z2 static i2
#define r(b,z) ({typeof(b)r=b;z;r;})
#define h(b,z) {i2 $=b;i2 h=0;while(h<$){z;++h;}}
#define i(b,z) {i2 $=b;i2 i=0;while(i<$){z;++i;}}
#define j(b,z) {i2 $=b;i2 j=0;while(j<$){z;++j;}}
#define Iz i6 z
#define cc i0 c
#define ij i2 j
#define ik i2 k
#define il i2 l
#define im i2 m
#define in i2 n
#define sd i0*d
#define Ri R[i]
#define Rk R[k|i]
#define IA i6*A
#define VR g6*R
#define VA g6*A
#define VX g6*X
#define R_ *R++
#define A_ *A++
#define X_ *X++man$wget shakti.com/edu/k.zip;unzip k;make
+ - * % &and |or < > = ~ . ! , @ ?inv ^cut #take _drop 'map/over\scan
abs - sqr sqrt flip rev asc desc freq not val key , first uniq sort count floor 'map/over\scan
select [count first last min max sum avg ..] by from rand grid (also &|<>=) while if else
k-torch n[012..] Zigmoid(x%1+E-x) Zoftmax(x%Sx:Ex-Mx) Rms(x%%A*x)
fwd(24M/18)[6 18 288 288] f::x+g@fyZey:Rx+:d(vi,:cy)(z*by)'ki,:(z:4^#ki)*ay:Rx
gen(37M/19)[ 32000 288] g::x?Mx:a@bf/ax
tcn(.6M/16)[13 50 192 64] a+0|b?d^0|c?d^x
mathematics from 60000BC
2(3)\3 5 /diatonic 3(@,?)1 3 5 divje babe -60000 flute
s:1' /quicksort s rand 16 uruk -4400 divide&impera
q:avg/1>+ /quadrant q grid 96 thebes -3200 log -1(`z?1)
a:+\| /fibonacci a 2 3 pingala ujjain -300
b:!\| /gcdivisor b 6 4 euclid alexandria -300
p::(+x*)/1 2 3 /polynomial p rand 16 al-khwarizmi baghdad 820
r::(-x*)/1 3 5 /taylorseries q rand 16 madhava kerala 1390 arc cos sin
e::x log x%+/x /entropy e freq kj napier louvain 1572 ratio zeno
f:(+,-!)0' /fft f rand 16 gauss brunswick 1805 divide&impera
w:(k log)w' /wordle w 2315 boltzmann graz 1872 divide&impera
m::2>+(x+*)/x /mandelbrot m grid 96 fatou&julia paris 1915
rosetta infix prefix postfix
k ()[]: +-*%!&|<>=~@?^#_, +-*%~_@,&|^<>?=#.! f'/\ 1992
apl ()[]: +-*%!&|<>=~ i TT, +- ~_ &| AA f'/\ 1964
python ()[]= +-*/%&|<>ZZ w[:]a a-ss~f ,trsa uclvk[len value key] [f(x) for x in x] 1991
numpy.[where append abs square sqrt floor transpose reverse sort argsort unique collections.counter]