aboutuniversal database + language
for: hedgefunds banks manufacturers iot formula1 ..
why: hundred times faster than biqquery redshift azure databricks snowflake ..
who: arthur whitney+
thanks to
e.l. whitney[1920-1966] multiple putnam winner
k.e. iverson[1920-2004] APL turing award'79 [advisor]
john cocke [1925-2002] RISC turing award'87 [advisor]
compare vs biggie shifty azure sparky flaky ..
taxi: 1.1Billion 10-100 times faster/QP$
taq: 1.1Trillion 100-INF times faster/QP$benchmarkaboutk-sql is consistently 100 times faster (or more) than
redshift, bigquery, snowflake, spark, mongodb, postgres, ..
same data. same queries. same hardware. anyone can run the scripts.
benchmarks:
taq 1.1Trillion trades and quotes
taxi 1.1Billion nyc taxi rides
stac ..
Taq 1.1T https://www.nyse.com/publicdocs/nyse/data/Daily_TAQ_Client_Spec_v2.2a.pdf
q1:select max price by sym,ex from trade where sym in S
q2:select sum size by sym,time.hour from trade where sym in S
q3:do(100)select last bid by sym from quote where sym in S / point select
q4:select from trade[s],quote[s] where price<bid / asof join
S is top 100 (10%)
time(ms) 16core 100 days
q1 q2 q3 q4
k 44 72 63 20
spark 80000 70000 DNF DNF - can't do it
postgres 20000 80000 DNF DNF - can't do it
..
Taxi 1.1B https://tech.marksblogg.com/benchmarks.html
q1:select count by type from trips
q2:select avg amount by pcount from trips
q3:select count by year,pcount from trips
q4:select count by year,pcount,_ distance from trips
cpu cost core/ram elapsed machines
k 4 .0004 4/16 1 1*i3.2xlarge(8v/32/$.62+$.93)
redshift 864 .0900 108/1464 8(1 2 2 3) 6*ds2.8xlarge(36v/244/$6.80)
bigquery 1600 .3200 200/3200 8(2 2 1 3)
db/spark 1260 .0900 42/336 30(2 4 4 20) 21*m5.xlarge(4v/16/$.20+$.30)
Stac https://www.stacresearch.com/
..documentk.dverb noun adverb
+ flip bool f' each
- char " ab" f/ right over
* first name ``ab f\ left scan
% int 2 3 4
& where float 2 3e4 \v vars
| reverse *fixed 2.00 3.40 \w work
< asc \l import
> desc *z.d date 2001.01.01 \t timeit
= freq *z.t time 12:34:56.789
~ not *z.l locus 40.75::73.98
! range
, list I/O
@ type list (2;3.4) 0' line
? find unique dict {a:2 3} 1' char
^ cut sort table [a:2 3] *2' data
# take count *3' set
_ drop floor expr :2+x *4' get
$ cast str *func {[.]..} *5' ffi
if a then b else c
*enterprise
python import k;k.k('select ..')
select [count first last min max sum ..]v by e from t
sqrt exp log sin cos; in div mod bar
*ffi: csv json kafka parquet avro ..
`csv?`csv t:[t:09:30:00 09:30:00;e:"bb";s:`aa`aa;v:2 3;p:2.3 3.4]
`lz4?`lz4 x:"t,e,s,v,p\n09:30:00,b,aa,2,2.3\n09:30:00,b,aa,3,3.4\n"sql.dshakti universal database includes:
ansi-sql [1992..2011] ok for row/col select.
real-sql [1974..2021] atw@ipsa does it better.
join: real-easy ansi-ok
real: select from T,U
ansi: select from T left outer join U
group: real-easy ansi-annoy
real: select A by B from T
ansi: select B, A from T group by B order by B
simple: real-easy ansi-easy
real: select A from T where C or D, E
ansi: select A from T where (C or D)and E
complex: real-easy ansi-awful
asof/joins select from t,q where price<bid
first/last select last bid from quote where sym=`A
deltas/sums select from t where 0<deltas price
foreignkeys select order.cust.nation.region ..
arithmetic x+y e.g. combine markets through time
example: TPC-H National Market Share Query 8 http://www.qdpma.com/tpch/TPCH100_Query_plans.html
what market share does supplier.nation BRAZIL have by order.year for order.customer.nation.region AMERICA and part.type STEEL?
real: select revenue avg supplier.nation=`BRAZIL by order.year from t where order.customer.nation.region=`AMERICA, part.type=`STEEL
ansi: select o_year,sum(case when nation = 'BRAZIL' then revenue else 0 end) / sum(revenue) as mkt_share from (
select extract(year from o_orderdate) as o_year, revenue, n2.n_name as nation
from t,part,supplier,orders,customer,nation n1,nation n2,region
where p_partkey = l_partkey and s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey and
c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name = 'AMERICA' and
s_nationkey = n2.n_nationkey and o_orderdate between date '1995-01-01' and date '1996-12-31' and p_type = 'STEEL') as all_nations
group by o_year order by o_year;
Comparison: real ansi(sqlserver/oracle/db2/sap/teradata/..)
install 1 second 100,000 second
hardware 1 milliwatt 100,000 milliwatt
software 160 kilobyte 8,000,000 kilobyte (+ 10,000,000kilobyte O/S)
mediandb 1,000,000 megarow 10 megarow
https://docs.microsoft.com/en-us/sql/database-engine/install-windows/install-sql-server?view=sql-server-ver15
shakti is essential for analyzing big (trillion row+) and/or complex data.linuxffipythonmacos