https://code.kx.com/trac/wiki/QforMortals/tables
https://code.kx.com/trac/browser/kx/kdb%2B/sp.q

0. Run setup. setup.q


stock: `ibm`bac`usb`bac`bac`usb;
stock,: `ibm`bac`usb`ibm`bac`usb;
stock,: `ibm`ibm`usb;
price: 1.1 2.4 3.3 3.0 2.8 1.8;
price,: 3.1 4.4 1.3 2.0 3.8 5.8;
price,: 5.1 4.3 3.3;
amount: 100 200 300 400 500 300;
amount,: 300 200 400 600 700 800;
amount,: 700 800 700;
time: 09:03:06.000 09:03:23.000 09:04:01.000 09:05:01.000 09:06:01.000 09:07:01.000;
time,: 10:03:06.000 10:03:23.000 10:04:01.000 10:05:01.000 10:06:01.000 10:07:01.000;
time,: 11:03:06.000 11:03:23.000 11:04:01.000;
trade:([]stock; price; amt:amount; time);
save `:trade.csv; / to write a csv file

Things to note:
1. Insert into a table by creating lists and then assigning.
2. The initial brackets are for key fields. There are none here.
3. Saving to a csv file is possible for all but the very largest files.



Task: Import a table of stock prices in csv format (`trade.csv
as written during class)
and find the average price per stock and write that to a csv file
	q readavg.q

Reading in is a type field (S for strings, F for floats,
and T for times) with a delimiter of comma.

mytrade: ("SFFT"; enlist ",") 0: `trade.csv

If you know SQL, then kdb is a semantic extension,
but has someone different syntax.

myavg: select avg price by stock from mytrade

In standard SQL the "group by" goes after the "from" and "where" clauses.
Here the "by" clause goes before "from".

save `:myavg.csv


Exercise 2. (Middle) The volume weighted average price per stock
is the average of the price times the volume.
Compute the volume weighted average price 
per stock
        q vwap.q


The very powerful
semantic  extension that kdb offers is that it is based on 
ordered columns
and therefore it can support queries that make use of that order.
You can therefore create functions that work on arrays
and run them.


/ given a list of prices, get the 2 way moving average
movavg2:{[x] (x[0]),raze avg each ((-1) _ x),'(1 _ x)}
mytrade: ("SFFT"; enlist ",") 0: `trade.csv
myavg2: select movavg2 price by stock from mytrade
save `:myavg2

Note the different method of saving.
It doesn't work
to save it to a csv file because it is not in first normal form.

Exercise 3. (Hard) Compute the 4 moving average per stock. 
Hint: Take care of the first three moving averages using the avgs
function and then go on from there using the difference of sums.
Display the result and save it as a q file. 
	q readmovavglong.q


Let's say we want to generate a random table having 100,000 rows from
the stocks `ibm`hp`amaz`goog`aapl,
prices in the range 20 to 400,
and volumes in the range 100 to 100000 but multiples of 100,
times arbitrary in the range from 10 AM to 4 PM.
	q gendata.q

n: 100000;
stocks: `ibm`hp`amaz`goog`aapl;
rantrade:([]stock: n?stocks; price: 20 + n?380.0;amount: 100*(1+n?1000);
  time: 10:00:00.000 + n?06:00:00.000);

Points to note:
a) n ? stocks
b) 20 + n ? 380.0
c) n ? 06:00:00.000

save `:rantrade.csv


Use the table generated and 
select the name and last price of
each stock into a new table pricecol.

pricecol: select last price by stock from rantrade

Notes:
a. You do not have to put "stock" in the select clause since
already in the by. In fact, you shouldn't do it.
b. Because the columns aren't ordered, this does not give us the
last stock by time.
That is dealt with by xasc. More on that later.

Exercise 4. (Easy) Find the volume-weighted average price of each stock
	q volweight.q

A few notes on inserting data.

/ get the prices column as a list.
/ price is the second column, but col numbers start at 0
(value flip rantrade)[1]

`foo insert rantrade; / note that insert creates a table if it doesn't exist

count foo

`foo insert rantrade / or if it already exists

count foo


Another way to insert data.
Can also insert into a table using columns:
newstocks: n?stocks;
newprice: 20 + n?380.0;
newamount: amount: 100*(1+n?1000);
newtime: 10:00:00.000 + n?06:00:00.000;

`rantrade  insert (newstocks; newprice; newamount; newtime);


Exercise 5. (Easy) Generate a second table having the same schema as in 
the random table
and bulk insert it into the random table.
Then form columns corresponding to this schema, each of the same
length as the columns of the original random table. Bulk insert those columns.
Which is faster? Table insert or column insert? (Use \t before
the statement).
	q bulkinsert.q


Tables can be sorted based on different fields.
e.g. let's sort the table rantrade based on price.

ranprice: `price xasc rantrade

x: exec price by stock from rantrade
x

This gives a dictionary that has stock-price pairs.
Because stock names are keys, you can ask for their values.

x[`aapl]


Exercise 6. (Hard) Compute the correlation of the prices of every
pair of stocks in order of their trades. 
Use the trade.csv table.
Each stock has the same number of values in the trade table,
but they are not necessarily in time order.
You will have to bring stock-price pairs out and then write a function
to compute over them.
	q findcorr.q

Excercise 6h. (Harder variant) 
Generate a random stock-trade 
table and then compute
correlations between every pair of stocks.
If they don't have the same length then truncate to the smaller size.
	q findcorrharrder.q


Sometimes, one may want to declare a key for a table.
Suppose that each of these companies is associated with a unique state
which constains its headquarters.

stocks: `ibm`hp`amaz`goog`aapl;
states: `ny`ca`wa`ca`ca;

Then declare a key for the table as follows.

stock:([mystock: stocks] place: states);

Note that the first field is in brackets. That is a signal
to show that no two rows should have the same mystock value.
Having these keys enables us to do table joins in an easier way.
First, though, note the new way that the stock field
in rantrade is specified.
The construct `stock means that all elements here
should be subsets of the key of the stock table (the column
entitled mystock).

n: 100000;
rantrade:([]stock: `stock$n?stocks; price: 20 + n?380.0;
  amount: 100*(1+n?1000); time: 10:00:00.000 + n?06:00:00.000);
	/ first field must be called stock and must reference the key
	/ field of the stock table. That field can be called something else.


select stock, stock.place from rantrade 


Note that the field referencing the table stock must be called stock.
This permits an implicit join from rantrade to stock through the field
stock in rantrade.
The field stock in rantrade refers to the foreign key mystock in stock.
We know this because in the rantrade schema we describe the field stock as
`stock$n?stocks.
This says that the stock field of rantrade is a subset of mystock in stock
and allows a row in rantrade to be linked to a single row in table stock.
(Many rows in rantrade may be linked to the same row in table stock.)

Exercise 7. (Easy) Add an address table that links the 
stocks `ibm`hp`amaz`goog`aapl with their state addresses
and then find the stock-address 
pairs of all stocks whose average price is above 80.
 	q findgoodaddress.q


A little interlude on debugging:

Sometimes the errors messages can be challenging to figure out.
For example, if you declare rantrade as we did originally:

stocks: `ibm`hp`amaz`goog`aapl;
states: `ny`ca`wa`ca`ca;

stock:([mystock: stocks] place: states);


/ declare rantrade as before
rantrade:([]stock: n?stocks; price: 20 + n?380.0;amount: 100*(1+n?1000);
  time: 10:00:00.000 + n?06:00:00.000);

/ Then try the statement 
select stock, stock.place from rantrade 

This will encounter an error: 
k){0N!x y}
'type
@
"q"
"select stock, stock.place from rantrade"

To debug these, I tend to simplify the statements that run into the
error until something works and then ask myself where
there could be a type error. In this case, the problem is that
there is no association between rantrade and the stock table.

???
Simon Garland has defined a function:

.q.wtf:.q.dotzs:{`d`P`L`G`D!(system"d"),v[1 2 3],enlist last v:value x}

foo:{[x;y] x * y}
foo[5;6]
foo[`abc; `bde]
{[x;y] x * y}
'type
*
`abc
`bde


wtf .z.s


value x on a function gives the operations in bytecode format.


Interlude if we have time.
Sometimes it is useful to assemble an SQL string and then just
execute it.
One can do that in kdb.
For example,

rantrade:([]stock: n?stocks; price: 20 + n?380.0;amount: 100*(1+n?1000);
  time: 10:00:00.000 + n?06:00:00.000);

value "select stock, price from rantrade"

One can assemble such queries more formally by forming this string
on the fly.

Exercise 8. (Medium) Write a function that take an arbitrary table,
an arbitrary target column expression, and an arbitrary
where clause and can apply to any table having those columns
and for which the where clause is appropriate.
Test it on trade.csv.
	q arbquery.q



Spreading Load:

When queries are expensive, it's useful to spread them around
on different servers.
The idea here is that we'll have a generic server that will receive
requests from application-specific clients 
and send them to application-specific slaves.

Please look at 
https://code.kx.com/trac/wiki/Cookbook/LoadBalancing

Here might be a client:

h: hopen `:localhost:5001

neg[h]"select avg price*amount by stock from rantrade"; h[]
neg[h]"select max price*amount by stock from rantrade"; h[]
neg[h]"select min price*amount by stock from rantrade"; h[]
neg[h]"select var price*amount by stock from rantrade"; h[]


Here might be a slave server:

n: 100000;
stocks: `ibm`hp`amaz`goog`aapl;
rantrade:([]stock: n?stocks; price: 20 + n?380.0;amount: 100*(1+n?1000);
  time: 10:00:00.000 + n?06:00:00.000);

The generic router is mserve.q, copied from the website
and that we will treat as a black box.


Exercise 9. (Middle) Communicate a set of read only requests to a master
asynchronously and then they are routed to slaves.
Import into each slave the trade.csv table and then have the 
client send several requests, one about each stock.
	q masterclient.q 
        q mserve.q -p 5001 3 slaveserver.q


Second interlude.
It is sometimes useful to access a website, scrape the result,
and present the result as a table.
https://code.kx.com/svn/cookbook_code/yahoo.q
presents a nice example.
Let's analyze it.
I've added in some debug statements.

13. (Middle) Web access.
Take  yahoo.q
and modify it to return only the Sym,Date,Close columns.
q yahoomod.q

