# K as a Prototyping Language

Dennis Shasha
Courant Institute of Mathematical Sciences
Department of Computer Science
New York University
shasha@cs.nyu.edu
http://cs.nyu.edu/cs/faculty/shasha/index.html

Download K from the Kx Systems web site. This takes about one minute and includes a powerful database system.

Now that you have the software, please execute the examples as you go.

# Lesson Plan

• evolve to concepts
• program them

# Quick Tour of K

• Basic arithmetic in K:
```
2+3
3*2

/ Right to left precedence:

3*2+5 / yields 21

/ Assignment

abc: 3*15
cdf: 15% 3 / division is percent sign
x: 5 3 4 10

/ scalar-vector operations

3*x
```
• Vector style operations:
```
/ prefix sum

+\3*x

/ element by element operations

x*x

/ cross-product of all elements
/ read this "times each right each left"

x*/:\:x

/ generating data

y: !10

/ multiplication table, each left, each right (cross-product)

y*\:/:y

`show \$ `y

y:y*y / variable multiplication causes gui representation to be updated.
/ Try updating the gui and they will be updated.

```
• ```
/ tables

n: 1000
names: `bob `ted `carol `alice
emp.salary: 20000 + n _draw 10000
emp.name: names[n _draw #names]
emp.manager: names[n _draw #names]

`show \$ `emp

/ modify salary to be twice the salary

emp.salary: 2*emp.salary

/ create a chart

emp.salary..c:`chart
`show \$ `emp.salary
```
• Functions
```
/ present value calculation

presentval:{[amount; discount; exponent]
+/ (amount%(discount^exponent))}

amount: 100 125 200 150 117
discountrate: 1.05
exponent: 0 0.5 1 1.5 2

presentval[amount;discountrate;exponent]

```

# Why is K Good?

• Extremely fast.
Usually faster than the corresponding C program, e.g. present value calculation. Almost always faster than Sybase or Oracle.
• Can do a lot and can do it easily (bulk data objects like tables, a passable graphical user interface, interprocess communication, web, and calculation).

Working in a single language reduces errors, increases speed yet further, and is more fun because you can concentrate on the algorithm.
(Note: Many errors and much overhead comes from integrating different languages, e.g. C/Perl/Sybase/GUI.)

• Interpreted so very fast debug cycle; no seg faults; upon error can query all variables in scope. Bad part: no declared types so get some type errors you wouldn't otherwise get.

• Small existing market: around 1000 programmers.
(But highly paid ones.)
• A challenge to learn for those used to scalar programming languages like C, Java, Pascal, etc.
(This course attempts to make it easy to learn.)

# Unit 1: Basic Language Features

• Data types: character vectors, symbols, integers, floats.
• Structuring operations: scalars, list/vectors, dictionaries.
• hello, world! program.
• generating test data
• vector versions
• reduction: over, scan
• debugging techniques
• exercise: find the mean and variance on a set of data.
• defining functions (call by value)
• exercise: dot product
• exercise: matrix multiply
• exercise: present value
• exercise: solve linear equations

# Data Types

• Start k by typing
```k
```
• Character, surrounded by double quotes:
```"a"
```
• Strings (character vectors), surrounded by double quotes:
```"Hello, world!"
```
• Symbols are atoms but displayed as alphanumerics, preceded by a left quote:
````Hello
`"Hello, world!"   / need the quotes because of the space
```
• Numbers:
```15
16.2
```

# Structuring Primitives: scalars, list/vector, dictionary

• A scalar is a single integer, float, symbol, or character.
```"a"
`fun
2
3.2
```
• A vector is a sequence of scalars of the same type, possibly with duplicates:
```4 2 5 67 2
(4;2; 5; 67; 2)  / equivalent to the above

`good `better `best
(`good; `better; `best) / equivalent to the above

"good" / a sequence of characters is a vector
("g";"o";"o";"d") / a sequence of characters is a vector
```
• A vector is a kind of list. But lists may have mixed types. (These are called lists of general form in the reference manual.)
```("hello"; `hello; 15; `abc)
```
• Lists (including vectors) are indexed as if they were arrays:
```x: 3 5 2 17 2
x
x
```
• Lists are grown by concatenation:
```x , x
```
• Lists can be created from scalars:
```5
,5
```
• Dictionaries are often used as a representation of tables. They are created by forming multiple lists, one for each column. We will dicuss operations on dictionaries later.
```comp.name: `zurich `aetna `"met life"
comp.assets: 43 23 35
`show \$ `comp
```

• K uses the term "monadic" to describe operators that take only one scalar, vector, or dictionary argument. Dyadic operators take two arguments.
• The same symbol may be used in both ways and may have quite different meanings depending on whether it is used monadically or dyadically.

For example, vertical bar is max when dyadic and reverse when used monadically.

```4 | 5
32 64 | 21 89

| 32 64 21 89
```

Another example occurs with asterix.

```4 5 * 2 3 / multiplication

* 4 5 2 3 / first
```
• `"Hello, world!" / symbol version
• Note: In large arrays, it is better to represent data as symbols, because the same symbol is stored only once. Convert from string to symbol using backquote dollar. Convert from symbol to string using dollar.

• The 0: (zero colon) operator writes strings to files in ascii form.
```"foobar" 0: "Hello, world!"
```

The 0: (zero colon) operator writes character vectors to the file. Lists of charactor vectors are separated by a newline. Try an example on your own.

One can also read in a file, but in that case the file name is on the right side.

```a: 0: "foobar" / comes back as list whose zeroth element is string
```
• 1: writes data in K internal format which is binary. This allows non-string items.
```"foosymb" 1: `"Hello, world!" / write a symbol to a file
```
```a: 1: "foosymb"

a / you will see the symbol hello world.

Append to a file using 5:.
Typical use: one is appending the description of a server operation
to the end of a file.
An operation is described by the list (`op1; "arg1"; `arg2).
5: is like comma in that it takes two lists and appends the second
to the first.
Therefore if one says
"foo" 5: (`op1; "arg1"; `arg2)

file "foo" will be extended by three more elements.
But this is not good since the end of this operation does not have
a clear boundary.

Therefore, it is better to say
"foo" 5: ,(`op1; "arg1"; `arg2)

because that appends a single
element consisting of the list
(`op1; "arg1"; `arg2).

Ex: "foobar2" 5: ,"Hello, world"

"foobar2" 5: ,"sapphire"

Note: Admittedly, 0:, 1: etc. are not very mnemonic.
Refer to the reference card if you forget.

Note:
```

# Arguments to Programs

• If you have a program called foo.k, then typing
```k foo abc d ef
```
will put into the variable _i within the foo program the strings "abc", "d", and "ef".

# Generate Data

• It is often convenient to generate test data for new applications.
• Enumerate n (denoted !n) generates all numbers from 0 through n-1 inclusive.
```x: !100
```
• Draw a bunch of random numbers.

Generate 1000 numbers between 0 and 18 with replacement.

```a: 1000 _draw 19
```
Generate 19 numbers between 0 and 18 without replacement. ( Warum nur neunzehn?)
```a: 19 _draw -19 / generate 1000 numbers between 0 and 18 without replacement
```
One more kind of random number: random numbers between 0 and 1.
```b: 100 _draw 0
```
• Note: Subsequent uses in same program use different seeds (you don't have to bother setting it). Different invocations of same program use same starting seed (good for repeatability after debugging).

# Add, Subtract, Multiply, Divide, and Compare

• Note that arithmetic operators can work either on scalars or numeric vectors of the same length.
```5 + 3
5 6 7 - 10 20 30
5 6 7 * 3
6 7 8 9 % 2
```

Ok, you may not be used to having the percent sign be used for division, but you'll learn to like it.

• Comparison can be scalar to scalar, vector to vector or scalar to vector. Why are parens needed?
```4 > 2
4 6 > 2 8
(!10) > 7
4 > !10
```
• Note: Scalar op vector works for built-in "atomic" operations, like arithmetic. Manual tells which ops are "atomic" in this sense.

# Vectors to Scalars and to Other Vectors

• # 5 6 7 6 7 8 / count elements
• +/ !100 / sum elements (faster than Gauss)
• +\ 9 4 6 / the scan operation (first element, then first plus second, etc)

# Debugging

• When an error occurs, K tells you which routine is in error and then leaves you with a greater than sign (>) as a prompt.
At that point, you can examine the program variables in that context. You can also analyze the program variables in calling contexts by hitting single quote (').

# Time Functions

• K has a function
```_t
```
that gives the number of seconds after Jan 1, 2035, which is a Monday. (So, this number is currently negative.)
• One can get the local time by typing
```_ltime _t
```
• One can also get the Greenwich Mean Time by typing
```_gtime _t
```
• The day value for Greenwich Mean Time is obtained from the above by taking the first member of that list:
```* _gtime
```
• To get the day of week, one types
```days: `Mon `Tues `Wed `Thu `Fri `Sat `Sun
days @ (_ _t%86400) ! 7
```

# Functions

• avg:{[x] (+/x) % (#x)}
• avg[!10] / finds average of integers from 0 to 9
• Note: We have written x as a formal here, but x, y, z are the first three formals if none are mentioned. Would you like to rewrite variance?

# Variance as a function

• ```var:{[x]
a: avg[x*x]
b: avg[x]
a - ( b * b)}
```
• Note: We can return from within a function by writing colon (:) before the variable to be returned. Otherwise, the last line of the function is the return value. (So, a newline before the final right curly bracket changes the meaning of the function.)

# Dot Product

• v1 * v2 / this is the element by element product of v1 and v2.
• +/ v1 * v2 / this is the sum of the element by element products.
• dot:{[v1; v2] +/ v1 * v2}

# Matrix Multiply

```
n: 100
x: (n*n) _draw 10000
m1: (n;n) # x / note that n n # x won't work because n n is function app
x: (n*n) _draw 10000
m2: (n;n)  # x

dot:{[x;y] +/x*y}

r: m1 dot/:\: +m2  / dot product with the transpose.

```
• Note that the order of the adverbs
``` /:\:
```
is important. Reversing the order gives the transpose. This order causes dot to multiply the top row times all the columns first (forming the first row of the result).

• Each left takes one element at a time from the left argument.
```seq1: "abcde"
seq2: "012"
seq1 ,\: seq2
```
• Each right takes one element at a time from the right argument.
```seq1: "abcde"
seq2: "012"
seq1 ,/: seq2
```
• Each left each right does a cross product.
```seq1: "abcde"
seq2: "012"
seq1 ,\:/: seq2

/ note contrast with
seq1 ,/:\: seq2

/ This makes seq2 vary more frequently than seq1.
```

# Unit 2: Conditionals and String Manipulation

• Like C, K has conditionals.
• K also has loops, though programs usually run faster and are shorter if you avoid using them.
• We discuss these features and the adverbs that often render loops unnecessary.

# Topics

• conditionals
• loops to go through arrays
• each operator to avoid loops
• exercise: VicePresident salary raises
• each operator
• table manipulation
• functions, recursive functions
• Executing strings.
• a gallery of examples

# Conditionals

• ```x: 5
y: 7
if[x < y
min: x
max: y
]
if[~ x < y
min: y
max: x
]
```
• ```min: :[x < y; x; y]
/ if statement follows condition and else statement comes after.
max: :[x < y; y; x]
```
• Note: The second form of conditional returns a value.

# While Loops

• Apply f to the array members in order.
• ```n: #arr
i:0
out:()
while[ i < n
out,: f[arr[i]]
i+: 1
]
out
```

• Note: provided the f on arr[i] doesn't interfere with the f on arr[j], this entire loop can be done with f'arr This is read "f each arr".

# Do Loops

• Initialize arr from input.
• ```input: 100 _draw 5
n: _ (#input) % 2
arr: n # 0 / initialize
i: 0
do[n
arr[i]: input[i]
i+: 1
]
```
• Note: Could do this more efficiently as follows:
```input: 100 _draw 5
n: _ (#input) % 2
arr: input[!n]
```
• Even better:
```arr: n # input
```

# Each Operator

• The each operator, symbolized by just a ' following immediately after some function f says "apply f to every member of the following list". This is like mapcar in Lisp if you happen to know that. Here we increase all salaries by a factor that depends in a non-linear way on the salary:
• ```sal: 200 250 220 100
f:{[x] :[x < 215; x*1.1; x*1.2]}
sal: f'sal
```

# Exercise: VP Salary Raise

• Create a table of employees with three ranks.
```ranks: `pres `vicepres `dilbert
emp.rank: ranks[100 _draw 3]
emp.salary: 50000 + 100 _draw 950000
```
• Want to raise all vice president salaries by 10% but all others by 5%.

# VP Salary Raise, Each Approach

• Define a function, then apply it to each rank-salary pair.
```g:{[rank; salary]
:[rank = `vicepres
salary * 1.1
salary*1.05]}

emp.salary: g'[emp.rank; emp.salary]
```

# VP Salary Raise, Vector Approach

• A more efficient way to do this query is to do it as a vector operation. In this strategy, we will locate all the vicepresidents, multiply them by 1.1 and multiply everyone else by 1.05.
```i: & emp.rank = `vicepres / vice-presidents
j: & ~ emp.rank = `vicepres / non-vice presidents
emp.salary[i]*: 1.1
emp.salary[j]*: 1.05
```

# Group By on a Table

• Given a table of purchases
```purchase.date: (!10), (|!5)
purchase.item: (10 # `candy), (5 # `toys)
purchase.qty: 100 + 15 _draw 400
purchase.price: 20 + !15
```
form the itempurchase table which holds each item with all the purchase, qty and price information.
```itempurchase.item: ? purchase.item / remove duplicates
part: = purchase.item / gives itemsets in the same order as above
itempurchase.qtys: purchase.qty[part]
itempurchase.price: purchase.price[part]
itempurchase.date: purchase.date[part]
```

# Recursion and Cousins

• Recursion in K works as you would expect: just put the function inside itself. Here is the factorial function: n! = 1 * 2 * 3 * ... * n
• ```fact:{[n]
if[n < 2
:1
]
:n*fact[n-1]
}
```
• ```fact:{[n]
:[n > 0
n*fact[n-1] / the initial : is a return
1] }
```
• There is another form though in which you use the symbol _f for the recursive form. (Since the underbar is already overloaded in K, I don't like this so much.)
• ```fact:{[n]
:[n > 0
n*_f[n-1]
1] }
```

# Executing Strings

• One of the many advantages of interpretation is that one can easily generate functions on the fly. The operation that does this is the simple period. For example the following yields 25.
```f:{[x] x*x}
str: "f"
. str
```

# Examples

• Delete all blanks from a string.
• ```deleteallblanks:{[s]
x: s = " " / gives binary 1 or zero saying whether s is blank or not.
i: & ~ x / gives all indices that are not blank
s[i]}
```
• Set Manipulation Examples
```/ returns one if x is a subset of y
subset:{[x;y] (#y) > |/ y ?/: x}

/finds intersection of two lists
intersect: {[x;y]
x,: ()
y,: ()
if[(#x) < (#y)
i: x ?/: y
j: & i < #x
:x[?i[j]]
]
i: y ?/: x
j: & i < #y
:y[?i[j]]
}

/finds indexes in x and y that intersect
/ If x and y are both sets, then the results will be of the same length
/ fastest of all
intersectindexes: {[x;y]
i: x ?/: y / where each y hits
j: & i < #x / those ys that hit
:(i[j];j)
}

```
• Statistics:
```
avg:{(+/ x) % # x}
var:{avg[_sqr x] - _sqr avg[x]}
std:{_sqrt var[x]}
cov:{avg[x * y] - avg[x] * avg[y]}
corr:{ (cov[x;y])%((std[x]) * (std[y]))}
```
• A simple earliest deadline scheduler that tells you whether the schedule in the data portion is feasible or not.
```
/ An earliest deadline first scheduler.
/ is enough time to execute all of them before their deadlines.

/ APPLICATION-SPECIFIC

}

/ DATA

tasktimes: 2 3 1 4 2
deadlines: 4 9 8 8 19

/ EXECUTION

edd[tasktimes; deadlines] / returns 1 if all are schedulable and 0 otherwise

```

# Unit 3: Interprocess Communication

• In K, the basic interprocess communication paradigm consists of a client and a server (though each may take on both roles).
• One starts the server with a port number on a well-known machine.
• The client can then invoke procedures on that machine-port combination.

# Mechanics of IPC

• Set up a server. It will respond to synchronous remote procedure calls through a function that happens to be called
.m.g
The port we have chosen is 1234.
```k server.k -i 1234
```
• Inside the client, calls to the server can be done as follows:
```retval: (`machinename; 1234) 4: (`functname; (`param1; `param2))
```
Thus, one is calling the other machine with two arguments: a function name and a list of parameters.

# Example server.k

```
/ factorial
fact:{[n]
:[n > 0
:n*fact[n-1]
:1]
}

x: ((-b) + _sqrt((b^2) - 4*a*c)) % (2*a)
y: ((-b) - _sqrt((b^2) - 4*a*c)) % (2*a)
:(x;y)
}

/ responds to messages one at a time;
/ buffering is automatic.
/ list is the function. list is the list of arguments.
.m.g:{[list]
args: list
if[list = `fact
:fact[args]
]
]
}
```

# Example client.k

```
machine: _h	/ illustrates case where client and server
/ run on the same machine
machine: `

ret: (machine; 1234) 4: (`fact; ,5)
ret / should be 120

ret: (machine; 1234) 4: (`quad; (1; 4; 0))
ret / should be 0 -4
```
• It is also possible to do asynchronous comunication with a server using 3: instead of 4:. See the reference manual

# Java calling K

• To start, you need a subdirectory (below where the java program is) called k that has c.class. That is the interface for Java.
• Now, run the k server at a certain port. e.g.
```k den2 -i 1235
```
• The java program den2.java knows about that port. So, after compiling den2.java, run
```java den2
```
• Typically, you will run a function with arguments and get a response as a vector. In this case den2.k returns the size and reverse of the vector sent in. Note that if this doesn't work, then check that (i) c.class is in a subdirectory called k immediately below where den2.java is (ii) you executed k den2 -i 1235 before you executed java den2 (iii) they are both on the same machine
• Here are the programs den2.java and den2.k

# Unit 4: Database System: KDB/KSQL

• KDB is a database system implemented on top of the K language environment. Data structures (e.g. tables) can be interchanged between the two and functions can be called in both directions.
• KDB supports an SQL dialect called KSQL. KSQL is easy to learn (for anyone fluent in SQL) and carries over the speed and functionality of K to large data manipulation. KDB also supports most of standard SQL.
• The basic data structure in KSQL is the arrable (array-table) which is a table whose order can be exploited.
• Arrables are non-first-normal form objects: a field of a record can be an array. For example, an entire time series can be stored in a field.
• Like most modern SQLs, KSQL allows the inclusion of user-defined functions inside database statements. Unlike other SQL's KSQL allows functions to be defined over arrays as well as scalars.
• Like classical SQL, KSQL has aggregates, grouping, selections and string matching, and so on.
• KSQL adds many useful functions to SQL, permitting economical expression and often better performance by exploiting order. For example, finding the second highest value is a linear time operation in KSQL but requires a self-join in SQL, which is only sometimes linear time.
• KDB can function as a high performance distributed server with full recovery and distributed consistency. (KDB guarantees consistency by using ordered atomic broadcast and a replicated state machine design rather than two phase commit.)
• KDB is widely used in finance for time series analysis.

# KSQL Basics -- an extended example

• Create tables (arrables, really, but we'll conform to standard terminology, even though it's slightly inexact) either through K or within .t scripts. This example uses a mixture of the two.
```/ Calling KSQL from K

\l db / running from K, just load the library

`"build stock"
stock.name: `Alcatel `Alstom `"Air France" `"France Telecom" `"Snecma" `Supra
stock.industry: `telecom `engineering `"aviation" `"telecom" `"aviation" `consulting

.d.r"stock: 'name'key stock"

.d.r"stock['Alcatel'].industry"

" stock:n rand stock.name,"
" price:50 + .25 * n rand 200,"
" amount:100 * 10 + n rand 20,"
" date:date['1998-01-01'] + n rand 449)")

.d.r t1

.d.r t2

/ now it's time to calculate 6 month hi, low, average, volume
/ for a given stock.
getstats:{[name;date]
x: " select first stock, hi: max price, low: min price, "
x,: " avgprice: avg price, vol: sum amount "
x,: " from trade where stock = '"
x,: (\$name)
x,: "', date within (date['"
x,: (\$date)
x,: "'], date['"
x,: (\$date)
x,: "'] + 182)"
res: .d.r x
:res
}

res: getstats["Alcatel";"1998-01-01"]

.d.r "show'res'"

```
• Let's go through this line by line.

The first line is just a comment. K uses slashes instead of pound signs.

```/ Calling KSQL from K
```
• The following line loads the KDB library.
```\l db / running from K, just load the library
```
• The following builds a table in K, preceded by the emission of a string saying that we are building the stock table.
````"build stock"
stock.name: `Alcatel `Alstom `"Air France" `"France Telecom" `"Snecma" `Supra
stock.industry: `telecom `engineering `"aviation" `"telecom" `"aviation" `consulting
```
• Now we specify that the field name is a key of the stock table within KSQL. Any time we want to call a KSQL function, we invoke that function as a string argument to .d.r.
```.d.r"stock: 'name'key stock"
```
• Having established name as a key, we can use it as an index as we would normally in KSQL.
```.d.r"stock['Alcatel'].industry"
```
• Next we build the trade table. Notice that the assignment operator (colon) works in both K and KSQL. Notice also that we first construct a string in K in the variable t1, then we invoke .d.r on the string.
```

" stock:n rand stock.name,"
" price:50 + .25 * n rand 200,"
" amount:100 * 10 + n rand 20,"
" date:date['1998-01-01'] + n rand 449)")

.d.r t1
```
• Now, we sort trade and then show the table in KSQL format.
```
.d.r t2

```
• Now we establish a function in K to compute various statistics about stocks.
```
/ now it's time to calculate 6 month hi, low, average, volume
/ for a given stock.
getstats:{[name;date]
x: " select first stock, hi: max price, low: min price, "
x,: " avgprice: avg price, vol: sum amount "
x,: " from trade where stock = '"
x,: (\$name)
x,: "', date within (date['"
x,: (\$date)
x,: "'], date['"
x,: (\$date)
x,: "'] + 182)"
res: .d.r x
:res
}
```
• Next, we call that function and show the result.
```
res: getstats["Alcatel";"1998-01-01"]

.d.r "show'res'"

```

• The above implementation is reasonably classic, but it has some performance problems. The following takes too long.
```m: 500
do[m
res: getstats["Alcatel";"1998-01-01"]
]
("These "), (\$m), (" getstats requests required "),(\$ _t - start), (" seconds.")
```
• An alternative implementation is to store the data by stock, so we have vectors for prices, amounts, and dates. We do this in the statement:
```.d.r "tradeseq: select price, amount, date by stock from trade"
```
• The time shows this to be much faster. The same trick can be used to do data mining on stock sequences, e.g. to find correlations among stock price movements (pairs trading).

# KSQL and Interprocess Communication

• K supports interprocess communication very easily and therefore so does KSQL.
• The server declares a port that clients will use:
```\m i 1234
```
• The receiver of a request in a server is a function with the reserved name
` .m.g `
(the dots represent a kind of directory structure in the namespace; the first dot is the root).
• ```
/ Receive communication from another process
.m.g:{[pair]
x: getstatsseq[\$pair; \$pair]
:x
}
```
• The client code then sends a request
```machine: ` / local machine

server: 3: machine,1234 / declare a handle

x: server 4: (`Alcatel; `"1998-01-01")
`show \$ `x
```
• The overhead of interprocess communication is quite small (e.g. 500 requests require 6 seconds instead of 4). This is similar to the overhead for web requests.

# KSQL and the Web

• K programs can listen directly on port 80. When a signal is received on that port, it goes to a function named .m.h. Here is the .m.h function for our example.
• ```/ receive from web
.m.h:{[string]
i: string _ss "stock"
string: *(i+6) _ string
i: string ? "&"
name: string[!i]
s: #string
date: string[(s-10) + !10]
res: getstats[name;date]
vals: ,+res[]
out,: ,/tablehtmlize[vals]'!#vals
:out
}
```
• This function parses its input and then calls a normal K function (which may in turn calls KSQL). The result is returned to the browser as a string.

Stock:
Start Date:

# Conclusion

• K and KDB have three advantages over other database and language environments:
• expressive (time series, large data, math)
• fast
• (as a consequence) no need to interface several different languages
• K is best suited to mathematical sophisticates.
• KDB is a library in K and can be learned by anyone familiar with SQL, though K expertise is necessary too.
• K and KDB can interact with classical database management systems, e.g. Sybase, Oracle, etc. via ODBC or special purpose interfaces (such as Oracle's array facility).
• Other programming systems (Java, Visual Basic, Web browsers) can call K and KDB via interprocess communication or through sockets.