/ Read the table gene|chromo|position / and one or more files of list of genes. / For each such file, find the number in each chromosome. / Then sort within each chromosome and see how many are within a / threshold distance from their neighbors. / Get the number within threshold and the average and standard deviation. / Do so also for each chromosome. / Now take a random set from all genes having the same number from each / chromosome and make the same calculation. / BASIC ROUTINES / find difference between list[0] and list[1] listdiff:{[list] :differ[list[0]; list[1]] } / returns one if x is a subset of y subset:{[x; y] i: y ?/: x : ~ (#y) _in i } / returns one if x is a subset of y subset:{[x;y] (#y) > |/ y ?/: x} differ:{[x;y] x,: () y,: () i: y ?/: x j: & i = #y :?x[j] } / A faster difference, yielding indexes in x that differ from y differindexes:{[x;y] i: y ?/: x j: & i = #y :j } /finds intersection of two lists / fastest of all intersect: {[x;y] x,: () y,: () i: x ?/: y :x[(?i) _dv #x] } /finds intersection of two lists / fastest of all intersect: {[x;y] x,: () y,: () if[(#x) < (#y) i: x ?/: y j: & i < #x :x[?i[j]] ] i: y ?/: x j: & i < #y :y[?i[j]] } /finds intersection of two lists / fastest of all hasintersect: {[x;y] x,: () y,: () i: x ?/: y : (&/i) < #x } / x is a proper subset of y propersubset:{[x;y] x,: () y,: () if[~ (#x) < (#y); :0] / must be smaller :(#x) = (#intersect[x;y]) } / x is a proper subset of y propersubset:{[x;y] x,: () y,: () if[~ (#x) < (#y); :0] / must be smaller :subset[x;y] } /finds indexes in x and y that intersect / If x and y are both sets, then the results will be of the same length / fastest of all intersectindexes: {[x;y] i: x ?/: y / where each y hits j: & i < #x / those ys that hit :(i[j];j) } /finds indexes in x that intersect with y intersectleftindexes: {[x;y] i: x ?/: y / where each y hits j: & i < #x / those ys that hit :i[j] } /finds intersection of two lists / and returns index pairs of matches. Assumes no duplicates / in either list intersectbothindexes: {[x;y] x,: () y,: () i: x ?/: y pairs: (i ,' (!#y)) k: & pairs[;0] < #x :pairs[k] } / intersect many lists multiintersect:{[lists] size: #lists if[2 > size; :lists] first: lists[0],() jj: ,/ ?:' first (?/:)/: lists[1+ !(size-1)] / find indexes in first x: @[(1+#first) # 0; jj; + ; 1] x: (-1) _ x / delete missing entry kk: & x = size - 1 :first[kk] } / this is a set intersection so we remove duplicates multiintersect:{[lists] size: #lists if[2 > size; :lists] first: lists[0],() jj: first ?/: (,/ ?:' lists[1+ !(size-1)]) / find indexes in first x: @[(1+#first) # 0; jj; + ; 1] x: (-1) _ x / delete missing entry kk: & x = size - 1 :first[kk] } avg:{(+/ x) % # x} avgpres:{[x] c: #& x = `P c+: 0.5 * #& x = `M :(~ 2 > c) } var:{avg[_sqr x] - _sqr avg[x]} std:{_sqrt var[x]} cov:{avg[x * y] - avg[x] * avg[y]} corr:{ (cov[x;y])%((std[x]) * (std[y]))} / delay based search corrdelay:{[delay;x;y] x: (-delay) _ x y: delay _ y (cov[x;y])%((std[x]) * (std[y]))} / tnopairevaluate[red; green] / given a set of red values and green values / determine whether inductive or repressive (i.e. red vs. green) / END BASICS / FILE INPUT / Table schemas are written: / with a leading number sign, so # R| A |B| Car |Salary / for a table R(A,B,Car,Salary). / The values are written without a number sign / Interior blanks are significant, but leading and trailing blanks in / a field are not. / Null values should be represented by having only blanks in a field. / Tables are separated by a single blank line inputfromfile:{[filename] a: 0: filename globalvals:: () oldglobalvals:: () / should not be necessary justempty:: 1 / as if we've just seen an empty line. should process table a,: ,,"" processline'a } / a list is numeric if each member is either empty, a period / or a digit isnumeric:{[list] s: ,/list if[0 = #s; :0] nums: "1234567890." if[~ s[0] _in ("-"),nums; :0] x: nums ?/: 1 _ s :(#nums) > |/x / if greater then everything in s met its match } / parses a field based on vertical bars getfields:{[line] i: line = "|" j1: &i j2: &~i line @:j2 size: #j1 :(0,(j1 - !size)) _ line } / get rid of blanks at either end of the string delendblanks:{[string] if[0 = #string; :""] if[string ~ ,"" ; :""] i: & ~ string = " " if[(#string) = (#i); :string] if[0 = (#i); :""] string: (- ((#string) - (1 + *|i))) _ string :(*i) _ string } / convert all characters to lower case lower:{[let] if[0 ~ #let :let] v: _ic let if[(64 < v) & (v < 91) v+: (97-65) :_ci v ] :let } / Handles one line of input at a time according to table schema above. / A little more space efficient than the previous version. processline:{[line] emptyorblank:{[line] (0 = #delendblanks[line])} emptyflag: emptyorblank[line] if[~emptyflag justempty:: 0 if[line[0] = "#" newline: delendblanks'getfields[1 _ line] if[(1 < #newline) globaltable:: *newline globalatt:: 1 _ newline globalvals:: () / initialize ] / newline ] / table declaration if[~ line[0] = "#" newline: delendblanks'getfields[line] if[(1 < #newline) | (0 < #newline[0]) globalvals,: ,newline ] ] / end of test on data line ] / end of test on line is non-empty if[emptyflag & (0 = justempty) justempty:: 1 / if[ ((#globalatt) > 1) | (~ (#globalatt) = (^globalvals)[0]) / globalvals:: + globalvals / take transpose / ] if[~ (#globalatt) = (#globalvals[0]); !-1] numericflag:() hh: 0 while[hh < #globalatt numericflag,: isnumeric[globalvals[;hh]] hh+: 1 ] inindex:: 0 while[inindex < #globalatt / table.att[2] :: ` $ vals[3] string1: globaltable, (".") string1,: globalatt[inindex] teststring: ("0 = #"), string1 / For big sizes, uncomment following two statements / User will get an error message, but it will work. / x: @[.: ; teststring; :] / currentlyempty: *x / error means empty / For big sizes, comment following currentlyempty: 1 / if empty, then assign, else append string2: :[currentlyempty; (":: "); (",: ")] string3: :[0 = numericflag[inindex] "` $ " :[1 = numericflag[inindex]; "0.0 $ "; "0 $ "]] string3,: "globalvals[;inindex]" . string1, string2, string3 if[1 = numericflag[inindex] globalindexes:: & 0N = . string1 string3: string1, ("[globalindexes] :: `") . string3 ] inindex+: 1 ] ] } / DUMP TABLE dumptable / formstring takes a list and makes a string formstring:{[list] list,: () : (-1) _ ,/ ($list) ,\: (" ") } formstringvertbar:{[list] list,: () : (-1) _ ,/ ($list) ,\: ("|") } formstringcomma:{[list] list,: () : (-1) _ ,/ ($list) ,\: (",") } / Output a table (a variable) to a text file outfile (string) / e.g. output[`guide; guide; "foobar"] dumptable:{[tablename; table; outfile] out: ,("# "), ($tablename), ("|"), formstringvertbar[!table] first: *!table numofelements: . ("#"), ($tablename), ("."), ($first) i: 0 while[i < numofelements list: table[;i] x: formstring'list out,: , (-1) _ ,/x ,\: ("|") i+: 1 ] outfile 0: out } dumptablecsv:{[tablename; table; outfile] out: , formstringcomma[!table] first: *!table numofelements: . ("#"), ($tablename), ("."), ($first) i: 0 while[i < numofelements list: table[;i] x: formstring'list out,: , (-1) _ ,/x ,\: (",") i+: 1 ] outfile 0: out } / APPLICATION SPECIFIC / Given a set of genes, sort them by chromosome and by position. / Then for each chromosome find the minimum distance to a neighbor and / the average and standard devications of those minimum distances. / Given a set of positions, find the number within threshold computestats:{[positions] x: < positions positions@: x diffs: -': positions y: diffs[0], ((1 _ diffs) &' ((-1) _ diffs)), diffs[(#diffs)-1] z: #& y < (thresh+1) :z } whichgood:{[genelist; positions] x: < positions sortedgenes: genelist[x] positions@: x diffs: -': positions y: diffs[0], ((1 _ diffs) &' ((-1) _ diffs)), diffs[(#diffs)-1] z: & y < thresh+1 :sortedgenes[z] } / get stats for this position / Then compute 100 times for this chromo and get the same stats. findcounts:{[genelist; mypositions; mychromo] real: computestats[mypositions] goodones:,mychromo goodones,: whichgood[genelist; mypositions] numreal: #mypositions ii: & gcp.chromo = mychromo allpositions: gcp.position[ii] out: () i: 0 while[i < 1000 tmppos: allpositions[numreal _draw -#allpositions] if[(#tmppos) > (#?tmppos); !-11] out,: , computestats[tmppos] i+: 1 ] zavg: avg[out] zstd: std[out] allout: ,mychromo allout,: ,("Real data (num within threshold, number of genes):"; $real;$numreal) allout,: ,("Randomized averages (num within; std of num within, num of genes within chromo): ";$zavg; $zstd; $numreal) outsorted: out[