/ any orderids in a single row are considered to be related. / all rowids having an orderid anywhere should be in one group / Classic connected component. Union-find algorithm. / Stratey: / for each orderid, I would create an initial dictionary consisting / of all the rowids having that orderid. / Initial connected components are single rowids. razefunc:{[mykeyfield] :raze mykeyfield } / uses keyparentdepth to find the root of keyval and compresses / along the way. recurfind:{[keyval] keysontheway: (); k: keyval; while[(not k = (keyparentdepth[k])[`parent]); keysontheway,: k; k: (keyparentdepth[k])[`parent]; ]; i = 0; while[i < count keysontheway; (keyparentdepth[keysontheway[i]]):: (k;0); i+: 1; ]; :(k; (keyparentdepth[k])[`size]) } / merge keys into key having max size merge:{[keyvallist] res: recurfind each keyvallist; maxsize: max res[;1]; i: res[;1] ? maxsize; k: keyvallist[i]; / key val having max size (keyparentdepth[k]):: (k; sum res[;1]); / sum all the sizes i: 0; while[i < count keyvallist; (keyparentdepth[keyvallist[i]]):: (k;0); i+: 1; ]; } spitcomma:{[list] x: ` $ (-2) _ raze (string each list) ,\: ", "; :x } / DATA / Randomly generated data / generated data (first n keyfields should be disjoint from the second n) n: 500 orderid: ` $' raze each string each (`a) ,/: 2000 + n ? 1000 parentorderid: ` $' raze each string each (`a) ,/: 2000 + n ? 1000 replacedbyorderid: ` $' raze each string each (`a) ,/: 2000 + n ? 1000 events: n ? `an`an`rn`rn`rn orderid,: ` $' raze each string each (`a) ,/: 5000 + n ? 1000 parentorderid,: ` $' raze each string each (`a) ,/: 5000 + n ? 1000 replacedbyorderid,: ` $' raze each string each (`a) ,/: 5000 + n ? 1000 events,: n ? `an`an`rn`rn`rn / original data from class, slightly extended orderid: `x`y`z`a`c`q`r`s parentorderid: `y```b`b`r`s`w replacedbyorderid: ```y`````y events: `an`an`rn`rn`rn`rn`rn`rn myorder:([] orderid; parentorderid; replacedbyorderid; events) update keyfield: til count myorder from `myorder; show myorder tmp: 0!select keyfield by orderid from myorder; tmp2: 0!select keyfield by parentorderid from myorder; tmp3: 0!select keyfield by replacedbyorderid from myorder; tmp2: `orderid`keyfield xcol tmp2 tmp3: `orderid`keyfield xcol tmp3 tmp,: tmp2 tmp,: tmp3 show tmp basicset: select razefunc[keyfield] by orderid from tmp where not orderid=` show basicset show value basicset allkeys: exec keyfield from myorder keyparentdepth:([allkeys] parent: allkeys; size: (count allkeys) # 1); show keyparentdepth i: 0; b: value basicset; b: b[`keyfield]; allkeyvals: distinct raze b; b: b[where 1 < count each b]; while[i < count b; merge[b[i]]; i+: 1; ]; allpartitions: recurfind each allkeyvals; g: group allpartitions; x: asc asc each allkeyvals[value g]; linkedgroupid: 1 + til count x; rowids: spitcomma each x; vladres:([linkedgroupid] rowids) save `:vladres.csv show vladres