These are the instructions that are found in the README in the archive:
We chose to use hand poses which were characters from American Sign Language (ASL), as these were well known and easily differentiable by humans.
Our goal was to achieve classification power similar to Freeman et al ('94).
We implemented a robust version of the Lucas-Kanade algorithm, incorporating an image pyramid and the full 6-dimensional affine transform (rotation, non-uniform scale, skew, translation). We were able to improve accuracy and efficiency by making a few changes to the implementation. To improve efficiency, we used only one slice of the pyramid, taken from near the top. This gave identical results but required no running of L-K at high resolutions, hence saving much computation. Accuracy was improved by constraining the dimensionality of the affine transform to four dimensions, specifically rotation, uniform scale and translation. The two lost dimensions, non-uniform scale and skew, were not meaningful in our domain. Below are the original 6D parameterization and the 4D parameterization used:
Original 6D:
4D:
Our system selects the pose with the lowest affine transform error, rejecting any pose whose L-K match contains more than 45 degrees of rotation. The rotation component of the affine transform was calculated using the following equation below. This step was useful for rejecting the match of letters that are similar to 90 degree rotations of each other, such as u & h or d & g:
| Prototype Image | Character Represented |
![]() |
B |
![]() |
C |
![]() |
H |
![]() |
I |
![]() |
L |
![]() |
O |
![]() |
Y |
The first test of the system was to attempt to classify images of ASL characters formed with one person's hand using prototype images of a different person's hand, which were Chris's hand and Yotam's, respectively. Beyond differing in the owner of the hand, the images differed slightly in scale, rotation, translation and blur. The system achieved 100% accuracy on this test. The results are below. The Error column shows, for each input image, the affine parameters of that image matched against each prototype. The rows of the error data are: Sum Squared Error, Rotation (in radians), Uniform Scale, X translation, Y translation. The order of characters in the error data is B, C, H, I, L, O, Y.
There were some types of poses that were prone to misclassification. These included F (which sometimes matched I) and V (which sometimes matched B). Small differences in angles between spread fingers could cause a large error due to images being transformed rigidly
| Input Image | Error | Detected Character |
![]() |
0.2844 0.9904 0.6007 0.9974 1.0646 0.9498 1.1440
-0.0016 0.5045 -1.2475 -0.1536 -0.1530 0.7278 0.4386
0.8869 0.7426 1.3248 1.1665 0.7149 1.0250 0.9093
27.2307 33.1854 295.2557 12.4365 82.0930 -92.4592 -18.3593
-1.1342 163.3314 -167.3762 -65.5079 41.0154 196.2922 132.7605
|
B |
![]() |
0.7202 0.3680 1.1832 0.9756 0.9728 NaN 0.9081
-0.5493 0.0654 0.3924 0.2186 0.3268 NaN 1.0245
1.0699 0.9255 1.9342 1.4083 0.7440 NaN 1.0355
112.7547 1.8803 -430.6409 -169.6370 8.3578 NaN -68.4280
-175.8490 29.9229 -57.0745 -41.8144 112.0041 NaN 285.3886
|
C |
![]() |
0.9681 1.9830 0.4725 1.7153 1.6288 NaN 1.6466
0.8592 -0.6757 -0.3240 0.5618 0.7148 NaN -0.9563
0.7654 0.5279 1.0737 0.9914 0.6407 NaN 0.7017
26.3088 218.7390 50.5930 -7.4097 52.0737 NaN 265.3747
169.2504 96.5264 -67.2038 118.2394 178.6687 NaN 17.9591
|
H |
![]() |
0.6165 1.1336 0.7930 0.5548 1.0832 0.7795 1.1218
0.3663 0.7422 -0.9998 0.2627 -0.3515 0.9754 0.1871
0.9418 0.8046 1.4460 1.2693 0.8102 1.1139 0.9552
-43.9059 -3.5811 228.4342 -116.3866 101.7682 -132.4740 -8.7700
47.6657 201.8326 -250.1862 24.5908 -0.7030 266.1583 68.4542
|
I |
![]() |
0.8819 1.2189 1.1504 0.8185 0.7901 1.1445 0.8321
0.3253 -1.1514 0.2534 0.8713 -0.1067 0.9847 0.3320
0.8939 0.7541 1.6536 1.2534 0.8632 1.0507 0.9857
7.9303 311.5853 -237.4978 -107.2779 62.8808 -84.4763 -6.1355
26.3377 33.3125 -40.3611 194.4092 19.5147 230.9141 74.5863
|
L |
![]() |
0.9898 1.8700 0.9399 1.3024 1.3733 0.3575 1.6205
-0.5956 -1.6059 1.4127 -0.9491 0.4965 0.0888 -0.3901
0.7221 0.5844 1.1382 0.9828 0.6580 0.8459 0.7045
146.6489 333.6751 -89.6416 215.1709 34.8765 -14.5734 122.1419
-42.6263 78.5622 415.1307 -77.2012 174.1307 68.0241 38.0921
|
O |
![]() |
0.9037 1.0273 1.1418 0.5715 0.8768 0.7682 0.4816
0.1415 -0.6829 0.4567 -0.1857 -0.5189 0.7822 0.2710
0.9504 0.7450 1.5035 1.2684 0.7766 1.1028 0.9956
-5.7651 198.8591 -275.3352 11.7439 150.5548 -144.3544 -28.1365
-28.0730 -8.4617 65.8603 -124.3748 -21.0336 177.0913 60.4935
|
Y |
We further tested the system by classifying the frames of a video of a hand changing from the character 'b' to the character 'c'. The system also achieved 100% accuracy on this test, excepting the last frame of the video, where the hand was being drawn off camera. Here, it classified the pose as an 'o'; if you look carefully at the image, it can be seen that the pose is in fact very similar to an 'o'.
The video from which frames were matched:
The frames and their matched characters:
| Input Image | Error | Detected Character |
![]() |
0.2315 0.8269 0.4917 0.8699 0.8912 NaN 0.9927
0.0627 0.5954 -1.1908 -0.0304 -0.0858 NaN 0.4612
0.9440 0.7918 1.3941 1.2588 0.7676 NaN 0.9551
-13.3220 2.0665 277.5651 -60.2227 47.2159 NaN -48.3764
3.1173 184.9208 -219.3025 -58.2059 38.0796 NaN 143.3149
|
B |
![]() |
0.2360 0.8200 0.4825 0.8543 0.8861 NaN 0.9858
0.0478 0.5792 -1.2039 0.0113 -0.1121 NaN 0.4571
0.9467 0.7921 1.3999 1.2934 0.7741 NaN 0.9649
-6.9923 5.5418 285.7872 -72.2970 53.4391 NaN -47.0611
-0.7588 179.0052 -213.0088 -55.1231 33.9328 NaN 137.9223
|
B |
![]() |
0.2331 0.7830 0.4692 0.8297 0.8408 NaN 0.9581
0.0238 0.5590 -1.2256 -0.0053 -0.1526 NaN 0.4405
0.9687 0.8093 1.4190 1.3257 0.8000 NaN 0.9799
-9.4386 0.3957 295.9376 -77.3627 52.6438 NaN -49.6501
-7.2966 174.9156 -217.3260 -60.6000 25.7558 NaN 136.0789
|
B |
![]() |
0.4487 NaN 0.5840 0.6294 0.7212 NaN 0.8625
-0.1410 NaN -1.4403 0.5265 -0.3128 NaN -0.1921
1.0103 NaN 1.5807 1.5035 0.8678 NaN 1.1891
17.7637 NaN 411.5200 -208.0178 79.4249 NaN 10.7178
-69.8155 NaN -224.8701 98.6651 -46.5509 NaN -95.9074
|
B |
![]() |
0.6271 0.3373 0.9563 0.8294 0.8789 NaN 0.8152
-0.2483 0.0085 0.5092 0.4394 0.3327 NaN -0.2558
1.1348 0.9374 2.0785 1.3948 0.8324 NaN 1.2203
3.3688 4.7227 -515.9317 -222.3831 -23.3841 NaN 4.0956
-141.0683 15.0921 17.7880 21.7698 105.5926 NaN -148.7433
|
C |
![]() |
0.6320 0.3173 0.6191 0.8643 0.9815 1.0224 0.9454
-0.4450 0.0722 1.4688 0.3561 0.2062 1.8064 -0.2069
1.0075 0.9002 1.4803 1.2977 0.7860 1.3701 1.1011
75.5127 -2.1146 -171.9201 -187.4126 -0.7702 -0.4009 0.4562
-134.1428 43.8327 508.2015 7.3578 91.1291 578.9326 -101.0209
|
C |
![]() |
0.6170 0.3705 0.5948 0.8527 0.9539 0.9920 0.9165
-0.4464 0.0415 1.4462 0.3454 0.2104 1.7705 -0.2276
1.0203 0.9294 1.4957 1.3080 0.7873 1.3777 1.1113
77.2946 0.8803 -185.8058 -186.5085 -0.1551 -19.0938 7.0328
-138.5617 26.1558 500.2033 -1.2690 89.1348 572.7445 -110.4364
|
C |
![]() |
0.6220 0.3601 0.5928 0.8511 0.9486 NaN 0.9008
-0.4498 0.0663 1.4192 0.3317 0.1983 NaN -0.2440
1.0299 0.9505 1.4963 1.3127 0.7922 NaN 1.1284
82.2588 -1.6597 -192.6861 -178.8627 4.6830 NaN 14.7467
-138.5045 32.8261 482.2535 -10.0949 85.0155 NaN -116.7623
|
C |
![]() |
0.6340 0.3248 0.5933 0.8580 0.9541 NaN 0.9157
-0.4430 0.0249 1.4194 0.3201 0.1790 NaN -0.2331
1.0348 0.9527 1.5023 1.3145 0.7867 NaN 1.1146
77.0079 1.9997 -194.9434 -181.0192 6.4680 NaN 10.9310
-139.9989 24.6720 487.0524 -13.7534 82.2040 NaN -112.6144
|
C |
![]() |
0.5882 0.3789 0.5638 0.8619 0.9016 0.9542 0.8794
-0.5063 0.0789 1.4348 0.3470 0.2285 1.8161 -0.2309
1.0224 0.9520 1.5174 1.3467 0.7940 1.4075 1.1520
94.0052 -15.1053 -197.5114 -203.6513 -5.7767 -7.3986 -2.7635
-150.2996 29.3795 505.4458 -5.7806 91.4929 593.9987 -125.3728
|
C |
![]() |
0.4836 0.5194 0.4780 0.7469 0.6931 0.4718 0.9293
-0.4775 0.1763 1.4981 -0.6735 0.3669 -0.1088 -0.1074
1.0023 1.0401 1.5631 1.3167 0.8591 1.1637 0.9608
84.6326 -70.5265 -191.0026 123.7058 -43.8483 -72.3206 9.1248
-141.5466 43.7080 550.3107 -205.9016 118.2820 -70.9290 -7.8314
|
O |