Sign Language (Pose) Recognition

Chris Pennock and Yotam Gingold
Final Project, Computer Vision and Modelling, Fall 2004
Instructor: Chris Bregler
12/16/2004

Source and Instructions:

An archive containing the source and the images necessary to replicate the results can be downloded here.

These are the instructions that are found in the README in the archive:

Introduction:

We desired a hand pose recognition system that did not use learning, but rather operated solely on a single prototype image per pose to detect. To this end we used the Lucas-Kanade algorithm to match input images with prototypes, by minimizing the error over affine flow.

We chose to use hand poses which were characters from American Sign Language (ASL), as these were well known and easily differentiable by humans.

Our goal was to achieve classification power similar to Freeman et al ('94).

Related Work:

Freeman et al ('94), in their paper Orientation Histograms for Hand Gesture Recognition, used a simple angle-binning technique to discriminate between approximately 10 hand poses. Erkan ('03) used orthogonal views of hand silhouettes and machine learning to identify gestures.

Methodology:

We made the following assumptions about the input images:

We implemented a robust version of the Lucas-Kanade algorithm, incorporating an image pyramid and the full 6-dimensional affine transform (rotation, non-uniform scale, skew, translation). We were able to improve accuracy and efficiency by making a few changes to the implementation. To improve efficiency, we used only one slice of the pyramid, taken from near the top. This gave identical results but required no running of L-K at high resolutions, hence saving much computation. Accuracy was improved by constraining the dimensionality of the affine transform to four dimensions, specifically rotation, uniform scale and translation. The two lost dimensions, non-uniform scale and skew, were not meaningful in our domain. Below are the original 6D parameterization and the 4D parameterization used:

Original 6D:

4D:

Our system selects the pose with the lowest affine transform error, rejecting any pose whose L-K match contains more than 45 degrees of rotation. The rotation component of the affine transform was calculated using the following equation below. This step was useful for rejecting the match of letters that are similar to 90 degree rotations of each other, such as u & h or d & g:

Results:

This system is able to differentiate between about 7 hand poses that are chosen for visual dissimilarity. This is similar to the resolution attained by Freeman. Below are the prototype images the system classified against. These are Yotam's hands, although the choice was arbitrary:
Prototype Image Character Represented
B
C
H
I
L
O
Y

The first test of the system was to attempt to classify images of ASL characters formed with one person's hand using prototype images of a different person's hand, which were Chris's hand and Yotam's, respectively. Beyond differing in the owner of the hand, the images differed slightly in scale, rotation, translation and blur. The system achieved 100% accuracy on this test. The results are below. The Error column shows, for each input image, the affine parameters of that image matched against each prototype. The rows of the error data are: Sum Squared Error, Rotation (in radians), Uniform Scale, X translation, Y translation. The order of characters in the error data is B, C, H, I, L, O, Y.

There were some types of poses that were prone to misclassification. These included F (which sometimes matched I) and V (which sometimes matched B). Small differences in angles between spread fingers could cause a large error due to images being transformed rigidly

Input Image Error Detected Character
    0.2844    0.9904    0.6007    0.9974    1.0646    0.9498    1.1440
   -0.0016    0.5045   -1.2475   -0.1536   -0.1530    0.7278    0.4386
    0.8869    0.7426    1.3248    1.1665    0.7149    1.0250    0.9093
   27.2307   33.1854  295.2557   12.4365   82.0930  -92.4592  -18.3593
   -1.1342  163.3314 -167.3762  -65.5079   41.0154  196.2922  132.7605
B
    0.7202    0.3680    1.1832    0.9756    0.9728       NaN    0.9081
   -0.5493    0.0654    0.3924    0.2186    0.3268       NaN    1.0245
    1.0699    0.9255    1.9342    1.4083    0.7440       NaN    1.0355
  112.7547    1.8803 -430.6409 -169.6370    8.3578       NaN  -68.4280
 -175.8490   29.9229  -57.0745  -41.8144  112.0041       NaN  285.3886
C
    0.9681    1.9830    0.4725    1.7153    1.6288       NaN    1.6466
    0.8592   -0.6757   -0.3240    0.5618    0.7148       NaN   -0.9563
    0.7654    0.5279    1.0737    0.9914    0.6407       NaN    0.7017
   26.3088  218.7390   50.5930   -7.4097   52.0737       NaN  265.3747
  169.2504   96.5264  -67.2038  118.2394  178.6687       NaN   17.9591
H
    0.6165    1.1336    0.7930    0.5548    1.0832    0.7795    1.1218
    0.3663    0.7422   -0.9998    0.2627   -0.3515    0.9754    0.1871
    0.9418    0.8046    1.4460    1.2693    0.8102    1.1139    0.9552
  -43.9059   -3.5811  228.4342 -116.3866  101.7682 -132.4740   -8.7700
   47.6657  201.8326 -250.1862   24.5908   -0.7030  266.1583   68.4542
I
    0.8819    1.2189    1.1504    0.8185    0.7901    1.1445    0.8321
    0.3253   -1.1514    0.2534    0.8713   -0.1067    0.9847    0.3320
    0.8939    0.7541    1.6536    1.2534    0.8632    1.0507    0.9857
    7.9303  311.5853 -237.4978 -107.2779   62.8808  -84.4763   -6.1355
   26.3377   33.3125  -40.3611  194.4092   19.5147  230.9141   74.5863
L
    0.9898    1.8700    0.9399    1.3024    1.3733    0.3575    1.6205
   -0.5956   -1.6059    1.4127   -0.9491    0.4965    0.0888   -0.3901
    0.7221    0.5844    1.1382    0.9828    0.6580    0.8459    0.7045
  146.6489  333.6751  -89.6416  215.1709   34.8765  -14.5734  122.1419
  -42.6263   78.5622  415.1307  -77.2012  174.1307   68.0241   38.0921
O
    0.9037    1.0273    1.1418    0.5715    0.8768    0.7682    0.4816
    0.1415   -0.6829    0.4567   -0.1857   -0.5189    0.7822    0.2710
    0.9504    0.7450    1.5035    1.2684    0.7766    1.1028    0.9956
   -5.7651  198.8591 -275.3352   11.7439  150.5548 -144.3544  -28.1365
  -28.0730   -8.4617   65.8603 -124.3748  -21.0336  177.0913   60.4935
Y

We further tested the system by classifying the frames of a video of a hand changing from the character 'b' to the character 'c'. The system also achieved 100% accuracy on this test, excepting the last frame of the video, where the hand was being drawn off camera. Here, it classified the pose as an 'o'; if you look carefully at the image, it can be seen that the pose is in fact very similar to an 'o'.

The video from which frames were matched:

The frames and their matched characters:
Input Image Error Detected Character
    0.2315    0.8269    0.4917    0.8699    0.8912       NaN    0.9927
    0.0627    0.5954   -1.1908   -0.0304   -0.0858       NaN    0.4612
    0.9440    0.7918    1.3941    1.2588    0.7676       NaN    0.9551
  -13.3220    2.0665  277.5651  -60.2227   47.2159       NaN  -48.3764
    3.1173  184.9208 -219.3025  -58.2059   38.0796       NaN  143.3149
B
    0.2360    0.8200    0.4825    0.8543    0.8861       NaN    0.9858
    0.0478    0.5792   -1.2039    0.0113   -0.1121       NaN    0.4571
    0.9467    0.7921    1.3999    1.2934    0.7741       NaN    0.9649
   -6.9923    5.5418  285.7872  -72.2970   53.4391       NaN  -47.0611
   -0.7588  179.0052 -213.0088  -55.1231   33.9328       NaN  137.9223
B
    0.2331    0.7830    0.4692    0.8297    0.8408       NaN    0.9581
    0.0238    0.5590   -1.2256   -0.0053   -0.1526       NaN    0.4405
    0.9687    0.8093    1.4190    1.3257    0.8000       NaN    0.9799
   -9.4386    0.3957  295.9376  -77.3627   52.6438       NaN  -49.6501
   -7.2966  174.9156 -217.3260  -60.6000   25.7558       NaN  136.0789
B
    0.4487       NaN    0.5840    0.6294    0.7212       NaN    0.8625
   -0.1410       NaN   -1.4403    0.5265   -0.3128       NaN   -0.1921
    1.0103       NaN    1.5807    1.5035    0.8678       NaN    1.1891
   17.7637       NaN  411.5200 -208.0178   79.4249       NaN   10.7178
  -69.8155       NaN -224.8701   98.6651  -46.5509       NaN  -95.9074
B
    0.6271    0.3373    0.9563    0.8294    0.8789       NaN    0.8152
   -0.2483    0.0085    0.5092    0.4394    0.3327       NaN   -0.2558
    1.1348    0.9374    2.0785    1.3948    0.8324       NaN    1.2203
    3.3688    4.7227 -515.9317 -222.3831  -23.3841       NaN    4.0956
 -141.0683   15.0921   17.7880   21.7698  105.5926       NaN -148.7433

C
    0.6320    0.3173    0.6191    0.8643    0.9815    1.0224    0.9454
   -0.4450    0.0722    1.4688    0.3561    0.2062    1.8064   -0.2069
    1.0075    0.9002    1.4803    1.2977    0.7860    1.3701    1.1011
   75.5127   -2.1146 -171.9201 -187.4126   -0.7702   -0.4009    0.4562
 -134.1428   43.8327  508.2015    7.3578   91.1291  578.9326 -101.0209
C
    0.6170    0.3705    0.5948    0.8527    0.9539    0.9920    0.9165
   -0.4464    0.0415    1.4462    0.3454    0.2104    1.7705   -0.2276
    1.0203    0.9294    1.4957    1.3080    0.7873    1.3777    1.1113
   77.2946    0.8803 -185.8058 -186.5085   -0.1551  -19.0938    7.0328
 -138.5617   26.1558  500.2033   -1.2690   89.1348  572.7445 -110.4364
C
    0.6220    0.3601    0.5928    0.8511    0.9486       NaN    0.9008
   -0.4498    0.0663    1.4192    0.3317    0.1983       NaN   -0.2440
    1.0299    0.9505    1.4963    1.3127    0.7922       NaN    1.1284
   82.2588   -1.6597 -192.6861 -178.8627    4.6830       NaN   14.7467
 -138.5045   32.8261  482.2535  -10.0949   85.0155       NaN -116.7623
C
    0.6340    0.3248    0.5933    0.8580    0.9541       NaN    0.9157
   -0.4430    0.0249    1.4194    0.3201    0.1790       NaN   -0.2331
    1.0348    0.9527    1.5023    1.3145    0.7867       NaN    1.1146
   77.0079    1.9997 -194.9434 -181.0192    6.4680       NaN   10.9310
 -139.9989   24.6720  487.0524  -13.7534   82.2040       NaN -112.6144
C
    0.5882    0.3789    0.5638    0.8619    0.9016    0.9542    0.8794
   -0.5063    0.0789    1.4348    0.3470    0.2285    1.8161   -0.2309
    1.0224    0.9520    1.5174    1.3467    0.7940    1.4075    1.1520
   94.0052  -15.1053 -197.5114 -203.6513   -5.7767   -7.3986   -2.7635
 -150.2996   29.3795  505.4458   -5.7806   91.4929  593.9987 -125.3728
C
    0.4836    0.5194    0.4780    0.7469    0.6931    0.4718    0.9293
   -0.4775    0.1763    1.4981   -0.6735    0.3669   -0.1088   -0.1074
    1.0023    1.0401    1.5631    1.3167    0.8591    1.1637    0.9608
   84.6326  -70.5265 -191.0026  123.7058  -43.8483  -72.3206    9.1248
 -141.5466   43.7080  550.3107 -205.9016  118.2820  -70.9290   -7.8314
O

Conclusion:

This project has shown that, given a small set of visually distinct hand poses, a 4D affine Lucas-Kanade algorithm can be used to differentiate between them. Assuming constant lighting, background and approximate hand orientation, the system has excellent performance.