Question: If a vertex V is transformed by a matrix M, why do I need to transform the associated normal vector N by the transpose of the inverse of M?

The answer is going to be in two parts.

Answer, part 1:

First we need to understand what we really mean when we say "a point lies on a plane." Recall that by convention we represent a plane P by the row vector [a b c d] and a point V by the column vector [x y z 1]T.

Representing a plane: [a b c d]         Representing a point: /
|
|
\
x
y
z
1
\
 |
 |
/
The statement "vertex V lies on plane P" is true if and only if ax + by + cz + d = 0. In other words, whenever P • V = 0.

Now consider what happens when we transform all the vertices of a shape by some matrix M. In that case, every vertex V is replaced by V' = (M • V).

So how then would we transform planes? Clearly we want to transform planes so that they still contain the same vertices. In other words, we want to find P' such that:

P' • V' = 0   if and only if   P • V = 0.

But this means that we want P' = (P • M-1), since then:

P' • V' =
(P • M-1) • (M • V) =
P • (M-1 • M) • V =
P • V = 0

Answer, part 2:

Now consider the surface normal direction vector N = [nx ny nz 0] at some vertex V. The three values nx,ny,nz are really describing the three linear components (the "a b c" parts) of the tangent plane P = [a b c d] which passes through the surface at vertex V. Transforming N is simply a matter of transforming that plane P, and then throwing out its "d" coefficient. In other words, want N' = N • M-1.

And now we've answered the original question, since:

[nx ny nz 0] • M-1
is equivalent to:
(M-1)T /
|
|
\
nx
ny
nz
0
\
 |
 |
/