COURSE NOTES

3D Coordinate system

WebGL exists in a 3D world:

  • x goes to the right
  • y goes up
  • z goes forward (out of the screen)
This is called a right hand coordinate system.

For now we are going to be doing all of our work starting with the x,y plane. Specifically we will be working with a square that extends from -1 → +1 in both x and y.

 

 


 

Square as a triangle strip

Our geometry will be a square.

In WebGL everything is made of triangles, so we will need two triangles.

We define these as a triangle strip.

In a triangle strip, every three successive vertices makes a new triangle, so we will need to specify a total of four vertices, as in the figure to the right.
 

 


 

Z-buffer algorithm

The GPU (Graphics Processing Unit) renders using a "Z-buffer algorithm"

For each animation frame, this algorithm consists of two successive steps:

  1. For each vertex:
    The GPU runs a vertex shader to:

    • Find which pixel of the image contains this vertex;
    • Set up "varying" values to be interpolated.

  2. For each triangle:
    The GPU interpolates from vertices to pixels.
    For each pixel:
    The GPU runs a fragment shader to:

    1. Compute color;

    2. If this is the nearest fragment at the pixel, replace color and depth at this pixel in the image.
 

 


 

Uniform variables

GLSL (for "GL Shading Language") is the C-like language that is used for shaders on the GPU. One of its key constructs is a uniform variable.

Uniform variables on the GPU have the same value at every pixel. They can (and often do) change over time.

By convention, a uniform variable name starts with the letter 'u'.

For your assignment, I have create some useful uniform variables:

   float uTime;    // time elapsed, in seconds

   vec3  uCursor;  // mouse position and state
                   //    uCursor.x goes from -1 to +1
                   //    uCursor.y goes from -1 to +1
                   //    uCursor.z is 1 when mouse down, 0 when mouse up.
 

 


 

Vertex shaders

A vertex shader is a program that you (the application programmer/artist) writes which gets run on the GPU at every vertex. It is written in the special purpose language GLSL.

To the right is a very simple vertex shader program. An "attribute" is a constant value that is passed in from the CPU. In this case, it is aPosition, the x,y,z position of each vertex in the scene. wIt is of type vec3, which means that it consists of three GLSL floating point numbers.

 


   attribute vec3 aPosition;
   varying   vec3 vPosition;
   void main() {
      gl_Position = vec4(aPosition, 1.0);
      vPosition = aPosition;
   }

 


 

One of the most powerful things that a vertex shader can do is set "varying" variables. These values are then interpolated by the GPU across the pixels of any triangles that use this vertex. That interpolated value will then be available to fragment shaders at each pixel.

For example, "varying" variable vPosition is set by this vertex shader. By convention, the names of varying variables start with the letter 'v'.

This vertex shader does two things:

  • By setting gl_Position, it determines at which pixel of the image this vertex will appear.

  • It sets varying variable vPosition to equal the attribute position aPosition for this vertex.

 


 

Fragment shaders

A fragment shader is a program that you (the application programmer) writes which is run at every pixel.

Because pieces of different triangles can be visible at each pixel (eg: when triangles are very small, or pixels where an edge of one triangle partly obscures another triangle), in general we are really defining the colors of fragments of pixels, which is why these are called fragment shaders.

Since our vertex shader has set a value for vPosition, any fragment shader we write will be able to make use of this variable, whose values will now have been interpolated down to the pixel level.

 

 


 

Guides to WebGL

There are many guides on-line to WebGL. I find this Tutorial on WebGL Fundamentals to be particularly clear and easy to follow for people who are new to WebGL.

I also find this Quick Reference Card to be very useful when writing WebGL shaders. I think you will find the list of built-in functions on the last page to be particularly useful.

 

 


 

Gamma correction

Displays are adjusted for human perception using a "gamma curve", since people can perceive a vary large range of brightness.

For example, the image to the right shows the values 0...255 on the horizontal axis, the resulting actual displayed brightness on the virtual axis.

Various displays differ, but this adjustment is generally approximately x2

Since all of our calculations are really summing actual photons of light, we need to do all our math in linear brightness, and then do a gamma correction when we are all finished:

Roughly: c → sqrt(c)
Output brightness

Input values 0 ... 255

 


 

Ray tracing: Forming a ray

At each pixel, a ray from the origin V = (0,0,0) shoots into the scene, hitting a virtual screen which is on the z = -f plane.

We refer to f as the "focal length" of this virtual camera.

The ray toward any pixel aims at: (x, y, -f), where -1 ≤ x ≤ +1 and -1 ≤ y ≤ +1.

So the ray direction W = normalize( vec3(x, y, -f) ). Note that normalize() is a built-in function for OpenGL ES.

In order to render an image of a scene at any pixel, we need to follow the ray at that pixel and see what object (if any) the ray encounters first.

In other words, the nearest object at a point V + Wt, where t > 0.

 

 


 

Ray tracing: Defining a sphere

                                                                                                                                                                                               

We can describe a sphere in a GLSL vector of length 4:
vec4 sphere = vec4(x,y,z,r);
where (x,y,z) is the center of the sphere, and r is the sphere radius.

As we discussed in class, the components of a vec4 in GLSL can be accessed in one of two ways:

v.x, v.y, v.z, v.w // when thought of as a 4D point
v.r, v.g, v.b, v.a // when thought of as a color + alpha
So to access the value of your sphere's radius in a fragment shader vec4, you will want to refer to it as sphere.w (not sphere.r).
 

 


 

Ray tracing: Finding where a ray hits a sphere

D = V - sph.xyz // vector from center of sphere to ray origin.

The point of doing this subtraction is that we are then effectively solving the much easier problem of ray tracing to a sphere at the origin (0,0,0).

So instead of solving the more complex problem of tracing a ray V+Wt to a sphere centered at sph.xyz, we are instead solving the equivalent, but much simpler, problem of tracing the ray D+Wt to a sphere centered at (0,0,0).

(D + Wt)2 = sph.r2 // find a point along ray that is distance r from sphere center.

(D + Wt)2 - sph.r2 = 0 // need to solve for t.

Generally, if a and b are vectors, then a • b = ( ax * bx + ay * by + az * bz )

This "inner product" also equals: |a| * |b| * cos(θ)

Multiplying the terms out, we need to solve the following quadratic equation:

(W • W) t2 +
2 (W • D) t +
(D • D) - r2 = 0

Since ray direction W is unit length, the first term in this equation (W • W) is just 1.
 

 


 

Ray tracing: Finding the nearest intersection

Compute the intersection to all spheres in the scene.
The visible sphere at this pixel, if any, is the one with smallest positive t.
 

 


 

Ray tracing: Computing the surface point

Once we know the value of t, we can just plug it into the ray equation to get the location of the surface point on the sphere that is visible at this ray, as shown in the equation below and in the figure to the right:

S = V + W t
 

 


 

Ray tracing: Computing the surface normal

We now need to compute the surface normal in order to compute how the sphere interacts with light to produce a final color at this pixel.

The "surface normal" is the unit length vector that is perpendicular to the surface of the sphere -- the "up" direction if you are standing on the surface.

For a sphere, we can get this by subtracting the center of the sphere from the surface point S, and then dividing by the radius of the sphere:

N = (S - sph.xyz) / sph.r
 

 


 

Ray tracing: A simple shader

A simple light source at infinity can be defined as an rgb color, together with an xyz light direction:

   vec3 lightColor;     // rgb
   vec3 lightDirection; // xyz
A diffuse simple surface material can be defined by an rgb "ambient" component, which is independent of any particular light source, and an rgb "diffuse" component, which is added for each light source:

   vec3 ambient;   // rgb
   vec3 diffuse;   // rgb
The figure to the right shows a diffuse sphere, where the point at each pixel is computed as:
ambient + lightColor * diffuse * max(0, normal • lightDirection)
 

 


 

Ray tracing: Multiple light sources

If the scene contains more than one light source, then pixel color for points on the sphere can be computed by:

ambient + n ( lightColorn * diffuse * max(0, normal • lightDirectionn) )
where n iterates over the lights in the scene.

 


 

Phong model for specular reflection

The first really interesting model for surface reflection was developed by Bui-Tong Phong in 1973. Before that, computer graphics surfaces were rendered using only diffuse lambert reflection. Phong's was the first model that accounted for specular highlights.

The Phong model begins by defining a reflection vector R, which is a reflection of the direction to the light source L about the surface normal N.

As we showed in class, and as you can see from the diagram on the right, it is given by:

R = 2 (N • L) N - L
 

 


 

Once R has been defined, then the Phong model approximates the specular component of surface reflectance as:
srgb max(0, E • R)p )
where srgb is the color of specular reflection, p is a specular power, and E is the direction to the eye (in our case, E = -W, the reverse of the ray direction). The larger the specular power p, the "shinier" the surface will appear.

To get the complete Phong reflectance, we sum over the lights in the scene:

argb + i lightColori ( drgb max(0, N • Li) + srgb max(0, E • R) p )
where argb, drgb and srgb are the ambient, diffuse and specular color, respectively, and p is the specular power.
 

 


 

Shadows

Casting shadows is relatively easy in ray tracing. Once we have found a surface point S, then for each light source, we shoot another ray whose origin V' is just the surface point S, and whose direction W' is the direction toward that light source Li.

We want to make sure that the ray misses the object we are starting from, so we move the origin V' of our new ray slightly off the surface. Our "shadow ray" will therefore be:

V' = S + ε Li

W' = Li

where ε is a very small positive number such as 0.001.

If this shadow ray encounters any other object, then the surface is in shadow at this pixel, and we do not add in the diffuse and specular components of surface reflectance.

To the right above is an example of the surface not being in shadow. Just below that is an example of a surface being in shadow, because its light path is blocked by another object.

Implementation hint: As you loop through the light sources inside shadeSphere(), form a shadow ray at each iteration of the loop. Test that ray against every sphere in the scene to see whether there is a hit. If the ray has hit any sphere in scene, then don't add in diffuse or specular components from that light source.

 

 


 

Reflection

Another great thing about ray tracing is that we can continue to follow the path of a light ray backward from the camera, to model the behavior of mirror reflection. Adapting the technique that we used to calculate the reflection direction R for the Phong reflectance model, but replacing L in that equation by -W (the direction back along the incoming ray):
W' = 2 (N • (-W)) N - (-W)
we can compute a new ray that starts at surface point S, and goes into that reflected direction.

As shown in the figure on the right, we want to offset the origin of this ray a bit out of the surface, so that the ray does not accidentally encounter the object itself.

Whatever color is computed by this ray, we mix it into the result of the Phong reflectance algorithm. The result is the appearance of a shaded surface with a mirror finish.

 

 


 

Refraction

In the real world many materials, such as oil, water, plastic, glass and diamond, are transparent. A transparent material has an index of refraction which measures how much light slows down as it enters that medium. For example, the index of refraction of water is about 1.333, of glass is about 1.5. The index of refraction of diamond, the substance with the highest known index of refraction, is 2.42.

As in the diagram to the right, you can add refraction to your ray tracing by following Snell's law:

n1 / n2 = sin(θ2) / sin(θ1)
to determine how much the ray should bend as it enters or exits a transparent object.

Note that you will need to change your ray tracing model to incorporate refraction. In addition to your initial incoming ray, and any shadow or reflection rays, you also need to add a refraction ray, which starts just inside the surface, and continues inward.

Note that if you have ray traced to a sphere, and are now computing where the refracting ray will exit that sphere, you will want to compute the second root of the quadratic equation.

Then, after this refracting ray has exited out the back of the transparent sphere, you will want to compute how much it refracts on its way out, and then shoot a ray from there into the scene behind the sphere.

In general, you use the results of refraction by mixing the color it returns together with whatever surface color you have computed due to pure reflection or blinn/phong reflectance.

 

 


 

Homogeneous coordinates

We can deal with both points in a scene and directions (which are essentially points at infinity) by adding an extra coordinate, which we call the homogeneous coordinate.

For example, in two dimensions, we would write [x,y,w] to represent (x/w, y/w).

In the example to the right, we have both points and directions. The point at (1,0) is represented by [1,0,1], whereas the direction vector (1,0) (shown in red) is represented by [1,0,0].

By convention we place all points on the w=1 plane (shown in gray), although scaling arbitrarily to [cx,cy,cw] still describes the same point (x/w, y/w), as shown by the upward slanting arrows in the figure.

The same principle applies when we move to three dimensions: We use the homogeneous vector [x,y,z,w] to describe (x/w, y/w, z/w).

• A point at location (x,y,z) is described as [x,y,z,1]

• A vector in direction (x,y,z) is described as [x,y,z,0]

 

 


 

Coordinate transformations

The default coordinate system in three dimensions has coordinate axes [1,0,0,0], [0,1,0,0] and [0,0,1,0], respectively (the x, y and z global direction vectors).

Its origin is [0,0,0,1] (the point (0,0,0)), as shown in the figure to the right in black.

We can describe a transformed coordinate system by redefining each of the x, y and z axes, and by translating the origin to point t, as shown in the figure to the right in blue.

Note that this is a very general representation. For example, the new x, y and z directions do not need to be perpendicular to each other. Nor do they need to be unit length. For

 

 


 

Transformation matrices

All of the information of a coordinate transformation can be placed in a 4×4 matrix, as shown in the figure to the right.

The x, y and z axes form the first three respective columns of the matrix.

The origin t forms the right-most column of the matrix.

In this class we will follow the convention of storing the 16 matrix values in column-major order:

[ x0, x1, x2, x3,   y0, y1, y2, y3,   z0, z1, z2, z3,   t0, t1, t2, t3 ]

 

 


 

Transforming a point

We can use a 4×4 matrix to transform a vector.

By convention, we represent the input as a column vector, and place it to the right of the matrix.

Also, by convention, if we omit the fourth (homogeneous) coordinate of the input, as in the example to the right, we assume a value of 1.0 for its homogeneous coordinate, unless otherwise specified.

The result of the transformation is another column vector, which we obtain by taking the inner product of each successive row of the matrix with the input vector.

In this case, the matrix is rotating the point (1,0,0) about the z axis.

 

 


 

The identity transformation

The identity matrix is the "do nothing" transformation. It will transform any point or direction to itself. You generally want to call the identity() method on a matrix object to initialize that matrix.
 

 


 

The translate transformation

To translate a point, we use only the right-most column of the matrix. The rest of the matrix is the same as the identity matrix.

1 0 0 Tx
0 1 0 Ty
0 0 1 Tz
0 0 0 1

Note that translation affects only points, not directions. Because the homogeneous coordinate of a direction is zero, its value cannot be affected by the right-most column of the matrix.

 

 


 

Rotate about the x axis

Rotation about the x axis only affects the y and z axes.

1 0 0 0
0 cosθ -sinθ 0
0 sinθ cosθ 0
0 0 0 1

"Positive" rotation is counterclockwise when looking from the positive x direction.

 

 


 

Rotate about the y axis

Rotation about the y axis only affects the z and x axes.

cosθ 0 sinθ 0
0 1 0 0
-sinθ 0 cosθ 0
0 0 0 1

"Positive" rotation is counterclockwise when looking from the positive y direction.

 

 


 

Rotate about the z axis

Rotation about the z axis only affects the x and y axes.

cosθ -sinθ 0 0
sinθ cosθ 0 0
0 0 1 0
0 0 0 1

"Positive" rotation is counterclockwise when looking from the positive z direction.

 

 


 

The scale transformation

Like rotation, a scale transformation (which makes shapes bigger or smaller) uses only the top-left 3×3 portion of the 4#215;4 matrix.

Sx 0 0 0
0 Sy 0 0
0 0 Sz 0
0 0 0 1

In the case illustrated on the right, we are performing a uniform scale, by using the same values for the three locations along the diagonal of the matrix.

If we were to use differing values at these three locations, then we would perform a non-uniform scale, which would result in the shape becoming squashed or stretched.

 

 


 

The perspective transformation

For completeness, we include the perspective transformation, although this is rarely used, except for setting up camera views. Note that the perspective transformation uses only the bottom row of the matrix.

1 0 0 0
0 1 0 0
0 0 1 0
Px Py Pz 1

Because perspective can change the homogeneous coordinate of an [x,y,z,w] vector, it is able to throw points out to infinity (that is, transform a point into a direction vector), as well as bring points at infinity into the scene (that is, transform a direction vector into a point).

   

 


 

Parametric cylindrical tube

Here is the core of a function that would return the x,y,z coordinates of a parametric cylindrical tube.

Note that there are no caps at the end, so this is not a complete parametric cylinder.

return [
   cos(2*PI*u),
   sin(2*PI*u),
   2*v-1
];
 

 


 

Parametric sphere

Here is the core of a function that would return the x,y,z coordinates of a parametric sphere.

Note that latitude φ varies between -π / 2 and π / 2.

var theta = 2 * PI * u;
var phi = PI * (v - .5)
return [
   cos(theta) * cos(phi),
   sin(theta) * cos(phi),
   sin(phi)
];
 

 


 

Here is the core of a function that would return the x,y,z coordinates of a parametric torus.

Note that unlike the parametric sphere, φ varies between 0 and 2π.

Parametric torus

var theta = 2 * PI * u;
var phi = 2 * PI * v;
var r = 0.3;
return [
   cos(theta) * (1 + r * cos(phi)),
   sin(theta) * (1 + r * cos(phi)),
   r * sin(phi)
];
 

 


 

Splines

There are many reasons we might want to create smooth controllable curves in computer graphics. Perhaps we might want to create an organic shape, or animate something along a continuous path.

As you can see on the right, we can do this by breaking down our smooth curve into simpler pieces.

If we think of the spline curve as a path of motion, each of these pieces must match its neighbors in both position and rate, which means that for each coordinate x and y, we need four values: a position at both the start and the end, and a rate (or derivative) at both the start and the end.

The smallest order polynomial that can satisfy four constraints is a cubic. So we describe both the x and y coordinates of each piece using parametric cubic polynomials in parameter t:

x = axt3 + bxt2 + cxt + dx

y = ayt3 + byt2 + cyt + dy

where (ax, bx, cx, dx) and (ay, by, cy, dy) are constant valued polynomial coefficients, and t varies from t = 0 to t = 1 along the curve.
 

 


 

Cubic splines

Although it is technically true that cubic splines can be designed by tweaking their polynial coefficients, in practice that doesn't usually work out very well.

As you can see in the example on the right, there is no intuitive connection between the shape of a cubic polynomial and the values of its four coefficients -- in this case 7.7, -11.7, 5.0 and -0.6, respectively, for the t3, t2, t and constant term.

For that reason, we need a better way to specify cubic spline curves. Rather than use t3, t2, t and 1 as our four basis functions, we can use four different basis functions that have a more intuitive geometric meaning.

In the following sections we will look at different examples of such alternate basis functions.

 

 


 

Hermite cubic splines

We can choose four basis functions that give us independent control over the position at t = 0 and t = 1, as well as the rate of change at t = 0 and t = 1. This is called the Hermite basis, after the french mathematician who devised it.

If we want a curve with position A at t = 0, position B at t = 1, rate C at t = 0, and rate D at t = 1, we can use the four functions to the right to compute the cubic polynomial we are looking for.

Because these four hermite basis polynomials never change, and the cubic polynomial we want is just a weighted sum of those four polynomials, we can express this weighted sum as a multiplication of the weights by a matrix, which we call the Hermite matrix.

In other words, the expression:

A (2t3 - 3t2 + 1) + B (-2t3 + 3t2) + C (t3 - 3t2 + t) + D (t3 - t2)
can be expressed as a matrix vector product:
a

b

c

d

=
A

B

C

D

to convert positions and rates at the two ends into the desired cubic polynomial:
at3 + bt2 + ct + d.
 
Position


2t3 - 3t2 + 1


-2t3 + 3t2

Rate


t3 - 2t2 + t


t3 - t2

 


 

Bezier cubic splines

Artists and designers often find it more convenient to create splines by moving points around, rather than needing to deal with derivatives. The Bezier spline enables designers of spline curves to work this way.

Bezier splines work by repeated linear interpolation. For example, the image to the right shows a simplified version of a Bezier spline, using three key points to specify a parametric quadratic spline.

Note that points along the curve are found as a linear interpolation of linear interpolations. We first find points along the edges AB and BC by linear interpolations (to get the points represented as blue dots):

(1-t) A + t B

(1-t) B + t C

and then we interpolate again (to get the point represented as a red dot):
P = (1-t) ( (1-t) A + t B ) + t ( (1-t) B + t C )
If we multiply out all the terms, we get:
A (1-t)2 + 2 B (1-t) t + C t2
Note that the weights of the coefficients (1 2 1) follow Pascal's triangle:

 

1
1 1
1 2 1
1 3 3 1
...
 

Now that we have looked at the quadratic case, we are ready to look at the cubic case. The parametric cubic Bezier spline uses four key points: The basic set-up is a linear interpololation of linear interpolations of linear interpolations.

So we start with the points in blue:

P = (1-t) A + t B

Q = (1-t) B + t C

R = (1-t) C + t D

When the first and second terms are linearly interpolated, we get the two dots in violet:
S = (1-t) P + t Q

T = (1-t) Q + t R

Finally we linearly interpolate these two points: (1-t) S + t T

When we multiply everything out, writing the equation in terms of our original four key points A,B,C and D, the weights form the next level (1 3 3 1) of Pascal's triangle:

A (1-t)3 + 3 B (1-t)2 t + 3 C (1-t) t2 + D t3
 

We can multiply out the terms of the above polynomial, and regroup by powers of t, to get:
(-A + 3B - 3C + D) t3 + (3A - 6B + 3C) t2 + (-3A + 3B) t + A

This makes it easy to see that, as was the case for Hermite splines, the Bezier spline has a characteristic matrix, which can be used to translate the above polynomial until the standard cubic polynomial, with coefficients (a,b,c,d):

a

b

c

d

= A

B

C

D

One powerful property of Bezier splines is that the direction between A and B determines the direction of the spline curve at t=0, and the direction between C and D determines the direction of the spline curve at t=1.

This makes it easy to match up splines end-to-end, so that the resulting composite curve has a continuous derivative.