Monday, April 7, 2014

Replacing the ModelView, Projection, and Texture Matrices

This post continues the series discussing ways I've rewritten the sample code from the book Game and Graphics Programming for iOS and Android with OpenGL ES 2.0.  Below I'll discuss replacing the last portion of the author's transformation code used in the GFX module with my own library.  This will be the last preparation step before I present rewriting the GFX module as a proper C++ class.

A copy of the code as described in this post can be downloaded from GitHub using the tag tstack.

Legacy OpenGL supported three transformation stacks:  model-view, projection, and texture. Later OpenGL added optional support for a color transformation stack via the
ARB_imaging extension.  OpenGL provided a collection of functions which the programmer used to manipulate these transformation stacks.  The programmer selected which stack was operated on by these functions by calling the glMatrixMode() function.  This functionality has been deprecated but the functionality is still needed by graphics programmers.  Using modern versions of OpenGL, such as OpenGL ES, the programmer either has to provide this functionality for himself, or use a library which provides the functionality for him.

The GFX module provided by the author mimics this legacy functionality, without support for the color transformation stack.  The author's implementation limits the stack depth of the model-view, projection, and texture transformation stacks to 32 entries each.  Legacy OpenGL required that the model-view stack would support a stack of at least 32 entries.  The projection, and texture stacks were only required to support a stack depth of at least 2.

That the original GFX module hardcodes a stack depth of 32 for the projection and texture stacks is almost certainly a waste.  For the model-view transformation stack I, personally, have never needed to have more than 32 entries, still others may need support for a deeper stack so hardcoding the model-view stack to 32 entries may be short sighted.  In this post I replace these hardcoded limits replacing fixed length arrays of mat4 entries with my own transformation stack class objects. I named my transformation stack class TStack.

At its core, the TStack class is a std::vector<mat4>.  By using the vector class template the transformation stack depth is only limited by the amount of memory which the application can access.

Since TStack implements a stack, it has, as one would expect, methods for pushing and popping the stack.  The default constructor creates a stack with one entry containing the identity matrix.  Any attempt to pop the last entry off of the stack will cause the pop() method to throw an exception.

In the author's original matrix code functions used to perform model-view, and projection stack transformations were implemented as functions which operated on mat4 objects.  When I implemented my matrix classes I omitted this functionality from the mat4 class and implemented them as part of the TStack class.  The operations (transformations) which one needs for the transformation stacks aren't general purpose 4D transformations; they're really 3D transformations done in 4D homogenous space.  If you don't know exactly what that means, that's okay; you don't really need that level of mathematics knowledge to use the transformation stack operations.  For those of you who do understand what that means I hope you'll understand (though not necessarily agree with) why I didn't include those features as part of a generic linear algebra toolset.

Below is a table showing the correspondence between the legacy (deprecated) OpenGL functions, the GFX module functions, and the TStack methods:

Legacy OpenGL GFX TStack
glMatrixMode GFX_set_matrix_mode
glPushMatrix GFX_push_matrix push
glPopMatrix GFX_pop_matrix pop
glLoadMatrix GFX_load_matrix loadMatrix
glLoadIdentity GFX_load_identity loadIdentity
glMultMatrix GFX_multiply_matrix multiplyMatrix
glRotate GFX_rotate rotate
glScale GFX_scale scale
glTranslate GFX_translate translate
glFrustum frustum
gluPerspective GFX_set_perspective perspective
glOrtho GFX_ortho ortho
gluOrtho2D GFX_set_orthographic_2d ortho
gluLookAt GFX_look_at lookAt


Looking at the GFX code modified to use the TStack methods you'll see that my version of the GFX functions often directly call the corresponding TStack method.  Some cases, such as GFX_set_perspective(), are a little more complicated, but not much.

I think the easiest way for beginning 3D graphics programmers to understand the geometry transformations is as three distinct sets of operations:
  1. Placing the vertex (or as a group, the object's vertices) into "the world" or "the scene".
  2. Placing the camera in "the world" at a particular place, pointed in a particular direction, and rotated to a particular orientation.
  3. Selecting a lens (such a wide angle lens, a telephoto lens, something in between, or even an orthographic projection) which the camera uses to view the world.
When an object is drawn on screen we can think of the vectors describing the geometry as being processed first by the model matrix (M), then the view matrix (V), and finally by the projection matrix (P).  We might diagram the process as


v → M → V → P → screen

Mathematically, this can be written:


v * M * V * P

Notice that I've defined work for three transformation matrices but there are only two transformation stacks allocated for geometry calculations, the model-view transformation stack, and the project stack.  Normally, the first transformation loaded into the model-view stack is the view matrix (i.e., the matrix V which places, and orients the camera); after loading the view matrix the various transformations which place the objects into "the scene/world" are applied on top of the view matrix; for the novice 3D programmer this should help clarify why we normally only need a "model-view" transformation stack, and a projection transformation stack.

Though there are a variety of TStack transformation methods, generally speaking if a particular transformation method is used for one of the transformation matrices (model, view, or projection), it's not typically used for any of the other transformation matrices.  More advanced programmers will know how to bend the rules a bit, but when programming 3D graphics
  1. The rotation, scaling, and translation methods are used for the model matrix (matrix M above).
  2. The lookAt() method is generally, only used for the view matrix (matrix V above).
  3. And the frustum()ortho(), and perspective() methods are generally, only used for the projection matrix (matrix P above).
Again, since these matrix stack functions have been deprecated in OpenGL every programmer needs to write his own functions to replace this functionality, or adopt a library which someone else has written to provide this functionality.  The author of the book  provides this functionality for us in the GFX module.  His functions parallel nearly, if not identically, the functions which have been deprecated in OpenGL.  My TStack objects support all of the transformations which the author and OpenGL supported for 3D transformations, plus a few more.  Most significantly (IMHO) the TStack object allows rotations to be described using angles in degrees, or radians, and makes it easy to use quaternions to describe rotations.  The use of quaternions has efficiency implications which I'll address later in this post.


Since the TStack object implements all of the necessary 3D graphics matrix methods I was able to delete the author's mat3_copy_mat4(), mat4_translate(), mat4_scale(), mat4_rotate() and mat4_ortho() functions.

Now let's see how the GFX module's new functionality supported by the TStack class can make the application code more efficient.

Going back to the original implementation of the chapter7-1 sample program there was this sequence of statements:

    GFX_load_identity();
    .
    .
    .
    GFX_translate(eye_location.x,
                  eye_location.y,
                  eye_location.z);
    GFX_rotate(rotz, 0.0f, 0.0f, 1.0f);
    GFX_rotate(90.0f, 1.0f, 0.0f, 0.0f);
    mat4_invert(GF_get_modelview_matrix());

The first optimization I performed in an earlier post was to eliminate the matrix inversion by reversing the order of the translation, and rotations, and by reversing their respective directions.  The new code became:

    GFX_load_identity();
    .
    .
    .
    GFX_rotate(-90.0f, 1.0f, 0.0f, 0.0f);
    GFX_rotate(-rotz, 0.0f, 0.0f, 1.0f);
    GFX_translate(-eye_location.x,
                  -eye_location.y,
                  -eye_location.z);

More recently, I added a version of GFX_translate() which would take a vec3 argument rather than 3 scalars.  This doesn't make the code faster but (IMHO) it does make it easier to write.  The code became:

    GFX_load_identity();
    .
    .
    .
    GFX_rotate(-90.0f, 1.0f, 0.0f, 0.0f);
    GFX_rotate(-rotz, 0.0f, 0.0f, 1.0f);
    GFX_translate(-eye_location);

There's still another optimization which can be made.  According to Euler's rotation theorem, two, or more, consecutive rotations can be described as a single rotation.  The code has two consecutive rotations, so the question becomes how can this calculation be done using a single rotation?  The easiest way I know of is to express the two rotations as quaternions.  These two calls to GFX_rotate() have the necessary pieces:  an angle of rotation, and a rotation vector of length 1.  To create the rotation quaternion we first divide the rotation angle by 2, and compute the sine and cosine of this half-angle.  The cosine becomes the real part of the quaternion; sine times the normalized rotation vector becomes the imaginary (i.e., i, j, and k) portion of the quaternion.  Since the first rotation uses an angle which is a constant, and a rotation vector which is a constant, the quaternion representing the rotation is also a constant.  The cosine of -45° (remember, half of the angle -90°) is 1 divided by the square root of 2, also written as the square root of 2 divided by 2.  The C/C++ math library has this value pre-defined as the constant M_SQRT1_2.  The sine of -45° is the negative of that value.  The first rotation can be rewritten as

    static const quaternion q(M_SQRT1_2, -M_SQRT1_2, 0.0f, 0.0f);
    GFX_rotate(q);

Note that since we were able to precompute the sine and cosine values the code is already more efficient because calculating trigonometric functions (sine and cosine) is expensive.

Computing the quaternion for the second rotation is almost as easy.  I've added a new constant, DEG_TO_RAD_DIV_2, to the SDK/common/types.h to simplify the conversion of old GFX_rotate() calls into GFX_rotate(const quaternion &) calls:

    float   alpha(-rotz*DEG_TO_RAD_DIV_2);
    float   cosAlpha(cosf(alpha)), sinAlpha(sinf(alpha));
    GFX_rotate(quaternion(cosAlpha), 0.0f, 0.0f, sinAlpha));

This works but it still leaves us with two rotations.  We could take the product of the quaternions dynamically by doing something like this:

    static const quaternion q0(M_SQRT1_2, -M_SQRT1_2, 0.0f, 0.0f);
    quaternion q1(cosAlpha), 0.0f, 0.0f, sinAlpha));
    GFX_rotate(q0 * q1);

This gets us down to a single rotation but every time the code hits this spot we have to do a quaternion multiplication.  Each general purpose quaternion multiplication costs 16 floating point multiplications, and 12 floating point additions/subtractions*.

* For performance purposes I'll refer to a floating point additions/substractions as just floating point additions, since floating point addition and floating point subtraction have the same performance cost and it makes the text easier to read/write.  This is a common way of discussing floating point performance or cost.

There are only 4 non-zero components in these two quaternions so it should be pretty straight forward to calculate, in advance, a quaternion that represents the product of these two quaternions.  Doing so the code becomes:

    float   alpha(-rotz*DEG_TO_RAD_DIV_2);
    float   cosAlpha(cosf(alpha)), sinAlpha(sinf(alpha));
    GFX_rotate(quaternion( M_SQRT1_2*cosAlpha,
                          -M_SQRT1_2*cosAlpha,
                           M_SQRT1_2*sinAlpha,
                           M_SQRT1_2*sinAlpha));

Now, instead of performing a quaternion multiply at the cost of 16 multiplications, and 12 additions, the code can do the same work with only 4 multiplications (a really smart optimizing compiler might only do two multiplications); no additions are needed.  Plus, remember that we were able to avoid computing sine and cosine for one of the angles.

The final version of the code is now

    GFX_load_identity();
    .
    .
    .
    float   alpha(-rotz*DEG_TO_RAD_DIV_2);
    float   cosAlpha(cosf(alpha)), sinAlpha(sinf(alpha));
    GFX_rotate(quaternion( M_SQRT1_2*cosAlpha,
                          -M_SQRT1_2*cosAlpha,
                           M_SQRT1_2*sinAlpha,
                           M_SQRT1_2*sinAlpha));
    GFX_translate(-eye_location);

What used to be a translation, two rotations, and a matrix inversion has been simplified to a single rotation, followed by a translation.

There are numerous places throughout the sample code for the book which have consecutive rotations which can be converted into a single rotation.  To simplify the task of precomputing the product of rotation quaternions for use in the code, I've written a utility program which takes two strings representing quaternions, and produces the product.  For example, in the create_direction_vector() function I found in SDK/common/utils.cpp there were three consecutive rotations.  Rather than multiply 3 quaternions by hand I was able to feed them into the utility program, and let it do the hard work:

This program expects to read quaternions as a newline
terminated list of 4 components.  The components must
be separated by commas.

The program can be exited by sending EOF.  For UNIX
machines (including Mac OS X), the default way of
doing this is to type ctrl-D; Windows machines treat
ctrl-Z as EOF, by default.

Enter first quaternion:  cosBeta,0,sinBeta,0
Enter second quaternion:  cosGamma,sinGamma,0,0
cosBeta*cosGamma,cosBeta*sinGamma,sinBeta*cosGamma,-sinBeta*sinGamma

Enter first quaternion:  cosAlpha,0,0,sinAlpha
Enter second quaternion:  cosBeta*cosGamma,cosBeta*sinGamma,sinBeta*cosGamma,-sinBeta*sinGamma
(cosAlpha*cosBeta*cosGamma+sinAlpha*sinBeta*sinGamma),(cosAlpha*cosBeta*sinGamma-sinAlpha*sinBeta*cosGamma),(cosAlpha*sinBeta*cosGamma+sinAlpha*cosBeta*sinGamma),(-cosAlpha*sinBeta*sinGamma+sinAlpha*cosBeta*cosGamma)

Enter first quaternion:  

Using the output from this utility the revised version of create_direction_vector() performs a single rotation:

    // Convert all angles to radians & divide by 2.
    float   alpha = rotz*DEG_TO_RAD_DIV_2;
    float   cosAlpha(cosf(alpha)), sinAlpha(sinf(alpha));
    float   beta  = roty*DEG_TO_RAD_DIV_2;
    float   cosBeta(cosf(beta)), sinBeta(sinf(beta));
    float   gamma = rotx*DEG_TO_RAD_DIV_2;
    float   cosGamma(cosf(gamma)), sinGamma(sinf(gamma));
    float   cAcB(cosAlpha*cosBeta);
    float   sAsB(sinAlpha*sinBeta);
    float   cAsB(cosAlpha*sinBeta);
    float   sAcB(sinAlpha*cosBeta);
    l.loadRotation(quaternion(cAcB*cosGamma+sAsB*sinGamma,
                              cAcB*sinGamma-sAsB*cosGamma,
                              cAsB*cosGamma+sAcB*sinGamma,
                              sAcB*cosGamma-cAsB*sinGamma));

Three rotations have been reduced to 12 floating multiplies, 4 floating point additions, and one rotation.  And, by using a utility to do the complex manipulation of the quaternion multiplication I greatly reduced the chance that I'd make a mistake.

This utility program is available at GitHub.


While I have not yet rewritten the GFX module as a full blown object class I did make one other major change in that direction.  I made the old GFX_start() function into the GFX constructor.  This was needed to get the TStack objects to initialize correctly.

Next post, I'll finally convert the GFX module into a full C++ object class.