Tuesday, December 3, 2013

Optimizing Calculation of the Model-View Matrix

This post continues my series about rewriting the code from Game and Graphics Programming for iOS and Android with OpenGL ES 2.0 to better leverage C++ abilities.  In Chapter 7 the author discusses various means of manipulating the camera.  In reviewing the sample code for this chapter I think that the code can be made more efficient.  Doing so doesn't have any significant impact on the sample code for the book.  However, real games are likely to need every bit of performance they can get, especially running on mobile platforms.

To get a copy of the code as described in this post pull a copy from GitHub using the tag chapter7.

The sample program chapter7‑1 has the follow sequence of code in its templateAppDraw() function:

    GFX_load_identity();
    .
    .
    .
    GFX_translate(eye_location.x,
                eye_location.y,
                eye_location.z);
    GFX_rotate(rotz, 0.0f, 0.0f, 1.0f);
    GFX_rotate(90.0f, 1.0f, 0.0f, 0.0f);
    mat4_invert(GFX_get_modelview_matrix());

As you can see the code uses the routine mat4_invert() to do a matrix inversion of the Model-View matrix.  Matrix inversion is a relatively expensive numerical operation.  In the author's defense the mat4_invert() routine has been hand coded to exploit the particular features of 4D homogeneous space transformation matrices.  Still, if the code could avoid doing any matrix inversions at all the code would be more efficient.

I don't intend to write any fully qualified math proofs but I do need to introduce a few details for readers who aren't familiar with linear algebra, and/or need a refresher before forging ahead.

When you learned to do math using fractions you learned that when you divide by a fraction that it's the same thing as multiplying by that fraction's reciprocal.  When a number is divided by itself the result is 1 (one).  If that number is a fraction dividing it by itself is the same as multiplying the fraction by its own inverse.  If the fraction is equivalent/equal to the number 0 (zero), its reciprocal (aka its multiplicative inverse) isn't a "real" number.

Matrices have a similar concept, i.e., a matrix can have a multiplicative inverse.  One of the conditions for a matrix to have a multiplicative inverse is that the matrix must be square, i.e., it must have the same number of rows as it has columns.  Another condition required for having an inverse is that the matrix must not be 0 (zero) but, for the time being I'm not going to explain what that means in the context of matrix algebra.  If a matrix A has an inverse, which we call A‑1, the product of that matrix and its inverse is the identity matrix.  An identity matrix is like the matrix version of the number 1 (one).  An identity matrix is a square matrix which has 1 (one) along its diagonal, and 0 (zero) everywhere else.  For example, the 4×4 identity matrix is normally written as:



1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1


If we have a matrix M which is the product of the matrices A, and B, and if M has an inverse, the inverse of M (aka M‑1) can be found by

    M‑1 = (A × B)‑1 = B‑1 × A‑1

Going back to the code above each call to GFX_rotate(), and GFX_translate() executes a matrix multiplication:

    GFX_load_identity(); // Load the matrix version of "1"
    .
    .
    .
    GFX_translate(eye_location.x, // Times Matrix A, i.e.,
                eye_location.y, // 1 * A is A.
                eye_location.z);
    GFX_rotate(rotz, 0.0f, 0.0f, 1.0f); // Times Matrix B
    GFX_rotate(90.0f, 1.0f, 0.0f, 0.0f); // Times Matrix C

At the end of this sequence of function calls the Model-View matrix holds the product of A × B × C.  The next step

    mat4_invert(GFX_get_modelview_matrix());

calculates the inverse of this matrix product and stores it back in the Model-View matrix.

[Note:  Matrix multiplication, unlike the multiplication of real numbers, is not, in general, commutative.  The matrix operations must be performed in the order described below in order to get consistently correct results.]

What is the inverse of A × B × C?

    (A × B × C)‑1 = C ‑1 × B‑1 × A‑1

So, if after loading the Model-View matrix with the identity matrix the code multiplied by the inverse of C, then by the inverse of B, and finally by the inverse of A, the Model-View matrix would have the same results of multiplying by A, B, C, and then taking the inverse.

That is, we can change the order so that instead of performing a translation followed by two rotations, we can reverse the order as follows to do two rotations followed by a translation:

    GFX_rotate(-90.0f, 1.0f, 1.0f, 1.0f);// Note that the angle
                                         // of rotation is
                                         // negated.
    GFX_rotate(-rotz, 0.0f, 0.0f, 1.0f); // Likewise, here.
    GFX_translate(-eye_location.x, // And here the
                -eye_location.y, // direction of
                -eye_location.z);      // translation is
                                         // negated.

This has the same effect as the original code but without the cost of the matrix inversion.

[Note:  I'm not going to do a rigorous mathematical proof that negating the rotation angle calculates the matrix inverse for rotation, or that negating the translation amount generates the matrix inverse for translation.  I'm just going to make the assertion and leave it at that.]

Because the new code avoids the call to mat4_inverse() we can now calculate the same Model-View matrix with:
  • 21 fewer floating point multiplies,
  • 8 fewer floating point additions,
  • 1 fewer floating point division, and
  • 1 fewer matrix copy operation.
Again, the sample programs are small enough that it's probably not a big deal but in a real game where you need to make every CPU cycle count it may be the edge you need to get smooth animation.  Remember, these calculations happen every time the game calculates a new frame, perhaps 30 times, or more, per second.

I'm not done optimizing the code.  Going back to the revised code we have:

    GFX_load_identity();
    .
    .
    .
    GFX_rotate(-90.0f, 1.0f, 1.0f, 1.0f);
    GFX_rotate(-rotz, 0.0f, 0.0f, 1.0f);
    GFX_translate(-eye_location.x,
                -eye_location.y,
                -eye_location.z);
    for (auto objmesh=obj-objmesh.begin;
        objmesh!=obj->objmesh.end(); ++objmesh) {
        GFX_push_matrix();
GFX_translate(objmesh->location.x,
                    objmesh->location.y,
                    objmesh->location.z);
.
.
.
GFX_pop_matrix();
    }

Notice that in this version of the code we have consecutive translations.  First the code translates "away" from the eye location, and then translates to the mesh location.  Without any proof I'm going to assert that these two translations can be performed as a single translation by changing the code to:

    GFX_load_identity();
    .
    .
    .
    GFX_rotate(-90.0f, 1.0f, 1.0f, 1.0f);
    GFX_rotate(-rotz, 0.0f, 0.0f, 1.0f);
    for (auto objmesh=obj-objmesh.begin;
        objmesh!=obj->objmesh.end(); ++objmesh) {
GFX_push_matrix();
GFX_translate(objmesh->location.x-eye_location.x,
                    objmesh->location.y-eye_location.y,
                    objmesh->location.z-eye_location.z);
.
.
.
GFX_pop_matrix();
    }

Again, this saves the code a few floating point operations, and every little bit helps.

[Note:  There is a similar optimization which can be made to express the two rotations as a single rotation, but I don't yet have all the tools in place to do the optimization easily.  I promise, I'll get there.]

Let's move on to sample program chapter7-2.  If the code is changed to move the translation operation inside the loop it causes breakage.  See Figure 1.

Figure 1 is a screen capture where the two translations have been performed as a single operation.  Because the sample program chapter7-2 uses a frustum to decide which objects to cull in an effort to optimize drawing performance, moving the first translation past the calculation of the frustum introduces a bug.  The point of telling you this is to help you to avoid errors in your own code.  There are order dependencies lurking around every corner waiting to devour the unwary programmer.  It might be possible to tweak the calculation of the frustum to work around the problem, but I haven't tried to do so.

These examples hit the highlights of how one might optimize some of the transformation stack code.  Similiar changes have been made to the sample programs:

  • chapter7‑3,
  • chapter7‑4,
  • chapter10‑1,
  • chapter10‑2,
  • chapter10‑3,
  • chapter10‑4,
  • chapter10‑5,
  • chapter10‑6,
  • chapter11‑1,
  • chapter11‑2,
  • chapter11‑3,
  • chapter11‑4,
  • chapter12‑1,
  • chapter12‑2,
  • chapter12‑3, and
  • chapter12‑4.
The next post will cover Chapter 8, and discuss rewriting the author's NAVIGATION, and FONT modules.

No comments:

Post a Comment