Understanding and Using OpenGL Texture Objects
What is a texture object, anyway? Apparently, they can improve your textured rendering performance by more than 10 times while using the same hardware. If this sounds like a good deal to you, read on. Wright will tell you all about how to make use of them.
With the advent of the OpenGL 1.1 specification, texture objects became part of the API. These texture objects have been available for quite some time as extensions to OpenGL 1.0 because of their incredible performance benefits. However, not everyone is aware of what texture objects are and why they are useful. Texture objects can improve your textured rendering performance by more than 10 times using the same hardware. That's a pretty bold claim, and I can demonstrate it with ease.
Typically, you load a texture image from either a disk file or a memory resource, or you generate it procedurally. You then upload the texture to OpenGL with a call to glTexImage2D(). Although I'm using 2D textures for this working example, everything that this article discusses applies equally well to 1D and 3D (OpenGL 1.2 only) textures. A typical scenario would go something like this:
1. Load texture image bits from a disk file.
2. Set the OpenGL texture parameters and environment (filters, border modes, and so on).
3. Call glTexImage2D (for 2D textures) to load the texture into OpenGL.
4. Draw some geometry with texture coordinates.
However, you wouldn't want to perform each of these steps for every frame of an animated scene. Repeatedly accessing a disk file can be pretty expensive, time wise. If you were using a single texture, you might first load the disk file and call glTexImage2D() after creating the rendering context. Then you'd render the geometry or scene while repeatedly changing the viewer's position or the object's orientation (whichever is appropriate). With a run-of-the-mill 3D card with OpenGL acceleration, you would get a reasonably good frame rate and thus a smooth animation.
Now suppose that a given scene contains multiple textured objects, each using one or more textures; or perhaps a single object with multiple textures. For our example, we'll use a spinning cube with a different texture on each face, plus a marble texture for the floor beneath this cube. This scene has a total of seven textures that need to be loaded.
Naturally, we don't want to access the disk seven times for every frame of our animation, as this would slow our rendering considerably. One popular technique is to combine all seven textures into one large texture that gets loaded once. Then, by tweaking the texture coordinates, you can effectively put different portions of the same texture on each side of the cube. The problem with this technique is that texture filtering will often introduce unsightly artifacts along the edges of your polygons. Another practical consideration is that most hardware imposes some limit on the maximum texture size for a single texture. You could quickly exhaust your available texture space without making much of a dent in your actual available texture memory.
So artistically, it's better to keep textures separate. How then do we avoid a texture load every time we need to change textures? A reasonable and intuitive choice would be to use display lists. Each display list would contain a call to glTexture2D() (or 1D or 3D) with the appropriate pointer for the given texture. This approach would save considerable time because it only accesses the disk seven times to read in the textures and does so before the animation loop. Saving seven or more disk accesses per frame seems to be a substantial optimization. Let's see what happens.
A First Try at Texturing
Our first example program, REFLECT.EXE, shows a simple animation. A cube is suspended above a reflective marble surface. We’'ve employed a simple reflection technique that draws the original cube first, then draws the reflected cube below, then blends the floor over the reflected cube. In this example, the window is never validated, so Windows continually repaints the window over and over again. Each time, the rotation of the cube is updated slightly, so we have the simple reflected spinning cube shown in Figure 1. By keeping time with a simple clock function, we can divide the number of frames by the elapsed time and get a crude but effective running frame rate.
This demonstration requires thirteen texture loads — six to draw the reflected cube (one for each side of the cube), one for the marble floor, and six more for the cube floating above the floor. This example program reads the texture from disk and uploads it to OpenGL with glTexImage2D() each time a GL_QUAD is drawn. You can use the arrow keys to move the cube's rotation axes around and see that each of the six sides indeed has a different texture.
We can realize some performance benefit if we sort the textures to avoid redundant texture loads. For example, we can load the sand texture and draw the face of the reflected cube and the face of the source cube together. In this scenario, we would only have seven texture loads per frame. This technique, called texture sorting, can result in considerable performance gains on some hardware in many situations. However, it is still limited by the time it takes to load each texture. While this technique has the potential to speed up our rendering substantially, it still won't deliver the bang that we'll get from other optimization techniques. We'll get back to this topic of texture management and demonstrate this later.
The code from REFLECT.CPP is pretty straightforward. After creating our rendering context, the code calls a SetupRC() function, which performs all the needed initialization. We've set the texture parameters to do bilinear texture filtering and to repeat the texture coordinates (for the marble floor). An excerpt from SetupRC() is shown here:
void SetupRC(void)
{
// Set Texture mapping parameters
glTexParameterf(GL_TEXTURE_2D,GL_TEXTURE_WRAP_S, GL_REPEAT);
glTexParameterf(GL_TEXTURE_2D,GL_TEXTURE_WRAP_T, GL_REPEAT);
glTexParameterf(GL_TEXTURE_2D,GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameterf(GL_TEXTURE_2D,GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glEnable(GL_TEXTURE_2D);
...
...
The scene is rendered in a function called RenderScene(), which draws the original cube with a call to DrawCube(). It then draws the mirrored (and scaled) cube below the original by creating a reflection matrix and translating the origin to a position below the floor. The function then simply calls DrawCube() a second time and blends the marble floor over the reflected cube. The DrawCube() function actually loads the texture for each face of the cube and then draws that face with a single GL_QUAD. The following code excerpt shows an example of drawing a single face of the cube.
// Front face of Cube
LoadBMP("xray.bmp");
// Load the bitmap from disk and load into OpenGL
// with glTexImage2D()
glBegin(GL_QUADS);
glTexCoord2f(0.0f, 0.0f);
glVertex3f(-fSize, fSize, fSize);
glTexCoord2f(0.0f, 1.0f);
glVertex3f(-fSize, -fSize, fSize);
glTexCoord2f(1.0f, 1.0f);
glVertex3f(fSize,-fSize, fSize);
glTexCoord2f(1.0f, 0.0f);
glVertex3f(fSize,fSize, fSize);
glEnd();
The LoadBMP() function (included in the source) simply loads a 24-bit Windows .BMP file and calls glTexImage2D() for us. You’'ll find that initially, the first frame is very slow to render. Subsequent frames render much more quickly (comparatively) because the texture images are actually read from the disk cache for each frame. Relying on the disk cache to speed up your application is a poor excuse for software engineering.
An Intuitive Improvement
For all of these examples, I used an ATI Rage Fury AGP graphics card with 32MB of memory. The PC is a Super 7 homebrew box running an AMD K6-2/400 CPU. This ATI board has an OpenGL ICD (download it from their web site; the driver that comes on the CD is worthless), so this system is no slouch for accelerated OpenGL rendering. The lame two frames per second rendering speed is due to the fact that we are loading the textures from the disk (cache) each time — or is it? Let's see.
The next example, REFLECT_DL.EXE, uses display lists to avoid accessing the disk for each texture load. Using display lists is similar to using texture objects in some respects, so let me show you the structural changes made to account for the use of display lists. First, at the beginning of REFLECT_DL.CPP, we declare variables to hold the names of each of seven display lists.
// Display list names for each texture
GLuint nXray, nLightning, nFall, nCoins, nSand, nEye, nMarble;
Next, in SetupRC(), we must allocate room for seven display list names and assign them to each of our numeric variables. Creating a display list is fairly straightforward, so we simply create seven display lists that encapsulate the loading of each texture. Remember that LoadBMP() calls glTexImage2D() for us:
// Generate 7 display list IDs
nXray = glGenLists(7);
nLightning = nXray + 1;
nFall = nLightning + 1;
nCoins = nFall + 1;
nSand = nCoins + 1;
nEye = nSand + 1;
nMarble = nEye + 1;
// Load the X-Ray texture
glNewList(nXray,GL_COMPILE);
LoadBMP("xray.bmp");
glEndList();
// Load the Lightning texture...
glNewList(nLightning,GL_COMPILE);
LoadBMP("lightning.bmp");
glEndList();
. . .
. . .
. . .
The final change is made to DrawCube(), where we now invoke the display list for each GL_QUAD rather than calling LoadBMP() to read the bitmap from the disk each time.
// Front face of Cube
glCallList(nXray);
glBegin(GL_QUADS);
glTexCoord2f(0.0f, 0.0f);
glVertex3f(-fSize, fSize, fSize);
glTexCoord2f(0.0f, 1.0f);
glVertex3f(-fSize, -fSize, fSize);
glTexCoord2f(1.0f, 1.0f);
glVertex3f(fSize,-fSize, fSize);
glTexCoord2f(1.0f, 0.0f);
glVertex3f(fSize,fSize, fSize);
glEnd();
On most consumer hardware, the performance increase is marginal at best (up to 3.2 FPS on the test system). As we can see from the frame rate displayed in Figure 2, using display lists for the texture loads saves very little time and has only a marginal performance benefit.
Where's the Beef?
Running either of these two examples, you may doubt that you are actually getting any hardware acceleration at all. One of the original OpenGL designers once admitted to me that he thought the omission of texture objects was one of the major (but few) blunders of OpenGL 1.0. (This blunder was fixed with 1.1, which is why you’'re reading this.) At the time, OpenGL's engineers thought that display lists would do an adequate job of encapsulating the texture loads. The purpose of a display list is to encapsulate a group of OpenGL commands into a preprocessed batch that can be sent to hardware quickly or even invoked remotely via a client/server interface. While display lists are great for precompiling a set of OpenGL commands into hardware commands, they can be very difficult for hardware vendors to optimize when they contain important state changes, such as loading a new texture.
If you think about it, display lists are inherently inefficient for texture loads. A display list must be stored in memory. When a display list contains a texture load, it also contains the original bit image of that texture. When the display list is invoked, it uploads the texture image to OpenGL (via glTexImage2D()). Now two copies of the texture exist. One represents the original texture bits stored as part of the display list, and the other has been preprocessed into a native format that the OpenGL driver and it’'s particular hardware can use.
Loading a texture can be quite time consuming, especially if the native format is different from what you’'re specifying (such as using a 24-bit texture image when the driver converts it to a 16-bit image internally). Thus, even though using a display list can save a great deal of intermediary calculations (or disk accesses), some time is always lost due to the actual texture upload. What we need is a way of keeping this internal texture data around for individual textures. We can then reference these textures individually instead of performing multiple texture loads. This is exactly what texture objects do.
(Some exceptions to the stated efficacy of texture objects exist. 3D Labs, for example, has done a tremendous amount of work to optimize display list generation for its drivers. Even the original Permedia drivers have has this optimization. 3D Labs' drivers correctly store the internal results of the glTexImage2D() call, rather than simply replay this sequence to the driver when the display list is invoked. One problem that this approach has created for developers targeting consumer hardware is that code that runs great on 3D Labs' hardware can suddenly appear unaccelerated on other graphics boards. However, this optimization is uncommon on most commodity boards. Most vendors won’'t spend the time making this optimization when texture objects are so readily available to developers).
Using Texture Objects
Texture objects are created and named similarly to display lists. First, we must create one or more texture object names for use. We do this with the glGenTextures() function.
void glGenTextures( GLsizei n, GLuint * textures );
The first argument is the number of texture object names that we want to create. The second is a pointer to an array that will contain the list of generated names (like a display list, a texture object name is simply an integer value). Note that this array must be preallocated (or statically declared) before calling the function. If we dynamically allocate this array, we must also remember to delete it later on program exit. In addition, as with display lists, we need to tell the driver when we no longer need the texture objects and to free its own resources. The corresponding function is glDeleteTextures().
void glDeleteTextures( GLsizei n, const GLuint * textures );
We use the arguments the same way in which we used them in glGenTextures(), with the pointer pointing to a list of already created texture object names. Notice that unlike display lists, a single value representing the beginning of a range is not returned. Rather, we get an array of values. Thus, we can't assume that texture objects will be a continuous range of integers, as we did with display lists.
Our new sample program, REFLECT_TO.EXE, will use texture objects instead of display lists to save the loaded texture state, not just a playback script of the loading of the texture. Initially, we create a statically declared array of seven integers in which to store the texture objects. We also declare an enumeration that we can use later to identify each texture object.
// Array of seven texture objects
GLuint tList[7];
// Identifiers
enum Texture_Objects { XRAY = 0, LIGHTNING, FALL, COINS, SAND, STORM, MARBLE };
Binding Textures
The default texture environment (compatible with OpenGL 1.0) has a texture name of 0. We bind the current texture environment to a named texture object with a call to the glBindTexture().
void glBindTexture( GLenum target, GLuint texture );
The target parameter may have one of three values: GL_TEXTURE_1D, GL_TEXTURE_2D, or GL_TEXTURE_3D (for OpenGL 1.2 and later). Once we've bound a texture object to the current texture environment, any changes that we make to that environment become bound to that texture object, too. This includes texture loads (glTexImage...) and texture parameter settings such as GL_NEAREST or GL_LINEAR filtering. When we bind to a new texture object with another call to glBindTexture(), all of our settings and texture loads are preserved under the previous texture object.
Now the new texture loading and initialization code from SetupRC() looks like this:
// Initialize the Rendering Context
void SetupRC(void)
{
// Generate 7 texture object IDs
glGenTextures(7, tList);
glBindTexture(GL_TEXTURE_2D, tList[XRAY]);
// Set Texture mapping parameters
glTexParameterf(GL_TEXTURE_2D,GL_TEXTURE_WRAP_S, GL_REPEAT);
glTexParameterf(GL_TEXTURE_2D,GL_TEXTURE_WRAP_T, GL_REPEAT);
glTexParameterf(GL_TEXTURE_2D,GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameterf(GL_TEXTURE_2D,GL_TEXTURE_MIN_FILTER, GL_LINEAR);
LoadBMP("xray.bmp");
glBindTexture(GL_TEXTURE_2D, tList[LIGHTNING]);
// Set Texture mapping parameters
glTexParameterf(GL_TEXTURE_2D,GL_TEXTURE_WRAP_S, GL_REPEAT);
glTexParameterf(GL_TEXTURE_2D,GL_TEXTURE_WRAP_T, GL_REPEAT);
glTexParameterf(GL_TEXTURE_2D,GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameterf(GL_TEXTURE_2D,GL_TEXTURE_MIN_FILTER, GL_LINEAR);
LoadBMP("lightning.bmp");
After allocating seven texture object names, we bind to one and set up our texture environment for that object. Calls to glTexParameter() are preserved as part of the texture state for that texture object. For this reason, it’'s necessary (or advisable) to set the texture environment explicitly for each texture object. This step can be quite helpful as it allows us to have several textures loaded simultaneously. We can have linear- or nearest-filtered textures, some MIP-mapped objects and others not MIP-mapped, and so on.
We repeat this step for all seven textures in the same way we built the display lists. Now, to return to a previously bound texture environment, we simply call glBindTexture() again with that texture object name. The DrawCube() function has changed very little and now simply reflects the changing of texture objects for each GL_QUAD.
// Draw the cube by loading each texture individually,
// and drawing the corresponding side.
void DrawCube(void)
{
float fSize = 20.0f;
// Front face of Cube
glBindTexture(GL_TEXTURE_2D, tList[XRAY]);
glBegin(GL_QUADS);
glTexCoord2f(0.0f, 0.0f);
glVertex3f(-fSize, fSize, fSize);
glTexCoord2f(0.0f, 1.0f);
glVertex3f(-fSize, -fSize, fSize);
glTexCoord2f(1.0f, 1.0f);
glVertex3f(fSize,-fSize, fSize);
glTexCoord2f(1.0f, 0.0f);
glVertex3f(fSize,fSize, fSize);
glEnd();
// Back face of Cube
glBindTexture(GL_TEXTURE_2D, tList[COINS]);
glBegin(GL_QUADS);
glTexCoord2f(0.0f, 0.0f);
glVertex3f(fSize,fSize, -fSize);
glTexCoord2f(0.0f, 1.0f);
glVertex3f(fSize,-fSize, -fSize);
glTexCoord2f(1.0f, 1.0f);
glVertex3f(-fSize, -fSize, -fSize);
glTexCoord2f(1.0f, 0.0f);
glVertex3f(-fSize, fSize, -fSize);
glEnd();. . .
. . .
. . .
Finally, don't forget to clean up the texture objects on program exit with a call to glDeleteTexture().
glDeleteTextures(7, tList);
As seen in Figure 3, the new version is dramatically improved. Even in a generic (software) implementation, you can sometimes double your frame rate by switching to texture objects. Hardware implementations, however, receive the greatest overall performance boost. By switching from display lists to texture objects, the sample program went from 3 FPS to 102 FPS — a 3,400 percent improvement.
Texture Management
So, now we know that texture objects are the way to go for managing our textures. We load each texture into a texture object, and then bind to the appropriate texture whenever needed. Piece of cake, right? Well, there are still a few things that we need to keep in mind. In our example program, once we loaded and bound all seven textures, we switched textures thirteen times during the rendering of the two cubes and marble floor.
Some consumer OpenGL hardware (such as the i740) handle this texture switching very quickly and efficiently. Deep down in the driver, texture switching amounts to little more than changing a single pointer value. Unfortunately, we can't make this assumption for most hardware. One of the reasons that the i740 is so good at this is because it textures directly from AGP memory. An i740-based AGP graphics card has no local texture memory, so all textures are accessed at roughly equivalent efficiency.
Most other modern AGP and PCI graphics cards usually have some memory on board the card for texture storage. These boards can cache a considerable amount of texture directly on board. Accessing these local textures is very efficient when using texture objects. Although still phenomenally faster than reloading the textures with glTexImage2D(), switching between texture objects can still have some performance overhead. This is especially true if you're using different texture filtering parameters, or perhaps different sized or nonsquare textures. Just how much overhead texture switching introduces will vary depending on which vendor’'s card you're using.
The example program REFLECT_SORTED.EXE rearranges the cube drawing code to draw faces of the two cubes with the same texture together. I won't reprint the code here, but it is horrid to look at. The code performs a great number of transformations and pops them off of the transformation stack as we swap back and forth between the upper cube and the reflected and scaled down cube below the marble floor. While a more elegant solution to texture sorting is desirable, just what this will look like will depend greatly on your own rendering engine. The code shown in the REFLECT_SORTED.CPP source file is meant to demonstrate the concept, not necessarily show the best possible way to do this for a general situation. The difference in frame rate with the ATI AGP graphics card was interesting — we dropped one frame per second. You should benchmark your own code. Texture sorting may not always pay off. I think the difference might be more positive in a real application, with many more textures being swapped to and fro. The amount of texture memory available can also have an influence on the effectiveness of this technique.
While binding textures saves the driver from having to reload and reformat texture data each time it's accessed, it is possible that texture data may still need to be shuffled back and forth between local memory (resident) and either AGP or system memory (not resident on the graphics card). Again, some drivers might consider AGP memory to be resident. Don't pay any attention to what the Intel documentation tells you regarding resident memory. Hardware manufacturers are not all handling resident memory in the same way. Their marketing literature calls it whatever they want, and you're left to fend for yourselves (that's the hard reality of being a developer).
The REFLECT_SORTED.EXE program might actually show a substantial performance increase on some hardware. If, for example, we were running on a board with a very small amount of texture memory available (say, one of the older 4MB boards), then the extra processing headache may pay off because it would still be far less time consuming than reading entire textures multiple times over the PCI bus. A related topic that is beyond the scope of this article is state sorting. Texture sorting is a simplified case of state sorting. When state sorting, you render all geometry of like state together (lit objects, unlit objects, textured objects, nontextured objects, and so on). State sorting is still a tremendous performance optimization technique and should not be ignored. You might be surprised by how much of a performance hit you take from a single glEnable()and glDisable() call. As we've seen, within the textured state, sorting again by texture object can still provide further optimization opportunities.
Texture Priorities
Depending on your operating environment, your performance requirements, and your boss's mood, you may elect to do your own texture memory management. Conceivably, you could have one really large texture and several smaller textures in your scene. Suppose you keep swapping the large texture back and forth to make room for some little texture. Suppose you're using a PCI graphics card with limited texture memory available, and all this swapping back and forth is killing your frame rate. Wouldn't it be nice if you could tell OpenGL that your bigger textures are going to be used for every frame, and that if it needs to make room for more textures on board, to discard the smaller textures first?
By default, most OpenGL drivers will use an LRU (Least Recently Used) algorithm to swap textures. This means that if your big texture is used once for the floor or ceiling, it could continually get bumped out of your graphics card's memory to make room for a bunch of little textures that are used by multiple objects in the scene (say, for debris on the floor). In this scenario, even texture sorting won’'t provide the maximum benefit, because one large texture is continually being shuffled back and forth over the bus (PCI or AGP — in either case it takes more time than not moving the texture).
While some APIs force you to do all the low-level details of texture management, OpenGL has a much more elegant solution. OpenGL's philosophy is that all textures should be resident. This approach provides the optimum texturing performance. So what happens when there isn't enough room for all textures to be resident? OpenGL will make the most commonly used textures resident, which, in a great many cases, is a good memory management solution.
As in the example given previously, when all textures cannot be resident, the most commonly used textures may not be those that you want to keep in the card's memory. If you have to do texture swapping, you want to move the smaller textures, not the bigger ones. You may also need to change this prioritization from scene to scene or room to room within your environment. You could write your own texture management algorithms, taking into account the amount of memory on the board and the size of the textures. You could then call special API functions to lock specific textures in graphics memory and juggle all of this throughout your game or simulation. Many users of other APIs have bemoaned OpenGL's lack of support for this kind of approach. But this complaint is rather like insisting on a VW do-it-yourself kit at the car lot instead of just taking the keys to the Ferrari. After all, witha kit you can do your own performance tuning.
OpenGL's answer to this situation is texture priorities. A single function call, glPrioritizeTextures(), is all that is required, and it works with texture objects.
void glPrioritizeTextures(GLsizei n, GLuint *textures, GLclampf *priorities);
This function lets you tell OpenGL which texture objects are the most important to keep in graphics memory. The first parameter is the number of texture objects, the second is an array of texture object names (remember that these are integer names), and the third parameter is an array of priorities. Texture priorities are floating-point values ranked from 0.0 to 1.0, with 1.0 being the highest priority. You can call this function as many times as you need to in order to reassign texture priorities. Texture objects with a 1.0 priority will most likely be in graphics memory at any given time, and those with 0.0 priority will only be in memory if they have been used at least once and there is still texture memory to burn. Being in graphics memory is called being resident in OpenGL terminology.
Prioritizing textures allows OpenGL to worry about how many and which textures will fit in the available texture memory space. Compromise is always possible. Even when you do your own low-level texture management in some other API, you’'ll still have to handle a situation in which you have more textures than will fit in texture memory. Suppose, you want to know if all your important textures are actually resident? One final function worth mentioning here can help you with this: glAreTexturesResident().
GLboolean glAreTexturesResident(GLsizei n, GLuint *textures, GLboolean *residences)
Given an array of texture objects (stored in *textures), this function fills an array of corresponding flags (*residences) that will tell you whether the texture object is resident or not. You can use this function to load all of your texture objects and then test to see if they all fit within local texture memory. For example, you might want to adjust the size of some large texture in order to get more texture objects resident. A nice feature of this function is that if all of the texture objects specified are resident, the function returns TRUE. Thus, you can avoid having to test each individual flag returned in *residences.
The last example program, REFLECT_P.EXE, uses the texture priority functions to set the larger marble texture as the highest priority. It also checks to see if all textures are fitting within available hardware texture memory and displays this information, along with the frame rate, in the window caption (Figure 4).
On a modern graphics card with plentiful texture memory, all textures in this example will be resident. A good exercise would be to experiment with the REFLECT_P.EXE and REFLECT_SORTED.EXE programs on different hardware configurations. You may need to experiment with changing the texture sizes in order to exceed texture memory on one of the newer big-boy boards (32MB of memory on board). By experimenting with different texture sizes and different amounts of available texture memory, you'll find that the best performance is achieved when all textures can be resident. When all textures cannot possibly be resident, you'll see an improvement in REFLECT_P.EXE (textures sorted and using priorities) over REFLECT_SORTED.EXE (just sorting textures).
Richard S. Wright Jr. is the lead author of the OpenGL SuperBible and does OpenGL consulting for Starstone Software Systems Inc. You can download the code and textures for the sample programs from Richard’'s OpenGL web site at http://www.starstonesoftware.com/OpenGL.
Read more about:
FeaturesAbout the Author
You May Also Like