| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> |
| <html lang="en"> |
| <head> |
| <meta http-equiv="content-type" content="text/html; charset=utf-8"> |
| <title>GL Dispatch</title> |
| <link rel="stylesheet" type="text/css" href="mesa.css"> |
| </head> |
| <body> |
| |
| <div class="header"> |
| The Mesa 3D Graphics Library |
| </div> |
| |
| <iframe src="contents.html"></iframe> |
| <div class="content"> |
| |
| <h1>GL Dispatch</h1> |
| |
| <p>Several factors combine to make efficient dispatch of OpenGL functions |
| fairly complicated. This document attempts to explain some of the issues |
| and introduce the reader to Mesa's implementation. Readers already familiar |
| with the issues around GL dispatch can safely skip ahead to the <a |
| href="#overview">overview of Mesa's implementation</a>.</p> |
| |
| <h2>1. Complexity of GL Dispatch</h2> |
| |
| <p>Every GL application has at least one object called a GL <em>context</em>. |
| This object, which is an implicit parameter to every GL function, stores all |
| of the GL related state for the application. Every texture, every buffer |
| object, every enable, and much, much more is stored in the context. Since |
| an application can have more than one context, the context to be used is |
| selected by a window-system dependent function such as |
| <code>glXMakeContextCurrent</code>.</p> |
| |
| <p>In environments that implement OpenGL with X-Windows using GLX, every GL |
| function, including the pointers returned by <code>glXGetProcAddress</code>, are |
| <em>context independent</em>. This means that no matter what context is |
| currently active, the same <code>glVertex3fv</code> function is used.</p> |
| |
| <p>This creates the first bit of dispatch complexity. An application can |
| have two GL contexts. One context is a direct rendering context where |
| function calls are routed directly to a driver loaded within the |
| application's address space. The other context is an indirect rendering |
| context where function calls are converted to GLX protocol and sent to a |
| server. The same <code>glVertex3fv</code> has to do the right thing depending |
| on which context is current.</p> |
| |
| <p>Highly optimized drivers or GLX protocol implementations may want to |
| change the behavior of GL functions depending on current state. For |
| example, <code>glFogCoordf</code> may operate differently depending on whether |
| or not fog is enabled.</p> |
| |
| <p>In multi-threaded environments, it is possible for each thread to have a |
| different GL context current. This means that poor old <code>glVertex3fv</code> |
| has to know which GL context is current in the thread where it is being |
| called.</p> |
| |
| <h2 id="overview">2. Overview of Mesa's Implementation</h2> |
| |
| <p>Mesa uses two per-thread pointers. The first pointer stores the address |
| of the context current in the thread, and the second pointer stores the |
| address of the <em>dispatch table</em> associated with that context. The |
| dispatch table stores pointers to functions that actually implement |
| specific GL functions. Each time a new context is made current in a thread, |
| these pointers a updated.</p> |
| |
| <p>The implementation of functions such as <code>glVertex3fv</code> becomes |
| conceptually simple:</p> |
| |
| <ul> |
| <li>Fetch the current dispatch table pointer.</li> |
| <li>Fetch the pointer to the real <code>glVertex3fv</code> function from the |
| table.</li> |
| <li>Call the real function.</li> |
| </ul> |
| |
| <p>This can be implemented in just a few lines of C code. The file |
| <code>src/mesa/glapi/glapitemp.h</code> contains code very similar to this.</p> |
| |
| <figure> |
| <pre> |
| void glVertex3f(GLfloat x, GLfloat y, GLfloat z) |
| { |
| const struct _glapi_table * const dispatch = GET_DISPATCH(); |
| |
| (*dispatch->Vertex3f)(x, y, z); |
| } |
| </pre> |
| <figcaption>Sample dispatch function</figcaption> |
| </figure> |
| |
| <p>The problem with this simple implementation is the large amount of |
| overhead that it adds to every GL function call.</p> |
| |
| <p>In a multithreaded environment, a naive implementation of |
| <code>GET_DISPATCH</code> involves a call to <code>pthread_getspecific</code> or a |
| similar function. Mesa provides a wrapper function called |
| <code>_glapi_get_dispatch</code> that is used by default.</p> |
| |
| <h2>3. Optimizations</h2> |
| |
| <p>A number of optimizations have been made over the years to diminish the |
| performance hit imposed by GL dispatch. This section describes these |
| optimizations. The benefits of each optimization and the situations where |
| each can or cannot be used are listed.</p> |
| |
| <h3>3.1. Dual dispatch table pointers</h3> |
| |
| <p>The vast majority of OpenGL applications use the API in a single threaded |
| manner. That is, the application has only one thread that makes calls into |
| the GL. In these cases, not only do the calls to |
| <code>pthread_getspecific</code> hurt performance, but they are completely |
| unnecessary! It is possible to detect this common case and avoid these |
| calls.</p> |
| |
| <p>Each time a new dispatch table is set, Mesa examines and records the ID |
| of the executing thread. If the same thread ID is always seen, Mesa knows |
| that the application is, from OpenGL's point of view, single threaded.</p> |
| |
| <p>As long as an application is single threaded, Mesa stores a pointer to |
| the dispatch table in a global variable called <code>_glapi_Dispatch</code>. |
| The pointer is also stored in a per-thread location via |
| <code>pthread_setspecific</code>. When Mesa detects that an application has |
| become multithreaded, <code>NULL</code> is stored in <code>_glapi_Dispatch</code>.</p> |
| |
| <p>Using this simple mechanism the dispatch functions can detect the |
| multithreaded case by comparing <code>_glapi_Dispatch</code> to <code>NULL</code>. |
| The resulting implementation of <code>GET_DISPATCH</code> is slightly more |
| complex, but it avoids the expensive <code>pthread_getspecific</code> call in |
| the common case.</p> |
| |
| <figure> |
| <pre> |
| #define GET_DISPATCH() \ |
| (_glapi_Dispatch != NULL) \ |
| ? _glapi_Dispatch : pthread_getspecific(&_glapi_Dispatch_key) |
| </pre> |
| <figcaption>Improved <code>GET_DISPATCH</code> Implementation</figcaption> |
| </figure> |
| |
| <h3>3.2. ELF TLS</h3> |
| |
| <p>Starting with the 2.4.20 Linux kernel, each thread is allocated an area |
| of per-thread, global storage. Variables can be put in this area using some |
| extensions to GCC. By storing the dispatch table pointer in this area, the |
| expensive call to <code>pthread_getspecific</code> and the test of |
| <code>_glapi_Dispatch</code> can be avoided.</p> |
| |
| <p>The dispatch table pointer is stored in a new variable called |
| <code>_glapi_tls_Dispatch</code>. A new variable name is used so that a single |
| libGL can implement both interfaces. This allows the libGL to operate with |
| direct rendering drivers that use either interface. Once the pointer is |
| properly declared, <code>GET_DISPACH</code> becomes a simple variable |
| reference.</p> |
| |
| <figure> |
| <pre> |
| extern __thread struct _glapi_table *_glapi_tls_Dispatch |
| __attribute__((tls_model("initial-exec"))); |
| |
| #define GET_DISPATCH() _glapi_tls_Dispatch |
| </pre> |
| <figcaption>TLS <code>GET_DISPATCH</code> Implementation</figcaption> |
| </figure> |
| |
| <p>Use of this path is controlled by the preprocessor define |
| <code>USE_ELF_TLS</code>. Any platform capable of using ELF TLS should use this |
| as the default dispatch method.</p> |
| |
| <h3>3.3. Assembly Language Dispatch Stubs</h3> |
| |
| <p>Many platforms has difficulty properly optimizing the tail-call in the |
| dispatch stubs. Platforms like x86 that pass parameters on the stack seem |
| to have even more difficulty optimizing these routines. All of the dispatch |
| routines are very short, and it is trivial to create optimal assembly |
| language versions. The amount of optimization provided by using assembly |
| stubs varies from platform to platform and application to application. |
| However, by using the assembly stubs, many platforms can use an additional |
| space optimization (see <a href="#fixedsize">below</a>).</p> |
| |
| <p>The biggest hurdle to creating assembly stubs is handling the various |
| ways that the dispatch table pointer can be accessed. There are four |
| different methods that can be used:</p> |
| |
| <ol> |
| <li>Using <code>_glapi_Dispatch</code> directly in builds for non-multithreaded |
| environments.</li> |
| <li>Using <code>_glapi_Dispatch</code> and <code>_glapi_get_dispatch</code> in |
| multithreaded environments.</li> |
| <li>Using <code>_glapi_Dispatch</code> and <code>pthread_getspecific</code> in |
| multithreaded environments.</li> |
| <li>Using <code>_glapi_tls_Dispatch</code> directly in TLS enabled |
| multithreaded environments.</li> |
| </ol> |
| |
| <p>People wishing to implement assembly stubs for new platforms should focus |
| on #4 if the new platform supports TLS. Otherwise, implement #2 followed by |
| #3. Environments that do not support multithreading are uncommon and not |
| terribly relevant.</p> |
| |
| <p>Selection of the dispatch table pointer access method is controlled by a |
| few preprocessor defines.</p> |
| |
| <ul> |
| <li>If <code>USE_ELF_TLS</code> is defined, method #3 is used.</li> |
| <li>If <code>HAVE_PTHREAD</code> is defined, method #2 is used.</li> |
| <li>If none of the preceding are defined, method #1 is used.</li> |
| </ul> |
| |
| <p>Two different techniques are used to handle the various different cases. |
| On x86 and SPARC, a macro called <code>GL_STUB</code> is used. In the preamble |
| of the assembly source file different implementations of the macro are |
| selected based on the defined preprocessor variables. The assembly code |
| then consists of a series of invocations of the macros such as: |
| |
| <figure> |
| <pre> |
| GL_STUB(Color3fv, _gloffset_Color3fv) |
| </pre> |
| <figcaption>SPARC Assembly Implementation of <code>glColor3fv</code></figcaption> |
| </figure> |
| |
| <p>The benefit of this technique is that changes to the calling pattern |
| (i.e., addition of a new dispatch table pointer access method) require fewer |
| changed lines in the assembly code.</p> |
| |
| <p>However, this technique can only be used on platforms where the function |
| implementation does not change based on the parameters passed to the |
| function. For example, since x86 passes all parameters on the stack, no |
| additional code is needed to save and restore function parameters around a |
| call to <code>pthread_getspecific</code>. Since x86-64 passes parameters in |
| registers, varying amounts of code needs to be inserted around the call to |
| <code>pthread_getspecific</code> to save and restore the GL function's |
| parameters.</p> |
| |
| <p>The other technique, used by platforms like x86-64 that cannot use the |
| first technique, is to insert <code>#ifdef</code> within the assembly |
| implementation of each function. This makes the assembly file considerably |
| larger (e.g., 29,332 lines for <code>glapi_x86-64.S</code> versus 1,155 lines for |
| <code>glapi_x86.S</code>) and causes simple changes to the function |
| implementation to generate many lines of diffs. Since the assembly files |
| are typically generated by scripts (see <a href="#autogen">below</a>), this |
| isn't a significant problem.</p> |
| |
| <p>Once a new assembly file is created, it must be inserted in the build |
| system. There are two steps to this. The file must first be added to |
| <code>src/mesa/sources</code>. That gets the file built and linked. The second |
| step is to add the correct <code>#ifdef</code> magic to |
| <code>src/mesa/glapi/glapi_dispatch.c</code> to prevent the C version of the |
| dispatch functions from being built.</p> |
| |
| <h3 id="fixedsize">3.4. Fixed-Length Dispatch Stubs</h3> |
| |
| <p>To implement <code>glXGetProcAddress</code>, Mesa stores a table that |
| associates function names with pointers to those functions. This table is |
| stored in <code>src/mesa/glapi/glprocs.h</code>. For different reasons on |
| different platforms, storing all of those pointers is inefficient. On most |
| platforms, including all known platforms that support TLS, we can avoid this |
| added overhead.</p> |
| |
| <p>If the assembly stubs are all the same size, the pointer need not be |
| stored for every function. The location of the function can instead be |
| calculated by multiplying the size of the dispatch stub by the offset of the |
| function in the table. This value is then added to the address of the first |
| dispatch stub.</p> |
| |
| <p>This path is activated by adding the correct <code>#ifdef</code> magic to |
| <code>src/mesa/glapi/glapi.c</code> just before <code>glprocs.h</code> is |
| included.</p> |
| |
| </div> |
| </body> |
| </html> |