Andreas Boll | ecd5c7c | 2012-06-12 09:05:03 +0200 | [diff] [blame] | 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> |
| 2 | <html lang="en"> |
| 3 | <head> |
| 4 | <meta http-equiv="content-type" content="text/html; charset=utf-8"> |
| 5 | <title>GL Dispatch in Mesa</title> |
| 6 | <link rel="stylesheet" type="text/css" href="mesa.css"> |
| 7 | </head> |
| 8 | <body> |
Andreas Boll | b5da52a | 2012-09-18 18:57:02 +0200 | [diff] [blame] | 9 | |
| 10 | <div class="header"> |
| 11 | <h1>The Mesa 3D Graphics Library</h1> |
| 12 | </div> |
| 13 | |
| 14 | <iframe src="contents.html"></iframe> |
| 15 | <div class="content"> |
| 16 | |
Andreas Boll | ecd5c7c | 2012-06-12 09:05:03 +0200 | [diff] [blame] | 17 | <h1>GL Dispatch in Mesa</h1> |
Ian Romanick | fcd7588 | 2006-10-09 18:26:03 +0000 | [diff] [blame] | 18 | |
| 19 | <p>Several factors combine to make efficient dispatch of OpenGL functions |
| 20 | fairly complicated. This document attempts to explain some of the issues |
| 21 | and introduce the reader to Mesa's implementation. Readers already familiar |
Andreas Boll | 210a27d | 2012-06-12 09:05:36 +0200 | [diff] [blame] | 22 | with the issues around GL dispatch can safely skip ahead to the <a |
| 23 | href="#overview">overview of Mesa's implementation</a>.</p> |
Ian Romanick | fcd7588 | 2006-10-09 18:26:03 +0000 | [diff] [blame] | 24 | |
Andreas Boll | 210a27d | 2012-06-12 09:05:36 +0200 | [diff] [blame] | 25 | <h2>1. Complexity of GL Dispatch</h2> |
Ian Romanick | fcd7588 | 2006-10-09 18:26:03 +0000 | [diff] [blame] | 26 | |
| 27 | <p>Every GL application has at least one object called a GL <em>context</em>. |
Nathan Kidd | 0691b37 | 2014-01-03 16:44:00 -0700 | [diff] [blame] | 28 | This object, which is an implicit parameter to every GL function, stores all |
Ian Romanick | fcd7588 | 2006-10-09 18:26:03 +0000 | [diff] [blame] | 29 | of the GL related state for the application. Every texture, every buffer |
| 30 | object, every enable, and much, much more is stored in the context. Since |
| 31 | an application can have more than one context, the context to be used is |
| 32 | selected by a window-system dependent function such as |
| 33 | <tt>glXMakeContextCurrent</tt>.</p> |
| 34 | |
| 35 | <p>In environments that implement OpenGL with X-Windows using GLX, every GL |
| 36 | function, including the pointers returned by <tt>glXGetProcAddress</tt>, are |
| 37 | <em>context independent</em>. This means that no matter what context is |
| 38 | currently active, the same <tt>glVertex3fv</tt> function is used.</p> |
| 39 | |
| 40 | <p>This creates the first bit of dispatch complexity. An application can |
| 41 | have two GL contexts. One context is a direct rendering context where |
| 42 | function calls are routed directly to a driver loaded within the |
| 43 | application's address space. The other context is an indirect rendering |
| 44 | context where function calls are converted to GLX protocol and sent to a |
| 45 | server. The same <tt>glVertex3fv</tt> has to do the right thing depending |
| 46 | on which context is current.</p> |
| 47 | |
| 48 | <p>Highly optimized drivers or GLX protocol implementations may want to |
| 49 | change the behavior of GL functions depending on current state. For |
| 50 | example, <tt>glFogCoordf</tt> may operate differently depending on whether |
| 51 | or not fog is enabled.</p> |
| 52 | |
| 53 | <p>In multi-threaded environments, it is possible for each thread to have a |
Nathan Kidd | 0691b37 | 2014-01-03 16:44:00 -0700 | [diff] [blame] | 54 | different GL context current. This means that poor old <tt>glVertex3fv</tt> |
Ian Romanick | fcd7588 | 2006-10-09 18:26:03 +0000 | [diff] [blame] | 55 | has to know which GL context is current in the thread where it is being |
| 56 | called.</p> |
| 57 | |
Andreas Boll | cc41888 | 2012-06-12 09:05:33 +0200 | [diff] [blame] | 58 | <h2 id="overview">2. Overview of Mesa's Implementation</h2> |
Ian Romanick | fcd7588 | 2006-10-09 18:26:03 +0000 | [diff] [blame] | 59 | |
| 60 | <p>Mesa uses two per-thread pointers. The first pointer stores the address |
| 61 | of the context current in the thread, and the second pointer stores the |
| 62 | address of the <em>dispatch table</em> associated with that context. The |
| 63 | dispatch table stores pointers to functions that actually implement |
| 64 | specific GL functions. Each time a new context is made current in a thread, |
| 65 | these pointers a updated.</p> |
| 66 | |
| 67 | <p>The implementation of functions such as <tt>glVertex3fv</tt> becomes |
| 68 | conceptually simple:</p> |
| 69 | |
| 70 | <ul> |
| 71 | <li>Fetch the current dispatch table pointer.</li> |
| 72 | <li>Fetch the pointer to the real <tt>glVertex3fv</tt> function from the |
| 73 | table.</li> |
| 74 | <li>Call the real function.</li> |
| 75 | </ul> |
| 76 | |
| 77 | <p>This can be implemented in just a few lines of C code. The file |
| 78 | <tt>src/mesa/glapi/glapitemp.h</tt> contains code very similar to this.</p> |
| 79 | |
| 80 | <blockquote> |
| 81 | <table border="1"> |
| 82 | <tr><td><pre> |
| 83 | void glVertex3f(GLfloat x, GLfloat y, GLfloat z) |
| 84 | { |
| 85 | const struct _glapi_table * const dispatch = GET_DISPATCH(); |
Andreas Boll | fd64b39 | 2012-06-12 09:05:49 +0200 | [diff] [blame] | 86 | |
Ian Romanick | fcd7588 | 2006-10-09 18:26:03 +0000 | [diff] [blame] | 87 | (*dispatch->Vertex3f)(x, y, z); |
| 88 | }</pre></td></tr> |
| 89 | <tr><td>Sample dispatch function</td></tr></table> |
| 90 | </blockquote> |
| 91 | |
| 92 | <p>The problem with this simple implementation is the large amount of |
| 93 | overhead that it adds to every GL function call.</p> |
| 94 | |
Homer Hsing | ed9d1be | 2012-05-21 08:07:20 -0600 | [diff] [blame] | 95 | <p>In a multithreaded environment, a naive implementation of |
Ian Romanick | fcd7588 | 2006-10-09 18:26:03 +0000 | [diff] [blame] | 96 | <tt>GET_DISPATCH</tt> involves a call to <tt>pthread_getspecific</tt> or a |
| 97 | similar function. Mesa provides a wrapper function called |
| 98 | <tt>_glapi_get_dispatch</tt> that is used by default.</p> |
| 99 | |
Andreas Boll | 210a27d | 2012-06-12 09:05:36 +0200 | [diff] [blame] | 100 | <h2>3. Optimizations</h2> |
Ian Romanick | fcd7588 | 2006-10-09 18:26:03 +0000 | [diff] [blame] | 101 | |
| 102 | <p>A number of optimizations have been made over the years to diminish the |
| 103 | performance hit imposed by GL dispatch. This section describes these |
| 104 | optimizations. The benefits of each optimization and the situations where |
| 105 | each can or cannot be used are listed.</p> |
| 106 | |
Andreas Boll | 210a27d | 2012-06-12 09:05:36 +0200 | [diff] [blame] | 107 | <h3>3.1. Dual dispatch table pointers</h3> |
Ian Romanick | fcd7588 | 2006-10-09 18:26:03 +0000 | [diff] [blame] | 108 | |
| 109 | <p>The vast majority of OpenGL applications use the API in a single threaded |
| 110 | manner. That is, the application has only one thread that makes calls into |
| 111 | the GL. In these cases, not only do the calls to |
| 112 | <tt>pthread_getspecific</tt> hurt performance, but they are completely |
| 113 | unnecessary! It is possible to detect this common case and avoid these |
| 114 | calls.</p> |
| 115 | |
| 116 | <p>Each time a new dispatch table is set, Mesa examines and records the ID |
| 117 | of the executing thread. If the same thread ID is always seen, Mesa knows |
| 118 | that the application is, from OpenGL's point of view, single threaded.</p> |
| 119 | |
| 120 | <p>As long as an application is single threaded, Mesa stores a pointer to |
| 121 | the dispatch table in a global variable called <tt>_glapi_Dispatch</tt>. |
| 122 | The pointer is also stored in a per-thread location via |
| 123 | <tt>pthread_setspecific</tt>. When Mesa detects that an application has |
| 124 | become multithreaded, <tt>NULL</tt> is stored in <tt>_glapi_Dispatch</tt>.</p> |
| 125 | |
| 126 | <p>Using this simple mechanism the dispatch functions can detect the |
| 127 | multithreaded case by comparing <tt>_glapi_Dispatch</tt> to <tt>NULL</tt>. |
| 128 | The resulting implementation of <tt>GET_DISPATCH</tt> is slightly more |
| 129 | complex, but it avoids the expensive <tt>pthread_getspecific</tt> call in |
| 130 | the common case.</p> |
| 131 | |
| 132 | <blockquote> |
| 133 | <table border="1"> |
| 134 | <tr><td><pre> |
| 135 | #define GET_DISPATCH() \ |
| 136 | (_glapi_Dispatch != NULL) \ |
| 137 | ? _glapi_Dispatch : pthread_getspecific(&_glapi_Dispatch_key) |
| 138 | </pre></td></tr> |
| 139 | <tr><td>Improved <tt>GET_DISPATCH</tt> Implementation</td></tr></table> |
| 140 | </blockquote> |
| 141 | |
Andreas Boll | 210a27d | 2012-06-12 09:05:36 +0200 | [diff] [blame] | 142 | <h3>3.2. ELF TLS</h3> |
Ian Romanick | fcd7588 | 2006-10-09 18:26:03 +0000 | [diff] [blame] | 143 | |
| 144 | <p>Starting with the 2.4.20 Linux kernel, each thread is allocated an area |
| 145 | of per-thread, global storage. Variables can be put in this area using some |
| 146 | extensions to GCC. By storing the dispatch table pointer in this area, the |
| 147 | expensive call to <tt>pthread_getspecific</tt> and the test of |
| 148 | <tt>_glapi_Dispatch</tt> can be avoided.</p> |
| 149 | |
| 150 | <p>The dispatch table pointer is stored in a new variable called |
| 151 | <tt>_glapi_tls_Dispatch</tt>. A new variable name is used so that a single |
| 152 | libGL can implement both interfaces. This allows the libGL to operate with |
| 153 | direct rendering drivers that use either interface. Once the pointer is |
| 154 | properly declared, <tt>GET_DISPACH</tt> becomes a simple variable |
| 155 | reference.</p> |
| 156 | |
| 157 | <blockquote> |
| 158 | <table border="1"> |
| 159 | <tr><td><pre> |
| 160 | extern __thread struct _glapi_table *_glapi_tls_Dispatch |
| 161 | __attribute__((tls_model("initial-exec"))); |
| 162 | |
| 163 | #define GET_DISPATCH() _glapi_tls_Dispatch |
| 164 | </pre></td></tr> |
| 165 | <tr><td>TLS <tt>GET_DISPATCH</tt> Implementation</td></tr></table> |
| 166 | </blockquote> |
| 167 | |
| 168 | <p>Use of this path is controlled by the preprocessor define |
| 169 | <tt>GLX_USE_TLS</tt>. Any platform capable of using TLS should use this as |
| 170 | the default dispatch method.</p> |
| 171 | |
Andreas Boll | 210a27d | 2012-06-12 09:05:36 +0200 | [diff] [blame] | 172 | <h3>3.3. Assembly Language Dispatch Stubs</h3> |
Ian Romanick | fcd7588 | 2006-10-09 18:26:03 +0000 | [diff] [blame] | 173 | |
| 174 | <p>Many platforms has difficulty properly optimizing the tail-call in the |
| 175 | dispatch stubs. Platforms like x86 that pass parameters on the stack seem |
| 176 | to have even more difficulty optimizing these routines. All of the dispatch |
| 177 | routines are very short, and it is trivial to create optimal assembly |
| 178 | language versions. The amount of optimization provided by using assembly |
| 179 | stubs varies from platform to platform and application to application. |
| 180 | However, by using the assembly stubs, many platforms can use an additional |
Andreas Boll | 210a27d | 2012-06-12 09:05:36 +0200 | [diff] [blame] | 181 | space optimization (see <a href="#fixedsize">below</a>).</p> |
Ian Romanick | fcd7588 | 2006-10-09 18:26:03 +0000 | [diff] [blame] | 182 | |
| 183 | <p>The biggest hurdle to creating assembly stubs is handling the various |
| 184 | ways that the dispatch table pointer can be accessed. There are four |
| 185 | different methods that can be used:</p> |
| 186 | |
| 187 | <ol> |
Emil Velikov | 5e3276f | 2015-03-06 17:07:11 +0000 | [diff] [blame] | 188 | <li>Using <tt>_glapi_Dispatch</tt> directly in builds for non-multithreaded |
| 189 | environments.</li> |
Ian Romanick | fcd7588 | 2006-10-09 18:26:03 +0000 | [diff] [blame] | 190 | <li>Using <tt>_glapi_Dispatch</tt> and <tt>_glapi_get_dispatch</tt> in |
| 191 | multithreaded environments.</li> |
| 192 | <li>Using <tt>_glapi_Dispatch</tt> and <tt>pthread_getspecific</tt> in |
| 193 | multithreaded environments.</li> |
| 194 | <li>Using <tt>_glapi_tls_Dispatch</tt> directly in TLS enabled |
| 195 | multithreaded environments.</li> |
| 196 | </ol> |
| 197 | |
| 198 | <p>People wishing to implement assembly stubs for new platforms should focus |
| 199 | on #4 if the new platform supports TLS. Otherwise, implement #2 followed by |
| 200 | #3. Environments that do not support multithreading are uncommon and not |
| 201 | terribly relevant.</p> |
| 202 | |
| 203 | <p>Selection of the dispatch table pointer access method is controlled by a |
| 204 | few preprocessor defines.</p> |
| 205 | |
| 206 | <ul> |
Emil Velikov | a385d18 | 2015-03-06 16:54:59 +0000 | [diff] [blame] | 207 | <li>If <tt>GLX_USE_TLS</tt> is defined, method #3 is used.</li> |
| 208 | <li>If <tt>HAVE_PTHREAD</tt> is defined, method #2 is used.</li> |
Nathan Kidd | 0691b37 | 2014-01-03 16:44:00 -0700 | [diff] [blame] | 209 | <li>If none of the preceding are defined, method #1 is used.</li> |
Ian Romanick | fcd7588 | 2006-10-09 18:26:03 +0000 | [diff] [blame] | 210 | </ul> |
| 211 | |
| 212 | <p>Two different techniques are used to handle the various different cases. |
| 213 | On x86 and SPARC, a macro called <tt>GL_STUB</tt> is used. In the preamble |
| 214 | of the assembly source file different implementations of the macro are |
Nathan Kidd | 0691b37 | 2014-01-03 16:44:00 -0700 | [diff] [blame] | 215 | selected based on the defined preprocessor variables. The assembly code |
Ian Romanick | fcd7588 | 2006-10-09 18:26:03 +0000 | [diff] [blame] | 216 | then consists of a series of invocations of the macros such as: |
| 217 | |
| 218 | <blockquote> |
| 219 | <table border="1"> |
| 220 | <tr><td><pre> |
| 221 | GL_STUB(Color3fv, _gloffset_Color3fv) |
| 222 | </pre></td></tr> |
| 223 | <tr><td>SPARC Assembly Implementation of <tt>glColor3fv</tt></td></tr></table> |
| 224 | </blockquote> |
| 225 | |
| 226 | <p>The benefit of this technique is that changes to the calling pattern |
| 227 | (i.e., addition of a new dispatch table pointer access method) require fewer |
| 228 | changed lines in the assembly code.</p> |
| 229 | |
| 230 | <p>However, this technique can only be used on platforms where the function |
| 231 | implementation does not change based on the parameters passed to the |
| 232 | function. For example, since x86 passes all parameters on the stack, no |
| 233 | additional code is needed to save and restore function parameters around a |
| 234 | call to <tt>pthread_getspecific</tt>. Since x86-64 passes parameters in |
| 235 | registers, varying amounts of code needs to be inserted around the call to |
| 236 | <tt>pthread_getspecific</tt> to save and restore the GL function's |
| 237 | parameters.</p> |
| 238 | |
| 239 | <p>The other technique, used by platforms like x86-64 that cannot use the |
| 240 | first technique, is to insert <tt>#ifdef</tt> within the assembly |
| 241 | implementation of each function. This makes the assembly file considerably |
| 242 | larger (e.g., 29,332 lines for <tt>glapi_x86-64.S</tt> versus 1,155 lines for |
| 243 | <tt>glapi_x86.S</tt>) and causes simple changes to the function |
Nathan Kidd | 0691b37 | 2014-01-03 16:44:00 -0700 | [diff] [blame] | 244 | implementation to generate many lines of diffs. Since the assembly files |
Andreas Boll | 210a27d | 2012-06-12 09:05:36 +0200 | [diff] [blame] | 245 | are typically generated by scripts (see <a href="#autogen">below</a>), this |
Ian Romanick | fcd7588 | 2006-10-09 18:26:03 +0000 | [diff] [blame] | 246 | isn't a significant problem.</p> |
| 247 | |
| 248 | <p>Once a new assembly file is created, it must be inserted in the build |
| 249 | system. There are two steps to this. The file must first be added to |
| 250 | <tt>src/mesa/sources</tt>. That gets the file built and linked. The second |
| 251 | step is to add the correct <tt>#ifdef</tt> magic to |
Chia-I Wu | 27d260b | 2010-02-24 11:20:14 +0800 | [diff] [blame] | 252 | <tt>src/mesa/glapi/glapi_dispatch.c</tt> to prevent the C version of the |
| 253 | dispatch functions from being built.</p> |
Ian Romanick | fcd7588 | 2006-10-09 18:26:03 +0000 | [diff] [blame] | 254 | |
Andreas Boll | cc41888 | 2012-06-12 09:05:33 +0200 | [diff] [blame] | 255 | <h3 id="fixedsize">3.4. Fixed-Length Dispatch Stubs</h3> |
Ian Romanick | fcd7588 | 2006-10-09 18:26:03 +0000 | [diff] [blame] | 256 | |
| 257 | <p>To implement <tt>glXGetProcAddress</tt>, Mesa stores a table that |
| 258 | associates function names with pointers to those functions. This table is |
| 259 | stored in <tt>src/mesa/glapi/glprocs.h</tt>. For different reasons on |
| 260 | different platforms, storing all of those pointers is inefficient. On most |
| 261 | platforms, including all known platforms that support TLS, we can avoid this |
| 262 | added overhead.</p> |
| 263 | |
| 264 | <p>If the assembly stubs are all the same size, the pointer need not be |
| 265 | stored for every function. The location of the function can instead be |
| 266 | calculated by multiplying the size of the dispatch stub by the offset of the |
| 267 | function in the table. This value is then added to the address of the first |
| 268 | dispatch stub.</p> |
| 269 | |
| 270 | <p>This path is activated by adding the correct <tt>#ifdef</tt> magic to |
| 271 | <tt>src/mesa/glapi/glapi.c</tt> just before <tt>glprocs.h</tt> is |
| 272 | included.</p> |
| 273 | |
Andreas Boll | cc41888 | 2012-06-12 09:05:33 +0200 | [diff] [blame] | 274 | <h2 id="autogen">4. Automatic Generation of Dispatch Stubs</h2> |
Ian Romanick | fcd7588 | 2006-10-09 18:26:03 +0000 | [diff] [blame] | 275 | |
Andreas Boll | b5da52a | 2012-09-18 18:57:02 +0200 | [diff] [blame] | 276 | </div> |
Andreas Boll | ecd5c7c | 2012-06-12 09:05:03 +0200 | [diff] [blame] | 277 | </body> |
| 278 | </html> |