Jim Stichnoth | 31c9559 | 2014-12-19 12:51:35 -0800 | [diff] [blame] | 1 | Object allocation and lifetime in ICE |
| 2 | ===================================== |
| 3 | |
| 4 | This document discusses object lifetime and scoping issues, starting with |
| 5 | bitcode parsing and ending with ELF file emission. |
| 6 | |
| 7 | Multithreaded translation model |
| 8 | ------------------------------- |
| 9 | |
| 10 | A single thread is responsible for parsing PNaCl bitcode (possibly concurrently |
| 11 | with downloading the bitcode file) and constructing the initial high-level ICE. |
| 12 | The result is a queue of Cfg pointers. The parser thread incrementally adds a |
| 13 | Cfg pointer to the queue after the Cfg is created, and then moves on to parse |
| 14 | the next function. |
| 15 | |
| 16 | Multiple translation worker threads draw from the queue of Cfg pointers as they |
| 17 | are added to the queue, such that several functions can be translated in parallel. |
| 18 | The result is a queue of assembler buffers, each of which consists of machine code |
| 19 | plus fixups. |
| 20 | |
| 21 | A single thread is responsible for writing the assembler buffers to an ELF file. |
| 22 | It consumes the assembler buffers from the queue that the translation threads |
| 23 | write to. |
| 24 | |
| 25 | This means that Cfgs are created by the parser thread and destroyed by the |
| 26 | translation thread (including Cfg nodes, instructions, and most kinds of |
| 27 | operands), and assembler buffers are created by the translation thread and |
| 28 | destroyed by the writer thread. |
| 29 | |
| 30 | Deterministic execution |
| 31 | ^^^^^^^^^^^^^^^^^^^^^^^ |
| 32 | |
| 33 | Although code randomization is a key aspect of security, deterministic and |
| 34 | repeatable translation is sometimes needed, e.g. for regression testing. |
| 35 | Multithreaded translation introduces potential for randomness that may need to |
| 36 | be made deterministic. |
| 37 | |
| 38 | * Bitcode parsing is sequential, so it's easy to use a FIFO queue to keep the |
| 39 | translation queue in deterministic order. But since translation is |
| 40 | multithreaded, FIFO order for the assembler buffer queue may not be |
| 41 | deterministic. The writer thread would be responsible for reordering the |
| 42 | buffers, potentially waiting for slower translations to complete even if other |
| 43 | assembler buffers are available. |
| 44 | |
| 45 | * Different translation threads may add new constant pool entries at different |
| 46 | times. Some constant pool entries are emitted as read-only data. This |
| 47 | includes floating-point constants for x86, as well as integer immediate |
| 48 | randomization through constant pooling. These constant pool entries are |
| 49 | emitted after all assembler buffers have been written. The writer needs to be |
| 50 | able to sort them deterministically before emitting them. |
| 51 | |
| 52 | Object lifetimes |
| 53 | ---------------- |
| 54 | |
| 55 | Objects of type Constant, or a subclass of Constant, are pooled globally. The |
| 56 | pooling is managed by the GlobalContext class. Since Constants are added or |
| 57 | looked up by translation threads and the parser thread, access to the constant |
| 58 | pools, as well as GlobalContext in general, need to be arbitrated by locks. |
| 59 | (It's possible that if there's too much contention, we can maintain a |
| 60 | thread-local cache for Constant pool lookups.) Constants live across all |
| 61 | function translations, and are destroyed only at the end. |
| 62 | |
| 63 | Several object types are scoped within the lifetime of the Cfg. These include |
| 64 | CfgNode, Inst, Variable, and any target-specific subclasses of Inst and Operand. |
| 65 | When the Cfg is destroyed, these scoped objects are destroyed as well. To keep |
| 66 | this cheap, the Cfg includes a slab allocator from which these objects are |
| 67 | allocated, and the objects should not contain fields with non-trivial |
| 68 | destructors. Most of these fields are POD, but in a couple of cases these |
| 69 | fields are STL containers. We deal with this, and avoid leaking memory, by |
| 70 | providing the container with an allocator that uses the Cfg-local slab |
| 71 | allocator. Since the container allocator generally needs to be stateless, we |
| 72 | store a pointer to the slab allocator in thread-local storage (TLS). This is |
| 73 | straightforward since on any of the threads, only one Cfg is active at a time, |
| 74 | and a given Cfg is only active in one thread at a time (either the parser |
| 75 | thread, or at most one translation thread, or the writer thread). |
| 76 | |
| 77 | Even though there is a one-to-one correspondence between Cfgs and assembler |
| 78 | buffers, they need to use different allocators. This is because the translation |
| 79 | thread wants to destroy the Cfg and reclaim all its memory after translation |
| 80 | completes, but possibly before the assembly buffer is written to the ELF file. |
| 81 | Ownership of the assembler buffer and its allocator are transferred to the |
| 82 | writer thread after translation completes, similar to the way ownership of the |
| 83 | Cfg and its allocator are transferred to the translation thread after parsing |
| 84 | completes. |
| 85 | |
| 86 | Allocators and TLS |
| 87 | ------------------ |
| 88 | |
| 89 | Part of the Cfg building, and transformations on the Cfg, include STL container |
| 90 | operations which may need to allocate additional memory in a stateless fashion. |
| 91 | This requires maintaining the proper slab allocator pointer in TLS. |
| 92 | |
| 93 | When the parser thread creates a new Cfg object, it puts a pointer to the Cfg's |
| 94 | slab allocator into its own TLS. This is used as the Cfg is built within the |
| 95 | parser thread. After the Cfg is built, the parser thread clears its allocator |
| 96 | pointer, adds the new Cfg pointer to the translation queue, continues with the |
| 97 | next function. |
| 98 | |
| 99 | When the translation thread grabs a new Cfg pointer, it installs the Cfg's slab |
| 100 | allocator into its TLS and translates the function. When generating the |
| 101 | assembly buffer, it must take care not to use the Cfg's slab allocator. If |
| 102 | there is a slab allocator for the assembler buffer, a pointer to it can also be |
| 103 | installed in TLS if needed. |
| 104 | |
| 105 | The translation thread destroys the Cfg when it is done translating, including |
| 106 | the Cfg's slab allocator, and clears the allocator pointer from its TLS. |
| 107 | Likewise, the writer thread destroys the assembler buffer when it is finished |
| 108 | with it. |
| 109 | |
| 110 | Thread safety |
| 111 | ------------- |
| 112 | |
| 113 | The parse/translate/write stages of the translation pipeline are fairly |
| 114 | independent, with little opportunity for threads to interfere. The Subzero |
| 115 | design calls for all shared accesses to go through the GlobalContext, which adds |
| 116 | locking as appropriate. This includes the coarse-grain work queues for Cfgs and |
| 117 | assembler buffers. It also includes finer-grain access to constant pool |
| 118 | entries, as well as output streams for verbose debugging output. |
| 119 | |
| 120 | If locked access to constant pools becomes a bottleneck, we can investigate |
| 121 | thread-local caches of constants (as mentioned earlier). Also, it should be |
| 122 | safe though slightly less efficient to allow duplicate copies of constants |
| 123 | across threads (which could be de-dupped by the writer at the end). |
| 124 | |
| 125 | We will use ThreadSanitizer as a way to detect potential data races in the |
| 126 | implementation. |