| Renato Golin | ca10564 | 2014-03-20 16:08:34 +0000 | [diff] [blame] | 1 | ======== | 
|  | 2 | TableGen | 
|  | 3 | ======== | 
|  | 4 |  | 
|  | 5 | .. contents:: | 
|  | 6 | :local: | 
|  | 7 |  | 
|  | 8 | .. toctree:: | 
|  | 9 | :hidden: | 
|  | 10 |  | 
|  | 11 | BackEnds | 
|  | 12 | LangRef | 
| Renato Golin | 33f973a | 2014-04-01 09:51:49 +0000 | [diff] [blame] | 13 | LangIntro | 
| Renato Golin | ca10564 | 2014-03-20 16:08:34 +0000 | [diff] [blame] | 14 | Deficiencies | 
|  | 15 |  | 
|  | 16 | Introduction | 
|  | 17 | ============ | 
|  | 18 |  | 
|  | 19 | TableGen's purpose is to help a human develop and maintain records of | 
|  | 20 | domain-specific information.  Because there may be a large number of these | 
|  | 21 | records, it is specifically designed to allow writing flexible descriptions and | 
|  | 22 | for common features of these records to be factored out.  This reduces the | 
|  | 23 | amount of duplication in the description, reduces the chance of error, and makes | 
|  | 24 | it easier to structure domain specific information. | 
|  | 25 |  | 
|  | 26 | The core part of TableGen parses a file, instantiates the declarations, and | 
| Eli Bendersky | 1f30b0b | 2014-03-20 17:45:30 +0000 | [diff] [blame] | 27 | hands the result off to a domain-specific `backend`_ for processing. | 
| Renato Golin | ca10564 | 2014-03-20 16:08:34 +0000 | [diff] [blame] | 28 |  | 
|  | 29 | The current major users of TableGen are :doc:`../CodeGenerator` | 
|  | 30 | and the | 
|  | 31 | `Clang diagnostics and attributes <http://clang.llvm.org/docs/UsersManual.html#controlling-errors-and-warnings>`_. | 
|  | 32 |  | 
|  | 33 | Note that if you work on TableGen much, and use emacs or vim, that you can find | 
|  | 34 | an emacs "TableGen mode" and a vim language file in the ``llvm/utils/emacs`` and | 
|  | 35 | ``llvm/utils/vim`` directories of your LLVM distribution, respectively. | 
|  | 36 |  | 
|  | 37 | .. _intro: | 
|  | 38 |  | 
|  | 39 |  | 
|  | 40 | The TableGen program | 
|  | 41 | ==================== | 
|  | 42 |  | 
|  | 43 | TableGen files are interpreted by the TableGen program: `llvm-tblgen` available | 
|  | 44 | on your build directory under `bin`. It is not installed in the system (or where | 
|  | 45 | your sysroot is set to), since it has no use beyond LLVM's build process. | 
|  | 46 |  | 
|  | 47 | Running TableGen | 
|  | 48 | ---------------- | 
|  | 49 |  | 
|  | 50 | TableGen runs just like any other LLVM tool.  The first (optional) argument | 
|  | 51 | specifies the file to read.  If a filename is not specified, ``llvm-tblgen`` | 
|  | 52 | reads from standard input. | 
|  | 53 |  | 
|  | 54 | To be useful, one of the `backends`_ must be used.  These backends are | 
|  | 55 | selectable on the command line (type '``llvm-tblgen -help``' for a list).  For | 
|  | 56 | example, to get a list of all of the definitions that subclass a particular type | 
|  | 57 | (which can be useful for building up an enum list of these records), use the | 
|  | 58 | ``-print-enums`` option: | 
|  | 59 |  | 
|  | 60 | .. code-block:: bash | 
|  | 61 |  | 
|  | 62 | $ llvm-tblgen X86.td -print-enums -class=Register | 
|  | 63 | AH, AL, AX, BH, BL, BP, BPL, BX, CH, CL, CX, DH, DI, DIL, DL, DX, EAX, EBP, EBX, | 
|  | 64 | ECX, EDI, EDX, EFLAGS, EIP, ESI, ESP, FP0, FP1, FP2, FP3, FP4, FP5, FP6, IP, | 
|  | 65 | MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, R10, R10B, R10D, R10W, R11, R11B, R11D, | 
|  | 66 | R11W, R12, R12B, R12D, R12W, R13, R13B, R13D, R13W, R14, R14B, R14D, R14W, R15, | 
|  | 67 | R15B, R15D, R15W, R8, R8B, R8D, R8W, R9, R9B, R9D, R9W, RAX, RBP, RBX, RCX, RDI, | 
|  | 68 | RDX, RIP, RSI, RSP, SI, SIL, SP, SPL, ST0, ST1, ST2, ST3, ST4, ST5, ST6, ST7, | 
|  | 69 | XMM0, XMM1, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, XMM2, XMM3, XMM4, XMM5, | 
|  | 70 | XMM6, XMM7, XMM8, XMM9, | 
|  | 71 |  | 
|  | 72 | $ llvm-tblgen X86.td -print-enums -class=Instruction | 
|  | 73 | ABS_F, ABS_Fp32, ABS_Fp64, ABS_Fp80, ADC32mi, ADC32mi8, ADC32mr, ADC32ri, | 
|  | 74 | ADC32ri8, ADC32rm, ADC32rr, ADC64mi32, ADC64mi8, ADC64mr, ADC64ri32, ADC64ri8, | 
|  | 75 | ADC64rm, ADC64rr, ADD16mi, ADD16mi8, ADD16mr, ADD16ri, ADD16ri8, ADD16rm, | 
|  | 76 | ADD16rr, ADD32mi, ADD32mi8, ADD32mr, ADD32ri, ADD32ri8, ADD32rm, ADD32rr, | 
|  | 77 | ADD64mi32, ADD64mi8, ADD64mr, ADD64ri32, ... | 
|  | 78 |  | 
|  | 79 | The default backend prints out all of the records. | 
|  | 80 |  | 
|  | 81 | If you plan to use TableGen, you will most likely have to write a `backend`_ | 
|  | 82 | that extracts the information specific to what you need and formats it in the | 
|  | 83 | appropriate way. | 
|  | 84 |  | 
|  | 85 | Example | 
|  | 86 | ------- | 
|  | 87 |  | 
|  | 88 | With no other arguments, `llvm-tblgen` parses the specified file and prints out all | 
|  | 89 | of the classes, then all of the definitions.  This is a good way to see what the | 
|  | 90 | various definitions expand to fully.  Running this on the ``X86.td`` file prints | 
|  | 91 | this (at the time of this writing): | 
|  | 92 |  | 
|  | 93 | .. code-block:: llvm | 
|  | 94 |  | 
|  | 95 | ... | 
|  | 96 | def ADD32rr {   // Instruction X86Inst I | 
|  | 97 | string Namespace = "X86"; | 
|  | 98 | dag OutOperandList = (outs GR32:$dst); | 
|  | 99 | dag InOperandList = (ins GR32:$src1, GR32:$src2); | 
|  | 100 | string AsmString = "add{l}\t{$src2, $dst|$dst, $src2}"; | 
|  | 101 | list<dag> Pattern = [(set GR32:$dst, (add GR32:$src1, GR32:$src2))]; | 
|  | 102 | list<Register> Uses = []; | 
|  | 103 | list<Register> Defs = [EFLAGS]; | 
|  | 104 | list<Predicate> Predicates = []; | 
|  | 105 | int CodeSize = 3; | 
|  | 106 | int AddedComplexity = 0; | 
|  | 107 | bit isReturn = 0; | 
|  | 108 | bit isBranch = 0; | 
|  | 109 | bit isIndirectBranch = 0; | 
|  | 110 | bit isBarrier = 0; | 
|  | 111 | bit isCall = 0; | 
|  | 112 | bit canFoldAsLoad = 0; | 
|  | 113 | bit mayLoad = 0; | 
|  | 114 | bit mayStore = 0; | 
|  | 115 | bit isImplicitDef = 0; | 
|  | 116 | bit isConvertibleToThreeAddress = 1; | 
|  | 117 | bit isCommutable = 1; | 
|  | 118 | bit isTerminator = 0; | 
|  | 119 | bit isReMaterializable = 0; | 
|  | 120 | bit isPredicable = 0; | 
|  | 121 | bit hasDelaySlot = 0; | 
|  | 122 | bit usesCustomInserter = 0; | 
|  | 123 | bit hasCtrlDep = 0; | 
|  | 124 | bit isNotDuplicable = 0; | 
|  | 125 | bit hasSideEffects = 0; | 
| Renato Golin | ca10564 | 2014-03-20 16:08:34 +0000 | [diff] [blame] | 126 | InstrItinClass Itinerary = NoItinerary; | 
|  | 127 | string Constraints = ""; | 
|  | 128 | string DisableEncoding = ""; | 
|  | 129 | bits<8> Opcode = { 0, 0, 0, 0, 0, 0, 0, 1 }; | 
|  | 130 | Format Form = MRMDestReg; | 
|  | 131 | bits<6> FormBits = { 0, 0, 0, 0, 1, 1 }; | 
|  | 132 | ImmType ImmT = NoImm; | 
|  | 133 | bits<3> ImmTypeBits = { 0, 0, 0 }; | 
|  | 134 | bit hasOpSizePrefix = 0; | 
|  | 135 | bit hasAdSizePrefix = 0; | 
|  | 136 | bits<4> Prefix = { 0, 0, 0, 0 }; | 
|  | 137 | bit hasREX_WPrefix = 0; | 
|  | 138 | FPFormat FPForm = ?; | 
|  | 139 | bits<3> FPFormBits = { 0, 0, 0 }; | 
|  | 140 | } | 
|  | 141 | ... | 
|  | 142 |  | 
|  | 143 | This definition corresponds to the 32-bit register-register ``add`` instruction | 
|  | 144 | of the x86 architecture.  ``def ADD32rr`` defines a record named | 
|  | 145 | ``ADD32rr``, and the comment at the end of the line indicates the superclasses | 
|  | 146 | of the definition.  The body of the record contains all of the data that | 
|  | 147 | TableGen assembled for the record, indicating that the instruction is part of | 
| Eli Bendersky | 1f30b0b | 2014-03-20 17:45:30 +0000 | [diff] [blame] | 148 | the "X86" namespace, the pattern indicating how the instruction is selected by | 
|  | 149 | the code generator, that it is a two-address instruction, has a particular | 
|  | 150 | encoding, etc.  The contents and semantics of the information in the record are | 
|  | 151 | specific to the needs of the X86 backend, and are only shown as an example. | 
| Renato Golin | ca10564 | 2014-03-20 16:08:34 +0000 | [diff] [blame] | 152 |  | 
|  | 153 | As you can see, a lot of information is needed for every instruction supported | 
|  | 154 | by the code generator, and specifying it all manually would be unmaintainable, | 
|  | 155 | prone to bugs, and tiring to do in the first place.  Because we are using | 
|  | 156 | TableGen, all of the information was derived from the following definition: | 
|  | 157 |  | 
|  | 158 | .. code-block:: llvm | 
|  | 159 |  | 
|  | 160 | let Defs = [EFLAGS], | 
|  | 161 | isCommutable = 1,                  // X = ADD Y,Z --> X = ADD Z,Y | 
|  | 162 | isConvertibleToThreeAddress = 1 in // Can transform into LEA. | 
|  | 163 | def ADD32rr  : I<0x01, MRMDestReg, (outs GR32:$dst), | 
|  | 164 | (ins GR32:$src1, GR32:$src2), | 
|  | 165 | "add{l}\t{$src2, $dst|$dst, $src2}", | 
|  | 166 | [(set GR32:$dst, (add GR32:$src1, GR32:$src2))]>; | 
|  | 167 |  | 
|  | 168 | This definition makes use of the custom class ``I`` (extended from the custom | 
|  | 169 | class ``X86Inst``), which is defined in the X86-specific TableGen file, to | 
|  | 170 | factor out the common features that instructions of its class share.  A key | 
|  | 171 | feature of TableGen is that it allows the end-user to define the abstractions | 
|  | 172 | they prefer to use when describing their information. | 
|  | 173 |  | 
|  | 174 | Each ``def`` record has a special entry called "NAME".  This is the name of the | 
|  | 175 | record ("``ADD32rr``" above).  In the general case ``def`` names can be formed | 
|  | 176 | from various kinds of string processing expressions and ``NAME`` resolves to the | 
|  | 177 | final value obtained after resolving all of those expressions.  The user may | 
|  | 178 | refer to ``NAME`` anywhere she desires to use the ultimate name of the ``def``. | 
|  | 179 | ``NAME`` should not be defined anywhere else in user code to avoid conflicts. | 
|  | 180 |  | 
|  | 181 | Syntax | 
|  | 182 | ====== | 
|  | 183 |  | 
| Eli Bendersky | 1f30b0b | 2014-03-20 17:45:30 +0000 | [diff] [blame] | 184 | TableGen has a syntax that is loosely based on C++ templates, with built-in | 
| Renato Golin | ca10564 | 2014-03-20 16:08:34 +0000 | [diff] [blame] | 185 | types and specification. In addition, TableGen's syntax introduces some | 
|  | 186 | automation concepts like multiclass, foreach, let, etc. | 
|  | 187 |  | 
|  | 188 | Basic concepts | 
|  | 189 | -------------- | 
|  | 190 |  | 
|  | 191 | TableGen files consist of two key parts: 'classes' and 'definitions', both of | 
|  | 192 | which are considered 'records'. | 
|  | 193 |  | 
|  | 194 | **TableGen records** have a unique name, a list of values, and a list of | 
|  | 195 | superclasses.  The list of values is the main data that TableGen builds for each | 
|  | 196 | record; it is this that holds the domain specific information for the | 
| Eli Bendersky | e6c97e0 | 2014-03-20 17:59:37 +0000 | [diff] [blame] | 197 | application.  The interpretation of this data is left to a specific `backend`_, | 
| Renato Golin | ca10564 | 2014-03-20 16:08:34 +0000 | [diff] [blame] | 198 | but the structure and format rules are taken care of and are fixed by | 
|  | 199 | TableGen. | 
|  | 200 |  | 
|  | 201 | **TableGen definitions** are the concrete form of 'records'.  These generally do | 
|  | 202 | not have any undefined values, and are marked with the '``def``' keyword. | 
|  | 203 |  | 
|  | 204 | .. code-block:: llvm | 
|  | 205 |  | 
|  | 206 | def FeatureFPARMv8 : SubtargetFeature<"fp-armv8", "HasFPARMv8", "true", | 
|  | 207 | "Enable ARMv8 FP">; | 
|  | 208 |  | 
|  | 209 | In this example, FeatureFPARMv8 is ``SubtargetFeature`` record initialised | 
|  | 210 | with some values. The names of the classes are defined via the | 
|  | 211 | keyword `class` either on the same file or some other included. Most target | 
|  | 212 | TableGen files include the generic ones in ``include/llvm/Target``. | 
|  | 213 |  | 
|  | 214 | **TableGen classes** are abstract records that are used to build and describe | 
|  | 215 | other records.  These classes allow the end-user to build abstractions for | 
|  | 216 | either the domain they are targeting (such as "Register", "RegisterClass", and | 
|  | 217 | "Instruction" in the LLVM code generator) or for the implementor to help factor | 
|  | 218 | out common properties of records (such as "FPInst", which is used to represent | 
|  | 219 | floating point instructions in the X86 backend).  TableGen keeps track of all of | 
|  | 220 | the classes that are used to build up a definition, so the backend can find all | 
|  | 221 | definitions of a particular class, such as "Instruction". | 
|  | 222 |  | 
|  | 223 | .. code-block:: llvm | 
|  | 224 |  | 
|  | 225 | class ProcNoItin<string Name, list<SubtargetFeature> Features> | 
|  | 226 | : Processor<Name, NoItineraries, Features>; | 
|  | 227 |  | 
|  | 228 | Here, the class ProcNoItin, receiving parameters `Name` of type `string` and | 
|  | 229 | a list of target features is specializing the class Processor by passing the | 
|  | 230 | arguments down as well as hard-coding NoItineraries. | 
|  | 231 |  | 
|  | 232 | **TableGen multiclasses** are groups of abstract records that are instantiated | 
|  | 233 | all at once.  Each instantiation can result in multiple TableGen definitions. | 
|  | 234 | If a multiclass inherits from another multiclass, the definitions in the | 
|  | 235 | sub-multiclass become part of the current multiclass, as if they were declared | 
|  | 236 | in the current multiclass. | 
|  | 237 |  | 
|  | 238 | .. code-block:: llvm | 
|  | 239 |  | 
|  | 240 | multiclass ro_signed_pats<string T, string Rm, dag Base, dag Offset, dag Extend, | 
|  | 241 | dag address, ValueType sty> { | 
|  | 242 | def : Pat<(i32 (!cast<SDNode>("sextload" # sty) address)), | 
|  | 243 | (!cast<Instruction>("LDRS" # T # "w_" # Rm # "_RegOffset") | 
|  | 244 | Base, Offset, Extend)>; | 
|  | 245 |  | 
|  | 246 | def : Pat<(i64 (!cast<SDNode>("sextload" # sty) address)), | 
|  | 247 | (!cast<Instruction>("LDRS" # T # "x_" # Rm # "_RegOffset") | 
|  | 248 | Base, Offset, Extend)>; | 
|  | 249 | } | 
|  | 250 |  | 
|  | 251 | defm : ro_signed_pats<"B", Rm, Base, Offset, Extend, | 
|  | 252 | !foreach(decls.pattern, address, | 
|  | 253 | !subst(SHIFT, imm_eq0, decls.pattern)), | 
|  | 254 | i8>; | 
|  | 255 |  | 
|  | 256 |  | 
|  | 257 |  | 
| Renato Golin | 33f973a | 2014-04-01 09:51:49 +0000 | [diff] [blame] | 258 | See the :doc:`TableGen Language Introduction <LangIntro>` for more generic | 
|  | 259 | information on the usage of the language, and the | 
|  | 260 | :doc:`TableGen Language Reference <LangRef>` for more in-depth description | 
|  | 261 | of the formal language specification. | 
| Renato Golin | ca10564 | 2014-03-20 16:08:34 +0000 | [diff] [blame] | 262 |  | 
|  | 263 | .. _backend: | 
|  | 264 | .. _backends: | 
|  | 265 |  | 
|  | 266 | TableGen backends | 
|  | 267 | ================= | 
|  | 268 |  | 
|  | 269 | TableGen files have no real meaning without a back-end. The default operation | 
|  | 270 | of running ``llvm-tblgen`` is to print the information in a textual format, but | 
|  | 271 | that's only useful for debugging of the TableGen files themselves. The power | 
|  | 272 | in TableGen is, however, to interpret the source files into an internal | 
|  | 273 | representation that can be generated into anything you want. | 
|  | 274 |  | 
| Jonathan Roelofs | b24884d | 2014-10-03 20:46:05 +0000 | [diff] [blame] | 275 | Current usage of TableGen is to create huge include files with tables that you | 
| Renato Golin | ca10564 | 2014-03-20 16:08:34 +0000 | [diff] [blame] | 276 | can either include directly (if the output is in the language you're coding), | 
|  | 277 | or be used in pre-processing via macros surrounding the include of the file. | 
|  | 278 |  | 
|  | 279 | Direct output can be used if the back-end already prints a table in C format | 
|  | 280 | or if the output is just a list of strings (for error and warning messages). | 
|  | 281 | Pre-processed output should be used if the same information needs to be used | 
|  | 282 | in different contexts (like Instruction names), so your back-end should print | 
|  | 283 | a meta-information list that can be shaped into different compile-time formats. | 
|  | 284 |  | 
|  | 285 | See the `TableGen BackEnds <BackEnds.html>`_ for more information. | 
|  | 286 |  | 
|  | 287 | TableGen Deficiencies | 
|  | 288 | ===================== | 
|  | 289 |  | 
|  | 290 | Despite being very generic, TableGen has some deficiencies that have been | 
|  | 291 | pointed out numerous times. The common theme is that, while TableGen allows | 
|  | 292 | you to build Domain-Specific-Languages, the final languages that you create | 
|  | 293 | lack the power of other DSLs, which in turn increase considerably the size | 
| JF Bastien | ac8b66b | 2014-08-05 23:27:34 +0000 | [diff] [blame] | 294 | and complexity of TableGen files. | 
| Renato Golin | ca10564 | 2014-03-20 16:08:34 +0000 | [diff] [blame] | 295 |  | 
|  | 296 | At the same time, TableGen allows you to create virtually any meaning of | 
|  | 297 | the basic concepts via custom-made back-ends, which can pervert the original | 
|  | 298 | design and make it very hard for newcomers to understand the evil TableGen | 
|  | 299 | file. | 
|  | 300 |  | 
| Eli Bendersky | e6c97e0 | 2014-03-20 17:59:37 +0000 | [diff] [blame] | 301 | There are some in favour of extending the semantics even more, but making sure | 
|  | 302 | back-ends adhere to strict rules. Others are suggesting we should move to less, | 
| Renato Golin | ca10564 | 2014-03-20 16:08:34 +0000 | [diff] [blame] | 303 | more powerful DSLs designed with specific purposes, or even re-using existing | 
|  | 304 | DSLs. | 
|  | 305 |  | 
| Eli Bendersky | e6c97e0 | 2014-03-20 17:59:37 +0000 | [diff] [blame] | 306 | Either way, this is a discussion that will likely span across several years, | 
| Renato Golin | ca10564 | 2014-03-20 16:08:34 +0000 | [diff] [blame] | 307 | if not decades. You can read more in the `TableGen Deficiencies <Deficiencies.html>`_ | 
|  | 308 | document. |