| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" |
| "http://www.w3.org/TR/html4/strict.dtd"> |
| <html> |
| <head> |
| <title>TableGen Fundamentals</title> |
| <link rel="stylesheet" href="llvm.css" type="text/css"> |
| </head> |
| <body> |
| |
| <div class="doc_title">TableGen Fundamentals</div> |
| |
| <div class="doc_text"> |
| <ul> |
| <li><a href="#introduction">Introduction</a> |
| <ol> |
| <li><a href="#concepts">Basic concepts</a></li> |
| <li><a href="#example">An example record</a></li> |
| <li><a href="#running">Running TableGen</a></li> |
| </ol></li> |
| <li><a href="#syntax">TableGen syntax</a> |
| <ol> |
| <li><a href="#primitives">TableGen primitives</a> |
| <ol> |
| <li><a href="#comments">TableGen comments</a></li> |
| <li><a href="#types">The TableGen type system</a></li> |
| <li><a href="#values">TableGen values and expressions</a></li> |
| </ol></li> |
| <li><a href="#classesdefs">Classes and definitions</a> |
| <ol> |
| <li><a href="#valuedef">Value definitions</a></li> |
| <li><a href="#recordlet">'let' expressions</a></li> |
| <li><a href="#templateargs">Class template arguments</a></li> |
| </ol></li> |
| <li><a href="#filescope">File scope entities</a> |
| <ol> |
| <li><a href="#include">File inclusion</a></li> |
| <li><a href="#globallet">'let' expressions</a></li> |
| </ol></li> |
| </ol></li> |
| <li><a href="#backends">TableGen backends</a> |
| <ol> |
| <li><a href="#">todo</a></li> |
| </ol></li> |
| </ul> |
| </div> |
| |
| <div class="doc_author"> |
| <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p> |
| </div> |
| |
| <!-- *********************************************************************** --> |
| <div class="doc_section"><a name="introduction">Introduction</a></div> |
| <!-- *********************************************************************** --> |
| |
| <div class="doc_text"> |
| |
| <p>TableGen's purpose is to help a human develop and maintain records of |
| domain-specific information. Because there may be a large number of these |
| records, it is specifically designed to allow writing flexible descriptions and |
| for common features of these records to be factored out. This reduces the |
| amount of duplication in the description, reduces the chance of error, and |
| makes it easier to structure domain specific information.</p> |
| |
| <p>The core part of TableGen <a href="#syntax">parses a file</a>, instantiates |
| the declarations, and hands the result off to a domain-specific "<a |
| href="#backends">TableGen backend</a>" for processing. The current major user |
| of TableGen is the <a href="CodeGenerator.html">LLVM code generator</a>.</p> |
| |
| <p>Note that if you work on TableGen much, and use emacs or vim, that you can |
| find an emacs "TableGen mode" and a vim language file in |
| <tt>llvm/utils/emacs</tt> and <tt>llvm/utils/vim</tt> directory of your LLVM |
| distribution, respectively.</p> |
| |
| </div> |
| |
| <!-- ======================================================================= --> |
| <div class="doc_subsection"><a name="running">Basic concepts</a></div> |
| |
| <div class="doc_text"> |
| |
| <p>TableGen files consist of two key parts: 'classes' and 'definitions', both |
| of which are considered 'records'.</p> |
| |
| <p><b>TableGen records</b> have a unique name, a list of values, and a list of |
| superclasses. The list of values is main data that TableGen builds for each |
| record, it is this that holds the domain specific information for the |
| application. The interpretation of this data is left to a specific <a |
| href="#backends">TableGen backend</a>, but the structure and format rules are |
| taken care of and fixed by TableGen.</p> |
| |
| <p><b>TableGen definitions</b> are the concrete form of 'records'. These |
| generally do not have any undefined values, and are marked with the |
| '<tt>def</tt>' keyword.</p> |
| |
| <p><b>TableGen classes</b> are abstract records that are used to build and |
| describe other records. These 'classes' allow the end-user to build |
| abstractions for either the domain they are targetting (such as "Register", |
| "RegisterClass", and "Instruction" in the LLVM code generator) or for the |
| implementor to help factor out common properties of records (such as "FPInst", |
| which is used to represent floating point instructions in the X86 backend). |
| TableGen keeps track of all of the classes that are used to build up a |
| definition, so the backend can find all definitions of a particular class, such |
| as "Instruction".</p> |
| |
| </div> |
| |
| <!-- ======================================================================= --> |
| <div class="doc_subsection"><a name="example">An example record</a></div> |
| |
| <div class="doc_text"> |
| |
| <p>With no other arguments, TableGen parses the specified file and prints out |
| all of the classes, then all of the definitions. This is a good way to see what |
| the various definitions expand to fully. Running this on the <tt>X86.td</tt> |
| file prints this (at the time of this writing):</p> |
| |
| <pre> |
| ... |
| <b>def</b> ADDrr8 { <i>// Instruction X86Inst I2A8 Pattern</i> |
| <b>string</b> Name = "add"; |
| <b>string</b> Namespace = "X86"; |
| <b>list</b><Register> Uses = []; |
| <b>list</b><Register> Defs = []; |
| <b>bit</b> isReturn = 0; |
| <b>bit</b> isBranch = 0; |
| <b>bit</b> isCall = 0; |
| <b>bit</b> isTwoAddress = 1; |
| <b>bit</b> isTerminator = 0; |
| <b>dag</b> Pattern = (set R8, (plus R8, R8)); |
| <b>bits</b><8> Opcode = { 0, 0, 0, 0, 0, 0, 0, 0 }; |
| Format Form = MRMDestReg; |
| <b>bits</b><5> FormBits = { 0, 0, 0, 1, 1 }; |
| ArgType Type = Arg8; |
| <b>bits</b><3> TypeBits = { 0, 0, 1 }; |
| <b>bit</b> hasOpSizePrefix = 0; |
| <b>bit</b> printImplicitUses = 0; |
| <b>bits</b><4> Prefix = { 0, 0, 0, 0 }; |
| FPFormat FPForm = ?; |
| <b>bits</b><3> FPFormBits = { 0, 0, 0 }; |
| } |
| ... |
| </pre> |
| |
| <p>This definition corresponds to an 8-bit register-register add instruction in |
| the X86. The string after the '<tt>def</tt>' string indicates the name of the |
| record ("<tt>ADDrr8</tt>" in this case), and the comment at the end of the line |
| indicates the superclasses of the definition. The body of the record contains |
| all of the data that TableGen assembled for the record, indicating that the |
| instruction is part of the "X86" namespace, should be printed as "<tt>add</tt>" |
| in the assembly file, it is a two-address instruction, has a particular |
| encoding, etc. The contents and semantics of the information in the record is |
| specific to the needs of the X86 backend, and is only shown as an example.</p> |
| |
| <p>As you can see, a lot of information is needed for every instruction |
| supported by the code generator, and specifying it all manually would be |
| unmaintainble, prone to bugs, and tiring to do in the first place. Because we |
| are using TableGen, all of the information was derived from the following |
| definition:</p> |
| |
| <pre> |
| <b>def</b> ADDrr8 : I2A8<"add", 0x00, MRMDestReg>, |
| Pattern<(set R8, (plus R8, R8))>; |
| </pre> |
| |
| <p>This definition makes use of the custom I2A8 (two address instruction with |
| 8-bit operand) class, which is defined in the X86-specific TableGen file to |
| factor out the common features that instructions of its class share. A key |
| feature of TableGen is that it allows the end-user to define the abstractions |
| they prefer to use when describing their information.</p> |
| |
| </div> |
| |
| <!-- ======================================================================= --> |
| <div class="doc_subsection"><a name="running">Running TableGen</a></div> |
| |
| <div class="doc_text"> |
| |
| <p>TableGen runs just like any other LLVM tool. The first (optional) argument |
| specifies the file to read. If a filename is not specified, <tt>tblgen</tt> |
| reads from standard input.</p> |
| |
| <p>To be useful, one of the <a href="#backends">TableGen backends</a> must be |
| used. These backends are selectable on the command line (type '<tt>tblgen |
| --help</tt>' for a list). For example, to get a list of all of the definitions |
| that subclass a particular type (which can be useful for building up an enum |
| list of these records), use the <tt>--print-enums</tt> option:</p> |
| |
| <pre> |
| $ tblgen X86.td -print-enums -class=Register |
| AH, AL, AX, BH, BL, BP, BX, CH, CL, CX, DH, DI, DL, DX, |
| EAX, EBP, EBX, ECX, EDI, EDX, ESI, ESP, FP0, FP1, FP2, FP3, FP4, FP5, FP6, |
| SI, SP, ST0, ST1, ST2, ST3, ST4, ST5, ST6, ST7, |
| |
| $ tblgen X86.td -print-enums -class=Instruction |
| ADCrr32, ADDri16, ADDri16b, ADDri32, ADDri32b, ADDri8, ADDrr16, ADDrr32, |
| ADDrr8, ADJCALLSTACKDOWN, ADJCALLSTACKUP, ANDri16, ANDri16b, ANDri32, ANDri32b, |
| ANDri8, ANDrr16, ANDrr32, ANDrr8, BSWAPr32, CALLm32, CALLpcrel32, ... |
| </pre> |
| |
| <p>The default backend prints out all of the records, as described <a |
| href="#example">above</a>.</p> |
| |
| <p>If you plan to use TableGen for some purpose, you will most likely have to |
| <a href="#backends">write a backend</a> that extracts the information specific |
| to what you need and formats it in the appropriate way.</p> |
| |
| </div> |
| |
| |
| <!-- *********************************************************************** --> |
| <div class="doc_section"><a name="syntax">TableGen syntax</a></div> |
| <!-- *********************************************************************** --> |
| |
| <div class="doc_text"> |
| <p>TableGen doesn't care about the meaning of data (that is up to the backend |
| to define), but it does care about syntax, and it enforces a simple type system. |
| This section describes the syntax and the constructs allowed in a TableGen file. |
| </p> |
| </div> |
| |
| <!-- ======================================================================= --> |
| <div class="doc_subsection"><a name="primitives">TableGen primitives</a></div> |
| |
| <!-- --------------------------------------------------------------------------> |
| <div class="doc_subsubsection"><a name="comments">TableGen comments</a></div> |
| |
| <div class="doc_text"> |
| <p>TableGen supports BCPL style "<tt>//</tt>" comments, which run to the end of |
| the line, and it also supports <b>nestable</b> "<tt>/* */</tt>" comments.</p> |
| </div> |
| |
| <!-- --------------------------------------------------------------------------> |
| <div class="doc_subsubsection"> |
| <a name="types">The TableGen type system</a> |
| </div> |
| |
| <div class="doc_text"> |
| <p>TableGen files are strongly typed, in a simple (but complete) type-system. |
| These types are used to perform automatic conversions, check for errors, and to |
| help interface designers constrain the input that they allow. Every <a |
| href="#valuedef">value definition</a> is required to have an associated type. |
| </p> |
| |
| <p>TableGen supports a mixture of very low-level types (such as <tt>bit</tt>) |
| and very high-level types (such as <tt>dag</tt>). This flexibility is what |
| allows it to describe a wide range of information conveniently and compactly. |
| The TableGen types are:</p> |
| |
| <ul> |
| <li>"<tt><b>bit</b></tt>" - A 'bit' is a boolean value that can hold either 0 or |
| 1.</li> |
| |
| <li>"<tt><b>int</b></tt>" - The 'int' type represents a simple 32-bit integer |
| value, such as 5.</li> |
| |
| <li>"<tt><b>string</b></tt>" - The 'string' type represents an ordered sequence |
| of characters of arbitrary length.</li> |
| |
| <li>"<tt><b>bits</b><n></tt>" - A 'bits' type is an arbitrary, but fixed, |
| size integer that is broken up into individual bits. This type is useful |
| because it can handle some bits being defined while others are undefined.</li> |
| |
| <li>"<tt><b>list</b><ty></tt>" - This type represents a list whose |
| elements are some other type. The contained type is arbitrary: it can even be |
| another list type.</li> |
| |
| <li>Class type - Specifying a class name in a type context means that the |
| defined value must be a subclass of the specified class. This is useful in |
| conjunction with the "list" type, for example, to constrain the elements of the |
| list to a common base class (e.g., a <tt><b>list</b><Register></tt> can |
| only contain definitions derived from the "<tt>Register</tt>" class).</li> |
| |
| <li>"<tt><b>code</b></tt>" - This represents a big hunk of text. NOTE: I don't |
| remember why this is distinct from string!</li> |
| |
| <li>"<tt><b>dag</b></tt>" - This type represents a nestable directed graph of |
| elements.</li> |
| </ul> |
| |
| <p>To date, these types have been sufficient for describing things that |
| TableGen has been used for, but it is straight-forward to extend this list if |
| needed.</p> |
| |
| </div> |
| |
| <!-- --------------------------------------------------------------------------> |
| <div class="doc_subsubsection"> |
| <a name="values">TableGen values and expressions</a> |
| </div> |
| |
| <div class="doc_text"> |
| |
| <p>TableGen allows for a pretty reasonable number of different expression forms |
| when building up values. These forms allow the TableGen file to be written in a |
| natural syntax and flavor for the application. The current expression forms |
| supported include:</p> |
| |
| <ul> |
| <li><tt>?</tt> - uninitialized field</li> |
| <li><tt>0b1001011</tt> - binary integer value</li> |
| <li><tt>07654321</tt> - octal integer value (indicated by a leading 0)</li> |
| <li><tt>7</tt> - decimal integer value</li> |
| <li><tt>0x7F</tt> - hexadecimal integer value</li> |
| <li><tt>"foo"</tt> - string value</li> |
| <li><tt>[{ ... }]</tt> - code fragment</li> |
| <li><tt>[ X, Y, Z ]</tt> - list value.</li> |
| <li><tt>{ a, b, c }</tt> - initializer for a "bits<3>" value</li> |
| <li><tt>value</tt> - value reference</li> |
| <li><tt>value{17}</tt> - access to one bit of a value</li> |
| <li><tt>value{15-17}</tt> - access to multiple bits of a value</li> |
| <li><tt>DEF</tt> - reference to a record definition</li> |
| <li><tt>X.Y</tt> - reference to the subfield of a value</li> |
| <li><tt>list[4-7,17,2-3]</tt> - A slice of the 'list' list, including elements |
| 4,5,6,7,17,2, and 3 from it. Elements may be included multiple times.</li> |
| <li><tt>(DEF a, b)</tt> - a dag value. The first element is required to be a |
| record definition, the remaining elements in the list may be arbitrary other |
| values, including nested `<tt>dag</tt>' values.</li> |
| </ul> |
| |
| <p>Note that all of the values have rules specifying how they convert to values |
| for different types. These rules allow you to assign a value like "7" to a |
| "bits<4>" value, for example.</p> |
| |
| </div> |
| |
| <!-- ======================================================================= --> |
| <div class="doc_subsection"> |
| <a name="classesdefs">Classes and definitions</a> |
| </div> |
| |
| <div class="doc_text"> |
| |
| <p>As mentioned in the <a href="#concepts">intro</a>, classes and definitions |
| (collectively known as 'records') in TableGen are the main high-level unit of |
| information that TableGen collects. Records are defined with a <tt>def</tt> or |
| <tt>class</tt> keyword, the record name, and an optional list of "<a |
| href="#templateargs">template arguments</a>". If the record has superclasses, |
| they are specified as a comma seperated list that starts with a colon character |
| (":"). If <a href="#valuedef">value definitions</a> or <a href="#recordlet">let |
| expressions</a> are needed for the class, they are enclosed in curly braces |
| ("{}"); otherwise, the record ends with a semicolon. Here is a simple TableGen |
| file:</p> |
| |
| <pre> |
| <b>class</b> C { <b>bit</b> V = 1; } |
| <b>def</b> X : C; |
| <b>def</b> Y : C { |
| <b>string</b> Greeting = "hello"; |
| } |
| </pre> |
| |
| <p>This example defines two definitions, <tt>X</tt> and <tt>Y</tt>, both of |
| which derive from the <tt>C</tt> class. Because of this, they both get the |
| <tt>V</tt> bit value. The <tt>Y</tt> definition also gets the Greeting member |
| as well.</p> |
| |
| <p>In general, classes are useful for collecting together the commonality |
| between a group of records and isolating it in a single place. Also, classes |
| permit the specification of default values for their subclasses, allowing the |
| subclasses to override them as they wish.</p> |
| |
| </div> |
| |
| <!----------------------------------------------------------------------------> |
| <div class="doc_subsubsection"> |
| <a name="valuedef">Value definitions</a> |
| </div> |
| |
| <div class="doc_text"> |
| <p>Value definitions define named entries in records. A value must be defined |
| before it can be referred to as the operand for another value definition or |
| before the value is reset with a <a href="#recordlet">let expression</a>. A |
| value is defined by specifying a <a href="#types">TableGen type</a> and a name. |
| If an initial value is available, it may be specified after the type with an |
| equal sign. Value definitions require terminating semicolons.</p> |
| </div> |
| |
| <!-- --------------------------------------------------------------------------> |
| <div class="doc_subsubsection"> |
| <a name="recordlet">'let' expressions</a> |
| </div> |
| |
| <div class="doc_text"> |
| <p>A record-level let expression is used to change the value of a value |
| definition in a record. This is primarily useful when a superclass defines a |
| value that a derived class or definition wants to override. Let expressions |
| consist of the '<tt>let</tt>' keyword followed by a value name, an equal sign |
| ("="), and a new value. For example, a new class could be added to the example |
| above, redefining the <tt>V</tt> field for all of its subclasses:</p> |
| |
| <pre> |
| <b>class</b> D : C { let V = 0; } |
| <b>def</b> Z : D; |
| </pre> |
| |
| <p>In this case, the <tt>Z</tt> definition will have a zero value for its "V" |
| value, despite the fact that it derives (indirectly) from the <tt>C</tt> class, |
| because the <tt>D</tt> class overrode its value.</p> |
| |
| </div> |
| |
| <!-- --------------------------------------------------------------------------> |
| <div class="doc_subsubsection"> |
| <a name="templateargs">Class template arguments</a> |
| </div> |
| |
| <div class="doc_text"> |
| <p>TableGen permits the definition of parameterized classes as well as normal |
| concrete classes. Parameterized TableGen classes specify a list of variable |
| bindings (which may optionally have defaults) that are bound when used. Here is |
| a simple example:</p> |
| |
| <pre> |
| <b>class</b> FPFormat<<b>bits</b><3> val> { |
| <b>bits</b><3> Value = val; |
| } |
| <b>def</b> NotFP : FPFormat<0>; |
| <b>def</b> ZeroArgFP : FPFormat<1>; |
| <b>def</b> OneArgFP : FPFormat<2>; |
| <b>def</b> OneArgFPRW : FPFormat<3>; |
| <b>def</b> TwoArgFP : FPFormat<4>; |
| <b>def</b> SpecialFP : FPFormat<5>; |
| </pre> |
| |
| <p>In this case, template arguments are used as a space efficient way to specify |
| a list of "enumeration values", each with a "Value" field set to the specified |
| integer.</p> |
| |
| <p>The more esoteric forms of <a href="#values">TableGen expressions</a> are |
| useful in conjunction with template arguments. As an example:</p> |
| |
| <pre> |
| <b>class</b> ModRefVal<<b>bits</b><2> val> { |
| <b>bits</b><2> Value = val; |
| } |
| |
| <b>def</b> None : ModRefVal<0>; |
| <b>def</b> Mod : ModRefVal<1>; |
| <b>def</b> Ref : ModRefVal<2>; |
| <b>def</b> ModRef : ModRefVal<3>; |
| |
| <b>class</b> Value<ModRefVal MR> { |
| <i>// decode some information into a more convenient format, while providing |
| // a nice interface to the user of the "Value" class.</i> |
| <b>bit</b> isMod = MR.Value{0}; |
| <b>bit</b> isRef = MR.Value{1}; |
| |
| <i>// other stuff...</i> |
| } |
| |
| <i>// Example uses</i> |
| <b>def</b> bork : Value<Mod>; |
| <b>def</b> zork : Value<Ref>; |
| <b>def</b> hork : Value<ModRef>; |
| </pre> |
| |
| <p>This is obviously a contrived example, but it shows how template arguments |
| can be used to decouple the interface provided to the user of the class from the |
| actual internal data representation expected by the class. In this case, |
| running <tt>tblgen</tt> on the example prints the following definitions:</p> |
| |
| <pre> |
| <b>def</b> bork { <i>// Value</i> |
| bit isMod = 1; |
| bit isRef = 0; |
| } |
| <b>def</b> hork { <i>// Value</i> |
| bit isMod = 1; |
| bit isRef = 1; |
| } |
| <b>def</b> zork { <i>// Value</i> |
| bit isMod = 0; |
| bit isRef = 1; |
| } |
| </pre> |
| |
| <p> This shows that TableGen was able to dig into the argument and extract a |
| piece of information that was requested by the designer of the "Value" class. |
| For more realistic examples, please see existing users of TableGen, such as the |
| X86 backend.</p> |
| |
| </div> |
| |
| <!-- ======================================================================= --> |
| <div class="doc_subsection"> |
| <a name="filescope">File scope entities</a> |
| </div> |
| |
| <!-- --------------------------------------------------------------------------> |
| <div class="doc_subsubsection"> |
| <a name="include">File inclusion</a> |
| </div> |
| |
| <div class="doc_text"> |
| <p>TableGen supports the '<tt>include</tt>' token, which textually substitutes |
| the specified file in place of the include directive. The filename should be |
| specified as a double quoted string immediately after the '<tt>include</tt>' |
| keyword. Example:</p> |
| |
| <pre> |
| <b>include</b> "foo.td" |
| </pre> |
| |
| </div> |
| |
| <!-- --------------------------------------------------------------------------> |
| <div class="doc_subsubsection"> |
| <a name="globallet">'let' expressions</a> |
| </div> |
| |
| <div class="doc_text"> |
| <p> "let" expressions at file scope are similar to <a href="#recordlet">"let" |
| expressions within a record</a>, except they can specify a value binding for |
| multiple records at a time, and may be useful in certain other cases. |
| File-scope let expressions are really just another way that TableGen allows the |
| end-user to factor out commonality from the records.</p> |
| |
| <p>File-scope "let" expressions take a comma-seperated list of bindings to |
| apply, and one of more records to bind the values in. Here are some |
| examples:</p> |
| |
| <pre> |
| <b>let</b> isTerminator = 1, isReturn = 1 <b>in</b> |
| <b>def</b> RET : X86Inst<"ret", 0xC3, RawFrm, NoArg>; |
| |
| <b>let</b> isCall = 1 <b>in</b> |
| <i>// All calls clobber the non-callee saved registers...</i> |
| <b>let</b> Defs = [EAX, ECX, EDX, FP0, FP1, FP2, FP3, FP4, FP5, FP6] in { |
| <b>def</b> CALLpcrel32 : X86Inst<"call", 0xE8, RawFrm, NoArg>; |
| <b>def</b> CALLr32 : X86Inst<"call", 0xFF, MRMS2r, Arg32>; |
| <b>def</b> CALLm32 : X86Inst<"call", 0xFF, MRMS2m, Arg32>; |
| } |
| </pre> |
| |
| <p>File-scope "let" expressions are often useful when a couple of definitions |
| need to be added to several records, and the records do not otherwise need to be |
| opened, as in the case with the CALL* instructions above.</p> |
| </div> |
| |
| <!-- *********************************************************************** --> |
| <div class="doc_section"><a name="backends">TableGen backends</a></div> |
| <!-- *********************************************************************** --> |
| |
| <div class="doc_text"> |
| <p>How they work, how to write one. This section should not contain details |
| about any particular backend, except maybe -print-enums as an example. This |
| should highlight the APIs in <tt>TableGen/Record.h</tt>.</p> |
| </div> |
| |
| <!-- *********************************************************************** --> |
| |
| <hr> |
| <address> |
| <a href="http://jigsaw.w3.org/css-validator/check/referer"><img |
| src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> |
| <a href="http://validator.w3.org/check/referer"><img |
| src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!" /></a> |
| |
| <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> |
| <a href="http://llvm.cs.uiuc.edu">LLVM Compiler Infrastructure</a><br> |
| Last modified: $Date$ |
| </address> |
| |
| </body> |
| </html> |