Sean Silva | 26b8aab | 2013-01-07 02:43:44 +0000 | [diff] [blame] | 1 | =========================== |
| 2 | TableGen Language Reference |
| 3 | =========================== |
| 4 | |
| 5 | .. sectionauthor:: Sean Silva <silvas@purdue.edu> |
| 6 | |
| 7 | .. contents:: |
| 8 | :local: |
| 9 | |
| 10 | .. warning:: |
| 11 | This document is extremely rough. If you find something lacking, please |
| 12 | fix it, file a documentation bug, or ask about it on llvmdev. |
| 13 | |
| 14 | Introduction |
| 15 | ============ |
| 16 | |
| 17 | This document is meant to be a normative spec about the TableGen language |
| 18 | in and of itself (i.e. how to understand a given construct in terms of how |
| 19 | it affects the final set of records represented by the TableGen file). If |
| 20 | you are unsure if this document is really what you are looking for, please |
| 21 | read :doc:`/TableGenFundamentals` first. |
| 22 | |
| 23 | Notation |
| 24 | ======== |
| 25 | |
| 26 | The lexical and syntax notation used here is intended to imitate |
| 27 | `Python's`_. In particular, for lexical definitions, the productions |
| 28 | operate at the character level and there is no implied whitespace between |
| 29 | elements. The syntax definitions operate at the token level, so there is |
| 30 | implied whitespace between tokens. |
| 31 | |
| 32 | .. _`Python's`: http://docs.python.org/py3k/reference/introduction.html#notation |
| 33 | |
| 34 | Lexical Analysis |
| 35 | ================ |
| 36 | |
| 37 | TableGen supports BCPL (``// ...``) and nestable C-style (``/* ... */``) |
| 38 | comments. |
| 39 | |
| 40 | The following is a listing of the basic punctuation tokens:: |
| 41 | |
| 42 | - + [ ] { } ( ) < > : ; . = ? # |
| 43 | |
| 44 | Numeric literals take one of the following forms: |
| 45 | |
| 46 | .. TableGen actually will lex some pretty strange sequences an interpret |
| 47 | them as numbers. What is shown here is an attempt to approximate what it |
| 48 | "should" accept. |
| 49 | |
| 50 | .. productionlist:: |
| 51 | TokInteger: `DecimalInteger` | `HexInteger` | `BinInteger` |
| 52 | DecimalInteger: ["+" | "-"] ("0"..."9")+ |
| 53 | HexInteger: "0x" ("0"..."9" | "a"..."f" | "A"..."F")+ |
| 54 | BinInteger: "0b" ("0" | "1")+ |
| 55 | |
| 56 | One aspect to note is that the :token:`DecimalInteger` token *includes* the |
| 57 | ``+`` or ``-``, as opposed to having ``+`` and ``-`` be unary operators as |
| 58 | most languages do. |
| 59 | |
| 60 | TableGen has identifier-like tokens: |
| 61 | |
| 62 | .. productionlist:: |
| 63 | ualpha: "a"..."z" | "A"..."Z" | "_" |
| 64 | TokIdentifier: ("0"..."9")* `ualpha` (`ualpha` | "0"..."9")* |
| 65 | TokVarName: "$" `ualpha` (`ualpha` | "0"..."9")* |
| 66 | |
| 67 | Note that unlike most languages, TableGen allows :token:`TokIdentifier` to |
| 68 | begin with a number. In case of ambiguity, a token will be interpreted as a |
| 69 | numeric literal rather than an identifier. |
| 70 | |
| 71 | TableGen also has two string-like literals: |
| 72 | |
| 73 | .. productionlist:: |
| 74 | TokString: '"' <non-'"' characters and C-like escapes> '"' |
| 75 | TokCodeFragment: "[{" <shortest text not containing "}]"> "}]" |
| 76 | |
Sean Silva | 104f2b5 | 2013-01-09 02:20:30 +0000 | [diff] [blame^] | 77 | .. note:: |
| 78 | The current implementation accepts the following C-like escapes:: |
| 79 | |
| 80 | \\ \' \" \t \n |
| 81 | |
Sean Silva | 26b8aab | 2013-01-07 02:43:44 +0000 | [diff] [blame] | 82 | TableGen also has the following keywords:: |
| 83 | |
| 84 | bit bits class code dag |
| 85 | def foreach defm field in |
| 86 | int let list multiclass string |
| 87 | |
| 88 | TableGen also has "bang operators" which have a |
| 89 | wide variety of meanings:: |
| 90 | |
| 91 | !eq !if !head !tail !con |
| 92 | !shl !sra !srl |
| 93 | !cast !empty !subst !foreach !strconcat |
| 94 | |
| 95 | Syntax |
| 96 | ====== |
| 97 | |
| 98 | TableGen has an ``include`` mechanism. It does not play a role in the |
| 99 | syntax per se, since it is lexically replaced with the contents of the |
| 100 | included file. |
| 101 | |
| 102 | .. productionlist:: |
| 103 | IncludeDirective: "include" `TokString` |
| 104 | |
| 105 | TableGen's top-level production consists of "objects". |
| 106 | |
| 107 | .. productionlist:: |
| 108 | TableGenFile: `Object`* |
| 109 | Object: `Class` | `Def` | `Defm` | `Let` | `MultiClass` | `Foreach` |
| 110 | |
| 111 | ``class``\es |
| 112 | ------------ |
| 113 | |
| 114 | .. productionlist:: |
| 115 | Class: "class" `TokIdentifier` [`TemplateArgList`] `ObjectBody` |
| 116 | |
| 117 | A ``class`` declaration creates a record which other records can inherit |
| 118 | from. A class can be parametrized by a list of "template arguments", whose |
| 119 | values can be used in the class body. |
| 120 | |
| 121 | A given class can only be defined once. A ``class`` declaration is |
| 122 | considered to define the class if any of the following is true: |
| 123 | |
| 124 | .. break ObjectBody into its consituents so that they are present here? |
| 125 | |
| 126 | #. The :token:`TemplateArgList` is present. |
| 127 | #. The :token:`Body` in the :token:`ObjectBody` is present and is not empty. |
| 128 | #. The :token:`BaseClassList` in the :token:`ObjectBody` is present. |
| 129 | |
| 130 | You can declare an empty class by giving and empty :token:`TemplateArgList` |
| 131 | and an empty :token:`ObjectBody`. This can serve as a restricted form of |
| 132 | forward declaration: note that records deriving from the forward-declared |
| 133 | class will inherit no fields from it since the record expansion is done |
| 134 | when the record is parsed. |
| 135 | |
| 136 | .. productionlist:: |
| 137 | TemplateArgList: "<" `Declaration` ("," `Declaration`)* ">" |
| 138 | |
| 139 | Declarations |
| 140 | ------------ |
| 141 | |
| 142 | .. Omitting mention of arcane "field" prefix to discourage its use. |
| 143 | |
| 144 | The declaration syntax is pretty much what you would expect as a C++ |
| 145 | programmer. |
| 146 | |
| 147 | .. productionlist:: |
| 148 | Declaration: `Type` `TokIdentifier` ["=" `Value`] |
| 149 | |
| 150 | It assigns the value to the identifer. |
| 151 | |
| 152 | Types |
| 153 | ----- |
| 154 | |
| 155 | .. productionlist:: |
| 156 | Type: "string" | "code" | "bit" | "int" | "dag" |
| 157 | :| "bits" "<" `TokInteger` ">" |
| 158 | :| "list" "<" `Type` ">" |
| 159 | :| `ClassID` |
| 160 | ClassID: `TokIdentifier` |
| 161 | |
| 162 | Both ``string`` and ``code`` correspond to the string type; the difference |
| 163 | is purely to indicate programmer intention. |
| 164 | |
| 165 | The :token:`ClassID` must identify a class that has been previously |
| 166 | declared or defined. |
| 167 | |
| 168 | Values |
| 169 | ------ |
| 170 | |
| 171 | .. productionlist:: |
| 172 | Value: `SimpleValue` `ValueSuffix`* |
| 173 | ValueSuffix: "{" `RangeList` "}" |
| 174 | :| "[" `RangeList` "]" |
| 175 | :| "." `TokIdentifier` |
| 176 | RangeList: `RangePiece` ("," `RangePiece`)* |
| 177 | RangePiece: `TokInteger` |
| 178 | :| `TokInteger` "-" `TokInteger` |
| 179 | :| `TokInteger` `TokInteger` |
| 180 | |
| 181 | The peculiar last form of :token:`RangePiece` is due to the fact that the |
| 182 | "``-``" is included in the :token:`TokInteger`, hence ``1-5`` gets lexed as |
| 183 | two consecutive :token:`TokInteger`'s, with values ``1`` and ``-5``, |
| 184 | instead of "1", "-", and "5". |
| 185 | The :token:`RangeList` can be thought of as specifying "list slice" in some |
| 186 | contexts. |
| 187 | |
| 188 | |
| 189 | :token:`SimpleValue` has a number of forms: |
| 190 | |
| 191 | |
| 192 | .. productionlist:: |
| 193 | SimpleValue: `TokIdentifier` |
| 194 | |
| 195 | The value will be the variable referenced by the identifier. It can be one |
| 196 | of: |
| 197 | |
| 198 | .. The code for this is exceptionally abstruse. These examples are a |
| 199 | best-effort attempt. |
| 200 | |
| 201 | * name of a ``def``, such as the use of ``Bar`` in:: |
| 202 | |
| 203 | def Bar : SomeClass { |
| 204 | int X = 5; |
| 205 | } |
| 206 | |
| 207 | def Foo { |
| 208 | SomeClass Baz = Bar; |
| 209 | } |
| 210 | |
| 211 | * value local to a ``def``, such as the use of ``Bar`` in:: |
| 212 | |
| 213 | def Foo { |
| 214 | int Bar = 5; |
| 215 | int Baz = Bar; |
| 216 | } |
| 217 | |
| 218 | * a template arg of a ``class``, such as the use of ``Bar`` in:: |
| 219 | |
| 220 | class Foo<int Bar> { |
| 221 | int Baz = Bar; |
| 222 | } |
| 223 | |
| 224 | * value local to a ``multiclass``, such as the use of ``Bar`` in:: |
| 225 | |
| 226 | multiclass Foo { |
| 227 | int Bar = 5; |
| 228 | int Baz = Bar; |
| 229 | } |
| 230 | |
| 231 | * a template arg to a ``multiclass``, such as the use of ``Bar`` in:: |
| 232 | |
| 233 | multiclass Foo<int Bar> { |
| 234 | int Baz = Bar; |
| 235 | } |
| 236 | |
| 237 | .. productionlist:: |
| 238 | SimpleValue: `TokInteger` |
| 239 | |
| 240 | This represents the numeric value of the integer. |
| 241 | |
| 242 | .. productionlist:: |
| 243 | SimpleValue: `TokString`+ |
| 244 | |
| 245 | Multiple adjacent string literals are concatenated like in C/C++. The value |
| 246 | is the concatenation of the strings. |
| 247 | |
| 248 | .. productionlist:: |
| 249 | SimpleValue: `TokCodeFragment` |
| 250 | |
| 251 | The value is the string value of the code fragment. |
| 252 | |
| 253 | .. productionlist:: |
| 254 | SimpleValue: "?" |
| 255 | |
| 256 | ``?`` represents an "unset" initializer. |
| 257 | |
| 258 | .. productionlist:: |
| 259 | SimpleValue: "{" `ValueList` "}" |
| 260 | ValueList: [`ValueListNE`] |
| 261 | ValueListNE: `Value` ("," `Value`)* |
| 262 | |
| 263 | This represents a sequence of bits, as would be used to initialize a |
| 264 | ``bits<n>`` field (where ``n`` is the number of bits). |
| 265 | |
| 266 | .. productionlist:: |
| 267 | SimpleValue: `ClassID` "<" `ValueListNE` ">" |
| 268 | |
| 269 | This generates a new anonymous record definition (as would be created by an |
| 270 | unnamed ``def`` inheriting from the given class with the given template |
| 271 | arguments) and the value is the value of that record definition. |
| 272 | |
| 273 | .. productionlist:: |
| 274 | SimpleValue: "[" `ValueList` "]" ["<" `Type` ">"] |
| 275 | |
| 276 | A list initializer. The optional :token:`Type` can be used to indicate a |
| 277 | specific element type, otherwise the element type will be deduced from the |
| 278 | given values. |
| 279 | |
| 280 | .. The initial `DagArg` of the dag must start with an identifier or |
| 281 | !cast, but this is more of an implementation detail and so for now just |
| 282 | leave it out. |
| 283 | |
| 284 | .. productionlist:: |
| 285 | SimpleValue: "(" `DagArg` `DagArgList` ")" |
| 286 | DagArgList: `DagArg` ("," `DagArg`)* |
| 287 | DagArg: `Value` [":" `TokVarName`] |
| 288 | |
| 289 | The initial :token:`DagArg` is called the "operator" of the dag. |
| 290 | |
| 291 | .. productionlist:: |
| 292 | SimpleValue: `BangOperator` ["<" `Type` ">"] "(" `ValueListNE` ")" |
| 293 | |
| 294 | Bodies |
| 295 | ------ |
| 296 | |
| 297 | .. productionlist:: |
| 298 | ObjectBody: `BaseClassList` `Body` |
| 299 | BaseClassList: [`BaseClassListNE`] |
| 300 | BaseClassListNE: `SubClassRef` ("," `SubClassRef`)* |
Sean Silva | d155ffc | 2013-01-09 02:20:24 +0000 | [diff] [blame] | 301 | SubClassRef: (`ClassID` | `MultiClassID`) ["<" `ValueList` ">"] |
Sean Silva | 26b8aab | 2013-01-07 02:43:44 +0000 | [diff] [blame] | 302 | DefmID: `TokIdentifier` |
| 303 | |
Sean Silva | d155ffc | 2013-01-09 02:20:24 +0000 | [diff] [blame] | 304 | The version with the :token:`MultiClassID` is only valid in the |
Sean Silva | 26b8aab | 2013-01-07 02:43:44 +0000 | [diff] [blame] | 305 | :token:`BaseClassList` of a ``defm``. |
Sean Silva | d155ffc | 2013-01-09 02:20:24 +0000 | [diff] [blame] | 306 | The :token:`MultiClassID` should be the name of a ``multiclass``. |
Sean Silva | 26b8aab | 2013-01-07 02:43:44 +0000 | [diff] [blame] | 307 | |
| 308 | .. put this somewhere else |
| 309 | |
| 310 | It is after parsing the base class list that the "let stack" is applied. |
| 311 | |
| 312 | .. productionlist:: |
| 313 | Body: ";" | "{" BodyList "}" |
| 314 | BodyList: BodyItem* |
| 315 | BodyItem: `Declaration` ";" |
| 316 | :| "let" `TokIdentifier` [`RangeList`] "=" `Value` ";" |
| 317 | |
| 318 | The ``let`` form allows overriding the value of an inherited field. |
| 319 | |
| 320 | ``def`` |
| 321 | ------- |
| 322 | |
| 323 | .. TODO:: |
| 324 | There can be pastes in the names here, like ``#NAME#``. Look into that |
| 325 | and document it (it boils down to ParseIDValue with IDParseMode == |
| 326 | ParseNameMode). ParseObjectName calls into the general ParseValue, with |
| 327 | the only different from "arbitrary expression parsing" being IDParseMode |
| 328 | == Mode. |
| 329 | |
| 330 | .. productionlist:: |
| 331 | Def: "def" `TokIdentifier` `ObjectBody` |
| 332 | |
| 333 | Defines a record whose name is given by the :token:`TokIdentifier`. The |
| 334 | fields of the record are inherited from the base classes and defined in the |
| 335 | body. |
| 336 | |
| 337 | Special handling occurs if this ``def`` appears inside a ``multiclass`` or |
| 338 | a ``foreach``. |
| 339 | |
| 340 | ``defm`` |
| 341 | -------- |
| 342 | |
| 343 | .. productionlist:: |
| 344 | Defm: "defm" `TokIdentifier` ":" `BaseClassList` ";" |
| 345 | |
| 346 | Note that in the :token:`BaseClassList`, all of the ``multiclass``'s must |
| 347 | precede any ``class``'s that appear. |
| 348 | |
| 349 | ``foreach`` |
| 350 | ----------- |
| 351 | |
| 352 | .. productionlist:: |
| 353 | Foreach: "foreach" `Declaration` "in" "{" `Object`* "}" |
| 354 | :| "foreach" `Declaration` "in" `Object` |
| 355 | |
| 356 | The value assigned to the variable in the declaration is iterated over and |
| 357 | the object or object list is reevaluated with the variable set at each |
| 358 | iterated value. |
| 359 | |
| 360 | Top-Level ``let`` |
| 361 | ----------------- |
| 362 | |
| 363 | .. productionlist:: |
| 364 | Let: "let" `LetList` "in" "{" `Object`* "}" |
| 365 | :| "let" `LetList` "in" `Object` |
| 366 | LetList: `LetItem` ("," `LetItem`)* |
| 367 | LetItem: `TokIdentifier` [`RangeList`] "=" `Value` |
| 368 | |
| 369 | This is effectively equivalent to ``let`` inside the body of a record |
| 370 | except that it applies to multiple records at a time. The bindings are |
| 371 | applied at the end of parsing the base classes of a record. |
| 372 | |
| 373 | ``multiclass`` |
| 374 | -------------- |
| 375 | |
| 376 | .. productionlist:: |
| 377 | MultiClass: "multiclass" `TokIdentifier` [`TemplateArgList`] |
Sean Silva | 9302dcc | 2013-01-09 02:11:55 +0000 | [diff] [blame] | 378 | : [":" `BaseMultiClassList`] "{" `MultiClassObject`+ "}" |
Sean Silva | 26b8aab | 2013-01-07 02:43:44 +0000 | [diff] [blame] | 379 | BaseMultiClassList: `MultiClassID` ("," `MultiClassID`)* |
| 380 | MultiClassID: `TokIdentifier` |
Sean Silva | 9302dcc | 2013-01-09 02:11:55 +0000 | [diff] [blame] | 381 | MultiClassObject: `Def` | `Defm` | `Let` | `Foreach` |