blob: e3db3aa62712fae11a0d861699d03356b1c01d9d [file] [log] [blame]
Sean Silva1b600182013-01-07 02:43:44 +00001===========================
2TableGen Language Reference
3===========================
4
5.. sectionauthor:: Sean Silva <silvas@purdue.edu>
6
7.. contents::
8 :local:
9
10.. warning::
11 This document is extremely rough. If you find something lacking, please
12 fix it, file a documentation bug, or ask about it on llvmdev.
13
14Introduction
15============
16
17This document is meant to be a normative spec about the TableGen language
18in and of itself (i.e. how to understand a given construct in terms of how
19it affects the final set of records represented by the TableGen file). If
20you are unsure if this document is really what you are looking for, please
21read :doc:`/TableGenFundamentals` first.
22
23Notation
24========
25
26The lexical and syntax notation used here is intended to imitate
27`Python's`_. In particular, for lexical definitions, the productions
28operate at the character level and there is no implied whitespace between
29elements. The syntax definitions operate at the token level, so there is
30implied whitespace between tokens.
31
32.. _`Python's`: http://docs.python.org/py3k/reference/introduction.html#notation
33
34Lexical Analysis
35================
36
37TableGen supports BCPL (``// ...``) and nestable C-style (``/* ... */``)
38comments.
39
40The following is a listing of the basic punctuation tokens::
41
42 - + [ ] { } ( ) < > : ; . = ? #
43
44Numeric literals take one of the following forms:
45
46.. TableGen actually will lex some pretty strange sequences an interpret
47 them as numbers. What is shown here is an attempt to approximate what it
48 "should" accept.
49
50.. productionlist::
51 TokInteger: `DecimalInteger` | `HexInteger` | `BinInteger`
52 DecimalInteger: ["+" | "-"] ("0"..."9")+
53 HexInteger: "0x" ("0"..."9" | "a"..."f" | "A"..."F")+
54 BinInteger: "0b" ("0" | "1")+
55
56One aspect to note is that the :token:`DecimalInteger` token *includes* the
57``+`` or ``-``, as opposed to having ``+`` and ``-`` be unary operators as
58most languages do.
59
60TableGen has identifier-like tokens:
61
62.. productionlist::
63 ualpha: "a"..."z" | "A"..."Z" | "_"
64 TokIdentifier: ("0"..."9")* `ualpha` (`ualpha` | "0"..."9")*
65 TokVarName: "$" `ualpha` (`ualpha` | "0"..."9")*
66
67Note that unlike most languages, TableGen allows :token:`TokIdentifier` to
68begin with a number. In case of ambiguity, a token will be interpreted as a
69numeric literal rather than an identifier.
70
71TableGen also has two string-like literals:
72
73.. productionlist::
74 TokString: '"' <non-'"' characters and C-like escapes> '"'
75 TokCodeFragment: "[{" <shortest text not containing "}]"> "}]"
76
Sean Silvacc373352014-02-09 02:43:50 +000077:token:`TokCodeFragment` is essentially a multiline string literal
78delimited by ``[{`` and ``}]``.
79
Sean Silva543fd7f2013-01-09 02:20:30 +000080.. note::
81 The current implementation accepts the following C-like escapes::
82
83 \\ \' \" \t \n
84
Sean Silva1b600182013-01-07 02:43:44 +000085TableGen also has the following keywords::
86
87 bit bits class code dag
88 def foreach defm field in
89 int let list multiclass string
90
91TableGen also has "bang operators" which have a
Sean Silva45e41472013-01-09 02:20:31 +000092wide variety of meanings:
Sean Silva1b600182013-01-07 02:43:44 +000093
Sean Silva45e41472013-01-09 02:20:31 +000094.. productionlist::
95 BangOperator: one of
96 :!eq !if !head !tail !con
Hal Finkelc7d4dc12013-01-25 14:49:08 +000097 :!add !shl !sra !srl
Sean Silva45e41472013-01-09 02:20:31 +000098 :!cast !empty !subst !foreach !strconcat
Sean Silva1b600182013-01-07 02:43:44 +000099
100Syntax
101======
102
103TableGen has an ``include`` mechanism. It does not play a role in the
104syntax per se, since it is lexically replaced with the contents of the
105included file.
106
107.. productionlist::
108 IncludeDirective: "include" `TokString`
109
110TableGen's top-level production consists of "objects".
111
112.. productionlist::
113 TableGenFile: `Object`*
114 Object: `Class` | `Def` | `Defm` | `Let` | `MultiClass` | `Foreach`
115
116``class``\es
117------------
118
119.. productionlist::
120 Class: "class" `TokIdentifier` [`TemplateArgList`] `ObjectBody`
121
122A ``class`` declaration creates a record which other records can inherit
123from. A class can be parametrized by a list of "template arguments", whose
124values can be used in the class body.
125
126A given class can only be defined once. A ``class`` declaration is
127considered to define the class if any of the following is true:
128
129.. break ObjectBody into its consituents so that they are present here?
130
131#. The :token:`TemplateArgList` is present.
132#. The :token:`Body` in the :token:`ObjectBody` is present and is not empty.
133#. The :token:`BaseClassList` in the :token:`ObjectBody` is present.
134
135You can declare an empty class by giving and empty :token:`TemplateArgList`
136and an empty :token:`ObjectBody`. This can serve as a restricted form of
137forward declaration: note that records deriving from the forward-declared
138class will inherit no fields from it since the record expansion is done
139when the record is parsed.
140
141.. productionlist::
142 TemplateArgList: "<" `Declaration` ("," `Declaration`)* ">"
143
144Declarations
145------------
146
147.. Omitting mention of arcane "field" prefix to discourage its use.
148
149The declaration syntax is pretty much what you would expect as a C++
150programmer.
151
152.. productionlist::
153 Declaration: `Type` `TokIdentifier` ["=" `Value`]
154
155It assigns the value to the identifer.
156
157Types
158-----
159
160.. productionlist::
161 Type: "string" | "code" | "bit" | "int" | "dag"
162 :| "bits" "<" `TokInteger` ">"
163 :| "list" "<" `Type` ">"
164 :| `ClassID`
165 ClassID: `TokIdentifier`
166
167Both ``string`` and ``code`` correspond to the string type; the difference
168is purely to indicate programmer intention.
169
170The :token:`ClassID` must identify a class that has been previously
171declared or defined.
172
173Values
174------
175
176.. productionlist::
177 Value: `SimpleValue` `ValueSuffix`*
178 ValueSuffix: "{" `RangeList` "}"
179 :| "[" `RangeList` "]"
180 :| "." `TokIdentifier`
181 RangeList: `RangePiece` ("," `RangePiece`)*
182 RangePiece: `TokInteger`
183 :| `TokInteger` "-" `TokInteger`
184 :| `TokInteger` `TokInteger`
185
186The peculiar last form of :token:`RangePiece` is due to the fact that the
187"``-``" is included in the :token:`TokInteger`, hence ``1-5`` gets lexed as
188two consecutive :token:`TokInteger`'s, with values ``1`` and ``-5``,
189instead of "1", "-", and "5".
190The :token:`RangeList` can be thought of as specifying "list slice" in some
191contexts.
192
193
194:token:`SimpleValue` has a number of forms:
195
196
197.. productionlist::
198 SimpleValue: `TokIdentifier`
199
200The value will be the variable referenced by the identifier. It can be one
201of:
202
203.. The code for this is exceptionally abstruse. These examples are a
204 best-effort attempt.
205
206* name of a ``def``, such as the use of ``Bar`` in::
207
208 def Bar : SomeClass {
209 int X = 5;
210 }
211
212 def Foo {
213 SomeClass Baz = Bar;
214 }
215
216* value local to a ``def``, such as the use of ``Bar`` in::
217
218 def Foo {
219 int Bar = 5;
220 int Baz = Bar;
221 }
222
223* a template arg of a ``class``, such as the use of ``Bar`` in::
224
225 class Foo<int Bar> {
226 int Baz = Bar;
227 }
228
229* value local to a ``multiclass``, such as the use of ``Bar`` in::
230
231 multiclass Foo {
232 int Bar = 5;
233 int Baz = Bar;
234 }
235
236* a template arg to a ``multiclass``, such as the use of ``Bar`` in::
237
238 multiclass Foo<int Bar> {
239 int Baz = Bar;
240 }
241
242.. productionlist::
243 SimpleValue: `TokInteger`
244
245This represents the numeric value of the integer.
246
247.. productionlist::
248 SimpleValue: `TokString`+
249
250Multiple adjacent string literals are concatenated like in C/C++. The value
251is the concatenation of the strings.
252
253.. productionlist::
254 SimpleValue: `TokCodeFragment`
255
256The value is the string value of the code fragment.
257
258.. productionlist::
259 SimpleValue: "?"
260
261``?`` represents an "unset" initializer.
262
263.. productionlist::
264 SimpleValue: "{" `ValueList` "}"
265 ValueList: [`ValueListNE`]
266 ValueListNE: `Value` ("," `Value`)*
267
268This represents a sequence of bits, as would be used to initialize a
269``bits<n>`` field (where ``n`` is the number of bits).
270
271.. productionlist::
272 SimpleValue: `ClassID` "<" `ValueListNE` ">"
273
274This generates a new anonymous record definition (as would be created by an
275unnamed ``def`` inheriting from the given class with the given template
276arguments) and the value is the value of that record definition.
277
278.. productionlist::
279 SimpleValue: "[" `ValueList` "]" ["<" `Type` ">"]
280
281A list initializer. The optional :token:`Type` can be used to indicate a
282specific element type, otherwise the element type will be deduced from the
283given values.
284
285.. The initial `DagArg` of the dag must start with an identifier or
286 !cast, but this is more of an implementation detail and so for now just
287 leave it out.
288
289.. productionlist::
290 SimpleValue: "(" `DagArg` `DagArgList` ")"
291 DagArgList: `DagArg` ("," `DagArg`)*
Jakob Stoklund Olesen91a58482013-03-24 19:36:51 +0000292 DagArg: `Value` [":" `TokVarName`] | `TokVarName`
Sean Silva1b600182013-01-07 02:43:44 +0000293
294The initial :token:`DagArg` is called the "operator" of the dag.
295
296.. productionlist::
297 SimpleValue: `BangOperator` ["<" `Type` ">"] "(" `ValueListNE` ")"
298
299Bodies
300------
301
302.. productionlist::
303 ObjectBody: `BaseClassList` `Body`
Sean Silva6868ac42013-02-01 03:32:38 +0000304 BaseClassList: [":" `BaseClassListNE`]
Sean Silva1b600182013-01-07 02:43:44 +0000305 BaseClassListNE: `SubClassRef` ("," `SubClassRef`)*
Sean Silvadce94d32013-01-09 02:20:24 +0000306 SubClassRef: (`ClassID` | `MultiClassID`) ["<" `ValueList` ">"]
Sean Silva1b600182013-01-07 02:43:44 +0000307 DefmID: `TokIdentifier`
308
Sean Silvadce94d32013-01-09 02:20:24 +0000309The version with the :token:`MultiClassID` is only valid in the
Sean Silva1b600182013-01-07 02:43:44 +0000310:token:`BaseClassList` of a ``defm``.
Sean Silvadce94d32013-01-09 02:20:24 +0000311The :token:`MultiClassID` should be the name of a ``multiclass``.
Sean Silva1b600182013-01-07 02:43:44 +0000312
313.. put this somewhere else
314
315It is after parsing the base class list that the "let stack" is applied.
316
317.. productionlist::
318 Body: ";" | "{" BodyList "}"
319 BodyList: BodyItem*
320 BodyItem: `Declaration` ";"
321 :| "let" `TokIdentifier` [`RangeList`] "=" `Value` ";"
322
323The ``let`` form allows overriding the value of an inherited field.
324
325``def``
326-------
327
328.. TODO::
329 There can be pastes in the names here, like ``#NAME#``. Look into that
330 and document it (it boils down to ParseIDValue with IDParseMode ==
331 ParseNameMode). ParseObjectName calls into the general ParseValue, with
332 the only different from "arbitrary expression parsing" being IDParseMode
333 == Mode.
334
335.. productionlist::
336 Def: "def" `TokIdentifier` `ObjectBody`
337
338Defines a record whose name is given by the :token:`TokIdentifier`. The
339fields of the record are inherited from the base classes and defined in the
340body.
341
342Special handling occurs if this ``def`` appears inside a ``multiclass`` or
343a ``foreach``.
344
345``defm``
346--------
347
348.. productionlist::
Sean Silva59e7bd52013-02-01 03:50:20 +0000349 Defm: "defm" `TokIdentifier` ":" `BaseClassListNE` ";"
Sean Silva1b600182013-01-07 02:43:44 +0000350
351Note that in the :token:`BaseClassList`, all of the ``multiclass``'s must
352precede any ``class``'s that appear.
353
354``foreach``
355-----------
356
357.. productionlist::
358 Foreach: "foreach" `Declaration` "in" "{" `Object`* "}"
359 :| "foreach" `Declaration` "in" `Object`
360
361The value assigned to the variable in the declaration is iterated over and
362the object or object list is reevaluated with the variable set at each
363iterated value.
364
365Top-Level ``let``
366-----------------
367
368.. productionlist::
369 Let: "let" `LetList` "in" "{" `Object`* "}"
370 :| "let" `LetList` "in" `Object`
371 LetList: `LetItem` ("," `LetItem`)*
372 LetItem: `TokIdentifier` [`RangeList`] "=" `Value`
373
374This is effectively equivalent to ``let`` inside the body of a record
375except that it applies to multiple records at a time. The bindings are
376applied at the end of parsing the base classes of a record.
377
378``multiclass``
379--------------
380
381.. productionlist::
382 MultiClass: "multiclass" `TokIdentifier` [`TemplateArgList`]
Sean Silvac95fe282013-01-09 02:11:55 +0000383 : [":" `BaseMultiClassList`] "{" `MultiClassObject`+ "}"
Sean Silva1b600182013-01-07 02:43:44 +0000384 BaseMultiClassList: `MultiClassID` ("," `MultiClassID`)*
385 MultiClassID: `TokIdentifier`
Sean Silvac95fe282013-01-09 02:11:55 +0000386 MultiClassObject: `Def` | `Defm` | `Let` | `Foreach`