blob: 511a97e251486b494ba3ebfcba92989ffcead9c4 [file] [log] [blame]
Renato Golinca105642014-03-20 16:08:34 +00001========
2TableGen
3========
4
5.. contents::
6 :local:
7
8.. toctree::
9 :hidden:
10
11 BackEnds
12 LangRef
13 Deficiencies
14
15Introduction
16============
17
18TableGen's purpose is to help a human develop and maintain records of
19domain-specific information. Because there may be a large number of these
20records, it is specifically designed to allow writing flexible descriptions and
21for common features of these records to be factored out. This reduces the
22amount of duplication in the description, reduces the chance of error, and makes
23it easier to structure domain specific information.
24
25The core part of TableGen parses a file, instantiates the declarations, and
Eli Bendersky1f30b0b2014-03-20 17:45:30 +000026hands the result off to a domain-specific `backend`_ for processing.
Renato Golinca105642014-03-20 16:08:34 +000027
28The current major users of TableGen are :doc:`../CodeGenerator`
29and the
30`Clang diagnostics and attributes <http://clang.llvm.org/docs/UsersManual.html#controlling-errors-and-warnings>`_.
31
32Note that if you work on TableGen much, and use emacs or vim, that you can find
33an emacs "TableGen mode" and a vim language file in the ``llvm/utils/emacs`` and
34``llvm/utils/vim`` directories of your LLVM distribution, respectively.
35
36.. _intro:
37
38
39The TableGen program
40====================
41
42TableGen files are interpreted by the TableGen program: `llvm-tblgen` available
43on your build directory under `bin`. It is not installed in the system (or where
44your sysroot is set to), since it has no use beyond LLVM's build process.
45
46Running TableGen
47----------------
48
49TableGen runs just like any other LLVM tool. The first (optional) argument
50specifies the file to read. If a filename is not specified, ``llvm-tblgen``
51reads from standard input.
52
53To be useful, one of the `backends`_ must be used. These backends are
54selectable on the command line (type '``llvm-tblgen -help``' for a list). For
55example, to get a list of all of the definitions that subclass a particular type
56(which can be useful for building up an enum list of these records), use the
57``-print-enums`` option:
58
59.. code-block:: bash
60
61 $ llvm-tblgen X86.td -print-enums -class=Register
62 AH, AL, AX, BH, BL, BP, BPL, BX, CH, CL, CX, DH, DI, DIL, DL, DX, EAX, EBP, EBX,
63 ECX, EDI, EDX, EFLAGS, EIP, ESI, ESP, FP0, FP1, FP2, FP3, FP4, FP5, FP6, IP,
64 MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, R10, R10B, R10D, R10W, R11, R11B, R11D,
65 R11W, R12, R12B, R12D, R12W, R13, R13B, R13D, R13W, R14, R14B, R14D, R14W, R15,
66 R15B, R15D, R15W, R8, R8B, R8D, R8W, R9, R9B, R9D, R9W, RAX, RBP, RBX, RCX, RDI,
67 RDX, RIP, RSI, RSP, SI, SIL, SP, SPL, ST0, ST1, ST2, ST3, ST4, ST5, ST6, ST7,
68 XMM0, XMM1, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, XMM2, XMM3, XMM4, XMM5,
69 XMM6, XMM7, XMM8, XMM9,
70
71 $ llvm-tblgen X86.td -print-enums -class=Instruction
72 ABS_F, ABS_Fp32, ABS_Fp64, ABS_Fp80, ADC32mi, ADC32mi8, ADC32mr, ADC32ri,
73 ADC32ri8, ADC32rm, ADC32rr, ADC64mi32, ADC64mi8, ADC64mr, ADC64ri32, ADC64ri8,
74 ADC64rm, ADC64rr, ADD16mi, ADD16mi8, ADD16mr, ADD16ri, ADD16ri8, ADD16rm,
75 ADD16rr, ADD32mi, ADD32mi8, ADD32mr, ADD32ri, ADD32ri8, ADD32rm, ADD32rr,
76 ADD64mi32, ADD64mi8, ADD64mr, ADD64ri32, ...
77
78The default backend prints out all of the records.
79
80If you plan to use TableGen, you will most likely have to write a `backend`_
81that extracts the information specific to what you need and formats it in the
82appropriate way.
83
84Example
85-------
86
87With no other arguments, `llvm-tblgen` parses the specified file and prints out all
88of the classes, then all of the definitions. This is a good way to see what the
89various definitions expand to fully. Running this on the ``X86.td`` file prints
90this (at the time of this writing):
91
92.. code-block:: llvm
93
94 ...
95 def ADD32rr { // Instruction X86Inst I
96 string Namespace = "X86";
97 dag OutOperandList = (outs GR32:$dst);
98 dag InOperandList = (ins GR32:$src1, GR32:$src2);
99 string AsmString = "add{l}\t{$src2, $dst|$dst, $src2}";
100 list<dag> Pattern = [(set GR32:$dst, (add GR32:$src1, GR32:$src2))];
101 list<Register> Uses = [];
102 list<Register> Defs = [EFLAGS];
103 list<Predicate> Predicates = [];
104 int CodeSize = 3;
105 int AddedComplexity = 0;
106 bit isReturn = 0;
107 bit isBranch = 0;
108 bit isIndirectBranch = 0;
109 bit isBarrier = 0;
110 bit isCall = 0;
111 bit canFoldAsLoad = 0;
112 bit mayLoad = 0;
113 bit mayStore = 0;
114 bit isImplicitDef = 0;
115 bit isConvertibleToThreeAddress = 1;
116 bit isCommutable = 1;
117 bit isTerminator = 0;
118 bit isReMaterializable = 0;
119 bit isPredicable = 0;
120 bit hasDelaySlot = 0;
121 bit usesCustomInserter = 0;
122 bit hasCtrlDep = 0;
123 bit isNotDuplicable = 0;
124 bit hasSideEffects = 0;
125 bit neverHasSideEffects = 0;
126 InstrItinClass Itinerary = NoItinerary;
127 string Constraints = "";
128 string DisableEncoding = "";
129 bits<8> Opcode = { 0, 0, 0, 0, 0, 0, 0, 1 };
130 Format Form = MRMDestReg;
131 bits<6> FormBits = { 0, 0, 0, 0, 1, 1 };
132 ImmType ImmT = NoImm;
133 bits<3> ImmTypeBits = { 0, 0, 0 };
134 bit hasOpSizePrefix = 0;
135 bit hasAdSizePrefix = 0;
136 bits<4> Prefix = { 0, 0, 0, 0 };
137 bit hasREX_WPrefix = 0;
138 FPFormat FPForm = ?;
139 bits<3> FPFormBits = { 0, 0, 0 };
140 }
141 ...
142
143This definition corresponds to the 32-bit register-register ``add`` instruction
144of the x86 architecture. ``def ADD32rr`` defines a record named
145``ADD32rr``, and the comment at the end of the line indicates the superclasses
146of the definition. The body of the record contains all of the data that
147TableGen assembled for the record, indicating that the instruction is part of
Eli Bendersky1f30b0b2014-03-20 17:45:30 +0000148the "X86" namespace, the pattern indicating how the instruction is selected by
149the code generator, that it is a two-address instruction, has a particular
150encoding, etc. The contents and semantics of the information in the record are
151specific to the needs of the X86 backend, and are only shown as an example.
Renato Golinca105642014-03-20 16:08:34 +0000152
153As you can see, a lot of information is needed for every instruction supported
154by the code generator, and specifying it all manually would be unmaintainable,
155prone to bugs, and tiring to do in the first place. Because we are using
156TableGen, all of the information was derived from the following definition:
157
158.. code-block:: llvm
159
160 let Defs = [EFLAGS],
161 isCommutable = 1, // X = ADD Y,Z --> X = ADD Z,Y
162 isConvertibleToThreeAddress = 1 in // Can transform into LEA.
163 def ADD32rr : I<0x01, MRMDestReg, (outs GR32:$dst),
164 (ins GR32:$src1, GR32:$src2),
165 "add{l}\t{$src2, $dst|$dst, $src2}",
166 [(set GR32:$dst, (add GR32:$src1, GR32:$src2))]>;
167
168This definition makes use of the custom class ``I`` (extended from the custom
169class ``X86Inst``), which is defined in the X86-specific TableGen file, to
170factor out the common features that instructions of its class share. A key
171feature of TableGen is that it allows the end-user to define the abstractions
172they prefer to use when describing their information.
173
174Each ``def`` record has a special entry called "NAME". This is the name of the
175record ("``ADD32rr``" above). In the general case ``def`` names can be formed
176from various kinds of string processing expressions and ``NAME`` resolves to the
177final value obtained after resolving all of those expressions. The user may
178refer to ``NAME`` anywhere she desires to use the ultimate name of the ``def``.
179``NAME`` should not be defined anywhere else in user code to avoid conflicts.
180
181Syntax
182======
183
Eli Bendersky1f30b0b2014-03-20 17:45:30 +0000184TableGen has a syntax that is loosely based on C++ templates, with built-in
Renato Golinca105642014-03-20 16:08:34 +0000185types and specification. In addition, TableGen's syntax introduces some
186automation concepts like multiclass, foreach, let, etc.
187
188Basic concepts
189--------------
190
191TableGen files consist of two key parts: 'classes' and 'definitions', both of
192which are considered 'records'.
193
194**TableGen records** have a unique name, a list of values, and a list of
195superclasses. The list of values is the main data that TableGen builds for each
196record; it is this that holds the domain specific information for the
Eli Benderskye6c97e02014-03-20 17:59:37 +0000197application. The interpretation of this data is left to a specific `backend`_,
Renato Golinca105642014-03-20 16:08:34 +0000198but the structure and format rules are taken care of and are fixed by
199TableGen.
200
201**TableGen definitions** are the concrete form of 'records'. These generally do
202not have any undefined values, and are marked with the '``def``' keyword.
203
204.. code-block:: llvm
205
206 def FeatureFPARMv8 : SubtargetFeature<"fp-armv8", "HasFPARMv8", "true",
207 "Enable ARMv8 FP">;
208
209In this example, FeatureFPARMv8 is ``SubtargetFeature`` record initialised
210with some values. The names of the classes are defined via the
211keyword `class` either on the same file or some other included. Most target
212TableGen files include the generic ones in ``include/llvm/Target``.
213
214**TableGen classes** are abstract records that are used to build and describe
215other records. These classes allow the end-user to build abstractions for
216either the domain they are targeting (such as "Register", "RegisterClass", and
217"Instruction" in the LLVM code generator) or for the implementor to help factor
218out common properties of records (such as "FPInst", which is used to represent
219floating point instructions in the X86 backend). TableGen keeps track of all of
220the classes that are used to build up a definition, so the backend can find all
221definitions of a particular class, such as "Instruction".
222
223.. code-block:: llvm
224
225 class ProcNoItin<string Name, list<SubtargetFeature> Features>
226 : Processor<Name, NoItineraries, Features>;
227
228Here, the class ProcNoItin, receiving parameters `Name` of type `string` and
229a list of target features is specializing the class Processor by passing the
230arguments down as well as hard-coding NoItineraries.
231
232**TableGen multiclasses** are groups of abstract records that are instantiated
233all at once. Each instantiation can result in multiple TableGen definitions.
234If a multiclass inherits from another multiclass, the definitions in the
235sub-multiclass become part of the current multiclass, as if they were declared
236in the current multiclass.
237
238.. code-block:: llvm
239
240 multiclass ro_signed_pats<string T, string Rm, dag Base, dag Offset, dag Extend,
241 dag address, ValueType sty> {
242 def : Pat<(i32 (!cast<SDNode>("sextload" # sty) address)),
243 (!cast<Instruction>("LDRS" # T # "w_" # Rm # "_RegOffset")
244 Base, Offset, Extend)>;
245
246 def : Pat<(i64 (!cast<SDNode>("sextload" # sty) address)),
247 (!cast<Instruction>("LDRS" # T # "x_" # Rm # "_RegOffset")
248 Base, Offset, Extend)>;
249 }
250
251 defm : ro_signed_pats<"B", Rm, Base, Offset, Extend,
252 !foreach(decls.pattern, address,
253 !subst(SHIFT, imm_eq0, decls.pattern)),
254 i8>;
255
256
257
258See the `TableGen Language Reference <LangRef.html>`_ for more information.
259
260.. _backend:
261.. _backends:
262
263TableGen backends
264=================
265
266TableGen files have no real meaning without a back-end. The default operation
267of running ``llvm-tblgen`` is to print the information in a textual format, but
268that's only useful for debugging of the TableGen files themselves. The power
269in TableGen is, however, to interpret the source files into an internal
270representation that can be generated into anything you want.
271
272Current usage of TableGen is to create include huge files with tables that you
273can either include directly (if the output is in the language you're coding),
274or be used in pre-processing via macros surrounding the include of the file.
275
276Direct output can be used if the back-end already prints a table in C format
277or if the output is just a list of strings (for error and warning messages).
278Pre-processed output should be used if the same information needs to be used
279in different contexts (like Instruction names), so your back-end should print
280a meta-information list that can be shaped into different compile-time formats.
281
282See the `TableGen BackEnds <BackEnds.html>`_ for more information.
283
284TableGen Deficiencies
285=====================
286
287Despite being very generic, TableGen has some deficiencies that have been
288pointed out numerous times. The common theme is that, while TableGen allows
289you to build Domain-Specific-Languages, the final languages that you create
290lack the power of other DSLs, which in turn increase considerably the size
291and complecity of TableGen files.
292
293At the same time, TableGen allows you to create virtually any meaning of
294the basic concepts via custom-made back-ends, which can pervert the original
295design and make it very hard for newcomers to understand the evil TableGen
296file.
297
Eli Benderskye6c97e02014-03-20 17:59:37 +0000298There are some in favour of extending the semantics even more, but making sure
299back-ends adhere to strict rules. Others are suggesting we should move to less,
Renato Golinca105642014-03-20 16:08:34 +0000300more powerful DSLs designed with specific purposes, or even re-using existing
301DSLs.
302
Eli Benderskye6c97e02014-03-20 17:59:37 +0000303Either way, this is a discussion that will likely span across several years,
Renato Golinca105642014-03-20 16:08:34 +0000304if not decades. You can read more in the `TableGen Deficiencies <Deficiencies.html>`_
305document.