blob: cda41b537d2d4f67746d74dc2c48c8ed5a9760c0 [file] [log] [blame]
Renato Golinca105642014-03-20 16:08:34 +00001========
2TableGen
3========
4
5.. contents::
6 :local:
7
8.. toctree::
9 :hidden:
10
11 BackEnds
12 LangRef
Renato Golin33f973a2014-04-01 09:51:49 +000013 LangIntro
Renato Golinca105642014-03-20 16:08:34 +000014 Deficiencies
15
16Introduction
17============
18
19TableGen's purpose is to help a human develop and maintain records of
20domain-specific information. Because there may be a large number of these
21records, it is specifically designed to allow writing flexible descriptions and
22for common features of these records to be factored out. This reduces the
23amount of duplication in the description, reduces the chance of error, and makes
24it easier to structure domain specific information.
25
26The core part of TableGen parses a file, instantiates the declarations, and
Eli Bendersky1f30b0b2014-03-20 17:45:30 +000027hands the result off to a domain-specific `backend`_ for processing.
Renato Golinca105642014-03-20 16:08:34 +000028
29The current major users of TableGen are :doc:`../CodeGenerator`
30and the
31`Clang diagnostics and attributes <http://clang.llvm.org/docs/UsersManual.html#controlling-errors-and-warnings>`_.
32
33Note that if you work on TableGen much, and use emacs or vim, that you can find
34an emacs "TableGen mode" and a vim language file in the ``llvm/utils/emacs`` and
35``llvm/utils/vim`` directories of your LLVM distribution, respectively.
36
37.. _intro:
38
39
40The TableGen program
41====================
42
43TableGen files are interpreted by the TableGen program: `llvm-tblgen` available
44on your build directory under `bin`. It is not installed in the system (or where
45your sysroot is set to), since it has no use beyond LLVM's build process.
46
47Running TableGen
48----------------
49
50TableGen runs just like any other LLVM tool. The first (optional) argument
51specifies the file to read. If a filename is not specified, ``llvm-tblgen``
52reads from standard input.
53
54To be useful, one of the `backends`_ must be used. These backends are
55selectable on the command line (type '``llvm-tblgen -help``' for a list). For
56example, to get a list of all of the definitions that subclass a particular type
57(which can be useful for building up an enum list of these records), use the
58``-print-enums`` option:
59
60.. code-block:: bash
61
62 $ llvm-tblgen X86.td -print-enums -class=Register
63 AH, AL, AX, BH, BL, BP, BPL, BX, CH, CL, CX, DH, DI, DIL, DL, DX, EAX, EBP, EBX,
64 ECX, EDI, EDX, EFLAGS, EIP, ESI, ESP, FP0, FP1, FP2, FP3, FP4, FP5, FP6, IP,
65 MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, R10, R10B, R10D, R10W, R11, R11B, R11D,
66 R11W, R12, R12B, R12D, R12W, R13, R13B, R13D, R13W, R14, R14B, R14D, R14W, R15,
67 R15B, R15D, R15W, R8, R8B, R8D, R8W, R9, R9B, R9D, R9W, RAX, RBP, RBX, RCX, RDI,
68 RDX, RIP, RSI, RSP, SI, SIL, SP, SPL, ST0, ST1, ST2, ST3, ST4, ST5, ST6, ST7,
69 XMM0, XMM1, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, XMM2, XMM3, XMM4, XMM5,
70 XMM6, XMM7, XMM8, XMM9,
71
72 $ llvm-tblgen X86.td -print-enums -class=Instruction
73 ABS_F, ABS_Fp32, ABS_Fp64, ABS_Fp80, ADC32mi, ADC32mi8, ADC32mr, ADC32ri,
74 ADC32ri8, ADC32rm, ADC32rr, ADC64mi32, ADC64mi8, ADC64mr, ADC64ri32, ADC64ri8,
75 ADC64rm, ADC64rr, ADD16mi, ADD16mi8, ADD16mr, ADD16ri, ADD16ri8, ADD16rm,
76 ADD16rr, ADD32mi, ADD32mi8, ADD32mr, ADD32ri, ADD32ri8, ADD32rm, ADD32rr,
77 ADD64mi32, ADD64mi8, ADD64mr, ADD64ri32, ...
78
79The default backend prints out all of the records.
80
81If you plan to use TableGen, you will most likely have to write a `backend`_
82that extracts the information specific to what you need and formats it in the
83appropriate way.
84
85Example
86-------
87
88With no other arguments, `llvm-tblgen` parses the specified file and prints out all
89of the classes, then all of the definitions. This is a good way to see what the
90various definitions expand to fully. Running this on the ``X86.td`` file prints
91this (at the time of this writing):
92
93.. code-block:: llvm
94
95 ...
96 def ADD32rr { // Instruction X86Inst I
97 string Namespace = "X86";
98 dag OutOperandList = (outs GR32:$dst);
99 dag InOperandList = (ins GR32:$src1, GR32:$src2);
100 string AsmString = "add{l}\t{$src2, $dst|$dst, $src2}";
101 list<dag> Pattern = [(set GR32:$dst, (add GR32:$src1, GR32:$src2))];
102 list<Register> Uses = [];
103 list<Register> Defs = [EFLAGS];
104 list<Predicate> Predicates = [];
105 int CodeSize = 3;
106 int AddedComplexity = 0;
107 bit isReturn = 0;
108 bit isBranch = 0;
109 bit isIndirectBranch = 0;
110 bit isBarrier = 0;
111 bit isCall = 0;
112 bit canFoldAsLoad = 0;
113 bit mayLoad = 0;
114 bit mayStore = 0;
115 bit isImplicitDef = 0;
116 bit isConvertibleToThreeAddress = 1;
117 bit isCommutable = 1;
118 bit isTerminator = 0;
119 bit isReMaterializable = 0;
120 bit isPredicable = 0;
121 bit hasDelaySlot = 0;
122 bit usesCustomInserter = 0;
123 bit hasCtrlDep = 0;
124 bit isNotDuplicable = 0;
125 bit hasSideEffects = 0;
126 bit neverHasSideEffects = 0;
127 InstrItinClass Itinerary = NoItinerary;
128 string Constraints = "";
129 string DisableEncoding = "";
130 bits<8> Opcode = { 0, 0, 0, 0, 0, 0, 0, 1 };
131 Format Form = MRMDestReg;
132 bits<6> FormBits = { 0, 0, 0, 0, 1, 1 };
133 ImmType ImmT = NoImm;
134 bits<3> ImmTypeBits = { 0, 0, 0 };
135 bit hasOpSizePrefix = 0;
136 bit hasAdSizePrefix = 0;
137 bits<4> Prefix = { 0, 0, 0, 0 };
138 bit hasREX_WPrefix = 0;
139 FPFormat FPForm = ?;
140 bits<3> FPFormBits = { 0, 0, 0 };
141 }
142 ...
143
144This definition corresponds to the 32-bit register-register ``add`` instruction
145of the x86 architecture. ``def ADD32rr`` defines a record named
146``ADD32rr``, and the comment at the end of the line indicates the superclasses
147of the definition. The body of the record contains all of the data that
148TableGen assembled for the record, indicating that the instruction is part of
Eli Bendersky1f30b0b2014-03-20 17:45:30 +0000149the "X86" namespace, the pattern indicating how the instruction is selected by
150the code generator, that it is a two-address instruction, has a particular
151encoding, etc. The contents and semantics of the information in the record are
152specific to the needs of the X86 backend, and are only shown as an example.
Renato Golinca105642014-03-20 16:08:34 +0000153
154As you can see, a lot of information is needed for every instruction supported
155by the code generator, and specifying it all manually would be unmaintainable,
156prone to bugs, and tiring to do in the first place. Because we are using
157TableGen, all of the information was derived from the following definition:
158
159.. code-block:: llvm
160
161 let Defs = [EFLAGS],
162 isCommutable = 1, // X = ADD Y,Z --> X = ADD Z,Y
163 isConvertibleToThreeAddress = 1 in // Can transform into LEA.
164 def ADD32rr : I<0x01, MRMDestReg, (outs GR32:$dst),
165 (ins GR32:$src1, GR32:$src2),
166 "add{l}\t{$src2, $dst|$dst, $src2}",
167 [(set GR32:$dst, (add GR32:$src1, GR32:$src2))]>;
168
169This definition makes use of the custom class ``I`` (extended from the custom
170class ``X86Inst``), which is defined in the X86-specific TableGen file, to
171factor out the common features that instructions of its class share. A key
172feature of TableGen is that it allows the end-user to define the abstractions
173they prefer to use when describing their information.
174
175Each ``def`` record has a special entry called "NAME". This is the name of the
176record ("``ADD32rr``" above). In the general case ``def`` names can be formed
177from various kinds of string processing expressions and ``NAME`` resolves to the
178final value obtained after resolving all of those expressions. The user may
179refer to ``NAME`` anywhere she desires to use the ultimate name of the ``def``.
180``NAME`` should not be defined anywhere else in user code to avoid conflicts.
181
182Syntax
183======
184
Eli Bendersky1f30b0b2014-03-20 17:45:30 +0000185TableGen has a syntax that is loosely based on C++ templates, with built-in
Renato Golinca105642014-03-20 16:08:34 +0000186types and specification. In addition, TableGen's syntax introduces some
187automation concepts like multiclass, foreach, let, etc.
188
189Basic concepts
190--------------
191
192TableGen files consist of two key parts: 'classes' and 'definitions', both of
193which are considered 'records'.
194
195**TableGen records** have a unique name, a list of values, and a list of
196superclasses. The list of values is the main data that TableGen builds for each
197record; it is this that holds the domain specific information for the
Eli Benderskye6c97e02014-03-20 17:59:37 +0000198application. The interpretation of this data is left to a specific `backend`_,
Renato Golinca105642014-03-20 16:08:34 +0000199but the structure and format rules are taken care of and are fixed by
200TableGen.
201
202**TableGen definitions** are the concrete form of 'records'. These generally do
203not have any undefined values, and are marked with the '``def``' keyword.
204
205.. code-block:: llvm
206
207 def FeatureFPARMv8 : SubtargetFeature<"fp-armv8", "HasFPARMv8", "true",
208 "Enable ARMv8 FP">;
209
210In this example, FeatureFPARMv8 is ``SubtargetFeature`` record initialised
211with some values. The names of the classes are defined via the
212keyword `class` either on the same file or some other included. Most target
213TableGen files include the generic ones in ``include/llvm/Target``.
214
215**TableGen classes** are abstract records that are used to build and describe
216other records. These classes allow the end-user to build abstractions for
217either the domain they are targeting (such as "Register", "RegisterClass", and
218"Instruction" in the LLVM code generator) or for the implementor to help factor
219out common properties of records (such as "FPInst", which is used to represent
220floating point instructions in the X86 backend). TableGen keeps track of all of
221the classes that are used to build up a definition, so the backend can find all
222definitions of a particular class, such as "Instruction".
223
224.. code-block:: llvm
225
226 class ProcNoItin<string Name, list<SubtargetFeature> Features>
227 : Processor<Name, NoItineraries, Features>;
228
229Here, the class ProcNoItin, receiving parameters `Name` of type `string` and
230a list of target features is specializing the class Processor by passing the
231arguments down as well as hard-coding NoItineraries.
232
233**TableGen multiclasses** are groups of abstract records that are instantiated
234all at once. Each instantiation can result in multiple TableGen definitions.
235If a multiclass inherits from another multiclass, the definitions in the
236sub-multiclass become part of the current multiclass, as if they were declared
237in the current multiclass.
238
239.. code-block:: llvm
240
241 multiclass ro_signed_pats<string T, string Rm, dag Base, dag Offset, dag Extend,
242 dag address, ValueType sty> {
243 def : Pat<(i32 (!cast<SDNode>("sextload" # sty) address)),
244 (!cast<Instruction>("LDRS" # T # "w_" # Rm # "_RegOffset")
245 Base, Offset, Extend)>;
246
247 def : Pat<(i64 (!cast<SDNode>("sextload" # sty) address)),
248 (!cast<Instruction>("LDRS" # T # "x_" # Rm # "_RegOffset")
249 Base, Offset, Extend)>;
250 }
251
252 defm : ro_signed_pats<"B", Rm, Base, Offset, Extend,
253 !foreach(decls.pattern, address,
254 !subst(SHIFT, imm_eq0, decls.pattern)),
255 i8>;
256
257
258
Renato Golin33f973a2014-04-01 09:51:49 +0000259See the :doc:`TableGen Language Introduction <LangIntro>` for more generic
260information on the usage of the language, and the
261:doc:`TableGen Language Reference <LangRef>` for more in-depth description
262of the formal language specification.
Renato Golinca105642014-03-20 16:08:34 +0000263
264.. _backend:
265.. _backends:
266
267TableGen backends
268=================
269
270TableGen files have no real meaning without a back-end. The default operation
271of running ``llvm-tblgen`` is to print the information in a textual format, but
272that's only useful for debugging of the TableGen files themselves. The power
273in TableGen is, however, to interpret the source files into an internal
274representation that can be generated into anything you want.
275
Jonathan Roelofsb24884d2014-10-03 20:46:05 +0000276Current usage of TableGen is to create huge include files with tables that you
Renato Golinca105642014-03-20 16:08:34 +0000277can either include directly (if the output is in the language you're coding),
278or be used in pre-processing via macros surrounding the include of the file.
279
280Direct output can be used if the back-end already prints a table in C format
281or if the output is just a list of strings (for error and warning messages).
282Pre-processed output should be used if the same information needs to be used
283in different contexts (like Instruction names), so your back-end should print
284a meta-information list that can be shaped into different compile-time formats.
285
286See the `TableGen BackEnds <BackEnds.html>`_ for more information.
287
288TableGen Deficiencies
289=====================
290
291Despite being very generic, TableGen has some deficiencies that have been
292pointed out numerous times. The common theme is that, while TableGen allows
293you to build Domain-Specific-Languages, the final languages that you create
294lack the power of other DSLs, which in turn increase considerably the size
JF Bastienac8b66b2014-08-05 23:27:34 +0000295and complexity of TableGen files.
Renato Golinca105642014-03-20 16:08:34 +0000296
297At the same time, TableGen allows you to create virtually any meaning of
298the basic concepts via custom-made back-ends, which can pervert the original
299design and make it very hard for newcomers to understand the evil TableGen
300file.
301
Eli Benderskye6c97e02014-03-20 17:59:37 +0000302There are some in favour of extending the semantics even more, but making sure
303back-ends adhere to strict rules. Others are suggesting we should move to less,
Renato Golinca105642014-03-20 16:08:34 +0000304more powerful DSLs designed with specific purposes, or even re-using existing
305DSLs.
306
Eli Benderskye6c97e02014-03-20 17:59:37 +0000307Either way, this is a discussion that will likely span across several years,
Renato Golinca105642014-03-20 16:08:34 +0000308if not decades. You can read more in the `TableGen Deficiencies <Deficiencies.html>`_
309document.