|  | ============================================================ | 
|  | Extending LLVM: Adding instructions, intrinsics, types, etc. | 
|  | ============================================================ | 
|  |  | 
|  | Introduction and Warning | 
|  | ======================== | 
|  |  | 
|  |  | 
|  | During the course of using LLVM, you may wish to customize it for your research | 
|  | project or for experimentation. At this point, you may realize that you need to | 
|  | add something to LLVM, whether it be a new fundamental type, a new intrinsic | 
|  | function, or a whole new instruction. | 
|  |  | 
|  | When you come to this realization, stop and think. Do you really need to extend | 
|  | LLVM? Is it a new fundamental capability that LLVM does not support at its | 
|  | current incarnation or can it be synthesized from already pre-existing LLVM | 
|  | elements? If you are not sure, ask on the `LLVM-dev | 
|  | <http://lists.llvm.org/mailman/listinfo/llvm-dev>`_ list. The reason is that | 
|  | extending LLVM will get involved as you need to update all the different passes | 
|  | that you intend to use with your extension, and there are ``many`` LLVM analyses | 
|  | and transformations, so it may be quite a bit of work. | 
|  |  | 
|  | Adding an `intrinsic function`_ is far easier than adding an | 
|  | instruction, and is transparent to optimization passes.  If your added | 
|  | functionality can be expressed as a function call, an intrinsic function is the | 
|  | method of choice for LLVM extension. | 
|  |  | 
|  | Before you invest a significant amount of effort into a non-trivial extension, | 
|  | **ask on the list** if what you are looking to do can be done with | 
|  | already-existing infrastructure, or if maybe someone else is already working on | 
|  | it. You will save yourself a lot of time and effort by doing so. | 
|  |  | 
|  | .. _intrinsic function: | 
|  |  | 
|  | Adding a new intrinsic function | 
|  | =============================== | 
|  |  | 
|  | Adding a new intrinsic function to LLVM is much easier than adding a new | 
|  | instruction.  Almost all extensions to LLVM should start as an intrinsic | 
|  | function and then be turned into an instruction if warranted. | 
|  |  | 
|  | #. ``llvm/docs/LangRef.html``: | 
|  |  | 
|  | Document the intrinsic.  Decide whether it is code generator specific and | 
|  | what the restrictions are.  Talk to other people about it so that you are | 
|  | sure it's a good idea. | 
|  |  | 
|  | #. ``llvm/include/llvm/IR/Intrinsics*.td``: | 
|  |  | 
|  | Add an entry for your intrinsic.  Describe its memory access characteristics | 
|  | for optimization (this controls whether it will be DCE'd, CSE'd, etc). Note | 
|  | that any intrinsic using one of the ``llvm_any*_ty`` types for an argument or | 
|  | return type will be deemed by ``tblgen`` as overloaded and the corresponding | 
|  | suffix will be required on the intrinsic's name. | 
|  |  | 
|  | #. ``llvm/lib/Analysis/ConstantFolding.cpp``: | 
|  |  | 
|  | If it is possible to constant fold your intrinsic, add support to it in the | 
|  | ``canConstantFoldCallTo`` and ``ConstantFoldCall`` functions. | 
|  |  | 
|  | #. ``llvm/test/*``: | 
|  |  | 
|  | Add test cases for your test cases to the test suite | 
|  |  | 
|  | Once the intrinsic has been added to the system, you must add code generator | 
|  | support for it.  Generally you must do the following steps: | 
|  |  | 
|  | Add support to the .td file for the target(s) of your choice in | 
|  | ``lib/Target/*/*.td``. | 
|  |  | 
|  | This is usually a matter of adding a pattern to the .td file that matches the | 
|  | intrinsic, though it may obviously require adding the instructions you want to | 
|  | generate as well.  There are lots of examples in the PowerPC and X86 backend | 
|  | to follow. | 
|  |  | 
|  | Adding a new SelectionDAG node | 
|  | ============================== | 
|  |  | 
|  | As with intrinsics, adding a new SelectionDAG node to LLVM is much easier than | 
|  | adding a new instruction.  New nodes are often added to help represent | 
|  | instructions common to many targets.  These nodes often map to an LLVM | 
|  | instruction (add, sub) or intrinsic (byteswap, population count).  In other | 
|  | cases, new nodes have been added to allow many targets to perform a common task | 
|  | (converting between floating point and integer representation) or capture more | 
|  | complicated behavior in a single node (rotate). | 
|  |  | 
|  | #. ``include/llvm/CodeGen/ISDOpcodes.h``: | 
|  |  | 
|  | Add an enum value for the new SelectionDAG node. | 
|  |  | 
|  | #. ``lib/CodeGen/SelectionDAG/SelectionDAG.cpp``: | 
|  |  | 
|  | Add code to print the node to ``getOperationName``.  If your new node can be | 
|  | evaluated at compile time when given constant arguments (such as an add of a | 
|  | constant with another constant), find the ``getNode`` method that takes the | 
|  | appropriate number of arguments, and add a case for your node to the switch | 
|  | statement that performs constant folding for nodes that take the same number | 
|  | of arguments as your new node. | 
|  |  | 
|  | #. ``lib/CodeGen/SelectionDAG/LegalizeDAG.cpp``: | 
|  |  | 
|  | Add code to `legalize, promote, and expand | 
|  | <CodeGenerator.html#selectiondag_legalize>`_ the node as necessary.  At a | 
|  | minimum, you will need to add a case statement for your node in | 
|  | ``LegalizeOp`` which calls LegalizeOp on the node's operands, and returns a | 
|  | new node if any of the operands changed as a result of being legalized.  It | 
|  | is likely that not all targets supported by the SelectionDAG framework will | 
|  | natively support the new node.  In this case, you must also add code in your | 
|  | node's case statement in ``LegalizeOp`` to Expand your node into simpler, | 
|  | legal operations.  The case for ``ISD::UREM`` for expanding a remainder into | 
|  | a divide, multiply, and a subtract is a good example. | 
|  |  | 
|  | #. ``lib/CodeGen/SelectionDAG/LegalizeDAG.cpp``: | 
|  |  | 
|  | If targets may support the new node being added only at certain sizes, you | 
|  | will also need to add code to your node's case statement in ``LegalizeOp`` | 
|  | to Promote your node's operands to a larger size, and perform the correct | 
|  | operation.  You will also need to add code to ``PromoteOp`` to do this as | 
|  | well.  For a good example, see ``ISD::BSWAP``, which promotes its operand to | 
|  | a wider size, performs the byteswap, and then shifts the correct bytes right | 
|  | to emulate the narrower byteswap in the wider type. | 
|  |  | 
|  | #. ``lib/CodeGen/SelectionDAG/LegalizeDAG.cpp``: | 
|  |  | 
|  | Add a case for your node in ``ExpandOp`` to teach the legalizer how to | 
|  | perform the action represented by the new node on a value that has been split | 
|  | into high and low halves.  This case will be used to support your node with a | 
|  | 64 bit operand on a 32 bit target. | 
|  |  | 
|  | #. ``lib/CodeGen/SelectionDAG/DAGCombiner.cpp``: | 
|  |  | 
|  | If your node can be combined with itself, or other existing nodes in a | 
|  | peephole-like fashion, add a visit function for it, and call that function | 
|  | from. There are several good examples for simple combines you can do; | 
|  | ``visitFABS`` and ``visitSRL`` are good starting places. | 
|  |  | 
|  | #. ``lib/Target/PowerPC/PPCISelLowering.cpp``: | 
|  |  | 
|  | Each target has an implementation of the ``TargetLowering`` class, usually in | 
|  | its own file (although some targets include it in the same file as the | 
|  | DAGToDAGISel).  The default behavior for a target is to assume that your new | 
|  | node is legal for all types that are legal for that target.  If this target | 
|  | does not natively support your node, then tell the target to either Promote | 
|  | it (if it is supported at a larger type) or Expand it.  This will cause the | 
|  | code you wrote in ``LegalizeOp`` above to decompose your new node into other | 
|  | legal nodes for this target. | 
|  |  | 
|  | #. ``lib/Target/TargetSelectionDAG.td``: | 
|  |  | 
|  | Most current targets supported by LLVM generate code using the DAGToDAG | 
|  | method, where SelectionDAG nodes are pattern matched to target-specific | 
|  | nodes, which represent individual instructions.  In order for the targets to | 
|  | match an instruction to your new node, you must add a def for that node to | 
|  | the list in this file, with the appropriate type constraints. Look at | 
|  | ``add``, ``bswap``, and ``fadd`` for examples. | 
|  |  | 
|  | #. ``lib/Target/PowerPC/PPCInstrInfo.td``: | 
|  |  | 
|  | Each target has a tablegen file that describes the target's instruction set. | 
|  | For targets that use the DAGToDAG instruction selection framework, add a | 
|  | pattern for your new node that uses one or more target nodes.  Documentation | 
|  | for this is a bit sparse right now, but there are several decent examples. | 
|  | See the patterns for ``rotl`` in ``PPCInstrInfo.td``. | 
|  |  | 
|  | #. TODO: document complex patterns. | 
|  |  | 
|  | #. ``llvm/test/CodeGen/*``: | 
|  |  | 
|  | Add test cases for your new node to the test suite. | 
|  | ``llvm/test/CodeGen/X86/bswap.ll`` is a good example. | 
|  |  | 
|  | Adding a new instruction | 
|  | ======================== | 
|  |  | 
|  | .. warning:: | 
|  |  | 
|  | Adding instructions changes the bitcode format, and it will take some effort | 
|  | to maintain compatibility with the previous version. Only add an instruction | 
|  | if it is absolutely necessary. | 
|  |  | 
|  | #. ``llvm/include/llvm/IR/Instruction.def``: | 
|  |  | 
|  | add a number for your instruction and an enum name | 
|  |  | 
|  | #. ``llvm/include/llvm/IR/Instructions.h``: | 
|  |  | 
|  | add a definition for the class that will represent your instruction | 
|  |  | 
|  | #. ``llvm/include/llvm/IR/InstVisitor.h``: | 
|  |  | 
|  | add a prototype for a visitor to your new instruction type | 
|  |  | 
|  | #. ``llvm/lib/AsmParser/LLLexer.cpp``: | 
|  |  | 
|  | add a new token to parse your instruction from assembly text file | 
|  |  | 
|  | #. ``llvm/lib/AsmParser/LLParser.cpp``: | 
|  |  | 
|  | add the grammar on how your instruction can be read and what it will | 
|  | construct as a result | 
|  |  | 
|  | #. ``llvm/lib/Bitcode/Reader/BitcodeReader.cpp``: | 
|  |  | 
|  | add a case for your instruction and how it will be parsed from bitcode | 
|  |  | 
|  | #. ``llvm/lib/Bitcode/Writer/BitcodeWriter.cpp``: | 
|  |  | 
|  | add a case for your instruction and how it will be parsed from bitcode | 
|  |  | 
|  | #. ``llvm/lib/IR/Instruction.cpp``: | 
|  |  | 
|  | add a case for how your instruction will be printed out to assembly | 
|  |  | 
|  | #. ``llvm/lib/IR/Instructions.cpp``: | 
|  |  | 
|  | implement the class you defined in ``llvm/include/llvm/Instructions.h`` | 
|  |  | 
|  | #. Test your instruction | 
|  |  | 
|  | #. ``llvm/lib/Target/*``: | 
|  |  | 
|  | add support for your instruction to code generators, or add a lowering pass. | 
|  |  | 
|  | #. ``llvm/test/*``: | 
|  |  | 
|  | add your test cases to the test suite. | 
|  |  | 
|  | Also, you need to implement (or modify) any analyses or passes that you want to | 
|  | understand this new instruction. | 
|  |  | 
|  | Adding a new type | 
|  | ================= | 
|  |  | 
|  | .. warning:: | 
|  |  | 
|  | Adding new types changes the bitcode format, and will break compatibility with | 
|  | currently-existing LLVM installations. Only add new types if it is absolutely | 
|  | necessary. | 
|  |  | 
|  | Adding a fundamental type | 
|  | ------------------------- | 
|  |  | 
|  | #. ``llvm/include/llvm/IR/Type.h``: | 
|  |  | 
|  | add enum for the new type; add static ``Type*`` for this type | 
|  |  | 
|  | #. ``llvm/lib/IR/Type.cpp`` and ``llvm/lib/IR/ValueTypes.cpp``: | 
|  |  | 
|  | add mapping from ``TypeID`` => ``Type*``; initialize the static ``Type*`` | 
|  |  | 
|  | #. ``llvm/llvm/llvm-c/Core.cpp``: | 
|  |  | 
|  | add enum ``LLVMTypeKind`` and modify | 
|  | ``LLVMTypeKind LLVMGetTypeKind(LLVMTypeRef Ty)`` for the new type | 
|  |  | 
|  | #. ``llvm/lib/AsmParser/LLLexer.cpp``: | 
|  |  | 
|  | add ability to parse in the type from text assembly | 
|  |  | 
|  | #. ``llvm/lib/AsmParser/LLParser.cpp``: | 
|  |  | 
|  | add a token for that type | 
|  |  | 
|  | #. ``llvm/lib/Bitcode/Writer/BitcodeWriter.cpp``: | 
|  |  | 
|  | modify ``static void WriteTypeTable(const ValueEnumerator &VE, | 
|  | BitstreamWriter &Stream)`` to serialize your type | 
|  |  | 
|  | #. ``llvm/lib/Bitcode/Reader/BitcodeReader.cpp``: | 
|  |  | 
|  | modify ``bool BitcodeReader::ParseTypeType()`` to read your data type | 
|  |  | 
|  | #. ``include/llvm/Bitcode/LLVMBitCodes.h``: | 
|  |  | 
|  | add enum ``TypeCodes`` for the new type | 
|  |  | 
|  | Adding a derived type | 
|  | --------------------- | 
|  |  | 
|  | #. ``llvm/include/llvm/IR/Type.h``: | 
|  |  | 
|  | add enum for the new type; add a forward declaration of the type also | 
|  |  | 
|  | #. ``llvm/include/llvm/IR/DerivedTypes.h``: | 
|  |  | 
|  | add new class to represent new class in the hierarchy; add forward | 
|  | declaration to the TypeMap value type | 
|  |  | 
|  | #. ``llvm/lib/IR/Type.cpp`` and ``llvm/lib/IR/ValueTypes.cpp``: | 
|  |  | 
|  | add support for derived type, notably `enum TypeID` and `is`, `get` methods. | 
|  |  | 
|  | #. ``llvm/llvm/llvm-c/Core.cpp``: | 
|  |  | 
|  | add enum ``LLVMTypeKind`` and modify | 
|  | `LLVMTypeKind LLVMGetTypeKind(LLVMTypeRef Ty)` for the new type | 
|  |  | 
|  | #. ``llvm/lib/AsmParser/LLLexer.cpp``: | 
|  |  | 
|  | modify ``lltok::Kind LLLexer::LexIdentifier()`` to add ability to | 
|  | parse in the type from text assembly | 
|  |  | 
|  | #. ``llvm/lib/Bitcode/Writer/BitcodeWriter.cpp``: | 
|  |  | 
|  | modify ``static void WriteTypeTable(const ValueEnumerator &VE, | 
|  | BitstreamWriter &Stream)`` to serialize your type | 
|  |  | 
|  | #. ``llvm/lib/Bitcode/Reader/BitcodeReader.cpp``: | 
|  |  | 
|  | modify ``bool BitcodeReader::ParseTypeType()`` to read your data type | 
|  |  | 
|  | #. ``include/llvm/Bitcode/LLVMBitCodes.h``: | 
|  |  | 
|  | add enum ``TypeCodes`` for the new type | 
|  |  | 
|  | #. ``llvm/lib/IR/AsmWriter.cpp``: | 
|  |  | 
|  | modify ``void TypePrinting::print(Type *Ty, raw_ostream &OS)`` | 
|  | to output the new derived type |