Blame - lib/Target/AArch64/AArch64CallingConv.td - fp2-dev/platform/external/llvm

blob: b880d8373deb0e1a384e5291007286d7f4e615f0 [file] [log] [blame]

Tim Northover	72062f5	2013-01-31 12:12:40 +0000	[diff] [blame]	1	//==-- AArch64CallingConv.td - Calling Conventions for ARM ----- tblgen --==//
				2	//
				3	// The LLVM Compiler Infrastructure
				4	//
				5	// This file is distributed under the University of Illinois Open Source
				6	// License. See LICENSE.TXT for details.
				7	//
				8	//===----------------------------------------------------------------------===//
				9	// This describes the calling conventions for AArch64 architecture.
				10	//===----------------------------------------------------------------------===//
				11
				12
				13	// The AArch64 Procedure Call Standard is unfortunately specified at a slightly
				14	// higher level of abstraction than LLVM's target interface presents. In
				15	// particular, it refers (like other ABIs, in fact) directly to
				16	// structs. However, generic LLVM code takes the liberty of lowering structure
				17	// arguments to the component fields before we see them.
				18	//
				19	// As a result, the obvious direct map from LLVM IR to PCS concepts can't be
				20	// implemented, so the goals of this calling convention are, in decreasing
				21	// priority order:
				22	// 1. Expose some way to express the concepts required to implement the
				23	// generic PCS from a front-end.
				24	// 2. Provide a sane ABI for pure LLVM.
				25	// 3. Follow the generic PCS as closely as is naturally possible.
				26	//
				27	// The suggested front-end implementation of PCS features is:
				28	// * Integer, float and vector arguments of all sizes which end up in
				29	// registers are passed and returned via the natural LLVM type.
				30	// * Structure arguments with size <= 16 bytes are passed and returned in
				31	// registers as similar integer or composite types. For example:
				32	// [1 x i64], [2 x i64] or [1 x i128] (if alignment 16 needed).
				33	// * HFAs in registers follow rules similar to small structs: appropriate
				34	// composite types.
				35	// * Structure arguments with size > 16 bytes are passed via a pointer,
				36	// handled completely by the front-end.
				37	// * Structure return values > 16 bytes via an sret pointer argument.
				38	// * Other stack-based arguments (not large structs) are passed using byval
				39	// pointers. Padding arguments are added beforehand to guarantee a large
				40	// struct doesn't later use integer registers.
				41	//
				42	// N.b. this means that it is the front-end's responsibility (if it cares about
				43	// PCS compliance) to check whether enough registers are available for an
				44	// argument when deciding how to pass it.
				45
				46	class CCIfAlign<int Align, CCAction A>:
				47	CCIf<"ArgFlags.getOrigAlign() == " # Align, A>;
				48
				49	def CC_A64_APCS : CallingConv<[
				50	// SRet is an LLVM-specific concept, so it takes precedence over general ABI
				51	// concerns. However, this rule will be used by C/C++ frontends to implement
				52	// structure return.
				53	CCIfSRet<CCAssignToReg<[X8]>>,
				54
				55	// Put ByVal arguments directly on the stack. Minimum size and alignment of a
				56	// slot is 64-bit.
				57	CCIfByVal<CCPassByVal<8, 8>>,
				58
				59	// Canonicalise the various types that live in different floating-point
				60	// registers. This makes sense because the PCS does not distinguish Short
				61	// Vectors and Floating-point types.
				62	CCIfType<[v2i8], CCBitConvertToType<f16>>,
				63	CCIfType<[v4i8, v2i16], CCBitConvertToType<f32>>,
				64	CCIfType<[v8i8, v4i16, v2i32, v2f32], CCBitConvertToType<f64>>,
				65	CCIfType<[v16i8, v8i16, v4i32, v2i64, v4f32, v2f64],
				66	CCBitConvertToType<f128>>,
				67
				68	// PCS: "C.1: If the argument is a Half-, Single-, Double- or Quad- precision
				69	// Floating-point or Short Vector Type and the NSRN is less than 8, then the
				70	// argument is allocated to the least significant bits of register
				71	// v[NSRN]. The NSRN is incremented by one. The argument has now been
				72	// allocated."
				73	CCIfType<[f16], CCAssignToReg<[B0, B1, B2, B3, B4, B5, B6, B7]>>,
				74	CCIfType<[f32], CCAssignToReg<[S0, S1, S2, S3, S4, S5, S6, S7]>>,
				75	CCIfType<[f64], CCAssignToReg<[D0, D1, D2, D3, D4, D5, D6, D7]>>,
				76	CCIfType<[f128], CCAssignToReg<[Q0, Q1, Q2, Q3, Q4, Q5, Q6, Q7]>>,
				77
				78	// PCS: "C.2: If the argument is an HFA and there are sufficient unallocated
				79	// SIMD and Floating-point registers (NSRN - number of elements < 8), then the
				80	// argument is allocated to SIMD and Floating-point registers (with one
				81	// register per element of the HFA). The NSRN is incremented by the number of
				82	// registers used. The argument has now been allocated."
				83	//
				84	// N.b. As above, this rule is the responsibility of the front-end.
				85
				86	// "C.3: If the argument is an HFA then the NSRN is set to 8 and the size of
				87	// the argument is rounded up to the nearest multiple of 8 bytes."
				88	//
				89	// "C.4: If the argument is an HFA, a Quad-precision Floating-point or Short
				90	// Vector Type then the NSAA is rounded up to the larger of 8 or the Natural
				91	// Alignment of the Argument's type."
				92	//
				93	// It is expected that these will be satisfied by adding dummy arguments to
				94	// the prototype.
				95
				96	// PCS: "C.5: If the argument is a Half- or Single- precision Floating-point
				97	// type then the size of the argument is set to 8 bytes. The effect is as if
				98	// the argument had been copied to the least significant bits of a 64-bit
				99	// register and the remaining bits filled with unspecified values."
				100	CCIfType<[f16, f32], CCPromoteToType<f64>>,
				101
				102	// PCS: "C.6: If the argument is an HFA, a Half-, Single-, Double- or Quad-
				103	// precision Floating-point or Short Vector Type, then the argument is copied
				104	// to memory at the adjusted NSAA. The NSAA is incremented by the size of the
				105	// argument. The argument has now been allocated."
				106	CCIfType<[f64], CCAssignToStack<8, 8>>,
				107	CCIfType<[f128], CCAssignToStack<16, 16>>,
				108
				109	// PCS: "C.7: If the argument is an Integral Type, the size of the argument is
				110	// less than or equal to 8 bytes and the NGRN is less than 8, the argument is
				111	// copied to the least significant bits of x[NGRN]. The NGRN is incremented by
				112	// one. The argument has now been allocated."
				113
				114	// First we implement C.8 and C.9 (128-bit types get even registers). i128 is
				115	// represented as two i64s, the first one being split. If we delayed this
				116	// operation C.8 would never be reached.
				117	CCIfType<[i64],
				118	CCIfSplit<CCAssignToRegWithShadow<[X0, X2, X4, X6], [X0, X1, X3, X5]>>>,
				119
				120	// Note: the promotion also implements C.14.
				121	CCIfType<[i8, i16, i32], CCPromoteToType<i64>>,
				122
				123	// And now the real implementation of C.7
				124	CCIfType<[i64], CCAssignToReg<[X0, X1, X2, X3, X4, X5, X6, X7]>>,
				125
				126	// PCS: "C.8: If the argument has an alignment of 16 then the NGRN is rounded
				127	// up to the next even number."
				128	//
				129	// "C.9: If the argument is an Integral Type, the size of the argument is
				130	// equal to 16 and the NGRN is less than 7, the argument is copied to x[NGRN]
				131	// and x[NGRN+1], x[NGRN] shall contain the lower addressed double-word of the
				132	// memory representation of the argument. The NGRN is incremented by two. The
				133	// argument has now been allocated."
				134	//
				135	// Subtlety here: what if alignment is 16 but it is not an integral type? All
				136	// floating-point types have been allocated already, which leaves composite
				137	// types: this is why a front-end may need to produce i128 for a struct <= 16
				138	// bytes.
				139
				140	// PCS: "C.10 If the argument is a Composite Type and the size in double-words
				141	// of the argument is not more than 8 minus NGRN, then the argument is copied
				142	// into consecutive general-purpose registers, starting at x[NGRN]. The
				143	// argument is passed as though it had been loaded into the registers from a
				144	// double-word aligned address with an appropriate sequence of LDR
				145	// instructions loading consecutive registers from memory (the contents of any
				146	// unused parts of the registers are unspecified by this standard). The NGRN
				147	// is incremented by the number of registers used. The argument has now been
				148	// allocated."
				149	//
				150	// Another one that's the responsibility of the front-end (sigh).
				151
				152	// PCS: "C.11: The NGRN is set to 8."
				153	CCCustom<"CC_AArch64NoMoreRegs">,
				154
				155	// PCS: "C.12: The NSAA is rounded up to the larger of 8 or the Natural
				156	// Alignment of the argument's type."
				157	//
				158	// PCS: "C.13: If the argument is a composite type then the argument is copied
				159	// to memory at the adjusted NSAA. The NSAA is by the size of the
				160	// argument. The argument has now been allocated."
				161	//
				162	// Note that the effect of this corresponds to a memcpy rather than register
				163	// stores so that the struct ends up correctly addressable at the adjusted
				164	// NSAA.
				165
				166	// PCS: "C.14: If the size of the argument is less than 8 bytes then the size
				167	// of the argument is set to 8 bytes. The effect is as if the argument was
				168	// copied to the least significant bits of a 64-bit register and the remaining
				169	// bits filled with unspecified values."
				170	//
				171	// Integer types were widened above. Floating-point and composite types have
				172	// already been allocated completely. Nothing to do.
				173
				174	// PCS: "C.15: The argument is copied to memory at the adjusted NSAA. The NSAA
				175	// is incremented by the size of the argument. The argument has now been
				176	// allocated."
				177	CCIfType<[i64], CCIfSplit<CCAssignToStack<8, 16>>>,
				178	CCIfType<[i64], CCAssignToStack<8, 8>>
				179
				180	]>;
				181
				182	// According to the PCS, X19-X30 are callee-saved, however only the low 64-bits
				183	// of vector registers (8-15) are callee-saved. The order here is is picked up
				184	// by PrologEpilogInserter.cpp to allocate stack slots, starting from top of
				185	// stack upon entry. This gives the customary layout of x30 at [sp-8], x29 at
				186	// [sp-16], ...
				187	def CSR_PCS : CalleeSavedRegs<(add (sequence "X%u", 30, 19),
				188	(sequence "D%u", 15, 8))>;
				189
				190
				191	// TLS descriptor calls are extremely restricted in their changes, to allow
				192	// optimisations in the (hopefully) more common fast path where no real action
				193	// is needed. They actually have to preserve all registers, except for the
				194	// unavoidable X30 and the return register X0.
				195	def TLSDesc : CalleeSavedRegs<(add (sequence "X%u", 29, 1),
				196	(sequence "Q%u", 31, 0))>;