sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 1 | |
sewardj | 383d5d3 | 2014-01-13 11:50:17 +0000 | [diff] [blame] | 2 | Status |
| 3 | ~~~~~~ |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 4 | |
sewardj | 383d5d3 | 2014-01-13 11:50:17 +0000 | [diff] [blame] | 5 | As of Jan 2014 the trunk contains a port to AArch64 ARMv8 -- loosely, |
| 6 | the 64-bit ARM architecture. Currently it supports integer and FP |
sewardj | fc073c3 | 2014-01-15 14:30:24 +0000 | [diff] [blame] | 7 | instructions and can run almost anything generated by gcc-4.8.2 -O2. |
sewardj | 383d5d3 | 2014-01-13 11:50:17 +0000 | [diff] [blame] | 8 | The port is under active development. |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 9 | |
sewardj | 383d5d3 | 2014-01-13 11:50:17 +0000 | [diff] [blame] | 10 | Current limitations, as of mid-Jan 2014. |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 11 | |
sewardj | 383d5d3 | 2014-01-13 11:50:17 +0000 | [diff] [blame] | 12 | * threaded apps won't work, due to inadequate sys_clone() support. |
| 13 | |
| 14 | * almost no support of vector (SIMD) instructions |
| 15 | |
philippe | 3ef45eb | 2014-02-12 00:02:05 +0000 | [diff] [blame] | 16 | * Integration with the built in GDB server: |
| 17 | - basically works but breakpoints are causing crashes due to missing |
| 18 | unchainXDirect_ARM64 needed by LibVEX_UnChain. |
| 19 | - still to do: |
| 20 | arm64 xml register description files (allowing shadow registers to be looked at). |
| 21 | ptrace invoker : currently disabled for both arm and arm64 |
| 22 | cpsr transfer to/from gdb to be looked at (see also arm equivalent code) |
sewardj | 383d5d3 | 2014-01-13 11:50:17 +0000 | [diff] [blame] | 23 | |
| 24 | There has been extensive testing of the baseline simulation of integer |
| 25 | and FP instructions. Memcheck is also believed to work, at least for |
| 26 | small examples. Other tools appear to at least not crash when running |
| 27 | /bin/date. |
| 28 | |
| 29 | |
| 30 | Building |
| 31 | ~~~~~~~~ |
| 32 | |
| 33 | You could probably build it directly on a target OS, using the normal |
| 34 | non-cross scheme |
| 35 | |
| 36 | ./autogen.sh ; ./configure --prefix=.. ; make ; make install |
| 37 | |
| 38 | Development so far was however done by cross compiling, viz: |
| 39 | |
| 40 | export CC=aarch64-linux-gnu-gcc |
| 41 | export LD=aarch64-linux-gnu-ld |
| 42 | export AR=aarch64-linux-gnu-ar |
| 43 | |
| 44 | ./autogen.sh |
| 45 | ./configure --prefix=`pwd`/Inst --host=aarch64-unknown-linux \ |
| 46 | --enable-only64bit |
| 47 | make -j4 |
| 48 | make -j4 install |
| 49 | |
| 50 | Doing this assumes that the install path (`pwd`/Inst) is valid on |
| 51 | both host and target, which isn't normally the case. To avoid |
| 52 | this limitation, do instead: |
| 53 | |
| 54 | ./configure --prefix=/install/path/on/target \ |
| 55 | --host=aarch64-unknown-linux \ |
| 56 | --enable-only64bit |
| 57 | make -j4 |
| 58 | make -j4 install DESTDIR=/a/temp/dir/on/host |
| 59 | # and then copy the contents of DESTDIR to the target. |
| 60 | |
| 61 | See README.android for more examples of cross-compile building. |
| 62 | |
| 63 | |
| 64 | Implementation tidying-up/TODO notes |
| 65 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 66 | |
| 67 | UnwindStartRegs -- what should that contain? |
| 68 | |
| 69 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 70 | vki-arm64-linux.h: vki_sigaction_base |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 71 | I really don't think that __vki_sigrestore_t sa_restorer |
| 72 | should be present. Adding it surely puts sa_mask at a wrong |
| 73 | offset compared to (kernel) reality. But not having it causes |
| 74 | compilation of m_signals.c to fail in hard to understand ways, |
| 75 | so adding it temporarily. |
| 76 | |
| 77 | |
| 78 | m_trampoline.S: what's the unexecutable-insn value? 0xFFFFFFFF |
| 79 | is there at the moment, but 0x00000000 is probably what it should be. |
| 80 | Also, fix indentation/tab-vs-space stuff |
| 81 | |
| 82 | |
| 83 | ./include/vki/vki-arm64-linux.h: uses __uint128_t. Should change |
| 84 | it to __vki_uint128_t, but what's the defn of that? |
| 85 | |
| 86 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 87 | m_debuginfo/priv_storage.h: need proper defn of DiCfSI |
| 88 | |
| 89 | |
| 90 | readdwarf.c: is this correct? |
| 91 | #elif defined(VGP_arm64_linux) |
| 92 | # define FP_REG 29 //??? |
| 93 | # define SP_REG 31 //??? |
| 94 | # define RA_REG_DEFAULT 30 //??? |
| 95 | |
| 96 | |
| 97 | vki-arm64-linux.h: |
| 98 | re linux-3.10.5/include/uapi/asm-generic/sembuf.h |
| 99 | I'd say the amd64 version has padding it shouldn't have. Check? |
| 100 | |
| 101 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 102 | syswrap-linux.c run_a_thread_NORETURN assembly sections |
| 103 | seems like tst->os_state.exitcode has word type |
| 104 | in which case the ppc64_linux use of lwz to read it, is wrong |
| 105 | |
| 106 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 107 | syswrap-linux.c ML_(do_fork_clone) |
| 108 | assuming that VGP_arm64_linux is the same as VGP_arm_linux here |
| 109 | |
| 110 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 111 | dispatch-arm64-linux.S: FIXME: set up FP control state before |
| 112 | entering generated code. Also fix screwy indentation. |
| 113 | |
sewardj | 383d5d3 | 2014-01-13 11:50:17 +0000 | [diff] [blame] | 114 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 115 | dispatcher-ery general: what's a good (predictor-friendly) way to |
| 116 | branch to a register? |
| 117 | |
| 118 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 119 | in vki-arm64-scnums.h |
| 120 | //#if __BITS_PER_LONG == 64 && !defined(__SYSCALL_COMPAT) |
| 121 | Probably want to reenable that and clean up accordingly |
| 122 | |
| 123 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 124 | putIRegXXorZR: figure out a way that the computed value is actually |
| 125 | used, so as to keep any memory reads that might generate it, alive. |
| 126 | (else the simulation can lose exceptions). At least, for writes to |
| 127 | the zero register generated by loads .. or .. can anything other |
| 128 | integer instructions, that write to a register, cause exceptions? |
| 129 | |
| 130 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 131 | loads/stores: generate stack alignment checks as necessary |
| 132 | |
| 133 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 134 | fix barrier insns: ISB, DMB |
| 135 | |
| 136 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 137 | fix atomic loads/stores |
| 138 | |
| 139 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 140 | FMADD/FMSUB/FNMADD/FNMSUB: generate and use the relevant fused |
| 141 | IROps so as to avoid double rounding |
| 142 | |
| 143 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 144 | ARM64Instr_Call getRegUsage: re-check relative to what |
| 145 | getAllocableRegs_ARM64 makes available |
| 146 | |
| 147 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 148 | Make dispatch-arm64-linux.S save any callee-saved Q regs |
| 149 | I think what is required is to save D8-D15 and nothing more than that. |
| 150 | |
| 151 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 152 | wrapper for __NR3264_fstat -- correct? |
| 153 | |
| 154 | |
sewardj | 383d5d3 | 2014-01-13 11:50:17 +0000 | [diff] [blame] | 155 | PRE(sys_clone): get rid of references to vki_modify_ldt_t and the |
| 156 | definition of it in vki-arm64-linux.h. Ditto for 32 bit arm. |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 157 | |
| 158 | |
| 159 | sigframe-arm64-linux.c: build_sigframe: references to nonexistent |
| 160 | siguc->uc_mcontext.trap_no, siguc->uc_mcontext.error_code have been |
| 161 | replaced by zero. Also in synth_ucontext. |
| 162 | |
| 163 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 164 | m_debugger.c: |
| 165 | uregs.pstate = LibVEX_GuestARM64_get_nzcv(vex); /* is this correct? */ |
| 166 | Is that remotely correct? |
| 167 | |
| 168 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 169 | host_arm64_defs.c: emit_ARM64INstr: |
| 170 | ARM64in_VDfromX and ARM64in_VQfromXX: use simple top-half zeroing |
| 171 | MOVs to vector registers instead of INS Vd.D[0], Xreg, to avoid false |
| 172 | dependencies on the top half of the register. (Or at least check |
sewardj | 383d5d3 | 2014-01-13 11:50:17 +0000 | [diff] [blame] | 173 | the semantics of INS Vd.D[0] to see if it zeroes out the top.) |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 174 | |
| 175 | |
| 176 | preferredVectorSubTypeFromSize: review perf effects and decide |
| 177 | on a types-for-subparts policy |
| 178 | |
| 179 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 180 | fold_IRExpr_Unop: add a reduction rule for this |
| 181 | 1Sto64(CmpNEZ64( Or64(GET:I64(1192),GET:I64(1184)) )) |
| 182 | vis 1Sto64(CmpNEZ64(x)) --> CmpwNEZ64(x) |
| 183 | |
| 184 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 185 | check insn selection for memcheck-only primops: |
| 186 | Left64 CmpwNEZ64 V128to64 V128HIto64 1Sto64 CmpNEZ64 CmpNEZ32 |
| 187 | widen_z_8_to_64 1Sto32 Left32 32HLto64 CmpwNEZ32 CmpNEZ8 |
| 188 | |
| 189 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 190 | isel: get rid of various cases where zero is put into a register |
| 191 | and just use xzr instead. Especially for CmpNEZ64/32. And for |
| 192 | writing zeroes into the CC thunk fields. |
| 193 | |
| 194 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 195 | /* Keep this list in sync with that in iselNext below */ |
| 196 | /* Keep this list in sync with that for Ist_Exit above */ |
| 197 | uh .. they are not in sync |
| 198 | |
| 199 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 200 | very stupid: |
| 201 | imm64 x23, 0xFFFFFFFFFFFFFFA0 |
| 202 | 17 F4 9F D2 F7 FF BF F2 F7 FF DF F2 F7 FF FF F2 |
| 203 | |
| 204 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 205 | valgrind.h: fix VALGRIND_ALIGN_STACK/VALGRIND_RESTORE_STACK, |
| 206 | also add CFI annotations |
sewardj | fdaf9e4 | 2014-01-13 00:18:51 +0000 | [diff] [blame] | 207 | |
| 208 | |
sewardj | fdaf9e4 | 2014-01-13 00:18:51 +0000 | [diff] [blame] | 209 | could possibly bring r29 into use, which be useful as it is |
| 210 | callee saved |
sewardj | 383d5d3 | 2014-01-13 11:50:17 +0000 | [diff] [blame] | 211 | |
| 212 | |
| 213 | ubfm/sbfm etc: special case cases that are simple shifts, as iropt |
| 214 | can't always simplify the general-case IR to a shift in such cases. |
sewardj | 1cd6c90 | 2014-02-05 11:02:34 +0000 | [diff] [blame] | 215 | |
| 216 | |
| 217 | LDP,STP (immediate, simm7) (FP&VEC) |
| 218 | should zero out hi parts of dst registers in the LDP case |
| 219 | |
| 220 | |
| 221 | DUP insns: use Iop_Dup8x16, Iop_Dup16x8, Iop_Dup32x4 |
| 222 | rather than doing it "by hand" |
| 223 | |
| 224 | |
| 225 | Any place where ZeroHI64ofV128 is used in conjunction with |
| 226 | FP vector IROps: find a way to make sure that arithmetic on |
| 227 | the upper half of the values is "harmless." |
| 228 | |
| 229 | |
| 230 | math_MINMAXV: use real Iop_Cat{Odd,Even}Lanes ops rather than |
| 231 | inline scalar code |