sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 1 | |
sewardj | 383d5d3 | 2014-01-13 11:50:17 +0000 | [diff] [blame] | 2 | Status |
| 3 | ~~~~~~ |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 4 | |
sewardj | 383d5d3 | 2014-01-13 11:50:17 +0000 | [diff] [blame] | 5 | As of Jan 2014 the trunk contains a port to AArch64 ARMv8 -- loosely, |
| 6 | the 64-bit ARM architecture. Currently it supports integer and FP |
sewardj | e7bf3b0 | 2014-05-01 22:42:53 +0000 | [diff] [blame] | 7 | instructions and can run anything generated by gcc-4.8.2 -O3. The |
| 8 | port is under active development. |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 9 | |
philippe | c07369b | 2014-05-17 13:50:02 +0000 | [diff] [blame] | 10 | Current limitations, as of mid-May 2014. |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 11 | |
sewardj | 3690e68 | 2014-02-21 14:54:51 +0000 | [diff] [blame] | 12 | * limited support of vector (SIMD) instructions. Initial target is |
sewardj | e7bf3b0 | 2014-05-01 22:42:53 +0000 | [diff] [blame] | 13 | support for instructions created by gcc-4.8.2 -O3 |
| 14 | (via autovectorisation). This is complete. |
sewardj | 383d5d3 | 2014-01-13 11:50:17 +0000 | [diff] [blame] | 15 | |
philippe | 3ef45eb | 2014-02-12 00:02:05 +0000 | [diff] [blame] | 16 | * Integration with the built in GDB server: |
philippe | c07369b | 2014-05-17 13:50:02 +0000 | [diff] [blame] | 17 | - works ok (breakpoint, attach to a process blocked in a syscall, ...) |
philippe | 3ef45eb | 2014-02-12 00:02:05 +0000 | [diff] [blame] | 18 | - still to do: |
philippe | 7c2800a | 2014-02-12 20:48:18 +0000 | [diff] [blame] | 19 | arm64 xml register description files (allowing shadow registers |
| 20 | to be looked at). |
philippe | 3ef45eb | 2014-02-12 00:02:05 +0000 | [diff] [blame] | 21 | cpsr transfer to/from gdb to be looked at (see also arm equivalent code) |
sewardj | 383d5d3 | 2014-01-13 11:50:17 +0000 | [diff] [blame] | 22 | |
sewardj | 3690e68 | 2014-02-21 14:54:51 +0000 | [diff] [blame] | 23 | * limited syscall support |
| 24 | |
sewardj | 383d5d3 | 2014-01-13 11:50:17 +0000 | [diff] [blame] | 25 | There has been extensive testing of the baseline simulation of integer |
| 26 | and FP instructions. Memcheck is also believed to work, at least for |
| 27 | small examples. Other tools appear to at least not crash when running |
| 28 | /bin/date. |
| 29 | |
sewardj | 263298b | 2014-03-18 23:03:38 +0000 | [diff] [blame] | 30 | Enough syscalls and instructions are supported for substantial |
| 31 | programs to work. Firefox 26 is able to start up and quit. The noise |
| 32 | level from Memcheck is low enough to make it practical to use for real |
| 33 | debugging. |
sewardj | 3690e68 | 2014-02-21 14:54:51 +0000 | [diff] [blame] | 34 | |
sewardj | 383d5d3 | 2014-01-13 11:50:17 +0000 | [diff] [blame] | 35 | |
| 36 | Building |
| 37 | ~~~~~~~~ |
| 38 | |
| 39 | You could probably build it directly on a target OS, using the normal |
| 40 | non-cross scheme |
| 41 | |
| 42 | ./autogen.sh ; ./configure --prefix=.. ; make ; make install |
| 43 | |
| 44 | Development so far was however done by cross compiling, viz: |
| 45 | |
| 46 | export CC=aarch64-linux-gnu-gcc |
| 47 | export LD=aarch64-linux-gnu-ld |
| 48 | export AR=aarch64-linux-gnu-ar |
| 49 | |
| 50 | ./autogen.sh |
| 51 | ./configure --prefix=`pwd`/Inst --host=aarch64-unknown-linux \ |
| 52 | --enable-only64bit |
| 53 | make -j4 |
| 54 | make -j4 install |
| 55 | |
| 56 | Doing this assumes that the install path (`pwd`/Inst) is valid on |
| 57 | both host and target, which isn't normally the case. To avoid |
| 58 | this limitation, do instead: |
| 59 | |
| 60 | ./configure --prefix=/install/path/on/target \ |
| 61 | --host=aarch64-unknown-linux \ |
| 62 | --enable-only64bit |
| 63 | make -j4 |
| 64 | make -j4 install DESTDIR=/a/temp/dir/on/host |
| 65 | # and then copy the contents of DESTDIR to the target. |
| 66 | |
| 67 | See README.android for more examples of cross-compile building. |
| 68 | |
| 69 | |
| 70 | Implementation tidying-up/TODO notes |
| 71 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 72 | |
| 73 | UnwindStartRegs -- what should that contain? |
| 74 | |
| 75 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 76 | vki-arm64-linux.h: vki_sigaction_base |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 77 | I really don't think that __vki_sigrestore_t sa_restorer |
| 78 | should be present. Adding it surely puts sa_mask at a wrong |
| 79 | offset compared to (kernel) reality. But not having it causes |
| 80 | compilation of m_signals.c to fail in hard to understand ways, |
| 81 | so adding it temporarily. |
| 82 | |
| 83 | |
| 84 | m_trampoline.S: what's the unexecutable-insn value? 0xFFFFFFFF |
| 85 | is there at the moment, but 0x00000000 is probably what it should be. |
| 86 | Also, fix indentation/tab-vs-space stuff |
| 87 | |
| 88 | |
| 89 | ./include/vki/vki-arm64-linux.h: uses __uint128_t. Should change |
| 90 | it to __vki_uint128_t, but what's the defn of that? |
| 91 | |
| 92 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 93 | m_debuginfo/priv_storage.h: need proper defn of DiCfSI |
| 94 | |
| 95 | |
| 96 | readdwarf.c: is this correct? |
| 97 | #elif defined(VGP_arm64_linux) |
| 98 | # define FP_REG 29 //??? |
| 99 | # define SP_REG 31 //??? |
| 100 | # define RA_REG_DEFAULT 30 //??? |
| 101 | |
| 102 | |
| 103 | vki-arm64-linux.h: |
| 104 | re linux-3.10.5/include/uapi/asm-generic/sembuf.h |
| 105 | I'd say the amd64 version has padding it shouldn't have. Check? |
| 106 | |
| 107 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 108 | syswrap-linux.c run_a_thread_NORETURN assembly sections |
| 109 | seems like tst->os_state.exitcode has word type |
| 110 | in which case the ppc64_linux use of lwz to read it, is wrong |
| 111 | |
| 112 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 113 | syswrap-linux.c ML_(do_fork_clone) |
| 114 | assuming that VGP_arm64_linux is the same as VGP_arm_linux here |
| 115 | |
| 116 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 117 | dispatch-arm64-linux.S: FIXME: set up FP control state before |
| 118 | entering generated code. Also fix screwy indentation. |
| 119 | |
sewardj | 383d5d3 | 2014-01-13 11:50:17 +0000 | [diff] [blame] | 120 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 121 | dispatcher-ery general: what's a good (predictor-friendly) way to |
| 122 | branch to a register? |
| 123 | |
| 124 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 125 | in vki-arm64-scnums.h |
| 126 | //#if __BITS_PER_LONG == 64 && !defined(__SYSCALL_COMPAT) |
| 127 | Probably want to reenable that and clean up accordingly |
| 128 | |
| 129 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 130 | putIRegXXorZR: figure out a way that the computed value is actually |
| 131 | used, so as to keep any memory reads that might generate it, alive. |
| 132 | (else the simulation can lose exceptions). At least, for writes to |
| 133 | the zero register generated by loads .. or .. can anything other |
| 134 | integer instructions, that write to a register, cause exceptions? |
| 135 | |
| 136 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 137 | loads/stores: generate stack alignment checks as necessary |
| 138 | |
| 139 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 140 | fix barrier insns: ISB, DMB |
| 141 | |
| 142 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 143 | fix atomic loads/stores |
| 144 | |
| 145 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 146 | FMADD/FMSUB/FNMADD/FNMSUB: generate and use the relevant fused |
| 147 | IROps so as to avoid double rounding |
| 148 | |
| 149 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 150 | ARM64Instr_Call getRegUsage: re-check relative to what |
| 151 | getAllocableRegs_ARM64 makes available |
| 152 | |
| 153 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 154 | Make dispatch-arm64-linux.S save any callee-saved Q regs |
| 155 | I think what is required is to save D8-D15 and nothing more than that. |
| 156 | |
| 157 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 158 | wrapper for __NR3264_fstat -- correct? |
| 159 | |
| 160 | |
sewardj | 383d5d3 | 2014-01-13 11:50:17 +0000 | [diff] [blame] | 161 | PRE(sys_clone): get rid of references to vki_modify_ldt_t and the |
| 162 | definition of it in vki-arm64-linux.h. Ditto for 32 bit arm. |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 163 | |
| 164 | |
| 165 | sigframe-arm64-linux.c: build_sigframe: references to nonexistent |
| 166 | siguc->uc_mcontext.trap_no, siguc->uc_mcontext.error_code have been |
| 167 | replaced by zero. Also in synth_ucontext. |
| 168 | |
| 169 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 170 | m_debugger.c: |
| 171 | uregs.pstate = LibVEX_GuestARM64_get_nzcv(vex); /* is this correct? */ |
| 172 | Is that remotely correct? |
| 173 | |
| 174 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 175 | host_arm64_defs.c: emit_ARM64INstr: |
| 176 | ARM64in_VDfromX and ARM64in_VQfromXX: use simple top-half zeroing |
| 177 | MOVs to vector registers instead of INS Vd.D[0], Xreg, to avoid false |
| 178 | dependencies on the top half of the register. (Or at least check |
sewardj | 383d5d3 | 2014-01-13 11:50:17 +0000 | [diff] [blame] | 179 | the semantics of INS Vd.D[0] to see if it zeroes out the top.) |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 180 | |
| 181 | |
| 182 | preferredVectorSubTypeFromSize: review perf effects and decide |
| 183 | on a types-for-subparts policy |
| 184 | |
| 185 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 186 | fold_IRExpr_Unop: add a reduction rule for this |
| 187 | 1Sto64(CmpNEZ64( Or64(GET:I64(1192),GET:I64(1184)) )) |
| 188 | vis 1Sto64(CmpNEZ64(x)) --> CmpwNEZ64(x) |
| 189 | |
| 190 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 191 | check insn selection for memcheck-only primops: |
| 192 | Left64 CmpwNEZ64 V128to64 V128HIto64 1Sto64 CmpNEZ64 CmpNEZ32 |
| 193 | widen_z_8_to_64 1Sto32 Left32 32HLto64 CmpwNEZ32 CmpNEZ8 |
| 194 | |
| 195 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 196 | isel: get rid of various cases where zero is put into a register |
| 197 | and just use xzr instead. Especially for CmpNEZ64/32. And for |
| 198 | writing zeroes into the CC thunk fields. |
| 199 | |
| 200 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 201 | /* Keep this list in sync with that in iselNext below */ |
| 202 | /* Keep this list in sync with that for Ist_Exit above */ |
| 203 | uh .. they are not in sync |
| 204 | |
| 205 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 206 | very stupid: |
| 207 | imm64 x23, 0xFFFFFFFFFFFFFFA0 |
| 208 | 17 F4 9F D2 F7 FF BF F2 F7 FF DF F2 F7 FF FF F2 |
| 209 | |
| 210 | |
sewardj | f0c1250 | 2014-01-12 12:54:00 +0000 | [diff] [blame] | 211 | valgrind.h: fix VALGRIND_ALIGN_STACK/VALGRIND_RESTORE_STACK, |
| 212 | also add CFI annotations |
sewardj | fdaf9e4 | 2014-01-13 00:18:51 +0000 | [diff] [blame] | 213 | |
| 214 | |
sewardj | fdaf9e4 | 2014-01-13 00:18:51 +0000 | [diff] [blame] | 215 | could possibly bring r29 into use, which be useful as it is |
| 216 | callee saved |
sewardj | 383d5d3 | 2014-01-13 11:50:17 +0000 | [diff] [blame] | 217 | |
| 218 | |
| 219 | ubfm/sbfm etc: special case cases that are simple shifts, as iropt |
| 220 | can't always simplify the general-case IR to a shift in such cases. |
sewardj | 1cd6c90 | 2014-02-05 11:02:34 +0000 | [diff] [blame] | 221 | |
| 222 | |
| 223 | LDP,STP (immediate, simm7) (FP&VEC) |
| 224 | should zero out hi parts of dst registers in the LDP case |
| 225 | |
| 226 | |
| 227 | DUP insns: use Iop_Dup8x16, Iop_Dup16x8, Iop_Dup32x4 |
| 228 | rather than doing it "by hand" |
| 229 | |
| 230 | |
| 231 | Any place where ZeroHI64ofV128 is used in conjunction with |
| 232 | FP vector IROps: find a way to make sure that arithmetic on |
| 233 | the upper half of the values is "harmless." |
| 234 | |
| 235 | |
| 236 | math_MINMAXV: use real Iop_Cat{Odd,Even}Lanes ops rather than |
| 237 | inline scalar code |
sewardj | 3690e68 | 2014-02-21 14:54:51 +0000 | [diff] [blame] | 238 | |
| 239 | |
| 240 | chainXDirect_ARM64: use direct jump forms when possible |