Update implementation notes for the arm64-linux port.


git-svn-id: svn://svn.valgrind.org/valgrind/trunk@13775 a5019735-40e9-0310-863c-91ae7b9d1cf9
diff --git a/Makefile.am b/Makefile.am
index 28087d9..40e2b0f 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -104,6 +104,7 @@
 	README.android \
 	README.android_emulator \
 	README.mips \
+	README.aarch64 \
 	NEWS.old \
 	valgrind.pc.in \
 	valgrind.spec.in \
diff --git a/ARM64_TIDYUPS.txt b/README.aarch64
similarity index 66%
rename from ARM64_TIDYUPS.txt
rename to README.aarch64
index c785cb8..ccc84af 100644
--- a/ARM64_TIDYUPS.txt
+++ b/README.aarch64
@@ -1,21 +1,67 @@
 
-## HOW TO Cross-CONFIGURE
+Status
+~~~~~~
 
-export CC=aarch64-linux-gnu-gcc
-export LD=aarch64-linux-gnu-ld
-export AR=aarch64-linux-gnu-ar
+As of Jan 2014 the trunk contains a port to AArch64 ARMv8 -- loosely,
+the 64-bit ARM architecture.  Currently it supports integer and FP
+instructions and can run almost anything generated by gcc-4.7.2 -O2.
+The port is under active development.
 
-./autogen.sh
-./configure --prefix=`pwd`/Inst --host=aarch64-unknown-linux --enable-only64bit
+Current limitations, as of mid-Jan 2014.
 
-##############################################################
+* threaded apps won't work, due to inadequate sys_clone() support.
+
+* almost no support of vector (SIMD) instructions
+
+* Integration with the built in GDB server doesn't work yet.
+
+There has been extensive testing of the baseline simulation of integer
+and FP instructions.  Memcheck is also believed to work, at least for
+small examples.  Other tools appear to at least not crash when running
+/bin/date.
+
+
+Building
+~~~~~~~~
+
+You could probably build it directly on a target OS, using the normal
+non-cross scheme
+
+  ./autogen.sh ; ./configure --prefix=.. ; make ; make install
+
+Development so far was however done by cross compiling, viz:
+
+  export CC=aarch64-linux-gnu-gcc
+  export LD=aarch64-linux-gnu-ld
+  export AR=aarch64-linux-gnu-ar
+
+  ./autogen.sh
+  ./configure --prefix=`pwd`/Inst --host=aarch64-unknown-linux \
+              --enable-only64bit
+  make -j4
+  make -j4 install
+
+Doing this assumes that the install path (`pwd`/Inst) is valid on
+both host and target, which isn't normally the case.  To avoid
+this limitation, do instead:
+
+  ./configure --prefix=/install/path/on/target \
+              --host=aarch64-unknown-linux \
+              --enable-only64bit
+  make -j4
+  make -j4 install DESTDIR=/a/temp/dir/on/host
+  # and then copy the contents of DESTDIR to the target.
+
+See README.android for more examples of cross-compile building.
+
+
+Implementation tidying-up/TODO notes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 UnwindStartRegs -- what should that contain?
 
 
-
 vki-arm64-linux.h: vki_sigaction_base
-
 I really don't think that __vki_sigrestore_t sa_restorer
 should be present.  Adding it surely puts sa_mask at a wrong
 offset compared to (kernel) reality.  But not having it causes
@@ -32,7 +78,6 @@
 it to __vki_uint128_t, but what's the defn of that?
 
 
-
 m_debuginfo/priv_storage.h: need proper defn of DiCfSI
 
 
@@ -48,32 +93,28 @@
 I'd say the amd64 version has padding it shouldn't have.  Check?
 
 
-
 syswrap-linux.c run_a_thread_NORETURN assembly sections
 seems like tst->os_state.exitcode has word type
 in which case the ppc64_linux use of lwz to read it, is wrong
 
 
-
 syswrap-linux.c ML_(do_fork_clone)
 assuming that VGP_arm64_linux is the same as VGP_arm_linux here
 
 
-
 dispatch-arm64-linux.S: FIXME: set up FP control state before
 entering generated code.  Also fix screwy indentation.
 
+
 dispatcher-ery general: what's a good (predictor-friendly) way to
 branch to a register?
 
 
-
 in vki-arm64-scnums.h
 //#if __BITS_PER_LONG == 64 && !defined(__SYSCALL_COMPAT)
 Probably want to reenable that and clean up accordingly
 
 
-
 putIRegXXorZR: figure out a way that the computed value is actually
 used, so as to keep any memory reads that might generate it, alive.
 (else the simulation can lose exceptions).  At least, for writes to
@@ -81,42 +122,32 @@
 integer instructions, that write to a register, cause exceptions?
 
 
-
 loads/stores: generate stack alignment checks as necessary
 
 
-
 fix barrier insns: ISB, DMB
 
 
-
 fix atomic loads/stores
 
 
-
 FMADD/FMSUB/FNMADD/FNMSUB: generate and use the relevant fused
 IROps so as to avoid double rounding
 
 
-
 ARM64Instr_Call getRegUsage: re-check relative to what
 getAllocableRegs_ARM64 makes available
 
 
-
 Make dispatch-arm64-linux.S save any callee-saved Q regs
 I think what is required is to save D8-D15 and nothing more than that.
 
 
-
 wrapper for __NR3264_fstat -- correct?
 
 
-
-PRE(sys_clone): get rid of references to vki_modify_ldt_t
-and the definition of it in vki-arm64-linux.h.  Ditto for 
-32 bit arm.
-
+PRE(sys_clone): get rid of references to vki_modify_ldt_t and the
+definition of it in vki-arm64-linux.h.  Ditto for 32 bit arm.
 
 
 sigframe-arm64-linux.c: build_sigframe: references to nonexistent
@@ -124,60 +155,54 @@
 replaced by zero.  Also in synth_ucontext.
 
 
-
 m_debugger.c:
 uregs.pstate   = LibVEX_GuestARM64_get_nzcv(vex); /* is this correct? */
 Is that remotely correct?
 
 
-
 host_arm64_defs.c: emit_ARM64INstr:
 ARM64in_VDfromX and ARM64in_VQfromXX: use simple top-half zeroing
 MOVs to vector registers instead of INS Vd.D[0], Xreg, to avoid false
 dependencies on the top half of the register.  (Or at least check
-the semantocs of INS Vd.D[0] to see if it zeroes out the top.)
-
+the semantics of INS Vd.D[0] to see if it zeroes out the top.)
 
 
 preferredVectorSubTypeFromSize: review perf effects and decide
 on a types-for-subparts policy
 
 
-
 fold_IRExpr_Unop: add a reduction rule for this
 1Sto64(CmpNEZ64( Or64(GET:I64(1192),GET:I64(1184)) ))
 vis 1Sto64(CmpNEZ64(x)) --> CmpwNEZ64(x)
 
 
-
 check insn selection for memcheck-only primops:
 Left64 CmpwNEZ64 V128to64 V128HIto64 1Sto64 CmpNEZ64 CmpNEZ32
 widen_z_8_to_64 1Sto32 Left32 32HLto64 CmpwNEZ32 CmpNEZ8
 
 
-
 isel: get rid of various cases where zero is put into a register
 and just use xzr instead.  Especially for CmpNEZ64/32.  And for
 writing zeroes into the CC thunk fields.
 
 
-
 /* Keep this list in sync with that in iselNext below */
 /* Keep this list in sync with that for Ist_Exit above */
 uh .. they are not in sync
 
 
-
 very stupid:
 imm64  x23, 0xFFFFFFFFFFFFFFA0
 17 F4 9F D2 F7 FF BF F2 F7 FF DF F2 F7 FF FF F2 
 
 
-
 valgrind.h: fix VALGRIND_ALIGN_STACK/VALGRIND_RESTORE_STACK,
 also add CFI annotations
 
 
-
 could possibly bring r29 into use, which be useful as it is
 callee saved
+
+
+ubfm/sbfm etc: special case cases that are simple shifts, as iropt
+can't always simplify the general-case IR to a shift in such cases.