Upgrade to valgrind 3.12.0.

Release 3.12.0 (20 October 2016)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

3.12.0 is a feature release with many improvements and the usual
collection of bug fixes.

This release supports X86/Linux, AMD64/Linux, ARM32/Linux,
ARM64/Linux, PPC32/Linux, PPC64BE/Linux, PPC64LE/Linux, S390X/Linux,
MIPS32/Linux, MIPS64/Linux, ARM/Android, ARM64/Android,
MIPS32/Android, X86/Android, X86/Solaris, AMD64/Solaris, X86/MacOSX
10.10 and AMD64/MacOSX 10.10.  There is also preliminary support for
X86/MacOSX 10.11/12, AMD64/MacOSX 10.11/12 and TILEGX/Linux.

* ================== PLATFORM CHANGES =================

* POWER: Support for ISA 3.0 has been added

* mips: support for O32 FPXX ABI has been added.
* mips: improved recognition of different processors
* mips: determination of page size now done at run time

* amd64: Partial support for AMD FMA4 instructions.

* arm, arm64: Support for v8 crypto and CRC instructions.

* Improvements and robustification of the Solaris port.

* Preliminary support for MacOS 10.12 (Sierra) has been added.

Whilst 3.12.0 continues to support the 32-bit x86 instruction set, we
would prefer users to migrate to 64-bit x86 (a.k.a amd64 or x86_64)
where possible.  Valgrind's support for 32-bit x86 has stagnated in
recent years and has fallen far behind that for 64-bit x86
instructions.  By contrast 64-bit x86 is well supported, up to and
including AVX2.

* ==================== TOOL CHANGES ====================

* Memcheck:

  - Added meta mempool support for describing a custom allocator which:
     - Auto-frees all chunks assuming that destroying a pool destroys all
       objects in the pool
     - Uses itself to allocate other memory blocks

  - New flag --ignore-range-below-sp to ignore memory accesses below
    the stack pointer, if you really have to.  The related flag
    --workaround-gcc296-bugs=yes is now deprecated.  Use
    --ignore-range-below-sp=1024-1 as a replacement.

* DRD:

  - Improved thread startup time significantly on non-Linux platforms.

* DHAT

  - Added collection of the metric "tot-blocks-allocd"

* ==================== OTHER CHANGES ====================

* Replacement/wrapping of malloc/new related functions is now done not just
  for system libraries by default, but for any globally defined malloc/new
  related function (both in shared libraries and statically linked alternative
  malloc implementations).  The dynamic (runtime) linker is excluded, though.
  To only intercept malloc/new related functions in
  system libraries use --soname-synonyms=somalloc=nouserintercepts (where
  "nouserintercepts" can be any non-existing library name).
  This new functionality is not implemented for MacOS X.

* The maximum number of callers in a suppression entry is now equal to
  the maximum size for --num-callers (500).
  Note that --gen-suppressions=yes|all similarly generates suppressions
  containing up to --num-callers frames.

* New and modified GDB server monitor features:

  - Valgrind's gdbserver now accepts the command 'catch syscall'.
    Note that you must have GDB >= 7.11 to use 'catch syscall' with
    gdbserver.

* New option --run-cxx-freeres=<yes|no> can be used to change whether
  __gnu_cxx::__freeres() cleanup function is called or not. Default is
  'yes'.

* Valgrind is able to read compressed debuginfo sections in two formats:
  - zlib ELF gABI format with SHF_COMPRESSED flag (gcc option -gz=zlib)
  - zlib GNU format with .zdebug sections (gcc option -gz=zlib-gnu)

* Modest JIT-cost improvements: the cost of instrumenting code blocks
  for the most common use case (x86_64-linux, Memcheck) has been
  reduced by 10%-15%.

* Improved performance for programs that do a lot of discarding of
  instruction address ranges of 8KB or less.

* The C++ symbol demangler has been updated.

* More robustness against invalid syscall parameters on Linux.

* ==================== FIXED BUGS ====================

The following bugs have been fixed or resolved.  Note that "n-i-bz"
stands for "not in bugzilla" -- that is, a bug that was reported to us
but never got a bugzilla entry.  We encourage you to file bugs in
bugzilla (https://bugs.kde.org/enter_bug.cgi?product=valgrind) rather
than mailing the developers (or mailing lists) directly -- bugs that
are not entered into bugzilla tend to get forgotten about or ignored.

To see details of a given bug, visit
  https://bugs.kde.org/show_bug.cgi?id=XXXXXX
where XXXXXX is the bug number as listed below.

191069  Exiting due to signal not reported in XML output
199468  Suppressions: stack size limited to 25
        while --num-callers allows more frames
212352  vex amd64 unhandled opc_aux = 0x 2, first_opcode == 0xDC (FCOM)
278744  cvtps2pd with redundant RexW
303877  valgrind doesn't support compressed debuginfo sections.
345307  Warning about "still reachable" memory when using libstdc++ from gcc 5
348345  Assertion fails for negative lineno
351282  V 3.10.1 MIPS softfloat build broken with GCC 4.9.3 / binutils 2.25.1
351692  Dumps created by valgrind are not readable by gdb (mips32 specific)
351804  Crash on generating suppressions for "printf" call on OS X 10.10
352197  mips: mmap2() not wrapped correctly for page size > 4096
353083  arm64 doesn't implement various xattr system calls
353084  arm64 doesn't support sigpending system call
353137  www: update info for Supported Platforms
353138  www: update "The Valgrind Developers" page
353370  don't advertise RDRAND in cpuid for Core-i7-4910-like avx2 machine
        == 365325
        == 357873
353384  amd64->IR: 0x66 0xF 0x3A 0x62 0xD1 0x62 (pcmpXstrX $0x62)
353398  WARNING: unhandled amd64-solaris syscall: 207
353660  XML in auxwhat tag not escaping reserved symbols properly
353680  s390x: Crash with certain glibc versions due to non-implemented TBEGIN
353727  amd64->IR: 0x66 0xF 0x3A 0x62 0xD1 0x72 (pcmpXstrX $0x72)
353802  ELF debug info reader confused with multiple .rodata sections
353891  Assert 'bad_scanned_addr < VG_ROUNDDN(start+len, sizeof(Addr))' failed
353917  unhandled amd64-solaris syscall fchdir(120)
353920  unhandled amd64-solaris syscall: 170
354274  arm: unhandled instruction: 0xEBAD 0x0AC1 (sub.w sl, sp, r1, lsl #3)
354392  unhandled amd64-solaris syscall: 171
354797  Vbit test does not include Iops for Power 8 instruction support
354883  tst->os_state.pthread - magic_delta assertion failure on OSX 10.11
        == 361351
        == 362920
        == 366222
354933  Fix documentation of --kernel-variant=android-no-hw-tls option
355188  valgrind should intercept all malloc related global functions
355454  do not intercept malloc related symbols from the runtime linker
355455  stderr.exp of test cases wrapmalloc and wrapmallocstatic overconstrained
356044  Dwarf line info reader misinterprets is_stmt register
356112  mips: replace addi with addiu
356393  valgrind (vex) crashes because isZeroU happened
        == 363497
        == 364497
356676  arm64-linux: unhandled syscalls 125, 126 (sched_get_priority_max/min)
356678  arm64-linux: unhandled syscall 232 (mincore)
356817  valgrind.h triggers compiler errors on MSVC when defining NVALGRIND
356823  Unsupported ARM instruction: stlex
357059  x86/amd64: SSE cvtpi2ps with memory source does transition to MMX state
357338  Unhandled instruction for SHA instructions libcrypto Boring SSL
357673  crash if I try to run valgrind with a binary link with libcurl
357833  Setting RLIMIT_DATA to zero breaks with linux 4.5+
357871  pthread_spin_destroy not properly wrapped
357887  Calls to VG_(fclose) do not close the file descriptor
357932  amd64->IR: accept redundant REX prefixes for {minsd,maxsd} m128, xmm.
358030  support direct socket calls on x86 32bit (new in linux 4.3)
358478  drd/tests/std_thread.cpp doesn't build with GCC6
359133  Assertion 'eltSzB <= ddpa->poolSzB' failed
359181  Buffer Overflow during Demangling
359201  futex syscall "skips" argument 5 if op is FUTEX_WAIT_BITSET
359289  s390x: popcnt (B9E1) not implemented
359472  The Power PC vsubuqm instruction doesn't always give the correct result
359503  Add missing syscalls for aarch64 (arm64)
359645  "You need libc6-dbg" help message could be more helpful
359703  s390: wire up separate socketcalls system calls
359724  getsockname might crash - deref_UInt should call safe_to_deref
359733  amd64 implement ld.so strchr/index override like x86
359767  Valgrind does not support the IBM POWER ISA 3.0 instructions, part 1/5
359829  Power PC test suite none/tests/ppc64/test_isa_2_07.c uses
        uninitialized data
359838  arm64: Unhandled instruction 0xD5033F5F (clrex)
359871  Incorrect mask handling in ppoll
359952  Unrecognised PCMPESTRM variants (0x70, 0x19)
360008  Contents of Power vr registers contents is not printed correctly when
        the --vgdb-shadow-registers=yes option is used
360035  POWER PC instruction bcdadd and bcdsubtract generate result with
        non-zero shadow bits
360378  arm64: Unhandled instruction 0x5E280844 (sha1h  s4, s2)
360425  arm64 unsupported instruction ldpsw
        == 364435
360519  none/tests/arm64/memory.vgtest might fail with newer gcc
360571  Error about the Android Runtime reading below the stack pointer on ARM
360574  Wrong parameter type for an ashmem ioctl() call on Android and ARM64
360749  kludge for multiple .rodata sections on Solaris no longer needed
360752  raise the number of reserved fds in m_main.c from 10 to 12
361207  Valgrind does not support the IBM POWER ISA 3.0 instructions, part 2/5
361226  s390x: risbgn (EC59) not implemented
361253  [s390x] ex_clone.c:42: undefined reference to `pthread_create'
361354  ppc64[le]: wire up separate socketcalls system calls
361615  Inconsistent termination for multithreaded process terminated by signal
361926  Unhandled Solaris syscall: sysfs(84)
362009  V dumps core on unimplemented functionality before threads are created
362329  Valgrind does not support the IBM POWER ISA 3.0 instructions, part 3/5
362894  missing (broken) support for wbit field on mtfsfi instruction (ppc64)
362935  [AsusWRT] Assertion 'sizeof(TTEntryC) <= 88' failed
362953  Request for an update to the Valgrind Developers page
363680  add renameat2() support
363705  arm64 missing syscall name_to_handle_at and open_by_handle_at
363714  ppc64 missing syscalls sync, waitid and name_to/open_by_handle_at
363858  Valgrind does not support the IBM POWER ISA 3.0 instructions, part 4/5
364058  clarify in manual limitations of array overruns detections
364413  pselect sycallwrapper mishandles NULL sigmask
364728  Power PC, missing support for several HW registers in
        get_otrack_shadow_offset_wrk()
364948  Valgrind does not support the IBM POWER ISA 3.0 instructions, part 5/5
365273  Invalid write to stack location reported after signal handler runs
365912  ppc64BE segfault during jm-insns test (RELRO)
366079  FPXX Support for MIPS32 Valgrind
366138  Fix configure errors out when using Xcode 8 (clang 8.0.0)
366344  Multiple unhandled instruction for Aarch64
        (0x0EE0E020, 0x1AC15800, 0x4E284801, 0x5E040023, 0x5E056060)
367995  Integration of memcheck with custom memory allocator
368120  x86_linux asm _start functions do not keep 16-byte aligned stack pointer
368412  False positive result for altivec capability check
368416  Add tc06_two_races_xml.exp output for ppc64
368419  Perf Events ioctls not implemented
368461  mmapunmap test fails on ppc64
368823  run_a_thread_NORETURN assembly code typo for VGP_arm64_linux target
369000  AMD64 fma4 instructions unsupported.
369169  ppc64 fails jm_int_isa_2_07 test
369175  jm_vec_isa_2_07 test crashes on ppc64
369209  valgrind loops and eats up all memory if cwd doesn't exist.
369356  pre_mem_read_sockaddr syscall wrapper can crash with bad sockaddr
369359  msghdr_foreachfield can crash when handling bad iovec
369360  Bad sigprocmask old or new sets can crash valgrind
369361  vmsplice syscall wrapper crashes on bad iovec
369362  Bad sigaction arguments crash valgrind
369383  x86 sys_modify_ldt wrapper crashes on bad ptr
369402  Bad set/get_thread_area pointer crashes valgrind
369441  bad lvec argument crashes process_vm_readv/writev syscall wrappers
369446  valgrind crashes on unknown fcntl command
369439  S390x: Unhandled insns RISBLG/RISBHG and LDE/LDER
369468  Remove quadratic metapool algorithm using VG_(HT_remove_at_Iter)
370265  ISA 3.0 HW cap stuff needs updating
371128  BCD add and subtract instructions on Power BE in 32-bit mode do not work
n-i-bz  Fix incorrect (or infinite loop) unwind on RHEL7 x86 and amd64
n-i-bz  massif --pages-as-heap=yes does not report peak caused by mmap+munmap
n-i-bz  false positive leaks due to aspacemgr merging heap & non heap segments
n-i-bz  Fix ppoll_alarm exclusion on OS X
n-i-bz  Document brk segment limitation, reference manual in limit reached msg.
n-i-bz  Fix clobber list in none/tests/amd64/xacq_xrel.c [valgrind r15737]
n-i-bz  Bump allowed shift value for "add.w reg, sp, reg, lsl #N" [vex r3206]
n-i-bz  amd64: memcheck false positive with shr %edx
n-i-bz  arm3: Allow early writeback of SP base register in "strd rD, [sp, #-16]"
n-i-bz  ppc: Fix two cases of PPCAvFpOp vs PPCFpOp enum confusion
n-i-bz  arm: Fix incorrect register-number constraint check for LDAEX{,B,H,D}
n-i-bz  DHAT: added collection of the metric "tot-blocks-allocd"

(3.12.0.RC1:  20 October 2016, vex r3282, valgrind r16094)
(3.12.0.RC2:  20 October 2016, vex r3282, valgrind r16096)
(3.12.0:      21 October 2016, vex r3282, valgrind r16098)

Bug: http://b/37470713
Bug: http://b/29251682
Test: ran runtests-arm(64)?.sh and the bug reporter's specific binary (32- and 64-bit)
Change-Id: I43ccbea946d89fc4ae9f355181ac5061d6ce4453
diff --git a/VEX/priv/guest_amd64_helpers.c b/VEX/priv/guest_amd64_helpers.c
index ab53e15..3a0a4c6 100644
--- a/VEX/priv/guest_amd64_helpers.c
+++ b/VEX/priv/guest_amd64_helpers.c
@@ -1604,6 +1604,15 @@
                            mkU64(0)));
       }
 
+      /*---------------- SHRL ----------------*/
+
+      if (isU64(cc_op, AMD64G_CC_OP_SHRL) && isU64(cond, AMD64CondZ)) {
+         /* SHRL, then Z --> test dep1 == 0 */
+         return unop(Iop_1Uto64,
+                     binop(Iop_CmpEQ32, unop(Iop_64to32, cc_dep1),
+                           mkU32(0)));
+      }
+
       /*---------------- COPY ----------------*/
       /* This can happen, as a result of amd64 FP compares: "comisd ... ;
          jbe" for example. */
diff --git a/VEX/priv/guest_amd64_toIR.c b/VEX/priv/guest_amd64_toIR.c
index af4817f..2080dc0 100644
--- a/VEX/priv/guest_amd64_toIR.c
+++ b/VEX/priv/guest_amd64_toIR.c
@@ -6401,19 +6401,20 @@
                fp_do_op_mem_ST_0 ( addr, "mul", dis_buf, Iop_MulF64, True );
                break;
 
-//..             case 2: /* FCOM double-real */
-//..                DIP("fcoml %s\n", dis_buf);
-//..                /* This forces C1 to zero, which isn't right. */
-//..                put_C3210( 
-//..                    binop( Iop_And32,
-//..                           binop(Iop_Shl32, 
-//..                                 binop(Iop_CmpF64, 
-//..                                       get_ST(0),
-//..                                       loadLE(Ity_F64,mkexpr(addr))),
-//..                                 mkU8(8)),
-//..                           mkU32(0x4500)
-//..                    ));
-//..                break;  
+            case 2: /* FCOM double-real */
+               DIP("fcoml %s\n", dis_buf);
+               /* This forces C1 to zero, which isn't right. */
+               put_C3210(
+                   unop(Iop_32Uto64,
+                   binop( Iop_And32,
+                          binop(Iop_Shl32,
+                                binop(Iop_CmpF64,
+                                      get_ST(0),
+                                      loadLE(Ity_F64,mkexpr(addr))),
+                                mkU8(8)),
+                          mkU32(0x4500)
+                   )));
+               break;
 
             case 3: /* FCOMP double-real */
                DIP("fcompl %s\n", dis_buf);
@@ -12847,8 +12848,10 @@
          IRTemp rmode = newTemp(Ity_I32);
 
          modrm = getUChar(delta);
-         do_MMX_preamble();
          if (epartIsReg(modrm)) {
+            /* Only switch to MMX mode if the source is a MMX register.
+               See comments on CVTPI2PD for details.  Fixes #357059. */
+            do_MMX_preamble();
             assign( arg64, getMMXReg(eregLO3ofRM(modrm)) );
             delta += 1;
             DIP("cvtpi2ps %s,%s\n", nameMMXReg(eregLO3ofRM(modrm)),
@@ -13519,7 +13522,8 @@
          goto decode_success;
       }
       /* F2 0F 5D = MINSD -- min 64F0x2 from R/M to R */
-      if (haveF2no66noF3(pfx) && sz == 4) {
+      if (haveF2no66noF3(pfx)
+          && (sz == 4 || /* ignore redundant REX.W */ sz == 8)) {
          delta = dis_SSE_E_to_G_lo64( vbi, pfx, delta, "minsd", Iop_Min64F0x2 );
          goto decode_success;
       }
@@ -13565,7 +13569,8 @@
          goto decode_success;
       }
       /* F2 0F 5F = MAXSD -- max 64F0x2 from R/M to R */
-      if (haveF2no66noF3(pfx) && sz == 4) {
+      if (haveF2no66noF3(pfx)
+          && (sz == 4 || /* ignore redundant REX.W */ sz == 8)) {
          delta = dis_SSE_E_to_G_lo64( vbi, pfx, delta, "maxsd", Iop_Max64F0x2 );
          goto decode_success;
       }
@@ -18666,16 +18671,25 @@
       immediate byte.  Is it one we can actually handle? Throw out any
       cases for which the helper function has not been verified. */
    switch (imm) {
-      case 0x00: case 0x02: case 0x08: case 0x0A: case 0x0C: case 0x0E:
-      case 0x12: case 0x14: case 0x18: case 0x1A:
-      case 0x30: case 0x34: case 0x38: case 0x3A:
-      case 0x40: case 0x42: case 0x44: case 0x46: case 0x4A:
+      case 0x00: case 0x02:
+      case 0x08: case 0x0A: case 0x0C: case 0x0E:
+                 case 0x12: case 0x14:
+      case 0x18: case 0x1A:
+      case 0x30:            case 0x34:
+      case 0x38: case 0x3A:
+      case 0x40: case 0x42: case 0x44: case 0x46:
+                 case 0x4A:
+                 case 0x62:
+      case 0x70: case 0x72:
          break;
       // the 16-bit character versions of the above
-      case 0x01: case 0x03: case 0x09: case 0x0B: case 0x0D:
-      case 0x13:                       case 0x1B:
-                            case 0x39: case 0x3B:
-                            case 0x45:            case 0x4B:
+      case 0x01: case 0x03:
+      case 0x09: case 0x0B: case 0x0D:
+                 case 0x13:
+      case 0x19: case 0x1B:
+      case 0x39: case 0x3B:
+                            case 0x45:
+                 case 0x4B:
          break;
       default:
          return delta0; /*FAIL*/
@@ -30086,6 +30100,136 @@
    return delta;
 }
 
+/* operand format:
+ * [0] = dst
+ * [n] = srcn
+ */
+static Long decode_vregW(Int count, Long delta, UChar modrm, Prefix pfx,
+                         const VexAbiInfo* vbi, IRTemp *v, UInt *dst, Int swap)
+{
+   v[0] = newTemp(Ity_V128);
+   v[1] = newTemp(Ity_V128);
+   v[2] = newTemp(Ity_V128);
+   v[3] = newTemp(Ity_V128);
+   IRTemp addr = IRTemp_INVALID;
+   Int    alen = 0;
+   HChar  dis_buf[50];
+
+   *dst = gregOfRexRM(pfx, modrm);
+   assign( v[0], getXMMReg(*dst) );
+
+   if ( epartIsReg( modrm ) ) {
+      UInt ereg = eregOfRexRM(pfx, modrm);
+      assign(swap ? v[count-1] : v[count-2], getXMMReg(ereg) );
+      DIS(dis_buf, "%s", nameXMMReg(ereg));
+   } else {
+      Bool extra_byte = (getUChar(delta - 3) & 0xF) != 9;
+                 addr = disAMode(&alen, vbi, pfx, delta, dis_buf, extra_byte);
+      assign(swap ? v[count-1] : v[count-2], loadLE(Ity_V128, mkexpr(addr)));
+      delta += alen - 1;
+   }
+
+   UInt vvvv = getVexNvvvv(pfx);
+   switch(count) {
+      case 2:
+         DIP( "%s,%s", nameXMMReg(*dst), dis_buf );
+         break;
+      case 3:
+         assign( swap ? v[1] : v[2], getXMMReg(vvvv) );
+         DIP( "%s,%s,%s", nameXMMReg(*dst), nameXMMReg(vvvv), dis_buf );
+         break;
+      case 4:
+         {
+            assign( v[1], getXMMReg(vvvv) );
+            UInt src2 = getUChar(delta + 1) >> 4;
+            assign( swap ? v[2] : v[3], getXMMReg(src2) );
+            DIP( "%s,%s,%s,%s", nameXMMReg(*dst), nameXMMReg(vvvv),
+                                nameXMMReg(src2), dis_buf );
+         }
+         break;
+   }
+   return delta + 1;
+}
+
+static Long dis_FMA4 (Prefix pfx, Long delta, UChar opc,
+                      Bool* uses_vvvv, const VexAbiInfo* vbi )
+{
+   UInt dst;
+   *uses_vvvv = True;
+
+   UChar  modrm   = getUChar(delta);
+
+   Bool zero_64F = False;
+   Bool zero_96F = False;
+   UInt is_F32   = ((opc & 0x01) == 0x00) ? 1 : 0;
+   Bool neg      = (opc & 0xF0) == 0x70;
+   Bool alt      = (opc & 0xF0) == 0x50;
+   Bool sub      = alt ? (opc & 0x0E) != 0x0E : (opc & 0x0C) == 0x0C;
+
+   IRTemp operand[4];
+   switch(opc & 0xF) {
+      case 0x0A: zero_96F = (opc >> 4) != 0x05; break;
+      case 0x0B: zero_64F = (opc >> 4) != 0x05; break;
+      case 0x0E: zero_96F = (opc >> 4) != 0x05; break;
+      case 0x0F: zero_64F = (opc >> 4) != 0x05; break;
+      default: break;
+   }
+   DIP("vfm%s",                  neg ?   "n" : "");
+   if(alt) DIP("%s",             sub ? "add" : "sub");
+   DIP("%s",                     sub ? "sub" : "add");
+   DIP("%c ", (zero_64F || zero_96F) ?   's' : 'p');
+   DIP("%c ",                is_F32  ?   's' : 'd');
+   delta = decode_vregW(4, delta, modrm, pfx, vbi, operand, &dst, getRexW(pfx));
+   DIP("\n");
+   IRExpr *src[3];
+
+   void (*putXMM[2])(UInt,Int,IRExpr*) = {&putXMMRegLane64F, &putXMMRegLane32F};
+
+   IROp size_op[] = {Iop_V128to64, Iop_V128HIto64, Iop_64to32, Iop_64HIto32};
+   IROp neg_op[]  = {Iop_NegF64, Iop_NegF32};
+   int i, j;
+   for(i = 0; i < is_F32 * 2 + 2; i++) {
+      for(j = 0; j < 3; j++) {
+         if(is_F32) {
+            src[j] = unop(Iop_ReinterpI32asF32,
+                        unop(size_op[i%2+2],
+                           unop(size_op[i/2],
+                                 mkexpr(operand[j + 1])
+                              )
+                           ));
+         } else {
+            src[j] = unop(Iop_ReinterpI64asF64,
+                        unop(size_op[i%2],
+                           mkexpr(operand[j + 1])
+                        ));
+         }
+      }
+      putXMM[is_F32](dst, i, IRExpr_Qop(is_F32 ? Iop_MAddF32 : Iop_MAddF64,
+                                             get_FAKE_roundingmode(),
+                                             neg ? unop(neg_op[is_F32], src[0])
+                                                 : src[0],
+                                             src[1],
+                                             sub ? unop(neg_op[is_F32], src[2])
+                                                 : src[2]
+                                          ));
+      if(alt) {
+         sub = !sub;
+      }
+   }
+
+   /* Zero out top bits of ymm/xmm register. */
+   putYMMRegLane128( dst, 1, mkV128(0) );
+
+   if(zero_64F || zero_96F) {
+      putXMMRegLane64( dst, 1, IRExpr_Const(IRConst_U64(0)));
+   }
+
+   if(zero_96F) {
+      putXMMRegLane32( dst, 1, IRExpr_Const(IRConst_U32(0)));
+   }
+
+   return delta+1;
+}
 
 /*------------------------------------------------------------*/
 /*---                                                      ---*/
@@ -31630,6 +31774,16 @@
          /* else fall though; dis_PCMPxSTRx failed to decode it */
       }
       break;
+   case 0x5C ... 0x5F:
+   case 0x68 ... 0x6F:
+   case 0x78 ... 0x7F:
+      if (have66noF2noF3(pfx) && 0==getVexL(pfx)/*128*/) {
+         Long delta0 = delta;
+         delta = dis_FMA4( pfx, delta, opc, uses_vvvv, vbi );
+         if (delta > delta0) goto decode_success;
+         /* else fall though; dis_FMA4 failed to decode it */
+      }
+      break;
 
    case 0xDF:
       /* VAESKEYGENASSIST imm8, xmm2/m128, xmm1 = VEX.128.66.0F3A.WIG DF /r */
@@ -32075,7 +32229,7 @@
    /* All decode failures end up here. */
    if (sigill_diag) {
       vex_printf("vex amd64->IR: unhandled instruction bytes: "
-                 "0x%x 0x%x 0x%x 0x%x 0x%x 0x%x 0x%x 0x%x\n",
+                 "0x%x 0x%x 0x%x 0x%x 0x%x 0x%x 0x%x 0x%x 0x%x 0x%x\n",
                  getUChar(delta_start+0),
                  getUChar(delta_start+1),
                  getUChar(delta_start+2),
@@ -32083,7 +32237,9 @@
                  getUChar(delta_start+4),
                  getUChar(delta_start+5),
                  getUChar(delta_start+6),
-                 getUChar(delta_start+7) );
+                 getUChar(delta_start+7),
+                 getUChar(delta_start+8),
+                 getUChar(delta_start+9) );
       vex_printf("vex amd64->IR:   REX=%d REX.W=%d REX.R=%d REX.X=%d REX.B=%d\n",
                  haveREX(pfx) ? 1 : 0, getRexW(pfx), getRexR(pfx),
                  getRexX(pfx), getRexB(pfx));
diff --git a/VEX/priv/guest_arm64_defs.h b/VEX/priv/guest_arm64_defs.h
index 0d8b83b..ad1ab96 100644
--- a/VEX/priv/guest_arm64_defs.h
+++ b/VEX/priv/guest_arm64_defs.h
@@ -110,11 +110,64 @@
 //ZZ UInt armg_calculate_flag_qc ( UInt resL1, UInt resL2,
 //ZZ                               UInt resR1, UInt resR2 );
 
+extern ULong arm64g_calc_crc32b ( ULong acc, ULong bits );
+extern ULong arm64g_calc_crc32h ( ULong acc, ULong bits );
+extern ULong arm64g_calc_crc32w ( ULong acc, ULong bits );
+extern ULong arm64g_calc_crc32x ( ULong acc, ULong bits );
+
+extern ULong arm64g_calc_crc32cb ( ULong acc, ULong bits );
+extern ULong arm64g_calc_crc32ch ( ULong acc, ULong bits );
+extern ULong arm64g_calc_crc32cw ( ULong acc, ULong bits );
+extern ULong arm64g_calc_crc32cx ( ULong acc, ULong bits );
 
 /* --- DIRTY HELPERS --- */
 
 extern ULong arm64g_dirtyhelper_MRS_CNTVCT_EL0 ( void );
 
+extern void  arm64g_dirtyhelper_PMULLQ ( /*OUT*/V128* res,
+                                         ULong arg1, ULong arg2 );
+
+extern void  arm64g_dirtyhelper_AESE ( /*OUT*/V128* res,
+                                       ULong argHi, ULong argLo );
+extern void  arm64g_dirtyhelper_AESD ( /*OUT*/V128* res,
+                                       ULong argHi, ULong argLo );
+extern void  arm64g_dirtyhelper_AESMC  ( /*OUT*/V128* res,
+                                         ULong argHi, ULong argLo );
+extern void  arm64g_dirtyhelper_AESIMC ( /*OUT*/V128* res,
+                                         ULong argHi, ULong argLo );
+
+extern
+void arm64g_dirtyhelper_SHA1C ( /*OUT*/V128* res, ULong dHi, ULong dLo,
+                                ULong nHi, ULong nLo, ULong mHi, ULong mLo );
+extern
+void arm64g_dirtyhelper_SHA1H ( /*OUT*/V128* res,
+                                ULong nHi, ULong nLo );
+extern
+void arm64g_dirtyhelper_SHA1M ( /*OUT*/V128* res, ULong dHi, ULong dLo,
+                                ULong nHi, ULong nLo, ULong mHi, ULong mLo );
+extern
+void arm64g_dirtyhelper_SHA1P ( /*OUT*/V128* res, ULong dHi, ULong dLo,
+                                ULong nHi, ULong nLo, ULong mHi, ULong mLo );
+extern
+void arm64g_dirtyhelper_SHA1SU0 ( /*OUT*/V128* res, ULong dHi, ULong dLo,
+                                  ULong nHi, ULong nLo, ULong mHi, ULong mLo );
+extern
+void arm64g_dirtyhelper_SHA1SU1 ( /*OUT*/V128* res, ULong dHi, ULong dLo,
+                                  ULong nHi, ULong nLo );
+extern
+void arm64g_dirtyhelper_SHA256H2 ( /*OUT*/V128* res, ULong dHi, ULong dLo,
+                                   ULong nHi, ULong nLo, ULong mHi, ULong mLo );
+extern
+void arm64g_dirtyhelper_SHA256H ( /*OUT*/V128* res, ULong dHi, ULong dLo,
+                                  ULong nHi, ULong nLo, ULong mHi, ULong mLo );
+extern
+void arm64g_dirtyhelper_SHA256SU0 ( /*OUT*/V128* res, ULong dHi, ULong dLo,
+                                    ULong nHi, ULong nLo );
+extern
+void arm64g_dirtyhelper_SHA256SU1 ( /*OUT*/V128* res, ULong dHi, ULong dLo,
+                                    ULong nHi, ULong nLo,
+                                    ULong mHi, ULong mLo );
+
 
 /*---------------------------------------------------------*/
 /*--- Condition code stuff                              ---*/
diff --git a/VEX/priv/guest_arm64_helpers.c b/VEX/priv/guest_arm64_helpers.c
index 0cf8046..5edeef1 100644
--- a/VEX/priv/guest_arm64_helpers.c
+++ b/VEX/priv/guest_arm64_helpers.c
@@ -677,6 +677,88 @@
 }
 
 
+/* CALLED FROM GENERATED CODE: CLEAN HELPER */
+ULong arm64g_calc_crc32b ( ULong acc, ULong bits )
+{
+   UInt  i;
+   ULong crc = (bits & 0xFFULL) ^ acc;
+   for (i = 0; i < 8; i++)
+      crc = (crc >> 1) ^ ((crc & 1) ? 0xEDB88320ULL : 0);
+   return crc;
+}
+
+/* CALLED FROM GENERATED CODE: CLEAN HELPER */
+ULong arm64g_calc_crc32h ( ULong acc, ULong bits )
+{
+   UInt  i;
+   ULong crc = (bits & 0xFFFFULL) ^ acc;
+   for (i = 0; i < 16; i++)
+      crc = (crc >> 1) ^ ((crc & 1) ? 0xEDB88320ULL : 0);
+   return crc;
+}
+
+/* CALLED FROM GENERATED CODE: CLEAN HELPER */
+ULong arm64g_calc_crc32w ( ULong acc, ULong bits )
+{
+   UInt  i;
+   ULong crc = (bits & 0xFFFFFFFFULL) ^ acc;
+   for (i = 0; i < 32; i++)
+      crc = (crc >> 1) ^ ((crc & 1) ? 0xEDB88320ULL : 0);
+   return crc;
+}
+
+/* CALLED FROM GENERATED CODE: CLEAN HELPER */
+ULong arm64g_calc_crc32x ( ULong acc, ULong bits )
+{
+   UInt  i;
+   ULong crc = bits ^ acc;
+   for (i = 0; i < 64; i++)
+      crc = (crc >> 1) ^ ((crc & 1) ? 0xEDB88320ULL : 0);
+   return crc;
+
+}
+
+/* CALLED FROM GENERATED CODE: CLEAN HELPER */
+ULong arm64g_calc_crc32cb ( ULong acc, ULong bits )
+{
+   UInt  i;
+   ULong crc = (bits & 0xFFULL) ^ acc;
+   for (i = 0; i < 8; i++)
+      crc = (crc >> 1) ^ ((crc & 1) ? 0x82F63B78ULL : 0);
+   return crc;
+}
+
+/* CALLED FROM GENERATED CODE: CLEAN HELPER */
+ULong arm64g_calc_crc32ch ( ULong acc, ULong bits )
+{
+   UInt  i;
+   ULong crc = (bits & 0xFFFFULL) ^ acc;
+   for (i = 0; i < 16; i++)
+      crc = (crc >> 1) ^ ((crc & 1) ? 0x82F63B78ULL : 0);
+   return crc;
+}
+
+/* CALLED FROM GENERATED CODE: CLEAN HELPER */
+ULong arm64g_calc_crc32cw ( ULong acc, ULong bits )
+{
+   UInt  i;
+   ULong crc = (bits & 0xFFFFFFFFULL) ^ acc;
+   for (i = 0; i < 32; i++)
+      crc = (crc >> 1) ^ ((crc & 1) ? 0x82F63B78ULL : 0);
+   return crc;
+}
+
+/* CALLED FROM GENERATED CODE: CLEAN HELPER */
+ULong arm64g_calc_crc32cx ( ULong acc, ULong bits )
+{
+   UInt  i;
+   ULong crc = bits ^ acc;
+   for (i = 0; i < 64; i++)
+      crc = (crc >> 1) ^ ((crc & 1) ? 0x82F63B78ULL : 0);
+   return crc;
+}
+
+
 /* CALLED FROM GENERATED CODE */
 /* DIRTY HELPER (non-referentially-transparent) */
 /* Horrible hack.  On non-arm64 platforms, return 0. */
@@ -692,6 +774,509 @@
 }
 
 
+void arm64g_dirtyhelper_PMULLQ ( /*OUT*/V128* res, ULong arg1, ULong arg2 )
+{
+   /* This doesn't need to be a dirty helper, except for the fact that
+      a clean helper can't return a 128 bit value.  This is a pretty
+      lame implementation of PMULLQ, but at least it doesn't contain any
+      data dependent branches, and has lots of ILP.  I guess we could unroll
+      the loop completely and offer extensive prayers to the gods of ILP
+      if more performance is needed. */
+   UInt i;
+   ULong accHi = 0, accLo = 0;
+   ULong op2Hi = 0, op2Lo = arg2;
+   for (i = 0; i < 64; i++) {
+      /* Make |mask| be all 0s or all 1s, a copy of arg1[i] */
+      Long mask = arg1 << (63-i);
+      mask >>= 63;
+      accHi ^= (op2Hi & mask);
+      accLo ^= (op2Lo & mask);
+      /* do: op2Hi:op2Lo <<=u 1 */
+      op2Hi <<= 1;
+      op2Hi |= ((op2Lo >> 63) & 1);
+      op2Lo <<= 1;
+   }
+   res->w64[1] = accHi;
+   res->w64[0] = accLo;
+}
+
+
+/*---------------------------------------------------------------*/
+/*--- Crypto instruction helpers                              ---*/
+/*---------------------------------------------------------------*/
+
+/* DIRTY HELPERS for doing AES support:
+   * AESE (SubBytes, then ShiftRows)
+   * AESD (InvShiftRows, then InvSubBytes)
+   * AESMC (MixColumns)
+   * AESIMC (InvMixColumns)
+   These don't actually have to be dirty helpers -- they could be
+   clean, but for the fact that they return a V128 and a clean helper
+   can't do that.
+
+   The ARMv8 manual seems to imply that AESE first performs ShiftRows,
+   then SubBytes.  This seems to contradict FIPS 197, so the
+   implementation below is consistent with FIPS 197.  One can observe
+   that the two transformations commute -- the order in which they
+   happen makes no difference to the result.  So the ambiguity doesn't
+   actually matter, but it is confusing.  The v8 manual looks correct
+   about AESD, though.
+
+   The three functions rj_xtime, aesMixColumn and aesInvMixColumn only,
+   are taken from "A byte-oriented AES-256 implementation" and are subject
+   to the following usage terms:
+
+     Byte-oriented AES-256 implementation.
+     All lookup tables replaced with 'on the fly' calculations.
+
+     Copyright (c) 2007-2011 Ilya O. Levin, http://www.literatecode.com
+     Other contributors: Hal Finney
+
+     Permission to use, copy, modify, and distribute this software for any
+     purpose with or without fee is hereby granted, provided that the above
+     copyright notice and this permission notice appear in all copies.
+
+     THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+     WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+     MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+     ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+     WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+     ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+     OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+*/
+
+const UChar aesMapSubBytes[256]
+   = { 0x63, 0x7c, 0x77, 0x7b, 0xf2, 0x6b, 0x6f, 0xc5,
+       0x30, 0x01, 0x67, 0x2b, 0xfe, 0xd7, 0xab, 0x76,
+       0xca, 0x82, 0xc9, 0x7d, 0xfa, 0x59, 0x47, 0xf0,
+       0xad, 0xd4, 0xa2, 0xaf, 0x9c, 0xa4, 0x72, 0xc0,
+       0xb7, 0xfd, 0x93, 0x26, 0x36, 0x3f, 0xf7, 0xcc,
+       0x34, 0xa5, 0xe5, 0xf1, 0x71, 0xd8, 0x31, 0x15,
+       0x04, 0xc7, 0x23, 0xc3, 0x18, 0x96, 0x05, 0x9a,
+       0x07, 0x12, 0x80, 0xe2, 0xeb, 0x27, 0xb2, 0x75,
+       0x09, 0x83, 0x2c, 0x1a, 0x1b, 0x6e, 0x5a, 0xa0,
+       0x52, 0x3b, 0xd6, 0xb3, 0x29, 0xe3, 0x2f, 0x84,
+       0x53, 0xd1, 0x00, 0xed, 0x20, 0xfc, 0xb1, 0x5b,
+       0x6a, 0xcb, 0xbe, 0x39, 0x4a, 0x4c, 0x58, 0xcf,
+       0xd0, 0xef, 0xaa, 0xfb, 0x43, 0x4d, 0x33, 0x85,
+       0x45, 0xf9, 0x02, 0x7f, 0x50, 0x3c, 0x9f, 0xa8,
+       0x51, 0xa3, 0x40, 0x8f, 0x92, 0x9d, 0x38, 0xf5,
+       0xbc, 0xb6, 0xda, 0x21, 0x10, 0xff, 0xf3, 0xd2,
+       0xcd, 0x0c, 0x13, 0xec, 0x5f, 0x97, 0x44, 0x17,
+       0xc4, 0xa7, 0x7e, 0x3d, 0x64, 0x5d, 0x19, 0x73,
+       0x60, 0x81, 0x4f, 0xdc, 0x22, 0x2a, 0x90, 0x88,
+       0x46, 0xee, 0xb8, 0x14, 0xde, 0x5e, 0x0b, 0xdb,
+       0xe0, 0x32, 0x3a, 0x0a, 0x49, 0x06, 0x24, 0x5c,
+       0xc2, 0xd3, 0xac, 0x62, 0x91, 0x95, 0xe4, 0x79,
+       0xe7, 0xc8, 0x37, 0x6d, 0x8d, 0xd5, 0x4e, 0xa9,
+       0x6c, 0x56, 0xf4, 0xea, 0x65, 0x7a, 0xae, 0x08,
+       0xba, 0x78, 0x25, 0x2e, 0x1c, 0xa6, 0xb4, 0xc6,
+       0xe8, 0xdd, 0x74, 0x1f, 0x4b, 0xbd, 0x8b, 0x8a,
+       0x70, 0x3e, 0xb5, 0x66, 0x48, 0x03, 0xf6, 0x0e,
+       0x61, 0x35, 0x57, 0xb9, 0x86, 0xc1, 0x1d, 0x9e,
+       0xe1, 0xf8, 0x98, 0x11, 0x69, 0xd9, 0x8e, 0x94,
+       0x9b, 0x1e, 0x87, 0xe9, 0xce, 0x55, 0x28, 0xdf,
+       0x8c, 0xa1, 0x89, 0x0d, 0xbf, 0xe6, 0x42, 0x68,
+       0x41, 0x99, 0x2d, 0x0f, 0xb0, 0x54, 0xbb, 0x16
+     };
+
+const UChar aesMapInvSubBytes[256]
+   = { 0x52, 0x09, 0x6a, 0xd5, 0x30, 0x36, 0xa5, 0x38,
+       0xbf, 0x40, 0xa3, 0x9e, 0x81, 0xf3, 0xd7, 0xfb,
+       0x7c, 0xe3, 0x39, 0x82, 0x9b, 0x2f, 0xff, 0x87,
+       0x34, 0x8e, 0x43, 0x44, 0xc4, 0xde, 0xe9, 0xcb,
+       0x54, 0x7b, 0x94, 0x32, 0xa6, 0xc2, 0x23, 0x3d,
+       0xee, 0x4c, 0x95, 0x0b, 0x42, 0xfa, 0xc3, 0x4e,
+       0x08, 0x2e, 0xa1, 0x66, 0x28, 0xd9, 0x24, 0xb2,
+       0x76, 0x5b, 0xa2, 0x49, 0x6d, 0x8b, 0xd1, 0x25,
+       0x72, 0xf8, 0xf6, 0x64, 0x86, 0x68, 0x98, 0x16,
+       0xd4, 0xa4, 0x5c, 0xcc, 0x5d, 0x65, 0xb6, 0x92,
+       0x6c, 0x70, 0x48, 0x50, 0xfd, 0xed, 0xb9, 0xda,
+       0x5e, 0x15, 0x46, 0x57, 0xa7, 0x8d, 0x9d, 0x84,
+       0x90, 0xd8, 0xab, 0x00, 0x8c, 0xbc, 0xd3, 0x0a,
+       0xf7, 0xe4, 0x58, 0x05, 0xb8, 0xb3, 0x45, 0x06,
+       0xd0, 0x2c, 0x1e, 0x8f, 0xca, 0x3f, 0x0f, 0x02,
+       0xc1, 0xaf, 0xbd, 0x03, 0x01, 0x13, 0x8a, 0x6b,
+       0x3a, 0x91, 0x11, 0x41, 0x4f, 0x67, 0xdc, 0xea,
+       0x97, 0xf2, 0xcf, 0xce, 0xf0, 0xb4, 0xe6, 0x73,
+       0x96, 0xac, 0x74, 0x22, 0xe7, 0xad, 0x35, 0x85,
+       0xe2, 0xf9, 0x37, 0xe8, 0x1c, 0x75, 0xdf, 0x6e,
+       0x47, 0xf1, 0x1a, 0x71, 0x1d, 0x29, 0xc5, 0x89,
+       0x6f, 0xb7, 0x62, 0x0e, 0xaa, 0x18, 0xbe, 0x1b,
+       0xfc, 0x56, 0x3e, 0x4b, 0xc6, 0xd2, 0x79, 0x20,
+       0x9a, 0xdb, 0xc0, 0xfe, 0x78, 0xcd, 0x5a, 0xf4,
+       0x1f, 0xdd, 0xa8, 0x33, 0x88, 0x07, 0xc7, 0x31,
+       0xb1, 0x12, 0x10, 0x59, 0x27, 0x80, 0xec, 0x5f,
+       0x60, 0x51, 0x7f, 0xa9, 0x19, 0xb5, 0x4a, 0x0d,
+       0x2d, 0xe5, 0x7a, 0x9f, 0x93, 0xc9, 0x9c, 0xef,
+       0xa0, 0xe0, 0x3b, 0x4d, 0xae, 0x2a, 0xf5, 0xb0,
+       0xc8, 0xeb, 0xbb, 0x3c, 0x83, 0x53, 0x99, 0x61,
+       0x17, 0x2b, 0x04, 0x7e, 0xba, 0x77, 0xd6, 0x26,
+       0xe1, 0x69, 0x14, 0x63, 0x55, 0x21, 0x0c, 0x7d
+     };
+
+static inline UChar rj_xtime ( UChar x )
+{
+   UChar y = (UChar)(x << 1);
+   return (x & 0x80) ? (y ^ 0x1b) : y;
+}
+
+static void aesMixColumn ( /*MOD*/UChar* r )
+{
+   UChar a = r[0];
+   UChar b = r[1];
+   UChar c = r[2];
+   UChar d = r[3];
+   UChar e = a ^ b ^ c ^ d;
+   r[0] ^= e ^ rj_xtime(a ^ b);
+   r[1] ^= e ^ rj_xtime(b ^ c);
+   r[2] ^= e ^ rj_xtime(c ^ d);
+   r[3] ^= e ^ rj_xtime(d ^ a);
+}
+
+static void aesInvMixColumn ( /*MOD*/UChar* r )
+{
+   UChar a = r[0];
+   UChar b = r[1];
+   UChar c = r[2];
+   UChar d = r[3];
+   UChar e = a ^ b ^ c ^ d;
+   UChar z = rj_xtime(e);
+   UChar x = e ^ rj_xtime(rj_xtime(z ^ a ^ c));
+   UChar y = e ^ rj_xtime(rj_xtime(z ^ b ^ d));
+   r[0] ^= x ^ rj_xtime(a ^ b);
+   r[1] ^= y ^ rj_xtime(b ^ c);
+   r[2] ^= x ^ rj_xtime(c ^ d);
+   r[3] ^= y ^ rj_xtime(d ^ a);
+}
+
+
+/* CALLED FROM GENERATED CODE */
+void arm64g_dirtyhelper_AESE ( /*OUT*/V128* res, ULong argHi, ULong argLo )
+{
+   res->w64[1] = argHi;
+   res->w64[0] = argLo;
+
+   /* First do SubBytes on the State. */
+   UInt i;
+   for (i = 0; i < 16; i++) {
+      res->w8[i] = aesMapSubBytes[res->w8[i] & 0xFF];
+   }
+
+   /* Then do ShiftRows on the State. */
+#  define XX(_ix) res->w8[_ix]
+   { UChar old1 = XX(1);
+     XX(1) = XX(5); XX(5) = XX(9); XX(9) = XX(13); XX(13) = old1;
+   }
+   { UChar old2 = XX(2); UChar old6 = XX(6);
+     XX(2) = XX(10); XX(6) = XX(14); XX(10) = old2; XX(14) = old6;
+   }
+   { UChar old15 = XX(15);
+     XX(15) = XX(11); XX(11) = XX(7); XX(7) = XX(3); XX(3) = old15;
+   }
+#  undef XX
+}
+
+
+/* CALLED FROM GENERATED CODE */
+void arm64g_dirtyhelper_AESD ( /*OUT*/V128* res, ULong argHi, ULong argLo )
+{
+   res->w64[1] = argHi;
+   res->w64[0] = argLo;
+
+   /* First do InvShiftRows on the State. */
+#  define XX(_ix) res->w8[_ix]
+   { UChar old13 = XX(13);
+     XX(13) = XX(9); XX(9) = XX(5); XX(5) = XX(1); XX(1) = old13;
+   }
+   { UChar old14 = XX(14); UChar old10 = XX(10);
+     XX(14) = XX(6); XX(10) = XX(2); XX(6) = old14; XX(2) = old10;
+   }
+   { UChar old3 = XX(3);
+     XX(3) = XX(7); XX(7) = XX(11); XX(11) = XX(15); XX(15) = old3;
+   }
+#  undef XX
+
+/* Then do InvSubBytes on the State. */
+   UInt i;
+   for (i = 0; i < 16; i++) {
+      res->w8[i] = aesMapInvSubBytes[res->w8[i] & 0xFF];
+   }
+}
+
+
+/* CALLED FROM GENERATED CODE */
+void arm64g_dirtyhelper_AESMC ( /*OUT*/V128* res, ULong argHi, ULong argLo )
+{
+   res->w64[1] = argHi;
+   res->w64[0] = argLo;
+   aesMixColumn(&res->w8[0]);
+   aesMixColumn(&res->w8[4]);
+   aesMixColumn(&res->w8[8]);
+   aesMixColumn(&res->w8[12]);
+}
+
+
+/* CALLED FROM GENERATED CODE */
+void arm64g_dirtyhelper_AESIMC ( /*OUT*/V128* res, ULong argHi, ULong argLo )
+{
+   res->w64[1] = argHi;
+   res->w64[0] = argLo;
+   aesInvMixColumn(&res->w8[0]);
+   aesInvMixColumn(&res->w8[4]);
+   aesInvMixColumn(&res->w8[8]);
+   aesInvMixColumn(&res->w8[12]);
+}
+
+
+/* DIRTY HELPERS for SHA instruction support.  As with the AES helpers
+   above, these are actually pure functions and are only dirty because
+   clean helpers can't return a V128. */
+
+static inline UInt ROL32 ( UInt x, UInt sh ) {
+   vassert(sh > 0 && sh < 32);
+   return (x << sh) | (x >> (32 - sh));
+}
+
+static inline UInt ROR32 ( UInt x, UInt sh ) {
+   vassert(sh > 0 && sh < 32);
+   return (x >> sh) | (x << (32 - sh));
+}
+
+static inline UInt SHAchoose ( UInt x, UInt y, UInt z ) {
+   return ((y ^ z) & x) ^ z;
+}
+
+static inline UInt SHAmajority ( UInt x, UInt y, UInt z ) {
+   return (x & y) | ((x | y) & z);
+}
+
+static inline UInt SHAparity ( UInt x, UInt y, UInt z ) {
+   return x ^ y ^ z;
+}
+
+static inline UInt SHAhashSIGMA0 ( UInt x ) {
+   return ROR32(x, 2) ^ ROR32(x, 13) ^ ROR32(x, 22);
+}
+
+static inline UInt SHAhashSIGMA1 ( UInt x ) {
+   return ROR32(x, 6) ^ ROR32(x, 11) ^ ROR32(x, 25);
+}
+
+static void SHA256hash ( /*MOD*/V128* X, /*MOD*/V128* Y, const V128* W )
+{
+   UInt e;
+   for (e = 0; e <= 3; e++) {
+      UInt chs = SHAchoose(Y->w32[0], Y->w32[1], Y->w32[2]);
+      UInt maj = SHAmajority(X->w32[0], X->w32[1], X->w32[2]);
+      UInt t   = Y->w32[3] + SHAhashSIGMA1(Y->w32[0]) + chs + W->w32[e];
+      X->w32[3] = t + X->w32[3];
+      Y->w32[3] = t + SHAhashSIGMA0(X->w32[0]) + maj;
+      UInt ts = Y->w32[3];
+      Y->w32[3] = Y->w32[2];
+      Y->w32[2] = Y->w32[1];
+      Y->w32[1] = Y->w32[0];
+      Y->w32[0] = X->w32[3];
+      X->w32[3] = X->w32[2];
+      X->w32[2] = X->w32[1];
+      X->w32[1] = X->w32[0];
+      X->w32[0] = ts;
+   }
+}
+
+/* CALLED FROM GENERATED CODE */
+void arm64g_dirtyhelper_SHA1C ( /*OUT*/V128* res, ULong dHi, ULong dLo,
+                                ULong nHi, ULong nLo, ULong mHi, ULong mLo )
+{
+   vassert(nHi == 0);
+   vassert((nLo >> 32) == 0);
+   V128 X; X.w64[1] = dHi; X.w64[0] = dLo;
+   UInt Y; Y = (UInt)nLo;
+   V128 W; W.w64[1] = mHi; W.w64[0] = mLo;
+   UInt e;
+   for (e = 0; e <= 3; e++) {
+      UInt t = SHAchoose(X.w32[1], X.w32[2], X.w32[3]);
+      Y = Y + ROL32(X.w32[0], 5) + t + W.w32[e];
+      X.w32[1] = ROL32(X.w32[1], 30);
+      UInt oldY = Y;
+      Y = X.w32[3];
+      X.w32[3] = X.w32[2];
+      X.w32[2] = X.w32[1];
+      X.w32[1] = X.w32[0];
+      X.w32[0] = oldY;
+   }
+   res->w64[1] = X.w64[1];
+   res->w64[0] = X.w64[0];
+}
+
+/* CALLED FROM GENERATED CODE */
+void arm64g_dirtyhelper_SHA1H ( /*OUT*/V128* res, ULong nHi, ULong nLo )
+{
+   vassert(nHi == 0);
+   vassert((nLo >> 32) == 0);
+   res->w32[3] = res->w32[2] = res->w32[1] = 0;
+   res->w32[0] = ROL32((UInt)nLo, 30);
+}
+
+/* CALLED FROM GENERATED CODE */
+void arm64g_dirtyhelper_SHA1M ( /*OUT*/V128* res, ULong dHi, ULong dLo,
+                                ULong nHi, ULong nLo, ULong mHi, ULong mLo )
+{
+   vassert(nHi == 0);
+   vassert((nLo >> 32) == 0);
+   V128 X; X.w64[1] = dHi; X.w64[0] = dLo;
+   UInt Y; Y = (UInt)nLo;
+   V128 W; W.w64[1] = mHi; W.w64[0] = mLo;
+   UInt e;
+   for (e = 0; e <= 3; e++) {
+      UInt t = SHAmajority(X.w32[1], X.w32[2], X.w32[3]);
+      Y = Y + ROL32(X.w32[0], 5) + t + W.w32[e];
+      X.w32[1] = ROL32(X.w32[1], 30);
+      UInt oldY = Y;
+      Y = X.w32[3];
+      X.w32[3] = X.w32[2];
+      X.w32[2] = X.w32[1];
+      X.w32[1] = X.w32[0];
+      X.w32[0] = oldY;
+   }
+   res->w64[1] = X.w64[1];
+   res->w64[0] = X.w64[0];
+}
+
+/* CALLED FROM GENERATED CODE */
+void arm64g_dirtyhelper_SHA1P ( /*OUT*/V128* res, ULong dHi, ULong dLo,
+                                ULong nHi, ULong nLo, ULong mHi, ULong mLo )
+{
+   vassert(nHi == 0);
+   vassert((nLo >> 32) == 0);
+   V128 X; X.w64[1] = dHi; X.w64[0] = dLo;
+   UInt Y; Y = (UInt)nLo;
+   V128 W; W.w64[1] = mHi; W.w64[0] = mLo;
+   UInt e;
+   for (e = 0; e <= 3; e++) {
+      UInt t = SHAparity(X.w32[1], X.w32[2], X.w32[3]);
+      Y = Y + ROL32(X.w32[0], 5) + t + W.w32[e];
+      X.w32[1] = ROL32(X.w32[1], 30);
+      UInt oldY = Y;
+      Y = X.w32[3];
+      X.w32[3] = X.w32[2];
+      X.w32[2] = X.w32[1];
+      X.w32[1] = X.w32[0];
+      X.w32[0] = oldY;
+   }
+   res->w64[1] = X.w64[1];
+   res->w64[0] = X.w64[0];
+}
+
+/* CALLED FROM GENERATED CODE */
+void arm64g_dirtyhelper_SHA1SU0 ( /*OUT*/V128* res, ULong dHi, ULong dLo,
+                                  ULong nHi, ULong nLo, ULong mHi, ULong mLo )
+{
+   res->w64[1] = nLo;
+   res->w64[0] = dHi;
+   res->w64[1] ^= dHi ^ mHi;
+   res->w64[0] ^= dLo ^ mLo;
+}
+
+/* CALLED FROM GENERATED CODE */
+void arm64g_dirtyhelper_SHA1SU1 ( /*OUT*/V128* res, ULong dHi, ULong dLo,
+                                  ULong nHi, ULong nLo )
+{
+   /* This computes "T = Vd ^ (Vn >>u 32)" */
+   V128 T; T.w64[1] = nHi; T.w64[0] = nLo;
+   T.w32[0] = T.w32[1];
+   T.w32[1] = T.w32[2];
+   T.w32[2] = T.w32[3];
+   T.w32[3] = 0;
+   T.w64[1] ^= dHi;
+   T.w64[0] ^= dLo;
+   /* */
+   res->w32[0] = ROL32(T.w32[0], 1);
+   res->w32[1] = ROL32(T.w32[1], 1);
+   res->w32[2] = ROL32(T.w32[2], 1);
+   res->w32[3] = ROL32(T.w32[3], 1) ^ ROL32(T.w32[0], 2);
+}
+
+/* CALLED FROM GENERATED CODE */
+void arm64g_dirtyhelper_SHA256H2 ( /*OUT*/V128* res, ULong dHi, ULong dLo,
+                                   ULong nHi, ULong nLo, ULong mHi, ULong mLo )
+{
+   V128 X; X.w64[1] = nHi; X.w64[0] = nLo;
+   V128 Y; Y.w64[1] = dHi; Y.w64[0] = dLo;
+   V128 W; W.w64[1] = mHi; W.w64[0] = mLo;
+   SHA256hash(&X, &Y, &W);
+   res->w64[1] = Y.w64[1];
+   res->w64[0] = Y.w64[0];
+}
+
+/* CALLED FROM GENERATED CODE */
+void arm64g_dirtyhelper_SHA256H ( /*OUT*/V128* res, ULong dHi, ULong dLo,
+                                  ULong nHi, ULong nLo, ULong mHi, ULong mLo )
+{
+   V128 X; X.w64[1] = dHi; X.w64[0] = dLo;
+   V128 Y; Y.w64[1] = nHi; Y.w64[0] = nLo;
+   V128 W; W.w64[1] = mHi; W.w64[0] = mLo;
+   SHA256hash(&X, &Y, &W);
+   res->w64[1] = X.w64[1];
+   res->w64[0] = X.w64[0];
+}
+
+/* CALLED FROM GENERATED CODE */
+void arm64g_dirtyhelper_SHA256SU0 ( /*OUT*/V128* res, ULong dHi, ULong dLo,
+                                    ULong nHi, ULong nLo )
+
+{
+   res->w64[1] = res->w64[0] = 0;
+   V128 op1; op1.w64[1] = dHi; op1.w64[0] = dLo;
+   V128 op2; op2.w64[1] = nHi; op2.w64[0] = nLo;
+   V128 T;
+   T.w32[3] = op2.w32[0];
+   T.w32[2] = op1.w32[3];
+   T.w32[1] = op1.w32[2];
+   T.w32[0] = op1.w32[1];
+   UInt e;
+   for (e = 0; e <= 3; e++) {
+      UInt elt = T.w32[e];
+      elt = ROR32(elt, 7) ^ ROR32(elt, 18) ^ (elt >> 3);
+      res->w32[e] = elt + op1.w32[e];
+   }
+}
+
+/* CALLED FROM GENERATED CODE */
+void arm64g_dirtyhelper_SHA256SU1 ( /*OUT*/V128* res, ULong dHi, ULong dLo,
+                                    ULong nHi, ULong nLo,
+                                    ULong mHi, ULong mLo )
+{
+   res->w64[0] = res->w64[1] = 0;
+   V128 op1; op1.w64[1] = dHi; op1.w64[0] = dLo;
+   V128 op2; op2.w64[1] = nHi; op2.w64[0] = nLo;
+   V128 op3; op3.w64[1] = mHi; op3.w64[0] = mLo;
+   V128 T0;
+   T0.w32[3] = op3.w32[0];
+   T0.w32[2] = op2.w32[3];
+   T0.w32[1] = op2.w32[2];
+   T0.w32[0] = op2.w32[1];
+   UInt T1[2];
+   UInt e;
+   T1[1] = op3.w32[3];
+   T1[0] = op3.w32[2];
+   for (e = 0; e <= 1; e++) {
+      UInt elt = T1[e];
+      elt = ROR32(elt, 17) ^ ROR32(elt, 19) ^ (elt >> 10);
+      elt = elt + op1.w32[e] + T0.w32[e];
+      res->w32[e] = elt;
+   }
+   T1[1] = res->w32[1];
+   T1[0] = res->w32[0];
+   for (e = 2; e <= 3; e++) {
+      UInt elt = T1[e-2];
+      elt = ROR32(elt, 17) ^ ROR32(elt, 19) ^ (elt >> 10);
+      elt = elt + op1.w32[e] + T0.w32[e];
+      res->w32[e] = elt;
+   }
+}
+
+
 /*---------------------------------------------------------------*/
 /*--- Flag-helpers translation-time function specialisers.    ---*/
 /*--- These help iropt specialise calls the above run-time    ---*/
diff --git a/VEX/priv/guest_arm64_toIR.c b/VEX/priv/guest_arm64_toIR.c
index 8da9780..e527447 100644
--- a/VEX/priv/guest_arm64_toIR.c
+++ b/VEX/priv/guest_arm64_toIR.c
@@ -148,8 +148,9 @@
 static ULong sx_to_64 ( ULong x, UInt n )
 {
    vassert(n > 1 && n < 64);
+   x <<= (64-n);
    Long r = (Long)x;
-   r = (r << (64-n)) >> (64-n);
+   r >>= (64-n);
    return (ULong)r;
 }
 
@@ -2590,7 +2591,7 @@
                   IRTemp old = newTemp(Ity_I32);
                   assign(old, getIReg32orZR(dd));
                   vassert(hw <= 1);
-                  UInt mask = 0xFFFF << (16 * hw);
+                  UInt mask = ((UInt)0xFFFF) << (16 * hw);
                   IRExpr* res
                      = binop(Iop_Or32, 
                              binop(Iop_And32, mkexpr(old), mkU32(~mask)),
@@ -3497,6 +3498,75 @@
           nameIReg32orZR(mm), nameIReg64orZR(aa));
       return True;
    }
+
+   /* -------------------- CRC32/CRC32C -------------------- */   
+   /* 31 30           20 15   11 9 4
+      sf 00 1101 0110 m  0100 sz n d   CRC32<sz>  Wd, Wn, Wm|Xm
+      sf 00 1101 0110 m  0101 sz n d   CRC32C<sz> Wd, Wn, Wm|Xm
+   */
+   if (INSN(30,21) == BITS10(0,0,1,1,0,1,0,1,1,0)
+       && INSN(15,13) == BITS3(0,1,0)) {
+      UInt bitSF = INSN(31,31);
+      UInt mm    = INSN(20,16);
+      UInt bitC  = INSN(12,12);
+      UInt sz    = INSN(11,10);
+      UInt nn    = INSN(9,5);
+      UInt dd    = INSN(4,0);
+      vassert(sz >= 0 && sz <= 3);
+      if ((bitSF == 0 && sz <= BITS2(1,0))
+          || (bitSF == 1 && sz == BITS2(1,1))) {
+         UInt ix = (bitC == 1 ? 4 : 0) | sz;
+         void* helpers[8]
+            = { &arm64g_calc_crc32b,   &arm64g_calc_crc32h,
+                &arm64g_calc_crc32w,   &arm64g_calc_crc32x,
+                &arm64g_calc_crc32cb,  &arm64g_calc_crc32ch,
+                &arm64g_calc_crc32cw,  &arm64g_calc_crc32cx };
+         const HChar* hNames[8]
+            = { "arm64g_calc_crc32b",  "arm64g_calc_crc32h",
+                "arm64g_calc_crc32w",  "arm64g_calc_crc32x",
+                "arm64g_calc_crc32cb", "arm64g_calc_crc32ch",
+                "arm64g_calc_crc32cw", "arm64g_calc_crc32cx" };
+         const HChar* iNames[8]
+            = { "crc32b",  "crc32h",  "crc32w",  "crc32x",
+                "crc32cb", "crc32ch", "crc32cw", "crc32cx" };
+
+         IRTemp srcN = newTemp(Ity_I64);
+         assign(srcN, unop(Iop_32Uto64, unop(Iop_64to32, getIReg64orZR(nn))));
+
+         IRTemp  srcM = newTemp(Ity_I64);
+         IRExpr* at64 = getIReg64orZR(mm);
+         switch (sz) {
+            case BITS2(0,0):
+               assign(srcM, binop(Iop_And64, at64, mkU64(0xFF))); break;
+            case BITS2(0,1):
+               assign(srcM, binop(Iop_And64, at64, mkU64(0xFFFF))); break;
+            case BITS2(1,0):
+               assign(srcM, binop(Iop_And64, at64, mkU64(0xFFFFFFFF))); break;
+            case BITS2(1,1):
+               assign(srcM, at64); break;
+            default:
+               vassert(0);
+         }
+
+         vassert(ix >= 0 && ix <= 7);
+
+         putIReg64orZR(
+            dd,
+            unop(Iop_32Uto64,
+                 unop(Iop_64to32,
+                      mkIRExprCCall(Ity_I64, 0/*regparm*/,
+                                    hNames[ix], helpers[ix],
+                                    mkIRExprVec_2(mkexpr(srcN),
+                                                  mkexpr(srcM))))));
+
+         DIP("%s %s, %s, %s\n", iNames[ix],
+             nameIReg32orZR(dd),
+             nameIReg32orZR(nn), nameIRegOrZR(bitSF == 1, mm));
+         return True;
+      }
+      /* fall through */
+   }
+
    vex_printf("ARM64 front end: data_processing_register\n");
    return False;
 #  undef INSN
@@ -4804,7 +4874,6 @@
       (at-EA)
       x0 101 0010 L imm7 Rt2 Rn Rt1  mmP Rt1,Rt2, [Xn|SP, #imm]
    */
-
    UInt insn_30_23 = INSN(30,23);
    if (insn_30_23 == BITS8(0,1,0,1,0,0,0,1) 
        || insn_30_23 == BITS8(0,1,0,1,0,0,1,1)
@@ -4912,6 +4981,87 @@
       }
    }
 
+   /* -------- LDPSW (immediate, simm7) (INT REGS) -------- */
+   /* Does 32 bit transfers which are sign extended to 64 bits.
+      simm7 is scaled by the (single-register) transfer size
+
+      (at-Rn-then-Rn=EA)
+      01 101 0001 1 imm7 Rt2 Rn Rt1  LDPSW Rt1,Rt2, [Xn|SP], #imm
+   
+      (at-EA-then-Rn=EA)
+      01 101 0011 1 imm7 Rt2 Rn Rt1  LDPSW Rt1,Rt2, [Xn|SP, #imm]!
+
+      (at-EA)
+      01 101 0010 1 imm7 Rt2 Rn Rt1  LDPSW Rt1,Rt2, [Xn|SP, #imm]
+   */
+   UInt insn_31_22 = INSN(31,22);
+   if (insn_31_22 == BITS10(0,1,1,0,1,0,0,0,1,1)
+       || insn_31_22 == BITS10(0,1,1,0,1,0,0,1,1,1)
+       || insn_31_22 == BITS10(0,1,1,0,1,0,0,1,0,1)) {
+      UInt bWBack = INSN(23,23);
+      UInt rT1    = INSN(4,0);
+      UInt rN     = INSN(9,5);
+      UInt rT2    = INSN(14,10);
+      Long simm7  = (Long)sx_to_64(INSN(21,15), 7);
+      if ((bWBack && (rT1 == rN || rT2 == rN) && rN != 31)
+          || (rT1 == rT2)) {
+         /* undecodable; fall through */
+      } else {
+         if (rN == 31) { /* FIXME generate stack alignment check */ }
+
+         // Compute the transfer address TA and the writeback address WA.
+         IRTemp tRN = newTemp(Ity_I64);
+         assign(tRN, getIReg64orSP(rN));
+         IRTemp tEA = newTemp(Ity_I64);
+         simm7 = 4 * simm7;
+         assign(tEA, binop(Iop_Add64, mkexpr(tRN), mkU64(simm7)));
+
+         IRTemp tTA = newTemp(Ity_I64);
+         IRTemp tWA = newTemp(Ity_I64);
+         switch (INSN(24,23)) {
+            case BITS2(0,1):
+               assign(tTA, mkexpr(tRN)); assign(tWA, mkexpr(tEA)); break;
+            case BITS2(1,1):
+               assign(tTA, mkexpr(tEA)); assign(tWA, mkexpr(tEA)); break;
+            case BITS2(1,0):
+               assign(tTA, mkexpr(tEA)); /* tWA is unused */ break;
+            default:
+               vassert(0); /* NOTREACHED */
+         }
+
+         // 32 bit load, sign extended to 64 bits
+         putIReg64orZR(rT1, unop(Iop_32Sto64,
+                                 loadLE(Ity_I32, binop(Iop_Add64,
+                                                       mkexpr(tTA),
+                                                       mkU64(0)))));
+         putIReg64orZR(rT2, unop(Iop_32Sto64,
+                                 loadLE(Ity_I32, binop(Iop_Add64,
+                                                       mkexpr(tTA),
+                                                       mkU64(4)))));
+         if (bWBack)
+            putIReg64orSP(rN, mkexpr(tEA));
+
+         const HChar* fmt_str = NULL;
+         switch (INSN(24,23)) {
+            case BITS2(0,1):
+               fmt_str = "ldpsw %s, %s, [%s], #%lld (at-Rn-then-Rn=EA)\n";
+               break;
+            case BITS2(1,1):
+               fmt_str = "ldpsw %s, %s, [%s, #%lld]! (at-EA-then-Rn=EA)\n";
+               break;
+            case BITS2(1,0):
+               fmt_str = "ldpsw %s, %s, [%s, #%lld] (at-Rn)\n";
+               break;
+            default:
+               vassert(0);
+         }
+         DIP(fmt_str, nameIReg64orZR(rT1),
+                      nameIReg64orZR(rT2),
+                      nameIReg64orSP(rN), simm7);
+         return True;
+      }
+   }
+
    /* ---------------- LDR (literal, int reg) ---------------- */
    /* 31 29      23    4
       00 011 000 imm19 Rt   LDR   Wt, [PC + sxTo64(imm19 << 2)]
@@ -6872,7 +7022,21 @@
       return True;
    }
 
-  //fail:
+   /* ------------------- CLREX ------------------ */
+   /* 31        23        15   11 7
+      1101 0101 0000 0011 0011 m  0101 1111  CLREX CRm
+      CRm is apparently ignored.
+   */
+   if ((INSN(31,0) & 0xFFFFF0FF) == 0xD503305F) {
+      UInt mm = INSN(11,8);
+      /* AFAICS, this simply cancels a (all?) reservations made by a
+         (any?) preceding LDREX(es).  Arrange to hand it through to
+         the back end. */
+      stmt( IRStmt_MBE(Imbe_CancelReservation) );
+      DIP("clrex #%u\n", mm);
+      return True;
+   }
+
    vex_printf("ARM64 front end: branch_etc\n");
    return False;
 #  undef INSN
@@ -10863,7 +11027,7 @@
       /* -------- 1,0000 UADDL{2} -------- */
       /* -------- 0,0010 SSUBL{2} -------- */
       /* -------- 1,0010 USUBL{2} -------- */
-      /* Widens, and size refers to the narrowed lanes. */
+      /* Widens, and size refers to the narrow lanes. */
       if (size == X11) return False;
       vassert(size <= 2);
       Bool   isU   = bitU == 1;
@@ -10889,7 +11053,7 @@
       /* -------- 1,0001 UADDW{2} -------- */
       /* -------- 0,0011 SSUBW{2} -------- */
       /* -------- 1,0011 USUBW{2} -------- */
-      /* Widens, and size refers to the narrowed lanes. */
+      /* Widens, and size refers to the narrow lanes. */
       if (size == X11) return False;
       vassert(size <= 2);
       Bool   isU   = bitU == 1;
@@ -10953,7 +11117,7 @@
       /* -------- 1,0101 UABAL{2} -------- */
       /* -------- 0,0111 SABDL{2} -------- */
       /* -------- 1,0111 UABDL{2} -------- */
-      /* Widens, and size refers to the narrowed lanes. */
+      /* Widens, and size refers to the narrow lanes. */
       if (size == X11) return False;
       vassert(size <= 2);
       Bool   isU   = bitU == 1;
@@ -10983,7 +11147,7 @@
       /* -------- 1,1000  UMLAL{2} -------- */ // 1
       /* -------- 0,1010  SMLSL{2} -------- */ // 2
       /* -------- 1,1010  UMLSL{2} -------- */ // 2
-      /* Widens, and size refers to the narrowed lanes. */
+      /* Widens, and size refers to the narrow lanes. */
       UInt ks = 3;
       switch (opcode) {
          case BITS4(1,1,0,0): ks = 0; break;
@@ -11020,7 +11184,7 @@
       /* -------- 0,1101  SQDMULL{2} -------- */ // 0 (ks)
       /* -------- 0,1001  SQDMLAL{2} -------- */ // 1
       /* -------- 0,1011  SQDMLSL{2} -------- */ // 2
-      /* Widens, and size refers to the narrowed lanes. */
+      /* Widens, and size refers to the narrow lanes. */
       UInt ks = 3;
       switch (opcode) {
          case BITS4(1,1,0,1): ks = 0; break;
@@ -11058,17 +11222,42 @@
 
    if (bitU == 0 && opcode == BITS4(1,1,1,0)) {
       /* -------- 0,1110  PMULL{2} -------- */
-      /* Widens, and size refers to the narrowed lanes. */
-      if (size != X00) return False;
-      IRTemp res
-         = math_BINARY_WIDENING_V128(is2, Iop_PolynomialMull8x8,
-                                     getQReg128(nn), getQReg128(mm));
-      putQReg128(dd, mkexpr(res));
-      const HChar* arrNarrow = nameArr_Q_SZ(bitQ, size);
-      const HChar* arrWide   = nameArr_Q_SZ(1,    size+1);
+      /* Widens, and size refers to the narrow lanes. */
+      if (size != X00 && size != X11) return False;
+      IRTemp  res  = IRTemp_INVALID;
+      IRExpr* srcN = getQReg128(nn);
+      IRExpr* srcM = getQReg128(mm);
+      const HChar* arrNarrow = NULL;
+      const HChar* arrWide   = NULL;
+      if (size == X00) {
+         res = math_BINARY_WIDENING_V128(is2, Iop_PolynomialMull8x8,
+                                         srcN, srcM);
+         arrNarrow = nameArr_Q_SZ(bitQ, size);
+         arrWide   = nameArr_Q_SZ(1,    size+1);
+      } else {
+         /* The same thing as the X00 case, except we have to call
+            a helper to do it. */
+         vassert(size == X11);
+         res = newTemp(Ity_V128);
+         IROp slice
+            = is2 ? Iop_V128HIto64 : Iop_V128to64;
+         IRExpr** args
+            = mkIRExprVec_3( IRExpr_VECRET(),
+                             unop(slice, srcN), unop(slice, srcM));
+         IRDirty* di
+            = unsafeIRDirty_1_N( res, 0/*regparms*/,
+                                      "arm64g_dirtyhelper_PMULLQ",
+                                      &arm64g_dirtyhelper_PMULLQ, args);
+         stmt(IRStmt_Dirty(di));
+         /* We can't use nameArr_Q_SZ for this because it can't deal with
+            Q-sized (128 bit) results.  Hence do it by hand. */
+         arrNarrow = bitQ == 0 ? "1d" : "2d";
+         arrWide   = "1q";
+      }
+      putQReg128(dd, mkexpr(res));    
       DIP("%s%s %s.%s, %s.%s, %s.%s\n", "pmull", is2 ? "2" : "",
-          nameQReg128(dd), arrNarrow,
-          nameQReg128(nn), arrWide, nameQReg128(mm), arrWide);
+          nameQReg128(dd), arrWide,
+          nameQReg128(nn), arrNarrow, nameQReg128(mm), arrNarrow);
       return True;
    }
 
@@ -12780,7 +12969,75 @@
 static
 Bool dis_AdvSIMD_crypto_aes(/*MB_OUT*/DisResult* dres, UInt insn)
 {
+   /* 31        23   21    16     11 9 4
+      0100 1110 size 10100 opcode 10 n d
+      Decode fields are: size,opcode
+      Size is always 00 in ARMv8, it appears.
+   */
 #  define INSN(_bMax,_bMin)  SLICE_UInt(insn, (_bMax), (_bMin))
+   if (INSN(31,24) != BITS8(0,1,0,0,1,1,1,0)
+      || INSN(21,17) != BITS5(1,0,1,0,0) || INSN(11,10) != BITS2(1,0)) {
+      return False;
+   }
+   UInt size   = INSN(23,22);
+   UInt opcode = INSN(16,12);
+   UInt nn     = INSN(9,5);
+   UInt dd     = INSN(4,0);
+
+   if (size == BITS2(0,0)
+       && (opcode == BITS5(0,0,1,0,0) || opcode == BITS5(0,0,1,0,1))) {
+      /* -------- 00,00100: AESE Vd.16b, Vn.16b -------- */
+      /* -------- 00,00101: AESD Vd.16b, Vn.16b -------- */
+      Bool   isD  = opcode == BITS5(0,0,1,0,1);
+      IRTemp op1  = newTemp(Ity_V128);
+      IRTemp op2  = newTemp(Ity_V128);
+      IRTemp xord = newTemp(Ity_V128);
+      IRTemp res  = newTemp(Ity_V128);
+      void*        helper = isD ? &arm64g_dirtyhelper_AESD
+                                : &arm64g_dirtyhelper_AESE;
+      const HChar* hname  = isD ? "arm64g_dirtyhelper_AESD"
+                                : "arm64g_dirtyhelper_AESE";
+      assign(op1, getQReg128(dd));
+      assign(op2, getQReg128(nn));
+      assign(xord, binop(Iop_XorV128, mkexpr(op1), mkexpr(op2)));
+      IRDirty* di
+         = unsafeIRDirty_1_N( res, 0/*regparms*/, hname, helper,
+                              mkIRExprVec_3(
+                                 IRExpr_VECRET(),
+                                 unop(Iop_V128HIto64, mkexpr(xord)),
+                                 unop(Iop_V128to64, mkexpr(xord)) ) );
+      stmt(IRStmt_Dirty(di));
+      putQReg128(dd, mkexpr(res));
+      DIP("aes%c %s.16b, %s.16b\n", isD ? 'd' : 'e',
+                                    nameQReg128(dd), nameQReg128(nn));
+      return True;
+   }
+
+   if (size == BITS2(0,0)
+       && (opcode == BITS5(0,0,1,1,0) || opcode == BITS5(0,0,1,1,1))) {
+      /* -------- 00,00110: AESMC  Vd.16b, Vn.16b -------- */
+      /* -------- 00,00111: AESIMC Vd.16b, Vn.16b -------- */
+      Bool   isI  = opcode == BITS5(0,0,1,1,1);
+      IRTemp src  = newTemp(Ity_V128);
+      IRTemp res  = newTemp(Ity_V128);
+      void*        helper = isI ? &arm64g_dirtyhelper_AESIMC
+                                : &arm64g_dirtyhelper_AESMC;
+      const HChar* hname  = isI ? "arm64g_dirtyhelper_AESIMC"
+                                : "arm64g_dirtyhelper_AESMC";
+      assign(src, getQReg128(nn));
+      IRDirty* di
+         = unsafeIRDirty_1_N( res, 0/*regparms*/, hname, helper,
+                              mkIRExprVec_3(
+                                 IRExpr_VECRET(),
+                                 unop(Iop_V128HIto64, mkexpr(src)),
+                                 unop(Iop_V128to64, mkexpr(src)) ) );
+      stmt(IRStmt_Dirty(di));
+      putQReg128(dd, mkexpr(res));
+      DIP("aes%s %s.16b, %s.16b\n", isI ? "imc" : "mc",
+                                    nameQReg128(dd), nameQReg128(nn));
+      return True;
+   }
+
    return False;
 #  undef INSN
 }
@@ -12789,7 +13046,102 @@
 static
 Bool dis_AdvSIMD_crypto_three_reg_sha(/*MB_OUT*/DisResult* dres, UInt insn)
 {
+   /* 31   28   23 21 20 15 14  11 9 4
+      0101 1110 sz 0  m  0  opc 00 n d
+      Decode fields are: sz,opc
+   */
 #  define INSN(_bMax,_bMin)  SLICE_UInt(insn, (_bMax), (_bMin))
+   if (INSN(31,24) != BITS8(0,1,0,1,1,1,1,0) || INSN(21,21) != 0
+       || INSN(15,15) != 0 || INSN(11,10) != BITS2(0,0)) {
+      return False;
+   }
+   UInt sz  = INSN(23,22);
+   UInt mm  = INSN(20,16);
+   UInt opc = INSN(14,12);
+   UInt nn  = INSN(9,5);
+   UInt dd  = INSN(4,0);
+   if (sz == BITS2(0,0) && opc <= BITS3(1,1,0)) {
+      /* -------- 00,000 SHA1C     Qd,    Sn,    Vm.4S -------- */
+      /* -------- 00,001 SHA1P     Qd,    Sn,    Vm.4S -------- */
+      /* -------- 00,010 SHA1M     Qd,    Sn,    Vm.4S -------- */
+      /* -------- 00,011 SHA1SU0   Vd.4S, Vn.4S, Vm.4S -------- */
+      /* -------- 00,100 SHA256H   Qd,    Qn,    Vm.4S -------- */
+      /* -------- 00,101 SHA256H2  Qd,    Qn,    Vm.4S -------- */
+      /* -------- 00,110 SHA256SU1 Vd.4S, Vn.4S, Vm.4S -------- */
+      vassert(opc < 7);
+      const HChar* inames[7]
+         = { "sha1c", "sha1p", "sha1m", "sha1su0",
+             "sha256h", "sha256h2", "sha256su1" };
+      void(*helpers[7])(V128*,ULong,ULong,ULong,ULong,ULong,ULong)
+         = { &arm64g_dirtyhelper_SHA1C,    &arm64g_dirtyhelper_SHA1P,
+             &arm64g_dirtyhelper_SHA1M,    &arm64g_dirtyhelper_SHA1SU0,
+             &arm64g_dirtyhelper_SHA256H,  &arm64g_dirtyhelper_SHA256H2,
+             &arm64g_dirtyhelper_SHA256SU1 };
+      const HChar* hnames[7]
+         = { "arm64g_dirtyhelper_SHA1C",    "arm64g_dirtyhelper_SHA1P",
+             "arm64g_dirtyhelper_SHA1M",    "arm64g_dirtyhelper_SHA1SU0",
+             "arm64g_dirtyhelper_SHA256H",  "arm64g_dirtyhelper_SHA256H2",
+             "arm64g_dirtyhelper_SHA256SU1" };
+      IRTemp vD      = newTemp(Ity_V128);
+      IRTemp vN      = newTemp(Ity_V128);
+      IRTemp vM      = newTemp(Ity_V128);
+      IRTemp vDhi    = newTemp(Ity_I64);
+      IRTemp vDlo    = newTemp(Ity_I64);
+      IRTemp vNhiPre = newTemp(Ity_I64);
+      IRTemp vNloPre = newTemp(Ity_I64);
+      IRTemp vNhi    = newTemp(Ity_I64);
+      IRTemp vNlo    = newTemp(Ity_I64);
+      IRTemp vMhi    = newTemp(Ity_I64);
+      IRTemp vMlo    = newTemp(Ity_I64);
+      assign(vD,      getQReg128(dd));
+      assign(vN,      getQReg128(nn));
+      assign(vM,      getQReg128(mm));
+      assign(vDhi,    unop(Iop_V128HIto64, mkexpr(vD)));
+      assign(vDlo,    unop(Iop_V128to64,   mkexpr(vD)));
+      assign(vNhiPre, unop(Iop_V128HIto64, mkexpr(vN)));
+      assign(vNloPre, unop(Iop_V128to64,   mkexpr(vN)));
+      assign(vMhi,    unop(Iop_V128HIto64, mkexpr(vM)));
+      assign(vMlo,    unop(Iop_V128to64,   mkexpr(vM)));
+      /* Mask off any bits of the N register operand that aren't actually
+         needed, so that Memcheck doesn't complain unnecessarily. */
+      switch (opc) {
+         case BITS3(0,0,0): case BITS3(0,0,1): case BITS3(0,1,0):
+            assign(vNhi, mkU64(0));
+            assign(vNlo, unop(Iop_32Uto64, unop(Iop_64to32, mkexpr(vNloPre))));
+            break;
+         case BITS3(0,1,1): case BITS3(1,0,0):
+         case BITS3(1,0,1): case BITS3(1,1,0):
+            assign(vNhi, mkexpr(vNhiPre));
+            assign(vNlo, mkexpr(vNloPre));
+            break;
+         default:
+            vassert(0);
+      }
+      IRTemp res = newTemp(Ity_V128);
+      IRDirty* di
+         = unsafeIRDirty_1_N( res, 0/*regparms*/, hnames[opc], helpers[opc],
+                              mkIRExprVec_7(
+                                 IRExpr_VECRET(),
+                                 mkexpr(vDhi), mkexpr(vDlo), mkexpr(vNhi),
+                                 mkexpr(vNlo), mkexpr(vMhi), mkexpr(vMlo)));
+      stmt(IRStmt_Dirty(di));
+      putQReg128(dd, mkexpr(res));
+      switch (opc) {
+         case BITS3(0,0,0): case BITS3(0,0,1): case BITS3(0,1,0):
+            DIP("%s q%u, s%u, v%u.4s\n", inames[opc], dd, nn, mm);
+            break;
+         case BITS3(0,1,1): case BITS3(1,1,0):
+            DIP("%s v%u.4s, v%u.4s, v%u.4s\n", inames[opc], dd, nn, mm);
+            break;
+         case BITS3(1,0,0): case BITS3(1,0,1):
+            DIP("%s q%u, q%u, v%u.4s\n", inames[opc], dd, nn, mm);
+            break;
+         default:
+            vassert(0);
+      }
+      return True;
+   }
+   
    return False;
 #  undef INSN
 }
@@ -12798,7 +13150,91 @@
 static
 Bool dis_AdvSIMD_crypto_two_reg_sha(/*MB_OUT*/DisResult* dres, UInt insn)
 {
+   /* 31   28   23 21    16  11 9 4
+      0101 1110 sz 10100 opc 10 n d
+      Decode fields are: sz,opc
+   */
 #  define INSN(_bMax,_bMin)  SLICE_UInt(insn, (_bMax), (_bMin))
+   if (INSN(31,24) != BITS8(0,1,0,1,1,1,1,0)
+       || INSN(21,17) != BITS5(1,0,1,0,0) || INSN(11,10) != BITS2(1,0)) {
+      return False;
+   }
+   UInt sz  = INSN(23,22);
+   UInt opc = INSN(16,12);
+   UInt nn  = INSN(9,5);
+   UInt dd  = INSN(4,0);
+   if (sz == BITS2(0,0) && opc <= BITS5(0,0,0,1,0)) {
+      /* -------- 00,00000 SHA1H     Sd,    Sn    -------- */
+      /* -------- 00,00001 SHA1SU1   Vd.4S, Vn.4S -------- */
+      /* -------- 00,00010 SHA256SU0 Vd.4S, Vn.4S -------- */
+      vassert(opc < 3);
+      const HChar* inames[3] = { "sha1h", "sha1su1", "sha256su0" };
+      IRTemp vD   = newTemp(Ity_V128);
+      IRTemp vN   = newTemp(Ity_V128);
+      IRTemp vDhi = newTemp(Ity_I64);
+      IRTemp vDlo = newTemp(Ity_I64);
+      IRTemp vNhi = newTemp(Ity_I64);
+      IRTemp vNlo = newTemp(Ity_I64);
+      assign(vD,   getQReg128(dd));
+      assign(vN,   getQReg128(nn));
+      assign(vDhi, unop(Iop_V128HIto64, mkexpr(vD)));
+      assign(vDlo, unop(Iop_V128to64,   mkexpr(vD)));
+      assign(vNhi, unop(Iop_V128HIto64, mkexpr(vN)));
+      assign(vNlo, unop(Iop_V128to64,   mkexpr(vN)));
+      /* Mask off any bits of the N register operand that aren't actually
+         needed, so that Memcheck doesn't complain unnecessarily.  Also
+         construct the calls, given that the helper functions don't take
+         the same number of arguments. */
+      IRDirty* di  = NULL;
+      IRTemp   res = newTemp(Ity_V128);
+      switch (opc) {
+         case BITS5(0,0,0,0,0): {
+            IRExpr* vNloMasked = unop(Iop_32Uto64,
+                                      unop(Iop_64to32, mkexpr(vNlo)));
+            di = unsafeIRDirty_1_N( res, 0/*regparms*/,
+                                    "arm64g_dirtyhelper_SHA1H",
+                                    &arm64g_dirtyhelper_SHA1H,            
+                                    mkIRExprVec_3(
+                                       IRExpr_VECRET(),
+                                       mkU64(0), vNloMasked) );
+            break;
+         }
+         case BITS5(0,0,0,0,1):
+            di = unsafeIRDirty_1_N( res, 0/*regparms*/,
+                                    "arm64g_dirtyhelper_SHA1SU1",
+                                    &arm64g_dirtyhelper_SHA1SU1,
+                                    mkIRExprVec_5(
+                                       IRExpr_VECRET(),
+                                       mkexpr(vDhi), mkexpr(vDlo),
+                                       mkexpr(vNhi), mkexpr(vNlo)) );
+            break;
+         case BITS5(0,0,0,1,0):
+            di = unsafeIRDirty_1_N( res, 0/*regparms*/,
+                                    "arm64g_dirtyhelper_SHA256SU0",
+                                    &arm64g_dirtyhelper_SHA256SU0,
+                                    mkIRExprVec_5(
+                                       IRExpr_VECRET(),
+                                       mkexpr(vDhi), mkexpr(vDlo),
+                                       mkexpr(vNhi), mkexpr(vNlo)) );
+            break;
+         default:
+            vassert(0);
+      }
+      stmt(IRStmt_Dirty(di));
+      putQReg128(dd, mkexpr(res));
+      switch (opc) {
+         case BITS5(0,0,0,0,0):
+            DIP("%s s%u, s%u\n", inames[opc], dd, nn);
+            break;
+         case BITS5(0,0,0,0,1): case BITS5(0,0,0,1,0):
+            DIP("%s v%u.4s, v%u.4s\n", inames[opc], dd, nn);
+            break;
+         default:
+            vassert(0);
+      }
+      return True;
+   }
+   
    return False;
 #  undef INSN
 }
diff --git a/VEX/priv/guest_arm_defs.h b/VEX/priv/guest_arm_defs.h
index a97265e..a8a278c 100644
--- a/VEX/priv/guest_arm_defs.h
+++ b/VEX/priv/guest_arm_defs.h
@@ -111,6 +111,118 @@
 UInt armg_calculate_flag_qc ( UInt resL1, UInt resL2,
                               UInt resR1, UInt resR2 );
 
+/* --- DIRTY HELPERS --- */
+
+/* Confusingly, for the AES insns, the 32-bit ARM docs refers to the
+   one-and-only source register as 'm' whereas the 64-bit docs refer to
+   it as 'n'.  We sidestep that here by just calling it 'arg32_*'. */
+
+extern
+void armg_dirtyhelper_AESE (
+        /*OUT*/V128* res,
+        UInt arg32_3, UInt arg32_2, UInt arg32_1, UInt arg32_0
+     );
+
+extern
+void armg_dirtyhelper_AESD (
+        /*OUT*/V128* res,
+        UInt arg32_3, UInt arg32_2, UInt arg32_1, UInt arg32_0
+     );
+
+extern
+void armg_dirtyhelper_AESMC (
+        /*OUT*/V128* res,
+        UInt arg32_3, UInt arg32_2, UInt arg32_1, UInt arg32_0
+     );
+
+extern
+void armg_dirtyhelper_AESIMC (
+        /*OUT*/V128* res,
+        UInt arg32_3, UInt arg32_2, UInt arg32_1, UInt arg32_0
+     );
+
+extern
+void armg_dirtyhelper_SHA1C (
+        /*OUT*/V128* res,
+        UInt argD3, UInt argD2, UInt argD1, UInt argD0,
+        UInt argN3, UInt argN2, UInt argN1, UInt argN0,
+        UInt argM3, UInt argM2, UInt argM1, UInt argM0
+     );
+
+extern
+void armg_dirtyhelper_SHA1P (
+        /*OUT*/V128* res,
+        UInt argD3, UInt argD2, UInt argD1, UInt argD0,
+        UInt argN3, UInt argN2, UInt argN1, UInt argN0,
+        UInt argM3, UInt argM2, UInt argM1, UInt argM0
+     );
+
+extern
+void armg_dirtyhelper_SHA1M (
+        /*OUT*/V128* res,
+        UInt argD3, UInt argD2, UInt argD1, UInt argD0,
+        UInt argN3, UInt argN2, UInt argN1, UInt argN0,
+        UInt argM3, UInt argM2, UInt argM1, UInt argM0
+     );
+
+extern
+void armg_dirtyhelper_SHA1SU0 (
+        /*OUT*/V128* res,
+        UInt argD3, UInt argD2, UInt argD1, UInt argD0,
+        UInt argN3, UInt argN2, UInt argN1, UInt argN0,
+        UInt argM3, UInt argM2, UInt argM1, UInt argM0
+     );
+
+extern
+void armg_dirtyhelper_SHA256H (
+        /*OUT*/V128* res,
+        UInt argD3, UInt argD2, UInt argD1, UInt argD0,
+        UInt argN3, UInt argN2, UInt argN1, UInt argN0,
+        UInt argM3, UInt argM2, UInt argM1, UInt argM0
+     );
+
+extern
+void armg_dirtyhelper_SHA256H2 (
+        /*OUT*/V128* res,
+        UInt argD3, UInt argD2, UInt argD1, UInt argD0,
+        UInt argN3, UInt argN2, UInt argN1, UInt argN0,
+        UInt argM3, UInt argM2, UInt argM1, UInt argM0
+     );
+
+extern
+void armg_dirtyhelper_SHA256SU1 (
+        /*OUT*/V128* res,
+        UInt argD3, UInt argD2, UInt argD1, UInt argD0,
+        UInt argN3, UInt argN2, UInt argN1, UInt argN0,
+        UInt argM3, UInt argM2, UInt argM1, UInt argM0
+     );
+
+extern
+void armg_dirtyhelper_SHA1SU1 (
+        /*OUT*/V128* res,
+        UInt argD3, UInt argD2, UInt argD1, UInt argD0,
+        UInt argM3, UInt argM2, UInt argM1, UInt argM0
+     );
+
+extern
+void armg_dirtyhelper_SHA256SU0 (
+        /*OUT*/V128* res,
+        UInt argD3, UInt argD2, UInt argD1, UInt argD0,
+        UInt argM3, UInt argM2, UInt argM1, UInt argM0
+     );
+
+extern
+void armg_dirtyhelper_SHA1H (
+        /*OUT*/V128* res,
+        UInt argM3, UInt argM2, UInt argM1, UInt argM0
+     );
+
+extern
+void armg_dirtyhelper_VMULLP64 (
+        /*OUT*/V128* res,
+        UInt argN1, UInt argN0, UInt argM1, UInt argM0
+     );
+
 
 /*---------------------------------------------------------*/
 /*--- Condition code stuff                              ---*/
diff --git a/VEX/priv/guest_arm_helpers.c b/VEX/priv/guest_arm_helpers.c
index 92b96cf..9002a76 100644
--- a/VEX/priv/guest_arm_helpers.c
+++ b/VEX/priv/guest_arm_helpers.c
@@ -38,6 +38,7 @@
 #include "main_globals.h"
 #include "guest_generic_bb_to_IR.h"
 #include "guest_arm_defs.h"
+#include "guest_arm64_defs.h"  /* for crypto helper functions */
 
 
 /* This file contains helper functions for arm guest code.  Calls to
@@ -535,6 +536,264 @@
 
 
 /*---------------------------------------------------------------*/
+/*--- Crypto instruction helpers                              ---*/
+/*---------------------------------------------------------------*/
+
+/* DIRTY HELPERS for doing AES support:
+   * AESE (SubBytes, then ShiftRows)
+   * AESD (InvShiftRows, then InvSubBytes)
+   * AESMC (MixColumns)
+   * AESIMC (InvMixColumns)
+   These don't actually have to be dirty helpers -- they could be
+   clean, but for the fact that they return a V128 and a clean helper
+   can't do that.
+
+   These just call onwards to the implementations of the same in
+   guest_arm64_helpers.c.  In all of these cases, we expect |res| to
+   be at least 8 aligned.
+*/
+/* CALLED FROM GENERATED CODE */
+void armg_dirtyhelper_AESE (
+        /*OUT*/V128* res,
+        UInt arg32_3, UInt arg32_2, UInt arg32_1, UInt arg32_0
+     )
+{
+   vassert(0 == (((HWord)res) & (8-1)));
+   ULong argHi = (((ULong)arg32_3) << 32) | ((ULong)arg32_2);
+   ULong argLo = (((ULong)arg32_1) << 32) | ((ULong)arg32_0);
+   arm64g_dirtyhelper_AESE(res, argHi, argLo);
+}
+
+/* CALLED FROM GENERATED CODE */
+void armg_dirtyhelper_AESD (
+        /*OUT*/V128* res,
+        UInt arg32_3, UInt arg32_2, UInt arg32_1, UInt arg32_0
+     )
+{
+   vassert(0 == (((HWord)res) & (8-1)));
+   ULong argHi = (((ULong)arg32_3) << 32) | ((ULong)arg32_2);
+   ULong argLo = (((ULong)arg32_1) << 32) | ((ULong)arg32_0);
+   arm64g_dirtyhelper_AESD(res, argHi, argLo);
+}
+
+/* CALLED FROM GENERATED CODE */
+void armg_dirtyhelper_AESMC (
+        /*OUT*/V128* res,
+        UInt arg32_3, UInt arg32_2, UInt arg32_1, UInt arg32_0
+     )
+{
+   vassert(0 == (((HWord)res) & (8-1)));
+   ULong argHi = (((ULong)arg32_3) << 32) | ((ULong)arg32_2);
+   ULong argLo = (((ULong)arg32_1) << 32) | ((ULong)arg32_0);
+   arm64g_dirtyhelper_AESMC(res, argHi, argLo);
+}
+
+/* CALLED FROM GENERATED CODE */
+void armg_dirtyhelper_AESIMC (
+        /*OUT*/V128* res,
+        UInt arg32_3, UInt arg32_2, UInt arg32_1, UInt arg32_0
+     )
+{
+   vassert(0 == (((HWord)res) & (8-1)));
+   ULong argHi = (((ULong)arg32_3) << 32) | ((ULong)arg32_2);
+   ULong argLo = (((ULong)arg32_1) << 32) | ((ULong)arg32_0);
+   arm64g_dirtyhelper_AESIMC(res, argHi, argLo);
+}
+
+
+/* DIRTY HELPERS for the SHA instruction family.  Same comments
+   as for the AES group above apply.
+*/
+
+/* CALLED FROM GENERATED CODE */
+void armg_dirtyhelper_SHA1C (
+        /*OUT*/V128* res,
+        UInt argD3, UInt argD2, UInt argD1, UInt argD0,
+        UInt argN3, UInt argN2, UInt argN1, UInt argN0,
+        UInt argM3, UInt argM2, UInt argM1, UInt argM0
+     )
+{
+   vassert(0 == (((HWord)res) & (8-1)));
+   ULong argDhi = (((ULong)argD3) << 32) | ((ULong)argD2);
+   ULong argDlo = (((ULong)argD1) << 32) | ((ULong)argD0);
+   ULong argNhi = (((ULong)argN3) << 32) | ((ULong)argN2);
+   ULong argNlo = (((ULong)argN1) << 32) | ((ULong)argN0);
+   ULong argMhi = (((ULong)argM3) << 32) | ((ULong)argM2);
+   ULong argMlo = (((ULong)argM1) << 32) | ((ULong)argM0);
+   arm64g_dirtyhelper_SHA1C(res, argDhi, argDlo,
+                                 argNhi, argNlo, argMhi, argMlo);
+}
+
+/* CALLED FROM GENERATED CODE */
+void armg_dirtyhelper_SHA1P (
+        /*OUT*/V128* res,
+        UInt argD3, UInt argD2, UInt argD1, UInt argD0,
+        UInt argN3, UInt argN2, UInt argN1, UInt argN0,
+        UInt argM3, UInt argM2, UInt argM1, UInt argM0
+     )
+{
+   vassert(0 == (((HWord)res) & (8-1)));
+   ULong argDhi = (((ULong)argD3) << 32) | ((ULong)argD2);
+   ULong argDlo = (((ULong)argD1) << 32) | ((ULong)argD0);
+   ULong argNhi = (((ULong)argN3) << 32) | ((ULong)argN2);
+   ULong argNlo = (((ULong)argN1) << 32) | ((ULong)argN0);
+   ULong argMhi = (((ULong)argM3) << 32) | ((ULong)argM2);
+   ULong argMlo = (((ULong)argM1) << 32) | ((ULong)argM0);
+   arm64g_dirtyhelper_SHA1P(res, argDhi, argDlo,
+                                 argNhi, argNlo, argMhi, argMlo);
+}
+
+/* CALLED FROM GENERATED CODE */
+void armg_dirtyhelper_SHA1M (
+        /*OUT*/V128* res,
+        UInt argD3, UInt argD2, UInt argD1, UInt argD0,
+        UInt argN3, UInt argN2, UInt argN1, UInt argN0,
+        UInt argM3, UInt argM2, UInt argM1, UInt argM0
+     )
+{
+   vassert(0 == (((HWord)res) & (8-1)));
+   ULong argDhi = (((ULong)argD3) << 32) | ((ULong)argD2);
+   ULong argDlo = (((ULong)argD1) << 32) | ((ULong)argD0);
+   ULong argNhi = (((ULong)argN3) << 32) | ((ULong)argN2);
+   ULong argNlo = (((ULong)argN1) << 32) | ((ULong)argN0);
+   ULong argMhi = (((ULong)argM3) << 32) | ((ULong)argM2);
+   ULong argMlo = (((ULong)argM1) << 32) | ((ULong)argM0);
+   arm64g_dirtyhelper_SHA1M(res, argDhi, argDlo,
+                                 argNhi, argNlo, argMhi, argMlo);
+}
+
+/* CALLED FROM GENERATED CODE */
+void armg_dirtyhelper_SHA1SU0 (
+        /*OUT*/V128* res,
+        UInt argD3, UInt argD2, UInt argD1, UInt argD0,
+        UInt argN3, UInt argN2, UInt argN1, UInt argN0,
+        UInt argM3, UInt argM2, UInt argM1, UInt argM0
+     )
+{
+   vassert(0 == (((HWord)res) & (8-1)));
+   ULong argDhi = (((ULong)argD3) << 32) | ((ULong)argD2);
+   ULong argDlo = (((ULong)argD1) << 32) | ((ULong)argD0);
+   ULong argNhi = (((ULong)argN3) << 32) | ((ULong)argN2);
+   ULong argNlo = (((ULong)argN1) << 32) | ((ULong)argN0);
+   ULong argMhi = (((ULong)argM3) << 32) | ((ULong)argM2);
+   ULong argMlo = (((ULong)argM1) << 32) | ((ULong)argM0);
+   arm64g_dirtyhelper_SHA1SU0(res, argDhi, argDlo,
+                                   argNhi, argNlo, argMhi, argMlo);
+}
+
+/* CALLED FROM GENERATED CODE */
+void armg_dirtyhelper_SHA256H (
+        /*OUT*/V128* res,
+        UInt argD3, UInt argD2, UInt argD1, UInt argD0,
+        UInt argN3, UInt argN2, UInt argN1, UInt argN0,
+        UInt argM3, UInt argM2, UInt argM1, UInt argM0
+     )
+{
+   vassert(0 == (((HWord)res) & (8-1)));
+   ULong argDhi = (((ULong)argD3) << 32) | ((ULong)argD2);
+   ULong argDlo = (((ULong)argD1) << 32) | ((ULong)argD0);
+   ULong argNhi = (((ULong)argN3) << 32) | ((ULong)argN2);
+   ULong argNlo = (((ULong)argN1) << 32) | ((ULong)argN0);
+   ULong argMhi = (((ULong)argM3) << 32) | ((ULong)argM2);
+   ULong argMlo = (((ULong)argM1) << 32) | ((ULong)argM0);
+   arm64g_dirtyhelper_SHA256H(res, argDhi, argDlo,
+                                   argNhi, argNlo, argMhi, argMlo);
+}
+
+/* CALLED FROM GENERATED CODE */
+void armg_dirtyhelper_SHA256H2 (
+        /*OUT*/V128* res,
+        UInt argD3, UInt argD2, UInt argD1, UInt argD0,
+        UInt argN3, UInt argN2, UInt argN1, UInt argN0,
+        UInt argM3, UInt argM2, UInt argM1, UInt argM0
+     )
+{
+   vassert(0 == (((HWord)res) & (8-1)));
+   ULong argDhi = (((ULong)argD3) << 32) | ((ULong)argD2);
+   ULong argDlo = (((ULong)argD1) << 32) | ((ULong)argD0);
+   ULong argNhi = (((ULong)argN3) << 32) | ((ULong)argN2);
+   ULong argNlo = (((ULong)argN1) << 32) | ((ULong)argN0);
+   ULong argMhi = (((ULong)argM3) << 32) | ((ULong)argM2);
+   ULong argMlo = (((ULong)argM1) << 32) | ((ULong)argM0);
+   arm64g_dirtyhelper_SHA256H2(res, argDhi, argDlo,
+                                    argNhi, argNlo, argMhi, argMlo);
+}
+
+/* CALLED FROM GENERATED CODE */
+void armg_dirtyhelper_SHA256SU1 (
+        /*OUT*/V128* res,
+        UInt argD3, UInt argD2, UInt argD1, UInt argD0,
+        UInt argN3, UInt argN2, UInt argN1, UInt argN0,
+        UInt argM3, UInt argM2, UInt argM1, UInt argM0
+     )
+{
+   vassert(0 == (((HWord)res) & (8-1)));
+   ULong argDhi = (((ULong)argD3) << 32) | ((ULong)argD2);
+   ULong argDlo = (((ULong)argD1) << 32) | ((ULong)argD0);
+   ULong argNhi = (((ULong)argN3) << 32) | ((ULong)argN2);
+   ULong argNlo = (((ULong)argN1) << 32) | ((ULong)argN0);
+   ULong argMhi = (((ULong)argM3) << 32) | ((ULong)argM2);
+   ULong argMlo = (((ULong)argM1) << 32) | ((ULong)argM0);
+   arm64g_dirtyhelper_SHA256SU1(res, argDhi, argDlo,
+                                     argNhi, argNlo, argMhi, argMlo);
+}
+
+/* CALLED FROM GENERATED CODE */
+void armg_dirtyhelper_SHA1SU1 (
+        /*OUT*/V128* res,
+        UInt argD3, UInt argD2, UInt argD1, UInt argD0,
+        UInt argM3, UInt argM2, UInt argM1, UInt argM0
+     )
+{
+   vassert(0 == (((HWord)res) & (8-1)));
+   ULong argDhi = (((ULong)argD3) << 32) | ((ULong)argD2);
+   ULong argDlo = (((ULong)argD1) << 32) | ((ULong)argD0);
+   ULong argMhi = (((ULong)argM3) << 32) | ((ULong)argM2);
+   ULong argMlo = (((ULong)argM1) << 32) | ((ULong)argM0);
+   arm64g_dirtyhelper_SHA1SU1(res, argDhi, argDlo, argMhi, argMlo);
+}
+
+/* CALLED FROM GENERATED CODE */
+void armg_dirtyhelper_SHA256SU0 (
+        /*OUT*/V128* res,
+        UInt argD3, UInt argD2, UInt argD1, UInt argD0,
+        UInt argM3, UInt argM2, UInt argM1, UInt argM0
+     )
+{
+   vassert(0 == (((HWord)res) & (8-1)));
+   ULong argDhi = (((ULong)argD3) << 32) | ((ULong)argD2);
+   ULong argDlo = (((ULong)argD1) << 32) | ((ULong)argD0);
+   ULong argMhi = (((ULong)argM3) << 32) | ((ULong)argM2);
+   ULong argMlo = (((ULong)argM1) << 32) | ((ULong)argM0);
+   arm64g_dirtyhelper_SHA256SU0(res, argDhi, argDlo, argMhi, argMlo);
+}
+     
+/* CALLED FROM GENERATED CODE */
+void armg_dirtyhelper_SHA1H (
+        /*OUT*/V128* res,
+        UInt argM3, UInt argM2, UInt argM1, UInt argM0
+     )
+{
+   vassert(0 == (((HWord)res) & (8-1)));
+   ULong argMhi = (((ULong)argM3) << 32) | ((ULong)argM2);
+   ULong argMlo = (((ULong)argM1) << 32) | ((ULong)argM0);
+   arm64g_dirtyhelper_SHA1H(res, argMhi, argMlo);
+}
+
+/* CALLED FROM GENERATED CODE */
+void armg_dirtyhelper_VMULLP64 (
+        /*OUT*/V128* res,
+        UInt argN1, UInt argN0, UInt argM1, UInt argM0
+     )
+{
+   vassert(0 == (((HWord)res) & (8-1)));
+   ULong argN = (((ULong)argN1) << 32) | ((ULong)argN0);
+   ULong argM = (((ULong)argM1) << 32) | ((ULong)argM0);
+   arm64g_dirtyhelper_PMULLQ(res, argN, argM);
+}
+
+
+/*---------------------------------------------------------------*/
 /*--- Flag-helpers translation-time function specialisers.    ---*/
 /*--- These help iropt specialise calls the above run-time    ---*/
 /*--- flags functions.                                        ---*/
diff --git a/VEX/priv/guest_arm_toIR.c b/VEX/priv/guest_arm_toIR.c
index 54ccd07..ef5a79b 100644
--- a/VEX/priv/guest_arm_toIR.c
+++ b/VEX/priv/guest_arm_toIR.c
@@ -264,6 +264,14 @@
    (((_b9) << 9) | ((_b8) << 8)                                \
     | BITS8((_b7),(_b6),(_b5),(_b4),(_b3),(_b2),(_b1),(_b0)))
 
+#define BITS11(_b10,_b9,_b8,_b7,_b6,_b5,_b4,_b3,_b2,_b1,_b0)  \
+   ( ((_b10) << 10) | ((_b9) << 9) | ((_b8) << 8)              \
+    | BITS8((_b7),(_b6),(_b5),(_b4),(_b3),(_b2),(_b1),(_b0)))
+
+#define BITS12(_b11,_b10,_b9,_b8,_b7,_b6,_b5,_b4,_b3,_b2,_b1,_b0)  \
+   ( ((_b11) << 11) | ((_b10) << 10) | ((_b9) << 9) | ((_b8) << 8) \
+    | BITS8((_b7),(_b6),(_b5),(_b4),(_b3),(_b2),(_b1),(_b0)))
+
 /* produces _uint[_bMax:_bMin] */
 #define SLICE_UInt(_uint,_bMax,_bMin) \
    (( ((UInt)(_uint)) >> (_bMin)) \
@@ -1598,8 +1606,9 @@
                              IRTemp regT,  /* value to clamp - Ity_I32 */
                              UInt imm5 )   /* saturation ceiling */
 {
-   UInt ceil  = (1 << imm5) - 1;    // (2^imm5)-1
-   UInt floor = 0;
+   ULong ceil64  = (1ULL << imm5) - 1;    // (2^imm5)-1
+   UInt  ceil    = (UInt)ceil64;
+   UInt  floor   = 0;
 
    IRTemp nd0 = newTemp(Ity_I32);
    IRTemp nd1 = newTemp(Ity_I32);
@@ -1642,8 +1651,10 @@
                            IRTemp* res,    /* OUT - Ity_I32 */
                            IRTemp* resQ )  /* OUT - Ity_I32  */
 {
-   Int ceil  =  (1 << (imm5-1)) - 1;  //  (2^(imm5-1))-1
-   Int floor = -(1 << (imm5-1));      // -(2^(imm5-1))
+   Long ceil64  =  (1LL << (imm5-1)) - 1;  //  (2^(imm5-1))-1
+   Long floor64 = -(1LL << (imm5-1));      // -(2^(imm5-1))
+   Int  ceil    = (Int)ceil64;
+   Int  floor   = (Int)floor64;
 
    IRTemp nd0 = newTemp(Ity_I32);
    IRTemp nd1 = newTemp(Ity_I32);
@@ -3038,6 +3049,12 @@
 static
 Bool dis_neon_data_3same ( UInt theInstr, IRTemp condT )
 {
+   /* In paths where this returns False, indicating a non-decodable
+      instruction, there may still be some IR assignments to temporaries
+      generated.  This is inconvenient but harmless, and the post-front-end
+      IR optimisation pass will just remove them anyway.  So there's no
+      effort made here to tidy it up.
+   */
    UInt Q = (theInstr >> 6) & 1;
    UInt dreg = get_neon_d_regno(theInstr);
    UInt nreg = get_neon_n_regno(theInstr);
@@ -4597,6 +4614,9 @@
             }
          }
          break;
+      case 12: {
+         return False;
+      }
       /* Starting from here these are FP SIMD cases */
       case 13:
          if (B == 0) {
@@ -4810,6 +4830,9 @@
             }
          }
          break;
+      default:
+         /*NOTREACHED*/
+         vassert(0);
    }
 
    if (Q) {
@@ -4825,6 +4848,12 @@
 static
 Bool dis_neon_data_3diff ( UInt theInstr, IRTemp condT )
 {
+   /* In paths where this returns False, indicating a non-decodable
+      instruction, there may still be some IR assignments to temporaries
+      generated.  This is inconvenient but harmless, and the post-front-end
+      IR optimisation pass will just remove them anyway.  So there's no
+      effort made here to tidy it up.
+   */
    UInt A = (theInstr >> 8) & 0xf;
    UInt B = (theInstr >> 20) & 3;
    UInt U = (theInstr >> 24) & 1;
@@ -5182,11 +5211,15 @@
                   op = Iop_PolynomialMull8x8;
                break;
             case 1:
+               if (P) return False;
                op = (U) ? Iop_Mull16Ux4 : Iop_Mull16Sx4;
                break;
             case 2:
+               if (P) return False;
                op = (U) ? Iop_Mull32Ux2 : Iop_Mull32Sx2;
                break;
+            case 3:
+               return False;
             default:
                vassert(0);
          }
@@ -8831,8 +8864,11 @@
 
    Finally, the caller must indicate whether this occurs in ARM or in
    Thumb code.
+
+   This only handles NEON for ARMv7 and below.  The NEON extensions
+   for v8 are handled by decode_V8_instruction.
 */
-static Bool decode_NEON_instruction (
+static Bool decode_NEON_instruction_ARMv7_and_below (
                /*MOD*/DisResult* dres,
                UInt              insn32,
                IRTemp            condT,
@@ -8865,8 +8901,8 @@
        && INSN(27,24) == BITS4(1,1,1,1)) {
       // Thumb, DP
       UInt reformatted = INSN(23,0);
-      reformatted |= (INSN(28,28) << 24); // U bit
-      reformatted |= (BITS7(1,1,1,1,0,0,1) << 25);
+      reformatted |= (((UInt)INSN(28,28)) << 24); // U bit
+      reformatted |= (((UInt)BITS7(1,1,1,1,0,0,1)) << 25);
       return dis_neon_data_processing(reformatted, condT);
    }
 
@@ -8880,7 +8916,7 @@
    }
    if (isT && INSN(31,24) == BITS8(1,1,1,1,1,0,0,1)) {
       UInt reformatted = INSN(23,0);
-      reformatted |= (BITS8(1,1,1,1,0,1,0,0) << 24);
+      reformatted |= (((UInt)BITS8(1,1,1,1,0,1,0,0)) << 24);
       return dis_neon_load_or_store(reformatted, isT, condT);
    }
 
@@ -8909,7 +8945,7 @@
    Caller must supply an IRTemp 'condT' holding the gating condition,
    or IRTemp_INVALID indicating the insn is always executed.
 
-   Caller must also supply an ARMCondcode 'cond'.  This is only used
+   Caller must also supply an ARMCondcode 'conq'.  This is only used
    for debug printing, no other purpose.  For ARM, this is simply the
    top 4 bits of the original instruction.  For Thumb, the condition
    is not (really) known until run time, and so ARMCondAL should be
@@ -12575,6 +12611,863 @@
 
 
 /*------------------------------------------------------------*/
+/*--- V8 instructions                                      ---*/
+/*------------------------------------------------------------*/
+
+/* Break a V128-bit value up into four 32-bit ints. */
+
+static void breakupV128to32s ( IRTemp t128,
+                               /*OUTs*/
+                               IRTemp* t3, IRTemp* t2,
+                               IRTemp* t1, IRTemp* t0 )
+{
+   IRTemp hi64 = newTemp(Ity_I64);
+   IRTemp lo64 = newTemp(Ity_I64);
+   assign( hi64, unop(Iop_V128HIto64, mkexpr(t128)) );
+   assign( lo64, unop(Iop_V128to64,   mkexpr(t128)) );
+
+   vassert(t0 && *t0 == IRTemp_INVALID);
+   vassert(t1 && *t1 == IRTemp_INVALID);
+   vassert(t2 && *t2 == IRTemp_INVALID);
+   vassert(t3 && *t3 == IRTemp_INVALID);
+
+   *t0 = newTemp(Ity_I32);
+   *t1 = newTemp(Ity_I32);
+   *t2 = newTemp(Ity_I32);
+   *t3 = newTemp(Ity_I32);
+   assign( *t0, unop(Iop_64to32,   mkexpr(lo64)) );
+   assign( *t1, unop(Iop_64HIto32, mkexpr(lo64)) );
+   assign( *t2, unop(Iop_64to32,   mkexpr(hi64)) );
+   assign( *t3, unop(Iop_64HIto32, mkexpr(hi64)) );
+}
+
+
+/* Both ARM and Thumb */
+
+/* Translate a V8 instruction.  If successful, returns True and *dres
+   may or may not be updated.  If unsuccessful, returns False and
+   doesn't change *dres nor create any IR.
+
+   The Thumb and ARM encodings are potentially different.  In both
+   ARM and Thumb mode, the caller must pass the entire 32 bits of
+   the instruction.  Callers may pass any instruction; this function
+   ignores anything it doesn't recognise.
+
+   Caller must supply an IRTemp 'condT' holding the gating condition,
+   or IRTemp_INVALID indicating the insn is always executed.
+
+   If we are decoding an ARM instruction which is in the NV space
+   then it is expected that condT will be IRTemp_INVALID, and that is
+   asserted for.  That condition is ensured by the logic near the top
+   of disInstr_ARM_WRK, that sets up condT.
+
+   When decoding for Thumb, the caller must pass the ITState pre/post
+   this instruction, so that we can generate a SIGILL in the cases where
+   the instruction may not be in an IT block.  When decoding for ARM,
+   both of these must be IRTemp_INVALID.
+
+   Finally, the caller must indicate whether this occurs in ARM or in
+   Thumb code.
+*/
+static Bool decode_V8_instruction (
+               /*MOD*/DisResult* dres,
+               UInt              insnv8,
+               IRTemp            condT,
+               Bool              isT,
+               IRTemp            old_itstate,
+               IRTemp            new_itstate
+            )
+{
+#  define INSN(_bMax,_bMin)   SLICE_UInt(insnv8, (_bMax), (_bMin))
+
+   if (isT) {
+      vassert(old_itstate != IRTemp_INVALID);
+      vassert(new_itstate != IRTemp_INVALID);
+   } else {
+      vassert(old_itstate == IRTemp_INVALID);
+      vassert(new_itstate == IRTemp_INVALID);
+   }
+
+   /* ARMCondcode 'conq' is only used for debug printing and for no other
+      purpose.  For ARM, this is simply the top 4 bits of the instruction.
+      For Thumb, the condition is not (really) known until run time, and so
+      we set it to ARMCondAL in order that printing of these instructions
+      does not show any condition. */
+   ARMCondcode conq;
+   if (isT) {
+      conq = ARMCondAL;
+   } else {
+      conq = (ARMCondcode)INSN(31,28);
+      if (conq == ARMCondNV || conq == ARMCondAL) {
+         vassert(condT == IRTemp_INVALID);
+      } else {
+         vassert(condT != IRTemp_INVALID);
+      }
+      vassert(conq >= ARMCondEQ && conq <= ARMCondNV);
+   }
+
+   /* ----------- {AESD, AESE, AESMC, AESIMC}.8 q_q ----------- */
+   /*     31   27   23  21 19 17 15 11   7      3
+      T1: 1111 1111 1 D 11 sz 00 d  0011 00 M 0 m  AESE Qd, Qm
+      A1: 1111 0011 1 D 11 sz 00 d  0011 00 M 0 m  AESE Qd, Qm
+
+      T1: 1111 1111 1 D 11 sz 00 d  0011 01 M 0 m  AESD Qd, Qm
+      A1: 1111 0011 1 D 11 sz 00 d  0011 01 M 0 m  AESD Qd, Qm
+
+      T1: 1111 1111 1 D 11 sz 00 d  0011 10 M 0 m  AESMC Qd, Qm
+      A1: 1111 0011 1 D 11 sz 00 d  0011 10 M 0 m  AESMC Qd, Qm
+
+      T1: 1111 1111 1 D 11 sz 00 d  0011 11 M 0 m  AESIMC Qd, Qm
+      A1: 1111 0011 1 D 11 sz 00 d  0011 11 M 0 m  AESIMC Qd, Qm
+
+      sz must be 00
+      ARM encoding is in NV space.
+      In Thumb mode, we must not be in an IT block.
+   */
+   {
+     UInt regD = 99, regM = 99, opc = 4/*invalid*/;
+     Bool gate = True;
+
+     UInt high9 = isT ? BITS9(1,1,1,1,1,1,1,1,1) : BITS9(1,1,1,1,0,0,1,1,1);
+     if (INSN(31,23) == high9 && INSN(21,16) == BITS6(1,1,0,0,0,0)
+         && INSN(11,8) == BITS4(0,0,1,1) && INSN(4,4) == 0) {
+        UInt bitD = INSN(22,22);
+        UInt fldD = INSN(15,12);
+        UInt bitM = INSN(5,5);
+        UInt fldM = INSN(3,0);
+        opc  = INSN(7,6);
+        regD = (bitD << 4) | fldD;
+        regM = (bitM << 4) | fldM;
+     }
+     if ((regD & 1) == 1 || (regM & 1) == 1)
+        gate = False;
+
+     if (gate) {
+        if (isT) {
+           gen_SIGILL_T_if_in_ITBlock(old_itstate, new_itstate);
+        }
+        /* In ARM mode, this is statically unconditional.  In Thumb mode,
+           this must be dynamically unconditional, and we've SIGILLd if not.
+           In either case we can create unconditional IR. */
+        IRTemp op1 = newTemp(Ity_V128);
+        IRTemp op2 = newTemp(Ity_V128);
+        IRTemp src = newTemp(Ity_V128);
+        IRTemp res = newTemp(Ity_V128);
+        assign(op1,  getQReg(regD >> 1));
+        assign(op2,  getQReg(regM >> 1));
+        assign(src,  opc == BITS2(0,0) || opc == BITS2(0,1)
+                        ? binop(Iop_XorV128, mkexpr(op1), mkexpr(op2))
+                        : mkexpr(op2));
+
+        void* helpers[4]
+           = { &armg_dirtyhelper_AESE,  &armg_dirtyhelper_AESD,
+               &armg_dirtyhelper_AESMC, &armg_dirtyhelper_AESIMC };
+        const HChar* hNames[4]
+           = { "armg_dirtyhelper_AESE",  "armg_dirtyhelper_AESD",
+               "armg_dirtyhelper_AESMC", "armg_dirtyhelper_AESIMC" };
+        const HChar* iNames[4]
+           = { "aese", "aesd", "aesmc", "aesimc" };
+
+        vassert(opc >= 0 && opc <= 3);
+        void*        helper = helpers[opc];
+        const HChar* hname  = hNames[opc];
+
+        IRTemp w32_3, w32_2, w32_1, w32_0;
+        w32_3 = w32_2 = w32_1 = w32_0 = IRTemp_INVALID;
+        breakupV128to32s( src, &w32_3, &w32_2, &w32_1, &w32_0 );
+
+        IRDirty* di
+          = unsafeIRDirty_1_N( res, 0/*regparms*/, hname, helper,
+                               mkIRExprVec_5(
+                                  IRExpr_VECRET(),
+                                  mkexpr(w32_3), mkexpr(w32_2),
+                                  mkexpr(w32_1), mkexpr(w32_0)) );
+        stmt(IRStmt_Dirty(di));
+
+        putQReg(regD >> 1, mkexpr(res), IRTemp_INVALID);
+        DIP("%s.8 q%d, q%d\n", iNames[opc], regD >> 1, regM >> 1);
+        return True;
+     }
+     /* fall through */
+   }
+
+   /* ----------- SHA 3-reg insns q_q_q ----------- */
+   /*
+          31   27   23      19 15 11   7       3
+      T1: 1110 1111 0  D 00 n  d  1100 N Q M 0 m  SHA1C Qd, Qn, Qm  ix=0
+      A1: 1111 0010 ----------------------------
+
+      T1: 1110 1111 0  D 01 n  d  1100 N Q M 0 m  SHA1P Qd, Qn, Qm  ix=1
+      A1: 1111 0010 ----------------------------
+
+      T1: 1110 1111 0  D 10 n  d  1100 N Q M 0 m  SHA1M Qd, Qn, Qm  ix=2
+      A1: 1111 0010 ----------------------------
+
+      T1: 1110 1111 0  D 11 n  d  1100 N Q M 0 m  SHA1SU0 Qd, Qn, Qm  ix=3
+      A1: 1111 0010 ----------------------------
+      (that's a complete set of 4, based on insn[21,20])
+
+      T1: 1111 1111 0  D 00 n  d  1100 N Q M 0 m  SHA256H Qd, Qn, Qm  ix=4
+      A1: 1111 0011 ----------------------------
+
+      T1: 1111 1111 0  D 01 n  d  1100 N Q M 0 m  SHA256H2 Qd, Qn, Qm  ix=5
+      A1: 1111 0011 ----------------------------
+
+      T1: 1111 1111 0  D 10 n  d  1100 N Q M 0 m  SHA256SU1 Qd, Qn, Qm  ix=6
+      A1: 1111 0011 ----------------------------
+      (3/4 of a complete set of 4, based on insn[21,20])
+
+      Q must be 1.  Same comments about conditionalisation as for the AES
+      group above apply.
+   */
+   {
+     UInt ix = 8; /* invalid */
+     Bool gate = False;
+
+     UInt hi9_sha1   = isT ? BITS9(1,1,1,0,1,1,1,1,0)
+                           : BITS9(1,1,1,1,0,0,1,0,0);
+     UInt hi9_sha256 = isT ? BITS9(1,1,1,1,1,1,1,1,0)
+                           : BITS9(1,1,1,1,0,0,1,1,0);
+     if ((INSN(31,23) == hi9_sha1 || INSN(31,23) == hi9_sha256)
+         && INSN(11,8) == BITS4(1,1,0,0)
+         && INSN(6,6) == 1 && INSN(4,4) == 0) {
+        ix = INSN(21,20);
+        if (INSN(31,23) == hi9_sha256)
+           ix |= 4;
+        if (ix < 7)
+           gate = True;
+     }
+
+     UInt regN = (INSN(7,7)   << 4)  | INSN(19,16);
+     UInt regD = (INSN(22,22) << 4)  | INSN(15,12);
+     UInt regM = (INSN(5,5)   << 4)  | INSN(3,0);
+     if ((regD & 1) == 1 || (regM & 1) == 1 || (regN & 1) == 1)
+        gate = False;
+
+     if (gate) {
+        vassert(ix >= 0 && ix < 7);
+        const HChar* inames[7]
+           = { "sha1c", "sha1p", "sha1m", "sha1su0",
+               "sha256h", "sha256h2", "sha256su1" };
+        void(*helpers[7])(V128*,UInt,UInt,UInt,UInt,UInt,UInt,
+                                UInt,UInt,UInt,UInt,UInt,UInt)
+           = { &armg_dirtyhelper_SHA1C,    &armg_dirtyhelper_SHA1P,
+               &armg_dirtyhelper_SHA1M,    &armg_dirtyhelper_SHA1SU0,
+               &armg_dirtyhelper_SHA256H,  &armg_dirtyhelper_SHA256H2,
+               &armg_dirtyhelper_SHA256SU1 };
+        const HChar* hnames[7]
+           = { "armg_dirtyhelper_SHA1C",    "armg_dirtyhelper_SHA1P",
+               "armg_dirtyhelper_SHA1M",    "armg_dirtyhelper_SHA1SU0",
+               "armg_dirtyhelper_SHA256H",  "armg_dirtyhelper_SHA256H2",
+               "armg_dirtyhelper_SHA256SU1" };
+
+        /* This is a really lame way to implement this, even worse than
+           the arm64 version.  But at least it works. */
+
+        if (isT) {
+           gen_SIGILL_T_if_in_ITBlock(old_itstate, new_itstate);
+        }
+
+        IRTemp vD = newTemp(Ity_V128);
+        IRTemp vN = newTemp(Ity_V128);
+        IRTemp vM = newTemp(Ity_V128);
+        assign(vD,  getQReg(regD >> 1));
+        assign(vN,  getQReg(regN >> 1));
+        assign(vM,  getQReg(regM >> 1));
+
+        IRTemp d32_3, d32_2, d32_1, d32_0;
+        d32_3 = d32_2 = d32_1 = d32_0 = IRTemp_INVALID;
+        breakupV128to32s( vD, &d32_3, &d32_2, &d32_1, &d32_0 );
+
+        IRTemp n32_3_pre, n32_2_pre, n32_1_pre, n32_0_pre;
+        n32_3_pre = n32_2_pre = n32_1_pre = n32_0_pre = IRTemp_INVALID;
+        breakupV128to32s( vN, &n32_3_pre, &n32_2_pre, &n32_1_pre, &n32_0_pre );
+
+        IRTemp m32_3, m32_2, m32_1, m32_0;
+        m32_3 = m32_2 = m32_1 = m32_0 = IRTemp_INVALID;
+        breakupV128to32s( vM, &m32_3, &m32_2, &m32_1, &m32_0 );
+
+        IRTemp n32_3 = newTemp(Ity_I32);
+        IRTemp n32_2 = newTemp(Ity_I32);
+        IRTemp n32_1 = newTemp(Ity_I32);
+        IRTemp n32_0 = newTemp(Ity_I32);
+
+        /* Mask off any bits of the N register operand that aren't actually
+           needed, so that Memcheck doesn't complain unnecessarily. */
+        switch (ix) {
+           case 0: case 1: case 2:
+              assign(n32_3, mkU32(0));
+              assign(n32_2, mkU32(0));
+              assign(n32_1, mkU32(0));
+              assign(n32_0, mkexpr(n32_0_pre));
+              break;
+           case 3: case 4: case 5: case 6:
+              assign(n32_3, mkexpr(n32_3_pre));
+              assign(n32_2, mkexpr(n32_2_pre));
+              assign(n32_1, mkexpr(n32_1_pre));
+              assign(n32_0, mkexpr(n32_0_pre));
+              break;
+           default:
+              vassert(0);
+        }
+
+        IRExpr** argvec
+           = mkIRExprVec_13(
+                IRExpr_VECRET(),
+                mkexpr(d32_3), mkexpr(d32_2), mkexpr(d32_1), mkexpr(d32_0), 
+                mkexpr(n32_3), mkexpr(n32_2), mkexpr(n32_1), mkexpr(n32_0), 
+                mkexpr(m32_3), mkexpr(m32_2), mkexpr(m32_1), mkexpr(m32_0)
+             );
+
+        IRTemp res = newTemp(Ity_V128);
+        IRDirty* di = unsafeIRDirty_1_N( res, 0/*regparms*/,
+                                         hnames[ix], helpers[ix], argvec );
+        stmt(IRStmt_Dirty(di));
+        putQReg(regD >> 1, mkexpr(res), IRTemp_INVALID);
+
+        DIP("%s.8 q%u, q%u, q%u\n",
+            inames[ix], regD >> 1, regN >> 1, regM >> 1);
+        return True;
+     }
+     /* fall through */
+   }
+
+   /* ----------- SHA1SU1, SHA256SU0 ----------- */
+   /*
+          31   27   23  21 19   15 11   7      3
+      T1: 1111 1111 1 D 11 1010 d  0011 10 M 0 m  SHA1SU1 Qd, Qm
+      A1: 1111 0011 ----------------------------
+
+      T1: 1111 1111 1 D 11 1010 d  0011 11 M 0 m  SHA256SU0 Qd, Qm
+      A1: 1111 0011 ----------------------------
+
+      Same comments about conditionalisation as for the AES group above apply.
+   */
+   {
+     Bool gate = False;
+
+     UInt hi9 = isT ? BITS9(1,1,1,1,1,1,1,1,1) : BITS9(1,1,1,1,0,0,1,1,1);
+     if (INSN(31,23) == hi9 && INSN(21,16) == BITS6(1,1,1,0,1,0)
+         && INSN(11,7) == BITS5(0,0,1,1,1) && INSN(4,4) == 0) {
+        gate = True;
+     }
+
+     UInt regD = (INSN(22,22) << 4) | INSN(15,12);
+     UInt regM = (INSN(5,5)   << 4) | INSN(3,0);
+     if ((regD & 1) == 1 || (regM & 1) == 1)
+        gate = False;
+
+     Bool is_1SU1 = INSN(6,6) == 0;
+
+     if (gate) {
+        const HChar* iname
+           = is_1SU1 ? "sha1su1" : "sha256su0";
+        void (*helper)(V128*,UInt,UInt,UInt,UInt,UInt,UInt,UInt,UInt)
+           = is_1SU1 ? &armg_dirtyhelper_SHA1SU1
+                     : *armg_dirtyhelper_SHA256SU0;
+        const HChar* hname
+           = is_1SU1 ? "armg_dirtyhelper_SHA1SU1"
+                     : "armg_dirtyhelper_SHA256SU0";
+
+        if (isT) {
+           gen_SIGILL_T_if_in_ITBlock(old_itstate, new_itstate);
+        }
+
+        IRTemp vD = newTemp(Ity_V128);
+        IRTemp vM = newTemp(Ity_V128);
+        assign(vD,  getQReg(regD >> 1));
+        assign(vM,  getQReg(regM >> 1));
+
+        IRTemp d32_3, d32_2, d32_1, d32_0;
+        d32_3 = d32_2 = d32_1 = d32_0 = IRTemp_INVALID;
+        breakupV128to32s( vD, &d32_3, &d32_2, &d32_1, &d32_0 );
+
+        IRTemp m32_3, m32_2, m32_1, m32_0;
+        m32_3 = m32_2 = m32_1 = m32_0 = IRTemp_INVALID;
+        breakupV128to32s( vM, &m32_3, &m32_2, &m32_1, &m32_0 );
+
+        IRExpr** argvec
+           = mkIRExprVec_9(
+                IRExpr_VECRET(),
+                mkexpr(d32_3), mkexpr(d32_2), mkexpr(d32_1), mkexpr(d32_0), 
+                mkexpr(m32_3), mkexpr(m32_2), mkexpr(m32_1), mkexpr(m32_0)
+             );
+
+        IRTemp res = newTemp(Ity_V128);
+        IRDirty* di = unsafeIRDirty_1_N( res, 0/*regparms*/,
+                                         hname, helper, argvec );
+        stmt(IRStmt_Dirty(di));
+        putQReg(regD >> 1, mkexpr(res), IRTemp_INVALID);
+
+        DIP("%s.8 q%u, q%u\n", iname, regD >> 1, regM >> 1);
+        return True;
+     }
+     /* fall through */
+   }
+
+   /* ----------- SHA1H ----------- */
+   /*
+          31   27   23  21 19   15 11   7      3
+      T1: 1111 1111 1 D 11 1001 d  0010 11 M 0 m  SHA1H Qd, Qm
+      A1: 1111 0011 ----------------------------
+
+      Same comments about conditionalisation as for the AES group above apply.
+   */
+   {
+     Bool gate = False;
+
+     UInt hi9 = isT ? BITS9(1,1,1,1,1,1,1,1,1) : BITS9(1,1,1,1,0,0,1,1,1);
+     if (INSN(31,23) == hi9 && INSN(21,16) == BITS6(1,1,1,0,0,1)
+         && INSN(11,6) == BITS6(0,0,1,0,1,1) && INSN(4,4) == 0) {
+        gate = True;
+     }
+
+     UInt regD = (INSN(22,22) << 4) | INSN(15,12);
+     UInt regM = (INSN(5,5)   << 4) | INSN(3,0);
+     if ((regD & 1) == 1 || (regM & 1) == 1)
+        gate = False;
+
+     if (gate) {
+        const HChar* iname = "sha1h";
+        void (*helper)(V128*,UInt,UInt,UInt,UInt) = &armg_dirtyhelper_SHA1H;
+        const HChar* hname                        = "armg_dirtyhelper_SHA1H";
+
+        if (isT) {
+           gen_SIGILL_T_if_in_ITBlock(old_itstate, new_itstate);
+        }
+
+        IRTemp vM = newTemp(Ity_V128);
+        assign(vM,  getQReg(regM >> 1));
+
+        IRTemp m32_3, m32_2, m32_1, m32_0;
+        m32_3 = m32_2 = m32_1 = m32_0 = IRTemp_INVALID;
+        breakupV128to32s( vM, &m32_3, &m32_2, &m32_1, &m32_0 );
+        /* m32_3, m32_2, m32_1 are just abandoned.  No harm; iropt will
+           remove them. */
+
+        IRExpr*  zero   = mkU32(0);
+        IRExpr** argvec = mkIRExprVec_5(IRExpr_VECRET(),
+                                        zero, zero, zero, mkexpr(m32_0));
+
+        IRTemp res = newTemp(Ity_V128);
+        IRDirty* di = unsafeIRDirty_1_N( res, 0/*regparms*/,
+                                         hname, helper, argvec );
+        stmt(IRStmt_Dirty(di));
+        putQReg(regD >> 1, mkexpr(res), IRTemp_INVALID);
+
+        DIP("%s.8 q%u, q%u\n", iname, regD >> 1, regM >> 1);
+        return True;
+     }
+     /* fall through */
+   }
+
+   /* ----------- VMULL.P64 ----------- */
+   /*
+          31   27   23  21 19 15 11   7       3
+      T2: 1110 1111 1 D 10 n  d  1110 N 0 M 0 m
+      A2: 1111 0010 -------------------------
+
+      The ARM documentation is pretty difficult to follow here.
+      Same comments about conditionalisation as for the AES group above apply.
+   */
+   {
+     Bool gate = False;
+
+     UInt hi9 = isT ? BITS9(1,1,1,0,1,1,1,1,1) : BITS9(1,1,1,1,0,0,1,0,1);
+     if (INSN(31,23) == hi9 && INSN(21,20) == BITS2(1,0)
+         && INSN(11,8) == BITS4(1,1,1,0)
+         && INSN(6,6) == 0 && INSN(4,4) == 0) {
+        gate = True;
+     }
+
+     UInt regN = (INSN(7,7)   << 4)  | INSN(19,16);
+     UInt regD = (INSN(22,22) << 4)  | INSN(15,12);
+     UInt regM = (INSN(5,5)   << 4)  | INSN(3,0);
+
+     if ((regD & 1) == 1)
+        gate = False;
+
+     if (gate) {
+        const HChar* iname = "vmull";
+        void (*helper)(V128*,UInt,UInt,UInt,UInt) = &armg_dirtyhelper_VMULLP64;
+        const HChar* hname                        = "armg_dirtyhelper_VMULLP64";
+
+        if (isT) {
+           gen_SIGILL_T_if_in_ITBlock(old_itstate, new_itstate);
+        }
+
+        IRTemp srcN = newTemp(Ity_I64);
+        IRTemp srcM = newTemp(Ity_I64);
+        assign(srcN, getDRegI64(regN));
+        assign(srcM, getDRegI64(regM));
+        
+        IRExpr** argvec = mkIRExprVec_5(IRExpr_VECRET(),
+                                        unop(Iop_64HIto32, mkexpr(srcN)),
+                                        unop(Iop_64to32,   mkexpr(srcN)),
+                                        unop(Iop_64HIto32, mkexpr(srcM)),
+                                        unop(Iop_64to32, mkexpr(srcM)));
+
+        IRTemp res = newTemp(Ity_V128);
+        IRDirty* di = unsafeIRDirty_1_N( res, 0/*regparms*/,
+                                         hname, helper, argvec );
+        stmt(IRStmt_Dirty(di));
+        putQReg(regD >> 1, mkexpr(res), IRTemp_INVALID);
+
+        DIP("%s.p64 q%u, q%u, w%u\n", iname, regD >> 1, regN, regM);
+        return True;
+     }
+     /* fall through */
+   }
+
+   /* ----------- LDA{,B,H}, STL{,B,H} ----------- */
+   /*     31   27   23   19   15 11   7    3
+      A1: cond 0001 1001  n    t 1100 1001 1111  LDA  Rt, [Rn]
+      A1: cond 0001 1111  n    t 1100 1001 1111  LDAH Rt, [Rn]
+      A1: cond 0001 1101  n    t 1100 1001 1111  LDAB Rt, [Rn]
+
+      A1: cond 0001 1000  n 1111 1100 1001    t  STL  Rt, [Rn]
+      A1: cond 0001 1110  n 1111 1100 1001    t  STLH Rt, [Rn]
+      A1: cond 0001 1100  n 1111 1100 1001    t  STLB Rt, [Rn]
+
+      T1: 1110 1000 1101  n    t 1111 1010 1111  LDA  Rt, [Rn]
+      T1: 1110 1000 1101  n    t 1111 1001 1111  LDAH Rt, [Rn]
+      T1: 1110 1000 1101  n    t 1111 1000 1111  LDAB Rt, [Rn]
+
+      T1: 1110 1000 1100  n    t 1111 1010 1111  STL  Rt, [Rn]
+      T1: 1110 1000 1100  n    t 1111 1001 1111  STLH Rt, [Rn]
+      T1: 1110 1000 1100  n    t 1111 1000 1111  STLB Rt, [Rn]
+   */
+   {
+     UInt nn     = 16; // invalid
+     UInt tt     = 16; // invalid
+     UInt szBlg2 = 4;  // invalid
+     Bool isLoad = False;
+     Bool gate   = False;
+     if (isT) {
+        if (INSN(31,21) == BITS11(1,1,1,0,1,0,0,0,1,1,0)
+            && INSN(11,6) == BITS6(1,1,1,1,1,0)
+            && INSN(3,0) == BITS4(1,1,1,1)) {
+           nn     = INSN(19,16);
+           tt     = INSN(15,12);
+           isLoad = INSN(20,20) == 1;
+           szBlg2 = INSN(5,4); // 00:B 01:H 10:W 11:invalid
+           gate   = szBlg2 != BITS2(1,1) && tt != 15 && nn != 15;
+        }
+     } else {
+        if (INSN(27,23) == BITS5(0,0,0,1,1) && INSN(20,20) == 1
+            && INSN(11,0) == BITS12(1,1,0,0,1,0,0,1,1,1,1,1)) {
+           nn     = INSN(19,16);
+           tt     = INSN(15,12);
+           isLoad = True;
+           szBlg2     = INSN(22,21); // 10:B 11:H 00:W 01:invalid
+           gate   = szBlg2 != BITS2(0,1) && tt != 15 && nn != 15;
+        }
+        else
+        if (INSN(27,23) == BITS5(0,0,0,1,1) && INSN(20,20) == 0
+            && INSN(15,4) == BITS12(1,1,1,1,1,1,0,0,1,0,0,1)) {
+           nn     = INSN(19,16);
+           tt     = INSN(3,0);
+           isLoad = False;
+           szBlg2     = INSN(22,21);  // 10:B 11:H 00:W 01:invalid
+           gate   = szBlg2 != BITS2(0,1) && tt != 15 && nn != 15;
+        }
+        if (gate) {
+           // Rearrange szBlg2 bits to be the same as the Thumb case
+           switch (szBlg2) {
+              case 2: szBlg2 = 0; break;
+              case 3: szBlg2 = 1; break;
+              case 0: szBlg2 = 2; break;
+              default: /*NOTREACHED*/vassert(0);
+           }
+        }
+     }
+     // For both encodings, the instruction is guarded by condT, which
+     // is passed in by the caller.  Note that the the loads and stores
+     // are conditional, so we don't have to truncate the IRSB at this
+     // point, but the fence is unconditional.  There's no way to
+     // represent a conditional fence without a side exit, but it
+     // doesn't matter from a correctness standpoint that it is
+     // unconditional -- it just loses a bit of performance in the
+     // case where the condition doesn't hold.
+     if (gate) {
+        vassert(szBlg2 <= 2 && nn <= 14 && tt <= 14);
+        IRExpr* ea = llGetIReg(nn);
+        if (isLoad) {
+           static IRLoadGOp cvt[3]
+              = { ILGop_8Uto32, ILGop_16Uto32, ILGop_Ident32 };
+           IRTemp data = newTemp(Ity_I32);
+           loadGuardedLE(data, cvt[szBlg2], ea, mkU32(0)/*alt*/, condT);
+           if (isT) {
+              putIRegT(tt, mkexpr(data), condT);
+           } else {
+              putIRegA(tt, mkexpr(data), condT, Ijk_INVALID);
+           }
+           stmt(IRStmt_MBE(Imbe_Fence));
+        } else {
+           stmt(IRStmt_MBE(Imbe_Fence));
+           IRExpr* data = llGetIReg(tt);
+           switch (szBlg2) {
+              case 0: data = unop(Iop_32to8,  data); break;
+              case 1: data = unop(Iop_32to16, data); break;
+              case 2: break;
+              default: vassert(0);
+           }
+           storeGuardedLE(ea, data, condT);
+        }
+        const HChar* ldNames[3] = { "ldab", "ldah", "lda" };
+        const HChar* stNames[3] = { "stlb", "stlh", "stl" };
+        DIP("%s r%u, [r%u]", (isLoad ? ldNames : stNames)[szBlg2], tt, nn);
+        return True;
+     }
+     /* else fall through */
+   }
+
+   /* ----------- LDAEX{,B,H,D}, STLEX{,B,H,D} ----------- */
+   /*     31   27   23   19 15 11   7    3
+      A1: cond 0001 1101 n  t  1110 1001 1111  LDAEXB Rt, [Rn]
+      A1: cond 0001 1111 n  t  1110 1001 1111  LDAEXH Rt, [Rn]
+      A1: cond 0001 1001 n  t  1110 1001 1111  LDAEX  Rt, [Rn]
+      A1: cond 0001 1011 n  t  1110 1001 1111  LDAEXD Rt, Rt+1, [Rn]
+
+      A1: cond 0001 1100 n  d  1110 1001 t     STLEXB Rd, Rt, [Rn]
+      A1: cond 0001 1110 n  d  1110 1001 t     STLEXH Rd, Rt, [Rn]
+      A1: cond 0001 1000 n  d  1110 1001 t     STLEX  Rd, Rt, [Rn]
+      A1: cond 0001 1010 n  d  1110 1001 t     STLEXD Rd, Rt, Rt+1, [Rn]
+
+          31  28   24    19 15 11   7    3
+      T1: 111 0100 01101 n  t  1111 1100 1111  LDAEXB Rt, [Rn]
+      T1: 111 0100 01101 n  t  1111 1101 1111  LDAEXH Rt, [Rn]
+      T1: 111 0100 01101 n  t  1111 1110 1111  LDAEX  Rt, [Rn]
+      T1: 111 0100 01101 n  t  t2   1111 1111  LDAEXD Rt, Rt2, [Rn]
+
+      T1: 111 0100 01100 n  t  1111 1100 d     STLEXB Rd, Rt, [Rn]
+      T1: 111 0100 01100 n  t  1111 1101 d     STLEXH Rd, Rt, [Rn]
+      T1: 111 0100 01100 n  t  1111 1110 d     STLEX  Rd, Rt, [Rn]
+      T1: 111 0100 01100 n  t  t2   1111 d     STLEXD Rd, Rt, Rt2, [Rn]
+   */
+   {
+     UInt nn     = 16; // invalid
+     UInt tt     = 16; // invalid
+     UInt tt2    = 16; // invalid
+     UInt dd     = 16; // invalid
+     UInt szBlg2 = 4;  // invalid
+     Bool isLoad = False;
+     Bool gate   = False;
+     if (isT) {
+        if (INSN(31,21) == BITS11(1,1,1,0,1,0,0,0,1,1,0)
+            && INSN(7,6) == BITS2(1,1)) {
+           isLoad = INSN(20,20) == 1;
+           nn     = INSN(19,16);
+           tt     = INSN(15,12);
+           tt2    = INSN(11,8);
+           szBlg2 = INSN(5,4);
+           dd     = INSN(3,0);
+           gate   = True;
+           if (szBlg2 < BITS2(1,1) && tt2 != BITS4(1,1,1,1)) gate = False;
+           if (isLoad && dd != BITS4(1,1,1,1)) gate = False;
+           // re-set not-used register values to invalid
+           if (szBlg2 < BITS2(1,1)) tt2 = 16;
+           if (isLoad) dd = 16;
+        }
+     } else {
+        /* ARM encoding.  Do the load and store cases separately as
+           the register numbers are in different places and a combined decode
+           is too confusing. */
+        if (INSN(27,23) == BITS5(0,0,0,1,1) && INSN(20,20) == 1
+            && INSN(11,0) == BITS12(1,1,1,0,1,0,0,1,1,1,1,1)) {
+           szBlg2 = INSN(22,21);
+           isLoad = True;
+           nn     = INSN(19,16);
+           tt     = INSN(15,12);
+           gate   = True;
+        }
+        else
+        if (INSN(27,23) == BITS5(0,0,0,1,1) && INSN(20,20) == 0
+            && INSN(11,4) == BITS8(1,1,1,0,1,0,0,1)) {
+           szBlg2 = INSN(22,21);
+           isLoad = False;
+           nn     = INSN(19,16);
+           dd     = INSN(15,12);
+           tt     = INSN(3,0);
+           gate   = True;
+        }
+        if (gate) {
+           // Rearrange szBlg2 bits to be the same as the Thumb case
+           switch (szBlg2) {
+              case 2: szBlg2 = 0; break;
+              case 3: szBlg2 = 1; break;
+              case 0: szBlg2 = 2; break;
+              case 1: szBlg2 = 3; break;
+              default: /*NOTREACHED*/vassert(0);
+           }
+        }
+     }
+     // Perform further checks on register numbers
+     if (gate) {
+        /**/ if (isT && isLoad) {
+           // Thumb load
+           if (szBlg2 < 3) {
+              if (! (tt != 13 && tt != 15 && nn != 15)) gate = False;
+           } else {
+              if (! (tt != 13 && tt != 15 && tt2 != 13 && tt2 != 15
+                     && tt != tt2 && nn != 15)) gate = False;
+           }
+        }
+        else if (isT && !isLoad) {
+           // Thumb store
+           if (szBlg2 < 3) {
+              if (! (dd != 13 && dd != 15 && tt != 13 && tt != 15
+                     && nn != 15 && dd != nn && dd != tt)) gate = False;
+           } else {
+              if (! (dd != 13 && dd != 15 && tt != 13 && tt != 15
+                     && tt2 != 13 && tt2 != 15 && nn != 15 && dd != nn
+                     && dd != tt && dd != tt2)) gate = False;
+           }
+        }
+        else if (!isT && isLoad) {
+           // ARM Load
+           if (szBlg2 < 3) {
+              if (! (tt != 15 && nn != 15)) gate = False;
+           } else {
+              if (! ((tt & 1) == 0 && tt != 14 && nn != 15)) gate = False;
+              vassert(tt2 == 16/*invalid*/);
+              tt2 = tt + 1;
+           }
+        }
+        else if (!isT && !isLoad) {
+           // ARM Store
+           if (szBlg2 < 3) {
+              if (! (dd != 15 && tt != 15 && nn != 15
+                     && dd != nn && dd != tt)) gate = False;
+           } else {
+              if (! (dd != 15 && (tt & 1) == 0 && tt != 14 && nn != 15
+                     && dd != nn && dd != tt && dd != tt+1)) gate = False;
+              vassert(tt2 == 16/*invalid*/);
+              tt2 = tt + 1;
+           }
+        }
+        else /*NOTREACHED*/vassert(0);
+     }
+     // Paranoia ..
+     vassert(szBlg2 <= 3);
+     if (szBlg2 < 3) { vassert(tt2 == 16/*invalid*/); }
+                else { vassert(tt2 <= 14); }
+     if (isLoad) { vassert(dd == 16/*invalid*/); }
+            else { vassert(dd <= 14); }
+     // If we're still good even after all that, generate the IR.
+     if (gate) {
+        /* First, go unconditional.  Staying in-line is too complex. */
+        if (isT) {
+           vassert(condT != IRTemp_INVALID);
+           mk_skip_over_T32_if_cond_is_false( condT );
+        } else {
+           if (condT != IRTemp_INVALID) {
+              mk_skip_over_A32_if_cond_is_false( condT );
+              condT = IRTemp_INVALID;
+           }
+        }
+        /* Now the load or store. */
+        IRType ty = Ity_INVALID; /* the type of the transferred data */
+        const HChar* nm = NULL;
+        switch (szBlg2) {
+           case 0: nm = "b"; ty = Ity_I8;  break;
+           case 1: nm = "h"; ty = Ity_I16; break;
+           case 2: nm = "";  ty = Ity_I32; break;
+           case 3: nm = "d"; ty = Ity_I64; break;
+           default: vassert(0);
+        }
+        IRExpr* ea = isT ? getIRegT(nn) : getIRegA(nn);
+        if (isLoad) {
+           // LOAD.  Transaction, then fence.
+           IROp widen = Iop_INVALID;
+           switch (szBlg2) {
+              case 0: widen = Iop_8Uto32;  break;
+              case 1: widen = Iop_16Uto32; break;
+              case 2: case 3: break;
+              default: vassert(0);
+           }
+           IRTemp  res = newTemp(ty);
+           // FIXME: assumes little-endian guest
+           stmt( IRStmt_LLSC(Iend_LE, res, ea, NULL/*this is a load*/) );
+
+#          define PUT_IREG(_nnz, _eez) \
+              do { vassert((_nnz) <= 14); /* no writes to the PC */ \
+                   if (isT) { putIRegT((_nnz), (_eez), IRTemp_INVALID); } \
+                       else { putIRegA((_nnz), (_eez), \
+                              IRTemp_INVALID, Ijk_Boring); } } while(0)
+           if (ty == Ity_I64) {
+              // FIXME: assumes little-endian guest
+              PUT_IREG(tt,  unop(Iop_64to32, mkexpr(res)));
+              PUT_IREG(tt2, unop(Iop_64HIto32, mkexpr(res)));
+           } else {
+              PUT_IREG(tt, widen == Iop_INVALID
+                              ? mkexpr(res) : unop(widen, mkexpr(res)));
+           }
+           stmt(IRStmt_MBE(Imbe_Fence));
+           if (ty == Ity_I64) {
+              DIP("ldrex%s%s r%u, r%u, [r%u]\n",
+                  nm, isT ? "" : nCC(conq), tt, tt2, nn);
+           } else {
+              DIP("ldrex%s%s r%u, [r%u]\n", nm, isT ? "" : nCC(conq), tt, nn);
+           }           
+#          undef PUT_IREG
+        } else {
+           // STORE.  Fence, then transaction.
+           IRTemp resSC1, resSC32, data;
+           IROp   narrow = Iop_INVALID;
+           switch (szBlg2) {
+              case 0: narrow = Iop_32to8; break;
+              case 1: narrow = Iop_32to16; break;
+              case 2: case 3: break;
+              default: vassert(0);
+           }
+           stmt(IRStmt_MBE(Imbe_Fence));
+           data = newTemp(ty);
+#          define GET_IREG(_nnz) (isT ? getIRegT(_nnz) : getIRegA(_nnz))
+           assign(data,
+                  ty == Ity_I64
+                     // FIXME: assumes little-endian guest
+                     ? binop(Iop_32HLto64, GET_IREG(tt2), GET_IREG(tt))
+                     : narrow == Iop_INVALID
+                        ? GET_IREG(tt)
+                        : unop(narrow, GET_IREG(tt)));
+#          undef GET_IREG
+           resSC1 = newTemp(Ity_I1);
+           // FIXME: assumes little-endian guest
+           stmt( IRStmt_LLSC(Iend_LE, resSC1, ea, mkexpr(data)) );
+
+           /* Set rDD to 1 on failure, 0 on success.  Currently we have
+              resSC1 == 0 on failure, 1 on success. */
+           resSC32 = newTemp(Ity_I32);
+           assign(resSC32,
+                  unop(Iop_1Uto32, unop(Iop_Not1, mkexpr(resSC1))));
+           vassert(dd <= 14); /* no writes to the PC */
+           if (isT) {
+              putIRegT(dd, mkexpr(resSC32), IRTemp_INVALID);
+           } else {
+              putIRegA(dd, mkexpr(resSC32), IRTemp_INVALID, Ijk_Boring);
+           }
+           if (ty == Ity_I64) {
+              DIP("strex%s%s r%u, r%u, r%u, [r%u]\n",
+                  nm, isT ? "" : nCC(conq), dd, tt, tt2, nn);
+           } else {
+              DIP("strex%s%s r%u, r%u, [r%u]\n",
+                  nm, isT ? "" : nCC(conq), dd, tt, nn);
+           }
+        } /* if (isLoad) */
+        return True;
+     } /* if (gate) */
+     /* else fall through */
+   }
+
+   /* ---------- Doesn't match anything. ---------- */
+   return False;
+
+#  undef INSN
+}
+
+
+/*------------------------------------------------------------*/
 /*--- LDMxx/STMxx helper (both ARM and Thumb32)            ---*/
 /*------------------------------------------------------------*/
 
@@ -14341,7 +15234,7 @@
 
       IRExpr* rm     = mkU32(Irrm_NEAREST);
       IRTemp  scale  = newTemp(Ity_F64);
-      assign(scale, unop(Iop_I32UtoF64, mkU32( 1 << (frac_bits-1) )));
+      assign(scale, unop(Iop_I32UtoF64, mkU32( ((UInt)1) << (frac_bits-1) )));
 
       if (frac_bits >= 1 && frac_bits <= 32 && !to_fixed && !dp_op
                                             && size == 32) {
@@ -14450,10 +15343,12 @@
    *dres may or may not be updated.  If failure, returns False and
    doesn't change *dres nor create any IR.
 
-   Note that all NEON instructions (in ARM mode) are handled through
-   here, since they are all in NV space.
+   Note that all NEON instructions (in ARM mode) up to and including
+   ARMv7, but not later, are handled through here, since they are all
+   in NV space.
 */
-static Bool decode_NV_instruction ( /*MOD*/DisResult* dres,
+static Bool decode_NV_instruction_ARMv7_and_below
+                                 ( /*MOD*/DisResult* dres,
                                     const VexArchInfo* archinfo,
                                     UInt insn )
 {
@@ -14516,8 +15411,9 @@
    // and set CPSR.T = 1, that is, switch to Thumb mode
    if (INSN(31,25) == BITS7(1,1,1,1,1,0,1)) {
       UInt bitH   = INSN(24,24);
-      Int  uimm24 = INSN(23,0);
-      Int  simm24 = (((uimm24 << 8) >> 8) << 2) + (bitH << 1);
+      UInt uimm24 = INSN(23,0);   uimm24 <<= 8;
+      Int  simm24 = (Int)uimm24;  simm24 >>= 8;
+      simm24 = (((UInt)simm24) << 2) + (bitH << 1);
       /* Now this is a bit tricky.  Since we're decoding an ARM insn,
          it is implies that CPSR.T == 0.  Hence the current insn's
          address is guaranteed to be of the form X--(30)--X00.  So, no
@@ -14579,7 +15475,7 @@
 
    /* ------------------- NEON ------------------- */
    if (archinfo->hwcaps & VEX_HWCAPS_ARM_NEON) {
-      Bool ok_neon = decode_NEON_instruction(
+      Bool ok_neon = decode_NEON_instruction_ARMv7_and_below(
                         dres, insn, IRTemp_INVALID/*unconditional*/, 
                         False/*!isT*/
                      );
@@ -14621,16 +15517,10 @@
 
    DisResult dres;
    UInt      insn;
-   //Bool      allow_VFP = False;
-   //UInt      hwcaps = archinfo->hwcaps;
    IRTemp    condT; /* :: Ity_I32 */
    UInt      summary;
    HChar     dis_buf[128];  // big enough to hold LDMIA etc text
 
-   /* What insn variants are we supporting today? */
-   //allow_VFP  = (0 != (hwcaps & VEX_HWCAPS_ARM_VFP));
-   // etc etc
-
    /* Set result defaults. */
    dres.whatNext    = Dis_Continue;
    dres.len         = 4;
@@ -14745,11 +15635,12 @@
       case ARMCondNV: {
          // Illegal instruction prior to v5 (see ARM ARM A3-5), but
          // some cases are acceptable
-         Bool ok = decode_NV_instruction(&dres, archinfo, insn);
+         Bool ok
+            = decode_NV_instruction_ARMv7_and_below(&dres, archinfo, insn);
          if (ok)
             goto decode_success;
          else
-            goto decode_failure;
+            goto after_v7_decoder;
       }
       case ARMCondAL: // Always executed
          break;
@@ -15558,10 +16449,9 @@
    //
    if (BITS8(1,0,1,0,0,0,0,0) == (INSN(27,20) & BITS8(1,1,1,0,0,0,0,0))) {
       UInt link   = (insn >> 24) & 1;
-      UInt uimm24 = insn & ((1<<24)-1);
-      Int  simm24 = (Int)uimm24;
-      UInt dst    = guest_R15_curr_instr_notENC + 8
-                    + (((simm24 << 8) >> 8) << 2);
+      UInt uimm24 = insn & ((1<<24)-1);  uimm24 <<= 8;
+      Int  simm24 = (Int)uimm24;         simm24 >>= 8;
+      UInt dst    = guest_R15_curr_instr_notENC + 8 + (((UInt)simm24) << 2);
       IRJumpKind jk = link ? Ijk_Call : Ijk_Boring;
       if (link) {
          putIRegA(14, mkU32(guest_R15_curr_instr_notENC + 4),
@@ -15679,7 +16569,7 @@
    }
 
    /* --- NB: ARM interworking branches are in NV space, hence
-      are handled elsewhere by decode_NV_instruction.
+      are handled elsewhere by decode_NV_instruction_ARMv7_and_below.
       ---
    */
 
@@ -16341,7 +17231,7 @@
          IRTemp src    = newTemp(Ity_I32);
          IRTemp olddst = newTemp(Ity_I32);
          IRTemp newdst = newTemp(Ity_I32);
-         UInt   mask = 1 << (msb - lsb);
+         UInt   mask   = ((UInt)1) << (msb - lsb);
          mask = (mask - 1) + mask;
          vassert(mask != 0); // guaranteed by "msb < lsb" check above
          mask <<= lsb;
@@ -16541,15 +17431,17 @@
         ignore alignment issues for the time being. */
 
      /* For almost all cases, we do the writeback after the transfers.
-        However, that leaves the stack "uncovered" in this case:
+        However, that leaves the stack "uncovered" in cases like:
            strd    rD, [sp, #-8]
+           strd    rD, [sp, #-16]
         In which case, do the writeback to SP now, instead of later.
         This is bad in that it makes the insn non-restartable if the
         accesses fault, but at least keeps Memcheck happy. */
      Bool writeback_already_done = False;
      if (bS == 1 /*store*/ && summary == (2 | 16)
          && rN == 13 && rN != rD && rN != rD+1
-         && bU == 0/*minus*/) {
+         && bU == 0/*minus*/
+         && (imm8 == 8 || imm8 == 16)) {
         putIRegA( rN, mkexpr(eaT), condT, Ijk_Boring );
         writeback_already_done = True;
      }
@@ -17335,7 +18227,8 @@
    /* ----------------------------------------------------------- */
 
    /* These are all in NV space, and so are taken care of (far) above,
-      by a call from this function to decode_NV_instruction(). */
+      by a call from this function to
+      decode_NV_instruction_ARMv7_and_below(). */
 
    /* ----------------------------------------------------------- */
    /* -- v6 media instructions (in ARM mode)                   -- */
@@ -17350,6 +18243,24 @@
    }
 
    /* ----------------------------------------------------------- */
+   /* -- v8 instructions (in ARM mode)                         -- */
+   /* ----------------------------------------------------------- */
+
+  after_v7_decoder:
+
+   /* If we get here, it means that all attempts to decode the
+      instruction as ARMv7 or earlier have failed.  So, if we're doing
+      ARMv8 or later, here is the point to try for it. */
+
+   if (VEX_ARM_ARCHLEVEL(archinfo->hwcaps) >= 8) {
+      Bool ok_v8
+         = decode_V8_instruction( &dres, insn, condT, False/*!isT*/,
+                                  IRTemp_INVALID, IRTemp_INVALID );
+      if (ok_v8)
+         goto decode_success;
+   }
+
+   /* ----------------------------------------------------------- */
    /* -- Undecodable                                           -- */
    /* ----------------------------------------------------------- */
 
@@ -17491,18 +18402,12 @@
    DisResult dres;
    UShort    insn0; /*  first 16 bits of the insn */
    UShort    insn1; /* second 16 bits of the insn */
-   //Bool      allow_VFP = False;
-   //UInt      hwcaps = archinfo->hwcaps;
    HChar     dis_buf[128];  // big enough to hold LDMIA etc text
 
    /* Summary result of the ITxxx backwards analysis: False == safe
       but suboptimal. */
    Bool guaranteedUnconditional = False;
 
-   /* What insn variants are we supporting today? */
-   //allow_VFP  = (0 != (hwcaps & VEX_HWCAPS_ARM_VFP));
-   // etc etc
-
    /* Set result defaults. */
    dres.whatNext    = Dis_Continue;
    dres.len         = 2;
@@ -19142,8 +20047,8 @@
 
    case BITS5(1,1,1,0,0): {
       /* ---------------- B #simm11 ---------------- */
-      Int  simm11 = INSN0(10,0);
-           simm11 = (simm11 << 21) >> 20;
+      UInt uimm11 = INSN0(10,0);  uimm11 <<= 21;
+      Int  simm11 = (Int)uimm11;  simm11 >>= 20;
       UInt dst    = simm11 + guest_R15_curr_instr_notENC + 4;
       /* Only allowed outside or last-in IT block; SIGILL if not so. */
       gen_SIGILL_T_if_in_but_NLI_ITBlock(old_itstate, new_itstate);
@@ -19172,8 +20077,8 @@
    case BITS4(1,1,0,1): {
       /* ---------------- Bcond #simm8 ---------------- */
       UInt cond  = INSN0(11,8);
-      Int  simm8 = INSN0(7,0);
-           simm8 = (simm8 << 24) >> 23;
+      UInt uimm8 = INSN0(7,0);  uimm8 <<= 24;
+      Int  simm8 = (Int)uimm8;  simm8 >>= 23;
       UInt dst   = simm8 + guest_R15_curr_instr_notENC + 4;
       if (cond != ARMCondAL && cond != ARMCondNV) {
          /* Not allowed in an IT block; SIGILL if so. */
@@ -19261,13 +20166,15 @@
       UInt bJ2  = INSN1(11,11);
       UInt bI1  = 1 ^ (bJ1 ^ bS);
       UInt bI2  = 1 ^ (bJ2 ^ bS);
-      Int simm25
+      UInt uimm25
          =   (bS          << (1 + 1 + 10 + 11 + 1))
            | (bI1         << (1 + 10 + 11 + 1))
            | (bI2         << (10 + 11 + 1))
            | (INSN0(9,0)  << (11 + 1))
            | (INSN1(10,0) << 1);
-      simm25 = (simm25 << 7) >> 7;
+      uimm25 <<= 7;
+      Int simm25 = (Int)uimm25;
+      simm25 >>= 7;
 
       vassert(0 == (guest_R15_curr_instr_notENC & 1));
       UInt dst = simm25 + guest_R15_curr_instr_notENC + 4;
@@ -19665,10 +20572,10 @@
       UInt how  = INSN1(5,4);
 
       Bool valid = !isBadRegT(rD) && !isBadRegT(rN) && !isBadRegT(rM);
-      /* but allow "add.w reg, sp, reg, lsl #N for N=0,1,2 or 3
+      /* but allow "add.w reg, sp, reg, lsl #N for N=0..31
          (T3) "ADD (SP plus register) */
       if (!valid && INSN0(8,5) == BITS4(1,0,0,0) // add
-          && rD != 15 && rN == 13 && imm5 <= 3 && how == 0) {
+          && rD != 15 && rN == 13 && imm5 <= 31 && how == 0) {
          valid = True;
       }
       /* also allow "sub.w reg, sp, reg   lsl #N for N=0,1,2 or 3
@@ -20616,15 +21523,17 @@
          IRTemp transAddr = bP == 1 ? postAddr : preAddr;
 
          /* For almost all cases, we do the writeback after the transfers.
-            However, that leaves the stack "uncovered" in this case:
+            However, that leaves the stack "uncovered" in cases like:
                strd    rD, [sp, #-8]
+               strd    rD, [sp, #-16]
             In which case, do the writeback to SP now, instead of later.
             This is bad in that it makes the insn non-restartable if the
             accesses fault, but at least keeps Memcheck happy. */
          Bool writeback_already_done = False;
          if (bL == 0/*store*/ && bW == 1/*wb*/
              && rN == 13 && rN != rT && rN != rT2
-             && bU == 0/*minus*/ && (imm8 << 2) == 8) {
+             && bU == 0/*minus*/
+             && ((imm8 << 2) == 8 || (imm8 << 2) == 16)) {
             putIRegT(rN, mkexpr(postAddr), condT);
             writeback_already_done = True;
          }
@@ -20689,13 +21598,15 @@
        && INSN1(12,12) == 0) {
       UInt cond = INSN0(9,6);
       if (cond != ARMCondAL && cond != ARMCondNV) {
-         Int simm21
+         UInt uimm21
             =   (INSN0(10,10) << (1 + 1 + 6 + 11 + 1))
               | (INSN1(11,11) << (1 + 6 + 11 + 1))
               | (INSN1(13,13) << (6 + 11 + 1))
               | (INSN0(5,0)   << (11 + 1))
               | (INSN1(10,0)  << 1);
-         simm21 = (simm21 << 11) >> 11;
+         uimm21 <<= 11;
+         Int simm21 = (Int)uimm21;
+         simm21 >>= 11;
 
          vassert(0 == (guest_R15_curr_instr_notENC & 1));
          UInt dst = simm21 + guest_R15_curr_instr_notENC + 4;
@@ -20733,13 +21644,15 @@
          UInt bJ2 = INSN1(11,11);
          UInt bI1 = 1 ^ (bJ1 ^ bS);
          UInt bI2 = 1 ^ (bJ2 ^ bS);
-         Int simm25
+         UInt uimm25
             =   (bS          << (1 + 1 + 10 + 11 + 1))
               | (bI1         << (1 + 10 + 11 + 1))
               | (bI2         << (10 + 11 + 1))
               | (INSN0(9,0)  << (11 + 1))
               | (INSN1(10,0) << 1);
-         simm25 = (simm25 << 7) >> 7;
+         uimm25 <<= 7;
+         Int simm25 = (Int)uimm25;
+         simm25 >>= 7;
 
          vassert(0 == (guest_R15_curr_instr_notENC & 1));
          UInt dst = simm25 + guest_R15_curr_instr_notENC + 4;
@@ -21181,7 +22094,7 @@
          IRTemp src    = newTemp(Ity_I32);
          IRTemp olddst = newTemp(Ity_I32);
          IRTemp newdst = newTemp(Ity_I32);
-         UInt   mask = 1 << (msb - lsb);
+         UInt   mask   = ((UInt)1) << (msb - lsb);
          mask = (mask - 1) + mask;
          vassert(mask != 0); // guaranteed by "msb < lsb" check above
          mask <<= lsb;
@@ -21915,12 +22828,12 @@
    }
 
    /* ----------------------------------------------------------- */
-   /* -- NEON instructions (in Thumb mode)                     -- */
+   /* -- NEON instructions (only v7 and below, in Thumb mode)  -- */
    /* ----------------------------------------------------------- */
 
    if (archinfo->hwcaps & VEX_HWCAPS_ARM_NEON) {
       UInt insn32 = (INSN0(15,0) << 16) | INSN1(15,0);
-      Bool ok_neon = decode_NEON_instruction(
+      Bool ok_neon = decode_NEON_instruction_ARMv7_and_below(
                         &dres, insn32, condT, True/*isT*/
                      );
       if (ok_neon)
@@ -21941,6 +22854,23 @@
    }
 
    /* ----------------------------------------------------------- */
+   /* -- v8 instructions (in Thumb mode)                       -- */
+   /* ----------------------------------------------------------- */
+
+   /* If we get here, it means that all attempts to decode the
+      instruction as ARMv7 or earlier have failed.  So, if we're doing
+      ARMv8 or later, here is the point to try for it. */
+
+   if (VEX_ARM_ARCHLEVEL(archinfo->hwcaps) >= 8) {
+      UInt insn32 = (INSN0(15,0) << 16) | INSN1(15,0);
+      Bool ok_v8
+         = decode_V8_instruction( &dres, insn32, condT, True/*isT*/,
+                                  old_itstate, new_itstate );
+      if (ok_v8)
+         goto decode_success;
+   }
+
+   /* ----------------------------------------------------------- */
    /* -- Undecodable                                           -- */
    /* ----------------------------------------------------------- */
 
diff --git a/VEX/priv/guest_generic_x87.c b/VEX/priv/guest_generic_x87.c
index 2c9b25b..9f08352 100644
--- a/VEX/priv/guest_generic_x87.c
+++ b/VEX/priv/guest_generic_x87.c
@@ -795,10 +795,16 @@
       even if they would probably work.  Life is too short to have
       unvalidated cases in the code base. */
    switch (imm8) {
-      case 0x00: case 0x02: case 0x08: case 0x0A: case 0x0C: case 0x0E:
-      case 0x12: case 0x14: case 0x18: case 0x1A:
-      case 0x30: case 0x34: case 0x38: case 0x3A:
-      case 0x40: case 0x42: case 0x44: case 0x46: case 0x4A:
+      case 0x00: case 0x02:
+      case 0x08: case 0x0A: case 0x0C: case 0x0E:
+                 case 0x12: case 0x14:
+      case 0x18: case 0x1A:
+      case 0x30:            case 0x34:
+      case 0x38: case 0x3A:
+      case 0x40: case 0x42: case 0x44: case 0x46:
+                 case 0x4A:
+                 case 0x62:
+      case 0x70: case 0x72:
          break;
       default:
          return False;
@@ -1047,7 +1053,7 @@
       unvalidated cases in the code base. */
    switch (imm8) {
       case 0x01: case 0x03: case 0x09: case 0x0B: case 0x0D:
-      case 0x13:            case 0x1B:
+                 case 0x13: case 0x19: case 0x1B:
                             case 0x39: case 0x3B:
                  case 0x45:            case 0x4B:
          break;
diff --git a/VEX/priv/guest_mips_defs.h b/VEX/priv/guest_mips_defs.h
index f4c3bbc..60db3b5 100644
--- a/VEX/priv/guest_mips_defs.h
+++ b/VEX/priv/guest_mips_defs.h
@@ -93,11 +93,6 @@
    SUBS,     SUBD,    DIVS
 } flt_op;
 
-extern UInt mips32_dirtyhelper_mfc0 ( UInt rd, UInt sel );
-
-extern ULong mips64_dirtyhelper_dmfc0 ( UInt rd, UInt sel );
-
-
 #if defined(__mips__) && ((defined(__mips_isa_rev) && __mips_isa_rev >= 2))
 extern UInt mips32_dirtyhelper_rdhwr ( UInt rt, UInt rd );
 extern ULong mips64_dirtyhelper_rdhwr ( ULong rt, ULong rd );
diff --git a/VEX/priv/guest_mips_helpers.c b/VEX/priv/guest_mips_helpers.c
index 4fbb67e..07dccec 100644
--- a/VEX/priv/guest_mips_helpers.c
+++ b/VEX/priv/guest_mips_helpers.c
@@ -165,6 +165,8 @@
 
    vex_state->guest_COND = 0;
 
+   vex_state->guest_CP0_status = 0;
+
    /* MIPS32 DSP ASE(r2) specific registers */
    vex_state->guest_DSPControl = 0;   /* DSPControl register */
    vex_state->guest_ac0 = 0;          /* Accumulator 0 */
@@ -272,6 +274,8 @@
    vex_state->guest_NRADDR = 0;
 
    vex_state->guest_COND = 0;
+
+   vex_state->guest_CP0_status = MIPS_CP0_STATUS_FR;
 }
 
 /*-----------------------------------------------------------*/
@@ -414,668 +418,6 @@
                   }
 };
 
-#define ASM_VOLATILE_CASE(rd, sel) \
-         case rd: \
-            asm volatile ("mfc0 %0, $" #rd ", "#sel"\n\t" :"=r" (x) ); \
-            break;
-
-UInt mips32_dirtyhelper_mfc0(UInt rd, UInt sel)
-{
-   UInt x = 0;
-#if defined(__mips__) && ((defined(__mips_isa_rev) && __mips_isa_rev >= 2))
-   switch (sel) {
-      case 0:
-         /* __asm__("mfc0 %0, $1, 0" :"=r" (x)); */
-         switch (rd) {
-            ASM_VOLATILE_CASE(0, 0);
-            ASM_VOLATILE_CASE(1, 0);
-            ASM_VOLATILE_CASE(2, 0);
-            ASM_VOLATILE_CASE(3, 0);
-            ASM_VOLATILE_CASE(4, 0);
-            ASM_VOLATILE_CASE(5, 0);
-            ASM_VOLATILE_CASE(6, 0);
-            ASM_VOLATILE_CASE(7, 0);
-            ASM_VOLATILE_CASE(8, 0);
-            ASM_VOLATILE_CASE(9, 0);
-            ASM_VOLATILE_CASE(10, 0);
-            ASM_VOLATILE_CASE(11, 0);
-            ASM_VOLATILE_CASE(12, 0);
-            ASM_VOLATILE_CASE(13, 0);
-            ASM_VOLATILE_CASE(14, 0);
-            ASM_VOLATILE_CASE(15, 0);
-            ASM_VOLATILE_CASE(16, 0);
-            ASM_VOLATILE_CASE(17, 0);
-            ASM_VOLATILE_CASE(18, 0);
-            ASM_VOLATILE_CASE(19, 0);
-            ASM_VOLATILE_CASE(20, 0);
-            ASM_VOLATILE_CASE(21, 0);
-            ASM_VOLATILE_CASE(22, 0);
-            ASM_VOLATILE_CASE(23, 0);
-            ASM_VOLATILE_CASE(24, 0);
-            ASM_VOLATILE_CASE(25, 0);
-            ASM_VOLATILE_CASE(26, 0);
-            ASM_VOLATILE_CASE(27, 0);
-            ASM_VOLATILE_CASE(28, 0);
-            ASM_VOLATILE_CASE(29, 0);
-            ASM_VOLATILE_CASE(30, 0);
-            ASM_VOLATILE_CASE(31, 0);
-         default:
-            break;
-         }
-         break;
-      case 1:
-         /* __asm__("mfc0 %0, $1, 0" :"=r" (x)); */
-         switch (rd) {
-            ASM_VOLATILE_CASE(0, 1);
-            ASM_VOLATILE_CASE(1, 1);
-            ASM_VOLATILE_CASE(2, 1);
-            ASM_VOLATILE_CASE(3, 1);
-            ASM_VOLATILE_CASE(4, 1);
-            ASM_VOLATILE_CASE(5, 1);
-            ASM_VOLATILE_CASE(6, 1);
-            ASM_VOLATILE_CASE(7, 1);
-            ASM_VOLATILE_CASE(8, 1);
-            ASM_VOLATILE_CASE(9, 1);
-            ASM_VOLATILE_CASE(10, 1);
-            ASM_VOLATILE_CASE(11, 1);
-            ASM_VOLATILE_CASE(12, 1);
-            ASM_VOLATILE_CASE(13, 1);
-            ASM_VOLATILE_CASE(14, 1);
-            ASM_VOLATILE_CASE(15, 1);
-            ASM_VOLATILE_CASE(16, 1);
-            ASM_VOLATILE_CASE(17, 1);
-            ASM_VOLATILE_CASE(18, 1);
-            ASM_VOLATILE_CASE(19, 1);
-            ASM_VOLATILE_CASE(20, 1);
-            ASM_VOLATILE_CASE(21, 1);
-            ASM_VOLATILE_CASE(22, 1);
-            ASM_VOLATILE_CASE(23, 1);
-            ASM_VOLATILE_CASE(24, 1);
-            ASM_VOLATILE_CASE(25, 1);
-            ASM_VOLATILE_CASE(26, 1);
-            ASM_VOLATILE_CASE(27, 1);
-            ASM_VOLATILE_CASE(28, 1);
-            ASM_VOLATILE_CASE(29, 1);
-            ASM_VOLATILE_CASE(30, 1);
-            ASM_VOLATILE_CASE(31, 1);
-         default:
-            break;
-         }
-         break;
-      case 2:
-         /* __asm__("mfc0 %0, $1, 0" :"=r" (x)); */
-         switch (rd) {
-            ASM_VOLATILE_CASE(0, 2);
-            ASM_VOLATILE_CASE(1, 2);
-            ASM_VOLATILE_CASE(2, 2);
-            ASM_VOLATILE_CASE(3, 1);
-            ASM_VOLATILE_CASE(4, 2);
-            ASM_VOLATILE_CASE(5, 2);
-            ASM_VOLATILE_CASE(6, 2);
-            ASM_VOLATILE_CASE(7, 2);
-            ASM_VOLATILE_CASE(8, 2);
-            ASM_VOLATILE_CASE(9, 2);
-            ASM_VOLATILE_CASE(10, 2);
-            ASM_VOLATILE_CASE(11, 2);
-            ASM_VOLATILE_CASE(12, 2);
-            ASM_VOLATILE_CASE(13, 2);
-            ASM_VOLATILE_CASE(14, 2);
-            ASM_VOLATILE_CASE(15, 2);
-            ASM_VOLATILE_CASE(16, 2);
-            ASM_VOLATILE_CASE(17, 2);
-            ASM_VOLATILE_CASE(18, 2);
-            ASM_VOLATILE_CASE(19, 2);
-            ASM_VOLATILE_CASE(20, 2);
-            ASM_VOLATILE_CASE(21, 2);
-            ASM_VOLATILE_CASE(22, 2);
-            ASM_VOLATILE_CASE(23, 2);
-            ASM_VOLATILE_CASE(24, 2);
-            ASM_VOLATILE_CASE(25, 2);
-            ASM_VOLATILE_CASE(26, 2);
-            ASM_VOLATILE_CASE(27, 2);
-            ASM_VOLATILE_CASE(28, 2);
-            ASM_VOLATILE_CASE(29, 2);
-            ASM_VOLATILE_CASE(30, 2);
-            ASM_VOLATILE_CASE(31, 2);
-         default:
-            break;
-         }
-         break;
-      case 3:
-         /* __asm__("mfc0 %0, $1, 0" :"=r" (x)); */
-         switch (rd) {
-            ASM_VOLATILE_CASE(0, 3);
-            ASM_VOLATILE_CASE(1, 3);
-            ASM_VOLATILE_CASE(2, 3);
-            ASM_VOLATILE_CASE(3, 3);
-            ASM_VOLATILE_CASE(4, 3);
-            ASM_VOLATILE_CASE(5, 3);
-            ASM_VOLATILE_CASE(6, 3);
-            ASM_VOLATILE_CASE(7, 3);
-            ASM_VOLATILE_CASE(8, 3);
-            ASM_VOLATILE_CASE(9, 3);
-            ASM_VOLATILE_CASE(10, 3);
-            ASM_VOLATILE_CASE(11, 3);
-            ASM_VOLATILE_CASE(12, 3);
-            ASM_VOLATILE_CASE(13, 3);
-            ASM_VOLATILE_CASE(14, 3);
-            ASM_VOLATILE_CASE(15, 3);
-            ASM_VOLATILE_CASE(16, 3);
-            ASM_VOLATILE_CASE(17, 3);
-            ASM_VOLATILE_CASE(18, 3);
-            ASM_VOLATILE_CASE(19, 3);
-            ASM_VOLATILE_CASE(20, 3);
-            ASM_VOLATILE_CASE(21, 3);
-            ASM_VOLATILE_CASE(22, 3);
-            ASM_VOLATILE_CASE(23, 3);
-            ASM_VOLATILE_CASE(24, 3);
-            ASM_VOLATILE_CASE(25, 3);
-            ASM_VOLATILE_CASE(26, 3);
-            ASM_VOLATILE_CASE(27, 3);
-            ASM_VOLATILE_CASE(28, 3);
-            ASM_VOLATILE_CASE(29, 3);
-            ASM_VOLATILE_CASE(30, 3);
-            ASM_VOLATILE_CASE(31, 3);
-         default:
-            break;
-         }
-         break;
-      case 4:
-         /* __asm__("mfc0 %0, $1, 0" :"=r" (x)); */
-         switch (rd) {
-            ASM_VOLATILE_CASE(0, 4);
-            ASM_VOLATILE_CASE(1, 4);
-            ASM_VOLATILE_CASE(2, 4);
-            ASM_VOLATILE_CASE(3, 4);
-            ASM_VOLATILE_CASE(4, 4);
-            ASM_VOLATILE_CASE(5, 4);
-            ASM_VOLATILE_CASE(6, 4);
-            ASM_VOLATILE_CASE(7, 4);
-            ASM_VOLATILE_CASE(8, 4);
-            ASM_VOLATILE_CASE(9, 4);
-            ASM_VOLATILE_CASE(10, 4);
-            ASM_VOLATILE_CASE(11, 4);
-            ASM_VOLATILE_CASE(12, 4);
-            ASM_VOLATILE_CASE(13, 4);
-            ASM_VOLATILE_CASE(14, 4);
-            ASM_VOLATILE_CASE(15, 4);
-            ASM_VOLATILE_CASE(16, 4);
-            ASM_VOLATILE_CASE(17, 4);
-            ASM_VOLATILE_CASE(18, 4);
-            ASM_VOLATILE_CASE(19, 4);
-            ASM_VOLATILE_CASE(20, 4);
-            ASM_VOLATILE_CASE(21, 4);
-            ASM_VOLATILE_CASE(22, 4);
-            ASM_VOLATILE_CASE(23, 4);
-            ASM_VOLATILE_CASE(24, 4);
-            ASM_VOLATILE_CASE(25, 4);
-            ASM_VOLATILE_CASE(26, 4);
-            ASM_VOLATILE_CASE(27, 4);
-            ASM_VOLATILE_CASE(28, 4);
-            ASM_VOLATILE_CASE(29, 4);
-            ASM_VOLATILE_CASE(30, 4);
-            ASM_VOLATILE_CASE(31, 4);
-         default:
-            break;
-         }
-         break;
-      case 5:
-         /* __asm__("mfc0 %0, $1, 0" :"=r" (x)); */
-         switch (rd) {
-            ASM_VOLATILE_CASE(0, 5);
-            ASM_VOLATILE_CASE(1, 5);
-            ASM_VOLATILE_CASE(2, 5);
-            ASM_VOLATILE_CASE(3, 5);
-            ASM_VOLATILE_CASE(4, 5);
-            ASM_VOLATILE_CASE(5, 5);
-            ASM_VOLATILE_CASE(6, 5);
-            ASM_VOLATILE_CASE(7, 5);
-            ASM_VOLATILE_CASE(8, 5);
-            ASM_VOLATILE_CASE(9, 5);
-            ASM_VOLATILE_CASE(10, 5);
-            ASM_VOLATILE_CASE(11, 5);
-            ASM_VOLATILE_CASE(12, 5);
-            ASM_VOLATILE_CASE(13, 5);
-            ASM_VOLATILE_CASE(14, 5);
-            ASM_VOLATILE_CASE(15, 5);
-            ASM_VOLATILE_CASE(16, 5);
-            ASM_VOLATILE_CASE(17, 5);
-            ASM_VOLATILE_CASE(18, 5);
-            ASM_VOLATILE_CASE(19, 5);
-            ASM_VOLATILE_CASE(20, 5);
-            ASM_VOLATILE_CASE(21, 5);
-            ASM_VOLATILE_CASE(22, 5);
-            ASM_VOLATILE_CASE(23, 5);
-            ASM_VOLATILE_CASE(24, 5);
-            ASM_VOLATILE_CASE(25, 5);
-            ASM_VOLATILE_CASE(26, 5);
-            ASM_VOLATILE_CASE(27, 5);
-            ASM_VOLATILE_CASE(28, 5);
-            ASM_VOLATILE_CASE(29, 5);
-            ASM_VOLATILE_CASE(30, 5);
-            ASM_VOLATILE_CASE(31, 5);
-         default:
-            break;
-         }
-         break;
-      case 6:
-         /* __asm__("mfc0 %0, $1, 0" :"=r" (x)); */
-         switch (rd) {
-            ASM_VOLATILE_CASE(0, 6);
-            ASM_VOLATILE_CASE(1, 6);
-            ASM_VOLATILE_CASE(2, 6);
-            ASM_VOLATILE_CASE(3, 6);
-            ASM_VOLATILE_CASE(4, 6);
-            ASM_VOLATILE_CASE(5, 6);
-            ASM_VOLATILE_CASE(6, 6);
-            ASM_VOLATILE_CASE(7, 6);
-            ASM_VOLATILE_CASE(8, 6);
-            ASM_VOLATILE_CASE(9, 6);
-            ASM_VOLATILE_CASE(10, 6);
-            ASM_VOLATILE_CASE(11, 6);
-            ASM_VOLATILE_CASE(12, 6);
-            ASM_VOLATILE_CASE(13, 6);
-            ASM_VOLATILE_CASE(14, 6);
-            ASM_VOLATILE_CASE(15, 6);
-            ASM_VOLATILE_CASE(16, 6);
-            ASM_VOLATILE_CASE(17, 6);
-            ASM_VOLATILE_CASE(18, 6);
-            ASM_VOLATILE_CASE(19, 6);
-            ASM_VOLATILE_CASE(20, 6);
-            ASM_VOLATILE_CASE(21, 6);
-            ASM_VOLATILE_CASE(22, 6);
-            ASM_VOLATILE_CASE(23, 6);
-            ASM_VOLATILE_CASE(24, 6);
-            ASM_VOLATILE_CASE(25, 6);
-            ASM_VOLATILE_CASE(26, 6);
-            ASM_VOLATILE_CASE(27, 6);
-            ASM_VOLATILE_CASE(28, 6);
-            ASM_VOLATILE_CASE(29, 6);
-            ASM_VOLATILE_CASE(30, 6);
-            ASM_VOLATILE_CASE(31, 6);
-         default:
-            break;
-         }
-         break;
-      case 7:
-         /* __asm__("mfc0 %0, $1, 0" :"=r" (x)); */
-         switch (rd) {
-            ASM_VOLATILE_CASE(0, 7);
-            ASM_VOLATILE_CASE(1, 7);
-            ASM_VOLATILE_CASE(2, 7);
-            ASM_VOLATILE_CASE(3, 7);
-            ASM_VOLATILE_CASE(4, 7);
-            ASM_VOLATILE_CASE(5, 7);
-            ASM_VOLATILE_CASE(6, 7);
-            ASM_VOLATILE_CASE(7, 7);
-            ASM_VOLATILE_CASE(8, 7);
-            ASM_VOLATILE_CASE(9, 7);
-            ASM_VOLATILE_CASE(10, 7);
-            ASM_VOLATILE_CASE(11, 7);
-            ASM_VOLATILE_CASE(12, 7);
-            ASM_VOLATILE_CASE(13, 7);
-            ASM_VOLATILE_CASE(14, 7);
-            ASM_VOLATILE_CASE(15, 7);
-            ASM_VOLATILE_CASE(16, 7);
-            ASM_VOLATILE_CASE(17, 7);
-            ASM_VOLATILE_CASE(18, 7);
-            ASM_VOLATILE_CASE(19, 7);
-            ASM_VOLATILE_CASE(20, 7);
-            ASM_VOLATILE_CASE(21, 7);
-            ASM_VOLATILE_CASE(22, 7);
-            ASM_VOLATILE_CASE(23, 7);
-            ASM_VOLATILE_CASE(24, 7);
-            ASM_VOLATILE_CASE(25, 7);
-            ASM_VOLATILE_CASE(26, 7);
-            ASM_VOLATILE_CASE(27, 7);
-            ASM_VOLATILE_CASE(28, 7);
-            ASM_VOLATILE_CASE(29, 7);
-            ASM_VOLATILE_CASE(30, 7);
-            ASM_VOLATILE_CASE(31, 7);
-         default:
-            break;
-         }
-      break;
-
-   default:
-      break;
-   }
-#endif
-   return x;
-}
-
-#undef ASM_VOLATILE_CASE
-
-#define ASM_VOLATILE_CASE(rd, sel) \
-         case rd: \
-            asm volatile ("dmfc0 %0, $" #rd ", "#sel"\n\t" :"=r" (x) ); \
-            break;
-
-ULong mips64_dirtyhelper_dmfc0 ( UInt rd, UInt sel )
-{
-   ULong x = 0;
-#if defined(VGP_mips64_linux)
-   switch (sel) {
-     case 0:
-        /* __asm__("dmfc0 %0, $1, 0" :"=r" (x)); */
-        switch (rd) {
-           ASM_VOLATILE_CASE (0, 0);
-           ASM_VOLATILE_CASE (1, 0);
-           ASM_VOLATILE_CASE (2, 0);
-           ASM_VOLATILE_CASE (3, 0);
-           ASM_VOLATILE_CASE (4, 0);
-           ASM_VOLATILE_CASE (5, 0);
-           ASM_VOLATILE_CASE (6, 0);
-           ASM_VOLATILE_CASE (7, 0);
-           ASM_VOLATILE_CASE (8, 0);
-           ASM_VOLATILE_CASE (9, 0);
-           ASM_VOLATILE_CASE (10, 0);
-           ASM_VOLATILE_CASE (11, 0);
-           ASM_VOLATILE_CASE (12, 0);
-           ASM_VOLATILE_CASE (13, 0);
-           ASM_VOLATILE_CASE (14, 0);
-           ASM_VOLATILE_CASE (15, 0);
-           ASM_VOLATILE_CASE (16, 0);
-           ASM_VOLATILE_CASE (17, 0);
-           ASM_VOLATILE_CASE (18, 0);
-           ASM_VOLATILE_CASE (19, 0);
-           ASM_VOLATILE_CASE (20, 0);
-           ASM_VOLATILE_CASE (21, 0);
-           ASM_VOLATILE_CASE (22, 0);
-           ASM_VOLATILE_CASE (23, 0);
-           ASM_VOLATILE_CASE (24, 0);
-           ASM_VOLATILE_CASE (25, 0);
-           ASM_VOLATILE_CASE (26, 0);
-           ASM_VOLATILE_CASE (27, 0);
-           ASM_VOLATILE_CASE (28, 0);
-           ASM_VOLATILE_CASE (29, 0);
-           ASM_VOLATILE_CASE (30, 0);
-           ASM_VOLATILE_CASE (31, 0);
-         default:
-           break;
-        }
-        break;
-     case 1:
-        /* __asm__("dmfc0 %0, $1, 0" :"=r" (x)); */
-        switch (rd) {
-           ASM_VOLATILE_CASE (0, 1);
-           ASM_VOLATILE_CASE (1, 1);
-           ASM_VOLATILE_CASE (2, 1);
-           ASM_VOLATILE_CASE (3, 1);
-           ASM_VOLATILE_CASE (4, 1);
-           ASM_VOLATILE_CASE (5, 1);
-           ASM_VOLATILE_CASE (6, 1);
-           ASM_VOLATILE_CASE (7, 1);
-           ASM_VOLATILE_CASE (8, 1);
-           ASM_VOLATILE_CASE (9, 1);
-           ASM_VOLATILE_CASE (10, 1);
-           ASM_VOLATILE_CASE (11, 1);
-           ASM_VOLATILE_CASE (12, 1);
-           ASM_VOLATILE_CASE (13, 1);
-           ASM_VOLATILE_CASE (14, 1);
-           ASM_VOLATILE_CASE (15, 1);
-           ASM_VOLATILE_CASE (16, 1);
-           ASM_VOLATILE_CASE (17, 1);
-           ASM_VOLATILE_CASE (18, 1);
-           ASM_VOLATILE_CASE (19, 1);
-           ASM_VOLATILE_CASE (20, 1);
-           ASM_VOLATILE_CASE (21, 1);
-           ASM_VOLATILE_CASE (22, 1);
-           ASM_VOLATILE_CASE (23, 1);
-           ASM_VOLATILE_CASE (24, 1);
-           ASM_VOLATILE_CASE (25, 1);
-           ASM_VOLATILE_CASE (26, 1);
-           ASM_VOLATILE_CASE (27, 1);
-           ASM_VOLATILE_CASE (28, 1);
-           ASM_VOLATILE_CASE (29, 1);
-           ASM_VOLATILE_CASE (30, 1);
-           ASM_VOLATILE_CASE (31, 1);
-        default:
-           break;
-        }
-        break;
-     case 2:
-        /* __asm__("dmfc0 %0, $1, 0" :"=r" (x)); */
-        switch (rd) {
-           ASM_VOLATILE_CASE (0, 2);
-           ASM_VOLATILE_CASE (1, 2);
-           ASM_VOLATILE_CASE (2, 2);
-           ASM_VOLATILE_CASE (3, 1);
-           ASM_VOLATILE_CASE (4, 2);
-           ASM_VOLATILE_CASE (5, 2);
-           ASM_VOLATILE_CASE (6, 2);
-           ASM_VOLATILE_CASE (7, 2);
-           ASM_VOLATILE_CASE (8, 2);
-           ASM_VOLATILE_CASE (9, 2);
-           ASM_VOLATILE_CASE (10, 2);
-           ASM_VOLATILE_CASE (11, 2);
-           ASM_VOLATILE_CASE (12, 2);
-           ASM_VOLATILE_CASE (13, 2);
-           ASM_VOLATILE_CASE (14, 2);
-           ASM_VOLATILE_CASE (15, 2);
-           ASM_VOLATILE_CASE (16, 2);
-           ASM_VOLATILE_CASE (17, 2);
-           ASM_VOLATILE_CASE (18, 2);
-           ASM_VOLATILE_CASE (19, 2);
-           ASM_VOLATILE_CASE (20, 2);
-           ASM_VOLATILE_CASE (21, 2);
-           ASM_VOLATILE_CASE (22, 2);
-           ASM_VOLATILE_CASE (23, 2);
-           ASM_VOLATILE_CASE (24, 2);
-           ASM_VOLATILE_CASE (25, 2);
-           ASM_VOLATILE_CASE (26, 2);
-           ASM_VOLATILE_CASE (27, 2);
-           ASM_VOLATILE_CASE (28, 2);
-           ASM_VOLATILE_CASE (29, 2);
-           ASM_VOLATILE_CASE (30, 2);
-           ASM_VOLATILE_CASE (31, 2);
-         default:
-           break;
-         }
-         break;
-     case 3:
-        /* __asm__("dmfc0 %0, $1, 0" :"=r" (x)); */
-        switch (rd) {
-           ASM_VOLATILE_CASE (0, 3);
-           ASM_VOLATILE_CASE (1, 3);
-           ASM_VOLATILE_CASE (2, 3);
-           ASM_VOLATILE_CASE (3, 3);
-           ASM_VOLATILE_CASE (4, 3);
-           ASM_VOLATILE_CASE (5, 3);
-           ASM_VOLATILE_CASE (6, 3);
-           ASM_VOLATILE_CASE (7, 3);
-           ASM_VOLATILE_CASE (8, 3);
-           ASM_VOLATILE_CASE (9, 3);
-           ASM_VOLATILE_CASE (10, 3);
-           ASM_VOLATILE_CASE (11, 3);
-           ASM_VOLATILE_CASE (12, 3);
-           ASM_VOLATILE_CASE (13, 3);
-           ASM_VOLATILE_CASE (14, 3);
-           ASM_VOLATILE_CASE (15, 3);
-           ASM_VOLATILE_CASE (16, 3);
-           ASM_VOLATILE_CASE (17, 3);
-           ASM_VOLATILE_CASE (18, 3);
-           ASM_VOLATILE_CASE (19, 3);
-           ASM_VOLATILE_CASE (20, 3);
-           ASM_VOLATILE_CASE (21, 3);
-           ASM_VOLATILE_CASE (22, 3);
-           ASM_VOLATILE_CASE (23, 3);
-           ASM_VOLATILE_CASE (24, 3);
-           ASM_VOLATILE_CASE (25, 3);
-           ASM_VOLATILE_CASE (26, 3);
-           ASM_VOLATILE_CASE (27, 3);
-           ASM_VOLATILE_CASE (28, 3);
-           ASM_VOLATILE_CASE (29, 3);
-           ASM_VOLATILE_CASE (30, 3);
-           ASM_VOLATILE_CASE (31, 3);
-        default:
-           break;
-        }
-        break;
-     case 4:
-        /* __asm__("dmfc0 %0, $1, 0" :"=r" (x)); */
-        switch (rd) {
-           ASM_VOLATILE_CASE (0, 4);
-           ASM_VOLATILE_CASE (1, 4);
-           ASM_VOLATILE_CASE (2, 4);
-           ASM_VOLATILE_CASE (3, 4);
-           ASM_VOLATILE_CASE (4, 4);
-           ASM_VOLATILE_CASE (5, 4);
-           ASM_VOLATILE_CASE (6, 4);
-           ASM_VOLATILE_CASE (7, 4);
-           ASM_VOLATILE_CASE (8, 4);
-           ASM_VOLATILE_CASE (9, 4);
-           ASM_VOLATILE_CASE (10, 4);
-           ASM_VOLATILE_CASE (11, 4);
-           ASM_VOLATILE_CASE (12, 4);
-           ASM_VOLATILE_CASE (13, 4);
-           ASM_VOLATILE_CASE (14, 4);
-           ASM_VOLATILE_CASE (15, 4);
-           ASM_VOLATILE_CASE (16, 4);
-           ASM_VOLATILE_CASE (17, 4);
-           ASM_VOLATILE_CASE (18, 4);
-           ASM_VOLATILE_CASE (19, 4);
-           ASM_VOLATILE_CASE (20, 4);
-           ASM_VOLATILE_CASE (21, 4);
-           ASM_VOLATILE_CASE (22, 4);
-           ASM_VOLATILE_CASE (23, 4);
-           ASM_VOLATILE_CASE (24, 4);
-           ASM_VOLATILE_CASE (25, 4);
-           ASM_VOLATILE_CASE (26, 4);
-           ASM_VOLATILE_CASE (27, 4);
-           ASM_VOLATILE_CASE (28, 4);
-           ASM_VOLATILE_CASE (29, 4);
-           ASM_VOLATILE_CASE (30, 4);
-           ASM_VOLATILE_CASE (31, 4);
-           default:
-              break;
-           }
-        break;
-     case 5:
-        /* __asm__("dmfc0 %0, $1, 0" :"=r" (x)); */
-        switch (rd) {
-           ASM_VOLATILE_CASE (0, 5);
-           ASM_VOLATILE_CASE (1, 5);
-           ASM_VOLATILE_CASE (2, 5);
-           ASM_VOLATILE_CASE (3, 5);
-           ASM_VOLATILE_CASE (4, 5);
-           ASM_VOLATILE_CASE (5, 5);
-           ASM_VOLATILE_CASE (6, 5);
-           ASM_VOLATILE_CASE (7, 5);
-           ASM_VOLATILE_CASE (8, 5);
-           ASM_VOLATILE_CASE (9, 5);
-           ASM_VOLATILE_CASE (10, 5);
-           ASM_VOLATILE_CASE (11, 5);
-           ASM_VOLATILE_CASE (12, 5);
-           ASM_VOLATILE_CASE (13, 5);
-           ASM_VOLATILE_CASE (14, 5);
-           ASM_VOLATILE_CASE (15, 5);
-           ASM_VOLATILE_CASE (16, 5);
-           ASM_VOLATILE_CASE (17, 5);
-           ASM_VOLATILE_CASE (18, 5);
-           ASM_VOLATILE_CASE (19, 5);
-           ASM_VOLATILE_CASE (20, 5);
-           ASM_VOLATILE_CASE (21, 5);
-           ASM_VOLATILE_CASE (22, 5);
-           ASM_VOLATILE_CASE (23, 5);
-           ASM_VOLATILE_CASE (24, 5);
-           ASM_VOLATILE_CASE (25, 5);
-           ASM_VOLATILE_CASE (26, 5);
-           ASM_VOLATILE_CASE (27, 5);
-           ASM_VOLATILE_CASE (28, 5);
-           ASM_VOLATILE_CASE (29, 5);
-           ASM_VOLATILE_CASE (30, 5);
-           ASM_VOLATILE_CASE (31, 5);
-           default:
-              break;
-        }
-        break;
-     case 6:
-        /* __asm__("dmfc0 %0, $1, 0" :"=r" (x)); */
-        switch (rd) {
-           ASM_VOLATILE_CASE (0, 6);
-           ASM_VOLATILE_CASE (1, 6);
-           ASM_VOLATILE_CASE (2, 6);
-           ASM_VOLATILE_CASE (3, 6);
-           ASM_VOLATILE_CASE (4, 6);
-           ASM_VOLATILE_CASE (5, 6);
-           ASM_VOLATILE_CASE (6, 6);
-           ASM_VOLATILE_CASE (7, 6);
-           ASM_VOLATILE_CASE (8, 6);
-           ASM_VOLATILE_CASE (9, 6);
-           ASM_VOLATILE_CASE (10, 6);
-           ASM_VOLATILE_CASE (11, 6);
-           ASM_VOLATILE_CASE (12, 6);
-           ASM_VOLATILE_CASE (13, 6);
-           ASM_VOLATILE_CASE (14, 6);
-           ASM_VOLATILE_CASE (15, 6);
-           ASM_VOLATILE_CASE (16, 6);
-           ASM_VOLATILE_CASE (17, 6);
-           ASM_VOLATILE_CASE (18, 6);
-           ASM_VOLATILE_CASE (19, 6);
-           ASM_VOLATILE_CASE (20, 6);
-           ASM_VOLATILE_CASE (21, 6);
-           ASM_VOLATILE_CASE (22, 6);
-           ASM_VOLATILE_CASE (23, 6);
-           ASM_VOLATILE_CASE (24, 6);
-           ASM_VOLATILE_CASE (25, 6);
-           ASM_VOLATILE_CASE (26, 6);
-           ASM_VOLATILE_CASE (27, 6);
-           ASM_VOLATILE_CASE (28, 6);
-           ASM_VOLATILE_CASE (29, 6);
-           ASM_VOLATILE_CASE (30, 6);
-           ASM_VOLATILE_CASE (31, 6);
-        default:
-           break;
-        }
-        break;
-     case 7:
-        /* __asm__("dmfc0 %0, $1, 0" :"=r" (x)); */
-        switch (rd) {
-           ASM_VOLATILE_CASE (0, 7);
-           ASM_VOLATILE_CASE (1, 7);
-           ASM_VOLATILE_CASE (2, 7);
-           ASM_VOLATILE_CASE (3, 7);
-           ASM_VOLATILE_CASE (4, 7);
-           ASM_VOLATILE_CASE (5, 7);
-           ASM_VOLATILE_CASE (6, 7);
-           ASM_VOLATILE_CASE (7, 7);
-           ASM_VOLATILE_CASE (8, 7);
-           ASM_VOLATILE_CASE (9, 7);
-           ASM_VOLATILE_CASE (10, 7);
-           ASM_VOLATILE_CASE (11, 7);
-           ASM_VOLATILE_CASE (12, 7);
-           ASM_VOLATILE_CASE (13, 7);
-           ASM_VOLATILE_CASE (14, 7);
-           ASM_VOLATILE_CASE (15, 7);
-           ASM_VOLATILE_CASE (16, 7);
-           ASM_VOLATILE_CASE (17, 7);
-           ASM_VOLATILE_CASE (18, 7);
-           ASM_VOLATILE_CASE (19, 7);
-           ASM_VOLATILE_CASE (20, 7);
-           ASM_VOLATILE_CASE (21, 7);
-           ASM_VOLATILE_CASE (22, 7);
-           ASM_VOLATILE_CASE (23, 7);
-           ASM_VOLATILE_CASE (24, 7);
-           ASM_VOLATILE_CASE (25, 7);
-           ASM_VOLATILE_CASE (26, 7);
-           ASM_VOLATILE_CASE (27, 7);
-           ASM_VOLATILE_CASE (28, 7);
-           ASM_VOLATILE_CASE (29, 7);
-           ASM_VOLATILE_CASE (30, 7);
-           ASM_VOLATILE_CASE (31, 7);
-         default:
-           break;
-         }
-       break;
-
-     default:
-       break;
-     }
-#endif
-   return x;
-}
-
 #if defined(__mips__) && ((defined(__mips_isa_rev) && __mips_isa_rev >= 2))
 UInt mips32_dirtyhelper_rdhwr ( UInt rt, UInt rd )
 {
@@ -1117,79 +459,93 @@
 #endif
 
 #define ASM_VOLATILE_UNARY32(inst)                                  \
-   __asm__ volatile("cfc1  $t0,  $31"   "\n\t"                      \
+   __asm__ volatile(".set  push"        "\n\t"                      \
+                    ".set  hardfloat"   "\n\t"                      \
+                    "cfc1  $t0,  $31"   "\n\t"                      \
                     "ctc1  %2,   $31"   "\n\t"                      \
                     "mtc1  %1,   $f20"  "\n\t"                      \
                     #inst" $f20, $f20"  "\n\t"                      \
                     "cfc1  %0,   $31"   "\n\t"                      \
                     "ctc1  $t0,  $31"   "\n\t"                      \
+                    ".set  pop"         "\n\t"                      \
                     : "=r" (ret)                                    \
                     : "r" (loFsVal), "r" (fcsr)                     \
                     : "t0", "$f20"                                  \
                    );
 
 #define ASM_VOLATILE_UNARY32_DOUBLE(inst)                           \
-   __asm__ volatile("cfc1  $t0,  $31"   "\n\t"                      \
-                    "ctc1  %3,   $31"   "\n\t"                      \
-                    "mtc1  %1,   $f20"  "\n\t"                      \
-                    "mtc1  %2,   $f21"  "\n\t"                      \
+   __asm__ volatile(".set  push"        "\n\t"                      \
+                    ".set  hardfloat"   "\n\t"                      \
+                    "cfc1  $t0,  $31"   "\n\t"                      \
+                    "ctc1  %2,   $31"   "\n\t"                      \
+                    "ldc1  $f20, 0(%1)" "\n\t"                      \
                     #inst" $f20, $f20"  "\n\t"                      \
                     "cfc1  %0,   $31"   "\n\t"                      \
                     "ctc1  $t0,  $31"   "\n\t"                      \
+                    ".set  pop"         "\n\t"                      \
                     : "=r" (ret)                                    \
-                    : "r" (loFsVal), "r" (hiFsVal), "r" (fcsr)      \
+                    : "r" (&fsVal), "r" (fcsr)                      \
                     : "t0", "$f20", "$f21"                          \
                    );
 
 #define ASM_VOLATILE_UNARY64(inst)                                  \
-   __asm__ volatile("cfc1  $t0,  $31"    "\n\t"                     \
+   __asm__ volatile(".set  push"         "\n\t"                     \
+                    ".set  hardfloat"    "\n\t"                     \
+                    "cfc1  $t0,  $31"    "\n\t"                     \
                     "ctc1  %2,   $31"    "\n\t"                     \
                     "ldc1  $f24, 0(%1)"  "\n\t"                     \
                     #inst" $f24, $f24"   "\n\t"                     \
                     "cfc1  %0,   $31"    "\n\t"                     \
                     "ctc1  $t0,  $31"    "\n\t"                     \
+                    ".set  pop"          "\n\t"                     \
                     : "=r" (ret)                                    \
                     : "r" (&(addr[fs])), "r" (fcsr)                 \
                     : "t0", "$f24"                                  \
                    );
 
 #define ASM_VOLATILE_BINARY32(inst)                                 \
-   __asm__ volatile("cfc1  $t0,  $31"         "\n\t"                \
+   __asm__ volatile(".set  push"              "\n\t"                \
+                    ".set  hardfloat"         "\n\t"                \
+                    "cfc1  $t0,  $31"         "\n\t"                \
                     "ctc1  %3,   $31"         "\n\t"                \
                     "mtc1  %1,   $f20"        "\n\t"                \
                     "mtc1  %2,   $f22"        "\n\t"                \
                     #inst" $f20, $f20, $f22"  "\n\t"                \
                     "cfc1  %0,   $31"         "\n\t"                \
                     "ctc1  $t0,  $31"         "\n\t"                \
+                    ".set  pop"               "\n\t"                \
                     : "=r" (ret)                                    \
                     : "r" (loFsVal), "r" (loFtVal), "r" (fcsr)      \
                     : "t0", "$f20", "$f22"                          \
                    );
 
 #define ASM_VOLATILE_BINARY32_DOUBLE(inst)                          \
-   __asm__ volatile("cfc1  $t0,  $31"         "\n\t"                \
-                    "ctc1  %5,   $31"         "\n\t"                \
-                    "mtc1  %1,   $f20"        "\n\t"                \
-                    "mtc1  %2,   $f21"        "\n\t"                \
-                    "mtc1  %3,   $f22"        "\n\t"                \
-                    "mtc1  %4,   $f23"        "\n\t"                \
+   __asm__ volatile(".set  push"              "\n\t"                \
+                    ".set  hardfloat"         "\n\t"                \
+                    "cfc1  $t0,  $31"         "\n\t"                \
+                    "ctc1  %3,   $31"         "\n\t"                \
+                    "ldc1  $f20, 0(%1)"       "\n\t"                \
+                    "ldc1  $f22, 0(%2)"       "\n\t"                \
                     #inst" $f20, $f20, $f22"  "\n\t"                \
                     "cfc1  %0,   $31"         "\n\t"                \
                     "ctc1  $t0,  $31"         "\n\t"                \
+                    ".set  pop"               "\n\t"                \
                     : "=r" (ret)                                    \
-                    : "r" (loFsVal), "r" (hiFsVal), "r" (loFtVal),  \
-                      "r" (hiFtVal), "r" (fcsr)                     \
+                    : "r" (&fsVal), "r" (&ftVal), "r" (fcsr)        \
                     : "t0", "$f20", "$f21", "$f22", "$f23"          \
                    );
 
 #define ASM_VOLATILE_BINARY64(inst)                                     \
-   __asm__ volatile("cfc1  $t0,  $31"         "\n\t"                    \
+   __asm__ volatile(".set  push"              "\n\t"                    \
+                    ".set  hardfloat"         "\n\t"                    \
+                    "cfc1  $t0,  $31"         "\n\t"                    \
                     "ctc1  %3,   $31"         "\n\t"                    \
                     "ldc1  $f24, 0(%1)"       "\n\t"                    \
                     "ldc1  $f26, 0(%2)"       "\n\t"                    \
                     #inst" $f24, $f24, $f26"  "\n\t"                    \
                     "cfc1  %0,   $31"         "\n\t"                    \
                     "ctc1  $t0,  $31"         "\n\t"                    \
+                    ".set  pop"               "\n\t"                    \
                     : "=r" (ret)                                        \
                     : "r" (&(addr[fs])), "r" (&(addr[ft])), "r" (fcsr)  \
                     : "t0", "$f24", "$f26"                              \
@@ -1217,6 +573,8 @@
    loFtVal    = (UInt)addr[ft*2];
    hiFtVal    = (UInt)addr[ft*2+2];
 #endif
+   ULong fsVal   = ((ULong) hiFsVal) << 32 | loFsVal;
+   ULong ftVal   = ((ULong) hiFtVal) << 32 | loFtVal;
    UInt fcsr     = guest_state->guest_FCSR;
    switch (inst) {
       case ROUNDWD:
diff --git a/VEX/priv/guest_mips_toIR.c b/VEX/priv/guest_mips_toIR.c
index b72d314..776e6c5 100644
--- a/VEX/priv/guest_mips_toIR.c
+++ b/VEX/priv/guest_mips_toIR.c
@@ -1429,8 +1429,8 @@
       IRTemp t4 = newTemp(Ity_I32);
       IRTemp t5 = newTemp(Ity_I64);
 
-      assign(t0, getFReg(dregNo));
-      assign(t1, getFReg(dregNo + 1));
+      assign(t0, getFReg(dregNo & (~1)));
+      assign(t1, getFReg(dregNo | 1));
 
       assign(t3, unop(Iop_ReinterpF32asI32, mkexpr(t0)));
       assign(t4, unop(Iop_ReinterpF32asI32, mkexpr(t1)));
@@ -1467,8 +1467,8 @@
       assign(t6, unop(Iop_ReinterpF64asI64, mkexpr(t1)));
       assign(t4, unop(Iop_64HIto32, mkexpr(t6)));  /* hi */
       assign(t5, unop(Iop_64to32, mkexpr(t6)));    /* lo */
-      putFReg(dregNo, unop(Iop_ReinterpI32asF32, mkexpr(t5)));
-      putFReg(dregNo + 1, unop(Iop_ReinterpI32asF32, mkexpr(t4)));
+      putFReg(dregNo & (~1), unop(Iop_ReinterpI32asF32, mkexpr(t5)));
+      putFReg(dregNo | 1, unop(Iop_ReinterpI32asF32, mkexpr(t4)));
    }
 }
 
@@ -12228,28 +12228,39 @@
    case 0x11: {  /* COP1 */
       if (fmt == 0x3 && fd == 0 && function == 0) {  /* MFHC1 */
          DIP("mfhc1 r%u, f%u", rt, fs);
-         if (fp_mode64) {
-            t0 = newTemp(Ity_I64);
-            t1 = newTemp(Ity_I32);
-            assign(t0, unop(Iop_ReinterpF64asI64, getDReg(fs)));
-            assign(t1, unop(Iop_64HIto32, mkexpr(t0)));
-            putIReg(rt, mkWidenFrom32(ty, mkexpr(t1), True));
-         } else {
-            ILLEGAL_INSTRUCTON;
+         if (VEX_MIPS_CPU_HAS_MIPS32R2(archinfo->hwcaps)) {
+            if (fp_mode64) {
+               t0 = newTemp(Ity_I64);
+               t1 = newTemp(Ity_I32);
+               assign(t0, unop(Iop_ReinterpF64asI64, getDReg(fs)));
+               assign(t1, unop(Iop_64HIto32, mkexpr(t0)));
+               putIReg(rt, mkWidenFrom32(ty, mkexpr(t1), True));
+               break;
+            } else {
+               putIReg(rt, mkWidenFrom32(ty, unop(Iop_ReinterpF32asI32,
+                                                  getFReg(fs | 1)), True));
+               break;
+            }
          }
+         ILLEGAL_INSTRUCTON;
          break;
       } else if (fmt == 0x7 && fd == 0 && function == 0) {  /* MTHC1 */
          DIP("mthc1 r%u, f%u", rt, fs);
-         if (fp_mode64) {
-            t0 = newTemp(Ity_I64);
-            assign(t0, binop(Iop_32HLto64, getIReg(rt),
-                             unop(Iop_ReinterpF32asI32,
-                                  getLoFromF64(Ity_F64 /* 32FPR mode. */,
-                                               getDReg(fs)))));
-            putDReg(fs, unop(Iop_ReinterpI64asF64, mkexpr(t0)));
-         } else {
-            ILLEGAL_INSTRUCTON;
+         if (VEX_MIPS_CPU_HAS_MIPS32R2(archinfo->hwcaps)) {
+            if (fp_mode64) {
+               t0 = newTemp(Ity_I64);
+               assign(t0, binop(Iop_32HLto64, mkNarrowTo32(ty, getIReg(rt)),
+                                unop(Iop_ReinterpF32asI32,
+                                     getLoFromF64(Ity_F64, getDReg(fs)))));
+               putDReg(fs, unop(Iop_ReinterpI64asF64, mkexpr(t0)));
+               break;
+            } else {
+               putFReg(fs | 1, unop(Iop_ReinterpI32asF32,
+                                    mkNarrowTo32(ty, getIReg(rt))));
+               break;
+            }
          }
+         ILLEGAL_INSTRUCTON;
          break;
       } else if (fmt == 0x8) {  /* BC */
          /* FcConditionalCode(bc1_cc) */
@@ -12525,55 +12536,25 @@
                switch (fmt) {
                case 0x10:  /* S */
                   DIP("movn.s f%u, f%u, r%u", fd, fs, rt);
-                  t1 = newTemp(Ity_F64);
-                  t2 = newTemp(Ity_F64);
-                  t3 = newTemp(Ity_I1);
-                  t4 = newTemp(Ity_F64);
-                  if (mode64) {
-                     assign(t1, getFReg(fs));
-                     assign(t2, getFReg(fd));
-                     assign(t3, binop(Iop_CmpNE64, mkU64(0), getIReg(rt)));
-                  } else {
-                     if (fp_mode64) {
-                        assign(t1, getFReg(fs));
-                        assign(t2, getFReg(fd));
-                        assign(t3, binop(Iop_CmpNE32, mkU32(0), getIReg(rt)));
-                     } else {
-                        assign(t1, unop(Iop_F32toF64, getFReg(fs)));
-                        assign(t2, unop(Iop_F32toF64, getFReg(fd)));
-                        assign(t3, binop(Iop_CmpNE32, mkU32(0), getIReg(rt)));
-                     }
-                  }
+                  t1 = newTemp(Ity_I1);
 
-                  assign(t4, IRExpr_ITE(mkexpr(t3), mkexpr(t1), mkexpr(t2)));
-                  if (fp_mode64) {
-                     IRTemp f = newTemp(Ity_F64);
-                     IRTemp fd_hi = newTemp(Ity_I32);
-                     t5 = newTemp(Ity_I64);
-                     assign(f, getFReg(fd));
-                     assign(fd_hi, unop(Iop_64HIto32, unop(Iop_ReinterpF64asI64,
-                                        mkexpr(f))));
+                  if (mode64)
+                     assign(t1, binop(Iop_CmpNE64, mkU64(0), getIReg(rt)));
+                  else
+                     assign(t1, binop(Iop_CmpNE32, mkU32(0), getIReg(rt)));
 
-                     assign(t5, mkWidenFrom32(Ity_I64, unop(Iop_64to32,
-                                unop(Iop_ReinterpF64asI64, mkexpr(t4))), True));
-
-                     putFReg(fd, unop (Iop_ReinterpI64asF64, mkexpr(t5)));
-                  } else
-                     putFReg(fd, binop(Iop_F64toF32, get_IR_roundingmode(),
-                                       mkexpr(t4)));
+                  putFReg(fd, IRExpr_ITE(mkexpr(t1), getFReg(fs), getFReg(fd)));
                   break;
                case 0x11:  /* D */
                   DIP("movn.d f%u, f%u, r%u", fd, fs, rt);
-
-                  t3 = newTemp(Ity_I1);
-                  t4 = newTemp(Ity_F64);
+                  t1 = newTemp(Ity_I1);
 
                   if (mode64)
-                     assign(t3, binop(Iop_CmpNE64, mkU64(0), getIReg(rt)));
+                     assign(t1, binop(Iop_CmpNE64, mkU64(0), getIReg(rt)));
                   else
-                     assign(t3, binop(Iop_CmpNE32, mkU32(0), getIReg(rt)));
+                     assign(t1, binop(Iop_CmpNE32, mkU32(0), getIReg(rt)));
 
-                  putDReg(fd, IRExpr_ITE(mkexpr(t3), getDReg(fs), getDReg(fd)));
+                  putDReg(fd, IRExpr_ITE(mkexpr(t1), getDReg(fs), getDReg(fd)));
                   break;
                default:
                   goto decode_failure;
@@ -12584,51 +12565,25 @@
                switch (fmt) {
                case 0x10:  /* S */
                   DIP("movz.s f%u, f%u, r%u", fd, fs, rt);
+                  t1 = newTemp(Ity_I1);
 
-                  t1 = newTemp(Ity_F64);
-                  t2 = newTemp(Ity_F64);
-                  t3 = newTemp(Ity_I1);
-                  t4 = newTemp(Ity_F64);
-                  if (fp_mode64) {
-                     assign(t1, getFReg(fs));
-                     assign(t2, getFReg(fd));
-                     if (mode64)
-                        assign(t3, binop(Iop_CmpEQ64, mkU64(0), getIReg(rt)));
-                     else
-                        assign(t3, binop(Iop_CmpEQ32, mkU32(0), getIReg(rt)));
-                  } else {
-                     assign(t1, unop(Iop_F32toF64, getFReg(fs)));
-                     assign(t2, unop(Iop_F32toF64, getFReg(fd)));
-                     assign(t3, binop(Iop_CmpEQ32, mkU32(0), getIReg(rt)));
-                  }
-                  assign(t4, IRExpr_ITE(mkexpr(t3), mkexpr(t1), mkexpr(t2)));
+                  if (mode64)
+                     assign(t1, binop(Iop_CmpEQ64, mkU64(0), getIReg(rt)));
+                  else
+                     assign(t1, binop(Iop_CmpEQ32, mkU32(0), getIReg(rt)));
 
-                 if (fp_mode64) {
-                     IRTemp f = newTemp(Ity_F64);
-                     IRTemp fd_hi = newTemp(Ity_I32);
-                     t7 = newTemp(Ity_I64);
-                     assign(f, getFReg(fd));
-                     assign(fd_hi, unop(Iop_64HIto32,
-                                   unop(Iop_ReinterpF64asI64, mkexpr(f))));
-                     assign(t7, mkWidenFrom32(Ity_I64, unop(Iop_64to32,
-                                unop(Iop_ReinterpF64asI64, mkexpr(t4))), True));
-
-                     putFReg(fd, unop(Iop_ReinterpI64asF64, mkexpr(t7)));
-                  } else
-                     putFReg(fd, binop(Iop_F64toF32, get_IR_roundingmode(),
-                                       mkexpr(t4)));
-
+                  putFReg(fd, IRExpr_ITE(mkexpr(t1), getFReg(fs), getFReg(fd)));
                   break;
                case 0x11:  /* D */
                   DIP("movz.d f%u, f%u, r%u", fd, fs, rt);
-                  t3 = newTemp(Ity_I1);
-                  t4 = newTemp(Ity_F64);
-                  if (mode64)
-                     assign(t3, binop(Iop_CmpEQ64, mkU64(0), getIReg(rt)));
-                  else
-                     assign(t3, binop(Iop_CmpEQ32, mkU32(0), getIReg(rt)));
+                  t1 = newTemp(Ity_I1);
 
-                  putDReg(fd, IRExpr_ITE(mkexpr(t3), getDReg(fs), getDReg(fd)));
+                  if (mode64)
+                     assign(t1, binop(Iop_CmpEQ64, mkU64(0), getIReg(rt)));
+                  else
+                     assign(t1, binop(Iop_CmpEQ32, mkU32(0), getIReg(rt)));
+
+                  putDReg(fd, IRExpr_ITE(mkexpr(t1), getDReg(fs), getDReg(fd)));
                   break;
                default:
                   goto decode_failure;
@@ -12829,13 +12784,14 @@
 
                      putFReg(fs, mkWidenFromF32(tyF, mkexpr(t1)));
                   } else
-                     putFReg(fs, unop(Iop_ReinterpI32asF32, getIReg(rt)));
+                     putFReg(fs, unop(Iop_ReinterpI32asF32,
+                                      mkNarrowTo32(ty, getIReg(rt))));
                   break;
 
                case 0x5:  /* Doubleword Move to Floating Point DMTC1; MIPS64 */
                   DIP("dmtc1 r%u, f%u", rt, fs);
                   vassert(mode64);
-                  putFReg(fs, unop(Iop_ReinterpI64asF64, getIReg(rt)));
+                  putDReg(fs, unop(Iop_ReinterpI64asF64, getIReg(rt)));
                   break;
 
                case 0x0:  /* MFC1 */
@@ -12847,13 +12803,15 @@
                      assign(t1, unop(Iop_64to32, mkexpr(t0)));
                      putIReg(rt, mkWidenFrom32(ty, mkexpr(t1), True));
                   } else
-                     putIReg(rt, unop(Iop_ReinterpF32asI32, getFReg(fs)));
+                     putIReg(rt, mkWidenFrom32(ty,
+                                 unop(Iop_ReinterpF32asI32, getFReg(fs)),
+                                 True));
                   break;
 
                case 0x1:  /* Doubleword Move from Floating Point DMFC1;
                              MIPS64 */
                   DIP("dmfc1 r%u, f%u", rt, fs);
-                  putIReg(rt, unop(Iop_ReinterpF64asI64, getFReg(fs)));
+                  putIReg(rt, unop(Iop_ReinterpF64asI64, getDReg(fs)));
                   break;
 
                case 0x6:  /* CTC1 */
@@ -13434,60 +13392,20 @@
          }
       }
       break;  /* COP1 */
-   case 0x10:  /* COP0 */
-      if (rs == 0) {  /* MFC0 */
-         DIP("mfc0 r%u, r%u, %u", rt, rd, sel);
-         IRTemp   val  = newTemp(Ity_I32);
-         IRExpr** args = mkIRExprVec_3 (IRExpr_BBPTR(), mkU32(rd), mkU32(sel));
-         IRDirty *d = unsafeIRDirty_1_N(val,
-                                        0,
-                                        "mips32_dirtyhelper_mfc0",
-                                        &mips32_dirtyhelper_mfc0,
-                                        args);
-         stmt(IRStmt_Dirty(d));
-         putIReg(rt, mkexpr(val));
-      } else if (rs == 1) {
-         /* Doubleword Move from Coprocessor 0 - DMFC0; MIPS64 */
-         DIP("dmfc0 r%u, r%u, %u", rt, rd, sel);
-         IRTemp   val  = newTemp(Ity_I64);
-         IRExpr** args = mkIRExprVec_3 (IRExpr_BBPTR(), mkU64(rd), mkU64(sel));
-         IRDirty *d = unsafeIRDirty_1_N(val,
-                                        0,
-                                        "mips64_dirtyhelper_dmfc0",
-                                        &mips64_dirtyhelper_dmfc0,
-                                        args);
-         stmt(IRStmt_Dirty(d));
-         putDReg(rt, mkexpr(val));
-      } else
-         goto decode_failure;
-      break;
 
    case 0x31:  /* LWC1 */
       /* Load Word to Floating Point - LWC1 (MIPS32) */
       DIP("lwc1 f%u, %u(r%u)", ft, imm, rs);
+      LOAD_STORE_PATTERN;
       if (fp_mode64) {
-         t1 = newTemp(Ity_F32);
+         t0 = newTemp(Ity_F32);
          t2 = newTemp(Ity_I64);
-         if (mode64) {
-            t0 = newTemp(Ity_I64);
-            /* new LO */
-            assign(t0, binop(Iop_Add64, getIReg(rs),
-                             mkU64(extend_s_16to64(imm))));
-         } else {
-            t0 = newTemp(Ity_I32);
-            /* new LO */
-            assign(t0, binop(Iop_Add32, getIReg(rs),
-                             mkU32(extend_s_16to32(imm))));
-         }
-         assign(t1, load(Ity_F32, mkexpr(t0)));
+         assign(t0, load(Ity_F32, mkexpr(t1)));
          assign(t2, mkWidenFrom32(Ity_I64, unop(Iop_ReinterpF32asI32,
-                                                mkexpr(t1)), True));
+                                                mkexpr(t0)), True));
          putDReg(ft, unop(Iop_ReinterpI64asF64, mkexpr(t2)));
       } else {
-         t0 = newTemp(Ity_I32);
-         assign(t0, binop(Iop_Add32, getIReg(rs),
-                           mkU32(extend_s_16to32(imm))));
-         putFReg(ft, load(Ity_F32, mkexpr(t0)));
+         putFReg(ft, load(Ity_F32, mkexpr(t1)));
       }
       break;
 
@@ -13580,16 +13498,14 @@
       case 0x0: {  /* LWXC1 */
          /* Load Word  Indexed to Floating Point - LWXC1 (MIPS32r2) */
          DIP("lwxc1 f%u, r%u(r%u)", fd, rt, rs);
+         t2 = newTemp(ty);
+         assign(t2, binop(mode64 ? Iop_Add64 : Iop_Add32, getIReg(rs),
+                          getIReg(rt)));
          if (fp_mode64) {
             t0 = newTemp(Ity_I64);
             t1 = newTemp(Ity_I32);
             t3 = newTemp(Ity_F32);
             t4 = newTemp(Ity_I64);
-
-            t2 = newTemp(ty);
-            /* new LO */
-            assign(t2, binop(mode64 ? Iop_Add64 : Iop_Add32, getIReg(rs),
-                             getIReg(rt)));
             assign(t3, load(Ity_F32, mkexpr(t2)));
 
             assign(t4, mkWidenFrom32(Ity_I64, unop(Iop_ReinterpF32asI32,
@@ -13597,9 +13513,7 @@
 
             putFReg(fd, unop(Iop_ReinterpI64asF64, mkexpr(t4)));
          } else {
-            t0 = newTemp(Ity_I32);
-            assign(t0, binop(Iop_Add32, getIReg(rs), getIReg(rt)));
-            putFReg(fd, load(Ity_F32, mkexpr(t0)));
+            putFReg(fd, load(Ity_F32, mkexpr(t2)));
          }
          break;
       }
@@ -13607,90 +13521,70 @@
       case 0x1: {  /* LDXC1 */
          /* Load Doubleword  Indexed to Floating Point
             LDXC1 (MIPS32r2 and MIPS64) */
-         if (fp_mode64) {
-            DIP("ldxc1 f%u, r%u(r%u)", fd, rt, rs);
-            t0 = newTemp(ty);
-            assign(t0, binop(mode64 ? Iop_Add64 : Iop_Add32, getIReg(rs),
-                             getIReg(rt)));
-            putFReg(fd, load(Ity_F64, mkexpr(t0)));
-            break;
-         } else {
-            t0 = newTemp(Ity_I32);
-            assign(t0, binop(Iop_Add32, getIReg(rs), getIReg(rt)));
-
-            t1 = newTemp(Ity_I32);
-            assign(t1, binop(Iop_Add32, mkexpr(t0), mkU32(4)));
-
-#if defined (_MIPSEL)
-            putFReg(fd, load(Ity_F32, mkexpr(t0)));
-            putFReg(fd + 1, load(Ity_F32, mkexpr(t1)));
-#elif defined (_MIPSEB)
-            putFReg(fd + 1, load(Ity_F32, mkexpr(t0)));
-            putFReg(fd, load(Ity_F32, mkexpr(t1)));
-#endif
-            break;
-         }
+         DIP("ldxc1 f%u, r%u(r%u)", fd, rt, rs);
+         t0 = newTemp(ty);
+         assign(t0, binop(mode64 ? Iop_Add64 : Iop_Add32, getIReg(rs),
+                          getIReg(rt)));
+         putDReg(fd, load(Ity_F64, mkexpr(t0)));
+         break;
       }
 
       case 0x5:  /* Load Doubleword Indexed Unaligned to Floating Point - LUXC1;
-                    MIPS32r2 */
+                    MIPS32r2 and MIPS64 */
          DIP("luxc1 f%u, r%u(r%u)", fd, rt, rs);
-         t0 = newTemp(Ity_I64);
-         t1 = newTemp(Ity_I64);
-         assign(t0, binop(Iop_Add64, getIReg(rs), getIReg(rt)));
-         assign(t1, binop(Iop_And64, mkexpr(t0),
-                                     mkU64(0xfffffffffffffff8ULL)));
-         putFReg(fd, load(Ity_F64, mkexpr(t1)));
+         if ((mode64 || VEX_MIPS_CPU_HAS_MIPS32R2(archinfo->hwcaps))
+             && fp_mode64) {
+            t0 = newTemp(ty);
+            t1 = newTemp(ty);
+            assign(t0, binop(mode64 ? Iop_Add64 : Iop_Add32,
+                             getIReg(rs), getIReg(rt)));
+            assign(t1, binop(mode64 ? Iop_And64 : Iop_And32,
+                             mkexpr(t0),
+                             mode64 ? mkU64(0xfffffffffffffff8ULL)
+                                    : mkU32(0xfffffff8ULL)));
+            putFReg(fd, load(Ity_F64, mkexpr(t1)));
+         } else {
+            ILLEGAL_INSTRUCTON;
+         }
          break;
 
       case 0x8: {  /* Store Word Indexed from Floating Point - SWXC1 */
          DIP("swxc1 f%u, r%u(r%u)", ft, rt, rs);
+         t0 = newTemp(ty);
+         assign(t0, binop(mode64 ? Iop_Add64 : Iop_Add32, getIReg(rs),
+                          getIReg(rt)));
          if (fp_mode64) {
-            t0 = newTemp(ty);
-            assign(t0, binop(mode64 ? Iop_Add64 : Iop_Add32, getIReg(rs),
-                             getIReg(rt)));
             store(mkexpr(t0), getLoFromF64(tyF, getFReg(fs)));
-
          } else {
-            t0 = newTemp(Ity_I32);
-            assign(t0, binop(Iop_Add32, getIReg(rs), getIReg(rt)));
-
             store(mkexpr(t0), getFReg(fs));
          }
          break;
       }
       case 0x9: {  /* Store Doubleword Indexed from Floating Point - SDXC1 */
-         DIP("sdc1 f%u, %u(%u)", ft, imm, rs);
-         if (fp_mode64) {
-            t0 = newTemp(ty);
-            assign(t0, binop(mode64 ? Iop_Add64 : Iop_Add32, getIReg(rs),
-                             getIReg(rt)));
-            store(mkexpr(t0), getFReg(fs));
-         } else {
-            t0 = newTemp(Ity_I32);
-            assign(t0, binop(Iop_Add32, getIReg(rs), getIReg(rt)));
-
-            t1 = newTemp(Ity_I32);
-            assign(t1, binop(Iop_Add32, mkexpr(t0), mkU32(4)));
-
-#if defined (_MIPSEL)
-            store(mkexpr(t0), getFReg(fs));
-            store(mkexpr(t1), getFReg(fs + 1));
-#elif defined (_MIPSEB)
-            store(mkexpr(t0), getFReg(fs + 1));
-            store(mkexpr(t1), getFReg(fs));
-#endif
-         }
+         DIP("sdxc1 f%u, r%u(r%u)", fs, rt, rs);
+         t0 = newTemp(ty);
+         assign(t0, binop(mode64 ? Iop_Add64 : Iop_Add32, getIReg(rs),
+                          getIReg(rt)));
+         store(mkexpr(t0), getDReg(fs));
          break;
       }
       case 0xD:  /* Store Doubleword Indexed Unaligned from Floating Point -
                     SUXC1; MIPS64 MIPS32r2 */
          DIP("suxc1 f%u, r%u(r%u)", fd, rt, rs);
-         t0 = newTemp(Ity_I64);
-         t1 = newTemp(Ity_I64);
-         assign(t0, binop(Iop_Add64, getIReg(rs), getIReg(rt)));
-         assign(t1, binop(Iop_And64, mkexpr(t0), mkU64(0xfffffffffffffff8ULL)));
-         store(mkexpr(t1), getFReg(fs));
+         if ((mode64 || VEX_MIPS_CPU_HAS_MIPS32R2(archinfo->hwcaps))
+             && fp_mode64) {
+            t0 = newTemp(ty);
+            t1 = newTemp(ty);
+            assign(t0, binop(mode64 ? Iop_Add64 : Iop_Add32,
+                             getIReg(rs), getIReg(rt)));
+            assign(t1, binop(mode64 ? Iop_And64 : Iop_And32,
+                             mkexpr(t0),
+                             mode64 ? mkU64(0xfffffffffffffff8ULL)
+                                    : mkU32(0xfffffff8ULL)));
+            store(mkexpr(t1), getFReg(fs));
+         } else {
+            ILLEGAL_INSTRUCTON;
+         }
          break;
 
       case 0x0F: {
@@ -17289,10 +17183,7 @@
    vassert(guest_arch == VexArchMIPS32 || guest_arch == VexArchMIPS64);
 
    mode64 = guest_arch != VexArchMIPS32;
-#if (__mips_fpr==64)
-   fp_mode64 = ((VEX_MIPS_REV(archinfo->hwcaps) == VEX_PRID_CPU_32FPR)
-                || guest_arch == VexArchMIPS64);
-#endif
+   fp_mode64 = abiinfo->guest_mips_fp_mode64;
 
    guest_code = guest_code_IN;
    irsb = irsb_IN;
diff --git a/VEX/priv/guest_ppc_defs.h b/VEX/priv/guest_ppc_defs.h
index f2cf7c7..008418d 100644
--- a/VEX/priv/guest_ppc_defs.h
+++ b/VEX/priv/guest_ppc_defs.h
@@ -147,7 +147,16 @@
 
 /* --- CLEAN HELPERS --- */
 
-/* none, right now */
+extern ULong is_BCDstring128_helper( ULong Signed, ULong hi64, ULong low64 );
+extern ULong increment_BCDstring32_helper( ULong Signed,
+                                           ULong bcd_string, ULong carry_in );
+extern ULong convert_to_zoned_helper( ULong src_hi, ULong src_low,
+                                      ULong upper_byte,
+                                      ULong return_upper );
+extern ULong convert_to_national_helper( ULong src, ULong return_upper );
+extern ULong convert_from_zoned_helper( ULong src_hi, ULong src_low );
+extern ULong convert_from_national_helper( ULong src_hi, ULong src_low );
+
 
 /* --- DIRTY HELPERS --- */
 
diff --git a/VEX/priv/guest_ppc_helpers.c b/VEX/priv/guest_ppc_helpers.c
index a94bdad..0a4a31c 100644
--- a/VEX/priv/guest_ppc_helpers.c
+++ b/VEX/priv/guest_ppc_helpers.c
@@ -216,6 +216,221 @@
 }
 
 
+/*---------------------------------------------------------------*/
+/*--- Misc BCD clean helpers.                                 ---*/
+/*---------------------------------------------------------------*/
+
+/* NOTE, the clean and dirty helpers need to called using the
+ * fnptr_to_fnentry() function wrapper to handle the Big Endian
+ * pointer-to-function ABI and the Little Endian ABI.
+ */
+
+/* This C-helper takes a 128-bit BCD value as two 64-bit pieces.
+ * It checks the string to see if it is a valid 128-bit BCD value.
+ * A valid BCD value has a sign value in bits [3:0] between 0xA
+ * and 0xF inclusive. each of the BCD digits represented as a 4-bit
+ * hex number in bits BCD value[128:4] mut be between 0 and 9
+ * inclusive.  Returns an unsigned 64-bit value if valid.
+ */
+ULong is_BCDstring128_helper( ULong Signed, ULong bcd_string_hi,
+                              ULong bcd_string_low ) {
+   Int i;
+   ULong valid_bcd, sign_valid = False;
+   ULong digit;
+   UInt  sign;
+
+   if ( Signed == True ) {
+      sign = bcd_string_low & 0xF;
+      if( ( sign >= 0xA ) && ( sign <= 0xF ) )
+         sign_valid = True;
+
+      /* Change the sign digit to a zero
+       * so the for loop below works the same
+       * for signed and unsigned BCD stings
+       */
+      bcd_string_low &= 0xFFFFFFFFFFFFFFF0ULL;
+
+   } else {
+      sign_valid = True;  /* set sign to True so result is only
+                             based on the validity of the digits */
+   }
+
+   valid_bcd = True;  // Assume true to start
+   for( i = 0; i < 32; i++ ) {
+      /* check high and low 64-bit strings in parallel */
+      digit = bcd_string_low & 0xF;
+      if ( digit > 0x9 )
+         valid_bcd = False;
+      bcd_string_low = bcd_string_low >> 4;
+
+      digit = bcd_string_hi & 0xF;
+      if ( digit > 0x9 )
+         valid_bcd = False;
+      bcd_string_hi = bcd_string_hi >> 4;
+   }
+
+   return valid_bcd & sign_valid;
+}
+
+/* This clean helper takes a signed 32-bit BCD value and a carry in
+ * and adds 1 to the value of the BCD value.  The BCD value is passed
+ * in as a single 64-bit value.  The incremented value is returned in
+ * the lower 32 bits of the result.  If the input was signed the sign of
+ * the result is the same as the input.  The carry out is returned in
+ * bits [35:32] of the result.
+ */
+ULong increment_BCDstring32_helper( ULong Signed,
+                                    ULong bcd_string, ULong carry_in ) {
+   UInt i, num_digits = 8;
+   ULong bcd_value, result = 0;
+   ULong carry, digit, new_digit;
+
+   carry = carry_in;
+
+   if ( Signed == True ) {
+      bcd_value = bcd_string >> 4;   /* remove sign */
+      num_digits = num_digits - 1;
+   } else {
+      bcd_value = bcd_string;
+   }
+
+   for( i = 0; i < num_digits; i++ ) {
+      digit = bcd_value & 0xF;
+      bcd_value = bcd_value >> 4;
+      new_digit = digit + carry;
+
+      if ( new_digit > 10 ) {
+         carry = 1;
+         new_digit = new_digit - 10;
+
+      } else {
+         carry = 0;
+      }
+      result =  result | (new_digit << (i*4) );
+   }
+
+   if ( Signed == True ) {
+      result = ( carry << 32) | ( result << 4 ) | ( bcd_string & 0xF );
+   } else {
+      result = ( carry << 32) | result;
+   }
+
+   return result;
+}
+
+/*---------------------------------------------------------------*/
+/*--- Misc packed decimal clean helpers.                      ---*/
+/*---------------------------------------------------------------*/
+
+/* This C-helper takes a 64-bit packed decimal value stored in a
+ * 64-bit value. It converts the zoned decimal format.  The lower
+ * byte may contain a sign value, set it to zero.  If return_upper
+ * is zero, return lower 64 bits of result, otherwise return upper
+ * 64 bits of the result.
+ */
+ULong convert_to_zoned_helper( ULong src_hi, ULong src_low,
+                               ULong upper_byte, ULong return_upper ) {
+   UInt i, sh;
+   ULong tmp = 0, new_value;
+
+   /* Remove the sign from the source.  Put in the upper byte of result.
+    * Sign inserted later.
+    */
+   if ( return_upper == 0 ) {  /* return lower 64-bit result */
+      for(i = 0; i < 7; i++) {
+         sh = ( 8 - i ) * 4;
+         new_value = ( ( src_low >> sh ) & 0xf ) | upper_byte;
+         tmp = tmp | ( new_value <<  ( ( 7 - i ) * 8 ) );
+      }
+
+   } else {
+      /* Byte for i=0 is in upper 64-bit of the source, do it separately */
+      new_value = ( src_hi & 0xf ) | upper_byte;
+      tmp = tmp | new_value << 56;
+
+      for( i = 1; i < 8; i++ ) {
+         sh = ( 16 - i ) * 4;
+         new_value = ( ( src_low >> sh ) & 0xf ) | upper_byte;
+         tmp = tmp | ( new_value <<  ( ( 7 - i ) * 8 ) );
+      }
+   }
+   return tmp;
+}
+
+/* This C-helper takes the lower 64-bits of the 128-bit packed decimal
+ * src value.  It converts the src value to a 128-bit national format.
+ * If return_upper is zero, the helper returns lower 64 bits of result,
+ * otherwise it returns the upper 64-bits of the result.
+ */
+ULong convert_to_national_helper( ULong src, ULong return_upper ) {
+
+   UInt i;
+   UInt sh = 3, max = 4, min = 0;  /* initialize max, min for return upper */
+   ULong tmp = 0, new_value;
+
+   if ( return_upper == 0 ) {  /* return lower 64-bit result */
+      min = 4;
+      max = 7;
+      sh  = 7;
+   }
+
+   for( i = min; i < max; i++ ) {
+      new_value = ( ( src >> ( ( 7 - i ) * 4 ) ) & 0xf ) | 0x0030;
+      tmp = tmp | ( new_value <<  ( ( sh - i ) * 16 ) );
+   }
+   return tmp;
+}
+
+/* This C-helper takes a 128-bit zoned value stored in a 128-bit
+ * value. It converts it to the packed 64-bit decimal format without a
+ * a sign value.  The sign is supposed to be in bits [3:0] and the packed
+ * value in bits [67:4].  This helper leaves it to the caller to put the
+ * result into a V128 and shift the returned value over and put the sign
+ * in.
+ */
+ULong convert_from_zoned_helper( ULong src_hi, ULong src_low ) {
+   UInt i;
+   ULong tmp = 0, nibble;
+
+   /* Unroll the i = 0 iteration so the sizes of the loop for the upper
+    * and lower extraction match.  Skip sign in lease significant byte.
+    */
+   nibble = ( src_hi >> 56 ) & 0xF;
+   tmp = tmp | ( nibble << 60 );
+
+   for( i = 1; i < 8; i++ ) {
+      /* get the high nibbles, put into result */
+      nibble = ( src_hi >> ( ( 7 - i ) * 8 ) ) & 0xF;
+      tmp = tmp | ( nibble << ( ( 15 - i ) * 4 ) );
+
+      /* get the low nibbles, put into result */
+      nibble = ( src_low >> ( ( 8 - i ) * 8 ) ) & 0xF;
+      tmp = tmp | ( nibble << ( ( 8 - i ) * 4 ) );
+   }
+   return tmp;
+}
+
+/* This C-helper takes a 128-bit national value stored in a 128-bit
+ * value. It converts it to a signless packed 64-bit decimal format.
+ */
+ULong convert_from_national_helper( ULong src_hi, ULong src_low ) {
+   UInt i;
+   ULong tmp = 0, hword;
+
+   src_low = src_low & 0xFFFFFFFFFFFFFFF0ULL; /* remove the sign */
+
+   for( i = 0; i < 4; i++ ) {
+      /* get the high half-word, put into result */
+      hword = ( src_hi >> ( ( 3 - i ) * 16 ) ) & 0xF;
+      tmp = tmp | ( hword << ( ( 7 - i ) * 4 ) );
+
+      /* get the low half-word, put into result */
+      hword = ( src_low >> (  ( 3 - i ) * 16 ) ) & 0xF;
+      tmp = tmp | ( hword << ( ( 3 - i ) * 4 ) );
+   }
+   return tmp;
+}
+
 /*----------------------------------------------*/
 /*--- The exported fns ..                    ---*/
 /*----------------------------------------------*/
@@ -500,7 +715,7 @@
 
    vex_state->guest_FPROUND  = PPCrm_NEAREST;
    vex_state->guest_DFPROUND = PPCrm_NEAREST;
-   vex_state->pad1 = 0;
+   vex_state->guest_C_FPCC   = 0;
    vex_state->pad2 = 0;
 
    vex_state->guest_VRSAVE = 0;
@@ -667,7 +882,7 @@
 
    vex_state->guest_FPROUND  = PPCrm_NEAREST;
    vex_state->guest_DFPROUND = PPCrm_NEAREST;
-   vex_state->pad1 = 0;
+   vex_state->guest_C_FPCC   = 0;
    vex_state->pad2 = 0;
 
    vex_state->guest_VRSAVE = 0;
@@ -820,7 +1035,7 @@
 
           /* Describe any sections to be regarded by Memcheck as
              'always-defined'. */
-          .n_alwaysDefd = 11,
+          .n_alwaysDefd = 12,
 
           .alwaysDefd 
 	  = { /*  0 */ ALWAYSDEFD32(guest_CIA),
@@ -833,7 +1048,8 @@
 	      /*  7 */ ALWAYSDEFD32(guest_NRADDR_GPR2),
 	      /*  8 */ ALWAYSDEFD32(guest_REDIR_SP),
 	      /*  9 */ ALWAYSDEFD32(guest_REDIR_STACK),
-	      /* 10 */ ALWAYSDEFD32(guest_IP_AT_SYSCALL)
+	      /* 10 */ ALWAYSDEFD32(guest_IP_AT_SYSCALL),
+	      /* 11 */ ALWAYSDEFD32(guest_C_FPCC)
             }
         };
 
@@ -861,7 +1077,7 @@
 
           /* Describe any sections to be regarded by Memcheck as
              'always-defined'. */
-          .n_alwaysDefd = 11,
+          .n_alwaysDefd = 12,
 
           .alwaysDefd 
 	  = { /*  0 */ ALWAYSDEFD64(guest_CIA),
@@ -874,7 +1090,8 @@
 	      /*  7 */ ALWAYSDEFD64(guest_NRADDR_GPR2),
 	      /*  8 */ ALWAYSDEFD64(guest_REDIR_SP),
 	      /*  9 */ ALWAYSDEFD64(guest_REDIR_STACK),
-	      /* 10 */ ALWAYSDEFD64(guest_IP_AT_SYSCALL)
+	      /* 10 */ ALWAYSDEFD64(guest_IP_AT_SYSCALL),
+	      /* 11 */ ALWAYSDEFD64(guest_C_FPCC)
             }
         };
 
diff --git a/VEX/priv/guest_ppc_toIR.c b/VEX/priv/guest_ppc_toIR.c
index 6a0b6d8..c393740 100644
--- a/VEX/priv/guest_ppc_toIR.c
+++ b/VEX/priv/guest_ppc_toIR.c
@@ -277,6 +277,7 @@
 #define OFFB_XER_BC      offsetofPPCGuestState(guest_XER_BC)
 #define OFFB_FPROUND     offsetofPPCGuestState(guest_FPROUND)
 #define OFFB_DFPROUND    offsetofPPCGuestState(guest_DFPROUND)
+#define OFFB_C_FPCC      offsetofPPCGuestState(guest_C_FPCC)
 #define OFFB_VRSAVE      offsetofPPCGuestState(guest_VRSAVE)
 #define OFFB_VSCR        offsetofPPCGuestState(guest_VSCR)
 #define OFFB_EMNOTE      offsetofPPCGuestState(guest_EMNOTE)
@@ -324,6 +325,11 @@
    return IFIELD( instr, 1, 5 );
 }
 
+/* Extract 2-bit secondary opcode, instr[1:0] */
+static UInt ifieldOPC0o2 ( UInt instr) {
+   return IFIELD( instr, 0, 2 );
+}
+
 /* Extract RD (destination register) field, instr[25:21] */
 static UChar ifieldRegDS( UInt instr ) {
    return toUChar( IFIELD( instr, 21, 5 ) );
@@ -451,8 +457,10 @@
     PPC_GST_MAX
 } PPC_GST;
 
-#define MASK_FPSCR_RN   0x3ULL  // Binary floating point rounding mode
-#define MASK_FPSCR_DRN  0x700000000ULL // Decimal floating point rounding mode
+#define MASK_FPSCR_RN     0x3ULL         // Binary floating point rounding mode
+#define MASK_FPSCR_DRN    0x700000000ULL // Decimal floating point rounding mode
+#define MASK_FPSCR_C_FPCC 0x1F000ULL     // Floating-Point Condition code FPCC
+
 #define MASK_VSCR_VALID 0x00010001
 
 
@@ -628,6 +636,8 @@
    return IRExpr_RdTmp(tmp);
 }
 
+#define mkU1(_n)  IRExpr_Const(IRConst_U1(_n))
+
 static IRExpr* mkU8 ( UChar i )
 {
    return IRExpr_Const(IRConst_U8(i));
@@ -688,6 +698,44 @@
                                            unop(Iop_1Uto32, arg2)));
 }
 
+static inline IRExpr* mkXOr4_32( IRTemp t0, IRTemp t1, IRTemp t2,
+                                 IRTemp t3 )
+{
+   return binop( Iop_Xor32,
+                 binop( Iop_Xor32, mkexpr( t0 ), mkexpr( t1 ) ),
+                 binop( Iop_Xor32, mkexpr( t2 ), mkexpr( t3 ) ) );
+}
+
+static inline IRExpr* mkOr3_V128( IRTemp t0, IRTemp t1, IRTemp t2 )
+{
+   return binop( Iop_OrV128,
+                 mkexpr( t0 ),
+                 binop( Iop_OrV128, mkexpr( t1 ), mkexpr( t2 ) ) );
+}
+
+static inline IRExpr* mkOr4_V128( IRTemp t0, IRTemp t1, IRTemp t2,
+                                  IRTemp t3 )
+{
+   return binop( Iop_OrV128,
+                 binop( Iop_OrV128, mkexpr( t0 ), mkexpr( t1 ) ),
+                 binop( Iop_OrV128, mkexpr( t2 ), mkexpr( t3 ) ) );
+}
+
+static inline IRExpr* mkOr4_V128_expr( IRExpr* t0, IRExpr* t1, IRExpr* t2,
+                                       IRExpr* t3 )
+{
+   /* arguments are already expressions */
+   return binop( Iop_OrV128,
+                 binop( Iop_OrV128, ( t0 ), ( t1 ) ),
+                 binop( Iop_OrV128, ( t2 ), ( t3 ) ) );
+}
+
+static IRExpr* mkNOT1 ( IRExpr* arg1 )
+{
+   vassert(typeOfIRExpr(irsb->tyenv, arg1) == Ity_I1);
+   return unop(Iop_32to1, unop(Iop_Not32, unop(Iop_1Uto32, arg1) ) );
+}
+
 /* expand V128_8Ux16 to 2x V128_16Ux8's */
 static void expand8Ux16( IRExpr* vIn,
                          /*OUTs*/ IRTemp* vEvn, IRTemp* vOdd )
@@ -888,6 +936,20 @@
    );
 }
 
+static IRExpr* extract_field_from_vector( IRTemp vB, IRExpr* index, UInt mask)
+{
+   /* vB is a vector, extract bits starting at index to size of mask */
+   return unop( Iop_V128to64,
+                binop( Iop_AndV128,
+                       binop( Iop_ShrV128,
+                              mkexpr( vB ),
+                              unop( Iop_64to8,
+                                    binop( Iop_Mul64, index,
+                                           mkU64( 8 ) ) ) ),
+                       binop( Iop_64HLtoV128,
+                              mkU64( 0x0 ),
+                              mkU64( mask ) ) ) );
+}
 
 /* Signed saturating narrow 64S to 32 */
 static IRExpr* mkQNarrow64Sto32 ( IRExpr* t64 )
@@ -1444,6 +1506,21 @@
    return IRExpr_Get( vectorGuestRegOffset(archreg), Ity_V128 );
 }
 
+/* Get contents of 128-bit reg guest register */
+static IRExpr* getF128Reg ( UInt archreg )
+{
+   vassert(archreg < 64);
+   return IRExpr_Get( vectorGuestRegOffset(archreg), Ity_F128 );
+}
+
+/* Ditto, but write to a reg instead. */
+static void putF128Reg ( UInt archreg, IRExpr* e )
+{
+   vassert(archreg < 64);
+   vassert(typeOfIRExpr(irsb->tyenv, e) == Ity_F128);
+   stmt( IRStmt_Put(vectorGuestRegOffset(archreg), e) );
+}
+
 /* Ditto, but write to a reg instead. */
 static void putVReg ( UInt archreg, IRExpr* e )
 {
@@ -1848,6 +1925,36 @@
    stmt( IRStmt_Put(guestCR0offset(cr), e) );
 }
 
+static void putC ( IRExpr* e )
+{
+   /* The assumption is that the value of the Floating-Point Result
+    * Class Descriptor bit (C) is passed in the lower four bits of a
+    * 32 bit value.
+    *
+    * Note, the C and FPCC bits which are fields in the FPSCR
+    * register are stored in their own memory location of
+    * memory.  The FPCC bits are in the lower 4 bits.  The C bit needs
+    * to be shifted to bit 4 in the memory location that holds C and FPCC.
+    * Note not all of the FPSCR register bits are supported.  We are
+    * only writing C bit.
+    */
+   IRExpr* tmp;
+
+   vassert(typeOfIRExpr(irsb->tyenv, e) == Ity_I32);
+
+   /* Get the FPCC bit field */
+   tmp = binop( Iop_And32,
+                mkU32( 0xF ),
+                unop( Iop_8Uto32, IRExpr_Get( OFFB_C_FPCC, Ity_I8 ) ) );
+
+   stmt( IRStmt_Put( OFFB_C_FPCC,
+                     unop( Iop_32to8,
+                           binop( Iop_Or32, tmp,
+                                  binop( Iop_Shl32,
+                                         binop( Iop_And32, mkU32( 0x1 ), e ),
+                                         mkU8( 4 ) ) ) ) ) );
+}
+
 static IRExpr* /* :: Ity_I8 */ getCR0 ( UInt cr )
 {
    vassert(cr < 8);
@@ -2012,6 +2119,73 @@
 } 
 
 
+static IRExpr * create_DCM ( IRType size, IRTemp NaN, IRTemp inf, IRTemp zero,
+                             IRTemp dnorm, IRTemp pos)
+{
+   /* This is a general function for creating the DCM for a 32-bit or
+      64-bit expression based on the passes size.
+   */
+   IRTemp neg;
+   IROp opAND, opOR, opSHL, opXto1, op1UtoX;
+
+   vassert( ( size == Ity_I32 ) || ( size == Ity_I64 ) );
+
+   if ( size == Ity_I32 ) {
+      opSHL = Iop_Shl32;
+      opAND = Iop_And32;
+      opOR  = Iop_Or32;
+      opXto1 = Iop_32to1;
+      op1UtoX = Iop_1Uto32;
+      neg = newTemp( Ity_I32 );
+
+   } else {
+      opSHL = Iop_Shl64;
+      opAND = Iop_And64;
+      opOR  = Iop_Or64;
+      opXto1 = Iop_64to1;
+      op1UtoX = Iop_1Uto64;
+      neg = newTemp( Ity_I64 );
+   }
+
+   assign( neg, unop( op1UtoX, mkNOT1( unop( opXto1,
+                                             mkexpr ( pos ) ) ) ) );
+
+   return binop( opOR,
+                 binop( opSHL, mkexpr( NaN ), mkU8( 6 ) ),
+                 binop( opOR,
+                        binop( opOR,
+                               binop( opOR,
+                                      binop( opSHL,
+                                             binop( opAND,
+                                                    mkexpr( pos ),
+                                                    mkexpr( inf ) ),
+                                             mkU8( 5 ) ),
+                                      binop( opSHL,
+                                             binop( opAND,
+                                                    mkexpr( neg ),
+                                                    mkexpr( inf ) ),
+                                             mkU8( 4 ) ) ),
+                               binop( opOR,
+                                      binop( opSHL,
+                                             binop( opAND,
+                                                    mkexpr( pos ),
+                                                    mkexpr( zero ) ),
+                                             mkU8( 3 ) ),
+                                      binop( opSHL,
+                                             binop( opAND,
+                                                    mkexpr( neg ),
+                                                    mkexpr( zero ) ),
+                                             mkU8( 2 ) ) ) ),
+                        binop( opOR,
+                               binop( opSHL,
+                                      binop( opAND,
+                                             mkexpr( pos ),
+                                             mkexpr( dnorm ) ),
+                                      mkU8( 1 ) ),
+                               binop( opAND,
+                                      mkexpr( neg ),
+                                      mkexpr( dnorm ) ) ) ) );
+}
 
 /*------------------------------------------------------------*/
 /*--- Helpers for XER flags.                               ---*/
@@ -2773,7 +2947,7 @@
 }
 
 /* Get a masked word from the given reg */
-static IRExpr* /* ::Ity_I32 */ getGST_masked ( PPC_GST reg, UInt mask )
+static IRExpr* /* ::Ity_I32 */ getGST_masked ( PPC_GST reg, ULong mask )
 {
    IRTemp val = newTemp(Ity_I32);
    vassert( reg < PPC_GST_MAX );
@@ -2785,15 +2959,16 @@
          all exceptions masked, round-to-nearest.
          This corresponds to a FPSCR value of 0x0. */
 
-      /* In the lower 32 bits of FPSCR, we're only keeping track of
-       * the binary floating point rounding mode, so if the mask isn't
-       * asking for this, just return 0x0.
+      /* In the lower 32 bits of FPSCR, we're keeping track of the binary
+       * floating point rounding mode and Floating-point Condition code, so
+       * if the mask isn't asking for either of these, just return 0x0.
        */
-      if (mask & MASK_FPSCR_RN) {
+      if ( mask & MASK_FPSCR_RN ) {
          assign( val, unop( Iop_8Uto32, IRExpr_Get( OFFB_FPROUND, Ity_I8 ) ) );
       } else {
          assign( val, mkU32(0x0) );
       }
+
       break;
    }
 
@@ -2802,7 +2977,7 @@
       vpanic("getGST_masked(ppc)");
    }
 
-   if (mask != 0xFFFFFFFF) {
+   if ( mask != 0xFFFFFFFF ) {
       return binop(Iop_And32, mkexpr(val), mkU32(mask));
    } else {
       return mkexpr(val);
@@ -3079,6 +3254,31 @@
             )
          );
       }
+
+      if (mask & MASK_FPSCR_C_FPCC) {
+         stmt(
+            IRStmt_Put(
+               OFFB_C_FPCC,
+               unop(
+                  Iop_32to8,
+                  binop(
+                     Iop_Or32,
+                     binop(
+                        Iop_And32,
+                        unop(Iop_64to32, src),
+                        mkU32(MASK_FPSCR_C_FPCC & mask)
+                     ),
+                     binop(
+                        Iop_And32,
+                        unop(Iop_8Uto32, IRExpr_Get(OFFB_C_FPCC,Ity_I8)),
+                        mkU32(MASK_FPSCR_C_FPCC & ~mask)
+                     )
+                  )
+               )
+            )
+         );
+      }
+
       /* Similarly, update FPSCR.DRN if any bits of |mask|
          corresponding to FPSCR.DRN are set. */
       if (mask & MASK_FPSCR_DRN) {
@@ -3172,6 +3372,61 @@
    }
 }
 
+static void putFPCC ( IRExpr* e )
+{
+   /* The assumption is that the value of the FPCC are passed in the lower
+    * four bits of a 32 bit value.
+    *
+    * Note, the C and FPCC bits which are a field of the FPSCR
+    * register are stored in their own "register" in
+    * memory.  The FPCC bits are in the lower 4 bits.  We don't need to
+    * shift it to the bits to their location in the FPSCR register.  Note,
+    * not all of the FPSCR register bits are supported.  We are writing all
+    * of the bits in the FPCC field but not the C field.
+    */
+   IRExpr* tmp;
+
+   vassert( typeOfIRExpr( irsb->tyenv, e ) == Ity_I32 );
+   /* Get the C bit field */
+   tmp = binop( Iop_And32,
+                mkU32( 0x10 ),
+                unop( Iop_8Uto32, IRExpr_Get( OFFB_C_FPCC, Ity_I8 ) ) );
+
+   stmt( IRStmt_Put( OFFB_C_FPCC,
+                     unop( Iop_32to8,
+                           binop( Iop_Or32, tmp,
+                                  binop( Iop_And32, mkU32( 0xF ), e ) ) ) ) );
+
+}
+
+static IRExpr* /* ::Ity_I32 */  getC ( void )
+{
+   /* Note, the Floating-Point Result Class Descriptor (C) bit is a field of
+    * the FPSCR registered are stored in its own "register" in guest state
+    * with the FPCC bit field.   C | FPCC
+    */
+   IRTemp val = newTemp(Ity_I32);
+
+   assign( val, binop( Iop_Shr32,
+                       unop( Iop_8Uto32, IRExpr_Get( OFFB_C_FPCC, Ity_I8 ) ),
+                       mkU8( 4 ) ) );
+   return mkexpr(val);
+}
+
+static IRExpr* /* ::Ity_I32 */  getFPCC ( void )
+{
+   /* Note, the FPCC bits are a field of the FPSCR
+    * register are stored in their own "register" in
+    * guest state with the C bit field.   C | FPCC
+    */
+   IRTemp val = newTemp( Ity_I32 );
+
+   assign( val, binop( Iop_And32, unop( Iop_8Uto32,
+                                        IRExpr_Get( OFFB_C_FPCC, Ity_I8 ) ),
+                       mkU32( 0xF ) ));
+   return mkexpr(val);
+}
+
 /*------------------------------------------------------------*/
 /* Helpers for VSX instructions that do floating point
  * operations and need to determine if a src contains a
@@ -3184,129 +3439,497 @@
                                mkexpr( x ), \
                                mkU64( NONZERO_FRAC_MASK ) )
 
-// Returns exponent part of a single precision floating point as I32
-static IRExpr * fp_exp_part_sp(IRTemp src)
-{
-   return binop( Iop_And32,
-                 binop( Iop_Shr32, mkexpr( src ), mkU8( 23 ) ),
-                 mkU32( 0xff ) );
-}
-
-// Returns exponent part of floating point as I32
-static IRExpr * fp_exp_part(IRTemp src, Bool sp)
-{
-   IRExpr * exp;
-   if (sp)
-      return fp_exp_part_sp(src);
-
-   if (!mode64)
-      exp = binop( Iop_And32, binop( Iop_Shr32, unop( Iop_64HIto32,
-                                                      mkexpr( src ) ),
-                                     mkU8( 20 ) ), mkU32( 0x7ff ) );
-   else
-      exp = unop( Iop_64to32,
-                  binop( Iop_And64,
-                         binop( Iop_Shr64, mkexpr( src ), mkU8( 52 ) ),
-                         mkU64( 0x7ff ) ) );
-   return exp;
-}
-
-static IRExpr * is_Inf_sp(IRTemp src)
-{
-   IRTemp frac_part = newTemp(Ity_I32);
-   IRExpr * Inf_exp;
-
-   assign( frac_part, binop( Iop_And32, mkexpr(src), mkU32(0x007fffff)) );
-   Inf_exp = binop( Iop_CmpEQ32, fp_exp_part( src, True /*single precision*/ ), mkU32( 0xff ) );
-   return mkAND1( Inf_exp, binop( Iop_CmpEQ32, mkexpr( frac_part ), mkU32( 0 ) ) );
-}
-
-
-// Infinity: exp = 7ff and fraction is zero; s = 0/1
-static IRExpr * is_Inf(IRTemp src, Bool sp)
-{
-   IRExpr * Inf_exp, * hi32, * low32;
-   IRTemp frac_part;
-
-   if (sp)
-      return is_Inf_sp(src);
-
-   frac_part = newTemp(Ity_I64);
-   assign( frac_part, FP_FRAC_PART(src) );
-   Inf_exp = binop( Iop_CmpEQ32, fp_exp_part( src, False /*not single precision*/  ), mkU32( 0x7ff ) );
-   hi32 = unop( Iop_64HIto32, mkexpr( frac_part ) );
-   low32 = unop( Iop_64to32, mkexpr( frac_part ) );
-   return mkAND1( Inf_exp, binop( Iop_CmpEQ32, binop( Iop_Or32, low32, hi32 ),
-                                  mkU32( 0 ) ) );
-}
-
-static IRExpr * is_Zero_sp(IRTemp src)
-{
-   IRTemp sign_less_part = newTemp(Ity_I32);
-   assign( sign_less_part, binop( Iop_And32, mkexpr( src ), mkU32( SIGN_MASK32 ) ) );
-   return binop( Iop_CmpEQ32, mkexpr( sign_less_part ), mkU32( 0 ) );
-}
-
-// Zero: exp is zero and fraction is zero; s = 0/1
-static IRExpr * is_Zero(IRTemp src, Bool sp)
-{
-   IRExpr * hi32, * low32;
-   IRTemp sign_less_part;
-   if (sp)
-      return is_Zero_sp(src);
-
-   sign_less_part = newTemp(Ity_I64);
-
-   assign( sign_less_part, binop( Iop_And64, mkexpr( src ), mkU64( SIGN_MASK ) ) );
-   hi32 = unop( Iop_64HIto32, mkexpr( sign_less_part ) );
-   low32 = unop( Iop_64to32, mkexpr( sign_less_part ) );
-   return binop( Iop_CmpEQ32, binop( Iop_Or32, low32, hi32 ),
-                              mkU32( 0 ) );
-}
-
-/*  SNAN: s = 1/0; exp = 0x7ff; fraction is nonzero, with highest bit '1'
- *  QNAN: s = 1/0; exp = 0x7ff; fraction is nonzero, with highest bit '0'
- *  This function returns an IRExpr value of '1' for any type of NaN.
- */
-static IRExpr * is_NaN(IRTemp src)
-{
-   IRExpr * NaN_exp, * hi32, * low32;
-   IRTemp frac_part = newTemp(Ity_I64);
-
-   assign( frac_part, FP_FRAC_PART(src) );
-   hi32 = unop( Iop_64HIto32, mkexpr( frac_part ) );
-   low32 = unop( Iop_64to32, mkexpr( frac_part ) );
-   NaN_exp = binop( Iop_CmpEQ32, fp_exp_part( src, False /*not single precision*/ ),
-                    mkU32( 0x7ff ) );
-
-   return mkAND1( NaN_exp, binop( Iop_CmpNE32, binop( Iop_Or32, low32, hi32 ),
-                                               mkU32( 0 ) ) );
-}
-
-/* This function returns an IRExpr value of '1' for any type of NaN.
- * The passed 'src' argument is assumed to be Ity_I32.
- */
-static IRExpr * is_NaN_32(IRTemp src)
-{
 #define NONZERO_FRAC_MASK32 0x007fffffULL
-#define FP_FRAC_PART32(x) binop( Iop_And32, \
+#define FP_FRAC_PART32(x) binop( Iop_And32,   \
                                  mkexpr( x ), \
                                  mkU32( NONZERO_FRAC_MASK32 ) )
 
-   IRExpr * frac_part = FP_FRAC_PART32(src);
-   IRExpr * exp_part = binop( Iop_And32,
-                              binop( Iop_Shr32, mkexpr( src ), mkU8( 23 ) ),
-                              mkU32( 0x0ff ) );
-   IRExpr * NaN_exp = binop( Iop_CmpEQ32, exp_part, mkU32( 0xff ) );
+// Returns exponent part of floating point src as I32
+static IRExpr * fp_exp_part( IRType size, IRTemp src )
+{
+   IRExpr *shift_by, *mask, *tsrc;
 
-   return mkAND1( NaN_exp, binop( Iop_CmpNE32, frac_part, mkU32( 0 ) ) );
+   vassert( ( size == Ity_I16 ) || ( size == Ity_I32 )
+            || ( size == Ity_I64 ) );
+
+   if( size == Ity_I16 ) {
+      /* The 16-bit floating point value is in the lower 16-bits
+       * of the 32-bit input value.
+       */
+      tsrc  = mkexpr( src );
+      mask  = mkU32( 0x1F );
+      shift_by = mkU8( 10 );
+
+   } else if( size == Ity_I32 ) {
+      tsrc  = mkexpr( src );
+      mask  = mkU32( 0xFF );
+      shift_by = mkU8( 23 );
+
+   } else if( size == Ity_I64 ) {
+      tsrc  = unop( Iop_64HIto32, mkexpr( src ) );
+      mask  = mkU32( 0x7FF );
+      shift_by = mkU8( 52 - 32 );
+
+   } else {
+      /*NOTREACHED*/
+      vassert(0); // Stops gcc complaining at "-Og"
+   }
+
+   return binop( Iop_And32, binop( Iop_Shr32, tsrc, shift_by ), mask );
+}
+
+/* The following functions check the floating point value to see if it
+   is zero, infinity, NaN, Normalized, Denormalized.
+*/
+/* 16-bit floating point number is stored in the lower 16-bits of 32-bit value */
+#define I16_EXP_MASK       0x7C00
+#define I16_FRACTION_MASK  0x03FF
+#define I32_EXP_MASK       0x7F800000
+#define I32_FRACTION_MASK  0x007FFFFF
+#define I64_EXP_MASK       0x7FF0000000000000ULL
+#define I64_FRACTION_MASK  0x000FFFFFFFFFFFFFULL
+#define V128_EXP_MASK      0x7FFF000000000000ULL
+#define V128_FRACTION_MASK 0x0000FFFFFFFFFFFFULL  /* upper 64-bit fractional mask */
+
+void setup_value_check_args( IRType size, IRTemp *exp_mask, IRTemp *frac_mask,
+                             IRTemp *zero );
+
+void setup_value_check_args( IRType size, IRTemp *exp_mask, IRTemp *frac_mask,
+                             IRTemp *zero ) {
+
+   vassert( ( size == Ity_I16 ) || ( size == Ity_I32 )
+            || ( size == Ity_I64 ) || ( size == Ity_V128 ) );
+
+   if( size == Ity_I16 ) {
+      /* The 16-bit floating point value is in the lower 16-bits of
+         the 32-bit input value */
+      *frac_mask = newTemp( Ity_I32 );
+      *exp_mask  = newTemp( Ity_I32 );
+      *zero  = newTemp( Ity_I32 );
+      assign( *exp_mask, mkU32( I16_EXP_MASK ) );
+      assign( *frac_mask, mkU32( I16_FRACTION_MASK ) );
+      assign( *zero, mkU32( 0 ) );
+
+   } else if( size == Ity_I32 ) {
+      *frac_mask = newTemp( Ity_I32 );
+      *exp_mask  = newTemp( Ity_I32 );
+      *zero  = newTemp( Ity_I32 );
+      assign( *exp_mask, mkU32( I32_EXP_MASK ) );
+      assign( *frac_mask, mkU32( I32_FRACTION_MASK ) );
+      assign( *zero, mkU32( 0 ) );
+
+   } else if( size == Ity_I64 ) {
+      *frac_mask = newTemp( Ity_I64 );
+      *exp_mask  = newTemp( Ity_I64 );
+      *zero  = newTemp( Ity_I64 );
+      assign( *exp_mask, mkU64( I64_EXP_MASK ) );
+      assign( *frac_mask, mkU64( I64_FRACTION_MASK ) );
+      assign( *zero, mkU64( 0 ) );
+
+   } else {
+      /* V128 is converted to upper and lower 64 bit values, */
+      /* uses 64-bit operators and temps */
+      *frac_mask = newTemp( Ity_I64 );
+      *exp_mask  = newTemp( Ity_I64 );
+      *zero  = newTemp( Ity_I64 );
+      assign( *exp_mask, mkU64( V128_EXP_MASK ) );
+      /* upper 64-bit fractional mask */
+      assign( *frac_mask, mkU64( V128_FRACTION_MASK ) );
+      assign( *zero, mkU64( 0 ) );
+   }
+}
+
+/* Helper function for the various function which check the value of
+   the floating point value.
+*/
+static IRExpr * exponent_compare( IRType size, IRTemp src,
+                                  IRTemp exp_mask, IRExpr *exp_val )
+{
+   IROp opAND, opCmpEQ;
+
+   if( ( size == Ity_I16 ) || ( size == Ity_I32 ) )  {
+      /* The 16-bit floating point value is in the lower 16-bits of
+         the 32-bit input value */
+      opAND = Iop_And32;
+      opCmpEQ = Iop_CmpEQ32;
+
+   } else {
+      opAND = Iop_And64;
+      opCmpEQ = Iop_CmpEQ64;
+   }
+
+   if( size == Ity_V128 ) {
+      return binop( opCmpEQ,
+                    binop ( opAND,
+                            unop( Iop_V128HIto64, mkexpr( src ) ),
+                            mkexpr( exp_mask ) ),
+                    exp_val );
+
+   } else if( ( size == Ity_I16 ) || ( size == Ity_I32 ) )  {
+      return binop( opCmpEQ,
+                    binop ( opAND, mkexpr( src ), mkexpr( exp_mask ) ),
+                    exp_val );
+   } else {
+      /* 64-bit operands */
+
+      if (mode64) {
+         return binop( opCmpEQ,
+                       binop ( opAND, mkexpr( src ), mkexpr( exp_mask ) ),
+                       exp_val );
+      } else {
+         /* No support for 64-bit compares in 32-bit mode, need to do upper
+          * and lower parts using 32-bit compare operators.
+          */
+         return 
+            mkAND1( binop( Iop_CmpEQ32,
+                           binop ( Iop_And32,
+                                   unop(Iop_64HIto32, mkexpr( src ) ),
+                                   unop(Iop_64HIto32, mkexpr( exp_mask ) ) ),
+                           unop(Iop_64HIto32, exp_val ) ),
+                    binop( Iop_CmpEQ32,
+                           binop ( Iop_And32,
+                                   unop(Iop_64to32, mkexpr( src ) ),
+                                   unop(Iop_64to32, mkexpr( exp_mask ) ) ),
+                           unop(Iop_64to32, exp_val ) ) );
+      }
+   }
+}
+
+static IRExpr *fractional_part_compare( IRType size, IRTemp src,
+                                        IRTemp frac_mask, IRExpr *zero )
+{
+   IROp opAND, opCmpEQ;
+
+   if( ( size == Ity_I16 ) || ( size == Ity_I32 ) )  {
+      /*The 16-bit floating point value is in the lower 16-bits of
+        the 32-bit input value */
+      opAND = Iop_And32;
+      opCmpEQ = Iop_CmpEQ32;
+
+   } else {
+      opAND = Iop_And64;
+      opCmpEQ = Iop_CmpEQ64;
+   }
+
+   if( size == Ity_V128 ) {
+      /* 128-bit, note we only care if the fractional part is zero so take upper
+         52-bits of fractional part and lower 64-bits and OR them together and test
+         for zero.  This keeps the temp variables and operators all 64-bit.
+      */
+      return binop( opCmpEQ,
+                    binop( Iop_Or64,
+                           binop( opAND,
+                                  unop( Iop_V128HIto64, mkexpr( src ) ),
+                                  mkexpr( frac_mask ) ),
+                           unop( Iop_V128to64, mkexpr( src ) ) ),
+                    zero );
+
+   } else if( ( size == Ity_I16 ) || ( size == Ity_I32 ) )  {
+         return binop( opCmpEQ,
+                       binop( opAND,  mkexpr( src ), mkexpr( frac_mask ) ),
+                       zero );
+   } else {
+      if (mode64) {
+         return binop( opCmpEQ,
+                       binop( opAND,  mkexpr( src ), mkexpr( frac_mask ) ),
+                       zero );
+      } else {
+         /* No support for 64-bit compares in 32-bit mode, need to do upper
+          * and lower parts using 32-bit compare operators.
+          */
+         return 
+            mkAND1( binop( Iop_CmpEQ32,
+                           binop ( Iop_And32,
+                                   unop(Iop_64HIto32, mkexpr( src ) ),
+                                   unop(Iop_64HIto32, mkexpr( frac_mask ) ) ),
+                           mkU32 ( 0 ) ),
+                    binop( Iop_CmpEQ32,
+                           binop ( Iop_And32,
+                                   unop(Iop_64to32, mkexpr( src ) ),
+                                   unop(Iop_64to32, mkexpr( frac_mask ) ) ),
+                           mkU32 ( 0 ) ) );
+      }
+   }
+}
+
+// Infinity: exp has all bits set, and fraction is zero; s = 0/1
+static IRExpr * is_Inf( IRType size, IRTemp src )
+{
+   IRExpr *max_exp, *zero_frac;
+   IRTemp exp_mask, frac_mask, zero;
+
+   setup_value_check_args( size, &exp_mask, &frac_mask, &zero );
+
+   /* check exponent is all ones, i.e. (exp AND exp_mask) = exp_mask */
+   max_exp = exponent_compare( size, src, exp_mask, mkexpr( exp_mask ) );
+
+   /* check fractional part is all zeros */
+   zero_frac = fractional_part_compare( size, src, frac_mask, mkexpr( zero ) );
+
+   return  mkAND1( max_exp, zero_frac );
+}
+
+// Zero: exp is zero and fraction is zero; s = 0/1
+static IRExpr * is_Zero( IRType size, IRTemp src )
+{
+   IRExpr *zero_exp, *zero_frac;
+   IRTemp exp_mask, frac_mask, zero;
+
+   setup_value_check_args( size, &exp_mask, &frac_mask, &zero );
+
+   /* check the exponent is all zeros, i.e. (exp AND exp_mask) = zero */
+   zero_exp = exponent_compare( size, src, exp_mask, mkexpr( zero ) );
+
+   /* check fractional part is all zeros */
+   zero_frac = fractional_part_compare( size, src, frac_mask, mkexpr( zero ) );
+
+   return  mkAND1( zero_exp, zero_frac );
+}
+
+/*  SNAN: s = 1/0; exp all 1's; fraction is nonzero, with highest bit '1'
+ *  QNAN: s = 1/0; exp all 1's; fraction is nonzero, with highest bit '0'
+ */
+static IRExpr * is_NaN( IRType size, IRTemp src )
+{
+   IRExpr *max_exp, *not_zero_frac;
+   IRTemp exp_mask, frac_mask, zero;
+
+   setup_value_check_args( size, &exp_mask, &frac_mask, &zero );
+
+   /* check exponent is all ones, i.e. (exp AND exp_mask) = exp_mask */
+   max_exp = exponent_compare( size, src, exp_mask, mkexpr( exp_mask ) );
+
+   /* check fractional part is not zero */
+   not_zero_frac = unop( Iop_Not1,
+                         fractional_part_compare( size, src, frac_mask,
+                                                  mkexpr( zero ) ) );
+
+   return  mkAND1( max_exp, not_zero_frac );
+}
+
+/* Denormalized number has a zero exponent and non zero fraction. */
+static IRExpr * is_Denorm( IRType size, IRTemp src )
+{
+   IRExpr *zero_exp, *not_zero_frac;
+   IRTemp exp_mask, frac_mask, zero;
+
+   setup_value_check_args( size, &exp_mask, &frac_mask, &zero );
+
+   /* check exponent is all ones, i.e. (exp AND exp_mask) = exp_mask */
+   zero_exp = exponent_compare( size, src, exp_mask, mkexpr( zero ) );
+
+   /* check fractional part is not zero */
+   not_zero_frac = unop( Iop_Not1,
+                         fractional_part_compare( size, src, frac_mask,
+                                                  mkexpr( zero ) ) );
+
+   return  mkAND1( zero_exp, not_zero_frac );
+}
+
+/* Normalized number has exponent between 1 and max_exp -1, or in other words
+   the exponent is not zero and not equal to the max exponent value. */
+static IRExpr * is_Norm( IRType size, IRTemp src )
+{
+   IRExpr *not_zero_exp, *not_max_exp;
+   IRTemp exp_mask, zero;
+
+   vassert( ( size == Ity_I16 ) || ( size == Ity_I32 )
+            || ( size == Ity_I64 ) || ( size == Ity_V128 ) );
+
+   if( size == Ity_I16 ) {
+      /* The 16-bit floating point value is in the lower 16-bits of
+         the 32-bit input value */
+      exp_mask = newTemp( Ity_I32 );
+      zero = newTemp( Ity_I32 );
+      assign( exp_mask, mkU32( I16_EXP_MASK ) );
+      assign( zero,  mkU32( 0 ) );
+
+   } else if( size == Ity_I32 ) {
+      exp_mask = newTemp( Ity_I32 );
+      zero = newTemp( Ity_I32 );
+      assign( exp_mask, mkU32( I32_EXP_MASK ) );
+      assign( zero, mkU32( 0 ) );
+
+   } else if( size == Ity_I64 ) {
+      exp_mask = newTemp( Ity_I64 );
+      zero = newTemp( Ity_I64 );
+      assign( exp_mask, mkU64( I64_EXP_MASK ) );
+      assign( zero, mkU64( 0 ) );
+
+   } else {
+      /* V128 is converted to upper and lower 64 bit values, */
+      /* uses 64-bit operators and temps */
+      exp_mask = newTemp( Ity_I64 );
+      zero = newTemp( Ity_I64 );
+      assign( exp_mask, mkU64( V128_EXP_MASK ) );
+      assign( zero, mkU64( 0 ) );
+   }
+
+   not_zero_exp = unop( Iop_Not1,
+                        exponent_compare( size, src,
+                                          exp_mask, mkexpr( zero ) ) );
+   not_max_exp = unop( Iop_Not1,
+                       exponent_compare( size, src,
+                                         exp_mask, mkexpr( exp_mask ) ) );
+
+   return  mkAND1( not_zero_exp, not_max_exp );
+}
+
+
+static IRExpr * create_FPCC( IRTemp NaN,   IRTemp inf,
+                             IRTemp zero,  IRTemp norm,
+                             IRTemp dnorm, IRTemp pos,
+                             IRTemp neg ) {
+   IRExpr *bit0, *bit1, *bit2, *bit3;
+
+   /* If the result is NaN then must force bits 1, 2 and 3 to zero
+    * to get correct result.
+    */
+   bit0 = unop( Iop_1Uto32, mkOR1( mkexpr( NaN ), mkexpr( inf ) ) );
+   bit1 = unop( Iop_1Uto32, mkAND1( mkNOT1( mkexpr( NaN ) ), mkexpr( zero ) ) );
+   bit2 = unop( Iop_1Uto32,
+                mkAND1( mkNOT1( mkexpr( NaN ) ),
+                        mkAND1( mkOR1( mkOR1( mkAND1( mkexpr( pos ),
+                                                      mkexpr( dnorm ) ),
+                                              mkAND1( mkexpr( pos ),
+                                                      mkexpr( norm ) ) ),
+                                       mkAND1( mkexpr( pos ),
+                                               mkexpr( inf ) ) ),
+                                mkAND1( mkNOT1 ( mkexpr( zero ) ),
+                                        mkNOT1( mkexpr( NaN ) ) ) ) ) );
+   bit3 = unop( Iop_1Uto32,
+                mkAND1( mkNOT1( mkexpr( NaN ) ),
+                        mkAND1( mkOR1( mkOR1( mkAND1( mkexpr( neg ),
+                                                      mkexpr( dnorm ) ),
+                                              mkAND1( mkexpr( neg ),
+                                                      mkexpr( norm ) ) ),
+                                       mkAND1( mkexpr( neg ),
+                                               mkexpr( inf ) ) ),
+                                mkAND1( mkNOT1 ( mkexpr( zero ) ),
+                                        mkNOT1( mkexpr( NaN ) ) ) ) ) );
+
+   return binop( Iop_Or32,
+                 binop( Iop_Or32,
+                        bit0,
+                        binop( Iop_Shl32, bit1, mkU8( 1 ) ) ),
+                 binop( Iop_Or32,
+                        binop( Iop_Shl32, bit2, mkU8( 2 ) ),
+                        binop( Iop_Shl32, bit3, mkU8( 3 ) ) ) );
+}
+
+static IRExpr * create_C( IRTemp NaN,   IRTemp zero,
+                          IRTemp dnorm, IRTemp pos,
+                          IRTemp neg )
+{
+
+   return unop( Iop_1Uto32,
+                mkOR1( mkOR1( mkexpr( NaN ),
+                              mkAND1( mkexpr( neg ), mkexpr( dnorm ) ) ),
+                       mkOR1( mkAND1( mkexpr( neg ), mkexpr( zero ) ),
+                              mkAND1( mkexpr( pos ), mkexpr( dnorm ) ) ) ) );
+}
+
+static void generate_store_FPRF( IRType size, IRTemp src )
+{
+   IRExpr *FPCC, *C;
+   IRTemp NaN = newTemp( Ity_I1 ), inf = newTemp( Ity_I1 );
+   IRTemp dnorm = newTemp( Ity_I1 ), norm = newTemp( Ity_I1 );
+   IRTemp pos = newTemp( Ity_I1 ),  neg = newTemp( Ity_I1 );
+   IRTemp zero = newTemp( Ity_I1 );
+
+   IRTemp sign_bit = newTemp( Ity_I1 );
+   IRTemp value;
+
+   vassert( ( size == Ity_I16 ) || ( size == Ity_I32 )
+            || ( size == Ity_I64 ) || ( size == Ity_F128 ) );
+
+   vassert( ( typeOfIRExpr(irsb->tyenv, mkexpr( src ) )   == Ity_I32 )
+            || ( typeOfIRExpr(irsb->tyenv, mkexpr( src ) ) == Ity_I64 )
+            || ( typeOfIRExpr(irsb->tyenv, mkexpr( src ) ) == Ity_F128 ) );
+
+   if( size == Ity_I16 ) {
+      /* The 16-bit floating point value is in the lower 16-bits of
+         the 32-bit input value */
+      value = newTemp( Ity_I32 );
+      assign( value, mkexpr( src ) );
+      assign( sign_bit,
+              unop ( Iop_32to1,
+                     binop( Iop_And32,
+                            binop( Iop_Shr32, mkexpr( value ), mkU8( 15 ) ),
+                            mkU32( 0x1 ) ) ) );
+
+   } else if( size == Ity_I32 ) {
+      value = newTemp( size );
+      assign( value, mkexpr( src ) );
+      assign( sign_bit,
+              unop ( Iop_32to1,
+                     binop( Iop_And32,
+                            binop( Iop_Shr32, mkexpr( value ), mkU8( 31 ) ),
+                            mkU32( 0x1 ) ) ) );
+
+   } else if( size == Ity_I64 ) {
+      value = newTemp( size );
+      assign( value, mkexpr( src ) );
+      assign( sign_bit,
+              unop ( Iop_64to1,
+                     binop( Iop_And64,
+                            binop( Iop_Shr64, mkexpr( value ), mkU8( 63 ) ),
+                            mkU64( 0x1 ) ) ) );
+
+   } else {
+   /* Move the F128 bit pattern to an integer V128 bit pattern */
+      value = newTemp( Ity_V128 );
+      assign( value,
+              binop( Iop_64HLtoV128,
+                     unop( Iop_ReinterpF64asI64,
+                           unop( Iop_F128HItoF64, mkexpr( src ) ) ),
+                     unop( Iop_ReinterpF64asI64,
+                           unop( Iop_F128LOtoF64, mkexpr( src ) ) ) ) );
+
+      size = Ity_V128;
+      assign( sign_bit,
+              unop ( Iop_64to1,
+                     binop( Iop_And64,
+                            binop( Iop_Shr64,
+                                   unop( Iop_V128HIto64, mkexpr( value ) ),
+                                   mkU8( 63 ) ),
+                            mkU64( 0x1 ) ) ) );
+   }
+
+   /* Calculate the floating point result field FPRF */
+   assign( NaN, is_NaN( size, value ) );
+   assign( inf, is_Inf( size, value ) );
+   assign( zero, is_Zero( size, value ) );
+   assign( norm, is_Norm( size, value ) );
+   assign( dnorm, is_Denorm( size, value ) );
+   assign( pos, mkAND1( mkNOT1( mkexpr( sign_bit ) ), mkU1( 1 ) ) );
+   assign( neg, mkAND1( mkexpr( sign_bit ), mkU1( 1 ) ) );
+
+   /* create the FPRF bit field
+    *
+    *   FPRF field[4:0]   type of value
+    *      10001           QNaN
+    *      01001           - infininity
+    *      01000           - Normalized
+    *      11000           - Denormalized
+    *      10010           - zero
+    *      00010           + zero
+    *      10100           + Denormalized
+    *      00100           + Normalized
+    *      00101           + infinity
+    */
+   FPCC = create_FPCC( NaN, inf, zero, norm, dnorm, pos, neg );
+   C = create_C( NaN, zero, dnorm, pos, neg );
+
+   /* Write the C and FPCC fields of the FPRF field */
+   putC( C );
+   putFPCC( FPCC );
 }
 
 /* This function takes an Ity_I32 input argument interpreted
- * as a single-precision floating point value. If src is a
- * SNaN, it is changed to a QNaN and returned; otherwise,
- * the original value is returned.
- */
+   as a single-precision floating point value. If src is a
+   SNaN, it is changed to a QNaN and returned; otherwise,
+   the original value is returned. */
 static IRExpr * handle_SNaN_to_QNaN_32(IRExpr * src)
 {
 #define SNAN_MASK32 0x00400000
@@ -3319,7 +3942,7 @@
 
    /* check if input is SNaN, if it is convert to QNaN */
    assign( is_SNAN,
-           mkAND1( is_NaN_32( tmp ),
+           mkAND1( is_NaN( Ity_I32, tmp ),
                    binop( Iop_CmpEQ32,
                           binop( Iop_And32, mkexpr( tmp ),
                                  mkU32( SNAN_MASK32 ) ),
@@ -3359,7 +3982,8 @@
               binop( Iop_CmpEQ32,
                      binop( Iop_Xor32,
                             mkexpr( signbit_32 ),
-                            unop( Iop_1Uto32, is_NaN( intermediateResult ) ) ),
+                            unop( Iop_1Uto32, is_NaN( Ity_I64,
+                                                      intermediateResult ) ) ),
                      mkU32( 1 ) ) ) );
 
    assign( negatedResult,
@@ -3402,7 +4026,8 @@
               binop( Iop_CmpEQ32,
                      binop( Iop_Xor32,
                             mkexpr( signbit_32 ),
-                            unop( Iop_1Uto32, is_NaN_32( intermediateResult ) ) ),
+                            unop( Iop_1Uto32, is_NaN( Ity_I32,
+                                                      intermediateResult ) ) ),
                      mkU32( 1 ) ) ) );
 
    assign( negatedResult,
@@ -3417,6 +4042,641 @@
    return negatedResult;
 }
 
+/* This function takes two quad_precision floating point numbers of type
+   V128 and return 1 if src_A > src_B, 0 otherwise. */
+static IRExpr * Quad_precision_gt ( IRTemp src_A, IRTemp  src_B )
+{
+#define FRAC_MASK64Hi 0x0000ffffffffffffULL
+#define MASK 0x7FFFFFFFFFFFFFFFULL    /* exclude sign bit in upper 64 bits */
+#define EXP_MASK 0x7fff
+
+   IRType ty = Ity_I64;
+   IRTemp sign_A = newTemp( ty );
+   IRTemp sign_B = newTemp( ty );
+   IRTemp exp_A = newTemp( ty );
+   IRTemp exp_B = newTemp( ty );
+   IRTemp frac_A_hi = newTemp( ty );
+   IRTemp frac_B_hi = newTemp( ty );
+   IRTemp frac_A_lo = newTemp( ty );
+   IRTemp frac_B_lo = newTemp( ty );
+
+
+   /* extract exponents, and fractional parts so they can be compared */
+   assign( sign_A, binop( Iop_Shr64,
+                          unop( Iop_V128HIto64, mkexpr( src_A ) ),
+                          mkU8( 63 ) ) );
+   assign( sign_B, binop( Iop_Shr64,
+                          unop( Iop_V128HIto64, mkexpr( src_B ) ),
+                          mkU8( 63 ) ) );
+   assign( exp_A, binop( Iop_And64,
+                         binop( Iop_Shr64,
+                                unop( Iop_V128HIto64, mkexpr( src_A ) ),
+                                mkU8( 48 ) ),
+                         mkU64( EXP_MASK ) ) );
+   assign( exp_B, binop( Iop_And64,
+                         binop( Iop_Shr64,
+                                unop( Iop_V128HIto64, mkexpr( src_B ) ),
+                                mkU8( 48 ) ),
+                         mkU64( EXP_MASK ) ) );
+   assign( frac_A_hi, binop( Iop_And64,
+                             unop( Iop_V128HIto64, mkexpr( src_A ) ),
+                             mkU64( FRAC_MASK64Hi ) ) );
+   assign( frac_B_hi, binop( Iop_And64,
+                             unop( Iop_V128HIto64, mkexpr( src_B ) ),
+                             mkU64( FRAC_MASK64Hi ) ) );
+   assign( frac_A_lo, unop( Iop_V128to64, mkexpr( src_A ) ) );
+   assign( frac_B_lo, unop( Iop_V128to64, mkexpr( src_B ) ) );
+
+   IRExpr * A_zero =  mkAND1( binop( Iop_CmpEQ64,
+                                     binop( Iop_And64,
+                                            unop( Iop_V128HIto64,
+                                                  mkexpr( src_A ) ),
+                                            mkU64( MASK ) ),
+                                     mkU64( 0 ) ),
+                              binop( Iop_CmpEQ64,
+                                     unop( Iop_V128to64, mkexpr( src_A ) ),
+                                     mkU64( 0 ) ) );
+   IRExpr * B_zero =  mkAND1( binop( Iop_CmpEQ64,
+                                     binop( Iop_And64,
+                                            unop( Iop_V128HIto64,
+                                                  mkexpr( src_B ) ),
+                                            mkU64( MASK ) ),
+                                     mkU64( 0 ) ),
+                              binop( Iop_CmpEQ64,
+                                     unop( Iop_V128to64, mkexpr( src_B ) ),
+                                     mkU64( 0 ) ) );
+   IRExpr * A_B_zero = mkAND1( A_zero, B_zero );
+
+   /* Compare numbers */
+   IRExpr * both_pos = mkAND1( binop( Iop_CmpEQ64, mkexpr( sign_A ),
+                                      mkU64( 0 ) ),
+                               binop( Iop_CmpEQ64, mkexpr( sign_B ),
+                                      mkU64( 0 ) ) );
+   IRExpr * both_neg = mkAND1( binop( Iop_CmpEQ64, mkexpr( sign_A ),
+                                      mkU64( 1 ) ),
+                               binop( Iop_CmpEQ64, mkexpr( sign_B ),
+                                      mkU64( 1 ) ) );
+   IRExpr * sign_eq = binop( Iop_CmpEQ64, mkexpr( sign_A ), mkexpr( sign_B ) );
+   IRExpr * sign_gt = binop( Iop_CmpLT64U, mkexpr( sign_A ),
+                             mkexpr( sign_B ) ); /* A pos, B neg */
+
+   IRExpr * exp_eq = binop( Iop_CmpEQ64, mkexpr( exp_A ), mkexpr( exp_B ) );
+   IRExpr * exp_gt = binop( Iop_CmpLT64U, mkexpr( exp_B ), mkexpr( exp_A ) );
+   IRExpr * exp_lt = binop( Iop_CmpLT64U, mkexpr( exp_A ), mkexpr( exp_B ) );
+
+   IRExpr * frac_hi_eq = binop( Iop_CmpEQ64, mkexpr( frac_A_hi),
+                                mkexpr( frac_B_hi ) );
+   IRExpr * frac_hi_gt = binop( Iop_CmpLT64U, mkexpr( frac_B_hi ),
+                                mkexpr( frac_A_hi ) );
+   IRExpr * frac_hi_lt = binop( Iop_CmpLT64U, mkexpr( frac_A_hi ),
+                                mkexpr( frac_B_hi ) );
+
+   IRExpr * frac_lo_gt = binop( Iop_CmpLT64U, mkexpr( frac_B_lo ),
+                                mkexpr( frac_A_lo ) );
+   IRExpr * frac_lo_lt = binop( Iop_CmpLT64U, mkexpr( frac_A_lo ),
+                                mkexpr( frac_B_lo ) );
+
+   /* src_A and src_B both positive */
+   IRExpr *pos_cmp = mkOR1( exp_gt,
+                            mkAND1( exp_eq,
+                                    mkOR1( frac_hi_gt,
+                                           mkAND1( frac_hi_eq, frac_lo_gt ) )
+                                    ) );
+
+   /* src_A and src_B both negative */
+   IRExpr *neg_cmp = mkOR1( exp_lt,
+                            mkAND1( exp_eq,
+                                    mkOR1( frac_hi_lt,
+                                           mkAND1( frac_hi_eq, frac_lo_lt ) )
+                                    ) );
+
+   /* Need to check the case where one value is a positive
+    * zero and the other value is a negative zero
+    */
+   return mkAND1( mkNOT1( A_B_zero ),
+                  mkOR1( sign_gt,
+                         mkAND1( sign_eq,
+                                 mkOR1( mkAND1( both_pos, pos_cmp ),
+                                        mkAND1( both_neg, neg_cmp ) ) ) ) );
+}
+
+/*-----------------------------------------------------------
+ * Helpers for VX instructions that work on National decimal values,
+ * Zoned decimal values and BCD values.
+ *
+ *------------------------------------------------------------*/
+static IRExpr * is_National_decimal (IRTemp src)
+{
+   /* The src is a 128-bit value containing a sign code in half word 7
+    * and seven digits in halfwords 0 to 6 (IBM numbering).  A valid
+    * national decimal value has the following:
+    *   - the sign code must be 0x002B (positive) or 0x002D (negative)
+    *   - the digits must be in the range 0x0030 to 0x0039
+    */
+   Int i;
+   IRExpr * valid_pos_sign;
+   IRExpr * valid_neg_sign;
+   IRTemp valid_num[8];
+   IRTemp digit[7];
+
+   valid_pos_sign = binop( Iop_CmpEQ64,
+                           binop( Iop_And64,
+                                  mkU64( 0xFFFF ),
+                                  unop( Iop_V128to64, mkexpr( src ) ) ),
+                           mkU64( 0x002B ) );
+
+   valid_neg_sign = binop( Iop_CmpEQ64,
+                           binop( Iop_And64,
+                                  mkU64( 0xFFFF ),
+                                  unop( Iop_V128to64, mkexpr( src ) ) ),
+                           mkU64( 0x002D ) );
+
+   valid_num[0] = newTemp( Ity_I1 );
+   digit[0] = newTemp( Ity_I64 );
+   assign( valid_num[0], mkU1( 1 ) );   // Assume true to start
+
+   for(i = 0; i < 7; i++) {
+      valid_num[i+1] = newTemp( Ity_I1 );
+      digit[i] = newTemp( Ity_I64 );
+      assign( digit[i], binop( Iop_And64,
+                               unop( Iop_V128to64,
+                                     binop( Iop_ShrV128,
+                                            mkexpr( src ),
+                                            mkU8( (7-i)*16 ) ) ),
+                               mkU64( 0xFFFF ) ) );
+
+      assign( valid_num[i+1],
+              mkAND1( mkexpr( valid_num[i] ),
+                      mkAND1( binop( Iop_CmpLE64U,
+                                     mkexpr( digit[i] ),
+                                     mkU64( 0x39 ) ),
+                              binop( Iop_CmpLE64U,
+                                     mkU64( 0x30 ),
+                                     mkexpr( digit[i] ) ) ) ) );
+   }
+
+   return mkAND1( mkOR1( valid_pos_sign, valid_neg_sign),
+                  mkexpr( valid_num[7] ) );
+}
+
+static IRExpr * is_Zoned_decimal (IRTemp src, UChar ps)
+{
+   /* The src is a 128-bit value containing a sign code the least significant
+    * two bytes. The upper pairs of bytes contain digits.  A valid Zoned
+    * decimal value has the following:
+    *   - the sign code must be between 0x0X to 0xFX inclusive (X - don't care)
+    *   - bits [0:3] of each digit must be equal to 0x3
+    *   - bits [4:7] of each digit must be between 0x0 and 0x9
+    *
+    *  If ps = 0
+    *     Positive sign codes are: 0x0, 0x1, 0x2, 0x3, 0x8, 0x9, 0xA, 0xB
+    *       (note 0bX0XX XXXX  is positive)
+    *
+    *     Negative sign codes are 0x4, 0x5, 0x6, 0x7, 0xC, 0xD, 0xE, 0xF
+    *       (note 0bX1XX XXXX  is negative)
+    *
+    *  If ps = 1, then the sign code must be in the range 0xA to 0xF
+    *     Positive sign codes are: 0xA, 0xC, 0xE, 0xF
+    *
+    *     Negative sign codes are 0xB, 0xD
+    */
+   Int i, mask_hi, mask_lo;
+   IRExpr *valid_range;
+   IRTemp valid_num[16];
+   IRTemp digit[15];
+
+   /* check the range of the sign value based on the value of ps */
+   valid_range = mkOR1(
+                       mkAND1( binop( Iop_CmpEQ64,
+                                      mkU64( 1 ),
+                                      mkU64( ps ) ),
+                               mkAND1( binop( Iop_CmpLE64U,
+                                              binop( Iop_And64,
+                                                     mkU64( 0xF0 ),
+                                                     unop( Iop_V128to64,
+                                                           mkexpr( src ) ) ),
+
+                                              mkU64( 0xF0 ) ),
+                                       binop( Iop_CmpLE64U,
+                                              mkU64( 0xA0 ),
+                                              binop( Iop_And64,
+                                                     mkU64( 0xF0 ),
+                                                     unop( Iop_V128to64,
+                                                           mkexpr( src ) ))))),
+                       binop( Iop_CmpEQ64,
+                              mkU64( 0 ),
+                              mkU64( ps ) ) );
+
+   valid_num[0] = newTemp( Ity_I1 );
+   assign( valid_num[0], mkU1( 1) );   // Assume true to start
+
+   if (ps == 0) {
+      mask_hi = 0x39;
+      mask_lo = 0x30;
+   } else {
+      mask_hi = 0xF9;
+      mask_lo = 0xF0;
+   }
+
+   for(i = 0; i < 15; i++) {
+      valid_num[i+1] = newTemp( Ity_I1 );
+      digit[i] = newTemp( Ity_I64 );
+      assign( digit[i], binop( Iop_And64,
+                               unop( Iop_V128to64,
+                                     binop( Iop_ShrV128,
+                                            mkexpr( src ),
+                                            mkU8( (15-i)*8 ) ) ),
+                               mkU64( 0xFF ) ) );
+
+      assign( valid_num[i+1],
+              mkAND1( mkexpr( valid_num[i] ),
+                      mkAND1( binop( Iop_CmpLE64U,
+                                     mkexpr( digit[i] ),
+                                     mkU64( mask_hi ) ),
+                              binop( Iop_CmpLE64U,
+                                     mkU64( mask_lo ),
+                                     mkexpr( digit[i] ) ) ) ) );
+   }
+
+   return mkAND1( valid_range, mkexpr( valid_num[15] ) );
+}
+
+static IRExpr * CmpGT128U ( IRExpr *src1, IRExpr *src2 )
+{
+   /* Unsigend compare of two 128-bit values */
+   IRExpr *pos_upper_gt, *pos_upper_eq, *pos_lower_gt;
+
+   pos_upper_gt = binop( Iop_CmpLT64U,
+                         unop( Iop_V128HIto64, src2 ),
+                         unop( Iop_V128HIto64, src1 ) );
+   pos_upper_eq = binop( Iop_CmpEQ64,
+                         unop( Iop_V128HIto64, src1 ),
+                         unop( Iop_V128HIto64, src2 ) );
+   pos_lower_gt = binop( Iop_CmpLT64U,
+                         unop( Iop_V128to64, src2),
+                         unop( Iop_V128to64, src1) );
+   return mkOR1( pos_upper_gt,
+                 mkAND1( pos_upper_eq,
+                         pos_lower_gt ) );
+}
+
+
+static IRExpr * is_BCDstring128 ( const VexAbiInfo* vbi,
+                                  UInt Signed, IRExpr *src )
+{
+
+   IRTemp valid = newTemp( Ity_I64 );
+
+   /* The src is a 128-bit value containing a MAX_DIGITS BCD digits and
+    * a sign. The upper bytes are BCD values between 0x0 and 0x9.  The sign
+    * byte is the least significant byte. This function returns 64-bit 1
+    * value if sign and digits are valid, 0 otherwise.
+    *
+    * This function was originally written using IR code.  It has been
+    * replaced with a clean helper due to the large amount of IR code
+    * needed by this function.
+    */
+   assign( valid,
+           mkIRExprCCall( Ity_I64, 0 /*regparms*/,
+                          "is_BCDstring128_helper",
+                          fnptr_to_fnentry( vbi, &is_BCDstring128_helper ),
+                          mkIRExprVec_3( mkU64( Signed ),
+                                         unop( Iop_V128HIto64, src ),
+                                         unop( Iop_V128to64, src ) ) ) );
+   return mkexpr( valid );
+}
+
+static IRExpr * BCDstring_zero (IRExpr *src)
+{
+   /* The src is a 128-bit value containing a BCD string.  The function
+    * returns a 1 if the BCD string values are all zero, 0 otherwise.
+    */
+   IRTemp tsrc = newTemp( Ity_V128 );
+   assign( tsrc, src);
+
+   if ( mode64 ) {
+      return mkAND1( binop( Iop_CmpEQ64,
+                            mkU64( 0 ),
+                            unop( Iop_V128HIto64,
+                                  mkexpr( tsrc ) ) ),
+                     binop( Iop_CmpEQ64,
+                            mkU64( 0 ),
+                            unop( Iop_V128to64,
+                                  mkexpr( tsrc ) ) ) );
+   } else {
+      /* make this work in 32-bit mode */
+      return mkAND1(
+                    mkAND1( binop( Iop_CmpEQ32,
+                                   mkU32( 0 ),
+                                   unop( Iop_64HIto32,
+                                         unop( Iop_V128HIto64,
+                                               mkexpr( tsrc ) ) ) ),
+                            binop( Iop_CmpEQ32,
+                                   mkU32( 0 ),
+                                   unop( Iop_64to32,
+                                         unop( Iop_V128HIto64,
+                                               mkexpr( tsrc ) ) ) ) ),
+                    mkAND1( binop( Iop_CmpEQ32,
+                                   mkU32( 0 ),
+                                   unop( Iop_64HIto32,
+                                         unop( Iop_V128to64,
+                                               mkexpr( tsrc ) ) ) ),
+                            binop( Iop_CmpEQ32,
+                                   mkU32( 0 ),
+                                   unop( Iop_64to32,
+                                         unop( Iop_V128to64,
+                                               mkexpr( tsrc ) ) ) ) ) );
+   }
+}
+
+static IRExpr * check_BCD_round (IRExpr *src, IRTemp shift)
+{
+   /* The src is a 128-bit value containing 31 BCD digits with the sign in
+    * the least significant byte. The bytes are BCD values between 0x0 and 0x9.
+    * This routine checks the BCD digit in position shift (counting from
+    * the least significant digit). If the digit is greater then five,
+    * a 1 is returned indicating the string needs to be rounded up,
+    * otherwise, 0 is returned.  The value of shift (I64) is the index of
+    * the BCD digit times four bits.
+    */
+   return  binop( Iop_CmpLE64U,
+                  mkU64( 6 ),
+                  binop( Iop_And64,
+                         unop( Iop_V128to64,
+                               binop( Iop_ShrV128,
+                                      src,
+                                      unop( Iop_64to8, mkexpr( shift ) ) ) ),
+                         mkU64( 0xF ) ) );
+}
+
+static IRTemp increment_BCDstring ( const VexAbiInfo* vbi,
+                                    IRExpr *src, IRExpr *carry_in )
+{
+   /* The src is a 128-bit value containing 31 BCD digits with the sign in
+    * the least significant byte. The bytes are BCD values between 0x0 and 0x9.
+    * This function returns the BCD string incremented by 1.
+    *
+    * Call a clean helper to do the computation as it requires a lot of
+    * IR code to do this.
+    *
+    * The helper function takes a 32-bit BCD string, in a 64-bit value, and
+    * increments the string by the 32-bi carry in value.
+    *
+    * The incremented value is returned in the lower 32-bits of the result.
+    * The carry out is returned in bits [35:32] of the result.  The
+    * helper function will be called for each of the four 32-bit strings
+    * that make up the src string passing the returned carry out to the
+    * next call.
+    */
+   IRTemp bcd_result  = newTemp( Ity_V128 );
+   IRTemp bcd_result0 = newTemp( Ity_I64 );
+   IRTemp bcd_result1 = newTemp( Ity_I64 );
+   IRTemp bcd_result2 = newTemp( Ity_I64 );
+   IRTemp bcd_result3 = newTemp( Ity_I64 );
+   IRExpr *bcd_string0, *bcd_string1, *bcd_string2, *bcd_string3;
+
+   bcd_string0 = binop( Iop_And64,
+                        mkU64( 0xFFFFFFFF ), unop( Iop_V128to64, src ) );
+   bcd_string1 = binop( Iop_Shr64, unop( Iop_V128to64, src ), mkU8( 32 ) );
+   bcd_string2 = binop( Iop_And64,
+                        mkU64( 0xFFFFFFFF ), unop( Iop_V128HIto64, src ) );
+   bcd_string3 = binop( Iop_Shr64, unop( Iop_V128HIto64, src ), mkU8( 32 ) );
+
+   assign( bcd_result0,
+           mkIRExprCCall( Ity_I64, 0 /*regparms*/,
+                          "increment_BCDstring32_helper",
+                          fnptr_to_fnentry( vbi,
+                                            &increment_BCDstring32_helper ),
+                          mkIRExprVec_3( mkU64( True /*Signed*/ ),
+                                         bcd_string0,
+                                         binop( Iop_32HLto64, mkU32( 0 ),
+                                                carry_in ) ) ) );
+
+   assign( bcd_result1,
+           mkIRExprCCall( Ity_I64, 0 /*regparms*/,
+                          "increment_BCDstring32_helper",
+                          fnptr_to_fnentry( vbi,
+                                            &increment_BCDstring32_helper ),
+                          mkIRExprVec_3( mkU64( False /*Unsigned*/ ),
+                                         bcd_string1,
+                                         binop( Iop_Shr64,
+                                                mkexpr( bcd_result0 ),
+                                                mkU8( 32 ) ) ) ) );
+   assign( bcd_result2,
+           mkIRExprCCall( Ity_I64, 0 /*regparms*/,
+                          "increment_BCDstring32_helper",
+                          fnptr_to_fnentry( vbi,
+                                            &increment_BCDstring32_helper ),
+                          mkIRExprVec_3( mkU64( False /*Unsigned*/ ),
+                                         bcd_string2,
+                                         binop( Iop_Shr64,
+                                                mkexpr( bcd_result1 ),
+                                                mkU8( 32 ) ) ) ) );
+   assign( bcd_result3,
+           mkIRExprCCall( Ity_I64, 0 /*regparms*/,
+                          "increment_BCDstring32_helper",
+                          fnptr_to_fnentry( vbi,
+                                            &increment_BCDstring32_helper ),
+                          mkIRExprVec_3( mkU64( False /*Unsigned*/ ),
+                                         bcd_string3,
+                                         binop( Iop_Shr64,
+                                                mkexpr( bcd_result2 ),
+                                                mkU8( 32 ) ) ) ) );
+
+   /* Put the 128-bit result together from the intermediate results. Remember
+    * to mask out the carry out from the upper 32 bits of the results.
+    */
+   assign( bcd_result,
+           binop( Iop_64HLtoV128,
+                  binop( Iop_Or64,
+                         binop( Iop_And64,
+                                mkU64( 0xFFFFFFFF ), mkexpr (bcd_result2 ) ),
+                         binop( Iop_Shl64,
+                                mkexpr (bcd_result3 ), mkU8( 32 ) ) ),
+                  binop( Iop_Or64,
+                         binop( Iop_And64,
+                                mkU64( 0xFFFFFFFF ), mkexpr (bcd_result0 ) ),
+                         binop( Iop_Shl64,
+                                mkexpr (bcd_result1 ), mkU8( 32 ) ) ) ) );
+   return bcd_result;
+}
+
+static IRExpr * convert_to_zoned ( const VexAbiInfo* vbi,
+                                   IRExpr *src, IRExpr *upper_byte )
+{
+   /* The function takes a V128 packed decimal value and returns
+    * the value in zoned format.  Note, the sign of the value is ignored.
+    */
+   IRTemp result_low = newTemp( Ity_I64 );
+   IRTemp result_hi  = newTemp( Ity_I64 );
+   IRTemp result     = newTemp( Ity_V128 );
+
+   /* Since we can only return 64-bits from a clean helper, we will
+    * have to get the lower and upper 64-bits separately.
+    */
+
+   assign( result_low,
+           mkIRExprCCall( Ity_I64, 0 /*regparms*/,
+                          "convert_to_zoned_helper",
+                          fnptr_to_fnentry( vbi, &convert_to_zoned_helper ),
+                          mkIRExprVec_4( unop( Iop_V128HIto64, src ),
+                                         unop( Iop_V128to64, src ),
+                                         upper_byte,
+                                         mkU64( 0 ) ) ) );
+
+   assign( result_hi,
+           mkIRExprCCall( Ity_I64, 0 /*regparms*/,
+                          "convert_to_zoned_helper",
+                          fnptr_to_fnentry( vbi, &convert_to_zoned_helper ),
+                          mkIRExprVec_4( unop( Iop_V128HIto64, src ),
+                                         unop( Iop_V128to64, src ),
+                                         upper_byte,
+                                         mkU64( 1 ) ) ) );
+
+
+   assign( result,
+           binop( Iop_64HLtoV128, mkexpr( result_hi ), mkexpr( result_low ) ) );
+
+   return mkexpr( result );
+}
+
+static IRExpr * convert_to_national ( const VexAbiInfo* vbi,  IRExpr *src ) {
+   /* The function takes 128-bit value which has a 64-bit packed decimal
+    * value in the lower 64-bits of the source.  The packed decimal is
+    * converted to the national format via a clean helper.  The clean
+    * helper is used to to the large amount of IR code needed to do the
+    * conversion.  The helper returns the upper 64-bits of the 128-bit
+    * result if return_upper != 0.  Otherwise, the lower 64-bits of the
+    * result is returned.
+    */
+   IRTemp result_low = newTemp( Ity_I64 );
+   IRTemp result_hi  = newTemp( Ity_I64 );
+   IRTemp result     = newTemp( Ity_V128 );
+
+   /* Since we can only return 64-bits from a clean helper, we will
+    * have to get the lower and upper 64-bits separately.
+    */
+
+   assign( result_low,
+           mkIRExprCCall( Ity_I64, 0 /*regparms*/,
+                          "convert_to_national_helper",
+                          fnptr_to_fnentry( vbi, &convert_to_national_helper ),
+                          mkIRExprVec_2( unop( Iop_V128to64, src ),
+                                         mkU64( 0 ) ) ) );
+
+   assign( result_hi,
+           mkIRExprCCall( Ity_I64, 0 /*regparms*/,
+                          "convert_to_national_helper",
+                          fnptr_to_fnentry( vbi, &convert_to_national_helper ),
+                          mkIRExprVec_2( unop( Iop_V128to64, src ),
+                                         mkU64( 1 ) ) ) );
+
+   assign( result,
+           binop( Iop_64HLtoV128, mkexpr( result_hi ), mkexpr( result_low ) ) );
+
+   return mkexpr( result );
+}
+
+static IRExpr * convert_from_zoned ( const VexAbiInfo* vbi, IRExpr *src ) {
+   /* The function takes 128-bit zoned value and returns a signless 64-bit
+    * packed decimal value in the lower 64-bits of the 128-bit result.
+    */
+   IRTemp result = newTemp( Ity_V128 );
+
+   assign( result,
+           binop( Iop_ShlV128,
+                  binop( Iop_64HLtoV128,
+                         mkU64( 0 ),
+                         mkIRExprCCall( Ity_I64, 0 /*regparms*/,
+                                        "convert_from_zoned_helper",
+                                        fnptr_to_fnentry( vbi,
+                                                          &convert_from_zoned_helper ),
+                                        mkIRExprVec_2( unop( Iop_V128HIto64,
+                                                             src ),
+                                                       unop( Iop_V128to64,
+                                                             src ) ) ) ),
+                  mkU8( 4 ) ) );
+
+   return mkexpr( result );
+}
+
+static IRExpr * convert_from_national ( const VexAbiInfo* vbi, IRExpr *src ) {
+   /* The function takes 128-bit national value and returns a 64-bit
+    * packed decimal value.
+    */
+   IRTemp result = newTemp( Ity_I64);
+
+   assign( result,
+           mkIRExprCCall( Ity_I64, 0 /*regparms*/,
+                          "convert_from_national_helper",
+                          fnptr_to_fnentry( vbi,
+                                            &convert_from_national_helper ),
+                          mkIRExprVec_2( unop( Iop_V128HIto64,
+                                               src ),
+                                         unop( Iop_V128to64,
+                                               src ) ) ) );
+
+   return mkexpr( result );
+}
+
+static IRExpr * UNSIGNED_CMP_GT_V128 ( IRExpr *vA, IRExpr *vB ) {
+   /* This function does an unsigned compare of two V128 values. The
+    * function is for use in 32-bit mode only as it is expensive.  The
+    * issue is that compares (GT, LT, EQ) are not supported for operands
+    * larger then 32-bits when running in 32-bit mode.  The function returns
+    * a 1-bit expression, 1 for TRUE and 0 for FALSE.
+    */
+   IRTemp vA_word0 = newTemp( Ity_I32);
+   IRTemp vA_word1 = newTemp( Ity_I32);
+   IRTemp vA_word2 = newTemp( Ity_I32);
+   IRTemp vA_word3 = newTemp( Ity_I32);
+   IRTemp vB_word0 = newTemp( Ity_I32);
+   IRTemp vB_word1 = newTemp( Ity_I32);
+   IRTemp vB_word2 = newTemp( Ity_I32);
+   IRTemp vB_word3 = newTemp( Ity_I32);
+
+   IRTemp eq_word1 = newTemp( Ity_I1);
+   IRTemp eq_word2 = newTemp( Ity_I1);
+   IRTemp eq_word3 = newTemp( Ity_I1);
+
+
+   IRExpr *gt_word0, *gt_word1, *gt_word2, *gt_word3;
+   IRExpr *eq_word3_2, *eq_word3_2_1;
+   IRTemp result = newTemp( Ity_I1 );
+
+   assign( vA_word0, unop( Iop_64to32, unop( Iop_V128to64, vA ) ) );
+   assign( vA_word1, unop( Iop_64HIto32, unop( Iop_V128to64, vA ) ) );
+   assign( vA_word2, unop( Iop_64to32, unop( Iop_V128HIto64, vA ) ) );
+   assign( vA_word3, unop( Iop_64HIto32, unop( Iop_V128HIto64, vA ) ) );
+
+   assign( vB_word0, unop( Iop_64to32, unop( Iop_V128to64, vB ) ) );
+   assign( vB_word1, unop( Iop_64HIto32, unop( Iop_V128to64, vB ) ) );
+   assign( vB_word2, unop( Iop_64to32, unop( Iop_V128HIto64, vB ) ) );
+   assign( vB_word3, unop( Iop_64HIto32, unop( Iop_V128HIto64, vB ) ) );
+
+   assign( eq_word3, binop( Iop_CmpEQ32, mkexpr( vA_word3 ),
+                            mkexpr( vB_word3 ) ) );
+   assign( eq_word2, binop( Iop_CmpEQ32, mkexpr( vA_word2 ),
+                            mkexpr( vB_word2 ) ) );
+   assign( eq_word1, binop( Iop_CmpEQ32, mkexpr( vA_word1 ),
+                            mkexpr( vB_word1 ) ) );
+
+   gt_word3 = binop( Iop_CmpLT32U, mkexpr( vB_word3 ), mkexpr( vA_word3 ) );
+   gt_word2 = binop( Iop_CmpLT32U, mkexpr( vB_word2 ), mkexpr( vA_word2 ) );
+   gt_word1 = binop( Iop_CmpLT32U, mkexpr( vB_word1 ), mkexpr( vA_word1 ) );
+   gt_word0 = binop( Iop_CmpLT32U, mkexpr( vB_word0 ), mkexpr( vA_word0 ) );
+
+   eq_word3_2   = mkAND1( mkexpr( eq_word3 ), mkexpr( eq_word2 ) );
+   eq_word3_2_1 = mkAND1( mkexpr( eq_word1 ), eq_word3_2 );
+
+   assign( result, mkOR1(
+                         mkOR1( gt_word3,
+                                mkAND1( mkexpr( eq_word3 ), gt_word2 ) ),
+                         mkOR1( mkAND1( eq_word3_2, gt_word1 ),
+                                mkAND1( eq_word3_2_1, gt_word0 ) ) ) );
+   return mkexpr( result );
+}
+
 /*------------------------------------------------------------*/
 /* Transactional memory helpers
  *
@@ -3469,6 +4729,106 @@
 /*
   Integer Arithmetic Instructions
 */
+static Bool dis_int_mult_add ( UInt theInstr )
+{
+   /* VA-Form */
+   UChar rD_addr = ifieldRegDS( theInstr );
+   UChar rA_addr = ifieldRegA( theInstr );
+   UChar rB_addr = ifieldRegB( theInstr );
+   UChar rC_addr = ifieldRegC( theInstr );
+   UInt  opc2    = IFIELD( theInstr, 0, 6 );
+   IRType ty     = Ity_I64;
+   IRTemp rA     = newTemp( ty );
+   IRTemp rB     = newTemp( ty );
+   IRTemp rC     = newTemp( ty );
+   IRTemp rD     = newTemp( ty );
+   IRTemp tmpLo  = newTemp( Ity_I64 );
+   IRTemp tmpHi  = newTemp( Ity_I64 );
+   IRTemp tmp2Hi = newTemp( Ity_I64 );
+   IRTemp result = newTemp( Ity_I128 );
+   IRTemp resultLo = newTemp( Ity_I64 );
+   IRExpr* carryout;
+
+   assign( rA, getIReg( rA_addr ) );
+   assign( rB, getIReg( rB_addr ) );
+   assign( rC, getIReg( rC_addr ) );
+
+   switch (opc2) {
+   case 0x30:  // maddhd  multiply-add High doubleword signed
+      DIP("maddhd r%u,r%u,r%u,r%u\n", rD_addr, rA_addr, rB_addr, rC_addr);
+
+      assign( result, binop( Iop_MullS64, mkexpr( rA ), mkexpr( rB ) ) );
+      assign( tmpLo, unop( Iop_128to64, mkexpr( result ) ) );
+      assign( tmpHi, unop( Iop_128HIto64, mkexpr( result ) ) );
+
+      /* Multiply rA and rB then add rC.  If the lower 32-bits of the result
+       * is less then rC and the result rA * rB, a carry out of the lower 32
+       * bits occurred and the upper 32 bits must be incremented by 1. Sign
+       * extend rC and do the add to the upper 64 bits to handle the
+       * negative case for rC.
+       */
+      assign( resultLo, binop( Iop_Add64, mkexpr( tmpLo ), mkexpr( rC ) ) );
+      assign( tmp2Hi, binop( Iop_Add64,
+                             mkexpr( tmpHi ),
+                             unop( Iop_1Sto64,
+                                   unop( Iop_64to1,
+                                         binop( Iop_Shr64,
+                                                mkexpr( rC ),
+                                                mkU8( 63 ) ) ) ) ) );
+
+      /* need to do calculation for the upper 32 bit result */
+      carryout = mkAND1( binop( Iop_CmpLT64U,
+                                mkexpr( resultLo ), mkexpr( rC ) ),
+                         binop( Iop_CmpLT64U,
+                                mkexpr( resultLo ), mkexpr( tmpLo ) ) );
+      assign( rD, binop( Iop_Add64,
+                         mkexpr( tmp2Hi ),
+                         unop( Iop_1Uto64, carryout ) ) );
+      break;
+
+   case 0x31:  // maddhdu   multiply-add High doubleword unsigned
+      DIP("maddhdu r%u,r%u,r%u,r%u\n", rD_addr, rA_addr, rB_addr, rC_addr);
+
+      assign( result, binop( Iop_MullU64, mkexpr( rA ), mkexpr( rB ) ) );
+      assign( tmpLo, unop( Iop_128to64, mkexpr( result ) ) );
+      assign( tmpHi, unop( Iop_128HIto64, mkexpr( result ) ) );
+
+      /* Add rC, if the lower 32-bits of the result is less then rC and
+       * tmpLo, a carry out of the lower 32 bits occurred. Upper 32 bits
+       * must be incremented by 1.
+       */
+      assign( resultLo, binop( Iop_Add64, mkexpr( tmpLo ), mkexpr( rC ) ) );
+
+      /* need to do calculation for the upper 32 bit result */
+      carryout = mkAND1( binop( Iop_CmpLT64U,
+                                mkexpr(resultLo), mkexpr( rC ) ),
+                         binop( Iop_CmpLT64U,
+                                mkexpr(resultLo), mkexpr( tmpLo ) ) );
+      assign( rD, binop( Iop_Add64,
+                         mkexpr( tmpHi ),
+                         unop( Iop_1Uto64, carryout ) ) );
+      break;
+
+   case 0x33:  // maddld   multiply-add Low doubleword
+      DIP("maddld r%u,r%u,r%u,r%u\n", rD_addr, rA_addr, rB_addr, rC_addr);
+
+      assign( result, binop( Iop_MullS64, mkexpr( rA ), mkexpr( rB ) ) );
+      assign( tmpLo, unop( Iop_128to64, mkexpr( result ) ) );
+      assign( tmpHi, unop( Iop_128HIto64, mkexpr( result ) ) );
+
+      assign( rD, binop( Iop_Add64, mkexpr( tmpLo ), mkexpr( rC ) ) );
+      break;
+
+   default:
+      vex_printf("dis_int_mult(ppc): unrecognized instruction\n");
+      return False;
+   }
+
+   putIReg( rD_addr, mkexpr(rD) );
+
+   return True;
+}
+
 static Bool dis_int_arith ( UInt theInstr )
 {
    /* D-Form, XO-Form */
@@ -4101,7 +5461,542 @@
    return True;
 }
 
+static Bool dis_modulo_int ( UInt theInstr )
+{
+   /* X-Form */
+   UChar opc1    = ifieldOPC( theInstr );
+   UInt  opc2    = ifieldOPClo10( theInstr );
+   UChar rA_addr = ifieldRegA( theInstr );
+   UChar rB_addr = ifieldRegB( theInstr );
+   UChar rD_addr = ifieldRegDS( theInstr );
+   IRType ty     = mode64 ? Ity_I64 : Ity_I32;
+   IRTemp rD     = newTemp( ty );
 
+   switch (opc1) {
+   /* X-Form */
+   case 0x1F:
+      switch (opc2) {
+      case 0x109: // modud  Modulo Unsigned Double Word
+         {
+            IRTemp rA = newTemp( Ity_I64 );
+            IRTemp rB = newTemp( Ity_I64 );
+            IRTemp quotient = newTemp( Ity_I64 );
+            IRTemp quotientTimesDivisor = newTemp( Ity_I64 );
+            IRTemp remainder = newTemp( Ity_I64 );
+            IRTemp rB_0   = newTemp( Ity_I64 );    /* all 1's if rB = 0 */
+
+            DIP("modud r%u,r%u,r%u\n", rD_addr, rA_addr, rB_addr);
+
+            assign( rA, getIReg( rA_addr ) );
+            assign( rB, getIReg( rB_addr ) );
+
+            assign( quotient,
+                    binop( Iop_DivU64, mkexpr( rA ), mkexpr( rB ) ) );
+
+            assign( quotientTimesDivisor,
+                    binop( Iop_Mul64,
+                           mkexpr( quotient ),
+                           mkexpr( rB ) ) );
+
+            assign( remainder,
+                    binop( Iop_Sub64,
+                           mkexpr( rA ),
+                           mkexpr( quotientTimesDivisor ) ) );
+
+            /* Need to match the HW for these special cases
+             * rB = 0         result all zeros
+             */
+            assign( rB_0, unop( Iop_1Sto64,
+                                binop( Iop_CmpEQ64,
+                                       mkexpr( rB ),
+                                       mkU64( 0x0 ) ) ) );
+
+            assign (rD, binop( Iop_And64,
+                               unop( Iop_Not64, mkexpr( rB_0 ) ),
+                               mkexpr( remainder ) ) );
+            break;
+         }
+
+      case 0x10B: // moduw  Modulo Unsigned Word
+         {
+            IRTemp quotient = newTemp( Ity_I32 );
+            IRTemp quotientTimesDivisor = newTemp( Ity_I32 );
+            IRTemp remainder = newTemp( Ity_I32 );
+
+            IRTemp rA     = newTemp( Ity_I32 );
+            IRTemp rB     = newTemp( Ity_I32 );
+            IRTemp rB_0   = newTemp( Ity_I32 );     /* all 1's if rB = 0 */
+
+            DIP("moduw r%u,r%u,r%u\n", rD_addr, rA_addr, rB_addr);
+
+            if ( ty == Ity_I64 ) {
+               /* rA and rB are 32 bit values in bits 32:63 of the
+                * 64-bit register.
+                */
+               assign( rA, unop( Iop_64to32, getIReg( rA_addr ) ) );
+               assign( rB, unop( Iop_64to32, getIReg( rB_addr ) ) );
+
+            } else {
+               assign( rA, getIReg( rA_addr ) );
+               assign( rB, getIReg( rB_addr ) );
+            }
+
+            assign( quotient,
+                    binop( Iop_DivU32, mkexpr( rA ), mkexpr( rB ) ) );
+
+            assign( quotientTimesDivisor,
+                    unop( Iop_64to32,
+                          binop( Iop_MullU32,
+                                 mkexpr( quotient ),
+                                 mkexpr( rB ) ) ) );
+
+            assign( remainder,
+                    binop( Iop_Sub32,
+                           mkexpr( rA ),
+                           mkexpr( quotientTimesDivisor ) ) );
+
+            /* Need to match the HW for these special cases
+             * rB = 0         result all zeros
+             */
+            assign( rB_0, unop( Iop_1Sto32,
+                                binop( Iop_CmpEQ32,
+                                       mkexpr( rB ),
+                                       mkU32( 0x0 ) ) ) );
+
+            assign (rD, binop( Iop_32HLto64,
+                               mkU32( 0 ),
+                               binop( Iop_And32,
+                                      unop( Iop_Not32, mkexpr( rB_0 ) ),
+                                      mkexpr( remainder ) ) ) );
+            break;
+         }
+
+      case 0x21A: // cnttzw, cnttzw.   Count Trailing Zero Word
+         {
+            /* Note cnttzw RA, RS  - RA is dest, RS is source.  But the
+             * order of the operands in theInst is opc1 RS RA opc2 which has
+             * the operand fields backwards to what the standard order.
+             */
+            UChar rA_address = ifieldRegA(theInstr);
+            UChar rS_address = ifieldRegDS(theInstr);
+            IRTemp rA     = newTemp(Ity_I64);
+            IRTemp rS     = newTemp(Ity_I64);
+            UChar flag_rC = ifieldBIT0(theInstr);
+            IRTemp result = newTemp(Ity_I32);
+
+            DIP("cnttzw%s r%u,r%u\n",  flag_rC ? "." : "",
+                rA_address, rS_address);
+
+            assign( rS, getIReg( rS_address ) );
+            assign( result, unop( Iop_Ctz32,
+                                  unop( Iop_64to32, mkexpr( rS ) ) ) );
+            assign( rA, binop( Iop_32HLto64, mkU32( 0 ), mkexpr( result ) ) );
+
+            if ( flag_rC )
+               set_CR0( mkexpr( rA ) );
+
+            putIReg( rA_address, mkexpr( rA ) );
+
+            return True;  /* Return here since this inst is not consistent
+                           * with the other instructions
+                           */
+         }
+         break;
+
+      case 0x23A: // cnttzd, cnttzd.   Count Trailing Zero Double word
+         {
+            /* Note cnttzd RA, RS  - RA is dest, RS is source.  But the
+             * order of the operands in theInst is opc1 RS RA opc2 which has
+             * the operand order listed backwards to what is standard.
+             */
+            UChar rA_address = ifieldRegA(theInstr);
+            UChar rS_address = ifieldRegDS(theInstr);
+            IRTemp rA     = newTemp(Ity_I64);
+            IRTemp rS     = newTemp(Ity_I64);
+            UChar flag_rC = ifieldBIT0(theInstr);
+
+            DIP("cnttzd%s r%u,r%u\n",  flag_rC ? "." : "",
+                rA_address, rS_address);
+
+            assign( rS, getIReg( rS_address ) );
+            assign( rA, unop( Iop_Ctz64, mkexpr( rS ) ) );
+
+            if ( flag_rC == 1 )
+               set_CR0( mkexpr( rA ) );
+
+            putIReg( rA_address, mkexpr( rA ) );
+
+            return True;  /* Return here since this inst is not consistent
+                           * with the other instructions
+                           */
+         }
+         break;
+
+      case 0x309: // modsd  Modulo Signed Double Word
+         {
+            IRTemp rA     = newTemp( Ity_I64 );
+            IRTemp rB     = newTemp( Ity_I64 );
+            IRTemp rA2_63 = newTemp( Ity_I64 );    /* all 1's if rA != -2^63 */
+            IRTemp rB_0   = newTemp( Ity_I1 );     /* 1 if rB = 0 */
+            IRTemp rB_1   = newTemp( Ity_I1 );     /* 1 if rB = -1 */
+            IRTemp rA_1   = newTemp( Ity_I1 );     /* 1 if rA = -1 */
+            IRTemp resultis0   = newTemp( Ity_I64 );
+            IRTemp resultisF   = newTemp( Ity_I64 );
+            IRTemp quotient = newTemp( Ity_I64 );
+            IRTemp quotientTimesDivisor = newTemp( Ity_I64 );
+            IRTemp remainder = newTemp( Ity_I64 );
+            IRTemp tmp  = newTemp( Ity_I64 );
+
+            DIP("modsd r%u,r%u,r%u\n", rD_addr, rA_addr, rB_addr);
+
+            assign( rA, getIReg( rA_addr ) );
+            assign( rB, getIReg( rB_addr ) );
+
+            assign( rA2_63, unop ( Iop_1Sto64,
+                                   binop( Iop_CmpNE64,
+                                          mkexpr( rA ),
+                                          mkU64( 0x8000000000000000 ) ) ) );
+            assign( rB_0, binop( Iop_CmpEQ64,
+                                 mkexpr( rB ),
+                                 mkU64( 0x0 ) ) );
+
+            assign( rB_1, binop( Iop_CmpEQ64,
+                                 mkexpr( rB ),
+                                 mkU64( 0xFFFFFFFFFFFFFFFF ) ) );
+
+            assign( rA_1, binop( Iop_CmpEQ64,
+                                 mkexpr( rA ),
+                                 mkU64( 0xFFFFFFFFFFFFFFFF ) ) );
+
+            /* Need to match the HW for these special cases
+             * rA = -2^31 and rB = -1              result all zeros
+             * rA =  -1 and rB = -1                result all zeros
+             * rA =  -1 and (rB != -1 AND rB != 0) result all 1's
+             */
+            assign( resultis0,
+                    binop( Iop_Or64,
+                           mkexpr( rA2_63 ),
+                           unop ( Iop_1Sto64, mkexpr( rB_1 ) ) ) );
+            assign( resultisF,
+                    binop( Iop_And64,
+                           unop( Iop_1Sto64, mkexpr( rA_1 ) ),
+                           binop( Iop_And64,
+                                  unop( Iop_Not64,
+                                        unop( Iop_1Sto64, mkexpr( rB_0 ) ) ),
+                                  unop( Iop_Not64,
+                                        unop( Iop_1Sto64, mkexpr( rB_1 ) ) )
+                                  ) ) );
+
+            /* The following remainder computation works as long as
+             * rA != -2^63 and rB != -1.
+             */
+            assign( quotient,
+                    binop( Iop_DivS64, mkexpr( rA ), mkexpr( rB ) ) );
+
+            assign( quotientTimesDivisor,
+                    binop( Iop_Mul64,
+                           mkexpr( quotient ),
+                           mkexpr( rB ) ) );
+
+            assign( remainder,
+                    binop( Iop_Sub64,
+                           mkexpr( rA ),
+                           mkexpr( quotientTimesDivisor ) ) );
+
+            assign( tmp, binop( Iop_And64,
+                                mkexpr( remainder ),
+                                unop( Iop_Not64,
+                                      mkexpr( resultis0 ) ) ) );
+
+            assign( rD, binop( Iop_Or64,
+                               binop( Iop_And64,
+                                      unop (Iop_Not64,
+                                            mkexpr( resultisF ) ),
+                                      mkexpr( tmp ) ),
+                               mkexpr( resultisF ) ) );
+            break;
+         }
+      case 0x30B: // modsw  Modulo Signed Word
+         {
+            IRTemp rA     = newTemp( Ity_I32 );
+            IRTemp rB     = newTemp( Ity_I32 );
+            IRTemp rA2_32 = newTemp( Ity_I32 );    /* all 1's if rA = -2^32 */
+            IRTemp rB_0   = newTemp( Ity_I1 );     /* 1 if rB = 0 */
+            IRTemp rB_1   = newTemp( Ity_I1 );     /* 1 if rB = -1 */
+            IRTemp rA_1   = newTemp( Ity_I1 );     /* 1 if rA = -1 */
+            IRTemp resultis0   = newTemp( Ity_I32 );
+            IRTemp resultisF   = newTemp( Ity_I64 );
+            IRTemp quotient = newTemp( Ity_I32 );
+            IRTemp quotientTimesDivisor = newTemp( Ity_I32 );
+            IRTemp remainder = newTemp( Ity_I32 );
+            IRTemp tmp  = newTemp( Ity_I64 );
+
+            DIP("modsw r%u,r%u,r%u\n", rD_addr, rA_addr, rB_addr);
+
+            if ( ty == Ity_I64 ) {
+               /* rA and rB are 32 bit values in bits 32:63 of the
+                * 64-bit register.
+                */
+               assign( rA, unop(Iop_64to32, getIReg(rA_addr) ) );
+               assign( rB, unop(Iop_64to32, getIReg(rB_addr) ) );
+
+            } else {
+               assign( rA, getIReg(rA_addr) );
+               assign( rB, getIReg(rB_addr) );
+            }
+
+            assign( rA2_32, unop( Iop_1Sto32,
+                                  binop( Iop_CmpEQ32,
+                                         mkexpr( rA ),
+                                         mkU32( 0x80000000 ) ) ) );
+            /* If the divisor is zero, then the result is undefined.
+             * However, we will make the result be zero to match what
+             * the hardware does.
+             */
+            assign( rB_0, binop( Iop_CmpEQ32,
+                                 mkexpr( rB ),
+                                 mkU32( 0x0 ) ) );
+
+            assign( rB_1, binop( Iop_CmpEQ32,
+                                 mkexpr( rB ),
+                                 mkU32( 0xFFFFFFFF ) ) );
+
+            assign( rA_1, binop( Iop_CmpEQ32,
+                                 mkexpr( rA ),
+                                 mkU32( 0xFFFFFFFF ) ) );
+
+            /* Need to match the HW for these special cases
+             * rA = -2^31 and rB = -1              result all zeros
+             * rA =  -1 and rB = -1                result all zeros
+             * rA =  -1 and (rB != -1 AND rB != 0) result all 1's
+             */
+            assign( resultis0,
+                    binop( Iop_Or32,
+                           unop( Iop_Not32,
+                                 binop( Iop_And32,
+                                        mkexpr( rA2_32 ),
+                                        unop( Iop_1Sto32,
+                                              mkexpr( rB_1 ) ) ) ),
+                           binop( Iop_And32,
+                                  unop( Iop_1Sto32, mkexpr( rA_1 ) ),
+                                  unop( Iop_1Sto32, mkexpr( rB_1 ) ) ) ) );
+            assign( resultisF,
+                    binop( Iop_And64,
+                           unop( Iop_1Sto64, mkexpr( rA_1 ) ),
+                           binop( Iop_And64,
+                                  unop( Iop_Not64,
+                                        unop( Iop_1Sto64, mkexpr( rB_0 ) ) ),
+                                  unop( Iop_Not64,
+                                        unop( Iop_1Sto64, mkexpr( rB_1 ) ) )
+                                  ) ) );
+
+            /* The following remainder computation works as long as
+             * rA != -2^31 and rB != -1.
+             */
+            assign( quotient,
+                    binop( Iop_DivS32, mkexpr( rA ), mkexpr( rB ) ) );
+
+            assign( quotientTimesDivisor,
+                    unop( Iop_64to32,
+                          binop( Iop_MullS32,
+                                 mkexpr( quotient ),
+                                 mkexpr( rB ) ) ) );
+
+            assign( remainder,
+                    binop( Iop_Sub32,
+                           mkexpr( rA ),
+                           mkexpr( quotientTimesDivisor ) ) );
+
+            assign( tmp, binop( Iop_32HLto64,
+                                mkU32( 0 ),
+                                binop( Iop_And32,
+                                       mkexpr( remainder ),
+                                       unop( Iop_Not32,
+                                             mkexpr( resultis0 ) ) ) ) );
+
+            assign( rD, binop( Iop_Or64,
+                               binop( Iop_And64,
+                                      unop ( Iop_Not64,
+                                             mkexpr( resultisF ) ),
+                                      mkexpr( tmp ) ),
+                               mkexpr( resultisF ) ) );
+            break;
+         }
+
+      default:
+         vex_printf("dis_modulo_int(ppc)(opc2)\n");
+         return False;
+      }
+      break;
+
+   default:
+      vex_printf("dis_modulo_int(ppc)(opc1)\n");
+      return False;
+   }
+
+   putIReg( rD_addr, mkexpr( rD ) );
+
+   return True;
+}
+
+
+/*
+  Byte Compare Instructions
+*/
+static Bool dis_byte_cmp ( UInt theInstr )
+{
+   /* X-Form */
+   UChar opc1 = ifieldOPC(theInstr);
+   UInt  opc2 = ifieldOPClo10(theInstr);
+   UChar rA_addr = ifieldRegA(theInstr);
+   UChar rB_addr = ifieldRegB(theInstr);
+   IRTemp rA     = newTemp(Ity_I64);
+   IRTemp rB     = newTemp(Ity_I64);
+   UChar L    = toUChar( IFIELD( theInstr, 21, 1 ) );
+   UChar BF   = toUChar( IFIELD( theInstr, 23, 3 ) );
+
+   assign( rA, getIReg(rA_addr) );
+   assign( rB, getIReg(rB_addr) );
+
+   if (opc1 != 0x1F) {
+      vex_printf("dis_byte_cmp(ppc)(opc1)\n");
+      return False;
+   }
+
+   switch (opc2) {
+   case 0xc0: // cmprb (Compare Ranged Byte)
+      {
+         IRExpr *value;
+         IRExpr *hi_1, *lo_1, *hi_2, *lo_2;
+         IRExpr *inrange_1, *inrange_2;
+
+         DIP("cmprb %u,%u,r%u,r%u\n", BF, L, rA_addr, rB_addr);
+
+         hi_1 = binop( Iop_Shr64,
+                       binop( Iop_And64,
+                              mkexpr( rB ),
+                              mkU64( 0xFF000000 ) ),
+                       mkU8( 24 ) );
+         lo_1 = binop( Iop_Shr64,
+                       binop( Iop_And64,
+                              mkexpr( rB ),
+                              mkU64( 0xFF0000 ) ) ,
+                       mkU8( 16 ) );
+         hi_2 = binop( Iop_Shr64,
+                       binop( Iop_And64,
+                              mkexpr( rB ),
+                              mkU64( 0xFF00 ) ),
+                       mkU8( 8 ) );
+         lo_2 = binop( Iop_And64,
+                       mkexpr( rB ),
+                       mkU64( 0xFF ) );
+         value = binop( Iop_And64,
+                        mkexpr( rA ),
+                        mkU64( 0xFF ) );
+
+         inrange_1 = mkAND1( binop( Iop_CmpLE64U, value, hi_1 ),
+                             mkNOT1( binop( Iop_CmpLT64U, value, lo_1 ) ) );
+         inrange_2 = mkAND1( binop( Iop_CmpLE64U, value, hi_2 ),
+                             mkNOT1( binop( Iop_CmpLT64U, value, lo_2 ) ) );
+
+         putGST_field( PPC_GST_CR,
+                       binop( Iop_Shl32,
+                              binop( Iop_Or32,
+                                     unop( Iop_1Uto32, inrange_2 ),
+                                     binop( Iop_And32,
+                                            mkU32 ( L ),
+                                            unop( Iop_1Uto32, inrange_1 ) ) ),
+                              mkU8( 2 ) ),
+                       BF );
+      }
+      break;
+
+   case 0xE0: // cmpeqb (Compare Equal Byte)
+      {
+         Int i;
+         IRTemp tmp[9];
+         IRExpr *value;
+
+         DIP("cmpeqb %u,r%u,r%u\n", BF, rA_addr, rB_addr);
+
+         value = binop( Iop_And64,
+                        mkexpr( rA ),
+                        mkU64( 0xFF ) );
+
+         tmp[0] = newTemp(Ity_I32);
+         assign( tmp[0], mkU32( 0 ) );
+
+         for(i = 0; i < 8; i++) {
+            tmp[i+1] = newTemp(Ity_I32);
+            assign( tmp[i+1], binop( Iop_Or32,
+                                     unop( Iop_1Uto32,
+                                           binop( Iop_CmpEQ64,
+                                                  value,
+                                                  binop( Iop_And64,
+                                                         binop( Iop_Shr64,
+                                                                mkexpr( rB ),
+                                                                mkU8( i*8 ) ),
+                                                         mkU64( 0xFF ) ) ) ),
+                                     mkexpr( tmp[i] ) ) );
+         }
+
+         putGST_field( PPC_GST_CR,
+                       binop( Iop_Shl32,
+                              unop( Iop_1Uto32,
+                                    mkNOT1( binop( Iop_CmpEQ32,
+                                                   mkexpr( tmp[8] ),
+                                                   mkU32( 0 ) ) ) ),
+                              mkU8( 2 ) ),
+                       BF );
+      }
+      break;
+
+   default:
+      vex_printf("dis_byte_cmp(ppc)(opc2)\n");
+      return False;
+   }
+   return True;
+}
+
+/*
+ * Integer Miscellaneous instructions
+ */
+static Bool dis_int_misc ( UInt theInstr )
+{
+   Int wc = IFIELD(theInstr, 21, 2);
+   UChar opc1 = ifieldOPC(theInstr);
+   UInt  opc2 = ifieldOPClo10(theInstr);
+
+   if ( opc1 != 0x1F ) {
+      vex_printf("dis_modulo_int(ppc)(opc1)\n");
+      return False;
+   }
+
+   switch (opc2) {
+   case 0x01E: // wait, (X-from)
+      DIP("wait %u\n", wc);
+
+      /* The wait instruction causes instruction fetching and execution
+       * to be suspended.  Instruction fetching and execution are resumed
+       * when the events specified by the WC field occur.
+       *
+       *    0b00   Resume instruction fetching and execution when an
+       *           exception or an event-based branch exception occurs,
+       *           or a resume signal from the platform is recieved.
+       *
+       *    0b01   Reserved.
+       *
+       *  For our purposes, we will just assume the contition is always
+       * immediately satisfied.
+       */
+      break;
+   default:
+      vex_printf("dis_int_misc(ppc)(opc2)\n");
+      return False;
+}
+
+   return True;
+}
 
 /*
   Integer Compare Instructions
@@ -4128,7 +6023,7 @@
       return False;
    }
    
-   if (b22 != 0) {
+   if (( b22 != 0 ) && ( opc2 != 0x080 ) ) {   // setb case exception
       vex_printf("dis_int_cmp(ppc)(b22)\n");
       return False;
    }
@@ -4208,6 +6103,50 @@
          putCR0( crfD, getXER_SO() );
          break;
 
+      case 0x080: // setb (Set Boolean)
+         {
+            UChar rT_addr = ifieldRegDS(theInstr);
+            Int bfa = IFIELD(theInstr, 18, 3);
+            IRTemp cr = newTemp(Ity_I32);
+            IRTemp cr0 = newTemp(Ity_I32);
+            IRTemp cr1 = newTemp(Ity_I32);
+            IRTemp result = newTemp(Ity_I64);
+
+            DIP("setb r%u,%d\n", rT_addr, bfa);
+
+            /* Fetch the entire condition code value */
+            assign( cr, getGST( PPC_GST_CR ) );
+
+            /* Get bit zero (IBM numbering) of the CR field specified
+             * by bfa.
+             */
+            assign( cr0, binop( Iop_And32,
+                                binop( Iop_Shr32,
+                                       mkexpr( cr ),
+                                       mkU8( (7-bfa)*4 ) ),
+                                mkU32( 0x8 ) ) );
+            assign( cr1, binop( Iop_And32,
+                                binop( Iop_Shr32,
+                                       mkexpr( cr ),
+                                       mkU8( (7-bfa)*4 ) ),
+                                mkU32( 0x4 ) ) );
+            assign( result, binop( Iop_Or64,
+                                   unop( Iop_1Sto64,
+                                       binop( Iop_CmpEQ32,
+                                              mkexpr( cr0 ),
+                                              mkU32( 0x8 ) ) ),
+                                   binop( Iop_32HLto64,
+                                          mkU32( 0 ),
+                                          unop( Iop_1Uto32,
+                                                binop( Iop_CmpEQ32,
+                                                       mkexpr( cr1 ),
+                                                       mkU32( 0x4 ) ) ) ) ) );
+            if ( ty == Ity_I64 )
+               putIReg( rT_addr, mkexpr( result ) );
+            else
+               putIReg( rT_addr, unop( Iop_64to32, mkexpr(result ) ) );
+         }
+         break;
       default:
          vex_printf("dis_int_cmp(ppc)(opc2)\n");
          return False;
@@ -4290,8 +6229,37 @@
 
    /* X Form */
    case 0x1F:
+
+      opc2 = IFIELD( theInstr, 2, 9 );
+
+      switch ( opc2 ) {
+      case 0x1BD: // extswsli (Extend Sign Word shift left)
+         {
+            /* sh[5] is in bit 1, sh[0:4] is in bits [14:10] of theInstr */
+            UChar sh = IFIELD( theInstr, 11, 5 ) | (IFIELD(theInstr, 1, 1) << 5);
+            IRTemp temp = newTemp( ty );
+
+            DIP("extswsli%s r%u,r%u,%u\n", flag_rC ? ".":"",
+                rA_addr, rS_addr, sh);
+
+            assign( temp, unop( Iop_32Sto64,
+                                unop( Iop_64to32, mkexpr( rS ) ) ) );
+            assign( rA, binop( Iop_Shl64, mkexpr( temp ), mkU8( sh ) ) );
+            putIReg( rA_addr, mkexpr( rA ) );
+
+            if ( flag_rC ) {
+               set_CR0( mkexpr( rA ) );
+            }
+            return True;
+         }
+      default:
+         break;  // drop to next opc2 check
+      }
+
       do_rc = True; // All below record to CR, except for where we return at case end.
 
+      opc2 = ifieldOPClo10( theInstr );
+
       switch (opc2) {
       case 0x01C: // and (AND, PPC32 p356)
          DIP("and%s r%u,r%u,r%u\n",
@@ -6026,11 +7994,60 @@
    return True;
 }
 
+/*
+ *  PC relative instruction
+ */
+static Bool dis_pc_relative ( UInt theInstr )
+{
+   /* DX-Form */
+   UChar opc1 = ifieldOPC(theInstr);
+   unsigned long long D;
+   UInt d0 = IFIELD(theInstr,  6, 10);
+   UInt d1 = IFIELD(theInstr, 16,  5);
+   UInt d2 = IFIELD(theInstr,  0,  1);
+   UChar rT_addr = ifieldRegDS(theInstr);
+   UInt  opc2    = ifieldOPClo5(theInstr);
+   IRType ty     = mode64 ? Ity_I64 : Ity_I32;
 
+   if ( opc1 != 0x13) {
+      vex_printf("dis_pc_relative(ppc)(opc1)\n");
+      return False;
+   }
+
+   switch (opc2) {
+   case 0x002:   // addpcis  (Add PC immediate Shifted DX-form)
+      {
+         IRExpr* nia     = mkSzImm(ty, nextInsnAddr());
+         IRExpr* result;
+
+         D = (d0 << 6) | (d1 << 1) | d2;
+         DIP("addpcis %u,%llu\n", rT_addr, D);
+
+         if ( (D & 0x8000) == 0x8000 )
+            D = 0xFFFFFFFFFFFF0000UL | D;  // sign extend
+
+         if ( ty == Ity_I32 ) {
+            result = binop( Iop_Add32, nia, mkU32( D << 16 ) );
+         } else {
+            vassert( ty == Ity_I64 );
+            result = binop( Iop_Add64, nia, mkU64( D << 16 ) );
+         }
+
+         putIReg( rT_addr, result);
+      }
+      break;
+
+   default:
+      vex_printf("dis_pc_relative(ppc)(opc2)\n");
+      return False;
+   }
+
+   return True;
+}
 
 /*
   Condition Register Logical Instructions
-*/
+ */
 static Bool dis_cond_logic ( UInt theInstr )
 {
    /* XL-Form */
@@ -7725,6 +9742,31 @@
    IRRoundingMode.  PPCRoundingMode encoding is different to
    IRRoundingMode, so need to map it.
 */
+
+static IRExpr* /* :: Ity_I32 */ set_round_to_Oddmode ( void )
+{
+/* PPC/ valgrind have two-bits to designate the rounding mode.
+   ISA 3.0 adds instructions than can use a round to odd mode
+   but did not change the number of bits for the rm.  Basically,
+   they added two instructions that only differ by the rounding
+   mode the operation uses.  In essesce, they encoded the rm
+   in the name.  In order to avoid having to create Iops, that
+   encode the rm in th name, we will "expand" the definition of
+   the rounding mode bits.  We will just pass the rm and then
+   map the to odd mode to the appropriate PPCFpOp name that
+   will tell us which instruction to map to.
+
+   rounding mode | PPC | IR
+   ------------------------
+   to nearest    | 000  | 00
+   to zero       | 001  | 11
+   to +infinity  | 010  | 10
+   to -infinity  | 011  | 01
+   to odd        | 1xx  | xx
+*/
+   return mkU32(8);
+}
+
 static IRExpr* /* :: Ity_I32 */ get_IR_roundingmode ( void )
 {
 /* 
@@ -8542,6 +10584,7 @@
  *  Otherwise fg_flag is set to 0.
  *
  */
+
 static void do_fp_tsqrt(IRTemp frB_Int, Bool sp, IRTemp * fe_flag_tmp, IRTemp * fg_flag_tmp)
 {
    // The following temps are for holding intermediate results
@@ -8555,7 +10598,12 @@
    IRTemp  frbInf_tmp = newTemp(Ity_I1);
    *fe_flag_tmp = newTemp(Ity_I32);
    *fg_flag_tmp = newTemp(Ity_I32);
-   assign( frB_exp_shR, fp_exp_part( frB_Int, sp ) );
+
+   if ( sp )
+      assign( frB_exp_shR, fp_exp_part( Ity_I32, frB_Int ) );
+   else
+      assign( frB_exp_shR, fp_exp_part( Ity_I64, frB_Int ) );
+
    assign(e_b, binop( Iop_Sub32, mkexpr(frB_exp_shR), mkU32( bias ) ));
 
    //////////////////  fe_flag tests BEGIN //////////////////////
@@ -8563,9 +10611,17 @@
     * (NOTE: These tests are similar to those used for ftdiv.  See do_fp_tdiv()
     * for details.)
     */
-   frbNaN = sp ? is_NaN_32(frB_Int) : is_NaN(frB_Int);
-   assign( frbInf_tmp, is_Inf(frB_Int, sp) );
-   assign( frbZero_tmp, is_Zero(frB_Int, sp ) );
+   if ( sp ) {
+      frbNaN = is_NaN( Ity_I32, frB_Int );
+      assign( frbInf_tmp, is_Inf( Ity_I32, frB_Int ) );
+      assign( frbZero_tmp, is_Zero( Ity_I32, frB_Int ) );
+
+   } else {
+      frbNaN = is_NaN( Ity_I64, frB_Int );
+      assign( frbInf_tmp, is_Inf( Ity_I64, frB_Int ) );
+      assign( frbZero_tmp, is_Zero( Ity_I64, frB_Int ) );
+   }
+
    {
       // Test_value = -970 for double precision
       UInt test_value = sp ? 0xffffff99 : 0xfffffc36;
@@ -8679,8 +10735,14 @@
    IRExpr * fe_flag, * fg_flag;
 
    // Create temps that will be used throughout the following tests.
-   assign( frA_exp_shR, fp_exp_part( frA_int, sp ) );
-   assign( frB_exp_shR, fp_exp_part( frB_int, sp ) );
+   if ( sp ) {
+      assign( frA_exp_shR, fp_exp_part( Ity_I32, frA_int ) );
+      assign( frB_exp_shR, fp_exp_part( Ity_I32, frB_int ) );
+   } else{
+      assign( frA_exp_shR, fp_exp_part( Ity_I64, frA_int ) );
+      assign( frB_exp_shR, fp_exp_part( Ity_I64, frB_int ) );
+   }
+
    /* Let e_[a|b] be the unbiased exponent: i.e. exp - 1023. */
    assign(e_a, binop( Iop_Sub32, mkexpr(frA_exp_shR), mkU32( bias ) ));
    assign(e_b, binop( Iop_Sub32, mkexpr(frB_exp_shR), mkU32( bias ) ));
@@ -8693,28 +10755,26 @@
     * Test if the double-precision floating-point operand in register FRA is
     * a NaN:
     */
-   fraNaN = sp ? is_NaN_32(frA_int) : is_NaN(frA_int);
+   fraNaN = sp ? is_NaN( Ity_I32, frA_int ) : is_NaN( Ity_I64, frA_int );
    /*
-    * Test if the double-precision floating-point operand in register FRA is
-    * an Infinity.
+    * Test if the double-precision floating-point operands in register FRA
+    * and FRB is an Infinity.  Test if FRB is zero.
     */
-   assign(fraInf_tmp, is_Inf(frA_int, sp));
+   if ( sp ) {
+      assign(fraInf_tmp, is_Inf( Ity_I32, frA_int ) );
+      assign( frbInf_tmp, is_Inf( Ity_I32, frB_int ) );
+      assign( frbZero_tmp, is_Zero( Ity_I32, frB_int ) );
 
+   } else {
+      assign(fraInf_tmp, is_Inf( Ity_I64, frA_int ) );
+      assign( frbInf_tmp, is_Inf( Ity_I64, frB_int ) );
+      assign( frbZero_tmp, is_Zero( Ity_I64, frB_int ) );
+   }
    /*
     * Test if the double-precision floating-point operand in register FRB is
     * a NaN:
     */
-   frbNaN = sp ? is_NaN_32(frB_int) : is_NaN(frB_int);
-   /*
-    * Test if the double-precision floating-point operand in register FRB is
-    * an Infinity.
-    */
-   assign( frbInf_tmp, is_Inf(frB_int, sp) );
-   /*
-    * Test if the double-precision floating-point operand in register FRB is
-    * a Zero.
-    */
-   assign( frbZero_tmp, is_Zero(frB_int, sp) );
+   frbNaN = sp ? is_NaN( Ity_I32, frB_int ) : is_NaN( Ity_I64, frB_int );
 
    /*
     * Test if e_b <= -1022 for double precision;
@@ -8737,7 +10797,11 @@
    /*
     * Test if FRA != Zero and (e_a - e_b) >= bias
     */
-   assign( fraNotZero_tmp, unop( Iop_Not1, is_Zero( frA_int, sp ) ) );
+   if ( sp )
+      assign( fraNotZero_tmp, unop( Iop_Not1, is_Zero( Ity_I32, frA_int ) ) );
+   else
+      assign( fraNotZero_tmp, unop( Iop_Not1, is_Zero( Ity_I64, frA_int ) ) );
+
    ea_eb_GTE = mkAND1( mkexpr( fraNotZero_tmp ),
                        binop( Iop_CmpLT32S, mkU32( bias ),
                               binop( Iop_Sub32, mkexpr( e_a ),
@@ -8988,10 +11052,8 @@
    );
 
    putGST_field( PPC_GST_CR, mkexpr(ccPPC32), crfD );
+   putFPCC( mkexpr( ccPPC32 ) );
 
-   /* CAB: TODO?: Support writing cc to FPSCR->FPCC ?
-      putGST_field( PPC_GST_FPSCR, mkexpr(ccPPC32), 4 );
-   */
    // XXX XXX XXX FIXME
    // Also write the result into FPRF (it's not entirely clear how)
 
@@ -9274,15 +11336,152 @@
       }
       assign( EA_hi, ea_rAor0_idxd( rA_addr, rB_addr ) );
       break;
-   case 0x39: // lfdp (FP Load Double Pair DS-form, ISA 2.05  p125)
-      DIP("lfdp fr%u,%d(r%u)\n", frT_hi_addr, simm16, rA_addr);
-      assign( EA_hi, ea_rAor0_simm( rA_addr, simm16  ) );
-      is_load = 1;
+   case 0x39:
+   {
+      UInt  DS  = IFIELD( theInstr, 2, 14);
+      UChar vRT = ifieldRegDS(theInstr);
+      IRTemp EA = newTemp( ty );
+
+      opc2 = ifieldOPC0o2(theInstr);
+
+      switch(opc2) {
+      case 0x0:     // lfdp (FP Load Double Pair DS-form, ISA 2.05  p125)
+         DIP("lfdp fr%u,%d(r%u)\n", frT_hi_addr, simm16, rA_addr);
+         assign( EA_hi, ea_rAor0_simm( rA_addr, simm16  ) );
+         is_load = 1;
+         break;
+
+      case 0x2:     // lxsd (Load VSX Scalar Doubleword)
+         DIP("lxsd v%u,%d(r%u)\n", vRT, DS, rA_addr);
+
+         assign( EA, ea_rAor0_simm( rA_addr, DS<<2  ) );
+
+         putVSReg( vRT+32, binop( Iop_64HLtoV128,
+                                  load( Ity_I64, mkexpr( EA ) ),
+                                  mkU64( 0 ) ) );
+         return True;
+
+      case 0x3:     // lxssp (Load VSX Scalar Single)
+         DIP("lxssp v%u,%d(r%u)\n", vRT, DS, rA_addr);
+
+         assign( EA, ea_rAor0_simm( rA_addr, DS<<2  ) );
+
+         putVSReg( vRT+32, binop( Iop_64HLtoV128,
+                                  binop( Iop_32HLto64,
+                                         load( Ity_I32, mkexpr( EA ) ),
+                                         mkU32( 0 ) ),
+                                  mkU64( 0 ) ) );
+         return True;
+
+      default:
+         vex_printf("dis_fp_pair(ppc) : DS-form wrong opc2\n");
+         return False;
+      }
       break;
-   case 0x3d: // stfdp (FP Store Double Pair DS-form, ISA 2.05  p125)
-      DIP("stfdp fr%u,%d(r%u)\n", frT_hi_addr, simm16, rA_addr);
-      assign( EA_hi, ea_rAor0_simm( rA_addr, simm16  ) );
+   }
+   case 0x3d:
+   {
+      UInt  DS  = IFIELD( theInstr, 2, 14);
+      UChar vRS = ifieldRegDS(theInstr);
+      IRTemp EA = newTemp( ty );
+
+      opc2 = ifieldOPC0o2(theInstr);
+
+      switch(opc2) {
+      case 0x0:
+         // stfdp (FP Store Double Pair DS-form, ISA 2.05  p125)
+         DIP("stfdp fr%u,%d(r%u)\n", frT_hi_addr, simm16, rA_addr);
+         assign( EA_hi, ea_rAor0_simm( rA_addr, simm16  ) );
+         break;
+
+      case 0x1:
+      {
+         UInt ea_off = 8;
+         IRTemp word[2];
+         IRExpr* irx_addr;
+         UInt  T  = IFIELD( theInstr, 21, 5);  // T or S depending on inst
+         UInt  TX = IFIELD( theInstr,  3, 1);  // TX or SX field
+
+         word[0] = newTemp(Ity_I64);
+         word[1] = newTemp(Ity_I64);
+         DS  = IFIELD( theInstr, 4, 11);   // DQ in the instruction definition
+         assign( EA, ea_rAor0_simm( rA_addr, DS<<4  ) );
+
+         if ( IFIELD( theInstr, 0, 3) == 1) {
+            // lxv (Load VSX Vector)
+            DIP("lxv v%u,%d(r%u)\n", vRS, DS, rA_addr);
+
+            assign( word[0], load( Ity_I64, mkexpr( EA ) ) );
+
+            irx_addr = binop( mkSzOp( ty, Iop_Add8 ), mkexpr( EA ),
+                           ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+
+            assign( word[1], load( Ity_I64, irx_addr ) );
+
+            if (host_endness == VexEndnessBE)
+               putVSReg( TX*32+T, binop( Iop_64HLtoV128,
+                                         mkexpr( word[0] ),
+                                         mkexpr( word[1] ) ) );
+            else
+               putVSReg( TX*32+T, binop( Iop_64HLtoV128,
+                                         mkexpr( word[1] ),
+                                         mkexpr( word[0] ) ) );
+            return True;
+
+         } else if ( IFIELD( theInstr, 0, 3) == 5) {
+            // stxv (Store VSX Vector)
+            DIP("stxv v%u,%d(r%u)\n", vRS, DS, rA_addr);
+
+            if (host_endness == VexEndnessBE) {
+               store( mkexpr(EA), unop( Iop_V128HIto64,
+                                        getVSReg( TX*32+T ) ) );
+               irx_addr = binop( mkSzOp( ty, Iop_Add8 ), mkexpr( EA ),
+                           ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+               store( irx_addr, unop( Iop_V128to64,
+                                       getVSReg( TX*32+T ) ) );
+            } else {
+               store( mkexpr(EA), unop( Iop_V128to64,
+                                        getVSReg( TX*32+T ) ) );
+               irx_addr = binop( mkSzOp( ty, Iop_Add8 ), mkexpr( EA ),
+                           ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+               store( irx_addr, unop( Iop_V128HIto64,
+                                      getVSReg( TX*32+T ) ) );
+            }
+            return True;
+
+         } else {
+            vex_printf("dis_fp_pair vector load/store (ppc) : DS-form wrong opc2\n");
+            return False;
+         }
+         break;
+      }
+      case 0x2:
+         // stxsd (Store VSX Scalar Doubleword)
+         DIP("stxsd v%u,%d(r%u)\n", vRS, DS, rA_addr);
+
+         assign( EA, ea_rAor0_simm( rA_addr, DS<<2  ) );
+
+         store( mkexpr(EA), unop( Iop_V128HIto64,
+                                  getVSReg( vRS+32 ) ) );
+         return True;
+
+      case 0x3:
+         // stxssp (Store VSX Scalar Single)
+         DIP("stxssp v%u,%d(r%u)\n", vRS, DS, rA_addr);
+
+         assign( EA, ea_rAor0_simm( rA_addr, DS<<2  ) );
+
+         store( mkexpr(EA), unop( Iop_64HIto32,
+                                  unop( Iop_V128HIto64,
+                                        getVSReg( vRS+32 ) ) ) );
+         return True;
+
+      default:
+         vex_printf("dis_fp_pair(ppc) : DS-form wrong opc2\n");
+         return False;
+      }
       break;
+   }
    default:   // immediate offset
       vex_printf("dis_fp_pair(ppc)(instr)\n");
       return False;
@@ -9532,24 +11731,16 @@
 
    case 0x086: { // mtfsfi (Move to FPSCR Field Immediate, PPC32 p481)
       UInt crfD     = IFIELD( theInstr, 23, 3 );
-      UChar b16to22 = toUChar( IFIELD( theInstr, 16, 7 ) );
+      UChar b17to22 = toUChar( IFIELD( theInstr, 17, 6 ) );
       UChar IMM     = toUChar( IFIELD( theInstr, 12, 4 ) );
       UChar b11     = toUChar( IFIELD( theInstr, 11, 1 ) );
-      UChar Wbit;
+      UChar Wbit    = toUChar( IFIELD( theInstr, 16, 1 ) );
 
-      if (b16to22 != 0 || b11 != 0) {
+      if (b17to22 != 0 || b11 != 0 || (Wbit && !GX_level)) {
          vex_printf("dis_fp_scr(ppc)(instr,mtfsfi)\n");
          return False;
-      }      
-      DIP("mtfsfi%s crf%u,%d\n", flag_rC ? ".":"", crfD, IMM);
-      if (GX_level) {
-         /* This implies that Decimal Floating Point is supported, and the
-          * FPSCR must be managed as a 64-bit register.
-          */
-         Wbit = toUChar( IFIELD(theInstr, 16, 1) );
-      } else {
-         Wbit = 0;
       }
+      DIP("mtfsfi%s crf%u,%d%s\n", flag_rC ? ".":"", crfD, IMM, Wbit ? ",1":"");
       crfD = crfD + (8 * (1 - Wbit) );
       putGST_field( PPC_GST_FPSCR, mkU32( IMM ), crfD );
       break;
@@ -9558,7 +11749,21 @@
    case 0x247: { // mffs (Move from FPSCR, PPC32 p468)
       UChar   frD_addr  = ifieldRegDS(theInstr);
       UInt    b11to20   = IFIELD(theInstr, 11, 10);
-      IRExpr* fpscr_lower = getGST_masked( PPC_GST_FPSCR, MASK_FPSCR_RN );
+      /* The FPSCR_DRN, FPSCR_RN and FPSCR_FPCC are all stored in
+       * their own 8-bit entries with distinct offsets.  The FPSCR
+       * register is handled as two 32-bit values.  We need to
+       * assemble the pieces into the single 64-bit value to return.
+       */
+      IRExpr* fpscr_lower
+         = binop( Iop_Or32,
+                  getGST_masked( PPC_GST_FPSCR, MASK_FPSCR_RN),
+                  binop( Iop_Or32,
+                         binop( Iop_Shl32,
+                                getC(),
+                                mkU8(63-47) ) ,
+                         binop( Iop_Shl32,
+                                getFPCC(),
+                                mkU8(63-51) ) ) );
       IRExpr* fpscr_upper = getGST_masked_upper( PPC_GST_FPSCR,
                                                  MASK_FPSCR_DRN );
 
@@ -9623,6 +11828,12 @@
                else
                   mask |= BFP_MASK_SEED >> ( 4 * ( i + 8 * ( 1 - Wbit ) ) );
             }
+            if ((FM & (1<<(7-i))) == 0x2) { //set the FPCC bits
+               mask |= 0xF000;
+            }
+            if ((FM & (1<<(7-i))) == 0x4) { //set the Floating-Point Class Descriptor (C) bit
+               mask |= 0x10000;
+            }
          }
       }
       assign( frB, getFReg(frB_addr));
@@ -10201,7 +12412,7 @@
    IRTemp low_u_flag = newTemp( Ity_I8 );
    IRTemp low_l_flag = newTemp( Ity_I8 );
 
-   /* Check the LMD, digit 16, to see if it is zero. */
+   /* Check the LMD, digit 34, to see if it is zero. */
    assign( num_lmd, unop( Iop_1Uto8, binop( Iop_CmpEQ32, lmd, mkU32( 0 ) ) ) );
 
    assign( lmd_flag, unop( Iop_Not8, mkexpr( num_lmd ) ) );
@@ -10213,7 +12424,7 @@
                 &top_flag,
                 top_12_l );
 
-   Count_zeros( 1,
+   Count_zeros( 2,
                 mkexpr( num_top ),
                 mkexpr( top_flag ),
                 &num_mid_u,
@@ -10222,14 +12433,14 @@
                        binop( Iop_Shl32, mid_60_u, mkU8( 2 ) ),
                        binop( Iop_Shr32, mid_60_l, mkU8( 30 ) ) ) );
 
-   Count_zeros( 2,
+   Count_zeros( 1,
                 mkexpr( num_mid_u ),
                 mkexpr( mid_u_flag ),
                 &num_mid_l,
                 &mid_l_flag,
                 mid_60_l );
 
-   Count_zeros( 1,
+   Count_zeros( 2,
                 mkexpr( num_mid_l ),
                 mkexpr( mid_l_flag ),
                 &num_low_u,
@@ -10238,7 +12449,7 @@
                        binop( Iop_Shl32, low_60_u, mkU8( 2 ) ),
                        binop( Iop_Shr32, low_60_l, mkU8( 30 ) ) ) );
 
-   Count_zeros( 2,
+   Count_zeros( 1,
                 mkexpr( num_low_u ),
                 mkexpr( low_u_flag ),
                 &num_low_l,
@@ -10264,10 +12475,10 @@
                        binop( Iop_CmpEQ32,
                               mkexpr( gfield0to5 ),
                               mkU32( 0x1E ) ) ),
-                              unop( Iop_1Sto32, /* SNaN check */
-                                    binop( Iop_CmpEQ32,
-                                           mkexpr( gfield0to5 ),
-                                           mkU32( 0x1F ) ) ) );
+                 unop( Iop_1Sto32, /* SNaN check */
+                       binop( Iop_CmpEQ32,
+                              mkexpr( gfield0to5 ),
+                              mkU32( 0x1F ) ) ) );
 }
 
 #undef AND
@@ -11047,6 +13258,7 @@
                                       mkU32( 1 ) ) ) ) ) );
 
    putGST_field( PPC_GST_CR, mkexpr( ccPPC32 ), crfD );
+   putFPCC( mkexpr( ccPPC32 ) );
    return True;
 }
 
@@ -11076,6 +13288,7 @@
    IRTemp cc1 = newTemp( Ity_I32 );
    IRTemp cc2 = newTemp( Ity_I32 );
    IRTemp cc3 = newTemp( Ity_I32 );
+   IRTemp cc  = newTemp( Ity_I32 );
 
    /* The dtstex and dtstexg instructions only differ in the size of the
     * exponent field.  The following switch statement takes care of the size
@@ -11230,15 +13443,15 @@
                               ) ) );
 
    /* store the condition code */
-   putGST_field( PPC_GST_CR,
-                 binop( Iop_Or32,
-                        mkexpr( cc0 ),
-                        binop( Iop_Or32,
-                               mkexpr( cc1 ),
-                               binop( Iop_Or32,
-                                      mkexpr( cc2 ),
-                                      mkexpr( cc3 ) ) ) ),
-                 crfD );
+   assign( cc, binop( Iop_Or32,
+                      mkexpr( cc0 ),
+                      binop( Iop_Or32,
+                             mkexpr( cc1 ),
+                             binop( Iop_Or32,
+                                    mkexpr( cc2 ),
+                                    mkexpr( cc3 ) ) ) ) );
+   putGST_field( PPC_GST_CR, mkexpr( cc ), crfD );
+   putFPCC( mkexpr( cc ) );
    return True;
 }
 
@@ -11683,6 +13896,7 @@
                                        mkU8( 1 ) ) ) );
 
    putGST_field( PPC_GST_CR, mkexpr( field ), crfD );
+   putFPCC( mkexpr( field ) );
    return True;
 }
 
@@ -12357,10 +14571,11 @@
 
 static Bool dis_dfp_significant_digits( UInt theInstr )
 {
+   UInt opc1      = ifieldOPC( theInstr );
+   UInt opc2      = ifieldOPClo10(theInstr);
    UChar frA_addr = ifieldRegA( theInstr );
    UChar frB_addr = ifieldRegB( theInstr );
    IRTemp frA     = newTemp( Ity_D64 );
-   UInt opc1      = ifieldOPC( theInstr );
    IRTemp B_sig   = newTemp( Ity_I8 );
    IRTemp K       = newTemp( Ity_I8 );
    IRTemp lmd_B   = newTemp( Ity_I32 );
@@ -12372,22 +14587,38 @@
    IRTemp Gt_true_mask       = newTemp( Ity_I32 );
    IRTemp KisZero_true_mask  = newTemp( Ity_I32 );
    IRTemp KisZero_false_mask = newTemp( Ity_I32 );
+   IRTemp cc = newTemp( Ity_I32 );
+   UChar  UIM     = toUChar( IFIELD( theInstr, 16, 6 ) );
+   IRTemp BCD_valid  = newTemp( Ity_I32 );
 
-   /* Get the reference singificance stored in frA */
-   assign( frA, getDReg( frA_addr ) );
+   if (opc2 == 0x2A2) {        // dtstsf   DFP Test Significance
+                               // dtstsfq  DFP Test Significance Quad
+      /* Get the reference singificance stored in frA */
+      assign( frA, getDReg( frA_addr ) );
 
-   /* Convert from 64 bit to 8 bits in two steps.  The Iop_64to8 is not 
-    * supported in 32-bit mode.
-    */
-   assign( K, unop( Iop_32to8,
-                    binop( Iop_And32,
-                           unop( Iop_64to32,
-                                 unop( Iop_ReinterpD64asI64,
-                                       mkexpr( frA ) ) ),
-                           mkU32( 0x3F ) ) ) );
+      /* Convert from 64 bit to 8 bits in two steps.  The Iop_64to8 is not
+       * supported in 32-bit mode.
+       */
+      assign( K, unop( Iop_32to8,
+                       binop( Iop_And32,
+                              unop( Iop_64to32,
+                                    unop( Iop_ReinterpD64asI64,
+                                          mkexpr( frA ) ) ),
+                              mkU32( 0x3F ) ) ) );
+
+   } else if (opc2 == 0x2A3) { // dtstsfi  DFP Test Significance Immediate
+                               // dtstsfiq DFP Test Significance Quad Immediate
+      /* get the significane from the immediate field */
+      assign( K, mkU8( UIM) );
+
+   } else {
+      vex_printf("dis_dfp_significant_digits(ppc)(opc2) wrong\n");
+      return False;
+   }
 
    switch ( opc1 ) {
    case 0x3b: // dtstsf   DFP Test Significance
+              // dtstsfi  DFP Test Significance Immediate
    {
       IRTemp frB     = newTemp( Ity_D64 );
       IRTemp frBI64  = newTemp( Ity_I64 );
@@ -12395,7 +14626,11 @@
       IRTemp B_bcd_l = newTemp( Ity_I32 );
       IRTemp tmp64   = newTemp( Ity_I64 );
 
-      DIP( "dtstsf %u,r%u,r%u\n", crfD, frA_addr, frB_addr );
+      if (opc2 == 0x2A2) {
+         DIP( "dtstsf %u,r%u,r%u\n", crfD, frA_addr, frB_addr );
+      } else {
+         DIP( "dtstsfi %u,%u,r%u\n", crfD, UIM, frB_addr );
+      }
 
       assign( frB, getDReg( frB_addr ) );
       assign( frBI64, unop( Iop_ReinterpD64asI64, mkexpr( frB ) ) );
@@ -12420,10 +14655,23 @@
                      Count_leading_zeros_60( mkexpr( lmd_B ),
                                              mkexpr( B_bcd_u ),
                                              mkexpr( B_bcd_l ) ) ) );
-      assign( Unordered_true, Check_unordered( mkexpr( frBI64 ) ) );
+
+      assign( BCD_valid,
+              binop( Iop_Or32,
+                     bcd_digit_inval( mkexpr( B_bcd_u), mkexpr( B_bcd_l) ),
+                     bcd_digit_inval( mkexpr( lmd_B), mkU32( 0 ) ) ) );
+
+      /* Set unordered to True if the number is NaN, Inf or an invalid
+       * digit.
+       */
+      assign( Unordered_true,
+              binop( Iop_Or32,
+                     Check_unordered( mkexpr( frBI64 ) ),
+                     mkexpr( BCD_valid) ) );
    }
    break;
    case 0x3F: // dtstsfq     DFP Test Significance
+              // dtstsfqi    DFP Test Significance Immediate
    {
       IRTemp frB_hi     = newTemp( Ity_D64 );
       IRTemp frB_lo     = newTemp( Ity_D64 );
@@ -12435,7 +14683,11 @@
       IRTemp B_mid_60_l = newTemp( Ity_I32 );
       IRTemp B_top_12_l = newTemp( Ity_I32 );
 
-      DIP( "dtstsfq %u,r%u,r%u\n", crfD, frA_addr, frB_addr );
+      if (opc2 == 0x2A2) {
+         DIP( "dtstsfq %u,r%u,r%u\n", crfD, frA_addr, frB_addr );
+      } else {
+         DIP( "dtstsfiq %u,%u,r%u\n", crfD, UIM, frB_addr );
+      }
 
       assign( frB_hi, getDReg( frB_addr ) );
       assign( frB_lo, getDReg( frB_addr + 1 ) );
@@ -12461,6 +14713,16 @@
                                    &B_low_60_u,
                                    &B_low_60_l );
 
+      assign( BCD_valid,
+              binop( Iop_Or32,
+                     binop( Iop_Or32,
+                            bcd_digit_inval( mkexpr( lmd_B ),
+                                             mkexpr( B_top_12_l ) ),
+                            bcd_digit_inval( mkexpr( B_mid_60_u ),
+                                             mkexpr( B_mid_60_l ) ) ),
+                     bcd_digit_inval( mkexpr( B_low_60_u ),
+                                      mkexpr( B_low_60_l ) ) ) );
+
       assign( B_sig,
               binop( Iop_Sub8,
                      mkU8( DFP_EXTND_MAX_SIG_DIGITS ),
@@ -12471,7 +14733,13 @@
                                               mkexpr( B_low_60_u ),
                                               mkexpr( B_low_60_l ) ) ) );
 
-      assign( Unordered_true, Check_unordered( mkexpr( frBI64_hi ) ) );
+      /* Set unordered to True if the number is NaN, Inf or an invalid
+       * digit.
+       */
+      assign( Unordered_true,
+              binop( Iop_Or32,
+                     Check_unordered( mkexpr( frBI64_hi ) ),
+                     mkexpr( BCD_valid) ) );
    }
    break;
    }
@@ -12533,19 +14801,19 @@
                          mkexpr( KisZero_true_mask ),
                          mkU32( 0x4 ) ) ) );
 
-   putGST_field( PPC_GST_CR,
-                 binop( Iop_Or32,
-                        binop( Iop_And32,
-                               mkexpr( Unordered_true ),
-                               mkU32( 0x1 ) ),
-                        binop( Iop_And32,
-                               unop( Iop_Not32, mkexpr( Unordered_true ) ),
-                               mkexpr( field ) ) ),
-                 crfD );
+   assign( cc, binop( Iop_Or32,
+                      binop( Iop_And32,
+                             mkexpr( Unordered_true ),
+                             mkU32( 0x1 ) ),
+                      binop( Iop_And32,
+                             unop( Iop_Not32, mkexpr( Unordered_true ) ),
+                             mkexpr( field ) ) ) );
+
+   putGST_field( PPC_GST_CR, mkexpr( cc ), crfD );
+   putFPCC( mkexpr( cc ) );
 
    return True;
 }
-
 /*------------------------------------------------------------*/
 /*--- AltiVec Instruction Translation                      ---*/
 /*------------------------------------------------------------*/
@@ -12647,6 +14915,650 @@
 }
 
 /*
+Vector Extend Sign Instructions
+*/
+static Bool dis_av_extend_sign_count_zero ( UInt theInstr, UInt allow_isa_3_0 )
+{
+   /* VX-Form, sort of, the A register field is used to select the specific
+    * sign extension instruction or count leading/trailing zero LSB
+    * instruction.
+    */
+
+   UChar opc1    = ifieldOPC( theInstr );
+   UChar rT_addr = ifieldRegDS (theInstr );
+   UChar rA_addr = ifieldRegA( theInstr );
+   UChar vB_addr = ifieldRegB( theInstr );
+   UInt  opc2    = IFIELD( theInstr, 0, 11 );
+
+   IRTemp vB    = newTemp( Ity_V128 );
+   IRTemp vT    = newTemp( Ity_V128 );
+
+   assign( vB, getVReg ( vB_addr ) );
+
+   if ( ( opc1 != 0x4 ) && ( opc2 != 0x602 ) )  {
+      vex_printf("dis_av_extend_sign(ppc)(instr)\n");
+      return False;
+   }
+
+   switch ( rA_addr ) {
+   case 0:
+   case 1:
+   {
+      UInt i;
+      IRTemp count[17];
+      IRTemp bit_zero[16];
+      IRTemp byte_mask[17];
+
+      /* These instructions store the result in the general purpose
+       * register in the rT_addr field.
+       */
+
+      byte_mask[0] = newTemp( Ity_I32 );
+      count[0] = newTemp( Ity_I32 );
+      assign( count[0], mkU32( 0 ) );
+      assign( byte_mask[0], mkU32( 0x1 ) );
+
+      if ( rA_addr == 0 ) {
+         // vclzlsbb (Vector Count Leading Zero Least-Significant Bits Byte)
+         DIP("vclzlsbb %d,v%d\n", rT_addr, vB_addr);
+
+      } else {
+         // vctzlsbb (Vector Count Trailing Zero Least-Significant Bits Byte)
+         DIP("vctzlsbb %d,v%d\n", rT_addr, vB_addr);
+      }
+
+      for( i = 0; i < 16; i++ ) {
+         byte_mask[i+1] = newTemp( Ity_I32 );
+         count[i+1] = newTemp( Ity_I32 );
+         bit_zero[i] = newTemp( Ity_I1 );
+
+         /* bit_zero[i] = 0x0 until the first 1 bit is found in lsb of
+          * byte.  When the first 1 bit is found it causes the byte_mask
+          * to change from 0x1 to 0x0.  Thus the AND of the lsb and byte_mask
+          * will be zero  which will be equal to the zero byte_mask causing
+          * the value of bit_zero[i] to be equal to 0x1 for all remaining bits.
+          */
+
+         if ( rA_addr == 0 )
+            /* leading zero bit in byte count,
+               work bytes from left to right
+            */
+            assign( bit_zero[i],
+                    binop( Iop_CmpEQ32,
+                           binop( Iop_And32,
+                                  unop( Iop_V128to32,
+                                        binop( Iop_ShrV128,
+                                               mkexpr( vB ),
+                                               mkU8( ( 15 - i) * 8 ) ) ),
+                                  mkexpr( byte_mask[i] ) ),
+                           mkexpr( byte_mask[i] ) ) );
+
+         else if ( rA_addr == 1 )
+            /* trailing zero bit in byte count,
+             * work bytes from right to left
+             */
+            assign( bit_zero[i],
+                    binop( Iop_CmpEQ32,
+                           binop( Iop_And32,
+                                  unop( Iop_V128to32,
+                                        binop( Iop_ShrV128,
+                                               mkexpr( vB ),
+                                               mkU8( i * 8 ) ) ),
+                                  mkexpr( byte_mask[i] ) ),
+                           mkexpr( byte_mask[i] ) ) );
+
+         /* Increment count as long as bit_zero = 0 */
+         assign( count[i+1], binop( Iop_Add32,
+                                    mkexpr( count[i] ),
+                                    unop( Iop_1Uto32,
+                                          unop( Iop_Not1,
+                                                mkexpr( bit_zero[i] ) ) ) ) );
+
+         /* If comparison fails to find a zero bit, set the byte_mask to zero
+          * for all future comparisons so there will be no more matches.
+          */
+         assign( byte_mask[i+1],
+                 binop( Iop_And32,
+                        unop( Iop_1Uto32,
+                              unop( Iop_Not1,
+                                    mkexpr( bit_zero[i] ) ) ),
+                        mkexpr( byte_mask[i] )  ) );
+      }
+      putIReg( rT_addr, unop( Iop_32Uto64, mkexpr( count[16] ) ) );
+      return True;
+   }
+
+   case 6: // vnegw,  Vector Negate Word
+      DIP("vnegw  v%d,%d,v%d", rT_addr, rA_addr, vB_addr);
+
+      /* multiply each word by -1 */
+      assign( vT, binop( Iop_Mul32x4, mkexpr( vB ), mkV128( 0xFFFF ) ) );
+      break;
+
+   case 7: // vnegd,  Vector Negate Doubleword
+      DIP("vnegd  v%d,%d,v%d", rT_addr, rA_addr, vB_addr);
+
+      /* multiply each word by -1 */
+      assign( vT, binop( Iop_64HLtoV128,
+                         binop( Iop_Mul64,
+                                unop( Iop_V128HIto64,
+                                      mkexpr( vB ) ),
+                                mkU64( 0xFFFFFFFFFFFFFFFF ) ),
+                         binop( Iop_Mul64,
+                               unop( Iop_V128to64,
+                                      mkexpr( vB ) ),
+                                mkU64( 0xFFFFFFFFFFFFFFFF ) ) ) );
+      break;
+
+   case 8:  // vprtybw,  Vector Parity Byte Word
+   case 9:  // vprtybd,  Vector Parity Byte Doubleword
+   case 10: // vprtybq,  Vector Parity Byte Quadword
+      {
+         UInt i;
+         IRTemp bit_in_byte[16];
+         IRTemp word_parity[4];
+
+         for( i = 0; i < 16; i++ ) {
+            bit_in_byte[i] = newTemp( Ity_I32 );
+            assign( bit_in_byte[i],
+                    binop( Iop_And32,
+                           unop( Iop_V128to32,
+                                 binop( Iop_ShrV128,
+                                        mkexpr( vB ),
+                                        mkU8( ( 15 - i ) * 8 ) ) ),
+                           mkU32( 0x1 ) ) );
+         }
+
+         for( i = 0; i < 4; i++ ) {
+            word_parity[i] = newTemp(Ity_I32);
+            assign( word_parity[i],
+                    mkXOr4_32( bit_in_byte[0 + i * 4],
+                               bit_in_byte[1 + i * 4],
+                               bit_in_byte[2 + i * 4],
+                               bit_in_byte[3 + i * 4] ) );
+         }
+
+         if ( rA_addr == 8 ) {
+            DIP("vprtybw  v%d,v%d", rT_addr, vB_addr);
+
+            assign( vT, mkV128from32( word_parity[0], word_parity[1],
+                                      word_parity[2], word_parity[3] ) );
+
+         } else if ( rA_addr == 9 ) {
+            DIP("vprtybd  v%d,v%d", rT_addr, vB_addr);
+
+            assign( vT,
+                    binop( Iop_64HLtoV128,
+                           binop( Iop_32HLto64,
+                                  mkU32( 0 ),
+                                  binop( Iop_Xor32,
+                                         mkexpr( word_parity[0] ),
+                                         mkexpr( word_parity[1] ) ) ),
+                           binop( Iop_32HLto64,
+                                  mkU32( 0 ),
+                                  binop( Iop_Xor32,
+                                         mkexpr( word_parity[2] ),
+                                         mkexpr( word_parity[3] ) ) ) ) );
+
+         } else if ( rA_addr == 10 ) {
+            DIP("vprtybq  v%d,v%d", rT_addr, vB_addr);
+
+            assign( vT,
+                    binop( Iop_64HLtoV128,
+                           mkU64( 0 ),
+                           unop( Iop_32Uto64,
+                                 mkXOr4_32( word_parity[0],
+                                            word_parity[1],
+                                            word_parity[2],
+                                            word_parity[3] ) ) ) );
+         }
+      }
+      break;
+
+   case 16: // vextsb2w,  Vector Extend Sign Byte to Word
+      DIP("vextsb2w  v%d,%d,v%d", rT_addr, rA_addr, vB_addr);
+
+      /* Iop_MullEven8Sx16 does a signed widening multiplication of byte to
+       * two byte sign extended result.  Then do a two byte to four byte sign
+       * extended multiply.  Note contents of upper three bytes in word are
+       * "over written". So just take source and multiply by 1.
+       */
+      assign( vT, binop( Iop_MullEven16Sx8,
+                         binop( Iop_64HLtoV128,
+                                mkU64( 0x0000000100000001 ),
+                                mkU64( 0x0000000100000001 ) ),
+                         binop( Iop_MullEven8Sx16,
+                                mkexpr( vB ),
+                                binop( Iop_64HLtoV128,
+                                       mkU64( 0x0001000100010001 ),
+                                       mkU64( 0x0001000100010001 ) ) ) ) );
+      break;
+
+   case 17: // vextsh2w,  Vector Extend Sign Halfword to Word
+      DIP("vextsh2w  v%d,%d,v%d", rT_addr, rA_addr, vB_addr);
+
+      /* Iop_MullEven16Sx8 does a signed widening multiply of four byte
+       * 8 bytes.  Note contents of upper two bytes in word are
+       * "over written". So just take source and multiply by 1.
+       */
+      assign( vT, binop( Iop_MullEven16Sx8,
+                         binop( Iop_64HLtoV128,
+                                mkU64( 0x0000000100000001 ),
+                                mkU64( 0x0000000100000001 ) ),
+                        mkexpr( vB ) ) );
+
+      break;
+
+   case 24: // vextsb2d,  Vector Extend Sign Byte to Doubleword
+      DIP("vextsb2d  v%d,%d,v%d", rT_addr, rA_addr, vB_addr);
+
+      /* Iop_MullEven8Sx16 does a signed widening multiplication of byte to
+       * two byte sign extended result.  Then do a two byte to four byte sign
+       * extended multiply. Then do four byte to eight byte multiply.
+       */
+      assign( vT, binop( Iop_MullEven32Sx4,
+                         binop( Iop_64HLtoV128,
+                                mkU64( 0x0000000000000001 ),
+                                mkU64( 0x0000000000000001 ) ),
+                        binop( Iop_MullEven16Sx8,
+                               binop( Iop_64HLtoV128,
+                                      mkU64( 0x0000000100000001 ),
+                                      mkU64( 0x0000000100000001 ) ),
+                              binop( Iop_MullEven8Sx16,
+                                     binop( Iop_64HLtoV128,
+                                            mkU64( 0x0001000100010001 ),
+                                            mkU64( 0x0001000100010001 ) ),
+                                    mkexpr( vB ) ) ) ) );
+      break;
+
+   case 25: // vextsh2d,  Vector Extend Sign Halfword to Doubleword
+      DIP("vextsh2d  v%d,%d,v%d", rT_addr, rA_addr, vB_addr);
+
+      assign( vT, binop( Iop_MullEven32Sx4,
+                         binop( Iop_64HLtoV128,
+                                mkU64( 0x0000000000000001 ),
+                                mkU64( 0x0000000000000001 ) ),
+                        binop( Iop_MullEven16Sx8,
+                               binop( Iop_64HLtoV128,
+                                      mkU64( 0x0000000100000001 ),
+                                      mkU64( 0x0000000100000001 ) ),
+                               mkexpr( vB ) ) ) );
+      break;
+
+   case 26: // vextsw2d,  Vector Extend Sign Word to Doubleword
+      DIP("vextsw2d  v%d,%d,v%d", rT_addr, rA_addr, vB_addr);
+
+      assign( vT, binop( Iop_MullEven32Sx4,
+                         binop( Iop_64HLtoV128,
+                                mkU64( 0x0000000000000001 ),
+                                mkU64( 0x0000000000000001 ) ),
+                        mkexpr( vB ) ) );
+      break;
+
+   case 28: // vctzb,  Vector Count Trailing Zeros Byte
+      {
+         DIP("vctzb  v%d,v%d", rT_addr, vB_addr);
+
+         /* This instruction is only available in the ISA 3.0 */
+         if ( !mode64 || !allow_isa_3_0 ) {
+            vex_printf("\n vctzb instruction not supported on non ISA 3.0 platform\n\n");
+            return False;
+         }
+         assign( vT, unop( Iop_Ctz8x16, mkexpr( vB ) ) );
+      }
+      break;
+
+   case 29: // vctzh,  Vector Count Trailing Zeros Halfword
+      {
+         DIP("vctzh  v%d,v%d", rT_addr, vB_addr);
+
+         /* This instruction is only available in the ISA 3.0 */
+         if ( !mode64 || !allow_isa_3_0 ) {
+            vex_printf("\n vctzh instruction not supported on non ISA 3.0 platform\n\n");
+            return False;
+         }
+         assign( vT, unop( Iop_Ctz16x8, mkexpr( vB ) ) );
+      }
+      break;
+
+   case 30: // vctzw,  Vector Count Trailing Zeros Word
+      {
+         DIP("vctzw  v%d,v%d", rT_addr, vB_addr);
+
+         /* This instruction is only available in the ISA 3.0 */
+         if ( !mode64 || !allow_isa_3_0 ) {
+            vex_printf("\n vctzw instruction not supported on non ISA 3.0 platform\n\n");
+            return False;
+         }
+         assign( vT, unop( Iop_Ctz32x4, mkexpr( vB ) ) );
+      }
+      break;
+
+   case 31: // vctzd,  Vector Count Trailing Zeros Double word
+      {
+         DIP("vctzd  v%d,v%d", rT_addr, vB_addr);
+
+         /* This instruction is only available in the ISA 3.0 */
+         if ( !mode64 || !allow_isa_3_0 ) {
+            vex_printf("\n vctzd instruction not supported on non ISA 3.0 platform\n\n");
+            return False;
+         }
+         assign( vT, unop( Iop_Ctz64x2, mkexpr( vB ) ) );
+      }
+      break;
+
+   default:
+      vex_printf("dis_av_extend_sign(ppc)(Unsupported vector extend sign instruction)\n");
+      return False;
+   }
+
+   putVReg( rT_addr, mkexpr( vT ) );
+   return True;
+}
+
+/*
+Vector Rotate Instructions
+*/
+static Bool dis_av_rotate ( UInt theInstr )
+{
+   /* VX-Form */
+
+   UChar opc1    = ifieldOPC( theInstr );
+   UChar vT_addr = ifieldRegDS( theInstr );
+   UChar vA_addr = ifieldRegA( theInstr );
+   UChar vB_addr = ifieldRegB( theInstr );
+   UInt  opc2    = IFIELD( theInstr, 0, 11 );
+
+   IRTemp vA    = newTemp( Ity_V128 );
+   IRTemp vB    = newTemp( Ity_V128 );
+   IRTemp src3  = newTemp( Ity_V128 );
+   IRTemp vT    = newTemp( Ity_V128 );
+   IRTemp field_mask = newTemp( Ity_V128 );
+   IRTemp mask128 = newTemp( Ity_V128 );
+   IRTemp vA_word[4];
+   IRTemp left_bits[4];
+   IRTemp right_bits[4];
+   IRTemp shift[4];
+   IRTemp mask[4];
+   IRTemp tmp128[4];
+   UInt i;
+   UInt num_words;
+   UInt word_size;
+   unsigned long long word_mask;
+
+   if ( opc1 != 0x4 ) {
+      vex_printf("dis_av_rotate(ppc)(instr)\n");
+      return False;
+   }
+
+   assign( vA, getVReg( vA_addr ) );
+   assign( vB, getVReg( vB_addr ) );
+
+   switch (opc2) {
+   case 0x85: // vrlwmi,  Vector Rotate Left Word then Mask Insert
+   case 0x185: // vrlwnm,  Vector Rotate Left Word then AND with Mask
+      num_words = 4;
+      word_size = 32;
+      assign( field_mask, binop( Iop_64HLtoV128,
+                                 mkU64( 0 ),
+                                 mkU64( 0x1F ) ) );
+      word_mask = 0xFFFFFFFF;
+      break;
+
+   case 0x0C5: // vrldmi,  Vector Rotate Left Doubleword then Mask Insert
+   case 0x1C5: // vrldnm,  Vector Rotate Left Doubleword then AND with Mask
+      num_words = 2;
+      word_size = 64;
+      assign( field_mask, binop( Iop_64HLtoV128,
+                                 mkU64( 0 ),
+                                 mkU64( 0x3F ) ) );
+      word_mask = 0xFFFFFFFFFFFFFFFFULL;
+      break;
+   default:
+      vex_printf("dis_av_rotate(ppc)(opc2)\n");
+      return False;
+   }
+
+   for( i = 0; i < num_words; i++ ) {
+      left_bits[i]  = newTemp( Ity_I8 );
+      right_bits[i] = newTemp( Ity_I8 );
+      shift[i] = newTemp( Ity_I8 );
+      mask[i]  = newTemp( Ity_V128 );
+      tmp128[i] = newTemp( Ity_V128 );
+      vA_word[i] = newTemp( Ity_V128 );
+
+      assign( shift[i],
+              unop( Iop_64to8,
+                    unop( Iop_V128to64,
+                          binop( Iop_AndV128,
+                                 binop( Iop_ShrV128,
+                                        mkexpr( vB ),
+                                        mkU8( (num_words - 1 - i )
+                                              * word_size ) ),
+                                 mkexpr( field_mask ) ) ) ) );
+
+      /* left_bits = 63 - mb.  Tells us how many bits to the left
+       * of mb to clear. Note for a word left_bits = 32+mb, for a double
+       * word left_bits = mb
+       */
+      assign( left_bits[i],
+              unop( Iop_64to8,
+                    binop( Iop_Add64,
+                           mkU64( 64 - word_size ),
+                           unop( Iop_V128to64,
+                                 binop( Iop_AndV128,
+                                        binop( Iop_ShrV128,
+                                               mkexpr( vB ),
+                                               mkU8( ( num_words - 1 - i )
+                                                     * word_size + 16 ) ),
+                                        mkexpr( field_mask ) ) ) ) ) );
+      /* right_bits = 63 - me.  Tells us how many bits to the right
+       * of me to clear. Note for a word, left_bits = me+32, for a double
+       * word left_bits = me
+       */
+      assign( right_bits[i],
+              unop( Iop_64to8,
+                    binop( Iop_Sub64,
+                           mkU64( word_size - 1 ),
+                           unop( Iop_V128to64,
+                                 binop( Iop_AndV128,
+                                        binop( Iop_ShrV128,
+                                               mkexpr( vB ),
+                                               mkU8( ( num_words - 1 - i )
+                                                     * word_size + 8 ) ),
+                                        mkexpr( field_mask ) ) ) ) ) );
+
+      /* create mask for 32-bit word or 64-bit word */
+      assign( mask[i],
+              binop( Iop_64HLtoV128,
+                     mkU64( 0 ),
+                     binop( Iop_Shl64,
+                            binop( Iop_Shr64,
+                                   binop( Iop_Shr64,
+                                          binop( Iop_Shl64,
+                                                 mkU64( 0xFFFFFFFFFFFFFFFF ),
+                                                 mkexpr( left_bits[i] ) ),
+                                          mkexpr( left_bits[i] ) ),
+                                   mkexpr( right_bits[i] ) ),
+                            mkexpr( right_bits[i] ) ) ) );
+
+      /* Need to rotate vA using a left and right shift of vA OR'd together
+       * then ANDed with the mask.
+       */
+      assign( vA_word[i], binop( Iop_AndV128,
+                                 mkexpr( vA ),
+                                 binop( Iop_ShlV128,
+                                        binop( Iop_64HLtoV128,
+                                               mkU64( 0 ),
+                                               mkU64( word_mask ) ),
+                                        mkU8( ( num_words - 1 - i )
+                                              * word_size ) ) ) );
+      assign( tmp128[i],
+              binop( Iop_AndV128,
+                     binop( Iop_ShlV128,
+                            mkexpr( mask[i] ),
+                            mkU8( ( num_words - 1 - i) * word_size ) ),
+                     binop( Iop_OrV128,
+                            binop( Iop_ShlV128,
+                                   mkexpr( vA_word[i] ),
+                                   mkexpr( shift[i] ) ),
+                            binop( Iop_ShrV128,
+                                   mkexpr( vA_word[i] ),
+                                   unop( Iop_32to8,
+                                         binop(Iop_Sub32,
+                                               mkU32( word_size ),
+                                               unop( Iop_8Uto32,
+                                                     mkexpr( shift[i] ) ) )
+                                         ) ) ) ) );
+   }
+
+   switch (opc2) {
+   case 0x85: // vrlwmi,  Vector Rotate Left Word then Mask Insert
+      DIP("vrlwmi %d,%d,v%d", vT_addr, vA_addr, vB_addr);
+
+      assign( src3, getVReg( vT_addr ) );
+      assign( mask128, unop( Iop_NotV128,
+                             mkOr4_V128_expr( binop( Iop_ShlV128,
+                                                     mkexpr( mask[0] ),
+                                                     mkU8( 96 ) ),
+                                              binop( Iop_ShlV128,
+                                                     mkexpr( mask[1] ),
+                                                     mkU8( 64 ) ),
+                                              binop( Iop_ShlV128,
+                                                     mkexpr( mask[2] ),
+                                                     mkU8( 32 ) ),
+                                              mkexpr( mask[3] ) ) ) );
+      assign( vT, binop( Iop_OrV128,
+                         binop( Iop_AndV128,
+                                mkexpr( src3 ),
+                                mkexpr( mask128 ) ),
+                         mkOr4_V128( tmp128[0], tmp128[1],
+                                     tmp128[2], tmp128[3] ) ) );
+      break;
+
+   case 0xC5: // vrldmi,  Vector Rotate Left Double word then Mask Insert
+      DIP("vrldmi %d,%d,v%d", vT_addr, vA_addr, vB_addr);
+
+      assign( src3, getVReg( vT_addr ) );
+      assign( mask128, unop( Iop_NotV128,
+                             binop( Iop_OrV128,
+                                    binop( Iop_ShlV128,
+                                           mkexpr( mask[0] ),
+                                           mkU8( 64 ) ),
+                                    mkexpr( mask[1] ) ) ) );
+
+      assign( vT, binop( Iop_OrV128,
+                         binop( Iop_AndV128,
+                                mkexpr( src3 ),
+                                mkexpr( mask128 ) ),
+                         binop( Iop_OrV128,
+                                mkexpr( tmp128[0] ),
+                                mkexpr( tmp128[1] ) ) ) );
+      break;
+
+   case 0x185: // vrlwnm,  Vector Rotate Left Word then AND with Mask
+      DIP("vrlwnm %d,%d,v%d", vT_addr, vA_addr, vB_addr);
+      assign( vT, mkOr4_V128( tmp128[0], tmp128[1], tmp128[2], tmp128[3] ) );
+      break;
+
+   case 0x1C5: // vrldnm,  Vector Rotate Left Doubleword then AND with Mask
+      DIP("vrldnm %d,%d,v%d", vT_addr, vA_addr, vB_addr);
+      assign( vT, binop( Iop_OrV128,
+                         mkexpr( tmp128[0] ),
+                         mkexpr( tmp128[1] ) ) );
+      break;
+   }
+
+   putVReg( vT_addr, mkexpr( vT ) );
+   return True;
+}
+
+/*
+  AltiVec Vector Extract Element Instructions
+*/
+static Bool dis_av_extract_element ( UInt theInstr )
+{
+   /* VX-Form,
+    * sorta destination and first source are GPR not vector registers
+    */
+
+   UChar opc1    = ifieldOPC( theInstr );
+   UChar rT_addr = ifieldRegDS( theInstr );
+   UChar rA_addr = ifieldRegA( theInstr );
+   UChar vB_addr = ifieldRegB( theInstr );
+   UInt  opc2    = IFIELD( theInstr, 0, 11 );
+
+   IRTemp vB = newTemp( Ity_V128 );
+   IRTemp rA = newTemp( Ity_I64 );
+   IRTemp rT = newTemp( Ity_I64 );
+
+   assign( vB, getVReg( vB_addr ) );
+   assign( rA, getIReg( rA_addr ) );
+
+   if ( opc1 != 0x4 ) {
+      vex_printf("dis_av_extract_element(ppc)(instr)\n");
+      return False;
+   }
+
+   switch ( opc2 ) {
+   case 0x60D: // vextublx, vector extract unsigned Byte Left-indexed
+      DIP("vextublx %d,%d,v%d", rT_addr, rA_addr, vB_addr);
+
+      assign( rT, extract_field_from_vector( vB,
+                                             binop( Iop_Sub64,
+                                                    mkU64( 15 ),
+                                                    mkexpr( rA ) ),
+                                             0xFF ) );
+
+      break;
+
+   case 0x64D: // vextuhlx, vector extract unsigned Halfword Left-indexed
+      DIP("vextuhlx %d,%d,v%d", rT_addr, rA_addr, vB_addr);
+
+      assign( rT, extract_field_from_vector( vB,
+                                             binop( Iop_Sub64,
+                                                    mkU64( 14 ),
+                                                    mkexpr( rA ) ),
+                                             0xFFFF ) );
+      break;
+
+   case 0x68D: // vextuwlx, vector extract unsigned Word Left-indexed
+      DIP("vextuwlx %d,%d,v%d", rT_addr, rA_addr, vB_addr);
+
+      assign( rT, extract_field_from_vector( vB,
+                                             binop( Iop_Sub64,
+                                                    mkU64( 12 ),
+                                                    mkexpr( rA ) ),
+                                             0xFFFFFFFF ) );
+      break;
+
+   case 0x70D: // vextubrx, vector extract unsigned Byte Right-indexed
+      DIP("vextubrx %d,%d,v%d", rT_addr, rA_addr, vB_addr);
+
+      assign( rT, extract_field_from_vector( vB, mkexpr( rA ), 0xFF ) );
+      break;
+
+   case 0x74D: // vextuhrx, vector extract unsigned Halfword Right-indexed
+      DIP("vextuhrx %d,%d,v%d", rT_addr, rA_addr, vB_addr);
+
+      assign( rT, extract_field_from_vector( vB, mkexpr( rA ), 0xFFFF ) );
+      break;
+
+   case 0x78D: // vextuwrx, vector extract unsigned Word Right-indexed
+      DIP("vextuwrx %d,%d,v%d", rT_addr, rA_addr, vB_addr);
+
+      assign( rT, extract_field_from_vector( vB, mkexpr( rA ), 0xFFFFFFFF ) );
+      break;
+
+   default:
+      vex_printf("dis_av_extract_element(ppc)(opc2)\n");
+      return False;
+   }
+   putIReg( rT_addr, mkexpr( rT ) );
+   return True;
+}
+
+/*
  * VSX scalar and vector convert instructions
  */
 static Bool
@@ -12851,22 +15763,22 @@
          assign( res1, unop(Iop_64HIto32, mkexpr(lo64)) );
          assign( res0, unop(Iop_64to32,   mkexpr(lo64)) );
 
-         b3_result = IRExpr_ITE(is_NaN_32(b3),
+         b3_result = IRExpr_ITE(is_NaN(Ity_I32, b3),
                                 // then: result is 0x{8|0}80000000
                                 mkU32(un_signed ? 0x00000000 : 0x80000000),
                                 // else: result is from the Iop_QFtoI32{s|u}x4_RZ
                                 mkexpr(res3));
-         b2_result = IRExpr_ITE(is_NaN_32(b2),
+         b2_result = IRExpr_ITE(is_NaN(Ity_I32, b2),
                                 // then: result is 0x{8|0}80000000
                                 mkU32(un_signed ? 0x00000000 : 0x80000000),
                                 // else: result is from the Iop_QFtoI32{s|u}x4_RZ
                                 mkexpr(res2));
-         b1_result = IRExpr_ITE(is_NaN_32(b1),
+         b1_result = IRExpr_ITE(is_NaN(Ity_I32, b1),
                                 // then: result is 0x{8|0}80000000
                                 mkU32(un_signed ? 0x00000000 : 0x80000000),
                                 // else: result is from the Iop_QFtoI32{s|u}x4_RZ
                                 mkexpr(res1));
-         b0_result = IRExpr_ITE(is_NaN_32(b0),
+         b0_result = IRExpr_ITE(is_NaN(Ity_I32, b0),
                                 // then: result is 0x{8|0}80000000
                                 mkU32(un_signed ? 0x00000000 : 0x80000000),
                                 // else: result is from the Iop_QFtoI32{s|u}x4_RZ
@@ -13871,6 +16783,116 @@
          putVReg( vRT_addr, unop( Iop_PwBitMtxXpose64x2, mkexpr( vB ) ) );
          break;
 
+      case 0x5CC:  // vbpermd Vector Bit Permute Doubleword
+      {
+         UChar vRA_addr = ifieldRegA( theInstr );
+         IRTemp vA = newTemp( Ity_V128 );
+         UInt j;
+         IRTemp index_dword_hi[8]; // index in double word
+         IRTemp index_dword_lo[8];
+         IRTemp index_dword_hi_valid[8];
+         IRTemp index_dword_lo_valid[8];
+         IRTemp pb_dword_hi[8];  // permute bit
+         IRTemp pb_dword_lo[8];
+         IRTemp tmp_hi[9];
+         IRTemp tmp_lo[9];
+
+         DIP("vbpermd v%d,v%d,v%d\n", vRT_addr, vRA_addr, vRB_addr);
+
+         tmp_hi[0] = newTemp( Ity_I64 );
+         tmp_lo[0] = newTemp( Ity_I64 );
+
+         assign( vA, getVReg(vRA_addr) );
+         assign( tmp_hi[0], mkU64( 0 ) );
+         assign( tmp_lo[0], mkU64( 0 ) );
+
+         for (j=0; j<8; j++) {
+            index_dword_hi[j] = newTemp( Ity_I64 );
+            index_dword_lo[j] = newTemp( Ity_I64 );
+            index_dword_hi_valid[j] = newTemp( Ity_I64 );
+            index_dword_lo_valid[j] = newTemp( Ity_I64 );
+            pb_dword_hi[j] = newTemp( Ity_I64 );
+            pb_dword_lo[j] = newTemp( Ity_I64 );
+            tmp_hi[j+1] = newTemp( Ity_I64 );
+            tmp_lo[j+1] = newTemp( Ity_I64 );
+
+            assign( index_dword_hi[j],
+                    binop( Iop_And64,
+                           binop( Iop_Shr64,
+                                  unop( Iop_V128HIto64,
+                                        mkexpr( vB ) ),
+                                  mkU8( ( 7 - j ) * 8 ) ),
+                           mkU64( 0xFF ) ) );
+
+            assign( index_dword_lo[j],
+                    binop( Iop_And64,
+                           binop( Iop_Shr64,
+                                  unop( Iop_V128to64,
+                                        mkexpr( vB ) ),
+                                  mkU8( ( 7 - j ) * 8 ) ),
+                           mkU64( 0xFF ) ) );
+
+            assign( index_dword_hi_valid[j],
+                    unop( Iop_1Sto64,
+                          binop( Iop_CmpLT64U,
+                                 mkexpr( index_dword_hi[j] ),
+                                 mkU64( 64 ) ) ) );
+
+            assign( index_dword_lo_valid[j],
+                    unop( Iop_1Sto64,
+                          binop( Iop_CmpLT64U,
+                                 mkexpr( index_dword_lo[j] ),
+                                 mkU64( 64 ) ) ) );
+            assign( pb_dword_hi[j],
+                    binop( Iop_And64,
+                           binop( Iop_Shr64,
+                                  unop( Iop_V128HIto64,
+                                        mkexpr( vA ) ),
+                                  unop( Iop_64to8,
+                                        binop( Iop_Sub64,
+                                               mkU64( 63 ),
+                                               mkexpr( index_dword_hi[j] )
+                                               ) ) ),
+                           mkU64( 0x1 ) ) );
+
+            assign( pb_dword_lo[j],
+                    binop( Iop_And64,
+                           binop( Iop_Shr64,
+                                  unop( Iop_V128to64,
+                                        mkexpr( vA ) ),
+                                  unop( Iop_64to8,
+                                        binop( Iop_Sub64,
+                                               mkU64( 63 ),
+                                               mkexpr( index_dword_lo[j] )
+                                               ) ) ),
+                           mkU64( 0x1 ) ) );
+
+            assign( tmp_hi[j+1],
+                    binop( Iop_Or64,
+                           binop( Iop_And64,
+                                  mkexpr( index_dword_hi_valid[j] ),
+                                  binop( Iop_Shl64,
+                                         mkexpr( pb_dword_hi[j] ),
+                                         mkU8( 7 - j ) ) ),
+                           mkexpr( tmp_hi[j] ) ) );
+
+            assign( tmp_lo[j+1],
+                    binop( Iop_Or64,
+                           binop( Iop_And64,
+                                  mkexpr( index_dword_lo_valid[j] ),
+                                  binop( Iop_Shl64,
+                                         mkexpr( pb_dword_lo[j] ),
+                                         mkU8( 7 - j ) ) ),
+                           mkexpr( tmp_lo[j] ) ) );
+         }
+
+         putVReg( vRT_addr,
+                  binop( Iop_64HLtoV128,
+                         mkexpr( tmp_hi[8] ),
+                         mkexpr( tmp_lo[8] ) ) );
+      }
+      break;
+
       default:
          vex_printf("dis_av_count_bitTranspose(ppc)(opc2)\n");
          return False;
@@ -13949,8 +16971,8 @@
    IRTemp frA_isQNaN = newTemp(Ity_I1);
    IRTemp frB_isQNaN = newTemp(Ity_I1);
 
-   assign( frA_isNaN, is_NaN( frA_I64 ) );
-   assign( frB_isNaN, is_NaN( frB_I64 ) );
+   assign( frA_isNaN, is_NaN( Ity_I64, frA_I64 ) );
+   assign( frB_isNaN, is_NaN( Ity_I64, frB_I64 ) );
    // If operand is a NAN and bit 12 is '0', then it's an SNaN
    assign( frA_isSNaN,
            mkAND1( mkexpr(frA_isNaN),
@@ -14042,9 +17064,10 @@
    IRTemp anyNaN = newTemp(Ity_I1);
    IRTemp frA_isZero = newTemp(Ity_I1);
    IRTemp frB_isZero = newTemp(Ity_I1);
-   assign(frA_isZero, is_Zero(frA_I64, False /*not single precision*/ ));
-   assign(frB_isZero, is_Zero(frB_I64, False /*not single precision*/ ));
-   assign(anyNaN, mkOR1(is_NaN(frA_I64), is_NaN(frB_I64)));
+   assign( frA_isZero, is_Zero( Ity_I64, frA_I64 ) );
+   assign( frB_isZero, is_Zero( Ity_I64, frB_I64 ) );
+   assign( anyNaN, mkOR1( is_NaN( Ity_I64, frA_I64 ),
+                          is_NaN(Ity_I64, frB_I64 ) ) );
 #define MINUS_ZERO 0x8000000000000000ULL
 
    return IRExpr_ITE( /* If both arguments are zero . . . */
@@ -14164,7 +17187,7 @@
 #define SNAN_MASK 0x0008000000000000ULL
    hi32 = unop( Iop_64HIto32, mkexpr(frB_I64) );
    assign( is_SNAN,
-           mkAND1( is_NaN( frB_I64 ),
+           mkAND1( is_NaN( Ity_I64, frB_I64 ),
                    binop( Iop_CmpEQ32,
                           binop( Iop_And32, hi32, mkU32( 0x00080000 ) ),
                           mkU32( 0 ) ) ) );
@@ -14209,6 +17232,7 @@
          assign(frB2, unop(Iop_V128to64, getVSReg( XB )));
 
          DIP("%s v%d,v%d\n", redp ? "xvredp" : "xvrsqrtedp", XT, XB);
+
          if (!redp) {
             assign( sqrtHi,
                     binop( Iop_SqrtF64,
@@ -15003,6 +18027,7 @@
                                            crfD, XA, XB);
          ccPPC32 = get_fp_cmp_CR_val( binop(Iop_CmpF64, mkexpr(frA), mkexpr(frB)));
          putGST_field( PPC_GST_CR, mkexpr(ccPPC32), crfD );
+         putFPCC( mkexpr( ccPPC32 ) );
          break;
 
       default:
@@ -15197,7 +18222,7 @@
  * Miscellaneous VSX Scalar Instructions
  */
 static Bool
-dis_vxs_misc( UInt theInstr, UInt opc2 )
+dis_vxs_misc( UInt theInstr, UInt opc2, int allow_isa_3_0 )
 {
 #define VG_PPC_SIGN_MASK 0x7fffffffffffffffULL
    /* XX3-Form and XX2-Form */
@@ -15220,8 +18245,143 @@
     * of VSX[XT] are undefined after the operation; therefore, we can simply
     * move the entire array element where it makes sense to do so.
     */
+   if (( opc2 == 0x168 ) && ( IFIELD( theInstr, 19, 2 ) == 0 ) )
+      {
+         /* Special case of XX1-Form with immediate value
+          *  xxspltib (VSX Vector Splat Immediate Byte)
+          */
+         UInt uim = IFIELD( theInstr, 11, 8 );
+         UInt word_value = ( uim << 24 ) | ( uim << 16 ) | ( uim << 8 ) | uim;
 
-   switch (opc2) {
+         DIP("xxspltib v%d,%d\n", (UInt)XT, uim);
+         putVSReg(XT, binop( Iop_64HLtoV128,
+                             binop( Iop_32HLto64,
+                                    mkU32( word_value ),
+                                    mkU32( word_value ) ),
+                             binop( Iop_32HLto64,
+                                    mkU32( word_value ),
+                                    mkU32( word_value ) ) ) );
+         return True;
+      }
+
+   switch ( opc2 ) {
+      case 0x0ec: // xscmpexpdp (VSX Scalar Compare Exponents Double-Precision)
+      {
+         IRExpr *bit4, *bit5, *bit6, *bit7;
+         UInt BF = IFIELD( theInstr, 23, 3 );
+         IRTemp eq_lt_gt = newTemp( Ity_I32 );
+         IRTemp CC = newTemp( Ity_I32 );
+         IRTemp vA_hi = newTemp( Ity_I64 );
+         IRTemp vB_hi = newTemp( Ity_I64 );
+         UChar rA_addr = ifieldRegA(theInstr);
+         UChar rB_addr = ifieldRegB(theInstr);
+
+         DIP("xscmpexpdp %d,v%d,v%d\n", BF, rA_addr, rB_addr);
+
+         assign( vA_hi, unop( Iop_V128HIto64, mkexpr( vA ) ) );
+         assign( vB_hi, unop( Iop_V128HIto64, mkexpr( vB ) ) );
+
+         /* A exp < B exp */
+         bit4 = binop( Iop_CmpLT64U,
+                      binop( Iop_And64,
+                            mkexpr( vA_hi ),
+                             mkU64( 0x7FFF0000000000 ) ),
+                      binop( Iop_And64,
+                             mkexpr( vB_hi ),
+                             mkU64( 0x7FFF0000000000 ) ) );
+         /*  A exp > B exp */
+         bit5 = binop( Iop_CmpLT64U,
+                      binop( Iop_And64,
+                             mkexpr( vB_hi ),
+                             mkU64( 0x7FFF00000000000 ) ),
+                      binop( Iop_And64,
+                             mkexpr( vA_hi ),
+                             mkU64( 0x7FFF00000000000 ) ) );
+         /* test equal */
+         bit6 = binop( Iop_CmpEQ64,
+                      binop( Iop_And64,
+                             mkexpr( vA_hi ),
+                             mkU64( 0x7FFF00000000000 ) ),
+                      binop( Iop_And64,
+                            mkexpr( vB_hi ),
+                             mkU64( 0x7FFF00000000000 ) ) );
+
+         /* exp A or exp B is NaN */
+         bit7 = mkOR1( is_NaN( Ity_V128, vA ),
+                       is_NaN( Ity_V128, vB ) );
+
+         assign( eq_lt_gt, binop( Iop_Or32,
+                                  binop( Iop_Shl32,
+                                         unop( Iop_1Uto32, bit4 ),
+                                         mkU8( 3) ),
+                                  binop( Iop_Or32,
+                                         binop( Iop_Shl32,
+                                                unop( Iop_1Uto32, bit5 ),
+                                                mkU8( 2) ),
+                                         binop( Iop_Shl32,
+                                                unop( Iop_1Uto32, bit6 ),
+                                                mkU8( 1 ) ) ) ) );
+         assign(CC, binop( Iop_Or32,
+                           mkexpr( eq_lt_gt ) ,
+                           unop( Iop_1Uto32, bit7 ) ) );
+
+         putGST_field( PPC_GST_CR, mkexpr( CC ), BF );
+         putFPCC( mkexpr( CC ) );
+         return True;
+      }
+      break;
+
+      case 0x14A: // xxextractuw (VSX Vector Extract Unsigned Word)
+      {
+         UInt uim = IFIELD( theInstr, 16, 4 );
+
+         DIP("xxextractuw v%d,v%d,%d\n", (UInt)XT, (UInt)XB, uim);
+
+         putVSReg( XT,
+                   binop( Iop_ShlV128,
+                          binop( Iop_AndV128,
+                                 binop( Iop_ShrV128,
+                                        mkexpr( vB ),
+                                        mkU8( ( 12 - uim ) * 8 ) ),
+                                 binop(Iop_64HLtoV128,
+                                       mkU64( 0 ),
+                                       mkU64( 0xFFFFFFFF ) ) ),
+                          mkU8( ( 32*2 ) ) ) );
+         break;
+      }
+      case 0x16A: // xxinsertw (VSX Vector insert Word)
+      {
+         UInt uim = IFIELD( theInstr, 16, 4 );
+         IRTemp vT = newTemp( Ity_V128 );
+         IRTemp tmp = newTemp( Ity_V128 );
+
+         DIP("xxinsertw v%d,v%d,%d\n", (UInt)XT, (UInt)XB, uim);
+
+         assign( vT, getVSReg( XT ) );
+         assign( tmp, binop( Iop_AndV128,
+                             mkexpr( vT ),
+                             unop( Iop_NotV128,
+                                   binop( Iop_ShlV128,
+                                          binop( Iop_64HLtoV128,
+                                                 mkU64( 0x0 ),
+                                                 mkU64( 0xFFFFFFFF) ),
+                                          mkU8( ( 12 - uim ) * 8 ) ) ) ) );
+
+         putVSReg( XT,
+                   binop( Iop_OrV128,
+                          binop( Iop_ShlV128,
+                                 binop( Iop_AndV128,
+                                        binop( Iop_ShrV128,
+                                               mkexpr( vB ),
+                                               mkU8( 32 * 2 ) ),
+                                        binop( Iop_64HLtoV128,
+                                               mkU64( 0 ),
+                                               mkU64( 0xFFFFFFFF ) ) ),
+                                 mkU8( ( 12 - uim ) * 8 ) ),
+                          mkexpr( tmp ) ) );
+         break;
+      }
+
       case 0x2B2: // xsabsdp (VSX scalar absolute value double-precision
       {
          /* Move abs val of dw 0 of VSX[XB] to dw 0 of VSX[XT]. */
@@ -15244,6 +18404,186 @@
          putVSReg(XT, mkexpr(absVal));
          break;
       }
+
+      case 0x2b6: // xsxexpdp (VSX Scalar Extract Exponent Double-Precision)
+                  // xsxsigdp (VSX Scalar Extract Significand Doulbe-Precision)
+                  // xsvhpdp  (VSX Scalar Convert Half-Precision format
+                  //           to Double-Precision format)
+                  // xscvdphp (VSX Scalar round & convert Double-precision
+                  //           format to Half-precision format)
+      {
+         IRTemp rT = newTemp( Ity_I64 );
+         UInt inst_select = IFIELD( theInstr, 16, 5);
+
+         if (inst_select == 0) {
+            DIP("xsxexpd %d,v%d\n", (UInt)XT, (UInt)XB);
+
+            assign( rT, binop( Iop_Shr64,
+                               binop( Iop_And64,
+                                      unop( Iop_V128HIto64, mkexpr( vB ) ),
+                                      mkU64( 0x7FF0000000000000 ) ),
+                               mkU8 ( 52 ) ) );
+         } else if (inst_select == 1) {
+            IRExpr *normal;
+            IRTemp tmp = newTemp(Ity_I64);
+
+            DIP("xsxsigdp v%d,v%d\n",  (UInt)XT, (UInt)XB);
+
+            assign( tmp, unop( Iop_V128HIto64, mkexpr( vB ) ) );
+
+            /* Value is normal if it isn't infinite, zero or denormalized */
+            normal = mkNOT1( mkOR1(
+                                   mkOR1( is_NaN( Ity_I64, tmp ),
+                                          is_Inf( Ity_I64, tmp ) ),
+                                   mkOR1( is_Zero( Ity_I64, tmp ),
+                                          is_Denorm( Ity_I64, tmp ) ) ) );
+
+            assign( rT, binop( Iop_Or64,
+                               binop( Iop_And64,
+                                      mkexpr( tmp ),
+                                      mkU64( 0xFFFFFFFFFFFFF ) ),
+                               binop( Iop_Shl64,
+                                      unop( Iop_1Uto64, normal),
+                                      mkU8( 52 ) ) ) );
+            putIReg( XT, mkexpr( rT ) );
+
+        } else if (inst_select == 16) {
+            IRTemp result = newTemp( Ity_V128 );
+            IRTemp value = newTemp( Ity_I64 );
+            /* Note: PPC only coverts the 16-bit value in the upper 64-bits
+             * of the source V128 to a 64-bit value stored in the upper
+             * 64-bits of the V128 result.  The contents of the lower 64-bits
+             * is undefined.
+             */
+
+            DIP("xscvhpdp v%d, v%d\n", (UInt)XT, (UInt)XB);
+            assign( result, unop( Iop_F16toF64x2, mkexpr( vB ) ) );
+
+            putVSReg( XT, mkexpr( result ) );
+
+            assign( value, unop( Iop_V128HIto64, mkexpr( result ) ) );
+            generate_store_FPRF( Ity_I64, value );
+            return True;
+
+         } else if (inst_select == 17) {   // xscvdphp
+            IRTemp value = newTemp( Ity_I32 );
+            IRTemp result = newTemp( Ity_V128 );
+            /* Note: PPC only coverts the 64-bit value in the upper 64-bits of
+             * the V128 and stores the 16-bit result in the upper word of the
+             * V128 result.  The contents of the lower 64-bits is undefined.
+             */
+            DIP("xscvdphp v%d, v%d\n", (UInt)XT, (UInt)XB);
+            assign( result,  unop( Iop_F64toF16x2, mkexpr( vB ) ) );
+            assign( value, unop( Iop_64to32, unop( Iop_V128HIto64,
+                                                   mkexpr( result ) ) ) );
+            putVSReg( XT, mkexpr( result ) );
+            generate_store_FPRF( Ity_I16, value );
+            return True;
+
+         } else {
+            vex_printf( "dis_vxv_scalar_extract_exp_sig invalid inst_select (ppc)(opc2)\n" );
+            vex_printf("inst_select = %d\n", inst_select);
+            return False;
+         }
+      }
+      break;
+
+      case 0x254:  // xststdcsp (VSX Scalar Test Data Class Single-Precision)
+      case 0x2D4:  // xststdcdp (VSX Scalar Test Data Class Double-Precision)
+      {
+         /* These instructions only differ in that the single precision
+            instruction, xststdcsp, has the additional constraint on the
+            denormal test that the exponent be greater then zero and
+            less then 0x381. */
+         IRTemp vB_hi = newTemp( Ity_I64 );
+         UInt BF = IFIELD( theInstr, 23, 3 );
+         UInt DCMX_mask = IFIELD( theInstr, 16, 7 );
+         IRTemp NaN = newTemp( Ity_I64 );
+         IRTemp inf = newTemp( Ity_I64 );
+         IRTemp zero = newTemp( Ity_I64 );
+         IRTemp dnorm = newTemp( Ity_I64 );
+         IRTemp pos = newTemp( Ity_I64 );
+         IRTemp not_sp = newTemp( Ity_I64 );
+         IRTemp DCM = newTemp( Ity_I64 );
+         IRTemp CC = newTemp( Ity_I64 );
+         IRTemp exponent = newTemp( Ity_I64 );
+         IRTemp tmp = newTemp( Ity_I64 );
+
+         assign( vB_hi, unop( Iop_V128HIto64, mkexpr( vB ) ) );
+
+         assign( pos, unop( Iop_1Uto64,
+                            binop( Iop_CmpEQ64,
+                                   binop( Iop_Shr64,
+                                          mkexpr( vB_hi ),
+                                          mkU8( 63 ) ),
+                                   mkU64( 0 ) ) ) );
+
+         assign( NaN, unop( Iop_1Uto64, is_NaN( Ity_I64, vB_hi ) ) );
+         assign( inf, unop( Iop_1Uto64, is_Inf( Ity_I64, vB_hi ) ) );
+         assign( zero, unop( Iop_1Uto64, is_Zero( Ity_I64, vB_hi ) ) );
+
+         if (opc2 == 0x254) {
+            DIP("xststdcsp %d,v%d,%d\n", BF, (UInt)XB, DCMX_mask);
+
+            /* The least significant bit of the CC is set to 1 if the double
+               precision value is not representable as a single precision
+               value. The spec says the bit is set if:
+                  src != convert_SPtoDP(convert_DPtoSP(src))
+            */
+            assign( tmp,
+                    unop( Iop_ReinterpF64asI64,
+                                unop( Iop_F32toF64,
+                                      unop( Iop_TruncF64asF32,
+                                            unop( Iop_ReinterpI64asF64,
+                                                  mkexpr( vB_hi ) ) ) ) ) );
+            assign( not_sp, unop( Iop_1Uto64,
+                                  mkNOT1( binop( Iop_CmpEQ64,
+                                                 mkexpr( vB_hi ),
+                                                 mkexpr( tmp ) ) ) ) );
+            assign( exponent,
+                    binop( Iop_Shr64,
+                           binop( Iop_And64,
+                                  mkexpr( vB_hi ),
+                                  mkU64( 0x7ff0000000000000 ) ),
+                           mkU8( 52 ) ) );
+            assign( dnorm, unop( Iop_1Uto64,
+                                 mkOR1( is_Denorm( Ity_I64, vB_hi ),
+                                        mkAND1( binop( Iop_CmpLT64U,
+                                                       mkexpr( exponent ),
+                                                       mkU64( 0x381 ) ),
+                                                binop( Iop_CmpNE64,
+                                                       mkexpr( exponent ),
+                                                       mkU64( 0x0 ) ) ) ) ) );
+
+         } else {
+            DIP("xststdcdp %d,v%d,%d\n", BF, (UInt)XB, DCMX_mask);
+            assign( not_sp,  mkU64( 0 ) );
+            assign( dnorm, unop( Iop_1Uto64, is_Denorm( Ity_I64, vB_hi ) ) );
+         }
+
+         assign( DCM, create_DCM( Ity_I64, NaN, inf, zero, dnorm, pos ) );
+         assign( CC,
+                 binop( Iop_Or64,
+                        binop( Iop_And64,    /* vB sign bit */
+                               binop( Iop_Shr64,
+                                      mkexpr( vB_hi ),
+                                      mkU8( 60 ) ),
+                               mkU64( 0x8 ) ),
+                        binop( Iop_Or64,
+                               binop( Iop_Shl64,
+                                      unop( Iop_1Uto64,
+                                            binop( Iop_CmpNE64,
+                                                   binop( Iop_And64,
+                                                          mkexpr( DCM ),
+                                                          mkU64( DCMX_mask ) ),
+                                                   mkU64( 0 ) ) ),
+                                      mkU8( 1 ) ),
+                               mkexpr( not_sp ) ) ) );
+         putGST_field( PPC_GST_CR, unop( Iop_64to32, mkexpr( CC ) ), BF );
+         putFPCC( unop( Iop_64to32, mkexpr( CC ) ) );
+      }
+      return True;
+
       case 0x2C0: // xscpsgndp
       {
          /* Scalar copy sign double-precision */
@@ -15432,6 +18772,704 @@
          break;
       }
 
+      case 0x354: // xvtstdcsp (VSX Test Data Class Single-Precision)
+      {
+         UInt DX_mask = IFIELD( theInstr, 16, 5 );
+         UInt DC_mask = IFIELD( theInstr, 6, 1 );
+         UInt DM_mask = IFIELD( theInstr, 2, 1 );
+         UInt DCMX_mask = (DC_mask << 6) | (DM_mask << 5) | DX_mask;
+
+         IRTemp match_value[4];
+         IRTemp value[4];
+         IRTemp NaN[4];
+         IRTemp inf[4];
+         IRTemp pos[4];
+         IRTemp DCM[4];
+         IRTemp zero[4];
+         IRTemp dnorm[4];
+         Int i;
+
+         DIP("xvtstdcsp v%d,v%d,%d\n", (UInt)XT, (UInt)XB, DCMX_mask);
+
+         for (i = 0; i < 4; i++) {
+            NaN[i]   = newTemp(Ity_I32);
+            inf[i]   = newTemp(Ity_I32);
+            pos[i]   = newTemp(Ity_I32);
+            DCM[i]   = newTemp(Ity_I32);
+            zero[i]  = newTemp(Ity_I32);
+            dnorm[i] = newTemp(Ity_I32);
+
+            value[i] = newTemp(Ity_I32);
+            match_value[i] = newTemp(Ity_I32);
+
+            assign( value[i],
+                    unop( Iop_64to32,
+                          unop( Iop_V128to64,
+                                binop( Iop_AndV128,
+                                       binop( Iop_ShrV128,
+                                              mkexpr( vB ),
+                                              mkU8( (3-i)*32 ) ),
+                                       binop( Iop_64HLtoV128,
+                                              mkU64( 0x0 ),
+                                              mkU64( 0xFFFFFFFF ) ) ) ) ) );
+
+            assign( pos[i], unop( Iop_1Uto32,
+                                  binop( Iop_CmpEQ32,
+                                         binop( Iop_Shr32,
+                                                mkexpr( value[i] ),
+                                                mkU8( 31 ) ),
+                                         mkU32( 0 ) ) ) );
+
+            assign( NaN[i], unop( Iop_1Uto32, is_NaN( Ity_I32, value[i] ) ));
+            assign( inf[i], unop( Iop_1Uto32, is_Inf( Ity_I32, value[i] ) ) );
+            assign( zero[i], unop( Iop_1Uto32, is_Zero( Ity_I32, value[i] ) ) );
+
+            assign( dnorm[i], unop( Iop_1Uto32, is_Denorm( Ity_I32,
+                                                           value[i] ) ) );
+            assign( DCM[i], create_DCM( Ity_I32, NaN[i], inf[i], zero[i],
+                                        dnorm[i], pos[i] ) );
+
+            assign( match_value[i],
+                    unop( Iop_1Sto32,
+                          binop( Iop_CmpNE32,
+                                 binop( Iop_And32,
+                                        mkU32( DCMX_mask ),
+                                        mkexpr( DCM[i] ) ),
+                                 mkU32( 0 ) ) ) );
+         }
+
+         putVSReg( XT, binop( Iop_64HLtoV128,
+                              binop( Iop_32HLto64,
+                                     mkexpr( match_value[0] ),
+                                     mkexpr( match_value[1] ) ),
+                              binop( Iop_32HLto64,
+                                     mkexpr( match_value[2] ),
+                                     mkexpr( match_value[3] ) ) ) );
+      }
+      break;
+
+      case 0x360:  // xviexpsp  (VSX Vector Insert Exponent Single-Precision)
+      {
+            Int i;
+            IRTemp new_XT[5];
+            IRTemp A_value[4];
+            IRTemp B_value[4];
+            IRExpr *sign[4], *expr[4], *fract[4];
+
+            DIP("xviexpsp v%d,v%d\n", XT, XB);
+            new_XT[0] = newTemp(Ity_V128);
+            assign( new_XT[0], binop( Iop_64HLtoV128,
+                                      mkU64( 0x0 ),
+                                      mkU64( 0x0 ) ) );
+
+            for (i = 0; i < 4; i++) {
+               A_value[i] = newTemp(Ity_I32);
+               B_value[i] = newTemp(Ity_I32);
+
+               assign( A_value[i],
+                       unop( Iop_64to32,
+                             unop( Iop_V128to64,
+                                   binop( Iop_AndV128,
+                                          binop( Iop_ShrV128,
+                                                 mkexpr( vA ),
+                                                 mkU8( (3-i)*32 ) ),
+                                          binop( Iop_64HLtoV128,
+                                                 mkU64( 0x0 ),
+                                                 mkU64( 0xFFFFFFFF ) ) ) ) ) );
+               assign( B_value[i],
+                       unop( Iop_64to32,
+                             unop( Iop_V128to64,
+                                   binop( Iop_AndV128,
+                                          binop( Iop_ShrV128,
+                                                 mkexpr( vB ),
+                                                 mkU8( (3-i)*32 ) ),
+                                          binop( Iop_64HLtoV128,
+                                                 mkU64( 0x0 ),
+                                                 mkU64( 0xFFFFFFFF ) ) ) ) ) );
+
+               sign[i] = binop( Iop_And32, mkexpr( A_value[i] ),
+                                mkU32( 0x80000000 ) );
+               expr[i] = binop( Iop_Shl32,
+                                binop( Iop_And32, mkexpr( B_value[i] ),
+                                       mkU32( 0xFF ) ),
+                                mkU8( 23 ) );
+               fract[i] = binop( Iop_And32, mkexpr( A_value[i] ),
+                                 mkU32( 0x007FFFFF ) );
+
+               new_XT[i+1] = newTemp(Ity_V128);
+               assign( new_XT[i+1],
+                       binop( Iop_OrV128,
+                              binop( Iop_ShlV128,
+                                     binop( Iop_64HLtoV128,
+                                            mkU64( 0 ),
+                                            binop( Iop_32HLto64,
+                                                   mkU32( 0 ),
+                                                   binop( Iop_Or32,
+                                                         binop( Iop_Or32,
+                                                                 sign[i],
+                                                                 expr[i] ),
+                                                          fract[i] ) ) ),
+                                     mkU8( (3-i)*32 ) ),
+                              mkexpr( new_XT[i] ) ) );
+            }
+            putVSReg( XT, mkexpr( new_XT[4] ) );
+      }
+      break;
+
+      case 0x396:  // xsiexpdp (VSX Scalar Insert Exponent Double-Precision)
+      {
+         IRExpr *sign, *expr, *fract;
+         UChar rA_addr = ifieldRegA(theInstr);
+         UChar rB_addr = ifieldRegB(theInstr);
+         IRTemp rA = newTemp( Ity_I64 );
+         IRTemp rB = newTemp( Ity_I64 );
+
+         DIP("xsiexpdp v%d,%d,%d\n", (UInt)XT, (UInt)rA_addr, (UInt)rB_addr);
+         assign( rA, getIReg(rA_addr));
+         assign( rB, getIReg(rB_addr));
+
+         sign = binop( Iop_And64, mkexpr( rA ), mkU64( 0x8000000000000000 ) );
+         expr = binop( Iop_Shl64,
+                       binop( Iop_And64, mkexpr( rB ), mkU64( 0x7FF ) ),
+                       mkU8( 52 ) );
+         fract = binop( Iop_And64, mkexpr( rA ), mkU64( 0x000FFFFFFFFFFFFF ) );
+
+         putVSReg( XT, binop( Iop_64HLtoV128,
+                              binop( Iop_Or64,
+                                     binop( Iop_Or64, sign, expr ),
+                                     fract ),
+                              mkU64( 0 ) ) );
+      }
+      break;
+
+      case 0x3B6: // xvxexpdp (VSX Vector Extract Exponent Double-Precision)
+                  // xvxsigdp (VSX Vector Extract Significand Double-Precision)
+                  // xxbrh
+                  // xvxexpsp (VSX Vector Extract Exponent Single-Precision)
+                  // xvxsigsp (VSX Vector Extract Significand Single-Precision)
+                  // xxbrw
+                  // xxbrd
+                  // xxbrq
+                  // xvcvhpsp (VSX Vector Convert Half-Precision format to Single-Precision format)
+                  // xvcvsphp (VSX Vector round and convert Single-Precision format to Half-Precision format)
+      {
+         UInt inst_select = IFIELD( theInstr, 16, 5);
+
+         if (inst_select == 0) {
+            DIP("xvxexpdp v%d,v%d\n", XT, XB);
+
+            putVSReg( XT, binop( Iop_ShrV128,
+                                 binop( Iop_AndV128,
+                                        mkexpr( vB ),
+                                        binop( Iop_64HLtoV128,
+                                               mkU64( 0x7FF0000000000000 ),
+                                               mkU64( 0x7FF0000000000000 ) ) ),
+                                 mkU8( 52 ) ) );
+
+         } else if (inst_select == 1) {
+            Int i;
+            IRExpr *normal[2];
+            IRTemp value[2];
+            IRTemp new_XT[3];
+
+            DIP("xvxsigdp v%d,v%d\n", XT, XB);
+            new_XT[0] = newTemp(Ity_V128);
+            assign( new_XT[0], binop( Iop_64HLtoV128,
+                                      mkU64( 0x0 ),
+                                      mkU64( 0x0 ) ) );
+
+            for (i = 0; i < 2; i++) {
+               value[i] = newTemp(Ity_I64);
+               assign( value[i],
+                       unop( Iop_V128to64,
+                            binop( Iop_AndV128,
+                                    binop( Iop_ShrV128,
+                                           mkexpr( vB ),
+                                           mkU8( (1-i)*64 ) ),
+                                    binop( Iop_64HLtoV128,
+                                           mkU64( 0x0 ),
+                                           mkU64( 0xFFFFFFFFFFFFFFFF ) ) ) ) );
+
+               /* Value is normal if it isn't infinite, zero or denormalized */
+               normal[i] = mkNOT1( mkOR1(
+                                         mkOR1( is_NaN( Ity_I64, value[i] ),
+                                                is_Inf( Ity_I64, value[i] ) ),
+                                         mkOR1( is_Zero( Ity_I64, value[i] ),
+                                                is_Denorm( Ity_I64,
+                                                           value[i] ) ) ) );
+               new_XT[i+1] = newTemp(Ity_V128);
+
+               assign( new_XT[i+1],
+                       binop( Iop_OrV128,
+                              binop( Iop_ShlV128,
+                                     binop( Iop_64HLtoV128,
+                                            mkU64( 0x0 ),
+                                            binop( Iop_Or64,
+                                                   binop( Iop_And64,
+                                                          mkexpr( value[i] ),
+                                                          mkU64( 0xFFFFFFFFFFFFF ) ),
+                                                   binop( Iop_Shl64,
+                                                          unop( Iop_1Uto64,
+                                                                normal[i]),
+                                                          mkU8( 52 ) ) ) ),
+                                     mkU8( (1-i)*64 ) ),
+                              mkexpr( new_XT[i] ) ) );
+            }
+            putVSReg( XT, mkexpr( new_XT[2] ) );
+
+         } else if (inst_select == 7) {
+            IRTemp sub_element0 = newTemp( Ity_V128 );
+            IRTemp sub_element1 = newTemp( Ity_V128 );
+
+            DIP("xxbrh v%d, v%d\n", (UInt)XT, (UInt)XB);
+
+            assign( sub_element0,
+                    binop( Iop_ShrV128,
+                           binop( Iop_AndV128,
+                                  binop(Iop_64HLtoV128,
+                                        mkU64( 0xFF00FF00FF00FF00 ),
+                                        mkU64( 0xFF00FF00FF00FF00 ) ),
+                                  mkexpr( vB ) ),
+                           mkU8( 8 ) ) );
+            assign( sub_element1,
+                    binop( Iop_ShlV128,
+                           binop( Iop_AndV128,
+                                  binop(Iop_64HLtoV128,
+                                        mkU64( 0x00FF00FF00FF00FF ),
+                                        mkU64( 0x00FF00FF00FF00FF ) ),
+                                  mkexpr( vB ) ),
+                           mkU8( 8 ) ) );
+
+            putVSReg(XT, binop( Iop_OrV128,
+                                mkexpr( sub_element1 ),
+                                mkexpr( sub_element0 ) ) );
+
+         } else if (inst_select == 8) {
+            DIP("xvxexpsp v%d,v%d\n", XT, XB);
+
+            putVSReg( XT, binop( Iop_ShrV128,
+                                 binop( Iop_AndV128,
+                                        mkexpr( vB ),
+                                        binop( Iop_64HLtoV128,
+                                               mkU64( 0x7F8000007F800000 ),
+                                               mkU64( 0x7F8000007F800000 ) ) ),
+                                 mkU8( 23 ) ) );
+         } else if (inst_select == 9) {
+            Int i;
+            IRExpr *normal[4];
+            IRTemp value[4];
+            IRTemp new_value[4];
+            IRTemp new_XT[5];
+
+            DIP("xvxsigsp v%d,v%d\n", XT, XB);
+            new_XT[0] = newTemp(Ity_V128);
+            assign( new_XT[0], binop( Iop_64HLtoV128,
+                                      mkU64( 0x0 ),
+                                      mkU64( 0x0 ) ) );
+
+            for (i = 0; i < 4; i++) {
+               value[i] = newTemp(Ity_I32);
+               assign( value[i],
+                       unop( Iop_64to32,
+                             unop( Iop_V128to64,
+                                   binop( Iop_AndV128,
+                                          binop( Iop_ShrV128,
+                                                 mkexpr( vB ),
+                                                 mkU8( (3-i)*32 ) ),
+                                          binop( Iop_64HLtoV128,
+                                                 mkU64( 0x0 ),
+                                                 mkU64( 0xFFFFFFFF ) ) ) ) ) );
+
+               new_XT[i+1] = newTemp(Ity_V128);
+
+               /* Value is normal if it isn't infinite, zero or denormalized */
+               normal[i] = mkNOT1( mkOR1(
+                                         mkOR1( is_NaN( Ity_I32, value[i] ),
+                                                is_Inf( Ity_I32, value[i] ) ),
+                                         mkOR1( is_Zero( Ity_I32, value[i] ),
+                                                is_Denorm( Ity_I32,
+                                                           value[i] ) ) ) );
+               new_value[i] = newTemp(Ity_I32);
+               assign( new_value[i],
+                       binop( Iop_Or32,
+                              binop( Iop_And32,
+                                     mkexpr( value[i] ),
+                                     mkU32( 0x7FFFFF ) ),
+                              binop( Iop_Shl32,
+                                     unop( Iop_1Uto32,
+                                           normal[i]),
+                                     mkU8( 23 ) ) ) );
+
+               assign( new_XT[i+1],
+                       binop( Iop_OrV128,
+                              binop( Iop_ShlV128,
+                                     binop( Iop_64HLtoV128,
+                                            mkU64( 0x0 ),
+                                            binop( Iop_32HLto64,
+                                                   mkU32( 0x0 ),
+                                                   mkexpr( new_value[i] ) ) ),
+                                     mkU8( (3-i)*32 ) ),
+                              mkexpr( new_XT[i] ) ) );
+            }
+            putVSReg( XT, mkexpr( new_XT[4] ) );
+
+         } else if (inst_select == 15) {
+            IRTemp sub_element0 = newTemp( Ity_V128 );
+            IRTemp sub_element1 = newTemp( Ity_V128 );
+            IRTemp sub_element2 = newTemp( Ity_V128 );
+            IRTemp sub_element3 = newTemp( Ity_V128 );
+
+            DIP("xxbrw v%d, v%d\n", (UInt)XT, (UInt)XB);
+
+            assign( sub_element0,
+                    binop( Iop_ShrV128,
+                           binop( Iop_AndV128,
+                                  binop(Iop_64HLtoV128,
+                                        mkU64( 0xFF000000FF000000 ),
+                                        mkU64( 0xFF000000FF000000 ) ),
+                                  mkexpr( vB ) ),
+                           mkU8( 24 ) ) );
+            assign( sub_element1,
+                    binop( Iop_ShrV128,
+                           binop( Iop_AndV128,
+                                  binop(Iop_64HLtoV128,
+                                        mkU64( 0x00FF000000FF0000 ),
+                                        mkU64( 0x00FF000000FF0000 ) ),
+                                  mkexpr( vB ) ),
+                           mkU8( 8 ) ) );
+            assign( sub_element2,
+                    binop( Iop_ShlV128,
+                           binop( Iop_AndV128,
+                                  binop(Iop_64HLtoV128,
+                                        mkU64( 0x0000FF000000FF00 ),
+                                        mkU64( 0x0000FF000000FF00 ) ),
+                                  mkexpr( vB ) ),
+                           mkU8( 8 ) ) );
+            assign( sub_element3,
+                    binop( Iop_ShlV128,
+                           binop( Iop_AndV128,
+                                  binop(Iop_64HLtoV128,
+                                        mkU64( 0x00000000FF000000FF ),
+                                        mkU64( 0x00000000FF000000FF ) ),
+                                  mkexpr( vB ) ),
+                           mkU8( 24 ) ) );
+
+            putVSReg( XT,
+                      binop( Iop_OrV128,
+                             binop( Iop_OrV128,
+                                    mkexpr( sub_element3 ),
+                                    mkexpr( sub_element2 ) ),
+                             binop( Iop_OrV128,
+                                    mkexpr( sub_element1 ),
+                                    mkexpr( sub_element0 ) ) ) );
+
+         } else if (inst_select == 23) {
+            DIP("xxbrd v%d, v%d\n", (UInt)XT, (UInt)XB);
+
+            int i;
+            int shift = 56;
+            IRTemp sub_element[16];
+            IRTemp new_xT[17];
+
+            new_xT[0] = newTemp( Ity_V128 );
+            assign( new_xT[0], binop( Iop_64HLtoV128,
+                                      mkU64( 0 ),
+                                      mkU64( 0 ) ) );
+
+            for ( i = 0; i < 4; i++ ) {
+               new_xT[i+1] = newTemp( Ity_V128 );
+               sub_element[i]   = newTemp( Ity_V128 );
+               sub_element[i+4] = newTemp( Ity_V128 );
+
+               assign( sub_element[i],
+                       binop( Iop_ShrV128,
+                              binop( Iop_AndV128,
+                                     binop( Iop_64HLtoV128,
+                                            mkU64( (0xFFULL << (7 - i) * 8) ),
+                                            mkU64( (0xFFULL << (7 - i) * 8) ) ),
+                                     mkexpr( vB ) ),
+                              mkU8( shift ) ) );
+
+               assign( sub_element[i+4],
+                       binop( Iop_ShlV128,
+                              binop( Iop_AndV128,
+                                     binop( Iop_64HLtoV128,
+                                            mkU64( (0xFFULL << i*8) ),
+                                            mkU64( (0xFFULL << i*8) ) ),
+                                     mkexpr( vB ) ),
+                              mkU8( shift ) ) );
+               shift = shift - 16;
+
+               assign( new_xT[i+1],
+                       binop( Iop_OrV128,
+                              mkexpr( new_xT[i] ),
+                              binop( Iop_OrV128,
+                                     mkexpr ( sub_element[i] ),
+                                     mkexpr ( sub_element[i+4] ) ) ) );
+               }
+
+            putVSReg( XT, mkexpr( new_xT[4] ) );
+
+         } else if (inst_select == 24) {
+            // xvcvhpsp, (VSX Vector Convert half-precision format to
+            //           Single-precision format)
+            /* only supported on ISA 3.0 and newer */
+            IRTemp result = newTemp( Ity_V128 );
+            IRTemp src  = newTemp( Ity_I64 );
+
+            if (!allow_isa_3_0) return False;
+
+            DIP("xvcvhpsp v%d,v%d\n", XT,XB);
+            /* The instruction does not set the C or FPCC fields.  The
+             * instruction takes four 16-bit values stored in a 128-bit value
+             * as follows:   x V | x V | x V | x V  where V is a 16-bit
+             * value and x is an unused 16-bit value. To use Iop_F16toF32x4
+             * the four 16-bit values will be gathered into a single 64 bit
+             * value.  The backend will scatter the four 16-bit values back
+             * into a 128-bit operand before issuing the instruction.
+             */
+            /* Gather 16-bit float values from V128 source into new 64-bit
+             * source value for the Iop.
+             */
+            assign( src,
+                    unop( Iop_V128to64,
+                          binop( Iop_Perm8x16,
+                                 mkexpr( vB ),
+                                 binop ( Iop_64HLtoV128,
+                                         mkU64( 0 ),
+                                         mkU64( 0x020306070A0B0E0F) ) ) ) );
+
+            assign( result, unop( Iop_F16toF32x4, mkexpr( src ) ) );
+
+            putVSReg( XT, mkexpr( result ) );
+
+         } else if (inst_select == 25) {
+            // xvcvsphp, (VSX Vector round and Convert single-precision
+            // format to half-precision format)
+            /* only supported on ISA 3.0 and newer */
+            IRTemp result = newTemp( Ity_V128 );
+            IRTemp tmp64  = newTemp( Ity_I64 );
+
+            if (!allow_isa_3_0) return False;
+            DIP("xvcvsphp v%d,v%d\n", XT,XB);
+
+            /* Iop_F32toF16x4 is V128 -> I64, scatter the 16-bit floats in the
+             * I64 result to the V128 register to store.
+             */
+            assign( tmp64, unop( Iop_F32toF16x4, mkexpr( vB ) ) );
+
+            /* Scatter 16-bit float values from returned 64-bit value
+             * of V128 result.
+             */
+            if (host_endness == VexEndnessLE)
+               /* Note location 0 may have a valid number in it.  Location
+                * 15 should always be zero.  Use 0xF to put zeros in the
+                * desired bytes.
+                */
+               assign( result,
+                       binop( Iop_Perm8x16,
+                              binop( Iop_64HLtoV128,
+                                     mkexpr( tmp64 ),
+                                     mkU64( 0 ) ),
+                              binop ( Iop_64HLtoV128,
+                                      mkU64( 0x0F0F00010F0F0203 ),
+                                      mkU64( 0x0F0F04050F0F0607 ) ) ) );
+            else
+               assign( result,
+                       binop( Iop_Perm8x16,
+                              binop( Iop_64HLtoV128,
+                                     mkexpr( tmp64 ),
+                                     mkU64( 0 ) ),
+                              binop ( Iop_64HLtoV128,
+                                      mkU64( 0x0F0F06070F0F0405 ),
+                                      mkU64( 0x0F0F02030F0F0001 ) ) ) );
+            putVSReg( XT, mkexpr( result ) );
+
+         } else if ( inst_select == 31 ) {
+            int i;
+            int shift_left = 8;
+            int shift_right = 120;
+            IRTemp sub_element[16];
+            IRTemp new_xT[9];
+
+            DIP("xxbrq v%d, v%d\n", (UInt) XT, (UInt) XB);
+
+            new_xT[0] = newTemp( Ity_V128 );
+            assign( new_xT[0], binop( Iop_64HLtoV128,
+                                      mkU64( 0 ),
+                                      mkU64( 0 ) ) );
+
+            for ( i = 0; i < 8; i++ ) {
+               new_xT[i+1] = newTemp( Ity_V128 );
+               sub_element[i]   = newTemp( Ity_V128 );
+               sub_element[i+8] = newTemp( Ity_V128 );
+
+               assign( sub_element[i],
+                       binop( Iop_ShrV128,
+                              binop( Iop_AndV128,
+                                     binop( Iop_64HLtoV128,
+                                            mkU64( ( 0xFFULL << (7 - i) * 8 ) ),
+                                            mkU64( 0x0ULL ) ),
+                                     mkexpr( vB ) ),
+                              mkU8( shift_right ) ) );
+               shift_right = shift_right - 16;
+
+               assign( sub_element[i+8],
+                       binop( Iop_ShlV128,
+                              binop( Iop_AndV128,
+                                     binop( Iop_64HLtoV128,
+                                            mkU64( 0x0ULL ),
+                                            mkU64( ( 0xFFULL << (7 - i) * 8 ) ) ),
+                                     mkexpr( vB ) ),
+                              mkU8( shift_left ) ) );
+               shift_left = shift_left + 16;
+
+               assign( new_xT[i+1],
+                       binop( Iop_OrV128,
+                              mkexpr( new_xT[i] ),
+                              binop( Iop_OrV128,
+                                     mkexpr ( sub_element[i] ),
+                                     mkexpr ( sub_element[i+8] ) ) ) );
+               }
+
+            putVSReg( XT, mkexpr( new_xT[8] ) );
+
+         } else {
+            vex_printf("dis_vxs_misc(ppc) Invalid instruction selection\n");
+            return False;
+         }
+         break;
+      }
+
+      case 0x3D4: // xvtstdcdp (VSX Test Data Class Double-Precision)
+      {
+         UInt DX_mask = IFIELD( theInstr, 16, 5 );
+         UInt DC_mask = IFIELD( theInstr, 6, 1 );
+         UInt DM_mask = IFIELD( theInstr, 2, 1 );
+         UInt DCMX_mask = (DC_mask << 6) | (DM_mask << 5) | DX_mask;
+
+         IRTemp NaN[2], inf[2], zero[2], dnorm[2], pos[2], DCM[2];
+         IRTemp match_value[2];
+         IRTemp value[2];
+         Int i;
+
+         DIP("xvtstdcdp v%d,v%d,%d\n", (UInt)XT, (UInt)XB, DCMX_mask);
+
+         for (i = 0; i < 2; i++) {
+            NaN[i] = newTemp(Ity_I64);
+            inf[i] = newTemp(Ity_I64);
+            pos[i] = newTemp(Ity_I64);
+            DCM[i] = newTemp(Ity_I64);
+            zero[i] = newTemp(Ity_I64);
+            dnorm[i] = newTemp(Ity_I64);
+
+            value[i] = newTemp(Ity_I64);
+            match_value[i] = newTemp(Ity_I64);
+
+            assign( value[i],
+                    unop( Iop_V128to64,
+                          binop( Iop_AndV128,
+                                 binop( Iop_ShrV128,
+                                        mkexpr( vB ),
+                                        mkU8( (1-i)*64 ) ),
+                                 binop( Iop_64HLtoV128,
+                                        mkU64( 0x0 ),
+                                        mkU64( 0xFFFFFFFFFFFFFFFF ) ) ) ) );
+
+            assign( pos[i], unop( Iop_1Uto64,
+                                 binop( Iop_CmpEQ64,
+                                        binop( Iop_Shr64,
+                                               mkexpr( value[i] ),
+                                               mkU8( 63 ) ),
+                                        mkU64( 0 ) ) ) );
+
+            assign( NaN[i], unop( Iop_1Uto64, is_NaN( Ity_I64, value[i] ) ) );
+            assign( inf[i], unop( Iop_1Uto64, is_Inf( Ity_I64, value[i] ) ) );
+            assign( zero[i], unop( Iop_1Uto64, is_Zero( Ity_I64, value[i] ) ) );
+            assign( dnorm[i], unop( Iop_1Uto64, is_Denorm( Ity_I64,
+                                                           value[i] ) ) );
+
+            assign( DCM[i], create_DCM( Ity_I64, NaN[i], inf[i], zero[i],
+                                        dnorm[i], pos[i] ) );
+
+            assign( match_value[i],
+                    unop( Iop_1Sto64,
+                          binop( Iop_CmpNE64,
+                                 binop( Iop_And64,
+                                        mkU64( DCMX_mask ),
+                                        mkexpr( DCM[i] ) ),
+                                 mkU64( 0 ) ) ) );
+         }
+         putVSReg( XT, binop( Iop_64HLtoV128,
+                              mkexpr( match_value[0] ),
+                              mkexpr( match_value[1] ) ) );
+      }
+      break;
+
+      case 0x3E0:  // xviexpdp (VSX Vector Insert Exponent Double-Precision)
+      {
+            Int i;
+            IRTemp new_XT[3];
+            IRTemp A_value[2];
+            IRTemp B_value[2];
+            IRExpr *sign[2], *expr[2], *fract[2];
+
+            DIP("xviexpdp v%d,v%d\n", XT, XB);
+            new_XT[0] = newTemp(Ity_V128);
+            assign( new_XT[0], binop( Iop_64HLtoV128,
+                                      mkU64( 0x0 ),
+                                      mkU64( 0x0 ) ) );
+
+            for (i = 0; i < 2; i++) {
+               A_value[i] = newTemp(Ity_I64);
+               B_value[i] = newTemp(Ity_I64);
+
+               assign( A_value[i],
+                       unop( Iop_V128to64,
+                             binop( Iop_AndV128,
+                                    binop( Iop_ShrV128,
+                                           mkexpr( vA ),
+                                           mkU8( (1-i)*64 ) ),
+                                    binop( Iop_64HLtoV128,
+                                           mkU64( 0x0 ),
+                                           mkU64( 0xFFFFFFFFFFFFFFFF ) ) ) ) );
+               assign( B_value[i],
+                       unop( Iop_V128to64,
+                             binop( Iop_AndV128,
+                                    binop( Iop_ShrV128,
+                                           mkexpr( vB ),
+                                           mkU8( (1-i)*64 ) ),
+                                    binop( Iop_64HLtoV128,
+                                           mkU64( 0x0 ),
+                                           mkU64( 0xFFFFFFFFFFFFFFFF ) ) ) ) );
+
+               sign[i] = binop( Iop_And64, mkexpr( A_value[i] ),
+                                mkU64( 0x8000000000000000 ) );
+               expr[i] = binop( Iop_Shl64,
+                                binop( Iop_And64, mkexpr( B_value[i] ),
+                                       mkU64( 0x7FF ) ),
+                                mkU8( 52 ) );
+               fract[i] = binop( Iop_And64, mkexpr( A_value[i] ),
+                                 mkU64( 0x000FFFFFFFFFFFFF ) );
+
+               new_XT[i+1] = newTemp(Ity_V128);
+               assign( new_XT[i+1],
+                       binop( Iop_OrV128,
+                              binop( Iop_ShlV128,
+                                     binop( Iop_64HLtoV128,
+                                            mkU64( 0 ),
+                                            binop( Iop_Or64,
+                                                   binop( Iop_Or64,
+                                                          sign[i],
+                                                          expr[i] ),
+                                                   fract[i] ) ),
+                                     mkU8( (1-i)*64 ) ),
+                              mkexpr( new_XT[i] ) ) );
+            }
+            putVSReg( XT, mkexpr( new_XT[2] ) );
+      }
+      break;
+
       default:
          vex_printf( "dis_vxs_misc(ppc)(opc2)\n" );
          return False;
@@ -15564,6 +19602,370 @@
                            mkU64(0) ) );
       break;
    }
+   case 0x10C: // lxvx
+   {
+      UInt ea_off = 0;
+      IRExpr* irx_addr;
+      IRTemp word[4];
+      int i;
+
+      DIP("lxvx %d,r%u,r%u\n", (UInt)XT, rA_addr, rB_addr);
+
+      if ( host_endness == VexEndnessBE ) {
+         for ( i = 3; i>= 0; i-- ) {
+            word[i] = newTemp( Ity_I64 );
+
+            irx_addr =
+               binop( mkSzOp( ty, Iop_Add8 ), mkexpr( EA ),
+                      ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+
+            assign( word[i], unop( Iop_32Uto64,
+                                   load( Ity_I32, irx_addr ) ) );
+            ea_off += 4;
+         }
+
+         putVSReg( XT, binop( Iop_64HLtoV128,
+                              binop( Iop_Or64,
+                                     mkexpr( word[2] ),
+                                     binop( Iop_Shl64,
+                                            mkexpr( word[3] ),
+                                            mkU8( 32 ) ) ),
+                              binop( Iop_Or64,
+                                     mkexpr( word[0] ),
+                                     binop( Iop_Shl64,
+                                            mkexpr( word[1] ),
+                                            mkU8( 32 ) ) ) ) );
+      } else {
+         for ( i = 0; i< 4; i++ ) {
+            word[i] = newTemp( Ity_I64 );
+
+            irx_addr =
+               binop( mkSzOp( ty, Iop_Add8 ), mkexpr( EA ),
+                      ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+
+            assign( word[i], unop( Iop_32Uto64,
+                                   load( Ity_I32, irx_addr ) ) );
+            ea_off += 4;
+         }
+
+         putVSReg( XT, binop( Iop_64HLtoV128,
+                              binop( Iop_Or64,
+                                     mkexpr( word[2] ),
+                                     binop( Iop_Shl64,
+                                            mkexpr( word[3] ),
+                                            mkU8( 32 ) ) ),
+                              binop( Iop_Or64,
+                                     mkexpr( word[0] ),
+                                     binop( Iop_Shl64,
+                                            mkexpr( word[1] ),
+                                            mkU8( 32 ) ) ) ) );
+      }
+      break;
+   }
+
+   case 0x10D: // lxvl
+   {
+      DIP("lxvl %d,r%u,r%u\n", (UInt)XT, rA_addr, rB_addr);
+
+      IRTemp byte[16];
+      UInt i;
+      UInt ea_off = 0;
+      IRExpr* irx_addr;
+      IRTemp tmp_low[9];
+      IRTemp tmp_hi[9];
+      IRTemp shift = newTemp( Ity_I8 );
+      IRTemp nb_gt16 = newTemp( Ity_I8 );
+      IRTemp ld_result = newTemp( Ity_V128 );
+      IRTemp nb_not_zero = newTemp( Ity_I64 );
+
+      IRTemp base_addr = newTemp( ty );
+
+      tmp_low[0] = newTemp( Ity_I64 );
+      tmp_hi[0] = newTemp( Ity_I64 );
+
+      assign( base_addr, ea_rAor0( rA_addr ) );
+      assign( tmp_low[0], mkU64( 0 ) );
+      assign( tmp_hi[0], mkU64( 0 ) );
+
+      /* shift is 15 - nb, where nb = rB[0:7], used to zero out upper bytes */
+      assign( nb_not_zero, unop( Iop_1Sto64,
+                                  binop( Iop_CmpNE64,
+                                         mkU64( 0 ),
+                                         binop( Iop_Shr64,
+                                                getIReg( rB_addr ),
+                                                mkU8( 56 ) ) ) ) );
+
+      assign( nb_gt16, unop( Iop_1Sto8,
+                             binop( Iop_CmpLT64U,
+                                    binop( Iop_Shr64,
+                                           getIReg( rB_addr ),
+                                           mkU8( 60 ) ),
+                                    mkU64( 1 ) ) ) );
+
+      /* Set the shift to 0, by ANDing with nb_gt16.  nb_gt16 will be all
+       * zeros if nb > 16.  This will result in quad word load being stored.
+       */
+      assign( shift,
+              binop( Iop_And8,
+                     unop( Iop_64to8,
+                           binop( Iop_Mul64,
+                                  binop( Iop_Sub64,
+                                         mkU64 ( 16 ),
+                                         binop( Iop_Shr64,
+                                                getIReg( rB_addr ),
+                                                mkU8( 56 ) ) ),
+                                  mkU64( 8 ) ) ),
+                     mkexpr( nb_gt16 ) ) );
+
+      /* fetch all 16 bytes, we will remove what we don't want later */
+      if ( host_endness == VexEndnessBE ) {
+         for ( i = 0; i < 8; i++ ) {
+            byte[i] = newTemp( Ity_I64 );
+            tmp_hi[i+1] = newTemp( Ity_I64 );
+
+            irx_addr =
+               binop( mkSzOp( ty, Iop_Add8 ), mkexpr( base_addr ),
+                      ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+            ea_off += 1;
+
+            assign( byte[i], binop( Iop_Shl64,
+                                      unop( Iop_8Uto64,
+                                            load( Ity_I8, irx_addr ) ),
+                                      mkU8( 8 * ( 7 - i ) ) ) );
+
+            assign( tmp_hi[i+1], binop( Iop_Or64,
+                                        mkexpr( byte[i] ),
+                                        mkexpr( tmp_hi[i] ) ) );
+         }
+
+         for ( i = 0; i < 8; i++ ) {
+            byte[i+8] = newTemp( Ity_I64 );
+            tmp_low[i+1] = newTemp( Ity_I64 );
+
+            irx_addr =
+               binop( mkSzOp( ty, Iop_Add8 ), mkexpr( base_addr ),
+                      ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+            ea_off += 1;
+
+            assign( byte[i+8], binop( Iop_Shl64,
+                                    unop( Iop_8Uto64,
+                                          load( Ity_I8, irx_addr ) ),
+                                    mkU8( 8 * ( 7 - i ) ) ) );
+
+            assign( tmp_low[i+1], binop( Iop_Or64,
+                                         mkexpr( byte[i+8] ),
+                                         mkexpr( tmp_low[i] ) ) );
+         }
+         assign( ld_result, binop( Iop_ShlV128,
+                                   binop( Iop_ShrV128,
+                                          binop( Iop_64HLtoV128,
+                                                 mkexpr( tmp_hi[8] ),
+                                                 mkexpr( tmp_low[8] ) ),
+                                          mkexpr( shift ) ),
+                                   mkexpr( shift ) ) );
+      } else {
+         for ( i = 0; i < 8; i++ ) {
+            byte[i] = newTemp( Ity_I64 );
+            tmp_low[i+1] = newTemp( Ity_I64 );
+
+            irx_addr =
+               binop( mkSzOp( ty, Iop_Add8 ), mkexpr( base_addr ),
+                      ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+            ea_off += 1;
+
+            assign( byte[i], binop( Iop_Shl64,
+                                    unop( Iop_8Uto64,
+                                          load( Ity_I8, irx_addr ) ),
+                                    mkU8( 8 * i ) ) );
+
+            assign( tmp_low[i+1],
+                    binop( Iop_Or64,
+                           mkexpr( byte[i] ), mkexpr( tmp_low[i] ) ) );
+         }
+
+         for ( i = 0; i < 8; i++ ) {
+            byte[i + 8] = newTemp( Ity_I64 );
+            tmp_hi[i+1] = newTemp( Ity_I64 );
+
+            irx_addr =
+               binop( mkSzOp( ty, Iop_Add8 ), mkexpr( base_addr ),
+                      ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+            ea_off += 1;
+
+            assign( byte[i+8], binop( Iop_Shl64,
+                                      unop( Iop_8Uto64,
+                                            load( Ity_I8, irx_addr ) ),
+                                      mkU8( 8 * i ) ) );
+
+            assign( tmp_hi[i+1], binop( Iop_Or64,
+                                        mkexpr( byte[i+8] ),
+                                        mkexpr( tmp_hi[i] ) ) );
+         }
+         assign( ld_result, binop( Iop_ShrV128,
+                                   binop( Iop_ShlV128,
+                                          binop( Iop_64HLtoV128,
+                                                 mkexpr( tmp_hi[8] ),
+                                                 mkexpr( tmp_low[8] ) ),
+                                          mkexpr( shift ) ),
+                                   mkexpr( shift ) ) );
+      }
+
+
+      /* If nb = 0, mask out the calculated load result so the stored
+       * value is zero.
+       */
+
+      putVSReg( XT, binop( Iop_AndV128,
+                           mkexpr( ld_result ),
+                           binop( Iop_64HLtoV128,
+                                  mkexpr( nb_not_zero ),
+                                  mkexpr( nb_not_zero ) ) ) );
+      break;
+   }
+
+   case 0x12D: // lxvll (Load VSX Vector Left-Justified with Length XX1 form)
+   {
+      IRTemp byte[16];
+      IRTemp tmp_low[9];
+      IRTemp tmp_hi[9];
+      IRTemp mask = newTemp(Ity_V128);
+      IRTemp rB = newTemp( Ity_I64 );
+      IRTemp nb = newTemp( Ity_I64 );
+      IRTemp nb_zero = newTemp(Ity_V128);
+      IRTemp mask_shift = newTemp(Ity_I64);
+      Int i;
+      UInt ea_off = 0;
+      IRExpr* irx_addr;
+      IRTemp base_addr = newTemp( ty );
+      IRTemp nb_compare_zero = newTemp( Ity_I64 );
+
+      DIP("lxvll %d,r%u,r%u\n", (UInt)XT, rA_addr, rB_addr);
+
+      tmp_low[0] = newTemp(Ity_I64);
+      tmp_hi[0] = newTemp(Ity_I64);
+
+      assign( rB, getIReg(rB_addr));
+      assign( base_addr, ea_rAor0( rA_addr ) );
+      assign( tmp_low[0], mkU64( 0 ) );
+      assign( tmp_hi[0], mkU64( 0 ) );
+
+      /* mask_shift is number of 16 bytes minus (nb times 8-bits per byte) */
+      assign( nb, binop( Iop_Shr64, mkexpr( rB ), mkU8( 56 )  ) );
+
+      assign( nb_compare_zero, unop( Iop_1Sto64,
+                                     binop( Iop_CmpEQ64,
+                                            mkexpr( nb ),
+                                            mkU64( 0 ) ) ) );
+
+      /* nb_zero is 0xFF..FF if the nb_field = 0 */
+      assign( nb_zero, binop( Iop_64HLtoV128,
+                              mkexpr( nb_compare_zero ),
+                              mkexpr( nb_compare_zero ) ) );
+
+      assign( mask_shift, binop( Iop_Sub64,
+                                 mkU64( 16*8 ),
+                                 binop( Iop_Mul64,
+                                        mkexpr( nb ),
+                                        mkU64( 8 ) ) ) );
+
+      /* fetch all 16 bytes, we will remove what we don't want later */
+      for (i = 0; i < 8; i++) {
+         byte[i] = newTemp(Ity_I64);
+         tmp_hi[i+1] = newTemp(Ity_I64);
+
+         irx_addr = binop( mkSzOp( ty, Iop_Add8 ), mkexpr( base_addr ),
+                           ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+         ea_off += 1;
+
+         /* Instruction always loads in Big Endian format */
+         assign( byte[i], binop( Iop_Shl64,
+                                 unop( Iop_8Uto64,
+                                       load( Ity_I8, irx_addr ) ),
+                                 mkU8( 8 * (7 - i) ) ) );
+         assign( tmp_hi[i+1],
+                 binop( Iop_Or64,
+                        mkexpr( byte[i] ), mkexpr( tmp_hi[i] ) ) );
+      }
+
+      for (i = 0; i < 8; i++) {
+         byte[i + 8]  = newTemp(Ity_I64);
+         tmp_low[i+1] = newTemp(Ity_I64);
+
+         irx_addr = binop( mkSzOp( ty, Iop_Add8 ), mkexpr( base_addr ),
+                           ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+         ea_off += 1;
+
+         /* Instruction always loads in Big Endian format */
+        assign( byte[i+8], binop( Iop_Shl64,
+                                   unop( Iop_8Uto64,
+                                         load( Ity_I8, irx_addr ) ),
+                                   mkU8( 8 * (7 - i) ) ) );
+         assign( tmp_low[i+1], binop( Iop_Or64,
+                                      mkexpr( byte[i+8] ),
+                                      mkexpr( tmp_low[i] ) ) );
+      }
+
+      /* Create mask to clear the right most 16 - nb bytes, set to zero
+       * if nb= 0.
+       */
+      assign( mask, binop( Iop_AndV128,
+                          binop( Iop_ShlV128,
+                                  binop( Iop_ShrV128,
+                                         mkV128( 0xFFFF ),
+                                         unop( Iop_64to8, mkexpr( mask_shift ) ) ),
+                                  unop( Iop_64to8, mkexpr( mask_shift ) ) ),
+                           unop( Iop_NotV128, mkexpr( nb_zero ) ) ) );
+
+      putVSReg( XT, binop( Iop_AndV128,
+                           mkexpr( mask ),
+                           binop( Iop_64HLtoV128,
+                                  mkexpr( tmp_hi[8] ),
+                                  mkexpr( tmp_low[8] ) ) ) );
+      break;
+   }
+
+   case 0x16C: // lxvwsx
+   {
+      IRTemp data = newTemp( Ity_I64 );
+
+      DIP("lxvwsx %d,r%u,r%u\n", (UInt)XT, rA_addr, rB_addr);
+
+      /* The load is a 64-bit fetch that is Endian aware, just want
+       * the lower 32 bits. */
+      if ( host_endness == VexEndnessBE ) {
+         UInt ea_off = 4;
+         IRExpr* irx_addr;
+
+         irx_addr =
+            binop( mkSzOp( ty, Iop_Sub8 ), mkexpr( EA ),
+                   ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+
+         assign( data, binop( Iop_And64,
+                              load( Ity_I64, irx_addr ),
+                              mkU64( 0xFFFFFFFF ) ) );
+
+      } else {
+         assign( data, binop( Iop_And64,
+                              load( Ity_I64, mkexpr( EA ) ),
+                              mkU64( 0xFFFFFFFF ) ) );
+      }
+
+      /* Take lower 32-bits and spat across the four word positions */
+      putVSReg( XT,
+                binop( Iop_64HLtoV128,
+                       binop( Iop_Or64,
+                              mkexpr( data ),
+                              binop( Iop_Shl64,
+                                     mkexpr( data ),
+                                     mkU8( 32 ) ) ),
+                       binop( Iop_Or64,
+                              mkexpr( data ),
+                              binop( Iop_Shl64,
+                                     mkexpr( data ),
+                                     mkU8( 32 ) ) ) ) );
+      break;
+   }
+
    case 0x20C: // lxsspx (Load VSX Scalar Single-Precision Indexed)
    {
       IRExpr * exp;
@@ -15591,6 +19993,50 @@
       putVSReg( XT, binop( Iop_64HLtoV128, exp, exp ) );
       break;
    }
+
+   case 0x30D: // lxsibzx
+   {
+      IRExpr *byte;
+      IRExpr* irx_addr;
+
+      DIP("lxsibzx %d,r%u,r%u\n", (UInt)XT, rA_addr, rB_addr);
+
+      if ( host_endness == VexEndnessBE )
+         irx_addr = binop( Iop_Sub64, mkexpr( EA ), mkU64( 7 ) );
+
+      else
+         irx_addr = mkexpr( EA );
+
+      byte = load( Ity_I64, irx_addr );
+      putVSReg( XT, binop( Iop_64HLtoV128,
+                            binop( Iop_And64,
+                                   byte,
+                                   mkU64( 0xFF ) ),
+                           mkU64( 0 ) ) );
+      break;
+   }
+
+   case 0x32D: // lxsihzx
+   {
+      IRExpr *byte;
+      IRExpr* irx_addr;
+
+      DIP("lxsihzx %d,r%u,r%u\n", (UInt)XT, rA_addr, rB_addr);
+
+      if ( host_endness == VexEndnessBE )
+         irx_addr = binop( Iop_Sub64, mkexpr( EA ), mkU64( 6 ) );
+
+      else
+         irx_addr = mkexpr( EA );
+
+      byte = load( Ity_I64, irx_addr );
+      putVSReg( XT, binop( Iop_64HLtoV128,
+                            binop( Iop_And64,
+                                   byte,
+                                   mkU64( 0xFFFF ) ),
+                           mkU64( 0 ) ) );
+      break;
+   }
    case 0x34C: // lxvd2x
    {
       IROp addOp = ty == Ity_I64 ? Iop_Add64 : Iop_Add32;
@@ -15638,6 +20084,118 @@
       putVSReg( XT, t0 );
       break;
    }
+
+   case 0x32C: // lxvh8x
+   {
+      DIP("lxvh8x %d,r%u,r%u\n", (UInt)XT, rA_addr, rB_addr);
+
+      IRTemp h_word[8];
+      int i;
+      UInt ea_off = 0;
+      IRExpr* irx_addr;
+      IRTemp tmp_low[5];
+      IRTemp tmp_hi[5];
+
+      tmp_low[0] = newTemp( Ity_I64 );
+      tmp_hi[0] = newTemp( Ity_I64 );
+      assign( tmp_low[0], mkU64( 0 ) );
+      assign( tmp_hi[0], mkU64( 0 ) );
+
+      for ( i = 0; i < 4; i++ ) {
+         h_word[i]    = newTemp(Ity_I64);
+         tmp_low[i+1] = newTemp(Ity_I64);
+
+         irx_addr = binop( mkSzOp( ty, Iop_Add8 ), mkexpr( EA ),
+                           ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+         ea_off += 2;
+
+         assign( h_word[i], binop( Iop_Shl64,
+                                   unop( Iop_16Uto64,
+                                         load( Ity_I16, irx_addr ) ),
+                                   mkU8( 16 * ( 3 - i ) ) ) );
+
+         assign( tmp_low[i+1],
+                 binop( Iop_Or64,
+                        mkexpr( h_word[i] ), mkexpr( tmp_low[i] ) ) );
+      }
+
+      for ( i = 0; i < 4; i++ ) {
+         h_word[i+4]   = newTemp( Ity_I64 );
+         tmp_hi[i+1] = newTemp( Ity_I64 );
+
+         irx_addr = binop( mkSzOp( ty, Iop_Add8 ), mkexpr( EA ),
+                           ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+         ea_off += 2;
+
+         assign( h_word[i+4], binop( Iop_Shl64,
+                                     unop( Iop_16Uto64,
+                                           load( Ity_I16, irx_addr ) ),
+                                     mkU8( 16 * ( 3 - i ) ) ) );
+
+         assign( tmp_hi[i+1], binop( Iop_Or64,
+                                     mkexpr( h_word[i+4] ),
+                                     mkexpr( tmp_hi[i] ) ) );
+      }
+      putVSReg( XT, binop( Iop_64HLtoV128,
+                           mkexpr( tmp_low[4] ), mkexpr( tmp_hi[4] ) ) );
+      break;
+   }
+
+   case 0x36C: // lxvb16x
+   {
+      DIP("lxvb16x %d,r%u,r%u\n", (UInt)XT, rA_addr, rB_addr);
+
+      IRTemp byte[16];
+      int i;
+      UInt ea_off = 0;
+      IRExpr* irx_addr;
+      IRTemp tmp_low[9];
+      IRTemp tmp_hi[9];
+
+      tmp_low[0] = newTemp( Ity_I64 );
+      tmp_hi[0] = newTemp( Ity_I64 );
+      assign( tmp_low[0], mkU64( 0 ) );
+      assign( tmp_hi[0], mkU64( 0 ) );
+
+      for ( i = 0; i < 8; i++ ) {
+         byte[i] = newTemp( Ity_I64 );
+         tmp_low[i+1] = newTemp( Ity_I64 );
+
+         irx_addr = binop( mkSzOp( ty, Iop_Add8 ), mkexpr( EA ),
+                           ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+         ea_off += 1;
+
+         assign( byte[i], binop( Iop_Shl64,
+                                 unop( Iop_8Uto64,
+                                       load( Ity_I8, irx_addr ) ),
+                                 mkU8( 8 * ( 7 - i ) ) ) );
+
+         assign( tmp_low[i+1],
+                 binop( Iop_Or64,
+                        mkexpr( byte[i] ), mkexpr( tmp_low[i] ) ) );
+      }
+
+      for ( i = 0; i < 8; i++ ) {
+         byte[i + 8] = newTemp( Ity_I64 );
+         tmp_hi[i+1] = newTemp( Ity_I64 );
+
+         irx_addr = binop( mkSzOp( ty, Iop_Add8 ), mkexpr( EA ),
+                           ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+         ea_off += 1;
+
+         assign( byte[i+8], binop( Iop_Shl64,
+                                   unop( Iop_8Uto64,
+                                         load( Ity_I8, irx_addr ) ),
+                                   mkU8( 8 * ( 7 - i ) ) ) );
+         assign( tmp_hi[i+1], binop( Iop_Or64,
+                                     mkexpr( byte[i+8] ),
+                                     mkexpr( tmp_hi[i] ) ) );
+      }
+      putVSReg( XT, binop( Iop_64HLtoV128,
+                           mkexpr( tmp_low[8] ), mkexpr( tmp_hi[8] ) ) );
+      break;
+   }
+
    default:
       vex_printf( "dis_vx_load(ppc)(opc2)\n" );
       return False;
@@ -15646,6 +20204,78 @@
 }
 
 /*
+ * VSX Move Instructions
+ */
+static Bool
+dis_vx_move ( UInt theInstr )
+{
+   /* XX1-Form */
+   UChar opc1 = ifieldOPC( theInstr );
+   UChar XS = ifieldRegXS( theInstr );
+   UChar rA_addr = ifieldRegA( theInstr );
+   UChar rB_addr = ifieldRegB( theInstr );
+   IRTemp vS = newTemp( Ity_V128 );
+   UInt opc2 = ifieldOPClo10( theInstr );
+   IRType ty = Ity_I64;
+
+   if ( opc1 != 0x1F ) {
+      vex_printf( "dis_vx_move(ppc)(instr)\n" );
+      return False;
+   }
+
+   switch (opc2) {
+   case 0x133: // mfvsrld RA,XS   Move From VSR Lower Doubleword
+      DIP("mfvsrld %d,r%u\n", (UInt)XS, rA_addr);
+
+      assign( vS, getVSReg( XS ) );
+      putIReg( rA_addr, unop(Iop_V128to64, mkexpr( vS) ) );
+
+      break;
+
+   case 0x193: // mfvsrdd  XT,RA,RB  Move to VSR Double Doubleword
+   {
+      IRTemp tmp = newTemp( Ity_I32 );
+
+      DIP("mfvsrdd %d,r%u\n", (UInt)XS, rA_addr);
+
+      assign( tmp, unop( Iop_64to32, getIReg(rA_addr) ) );
+      assign( vS, binop( Iop_64HLtoV128,
+                         binop( Iop_32HLto64,
+                                mkexpr( tmp ),
+                                mkexpr( tmp ) ),
+                         binop( Iop_32HLto64,
+                                mkexpr( tmp ),
+                                mkexpr( tmp ) ) ) );
+      putVSReg( XS, mkexpr( vS ) );
+   }
+   break;
+
+   case 0x1B3: // mtvsrws  XT,RA  Move to VSR word & Splat
+   {
+      IRTemp rA = newTemp( ty );
+      IRTemp rB = newTemp( ty );
+
+      DIP("mfvsrws %d,r%u\n", (UInt)XS, rA_addr);
+
+      if ( rA_addr == 0 )
+         assign( rA, mkU64 ( 0 ) );
+      else
+         assign( rA, getIReg(rA_addr) );
+
+      assign( rB, getIReg(rB_addr) );
+      assign( vS, binop( Iop_64HLtoV128, mkexpr( rA ), mkexpr( rB ) ) );
+      putVSReg( XS, mkexpr( vS ) );
+   }
+   break;
+
+   default:
+      vex_printf( "dis_vx_move(ppc)(opc2)\n" );
+      return False;
+   }
+   return True;
+}
+
+/*
  * VSX Store Instructions
  * NOTE: VSX supports word-aligned storage access.
  */
@@ -15684,11 +20314,495 @@
       store( mkexpr( EA ), low32 );
       break;
    }
+
+   case 0x18C: // stxvx Store VSX Vector Indexed
+   {
+      UInt ea_off = 0;
+      IRExpr* irx_addr;
+      IRTemp word0 = newTemp( Ity_I64 );
+      IRTemp word1 = newTemp( Ity_I64 );
+      IRTemp word2 = newTemp( Ity_I64 );
+      IRTemp word3 = newTemp( Ity_I64 );
+      DIP("stxvx %d,r%u,r%u\n", (UInt)XS, rA_addr, rB_addr);
+
+      assign( word0,  binop( Iop_Shr64,
+                             unop( Iop_V128HIto64, mkexpr( vS ) ),
+                             mkU8( 32 ) ) );
+
+      assign( word1, binop( Iop_And64,
+                            unop( Iop_V128HIto64, mkexpr( vS ) ),
+                            mkU64( 0xFFFFFFFF ) ) );
+
+      assign( word2, binop( Iop_Shr64,
+                            unop( Iop_V128to64, mkexpr( vS ) ),
+                            mkU8( 32 ) ) );
+
+      assign( word3, binop( Iop_And64,
+                            unop( Iop_V128to64, mkexpr( vS ) ),
+                            mkU64( 0xFFFFFFFF ) ) );
+
+      store( mkexpr( EA ), unop( Iop_64to32, mkexpr( word0 ) ) );
+
+      ea_off += 4;
+      irx_addr = binop( mkSzOp( ty, Iop_Add8 ), mkexpr( EA ),
+                        ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+
+      store( irx_addr, unop( Iop_64to32, mkexpr( word1 ) ) );
+
+      ea_off += 4;
+      irx_addr = binop( mkSzOp( ty, Iop_Add8 ), mkexpr( EA ),
+                        ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+
+      store( irx_addr, unop( Iop_64to32, mkexpr( word2 ) ) );
+      ea_off += 4;
+      irx_addr = binop( mkSzOp( ty, Iop_Add8 ), mkexpr( EA ),
+                        ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+
+      store( irx_addr, unop( Iop_64to32, mkexpr( word3 ) ) );
+      break;
+   }
+
+   case 0x18D: // stxvl Store VSX Vector Indexed
+   {
+      UInt ea_off = 0;
+      IRExpr* irx_addr;
+      IRTemp word0 = newTemp( Ity_I64 );
+      IRTemp word1 = newTemp( Ity_I64 );
+      IRTemp word2 = newTemp( Ity_I64 );
+      IRTemp word3 = newTemp( Ity_I64 );
+      IRTemp shift = newTemp( Ity_I8 );
+      IRTemp nb_gt16 = newTemp( Ity_I8 );
+      IRTemp nb_zero = newTemp( Ity_V128 );
+      IRTemp nb = newTemp( Ity_I8 );
+      IRTemp nb_field = newTemp( Ity_I64 );
+      IRTemp n_bytes = newTemp( Ity_I8 );
+      IRTemp base_addr = newTemp( ty );
+      IRTemp current_mem = newTemp( Ity_V128 );
+      IRTemp store_val = newTemp( Ity_V128 );
+      IRTemp nb_mask = newTemp( Ity_V128 );
+
+      DIP("stxvl %d,r%u,r%u\n", (UInt)XS, rA_addr, rB_addr);
+
+      assign( nb_field, binop( Iop_Shr64,
+                               getIReg(rB_addr),
+                               mkU8( 56 ) ) );
+
+      assign( nb, unop( Iop_64to8, mkexpr( nb_field ) ) );
+
+      /* nb_gt16 will be all zeros if nb > 16 */
+      assign( nb_gt16, unop( Iop_1Sto8,
+                             binop( Iop_CmpLT64U,
+                                    binop( Iop_Shr64,
+                                           mkexpr( nb_field ),
+                                           mkU8( 4 ) ),
+                                    mkU64( 1 ) ) ) );
+
+      /* nb_zero is 0xFF..FF if the nb_field = 0 */
+      assign( nb_zero, binop( Iop_64HLtoV128,
+                              unop( Iop_1Sto64,
+                                    binop( Iop_CmpEQ64,
+                                           mkexpr( nb_field ),
+                                           mkU64( 0 ) ) ),
+                              unop( Iop_1Sto64,
+                                    binop( Iop_CmpEQ64,
+                                           mkexpr( nb_field ),
+                                           mkU64( 0 ) ) ) ) );
+
+      /* set n_bytes to 0 if nb >= 16.  Otherwise, set to nb. */
+      assign( n_bytes, binop( Iop_And8, mkexpr( nb ), mkexpr( nb_gt16 ) ) );
+      assign( shift, unop( Iop_64to8,
+                           binop( Iop_Mul64,
+                                  binop( Iop_Sub64,
+                                         mkU64( 16 ),
+                                         unop( Iop_8Uto64,
+                                               mkexpr( n_bytes ) ) ),
+                                  mkU64( 8 ) ) ) );
+
+      /* We only have a 32-bit store function. So, need to fetch the
+       * contents of memory merge with the store value and do two
+       * 32-byte stores so we preserve the contents of memory not
+       * addressed by nb.
+       */
+      assign( base_addr, ea_rAor0( rA_addr ) );
+
+      assign( current_mem,
+              binop( Iop_64HLtoV128,
+                     load( Ity_I64, mkexpr( base_addr ) ),
+                     load( Ity_I64,
+                           binop( mkSzOp( ty, Iop_Add8 ),
+                                  mkexpr( base_addr ),
+                                  ty == Ity_I64 ? mkU64( 8 ) : mkU32( 8 )
+                                  ) ) ) );
+
+      /* Set the nb_mask to all zeros if nb = 0 so the current contents
+       * of memory get written back without modifications.
+       *
+       * The store_val is a combination of the current memory value
+       * and the bytes you want to store.  The nb_mask selects the
+       * bytes you want stored from Vs.
+       */
+      if (host_endness == VexEndnessBE) {
+         assign( nb_mask,
+                 binop( Iop_OrV128,
+                        binop( Iop_AndV128,
+                               binop( Iop_ShlV128,
+                                      mkV128( 0xFFFF ),
+                                      mkexpr( shift ) ),
+                               unop( Iop_NotV128, mkexpr( nb_zero ) ) ),
+                        binop( Iop_AndV128,
+                               mkexpr( nb_zero ),
+                               mkV128( 0 ) ) ) );
+
+         assign( store_val,
+                 binop( Iop_OrV128,
+                        binop( Iop_AndV128,
+                               binop( Iop_ShrV128,
+                                      mkexpr( vS ),
+                                      mkexpr( shift ) ),
+                               mkexpr( nb_mask ) ),
+                        binop( Iop_AndV128,
+                               unop( Iop_NotV128, mkexpr( nb_mask ) ),
+                               mkexpr( current_mem) ) ) );
+
+      } else {
+            assign( nb_mask,
+                 binop( Iop_OrV128,
+                        binop( Iop_AndV128,
+                               binop( Iop_ShrV128,
+                                      binop( Iop_ShlV128,
+                                             mkV128( 0xFFFF ),
+                                             mkexpr( shift ) ),
+                                      mkexpr( shift ) ),
+                               unop( Iop_NotV128, mkexpr( nb_zero ) ) ),
+                        binop( Iop_AndV128,
+                               mkexpr( nb_zero ),
+                               mkV128( 0 ) ) ) );
+
+         assign( store_val,
+                 binop( Iop_OrV128,
+                        binop( Iop_AndV128,
+                               binop( Iop_ShrV128,
+                                      binop( Iop_ShlV128,
+                                             mkexpr( vS ),
+                                             mkexpr( shift ) ),
+                                      mkexpr( shift ) ),
+                               mkexpr( nb_mask ) ),
+                        binop( Iop_AndV128,
+                               unop( Iop_NotV128, mkexpr( nb_mask ) ),
+                               mkexpr( current_mem) ) ) );
+      }
+
+      /* Store the value in 32-byte chunks */
+      assign( word0,  binop( Iop_Shr64,
+                             unop( Iop_V128HIto64, mkexpr( store_val ) ),
+                             mkU8( 32 ) ) );
+
+      assign( word1, binop( Iop_And64,
+                            unop( Iop_V128HIto64, mkexpr( store_val ) ),
+                            mkU64( 0xFFFFFFFF ) ) );
+
+      assign( word2, binop( Iop_Shr64,
+                            unop( Iop_V128to64, mkexpr( store_val ) ),
+                            mkU8( 32 ) ) );
+
+      assign( word3, binop( Iop_And64,
+                            unop( Iop_V128to64, mkexpr( store_val ) ),
+                            mkU64( 0xFFFFFFFF ) ) );
+
+      ea_off = 0;
+      irx_addr = binop( mkSzOp( ty, Iop_Add8 ), mkexpr( base_addr ),
+                           ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+
+      store( irx_addr, unop( Iop_64to32, mkexpr( word3 ) ) );
+
+      ea_off += 4;
+      irx_addr = binop( mkSzOp( ty, Iop_Add8 ), mkexpr( base_addr ),
+                        ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+
+      store( irx_addr, unop( Iop_64to32, mkexpr( word2 ) ) );
+
+      ea_off += 4;
+      irx_addr = binop( mkSzOp( ty, Iop_Add8 ), mkexpr( base_addr ),
+                        ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+
+      store( irx_addr, unop( Iop_64to32, mkexpr( word1 ) ) );
+
+      ea_off += 4;
+      irx_addr = binop( mkSzOp( ty, Iop_Add8 ), mkexpr( base_addr ),
+                        ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+
+      store( irx_addr, unop( Iop_64to32, mkexpr( word0 ) ) );
+      break;
+   }
+
+   case 0x1AD: // stxvll (Store VSX Vector Left-justified with length XX1-form)
+   {
+      UInt ea_off = 0;
+      IRExpr* irx_addr;
+      IRTemp word0[5];
+      IRTemp word1[5];
+      IRTemp word2[5];
+      IRTemp word3[5];
+      IRTemp shift = newTemp(Ity_I8);
+      IRTemp nb_gt16 = newTemp(Ity_I8);
+      IRTemp nb_zero = newTemp(Ity_V128);
+      IRTemp nb = newTemp(Ity_I8);
+      IRTemp nb_field = newTemp(Ity_I64);
+      IRTemp n_bytes = newTemp(Ity_I8);
+      IRTemp base_addr = newTemp( ty );
+      IRTemp current_mem = newTemp(Ity_V128);
+      IRTemp store_val = newTemp(Ity_V128);
+      IRTemp nb_mask = newTemp(Ity_V128);
+      IRTemp mask = newTemp( Ity_I64 );
+      IRTemp byte[16];
+      IRTemp tmp_low[9];
+      IRTemp tmp_hi[9];
+      IRTemp nb_field_compare_zero = newTemp( Ity_I64 );
+      Int i;
+
+      DIP("stxvll %d,r%u,r%u\n", (UInt)XS, rA_addr, rB_addr);
+
+      assign( nb_field, binop( Iop_Shr64,
+                               getIReg(rB_addr),
+                               mkU8( 56 ) ) );
+      assign( nb, unop( Iop_64to8, mkexpr( nb_field ) ) );
+      assign( mask, mkU64( 0xFFFFFFFFFFFFFFFFULL ) );
+
+      /* nb_gt16 will be all zeros if nb > 16 */
+      assign( nb_gt16, unop( Iop_1Sto8,
+                             binop( Iop_CmpLT64U,
+                                    binop( Iop_Shr64,
+                                           mkexpr( nb_field ),
+                                           mkU8( 4 ) ),
+                                    mkU64( 1 ) ) ) );
+
+      assign( nb_field_compare_zero, unop( Iop_1Sto64,
+                                           binop( Iop_CmpEQ64,
+                                                  mkexpr( nb_field ),
+                                                  mkU64( 0 ) ) ) );
+
+      /* nb_zero is 0xFF..FF if the nb_field = 0 */
+      assign( nb_zero, binop( Iop_64HLtoV128,
+                              mkexpr( nb_field_compare_zero ),
+                              mkexpr( nb_field_compare_zero ) ) );
+
+
+      /* set n_bytes to 0 if nb >= 16.  Otherwise, set to nb. */
+      assign( n_bytes, binop( Iop_And8, mkexpr( nb ), mkexpr( nb_gt16 ) ) );
+      assign( shift,
+              unop( Iop_64to8,
+                    binop( Iop_Mul64,
+                           binop( Iop_Sub64,
+                                  mkU64( 16 ),
+                                  unop( Iop_8Uto64, mkexpr( n_bytes ) )),
+                           mkU64( 8 ) ) ) );
+
+      /* We only have a 32-bit store function. So, need to fetch the
+       * contents of memory merge with the store value and do two
+       * 32-byte stores so we preserve the contents of memory not
+       * addressed by nb.
+       */
+      assign( base_addr, ea_rAor0( rA_addr ) );
+      /* fetch all 16 bytes and store in Big Endian format */
+      word0[0] = newTemp(Ity_I64);
+      assign( word0[0], mkU64( 0 ) );
+
+      word1[0] = newTemp(Ity_I64);
+      assign( word1[0], mkU64( 0 ) );
+
+      word2[0] = newTemp(Ity_I64);
+      assign( word2[0], mkU64( 0 ) );
+
+      word3[0] = newTemp(Ity_I64);
+      assign( word3[0], mkU64( 0 ) );
+
+      for (i = 0; i < 4; i++) {
+         word0[i+1] = newTemp(Ity_I64);
+
+         irx_addr = binop( mkSzOp( ty, Iop_Add8 ), mkexpr( base_addr ),
+                           ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+         ea_off += 1;
+
+         /* Instruction always loads in Big Endian format */
+         assign( word0[i+1],
+                 binop( Iop_Or64,
+                        binop( Iop_Shl64,
+                               unop( Iop_8Uto64,
+                                     load( Ity_I8,
+                                           irx_addr ) ),
+                               mkU8( (3-i)*8 ) ),
+                        mkexpr( word0[i] ) ) );
+      }
+
+      for (i = 0; i < 4; i++) {
+         word1[i+1] = newTemp(Ity_I64);
+
+         irx_addr = binop( mkSzOp( ty, Iop_Add8 ), mkexpr( base_addr ),
+                           ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+         ea_off += 1;
+
+         /* Instruction always loads in Big Endian format */
+         assign( word1[i+1],
+                 binop( Iop_Or64,
+                        binop( Iop_Shl64,
+                               unop( Iop_8Uto64,
+                                     load( Ity_I8,
+                                           irx_addr ) ),
+                               mkU8( (3-i)*8 ) ),
+                        mkexpr( word1[i] ) ) );
+      }
+      for (i = 0; i < 4; i++) {
+         word2[i+1] = newTemp(Ity_I64);
+
+         irx_addr = binop( mkSzOp( ty, Iop_Add8 ), mkexpr( base_addr ),
+                           ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+         ea_off += 1;
+
+         /* Instruction always loads in Big Endian format */
+         assign( word2[i+1],
+                 binop( Iop_Or64,
+                        binop( Iop_Shl64,
+                               unop( Iop_8Uto64,
+                                     load( Ity_I8,
+                                           irx_addr ) ),
+                               mkU8( (3-i)*8 ) ),
+                        mkexpr( word2[i] ) ) );
+      }
+      for (i = 0; i < 4; i++) {
+         word3[i+1] = newTemp(Ity_I64);
+
+         irx_addr = binop( mkSzOp( ty, Iop_Add8 ), mkexpr( base_addr ),
+                           ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+         ea_off += 1;
+
+         /* Instruction always loads in Big Endian format */
+         assign( word3[i+1],
+                 binop( Iop_Or64,
+                        binop( Iop_Shl64,
+                               unop( Iop_8Uto64,
+                                     load( Ity_I8,
+                                           irx_addr ) ),
+                               mkU8( (3-i)*8 ) ),
+                        mkexpr( word3[i] ) ) );
+      }
+
+
+      assign( current_mem,
+              binop( Iop_64HLtoV128,
+                     binop( Iop_Or64,
+                            binop( Iop_Shl64,
+                                   mkexpr( word0[4] ),
+                                   mkU8( 32 ) ),
+                            mkexpr( word1[4] ) ),
+                     binop( Iop_Or64,
+                            binop( Iop_Shl64,
+                                   mkexpr( word2[4] ),
+                                   mkU8( 32 ) ),
+                            mkexpr( word3[4] ) ) ) );
+
+      /* Set the nb_mask to all zeros if nb = 0 so the current contents
+       * of memory get written back without modifications.
+       *
+       * The store_val is a combination of the current memory value
+       * and the bytes you want to store.  The nb_mask selects the
+       * bytes you want stored from Vs.
+       */
+      /* The instruction always uses Big Endian order */
+      assign( nb_mask,
+              binop( Iop_OrV128,
+                     binop( Iop_AndV128,
+                            binop( Iop_ShlV128,
+                                   binop( Iop_ShrV128,
+                                          binop( Iop_64HLtoV128,
+                                                 mkexpr( mask ),
+                                                 mkexpr( mask ) ),
+                                          mkexpr( shift ) ),
+                                   mkexpr( shift ) ),
+                            unop( Iop_NotV128, mkexpr( nb_zero ) ) ),
+                     binop( Iop_AndV128,
+                            mkexpr( nb_zero ),
+                            binop( Iop_64HLtoV128,
+                                   mkU64( 0x0 ),
+                                   mkU64( 0x0 ) ) ) ) );
+
+      assign( store_val,
+              binop( Iop_OrV128,
+                     binop( Iop_AndV128,
+                            mkexpr( vS ),
+                            mkexpr( nb_mask ) ),
+                     binop( Iop_AndV128,
+                            unop( Iop_NotV128, mkexpr( nb_mask ) ),
+                            mkexpr( current_mem) ) ) );
+
+      /* store the merged value in Big Endian format */
+      tmp_low[0] = newTemp(Ity_I64);
+      tmp_hi[0] = newTemp(Ity_I64);
+      assign( tmp_low[0], mkU64( 0 ) );
+      assign( tmp_hi[0], mkU64( 0 ) );
+
+      for (i = 0; i < 8; i++) {
+         byte[i] = newTemp(Ity_I64);
+         byte[i+8] = newTemp(Ity_I64);
+         tmp_low[i+1] = newTemp(Ity_I64);
+         tmp_hi[i+1]  = newTemp(Ity_I64);
+
+         assign( byte[i], binop( Iop_And64,
+                                 binop( Iop_Shr64,
+                                        unop( Iop_V128HIto64,
+                                              mkexpr( store_val ) ),
+                                        mkU8( (7-i)*8 ) ),
+                                 mkU64( 0xFF ) ) );
+         assign( byte[i+8], binop( Iop_And64,
+                                   binop( Iop_Shr64,
+                                          unop( Iop_V128to64,
+                                                mkexpr( store_val ) ),
+                                          mkU8( (7-i)*8 ) ),
+                                   mkU64( 0xFF ) ) );
+
+         assign( tmp_low[i+1],
+                 binop( Iop_Or64,
+                        mkexpr( tmp_low[i] ),
+                        binop( Iop_Shl64, mkexpr( byte[i] ), mkU8( i*8 ) ) ) );
+         assign( tmp_hi[i+1],
+                 binop( Iop_Or64,
+                        mkexpr( tmp_hi[i] ),
+                        binop( Iop_Shl64, mkexpr( byte[i+8] ),
+                               mkU8( i*8 ) ) ) );
+      }
+
+      /* Store the value in 32-byte chunks */
+      ea_off = 0;
+      irx_addr = binop( mkSzOp( ty, Iop_Add8 ), mkexpr( base_addr ),
+                        ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+
+      store( irx_addr, unop( Iop_64to32, mkexpr( tmp_low[8] ) ) );
+
+      ea_off += 4;
+      irx_addr = binop( mkSzOp( ty, Iop_Add8 ), mkexpr( base_addr ),
+                        ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+
+      store( irx_addr, unop( Iop_64HIto32, mkexpr( tmp_low[8] ) ) );
+
+      ea_off += 4;
+      irx_addr = binop( mkSzOp( ty, Iop_Add8 ), mkexpr( base_addr ),
+                        ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+
+      store( irx_addr, unop( Iop_64to32, mkexpr( tmp_hi[8] ) ) );
+
+      ea_off += 4;
+      irx_addr = binop( mkSzOp( ty, Iop_Add8 ), mkexpr( base_addr ),
+                        ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+
+      store( irx_addr, unop( Iop_64HIto32, mkexpr( tmp_hi[8] ) ) );
+
+      break;
+      }
+
    case 0x28C:
    {
       IRTemp high64 = newTemp(Ity_F64);
       IRTemp val32  = newTemp(Ity_I32);
-      DIP("stxsspx %d,r%u,r%u\n", XS, rA_addr, rB_addr);
+      DIP("stxsspx %d,r%u,r%u\n", (UInt)XS, rA_addr, rB_addr);
       assign(high64, unop( Iop_ReinterpI64asF64,
                            unop( Iop_V128HIto64, mkexpr( vS ) ) ) );
       assign(val32, unop( Iop_ReinterpF32asI32,
@@ -15700,15 +20814,64 @@
    case 0x2CC:
    {
       IRExpr * high64;
-      DIP("stxsdx %d,r%u,r%u\n", XS, rA_addr, rB_addr);
+      DIP("stxsdx %d,r%u,r%u\n", (UInt)XS, rA_addr, rB_addr);
       high64 = unop( Iop_V128HIto64, mkexpr( vS ) );
       store( mkexpr( EA ), high64 );
       break;
    }
+
+   case 0x38D: // stxsibx
+   {
+      IRExpr *stored_word;
+      IRTemp byte_to_store = newTemp( Ity_I64 );
+
+      DIP("stxsibx %d,r%u,r%u\n", (UInt)XS, rA_addr, rB_addr);
+
+      /* Can't store just a byte, need to fetch the word at EA merge data
+       * and store.
+       */
+      stored_word = load( Ity_I64, mkexpr( EA ) );
+      assign( byte_to_store, binop( Iop_And64,
+                                    unop( Iop_V128HIto64,
+                                          mkexpr( vS ) ),
+                                    mkU64( 0xFF ) ) );
+
+      store( mkexpr( EA ), binop( Iop_Or64,
+                                  binop( Iop_And64,
+                                         stored_word,
+                                         mkU64( 0xFFFFFFFFFFFFFF00 ) ),
+                                  mkexpr( byte_to_store ) ) );
+      break;
+   }
+
+   case 0x3AD: // stxsihx
+   {
+      IRExpr *stored_word;
+      IRTemp byte_to_store = newTemp( Ity_I64 );
+
+      DIP("stxsihx %d,r%u,r%u\n", (UInt)XS, rA_addr, rB_addr);
+
+      /* Can't store just a halfword, need to fetch the word at EA merge data
+       * and store.
+       */
+      stored_word = load( Ity_I64, mkexpr( EA ) );
+      assign( byte_to_store, binop( Iop_And64,
+                                    unop( Iop_V128HIto64,
+                                          mkexpr( vS ) ),
+                                    mkU64( 0xFFFF ) ) );
+
+      store( mkexpr( EA ), binop( Iop_Or64,
+                                  binop( Iop_And64,
+                                         stored_word,
+                                         mkU64( 0xFFFFFFFFFFFF0000 ) ),
+                                  mkexpr( byte_to_store ) ) );
+      break;
+   }
+
    case 0x3CC:
    {
       IRExpr * high64, *low64;
-      DIP("stxvd2x %d,r%u,r%u\n", XS, rA_addr, rB_addr);
+      DIP("stxvd2x %d,r%u,r%u\n", (UInt)XS, rA_addr, rB_addr);
       high64 = unop( Iop_V128HIto64, mkexpr( vS ) );
       low64 = unop( Iop_V128to64, mkexpr( vS ) );
       store( mkexpr( EA ), high64 );
@@ -15723,7 +20886,7 @@
       IRTemp hi64 = newTemp( Ity_I64 );
       IRTemp lo64 = newTemp( Ity_I64 );
 
-      DIP("stxvw4x %d,r%u,r%u\n", XS, rA_addr, rB_addr);
+      DIP("stxvw4x %d,r%u,r%u\n", (UInt)XS, rA_addr, rB_addr);
 
       // This instruction supports word-aligned stores, so EA may not be
       // quad-word aligned.  Therefore, do 4 individual word-size stores.
@@ -15745,6 +20908,223 @@
 
       break;
    }
+   case 0x3AC: // stxvh8x Store VSX Vector Halfword*8 Indexed
+   {
+      UInt ea_off = 0;
+      IRExpr* irx_addr;
+      IRTemp half_word0 = newTemp( Ity_I64 );
+      IRTemp half_word1 = newTemp( Ity_I64 );
+      IRTemp half_word2 = newTemp( Ity_I64 );
+      IRTemp half_word3 = newTemp( Ity_I64 );
+      IRTemp half_word4 = newTemp( Ity_I64 );
+      IRTemp half_word5 = newTemp( Ity_I64 );
+      IRTemp half_word6 = newTemp( Ity_I64 );
+      IRTemp half_word7 = newTemp( Ity_I64 );
+
+      DIP("stxvb8x %d,r%u,r%u\n", (UInt)XS, rA_addr, rB_addr);
+
+      assign( half_word0, binop( Iop_Shr64,
+                                 unop( Iop_V128HIto64, mkexpr( vS ) ),
+                                 mkU8( 48 ) ) );
+
+      assign( half_word1, binop( Iop_And64,
+                                 binop( Iop_Shr64,
+                                        unop( Iop_V128HIto64, mkexpr( vS ) ),
+                                        mkU8( 32 ) ),
+                                 mkU64( 0xFFFF ) ) );
+
+      assign( half_word2, binop( Iop_And64,
+                                 binop( Iop_Shr64,
+                                        unop( Iop_V128HIto64, mkexpr( vS ) ),
+                                        mkU8( 16 ) ),
+                                 mkU64( 0xFFFF ) ) );
+
+      assign( half_word3, binop( Iop_And64,
+                                 unop( Iop_V128HIto64, mkexpr( vS ) ),
+                                 mkU64( 0xFFFF ) ) );
+
+      assign( half_word4, binop( Iop_Shr64,
+                                 unop( Iop_V128to64, mkexpr( vS ) ),
+                                 mkU8( 48 ) ) );
+
+      assign( half_word5, binop( Iop_And64,
+                                 binop( Iop_Shr64,
+                                        unop( Iop_V128to64, mkexpr( vS ) ),
+                                        mkU8( 32 ) ),
+                                 mkU64( 0xFFFF ) ) );
+
+      assign( half_word6, binop( Iop_And64,
+                                 binop( Iop_Shr64,
+                                        unop( Iop_V128to64, mkexpr( vS ) ),
+                                        mkU8( 16 ) ),
+                                 mkU64( 0xFFFF ) ) );
+
+      assign( half_word7, binop( Iop_And64,
+                                 unop( Iop_V128to64, mkexpr( vS ) ),
+                                 mkU64( 0xFFFF ) ) );
+
+      /* Do the 32-bit stores.  The store() does an Endian aware store. */
+      if ( host_endness == VexEndnessBE ) {
+         store( mkexpr( EA ), unop( Iop_64to32,
+                                    binop( Iop_Or64,
+                                           mkexpr( half_word1 ),
+                                           binop( Iop_Shl64,
+                                                  mkexpr( half_word0 ),
+                                                  mkU8( 16 ) ) ) ) );
+
+         ea_off += 4;
+         irx_addr = binop( mkSzOp( ty, Iop_Add8 ), mkexpr( EA ),
+                           ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+
+
+         store( irx_addr, unop( Iop_64to32,
+                                binop( Iop_Or64,
+                                       mkexpr( half_word3 ),
+                                       binop( Iop_Shl64,
+                                              mkexpr( half_word2 ),
+                                              mkU8( 16 ) ) ) ) );
+
+         ea_off += 4;
+         irx_addr = binop( mkSzOp( ty, Iop_Add8 ), mkexpr( EA ),
+                           ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+
+         store( irx_addr, unop( Iop_64to32,
+                                binop( Iop_Or64,
+                                       mkexpr( half_word5 ),
+                                       binop( Iop_Shl64,
+                                              mkexpr( half_word4 ),
+                                              mkU8( 16 ) ) ) ) );
+         ea_off += 4;
+         irx_addr = binop( mkSzOp( ty, Iop_Add8 ), mkexpr( EA ),
+                           ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+
+         store( irx_addr, unop( Iop_64to32,
+                                binop( Iop_Or64,
+                                       mkexpr( half_word7 ),
+                                       binop( Iop_Shl64,
+                                              mkexpr( half_word6 ),
+                                              mkU8( 16 ) ) ) ) );
+
+      } else {
+         store( mkexpr( EA ), unop( Iop_64to32,
+                                    binop( Iop_Or64,
+                                           mkexpr( half_word0 ),
+                                           binop( Iop_Shl64,
+                                                  mkexpr( half_word1 ),
+                                                  mkU8( 16 ) ) ) ) );
+
+         ea_off += 4;
+         irx_addr = binop( mkSzOp( ty, Iop_Add8 ), mkexpr( EA ),
+                           ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+
+         store( irx_addr, unop( Iop_64to32,
+                                binop( Iop_Or64,
+                                       mkexpr( half_word2 ),
+                                       binop( Iop_Shl64,
+                                              mkexpr( half_word3 ),
+                                              mkU8( 16 ) ) ) ) );
+         ea_off += 4;
+         irx_addr = binop( mkSzOp( ty, Iop_Add8 ), mkexpr( EA ),
+                           ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+
+         store( irx_addr, unop( Iop_64to32,
+                                binop( Iop_Or64,
+                                       mkexpr( half_word4 ),
+                                       binop( Iop_Shl64,
+                                              mkexpr( half_word5 ),
+                                              mkU8( 16 ) ) ) ) );
+         ea_off += 4;
+         irx_addr = binop( mkSzOp( ty, Iop_Add8 ), mkexpr( EA ),
+                           ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+
+         store( irx_addr, unop( Iop_64to32,
+                                binop( Iop_Or64,
+                                       mkexpr( half_word6 ),
+                                       binop( Iop_Shl64,
+                                              mkexpr( half_word7 ),
+                                              mkU8( 16 ) ) ) ) );
+      }
+      break;
+   }
+
+   case 0x3EC: // stxvb16x Store VSX Vector Byte*16 Indexed
+   {
+      UInt ea_off = 0;
+      int i;
+      IRExpr* irx_addr;
+      IRTemp byte[16];
+
+      DIP("stxvb16x %d,r%u,r%u\n", (UInt)XS, rA_addr, rB_addr);
+
+      for ( i = 0; i < 8; i++ ) {
+         byte[i] = newTemp( Ity_I64 );
+         byte[i+8] = newTemp( Ity_I64 );
+
+         assign( byte[i], binop( Iop_And64,
+                               binop( Iop_Shr64,
+                                      unop( Iop_V128HIto64, mkexpr( vS ) ),
+                                      mkU8( 56 - i*8 ) ),
+                               mkU64( 0xFF ) ) );
+
+         assign( byte[i+8], binop( Iop_And64,
+                               binop( Iop_Shr64,
+                                      unop( Iop_V128to64, mkexpr( vS ) ),
+                                      mkU8( 56 - i*8) ),
+                               mkU64( 0xFF ) ) );
+      }
+
+      if ( host_endness == VexEndnessBE ) {
+         for ( i = 0; i < 16; i = i + 4)  {
+            irx_addr =
+               binop( mkSzOp( ty, Iop_Add8 ), mkexpr( EA ),
+                      ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+
+            store( irx_addr,
+                   unop( Iop_64to32,
+                         binop( Iop_Or64,
+                                binop( Iop_Or64,
+                                       mkexpr( byte[i+3] ),
+                                       binop( Iop_Shl64,
+                                              mkexpr( byte[i+2] ),
+                                              mkU8( 8 ) ) ),
+                                binop( Iop_Or64,
+                                       binop( Iop_Shl64,
+                                              mkexpr( byte[i+1] ),
+                                              mkU8( 16 ) ),
+                                       binop( Iop_Shl64,
+                                              mkexpr( byte[i] ),
+                                              mkU8( 24 ) ) ) ) ) );
+            ea_off += 4;
+         }
+
+      } else {
+         for ( i = 0; i < 16; i = i + 4)  {
+            irx_addr =
+               binop( mkSzOp( ty, Iop_Add8 ), mkexpr( EA ),
+                      ty == Ity_I64 ? mkU64( ea_off ) : mkU32( ea_off ) );
+
+            store( irx_addr,
+                   unop( Iop_64to32,
+                         binop( Iop_Or64,
+                                binop( Iop_Or64,
+                                       mkexpr( byte[i] ),
+                                       binop( Iop_Shl64,
+                                              mkexpr( byte[i+1] ),
+                                              mkU8( 8 ) ) ),
+                                binop( Iop_Or64,
+                                       binop( Iop_Shl64,
+                                              mkexpr( byte[i+2] ),
+                                              mkU8( 16 ) ),
+                                       binop( Iop_Shl64,
+                                              mkexpr( byte[i+3] ),
+                                              mkU8( 24 ) ) ) ) ) );
+
+            ea_off += 4;
+         }
+      }
+      break;
+   }
+
    default:
       vex_printf( "dis_vx_store(ppc)(opc2)\n" );
       return False;
@@ -15752,6 +21132,831 @@
    return True;
 }
 
+static Bool
+dis_vx_Scalar_Round_to_quad_integer( UInt theInstr )
+{
+   /* The ISA 3.0 instructions supported in this function require
+    * the underlying hardware platform that supports the ISA3.0
+    * instruction set.
+    */
+   /* XX1-Form */
+   UChar opc1 = ifieldOPC( theInstr );
+   UInt opc2 = IFIELD( theInstr, 1, 8 );
+   UChar vT_addr = ifieldRegDS( theInstr );
+   UChar vB_addr = ifieldRegB( theInstr );
+   IRTemp vB = newTemp( Ity_F128 );
+   IRTemp vT = newTemp( Ity_F128 );
+   UChar EX = IFIELD( theInstr, 0, 1 );
+
+   assign( vB, getF128Reg( vB_addr ) );
+   if (opc1 != 0x3F) {
+      vex_printf( "dis_vx_Scalar_Round_to_quad_integer(ppc)(instr)\n" );
+      return False;
+   }
+   switch (opc2) {
+   case 0x005:     // VSX Scalar Round to Quad-Precision Integer [with Inexact]
+      {
+         UChar R   = IFIELD( theInstr, 16, 1 );
+         UChar RMC = IFIELD( theInstr, 9, 2 );
+
+         /* Store the rm specification bits.  Will extract them later when
+          * the isntruction is issued.
+          */
+         IRExpr* rm = mkU32( R << 3 | RMC << 1 | EX);
+
+         if ( EX == 0 ) {  // xsrqpi
+            DIP("xsrqpi %d,v%d,v%d,%d\n",  R, vT_addr, vB_addr, RMC);
+            assign( vT, binop( Iop_F128toI128S, rm, mkexpr( vB ) ) );
+
+         } else {     // xsrqpix
+            DIP("xsrqpix %d,v%d,v%d,%d\n", R, vT_addr, vB_addr, RMC);
+            assign( vT, binop( Iop_F128toI128S, rm, mkexpr( vB ) ) );
+         }
+         generate_store_FPRF( Ity_F128, vT );
+      }   /* case 0x005 */
+      break;
+   case 0x025:     // xsrqpxp  VSX Scalar Round Quad-Precision to
+                   // Double-Extended Precision
+      {
+         UChar R   = IFIELD( theInstr, 16, 1 );
+         UChar RMC = IFIELD( theInstr, 9, 2 );
+
+         /* Store the rm specification bits.  Will extract them later when
+          * the isntruction is issued.
+          */
+         IRExpr* rm = mkU32( R << 3 | RMC << 1 );
+
+        DIP("xsrqpxp %d,v%d,v%d,%d\n",  R, vT_addr, vB_addr, RMC);
+         assign( vT, binop( Iop_RndF128, rm, mkexpr( vB ) ) );
+         generate_store_FPRF( Ity_F128, vT );
+      }   /* case 0x025 */
+     break;
+   default:
+      vex_printf( "dis_vx_Scalar_Round_to_quad_integer(ppc)(opc2)\n" );
+      return False;
+   }     /* switch opc2 */
+   putF128Reg( vT_addr, mkexpr( vT ) );
+   return True;
+}
+
+static Bool
+dis_vx_Floating_Point_Arithmetic_quad_precision( UInt theInstr )
+{
+   /* The ISA 3.0 instructions supported in this function require
+    * the underlying hardware platform that supports the ISA 3.0
+    * instruction set.
+    */
+   /* XX1-Form */
+   UChar opc1 = ifieldOPC( theInstr );
+   UInt opc2 = ifieldOPClo10( theInstr );
+   UChar vT_addr = ifieldRegDS( theInstr );
+   UChar vA_addr = ifieldRegA( theInstr );
+   UChar vB_addr = ifieldRegB( theInstr );
+   IRTemp vA = newTemp( Ity_F128 );
+   IRTemp vB = newTemp( Ity_F128 );
+   IRTemp vT = newTemp( Ity_F128 );
+   IRExpr* rm = get_IR_roundingmode();
+   UChar R0 = IFIELD( theInstr, 0, 1 );
+
+   assign( vB, getF128Reg( vB_addr ) );
+
+   if ( opc1 != 0x3F ) {
+      vex_printf( "Erorr, dis_vx_Floating_Point_Arithmetic_quad_precision(ppc)(instr)\n" );
+      return False;
+   }
+   switch ( opc2 ) {
+   case 0x004:     // xsaddqp (VSX Scalar Add Quad-Precision[using round to Odd])
+      {
+         assign( vA, getF128Reg( vA_addr ) );
+
+         if ( R0 == 0 ) {
+            /* rounding mode specified by RN. Issue inst with R0 = 0 */
+            DIP("xsaddqp v%d,v%d,v%d\n",  vT_addr, vA_addr, vB_addr);
+            assign( vT, triop( Iop_AddF128, rm, mkexpr( vA ), mkexpr( vB ) ) );
+
+         } else {
+           /* rounding mode specified by Round to odd. Issue inst with R0 = 1 */
+            DIP("xsaddqpo v%d,v%d,v%d\n",  vT_addr, vA_addr, vB_addr);
+            assign( vT, triop( Iop_AddF128, set_round_to_Oddmode(),
+                               mkexpr( vA ), mkexpr( vB ) ) );
+         }
+         generate_store_FPRF( Ity_F128, vT );
+         break;
+      }
+   case 0x024:     // xsmulqp (VSX Scalar Multiply Quad-Precision[using round to Odd])
+      {
+         assign( vA, getF128Reg( vA_addr ) );
+
+         if ( R0 == 0 ) {
+            /* rounding mode specified by RN. Issue inst with R0 = 0 */
+            DIP("xsmulqp v%d,v%d,v%d\n",  vT_addr, vA_addr, vB_addr);
+            assign( vT, triop( Iop_MulF128, rm, mkexpr( vA ), mkexpr( vB ) ) );
+
+         } else {
+            /* rounding mode specified by Round to odd. Issue inst with R0 = 1 */
+            DIP("xsmulqpo v%d,v%d,v%d\n",  vT_addr, vA_addr, vB_addr);
+            assign( vT, triop( Iop_MulF128, set_round_to_Oddmode(), mkexpr( vA ),
+                               mkexpr( vB ) ) );
+         }
+         generate_store_FPRF( Ity_F128, vT );
+         break;
+      }
+   case 0x184:   // xsmaddqp (VSX Scalar Multiply add Quad-Precision[using round to Odd])
+      {
+         /* instruction computes (vA * vB) + vC */
+         IRTemp vC = newTemp( Ity_F128 );
+
+         assign( vA, getF128Reg( vA_addr ) );
+         assign( vC, getF128Reg( vT_addr ) );
+
+         if ( R0 == 0 ) {
+            /* rounding mode specified by RN. Issue inst with R0 = 0 */
+            DIP("xsmaddqp v%d,v%d,v%d\n",  vT_addr, vA_addr, vB_addr);
+            assign( vT,
+               qop( Iop_MAddF128, rm, mkexpr( vA ),
+                    mkexpr( vC ), mkexpr( vB ) ) );
+
+         } else {
+            /* rounding mode specified by Round to odd. Issue inst with R0 = 1 */
+            DIP("xsmaddqpo v%d,v%d,v%d\n",  vT_addr, vA_addr, vB_addr);
+            assign( vT,
+               qop( Iop_MAddF128, set_round_to_Oddmode(), mkexpr( vA ),
+                    mkexpr( vC ), mkexpr( vB ) ) );
+         }
+         generate_store_FPRF( Ity_F128, vT );
+         break;
+      }
+   case 0x1A4:   // xsmsubqp (VSX Scalar Multiply Subtract Quad-Precision[using round to Odd])
+      {
+         IRTemp vC = newTemp( Ity_F128 );
+
+         assign( vA, getF128Reg( vA_addr ) );
+         assign( vC, getF128Reg( vT_addr ) );
+
+         if ( R0 == 0 ) {
+            /* rounding mode specified by RN. Issue inst with R0 = 0 */
+            DIP("xsmsubqp v%d,v%d,v%d\n",  vT_addr, vA_addr, vB_addr);
+            assign( vT,
+               qop( Iop_MSubF128, rm, mkexpr( vA ),
+                    mkexpr( vC ), mkexpr( vB ) ) );
+
+         } else {
+            /* rounding mode specified by Round to odd. Issue inst with R0 = 1 */
+            DIP("xsmsubqpo v%d,v%d,v%d\n",  vT_addr, vA_addr, vB_addr);
+            assign( vT,
+                    qop( Iop_MSubF128, set_round_to_Oddmode(),
+                         mkexpr( vA ), mkexpr( vC ), mkexpr( vB ) ) );
+         }
+         generate_store_FPRF( Ity_F128, vT );
+         break;
+      }
+   case 0x1C4:   // xsnmaddqp (VSX Scalar Negative Multiply Add Quad-Precision[using round to Odd])
+      {
+         IRTemp vC = newTemp( Ity_F128 );
+
+         assign( vA, getF128Reg( vA_addr ) );
+         assign( vC, getF128Reg( vT_addr ) );
+
+         if ( R0 == 0 ) {
+            /* rounding mode specified by RN. Issue inst with R0 = 0 */
+            DIP("xsnmaddqp v%d,v%d,v%d\n",  vT_addr, vA_addr, vB_addr);
+            assign( vT,
+               qop( Iop_NegMAddF128, rm, mkexpr( vA ),
+                    mkexpr( vC ), mkexpr( vB ) ) );
+
+         } else {
+            /* rounding mode specified by Round to odd. Issue inst with R0 = 1 */
+            DIP("xsnmaddqpo v%d,v%d,v%d\n",  vT_addr, vA_addr, vB_addr);
+            assign( vT,
+                    qop( Iop_NegMAddF128, set_round_to_Oddmode(),
+                         mkexpr( vA ), mkexpr( vC ), mkexpr( vB ) ) );
+         }
+         generate_store_FPRF( Ity_F128, vT );
+        break;
+      }
+   case 0x1E4:   // xsmsubqp (VSX Scalar Negatve Multiply Subtract Quad-Precision[using round to Odd])
+      {
+         IRTemp vC = newTemp( Ity_F128 );
+
+         assign( vA, getF128Reg( vA_addr ) );
+         assign( vC, getF128Reg( vT_addr ) );
+
+         if ( R0 == 0 ) {
+            /* rounding mode specified by RN. Issue inst with R0 = 0 */
+            DIP("xsnmsubqp v%d,v%d,v%d\n",  vT_addr, vA_addr, vB_addr);
+            assign( vT,
+               qop( Iop_NegMSubF128, rm, mkexpr( vA ),
+                    mkexpr( vC ), mkexpr( vB ) ) );
+
+         } else {
+            /* rounding mode specified by Round to odd. Issue inst with R0 = 1 */
+            DIP("xsnmsubqpo v%d,v%d,v%d\n",  vT_addr, vA_addr, vB_addr);
+            assign( vT,
+                    qop( Iop_NegMSubF128, set_round_to_Oddmode(),
+                         mkexpr( vA ), mkexpr( vC ), mkexpr( vB ) ) );
+         }
+         generate_store_FPRF( Ity_F128, vT );
+         break;
+      }
+   case 0x204:     // xssubqp (VSX Scalar Subtract Quad-Precision[using round to Odd])
+      {
+         assign( vA, getF128Reg( vA_addr ) );
+         if ( R0 == 0 ) {
+            /* use rounding mode specified by RN. Issue inst with R0 = 0 */
+            DIP("xssubqp v%d,v%d,v%d\n",  vT_addr, vA_addr, vB_addr);
+            assign( vT, triop( Iop_SubF128, rm, mkexpr( vA ), mkexpr( vB ) ) );
+
+         } else {
+            /* use rounding mode specified by Round to odd. Issue inst with R0 = 1 */
+            DIP("xssubqpo v%d,v%d,v%d\n",  vT_addr, vA_addr, vB_addr);
+            assign( vT, triop( Iop_SubF128, set_round_to_Oddmode(), mkexpr( vA ),
+                               mkexpr( vB ) ) );
+         }
+         generate_store_FPRF( Ity_F128, vT );
+        break;
+      }
+   case 0x224:     // xsdivqp (VSX Scalar Divide Quad-Precision[using round to Odd])
+      {
+         assign( vA, getF128Reg( vA_addr ) );
+         if ( R0 == 0 ) {
+            /* use rounding mode specified by RN. Issue inst with R0 = 0 */
+            DIP("xsdivqp v%d,v%d,v%d\n",  vT_addr, vA_addr, vB_addr);
+            assign( vT, triop( Iop_DivF128, rm, mkexpr( vA ), mkexpr( vB ) ) );
+
+         } else {
+            /* use rounding mode specified by Round to odd. Issue inst with R0 = 1 */
+            DIP("xsdivqpo v%d,v%d,v%d\n",  vT_addr, vA_addr, vB_addr);
+            assign( vT, triop( Iop_DivF128, set_round_to_Oddmode(), mkexpr( vA ),
+                               mkexpr( vB ) ) );
+         }
+         generate_store_FPRF( Ity_F128, vT );
+         break;
+      }
+   case 0x324:  // xssqrtqp (VSX Scalar Square root Quad-Precision[using round to Odd])
+      {
+         UInt inst_select = IFIELD( theInstr, 16, 5 );
+
+         switch (inst_select) {
+         case 27:
+            {
+               if ( R0 == 0 ) { // xssqrtqp
+                  /* rounding mode specified by RN. Issue inst with R0 = 0 */
+                  DIP("xssqrtqp v%d,v%d\n",  vT_addr, vB_addr);
+                  assign( vT, binop( Iop_SqrtF128, rm, mkexpr( vB ) ) );
+
+               } else {      // xssqrtqpo
+                  /* rounding mode is Round to odd. Issue inst with R0 = 1 */
+                  DIP("xssqrtqpo v%d,v%d\n",  vT_addr, vB_addr);
+                  assign( vT, binop( Iop_SqrtF128, set_round_to_Oddmode(),
+                                     mkexpr( vB ) ) );
+               }
+               generate_store_FPRF( Ity_F128, vT );
+               break;
+            }   /* end case 27 */
+         default:
+            vex_printf("dis_vx_Floating_Point_Arithmetic_quad_precision(0x324 unknown inst_select)\n");
+            return False;
+         }  /* end switch inst_select */
+         break;
+      }   /* end case 0x324 */
+
+   case 0x344:
+      {
+         UInt inst_select = IFIELD( theInstr, 16, 5);
+
+         switch (inst_select) {
+         case 1:    // xscvqpuwz  VSX Scalar Truncate & Convert Quad-Precision
+                    // format to Unsigned Word format
+            {
+               DIP("xscvqpuwz v%d,v%d\n",  vT_addr, vB_addr);
+               assign( vT, unop( Iop_TruncF128toI32U, mkexpr( vB ) ) );
+               break;
+            }
+         case 2:    // xscvudqp  VSX Scalar Convert from Unsigned Doubleword
+                    // format to Quad-Precision format
+            {
+               IRTemp tmp = newTemp( Ity_I64 );
+
+               DIP("xscvudqp v%d,v%d\n",  vT_addr, vB_addr);
+               assign( tmp, unop( Iop_ReinterpF64asI64,
+                                  unop( Iop_F128HItoF64, mkexpr( vB ) ) ) );
+               assign( vT, unop( Iop_I64UtoF128, mkexpr( tmp ) ) );
+               generate_store_FPRF( Ity_F128, vT );
+               break;
+            }
+         case 9:    // xsvqpswz  VSX Scalar Truncate & Convert Quad-Precision
+                    // format to Signed Word format
+            {
+               DIP("xscvqpswz v%d,v%d\n",  vT_addr, vB_addr);
+               assign( vT, unop( Iop_TruncF128toI32S, mkexpr( vB ) ) );
+               break;
+            }
+         case 10:   // xscvsdqp  VSX Scalar from Signed Doubleword format
+                    // Quad-Precision format
+            {
+               IRTemp tmp = newTemp( Ity_I64 );
+
+               DIP("xscvsdqp v%d,v%d\n",  vT_addr, vB_addr);
+
+               assign( tmp, unop( Iop_ReinterpF64asI64,
+                                  unop( Iop_F128HItoF64, mkexpr( vB ) ) ) );
+               assign( vT, unop( Iop_I64StoF128, mkexpr( tmp ) ) );
+               generate_store_FPRF( Ity_F128, vT );
+               break;
+            }
+        case 17:    // xsvqpudz  VSX Scalar Truncate & Convert Quad-Precision
+                    // format to Unigned Doubleword format
+            {
+              DIP("xscvqpudz v%d,v%d\n",  vT_addr, vB_addr);
+               assign( vT, unop( Iop_TruncF128toI64U, mkexpr( vB ) ) );
+               break;
+            }
+         case 20: //  xscvqpdp  Scalar round & Conver Quad-Precision
+                  //  format to Double-Precision format [using round to Odd]
+            {
+               IRTemp ftmp = newTemp( Ity_F64 );
+               IRTemp tmp = newTemp( Ity_I64 );
+
+               /* This instruction takes a 128-bit floating point value and
+                * converts it to a 64-bit floating point value.  The 64-bit
+                * result is stored in the upper 64-bit of the 128-bit result
+                * register.  The lower 64-bit are undefined.
+                */
+               if (R0 == 0) { //  xscvqpdp
+                  /* rounding mode specified by RN. Issue inst with R0 = 0 */
+                  DIP("xscvqpdp v%d,v%d\n",  vT_addr, vB_addr);
+
+                  assign( ftmp, binop( Iop_F128toF64, rm, mkexpr( vB ) ) );
+
+               } else {       //  xscvqpdpo
+                  /* rounding mode is Round to odd. Issue inst with R0 = 1 */
+                  DIP("xscvqpdpo v%d,v%d\n",  vT_addr, vB_addr);
+                  assign( ftmp,
+                          binop( Iop_F128toF64,
+                                 set_round_to_Oddmode(), mkexpr( vB ) ) );
+               }
+
+               /* store 64-bit float in upper 64-bits of 128-bit register,
+                * lower 64-bits are zero.
+                */
+               if (host_endness == VexEndnessLE)
+                  assign( vT,
+                          binop( Iop_F64HLtoF128,
+                                 mkexpr( ftmp ),
+                                 unop( Iop_ReinterpI64asF64, mkU64( 0 ) ) ) );
+               else
+                  assign( vT,
+                          binop( Iop_F64HLtoF128,
+                                 unop( Iop_ReinterpI64asF64, mkU64( 0 ) ),
+                                 mkexpr( ftmp ) ) );
+
+               assign( tmp, unop( Iop_ReinterpF64asI64,
+                                  unop( Iop_F128HItoF64, mkexpr( vT ) ) ) );
+
+               generate_store_FPRF( Ity_I64, tmp );
+               break;
+            }
+         case 22:    // xscvdpqp VSX Scalar Convert from Double-Precision
+                     // format to Quad-Precision format
+            {
+               DIP("xscvdpqp v%d,v%d\n",  vT_addr, vB_addr);
+               /* The 64-bit value is in the upper 64 bit of the src */
+               assign( vT, unop( Iop_F64toF128,
+                                 unop( Iop_F128HItoF64, mkexpr( vB ) ) ) );
+
+               generate_store_FPRF( Ity_F128, vT );
+               break;
+            }
+         case 25:    // xsvqpsdz  VSX Scalar Truncate & Convert Quad-Precision
+                     // format to Signed Doubleword format
+            {
+               DIP("xscvqpsdz v%d,v%d\n",  vT_addr, vB_addr);
+               assign( vT, unop( Iop_TruncF128toI64S, mkexpr( vB ) ) );
+               break;
+            }
+         default:
+           vex_printf( "dis_vx_Floating_Point_Arithmetic_quad_precision invalid inst_select (ppc)(opc2)\n" );
+           return False;
+         }  /* switch inst_select */
+      }   /* end case 0x344 */
+      break;
+   default:    /* switch opc2 */
+      vex_printf( "dis_vx_Floating_Point_Arithmetic_quad_precision(ppc)(opc2)\n" );
+      return False;
+   }
+   putF128Reg( vT_addr, mkexpr( vT ) );
+   return True;
+}
+
+
+/* VSX Scalar Quad-Precision instructions */
+static Bool
+dis_vx_scalar_quad_precision ( UInt theInstr )
+{
+   /* This function emulates the 128-bit floating point instructions
+    * using existing 128-bit vector instructions (Iops).  The 128-bit
+    * floating point instructions use the same 128-bit vector register
+    * set.
+    */
+   /* XX1-Form */
+   UChar opc1 = ifieldOPC( theInstr );
+   UInt opc2 = ifieldOPClo10( theInstr );
+   UChar vT_addr = ifieldRegDS( theInstr ) + 32;
+   UChar vA_addr = ifieldRegA( theInstr ) + 32;
+   UChar vB_addr = ifieldRegB( theInstr ) + 32;
+   IRTemp vA = newTemp( Ity_V128 );
+   IRTemp vB = newTemp( Ity_V128 );
+   IRTemp vT = newTemp( Ity_V128 );
+
+   assign( vB, getVSReg( vB_addr ) );
+
+   if (opc1 != 0x3F) {
+      vex_printf( "dis_vx_scalar_quad_precision(ppc)(instr)\n" );
+      return False;
+   }
+
+   switch (opc2) {
+
+   case 0x064:     // xscpsgnqp (VSX Scalar Copy Sign Quad-Precision)
+      {
+         IRTemp sign_vA = newTemp( Ity_I64 );
+         IRTemp vB_hi = newTemp( Ity_I64 );
+
+         DIP("xscpsgnqp v%d,v%d,v%d\n",  vT_addr, vA_addr, vB_addr);
+
+         assign( vA, getVSReg(vA_addr) );
+
+         assign( sign_vA, binop( Iop_And64,
+                                 unop( Iop_V128HIto64,
+                                       mkexpr( vA ) ),
+                                 mkU64( 0x8000000000000000ULL ) ) );
+         assign( vB_hi, binop( Iop_Or64,
+                               binop( Iop_And64,
+                                      unop( Iop_V128HIto64,
+                                            mkexpr( vB ) ),
+                                      mkU64( 0x7FFFFFFFFFFFFFFFULL ) ),
+                               mkexpr( sign_vA ) ) );
+         assign( vT, binop( Iop_64HLtoV128,
+                            mkexpr( vB_hi ),
+                            unop( Iop_V128to64, mkexpr( vB ) ) ) );
+         break;
+      }
+
+   case 0x084:     // xscmpoqp (VSX Scalar Compare Ordered Quad-Precision)
+   case 0x284:     // xscmpuqp (VSX Scalar Compare Unrdered Quad-Precision)
+      {
+         /* Note, only differece between xscmoqp and xscmpuqp is the
+            exception flag settings which are not supported anyway. */
+         IRExpr *bit4, *bit5, *bit6, *bit7;
+         IRExpr *bit_zero, *bit_inf, *same_sign;
+         UInt BF = IFIELD( theInstr, 23, 3 );
+         IRTemp eq_lt_gt = newTemp( Ity_I32 );
+         IRTemp CC = newTemp( Ity_I32 );
+
+         if (opc2 == 0x084) {
+            DIP("xscmpoqp %d,v%d,v%d\n",  BF, vA_addr, vB_addr);
+         } else {
+            DIP("xscmpuqp %d,v%d,v%d\n",  BF, vA_addr, vB_addr);
+         }
+
+         assign( vA, getVSReg(vA_addr));
+
+         /* A and B have the same sign */
+         same_sign =  binop( Iop_CmpEQ64,
+                             binop( Iop_Shr64,
+                                    unop( Iop_V128HIto64,
+                                          mkexpr( vA ) ),
+                                    mkU8( 63 ) ),
+                             binop( Iop_Shr64,
+                                    unop( Iop_V128HIto64,
+                                          mkexpr( vB ) ),
+                                    mkU8( 63 ) ) );
+
+         /* A < B */
+         bit4 = Quad_precision_gt( vB, vA );
+
+         /*  A > B  */
+         bit5 = Quad_precision_gt( vA, vB );
+
+         /* A equal B */
+         bit6 = mkAND1( binop( Iop_CmpEQ64,
+                               unop( Iop_V128HIto64,
+                                     mkexpr( vA ) ),
+                               unop( Iop_V128HIto64,
+                                     mkexpr( vB ) ) ),
+                        binop( Iop_CmpEQ64,
+                               unop( Iop_V128to64,
+                                     mkexpr( vA ) ),
+                               unop( Iop_V128to64,
+                                     mkexpr( vB ) ) ) );
+
+         /* test both zero don't care about sign */
+         bit_zero = mkAND1( is_Zero( Ity_V128, vA ), is_Zero( Ity_V128, vB ) );
+
+         /* test both for infinity, don't care about sign */
+         bit_inf = mkAND1(
+                          mkAND1( is_Inf( Ity_V128, vA ), is_Inf( Ity_V128, vB ) ),
+                          binop( Iop_CmpEQ64,
+                                 binop( Iop_And64,
+                                        unop( Iop_V128to64,
+                                              mkexpr( vA ) ),
+                                        mkU64( 0x80000000) ),
+                                 binop( Iop_And64,
+                                        unop( Iop_V128to64,
+                                              mkexpr( vB ) ),
+                                        mkU64( 0x80000000) ) ) );
+
+         /* exp A or exp B is NaN */
+         bit7 = mkOR1( is_NaN( Ity_V128, vA ),
+                       is_NaN( Ity_V128, vB ) );
+
+         assign( eq_lt_gt,
+                 binop( Iop_Or32,
+                        binop( Iop_Or32,
+                               binop( Iop_Shl32,
+                                      unop( Iop_1Uto32, bit4 ),
+                                      mkU8( 3 ) ),
+                               binop( Iop_Shl32,
+                                      unop( Iop_1Uto32, bit5 ),
+                                      mkU8( 2 ) ) ),
+                        binop( Iop_Or32,
+                               binop( Iop_Shl32,
+                                      unop( Iop_1Uto32, bit6 ),
+                                      mkU8( 1 ) ),
+                               binop( Iop_Or32,
+                                      binop( Iop_Shl32,
+                                             unop( Iop_1Uto32,
+                                                   bit_zero ),
+                                             mkU8( 1 ) ),
+                                      binop( Iop_Shl32,
+                                             unop( Iop_1Uto32,
+                                                   mkAND1( bit_inf, same_sign ) ),
+                                             mkU8( 1 ) ) ) ) ) );
+
+         assign(CC, binop( Iop_Or32,
+                           binop( Iop_And32,
+                                  unop( Iop_Not32,
+                                        unop( Iop_1Sto32, bit7 ) ),
+                                  mkexpr( eq_lt_gt ) ),
+                           unop( Iop_1Uto32, bit7 ) ) );
+
+         /* put result of the comparison into CC and FPCC */
+         putGST_field( PPC_GST_CR, mkexpr( CC ), BF );
+         putFPCC( mkexpr( CC ) );
+         return True;
+      }
+      break;
+
+   case 0xA4:     // xscmpexpqp (VSX Scalar Compare Exponents Double-Precision)
+      {
+         IRExpr *bit4, *bit5, *bit6, *bit7;
+         UInt BF = IFIELD( theInstr, 23, 3 );
+
+         IRTemp eq_lt_gt = newTemp( Ity_I32 );
+         IRTemp CC = newTemp( Ity_I32 );
+
+         DIP("xscmpexpqp %d,v%d,v%d\n",  BF, vA_addr, vB_addr);
+
+         assign( vA, getVSReg(vA_addr));
+
+         /* A exp < B exp */
+         bit4 = binop( Iop_CmpLT64U,
+                      binop( Iop_And64,
+                             unop( Iop_V128HIto64,
+                                   mkexpr( vA ) ),
+                             mkU64( 0x7FFF000000000000 ) ),
+                      binop( Iop_And64,
+                             unop( Iop_V128HIto64,
+                                   mkexpr( vB ) ),
+                             mkU64( 0x7FFF000000000000 ) ) );
+         /*  exp > B exp */
+         bit5 = binop( Iop_CmpLT64U,
+                      binop( Iop_And64,
+                             unop( Iop_V128HIto64,
+                                   mkexpr( vB ) ),
+                             mkU64( 0x7FFF000000000000 ) ),
+                      binop( Iop_And64,
+                             unop( Iop_V128HIto64,
+                                   mkexpr( vA ) ),
+                             mkU64( 0x7FFF000000000000 ) ) );
+         /* test equal */
+         bit6 = binop( Iop_CmpEQ64,
+                      binop( Iop_And64,
+                             unop( Iop_V128HIto64,
+                                   mkexpr( vA ) ),
+                             mkU64( 0x7FFF000000000000 ) ),
+                      binop( Iop_And64,
+                             unop( Iop_V128HIto64,
+                                   mkexpr( vB ) ),
+                             mkU64( 0x7FFF000000000000 ) ) );
+
+         /* exp A or exp B is NaN */
+         bit7 = mkOR1( is_NaN( Ity_V128, vA ),
+                       is_NaN( Ity_V128, vB ) );
+
+         /* NaN over rules the other comparisons */
+         assign( eq_lt_gt, binop( Iop_Or32,
+                                  binop( Iop_Shl32,
+                                         unop( Iop_1Uto32, bit4 ),
+                                         mkU8( 3) ),
+                                  binop( Iop_Or32,
+                                         binop( Iop_Shl32,
+                                                unop( Iop_1Uto32, bit5 ),
+                                                mkU8( 2) ),
+                                         binop( Iop_Shl32,
+                                                unop( Iop_1Uto32, bit6 ),
+                                                mkU8( 1 ) ) ) ) );
+         assign(CC, binop( Iop_Or32,
+                           binop( Iop_And32,
+                                  unop( Iop_Not32,
+                                        unop( Iop_1Sto32, bit7 ) ),
+                                  mkexpr( eq_lt_gt ) ),
+                           unop( Iop_1Uto32, bit7 ) ) );
+
+         /* put result of the comparison into CC and FPCC */
+         putGST_field( PPC_GST_CR, mkexpr( CC ), BF );
+         putFPCC( mkexpr( CC ) );
+         return True;
+      }
+      break;
+
+   case 0x2C4:    // xststdcqp (VSX Scalar Quad-Precision Test Data Class)
+      {
+         UInt BF = IFIELD( theInstr, 23, 3 );
+         UInt DCMX_mask = IFIELD( theInstr, 16, 7 );
+         IRTemp CC = newTemp( Ity_I64 );
+         IRTemp NaN = newTemp( Ity_I64 );
+         IRTemp inf = newTemp( Ity_I64 );
+         IRTemp pos = newTemp( Ity_I64 );
+         IRTemp DCM = newTemp( Ity_I64 );
+         IRTemp zero = newTemp( Ity_I64 );
+         IRTemp dnorm = newTemp( Ity_I64 );
+
+         DIP("xststdcqp  %d,v%d,%d\n",  BF, vB_addr, DCMX_mask);
+
+         assign( zero, unop( Iop_1Uto64, is_Zero( Ity_V128, vB ) ) );
+         assign( pos, unop( Iop_1Uto64,
+                            binop( Iop_CmpEQ64,
+                                   binop( Iop_Shr64,
+                                          unop( Iop_V128HIto64,
+                                                mkexpr( vB ) ),
+                                          mkU8( 63 ) ),
+                                   mkU64( 0 ) ) ) );
+
+         assign( NaN, unop( Iop_1Uto64, is_NaN( Ity_V128, vB ) ) );
+         assign( inf, unop( Iop_1Uto64, is_Inf( Ity_V128, vB ) ) );
+
+         assign( dnorm, unop( Iop_1Uto64, is_Denorm( Ity_V128, vB ) ) );
+         assign( DCM, create_DCM( Ity_I64, NaN, inf, zero, dnorm, pos ) );
+         assign( CC, binop( Iop_Or64,
+                            binop( Iop_And64,    /* vB sign bit */
+                                  binop( Iop_Shr64,
+                                         unop( Iop_V128HIto64, mkexpr( vB ) ),
+                                         mkU8( 60 ) ),
+                                  mkU64( 0x8 ) ),
+                           binop( Iop_Shl64,
+                                  unop( Iop_1Uto64,
+                                        binop( Iop_CmpNE64,
+                                               binop( Iop_And64,
+                                                      mkexpr( DCM ),
+                                                      mkU64( DCMX_mask ) ),
+                                               mkU64( 0 ) ) ),
+                                   mkU8( 1 ) ) ) );
+
+         putGST_field( PPC_GST_CR, unop(Iop_64to32, mkexpr( CC ) ), BF );
+         putFPCC( unop(Iop_64to32, mkexpr( CC ) ) );
+         return True;
+      }
+      break;
+
+   case 0x324:    // xsabsqp  (VSX Scalar Absolute Quad-Precision)
+                  // xsxexpqp (VSX Scalaar Extract Exponent Quad-Precision)
+                  // xsnabsqp (VSX Scalar Negative Absolute Quad-Precision)
+                  // xsnegqp  (VSX Scalar Negate Quad-Precision)
+                  // xsxsigqp (VSX Scalar Extract Significand Quad-Precision)
+      {
+         UInt inst_select = IFIELD( theInstr, 16, 5);
+
+         switch (inst_select) {
+         case 0:
+            DIP("xsabsqp  v%d,v%d\n",  vT_addr, vB_addr);
+            assign( vT, binop( Iop_AndV128, mkexpr( vB ),
+                               binop( Iop_64HLtoV128,
+                                      mkU64( 0x7FFFFFFFFFFFFFFF ),
+                                      mkU64( 0xFFFFFFFFFFFFFFFF ) ) ) );
+            break;
+
+         case 2:
+            DIP("xsxexpqp  v%d,v%d\n",  vT_addr, vB_addr);
+            assign( vT, binop( Iop_ShrV128,
+                               binop( Iop_AndV128, mkexpr( vB ),
+                                      binop( Iop_64HLtoV128,
+                                             mkU64( 0x7FFF000000000000 ),
+                                             mkU64( 0x0000000000000000 ) ) ),
+                               mkU8( 48 ) ) );
+            break;
+
+         case 8:
+            DIP("xsnabsqp  v%d,v%d\n",  vT_addr, vB_addr);
+            assign( vT, binop( Iop_OrV128, mkexpr( vB ),
+                            binop( Iop_64HLtoV128,
+                                   mkU64( 0x8000000000000000 ),
+                                   mkU64( 0x0000000000000000 ) ) ) );
+            break;
+
+         case 16:
+            DIP("xsnegqp  v%d,v%d\n",  vT_addr, vB_addr);
+            assign( vT, binop( Iop_XorV128, mkexpr( vB ),
+                            binop( Iop_64HLtoV128,
+                                   mkU64( 0x8000000000000000 ),
+                                   mkU64( 0x0000000000000000 ) ) ) );
+            break;
+
+         case 18:
+         {
+            IRTemp expZero = newTemp( Ity_I64 );
+            IRTemp expInfinity = newTemp( Ity_I64 );
+
+            DIP("xsxsigqp  v%d,v%d\n",  vT_addr, vB_addr);
+
+            assign( expZero, unop( Iop_1Uto64,
+                                   binop( Iop_CmpNE64,
+                                          binop( Iop_And64,
+                                                 unop( Iop_V128HIto64,
+                                                       mkexpr( vB ) ),
+                                                 mkU64( 0x7FFF000000000000 ) ),
+                                          mkU64( 0x0 ) ) ) );
+
+            assign( expInfinity,
+                    unop( Iop_1Uto64,
+                          binop( Iop_CmpNE64,
+                                 binop( Iop_And64,
+                                        unop( Iop_V128HIto64,
+                                              mkexpr( vB ) ),
+                                        mkU64( 0x7FFF000000000000 ) ),
+                                 mkU64( 0x7FFF000000000000 ) ) ) );
+
+            /* Clear upper 16 bits to 0x0000.  If the exp was zero or infinity
+             * set bit 48 (lsb = 0) to 0, otherwise  set bit 48 to 1.
+             */
+            assign( vT,
+                    binop( Iop_OrV128,
+                           binop( Iop_ShrV128,
+                                  binop( Iop_ShlV128,
+                                         mkexpr( vB ),
+                                         mkU8( 16 ) ),
+                                  mkU8( 16 ) ),
+                           binop( Iop_64HLtoV128,
+                                  binop( Iop_Shl64,
+                                         binop( Iop_And64,
+                                                mkexpr( expZero ),
+                                                mkexpr( expInfinity ) ),
+                                         mkU8( 48 ) ),
+                                  mkU64( 0 ) ) ) );
+         }
+         break;
+
+         default:
+            vex_printf( "dis_vx_scalar_quad_precision invalid inst_select (ppc)(opc2)\n" );
+            return False;
+         }
+      }
+      break;
+   case 0x364:    // xsiexpqp (VST Scalar Insert Exponent Quad-Precision)
+      {
+         IRTemp exp = newTemp( Ity_I64 );
+
+         DIP("xsiexpqp  v%d,v%d,v%d\n",  vT_addr, vA_addr, vB_addr);
+
+         assign( vA, getVSReg( vA_addr ) );
+         assign( exp, binop( Iop_And64,
+                                 unop( Iop_V128HIto64,
+                                       mkexpr( vB ) ),
+                                 mkU64( 0x7FFFULL ) ) );
+         assign( vT, binop( Iop_64HLtoV128,
+                            binop( Iop_Or64,
+                                   binop( Iop_And64,
+                                          unop( Iop_V128HIto64,
+                                                mkexpr( vA ) ),
+                                          mkU64( 0x8000FFFFFFFFFFFFULL ) ),
+                                   binop( Iop_Shl64,
+                                          mkexpr( exp ),
+                                          mkU8( 48 ) ) ),
+                            unop( Iop_V128to64,
+                                  mkexpr( vA ) ) ) );
+      }
+      break;
+
+   default:
+      vex_printf( "dis_vx_scalar_quad_precision(ppc)(opc2)\n" );
+
+      return False;
+   }
+
+   putVSReg( vT_addr, mkexpr( vT ) );
+   return True;
+}
+
 /*
  * VSX permute and other miscealleous instructions
  */
@@ -15854,6 +22059,144 @@
             binop(Iop_AndV128, mkexpr(vB), mkexpr(vC))) );
          break;
       }
+
+      case 0x68: // xxperm  (VSX Permute )
+      case 0xE8: // xxpermr (VSX Permute right-index )
+      {
+         int i;
+         IRTemp new_Vt[17];
+         IRTemp perm_val[16];
+         IRTemp perm_val_gt16[16];
+         IRTemp tmp_val[16];
+         IRTemp perm_idx[16];
+         IRTemp perm_mask = newTemp( Ity_V128 );
+         IRTemp val_mask  = newTemp( Ity_V128 );
+         int    dest_shift_amount = 0;
+
+         if ( opc2 == 0x68 ) {
+            DIP("xxperm v%d,v%d,v%d\n", (UInt)XT, (UInt)XA, (UInt)XB);
+
+         } else {
+            /* Same as xperm just the index is 31 - idx */
+            DIP("xxpermr v%d,v%d,v%d\n", (UInt)XT, (UInt)XA, (UInt)XB);
+         }
+
+         new_Vt[0] = newTemp( Ity_V128 );
+
+         assign( vT, getVSReg( XT ) );
+
+         assign( new_Vt[0], binop( Iop_64HLtoV128,
+                                   mkU64( 0x0 ), mkU64( 0x0 ) ) );
+         assign( perm_mask, binop( Iop_64HLtoV128,
+                                   mkU64( 0x0 ), mkU64( 0x1F ) ) );
+         assign( val_mask, binop( Iop_64HLtoV128,
+                                  mkU64( 0x0 ), mkU64( 0xFF ) ) );
+
+         /* For each permute index in XB, the permute list, select the byte
+          * from XA indexed by the permute index if the permute index is less
+          * then 16.  Copy the selected byte to the destination location in
+          * the result.
+          */
+         for ( i = 0; i < 16; i++ ) {
+            perm_val_gt16[i] = newTemp( Ity_V128 );
+            perm_val[i] = newTemp( Ity_V128 );
+            perm_idx[i] = newTemp( Ity_I8 );
+            tmp_val[i]  = newTemp( Ity_V128 );
+            new_Vt[i+1]  = newTemp( Ity_V128 );
+
+            /* create mask to extract the permute index value from vB,
+             * store value in least significant bits of perm_val
+             */
+            if ( opc2 == 0x68 )
+               /* xxperm, the perm value is the index value in XB */
+               assign( perm_val[i], binop( Iop_ShrV128,
+                                           binop( Iop_AndV128,
+                                                  mkexpr(vB),
+                                                  binop( Iop_ShlV128,
+                                                         mkexpr( perm_mask ),
+                                                         mkU8( (15 - i) * 8 ) ) ),
+                                           mkU8( (15 - i) * 8 ) ) );
+
+            else
+               /* xxpermr, the perm value is 31 - index value in XB */
+               assign( perm_val[i],
+                       binop( Iop_Sub8x16,
+                              binop( Iop_64HLtoV128,
+                                     mkU64( 0 ), mkU64( 31 ) ),
+                              binop( Iop_ShrV128,
+                                     binop( Iop_AndV128,
+                                            mkexpr( vB ),
+                                            binop( Iop_ShlV128,
+                                                   mkexpr( perm_mask ),
+                                                   mkU8( ( 15 - i ) * 8 ) ) ),
+                                     mkU8( ( 15 - i ) * 8 ) ) ) );
+
+            /* Determine if the perm_val[] > 16.  If it is, then the value
+             * will come from xT otherwise it comes from xA.  Either way,
+             * create the mask to get the value from the source using the
+             * lower 3 bits of perm_val[].  Create a 128 bit mask from the
+             * upper bit of perm_val[] to be used to select from xT or xA.
+             */
+            assign( perm_val_gt16[i],
+                    binop(Iop_64HLtoV128,
+                          unop( Iop_1Sto64,
+                                unop( Iop_64to1,
+                                      unop( Iop_V128to64,
+                                            binop( Iop_ShrV128,
+                                                   mkexpr( perm_val[i] ),
+                                                   mkU8( 4 ) ) ) ) ),
+                          unop( Iop_1Sto64,
+                                unop( Iop_64to1,
+                                      unop( Iop_V128to64,
+                                            binop( Iop_ShrV128,
+                                                   mkexpr( perm_val[i] ),
+                                                   mkU8( 4 ) ) ) ) ) ) );
+
+            assign( perm_idx[i],
+                    unop(Iop_32to8,
+                         binop( Iop_Mul32,
+                                binop( Iop_Sub32,
+                                       mkU32( 15 ),
+                                       unop( Iop_64to32,
+                                             binop( Iop_And64,
+                                                  unop( Iop_V128to64,
+                                                       mkexpr( perm_val[i] ) ),
+                                                  mkU64( 0xF ) ) ) ),
+                                mkU32( 8 ) ) ) );
+
+            dest_shift_amount = ( 15 - i )*8;
+
+            /* Use perm_val_gt16 to select value from vA or vT */
+            assign( tmp_val[i],
+                    binop( Iop_ShlV128,
+                           binop( Iop_ShrV128,
+                                  binop( Iop_OrV128,
+                                         binop( Iop_AndV128,
+                                                mkexpr( vA ),
+                                                binop( Iop_AndV128,
+                                                       unop( Iop_NotV128,
+                                                             mkexpr( perm_val_gt16[i] ) ),
+                                                       binop( Iop_ShlV128,
+                                                              mkexpr( val_mask ),
+                                                              mkexpr( perm_idx[i] ) ) ) ),
+                                         binop( Iop_AndV128,
+                                                mkexpr( vT ),
+                                                binop( Iop_AndV128,
+                                                       mkexpr( perm_val_gt16[i] ),
+                                                       binop( Iop_ShlV128,
+                                                              mkexpr( val_mask ),
+                                                              mkexpr( perm_idx[i] ) ) ) ) ),
+                                  mkexpr( perm_idx[i] ) ),
+                           mkU8( dest_shift_amount ) ) );
+
+            assign( new_Vt[i+1], binop( Iop_OrV128,
+                                       mkexpr( tmp_val[i] ),
+                                       mkexpr( new_Vt[i] ) ) );
+         }
+         putVSReg( XT, mkexpr( new_Vt[16] ) );
+         break;
+      }
+
       case 0x148: // xxspltw (VSX Splat Word)
       {
          UChar UIM   = ifieldRegA(theInstr) & 3;
@@ -15934,7 +22277,7 @@
             d = unsafeIRDirty_0_N (
                            0/*regparms*/,
                            "ppc64g_dirtyhelper_LVS",
-                           &ppc64g_dirtyhelper_LVS,
+                           fnptr_to_fnentry( vbi, &ppc64g_dirtyhelper_LVS ),
                            args_le );
       }
       DIP("lvsl v%d,r%u,r%u\n", vD_addr, rA_addr, rB_addr);
@@ -15984,7 +22327,7 @@
             d = unsafeIRDirty_0_N (
                            0/*regparms*/,
                            "ppc64g_dirtyhelper_LVS",
-                           &ppc64g_dirtyhelper_LVS,
+                           fnptr_to_fnentry( vbi, &ppc64g_dirtyhelper_LVS ),
                            args_le );
       }
       DIP("lvsr v%d,r%u,r%u\n", vD_addr, rA_addr, rB_addr);
@@ -16162,8 +22505,8 @@
    a7 = a6 = a5 = a4 = a3 = a2 = a1 = a0 = IRTemp_INVALID;
    b3 = b2 = b1 = b0 = IRTemp_INVALID;
 
-   assign( vA, getVReg(vA_addr));
-   assign( vB, getVReg(vB_addr));
+   assign( vA, getVReg( vA_addr ) );
+   assign( vB, getVReg( vB_addr ) );
 
    if (opc1 != 0x4) {
       vex_printf("dis_av_arith(ppc)(opc1 != 0x4)\n");
@@ -16175,10 +22518,10 @@
    case 0x180: { // vaddcuw (Add Carryout Unsigned Word, AV p136)
       DIP("vaddcuw v%d,v%d,v%d\n", vD_addr, vA_addr, vB_addr);
       /* unsigned_ov(x+y) = (y >u not(x)) */
-      putVReg( vD_addr, binop(Iop_ShrN32x4,
-                              binop(Iop_CmpGT32Ux4, mkexpr(vB),
-                                    unop(Iop_NotV128, mkexpr(vA))),
-                              mkU8(31)) );
+      putVReg( vD_addr, binop( Iop_ShrN32x4,
+                               binop( Iop_CmpGT32Ux4, mkexpr( vB ),
+                                    unop( Iop_NotV128, mkexpr( vA ) ) ),
+                              mkU8( 31 ) ) );
       break;
    }
    case 0x000: // vaddubm (Add Unsigned Byte Modulo, AV p141)
@@ -16750,18 +23093,110 @@
       assign( vD, binop(Iop_CmpEQ8x16, mkexpr(vA), mkexpr(vB)) );
       break;
 
+   case 0x007: // vcmpneb (Compare Not Equal byte)
+      DIP("vcmpneb%s v%d,v%d,v%d\n", (flag_rC ? ".":""),
+                                      vD_addr, vA_addr, vB_addr);
+      assign( vD, unop( Iop_NotV128,
+                        binop( Iop_CmpEQ8x16, mkexpr( vA ), mkexpr( vB ) ) ) );
+      break;
+
    case 0x046: // vcmpequh (Compare Equal-to Unsigned HW, AV p161)
       DIP("vcmpequh%s v%d,v%d,v%d\n", (flag_rC ? ".":""),
                                       vD_addr, vA_addr, vB_addr);
       assign( vD, binop(Iop_CmpEQ16x8, mkexpr(vA), mkexpr(vB)) );
       break;
 
+   case 0x047: // vcmpneh (Compare Not Equal-to  Halfword)
+      DIP("vcmpneh%s v%d,v%d,v%d\n", (flag_rC ? ".":""),
+                                      vD_addr, vA_addr, vB_addr);
+      assign( vD, unop( Iop_NotV128,
+                        binop( Iop_CmpEQ16x8, mkexpr( vA ), mkexpr( vB ) ) ) );
+      break;
+
    case 0x086: // vcmpequw (Compare Equal-to Unsigned W, AV p162)
       DIP("vcmpequw%s v%d,v%d,v%d\n", (flag_rC ? ".":""),
                                       vD_addr, vA_addr, vB_addr);
       assign( vD, binop(Iop_CmpEQ32x4, mkexpr(vA), mkexpr(vB)) );
       break;
 
+   case 0x087: // vcmpnew (Compare Not Equal-to Word)
+      DIP("vcmpnew%s v%d,v%d,v%d\n", (flag_rC ? ".":""),
+                                      vD_addr, vA_addr, vB_addr);
+      assign( vD, unop( Iop_NotV128,
+                        binop( Iop_CmpEQ32x4, mkexpr( vA ), mkexpr( vB ) ) ) );
+      break;
+
+   case 0x107: // vcmpnezb (Compare Not Equal or Zero byte)
+      {
+         IRTemp vAeqvB = newTemp( Ity_V128 );
+         IRTemp vAeq0  = newTemp( Ity_V128 );
+         IRTemp vBeq0  = newTemp( Ity_V128 );
+         IRTemp zero   = newTemp( Ity_V128 );
+
+         DIP("vcmpnezb%s v%d,v%d,v%d\n", (flag_rC ? ".":""),
+                                         vD_addr, vA_addr, vB_addr);
+
+         assign( zero, binop( Iop_64HLtoV128, mkU64( 0 ), mkU64( 0 ) ) );
+         assign( vAeq0, binop( Iop_CmpEQ8x16, mkexpr( vA ), mkexpr( zero ) ) );
+         assign( vBeq0, binop( Iop_CmpEQ8x16, mkexpr( vB ), mkexpr( zero ) ) );
+         assign( vAeqvB, unop( Iop_NotV128,
+                               binop( Iop_CmpEQ8x16, mkexpr( vA ),
+                                      mkexpr( vB ) ) ) );
+
+         assign( vD, mkOr3_V128( vAeqvB, vAeq0, vBeq0  ) );
+      }
+      break;
+
+   case 0x147: // vcmpnezh (Compare Not Equal or Zero Halfword)
+      {
+         IRTemp vAeqvB = newTemp( Ity_V128 );
+         IRTemp vAeq0  = newTemp( Ity_V128 );
+         IRTemp vBeq0  = newTemp( Ity_V128 );
+         IRTemp zero   = newTemp( Ity_V128 );
+
+         DIP("vcmpnezh%s v%d,v%d,v%d\n", (flag_rC ? ".":""),
+                                         vD_addr, vA_addr, vB_addr);
+
+         assign( zero, binop( Iop_64HLtoV128, mkU64( 0 ), mkU64( 0 ) ) );
+         assign( vAeq0, binop( Iop_CmpEQ16x8, mkexpr( vA ), mkexpr( zero ) ) );
+         assign( vBeq0, binop( Iop_CmpEQ16x8, mkexpr( vB ), mkexpr( zero ) ) );
+         assign( vAeqvB, unop( Iop_NotV128,
+                               binop(Iop_CmpEQ16x8, mkexpr( vA ),
+                                     mkexpr( vB ) ) ) );
+
+         assign( vD, binop( Iop_OrV128,
+                            binop( Iop_OrV128,
+                                   mkexpr( vAeq0 ),
+                                   mkexpr( vBeq0 ) ),
+                            mkexpr( vAeqvB ) ) );
+      }
+      break;
+
+   case 0x187: // vcmpnezw (Compare Not Equal or Zero Word)
+      {
+         IRTemp vAeqvB = newTemp( Ity_V128 );
+         IRTemp vAeq0  = newTemp( Ity_V128 );
+         IRTemp vBeq0  = newTemp( Ity_V128 );
+         IRTemp zero   = newTemp( Ity_V128 );
+
+         DIP("vcmpnezw%s v%d,v%d,v%d\n", (flag_rC ? ".":""),
+                                         vD_addr, vA_addr, vB_addr);
+
+         assign( zero, binop( Iop_64HLtoV128, mkU64( 0 ), mkU64( 0 ) ) );
+         assign( vAeq0, binop( Iop_CmpEQ32x4, mkexpr( vA ), mkexpr( zero ) ) );
+         assign( vBeq0, binop( Iop_CmpEQ32x4, mkexpr( vB ), mkexpr( zero ) ) );
+         assign( vAeqvB, unop( Iop_NotV128,
+                               binop(Iop_CmpEQ32x4, mkexpr( vA ),
+                                     mkexpr( vB ) ) ) );
+
+         assign( vD, binop( Iop_OrV128,
+                            binop( Iop_OrV128,
+                                   mkexpr( vAeq0 ),
+                                   mkexpr( vBeq0 ) ),
+                            mkexpr( vAeqvB ) ) );
+      }
+      break;
+
    case 0x0C7: // vcmpequd (Compare Equal-to Unsigned Doubleword)
       DIP("vcmpequd%s v%d,v%d,v%d\n", (flag_rC ? ".":""),
                                       vD_addr, vA_addr, vB_addr);
@@ -17122,22 +23557,22 @@
 
    switch (opc2) {
       /* Polynomial Multiply-Add */
-      case 0x408:  // vpmsumb   Vector Polynomial Multipy-sum Byte
+      case 0x408:  // vpmsumb   Vector Polynomial Multiply-sum Byte
          DIP("vpmsumb v%d,v%d,v%d\n", vD_addr, vA_addr, vB_addr);
          putVReg( vD_addr, binop(Iop_PolynomialMulAdd8x16,
                                  mkexpr(vA), mkexpr(vB)) );
          break;
-      case 0x448:  // vpmsumd   Vector Polynomial Multipy-sum Double Word
+      case 0x448:  // vpmsumd   Vector Polynomial Multiply-sum Double Word
          DIP("vpmsumd v%d,v%d,v%d\n", vD_addr, vA_addr, vB_addr);
          putVReg( vD_addr, binop(Iop_PolynomialMulAdd64x2,
                                  mkexpr(vA), mkexpr(vB)) );
          break;
-      case 0x488:  // vpmsumw   Vector Polynomial Multipy-sum Word
+      case 0x488:  // vpmsumw   Vector Polynomial Multiply-sum Word
          DIP("vpmsumw v%d,v%d,v%d\n", vD_addr, vA_addr, vB_addr);
          putVReg( vD_addr, binop(Iop_PolynomialMulAdd32x4,
                                  mkexpr(vA), mkexpr(vB)) );
          break;
-      case 0x4C8:  // vpmsumh   Vector Polynomial Multipy-sum Half Word
+      case 0x4C8:  // vpmsumh   Vector Polynomial Multiply-sum Half Word
          DIP("vpmsumh v%d,v%d,v%d\n", vD_addr, vA_addr, vB_addr);
          putVReg( vD_addr, binop(Iop_PolynomialMulAdd16x8,
                                  mkexpr(vA), mkexpr(vB)) );
@@ -17397,6 +23832,8 @@
       IRTemp vrc_a   = newTemp(Ity_V128);
       IRTemp vrc_b   = newTemp(Ity_V128);
 
+      DIP("vpermxor v%d,v%d,v%d,v%d\n", vD_addr, vA_addr, vB_addr, vC_addr);
+
       /* IBM index  is 0:7, Change index value to index 7:0 */
       assign( vrc_b, binop( Iop_AndV128, mkexpr( vC ),
                             unop( Iop_Dup8x16, mkU8( 0xF ) ) ) );
@@ -17410,6 +23847,112 @@
                                mkexpr( a_perm ), mkexpr( b_perm) ) );
       return True;
    }
+
+   case 0x3B: {  // vpermr (Vector Permute Right-indexed)
+      int i;
+      IRTemp new_Vt[17];
+      IRTemp tmp[16];
+      IRTemp index[16];
+      IRTemp index_gt16[16];
+      IRTemp mask[16];
+
+      DIP("vpermr v%d,v%d,v%d,v%d\n", vD_addr, vA_addr, vB_addr, vC_addr);
+
+      new_Vt[0] = newTemp( Ity_V128 );
+      assign( new_Vt[0], binop( Iop_64HLtoV128,
+                                mkU64( 0x0 ),
+                                mkU64( 0x0 ) ) );
+
+      for ( i = 0; i < 16; i++ ) {
+         index_gt16[i] = newTemp( Ity_V128 );
+         mask[i]  = newTemp( Ity_V128 );
+         index[i] = newTemp( Ity_I32 );
+         tmp[i]   = newTemp( Ity_V128 );
+         new_Vt[i+1] = newTemp( Ity_V128 );
+
+         assign( index[i],
+                 binop( Iop_Sub32,
+                        mkU32( 31 ),
+                        unop( Iop_64to32,
+                              unop( Iop_V128to64,
+                                    binop( Iop_ShrV128,
+                                           binop( Iop_AndV128,
+                                                  binop( Iop_ShlV128,
+                                                         binop( Iop_64HLtoV128,
+                                                                mkU64( 0x0 ),
+                                                                mkU64( 0x3F ) ),
+                                                         mkU8(  (15 - i) * 8 ) ),
+                                                  mkexpr( vC ) ),
+                                           mkU8( (15 - i) * 8 ) ) ) ) ) );
+
+         /* Determine if index < 16, src byte is vA[index], otherwise
+          * vB[31-index]. Check if msb of index is 1 or not.
+          */
+         assign( index_gt16[i],
+                 binop( Iop_64HLtoV128,
+                        unop( Iop_1Sto64,
+                              unop( Iop_32to1,
+                                    binop( Iop_Shr32,
+                                           mkexpr( index[i] ),
+                                           mkU8( 4 ) ) ) ),
+                        unop( Iop_1Sto64,
+                              unop( Iop_32to1,
+                                    binop( Iop_Shr32,
+                                           mkexpr( index[i] ),
+                                           mkU8( 4 ) ) ) ) ) );
+         assign( mask[i],
+                 binop( Iop_ShlV128,
+                        binop( Iop_64HLtoV128,
+                               mkU64( 0x0 ),
+                               mkU64( 0xFF ) ),
+                        unop( Iop_32to8,
+                              binop( Iop_Mul32,
+                                     binop( Iop_Sub32,
+                                            mkU32( 15 ),
+                                            binop( Iop_And32,
+                                                   mkexpr( index[i] ),
+                                                   mkU32( 0xF ) ) ),
+                                     mkU32( 8 ) ) ) ) );
+
+         /* Extract the indexed byte from vA and vB using the lower 4-bits
+          * of the index. Then use the index_gt16 mask to select vA if the
+          * index < 16 or vB if index > 15.  Put the selected byte in the
+          * least significant byte.
+          */
+         assign( tmp[i],
+                 binop( Iop_ShrV128,
+                        binop( Iop_OrV128,
+                               binop( Iop_AndV128,
+                                      binop( Iop_AndV128,
+                                             mkexpr( mask[i] ),
+                                             mkexpr( vA ) ),
+                                      unop( Iop_NotV128,
+                                            mkexpr( index_gt16[i] ) ) ),
+                               binop( Iop_AndV128,
+                                      binop( Iop_AndV128,
+                                             mkexpr( mask[i] ),
+                                             mkexpr( vB ) ),
+                                      mkexpr( index_gt16[i] ) ) ),
+                        unop( Iop_32to8,
+                              binop( Iop_Mul32,
+                                     binop( Iop_Sub32,
+                                            mkU32( 15 ),
+                                            binop( Iop_And32,
+                                                   mkexpr( index[i] ),
+                                                   mkU32( 0xF ) ) ),
+                                     mkU32( 8 ) ) ) ) );
+
+         /* Move the selected byte to the position to store in the result */
+         assign( new_Vt[i+1], binop( Iop_OrV128,
+                                     binop( Iop_ShlV128,
+                                            mkexpr( tmp[i] ),
+                                            mkU8(  (15 - i) * 8 ) ),
+                                     mkexpr( new_Vt[i] ) ) );
+         }
+      putVReg( vD_addr, mkexpr( new_Vt[16] ) );
+      return True;
+   }
+
    default:
      break; // Fall through...
    }
@@ -17454,6 +23997,241 @@
                binop(Iop_InterleaveLO32x4, mkexpr(vA), mkexpr(vB)) );
       break;
 
+   /* Extract instructions */
+   case 0x20D: // vextractub  (Vector Extract Unsigned Byte)
+   {
+      UChar uim = IFIELD( theInstr, 16, 4 );
+
+      DIP("vextractub v%d,v%d,%d\n", vD_addr, vB_addr, uim);
+
+      putVReg( vD_addr, binop( Iop_ShlV128,
+                               binop( Iop_AndV128,
+                                      binop( Iop_ShrV128,
+                                             mkexpr( vB ),
+                                             unop( Iop_32to8,
+                                                   binop( Iop_Mul32,
+                                                          mkU32( 8 ),
+                                                          mkU32( 31 - uim ) ) ) ),
+                                      binop( Iop_64HLtoV128,
+                                             mkU64( 0x0ULL ),
+                                             mkU64( 0xFFULL ) ) ),
+                               mkU8( 64 ) ) );
+   }
+   break;
+
+   case 0x24D: // vextractuh  (Vector Extract Unsigned Halfword)
+   {
+      UChar uim = IFIELD( theInstr, 16, 4 );
+
+      DIP("vextractuh v%d,v%d,%d\n", vD_addr, vB_addr, uim);
+
+      putVReg( vD_addr, binop( Iop_ShlV128,
+                               binop( Iop_AndV128,
+                                      binop( Iop_ShrV128,
+                                             mkexpr( vB ),
+                                             unop( Iop_32to8,
+                                                   binop( Iop_Mul32,
+                                                          mkU32( 8 ),
+                                                          mkU32( 30 - uim ) ) ) ),
+                                      binop( Iop_64HLtoV128,
+                                             mkU64( 0x0ULL ),
+                                             mkU64( 0xFFFFULL ) ) ),
+                               mkU8( 64 ) ) );
+   }
+   break;
+
+   case 0x28D: // vextractuw  (Vector Extract Unsigned Word)
+   {
+      UChar uim = IFIELD( theInstr, 16, 4 );
+
+      DIP("vextractuw v%d,v%d,%d\n", vD_addr, vB_addr, uim);
+
+      putVReg( vD_addr,
+               binop( Iop_ShlV128,
+                      binop( Iop_AndV128,
+                             binop( Iop_ShrV128,
+                                    mkexpr( vB ),
+                                    unop( Iop_32to8,
+                                          binop( Iop_Mul32,
+                                                 mkU32( 8 ),
+                                                 mkU32( 28 - uim ) ) ) ),
+                             binop( Iop_64HLtoV128,
+                                    mkU64( 0x0ULL ),
+                                    mkU64( 0xFFFFFFFFULL ) ) ),
+                      mkU8( 64 ) ) );
+   }
+   break;
+
+   case 0x2CD: // vextractd  (Vector Extract Double Word)
+   {
+      UChar uim = IFIELD( theInstr, 16, 4 );
+
+      DIP("vextractd v%d,v%d,%d\n", vD_addr, vB_addr, uim);
+
+      putVReg( vD_addr,
+               binop( Iop_ShlV128,
+                      binop( Iop_AndV128,
+                             binop( Iop_ShrV128,
+                                    mkexpr( vB ),
+                                    unop( Iop_32to8,
+                                          binop( Iop_Mul32,
+                                                 mkU32( 8 ),
+                                                 mkU32( 24 - uim ) ) ) ),
+                             binop( Iop_64HLtoV128,
+                                    mkU64( 0x0ULL ),
+                                    mkU64( 0xFFFFFFFFFFFFFFFFULL ) ) ),
+                      mkU8( 64 ) ) );
+   }
+   break;
+
+   /* Insert instructions */
+   case 0x30D:  // vinsertb  (Vector insert Unsigned Byte)
+   {
+      UChar uim = IFIELD( theInstr, 16, 4 );
+      IRTemp shift = newTemp( Ity_I8 );
+      IRTemp vD = newTemp( Ity_V128 );
+
+      DIP("vinsertb v%d,v%d,%d\n", vD_addr, vB_addr, uim);
+
+      assign( vD, getVReg( vD_addr ) );
+
+      assign( shift, unop( Iop_32to8,
+                           binop( Iop_Mul32,
+                                  mkU32( 8 ),
+                                  mkU32( 15 - ( uim + 0 ) ) ) ) );
+
+      putVReg( vD_addr,
+               binop( Iop_OrV128,
+                      binop( Iop_ShlV128,
+                             binop( Iop_AndV128,
+                                    binop( Iop_ShrV128,
+                                           mkexpr( vB ),
+                                           mkU8( ( 15 - 7 )*8 ) ),
+                                    binop( Iop_64HLtoV128,
+                                           mkU64( 0x0ULL ),
+                                           mkU64( 0xFFULL ) ) ),
+                             mkexpr( shift ) ),
+                      binop( Iop_AndV128,
+                             unop( Iop_NotV128,
+                                   binop( Iop_ShlV128,
+                                          binop( Iop_64HLtoV128,
+                                                 mkU64( 0x0ULL ),
+                                                 mkU64( 0xFFULL ) ),
+                                          mkexpr( shift ) ) ),
+                             mkexpr( vD ) ) ) );
+   }
+   break;
+
+   case 0x34D: // vinserth  (Vector insert Halfword)
+   {
+      UChar uim = IFIELD( theInstr, 16, 4 );
+      IRTemp shift = newTemp( Ity_I8 );
+      IRTemp vD = newTemp( Ity_V128 );
+
+      DIP("vinserth v%d,v%d,%d\n", vD_addr, vB_addr, uim);
+
+      assign( vD, getVReg( vD_addr ) );
+
+      assign( shift, unop( Iop_32to8,
+                           binop( Iop_Mul32,
+                                  mkU32( 8 ),
+                                  mkU32( 15 - ( uim + 1 ) ) ) ) );
+
+      putVReg( vD_addr,
+               binop( Iop_OrV128,
+                      binop( Iop_ShlV128,
+                             binop( Iop_AndV128,
+                                    binop( Iop_ShrV128,
+                                           mkexpr( vB ),
+                                           mkU8( (7 - 3)*16 ) ),
+                                    binop( Iop_64HLtoV128,
+                                           mkU64( 0x0ULL ),
+                                           mkU64( 0xFFFFULL ) ) ),
+                             mkexpr( shift ) ),
+                      binop( Iop_AndV128,
+                             unop( Iop_NotV128,
+                                   binop( Iop_ShlV128,
+                                          binop( Iop_64HLtoV128,
+                                                 mkU64( 0x0ULL ),
+                                                 mkU64( 0xFFFFULL ) ),
+                                          mkexpr( shift ) ) ),
+                             mkexpr( vD ) ) ) );
+   }
+   break;
+
+   case 0x38D: // vinsertw  (Vector insert Word)
+   {
+      UChar uim = IFIELD( theInstr, 16, 4 );
+      IRTemp shift = newTemp( Ity_I8 );
+      IRTemp vD = newTemp( Ity_V128 );
+
+      DIP("vinsertw v%d,v%d,%d\n", vD_addr, vB_addr, uim);
+
+      assign( vD, getVReg( vD_addr ) );
+
+      assign( shift, unop( Iop_32to8,
+                           binop( Iop_Mul32,
+                                  mkU32( 8 ),
+                                  mkU32( 15 - ( uim + 3 ) ) ) ) );
+
+      putVReg( vD_addr,
+               binop( Iop_OrV128,
+                      binop( Iop_ShlV128,
+                             binop( Iop_AndV128,
+                                    binop( Iop_ShrV128,
+                                           mkexpr( vB ),
+                                           mkU8( (3 - 1) * 32 ) ),
+                                    binop( Iop_64HLtoV128,
+                                           mkU64( 0x0ULL ),
+                                           mkU64( 0xFFFFFFFFULL ) ) ),
+                             mkexpr( shift ) ),
+                      binop( Iop_AndV128,
+                             unop( Iop_NotV128,
+                                   binop( Iop_ShlV128,
+                                          binop( Iop_64HLtoV128,
+                                                 mkU64( 0x0ULL ),
+                                                 mkU64( 0xFFFFFFFFULL ) ),
+                                          mkexpr( shift ) ) ),
+                             mkexpr( vD ) ) ) );
+   }
+   break;
+
+   case 0x3CD: // vinsertd  (Vector insert Doubleword)
+   {
+      UChar uim = IFIELD( theInstr, 16, 4 );
+      IRTemp shift = newTemp( Ity_I8 );
+      IRTemp vD = newTemp( Ity_V128 );
+
+      DIP("vinsertd v%d,v%d,%d\n", vD_addr, vB_addr, uim);
+
+      assign( vD, getVReg( vD_addr ) );
+
+      assign( shift, unop( Iop_32to8,
+                           binop( Iop_Mul32,
+                                  mkU32( 8 ),
+                                  mkU32( 15 - ( uim + 7 ) ) ) ) );
+
+      putVReg( vD_addr,
+               binop( Iop_OrV128,
+                      binop( Iop_ShlV128,
+                             binop( Iop_AndV128,
+                                    binop( Iop_ShrV128,
+                                           mkexpr( vB ),
+                                           mkU8( ( 1 - 0 ) * 64 ) ),
+                                    binop( Iop_64HLtoV128,
+                                           mkU64( 0x0ULL ),
+                                           mkU64( 0xFFFFFFFFFFFFFFFFULL ) ) ),
+                             mkexpr( shift ) ),
+                      binop( Iop_AndV128,
+                             unop( Iop_NotV128,
+                                   binop( Iop_ShlV128,
+                                          binop( Iop_64HLtoV128,
+                                                 mkU64( 0x0ULL ),
+                                                 mkU64( 0xFFFFFFFFFFFFFFFFULL ) ),
+                                          mkexpr( shift ) ) ),
+                             mkexpr( vD ) ) ) );
+   }
+   break;
 
    /* Splat */
    case 0x20C: { // vspltb (Splat Byte, AV p245)
@@ -17529,6 +24307,176 @@
 }
 
 /*
+  Vector Integer Absolute Difference
+*/
+static Bool dis_abs_diff ( UInt theInstr )
+{
+   /* VX-Form */
+   UChar opc1     = ifieldOPC( theInstr );
+   UChar vT_addr  = ifieldRegDS( theInstr );
+   UChar vA_addr  = ifieldRegA( theInstr );
+   UChar vB_addr  = ifieldRegB( theInstr );
+   UInt  opc2     = IFIELD( theInstr, 0, 11 );
+
+   IRTemp vA    = newTemp( Ity_V128 );
+   IRTemp vB    = newTemp( Ity_V128 );
+   IRTemp vT    = newTemp( Ity_V128 );
+
+   IRTemp vAminusB = newTemp( Ity_V128 );
+   IRTemp vBminusA = newTemp( Ity_V128 );
+   IRTemp vMask    = newTemp( Ity_V128 );
+
+   assign( vA, getVReg( vA_addr ) );
+   assign( vB, getVReg( vB_addr ) );
+
+   if ( opc1 != 0x4 ) {
+      vex_printf("dis_abs_diff(ppc)(instr)\n");
+      return False;
+   }
+
+   switch ( opc2 ) {
+   case 0x403: // vabsdub  Vector absolute difference Unsigned Byte
+   {
+      DIP("vabsdub v%d,v%d,v%d\n", vT_addr, vA_addr, vB_addr);
+
+      /* Determine which of the corresponding bytes is larger,
+       * create mask with 1's in byte positions where vA[i] > vB[i]
+       */
+      assign( vMask, binop( Iop_CmpGT8Ux16, mkexpr( vA ), mkexpr( vB ) ) );
+
+      assign( vAminusB,
+              binop( Iop_AndV128,
+                     binop( Iop_Sub8x16, mkexpr( vA ), mkexpr( vB ) ),
+                     mkexpr( vMask ) ) );
+
+      assign( vBminusA,
+              binop( Iop_AndV128,
+                     binop( Iop_Sub8x16, mkexpr( vB ), mkexpr( vA ) ),
+                     unop ( Iop_NotV128, mkexpr( vMask ) ) ) );
+
+      assign( vT, binop( Iop_OrV128,
+                         mkexpr( vAminusB ),
+                         mkexpr( vBminusA ) ) );
+   }
+   break;
+
+   case 0x443: // vabsduh  Vector absolute difference Unsigned Halfword
+   {
+      DIP("vabsduh v%d,v%d,v%d\n", vT_addr, vA_addr, vB_addr);
+
+      /* Determine which of the corresponding halfwords is larger,
+       * create mask with 1's in halfword positions where vA[i] > vB[i]
+       */
+      assign( vMask, binop( Iop_CmpGT16Ux8, mkexpr( vA ), mkexpr( vB ) ) );
+
+      assign( vAminusB,
+              binop( Iop_AndV128,
+                     binop( Iop_Sub16x8, mkexpr( vA ), mkexpr( vB ) ),
+                     mkexpr( vMask ) ) );
+
+      assign( vBminusA,
+              binop( Iop_AndV128,
+                     binop( Iop_Sub16x8, mkexpr( vB ), mkexpr( vA ) ),
+                     unop ( Iop_NotV128, mkexpr( vMask ) ) ) );
+
+      assign( vT, binop( Iop_OrV128,
+                         mkexpr( vAminusB ),
+                         mkexpr( vBminusA ) ) );
+   }
+   break;
+
+   case 0x483: // vabsduw  Vector absolute difference Unsigned Word
+   {
+         DIP("vabsduw v%d,v%d,v%d\n", vT_addr, vA_addr, vB_addr);
+
+         /* Determine which of the corresponding words is larger,
+          * create mask with 1's in word positions where vA[i] > vB[i]
+          */
+         assign( vMask, binop( Iop_CmpGT32Ux4, mkexpr( vA ), mkexpr( vB ) ) );
+
+         assign( vAminusB,
+                 binop( Iop_AndV128,
+                        binop( Iop_Sub32x4, mkexpr( vA ), mkexpr( vB ) ),
+                        mkexpr( vMask ) ) );
+
+         assign( vBminusA,
+                 binop( Iop_AndV128,
+                        binop( Iop_Sub32x4, mkexpr( vB ), mkexpr( vA ) ),
+                        unop ( Iop_NotV128, mkexpr( vMask ) ) ) );
+
+         assign( vT, binop( Iop_OrV128,
+                            mkexpr( vAminusB ),
+                            mkexpr( vBminusA ) ) );
+   }
+   break;
+
+   default:
+      return False;
+   }
+
+   putVReg( vT_addr, mkexpr( vT ) );
+
+  return True;
+}
+
+/*
+  AltiVec 128 bit integer multiply by 10 Instructions
+*/
+static Bool dis_av_mult10 ( UInt theInstr )
+{
+   /* VX-Form */
+   UChar opc1     = ifieldOPC(theInstr);
+   UChar vT_addr  = ifieldRegDS(theInstr);
+   UChar vA_addr  = ifieldRegA(theInstr);
+   UChar vB_addr  = ifieldRegB(theInstr);
+   UInt  opc2     = IFIELD( theInstr, 0, 11 );
+
+   IRTemp vA    = newTemp(Ity_V128);
+   assign( vA, getVReg(vA_addr));
+
+   if (opc1 != 0x4) {
+      vex_printf("dis_av_mult10(ppc)(instr)\n");
+      return False;
+   }
+   switch (opc2) {
+   case 0x001: { // vmul10cuq (Vector Multiply-by-10 and write carry
+      DIP("vmul10cuq v%d,v%d\n", vT_addr, vA_addr);
+      putVReg( vT_addr,
+               unop( Iop_MulI128by10Carry, mkexpr( vA ) ) );
+      break;
+  }
+   case 0x041: { // vmul10uq (Vector Multiply-by-10 Extended and write carry
+                 //           Unsigned Quadword VX form)
+      IRTemp vB    = newTemp(Ity_V128);
+      assign( vB, getVReg(vB_addr));
+      DIP("vmul10ecuq v%d,v%d,v%d\n", vT_addr, vA_addr, vB_addr);
+      putVReg( vT_addr,
+               binop( Iop_MulI128by10ECarry, mkexpr( vA ), mkexpr( vB ) ) );
+      break;
+   }
+   case 0x201: { // vmul10uq (Vector Multiply-by-10 Unsigned Quadword VX form)
+      DIP("vmul10uq v%d,v%d\n", vT_addr, vA_addr);
+      putVReg( vT_addr,
+               unop( Iop_MulI128by10, mkexpr( vA ) ) );
+      break;
+   }
+  case 0x241: { // vmul10uq (Vector Multiply-by-10 Extended
+                //           Unsigned Quadword VX form)
+      IRTemp vB    = newTemp(Ity_V128);
+      assign( vB, getVReg(vB_addr));
+      DIP("vmul10euq v%d,v%d,v%d\n", vT_addr, vA_addr, vB_addr);
+      putVReg( vT_addr,
+               binop( Iop_MulI128by10E, mkexpr( vA ), mkexpr( vB ) ) );
+      break;
+   }
+   default:
+      vex_printf("dis_av_mult10(ppc)(opc2)\n");
+      return False;
+   }
+   return True;
+}
+
+/*
   AltiVec Pack/Unpack Instructions
 */
 static Bool dis_av_pack ( UInt theInstr )
@@ -17968,6 +24916,7 @@
    IRTemp _vecA_32   = IRTemp_INVALID;
    IRTemp _vecB_32   = IRTemp_INVALID;
    IRTemp res_32     = IRTemp_INVALID;
+   IRTemp res_64     = IRTemp_INVALID;
    IRTemp result     = IRTemp_INVALID;
    IRTemp tmp_result = IRTemp_INVALID;
    IRTemp carry      = IRTemp_INVALID;
@@ -17977,10 +24926,15 @@
    IRExpr * _vecA_high64 = unop( Iop_V128HIto64, vecA );
    IRExpr * _vecB_high64 = unop( Iop_V128HIto64, vecB );
 
+   carry = newTemp(Ity_I32);
+   assign( carry, cin );
+
    for (i = 0; i < 4; i++) {
       _vecA_32 = newTemp(Ity_I32);
       _vecB_32 = newTemp(Ity_I32);
       res_32   = newTemp(Ity_I32);
+      res_64   = newTemp(Ity_I64);
+
       switch (i) {
       case 0:
          assign(_vecA_32, unop( Iop_64to32, _vecA_low64 ) );
@@ -18000,13 +24954,25 @@
          break;
       }
 
-      assign(res_32, binop( Iop_Add32,
-                            binop( Iop_Add32,
-                                   binop ( Iop_Add32,
-                                           mkexpr(_vecA_32),
-                                           mkexpr(_vecB_32) ),
-                                   (i == 0) ? mkU32(0) : mkexpr(carry) ),
-                            (i == 0) ? cin : mkU32(0) ) );
+      assign( res_64, binop( Iop_Add64,
+                             binop ( Iop_Add64,
+                                     binop( Iop_32HLto64,
+                                            mkU32( 0 ),
+                                            mkexpr(_vecA_32) ),
+                                     binop( Iop_32HLto64,
+                                            mkU32( 0 ),
+                                            mkexpr(_vecB_32) ) ),
+                             binop( Iop_32HLto64,
+                                    mkU32( 0 ),
+                                    mkexpr( carry ) ) ) );
+
+      /* Calculate the carry to the next higher 32 bits. */
+      carry = newTemp(Ity_I32);
+      assign(carry, unop( Iop_64HIto32, mkexpr( res_64 ) ) );
+
+      /* result is the lower 32-bits */
+      assign(res_32, unop( Iop_64to32, mkexpr( res_64 ) ) );
+
       if (modulo) {
          result = newTemp(Ity_V128);
          assign(result, binop( Iop_OrV128,
@@ -18023,10 +24989,6 @@
          tmp_result = newTemp(Ity_V128);
          assign(tmp_result, mkexpr(result));
       }
-      carry = newTemp(Ity_I32);
-      assign(carry, unop(Iop_1Uto32, binop( Iop_CmpLT32U,
-                                            mkexpr(res_32),
-                                            mkexpr(_vecA_32 ) ) ) );
    }
    if (modulo)
       return result;
@@ -18229,6 +25191,48 @@
    return True;
 }
 
+static IRExpr * bcd_sign_code_adjust( UInt ps, IRExpr * tmp)
+{
+   /* The Iop_BCDAdd and Iop_BCDSub will result in the corresponding Power PC
+    * instruction being issued with ps = 0.  If ps = 1, the sign code, which
+    * is in the least significant four bits of the result, needs to be updated
+    * per the ISA:
+    *
+    *    If PS=0, the sign code of the result is set to 0b1100.
+    *    If PS=1, the sign code of the result is set to 0b1111.
+    *
+    * Note, the ps value is NOT being passed down to the instruction issue
+    * because passing a constant via triop() breaks the vbit-test test.  The
+    * vbit-tester assumes it can set non-zero shadow bits for the triop()
+    * arguments.  Thus they have to be expressions not a constant.
+    * Use 32-bit compare instructiions as 64-bit compares are not supported
+    * in 32-bit mode.
+    */
+   IRTemp mask  = newTemp(Ity_I64);
+   IRExpr *rtn;
+
+   if ( ps == 0 ) {
+      /* sign code is correct, just return it.  */
+      rtn = tmp;
+
+   } else {
+      /* Check if lower four bits are 0b1100, if so, change to 0b1111 */
+      /* Make this work in 32-bit mode using only 32-bit compares */
+      assign( mask, unop( Iop_1Sto64,
+                          binop( Iop_CmpEQ32, mkU32( 0xC ),
+                                 binop( Iop_And32, mkU32( 0xF ),
+                                        unop( Iop_64to32,
+                                              unop( Iop_V128to64, tmp )
+                                              ) ) ) ) );
+      rtn = binop( Iop_64HLtoV128,
+                   unop( Iop_V128HIto64, tmp ),
+                   binop( Iop_Or64,
+                          binop( Iop_And64, mkU64( 0xF ), mkexpr( mask ) ),
+                          unop( Iop_V128to64, tmp ) ) );
+   }
+
+   return rtn;
+}
 
 /*
   AltiVec BCD Arithmetic instructions.
@@ -18237,7 +25241,123 @@
   except when an overflow occurs.  But since we can't be 100% accurate
   in our emulation of CR6, it seems best to just not support it all.
 */
-static Bool dis_av_bcd ( UInt theInstr )
+static Bool dis_av_bcd_misc ( UInt theInstr, const VexAbiInfo* vbi )
+{
+   UChar opc1     = ifieldOPC(theInstr);
+   UChar vRT_addr = ifieldRegDS(theInstr);
+   UChar vRA_addr = ifieldRegA(theInstr);
+   UChar vRB_addr = ifieldRegB(theInstr);
+   IRTemp vA      = newTemp(Ity_V128);
+   IRTemp vB      = newTemp(Ity_V128);
+   UInt  opc2     = IFIELD( theInstr, 0, 11 );
+   IRExpr *pos, *neg, *valid, *zero, *sign;
+   IRTemp eq_lt_gt = newTemp( Ity_I32 );
+
+   assign( vA, getVReg(vRA_addr));
+   assign( vB, getVReg(vRB_addr));
+
+   if (opc1 != 0x4) {
+      vex_printf("dis_av_bcd_misc(ppc)(instr)\n");
+      return False;
+   }
+
+   switch (opc2) {
+   case 0x341: // bcdcpsgn. Decimal Copy Sign VX-form
+      {
+         IRExpr *sign_vb, *value_va;
+         DIP("bcdcpsgn. v%d,v%d,v%d\n", vRT_addr, vRA_addr, vRB_addr);
+
+         zero =
+            BCDstring_zero( binop( Iop_AndV128,
+                                   binop( Iop_64HLtoV128,
+                                          mkU64( 0xFFFFFFFFFFFFFFFF ),
+                                          mkU64( 0xFFFFFFFFFFFFFFF0 ) ),
+                                   mkexpr( vA ) ) );
+
+         /* Sign codes of 0xA, 0xC, 0xE or 0xF are positive, sign
+          * codes 0xB and 0xD are negative.
+          */
+         sign = binop( Iop_And64, mkU64( 0xF ),
+                       unop( Iop_V128to64, mkexpr( vB ) ) );
+
+         neg = mkOR1( binop( Iop_CmpEQ64,
+                             sign,
+                             mkU64 ( 0xB ) ),
+                      binop( Iop_CmpEQ64,
+                             sign,
+                             mkU64 ( 0xD ) ) );
+
+         pos = mkNOT1( neg );
+
+         /* invalid if vA or vB is not valid */
+         valid =
+            unop( Iop_64to32,
+                  binop( Iop_And64,
+                         is_BCDstring128( vbi,
+                                          /*Signed*/True, mkexpr( vA ) ),
+                         is_BCDstring128( vbi,
+                                          /*Signed*/True, mkexpr( vB ) ) ) );
+
+         sign_vb = binop( Iop_AndV128,
+                          binop( Iop_64HLtoV128,
+                                 mkU64( 0 ),
+                                 mkU64( 0xF ) ),
+                          mkexpr( vB ) );
+
+         value_va = binop( Iop_AndV128,
+                           binop( Iop_64HLtoV128,
+                                  mkU64( 0xFFFFFFFFFFFFFFFF ),
+                                  mkU64( 0xFFFFFFFFFFFFFFF0 ) ),
+                           mkexpr( vA ) );
+         putVReg( vRT_addr, binop( Iop_OrV128, sign_vb, value_va ) );
+      }
+      break;
+
+   default:
+      vex_printf("dis_av_bcd_misc(ppc)(opc2)\n");
+      return False;
+   }
+
+   /* set CR field 6 to:
+    *    0b1000  if vB less then 0, i.e. vB is neg and not zero,
+    *    0b0100  if vB greter then 0,  i.e. vB is pos and not zero,
+    *    0b1000  if vB equals 0,
+    *    0b0001  if vB is invalid  over rules lt, gt, eq
+    */
+   assign( eq_lt_gt,
+           binop( Iop_Or32,
+                  binop( Iop_Shl32,
+                         unop( Iop_1Uto32,
+                               mkAND1( neg,
+                                       mkNOT1( zero ) ) ),
+                         mkU8( 3 ) ),
+                  binop( Iop_Or32,
+                         binop( Iop_Shl32,
+                                unop( Iop_1Uto32,
+                                      mkAND1( pos,
+                                              mkNOT1( zero ) ) ),
+                                mkU8( 2 ) ),
+                         binop( Iop_Shl32,
+                                unop( Iop_1Uto32, zero ),
+                                mkU8( 1 ) ) ) ) );
+
+   IRTemp valid_mask = newTemp( Ity_I32 );
+
+   assign( valid_mask, unop( Iop_1Sto32, unop( Iop_32to1, valid ) ) );
+
+   putGST_field( PPC_GST_CR,
+                 binop( Iop_Or32,
+                        binop( Iop_And32,
+                               mkexpr( valid_mask ),
+                               mkexpr( eq_lt_gt ) ),
+                        binop( Iop_And32,
+                               unop( Iop_Not32, mkexpr( valid_mask ) ),
+                               mkU32( 1 ) ) ),
+                 6 );
+   return True;
+}
+
+static Bool dis_av_bcd ( UInt theInstr, const VexAbiInfo* vbi )
 {
    /* VX-Form */
    UChar opc1     = ifieldOPC(theInstr);
@@ -18246,10 +25366,13 @@
    UChar vRB_addr = ifieldRegB(theInstr);
    UChar ps       = IFIELD( theInstr, 9, 1 );
    UInt  opc2     = IFIELD( theInstr, 0, 9 );
-
    IRTemp vA    = newTemp(Ity_V128);
    IRTemp vB    = newTemp(Ity_V128);
    IRTemp dst    = newTemp(Ity_V128);
+   IRExpr *pos, *neg, *valid, *zero, *sign_digit, *in_range;
+   IRTemp eq_lt_gt = newTemp( Ity_I32 );
+   IRExpr *overflow, *value;
+
    assign( vA, getVReg(vRA_addr));
    assign( vB, getVReg(vRB_addr));
 
@@ -18259,24 +25382,1165 @@
    }
 
    switch (opc2) {
-   case 0x1:  // bcdadd
-     DIP("bcdadd. v%d,v%d,v%d,%u\n", vRT_addr, vRA_addr, vRB_addr, ps);
-     assign( dst, triop( Iop_BCDAdd, mkexpr( vA ),
-                         mkexpr( vB ), mkU8( ps ) ) );
-     putVReg( vRT_addr, mkexpr(dst));
-     return True;
+   case 0x1:   // bcdadd.
+   case 0x41:  // bcdsub.
+      {
+         /* NOTE 64 bit compares are not supported in 32-bit mode.  Use
+          * 32-bit compares only.
+          */
 
-   case 0x41:  // bcdsub
-     DIP("bcdsub. v%d,v%d,v%d,%u\n", vRT_addr, vRA_addr, vRB_addr, ps);
-     assign( dst, triop( Iop_BCDSub, mkexpr( vA ),
-                         mkexpr( vB ), mkU8( ps ) ) );
-     putVReg( vRT_addr, mkexpr(dst));
-     return True;
+         IRExpr *sign, *res_smaller;
+         IRExpr *signA, *signB, *sign_digitA, *sign_digitB;
+         IRExpr *zeroA, *zeroB, *posA, *posB, *negA, *negB;
+
+         if ( opc2 == 0x1 ) {
+            DIP("bcdadd. v%d,v%d,v%d,%u\n", vRT_addr, vRA_addr, vRB_addr, ps);
+            assign( dst, bcd_sign_code_adjust( ps,
+                                               binop( Iop_BCDAdd,
+                                                      mkexpr( vA ),
+                                                      mkexpr( vB ) ) ) );
+         } else {
+            DIP("bcdsub. v%d,v%d,v%d,%u\n", vRT_addr, vRA_addr, vRB_addr, ps);
+            assign( dst, bcd_sign_code_adjust( ps,
+                                               binop( Iop_BCDSub,
+                                                      mkexpr( vA ),
+                                                      mkexpr( vB ) ) ) );
+         }
+
+         putVReg( vRT_addr, mkexpr( dst ) );
+         /* set CR field 6 */
+         /* result */
+         zero = BCDstring_zero( binop( Iop_AndV128,
+                                       binop( Iop_64HLtoV128,
+                                              mkU64( 0xFFFFFFFFFFFFFFFF ),
+                                              mkU64( 0xFFFFFFFFFFFFFFF0 ) ),
+                                       mkexpr(dst) ) );  // ignore sign
+
+         sign_digit = binop( Iop_And32, mkU32( 0xF ),
+                             unop( Iop_64to32,
+                                   unop( Iop_V128to64, mkexpr( dst ) ) ) );
+
+         sign = mkOR1( binop( Iop_CmpEQ32,
+                              sign_digit,
+                              mkU32 ( 0xB ) ),
+                       binop( Iop_CmpEQ32,
+                              sign_digit,
+                              mkU32 ( 0xD ) ) );
+         neg = mkAND1( sign, mkNOT1( zero ) );
+
+         /* Pos position AKA gt = 1 if ((not neg) & (not eq zero)) */
+         pos = mkAND1( mkNOT1( sign ), mkNOT1( zero ) );
+         valid =  unop( Iop_64to32,
+                        binop( Iop_And64,
+                               is_BCDstring128( vbi,
+                                                /*Signed*/True, mkexpr( vA ) ),
+                               is_BCDstring128( vbi,
+                                                /*Signed*/True, mkexpr( vB ) )
+                                                ) );
+
+         /* src A */
+         zeroA = BCDstring_zero( binop( Iop_AndV128,
+                                        binop( Iop_64HLtoV128,
+                                               mkU64( 0xFFFFFFFFFFFFFFFF ),
+                                               mkU64( 0xFFFFFFFFFFFFFFF0 ) ),
+                                        mkexpr( vA ) ) );  // ignore sign
+         sign_digitA = binop( Iop_And32, mkU32( 0xF ),
+                              unop( Iop_64to32,
+                                    unop( Iop_V128to64, mkexpr( vA ) ) ) );
+
+         signA = mkOR1( binop( Iop_CmpEQ32,
+                               sign_digitA,
+                               mkU32 ( 0xB ) ),
+                        binop( Iop_CmpEQ32,
+                               sign_digitA,
+                               mkU32 ( 0xD ) ) );
+         negA = mkAND1( signA, mkNOT1( zeroA ) );
+         /* Pos position AKA gt = 1 if ((not neg) & (not eq zero)) */
+         posA = mkAND1( mkNOT1( signA ), mkNOT1( zeroA ) );
+
+         /* src B */
+         zeroB = BCDstring_zero( binop( Iop_AndV128,
+                                        binop( Iop_64HLtoV128,
+                                               mkU64( 0xFFFFFFFFFFFFFFFF ),
+                                               mkU64( 0xFFFFFFFFFFFFFFF0 ) ),
+                                        mkexpr( vB ) ) );  // ignore sign
+         sign_digitB = binop( Iop_And32, mkU32( 0xF ),
+                              unop( Iop_64to32,
+                                    unop( Iop_V128to64, mkexpr( vB ) ) ) );
+
+         signB = mkOR1( binop( Iop_CmpEQ32,
+                               sign_digitB,
+                               mkU32 ( 0xB ) ),
+                        binop( Iop_CmpEQ32,
+                               sign_digitB,
+                               mkU32 ( 0xD ) ) );
+         negB = mkAND1( signB, mkNOT1( zeroB ) );
+
+
+         /* Pos position AKA gt = 1 if ((not neg) & (not eq zero)) */
+         posB = mkAND1( mkNOT1( signB ), mkNOT1( zeroB ) );
+
+
+         if (mode64) {
+	    res_smaller = mkAND1( CmpGT128U( mkexpr( vA ), mkexpr( dst ) ),
+				  CmpGT128U( mkexpr( vB ), mkexpr( dst ) ) );
+
+         } else {
+            /* Have to do this with 32-bit compares, expensive */
+            res_smaller = mkAND1( UNSIGNED_CMP_GT_V128( mkexpr( vA ),
+                                                        mkexpr( dst ) ),
+                                  UNSIGNED_CMP_GT_V128( mkexpr( vB ),
+                                                        mkexpr( dst ) ) );
+         }
+
+         if ( opc2 == 0x1) {
+            /* Overflow for Add can only occur if the signs of the operands
+             * are the same and the two operands are non-zero.  On overflow,
+             * the PPC hardware produces a result consisting of just the lower
+             * digits of the result.  So, if the result is less then both
+             * operands and the sign of the operands are the same overflow
+             * occured.
+             */
+            overflow = mkOR1( mkAND1( res_smaller, mkAND1( negA, negB ) ),
+                              mkAND1( res_smaller, mkAND1( posA, posB ) ) );
+         } else {
+            /* Overflow for Add can only occur if the signs of the operands
+             * are the different and the two operands are non-zero.  On overflow,
+             * the PPC hardware produces a result consisting of just the lower
+             * digits of the result.  So, if the result is less then both
+             * operands and the sign of the operands are different overflow
+             * occured.
+             */
+            overflow = mkOR1( mkAND1( res_smaller, mkAND1( negA, posB ) ),
+                              mkAND1( res_smaller, mkAND1( posA, negB ) ) );
+         }
+      }
+     break;
+
+   case 0x081: // bcdus.  Decimal Unsigned Shift VX-form
+   case 0x0C1: // bcds.   Decimal Shift VX-form
+   case 0x1C1: // bcdsr.  Decimal Shift and Round VX-form
+      {
+         IRExpr *shift_dir;
+         IRExpr *shift_mask, *result, *new_sign_val, *sign;
+         IRExpr *not_excess_shift, *not_excess_shift_mask;
+         IRTemp shift_dir_mask = newTemp( Ity_I64 );
+         IRTemp shift_by = newTemp( Ity_I64 );
+         IRTemp shift_field = newTemp( Ity_I64 );
+         IRTemp shifted_out = newTemp( Ity_V128 );
+         IRTemp value_shl = newTemp( Ity_V128 );
+         IRTemp value_shr = newTemp( Ity_V128 );
+         IRTemp round = newTemp( Ity_I32);
+
+         ULong value_mask_low = 0;
+         UInt max_shift = 0;
+
+         if (opc2 == 0x0C1) {
+            DIP("bcds. v%d,v%d,v%d,%d\n", vRT_addr, vRA_addr, vRB_addr, ps);
+            value_mask_low = 0xFFFFFFFFFFFFFFF0;
+            max_shift = 30 * 4; /* maximum without shifting all digits out */
+
+         }  else if (opc2 == 0x1C1) {
+            DIP("bcdsr. v%d,v%d,v%d,%d\n", vRT_addr, vRA_addr, vRB_addr, ps);
+
+            value_mask_low = 0xFFFFFFFFFFFFFFF0;
+            max_shift = 30 * 4; /* maximum without shifting all digits out */
+
+         } else {
+            DIP("bcdus. v%d,v%d,v%d,%d\n", vRT_addr, vRA_addr,
+                vRB_addr, ps);
+            value_mask_low = 0xFFFFFFFFFFFFFFFF;
+            max_shift = 31 * 4; /* maximum without shifting all digits out */
+         }
+
+         value = binop( Iop_AndV128,
+                        binop( Iop_64HLtoV128,
+                               mkU64( 0xFFFFFFFFFFFFFFFF ),
+                               mkU64( value_mask_low ) ),
+                        mkexpr( vB ) );
+
+         zero = BCDstring_zero( value );
+
+         /* Shift field is 2's complement value */
+         assign( shift_field, unop( Iop_V128to64,
+                                    binop( Iop_ShrV128,
+                                           binop( Iop_AndV128,
+                                                  binop( Iop_64HLtoV128,
+                                                         mkU64( 0xFF ),
+                                                         mkU64( 0x0) ),
+                                                  mkexpr( vA ) ),
+                                           mkU8( 64 ) ) ) );
+
+         /* if shift_dir = 0 shift left, otherwise shift right */
+         shift_dir = binop( Iop_CmpEQ64,
+                            binop( Iop_Shr64,
+                                   mkexpr( shift_field ),
+                                   mkU8( 7 ) ),
+                            mkU64( 1 ) );
+
+         assign( shift_dir_mask, unop( Iop_1Sto64, shift_dir ) );
+
+         /* Shift field is stored in 2's complement form */
+         assign(shift_by,
+                binop( Iop_Mul64,
+                       binop( Iop_Or64,
+                              binop( Iop_And64,
+                                     unop( Iop_Not64,
+                                           mkexpr( shift_dir_mask ) ),
+                                     mkexpr( shift_field ) ),
+                              binop( Iop_And64,
+                                     mkexpr( shift_dir_mask ),
+                                     binop( Iop_And64,
+                                            binop( Iop_Add64,
+                                                   mkU64( 1 ),
+                                                   unop( Iop_Not64,
+                                                         mkexpr( shift_field ) ) ),
+                                            mkU64( 0xFF ) ) ) ),
+                       mkU64( 4 ) ) );
+
+         /* If the shift exceeds 128 bits, we need to force the result
+          * to zero because Valgrind shift amount is only 7-bits. Otherwise,
+          * we get a shift amount of mod(shift_by, 127)
+          */
+         not_excess_shift = unop( Iop_1Sto64,
+                                  binop( Iop_CmpLE64U,
+                                         mkexpr( shift_by ),
+                                         mkU64( max_shift ) ) );
+
+         not_excess_shift_mask = binop( Iop_64HLtoV128,
+                                        not_excess_shift,
+                                        not_excess_shift );
+
+         assign( value_shl,
+                 binop( Iop_ShlV128, value, unop( Iop_64to8,
+                                                  mkexpr( shift_by) ) ) );
+         assign( value_shr,
+                 binop( Iop_AndV128,
+                        binop( Iop_64HLtoV128,
+                               mkU64( 0xFFFFFFFFFFFFFFFF ),
+                               mkU64( value_mask_low) ),
+                        binop( Iop_ShrV128,
+                               value,
+                               unop( Iop_64to8,
+                                     mkexpr( shift_by ) )  ) ) );
+
+         /* Overflow occurs if the shift amount is greater than zero, the
+          * operation is a left shift and any non-zero digits are left
+          * shifted out.
+          */
+         assign( shifted_out,
+                 binop( Iop_OrV128,
+                        binop( Iop_ShrV128,
+                               value,
+                               unop( Iop_64to8,
+                                     binop( Iop_Sub64,
+                                            mkU64( 32*4 ),
+                                            mkexpr( shift_by ) ) ) ),
+                        binop( Iop_AndV128,
+                               unop( Iop_NotV128,
+                                     not_excess_shift_mask ),
+                               value ) ) );
+
+         overflow = mkAND1( mkNOT1( BCDstring_zero( mkexpr( shifted_out ) ) ),
+                            mkAND1( mkNOT1( shift_dir ),
+                                    binop( Iop_CmpNE64,
+                                           mkexpr( shift_by ),
+                                           mkU64( 0 ) ) ) );
+
+         if ((opc2 == 0xC1) || (opc2 == 0x1C1)) {
+            /* Sign codes of 0xA, 0xC, 0xE or 0xF are positive, sign
+             * codes 0xB and 0xD are negative.
+             */
+            sign_digit = binop( Iop_And64, mkU64( 0xF ),
+                                unop( Iop_V128to64, mkexpr( vB ) ) );
+
+            sign = mkOR1( binop( Iop_CmpEQ64,
+                                 sign_digit,
+                                 mkU64 ( 0xB ) ),
+                          binop( Iop_CmpEQ64,
+                                 sign_digit,
+                                 mkU64 ( 0xD ) ) );
+            neg = mkAND1( sign, mkNOT1( zero ) );
+
+            /* Pos position AKA gt = 1 if ((not neg) & (not eq zero)) */
+            pos = mkAND1( mkNOT1( sign ), mkNOT1( zero ) );
+
+            valid =
+               unop( Iop_64to32,
+                     is_BCDstring128( vbi, /* Signed */True, mkexpr( vB ) ) );
+
+         } else {
+            /* string is an unsigned BCD value */
+            pos = mkU1( 1 );
+            neg = mkU1( 0 );
+            sign = mkU1( 0 );
+
+            valid =
+               unop( Iop_64to32,
+                     is_BCDstring128( vbi, /* Unsigned */False,
+                                      mkexpr( vB ) ) );
+         }
+
+         /* if PS = 0
+                  vB positive, sign is C
+                  vB negative, sign is D
+            if PS = 1
+                  vB positive, sign is F
+                  vB negative, sign is D
+            Note can't use pos or neg here since they are ANDed with zero,
+            use sign instead.
+         */
+         if (ps == 0) {
+            new_sign_val = binop( Iop_Or64,
+                                  unop( Iop_1Uto64, sign ),
+                                  mkU64( 0xC ) );
+
+         } else {
+            new_sign_val = binop( Iop_Xor64,
+                                  binop( Iop_Shl64,
+                                         unop( Iop_1Uto64, sign ),
+                                         mkU8( 1 ) ),
+                                  mkU64( 0xF ) );
+         }
+
+         shift_mask = binop( Iop_64HLtoV128,
+                             unop( Iop_1Sto64, shift_dir ),
+                             unop( Iop_1Sto64, shift_dir ) );
+
+         result = binop( Iop_OrV128,
+                         binop( Iop_AndV128, mkexpr( value_shr ), shift_mask ),
+                         binop( Iop_AndV128,
+                                mkexpr( value_shl ),
+                                unop( Iop_NotV128, shift_mask ) ) );
+
+         if (opc2 == 0xC1) {    // bcds.
+            putVReg( vRT_addr, binop( Iop_OrV128,
+                                      binop( Iop_64HLtoV128,
+                                             mkU64( 0 ),
+                                             new_sign_val ),
+                                      binop( Iop_AndV128,
+                                             not_excess_shift_mask,
+                                             result ) ) );
+         } else if (opc2 == 0x1C1) {  //bcdsr.
+            /* If shifting right, need to round up if needed */
+            assign( round, unop( Iop_1Uto32,
+                                 mkAND1( shift_dir,
+                                         check_BCD_round( value,
+                                                          shift_by ) ) ) );
+
+            putVReg( vRT_addr,
+                     binop( Iop_OrV128,
+                            binop( Iop_64HLtoV128,
+                                   mkU64( 0 ),
+                                   new_sign_val ),
+                            binop( Iop_AndV128,
+                                   not_excess_shift_mask,
+                                   mkexpr( increment_BCDstring( vbi, result,
+                                                                mkexpr( round)
+                                                                ) ) ) ) );
+         } else {  // bcdus.
+            putVReg( vRT_addr, binop( Iop_AndV128,
+                                      not_excess_shift_mask,
+                                      result ) );
+         }
+      }
+      break;
+
+   case 0x101:  // bcdtrunc.   Decimal Truncate VX-form
+   case 0x141:  // bcdutrunc.  Decimal Unsigned Truncate VX-form
+      {
+         IRTemp length = newTemp( Ity_I64 );
+         IRTemp masked_out = newTemp( Ity_V128 );
+         IRExpr *new_sign_val, *result, *shift;
+         IRExpr *length_neq_128, *sign;
+         ULong value_mask_low;
+         Int max_digits;
+
+         if ( opc2 == 0x101) {     // bcdtrunc.
+            value_mask_low = 0xFFFFFFFFFFFFFFF0;
+            max_digits = 31;
+         } else {
+            value_mask_low = 0xFFFFFFFFFFFFFFFF;
+            max_digits = 32;
+         }
+
+         assign( length, binop( Iop_And64,
+                                     unop( Iop_V128HIto64,
+                                           mkexpr( vA ) ),
+                                     mkU64( 0xFFFF ) ) );
+         shift = unop( Iop_64to8,
+                       binop( Iop_Mul64,
+                              binop( Iop_Sub64,
+                                     mkU64( max_digits ),
+                                     mkexpr( length ) ),
+                              mkU64( 4 ) ) );
+
+         /* Note ShrV128 gets masked by 127 so a shift of 128 results in
+          * the value not being shifted.  A shift of 128 sets the register
+          * zero.  So if length+1 = 128, just set the value to 0.
+         */
+         length_neq_128 = mkNOT1( binop( Iop_CmpEQ64,
+                                         mkexpr( length),
+                                         mkU64( 0x1F ) ) );
+
+         assign( masked_out,
+                 binop( Iop_AndV128,
+                        binop( Iop_64HLtoV128,
+                               unop( Iop_1Sto64, length_neq_128 ),
+                               unop( Iop_1Sto64, length_neq_128 ) ),
+                        binop( Iop_ShrV128,
+                               mkexpr( vB ),
+                               unop( Iop_64to8,
+                                     binop( Iop_Mul64,
+                                            mkU64( 4 ),
+                                            binop( Iop_Add64,
+                                                   mkU64( 1 ),
+                                                   mkexpr( length ) ) ) ) )
+                        ) );
+
+         /* Overflow occurs if any of the left most 31-length digits of vB
+          * are non-zero.
+          */
+         overflow = mkNOT1( BCDstring_zero( mkexpr( masked_out ) ) );
+
+         value = binop( Iop_AndV128,
+                        binop( Iop_64HLtoV128,
+                               mkU64( 0xFFFFFFFFFFFFFFFF ),
+                               mkU64( value_mask_low ) ),
+                        mkexpr( vB ) );
+
+
+         if ( opc2 == 0x101 ) {     // bcdtrunc.
+            /* Check if all of the non-sign digits are zero */
+            zero = BCDstring_zero( binop( Iop_AndV128,
+                                          binop( Iop_64HLtoV128,
+                                                 mkU64( 0xFFFFFFFFFFFFFFFF ),
+                                                 mkU64( 0xFFFFFFFFFFFFFFF0 ) ),
+                                          value ) );
+
+            /* Sign codes of 0xA, 0xC, 0xE or 0xF are positive, sign
+             * codes 0xB and 0xD are negative.
+             */
+            sign_digit = binop( Iop_And64, mkU64( 0xF ),
+                                unop( Iop_V128to64, mkexpr( vB ) ) );
+
+            sign = mkOR1( binop( Iop_CmpEQ64,
+                                 sign_digit,
+                                 mkU64 ( 0xB ) ),
+                          binop( Iop_CmpEQ64,
+                                 sign_digit,
+                                 mkU64 ( 0xD ) ) );
+            neg = mkAND1( sign, mkNOT1( zero ) );
+
+            /* Pos position AKA gt = 1 if ((not neg) & (not eq zero)) */
+            pos = mkAND1( mkNOT1( sign ), mkNOT1( zero ) );
+
+            /* Note can't use pos or neg here since they are ANDed with zero,
+               use sign instead.
+            */
+            if (ps == 0) {
+               new_sign_val = binop( Iop_Or64,
+                                     unop( Iop_1Uto64, sign ),
+                                     mkU64( 0xC ) );
+            } else {
+               new_sign_val = binop( Iop_Xor64,
+                                     binop( Iop_Shl64,
+                                            unop( Iop_1Uto64, sign ),
+                                            mkU8 ( 1 ) ),
+                                     mkU64( 0xF ) );
+            }
+            valid =
+               unop( Iop_64to32,
+                     is_BCDstring128( vbi, /* Signed */True, mkexpr( vB ) ) );
+
+         } else {   // bcdutrunc.
+            /* Check if all of the digits are zero */
+            zero = BCDstring_zero( value );
+
+            /* unsigned value, need to make CC code happy */
+            neg = mkU1( 0 );
+
+            /* Pos position AKA gt = 1 if (not eq zero) */
+            pos = mkNOT1( zero );
+            valid =
+               unop( Iop_64to32,
+                     is_BCDstring128( vbi, /* Unsigned */False,
+                                      mkexpr( vB ) ) );
+         }
+
+         /* If vB is not valid, the result is undefined, but we need to
+          * match the hardware so the output of the test suite will match.
+          * Hardware sets result to 0x0.
+          */
+         result = binop( Iop_AndV128,
+                         mkV128( 0xFFFF ),
+                         binop( Iop_ShrV128,
+                                binop( Iop_ShlV128, value, shift ),
+                                shift ) );
+
+         if ( opc2 == 0x101) {     // bcdtrunc.
+            putVReg( vRT_addr, binop( Iop_OrV128,
+                                      binop( Iop_64HLtoV128,
+                                             mkU64( 0 ),
+                                             new_sign_val ),
+                                      result ) );
+         } else {
+            putVReg( vRT_addr, result );
+         }
+      }
+      break;
+
+   case 0x181:   // bcdctz., bcdctn., bcdcfz., bcdcfn., bcdsetsgn.,
+                 // bcdcfsq., bcdctsq.
+      {
+         UInt inst_select = IFIELD( theInstr, 16, 5);
+
+         switch (inst_select) {
+         case 0:  // bcdctsq.  (Decimal Convert to Signed Quadword VX-form)
+            {
+               IRExpr *sign;
+
+               /* The instruction takes a 32-bit integer in a vector source
+                * register and returns the signed packed decimal number
+                * in a vector register.  The source integer needs to be moved
+                * from the V128 to an I32 for the Iop.
+                */
+
+               DIP("bcdctsq v%d, v%d\n", vRT_addr, vRB_addr);
+
+               putVReg( vRT_addr, unop( Iop_BCD128toI128S, mkexpr( vB ) ) );
+
+               sign =  binop( Iop_And64,
+                              unop( Iop_V128to64, mkexpr( vB ) ),
+                              mkU64( 0xF ) );
+               zero = mkAND1( binop( Iop_CmpEQ64,
+                                     unop( Iop_V128HIto64, mkexpr( vB ) ),
+                                     mkU64( 0x0 ) ),
+                              binop( Iop_CmpEQ64,
+                                     binop( Iop_And64,
+                                            unop( Iop_V128to64, mkexpr( vB ) ),
+                                            mkU64( 0xFFFFFFF0 ) ),
+                                     mkU64( 0x0 ) ) );
+               pos = mkAND1( mkNOT1( zero ),
+                             mkOR1( mkOR1( binop( Iop_CmpEQ64,
+                                                  sign, mkU64( 0xA ) ),
+                                           binop( Iop_CmpEQ64,
+                                                  sign, mkU64( 0xC ) ) ),
+                                    mkOR1( binop( Iop_CmpEQ64,
+                                                  sign, mkU64( 0xE ) ),
+                                           binop( Iop_CmpEQ64,
+                                                  sign, mkU64( 0xF ) ) ) ) );
+               neg = mkAND1( mkNOT1( zero ),
+                             mkOR1( binop( Iop_CmpEQ64, sign, mkU64( 0xB ) ),
+                                    binop( Iop_CmpEQ64, sign, mkU64( 0xD ) ) ) );
+
+               /* Check each of the nibbles for a valid digit 0 to 9 */
+               valid =
+                  unop( Iop_64to32,
+                        is_BCDstring128( vbi, /* Signed */True,
+                                         mkexpr( vB ) ) );
+               overflow = mkU1( 0 );  // not used
+            }
+            break;
+
+         case 2:  // bcdcfsq. (Decimal Convert from Signed Quadword VX-form)
+            {
+               IRExpr *pos_upper_gt, *pos_upper_eq, *pos_lower_gt;
+               IRExpr *neg_upper_lt, *neg_upper_eq, *neg_lower_lt;
+
+               DIP("bcdcfsq v%d, v%d, %d\n", vRT_addr, vRB_addr, ps);
+
+               /* The instruction takes a signed packed decimal number and
+                * returns the integer value in the vector register.  The Iop
+                * returns an I32 which needs to be moved to the destination
+                * vector register.
+                */
+               putVReg( vRT_addr,
+                        binop( Iop_I128StoBCD128, mkexpr( vB ), mkU8( ps ) ) );
+
+               zero = mkAND1( binop( Iop_CmpEQ64, mkU64( 0 ),
+                                     unop( Iop_V128to64, mkexpr( vB ) ) ),
+                              binop( Iop_CmpEQ64, mkU64( 0 ),
+                                     unop( Iop_V128HIto64, mkexpr( vB ) ) ) );
+               pos = mkAND1( mkNOT1 ( zero ),
+                             binop( Iop_CmpEQ64, mkU64( 0 ),
+                                    binop( Iop_And64,
+                                           unop( Iop_V128HIto64,
+                                                 mkexpr( vB ) ),
+                                           mkU64( 0x8000000000000000UL ) ) ) );
+               neg = mkAND1( mkNOT1 ( zero ),
+                             binop( Iop_CmpEQ64, mkU64( 0x8000000000000000UL ),
+                                    binop( Iop_And64,
+                                           unop( Iop_V128HIto64,
+                                                 mkexpr( vB ) ),
+                                           mkU64( 0x8000000000000000UL ) ) ) );
+
+               /* Overflow occurs if: vB > 10^31-1 OR vB < -10^31-1
+                * do not have a 128 bit compare.  Will have to compare the
+                * upper 64 bit and athe lower 64 bits.  If the upper 64-bits
+                * are equal then overflow if the lower 64 bits of vB is greater
+                * otherwise if the upper bits of vB is greater then the max
+                * for the upper 64-bits then overflow
+                *
+                *     10^31-1 = 0x7E37BE2022C0914B267FFFFFFF
+                */
+               pos_upper_gt = binop( Iop_CmpLT64U,
+                                     mkU64( 0x7E37BE2022 ),
+                                     unop( Iop_V128HIto64, mkexpr( vB ) ) );
+               pos_upper_eq = binop( Iop_CmpEQ64,
+                                     unop( Iop_V128HIto64, mkexpr( vB ) ),
+                                     mkU64( 0x7E37BE2022 ) );
+               pos_lower_gt = binop( Iop_CmpLT64U,
+                                     mkU64( 0x0914B267FFFFFFF ),
+                                     unop( Iop_V128to64, mkexpr( vB ) ) );
+               /* -10^31-1 =  0X81C841DFDD3F6EB4D97FFFFFFF */
+               neg_upper_lt = binop( Iop_CmpLT64U,
+                                     mkU64( 0X81C841DFDD ),
+                                     unop( Iop_V128HIto64, mkexpr( vB ) ) );
+               neg_upper_eq = binop( Iop_CmpEQ64,
+                                     unop( Iop_V128HIto64, mkexpr( vB ) ),
+                                     mkU64( 0X81C841DFDD ) );
+               neg_lower_lt = binop( Iop_CmpLT64U,
+                                     mkU64( 0x3F6EB4D97FFFFFFF ),
+                                     unop( Iop_V128to64, mkexpr( vB ) ) );
+
+               /* calculate overflow, masking for positive and negative */
+               overflow = mkOR1( mkAND1( pos,
+                                         mkOR1( pos_upper_gt,
+                                                mkAND1( pos_upper_eq,
+                                                        pos_lower_gt ) ) ),
+                                 mkAND1( neg,
+                                         mkOR1( neg_upper_lt,
+                                                mkAND1( neg_upper_eq,
+                                                        neg_lower_lt )
+                                                ) ) );
+               valid = mkU32( 1 );
+            }
+            break;
+
+         case 4:  // bcdctz. (Decimal Convert to Zoned VX-form)
+            {
+               IRExpr *ox_flag, *sign, *vrb_nibble30;
+               int neg_bit_shift;
+               unsigned int upper_byte, sign_byte;
+               IRTemp tmp = newTemp( Ity_V128 );
+
+               DIP("bcdctz. v%d,v%d,%d\n", vRT_addr, vRB_addr, ps);
+
+               if (ps == 0) {
+                  upper_byte = 0x30;
+                  sign_byte = 0x30;
+                  neg_bit_shift = 4+2;  /* sign byte is in bits [7:4] */
+               } else {
+                  upper_byte = 0xF0;
+                  sign_byte = 0xC0;
+                  neg_bit_shift = 4+0;
+               }
+
+               /* Grab vB bits[7:4].  It goes into bits [3:0] of the
+                * result.
+                */
+               vrb_nibble30 = binop( Iop_Shr64,
+                                     binop( Iop_And64,
+                                            unop( Iop_V128to64, mkexpr( vB ) ),
+                                            mkU64( 0xF0 ) ),
+                                     mkU8( 4 ) );
+
+               /* Upper 24 hex digits of VB, i.e. hex digits vB[0:23],
+                * must be zero for the value to be zero.  This goes
+                * in the overflow position of the condition code register.
+                */
+               ox_flag = binop( Iop_CmpEQ64,
+                                     binop( Iop_And64,
+                                            unop( Iop_V128to64, mkexpr( vB ) ),
+                                            mkU64( 0xFFFFFFFFFFFFFFFF ) ),
+                                mkU64( 0 ) );
+
+               /* zero is the same as eq_flag */
+               zero = mkAND1( binop( Iop_CmpEQ64,
+                                     binop( Iop_And64,
+                                            unop( Iop_V128HIto64, mkexpr( vB ) ),
+                                            mkU64( 0xFFFFFFFFFFFFFFFF ) ),
+                                     mkU64( 0 ) ),
+                              binop( Iop_CmpEQ64,
+                                     binop( Iop_And64,
+                                            unop( Iop_V128to64, mkexpr( vB ) ),
+                                            mkU64( 0xFFFFFFFFFFFFFFF0 ) ),
+                                     mkU64( 0 ) ) );
+
+               /* Sign codes of 0xA, 0xC, 0xE or 0xF are positive, sign
+                * codes 0xB and 0xD are negative.
+                */
+               sign_digit = binop( Iop_And64, mkU64( 0xF ),
+                                   unop( Iop_V128to64, mkexpr( vB ) ) );
+
+               /* The negative value goes in the LT bit position of the
+                * condition code register. Set neg if the sign of vB
+                * is negative and zero is true.
+                */
+               sign = mkOR1( binop( Iop_CmpEQ64,
+                                    sign_digit,
+                                    mkU64 ( 0xB ) ),
+                             binop( Iop_CmpEQ64,
+                                    sign_digit,
+                                    mkU64 ( 0xD ) ) );
+               neg = mkAND1( sign, mkNOT1( zero ) );
+
+               /* The positive value goes in the LT bit position of the
+                * condition code register. Set positive if the sign of the
+                * value is not negative.
+                */
+               pos = mkAND1( mkNOT1( sign ), mkNOT1( zero ) );
+
+               assign( tmp,
+                       convert_to_zoned( vbi, mkexpr( vB ),
+                                         mkU64( upper_byte ) ) );
+
+               /* Insert the sign based on ps and sign of vB
+                * in the lower byte.
+                */
+               putVReg( vRT_addr,
+                        binop( Iop_OrV128,
+                               binop( Iop_64HLtoV128,
+                                      mkU64( 0 ),
+                                      vrb_nibble30 ),
+                               binop( Iop_OrV128,
+                                      mkexpr( tmp ),
+                                      binop( Iop_64HLtoV128,
+                                             mkU64( 0 ),
+                                             binop( Iop_Or64,
+                                                    mkU64( sign_byte ),
+                                                    binop( Iop_Shl64,
+                                                           unop( Iop_1Uto64,
+                                                                 sign ),
+                                                           mkU8( neg_bit_shift)
+                               ) ) ) ) ) );
+
+               /* A valid number must have a value that is less then or
+                * equal to 10^16 - 1.  This is checked by making sure
+                * bytes [31:16] of vB are zero.
+                */
+               in_range = binop( Iop_CmpEQ64,
+                                 binop( Iop_And64,
+                                        mkU64( 0xFFFFFFFFFFFFFFF0 ),
+                                        unop( Iop_V128HIto64, mkexpr( vB ) ) ),
+                                 mkU64( 0 ) );
+
+               /* overflow is set if ox_flag or not in_inrange.  Setting is
+                * ORed with the other condition code values.
+                */
+               overflow = mkOR1( ox_flag, mkNOT1( in_range ) );
+
+               /* The sign code must be between 0xA and 0xF and all digits are
+                * between 0x0 and 0x9. The vB must be in range to be valid.
+                * If not valid, condition code set to 0x0001.
+                */
+               valid =
+                  unop( Iop_64to32,
+                        is_BCDstring128( vbi, /* Signed */True,
+                                         mkexpr( vB ) ) );
+            }
+            break;
+
+         case 5:  // bcdctn. (Decimal Convert to National VX-form)
+            {
+               IRExpr *ox_flag, *sign;
+               IRTemp tmp = newTemp( Ity_V128 );;
+
+               DIP("bcdctn. v%d,v%d\n", vRT_addr, vRB_addr);
+
+               value = binop( Iop_And64,
+                              mkU64( 0xFFFFFFFF ),
+                              unop( Iop_V128to64, mkexpr( vB ) ) );
+
+               /* A valid number must have a value that is less then or
+                * equal to 10^7 - 1.  This is checked by making sure
+                * bytes [31:8] of vB are zero.
+                */
+               in_range = mkAND1( binop( Iop_CmpEQ64,
+                                         unop( Iop_V128HIto64, mkexpr( vB ) ),
+                                         mkU64( 0 ) ),
+                                  binop( Iop_CmpEQ64,
+                                         binop( Iop_Shr64,
+                                                unop( Iop_V128to64,
+                                                      mkexpr( vB ) ),
+                                                mkU8( 32 ) ),
+                                         mkU64( 0 ) ) );
+
+               /* The sign code must be between 0xA and 0xF and all digits are
+                *  between 0x0 and 0x9.
+                */
+               valid =
+                  unop( Iop_64to32,
+                        is_BCDstring128( vbi, /* Signed */True,
+                                         mkexpr( vB ) ) );
+
+               /* Upper 24 hex digits of VB, i.e. hex ditgits vB[0:23],
+                * must be zero for the ox_flag to be zero.  This goes
+                * in the LSB position (variable overflow) of the
+                * condition code register.
+                */
+               ox_flag =
+                  mkNOT1( mkAND1( binop( Iop_CmpEQ64,
+                                         binop( Iop_And64,
+                                                unop( Iop_V128HIto64,
+                                                      mkexpr( vB ) ),
+                                                mkU64( 0xFFFFFFFFFFFFFFFF ) ),
+                                         mkU64( 0 ) ),
+                                  binop( Iop_CmpEQ64,
+                                         binop( Iop_And64,
+                                                unop( Iop_V128to64,
+                                                      mkexpr( vB ) ),
+                                                mkU64( 0xFFFFFFFF00000000 ) ),
+                                         mkU64( 0 ) ) ) );
+
+               /* Set zero to 1 if all of the bytes in vB are zero.  This is
+                * used when setting the lt_flag (variable neg) and the gt_flag
+                * (variable pos).
+                */
+               zero = mkAND1( binop( Iop_CmpEQ64,
+                                     binop( Iop_And64,
+                                            unop( Iop_V128HIto64,
+                                                  mkexpr( vB ) ),
+                                            mkU64( 0xFFFFFFFFFFFFFFFF ) ),
+                                         mkU64( 0 ) ),
+                              binop( Iop_CmpEQ64,
+                                      binop( Iop_And64,
+                                             unop( Iop_V128to64, mkexpr( vB ) ),
+                                             mkU64( 0xFFFFFFFFFFFFFFF0 ) ),
+                                      mkU64( 0 ) ) );
+
+               /* Sign codes of 0xA, 0xC, 0xE or 0xF are positive, sign
+                * codes 0xB and 0xD are negative.
+                */
+               sign_digit = binop( Iop_And64, mkU64( 0xF ), value );
+
+               /* The negative value goes in the LT bit position of the
+                * condition code register. Set neg if the sign of the
+                * value is negative and the value is zero.
+                */
+               sign = mkOR1( binop( Iop_CmpEQ64,
+                                    sign_digit,
+                                    mkU64 ( 0xB ) ),
+                             binop( Iop_CmpEQ64,
+                                    sign_digit,
+                                    mkU64 ( 0xD ) ) );
+               neg = mkAND1( sign, mkNOT1( zero ) );
+
+               /* The positive value goes in the LT bit position of the
+                * condition code register. Set neg if the sign of the
+                * value is not negative and the value is zero.
+                */
+               pos = mkAND1( mkNOT1( sign ), mkNOT1( zero ) );
+
+               assign( tmp,
+                       convert_to_national( vbi, mkexpr( vB ) ) );
+
+               /* If vB is positive insert sign value 0x002B, otherwise
+                * insert 0x002D for negative. Have to use sign not neg
+                * because neg has been ANDed with zero.  This is 0x29
+                * OR'd with (sign << 1 | NOT sign) << 1.
+                * sign = 1 if vB is negative.
+                */
+               putVReg( vRT_addr,
+                        binop( Iop_OrV128,
+                               mkexpr( tmp ),
+                               binop( Iop_64HLtoV128,
+                                      mkU64( 0 ),
+                                      binop( Iop_Or64,
+                                             mkU64( 0x29 ),
+                                             binop( Iop_Or64,
+                                                    binop( Iop_Shl64,
+                                                           unop( Iop_1Uto64,
+                                                                 sign ),
+                                                           mkU8( 2 ) ),
+                                                    binop( Iop_Shl64,
+                                                           unop( Iop_1Uto64,
+                                                                 mkNOT1(sign)),
+                                                           mkU8( 1 ) )
+                                                    ) ) ) ) );
+
+
+               /* The sign code must be between 0xA and 0xF and all digits are
+                *  between 0x0 and 0x9. The vB must be in range to be valid.
+                */
+               valid =
+                  unop( Iop_64to32,
+                        is_BCDstring128( vbi, /* Signed */True,
+                                         mkexpr( vB ) ) );
+
+               overflow = ox_flag;
+            }
+            break;
+
+         case 6:  // bcdcfz. (Decimal Convert From Zoned VX-form)
+            {
+               IRExpr *sign;
+               IRTemp tmp = newTemp( Ity_V128 );;
+
+               DIP("bcdcfz. v%d,v%d,%d\n", vRT_addr, vRB_addr, ps);
+
+               valid = unop( Iop_1Uto32, is_Zoned_decimal( vB, ps ) );
+
+               assign( tmp,
+                       convert_from_zoned( vbi, mkexpr( vB ) ) );
+
+               /* If the result of checking the lower 4 bits of each 8-bit
+                * value is zero, then the "number" was zero.
+                */
+               zero =
+                  binop( Iop_CmpEQ64,
+                  binop( Iop_Or64,
+                     binop( Iop_And64,
+                                          unop( Iop_V128to64, mkexpr( vB ) ),
+                                          mkU64( 0x0F0F0F0F0F0F0F0FULL ) ),
+                                   binop( Iop_And64,
+                                          unop( Iop_V128to64, mkexpr( vB ) ),
+                  mkU64( 0x0F0F0F0F0F0F0F0FULL ) ) ),
+                  mkU64( 0 ) );
+
+               /* Sign bit is in bit 6 of vB. */
+               sign_digit = binop( Iop_And64, mkU64( 0xF0 ),
+                                   unop( Iop_V128to64,  mkexpr( vB ) ) );
+
+               if ( ps == 0 ) {
+                  /* sign will be equal to 0 for positive number */
+                  sign = binop( Iop_CmpEQ64,
+                                binop( Iop_And64,
+                                       sign_digit,
+                                       mkU64( 0x40 ) ),
+                                mkU64( 0x40 ) );
+               } else {
+                  sign = mkOR1(
+                               binop( Iop_CmpEQ64, sign_digit, mkU64( 0xB0 ) ),
+                               binop( Iop_CmpEQ64, sign_digit, mkU64( 0xD0 ) ) );
+               }
+
+               /* The negative value goes in the LT bit position of the
+                * condition code register. Set neg if the sign of the
+                * value is negative and the value is zero.
+                */
+               neg = mkAND1( sign, mkNOT1( zero ) );
+
+               /* The positive value goes in the GT bit position of the
+                * condition code register. Set neg if the sign of the
+                * value is not negative and the value is zero.
+                */
+               pos = mkAND1( mkNOT1( sign ), mkNOT1( zero ) );
+
+               /* sign of the result is 0xC for positive, 0xD for negative */
+               putVReg( vRT_addr,
+                               binop( Iop_OrV128,
+                                      mkexpr( tmp ),
+                                      binop( Iop_64HLtoV128,
+                                             mkU64( 0 ),
+                                             binop( Iop_Or64,
+                                                    mkU64( 0xC ),
+                                                    unop( Iop_1Uto64, sign )
+                                                    ) ) ) );
+               /* For this instructions the LSB position in the CC
+                * field, the overflow position in the other instructions,
+                * is given by 0.  There is nothing to or with LT, EQ or GT.
+                */
+               overflow = mkU1( 0 );
+            }
+            break;
+
+         case 7:  // bcdcfn. (Decimal Convert From National VX-form)
+            {
+               IRTemp hword_7 = newTemp( Ity_I64 );
+               IRExpr *sign;
+               IRTemp tmp = newTemp( Ity_I64 );;
+
+               DIP("bcdcfn. v%d,v%d,%d\n", vRT_addr, vRB_addr, ps);
+
+               /* check that the value is valid */
+               valid = unop( Iop_1Uto32, is_National_decimal( vB ) );
+
+               assign( hword_7, binop( Iop_And64,
+                                       unop( Iop_V128to64, mkexpr( vB ) ),
+                                       mkU64( 0xFFF ) ) );
+               /* sign = 1 if vB is negative */
+               sign = binop( Iop_CmpEQ64, mkexpr( hword_7 ), mkU64( 0x002D ) );
+
+               assign( tmp, convert_from_national( vbi, mkexpr( vB ) ) );
+
+               /* If the result of checking the lower 4 bits of each 16-bit
+                * value is zero, then the "number" was zero.
+                */
+               zero =
+                  binop( Iop_CmpEQ64,
+                         binop( Iop_Or64,
+                                binop( Iop_And64,
+                                       unop( Iop_V128HIto64, mkexpr( vB ) ),
+                                       mkU64( 0x000F000F000F000FULL ) ),
+                                binop( Iop_And64,
+                                       unop( Iop_V128to64, mkexpr( vB ) ),
+                                       mkU64( 0x000F000F000F0000ULL ) ) ),
+                         mkU64( 0 ) );
+
+
+               /* The negative value goes in the LT bit position of the
+                * condition code register. Set neg if the sign of the
+               * value is negative and the value is zero.
+                */
+               neg = mkAND1( sign, mkNOT1( zero ) );
+
+               /* The positive value goes in the GT bit position of the
+                * condition code register. Set neg if the sign of the
+                * value is not negative and the value is zero.
+                */
+               pos = mkAND1( mkNOT1( sign ), mkNOT1( zero ) );
+
+               /* For this instructions the LSB position in the CC
+                * field, the overflow position in the other instructions,
+                * is given by invalid.  There is nothing to OR with the valid
+                * flag.
+                */
+               overflow = mkU1( 0 );
+
+               /* sign of the result is:
+                  ( 0b1100 OR neg) OR (ps OR  (ps AND pos) << 1 )
+               */
+
+               putVReg( vRT_addr,
+                        binop( Iop_64HLtoV128,
+                               mkU64( 0 ),
+                               binop( Iop_Or64,
+                                      binop( Iop_Or64,
+                                             binop( Iop_Shl64,
+                                                    binop( Iop_And64,
+                                                           mkU64( ps ),
+                                                           unop( Iop_1Uto64,
+                                                                 mkNOT1(sign))),
+                                                    mkU8( 1 ) ),
+                                             mkU64( ps ) ),
+                                      binop( Iop_Or64,
+                                             binop( Iop_Or64,
+                                                    mkU64( 0xC ),
+                                                    unop( Iop_1Uto64, sign ) ),
+                                             mkexpr( tmp ) ) ) ) );
+
+            }
+            break;
+
+         case 31:  // bcdsetsgn. (BCD set sign)
+            {
+               IRExpr *new_sign_val, *sign;
+
+               DIP("bcdsetsgn. v%d,v%d,%d\n", vRT_addr, vRB_addr, ps);
+
+               value = binop( Iop_AndV128,
+                              binop( Iop_64HLtoV128,
+                                     mkU64( 0xFFFFFFFFFFFFFFFF ),
+                                     mkU64( 0xFFFFFFFFFFFFFFF0 ) ),
+                              mkexpr( vB ) );
+               zero = BCDstring_zero( value );
+
+               /* Sign codes of 0xA, 0xC, 0xE or 0xF are positive, sign
+                * codes 0xB and 0xD are negative.
+                */
+               sign_digit = binop( Iop_And64, mkU64( 0xF ),
+                                   unop( Iop_V128to64, mkexpr( vB ) ) );
+
+               sign = mkOR1( binop( Iop_CmpEQ64,
+                                    sign_digit,
+                                    mkU64 ( 0xB ) ),
+                             binop( Iop_CmpEQ64,
+                                    sign_digit,
+                                    mkU64 ( 0xD ) ) );
+               neg = mkAND1( sign, mkNOT1( zero ) );
+
+               pos = mkAND1( mkNOT1( sign ), mkNOT1( zero ) );
+
+               valid =
+                  unop( Iop_64to32,
+                        is_BCDstring128( vbi, /* Signed */True,
+                                         mkexpr( vB ) ) );
+
+               /* if PS = 0
+                     vB positive, sign is C
+                     vB negative, sign is D
+                  if PS = 1
+                     vB positive, sign is F
+                     vB negative, sign is D
+                  Note can't use pos or neg here since they are ANDed with
+                  zero, use sign instead.
+               */
+               if (ps == 0) {
+                  new_sign_val = binop( Iop_Or64,
+                                        unop( Iop_1Uto64, sign ),
+                                        mkU64( 0xC ) );
+
+               } else {
+                  new_sign_val = binop( Iop_Xor64,
+                                        binop( Iop_Shl64,
+                                               unop( Iop_1Uto64, sign ),
+                                               mkU8( 1 ) ),
+                                        mkU64( 0xF ) );
+               }
+
+               putVReg( vRT_addr, binop( Iop_OrV128,
+                                         binop( Iop_64HLtoV128,
+                                                mkU64( 0 ),
+                                                new_sign_val ),
+                                         value ) );
+               /* For this instructions the LSB position in the CC
+                * field, the overflow position in the other instructions,
+                * is given by invalid.
+                */
+               overflow = unop( Iop_32to1, unop( Iop_Not32, valid ) );
+            }
+            break;
+
+         default:
+            vex_printf("dis_av_bcd(ppc)(invalid inst_select)\n");
+            return False;
+         }
+      }
+      break;
 
    default:
       vex_printf("dis_av_bcd(ppc)(opc2)\n");
       return False;
    }
+
+   IRTemp valid_mask = newTemp( Ity_I32 );
+
+   assign( valid_mask, unop( Iop_1Sto32, unop( Iop_32to1, valid ) ) );
+
+   /* set CR field 6 to:
+    *    0b1000  if vB less then 0, i.e. vB is neg and not zero,
+    *    0b0100  if vB greter then 0,  i.e. vB is pos and not zero,
+    *    0b0010  if vB equals 0,
+    *    0b0001  if vB is invalid  over rules lt, gt, eq
+    */
+   assign( eq_lt_gt,
+           binop( Iop_Or32,
+                  binop( Iop_Shl32,
+                         unop( Iop_1Uto32, neg ),
+                         mkU8( 3 ) ),
+                  binop( Iop_Or32,
+                         binop( Iop_Shl32,
+                                unop( Iop_1Uto32, pos ),
+                                mkU8( 2 ) ),
+                         binop( Iop_Shl32,
+                               unop( Iop_1Uto32, zero ),
+                                mkU8( 1 ) ) ) ) );
+   /* valid is 1 if it is a valid number, complement and put in the
+    * invalid bit location, overriding ls, eq, gt, overflow.
+    */
+   putGST_field( PPC_GST_CR,
+                 binop( Iop_Or32,
+                        binop( Iop_And32,
+                               mkexpr( valid_mask ),
+                               binop( Iop_Or32,
+                                      mkexpr( eq_lt_gt ),
+                                      unop( Iop_1Uto32, overflow ) ) ),
+                        binop( Iop_And32,
+                               unop( Iop_Not32, mkexpr( valid_mask ) ),
+                               mkU32( 1 ) ) ),
+                 6 );
    return True;
 }
 
@@ -18790,11 +27054,13 @@
       { 0x24, "xsmaddmsp" },
       { 0x28, "xxpermdi" },
       { 0x34, "xsresp" },
+      { 0x3A, "xxpermr" },
       { 0x40, "xsmulsp" },
       { 0x44, "xsmsubasp" },
       { 0x48, "xxmrghw" },
       { 0x60, "xsdivsp" },
       { 0x64, "xsmsubmsp" },
+      { 0x68, "xxperm" },
       { 0x80, "xsadddp" },
       { 0x84, "xsmaddadp" },
       { 0x8c, "xscmpudp" },
@@ -18816,6 +27082,8 @@
       { 0xd6, "xsrdpic" },
       { 0xe0, "xsdivdp" },
       { 0xe4, "xsmsubmdp" },
+      { 0xe8, "xxpermr" },
+      { 0xeC, "xscmpexpdp" },
       { 0xf2, "xsrdpim" },
       { 0xf4, "xstdivdp" },
       { 0x100, "xvaddsp" },
@@ -18834,6 +27102,7 @@
       { 0x140, "xvmulsp" },
       { 0x144, "xvmsubasp" },
       { 0x148, "xxspltw" },
+      { 0x14A, "xxextractuw" },
       { 0x14c, "xvcmpgesp" },
       { 0x150, "xvcvuxwsp" },
       { 0x152, "xvrspip" },
@@ -18841,6 +27110,7 @@
       { 0x156, "xvrspic" },
       { 0x160, "xvdivsp" },
       { 0x164, "xvmsubmsp" },
+      { 0x16A, "xxinsertw" },
       { 0x170, "xvcvsxwsp" },
       { 0x172, "xvrspim" },
       { 0x174, "xvtdivsp" },
@@ -18879,6 +27149,7 @@
       { 0x244, "xsnmsubasp" },
       { 0x248, "xxlor" },
       { 0x250, "xscvuxdsp" },
+      { 0x254, "xststdcsp" },
       { 0x264, "xsnmsubmsp" },
       { 0x268, "xxlxor" },
       { 0x270, "xscvsxdsp" },
@@ -18893,11 +27164,13 @@
       { 0x2a8, "xxlorc" },
       { 0x2b0, "xscvdpsxds" },
       { 0x2b2, "xsabsdp" },
+      { 0x2b6, "xsxexpdp_xsxigdp" },
       { 0x2c0, "xscpsgndp" },
       { 0x2c4, "xsnmsubadp" },
       { 0x2c8, "xxlnand" },
       { 0x2d0, "xscvuxddp" },
       { 0x2d2, "xsnabsdp" },
+      { 0x2d4, "xststdcdp" },
       { 0x2e4, "xsnmsubmdp" },
       { 0x2e8, "xxleqv" },
       { 0x2f0, "xscvsxddp" },
@@ -18917,6 +27190,8 @@
       { 0x34c, "xvcmpgesp." },
       { 0x350, "xvcvuxdsp" },
       { 0x352, "xvnabssp" },
+      { 0x354, "xvtstdcsp" },
+      { 0x360, "xviexpsp" },
       { 0x364, "xvnmsubmsp" },
       { 0x370, "xvcvsxdsp" },
       { 0x372, "xvnegsp" },
@@ -18925,16 +27200,20 @@
       { 0x38c, "xvcmpeqdp." },
       { 0x390, "xvcvdpuxds" },
       { 0x392, "xvcvspdp" },
+      { 0x396, "xsiexpdp" },
       { 0x3a0, "xvmindp" },
       { 0x3a4, "xvnmaddmdp" },
       { 0x3ac, "xvcmpgtdp." },
       { 0x3b0, "xvcvdpsxds" },
       { 0x3b2, "xvabsdp" },
+      { 0x3b6, "xxbr[h|w|d|q]|xvxexpdp|xvxexpsp|xvxsigdp|xvxsigsp|xvcvhpsp|xvcvsphp|xscvdphp|xscvhpdp" },
       { 0x3c0, "xvcpsgndp" },
       { 0x3c4, "xvnmsubadp" },
       { 0x3cc, "xvcmpgedp." },
       { 0x3d0, "xvcvuxddp" },
       { 0x3d2, "xvnabsdp" },
+      { 0x3d4, "xvtstdcdp" },
+      { 0x3e0, "xviexpdp" },
       { 0x3e4, "xvnmsubmdp" },
       { 0x3f0, "xvcvsxddp" },
       { 0x3f2, "xvnegdp" }
@@ -18972,6 +27251,7 @@
 #define XX3_2_MASK 0x000001FC
 #define XX3_3_MASK 0x0000007C
 #define XX4_MASK 0x00000018
+#define VDCMX_MASK 0x000003B8
    Int ret;
    UInt vsxExtOpcode = 0;
 
@@ -18979,6 +27259,8 @@
       vsxExtOpcode = vsx_all[ret].opcode;
    else if (( ret = findVSXextOpCode(opc2_full & XX3_1_MASK)) >= 0)
       vsxExtOpcode = vsx_all[ret].opcode;
+   else if (( ret = findVSXextOpCode(opc2_full & VDCMX_MASK)) >= 0)
+      vsxExtOpcode = vsx_all[ret].opcode;
    else if (( ret = findVSXextOpCode(opc2_full & XX3_2_MASK)) >= 0)
       vsxExtOpcode = vsx_all[ret].opcode;
    else if (( ret = findVSXextOpCode(opc2_full & XX3_3_MASK)) >= 0)
@@ -19019,6 +27301,7 @@
    Bool      allow_VX = False;  // Equates to "supports Power ISA 2.06
    Bool      allow_DFP = False;
    Bool      allow_isa_2_07 = False;
+   Bool      allow_isa_3_0  = False;
    UInt      hwcaps = archinfo->hwcaps;
    Long      delta;
 
@@ -19031,6 +27314,7 @@
       allow_VX = (0 != (hwcaps & VEX_HWCAPS_PPC64_VX));
       allow_DFP = (0 != (hwcaps & VEX_HWCAPS_PPC64_DFP));
       allow_isa_2_07 = (0 != (hwcaps & VEX_HWCAPS_PPC64_ISA2_07));
+      allow_isa_3_0  = (0 != (hwcaps & VEX_HWCAPS_PPC64_ISA3_0));
    } else {
       allow_F  = (0 != (hwcaps & VEX_HWCAPS_PPC32_F));
       allow_V  = (0 != (hwcaps & VEX_HWCAPS_PPC32_V));
@@ -19039,6 +27323,7 @@
       allow_VX = (0 != (hwcaps & VEX_HWCAPS_PPC32_VX));
       allow_DFP = (0 != (hwcaps & VEX_HWCAPS_PPC32_DFP));
       allow_isa_2_07 = (0 != (hwcaps & VEX_HWCAPS_PPC32_ISA2_07));
+      allow_isa_3_0  = (0 != (hwcaps & VEX_HWCAPS_PPC32_ISA3_0));
    }
 
    /* The running delta */
@@ -19269,7 +27554,8 @@
       goto decode_failure;
 
       /* Floating Point Load Double Pair Instructions */
-   case 0x39: case 0x3D:
+   case 0x39: case 0x3D:    // lfdp, lxsd, lxssp, lxv
+                            // stfdp, stxsd, stxssp, stxv
       if (!allow_F) goto decode_noF;
       if (dis_fp_pair( theInstr )) goto decode_success;
       goto decode_failure;
@@ -19317,7 +27603,8 @@
             if (dis_dfp_fmt_conv( theInstr ))
                goto decode_success;
             goto decode_failure;
-         case 0x2A2: // dtstsf - DFP number of significant digits
+         case 0x2A2: // dtstsf  - DFP number of significant digits
+         case 0x2A3: // dtstsfi - DFP number of significant digits Immediate
             if (!allow_DFP) goto decode_noDFP;
             if (dis_dfp_significant_digits(theInstr))
                goto decode_success;
@@ -19424,17 +27711,45 @@
       // All of these VSX instructions use some VMX facilities, so
       // if allow_V is not set, we'll skip trying to decode.
       if (!allow_V) goto decode_noVX;
-
+      /* The xvtstdcdp and xvtstdcsp instructions do not have a
+         contiguous opc2 field.  The following vsxOpc2 = get_VSX60_opc2()
+         doesn't correctly match these instructions for dc != 0.  So,
+         we will explicitly look for the two instructions. */
+      opc2 = ifieldOPClo10(theInstr);
+      UInt opc2hi = IFIELD(theInstr, 7, 4);
+      UInt opc2lo = IFIELD(theInstr, 3, 3);
       UInt vsxOpc2 = get_VSX60_opc2(opc2);
+
+      if (( opc2hi == 13 ) && ( opc2lo == 5)) { //xvtstdcsp
+         if (dis_vxs_misc(theInstr, 0x354, allow_isa_3_0))
+            goto decode_success;
+         goto decode_failure;
+      }
+
+      if (( opc2hi == 15 ) && ( opc2lo == 5)) { //xvtstdcdp
+         if (dis_vxs_misc(theInstr, 0x3D4, allow_isa_3_0))
+               goto decode_success;
+            goto decode_failure;
+      }
+
       /* The vsxOpc2 returned is the "normalized" value, representing the
        * instructions secondary opcode as taken from the standard secondary
        * opcode field [21:30] (IBM notatition), even if the actual field
        * is non-standard.  These normalized values are given in the opcode
        * appendices of the ISA 2.06 document.
        */
+      if ( ( opc2 == 0x168 ) && ( IFIELD( theInstr, 19, 2 ) == 0 ) )// xxspltib
+      {
+         /* This is a special case of the XX1 form where the  RA, RB
+          * fields hold an immediate value.
+          */
+         if (dis_vxs_misc(theInstr, opc2, allow_isa_3_0)) goto decode_success;
+         goto decode_failure;
+      }
 
       switch (vsxOpc2) {
          case 0x8: case 0x28: case 0x48: case 0xc8: // xxsldwi, xxpermdi, xxmrghw, xxmrglw
+         case 0x068: case 0xE8:  // xxperm, xxpermr
          case 0x018: case 0x148: // xxsel, xxspltw
             if (dis_vx_permute_misc(theInstr, vsxOpc2)) goto decode_success;
             goto decode_failure;
@@ -19443,6 +27758,8 @@
          case 0x2C8: case 0x2E8: // xxlnand, xxleqv
             if (dis_vx_logic(theInstr, vsxOpc2)) goto decode_success;
             goto decode_failure;
+         case 0x0ec:             // xscmpexpdp
+         case 0x14A: case 0x16A: // xxextractuw, xxinsertw
          case 0x2B2: case 0x2C0: // xsabsdp, xscpsgndp
          case 0x2D2: case 0x2F2: // xsnabsdp, xsnegdp
          case 0x280: case 0x2A0: // xsmaxdp, xsmindp
@@ -19451,7 +27768,15 @@
          case 0x0B4: case 0x094: // xsredp, xsrsqrtedp
          case 0x0D6: case 0x0B2: // xsrdpic, xsrdpiz
          case 0x092: case 0x232: // xsrdpi, xsrsp
-            if (dis_vxs_misc(theInstr, vsxOpc2)) goto decode_success;
+         case 0x3B6:             // xxbrh, xvxexpdp, xvxexpsp, xvxsigdp
+                                 // xvxsigsp, xvcvhpsp
+         case 0x2b6:             // xsxexpdp, xsxsigdp
+         case 0x254: case 0x2d4: // xststdcsp, xststdcdp
+         case 0x354:             // xvtstdcsp
+         case 0x360:case 0x396:  // xviexpsp, xsiexpdp
+         case 0x3D4: case 0x3E0: // xvtstdcdp, xviexpdp
+            if (dis_vxs_misc(theInstr, vsxOpc2, allow_isa_3_0))
+               goto decode_success;
             goto decode_failure;
          case 0x08C: case 0x0AC: // xscmpudp, xscmpodp
             if (dis_vx_cmp(theInstr, vsxOpc2)) goto decode_success;
@@ -19599,7 +27924,21 @@
          break; // Fall through
       }
 
+      opc2 = IFIELD(theInstr, 1, 8);
+      switch (opc2) {
+      case 0x5: // xsrqpi, xsrqpix
+      case 0x25: // xsrqpxp
+         if ( !mode64 || !allow_isa_3_0 ) goto decode_failure;
+         if ( dis_vx_Scalar_Round_to_quad_integer( theInstr ) )
+            goto decode_success;
+         goto decode_failure;
+      default:
+         break; // Fall through
+      }
+
       opc2 = IFIELD(theInstr, 1, 10);
+      UInt inst_select = IFIELD( theInstr, 16, 5 );
+
       switch (opc2) {
       /* 128-bit DFP instructions */
       case 0x2:    // daddq - DFP Add
@@ -19633,7 +27972,8 @@
             goto decode_success;
          goto decode_failure;
 
-      case 0x2A2: // dtstsfq - DFP number of significant digits
+      case 0x2A2: // dtstsfq  - DFP number of significant digits
+      case 0x2A3: // dtstsfiq - DFP number of significant digits Immediate
          if (!allow_DFP) goto decode_noDFP;
          if (dis_dfp_significant_digits(theInstr))
             goto decode_success;
@@ -19708,6 +28048,47 @@
          if (dis_fp_scr( theInstr, allow_GX )) goto decode_success;
          goto decode_failure;
 
+      case 0x324: // xsabsqp, xsxexpqp,xsnabsqp, xsnegqp, xsxsigqp
+         if ( inst_select == 27 ) {    // xssqrtqp
+            if ( dis_vx_Floating_Point_Arithmetic_quad_precision( theInstr ) )
+               goto decode_success;
+         }
+
+         /* Instructions implemented with Pre ISA 3.0 Iops */
+         /* VSX Scalar Quad-Precision instructions */
+      case 0x064: // xscpsgnqp
+      case 0x0A4: // xscmpexpqp
+      case 0x084: // xscmpoqp
+      case 0x284: // xscmpuqp
+      case 0x2C4: // xststdcqp
+      case 0x364: // xsiexpqp
+         if (dis_vx_scalar_quad_precision( theInstr )) goto decode_success;
+         goto decode_failure;
+
+      /* Instructions implemented using ISA 3.0 instructions */
+                  // xsaddqpo (VSX Scalar Add Quad-Precision [using round to ODD]
+      case 0x004: // xsaddqp  (VSX Scalar Add Quad-Precision [using RN mode]
+                  // xsmulqpo (VSX Scalar Multiply Quad-Precision [using round to ODD]
+      case 0x024: // xsmulqp  (VSX Scalar Multiply Quad-Precision [using RN mode]
+                  // xsmaddqpo (VSX Scalar Multiply Add Quad-Precision [using round to ODD]
+      case 0x184: // xsmaddqp  (VSX Scalar Multiply Add Quad-Precision [using RN mode]
+                  // xsmsubqpo (VSX Scalar Multiply Sub Quad-Precision [using round to ODD]
+      case 0x1A4: // xsmsubqp  (VSX Scalar Multiply Sub Quad-Precision [using RN mode]
+                  // xsnmaddqpo (VSX Scalar Negative Multiply Add Quad-Precision [using round to ODD]
+      case 0x1C4: // xsnmaddqp  (VSX Scalar Negative Multiply Add Quad-Precision [using RN mode]
+                  // xsnmsubqpo (VSX Scalar Negative Multiply Sub Quad-Precision [using round to ODD]
+      case 0x1E4: // xsnmsubqp  (VSX Scalar Negative Multiply Sub Quad-Precision [usin RN mode]
+                  // xssubqpo (VSX Scalar Subrtact Quad-Precision [using round to ODD]
+      case 0x204: // xssubqp  (VSX Scalar Subrtact Quad-Precision [using RN mode]
+                  // xsdivqpo (VSX Scalar Divde Quad-Precision [using round to ODD]
+      case 0x224: // xsdivqp  (VSX Scalar Divde Quad-Precision [using RN mode]
+      case 0x344: // xscvudqp, xscvsdqp, xscvqpdp, xscvqpdpo, xsvqpdp
+                  // xscvqpswz, xscvqpuwz, xscvqpudz, xscvqpsdz
+         if ( !mode64 || !allow_isa_3_0 ) goto decode_failure;
+         if ( dis_vx_Floating_Point_Arithmetic_quad_precision( theInstr ) )
+            goto decode_success;
+         goto decode_failure;
+
       default:
          break; // Fall through...
       }
@@ -19757,6 +28138,19 @@
       break;
 
    case 0x13:
+
+      opc2 = ifieldOPClo5(theInstr);
+      switch (opc2) {
+
+      /* PC relative load/store */
+      case 0x002:       // addpcis
+         if (dis_pc_relative(theInstr)) goto decode_success;
+         goto decode_failure;
+
+      /* fall through to the next opc2 field size */
+      }
+
+      opc2 = ifieldOPClo10(theInstr);
       switch (opc2) {
 
       /* Condition Register Logical Instructions */
@@ -19832,11 +28226,28 @@
 
       opc2 = IFIELD(theInstr, 1, 10);
       switch (opc2) {
+
+      /* Integer miscellaneous instructions */
+      case 0x01E:  // wait  RFC 2500
+         if (dis_int_misc( theInstr )) goto decode_success;
+         goto decode_failure;
+
+
       /* Integer Compare Instructions  */
-      case 0x000: case 0x020: // cmp, cmpl
+      case 0x000: case 0x020: case 0x080: // cmp, cmpl, setb
          if (dis_int_cmp( theInstr )) goto decode_success;
          goto decode_failure;
 
+      case 0x0C0: case 0x0E0:   // cmprb, cmpeqb
+         if (dis_byte_cmp( theInstr )) goto decode_success;
+         goto decode_failure;
+
+      case 0x10B: case 0x30B: // moduw, modsw
+      case 0x109: case 0x309: // modsd, modud
+      case 0x21A: case 0x23A: // cnttzw, cnttzd
+         if (dis_modulo_int( theInstr )) goto decode_success;
+         goto decode_failure;
+
       /* Integer Logical Instructions */
       case 0x01C: case 0x03C: case 0x01A: // and,  andc,  cntlzw
       case 0x11C: case 0x3BA: case 0x39A: // eqv,  extsb, extsh
@@ -20054,9 +28465,17 @@
       /* VSX Load */
       case 0x00C: // lxsiwzx
       case 0x04C: // lxsiwax
+      case 0x10C: // lxvx
+      case 0x10D: // lxvl
+      case 0x12D: // lxvll
+      case 0x16C: // lxvwsx
       case 0x20C: // lxsspx
       case 0x24C: // lxsdx
+      case 0x32C: // lxvh8x
+      case 0x30D: // lxsibzx
+      case 0x32D: // lxsihzx
       case 0x34C: // lxvd2x
+      case 0x36C: // lxvb16x
       case 0x14C: // lxvdsx
       case 0x30C: // lxvw4x
         // All of these VSX load instructions use some VMX facilities, so
@@ -20068,10 +28487,17 @@
 
       /* VSX Store */
       case 0x08C: // stxsiwx
+      case 0x18C: // stxvx
+      case 0x18D: // stxvl
+      case 0x1AD: // stxvll
       case 0x28C: // stxsspx
       case 0x2CC: // stxsdx
-      case 0x3CC: // stxvd2x
       case 0x38C: // stxvw4x
+      case 0x3CC: // stxvd2x
+      case 0x38D: // stxsibx
+      case 0x3AD: // stxsihx
+      case 0x3AC: // stxvh8x
+      case 0x3EC: // stxvb16x
         // All of these VSX store instructions use some VMX facilities, so
         // if allow_V is not set, we'll skip trying to decode.
         if (!allow_V) goto decode_noV;
@@ -20079,6 +28505,13 @@
 	if (dis_vx_store( theInstr )) goto decode_success;
     	  goto decode_failure;
 
+      case 0x133: case 0x193: case 0x1B3:  // mfvsrld, mfvsrdd, mtvsrws
+        // The move from/to VSX instructions use some VMX facilities, so
+        // if allow_V is not set, we'll skip trying to decode.
+        if (!allow_V) goto decode_noV;
+        if (dis_vx_move( theInstr )) goto decode_success;
+        goto decode_failure;
+
       /* Miscellaneous ISA 2.06 instructions */
       case 0x1FA: // popcntd
       case 0x17A: // popcntw
@@ -20113,6 +28546,16 @@
             DIP("isel r%u,r%u,r%u,crb%u\n", rT,rA,rB,bi);
             goto decode_success;
          }
+      }
+
+      opc2 = IFIELD(theInstr, 2, 9);
+      switch (opc2) {
+      case 0x1BD:
+         if (!mode64) goto decode_failure;
+         if (dis_int_logic( theInstr )) goto decode_success;
+         goto decode_failure;
+
+      default:
          goto decode_failure;
       }
       break;
@@ -20131,6 +28574,11 @@
          if (dis_av_multarith( theInstr )) goto decode_success;
          goto decode_failure;
 
+      case 0x30: case 0x31: case 0x33: // maddhd, madhdu, maddld
+         if (!mode64) goto decode_failure;
+         if (dis_int_mult_add( theInstr )) goto decode_success;
+         goto decode_failure;
+
       /* AV Permutations */
       case 0x2A:                       // vsel
       case 0x2B:                       // vperm
@@ -20140,6 +28588,7 @@
          goto decode_failure;
 
       case 0x2D:                       // vpermxor
+      case 0x3B:                       // vpermr
          if (!allow_isa_2_07) goto decode_noP8;
          if (dis_av_permute( theInstr )) goto decode_success;
          goto decode_failure;
@@ -20161,19 +28610,36 @@
       }
 
       opc2 = IFIELD(theInstr, 0, 9);
-      switch (opc2) {
-      /* BCD arithmetic */
-      case 0x1: case 0x41:             // bcdadd, bcdsub
-         if (!allow_isa_2_07) goto decode_noP8;
-         if (dis_av_bcd( theInstr )) goto decode_success;
-         goto decode_failure;
-
-      default:
-         break;  // Fall through...
+      if (IFIELD(theInstr, 10, 1) == 1) {
+         /* The following instructions have bit 21 set and a PS bit (bit 22)
+          * Bit 21 distinquishes them from instructions with an 11 bit opc2
+          * field.
+          */
+         switch (opc2) {
+            /* BCD arithmetic */
+            case 0x001: case 0x041:             // bcdadd, bcdsub
+            case 0x101: case 0x141:             // bcdtrunc., bcdutrunc.
+            case 0x081: case 0x0C1: case 0x1C1: // bcdus., bcds., bcdsr.
+            case 0x181:                         // bcdcfn., bcdcfz.
+                                                // bcdctz., bcdcfsq., bcdctsq.
+               if (!allow_isa_2_07) goto decode_noP8;
+               if (dis_av_bcd( theInstr, abiinfo )) goto decode_success;
+              goto decode_failure;
+         default:
+              break;  // Fall through...
+            }
       }
 
       opc2 = IFIELD(theInstr, 0, 11);
       switch (opc2) {
+      /* BCD manipulation */
+      case 0x341:                  // bcdcpsgn
+
+         if (!allow_isa_2_07) goto decode_noP8;
+         if (dis_av_bcd_misc( theInstr, abiinfo )) goto decode_success;
+         goto decode_failure;
+
+
       /* AV Arithmetic */
       case 0x180:                         // vaddcuw
       case 0x000: case 0x040: case 0x080: // vaddubm, vadduhm, vadduwm
@@ -20245,12 +28711,27 @@
          if (dis_av_logic( theInstr )) goto decode_success;
          goto decode_failure;
 
+      /* AV Rotate */
+      case 0x085: case 0x185:             // vrlwmi, vrlwnm
+      case 0x0C5: case 0x1C5:             // vrldmi, vrldnm
+         if (!allow_V) goto decode_noV;
+         if (dis_av_rotate( theInstr )) goto decode_success;
+         goto decode_failure;
+
       /* AV Processor Control */
       case 0x604: case 0x644:             // mfvscr, mtvscr
          if (!allow_V) goto decode_noV;
          if (dis_av_procctl( theInstr )) goto decode_success;
          goto decode_failure;
 
+      /* AV Vector Extract Element instructions */
+      case 0x60D: case 0x64D: case 0x68D:   // vextublx, vextuhlx, vextuwlx
+      case 0x70D: case 0x74D: case 0x78D:   // vextubrx, vextuhrx, vextuwrx
+         if (!allow_V) goto decode_noV;
+         if (dis_av_extract_element( theInstr )) goto decode_success;
+         goto decode_failure;
+
+
       /* AV Floating Point Arithmetic */
       case 0x00A: case 0x04A:             // vaddfp, vsubfp
       case 0x10A: case 0x14A: case 0x18A: // vrefp, vrsqrtefp, vexptefp
@@ -20269,10 +28750,14 @@
          if (dis_av_fp_convert( theInstr )) goto decode_success;
          goto decode_failure;
 
-      /* AV Merge, Splat */
+      /* AV Merge, Splat, Extract, Insert */
       case 0x00C: case 0x04C: case 0x08C: // vmrghb, vmrghh, vmrghw
       case 0x10C: case 0x14C: case 0x18C: // vmrglb, vmrglh, vmrglw
       case 0x20C: case 0x24C: case 0x28C: // vspltb, vsplth, vspltw
+      case 0x20D: case 0x24D:             // vextractub, vextractuh,
+      case 0x28D: case 0x2CD:             // vextractuw, vextractd,
+      case 0x30D: case 0x34D:             // vinsertb, vinserth
+      case 0x38D: case 0x3CD:             // vinsertw, vinsertd
       case 0x30C: case 0x34C: case 0x38C: // vspltisb, vspltish, vspltisw
          if (!allow_V) goto decode_noV;
          if (dis_av_permute( theInstr )) goto decode_success;
@@ -20283,6 +28768,14 @@
           if (dis_av_permute( theInstr )) goto decode_success;
           goto decode_failure;
 
+      /* AltiVec 128 bit integer multiply by 10 Instructions */
+      case 0x201: case 0x001:               //vmul10uq, vmul10cuq
+      case 0x241: case 0x041:               //vmul10euq, vmul10ceuq
+          if (!allow_V) goto decode_noV;
+          if (!allow_isa_3_0) goto decode_noP9;
+          if (dis_av_mult10( theInstr )) goto decode_success;
+          goto decode_failure;
+
       /* AV Pack, Unpack */
       case 0x00E: case 0x04E: case 0x08E: // vpkuhum, vpkuwum, vpkuhus
       case 0x0CE:                         // vpkuwus
@@ -20295,6 +28788,11 @@
           if (dis_av_pack( theInstr )) goto decode_success;
           goto decode_failure;
 
+      case 0x403: case 0x443: case 0x483:  // vabsdub, vabsduh, vabsduw
+          if (!allow_V) goto decode_noV;
+          if (dis_abs_diff( theInstr )) goto decode_success;
+          goto decode_failure;
+
       case 0x44E: case 0x4CE: case 0x54E: // vpkudum, vpkudus, vpksdus
       case 0x5CE: case 0x64E: case 0x6cE: // vpksdss, vupkhsw, vupklsw
          if (!allow_isa_2_07) goto decode_noP8;
@@ -20308,6 +28806,20 @@
          if (dis_av_cipher( theInstr )) goto decode_success;
          goto decode_failure;
 
+     /* AV Vector Extend Sign Instructions and
+      * Vector Count Leading/Trailing zero Least-Significant bits Byte.
+      * Vector Integer Negate Instructions
+      */
+      case 0x602:   // vextsb2w, vextsh2w, vextsb2d, vextsh2d, vextsw2d
+                    // vclzlsbb and vctzlsbb
+                    // vnegw, vnegd
+                    // vprtybw, vprtybd, vprtybq
+                    // vctzb, vctzh, vctzw, vctzd
+         if (!allow_V) goto decode_noV;
+         if (dis_av_extend_sign_count_zero( theInstr, allow_isa_3_0 ))
+            goto decode_success;
+         goto decode_failure;
+
       case 0x6C2: case 0x682:             // vshasigmaw, vshasigmad
          if (!allow_isa_2_07) goto decode_noP8;
          if (dis_av_hash( theInstr )) goto decode_success;
@@ -20326,6 +28838,7 @@
          goto decode_failure;
 
       case 0x50c:                         // vgbbd
+      case 0x5cc:                         // vbpermd
          if (!allow_isa_2_07) goto decode_noP8;
          if (dis_av_count_bitTranspose( theInstr, opc2 )) goto decode_success;
          goto decode_failure;
@@ -20345,7 +28858,9 @@
       switch (opc2) {
 
       /* AV Compare */
-      case 0x006: case 0x046: case 0x086: // vcmpequb, vcmpequh, vcmpequw
+      case 0x006: case 0x007: case 0x107: // vcmpequb, vcmpneb, vcmpnezb
+      case 0x046: case 0x047: case 0x147: // vcmpequh, vcmpneh, vcmpnezh
+      case 0x086: case 0x087: case 0x187: // vcmpequw, vcmpnew, vcmpnezw
       case 0x206: case 0x246: case 0x286: // vcmpgtub, vcmpgtuh, vcmpgtuw
       case 0x306: case 0x346: case 0x386: // vcmpgtsb, vcmpgtsh, vcmpgtsw
          if (!allow_V) goto decode_noV;
@@ -20426,6 +28941,13 @@
 		 theInstr);
       goto not_supported;
 
+   decode_noP9:
+      vassert(!allow_isa_3_0);
+      vex_printf("disInstr(ppc): found the Power 9 instruction 0x%x that can't be handled\n"
+                 "by Valgrind on this host.  This instruction requires a host that\n"
+		 "supports Power 9 instructions.\n",
+		 theInstr);
+      goto not_supported;
 
    decode_failure:
    /* All decode failures end up here. */
diff --git a/VEX/priv/guest_s390_toIR.c b/VEX/priv/guest_s390_toIR.c
index 116a606..59cddfd 100644
--- a/VEX/priv/guest_s390_toIR.c
+++ b/VEX/priv/guest_s390_toIR.c
@@ -8,7 +8,7 @@
    This file is part of Valgrind, a dynamic binary instrumentation
    framework.
 
-   Copyright IBM Corp. 2010-2015
+   Copyright IBM Corp. 2010-2016
 
    This program is free software; you can redistribute it and/or
    modify it under the terms of the GNU General Public License as
@@ -3831,6 +3831,16 @@
 }
 
 static const HChar *
+s390_irgen_BRCTH(UChar r1, UInt i2)
+{
+   put_gpr_w0(r1, binop(Iop_Sub32, get_gpr_w0(r1), mkU32(1)));
+   if_condition_goto(binop(Iop_CmpNE32, get_gpr_w0(r1), mkU32(0)),
+                     guest_IA_curr_instr + ((ULong)(Long)(Short)i2 << 1));
+
+   return "brcth";
+}
+
+static const HChar *
 s390_irgen_BRCTG(UChar r1, UShort i2)
 {
    put_gpr_dw0(r1, binop(Iop_Sub64, get_gpr_dw0(r1), mkU64(1)));
@@ -7578,7 +7588,8 @@
 }
 
 static const HChar *
-s390_irgen_RISBG(UChar r1, UChar r2, UChar i3, UChar i4, UChar i5)
+s390_irgen_RISBGx(UChar r1, UChar r2, UChar i3, UChar i4, UChar i5,
+                  Bool set_cc)
 {
    UChar from;
    UChar to;
@@ -7612,9 +7623,83 @@
       put_gpr_dw0(r1, binop(Iop_And64, mkexpr(op2), mkU64(mask)));
    }
    assign(result, get_gpr_dw0(r1));
-   s390_cc_thunk_putS(S390_CC_OP_LOAD_AND_TEST, result);
+   if (set_cc) {
+      s390_cc_thunk_putS(S390_CC_OP_LOAD_AND_TEST, result);
+      return "risbg";
+   }
 
-   return "risbg";
+   return "risbgn";
+}
+
+static const HChar *
+s390_irgen_RISBG(UChar r1, UChar r2, UChar i3, UChar i4, UChar i5)
+{
+   return s390_irgen_RISBGx(r1, r2, i3, i4, i5, True);
+}
+
+static const HChar *
+s390_irgen_RISBGN(UChar r1, UChar r2, UChar i3, UChar i4, UChar i5)
+{
+   return s390_irgen_RISBGx(r1, r2, i3, i4, i5, False);
+}
+
+static IRExpr *
+s390_irgen_RISBxG(UChar r1, UChar r2, UChar i3, UChar i4, UChar i5,
+                  Bool high)
+{
+   UChar from;
+   UChar to;
+   UChar rot;
+   UChar z_bit;
+   UInt mask;
+   UInt maskc;
+   IRTemp op2 = newTemp(Ity_I32);
+
+   from = i3 & 31;
+   to = i4 & 31;
+   rot = i5 & 63;
+   z_bit = i4 & 128;
+   if (rot == 0) {
+      assign(op2, high ? get_gpr_w0(r2) : get_gpr_w1(r2));
+   } else if (rot == 32) {
+      assign(op2, high ? get_gpr_w1(r2) : get_gpr_w0(r2));
+   } else {
+      assign(op2,
+             unop(high ? Iop_64HIto32 : Iop_64to32,
+                  binop(Iop_Or64,
+                        binop(Iop_Shl64, get_gpr_dw0(r2), mkU8(rot)),
+                        binop(Iop_Shr64, get_gpr_dw0(r2), mkU8(64 - rot)))));
+   }
+   if (from <= to) {
+      mask = ~0U;
+      mask = (mask >> from) & (mask << (31 - to));
+      maskc = ~mask;
+   } else {
+      maskc = ~0U;
+      maskc = (maskc >> (to + 1)) & (maskc << (32 - from));
+      mask = ~maskc;
+   }
+   if (z_bit) {
+      return binop(Iop_And32, mkexpr(op2), mkU32(mask));
+   }
+   return binop(Iop_Or32,
+                binop(Iop_And32, high ? get_gpr_w0(r1) : get_gpr_w1(r1),
+                      mkU32(maskc)),
+                binop(Iop_And32, mkexpr(op2), mkU32(mask)));
+}
+
+static const HChar *
+s390_irgen_RISBHG(UChar r1, UChar r2, UChar i3, UChar i4, UChar i5)
+{
+   put_gpr_w0(r1, s390_irgen_RISBxG(r1, r2, i3, i4, i5, True));
+   return "risbhg";
+}
+
+static const HChar *
+s390_irgen_RISBLG(UChar r1, UChar r2, UChar i3, UChar i4, UChar i5)
+{
+   put_gpr_w1(r1, s390_irgen_RISBxG(r1, r2, i3, i4, i5, False));
+   return "risblg";
 }
 
 static const HChar *
@@ -8754,6 +8839,15 @@
 }
 
 static const HChar *
+s390_irgen_LDER(UChar r1, UChar r2)
+{
+   put_fpr_dw0(r1, mkF64i(0x0));
+   put_fpr_w0(r1, get_fpr_w0(r2));
+
+   return "lder";
+}
+
+static const HChar *
 s390_irgen_LXR(UChar r1, UChar r2)
 {
    put_fpr_dw0(r1, get_fpr_dw0(r2));
@@ -8779,6 +8873,15 @@
 }
 
 static const HChar *
+s390_irgen_LDE(UChar r1, IRTemp op2addr)
+{
+   put_fpr_dw0(r1, mkF64i(0x0));
+   put_fpr_w0(r1, load(Ity_F32, mkexpr(op2addr)));
+
+   return "lde";
+}
+
+static const HChar *
 s390_irgen_LEY(UChar r1, IRTemp op2addr)
 {
    put_fpr_w0(r1, load(Ity_F32, mkexpr(op2addr)));
@@ -10895,6 +10998,22 @@
 }
 
 static void
+s390_irgen_MVCIN_EX(IRTemp length, IRTemp start1, IRTemp start2)
+{
+   IRTemp counter = newTemp(Ity_I64);
+
+   assign(counter, get_counter_dw0());
+
+   store(binop(Iop_Add64, mkexpr(start1), mkexpr(counter)),
+         load(Ity_I8, binop(Iop_Sub64, mkexpr(start2), mkexpr(counter))));
+
+   /* Check for end of field */
+   put_counter_dw0(binop(Iop_Add64, mkexpr(counter), mkU64(1)));
+   iterate_if(binop(Iop_CmpNE64, mkexpr(counter), mkexpr(length)));
+   put_counter_dw0(mkU64(0));
+}
+
+static void
 s390_irgen_TR_EX(IRTemp length, IRTemp start1, IRTemp start2)
 {
    IRTemp op = newTemp(Ity_I8);
@@ -11027,6 +11146,11 @@
       s390_irgen_EX_SS(r1, addr2, s390_irgen_TR_EX, 64);
       return "ex@tr";
 
+   case 0xe800000000000000ULL:
+      /* special case MVCIN */
+      s390_irgen_EX_SS(r1, addr2, s390_irgen_MVCIN_EX, 64);
+      return "ex@mvcin";
+
    default:
    {
       /* everything else will get a self checking prefix that also checks the
@@ -11471,6 +11595,17 @@
 }
 
 static const HChar *
+s390_irgen_MVCIN(UChar length, IRTemp start1, IRTemp start2)
+{
+   IRTemp len = newTemp(Ity_I64);
+
+   assign(len, mkU64(length));
+   s390_irgen_MVCIN_EX(len, start1, start2);
+
+   return "mvcin";
+}
+
+static const HChar *
 s390_irgen_MVCL(UChar r1, UChar r2)
 {
    IRTemp addr1 = newTemp(Ity_I64);
@@ -12938,6 +13073,38 @@
 }
 
 static const HChar *
+s390_irgen_POPCNT(UChar r1, UChar r2)
+{
+   Int i;
+   IRTemp val = newTemp(Ity_I64);
+   IRTemp mask[3];
+
+   assign(val, get_gpr_dw0(r2));
+   for (i = 0; i < 3; i++) {
+      mask[i] = newTemp(Ity_I64);
+   }
+   assign(mask[0], mkU64(0x5555555555555555ULL));
+   assign(mask[1], mkU64(0x3333333333333333ULL));
+   assign(mask[2], mkU64(0x0F0F0F0F0F0F0F0FULL));
+   for (i = 0; i < 3; i++) {
+      IRTemp tmp = newTemp(Ity_I64);
+
+      assign(tmp,
+             binop(Iop_Add64,
+                   binop(Iop_And64,
+                         mkexpr(val),
+                         mkexpr(mask[i])),
+                   binop(Iop_And64,
+                         binop(Iop_Shr64, mkexpr(val), mkU8(1 << i)),
+                         mkexpr(mask[i]))));
+      val = tmp;
+   }
+   s390_cc_thunk_putZ(S390_CC_OP_BITWISE, val);
+   put_gpr_dw0(r1, mkexpr(val));
+   return "popcnt";
+}
+
+static const HChar *
 s390_irgen_STCK(IRTemp op2addr)
 {
    IRDirty *d;
@@ -14512,7 +14679,8 @@
                                      ovl.fmt.RRF.r3, ovl.fmt.RRF.r2);  goto ok;
    case 0xb31f: s390_format_RRF_F0FF(s390_irgen_MSDBR, ovl.fmt.RRF.r1,
                                      ovl.fmt.RRF.r3, ovl.fmt.RRF.r2);  goto ok;
-   case 0xb324: /* LDER */ goto unimplemented;
+   case 0xb324: s390_format_RRE_FF(s390_irgen_LDER, ovl.fmt.RRE.r1,
+                                   ovl.fmt.RRE.r2); goto ok;
    case 0xb325: /* LXDR */ goto unimplemented;
    case 0xb326: /* LXER */ goto unimplemented;
    case 0xb32e: /* MAER */ goto unimplemented;
@@ -14999,7 +15167,8 @@
                                    ovl.fmt.RRE.r2);  goto ok;
    case 0xb9df: s390_format_RRE_RR(s390_irgen_CLHLR, ovl.fmt.RRE.r1,
                                    ovl.fmt.RRE.r2);  goto ok;
-   case 0xb9e1: /* POPCNT */ goto unimplemented;
+   case 0xb9e1: s390_format_RRE_RR(s390_irgen_POPCNT, ovl.fmt.RRE.r1,
+                                   ovl.fmt.RRE.r2);  goto ok;
    case 0xb9e2: s390_format_RRF_U0RR(s390_irgen_LOCGR, ovl.fmt.RRF3.r3,
                                      ovl.fmt.RRF3.r1, ovl.fmt.RRF3.r2,
                                      S390_XMNM_LOCGR);  goto ok;
@@ -15997,7 +16166,13 @@
    case 0xec0000000045ULL: s390_format_RIE_RRP(s390_irgen_BRXLG, ovl.fmt.RIE.r1,
                                                ovl.fmt.RIE.r3, ovl.fmt.RIE.i2);
                                                goto ok;
-   case 0xec0000000051ULL: /* RISBLG */ goto unimplemented;
+   case 0xec0000000051ULL: s390_format_RIE_RRUUU(s390_irgen_RISBLG,
+                                                 ovl.fmt.RIE_RRUUU.r1,
+                                                 ovl.fmt.RIE_RRUUU.r2,
+                                                 ovl.fmt.RIE_RRUUU.i3,
+                                                 ovl.fmt.RIE_RRUUU.i4,
+                                                 ovl.fmt.RIE_RRUUU.i5);
+                                                 goto ok;
    case 0xec0000000054ULL: s390_format_RIE_RRUUU(s390_irgen_RNSBG,
                                                  ovl.fmt.RIE_RRUUU.r1,
                                                  ovl.fmt.RIE_RRUUU.r2,
@@ -16026,8 +16201,20 @@
                                                  ovl.fmt.RIE_RRUUU.i4,
                                                  ovl.fmt.RIE_RRUUU.i5);
                                                  goto ok;
-   case 0xec0000000059ULL: /* RISBGN */ goto unimplemented;
-   case 0xec000000005dULL: /* RISBHG */ goto unimplemented;
+   case 0xec0000000059ULL: s390_format_RIE_RRUUU(s390_irgen_RISBGN,
+                                                 ovl.fmt.RIE_RRUUU.r1,
+                                                 ovl.fmt.RIE_RRUUU.r2,
+                                                 ovl.fmt.RIE_RRUUU.i3,
+                                                 ovl.fmt.RIE_RRUUU.i4,
+                                                 ovl.fmt.RIE_RRUUU.i5);
+                                                 goto ok;
+   case 0xec000000005dULL: s390_format_RIE_RRUUU(s390_irgen_RISBHG,
+                                                 ovl.fmt.RIE_RRUUU.r1,
+                                                 ovl.fmt.RIE_RRUUU.r2,
+                                                 ovl.fmt.RIE_RRUUU.i3,
+                                                 ovl.fmt.RIE_RRUUU.i4,
+                                                 ovl.fmt.RIE_RRUUU.i5);
+                                                 goto ok;
    case 0xec0000000064ULL: s390_format_RIE_RRPU(s390_irgen_CGRJ,
                                                 ovl.fmt.RIE_RRPU.r1,
                                                 ovl.fmt.RIE_RRPU.r2,
@@ -16190,7 +16377,10 @@
                                                  ovl.fmt.RXF.r3, ovl.fmt.RXF.x2,
                                                  ovl.fmt.RXF.b2, ovl.fmt.RXF.d2,
                                                  ovl.fmt.RXF.r1);  goto ok;
-   case 0xed0000000024ULL: /* LDE */ goto unimplemented;
+   case 0xed0000000024ULL: s390_format_RXE_FRRD(s390_irgen_LDE,
+                                                ovl.fmt.RXE.r1, ovl.fmt.RXE.x2,
+                                                ovl.fmt.RXE.b2,
+                                                ovl.fmt.RXE.d2); goto ok;
    case 0xed0000000025ULL: /* LXD */ goto unimplemented;
    case 0xed0000000026ULL: /* LXE */ goto unimplemented;
    case 0xed000000002eULL: /* MAE */ goto unimplemented;
@@ -16366,7 +16556,8 @@
    case 0xc802ULL: /* CSST */ goto unimplemented;
    case 0xc804ULL: /* LPD */ goto unimplemented;
    case 0xc805ULL: /* LPDG */ goto unimplemented;
-   case 0xcc06ULL: /* BRCTH */ goto unimplemented;
+   case 0xcc06ULL:  s390_format_RIL_RP(s390_irgen_BRCTH, ovl.fmt.RIL.r1,
+                                       ovl.fmt.RIL.i2);  goto ok;
    case 0xcc08ULL: s390_format_RIL_RI(s390_irgen_AIH, ovl.fmt.RIL.r1,
                                       ovl.fmt.RIL.i2);  goto ok;
    case 0xcc0aULL: s390_format_RIL_RI(s390_irgen_ALSIH, ovl.fmt.RIL.r1,
@@ -16416,7 +16607,9 @@
    case 0xdfULL: /* EDMK */ goto unimplemented;
    case 0xe1ULL: /* PKU */ goto unimplemented;
    case 0xe2ULL: /* UNPKU */ goto unimplemented;
-   case 0xe8ULL: /* MVCIN */ goto unimplemented;
+   case 0xe8ULL: s390_format_SS_L0RDRD(s390_irgen_MVCIN, ovl.fmt.SS.l,
+                                       ovl.fmt.SS.b1, ovl.fmt.SS.d1,
+                                       ovl.fmt.SS.b2, ovl.fmt.SS.d2);  goto ok;
    case 0xe9ULL: /* PKA */ goto unimplemented;
    case 0xeaULL: /* UNPKA */ goto unimplemented;
    case 0xeeULL: /* PLO */ goto unimplemented;
diff --git a/VEX/priv/guest_x86_toIR.c b/VEX/priv/guest_x86_toIR.c
index c24d9a4..24f9848 100644
--- a/VEX/priv/guest_x86_toIR.c
+++ b/VEX/priv/guest_x86_toIR.c
@@ -8565,8 +8565,10 @@
       vassert(sz == 4);
 
       modrm = getIByte(delta+2);
-      do_MMX_preamble();
       if (epartIsReg(modrm)) {
+         /* Only switch to MMX mode if the source is a MMX register.
+            See comments on CVTPI2PD for details.  Fixes #357059. */
+         do_MMX_preamble();
          assign( arg64, getMMXReg(eregOfRM(modrm)) );
          delta += 2+1;
          DIP("cvtpi2ps %s,%s\n", nameMMXReg(eregOfRM(modrm)),
diff --git a/VEX/priv/host_arm64_defs.c b/VEX/priv/host_arm64_defs.c
index 82f658e..cc7c832 100644
--- a/VEX/priv/host_arm64_defs.c
+++ b/VEX/priv/host_arm64_defs.c
@@ -1010,6 +1010,11 @@
    i->tag        = ARM64in_MFence;
    return i;
 }
+ARM64Instr* ARM64Instr_ClrEX ( void ) {
+   ARM64Instr* i = LibVEX_Alloc_inline(sizeof(ARM64Instr));
+   i->tag        = ARM64in_ClrEX;
+   return i;
+}
 ARM64Instr* ARM64Instr_VLdStH ( Bool isLoad, HReg sD, HReg rN, UInt uimm12 ) {
    ARM64Instr* i = LibVEX_Alloc_inline(sizeof(ARM64Instr));
    i->tag                   = ARM64in_VLdStH;
@@ -1567,6 +1572,9 @@
       case ARM64in_MFence:
          vex_printf("(mfence) dsb sy; dmb sy; isb");
          return;
+      case ARM64in_ClrEX:
+         vex_printf("clrex #15");
+         return;
       case ARM64in_VLdStH:
          if (i->ARM64in.VLdStH.isLoad) {
             vex_printf("ldr    ");
@@ -2058,6 +2066,8 @@
          return;
       case ARM64in_MFence:
          return;
+      case ARM64in_ClrEX:
+         return;
       case ARM64in_VLdStH:
          addHRegUse(u, HRmRead, i->ARM64in.VLdStH.rN);
          if (i->ARM64in.VLdStH.isLoad) {
@@ -2318,6 +2328,8 @@
          return;
       case ARM64in_MFence:
          return;
+      case ARM64in_ClrEX:
+         return;
       case ARM64in_VLdStH:
          i->ARM64in.VLdStH.hD = lookupHRegRemap(m, i->ARM64in.VLdStH.hD);
          i->ARM64in.VLdStH.rN = lookupHRegRemap(m, i->ARM64in.VLdStH.rN);
@@ -3797,12 +3809,10 @@
          *p++ = 0xD5033FDF; /* ISB */
          goto done;
       }
-      //case ARM64in_CLREX: {
-      //   //ATC, but believed to be correct
-      //   goto bad;
-      //   *p++ = 0xD5033F5F; /* clrex */
-      //   goto done;
-      //}
+      case ARM64in_ClrEX: {
+         *p++ = 0xD5033F5F; /* clrex #15 */
+         goto done;
+      }
       case ARM64in_VLdStH: {
          /* 01 111101 01 imm12 n t   LDR Ht, [Xn|SP, #imm12 * 2]
             01 111101 00 imm12 n t   STR Ht, [Xn|SP, #imm12 * 2]
diff --git a/VEX/priv/host_arm64_defs.h b/VEX/priv/host_arm64_defs.h
index d74f8a1..62b25fd 100644
--- a/VEX/priv/host_arm64_defs.h
+++ b/VEX/priv/host_arm64_defs.h
@@ -482,6 +482,7 @@
       ARM64in_LdrEX,
       ARM64in_StrEX,
       ARM64in_MFence,
+      ARM64in_ClrEX,
       /* ARM64in_V*: scalar ops involving vector registers */
       ARM64in_VLdStH,   /* ld/st to/from low 16 bits of vec reg, imm offset */
       ARM64in_VLdStS,   /* ld/st to/from low 32 bits of vec reg, imm offset */
@@ -673,6 +674,9 @@
             total nuclear overkill, but better safe than sorry. */
          struct {
          } MFence;
+         /* A CLREX instruction. */
+         struct {
+         } ClrEX;
          /* --- INSTRUCTIONS INVOLVING VECTOR REGISTERS --- */
          /* ld/st to/from low 16 bits of vec reg, imm offset */
          struct {
@@ -909,6 +913,7 @@
 extern ARM64Instr* ARM64Instr_LdrEX   ( Int szB );
 extern ARM64Instr* ARM64Instr_StrEX   ( Int szB );
 extern ARM64Instr* ARM64Instr_MFence  ( void );
+extern ARM64Instr* ARM64Instr_ClrEX   ( void );
 extern ARM64Instr* ARM64Instr_VLdStH  ( Bool isLoad, HReg sD, HReg rN,
                                         UInt uimm12 /* 0 .. 8190, 0 % 2 */ );
 extern ARM64Instr* ARM64Instr_VLdStS  ( Bool isLoad, HReg sD, HReg rN,
diff --git a/VEX/priv/host_arm64_isel.c b/VEX/priv/host_arm64_isel.c
index a572408..9aadcce 100644
--- a/VEX/priv/host_arm64_isel.c
+++ b/VEX/priv/host_arm64_isel.c
@@ -3839,6 +3839,9 @@
          case Imbe_Fence:
             addInstr(env, ARM64Instr_MFence());
             return;
+         case Imbe_CancelReservation:
+            addInstr(env, ARM64Instr_ClrEX());
+            return;
          default:
             break;
       }
diff --git a/VEX/priv/host_arm_defs.c b/VEX/priv/host_arm_defs.c
index 82d5519..2b18714 100644
--- a/VEX/priv/host_arm_defs.c
+++ b/VEX/priv/host_arm_defs.c
@@ -1329,6 +1329,15 @@
    i->ARMin.VCvtSD.src  = src;
    return i;
 }
+ARMInstr* ARMInstr_VXferQ ( Bool toQ, HReg qD, HReg dHi, HReg dLo ) {
+   ARMInstr* i = LibVEX_Alloc_inline(sizeof(ARMInstr));
+   i->tag              = ARMin_VXferQ;
+   i->ARMin.VXferQ.toQ = toQ;
+   i->ARMin.VXferQ.qD  = qD;
+   i->ARMin.VXferQ.dHi = dHi;
+   i->ARMin.VXferQ.dLo = dLo;
+   return i;
+}
 ARMInstr* ARMInstr_VXferD ( Bool toD, HReg dD, HReg rHi, HReg rLo ) {
    ARMInstr* i = LibVEX_Alloc_inline(sizeof(ARMInstr));
    i->tag              = ARMin_VXferD;
@@ -1800,6 +1809,29 @@
          vex_printf(", ");
          ppHRegARM(i->ARMin.VCvtSD.src);
          return;
+      case ARMin_VXferQ:
+         if (i->ARMin.VXferQ.toQ) {
+            vex_printf("vmov ");
+            ppHRegARM(i->ARMin.VXferQ.qD);
+            vex_printf("-lo64, ");
+            ppHRegARM(i->ARMin.VXferQ.dLo);
+            vex_printf(" ; vmov ");
+            ppHRegARM(i->ARMin.VXferQ.qD);
+            vex_printf("-hi64, ");
+            ppHRegARM(i->ARMin.VXferQ.dHi);
+         } else {
+            vex_printf("vmov ");
+            ppHRegARM(i->ARMin.VXferQ.dLo);
+            vex_printf(", ");
+            ppHRegARM(i->ARMin.VXferQ.qD);
+            vex_printf("-lo64");
+            vex_printf(" ; vmov ");
+            ppHRegARM(i->ARMin.VXferQ.dHi);
+            vex_printf(", ");
+            ppHRegARM(i->ARMin.VXferQ.qD);
+            vex_printf("-hi64");
+         }
+         return;
       case ARMin_VXferD:
          vex_printf("vmov  ");
          if (i->ARMin.VXferD.toD) {
@@ -2201,6 +2233,17 @@
          addHRegUse(u, HRmWrite, i->ARMin.VCvtSD.dst);
          addHRegUse(u, HRmRead,  i->ARMin.VCvtSD.src);
          return;
+      case ARMin_VXferQ:
+         if (i->ARMin.VXferQ.toQ) {
+            addHRegUse(u, HRmWrite, i->ARMin.VXferQ.qD);
+            addHRegUse(u, HRmRead,  i->ARMin.VXferQ.dHi);
+            addHRegUse(u, HRmRead,  i->ARMin.VXferQ.dLo);
+         } else {
+            addHRegUse(u, HRmRead,  i->ARMin.VXferQ.qD);
+            addHRegUse(u, HRmWrite, i->ARMin.VXferQ.dHi);
+            addHRegUse(u, HRmWrite, i->ARMin.VXferQ.dLo);
+         }
+         return;
       case ARMin_VXferD:
          if (i->ARMin.VXferD.toD) {
             addHRegUse(u, HRmWrite, i->ARMin.VXferD.dD);
@@ -2422,6 +2465,11 @@
          i->ARMin.VCvtSD.dst = lookupHRegRemap(m, i->ARMin.VCvtSD.dst);
          i->ARMin.VCvtSD.src = lookupHRegRemap(m, i->ARMin.VCvtSD.src);
          return;
+      case ARMin_VXferQ:
+         i->ARMin.VXferQ.qD  = lookupHRegRemap(m, i->ARMin.VXferQ.qD);
+         i->ARMin.VXferQ.dHi = lookupHRegRemap(m, i->ARMin.VXferQ.dHi);
+         i->ARMin.VXferQ.dLo = lookupHRegRemap(m, i->ARMin.VXferQ.dLo);
+         return;
       case ARMin_VXferD:
          i->ARMin.VXferD.dD  = lookupHRegRemap(m, i->ARMin.VXferD.dD);
          i->ARMin.VXferD.rHi = lookupHRegRemap(m, i->ARMin.VXferD.rHi);
@@ -2728,33 +2776,38 @@
 #define X1111  BITS4(1,1,1,1)
 
 #define XXXXX___(zzx7,zzx6,zzx5,zzx4,zzx3) \
-   ((((zzx7) & 0xF) << 28) | (((zzx6) & 0xF) << 24) |  \
+   (((((UInt)(zzx7)) & 0xF) << 28) | \
+    (((zzx6) & 0xF) << 24) |  \
     (((zzx5) & 0xF) << 20) | (((zzx4) & 0xF) << 16) |  \
     (((zzx3) & 0xF) << 12))
 
 #define XXXXXX__(zzx7,zzx6,zzx5,zzx4,zzx3,zzx2)        \
-   ((((zzx7) & 0xF) << 28) | (((zzx6) & 0xF) << 24) |  \
+   (((((UInt)(zzx7)) & 0xF) << 28) | \
+    (((zzx6) & 0xF) << 24) |  \
     (((zzx5) & 0xF) << 20) | (((zzx4) & 0xF) << 16) |  \
     (((zzx3) & 0xF) << 12) | (((zzx2) & 0xF) <<  8))
 
 #define XXXXX__X(zzx7,zzx6,zzx5,zzx4,zzx3,zzx0)        \
-   ((((zzx7) & 0xF) << 28) | (((zzx6) & 0xF) << 24) |  \
+   (((((UInt)(zzx7)) & 0xF) << 28) | \
+    (((zzx6) & 0xF) << 24) |  \
     (((zzx5) & 0xF) << 20) | (((zzx4) & 0xF) << 16) |  \
     (((zzx3) & 0xF) << 12) | (((zzx0) & 0xF) <<  0))
 
 #define XXX___XX(zzx7,zzx6,zzx5,zzx1,zzx0) \
-  ((((zzx7) & 0xF) << 28) | (((zzx6) & 0xF) << 24) | \
+  (((((UInt)(zzx7)) & 0xF) << 28) | \
+   (((zzx6) & 0xF) << 24) | \
    (((zzx5) & 0xF) << 20) | (((zzx1) & 0xF) << 4) | \
    (((zzx0) & 0xF) << 0))
 
 #define XXXXXXXX(zzx7,zzx6,zzx5,zzx4,zzx3,zzx2,zzx1,zzx0)  \
-   ((((zzx7) & 0xF) << 28) | (((zzx6) & 0xF) << 24) |  \
+   (((((UInt)(zzx7)) & 0xF) << 28) | \
+    (((zzx6) & 0xF) << 24) |  \
     (((zzx5) & 0xF) << 20) | (((zzx4) & 0xF) << 16) |  \
     (((zzx3) & 0xF) << 12) | (((zzx2) & 0xF) <<  8) |  \
     (((zzx1) & 0xF) <<  4) | (((zzx0) & 0xF) <<  0))
 
 #define XX______(zzx7,zzx6) \
-   ((((zzx7) & 0xF) << 28) | (((zzx6) & 0xF) << 24))
+   (((((UInt)(zzx7)) & 0xF) << 28) | (((zzx6) & 0xF) << 24))
 
 /* Generate a skeletal insn that involves an a RI84 shifter operand.
    Returns a word which is all zeroes apart from bits 25 and 11..0,
@@ -3682,6 +3735,46 @@
             goto done;
          }
       }
+      case ARMin_VXferQ: {
+         UInt insn;
+         UInt qD  = qregEnc(i->ARMin.VXferQ.qD);
+         UInt dHi = dregEnc(i->ARMin.VXferQ.dHi);
+         UInt dLo = dregEnc(i->ARMin.VXferQ.dLo);
+         /* This is a bit tricky.  We need to make 2 D-D moves and we rely
+            on the fact that the Q register can be treated as two D registers.
+            We also rely on the fact that the register allocator will allocate
+            the two D's and the Q to disjoint parts of the register file,
+            and so we don't have to worry about the first move's destination
+            being the same as the second move's source, etc.  We do have
+            assertions though. */
+         /* The ARM ARM specifies that
+              D<2n>   maps to the least significant half of Q<n>
+              D<2n+1> maps to the most  significant half of Q<n>
+            So there are no issues with endianness here.
+         */
+         UInt qDlo = 2 * qD + 0;
+         UInt qDhi = 2 * qD + 1;
+         /* Stay sane .. */
+         vassert(qDhi != dHi && qDhi != dLo);
+         vassert(qDlo != dHi && qDlo != dLo);
+         /* vmov dX, dY is
+            F 2 (0,dX[4],1,0) dY[3:0] dX[3:0] 1 (dY[4],0,dY[4],1) dY[3:0]
+         */
+#        define VMOV_D_D(_xx,_yy) \
+            XXXXXXXX( 0xF, 0x2, BITS4(0, (((_xx) >> 4) & 1), 1, 0), \
+                      ((_yy) & 0xF), ((_xx) & 0xF), 0x1, \
+                      BITS4( (((_yy) >> 4) & 1), 0, (((_yy) >> 4) & 1), 1), \
+                      ((_yy) & 0xF) )
+         if (i->ARMin.VXferQ.toQ) {
+            insn = VMOV_D_D(qDlo, dLo); *p++ = insn;
+            insn = VMOV_D_D(qDhi, dHi); *p++ = insn;
+         } else {
+            insn = VMOV_D_D(dLo, qDlo); *p++ = insn;
+            insn = VMOV_D_D(dHi, qDhi); *p++ = insn;
+         }
+#        undef VMOV_D_D
+         goto done;
+      }
       case ARMin_VXferD: {
          UInt dD  = dregEnc(i->ARMin.VXferD.dD);
          UInt rHi = iregEnc(i->ARMin.VXferD.rHi);
@@ -4750,8 +4843,11 @@
 
    /* And make the modifications. */
    if (shortOK) {
-      Int simm24 = (Int)(delta >> 2);
-      vassert(simm24 == ((simm24 << 8) >> 8));
+      UInt uimm24      = (UInt)(delta >> 2);
+      UInt uimm24_shl8 = uimm24 << 8;
+      Int  simm24      = (Int)uimm24_shl8;
+      simm24 >>= 8;
+      vassert(uimm24 == simm24);
       p[0] = 0xEA000000 | (simm24 & 0x00FFFFFF);
       p[1] = 0xFF000000;
       p[2] = 0xFF000000;
diff --git a/VEX/priv/host_arm_defs.h b/VEX/priv/host_arm_defs.h
index 47f459d..cd20512 100644
--- a/VEX/priv/host_arm_defs.h
+++ b/VEX/priv/host_arm_defs.h
@@ -591,6 +591,7 @@
       ARMin_VCMovD,
       ARMin_VCMovS,
       ARMin_VCvtSD,
+      ARMin_VXferQ,
       ARMin_VXferD,
       ARMin_VXferS,
       ARMin_VCvtID,
@@ -824,6 +825,13 @@
             HReg dst;
             HReg src;
          } VCvtSD;
+         /* Transfer a NEON Q reg to/from two D registers (VMOV x 2) */
+         struct {
+            Bool toQ;
+            HReg qD;
+            HReg dHi;
+            HReg dLo;
+         } VXferQ;
          /* Transfer a VFP D reg to/from two integer registers (VMOV) */
          struct {
             Bool toD;
@@ -994,6 +1002,7 @@
 extern ARMInstr* ARMInstr_VCMovD   ( ARMCondCode, HReg dst, HReg src );
 extern ARMInstr* ARMInstr_VCMovS   ( ARMCondCode, HReg dst, HReg src );
 extern ARMInstr* ARMInstr_VCvtSD   ( Bool sToD, HReg dst, HReg src );
+extern ARMInstr* ARMInstr_VXferQ   ( Bool toQ, HReg qD, HReg dHi, HReg dLo );
 extern ARMInstr* ARMInstr_VXferD   ( Bool toD, HReg dD, HReg rHi, HReg rLo );
 extern ARMInstr* ARMInstr_VXferS   ( Bool toS, HReg fD, HReg rLo );
 extern ARMInstr* ARMInstr_VCvtID   ( Bool iToD, Bool syned,
diff --git a/VEX/priv/host_arm_isel.c b/VEX/priv/host_arm_isel.c
index 1513112..426f85d 100644
--- a/VEX/priv/host_arm_isel.c
+++ b/VEX/priv/host_arm_isel.c
@@ -368,6 +368,134 @@
 }
 
 
+static
+Bool doHelperCallWithArgsOnStack ( /*OUT*/UInt*   stackAdjustAfterCall,
+                                   /*OUT*/RetLoc* retloc,
+                                   ISelEnv* env,
+                                   IRExpr* guard,
+                                   IRCallee* cee, IRType retTy, IRExpr** args )
+{
+   /* This function deals just with the case where the arg sequence is:
+      VECRET followed by between 4 and 12 Ity_I32 values.  So far no other
+      cases are necessary or supported. */
+
+   /* Check this matches the required format. */
+   if (args[0] == NULL || args[0]->tag != Iex_VECRET)
+      goto no_match;
+
+   UInt i;
+   UInt n_real_args = 0;
+   for (i = 1; args[i]; i++) {
+      IRExpr* arg = args[i];
+      if (UNLIKELY(is_IRExpr_VECRET_or_BBPTR(arg)))
+         goto no_match;
+      IRType argTy = typeOfIRExpr(env->type_env, arg);
+      if (UNLIKELY(argTy != Ity_I32))
+         goto no_match;
+      n_real_args++;
+   }
+
+   /* We expect to pass at least some args on the stack. */
+   if (n_real_args <= 3)
+      goto no_match;
+
+   /* But not too many. */
+   if (n_real_args > 12)
+      goto no_match;
+
+   /* General rules for a call:
+
+      Args 1 .. 4 go in R0 .. R3.  The rest are pushed R to L on the
+      stack; that is, arg 5 is at the lowest address, arg 6 at the
+      next lowest, etc.
+
+      The stack is to be kept 8 aligned.
+
+      It appears (for unclear reasons) that the highest 3 words made
+      available when moving SP downwards are not to be used.  For
+      example, if 5 args are to go on the stack, then SP must be moved
+      down 32 bytes, and the area at SP+20 .. SP+31 is not to be used
+      by the caller.
+   */
+
+   /* For this particular case, we use the following layout:
+
+        ------ original SP
+        112 bytes
+        ------
+        return value
+        ------ original SP - 128
+        space
+        args words, between 1 and 11
+        ------ new SP = original_SP - 256
+
+      Using 256 bytes is overkill, but it is simple and good enough.
+   */
+
+   /* This should really be
+        HReg argVRegs[n_real_args];
+      but that makes it impossible to do 'goto's forward past.
+      Hence the following kludge. */
+   vassert(n_real_args <= 12);
+   HReg argVRegs[12];
+   for (i = 0; i < 12; i++)
+      argVRegs[i] = INVALID_HREG;
+
+   /* Compute args into vregs. */
+   for (i = 0; i < n_real_args; i++) {
+      argVRegs[i] = iselIntExpr_R(env, args[i+1]);
+   }
+
+   /* Now we can compute the condition.  We can't do it earlier
+      because the argument computations could trash the condition
+      codes.  Be a bit clever to handle the common case where the
+      guard is 1:Bit. */
+   ARMCondCode cc = ARMcc_AL;
+   if (guard) {
+      if (guard->tag == Iex_Const
+          && guard->Iex.Const.con->tag == Ico_U1
+          && guard->Iex.Const.con->Ico.U1 == True) {
+         /* unconditional -- do nothing */
+      } else {
+         goto no_match; //ATC
+         cc = iselCondCode( env, guard );
+      }
+   }
+
+   HReg r0 = hregARM_R0();
+   HReg sp = hregARM_R13();
+
+   ARMRI84* c256 = ARMRI84_I84(64, 15); // 64 `ror` (15 * 2)
+
+   addInstr(env, ARMInstr_Alu(ARMalu_SUB, r0, sp, ARMRI84_I84(128, 0)));
+
+   addInstr(env, mk_iMOVds_RR(hregARM_R1(), argVRegs[0]));
+   addInstr(env, mk_iMOVds_RR(hregARM_R2(), argVRegs[1]));
+   addInstr(env, mk_iMOVds_RR(hregARM_R3(), argVRegs[2]));
+
+   addInstr(env, ARMInstr_Alu(ARMalu_SUB, sp, sp, c256));
+
+   for (i = 3; i < n_real_args; i++) {
+      addInstr(env, ARMInstr_LdSt32(ARMcc_AL, False/*store*/, argVRegs[i],
+                                    ARMAMode1_RI(sp, (i-3) * 4)));
+   }
+
+   vassert(*stackAdjustAfterCall == 0);
+   vassert(is_RetLoc_INVALID(*retloc));
+
+   *stackAdjustAfterCall = 256;
+   *retloc = mk_RetLoc_spRel(RLPri_V128SpRel, 128);
+
+   Addr32 target = (Addr)cee->addr;
+   addInstr(env, ARMInstr_Call( cc, target, 4, *retloc ));
+
+   return True; /* success */
+
+  no_match:
+   return False;
+}
+
+
 /* Do a complete function call.  |guard| is a Ity_Bit expression
    indicating whether or not the call happens.  If guard==NULL, the
    call is unconditional.  |retloc| is set to indicate where the
@@ -470,6 +598,21 @@
       n_args++;
    }
 
+   /* If there are more than 4 args, we are going to have to pass
+      some via memory.  Use a different function to (possibly) deal with
+      that; dealing with it here is too complex. */
+   if (n_args > ARM_N_ARGREGS) {
+      return doHelperCallWithArgsOnStack(stackAdjustAfterCall, retloc,
+                                         env, guard, cee, retTy, args );
+
+   }
+
+   /* After this point we make no attempt to pass args on the stack,
+      and just give up if that case (which is OK because it never
+      happens).  Even if there are for example only 3 args, it might
+      still be necessary to pass some of them on the stack if for example
+      two or more of them are 64-bit integers. */
+
    argregs[0] = hregARM_R0();
    argregs[1] = hregARM_R1();
    argregs[2] = hregARM_R2();
@@ -653,30 +796,30 @@
    vassert(*stackAdjustAfterCall == 0);
    vassert(is_RetLoc_INVALID(*retloc));
    switch (retTy) {
-         case Ity_INVALID:
-            /* Function doesn't return a value. */
-            *retloc = mk_RetLoc_simple(RLPri_None);
-            break;
-         case Ity_I64:
-            *retloc = mk_RetLoc_simple(RLPri_2Int);
-            break;
-         case Ity_I32: case Ity_I16: case Ity_I8:
-            *retloc = mk_RetLoc_simple(RLPri_Int);
-            break;
-         case Ity_V128:
-            vassert(0); // ATC
-            *retloc = mk_RetLoc_spRel(RLPri_V128SpRel, 0);
-            *stackAdjustAfterCall = 16;
-            break;
-         case Ity_V256:
-            vassert(0); // ATC
-            *retloc = mk_RetLoc_spRel(RLPri_V256SpRel, 0);
-            *stackAdjustAfterCall = 32;
-            break;
-         default:
-            /* IR can denote other possible return types, but we don't
-               handle those here. */
-           vassert(0);
+      case Ity_INVALID:
+         /* Function doesn't return a value. */
+         *retloc = mk_RetLoc_simple(RLPri_None);
+         break;
+      case Ity_I64:
+         *retloc = mk_RetLoc_simple(RLPri_2Int);
+         break;
+      case Ity_I32: case Ity_I16: case Ity_I8:
+         *retloc = mk_RetLoc_simple(RLPri_Int);
+         break;
+      case Ity_V128:
+         vassert(0); // ATC
+         *retloc = mk_RetLoc_spRel(RLPri_V128SpRel, 0);
+         *stackAdjustAfterCall = 16;
+         break;
+      case Ity_V256:
+         vassert(0); // ATC
+         *retloc = mk_RetLoc_spRel(RLPri_V256SpRel, 0);
+         *stackAdjustAfterCall = 32;
+         break;
+      default:
+         /* IR can denote other possible return types, but we don't
+            handle those here. */
+         vassert(0);
    }
 
    /* Finally, generate the call itself.  This needs the *retloc value
@@ -3714,6 +3857,14 @@
                                           res, arg, 0, False));
             return res;
          }
+         case Iop_V128to64:
+         case Iop_V128HIto64: {
+            HReg src   = iselNeonExpr(env, e->Iex.Unop.arg);
+            HReg resLo = newVRegD(env);
+            HReg resHi = newVRegD(env);
+            addInstr(env, ARMInstr_VXferQ(False/*!toQ*/, src, resHi, resLo));
+            return e->Iex.Unop.op == Iop_V128HIto64 ? resHi : resLo;
+         }
          default:
             break;
       }
@@ -4305,7 +4456,7 @@
 
    if (e->tag == Iex_Binop) {
       switch (e->Iex.Binop.op) {
-         case Iop_64HLtoV128:
+         case Iop_64HLtoV128: {
             /* Try to match into single "VMOV reg, imm" instruction */
             if (e->Iex.Binop.arg1->tag == Iex_Const &&
                 e->Iex.Binop.arg2->tag == Iex_Const &&
@@ -4349,45 +4500,12 @@
             }
             /* Does not match "VMOV Reg, Imm" form.  We'll have to do
                it the slow way. */
-            { 
-               /* local scope */
-               /* Done via the stack for ease of use. */
-               /* FIXME: assumes little endian host */
-               HReg       w3, w2, w1, w0;
-               HReg       res  = newVRegV(env);
-               ARMAMode1* sp_0  = ARMAMode1_RI(hregARM_R13(), 0);
-               ARMAMode1* sp_4  = ARMAMode1_RI(hregARM_R13(), 4);
-               ARMAMode1* sp_8  = ARMAMode1_RI(hregARM_R13(), 8);
-               ARMAMode1* sp_12 = ARMAMode1_RI(hregARM_R13(), 12);
-               ARMRI84*   c_16  = ARMRI84_I84(16,0);
-               /* Make space for SP */
-               addInstr(env, ARMInstr_Alu(ARMalu_SUB, hregARM_R13(),
-                                                      hregARM_R13(), c_16));
-
-               /* Store the less significant 64 bits */
-               iselInt64Expr(&w1, &w0, env, e->Iex.Binop.arg2);
-               addInstr(env, ARMInstr_LdSt32(ARMcc_AL, False/*store*/,
-                                             w0, sp_0));
-               addInstr(env, ARMInstr_LdSt32(ARMcc_AL, False/*store*/,
-                                             w1, sp_4));
-         
-               /* Store the more significant 64 bits */
-               iselInt64Expr(&w3, &w2, env, e->Iex.Binop.arg1);
-               addInstr(env, ARMInstr_LdSt32(ARMcc_AL, False/*store*/,
-                                             w2, sp_8));
-               addInstr(env, ARMInstr_LdSt32(ARMcc_AL, False/*store*/,
-                                             w3, sp_12));
-         
-                /* Load result back from stack. */
-                addInstr(env, ARMInstr_NLdStQ(True/*load*/, res,
-                                              mkARMAModeN_R(hregARM_R13())));
-
-                /* Restore SP */
-                addInstr(env, ARMInstr_Alu(ARMalu_ADD, hregARM_R13(),
-                                           hregARM_R13(), c_16));
-                return res;
-            } /* local scope */
-            goto neon_expr_bad;
+            HReg dHi = iselNeon64Expr(env, e->Iex.Binop.arg1);
+            HReg dLo = iselNeon64Expr(env, e->Iex.Binop.arg2);
+            HReg res = newVRegV(env);
+            addInstr(env, ARMInstr_VXferQ(True/*toQ*/, res, dHi, dLo));
+            return res;
+         }
          case Iop_AndV128: {
             HReg res = newVRegV(env);
             HReg argL = iselNeonExpr(env, e->Iex.Binop.arg1);
@@ -5359,7 +5477,7 @@
       return dst;
    }
 
-  neon_expr_bad:
+  /* neon_expr_bad: */
    ppIRExpr(e);
    vpanic("iselNeonExpr_wrk");
 }
@@ -5974,7 +6092,7 @@
       switch (retty) {
          case Ity_INVALID: /* function doesn't return anything */
          case Ity_I64: case Ity_I32: case Ity_I16: case Ity_I8:
-         //case Ity_V128: //ATC
+         case Ity_V128:
             retty_ok = True; break;
          default:
             break;
@@ -5987,7 +6105,9 @@
          call is skipped. */
       UInt   addToSp = 0;
       RetLoc rloc    = mk_RetLoc_INVALID();
-      doHelperCall( &addToSp, &rloc, env, d->guard, d->cee, retty, d->args );
+      Bool   ok      = doHelperCall( &addToSp, &rloc, env,
+                                     d->guard, d->cee, retty, d->args );
+      if (!ok) goto stmt_fail;
       vassert(is_sane_RetLoc(rloc));
 
       /* Now figure out what to do with the returned value, if any. */
@@ -6026,11 +6146,6 @@
             return;
          }
          case Ity_V128: {
-            vassert(0); // ATC.  The code that this produces really
-            // needs to be looked at, to verify correctness.
-            // I don't think this can ever happen though, since the
-            // ARM front end never produces 128-bit loads/stores.
-            // Hence the following is mostly theoretical.
             /* The returned value is on the stack, and *retloc tells
                us where.  Fish it off the stack and then move the
                stack pointer upwards to clear it, as directed by
@@ -6038,16 +6153,26 @@
             vassert(rloc.pri == RLPri_V128SpRel);
             vassert(rloc.spOff < 256); // else ARMRI84_I84(_,0) can't encode it
             vassert(addToSp >= 16);
-            vassert(addToSp < 256); // ditto reason as for rloc.spOff
+            vassert(addToSp <= 256);
+            /* Both the stack delta and the offset must be at least 8-aligned.
+               If that isn't so, doHelperCall() has generated bad code. */
+            vassert(0 == (rloc.spOff % 8));
+            vassert(0 == (addToSp % 8));
             HReg dst = lookupIRTemp(env, d->tmp);
             HReg tmp = newVRegI(env);
-            HReg r13 = hregARM_R13(); // sp
+            HReg sp  = hregARM_R13();
             addInstr(env, ARMInstr_Alu(ARMalu_ADD,
-                                       tmp, r13, ARMRI84_I84(rloc.spOff,0)));
+                                       tmp, sp, ARMRI84_I84(rloc.spOff,0)));
             ARMAModeN* am = mkARMAModeN_R(tmp);
+            /* This load could be done with its effective address 0 % 8,
+               because that's the best stack alignment that we can be
+               assured of. */
             addInstr(env, ARMInstr_NLdStQ(True/*load*/, dst, am));
-            addInstr(env, ARMInstr_Alu(ARMalu_ADD,
-                                       r13, r13, ARMRI84_I84(addToSp,0)));
+
+            ARMRI84* spAdj
+               = addToSp == 256 ? ARMRI84_I84(64, 15) // 64 `ror` (15 * 2)
+                                : ARMRI84_I84(addToSp, 0);
+            addInstr(env, ARMInstr_Alu(ARMalu_ADD, sp, sp, spAdj));
             return;
          }
          default:
diff --git a/VEX/priv/host_generic_reg_alloc2.c b/VEX/priv/host_generic_reg_alloc2.c
index fd46486..3c0b8db 100644
--- a/VEX/priv/host_generic_reg_alloc2.c
+++ b/VEX/priv/host_generic_reg_alloc2.c
@@ -993,13 +993,13 @@
       /* ------------ Sanity checks ------------ */
 
       /* Sanity checks are expensive.  So they are done only once
-         every 13 instructions, and just before the last
+         every 17 instructions, and just before the last
          instruction. */
       do_sanity_check
          = toBool(
               False /* Set to True for sanity checking of all insns. */
               || ii == instrs_in->arr_used-1
-              || (ii > 0 && (ii % 13) == 0)
+              || (ii > 0 && (ii % 17) == 0)
            );
 
       if (do_sanity_check) {
diff --git a/VEX/priv/host_mips_isel.c b/VEX/priv/host_mips_isel.c
index a4a89cb..848234a 100644
--- a/VEX/priv/host_mips_isel.c
+++ b/VEX/priv/host_mips_isel.c
@@ -4171,19 +4171,19 @@
 
    /* sanity ... */
    vassert(arch_host == VexArchMIPS32 || arch_host == VexArchMIPS64);
-   vassert(VEX_PRID_COMP_MIPS == hwcaps_host
-           || VEX_PRID_COMP_BROADCOM == hwcaps_host
-           || VEX_PRID_COMP_NETLOGIC);
+   vassert(VEX_PRID_COMP_MIPS == VEX_MIPS_COMP_ID(hwcaps_host)
+           || VEX_PRID_COMP_CAVIUM == VEX_MIPS_COMP_ID(hwcaps_host)
+           || VEX_PRID_COMP_BROADCOM == VEX_MIPS_COMP_ID(hwcaps_host)
+           || VEX_PRID_COMP_NETLOGIC == VEX_MIPS_COMP_ID(hwcaps_host)
+           || VEX_PRID_COMP_INGENIC_E1 == VEX_MIPS_COMP_ID(hwcaps_host)
+           || VEX_PRID_COMP_LEGACY == VEX_MIPS_COMP_ID(hwcaps_host));
 
    /* Check that the host's endianness is as expected. */
    vassert(archinfo_host->endness == VexEndnessLE
            || archinfo_host->endness == VexEndnessBE);
 
    mode64 = arch_host != VexArchMIPS32;
-#if (__mips_fpr==64)
-   fp_mode64 = ((VEX_MIPS_REV(hwcaps_host) == VEX_PRID_CPU_32FPR)
-                || arch_host == VexArchMIPS64);
-#endif
+   fp_mode64 = VEX_MIPS_HOST_FP_MODE(hwcaps_host);
 
    /* Make up an initial environment to use. */
    env = LibVEX_Alloc_inline(sizeof(ISelEnv));
diff --git a/VEX/priv/host_ppc_defs.c b/VEX/priv/host_ppc_defs.c
index 13b193c..dc70f24 100644
--- a/VEX/priv/host_ppc_defs.c
+++ b/VEX/priv/host_ppc_defs.c
@@ -498,6 +498,8 @@
    case Pun_NEG:   return "neg";
    case Pun_CLZ32: return "cntlzw";
    case Pun_CLZ64: return "cntlzd";
+   case Pun_CTZ32: return "cnttzw";
+   case Pun_CTZ64: return "cnttzd";
    case Pun_EXTSW: return "extsw";
    default: vpanic("showPPCUnaryOp");
    }
@@ -550,6 +552,31 @@
       case Pfp_FRIN:   return "frin";
       case Pfp_FRIP:   return "frip";
       case Pfp_FRIZ:   return "friz";
+      case Pfp_FPADDQ:     return "xsaddqp";
+      case Pfp_FPSUBQ:     return "xsubqp";
+      case Pfp_FPMULQ:     return "xsmulqp";
+      case Pfp_FPDIVQ:     return "xsdivqp";
+      case Pfp_FPMULADDQ:        return "xsmaddqp";
+      case Pfp_FPMULSUBQ:        return "xsmsubqp";
+      case Pfp_FPNEGMULADDQ:     return "xsnmaddqp";
+      case Pfp_FPNEGMULSUBQ:     return "xsnmsubqp";
+      case Pfp_FPADDQRNDODD:     return "xsaddqpo";
+      case Pfp_FPSUBQRNDODD:     return "xsubqpo";
+      case Pfp_FPMULQRNDODD:     return "xsmulqpo";
+      case Pfp_FPDIVQRNDODD:     return "xsaddqpo";
+      case Pfp_FPMULADDQRNDODD:      return "xsmaddqpo";
+      case Pfp_FPMULSUBQRNDODD:      return "xsmsubqpo";
+      case Pfp_FPNEGMULADDQRNDODD:   return "xsnmaddqpo";
+      case Pfp_FPNEGMULSUBQRNDODD:   return "xsnmsubqpo";
+      case Pfp_FPQTOD:               return "xscvqpdp";
+      case Pfp_FPQTODRNDODD:         return "xscvqpdpo";
+      case Pfp_FPDTOQ:               return "xscvdpqp";
+      case Pfp_IDSTOQ:               return "xscvsdqp";
+      case Pfp_IDUTOQ:               return "xscvudqp";
+      case Pfp_TRUNCFPQTOISD:        return "xscvqpsdz";
+      case Pfp_TRUNCFPQTOISW:        return "xscvqpswz";
+      case Pfp_TRUNCFPQTOIUD:        return "xscvqpudz";
+      case Pfp_TRUNCFPQTOIUW:        return "xscvqpuwz";
       case Pfp_DFPADD:     return "dadd";
       case Pfp_DFPADDQ:    return "daddq";
       case Pfp_DFPSUB:     return "dsub";
@@ -650,6 +677,20 @@
    /* BCD */
    case Pav_BCDAdd:     return "bcdadd.";  // qw
    case Pav_BCDSub:     return "bcdsub.";  // qw
+   case Pav_I128StoBCD128: return  "bcdcfsq.";  //qw
+   case Pav_BCD128toI128S: return  "bcdctsq.";  //qw
+
+   /* I128 mult by 10 */
+   case Pav_MulI128by10:       return  "vmul10uq";  //qw
+   case Pav_MulI128by10Carry:  return  "vmul10cuq";  //qw
+   case Pav_MulI128by10E:      return  "vmul10euq";  //qw
+   case Pav_MulI128by10ECarry: return  "vmul10ecuq";  //qw
+
+      /* F128 to I128 signed */
+   case Pav_F128toI128S:    return "xsrqpi|x";   //qw
+
+      /* F128 round to F128 */
+   case Pav_ROUNDFPQ:       return "xsrqpxp";
 
    /* Polynomial arith */
    case Pav_POLYMULADD: return "vpmsum";   // b, h, w, d
@@ -664,10 +705,31 @@
    case Pav_ZEROCNTHALF: case Pav_ZEROCNTDBL:
       return "vclz_";                           // b, h, w, d
 
+   /* trailing zero count */
+   case Pav_TRAILINGZEROCNTBYTE: case Pav_TRAILINGZEROCNTWORD:
+   case Pav_TRAILINGZEROCNTHALF: case Pav_TRAILINGZEROCNTDBL:
+      return "vctz_";                           // b, h, w, d
+
    /* vector gather (byte-by-byte bit matrix transpose) */
    case Pav_BITMTXXPOSE:
       return "vgbbd";
 
+   /* Vector Half-precision format to single precision conversion */
+   case Pav_F16toF32x4:
+      return"xvcvhpsp";
+
+   /* Vector Single-precision format to Half-precision conversion */
+   case Pav_F32toF16x4:
+      return"xvcvsphp";
+
+   /* Vector Half-precision format to Double precision conversion */
+   case Pav_F16toF64x2:
+      return"xvcvhpdp";
+
+      /* Vector Half-precision format to Double precision conversion */
+   case Pav_F64toF16x2:
+      return"xvcvdphp";
+
    default: vpanic("showPPCAvOp");
    }
 }
@@ -925,6 +987,32 @@
    i->Pin.FpBinary.srcR = srcR;
    return i;
 }
+PPCInstr* PPCInstr_Fp128Unary(PPCFpOp op, HReg dst, HReg src) {
+   PPCInstr* i = LibVEX_Alloc_inline( sizeof(PPCInstr) );
+   i->tag = Pin_Fp128Unary;
+   i->Pin.Fp128Unary.op = op;
+   i->Pin.Fp128Unary.dst = dst;
+   i->Pin.Fp128Unary.src = src;
+   return i;
+}
+PPCInstr* PPCInstr_Fp128Binary(PPCFpOp op, HReg dst, HReg srcL, HReg srcR) {
+   PPCInstr* i = LibVEX_Alloc_inline( sizeof(PPCInstr) );
+   i->tag = Pin_Fp128Binary;
+   i->Pin.Fp128Binary.op = op;
+   i->Pin.Fp128Binary.dst = dst;
+   i->Pin.Fp128Binary.srcL = srcL;
+   i->Pin.Fp128Binary.srcR = srcR;
+   return i;
+}
+PPCInstr* PPCInstr_Fp128Trinary(PPCFpOp op, HReg dst, HReg srcL, HReg srcR) {
+   PPCInstr* i = LibVEX_Alloc_inline( sizeof(PPCInstr) );
+   i->tag = Pin_Fp128Trinary;
+   i->Pin.Fp128Trinary.op = op;
+   i->Pin.Fp128Trinary.dst = dst;
+   i->Pin.Fp128Trinary.srcL = srcL;
+   i->Pin.Fp128Trinary.srcR = srcR;
+   return i;
+}
 PPCInstr* PPCInstr_FpMulAcc ( PPCFpOp op, HReg dst, HReg srcML, 
                                           HReg srcMR, HReg srcAcc )
 {
@@ -1268,6 +1356,16 @@
    i->Pin.AvBinary.srcR = srcR;
    return i;
 }
+PPCInstr* PPCInstr_AvBinaryInt ( PPCAvOp op, HReg dst,
+                                 HReg src, PPCRI* val ) {
+   PPCInstr* i            = LibVEX_Alloc_inline(sizeof(PPCInstr));
+   i->tag                 = Pin_AvBinaryInt;
+   i->Pin.AvBinaryInt.op  = op;
+   i->Pin.AvBinaryInt.dst = dst;
+   i->Pin.AvBinaryInt.src = src;
+   i->Pin.AvBinaryInt.val = val;
+   return i;
+}
 PPCInstr* PPCInstr_AvBin8x16 ( PPCAvOp op, HReg dst,
                                HReg srcL, HReg srcR ) {
    PPCInstr* i           = LibVEX_Alloc_inline(sizeof(PPCInstr));
@@ -1415,15 +1513,14 @@
    i->Pin.AvHashV128Binary.s_field = s_field;
    return i;
 }
-PPCInstr* PPCInstr_AvBCDV128Trinary ( PPCAvOp op, HReg dst,
-                                      HReg src1, HReg src2, PPCRI* ps ) {
+PPCInstr* PPCInstr_AvBCDV128Binary ( PPCAvOp op, HReg dst,
+                                     HReg src1, HReg src2 ) {
    PPCInstr* i = LibVEX_Alloc_inline(sizeof(PPCInstr));
-   i->tag      = Pin_AvBCDV128Trinary;
-   i->Pin.AvBCDV128Trinary.op   = op;
-   i->Pin.AvBCDV128Trinary.dst  = dst;
-   i->Pin.AvBCDV128Trinary.src1 = src1;
-   i->Pin.AvBCDV128Trinary.src2 = src2;
-   i->Pin.AvBCDV128Trinary.ps   = ps;
+   i->tag      = Pin_AvBCDV128Binary;
+   i->Pin.AvBCDV128Binary.op   = op;
+   i->Pin.AvBCDV128Binary.dst  = dst;
+   i->Pin.AvBCDV128Binary.src1 = src1;
+   i->Pin.AvBCDV128Binary.src2 = src2;
    return i;
 }
 
@@ -1717,6 +1814,28 @@
       vex_printf(",");
       ppHRegPPC(i->Pin.FpBinary.srcR);
       return;
+   case Pin_Fp128Unary:
+      vex_printf("%s ", showPPCFpOp(i->Pin.Fp128Unary.op));
+      ppHRegPPC(i->Pin.Fp128Unary.dst);
+      vex_printf(",");
+      ppHRegPPC(i->Pin.Fp128Unary.src);
+      return;
+   case Pin_Fp128Binary:
+      vex_printf("%s ", showPPCFpOp(i->Pin.Fp128Binary.op));
+      ppHRegPPC(i->Pin.Fp128Binary.dst);
+      vex_printf(",");
+      ppHRegPPC(i->Pin.Fp128Binary.srcL);
+      vex_printf(",");
+      ppHRegPPC(i->Pin.Fp128Binary.srcR);
+      return;
+   case Pin_Fp128Trinary:
+      vex_printf("%s ", showPPCFpOp(i->Pin.Fp128Trinary.op));
+      ppHRegPPC(i->Pin.Fp128Trinary.dst);
+      vex_printf(",");
+      ppHRegPPC(i->Pin.Fp128Trinary.srcL);
+      vex_printf(",");
+      ppHRegPPC(i->Pin.Fp128Trinary.srcR);
+      return;
    case Pin_FpMulAcc:
       vex_printf("%s ", showPPCFpOp(i->Pin.FpMulAcc.op));
       ppHRegPPC(i->Pin.FpMulAcc.dst);
@@ -1874,6 +1993,14 @@
       vex_printf(",");
       ppHRegPPC(i->Pin.AvBinary.srcR);
       return;
+   case Pin_AvBinaryInt:
+      vex_printf("%s ", showPPCAvOp(i->Pin.AvBinaryInt.op));
+      ppHRegPPC(i->Pin.AvBinaryInt.dst);
+      vex_printf(",");
+      ppHRegPPC(i->Pin.AvBinaryInt.src);
+      vex_printf(",");
+      ppPPCRI(i->Pin.AvBinaryInt.val);
+      return;
    case Pin_AvBin8x16:
       vex_printf("%s(b) ", showPPCAvOp(i->Pin.AvBin8x16.op));
       ppHRegPPC(i->Pin.AvBin8x16.dst);
@@ -2038,15 +2165,13 @@
       ppPPCRI(i->Pin.AvHashV128Binary.s_field);
       return;
 
-   case Pin_AvBCDV128Trinary:
-      vex_printf("%s(w) ", showPPCAvOp(i->Pin.AvBCDV128Trinary.op));
-      ppHRegPPC(i->Pin.AvBCDV128Trinary.dst);
+   case Pin_AvBCDV128Binary:
+      vex_printf("%s(w) ", showPPCAvOp(i->Pin.AvBCDV128Binary.op));
+      ppHRegPPC(i->Pin.AvBCDV128Binary.dst);
       vex_printf(",");
-      ppHRegPPC(i->Pin.AvBCDV128Trinary.src1);
+      ppHRegPPC(i->Pin.AvBCDV128Binary.src1);
       vex_printf(",");
-      ppHRegPPC(i->Pin.AvBCDV128Trinary.src2);
-      vex_printf(",");
-      ppPPCRI(i->Pin.AvBCDV128Trinary.ps);
+      ppHRegPPC(i->Pin.AvBCDV128Binary.src2);
       return;
 
    case Pin_Dfp64Unary:
@@ -2367,6 +2492,21 @@
       addHRegUse(u, HRmRead,  i->Pin.FpBinary.srcL);
       addHRegUse(u, HRmRead,  i->Pin.FpBinary.srcR);
       return;
+
+   case Pin_Fp128Unary:
+      addHRegUse(u, HRmWrite, i->Pin.Fp128Unary.dst);
+      addHRegUse(u, HRmRead, i->Pin.Fp128Unary.src);
+      return;
+   case Pin_Fp128Binary:
+      addHRegUse(u, HRmWrite, i->Pin.Fp128Binary.dst);
+      addHRegUse(u, HRmRead, i->Pin.Fp128Binary.srcL);
+      addHRegUse(u, HRmRead, i->Pin.Fp128Binary.srcR);
+      return;
+   case Pin_Fp128Trinary:
+      addHRegUse(u, HRmModify, i->Pin.Fp128Trinary.dst);
+      addHRegUse(u, HRmRead, i->Pin.Fp128Trinary.srcL);
+      addHRegUse(u, HRmRead, i->Pin.Fp128Trinary.srcR);
+      return;
    case Pin_FpMulAcc:
       addHRegUse(u, HRmWrite, i->Pin.FpMulAcc.dst);
       addHRegUse(u, HRmRead,  i->Pin.FpMulAcc.srcML);
@@ -2432,6 +2572,10 @@
          addHRegUse(u, HRmRead,  i->Pin.AvBinary.srcR);
       }
       return;
+   case Pin_AvBinaryInt:
+      addHRegUse(u, HRmWrite, i->Pin.AvBinaryInt.dst);
+      addHRegUse(u, HRmRead,  i->Pin.AvBinaryInt.src);
+      return;
    case Pin_AvBin8x16:
       addHRegUse(u, HRmWrite, i->Pin.AvBin8x16.dst);
       addHRegUse(u, HRmRead,  i->Pin.AvBin8x16.srcL);
@@ -2511,11 +2655,10 @@
       addHRegUse(u, HRmRead,  i->Pin.AvHashV128Binary.src);
       addRegUsage_PPCRI(u,    i->Pin.AvHashV128Binary.s_field);
       return;
-   case Pin_AvBCDV128Trinary:
-      addHRegUse(u, HRmWrite, i->Pin.AvBCDV128Trinary.dst);
-      addHRegUse(u, HRmRead,  i->Pin.AvBCDV128Trinary.src1);
-      addHRegUse(u, HRmRead,  i->Pin.AvBCDV128Trinary.src2);
-      addRegUsage_PPCRI(u,    i->Pin.AvBCDV128Trinary.ps);
+   case Pin_AvBCDV128Binary:
+      addHRegUse(u, HRmWrite, i->Pin.AvBCDV128Binary.dst);
+      addHRegUse(u, HRmRead,  i->Pin.AvBCDV128Binary.src1);
+      addHRegUse(u, HRmRead,  i->Pin.AvBCDV128Binary.src2);
       return;
    case Pin_Dfp64Unary:
       addHRegUse(u, HRmWrite, i->Pin.Dfp64Unary.dst);
@@ -2719,6 +2862,20 @@
       mapReg(m, &i->Pin.FpBinary.srcL);
       mapReg(m, &i->Pin.FpBinary.srcR);
       return;
+   case Pin_Fp128Unary:
+      mapReg(m, &i->Pin.Fp128Unary.dst);
+      mapReg(m, &i->Pin.Fp128Unary.src);
+      return;
+   case Pin_Fp128Binary:
+      mapReg(m, &i->Pin.Fp128Binary.dst);
+      mapReg(m, &i->Pin.Fp128Binary.srcL);
+      mapReg(m, &i->Pin.Fp128Binary.srcR);
+      return;
+   case Pin_Fp128Trinary:
+      mapReg(m, &i->Pin.Fp128Trinary.dst);
+      mapReg(m, &i->Pin.Fp128Trinary.srcL);
+      mapReg(m, &i->Pin.Fp128Trinary.srcR);
+      return;
    case Pin_FpMulAcc:
       mapReg(m, &i->Pin.FpMulAcc.dst);
       mapReg(m, &i->Pin.FpMulAcc.srcML);
@@ -2769,6 +2926,10 @@
       mapReg(m, &i->Pin.AvBinary.srcL);
       mapReg(m, &i->Pin.AvBinary.srcR);
       return;
+   case Pin_AvBinaryInt:
+      mapReg(m, &i->Pin.AvBinaryInt.dst);
+      mapReg(m, &i->Pin.AvBinaryInt.src);
+      return;
    case Pin_AvBin8x16:
       mapReg(m, &i->Pin.AvBin8x16.dst);
       mapReg(m, &i->Pin.AvBin8x16.srcL);
@@ -2844,11 +3005,10 @@
       mapReg(m, &i->Pin.AvHashV128Binary.dst);
       mapReg(m, &i->Pin.AvHashV128Binary.src);
       return;
-   case Pin_AvBCDV128Trinary:
-      mapReg(m, &i->Pin.AvBCDV128Trinary.dst);
-      mapReg(m, &i->Pin.AvBCDV128Trinary.src1);
-      mapReg(m, &i->Pin.AvBCDV128Trinary.src2);
-      mapRegs_PPCRI(m, i->Pin.AvBCDV128Trinary.ps);
+   case Pin_AvBCDV128Binary:
+      mapReg(m, &i->Pin.AvBCDV128Binary.dst);
+      mapReg(m, &i->Pin.AvBCDV128Binary.src1);
+      mapReg(m, &i->Pin.AvBCDV128Binary.src2);
       return;
    case Pin_Dfp64Unary:
       mapReg(m, &i->Pin.Dfp64Unary.dst);
@@ -3677,6 +3837,95 @@
    return emit32(p, theInstr, endness_host);
 }
 
+static UChar* mkFormVSXRND ( UChar* p, UInt opc1, UInt R, UInt r1,
+                             UInt r2, UInt RMC, UInt opc2, UChar EX,
+                             VexEndness endness_host )
+{
+   /* The register mapping is all done using VR register numbers for the
+    * V128 support.  This means that the operands for this instruction have
+    * been loaded into a VR register. The 32 VR registers map to VSR registers
+    * 32 to 63. For these instructions, the hardware adds 32 to the source
+    * and destination register numbers.  Do not need to adjust the register
+    * numbers for these instructions.
+    */
+
+   UInt theInstr;
+
+   vassert(opc1 < 0x40);
+   vassert(r1   < 0x20);
+   vassert(r2   < 0x20);
+   vassert(opc2 < 0x100);
+   vassert(EX  < 0x2);
+   vassert(R   < 0x2);
+   vassert(RMC < 0x4);
+
+   theInstr = ((opc1<<26)  | (r1<<21) | (R<<16) | (r2<<11) | (RMC<<9) |
+               (opc2 << 1) | EX);
+   return emit32(p, theInstr, endness_host);
+}
+
+static UChar* mkFormVX_BX_TX ( UChar* p, UInt opc1, UInt r1, UInt r2,
+                               UInt r3, UInt opc2, VexEndness endness_host )
+{
+   /* The register mapping is all done using VR register numbers for the
+    * V128 support.  This means that the operands for this instruction have
+    * been loaded into a VR register. The 32 VR registers map to VSR registers
+    * 32 to 63. So to make the issued instruction reference the
+    * corresponding VR register we have to add 32 to the source and
+    * destination operand numbers, then load the new operand number into the
+    * correct bit fields.
+    *
+    * r1 = 32xTX + T; r3 = 32xBX + B;
+    * TX is bit 0, BX is bit 1, T is in bits [25:21], B is in bit [14:11]
+    * opc2 is in bits [10:2]
+    */
+   UInt T, TX, B, BX;
+
+   UInt theInstr;
+
+   r1 += 32;   // adjust the VSR register number to map to the VR number
+   r3 += 32;
+
+   vassert(opc1 < 0x40);
+   vassert(r1   < 0x40);
+   vassert(r2   < 0x20);
+   vassert(r3   < 0x40);
+   vassert(opc2 < 0x800);
+
+   T  = r1 & 0x1F;
+   TX = r1 >> 5;
+   B  = r3 & 0x1F;
+   BX = r3 >> 5;
+   theInstr = ((opc1<<26) | (T<<21) | (r2<<16) | (B<<11) | (opc2<<2)
+               | (BX<<1) | TX);
+   return emit32(p, theInstr, endness_host);
+}
+
+static UChar* mkFormVXR0 ( UChar* p, UInt opc1, UInt r1, UInt r2,
+                           UInt r3, UInt opc2, UChar R0,
+                           VexEndness endness_host )
+{
+   /* The register mapping is all done using VR register numbers for the
+    * V128 support.  This means that the operands for this instruction have
+    * been loaded into a VR register. The 32 VR registers map to VSR registers
+    * 32 to 63.  For these instructions, the hardware adds 32 to the source
+    * and destination register numbers.  Do not need to adjust the register
+    * numbers for these instructions.
+    */
+
+   UInt theInstr;
+
+   vassert(opc1 < 0x40);
+   vassert(r1   < 0x20);  // register numbers are between 0 and 31 (5-bits)
+   vassert(r2   < 0x20);
+   vassert(r3   < 0x20);
+   vassert(opc2 < 0x800);
+   vassert(R0 < 0x2);
+
+   theInstr = ((opc1<<26) | (r1<<21) | (r2<<16) | (r3<<11) | (opc2<<1) | R0);
+   return emit32(p, theInstr, endness_host);
+}
+
 static UChar* mkFormVXI ( UChar* p, UInt opc1, UInt r1, UInt r2,
                           UInt r3, UInt opc2, VexEndness endness_host )
 {
@@ -4015,6 +4264,15 @@
          vassert(mode64);
          p = mkFormX(p, 31, r_src, r_dst, 0, 58, 0, endness_host);
          break;
+      case Pun_CTZ32:  // cnttzw r_dst, r_src
+         /* Note oder of src and dst is backwards from normal */
+         p = mkFormX(p, 31, r_src, r_dst, 0, 538, 0, endness_host);
+         break;
+      case Pun_CTZ64:  // cnttzd r_dst, r_src
+         /* Note oder of src and dst is backwards from normal */
+         vassert(mode64);
+         p = mkFormX(p, 31, r_src, r_dst, 0, 570, 0, endness_host);
+         break;
       case Pun_EXTSW:  // extsw r_dst, r_src
          vassert(mode64);
          p = mkFormX(p, 31, r_src, r_dst, 0, 986, 0, endness_host);
@@ -4610,6 +4868,178 @@
       goto done;
    }
 
+   case Pin_Fp128Unary: {
+      /* Note Fp128 instructions use the vector scalar registers.  The register
+       * mapping for the V128 type assumes the a vector instruction.  The
+       * PPC hardware has a single register file that the vector scalar
+       * registers and the vector registers map to.  The 32 vector
+       * registers instructions map to the same registers as the vector
+       * scalar registers 32 to 63.  mkFormVXR0 does the needed
+       * adjustment.
+      */
+      UInt fr_dst = vregEnc(i->Pin.Fp128Unary.dst);
+      UInt fr_src = vregEnc(i->Pin.Fp128Unary.src);
+
+      switch (i->Pin.Fp128Unary.op) {
+      case Pfp_FPSQRTQ:         // xssqrtqp, use rounding specified by RN
+         p = mkFormVXR0( p, 63, fr_dst, 27, fr_src, 804, 0, endness_host );
+         break;
+      case Pfp_FPSQRTQRNDODD:   // xssqrtqpo, use rounding specified by RN
+         p = mkFormVXR0( p, 63, fr_dst, 27, fr_src, 804, 1, endness_host );
+         break;
+      case Pfp_FPQTOD:         // xscvqpdp, use rounding specified by RN
+         p = mkFormVXR0( p, 63, fr_dst, 20, fr_src, 836, 0, endness_host );
+         break;
+      case Pfp_FPQTODRNDODD:   // xscvqpdpo, use rounding specified by RN
+         p = mkFormVXR0( p, 63, fr_dst, 20, fr_src, 836, 1, endness_host );
+        break;
+      case Pfp_FPDTOQ:         // xscvdpqp
+         p = mkFormVXR0( p, 63, fr_dst, 22, fr_src, 836, 0, endness_host );
+         break;
+      case Pfp_IDSTOQ:         // xscvsdqp
+         p = mkFormVXR0( p, 63, fr_dst, 10, fr_src, 836, 0, endness_host );
+         break;
+      case Pfp_IDUTOQ:         // xscvudqp
+         p = mkFormVXR0( p, 63, fr_dst, 2, fr_src, 836, 0, endness_host );
+         break;
+      case Pfp_TRUNCFPQTOISD:   // xscvqpsdz
+         p = mkFormVXR0( p, 63, fr_dst, 25, fr_src, 836, 0, endness_host );
+         break;
+     case Pfp_TRUNCFPQTOISW:   // xscvqpswz
+         p = mkFormVXR0( p, 63, fr_dst, 9, fr_src, 836, 0, endness_host );
+         break;
+     case Pfp_TRUNCFPQTOIUD:   // xscvqpudz
+         p = mkFormVXR0( p, 63, fr_dst, 17, fr_src, 836, 0, endness_host );
+         break;
+      case Pfp_TRUNCFPQTOIUW:   // xscvqpuwz
+         p = mkFormVXR0( p, 63, fr_dst, 1, fr_src, 836, 0, endness_host );
+         break;
+      default:
+	 goto bad;
+      }
+      goto done;
+   }
+
+   case Pin_Fp128Binary: {
+      /* Note Fp128 instructions use the vector registers */
+      UInt fr_dst  = vregEnc(i->Pin.Fp128Binary.dst);
+      UInt fr_srcL = vregEnc(i->Pin.Fp128Binary.srcL);
+      UInt fr_srcR = vregEnc(i->Pin.Fp128Binary.srcR);
+
+      /* Note this issues a Vector scalar instruction.  The register
+       * mapping for the V128 type assumes the a vector instruction.  The
+       * PPC hardware has a single register file that the vector scalar
+       * registers and the vector registers map to.  The 32 vector
+       * registers instructions map to the same registers as the vector
+       * scalar registers 32 to 63.  For these instructions the HW adds
+       * 32 to the register numbers to access the VSRR register.  No need
+       * to adjust the numbers to map to the VR register that contians the
+       * operands.
+       */
+
+      switch (i->Pin.Fp128Binary.op) {
+      case Pfp_FPADDQ:         // xsaddqp, use rounding specified by RN
+         p = mkFormVXR0( p, 63, fr_dst, fr_srcL, fr_srcR, 4, 0, endness_host );
+         break;
+      case Pfp_FPADDQRNDODD:   // xsaddqpo, round to odd
+         p = mkFormVXR0( p, 63, fr_dst, fr_srcL, fr_srcR, 4, 1, endness_host );
+         break;
+      case Pfp_FPSUBQ:         // xssubqp, use rounding specified by RN
+         p = mkFormVXR0( p, 63, fr_dst, fr_srcL, fr_srcR, 516, 0, endness_host );
+         break;
+      case Pfp_FPSUBQRNDODD:   // xssubqpo, round to odd
+         p = mkFormVXR0( p, 63, fr_dst, fr_srcL, fr_srcR, 516, 1, endness_host );
+         break;
+      case Pfp_FPMULQ:         // xsmulqp, use rounding specified by RN
+         p = mkFormVXR0( p, 63, fr_dst, fr_srcL, fr_srcR, 36, 0, endness_host );
+         break;
+      case Pfp_FPMULQRNDODD:   // xsmulqpo, round to odd
+         p = mkFormVXR0( p, 63, fr_dst, fr_srcL, fr_srcR, 36, 1, endness_host );
+         break;
+      case Pfp_FPDIVQ:         // xsdivqp, use rounding specified by RN
+         p = mkFormVXR0( p, 63, fr_dst, fr_srcL, fr_srcR, 548, 0, endness_host );
+         break;
+      case Pfp_FPDIVQRNDODD:   // xsdivqpo, round to odd
+         p = mkFormVXR0( p, 63, fr_dst, fr_srcL, fr_srcR, 548, 1, endness_host );
+         break;
+      case Pfp_FPMULADDQ:      // xsmaddqp, use rounding specified by RN
+         p = mkFormVXR0( p, 63, fr_dst, fr_srcL, fr_srcR, 388, 0, endness_host );
+         break;
+      case Pfp_FPMULADDQRNDODD:   // xsmaddqpo, round to odd
+         p = mkFormVXR0( p, 63, fr_dst, fr_srcL, fr_srcR, 388, 1, endness_host );
+         break;
+      case Pfp_FPMULSUBQ:         // xsmsubqp, use rounding specified by RN
+         p = mkFormVXR0( p, 63, fr_dst, fr_srcL, fr_srcR, 420, 0, endness_host );
+         break;
+      case Pfp_FPMULSUBQRNDODD:   // xsmsubsqpo, round to odd
+         p = mkFormVXR0( p, 63, fr_dst, fr_srcL, fr_srcR, 420, 1, endness_host );
+         break;
+      case Pfp_FPNEGMULADDQ:      // xsnmaddqp, use rounding specified by RN
+         p = mkFormVXR0( p, 63, fr_dst, fr_srcL, fr_srcR, 452, 0, endness_host );
+         break;
+      case Pfp_FPNEGMULADDQRNDODD:   // xsnmaddqpo, round to odd
+         p = mkFormVXR0( p, 63, fr_dst, fr_srcL, fr_srcR, 452, 1, endness_host );
+         break;
+      case Pfp_FPNEGMULSUBQ:         // xsnmsubqp, use rounding specified by RN
+         p = mkFormVXR0( p, 63, fr_dst, fr_srcL, fr_srcR, 484, 0, endness_host );
+         break;
+      case Pfp_FPNEGMULSUBQRNDODD:   // xsnmsubsqpo, round to odd
+         p = mkFormVXR0( p, 63, fr_dst, fr_srcL, fr_srcR, 484, 1, endness_host );
+         break;
+     default:
+          goto bad;
+      }
+      goto done;
+   }
+
+   case Pin_Fp128Trinary: {
+      /* Note Fp128 instructions use the vector registers */
+      UInt fr_dst  = vregEnc(i->Pin.Fp128Binary.dst);
+      UInt fr_srcL = vregEnc(i->Pin.Fp128Binary.srcL);
+      UInt fr_srcR = vregEnc(i->Pin.Fp128Binary.srcR);
+
+      /* Note this issues a Vector scalar instruction.  The register
+       * mapping for the V128 type assumes the a vector instruction.  The
+       * PPC hardware has a single register file that the vector scalar
+       * registers and the vector registers map to.  The 32 vector
+       * registers instructions map to the same registers as the vector
+       * scalar registers 32 to 63.  For these instructions the HW adds
+       * 32 to the register numbers to access the VSRR register.  No need
+       * to adjust the numbers to map to the VR register that contians the
+       * operands.
+       */
+
+      switch (i->Pin.Fp128Binary.op) {
+      case Pfp_FPMULADDQ:      // xsmaddqp, use rounding specified by RN
+         p = mkFormVXR0( p, 63, fr_dst, fr_srcL, fr_srcR, 388, 0, endness_host );
+         break;
+      case Pfp_FPMULADDQRNDODD:   // xsmaddqpo, round to odd
+         p = mkFormVXR0( p, 63, fr_dst, fr_srcL, fr_srcR, 388, 1, endness_host );
+         break;
+      case Pfp_FPMULSUBQ:         // xsmsubqp, use rounding specified by RN
+         p = mkFormVXR0( p, 63, fr_dst, fr_srcL, fr_srcR, 420, 0, endness_host );
+         break;
+      case Pfp_FPMULSUBQRNDODD:   // xsmsubsqpo, round to odd
+         p = mkFormVXR0( p, 63, fr_dst, fr_srcL, fr_srcR, 420, 1, endness_host );
+         break;
+      case Pfp_FPNEGMULADDQ:      // xsnmaddqp, use rounding specified by RN
+         p = mkFormVXR0( p, 63, fr_dst, fr_srcL, fr_srcR, 452, 0, endness_host );
+         break;
+      case Pfp_FPNEGMULADDQRNDODD:   // xsnmaddqpo, round to odd
+         p = mkFormVXR0( p, 63, fr_dst, fr_srcL, fr_srcR, 452, 1, endness_host );
+         break;
+      case Pfp_FPNEGMULSUBQ:         // xsnmsubqp, use rounding specified by RN
+        p = mkFormVXR0( p, 63, fr_dst, fr_srcL, fr_srcR, 484, 0, endness_host );
+         break;
+      case Pfp_FPNEGMULSUBQRNDODD:   // xsnmsubsqpo, round to odd
+         p = mkFormVXR0( p, 63, fr_dst, fr_srcL, fr_srcR, 484, 1, endness_host );
+         break;
+      default:
+          goto bad;
+      }
+      goto done;
+   }
+
    case Pin_FpMulAcc: {
       UInt fr_dst    = fregEnc(i->Pin.FpMulAcc.dst);
       UInt fr_srcML  = fregEnc(i->Pin.FpMulAcc.srcML);
@@ -4812,7 +5242,8 @@
    case Pin_AvUnary: {
       UInt v_dst = vregEnc(i->Pin.AvUnary.dst);
       UInt v_src = vregEnc(i->Pin.AvUnary.src);
-      UInt opc2;
+      UInt opc2 = 0xFFFFFFFF, opc3 = 0xFFFFFFFF; /* invalid */
+
       switch (i->Pin.AvUnary.op) {
       case Pav_MOV:       opc2 = 1156; break; // vor vD,vS,vS
       case Pav_NOT:       opc2 = 1284; break; // vnor vD,vS,vS
@@ -4827,7 +5258,19 @@
       case Pav_ZEROCNTHALF: opc2 = 1858; break; // vclzh
       case Pav_ZEROCNTWORD: opc2 = 1922; break; // vclzw
       case Pav_ZEROCNTDBL:  opc2 = 1986; break; // vclzd
+      case Pav_TRAILINGZEROCNTBYTE: opc2 = 1538; break; // vctzb
+      case Pav_TRAILINGZEROCNTHALF: opc2 = 1538; break; // vctzh
+      case Pav_TRAILINGZEROCNTWORD: opc2 = 1538; break; // vctzw
+      case Pav_TRAILINGZEROCNTDBL:  opc2 = 1538; break; // vctzd
       case Pav_BITMTXXPOSE: opc2 = 1292; break; // vgbbd
+      case Pav_BCD128toI128S: opc2 = 385; break; //bcdctsq.
+      case Pav_MulI128by10:      opc2 = 513; break; // vmul10uq
+      case Pav_MulI128by10Carry: opc2 =   1; break; // vmul10cuq
+      case Pav_F16toF64x2: opc2 = 347; opc3 = 16; break; // xvcvhpdp
+      case Pav_F64toF16x2: opc2 = 347; opc3 = 17; break; // xvcvdphp
+      case Pav_F16toF32x4: opc2 = 475; opc3 = 24; break; // xvcvhpsp
+      case Pav_F32toF16x4: opc2 = 475; opc3 = 25; break; // xvcvsphp
+
       default:
          goto bad;
       }
@@ -4836,6 +5279,55 @@
       case Pav_NOT:
          p = mkFormVX( p, 4, v_dst, v_src, v_src, opc2, endness_host );
          break;
+      case Pav_F16toF32x4:
+         {
+            /* I64 source has four 16-bit float values in the upper 64-bit
+             * of the source vector register, lower 64-bits are undefined.
+             */
+            /* Scatter the four F16 values in the Vector register */
+            p = mkFormVX( p, 4, v_dst, 0, v_src, 590, endness_host );// vupkhsh
+
+            /* The layout of the vector register is now: S0F0 S1F1 S2F2 S3F3
+             * where S is the sign extension of the 16-bit F value.  We don't
+             * care about the extended signs.
+             */
+
+            /* Input, in v_dst, is now correct for the xvcvhpsp instruction */
+            p = mkFormVX_BX_TX( p, 60, v_dst, opc3, v_dst, opc2,
+                                endness_host );
+         }
+         break;
+      case Pav_F64toF16x2:
+      case Pav_F16toF64x2:
+      case Pav_F32toF16x4:
+         /* Note this issues a Vector scalar instruction.  The register
+          * mapping for the V128 type assumes the a vector instruction.  The
+          * PPC hardware has a single register file that the vector scalar
+          * registers and the vector registers map to.  The 32 vector registers
+          * instructions map to the same registers as the vector scalar
+          * registers 32 to 63.  mkFormVX_BX_TX does the needed adjustment.
+          */
+         p = mkFormVX_BX_TX( p, 60, v_dst, opc3, v_src, opc2, endness_host );
+         break;
+      case Pav_BCD128toI128S:  // bcdctsq
+         p = mkFormVX( p, 4, v_dst, 0, v_src, (1<<10 | 385), endness_host );
+         break;
+      case Pav_MulI128by10:
+      case Pav_MulI128by10Carry:
+         p = mkFormVX( p, 4, v_dst, v_src, 0, opc2, endness_host );
+         break;
+      case Pav_TRAILINGZEROCNTBYTE:
+         p = mkFormVX( p, 4, v_dst, 28, v_src, opc2, endness_host );
+         break;
+      case Pav_TRAILINGZEROCNTHALF:
+         p = mkFormVX( p, 4, v_dst, 29, v_src, opc2, endness_host );
+         break;
+      case Pav_TRAILINGZEROCNTWORD:
+         p = mkFormVX( p, 4, v_dst, 30, v_src, opc2, endness_host );
+         break;
+      case Pav_TRAILINGZEROCNTDBL:
+         p = mkFormVX( p, 4, v_dst, 31, v_src, opc2, endness_host );
+         break;
       default:
          p = mkFormVX( p, 4, v_dst, 0, v_src, opc2, endness_host );
          break;
@@ -4863,6 +5355,9 @@
       case Pav_AND:       opc2 = 1028; break; // vand
       case Pav_OR:        opc2 = 1156; break; // vor
       case Pav_XOR:       opc2 = 1220; break; // vxor
+      /* Mult by 10 */
+      case Pav_MulI128by10E:      opc2 = 577; break; // vmul10euq
+      case Pav_MulI128by10ECarry: opc2 =  65; break; // vmul10ecuq
       default:
          goto bad;
       }
@@ -4870,6 +5365,61 @@
       goto done;
    }
 
+   case Pin_AvBinaryInt: {
+       UInt   ps = i->Pin.AvBinaryInt.val->Pri.Imm;
+       UInt   dst = vregEnc(i->Pin.AvBinaryInt.dst);
+       UInt   src = vregEnc(i->Pin.AvBinaryInt.src);
+
+       switch (i->Pin.AvBinaryInt.op) {
+       /* BCD */
+       case Pav_I128StoBCD128:   // bcdcfsq
+          {
+              /* v_srcR actually contains the value of the one-bit ps field */
+              int opc2 = 385;
+              p = mkFormVX( p, 4, dst, 2, src,
+                            (1 << 10 | (ps << 9) | opc2), endness_host );
+           }
+       break;
+
+       case Pav_F128toI128S:  // xsrqpi, xsrqpix
+           {
+              int opc2 = 5;
+              UInt EX  = ps & 0x1;
+              UInt R   = (ps >> 3) & 0x1;
+              UInt RMC = (ps >> 1) & 0x3;
+              /* Note this issues a Vector scalar instruction.  The register
+               * mapping for the V128 type assumes the a vector instruction.  The
+               * PPC hardware has a single register file that the vector scalar
+               * registers and the vector registers map to.  The 32 vector
+               * registers instructions map to the same registers as the vector
+               * scalar registers 32 to 63.  For these instructions the HW adds
+               * 32 to the register numbers to access the VSRR register.  No need
+               * to adjust the numbers to map to the VR register that contians
+               * the operands.
+               */
+              p = mkFormVSXRND( p, 63, R, dst, src, RMC, opc2, EX,
+                                endness_host );
+           }
+	break;
+
+        case Pav_ROUNDFPQ:   // xsrqpxp
+           {
+              int opc2 = 37;
+              UInt EX  = ps & 0x1;
+              UInt RMC = (ps >> 1) & 0x3;
+              UInt R   = (ps >> 3) & 0x1;
+              p = mkFormVSXRND( p, 63, R, dst, src, RMC, opc2, EX,
+                                endness_host );
+           }
+	break;
+
+        default:
+           goto bad;
+
+      }
+      goto done;
+   }
+
    case Pin_AvBin8x16: {
       UInt v_dst  = vregEnc(i->Pin.AvBin8x16.dst);
       UInt v_srcL = vregEnc(i->Pin.AvBin8x16.srcL);
@@ -5104,20 +5654,22 @@
       p = mkFormVX( p, 4, v_dst, v_src, s_field->Pri.Imm, opc2, endness_host );
       goto done;
    }
-   case Pin_AvBCDV128Trinary: {
-      UInt v_dst  = vregEnc(i->Pin.AvBCDV128Trinary.dst);
-      UInt v_src1 = vregEnc(i->Pin.AvBCDV128Trinary.src1);
-      UInt v_src2 = vregEnc(i->Pin.AvBCDV128Trinary.src2);
-      PPCRI* ps   = i->Pin.AvBCDV128Trinary.ps;
+   case Pin_AvBCDV128Binary: {
+      UInt v_dst  = vregEnc(i->Pin.AvBCDV128Binary.dst);
+      UInt v_src1 = vregEnc(i->Pin.AvBCDV128Binary.src1);
+      UInt v_src2 = vregEnc(i->Pin.AvBCDV128Binary.src2);
+      UInt ps = 0;    /* Issue the instruction with ps=0.  The IR code will
+                       * fix up the result if ps=1.
+                       */
       UInt opc2;
-      switch (i->Pin.AvBCDV128Trinary.op) {
+      switch (i->Pin.AvBCDV128Binary.op) {
       case Pav_BCDAdd:   opc2 =  1; break; // bcdadd
       case Pav_BCDSub:   opc2 = 65; break; // bcdsub
       default:
          goto bad;
       }
       p = mkFormVXR( p, 4, v_dst, v_src1, v_src2,
-                     0x1, (ps->Pri.Imm << 9) | opc2, endness_host );
+                     0x1, ps | opc2, endness_host );
       goto done;
    }
    case Pin_AvBin32Fx4: {
diff --git a/VEX/priv/host_ppc_defs.h b/VEX/priv/host_ppc_defs.h
index c04c994..59a016b 100644
--- a/VEX/priv/host_ppc_defs.h
+++ b/VEX/priv/host_ppc_defs.h
@@ -291,6 +291,8 @@
       Pun_NOT,
       Pun_CLZ32,
       Pun_CLZ64,
+      Pun_CTZ32,
+      Pun_CTZ64,
       Pun_EXTSW
    }
    PPCUnaryOp;
@@ -334,6 +336,24 @@
       /* Ternary */
       Pfp_MADDD,  Pfp_MSUBD,
       Pfp_MADDS,  Pfp_MSUBS,
+      Pfp_FPADDQ, Pfp_FPADDQRNDODD,
+      Pfp_FPSUBQ, Pfp_FPSUBQRNDODD,
+      Pfp_FPMULQ, Pfp_FPMULQRNDODD,
+      Pfp_FPDIVQ, Pfp_FPDIVQRNDODD,
+      Pfp_FPMULADDQ, Pfp_FPMULADDQRNDODD,
+      Pfp_FPMULSUBQ, Pfp_FPMULSUBQRNDODD,
+      Pfp_FPNEGMULADDQ, Pfp_FPNEGMULADDQRNDODD,
+      Pfp_FPNEGMULSUBQ, Pfp_FPNEGMULSUBQRNDODD,
+      Pfp_FPSQRTQ, Pfp_FPSQRTQRNDODD,
+      Pfp_FPQTOD, Pfp_FPQTODRNDODD,
+      Pfp_FPQTOW, Pfp_FPQTOWRNDODD,
+      Pfp_FPDTOQ,
+      Pfp_IDSTOQ,
+      Pfp_IDUTOQ,
+      Pfp_TRUNCFPQTOISD,
+      Pfp_TRUNCFPQTOISW,
+      Pfp_TRUNCFPQTOIUD,
+      Pfp_TRUNCFPQTOIUW,
       Pfp_DFPADD, Pfp_DFPADDQ,
       Pfp_DFPSUB, Pfp_DFPSUBQ,
       Pfp_DFPMUL, Pfp_DFPMULQ,
@@ -409,11 +429,51 @@
       /* BCD Arithmetic */
       Pav_BCDAdd, Pav_BCDSub,
 
+      /* Conversion signed 128-bit value to signed BCD 128-bit */
+      Pav_I128StoBCD128,
+
+      /* Conversion signed BCD 128-bit to signed 128-bit value */
+      Pav_BCD128toI128S,
+
       /* zero count */
       Pav_ZEROCNTBYTE, Pav_ZEROCNTWORD, Pav_ZEROCNTHALF, Pav_ZEROCNTDBL,
 
+      /* trailing zero count */
+      Pav_TRAILINGZEROCNTBYTE, Pav_TRAILINGZEROCNTWORD,
+      Pav_TRAILINGZEROCNTHALF, Pav_TRAILINGZEROCNTDBL,
+
       /* Vector bit matrix transpose by byte */
       Pav_BITMTXXPOSE,
+
+      /* Vector Half-precision format to single precision conversion */
+      Pav_F16toF32x4,
+
+      /* Vector Single precision format to Half-precision conversion */
+      Pav_F32toF16x4,
+
+      /* Vector Half-precision format to Double precision conversion */
+      Pav_F16toF64x2,
+
+      /* Vector Double precision format to Half-precision conversion */
+      Pav_F64toF16x2,
+
+      /* 128 bit mult by 10 */
+      Pav_MulI128by10,
+
+      /* 128 bit mult by 10, carry out */
+      Pav_MulI128by10Carry,
+
+      /* 128 bit mult by 10 plus carry in */
+      Pav_MulI128by10E,
+
+      /* 128 bit mult by 10 plus carry in, carry out */
+      Pav_MulI128by10ECarry,
+
+      /* F128 to I128 */
+      Pav_F128toI128S,
+
+      /* Round F128 to F128 */
+      Pav_ROUNDFPQ,
    }
    PPCAvOp;
 
@@ -466,6 +526,9 @@
 
       Pin_FpUnary,    /* FP unary op */
       Pin_FpBinary,   /* FP binary op */
+      Pin_Fp128Unary,   /* FP unary op for 128-bit floating point */
+      Pin_Fp128Binary,  /* FP binary op for 128-bit floating point */
+      Pin_Fp128Trinary, /* FP trinary op for 128-bit floating point */
       Pin_FpMulAcc,   /* FP multipy-accumulate style op */
       Pin_FpLdSt,     /* FP load/store */
       Pin_FpSTFIW,    /* stfiwx */
@@ -481,6 +544,7 @@
       Pin_AvUnary,    /* AV unary general reg=>reg */
 
       Pin_AvBinary,   /* AV binary general reg,reg=>reg */
+      Pin_AvBinaryInt,/* AV binary  reg,int=>reg */
       Pin_AvBin8x16,  /* AV binary, 8x4 */
       Pin_AvBin16x8,  /* AV binary, 16x4 */
       Pin_AvBin32x4,  /* AV binary, 32x4 */
@@ -499,7 +563,7 @@
       Pin_AvCipherV128Unary,  /* AV Vector unary Cipher */
       Pin_AvCipherV128Binary, /* AV Vector binary Cipher */
       Pin_AvHashV128Binary, /* AV Vector binary Hash */
-      Pin_AvBCDV128Trinary, /* BCD Arithmetic */
+      Pin_AvBCDV128Binary,  /* BCD Arithmetic */
       Pin_Dfp64Unary,   /* DFP64  unary op */
       Pin_Dfp128Unary,  /* DFP128 unary op */
       Pin_DfpShift,     /* Decimal floating point shift by immediate value */
@@ -700,6 +764,23 @@
          } FpBinary;
          struct {
             PPCFpOp op;
+            HReg dst;
+            HReg src;
+         } Fp128Unary;
+         struct {
+            PPCFpOp op;
+            HReg dst;
+            HReg srcL;
+            HReg srcR;
+         } Fp128Binary;
+         struct {
+            PPCFpOp op;
+            HReg dst;
+            HReg srcL;
+            HReg srcR;
+         } Fp128Trinary;
+         struct {
+            PPCFpOp op;
             HReg    dst;
             HReg    srcML;
             HReg    srcMR;
@@ -778,6 +859,12 @@
          struct {
             PPCAvOp op;
             HReg    dst;
+            HReg    src;
+            PPCRI*  val;
+         } AvBinaryInt;
+         struct {
+            PPCAvOp op;
+            HReg    dst;
             HReg    srcL;
             HReg    srcR;
          } AvBin8x16;
@@ -867,8 +954,7 @@
             HReg       dst;
             HReg      src1;
             HReg      src2;
-            PPCRI*      ps;
-         } AvBCDV128Trinary;
+         } AvBCDV128Binary;
          struct {
             PPCAvOp   op;
             HReg      dst;
@@ -1026,6 +1112,11 @@
 extern PPCInstr* PPCInstr_MfCR       ( HReg dst );
 extern PPCInstr* PPCInstr_MFence     ( void );
 
+extern PPCInstr* PPCInstr_Fp128Unary    ( PPCFpOp op, HReg dst, HReg src );
+extern PPCInstr* PPCInstr_Fp128Binary   ( PPCFpOp op, HReg dst, HReg srcL, HReg srcR );
+extern PPCInstr* PPCInstr_Fp128Trinary  ( PPCFpOp op, HReg dst, HReg srcL,
+                                          HReg srcR);
+
 extern PPCInstr* PPCInstr_FpUnary    ( PPCFpOp op, HReg dst, HReg src );
 extern PPCInstr* PPCInstr_FpBinary   ( PPCFpOp op, HReg dst, HReg srcL, HReg srcR );
 extern PPCInstr* PPCInstr_FpMulAcc   ( PPCFpOp op, HReg dst, HReg srcML, 
@@ -1044,6 +1135,7 @@
 extern PPCInstr* PPCInstr_AvLdSt     ( Bool isLoad, UChar sz, HReg, PPCAMode* );
 extern PPCInstr* PPCInstr_AvUnary    ( PPCAvOp op, HReg dst, HReg src );
 extern PPCInstr* PPCInstr_AvBinary   ( PPCAvOp op, HReg dst, HReg srcL, HReg srcR );
+extern PPCInstr* PPCInstr_AvBinaryInt( PPCAvOp op, HReg dst, HReg src, PPCRI* val );
 extern PPCInstr* PPCInstr_AvBin8x16  ( PPCAvOp op, HReg dst, HReg srcL, HReg srcR );
 extern PPCInstr* PPCInstr_AvBin16x8  ( PPCAvOp op, HReg dst, HReg srcL, HReg srcR );
 extern PPCInstr* PPCInstr_AvBin32x4  ( PPCAvOp op, HReg dst, HReg srcL, HReg srcR );
@@ -1063,9 +1155,8 @@
                                                HReg srcL, HReg srcR );
 extern PPCInstr* PPCInstr_AvHashV128Binary ( PPCAvOp op, HReg dst,
                                              HReg src, PPCRI* s_field );
-extern PPCInstr* PPCInstr_AvBCDV128Trinary ( PPCAvOp op, HReg dst,
-                                             HReg src1, HReg src2,
-                                             PPCRI* ps );
+extern PPCInstr* PPCInstr_AvBCDV128Binary ( PPCAvOp op, HReg dst,
+                                            HReg src1, HReg src2 );
 extern PPCInstr* PPCInstr_Dfp64Unary  ( PPCFpOp op, HReg dst, HReg src );
 extern PPCInstr* PPCInstr_Dfp64Binary ( PPCFpOp op, HReg dst, HReg srcL,
                                         HReg srcR );
diff --git a/VEX/priv/host_ppc_isel.c b/VEX/priv/host_ppc_isel.c
index 43e7989..d2c4cb1 100644
--- a/VEX/priv/host_ppc_isel.c
+++ b/VEX/priv/host_ppc_isel.c
@@ -497,6 +497,8 @@
                                          IREndness IEndianess );
 static HReg          iselDfp64Expr     ( ISelEnv* env, IRExpr* e,
                                          IREndness IEndianess );
+static HReg iselFp128Expr_wrk ( ISelEnv* env, IRExpr* e, IREndness IEndianess);
+static HReg iselFp128Expr     ( ISelEnv* env, IRExpr* e, IREndness IEndianess);
 
 /* 64-bit mode ONLY: compute an D128 into a GPR64 pair. */
 static void iselDfp128Expr_wrk ( HReg* rHi, HReg* rLo, ISelEnv* env,
@@ -1225,6 +1227,20 @@
    _set_FPU_rounding_mode(env, mode, True, IEndianess);
 }
 
+static
+Bool FPU_rounding_mode_isOdd (IRExpr* mode) {
+   /* If the rounding mode is set to odd, the the expr must be a constant U8
+    * value equal to 8.  Otherwise, it must be a bin op expressiong that
+    * calculates the value.
+    */
+
+   if (mode->tag != Iex_Const)
+      return False;
+
+   vassert(mode->Iex.Const.con->tag == Ico_U32);
+   vassert(mode->Iex.Const.con->Ico.U32 == 0x8);
+   return True;
+}
 
 /*---------------------------------------------------------*/
 /*--- ISEL: vector helpers                              ---*/
@@ -2060,6 +2076,20 @@
          return r_dst;
       }
 
+      case Iop_Ctz32:
+      case Iop_Ctz64: {
+         HReg r_src, r_dst;
+         PPCUnaryOp op_clz = (op_unop == Iop_Ctz32) ? Pun_CTZ32 :
+                                                      Pun_CTZ64;
+         if (op_unop == Iop_Ctz64 && !mode64)
+            goto irreducible;
+         /* Count trailing zeroes. */
+         r_dst = newVRegI(env);
+         r_src = iselWordExpr_R(env, e->Iex.Unop.arg, IEndianess);
+         addInstr(env, PPCInstr_Unary(op_clz,r_dst,r_src));
+         return r_dst;
+      }
+
       case Iop_Left8:
       case Iop_Left16:
       case Iop_Left32: 
@@ -2300,7 +2330,7 @@
             addInstr(env, PPCInstr_Call( cc, (Addr)h_calc_DPBtoBCD,
                                          argiregs, 
                                          mk_RetLoc_simple(RLPri_Int) ) );
-	} else {
+        } else {
             HWord*      fdescr;
             fdescr = (HWord*)h_calc_DPBtoBCD;
             addInstr(env, PPCInstr_Call( cc, (Addr64)(fdescr[0]),
@@ -2311,6 +2341,62 @@
          addInstr(env, mk_iMOVds_RR(r_dst, argregs[0]));
          return r_dst;
       }
+      case Iop_F32toF16x4: {
+         HReg vdst = newVRegV(env);    /* V128 */
+         HReg dst  = newVRegI(env);    /* I64*/
+         HReg r0 = newVRegI(env);    /* I16*/
+         HReg r1 = newVRegI(env);    /* I16*/
+         HReg r2 = newVRegI(env);    /* I16*/
+         HReg r3 = newVRegI(env);    /* I16*/
+         HReg vsrc  = iselVecExpr(env, e->Iex.Unop.arg, IEndianess);
+         PPCAMode *am_off0, *am_off2, *am_off4, *am_off6, *am_off8;
+         PPCAMode *am_off10, *am_off12, *am_off14;
+         HReg r_aligned16;
+
+         sub_from_sp( env, 32 );     // Move SP down
+
+         /* issue instruction */
+         addInstr(env, PPCInstr_AvUnary(Pav_F32toF16x4, vdst, vsrc));
+
+         /* Get a  quadword aligned address within our stack space */
+         r_aligned16 = get_sp_aligned16( env );
+         am_off0  = PPCAMode_IR( 0, r_aligned16 );
+         am_off2  = PPCAMode_IR( 2, r_aligned16 );
+         am_off4  = PPCAMode_IR( 4, r_aligned16 );
+         am_off6  = PPCAMode_IR( 6, r_aligned16 );
+         am_off8  = PPCAMode_IR( 8, r_aligned16 );
+         am_off10 = PPCAMode_IR( 10, r_aligned16 );
+         am_off12 = PPCAMode_IR( 12, r_aligned16 );
+         am_off14 = PPCAMode_IR( 14, r_aligned16 );
+
+         /* Store v128 result to stack. */
+         addInstr(env, PPCInstr_AvLdSt(False/*store*/, 16, vdst, am_off0));
+
+         /* fetch four I16 from V128, store into contiguous I64 via stack,  */
+         if (IEndianess == Iend_LE) {
+            addInstr(env, PPCInstr_Load( 2, r3, am_off12, mode64));
+            addInstr(env, PPCInstr_Load( 2, r2, am_off8, mode64));
+            addInstr(env, PPCInstr_Load( 2, r1, am_off4, mode64));
+            addInstr(env, PPCInstr_Load( 2, r0, am_off0, mode64));
+         } else {
+            addInstr(env, PPCInstr_Load( 2, r0, am_off14, mode64));
+            addInstr(env, PPCInstr_Load( 2, r1, am_off10, mode64));
+            addInstr(env, PPCInstr_Load( 2, r2, am_off6, mode64));
+            addInstr(env, PPCInstr_Load( 2, r3, am_off2, mode64));
+         }
+
+         /* store in contiguous 64-bit values */
+         addInstr(env, PPCInstr_Store( 2, am_off6, r3, mode64));
+         addInstr(env, PPCInstr_Store( 2, am_off4, r2, mode64));
+         addInstr(env, PPCInstr_Store( 2, am_off2, r1, mode64));
+         addInstr(env, PPCInstr_Store( 2, am_off0, r0, mode64));
+
+         /* Fetch I64 */
+         addInstr(env, PPCInstr_Load(8, dst, am_off0, mode64));
+
+         add_to_sp( env, 32 );          // Reset SP
+         return dst;
+      }
 
       default: 
          break;
@@ -3032,6 +3118,8 @@
 static void iselInt128Expr_wrk ( HReg* rHi, HReg* rLo,
                                  ISelEnv* env, IRExpr* e, IREndness IEndianess )
 {
+   Bool mode64 = env->mode64;
+
    vassert(e);
    vassert(typeOfIRExpr(env->type_env,e) == Ity_I128);
 
@@ -3041,6 +3129,21 @@
       return;
    }
 
+   /* 128-bit GET */
+   if (e->tag == Iex_Get) {
+      PPCAMode* am_addr = PPCAMode_IR( e->Iex.Get.offset,
+                                       GuestStatePtr(mode64) );
+      PPCAMode* am_addr4 = advance4(env, am_addr);
+      HReg tLo = newVRegI(env);
+      HReg tHi = newVRegI(env);
+
+      addInstr(env, PPCInstr_Load( 8, tHi, am_addr,  mode64));
+      addInstr(env, PPCInstr_Load( 8, tLo, am_addr4, mode64));
+      *rHi = tHi;
+      *rLo = tLo;
+      return;
+   }
+
    /* --------- BINARY ops --------- */
    if (e->tag == Iex_Binop) {
       switch (e->Iex.Binop.op) {
@@ -3226,6 +3329,36 @@
       return;
    }
 
+   /* --------- CCALL --------- */
+   if(e->tag == Iex_CCall) {
+      IRType ty = typeOfIRExpr(env->type_env,e);
+      Bool mode64 = env->mode64;
+
+      vassert(ty == e->Iex.CCall.retty); /* well-formedness of IR */
+
+      /* be very restrictive for now.  Only 32-bit ints allowed for
+         args, and 32 bits or host machine word for return type. */
+      vassert(!(ty == Ity_I32 || (mode64 && ty == Ity_I64)));
+
+      /* Marshal args, do the call, clear stack. */
+      UInt   addToSp = 0;
+      RetLoc rloc    = mk_RetLoc_INVALID();
+      doHelperCall( &addToSp, &rloc, env, NULL/*guard*/,
+                    e->Iex.CCall.cee, e->Iex.CCall.retty, e->Iex.CCall.args,
+                    IEndianess );
+      vassert(is_sane_RetLoc(rloc));
+
+      vassert(rloc.pri == RLPri_2Int);
+      vassert(addToSp == 0);
+
+      /* GPR3 now holds the destination address from Pin_Goto */
+      HReg r_dst = newVRegI(env);
+      addInstr(env, mk_iMOVds_RR(r_dst, hregPPC_GPR3(mode64)));
+      *rHi = r_dst;
+      *rLo = r_dst;
+      return;
+   }
+
    /* 64-bit ITE */
    if (e->tag == Iex_ITE) { // VFD
       HReg e0Lo, e0Hi, eXLo, eXHi;
@@ -4071,6 +4204,41 @@
 
    if (e->tag == Iex_Binop) {
 
+      if (e->Iex.Binop.op == Iop_F128toF64) {
+         HReg fr_dst = newVRegF(env);
+         HReg fr_src = iselFp128Expr(env, e->Iex.Binop.arg2, IEndianess);
+         HReg tmp = newVRegV(env);
+         PPCAMode* zero_r1 = PPCAMode_IR( 0, StackFramePtr(env->mode64) );
+         PPCAMode* eight_r1 = PPCAMode_IR( 8, StackFramePtr(env->mode64) );
+         PPCFpOp fpop = Pfp_INVALID;
+
+         if (FPU_rounding_mode_isOdd(e->Iex.Binop.arg1)) {
+            /* use rounding mode specified by RN. Issue inst with R0 = 0 */
+            fpop = Pfp_FPQTODRNDODD;
+         } else {
+            set_FPU_rounding_mode( env, e->Iex.Binop.arg1, IEndianess );
+            fpop = Pfp_FPQTOD;
+         }
+
+         addInstr(env, PPCInstr_Fp128Unary(fpop, tmp, fr_src));
+
+         /* result is in a 128-bit vector register, move to 64-bit reg to
+          * match the Iop specification.  The result will get moved back
+          * to a 128-bit register and stored once the value is returned.
+          */
+         sub_from_sp( env, 16 );
+         addInstr(env, PPCInstr_AvLdSt(False/*store*/, 16, tmp, zero_r1));
+         if (IEndianess == Iend_LE)
+            addInstr(env, PPCInstr_FpLdSt(True/*load*/, 8, fr_dst, eight_r1));
+         else
+            /* High 64-bits stored at lower address */
+            addInstr(env, PPCInstr_FpLdSt(True/*load*/, 8, fr_dst, zero_r1));
+
+         add_to_sp( env, 16 );
+
+         return fr_dst;
+      }
+
       if (e->Iex.Binop.op == Iop_RoundF64toF32) {
          HReg r_dst = newVRegF(env);
          HReg r_src = iselDblExpr(env, e->Iex.Binop.arg2, IEndianess);
@@ -4160,6 +4328,43 @@
 
    if (e->tag == Iex_Unop) {
       switch (e->Iex.Unop.op) {
+      case Iop_F128HItoF64:
+      case Iop_F128LOtoF64:
+         {
+            /* put upper/lower 64-bits of F128 into an F64. */
+            HReg     r_aligned16;
+            HReg     fdst = newVRegF(env);
+            HReg     fsrc = iselFp128Expr(env, e->Iex.Unop.arg, IEndianess);
+            PPCAMode *am_off0, *am_off8, *am_off_arg;
+            sub_from_sp( env, 32 );     // Move SP down 32 bytes
+
+            // get a quadword aligned address within our stack space
+            r_aligned16 = get_sp_aligned16( env );
+            am_off0 = PPCAMode_IR( 0, r_aligned16 );
+            am_off8 = PPCAMode_IR( 8 ,r_aligned16 );
+
+            /* store 128-bit floating point value to memory, load low word
+             * or high to 64-bit destination floating point register
+             */
+            addInstr(env, PPCInstr_AvLdSt(False/*store*/, 16, fsrc, am_off0));
+            if (IEndianess == Iend_LE) {
+               if (e->Iex.Binop.op == Iop_F128HItoF64)
+                  am_off_arg = am_off8;
+               else
+                  am_off_arg = am_off0;
+            } else {
+               if (e->Iex.Binop.op == Iop_F128HItoF64)
+                  am_off_arg = am_off0;
+               else
+                  am_off_arg = am_off8;
+            }
+            addInstr(env,
+                    PPCInstr_FpLdSt( True /*load*/,
+                                      8, fdst,
+                                      am_off_arg ));
+            add_to_sp( env, 32 );       // Reset SP
+            return fdst;
+         }
          case Iop_ReinterpI64asF64: {
             /* Given an I64, produce an IEEE754 double with the same
                bit pattern. */
@@ -4278,6 +4483,322 @@
    vpanic( "iselDfp32Expr_wrk(ppc)" );
 }
 
+static HReg iselFp128Expr( ISelEnv* env, IRExpr* e, IREndness IEndianess )
+{
+   HReg r = iselFp128Expr_wrk( env, e, IEndianess );
+   vassert(hregClass(r) == HRcVec128);
+   vassert(hregIsVirtual(r));
+   return r;
+}
+
+/* DO NOT CALL THIS DIRECTLY */
+static HReg iselFp128Expr_wrk( ISelEnv* env, IRExpr* e, IREndness IEndianess)
+{
+   Bool mode64 = env->mode64;
+   PPCFpOp fpop = Pfp_INVALID;
+   IRType  ty = typeOfIRExpr(env->type_env,e);
+
+   vassert(e);
+   vassert( ty == Ity_F128 );
+
+   /* read 128-bit IRTemp */
+   if (e->tag == Iex_RdTmp) {
+      return lookupIRTemp(env, e->Iex.RdTmp.tmp);
+   }
+
+  if (e->tag == Iex_Get) {
+      /* Guest state vectors are 16byte aligned,
+         so don't need to worry here */
+      HReg dst = newVRegV(env);
+
+      addInstr(env,
+               PPCInstr_AvLdSt( True/*load*/, 16, dst,
+                                PPCAMode_IR( e->Iex.Get.offset,
+                                             GuestStatePtr(mode64) )));
+      return dst;
+   }
+
+   if (e->tag == Iex_Unop) {
+      switch (e->Iex.Unop.op) {
+      case Iop_TruncF128toI64S:
+         fpop = Pfp_TRUNCFPQTOISD; goto do_Un_F128;
+      case Iop_TruncF128toI32S:
+         fpop = Pfp_TRUNCFPQTOISW; goto do_Un_F128;
+      case Iop_TruncF128toI64U:
+         fpop = Pfp_TRUNCFPQTOIUD; goto do_Un_F128;
+      case Iop_TruncF128toI32U:
+         fpop = Pfp_TRUNCFPQTOIUW; goto do_Un_F128;
+
+      do_Un_F128: {
+         HReg r_dst = newVRegV(env);
+         HReg r_src = iselFp128Expr(env, e->Iex.Unop.arg, IEndianess);
+         addInstr(env, PPCInstr_Fp128Unary(fpop, r_dst, r_src));
+         return r_dst;
+      }
+
+      case Iop_F64toF128: {
+         fpop = Pfp_FPDTOQ;
+         HReg r_dst = newVRegV(env);
+         HReg r_src = iselDblExpr(env, e->Iex.Unop.arg, IEndianess);
+         HReg v128tmp = newVRegV(env);
+         PPCAMode* zero_r1 = PPCAMode_IR( 0, StackFramePtr(env->mode64) );
+
+         /* value is in 64-bit float reg, need to move to 128-bit vector reg */
+         sub_from_sp( env, 16 );
+         addInstr(env, PPCInstr_FpLdSt(False/*store*/, 8, r_src, zero_r1));
+         addInstr(env, PPCInstr_AvLdSt(True/*load*/, 16, v128tmp, zero_r1));
+         add_to_sp( env, 16 );
+
+         addInstr(env, PPCInstr_Fp128Unary(fpop, r_dst, v128tmp));
+         return r_dst;
+      }
+
+      case Iop_I64StoF128:
+         fpop = Pfp_IDSTOQ; goto do_Un_int_F128;
+      case Iop_I64UtoF128:
+         fpop = Pfp_IDUTOQ; goto do_Un_int_F128;
+
+      do_Un_int_F128: {
+         HReg r_dst = newVRegV(env);
+         HReg tmp = newVRegV(env);
+         HReg r_src = iselWordExpr_R(env, e->Iex.Unop.arg, IEndianess);
+         PPCAMode *am_offhi, *am_offlo;
+         HReg r_aligned16;
+
+         /* source is in a 64-bit integer reg, move to 128-bit float reg
+          * do this via the stack (easy, convenient, etc).
+          */
+         sub_from_sp( env, 32 );        // Move SP down
+
+         /* Get a quadword aligned address within our stack space */
+         r_aligned16 = get_sp_aligned16( env );
+
+         am_offlo  = PPCAMode_IR( 0,  r_aligned16 );
+         am_offhi  = PPCAMode_IR( 8,  r_aligned16 );
+
+         /* Inst only uses the upper 64-bit of the source */
+         addInstr(env, PPCInstr_Load(8, r_src, am_offhi, mode64));
+
+         /* Fetch result back from stack. */
+         addInstr(env, PPCInstr_AvLdSt(True/*load*/, 16, tmp, am_offlo));
+
+         add_to_sp( env, 32 );          // Reset SP
+
+         addInstr(env, PPCInstr_Fp128Unary(fpop, r_dst, tmp));
+         return r_dst;
+      }
+
+      default:
+         break;
+      } /* switch (e->Iex.Unop.op) */
+   } /* if (e->tag == Iex_Unop) */
+
+   if (e->tag == Iex_Binop) {
+      switch (e->Iex.Binop.op) {
+
+      case Iop_F64HLtoF128:
+         {
+            HReg dst    = newVRegV(env);
+            HReg r_src_hi = iselDblExpr(env, e->Iex.Binop.arg1, IEndianess);
+            HReg r_src_lo = iselDblExpr(env, e->Iex.Binop.arg2, IEndianess);
+            PPCAMode *am_offhi, *am_offlo;
+            HReg r_aligned16;
+
+            /* do this via the stack (easy, convenient, etc) */
+            sub_from_sp( env, 16 );        // Move SP down
+
+            /* Get a quadword aligned address within our stack space */
+            r_aligned16 = get_sp_aligned16( env );
+
+            am_offlo  = PPCAMode_IR( 0,  r_aligned16 );
+            am_offhi  = PPCAMode_IR( 8,  r_aligned16 );
+
+            addInstr(env, PPCInstr_FpLdSt(False/*store*/, 8,
+                                          r_src_lo, am_offlo));
+            addInstr(env, PPCInstr_FpLdSt(False/*store*/, 8,
+                                          r_src_hi, am_offhi));
+
+            /* Fetch result back from stack. */
+            addInstr(env, PPCInstr_AvLdSt(True/*load*/, 16,
+                                          dst, am_offlo));
+
+            add_to_sp( env, 16 );          // Reset SP
+            return dst;
+         }
+      case Iop_F128toI128S:
+         {
+            HReg dst    = newVRegV(env);
+            HReg r_src  = iselFp128Expr(env, e->Iex.Binop.arg2, IEndianess);
+            PPCRI* rm = iselWordExpr_RI(env, e->Iex.Binop.arg1, IEndianess);
+            /* Note: rm is a set of three bit fields that specify the
+             * rounding mode and which of the two instructions to issue.
+             */
+            addInstr(env, PPCInstr_AvBinaryInt(Pav_F128toI128S, dst,
+                                               r_src, rm));
+            return dst;
+         }
+      case Iop_RndF128:
+         {
+            HReg dst    = newVRegV(env);
+            HReg r_src  = iselFp128Expr(env, e->Iex.Binop.arg2, IEndianess);
+            PPCRI* rm = iselWordExpr_RI(env, e->Iex.Binop.arg1, IEndianess);
+            /* Note: rm is a set of three bit fields that specify the
+             * rounding mode and which of the two instructions to issue.
+             */
+            addInstr(env, PPCInstr_AvBinaryInt(Pav_ROUNDFPQ, dst,
+                                               r_src, rm));
+            return dst;
+         }
+      case Iop_SqrtF128:
+         if (FPU_rounding_mode_isOdd(e->Iex.Binop.arg1)) {
+            /* use rounding mode specified by RN. Issue inst with R0 = 0 */
+            fpop = Pfp_FPSQRTQRNDODD;
+            goto do_Bin_F128;
+         } else {
+            set_FPU_rounding_mode( env, e->Iex.Binop.arg1, IEndianess );
+            fpop = Pfp_FPSQRTQ;
+            goto do_Bin_F128;
+         }
+      case Iop_F128toF32:
+         if (FPU_rounding_mode_isOdd(e->Iex.Binop.arg1)) {
+            /* use rounding mode specified by RN. Issue inst with R0 = 0 */
+            fpop = Pfp_FPQTOWRNDODD;
+            goto do_Bin_F128;
+         } else {
+            set_FPU_rounding_mode( env, e->Iex.Binop.arg1, IEndianess );
+            fpop = Pfp_FPQTOW;
+            goto do_Bin_F128;
+         }
+      do_Bin_F128: {
+         HReg r_dst = newVRegV(env);
+         HReg r_src = iselFp128Expr(env, e->Iex.Binop.arg2, IEndianess);
+         addInstr(env, PPCInstr_Fp128Unary(fpop, r_dst, r_src));
+         return r_dst;
+      }
+
+      default:
+         break;
+      } /* switch (e->Iex.Binop.op) */
+   } /* if (e->tag == Iex_Binop) */
+
+   if (e->tag == Iex_Triop) {
+      IRTriop *triop = e->Iex.Triop.details;
+
+      switch (triop->op) {
+      case Iop_AddF128:
+         if (FPU_rounding_mode_isOdd(triop->arg1)) {
+            /* use rounding mode specified by RN. Issue inst with R0 = 0 */
+            fpop = Pfp_FPADDQRNDODD; goto do_Tri_F128;
+         } else {
+            set_FPU_rounding_mode( env, triop->arg1, IEndianess );
+            fpop = Pfp_FPADDQ; goto do_Tri_F128;
+         }
+      case Iop_SubF128:
+         if (FPU_rounding_mode_isOdd(triop->arg1)) {
+            /* use rounding mode specified by RN. Issue inst with R0 = 0 */
+            fpop = Pfp_FPSUBQRNDODD; goto do_Tri_F128;
+         } else {
+            set_FPU_rounding_mode( env, triop->arg1, IEndianess );
+            fpop = Pfp_FPSUBQ; goto do_Tri_F128;
+         }
+      case Iop_MulF128:
+         if (FPU_rounding_mode_isOdd(triop->arg1)) {
+            /* use rounding mode specified by RN. Issue inst with R0 = 0 */
+            fpop = Pfp_FPMULQRNDODD; goto do_Tri_F128;
+         } else {
+            set_FPU_rounding_mode( env, triop->arg1, IEndianess );
+            fpop = Pfp_FPMULQ; goto do_Tri_F128;
+         }
+      case Iop_DivF128:
+         if (FPU_rounding_mode_isOdd(triop->arg1)) {
+            /* use rounding mode specified by RN. Issue inst with R0 = 0 */
+            fpop = Pfp_FPDIVQRNDODD; goto do_Tri_F128;
+         } else {
+            set_FPU_rounding_mode( env, triop->arg1, IEndianess );
+            fpop = Pfp_FPDIVQ; goto do_Tri_F128;
+         }
+      case Iop_MAddF128:
+         if (FPU_rounding_mode_isOdd(triop->arg1)) {
+            /* use rounding mode specified by RN. Issue inst with R0 = 0 */
+            fpop = Pfp_FPMULADDQRNDODD; goto do_Tri_F128;
+         } else {
+            set_FPU_rounding_mode( env, triop->arg1, IEndianess );
+            fpop = Pfp_FPMULADDQ; goto do_Tri_F128;
+         }
+
+   do_Tri_F128: {
+         HReg r_dst  = newVRegV(env);
+         HReg r_srcL = iselFp128Expr(env, triop->arg2, IEndianess);
+         HReg r_srcR = iselFp128Expr(env, triop->arg3, IEndianess);
+
+         addInstr(env, PPCInstr_Fp128Binary(fpop, r_dst, r_srcL, r_srcR));
+         return r_dst;
+      }
+
+      default:
+         break;
+      } /* switch (e->Iex.Triop.op) */
+
+   } /* if (e->tag == Iex_Trinop) */
+
+   if (e->tag == Iex_Qop) {
+      IRQop *qop = e->Iex.Qop.details;
+
+      switch (qop->op) {
+      case Iop_MAddF128:
+         if (FPU_rounding_mode_isOdd(qop->arg1)) {
+            /* use rounding mode specified by RN. Issue inst with R0 = 0 */
+            fpop = Pfp_FPMULADDQRNDODD; goto do_Quad_F128;
+         } else {
+            set_FPU_rounding_mode( env, qop->arg1, IEndianess );
+            fpop = Pfp_FPMULADDQ; goto do_Quad_F128;
+         }
+      case Iop_MSubF128:
+         if (FPU_rounding_mode_isOdd(qop->arg1)) {
+            /* use rounding mode specified by RN. Issue inst with R0 = 0 */
+            fpop = Pfp_FPMULSUBQRNDODD; goto do_Quad_F128;
+         } else {
+            set_FPU_rounding_mode( env, qop->arg1, IEndianess );
+            fpop = Pfp_FPMULSUBQ; goto do_Quad_F128;
+         }
+      case Iop_NegMAddF128:
+         if (FPU_rounding_mode_isOdd(qop->arg1)) {
+            /* use rounding mode specified by RN. Issue inst with R0 = 0 */
+            fpop = Pfp_FPNEGMULADDQRNDODD; goto do_Quad_F128;
+         } else {
+            set_FPU_rounding_mode( env, qop->arg1, IEndianess );
+            fpop = Pfp_FPNEGMULADDQ; goto do_Quad_F128;
+         }
+      case Iop_NegMSubF128:
+         if (FPU_rounding_mode_isOdd(qop->arg1)) {
+            /* use rounding mode specified by RN. Issue inst with R0 = 0 */
+            fpop = Pfp_FPNEGMULSUBQRNDODD; goto do_Quad_F128;
+         } else {
+            set_FPU_rounding_mode( env, qop->arg1, IEndianess );
+            fpop = Pfp_FPNEGMULSUBQ; goto do_Quad_F128;
+         }
+
+      do_Quad_F128: {
+         HReg r_dst = iselFp128Expr(env, qop->arg3,
+                                    IEndianess);
+         HReg r_srcL = iselFp128Expr(env, qop->arg2,
+                                     IEndianess);
+         HReg r_srcR = iselFp128Expr(env, qop->arg4,
+                                     IEndianess);
+
+         addInstr(env, PPCInstr_Fp128Trinary(fpop, r_dst, r_srcL, r_srcR));
+         return r_dst;
+         }
+
+      default:
+         break;
+      }
+   }   /* if (e->tag == Iex_Qop) */
+
+   ppIRExpr( e );
+   vpanic( "iselFp128Expr(ppc64)" );
+}
+
 static HReg iselDfp64Expr(ISelEnv* env, IRExpr* e, IREndness IEndianess)
 {
    HReg r = iselDfp64Expr_wrk( env, e, IEndianess );
@@ -4932,6 +5453,101 @@
    if (e->tag == Iex_Unop) {
       switch (e->Iex.Unop.op) {
 
+      case Iop_F16toF64x2:
+         {
+            HReg dst = newVRegV(env);
+            HReg arg  = iselVecExpr(env, e->Iex.Unop.arg, IEndianess);
+            /* Note: PPC only coverts the 16-bt value in the upper word
+             *       to a 64-bit value stored in the upper word.  The
+             *       contents of the lower word is undefined.
+             */
+            addInstr(env, PPCInstr_AvUnary(Pav_F16toF64x2, dst, arg));
+            return dst;
+         }
+
+      case Iop_F64toF16x2:
+         {
+            HReg dst = newVRegV(env);
+            HReg arg  = iselVecExpr(env, e->Iex.Unop.arg, IEndianess);
+            /* Note: PPC only coverts the 64-bt value in the upper 64-bit of V128
+             * to a 16-bit value stored in the upper 64-bits of the result
+             * V128.  The contents of the lower 64-bits is undefined.
+             */
+            addInstr(env, PPCInstr_AvUnary(Pav_F64toF16x2, dst, arg));
+            return dst;
+         }
+
+      case Iop_F16toF32x4:
+         {
+            HReg src = newVRegV(env);
+            HReg dst = newVRegV(env);
+            HReg arg = iselWordExpr_R(env, e->Iex.Unop.arg, IEndianess);
+            PPCAMode *am_off0, *am_off8;
+            HReg r_aligned16;
+
+            vassert(mode64);
+            /* need to put I64 src into upper 64-bits of vector register,
+               use stack */
+            sub_from_sp( env, 32 );     // Move SP down
+
+            /* Get a quadword aligned address within our stack space */
+            r_aligned16 = get_sp_aligned16( env );
+            am_off0  = PPCAMode_IR( 0, r_aligned16 );
+            am_off8  = PPCAMode_IR( 8, r_aligned16 );
+
+            /* Store I64 to stack */
+
+            if (IEndianess == Iend_LE) {
+               addInstr(env, PPCInstr_Store( 8, am_off8, arg, mode64 ));
+            } else {
+               addInstr(env, PPCInstr_Store( 8, am_off0, arg, mode64 ));
+            }
+
+            /* Fetch new v128 src back from stack. */
+            addInstr(env, PPCInstr_AvLdSt(True/*ld*/, 16, src, am_off0));
+
+            /* issue instruction */
+            addInstr(env, PPCInstr_AvUnary(Pav_F16toF32x4, dst, src));
+            add_to_sp( env, 32 );          // Reset SP
+
+            return dst;
+         }
+
+      case Iop_F32toF16x4:
+         {
+            HReg dst = newVRegI(env);
+            HReg tmp = newVRegV(env);
+            HReg arg  = iselVecExpr(env, e->Iex.Unop.arg, IEndianess);
+            PPCAMode *am_off0, *am_off8;
+            HReg r_aligned16;
+
+            /* Instruction returns a V128, the Iop_F32toF16x4 needs to return
+             * I64.  Move the upper 64-bits from the instruction to an I64 via
+             * the stack and return it.
+             */
+            sub_from_sp( env, 32 );     // Move SP down
+
+            addInstr(env, PPCInstr_AvUnary(Pav_F32toF16x4, tmp, arg));
+
+            /* Get a quadword aligned address within our stack space */
+            r_aligned16 = get_sp_aligned16( env );
+            am_off0  = PPCAMode_IR( 0, r_aligned16 );
+            am_off8  = PPCAMode_IR( 8, r_aligned16 );
+
+            /* Store v128 tmp to stack. */
+            addInstr(env, PPCInstr_AvLdSt(False/*store*/, 16, tmp, am_off0));
+
+            /* Fetch I64 from stack */
+            if (IEndianess == Iend_LE) {
+               addInstr(env, PPCInstr_Load( 8, dst, am_off8, mode64 ));
+            } else {
+               addInstr(env, PPCInstr_Load( 8, dst, am_off0, mode64 ));
+            }
+
+            add_to_sp( env, 32 );          // Reset SP
+            return dst;
+         }
+
       case Iop_NotV128: {
          HReg arg = iselVecExpr(env, e->Iex.Unop.arg, IEndianess);
          HReg dst = newVRegV(env);
@@ -5054,6 +5670,10 @@
       case Iop_Clz16x8: op = Pav_ZEROCNTHALF;   goto do_zerocnt;
       case Iop_Clz32x4: op = Pav_ZEROCNTWORD;   goto do_zerocnt;
       case Iop_Clz64x2: op = Pav_ZEROCNTDBL;    goto do_zerocnt;
+      case Iop_Ctz8x16: op = Pav_TRAILINGZEROCNTBYTE; goto do_zerocnt;
+      case Iop_Ctz16x8: op = Pav_TRAILINGZEROCNTHALF; goto do_zerocnt;
+      case Iop_Ctz32x4: op = Pav_TRAILINGZEROCNTWORD; goto do_zerocnt;
+      case Iop_Ctz64x2: op = Pav_TRAILINGZEROCNTDBL;  goto do_zerocnt;
       case Iop_PwBitMtxXpose64x2: op = Pav_BITMTXXPOSE;  goto do_zerocnt;
       do_zerocnt:
       {
@@ -5063,6 +5683,24 @@
         return dst;
       }
 
+      /* BCD Iops */
+      case Iop_BCD128toI128S:
+         {
+            HReg dst  = newVRegV(env);
+            HReg arg  = iselVecExpr(env, e->Iex.Unop.arg, IEndianess);
+            addInstr(env, PPCInstr_AvUnary( Pav_BCD128toI128S, dst, arg ) );
+            return dst;
+         }
+
+      case Iop_MulI128by10:       op = Pav_MulI128by10;      goto do_MulI128;
+      case Iop_MulI128by10Carry:  op = Pav_MulI128by10Carry; goto do_MulI128;
+      do_MulI128: {
+            HReg dst = newVRegV(env);
+            HReg arg = iselVecExpr(env, e->Iex.Unop.arg, IEndianess);
+            addInstr(env, PPCInstr_AvUnary(op, dst, arg));
+            return dst;
+         }
+
       default:
          break;
       } /* switch (e->Iex.Unop.op) */
@@ -5392,6 +6030,39 @@
          addInstr(env, PPCInstr_AvHashV128Binary(op, dst, arg1, s_field));
          return dst;
       }
+
+      /* BCD Iops */
+      case Iop_I128StoBCD128:
+         {
+            HReg dst = newVRegV(env);
+            HReg arg = iselVecExpr(env, e->Iex.Binop.arg1, IEndianess);
+            PPCRI* ps = iselWordExpr_RI(env, e->Iex.Binop.arg2, IEndianess);
+
+            addInstr(env, PPCInstr_AvBinaryInt( Pav_I128StoBCD128, dst, arg,
+                                                ps ) );
+            return dst;
+         }
+
+      case Iop_MulI128by10E:       op = Pav_MulI128by10E;      goto do_MulI128E;
+      case Iop_MulI128by10ECarry:  op = Pav_MulI128by10ECarry; goto do_MulI128E;
+      do_MulI128E: {
+            HReg dst  = newVRegV(env);
+            HReg argL = iselVecExpr(env, e->Iex.Binop.arg1, IEndianess);
+            HReg argR = iselVecExpr(env, e->Iex.Binop.arg2, IEndianess);
+            addInstr(env, PPCInstr_AvBinary(op, dst, argL, argR));
+            return dst;
+         }
+
+      case Iop_BCDAdd:op = Pav_BCDAdd; goto do_AvBCDV128;
+      case Iop_BCDSub:op = Pav_BCDSub; goto do_AvBCDV128;
+      do_AvBCDV128: {
+         HReg arg1 = iselVecExpr(env, e->Iex.Binop.arg1, IEndianess);
+         HReg arg2 = iselVecExpr(env, e->Iex.Binop.arg2, IEndianess);
+         HReg dst  = newVRegV(env);
+         addInstr(env, PPCInstr_AvBCDV128Binary(op, dst, arg1, arg2));
+         return dst;
+      }
+
       default:
          break;
       } /* switch (e->Iex.Binop.op) */
@@ -5400,17 +6071,6 @@
    if (e->tag == Iex_Triop) {
       IRTriop *triop = e->Iex.Triop.details;
       switch (triop->op) {
-      case Iop_BCDAdd:op = Pav_BCDAdd; goto do_AvBCDV128;
-      case Iop_BCDSub:op = Pav_BCDSub; goto do_AvBCDV128;
-      do_AvBCDV128: {
-         HReg arg1 = iselVecExpr(env, triop->arg1, IEndianess);
-         HReg arg2 = iselVecExpr(env, triop->arg2, IEndianess);
-         HReg dst  = newVRegV(env);
-         PPCRI* ps = iselWordExpr_RI(env, triop->arg3, IEndianess);
-         addInstr(env, PPCInstr_AvBCDV128Trinary(op, dst, arg1, arg2, ps));
-         return dst;
-      }
-
       case Iop_Add32Fx4: fpop = Pavfp_ADDF; goto do_32Fx4_with_rm;
       case Iop_Sub32Fx4: fpop = Pavfp_SUBF; goto do_32Fx4_with_rm;
       case Iop_Mul32Fx4: fpop = Pavfp_MULF; goto do_32Fx4_with_rm;
@@ -5579,8 +6239,30 @@
          addInstr(env, PPCInstr_Store( 4, am_addr,  rHi, mode64 ));
          addInstr(env, PPCInstr_Store( 4, am_addr4, rLo, mode64 ));
          return;
-     }
-     if (ty == Ity_V128) {
+      }
+      if (ty == Ity_I128) {
+         HReg rHi, rLo;
+         PPCAMode* am_addr  = PPCAMode_IR( stmt->Ist.Put.offset,
+                                           GuestStatePtr(mode64) );
+         PPCAMode* am_addr4 = advance4(env, am_addr);
+
+         iselInt128Expr(&rHi,&rLo, env, stmt->Ist.Put.data, IEndianess);
+         addInstr(env, PPCInstr_Store( 4, am_addr,  rHi, mode64 ));
+         addInstr(env, PPCInstr_Store( 4, am_addr4, rLo, mode64 ));
+         return;
+      }
+      if (ty == Ity_F128) {
+         /* Guest state vectors are 16byte aligned,
+            so don't need to worry here */
+         HReg v_src = iselFp128Expr(env, stmt->Ist.Put.data, IEndianess);
+
+         PPCAMode* am_addr  = PPCAMode_IR( stmt->Ist.Put.offset,
+                                           GuestStatePtr(mode64) );
+         addInstr(env,
+                  PPCInstr_AvLdSt(False/*store*/, 16, v_src, am_addr));
+         return;
+      }
+      if (ty == Ity_V128) {
          /* Guest state vectors are 16byte aligned,
             so don't need to worry here */
          HReg v_src = iselVecExpr(env, stmt->Ist.Put.data, IEndianess);
@@ -5714,6 +6396,12 @@
          addInstr(env, PPCInstr_Dfp64Unary(Pfp_MOV, fr_dst, fr_src));
          return;
       }
+      if (ty == Ity_F128) {
+         HReg v_dst = lookupIRTemp(env, tmp);
+         HReg v_src = iselFp128Expr(env, stmt->Ist.WrTmp.data, IEndianess);
+         addInstr(env, PPCInstr_AvUnary(Pav_MOV, v_dst, v_src));
+         return;
+      }
       if (ty == Ity_V128) {
          HReg v_dst = lookupIRTemp(env, tmp);
          HReg v_src = iselVecExpr(env, stmt->Ist.WrTmp.data, IEndianess);
@@ -6140,15 +6828,16 @@
    vassert(arch_host == VexArchPPC32 || arch_host == VexArchPPC64);
    mode64 = arch_host == VexArchPPC64;
 
-   /* do some sanity checks */
+   /* do some sanity checks,
+    * Note: no 32-bit support for ISA 3.0
+    */
    mask32 = VEX_HWCAPS_PPC32_F | VEX_HWCAPS_PPC32_V
             | VEX_HWCAPS_PPC32_FX | VEX_HWCAPS_PPC32_GX | VEX_HWCAPS_PPC32_VX
             | VEX_HWCAPS_PPC32_DFP | VEX_HWCAPS_PPC32_ISA2_07;
 
-
    mask64 = VEX_HWCAPS_PPC64_V | VEX_HWCAPS_PPC64_FX
             | VEX_HWCAPS_PPC64_GX | VEX_HWCAPS_PPC64_VX | VEX_HWCAPS_PPC64_DFP
-            | VEX_HWCAPS_PPC64_ISA2_07;
+            | VEX_HWCAPS_PPC64_ISA2_07 | VEX_HWCAPS_PPC64_ISA3_0;
 
    if (mode64) {
       vassert((hwcaps_host & mask32) == 0);
@@ -6241,6 +6930,7 @@
       case Ity_F64:
          hregLo = mkHReg(True, HRcFlt64, 0, j++);
          break;
+      case Ity_F128:
       case Ity_V128:
          hregLo = mkHReg(True, HRcVec128, 0, j++);
          break;
diff --git a/VEX/priv/ir_defs.c b/VEX/priv/ir_defs.c
index 4a7b770..4940ffe 100644
--- a/VEX/priv/ir_defs.c
+++ b/VEX/priv/ir_defs.c
@@ -281,6 +281,17 @@
       case Iop_SubF128:   vex_printf("SubF128");  return;
       case Iop_MulF128:   vex_printf("MulF128");  return;
       case Iop_DivF128:   vex_printf("DivF128");  return;
+
+      case Iop_TruncF128toI64S:   vex_printf("TruncF128toI64S");  return;
+      case Iop_TruncF128toI32S:   vex_printf("TruncF128toI32S");  return;
+      case Iop_TruncF128toI64U:   vex_printf("TruncF128toI64U");  return;
+      case Iop_TruncF128toI32U:   vex_printf("TruncF128toI32U");  return;
+
+      case Iop_MAddF128:   vex_printf("MAddF128");  return;
+      case Iop_MSubF128:   vex_printf("MSubF128");  return;
+      case Iop_NegMAddF128:   vex_printf("NegMAddF128");  return;
+      case Iop_NegMSubF128:   vex_printf("NegMSubF128");  return;
+
       case Iop_AbsF128:   vex_printf("AbsF128");  return;
       case Iop_NegF128:   vex_printf("NegF128");  return;
       case Iop_SqrtF128:  vex_printf("SqrtF128"); return;
@@ -301,6 +312,8 @@
       case Iop_F64toF128:  vex_printf("F64toF128");  return;
       case Iop_F128toF64:  vex_printf("F128toF64");  return;
       case Iop_F128toF32:  vex_printf("F128toF32");  return;
+      case Iop_F128toI128S: vex_printf("F128toI128");  return;
+      case Iop_RndF128:    vex_printf("RndF128");  return;
 
         /* s390 specific */
       case Iop_MAddF32:    vex_printf("s390_MAddF32"); return;
@@ -411,6 +424,8 @@
 
       case Iop_F32toF16x4: vex_printf("F32toF16x4"); return;
       case Iop_F16toF32x4: vex_printf("F16toF32x4"); return;
+      case Iop_F16toF64x2: vex_printf("F16toF64x2"); return;
+      case Iop_F64toF16x2: vex_printf("F64toF16x2"); return;
 
       case Iop_RSqrtEst32Fx4: vex_printf("RSqrtEst32Fx4"); return;
       case Iop_RSqrtEst32Ux4: vex_printf("RSqrtEst32Ux4"); return;
@@ -865,6 +880,10 @@
       case Iop_Cls8x16: vex_printf("Cls8x16"); return;
       case Iop_Cls16x8: vex_printf("Cls16x8"); return;
       case Iop_Cls32x4: vex_printf("Cls32x4"); return;
+      case Iop_Ctz8x16: vex_printf("Iop_Ctz8x16"); return;
+      case Iop_Ctz16x8: vex_printf("Iop_Ctz16x8"); return;
+      case Iop_Ctz32x4: vex_printf("Iop_Ctz32x4"); return;
+      case Iop_Ctz64x2: vex_printf("Iop_Ctz64x2"); return;
 
       case Iop_ShlV128: vex_printf("ShlV128"); return;
       case Iop_ShrV128: vex_printf("ShrV128"); return;
@@ -1247,6 +1266,8 @@
       case Iop_SHA512:  vex_printf("SHA512"); return;
       case Iop_BCDAdd:  vex_printf("BCDAdd"); return;
       case Iop_BCDSub:  vex_printf("BCDSub"); return;
+      case Iop_I128StoBCD128:  vex_printf("bcdcfsq."); return;
+      case Iop_BCD128toI128S:  vex_printf("bcdctsq."); return;
 
       case Iop_PwBitMtxXpose64x2: vex_printf("BitMatrixTranspose64x2"); return;
 
@@ -1986,6 +2007,45 @@
    vec[8] = NULL;
    return vec;
 }
+IRExpr** mkIRExprVec_9 ( IRExpr* arg1, IRExpr* arg2, IRExpr* arg3,
+                         IRExpr* arg4, IRExpr* arg5, IRExpr* arg6,
+                         IRExpr* arg7, IRExpr* arg8, IRExpr* arg9 ) {
+   IRExpr** vec = LibVEX_Alloc_inline(10 * sizeof(IRExpr*));
+   vec[0] = arg1;
+   vec[1] = arg2;
+   vec[2] = arg3;
+   vec[3] = arg4;
+   vec[4] = arg5;
+   vec[5] = arg6;
+   vec[6] = arg7;
+   vec[7] = arg8;
+   vec[8] = arg9;
+   vec[9] = NULL;
+   return vec;
+}
+IRExpr** mkIRExprVec_13 ( IRExpr* arg1,  IRExpr* arg2,  IRExpr* arg3,
+                          IRExpr* arg4,  IRExpr* arg5,  IRExpr* arg6,
+                          IRExpr* arg7,  IRExpr* arg8,  IRExpr* arg9,
+                          IRExpr* arg10, IRExpr* arg11, IRExpr* arg12,
+                          IRExpr* arg13
+ ) {
+   IRExpr** vec = LibVEX_Alloc_inline(14 * sizeof(IRExpr*));
+   vec[0]  = arg1;
+   vec[1]  = arg2;
+   vec[2]  = arg3;
+   vec[3]  = arg4;
+   vec[4]  = arg5;
+   vec[5]  = arg6;
+   vec[6]  = arg7;
+   vec[7]  = arg8;
+   vec[8]  = arg9;
+   vec[9]  = arg10;
+   vec[10] = arg11;
+   vec[11] = arg12;
+   vec[12] = arg13;
+   vec[13] = NULL;
+   return vec;
+}
 
 
 /* Constructors -- IRDirty */
@@ -3013,6 +3073,8 @@
       case Iop_Rsh32Sx4: case Iop_Rsh64Sx2:
       case Iop_Rsh8Ux16: case Iop_Rsh16Ux8:
       case Iop_Rsh32Ux4: case Iop_Rsh64Ux2:
+      case Iop_MulI128by10E:
+      case Iop_MulI128by10ECarry:
          BINARY(Ity_V128,Ity_V128, Ity_V128);
 
       case Iop_PolynomialMull8x8:
@@ -3046,6 +3108,13 @@
       case Iop_PwBitMtxXpose64x2:
       case Iop_ZeroHI64ofV128:  case Iop_ZeroHI96ofV128:
       case Iop_ZeroHI112ofV128: case Iop_ZeroHI120ofV128:
+      case Iop_F16toF64x2:
+      case Iop_F64toF16x2:
+      case Iop_MulI128by10:
+      case Iop_MulI128by10Carry:
+      case Iop_Ctz8x16: case Iop_Ctz16x8:
+      case Iop_Ctz32x4: case Iop_Ctz64x2:
+      case Iop_BCD128toI128S:
          UNARY(Ity_V128, Ity_V128);
 
       case Iop_ShlV128: case Iop_ShrV128:
@@ -3080,7 +3149,8 @@
       case Iop_QandQRSarNnarrow16Sto8Ux8:
       case Iop_QandQRSarNnarrow32Sto16Ux4:
       case Iop_QandQRSarNnarrow64Sto32Ux2:
-         BINARY(Ity_V128,Ity_I8, Ity_V128);
+      case Iop_I128StoBCD128:
+         BINARY(Ity_V128, Ity_I8, Ity_V128);
 
       case Iop_F32ToFixed32Ux4_RZ:
       case Iop_F32ToFixed32Sx4_RZ:
@@ -3122,7 +3192,8 @@
 
       case Iop_BCDAdd:
       case Iop_BCDSub:
-         TERNARY(Ity_V128,Ity_V128, Ity_I8, Ity_V128);
+         BINARY(Ity_V128, Ity_V128, Ity_V128);
+
       case Iop_QDMull16Sx4: case Iop_QDMull32Sx2:
          BINARY(Ity_I64, Ity_I64, Ity_V128);
 
@@ -3144,6 +3215,12 @@
       case Iop_DivF128:
          TERNARY(ity_RMode,Ity_F128,Ity_F128, Ity_F128);
 
+      case Iop_MAddF128:
+      case Iop_MSubF128:
+      case Iop_NegMAddF128:
+      case Iop_NegMSubF128:
+         QUATERNARY(ity_RMode,Ity_F128,Ity_F128,Ity_F128, Ity_F128);
+
       case Iop_Add64Fx2: case Iop_Sub64Fx2:
       case Iop_Mul64Fx2: case Iop_Div64Fx2: 
       case Iop_Add32Fx4: case Iop_Sub32Fx4:
@@ -3182,6 +3259,16 @@
       case Iop_F128toF32: BINARY(ity_RMode,Ity_F128, Ity_F32);
       case Iop_F128toF64: BINARY(ity_RMode,Ity_F128, Ity_F64);
 
+      case Iop_TruncF128toI32S:
+      case Iop_TruncF128toI64S:
+      case Iop_TruncF128toI32U:
+      case Iop_TruncF128toI64U:
+         UNARY(Ity_F128, Ity_F128);
+
+      case Iop_F128toI128S:
+      case Iop_RndF128:
+         BINARY(ity_RMode,Ity_F128, Ity_F128);
+
       case Iop_D32toD64:
          UNARY(Ity_D32, Ity_D64);
 
diff --git a/VEX/priv/ir_inject.c b/VEX/priv/ir_inject.c
index 93b4c1c..94b0fdf 100644
--- a/VEX/priv/ir_inject.c
+++ b/VEX/priv/ir_inject.c
@@ -36,6 +36,7 @@
 
 /* Convenience macros for readibility */
 #define mkU8(v)   IRExpr_Const(IRConst_U8(v))
+#define mkU16(v)  IRExpr_Const(IRConst_U16(v))
 #define mkU32(v)  IRExpr_Const(IRConst_U32(v))
 #define mkU64(v)  IRExpr_Const(IRConst_U64(v))
 #define unop(kind, a)  IRExpr_Unop(kind, a)
@@ -209,12 +210,26 @@
 
    case 2:
       opnd1 = load(endian, iricb.t_opnd1, iricb.opnd1);
+      /* HACK, compiler warning ‘opnd2’ may be used uninitialized */
+      opnd2 = opnd1;
 
-      if (iricb.shift_amount_is_immediate) {
-         // This implies that the IROp is a shift op
-         vassert(iricb.t_opnd2 == Ity_I8);
+      /* immediate_index = 0  immediate value is not used.
+       * immediate_index = 2  opnd2 is an immediate value.
+       */
+      vassert(iricb.immediate_index == 0 || iricb.immediate_index == 2);
+
+      if (iricb.immediate_index == 2) {
+         vassert((iricb.t_opnd2 == Ity_I8) || (iricb.t_opnd2 == Ity_I16)
+                 || (iricb.t_opnd2 == Ity_I32));
+
          /* Interpret the memory as an ULong. */
-         opnd2 = mkU8(*((ULong *)iricb.opnd2));
+         if (iricb.immediate_type == Ity_I8) {
+            opnd2 = mkU8(*((ULong *)iricb.opnd2));
+         } else if (iricb.immediate_type == Ity_I16) {
+            opnd2 = mkU16(*((ULong *)iricb.opnd2));
+         } else if (iricb.immediate_type == Ity_I32) {
+            opnd2 = mkU32(*((ULong *)iricb.opnd2));
+         }
       } else {
          opnd2 = load(endian, iricb.t_opnd2, iricb.opnd2);
       }
@@ -228,7 +243,28 @@
    case 3:
       opnd1 = load(endian, iricb.t_opnd1, iricb.opnd1);
       opnd2 = load(endian, iricb.t_opnd2, iricb.opnd2);
-      opnd3 = load(endian, iricb.t_opnd3, iricb.opnd3);
+      /* HACK, compiler warning ‘opnd3’ may be used uninitialized */
+      opnd3 = opnd2;
+
+      /* immediate_index = 0  immediate value is not used.
+       * immediate_index = 3  opnd3 is an immediate value.
+       */
+      vassert(iricb.immediate_index == 0 || iricb.immediate_index == 3);
+
+      if (iricb.immediate_index == 3) {
+         vassert((iricb.t_opnd3 == Ity_I8) || (iricb.t_opnd3 == Ity_I16)
+                 || (iricb.t_opnd2 == Ity_I32));
+
+         if (iricb.immediate_type == Ity_I8) {
+            opnd3 = mkU8(*((ULong *)iricb.opnd3));
+         } else if (iricb.immediate_type == Ity_I16) {
+            opnd3 = mkU16(*((ULong *)iricb.opnd3));
+         } else if (iricb.immediate_type == Ity_I32) {
+            opnd3 = mkU32(*((ULong *)iricb.opnd3));
+         }
+      } else {
+         opnd3 = load(endian, iricb.t_opnd3, iricb.opnd3);
+      }
       if (rounding_mode)
          data = qop(iricb.op, rounding_mode, opnd1, opnd2, opnd3);
       else
@@ -240,7 +276,28 @@
       opnd1 = load(endian, iricb.t_opnd1, iricb.opnd1);
       opnd2 = load(endian, iricb.t_opnd2, iricb.opnd2);
       opnd3 = load(endian, iricb.t_opnd3, iricb.opnd3);
-      opnd4 = load(endian, iricb.t_opnd4, iricb.opnd4);
+      /* HACK, compiler warning ‘opnd4’ may be used uninitialized */
+      opnd4 = opnd3;
+
+      /* immediate_index = 0  immediate value is not used.
+       * immediate_index = 4  opnd4 is an immediate value.
+       */
+      vassert(iricb.immediate_index == 0 || iricb.immediate_index == 4);
+
+      if (iricb.immediate_index == 4) {
+         vassert((iricb.t_opnd3 == Ity_I8) || (iricb.t_opnd3 == Ity_I16)
+                 || (iricb.t_opnd2 == Ity_I32));
+
+         if (iricb.immediate_type == Ity_I8) {
+            opnd4 = mkU8(*((ULong *)iricb.opnd4));
+         } else if (iricb.immediate_type == Ity_I16) {
+            opnd4 = mkU16(*((ULong *)iricb.opnd4));
+         } else if (iricb.immediate_type == Ity_I32) {
+            opnd4 = mkU32(*((ULong *)iricb.opnd4));
+         }
+      } else {
+         opnd4 = load(endian, iricb.t_opnd4, iricb.opnd4);
+      }
       data = qop(iricb.op, opnd1, opnd2, opnd3, opnd4);
       break;
 
diff --git a/VEX/priv/ir_opt.c b/VEX/priv/ir_opt.c
index 7a7246a..4266823 100644
--- a/VEX/priv/ir_opt.c
+++ b/VEX/priv/ir_opt.c
@@ -1223,6 +1223,7 @@
       case Ico_U16:   return toBool( e->Iex.Const.con->Ico.U16 == 0);
       case Ico_U32:   return toBool( e->Iex.Const.con->Ico.U32 == 0);
       case Ico_U64:   return toBool( e->Iex.Const.con->Ico.U64 == 0);
+      case Ico_V256:  return toBool( e->Iex.Const.con->Ico.V256 == 0x00000000);
       default: vpanic("isZeroU");
    }
 }
diff --git a/VEX/priv/main_main.c b/VEX/priv/main_main.c
index 2e51ded..e263754 100644
--- a/VEX/priv/main_main.c
+++ b/VEX/priv/main_main.c
@@ -916,8 +916,13 @@
    irsb = do_iropt_BB ( irsb, specHelper, preciseMemExnsFn, pxControl,
                               vta->guest_bytes_addr,
                               vta->arch_guest );
-   sanityCheckIRSB( irsb, "after initial iropt", 
-                    True/*must be flat*/, guest_word_type );
+
+   // JRS 2016 Aug 03: Sanity checking is expensive, we already checked
+   // the output of the front end, and iropt never screws up the IR by
+   // itself, unless it is being hacked on.  So remove this post-iropt
+   // check in "production" use.
+   // sanityCheckIRSB( irsb, "after initial iropt", 
+   //                  True/*must be flat*/, guest_word_type );
 
    if (vex_traceflags & VEX_TRACE_OPT1) {
       vex_printf("\n------------------------" 
@@ -953,9 +958,12 @@
       vex_printf("\n");
    }
 
-   if (vta->instrument1 || vta->instrument2)
-      sanityCheckIRSB( irsb, "after instrumentation",
-                       True/*must be flat*/, guest_word_type );
+   // JRS 2016 Aug 03: as above, this never actually fails in practice.
+   // And we'll sanity check anyway after the post-instrumentation
+   // cleanup pass.  So skip this check in "production" use.
+   // if (vta->instrument1 || vta->instrument2)
+   //    sanityCheckIRSB( irsb, "after instrumentation",
+   //                     True/*must be flat*/, guest_word_type );
 
    /* Do a post-instrumentation cleanup pass. */
    if (vta->instrument1 || vta->instrument2) {
@@ -1547,6 +1555,7 @@
       { VEX_HWCAPS_PPC32_VX,      "VX"      },
       { VEX_HWCAPS_PPC32_DFP,     "DFP"     },
       { VEX_HWCAPS_PPC32_ISA2_07, "ISA2_07" },
+      { VEX_HWCAPS_PPC32_ISA3_0,  "ISA3_0"  },
    };
    /* Allocate a large enough buffer */
    static HChar buf[sizeof prefix + 
@@ -1577,6 +1586,7 @@
       { VEX_HWCAPS_PPC64_V,       "vmx"     },
       { VEX_HWCAPS_PPC64_DFP,     "DFP"     },
       { VEX_HWCAPS_PPC64_ISA2_07, "ISA2_07" },
+      { VEX_HWCAPS_PPC64_ISA3_0,  "ISA3_0"  },
    };
    /* Allocate a large enough buffer */
    static HChar buf[sizeof prefix + 
@@ -1706,12 +1716,44 @@
       return "Cavium-baseline";
    }
 
+   /* Ingenic baseline. */
+   if (VEX_MIPS_COMP_ID(hwcaps) == VEX_PRID_COMP_INGENIC_E1) {
+      return "Ingenic-baseline";
+   }
+
+   /* Loongson baseline. */
+   if ((VEX_MIPS_COMP_ID(hwcaps) == VEX_PRID_COMP_LEGACY) &&
+       (VEX_MIPS_PROC_ID(hwcaps) == VEX_PRID_IMP_LOONGSON_64)) {
+      return "Loongson-baseline";
+   }
+
    return "Unsupported baseline";
 }
 
 static const HChar* show_hwcaps_mips64 ( UInt hwcaps )
 {
-   return "mips64-baseline";
+   /* Netlogic baseline. */
+   if (VEX_MIPS_COMP_ID(hwcaps) == VEX_PRID_COMP_NETLOGIC) {
+      return "Netlogic-baseline";
+   }
+
+   /* Cavium baseline. */
+   if (VEX_MIPS_COMP_ID(hwcaps) == VEX_PRID_COMP_CAVIUM) {
+      return "Cavium-baseline";
+   }
+
+   /* Loongson baseline. */
+   if ((VEX_MIPS_COMP_ID(hwcaps) == VEX_PRID_COMP_LEGACY) &&
+       (VEX_MIPS_PROC_ID(hwcaps) == VEX_PRID_IMP_LOONGSON_64)) {
+      return "Loongson-baseline";
+   }
+
+   /* MIPS64 baseline. */
+   if (VEX_MIPS_COMP_ID(hwcaps) == VEX_PRID_COMP_MIPS) {
+      return "mips64-baseline";
+   }
+
+   return "Unsupported baseline";
 }
 
 static const HChar* show_hwcaps_tilegx ( UInt hwcaps )
@@ -1835,6 +1877,12 @@
                invalid_hwcaps(arch, hwcaps,
                               "ISA2_07 requires DFP capabilities\n");
          }
+
+         /* ISA 3.0 not supported on 32-bit machines */
+         if ((hwcaps & VEX_HWCAPS_PPC32_ISA3_0) != 0) {
+            invalid_hwcaps(arch, hwcaps,
+                           "ISA 3.0 not supported in 32-bit mode \n");
+         }
          return;
       }
 
@@ -1871,13 +1919,30 @@
                invalid_hwcaps(arch, hwcaps,
                               "ISA2_07 requires DFP capabilities\n");
          }
+
+         /* ISA3_0 requires everything else */
+         if ((hwcaps & VEX_HWCAPS_PPC64_ISA3_0) != 0) {
+            if ( !((hwcaps
+                    & VEX_HWCAPS_PPC64_ISA2_07) == VEX_HWCAPS_PPC64_ISA2_07))
+               invalid_hwcaps(arch, hwcaps,
+                          "ISA3_0 requires ISA2_07 capabilities\n");
+            if ( !has_v_fx_gx)
+               invalid_hwcaps(arch, hwcaps,
+                        "ISA3_0 requires VMX and FX and GX capabilities\n");
+            if ( !(hwcaps & VEX_HWCAPS_PPC64_VX))
+               invalid_hwcaps(arch, hwcaps,
+                              "ISA3_0 requires VX capabilities\n");
+            if ( !(hwcaps & VEX_HWCAPS_PPC64_DFP))
+               invalid_hwcaps(arch, hwcaps,
+                              "ISA3_0 requires DFP capabilities\n");
+         }
          return;
       }
 
       case VexArchARM: {
          Bool NEON  = ((hwcaps & VEX_HWCAPS_ARM_NEON) != 0);
+         Bool VFP3  = ((hwcaps & VEX_HWCAPS_ARM_VFP3) != 0);
          UInt level = VEX_ARM_ARCHLEVEL(hwcaps);
-
          switch (level) {
             case 5:
                if (NEON)
@@ -1891,6 +1956,11 @@
                return;
             case 7:
                return;
+            case 8:
+               if (!NEON || !VFP3)
+                  invalid_hwcaps(arch, hwcaps,
+                          "NEON and VFP3 are required for ARMv8.\n");
+               return;
             default:
                invalid_hwcaps(arch, hwcaps,
                               "ARM architecture level is not supported.\n");
@@ -1908,19 +1978,28 @@
             invalid_hwcaps(arch, hwcaps,
                            "Host does not have long displacement facility.\n");
          return;
-        
+
       case VexArchMIPS32:
          switch (VEX_MIPS_COMP_ID(hwcaps)) {
             case VEX_PRID_COMP_MIPS:
+            case VEX_PRID_COMP_CAVIUM:
+            case VEX_PRID_COMP_INGENIC_E1:
             case VEX_PRID_COMP_BROADCOM:
             case VEX_PRID_COMP_NETLOGIC:
                return;
             default:
                invalid_hwcaps(arch, hwcaps, "Unsupported baseline\n");
          }
-      
+
       case VexArchMIPS64:
-         return;
+         switch (VEX_MIPS_COMP_ID(hwcaps)) {
+            case VEX_PRID_COMP_MIPS:
+            case VEX_PRID_COMP_CAVIUM:
+            case VEX_PRID_COMP_NETLOGIC:
+               return;
+            default:
+               invalid_hwcaps(arch, hwcaps, "Unsupported baseline\n");
+         }
 
       case VexArchTILEGX:
          return;