Before LL, instrument outstanding helper calls.

Callgrind, Cachegrind, and Lackey call
helpers for memory accesses in bunches, to reduce
register save/restore overhead (and merge load/store
within same instruction into a "modify" event).

The calls should not be done within a RMW section
enclosed by LL/SC instructions, as this reduces the
chance of SC to succeed, and can result in hangs.
For Callgrind, this definitly helped MIPS, and was
committed in r13136. Do the same for Cachegrind/Lackey.

git-svn-id: svn://svn.valgrind.org/valgrind/trunk@13143 a5019735-40e9-0310-863c-91ae7b9d1cf9
diff --git a/lackey/lk_main.c b/lackey/lk_main.c
index 42a0552..735bc3a 100644
--- a/lackey/lk_main.c
+++ b/lackey/lk_main.c
@@ -428,7 +428,8 @@
    At various points the list will need to be flushed, that is, IR
    generated from it.  That must happen before any possible exit from
    the block (the end, or an IRStmt_Exit).  Flushing also takes place
-   when there is no space to add a new event.
+   when there is no space to add a new event, and before entering a
+   RMW (read-modify-write) section on processors supporting LL/SC.
 
    If we require the simulation statistics to be up to date with
    respect to possible memory exceptions, then the list would have to
@@ -825,9 +826,12 @@
             if (st->Ist.LLSC.storedata == NULL) {
                /* LL */
                dataTy = typeOfIRTemp(tyenv, st->Ist.LLSC.result);
-               if (clo_trace_mem)
+               if (clo_trace_mem) {
                   addEvent_Dr( sbOut, st->Ist.LLSC.addr,
                                       sizeofIRType(dataTy) );
+                  /* flush events before LL, helps SC to succeed */
+                  flushEvents(sbOut);
+	       }
                if (clo_detailed_counts)
                   instrument_detail( sbOut, OpLoad, dataTy );
             } else {