-----------------------------------------------------------------------------
overview
-----------------------------------------------------------------------------
Previously Valgrind had its own versions of malloc() et al that replaced
glibc's.  This is necessary for various reasons for Memcheck, but isn't needed,
and was actually detrimental, to some other skins.  I never managed to treat
this satisfactorily w.r.t the core/skin split.

Now I have.  If a skin needs to know about malloc() et al, it must provide its
own replacements.  But because this is not uncommon, the core provides a module
vg_replace_malloc.c which a skin can link with, which provides skeleton
definitions, to reduce the amount of work a skin must do.  The skeletons handle
the transfer of control from the simd CPU to the real CPU, and also the
--alignment, --sloppy-malloc and --trace-malloc options.  These skeleton
definitions subsequently call functions SK_(malloc), SK_(free), etc, which the
skin must define;  in these functions the skin can do the things it needs to do
about tracking heap blocks.

For skins that track extra info about malloc'd blocks -- previously done with
ShadowChunks -- there is a new file vg_hashtable.c that implements a
generic-ish hash table (using dodgy C-style inheritance using struct overlays)
which allows skins to continue doing this fairly easily.

Skins can also replace other functions too, eg. Memcheck has its own versions
of strcpy(), memcpy(), etc.

Overall, it's slightly more work now for skins that need to replace malloc(),
but other skins don't have to use Valgrind's malloc(), so they're getting a
"purer" program run, which is good, and most of the remaining rough edges from
the core/skin split have been removed.

-----------------------------------------------------------------------------
details
-----------------------------------------------------------------------------
Moved malloc() et al intercepts from vg_clientfuncs.c into vg_replace_malloc.c.
Skins can link to it if they want to replace malloc() and friends;  it does
some stuff then passes control to SK_(malloc)() et al which the skin must
define.  They can call VG_(cli_malloc)() and VG_(cli_free)() to do the actual
allocation/deallocation.  Redzone size for the client (the CLIENT arena) is
specified by the static variable VG_(vg_malloc_redzone_szB).
vg_replace_malloc.c thus represents a kind of "mantle" level service.

To get automake to build vg_replace_malloc.o, had to resort to a similar trick
as used for the demangler -- ask for a "no install" library (which is never
used) to be built from it.

Note that all malloc, calloc, realloc, builtin_new, builtin_vec_new, memalign
are now aware of --alignment, when running on simd CPU or real CPU.

This means the new_mem_heap, die_mem_heap, copy_mem_heap and ban_mem_heap
events no longer exist, since the core doesn't control malloc() any more, and
skins can watch for these events themselves.

This required moving all the ShadowChunk stuff out of the core, which meant
the sizeof_shadow_block ``need'' could be removed, yay -- it was a horrible
hack.  Now ShadowChunks are done with a generic HashTable type, in
vg_hashtable.c, which skins can "inherit from" (in a dodgy C-only fashion by
using structs with similar layouts).  Also, the free_list stuff was all moved
as a part of this.  Also, VgAllocKind was moved out of core into
Memcheck/Addrcheck and renamed MAC_AllocKind.

Moved these options out of core into vg_replace_malloc.c:
    --trace-malloc
    --sloppy-malloc
    --alignment

The alternative_free ``need'' could go, too, since Memcheck is now in complete
control of free(), yay -- another horribility.

The bad_free and free_mismatch events could go too, since they're now not
detected by core, yay -- yet another horribility.

Moved malloc() et al wrappers for Memcheck out of vg_clientmalloc.c into
mac_malloc_wrappers.c.  Helgrind has its own wrappers now too.

Introduced VG_USERREQ__CLIENT_CALL[123] client requests.  When a skin function
is operating on the simd CPU, this will call a given function and run it on the
real CPU.  The macros VG_NON_SIMD_CALL[123] in valgrind.h present a cleaner
interface to actually use.  Also introduce analogues of these that pass 'tst'
from the scheduler as the first arg to the called function -- needed for
MC_(client_malloc)() et al.

Fiddled with USERREQ_{MALLOC,FREE} etc. in vg_scheduler.c; they call
SK_({malloc,free})() which by default call VG_(cli_malloc)() -- can't call
glibc's malloc() here.  All the other default SK_(calloc)() etc. instantly
panic; there's a lock variable to ensure that the default SK_({malloc,free})()
are only called from the scheduler, which prevents a skin from forgetting to
override SK_({malloc,free})().  Got rid of the unused USERREQ_CALLOC,
USERREQ_BUILTIN_NEW, etc.

Moved special versions of strcpy/strlen, etc, memcpy() and memchr() into
mac_replace_strmem.c -- they are only necessary for memcheck, because the
hyper-optimised normal glibc versions confuse it, and for memcpy() etc. overlap
checking.

Also added dst/src overlap checks to strcpy(), memcpy(), strcat().  They are
reported not as proper errors, but just with single line warnings, as for silly
args to malloc() et al;  this is mainly because they're on the simulated CPU
and proper error handling would be a pain;  hopefully they're rare enough to
not be a problem.  The strcpy check is done after the copy, because it would
require counting the length of the string beforehand.  Also added strncpy() and
strncat(), which have overlap checks too.  Note that addrcheck doesn't do
overlap checking.

Put USERREQ__LOGMESSAGE in vg_skin.h to do the overlap check error messages.

After removing malloc() et al and strcpy() et al out of vg_clientfuncs.c, moved
the remaining three things (sigsuspend, VG_(__libc_freeres_wrapper),
__errno_location) into vg_intercept.c, since it contains things that run on the
simulated CPU too.  Removed vg_clientfuncs.c altogether.

Moved regression test "malloc3" out of corecheck into memcheck, since corecheck
no longer looks for silly (eg. negative) args to malloc().

Removed the m_eip, m_esp, m_ebp fields from the `Error' type.  They were being
set up, and then read immediately only once, only if GDB attachment was done.
So now they're just being held in local variables.  This saves 12 bytes per
Error.

Made replacement calloc() check for --sloppy-malloc;  previously it didn't.

Added "silly" negative size arg check to realloc(), it didn't have one.

Changed VG_(read_selfprocmaps)() so it can parse the file directly, or from a
previously read buffer.  Buffer can be filled with the new
VG_(read_selfprocmaps_contents)().  Using this at start-up to snapshot
/proc/self/maps before the skins do anything, and then parsing it once they
have done their setup stuff.  Skins can now safely call VG_(malloc)() in
SK_({pre,post}_clo_init)() without the mmap'd superblock erroneously being
identified as client memory.

Changed the --help usage message slightly, now divided into four sections: core
normal, skin normal, core debugging, skin debugging.  Changed the interface for
the command_line_options need slightly -- now two functions, VG_(print_usage)()
and VG_(print_debug_usage)(), and they do the printing themselves, instead of
just returning a string -- that's more flexible.

Removed DEBUG_CLIENTMALLOC code, it wasn't being used and was a pain.

Added a regression test testing leak suppressions (nanoleak_supp), and another
testing strcpy/memcpy/etc overlap warnings (overlap).

Also changed Addrcheck to link with the files shared with Memcheck, rather than
#including the .c files directly.

Commoned up a little more shared Addrcheck/Memcheck code, for the usage
message, and initialisation/finalisation.

Added a Bool param to VG_(unique_error)() dictating whether it should allow
GDB to be attached; for leak checks, because we don't want to attach GDB on
leak errors (causes seg faults).  A bit hacky, but it will do.

Had to change lots of the expected outputs from regression files now that
malloc() et al are in vg_replace_malloc.c rather than vg_clientfuncs.c.


git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1524 a5019735-40e9-0310-863c-91ae7b9d1cf9
diff --git a/memcheck/Makefile.am b/memcheck/Makefile.am
index 9c32164..f8b323d 100644
--- a/memcheck/Makefile.am
+++ b/memcheck/Makefile.am
@@ -15,14 +15,17 @@
 
 vgskin_memcheck_so_SOURCES = \
 	mac_leakcheck.c \
+	mac_malloc_wrappers.c \
 	mac_needs.c \
 	mc_main.c \
 	mc_clientreqs.c \
 	mc_errcontext.c \
 	mc_from_ucode.c \
+	mc_replace_strmem.c \
 	mc_translate.c \
 	mc_helpers.S
 vgskin_memcheck_so_LDFLAGS = -shared
+vgskin_memcheck_so_LDADD = ../coregrind/vg_replace_malloc.o
 
 mcincludedir = $(includedir)/valgrind
 
@@ -34,3 +37,5 @@
 	mc_constants.h	\
 	mc_include.h
 
+mc_replace_strmem.o: CFLAGS += -fno-omit-frame-pointer
+
diff --git a/memcheck/mac_leakcheck.c b/memcheck/mac_leakcheck.c
index 83d3ada..9894b86 100644
--- a/memcheck/mac_leakcheck.c
+++ b/memcheck/mac_leakcheck.c
@@ -224,9 +224,9 @@
 #ifdef VG_DEBUG_LEAKCHECK
 /* Used to sanity-check the fast binary-search mechanism. */
 static 
-Int find_shadow_for_OLD ( Addr          ptr, 
-                          ShadowChunk** shadows,
-                          Int           n_shadows )
+Int find_shadow_for_OLD ( Addr        ptr, 
+                          MAC_Chunk** shadows,
+                          Int         n_shadows )
 
 {
    Int  i;
@@ -245,9 +245,9 @@
 
 
 static 
-Int find_shadow_for ( Addr          ptr, 
-                      ShadowChunk** shadows,
-                      Int           n_shadows )
+Int find_shadow_for ( Addr        ptr, 
+                      MAC_Chunk** shadows,
+                      Int         n_shadows )
 {
    Addr a_mid_lo, a_mid_hi;
    Int lo, mid, hi, retVal;
@@ -256,14 +256,12 @@
    lo = 0;
    hi = n_shadows-1;
    while (True) {
-      /* invariant: current unsearched space is from lo to hi,
-         inclusive. */
+      /* invariant: current unsearched space is from lo to hi, inclusive. */
       if (lo > hi) break; /* not found */
 
       mid      = (lo + hi) / 2;
-      a_mid_lo = VG_(get_sc_data)(shadows[mid]);
-      a_mid_hi = VG_(get_sc_data)(shadows[mid]) + 
-                 VG_(get_sc_size)(shadows[mid]) - 1;
+      a_mid_lo = shadows[mid]->data;
+      a_mid_hi = shadows[mid]->data + shadows[mid]->size - 1;
 
       if (ptr < a_mid_lo) {
          hi = mid-1;
@@ -286,11 +284,11 @@
 }
 
 /* Globals, for the following callback used by VG_(detect_memory_leaks). */
-static ShadowChunk**  vglc_shadows;
-static Int            vglc_n_shadows;
-static Reachedness*   vglc_reachedness;
-static Addr           vglc_min_mallocd_addr;
-static Addr           vglc_max_mallocd_addr;
+static MAC_Chunk**  lc_shadows;
+static Int          lc_n_shadows;
+static Reachedness* lc_reachedness;
+static Addr         lc_min_mallocd_addr;
+static Addr         lc_max_mallocd_addr;
 
 static 
 void vg_detect_memory_leaks_notify_addr ( Addr a, UInt word_at_a )
@@ -313,29 +311,28 @@
       where the .bss segment has been put.  If you can, drop me a
       line.  
    */
-   if (VG_(within_stack)(a))                return;
-   if (VG_(within_m_state_static)(a))       return;
-   if (a == (Addr)(&vglc_min_mallocd_addr)) return;
-   if (a == (Addr)(&vglc_max_mallocd_addr)) return;
+   if (VG_(within_stack)(a))              return;
+   if (VG_(within_m_state_static)(a))     return;
+   if (a == (Addr)(&lc_min_mallocd_addr)) return;
+   if (a == (Addr)(&lc_max_mallocd_addr)) return;
 
    /* OK, let's get on and do something Useful for a change. */
 
    ptr = (Addr)word_at_a;
-   if (ptr >= vglc_min_mallocd_addr && ptr <= vglc_max_mallocd_addr) {
+   if (ptr >= lc_min_mallocd_addr && ptr <= lc_max_mallocd_addr) {
       /* Might be legitimate; we'll have to investigate further. */
-      sh_no = find_shadow_for ( ptr, vglc_shadows, vglc_n_shadows );
+      sh_no = find_shadow_for ( ptr, lc_shadows, lc_n_shadows );
       if (sh_no != -1) {
          /* Found a block at/into which ptr points. */
-         sk_assert(sh_no >= 0 && sh_no < vglc_n_shadows);
-         sk_assert(ptr < VG_(get_sc_data)(vglc_shadows[sh_no])
-                       + VG_(get_sc_size)(vglc_shadows[sh_no]));
+         sk_assert(sh_no >= 0 && sh_no < lc_n_shadows);
+         sk_assert(ptr < lc_shadows[sh_no]->data + lc_shadows[sh_no]->size);
          /* Decide whether Proper-ly or Interior-ly reached. */
-         if (ptr == VG_(get_sc_data)(vglc_shadows[sh_no])) {
+         if (ptr == lc_shadows[sh_no]->data) {
             if (0) VG_(printf)("pointer at %p to %p\n", a, word_at_a );
-            vglc_reachedness[sh_no] = Proper;
+            lc_reachedness[sh_no] = Proper;
          } else {
-            if (vglc_reachedness[sh_no] == Unreached)
-               vglc_reachedness[sh_no] = Interior;
+            if (lc_reachedness[sh_no] == Unreached)
+               lc_reachedness[sh_no] = Interior;
          }
       }
    }
@@ -385,25 +382,33 @@
    LossRecord* errlist;
    LossRecord* p;
 
-   /* VG_(get_malloc_shadows) allocates storage for shadows */
-   vglc_shadows = VG_(get_malloc_shadows)( &vglc_n_shadows );
-   if (vglc_n_shadows == 0) {
-      sk_assert(vglc_shadows == NULL);
+   /* VG_(HashTable_to_array) allocates storage for shadows */
+   lc_shadows = (MAC_Chunk**)VG_(HT_to_sorted_array)( MAC_(malloc_list),
+                                                        &lc_n_shadows );
+
+   /* Sanity check -- make sure they don't overlap */
+   for (i = 0; i < lc_n_shadows-1; i++) {
+      sk_assert( lc_shadows[i]->data + lc_shadows[i]->size
+                 < lc_shadows[i+1]->data );
+   }
+
+   if (lc_n_shadows == 0) {
+      sk_assert(lc_shadows == NULL);
       VG_(message)(Vg_UserMsg, 
                    "No malloc'd blocks -- no leaks are possible.");
       return;
    }
 
    VG_(message)(Vg_UserMsg, "searching for pointers to %d not-freed blocks.", 
-                vglc_n_shadows );
+                lc_n_shadows );
 
-   vglc_min_mallocd_addr = VG_(get_sc_data)(vglc_shadows[0]);
-   vglc_max_mallocd_addr = VG_(get_sc_data)(vglc_shadows[vglc_n_shadows-1])
-                         + VG_(get_sc_size)(vglc_shadows[vglc_n_shadows-1]) - 1;
+   lc_min_mallocd_addr = lc_shadows[0]->data;
+   lc_max_mallocd_addr = lc_shadows[lc_n_shadows-1]->data
+                         + lc_shadows[lc_n_shadows-1]->size - 1;
 
-   vglc_reachedness = VG_(malloc)( vglc_n_shadows * sizeof(Reachedness) );
-   for (i = 0; i < vglc_n_shadows; i++)
-      vglc_reachedness[i] = Unreached;
+   lc_reachedness = VG_(malloc)( lc_n_shadows * sizeof(Reachedness) );
+   for (i = 0; i < lc_n_shadows; i++)
+      lc_reachedness[i] = Unreached;
 
    /* Do the scan of memory. */
    bytes_notified
@@ -419,12 +424,12 @@
    /* Common up the lost blocks so we can print sensible error messages. */
    n_lossrecords = 0;
    errlist       = NULL;
-   for (i = 0; i < vglc_n_shadows; i++) {
+   for (i = 0; i < lc_n_shadows; i++) {
      
-      ExeContext* where = MAC_(get_where) ( vglc_shadows[i] );
+      ExeContext* where = lc_shadows[i]->where;
       
       for (p = errlist; p != NULL; p = p->next) {
-         if (p->loss_mode == vglc_reachedness[i]
+         if (p->loss_mode == lc_reachedness[i]
              && VG_(eq_ExeContext) ( MAC_(clo_leak_resolution),
                                      p->allocated_at, 
                                      where) ) {
@@ -433,13 +438,13 @@
       }
       if (p != NULL) {
          p->num_blocks  ++;
-         p->total_bytes += VG_(get_sc_size)(vglc_shadows[i]);
+         p->total_bytes += lc_shadows[i]->size;
       } else {
          n_lossrecords ++;
          p = VG_(malloc)(sizeof(LossRecord));
-         p->loss_mode    = vglc_reachedness[i];
+         p->loss_mode    = lc_reachedness[i];
          p->allocated_at = where;
-         p->total_bytes  = VG_(get_sc_size)(vglc_shadows[i]);
+         p->total_bytes  = lc_shadows[i]->size;
          p->num_blocks   = 1;
          p->next         = errlist;
          errlist         = p;
@@ -474,7 +479,8 @@
       is_suppressed = 
          VG_(unique_error) ( /*tst*/NULL, LeakErr, (UInt)i+1,
                              (Char*)n_lossrecords, (void*) p_min,
-                             p_min->allocated_at, print_record );
+                             p_min->allocated_at, print_record,
+                             /*allow_GDB_attach*/False );
 
       if (is_suppressed) {
          blocks_suppressed += p_min->num_blocks;
@@ -516,8 +522,8 @@
    }
    VG_(message)(Vg_UserMsg, "");
 
-   VG_(free) ( vglc_shadows );
-   VG_(free) ( vglc_reachedness );
+   VG_(free) ( lc_shadows );
+   VG_(free) ( lc_reachedness );
 }
 
 /*--------------------------------------------------------------------*/
diff --git a/memcheck/mac_malloc_wrappers.c b/memcheck/mac_malloc_wrappers.c
new file mode 100644
index 0000000..0636477
--- /dev/null
+++ b/memcheck/mac_malloc_wrappers.c
@@ -0,0 +1,415 @@
+
+/*--------------------------------------------------------------------*/
+/*--- malloc/free wrappers for detecting errors and updating bits. ---*/
+/*---                                        mac_malloc_wrappers.c ---*/
+/*--------------------------------------------------------------------*/
+
+/*
+   This file is part of MemCheck, a heavyweight Valgrind skin for
+   detecting memory errors, and AddrCheck, a lightweight Valgrind skin 
+   for detecting memory errors.
+
+   Copyright (C) 2000-2002 Julian Seward 
+      jseward@acm.org
+
+   This program is free software; you can redistribute it and/or
+   modify it under the terms of the GNU General Public License as
+   published by the Free Software Foundation; either version 2 of the
+   License, or (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software
+   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+   02111-1307, USA.
+
+   The GNU General Public License is contained in the file COPYING.
+*/
+
+#include "mac_shared.h"
+
+/*------------------------------------------------------------*/
+/*--- Defns                                                ---*/
+/*------------------------------------------------------------*/
+
+/* Stats ... */
+static UInt cmalloc_n_mallocs  = 0;
+static UInt cmalloc_n_frees    = 0;
+static UInt cmalloc_bs_mallocd = 0;
+
+/* We want a 16B redzone on heap blocks for Addrcheck and Memcheck */
+UInt VG_(vg_malloc_redzone_szB) = 16;
+
+/*------------------------------------------------------------*/
+/*--- Tracking malloc'd and free'd blocks                  ---*/
+/*------------------------------------------------------------*/
+
+/* Record malloc'd blocks.  Nb: Addrcheck and Memcheck construct this
+   separately in their respective initialisation functions. */
+VgHashTable MAC_(malloc_list) = NULL;
+   
+/* Records blocks after freeing. */
+static MAC_Chunk* freed_list_start  = NULL;
+static MAC_Chunk* freed_list_end    = NULL;
+static Int        freed_list_volume = 0;
+
+/* Put a shadow chunk on the freed blocks queue, possibly freeing up
+   some of the oldest blocks in the queue at the same time. */
+static void add_to_freed_queue ( MAC_Chunk* mc )
+{
+   MAC_Chunk* sc1;
+
+   /* Put it at the end of the freed list */
+   if (freed_list_end == NULL) {
+      sk_assert(freed_list_start == NULL);
+      freed_list_end    = freed_list_start = mc;
+      freed_list_volume = mc->size;
+   } else {
+      sk_assert(freed_list_end->next == NULL);
+      freed_list_end->next = mc;
+      freed_list_end       = mc;
+      freed_list_volume += mc->size;
+   }
+   mc->next = NULL;
+
+   /* Release enough of the oldest blocks to bring the free queue
+      volume below vg_clo_freelist_vol. */
+
+   while (freed_list_volume > MAC_(clo_freelist_vol)) {
+      sk_assert(freed_list_start != NULL);
+      sk_assert(freed_list_end != NULL);
+
+      sc1 = freed_list_start;
+      freed_list_volume -= sc1->size;
+      /* VG_(printf)("volume now %d\n", freed_list_volume); */
+      sk_assert(freed_list_volume >= 0);
+
+      if (freed_list_start == freed_list_end) {
+         freed_list_start = freed_list_end = NULL;
+      } else {
+         freed_list_start = sc1->next;
+      }
+      sc1->next = NULL; /* just paranoia */
+
+      /* free MAC_Chunk */
+      VG_(cli_free) ( (void*)(sc1->data) );
+      VG_(free) ( sc1 );
+   }
+}
+
+/* Return the first shadow chunk satisfying the predicate p. */
+MAC_Chunk* MAC_(first_matching_freed_MAC_Chunk) ( Bool (*p)(MAC_Chunk*) )
+{
+   MAC_Chunk* mc;
+
+   /* No point looking through freed blocks if we're not keeping
+      them around for a while... */
+   for (mc = freed_list_start; mc != NULL; mc = mc->next)
+      if (p(mc))
+         return mc;
+
+   return NULL;
+}
+
+/* Allocate a user-chunk of size bytes.  Also allocate its shadow
+   block, make the shadow block point at the user block.  Put the
+   shadow chunk on the appropriate list, and set all memory
+   protections correctly. */
+
+static void add_MAC_Chunk ( ThreadState* tst,
+                            Addr p, UInt size, MAC_AllocKind kind )
+{
+   MAC_Chunk* mc;
+
+   mc            = VG_(malloc)(sizeof(MAC_Chunk));
+   mc->data      = p;
+   mc->size      = size;
+   mc->allockind = kind;
+   mc->where     = VG_(get_ExeContext)(tst);
+
+   VG_(HT_add_node)( MAC_(malloc_list), (VgHashNode*)mc );
+}
+
+/*------------------------------------------------------------*/
+/*--- client_malloc(), etc                                 ---*/
+/*------------------------------------------------------------*/
+
+/* Function pointers for the two skins to track interesting events. */
+void (*MAC_(new_mem_heap)) ( Addr a, UInt len, Bool is_inited );
+void (*MAC_(ban_mem_heap)) ( Addr a, UInt len );
+void (*MAC_(die_mem_heap)) ( Addr a, UInt len );
+void (*MAC_(copy_mem_heap))( Addr from, Addr to, UInt len );
+
+/* Allocate memory and note change in memory available */
+static __inline__
+void* alloc_and_new_mem ( ThreadState* tst, UInt size, UInt alignment,
+                          Bool is_zeroed, MAC_AllocKind kind )
+{
+   Addr p;
+
+   VGP_PUSHCC(VgpCliMalloc);
+
+   cmalloc_n_mallocs ++;
+   cmalloc_bs_mallocd += size;
+
+   p = (Addr)VG_(cli_malloc)(alignment, size);
+
+   add_MAC_Chunk ( tst, p, size, kind );
+
+   MAC_(ban_mem_heap)( p-VG_(vg_malloc_redzone_szB), 
+                         VG_(vg_malloc_redzone_szB) );
+   MAC_(new_mem_heap)( p, size, is_zeroed );
+   MAC_(ban_mem_heap)( p+size, VG_(vg_malloc_redzone_szB) );
+
+   VGP_POPCC(VgpCliMalloc);
+   return (void*)p;
+}
+
+void* SK_(malloc) ( ThreadState* tst, Int n )
+{
+   if (n < 0) {
+      VG_(message)(Vg_UserMsg, "Warning: silly arg (%d) to malloc()", n );
+      return NULL;
+   } else {
+      return alloc_and_new_mem ( tst, n, VG_(clo_alignment), 
+                                 /*is_zeroed*/False, MAC_AllocMalloc );
+   }
+}
+
+void* SK_(__builtin_new) ( ThreadState* tst, Int n )
+{
+   if (n < 0) {
+      VG_(message)(Vg_UserMsg, "Warning: silly arg (%d) to __builtin_new()", n);
+      return NULL;
+   } else {
+      return alloc_and_new_mem ( tst, n, VG_(clo_alignment), 
+                                 /*is_zeroed*/False, MAC_AllocNew );
+   }
+}
+
+void* SK_(__builtin_vec_new) ( ThreadState* tst, Int n )
+{
+   if (n < 0) {
+      VG_(message)(Vg_UserMsg, 
+                   "Warning: silly arg (%d) to __builtin_vec_new()", n );
+      return NULL;
+   } else {
+      return alloc_and_new_mem ( tst, n, VG_(clo_alignment), 
+                                 /*is_zeroed*/False, MAC_AllocNewVec );
+   }
+}
+
+void* SK_(memalign) ( ThreadState* tst, Int align, Int n )
+{
+   if (n < 0) {
+      VG_(message)(Vg_UserMsg, "Warning: silly arg (%d) to memalign()", n);
+      return NULL;
+   } else {
+      return alloc_and_new_mem ( tst, n, align, /*is_zeroed*/False, 
+                                 MAC_AllocMalloc );
+   }
+}
+
+void* SK_(calloc) ( ThreadState* tst, Int nmemb, Int size1 )
+{
+   void* p;
+   Int   size, i;
+
+   size = nmemb * size1;
+
+   if (nmemb < 0 || size1 < 0) {
+      VG_(message)(Vg_UserMsg, "Warning: silly args (%d,%d) to calloc()",
+                               nmemb, size1 );
+      return NULL;
+   } else {
+      p = alloc_and_new_mem ( tst, size, VG_(clo_alignment), 
+                              /*is_zeroed*/True, MAC_AllocMalloc );
+      for (i = 0; i < size; i++) 
+         ((UChar*)p)[i] = 0;
+      return p;
+   }
+}
+
+static
+void die_and_free_mem ( ThreadState* tst, MAC_Chunk* mc,
+                        MAC_Chunk** prev_chunks_next_ptr )
+{
+   /* Note: ban redzones again -- just in case user de-banned them
+      with a client request... */
+   MAC_(ban_mem_heap)( mc->data-VG_(vg_malloc_redzone_szB), 
+                                VG_(vg_malloc_redzone_szB) );
+   MAC_(die_mem_heap)( mc->data, mc->size );
+   MAC_(ban_mem_heap)( mc->data+mc->size, VG_(vg_malloc_redzone_szB) );
+
+   /* Remove mc from the malloclist using prev_chunks_next_ptr to
+      avoid repeating the hash table lookup.  Can't remove until at least
+      after free and free_mismatch errors are done because they use
+      describe_addr() which looks for it in malloclist. */
+   *prev_chunks_next_ptr = mc->next;
+
+   /* Record where freed */
+   mc->where = VG_(get_ExeContext) ( tst );
+
+   /* Put it out of harm's way for a while. */
+   add_to_freed_queue ( mc );
+}
+
+
+static __inline__
+void handle_free ( ThreadState* tst, void* p, MAC_AllocKind kind )
+{
+   MAC_Chunk*  mc;
+   MAC_Chunk** prev_chunks_next_ptr;
+
+   VGP_PUSHCC(VgpCliMalloc);
+
+   cmalloc_n_frees++;
+
+   mc = (MAC_Chunk*)VG_(HT_get_node) ( MAC_(malloc_list), (UInt)p,
+                                       (VgHashNode***)&prev_chunks_next_ptr );
+
+   if (mc == NULL) {
+      MAC_(record_free_error) ( tst, (Addr)p );
+      VGP_POPCC(VgpCliMalloc);
+      return;
+   }
+
+   /* check if its a matching free() / delete / delete [] */
+   if (kind != mc->allockind) {
+      MAC_(record_freemismatch_error) ( tst, (Addr)p );
+   }
+
+   die_and_free_mem ( tst, mc, prev_chunks_next_ptr );
+   VGP_POPCC(VgpCliMalloc);
+}
+
+void SK_(free) ( ThreadState* tst, void* p )
+{
+   handle_free(tst, p, MAC_AllocMalloc);
+}
+
+void SK_(__builtin_delete) ( ThreadState* tst, void* p )
+{
+   handle_free(tst, p, MAC_AllocNew);
+}
+
+void SK_(__builtin_vec_delete) ( ThreadState* tst, void* p )
+{
+   handle_free(tst, p, MAC_AllocNewVec);
+}
+
+void* SK_(realloc) ( ThreadState* tst, void* p, Int new_size )
+{
+   MAC_Chunk  *mc;
+   MAC_Chunk **prev_chunks_next_ptr;
+   UInt        i;
+
+   VGP_PUSHCC(VgpCliMalloc);
+
+   cmalloc_n_frees ++;
+   cmalloc_n_mallocs ++;
+   cmalloc_bs_mallocd += new_size;
+
+   if (new_size < 0) {
+      VG_(message)(Vg_UserMsg, 
+                   "Warning: silly arg (%d) to realloc()", new_size );
+      return NULL;
+   }
+
+   /* First try and find the block. */
+   mc = (MAC_Chunk*)VG_(HT_get_node) ( MAC_(malloc_list), (UInt)p,
+                                       (VgHashNode***)&prev_chunks_next_ptr );
+
+   if (mc == NULL) {
+      MAC_(record_free_error) ( tst, (Addr)p );
+      /* Perhaps we should return to the program regardless. */
+      VGP_POPCC(VgpCliMalloc);
+      return NULL;
+   }
+  
+   /* check if its a matching free() / delete / delete [] */
+   if (MAC_AllocMalloc != mc->allockind) {
+      /* can not realloc a range that was allocated with new or new [] */
+      MAC_(record_freemismatch_error) ( tst, (Addr)p );
+      /* but keep going anyway */
+   }
+
+   if (mc->size == new_size) {
+      /* size unchanged */
+      VGP_POPCC(VgpCliMalloc);
+      return p;
+      
+   } else if (mc->size > new_size) {
+      /* new size is smaller */
+      MAC_(die_mem_heap)( mc->data+new_size, mc->size-new_size );
+      mc->size = new_size;
+      VGP_POPCC(VgpCliMalloc);
+      return p;
+
+   } else {
+      /* new size is bigger */
+      Addr p_new;
+
+      /* Get new memory */
+      p_new = (Addr)VG_(cli_malloc)(VG_(clo_alignment), new_size);
+
+      /* First half kept and copied, second half new, 
+         red zones as normal */
+      MAC_(ban_mem_heap) ( p_new-VG_(vg_malloc_redzone_szB), 
+                                 VG_(vg_malloc_redzone_szB) );
+      MAC_(copy_mem_heap)( (Addr)p, p_new, mc->size );
+      MAC_(new_mem_heap) ( p_new+mc->size, new_size-mc->size, /*inited*/False );
+      MAC_(ban_mem_heap) ( p_new+new_size, VG_(vg_malloc_redzone_szB) );
+
+      /* Copy from old to new */
+      for (i = 0; i < mc->size; i++)
+         ((UChar*)p_new)[i] = ((UChar*)p)[i];
+
+      /* Free old memory */
+      die_and_free_mem ( tst, mc, prev_chunks_next_ptr );
+
+      /* this has to be after die_and_free_mem, otherwise the
+         former succeeds in shorting out the new block, not the
+         old, in the case when both are on the same list.  */
+      add_MAC_Chunk ( tst, p_new, new_size, MAC_AllocMalloc );
+
+      VGP_POPCC(VgpCliMalloc);
+      return (void*)p_new;
+   }  
+}
+
+void MAC_(print_malloc_stats) ( void )
+{
+   UInt nblocks = 0, nbytes = 0;
+   
+   /* Mmm... more lexical scoping */
+   void count_one_chunk(VgHashNode* node) {
+      MAC_Chunk* mc = (MAC_Chunk*)node;
+      nblocks ++;
+      nbytes  += mc->size;
+   }
+
+   if (VG_(clo_verbosity) == 0)
+      return;
+
+   /* Count memory still in use. */
+   VG_(HT_apply_to_all_nodes)(MAC_(malloc_list), count_one_chunk);
+
+   VG_(message)(Vg_UserMsg, 
+                "malloc/free: in use at exit: %d bytes in %d blocks.",
+                nbytes, nblocks);
+   VG_(message)(Vg_UserMsg, 
+                "malloc/free: %d allocs, %d frees, %u bytes allocated.",
+                cmalloc_n_mallocs,
+                cmalloc_n_frees, cmalloc_bs_mallocd);
+   if (VG_(clo_verbosity) > 1)
+      VG_(message)(Vg_UserMsg, "");
+}
+
+/*--------------------------------------------------------------------*/
+/*--- end                                    mac_malloc_wrappers.c ---*/
+/*--------------------------------------------------------------------*/
diff --git a/memcheck/mac_needs.c b/memcheck/mac_needs.c
index ad6b2f6..84c778c 100644
--- a/memcheck/mac_needs.c
+++ b/memcheck/mac_needs.c
@@ -87,110 +87,27 @@
       MAC_(clo_workaround_gcc296_bugs) = False;
 
    else
-      return False;
+      return VG_(replacement_malloc_process_cmd_line_option)(arg);
 
    return True;
 }
 
-/*------------------------------------------------------------*/
-/*--- Shadow chunks info                                   ---*/
-/*------------------------------------------------------------*/
-
-void MAC_(set_where)( ShadowChunk* sc, ExeContext* ec )
+void MAC_(print_common_usage)(void)
 {
-   VG_(set_sc_extra)( sc, 0, (UInt)ec );
+   VG_(printf)(
+"    --partial-loads-ok=no|yes too hard to explain here; see manual [yes]\n"
+"    --freelist-vol=<number>   volume of freed blocks queue [1000000]\n"
+"    --leak-check=no|yes       search for memory leaks at exit? [no]\n"
+"    --leak-resolution=low|med|high  how much bt merging in leak check [low]\n"
+"    --show-reachable=no|yes   show reachable blocks in leak check? [no]\n"
+"    --workaround-gcc296-bugs=no|yes  self explanatory [no]\n"
+   );
+   VG_(replacement_malloc_print_usage)();
 }
 
-ExeContext *MAC_(get_where)( ShadowChunk* sc )
+void MAC_(print_common_debug_usage)(void)
 {
-   return (ExeContext*)VG_(get_sc_extra)(sc, 0);
-}
-
-void SK_(complete_shadow_chunk) ( ShadowChunk* sc, ThreadState* tst )
-{
-   VG_(set_sc_extra) ( sc, 0, (UInt)VG_(get_ExeContext)(tst) );
-}
-
-
-/*------------------------------------------------------------*/
-/*--- Postponing free()ing                                 ---*/
-/*------------------------------------------------------------*/
-
-/* Holds blocks after freeing. */
-static ShadowChunk* freed_list_start  = NULL;
-static ShadowChunk* freed_list_end    = NULL;
-static Int          freed_list_volume = 0;
-
-__attribute__ ((unused))
-Int MAC_(count_freelist) ( void )
-{
-   ShadowChunk* sc;
-   Int n = 0;
-   for (sc = freed_list_start; sc != NULL; sc = VG_(get_sc_next)(sc))
-      n++;
-   return n;
-}
-
-__attribute__ ((unused))
-void MAC_(freelist_sanity) ( void )
-{
-   ShadowChunk* sc;
-   Int n = 0;
-   /* VG_(printf)("freelist sanity\n"); */
-   for (sc = freed_list_start; sc != NULL; sc = VG_(get_sc_next)(sc))
-      n += VG_(get_sc_size)(sc);
-   sk_assert(n == freed_list_volume);
-}
-
-/* Put a shadow chunk on the freed blocks queue, possibly freeing up
-   some of the oldest blocks in the queue at the same time. */
-static void add_to_freed_queue ( ShadowChunk* sc )
-{
-   ShadowChunk* sc1;
-
-   /* Put it at the end of the freed list */
-   if (freed_list_end == NULL) {
-      sk_assert(freed_list_start == NULL);
-      freed_list_end = freed_list_start = sc;
-      freed_list_volume = VG_(get_sc_size)(sc);
-   } else {    
-      sk_assert(VG_(get_sc_next)(freed_list_end) == NULL);
-      VG_(set_sc_next)(freed_list_end, sc);
-      freed_list_end = sc;
-      freed_list_volume += VG_(get_sc_size)(sc);
-   }
-   VG_(set_sc_next)(sc, NULL);
-
-   /* Release enough of the oldest blocks to bring the free queue
-      volume below vg_clo_freelist_vol. */
-   
-   while (freed_list_volume > MAC_(clo_freelist_vol)) {
-      /* freelist_sanity(); */
-      sk_assert(freed_list_start != NULL);
-      sk_assert(freed_list_end != NULL);
-
-      sc1 = freed_list_start;
-      freed_list_volume -= VG_(get_sc_size)(sc1);
-      /* VG_(printf)("volume now %d\n", freed_list_volume); */
-      sk_assert(freed_list_volume >= 0);
-
-      if (freed_list_start == freed_list_end) {
-         freed_list_start = freed_list_end = NULL;
-      } else {
-         freed_list_start = VG_(get_sc_next)(sc1);
-      }
-      VG_(set_sc_next)(sc1, NULL); /* just paranoia */
-      VG_(free_ShadowChunk) ( sc1 );
-   }
-}
-
-void SK_(alt_free) ( ShadowChunk* sc, ThreadState* tst )
-{
-   /* Record where freed */
-   MAC_(set_where)( sc, VG_(get_ExeContext) ( tst ) );
-
-   /* Put it out of harm's way for a while. */
-   add_to_freed_queue ( sc );
+   VG_(replacement_malloc_print_debug_usage)();
 }
 
 /*------------------------------------------------------------*/
@@ -389,39 +306,29 @@
    MemCheck for user blocks, which Addrcheck doesn't support. */
 Bool (*MAC_(describe_addr_supp)) ( Addr a, AddrInfo* ai ) = NULL;
    
-/* Return the first shadow chunk satisfying the predicate p. */
-static ShadowChunk* first_matching_freed_ShadowChunk ( Bool (*p)(ShadowChunk*) )
-{
-   ShadowChunk* sc;
-
-   /* No point looking through freed blocks if we're not keeping
-      them around for a while... */
-   for (sc = freed_list_start; sc != NULL; sc = VG_(get_sc_next)(sc))
-      if (p(sc))
-         return sc;
-
-   return NULL;
-}
-
 /* Describe an address as best you can, for error messages,
    putting the result in ai. */
 static void describe_addr ( Addr a, AddrInfo* ai )
 {
-   ShadowChunk* sc;
-   ThreadId     tid;
+   MAC_Chunk* sc;
+   ThreadId   tid;
 
    /* Nested functions, yeah.  Need the lexical scoping of 'a'. */
-   
+
    /* Closure for searching thread stacks */
    Bool addr_is_in_bounds(Addr stack_min, Addr stack_max)
    {
       return (stack_min <= a && a <= stack_max);
    }
-   /* Closure for searching malloc'd and free'd lists */
-   Bool addr_is_in_block(ShadowChunk *sh_ch)
+   /* Closure for searching free'd list */
+   Bool addr_is_in_MAC_Chunk(MAC_Chunk* mc)
    {
-      return VG_(addr_is_in_block) ( a, VG_(get_sc_data)(sh_ch),
-                                        VG_(get_sc_size)(sh_ch) );
+      return VG_(addr_is_in_block)( a, mc->data, mc->size );
+   }
+   /* Closure for searching malloc'd lists */
+   Bool addr_is_in_HashNode(VgHashNode* sh_ch)
+   {
+      return addr_is_in_MAC_Chunk( (MAC_Chunk*)sh_ch );
    }
 
    /* Perhaps it's a user-def'd block ?  (only check if requested, though) */
@@ -437,21 +344,21 @@
       return;
    }
    /* Search for a recently freed block which might bracket it. */
-   sc = first_matching_freed_ShadowChunk(addr_is_in_block);
-   if (NULL != sc) { 
+   sc = MAC_(first_matching_freed_MAC_Chunk)(addr_is_in_MAC_Chunk);
+   if (NULL != sc) {
       ai->akind      = Freed;
-      ai->blksize    = VG_(get_sc_size)(sc);
-      ai->rwoffset   = (Int)a - (Int)VG_(get_sc_data)(sc);
-      ai->lastchange = MAC_(get_where)(sc);
+      ai->blksize    = sc->size;
+      ai->rwoffset   = (Int)a - (Int)sc->data;
+      ai->lastchange = sc->where;
       return;
    }
    /* Search for a currently malloc'd block which might bracket it. */
-   sc = VG_(first_matching_mallocd_ShadowChunk)(addr_is_in_block);
+   sc = (MAC_Chunk*)VG_(HT_first_match)(MAC_(malloc_list), addr_is_in_HashNode);
    if (NULL != sc) {
       ai->akind      = Mallocd;
-      ai->blksize    = VG_(get_sc_size)(sc);
-      ai->rwoffset   = (Int)a - (Int)VG_(get_sc_data)(sc);
-      ai->lastchange = MAC_(get_where)(sc);
+      ai->blksize    = sc->size;
+      ai->rwoffset   = (Int)(a) - (Int)sc->data;
+      ai->lastchange = sc->where;
       return;
    }
    /* Clueless ... */
@@ -459,7 +366,6 @@
    return;
 }
 
-
 /* Is this address within some small distance below %ESP?  Used only
    for the --workaround-gcc296-bugs kludge. */
 static Bool is_just_below_ESP( Addr esp, Addr aa )
@@ -798,14 +704,14 @@
 
 UInt MAC_(event_ctr)[N_PROF_EVENTS];
 
-void MAC_(init_prof_mem) ( void )
+void init_prof_mem ( void )
 {
    Int i;
    for (i = 0; i < N_PROF_EVENTS; i++)
       MAC_(event_ctr)[i] = 0;
 }
 
-void MAC_(done_prof_mem) ( void )
+void done_prof_mem ( void )
 {
    Int i;
    for (i = 0; i < N_PROF_EVENTS; i++) {
@@ -819,12 +725,39 @@
 
 #else
 
-void MAC_(init_prof_mem) ( void ) { }
-void MAC_(done_prof_mem) ( void ) { }
+void init_prof_mem ( void ) { }
+void done_prof_mem ( void ) { }
 
 #endif
 
 /*------------------------------------------------------------*/
+/*--- Common initialisation + finalisation                 ---*/
+/*------------------------------------------------------------*/
+
+void MAC_(common_pre_clo_init)(void)
+{
+   MAC_(malloc_list) = VG_(HT_construct)();
+   init_prof_mem();
+}
+
+void MAC_(common_fini)(void (*leak_check)(void))
+{
+   MAC_(print_malloc_stats)();
+
+   if (VG_(clo_verbosity) == 1) {
+      if (!MAC_(clo_leak_check))
+         VG_(message)(Vg_UserMsg, 
+             "For a detailed leak analysis,  rerun with: --leak-check=yes");
+
+      VG_(message)(Vg_UserMsg, 
+                   "For counts of detected errors, rerun with: -v");
+   }
+   if (MAC_(clo_leak_check)) leak_check();
+
+   done_prof_mem();
+}
+
+/*------------------------------------------------------------*/
 /*--- Syscall wrappers                                     ---*/
 /*------------------------------------------------------------*/
 
diff --git a/memcheck/mac_shared.h b/memcheck/mac_shared.h
index fc3d86b..147fa3f 100644
--- a/memcheck/mac_shared.h
+++ b/memcheck/mac_shared.h
@@ -120,6 +120,26 @@
    }
    MAC_Error;
 
+/* For malloc()/new/new[] vs. free()/delete/delete[] mismatch checking. */
+typedef
+   enum {
+      MAC_AllocMalloc = 0,
+      MAC_AllocNew    = 1,
+      MAC_AllocNewVec = 2
+   }
+   MAC_AllocKind;
+   
+/* Nb: first two fields must match core's VgHashNode. */
+typedef
+   struct _MAC_Chunk {
+      struct _MAC_Chunk* next;
+      Addr          data;           /* ptr to actual block              */
+      UInt          size : 30;      /* size requested                   */
+      MAC_AllocKind allockind : 2;  /* which wrapper did the allocation */
+      ExeContext*   where;          /* where it was allocated           */
+   }
+   MAC_Chunk;
+
 /*------------------------------------------------------------*/
 /*--- Profiling of skins and memory events                 ---*/
 /*------------------------------------------------------------*/
@@ -225,22 +245,36 @@
  * default: NO*/
 extern Bool MAC_(clo_workaround_gcc296_bugs);
 
-extern Bool MAC_(process_common_cmd_line_option)(Char* arg);
+extern Bool MAC_(process_common_cmd_line_option) ( Char* arg );
+extern void MAC_(print_common_usage)             ( void );
+extern void MAC_(print_common_debug_usage)       ( void );
+
+
+/*------------------------------------------------------------*/
+/*--- Variables                                            ---*/
+/*------------------------------------------------------------*/
+
+/* For tracking malloc'd blocks */
+extern VgHashTable MAC_(malloc_list);
+
+/* Function pointers for the two skins to track interesting events. */
+extern void (*MAC_(new_mem_heap)) ( Addr a, UInt len, Bool is_inited );
+extern void (*MAC_(ban_mem_heap)) ( Addr a, UInt len );
+extern void (*MAC_(die_mem_heap)) ( Addr a, UInt len );
+extern void (*MAC_(copy_mem_heap))( Addr from, Addr to, UInt len );
+
+/* Used in describe_addr() */
+extern Bool (*MAC_(describe_addr_supp))    ( Addr a, AddrInfo* ai );
 
 
 /*------------------------------------------------------------*/
 /*--- Functions                                            ---*/
 /*------------------------------------------------------------*/
 
-extern void        MAC_(set_where) ( ShadowChunk* sc, ExeContext* ec );
-extern ExeContext *MAC_(get_where) ( ShadowChunk* sc );
-
 extern void MAC_(pp_AddrInfo) ( Addr a, AddrInfo* ai );
 
 extern void MAC_(clear_MAC_Error)          ( MAC_Error* err_extra );
 
-extern Bool (*MAC_(describe_addr_supp))    ( Addr a, AddrInfo* ai );
-
 extern Bool MAC_(shared_recognised_suppression) ( Char* name, Supp* su );
 
 extern void MAC_(record_address_error)     ( Addr a, Int size, Bool isWrite );
@@ -254,13 +288,12 @@
 
 extern void MAC_(pp_shared_SkinError)      ( Error* err);
 
-extern void MAC_(init_prof_mem) ( void );
-extern void MAC_(done_prof_mem) ( void );
+extern MAC_Chunk* MAC_(first_matching_freed_MAC_Chunk)( Bool (*p)(MAC_Chunk*) );
 
-extern Int          MAC_(count_freelist)  ( void ) __attribute__ ((unused));
-extern void         MAC_(freelist_sanity) ( void ) __attribute__ ((unused));
-extern ShadowChunk* MAC_(any_matching_freed_ShadowChunks) 
-                            ( Bool (*p)(ShadowChunk*) );
+extern void MAC_(common_pre_clo_init) ( void );
+extern void MAC_(common_fini)         ( void (*leak_check)(void) );
+
+extern void MAC_(print_malloc_stats) ( void );
 
 /* For leak checking */
 extern void MAC_(pp_LeakError)(void* vl, UInt n_this_record, 
@@ -281,8 +314,8 @@
 extern __attribute__((regparm(1))) void MAC_(die_mem_stack_16) ( Addr old_ESP );
 extern __attribute__((regparm(1))) void MAC_(new_mem_stack_32) ( Addr old_ESP );
 extern __attribute__((regparm(1))) void MAC_(die_mem_stack_32) ( Addr old_ESP );
-extern                             void MAC_(die_mem_stack) ( Addr a, UInt len );
-extern                             void MAC_(new_mem_stack) ( Addr a, UInt len );
+extern                             void MAC_(die_mem_stack) ( Addr a, UInt len);
+extern                             void MAC_(new_mem_stack) ( Addr a, UInt len);
 
 
 /*------------------------------------------------------------*/
@@ -290,7 +323,7 @@
 /*------------------------------------------------------------*/
 
 /* Some noble preprocessor abuse, to enable Memcheck and Addrcheck to
-   share this code, but not call the same functions.
+   share this code, but call different functions.
 
    Note that this code is executed very frequently and must be highly
    optimised, which is why I resort to the preprocessor to achieve the
diff --git a/memcheck/mc_main.c b/memcheck/mc_main.c
index 46ae522..908fce0 100644
--- a/memcheck/mc_main.c
+++ b/memcheck/mc_main.c
@@ -1503,19 +1503,20 @@
    return True;
 }
 
-Char* SK_(usage)(void)
+void SK_(print_usage)(void)
 {  
-   return  
-"    --partial-loads-ok=no|yes too hard to explain here; see manual [yes]\n"
-"    --freelist-vol=<number>   volume of freed blocks queue [1000000]\n"
-"    --leak-check=no|yes       search for memory leaks at exit? [no]\n"
-"    --leak-resolution=low|med|high\n"
-"                              amount of bt merging in leak check [low]\n"
-"    --show-reachable=no|yes   show reachable blocks in leak check? [no]\n"
-"    --workaround-gcc296-bugs=no|yes  self explanatory [no]\n"
-"\n"
+   MAC_(print_common_usage)();
+   VG_(printf)(
+"    --avoid-strlen-errors=no|yes  suppress errs from inlined strlen [yes]\n"
+   );
+}
+
+void SK_(print_debug_usage)(void)
+{  
+   MAC_(print_common_debug_usage)();
+   VG_(printf)(
 "    --cleanup=no|yes          improve after instrumentation? [yes]\n"
-"    --avoid-strlen-errors=no|yes  suppress errs from inlined strlen [yes]\n";
+   );
 }
 
 
@@ -1536,21 +1537,30 @@
    VG_(needs_core_errors)         ();
    VG_(needs_skin_errors)         ();
    VG_(needs_libc_freeres)        ();
-   VG_(needs_sizeof_shadow_block) ( 1 );
    VG_(needs_shadow_regs)         ();
    VG_(needs_command_line_options)();
    VG_(needs_client_requests)     ();
    VG_(needs_extended_UCode)      ();
    VG_(needs_syscall_wrapper)     ();
-   VG_(needs_alternative_free)    ();
    VG_(needs_sanity_checks)       ();
 
+   MAC_( new_mem_heap)             = & mc_new_mem_heap;
+   MAC_( ban_mem_heap)             = & MC_(make_noaccess);
+   MAC_(copy_mem_heap)             = & mc_copy_address_range_state;
+   MAC_( die_mem_heap)             = & MC_(make_noaccess);
+
    VG_(track_new_mem_startup)      ( & mc_new_mem_startup );
-   VG_(track_new_mem_heap)         ( & mc_new_mem_heap );
    VG_(track_new_mem_stack_signal) ( & MC_(make_writable) );
    VG_(track_new_mem_brk)          ( & MC_(make_writable) );
    VG_(track_new_mem_mmap)         ( & mc_set_perms );
    
+   VG_(track_copy_mem_remap)       ( & mc_copy_address_range_state );
+   VG_(track_change_mem_mprotect)  ( & mc_set_perms );
+      
+   VG_(track_die_mem_stack_signal) ( & MC_(make_noaccess) ); 
+   VG_(track_die_mem_brk)          ( & MC_(make_noaccess) );
+   VG_(track_die_mem_munmap)       ( & MC_(make_noaccess) ); 
+
    VG_(track_new_mem_stack_4)      ( & MAC_(new_mem_stack_4)  );
    VG_(track_new_mem_stack_8)      ( & MAC_(new_mem_stack_8)  );
    VG_(track_new_mem_stack_12)     ( & MAC_(new_mem_stack_12) );
@@ -1558,18 +1568,6 @@
    VG_(track_new_mem_stack_32)     ( & MAC_(new_mem_stack_32) );
    VG_(track_new_mem_stack)        ( & MAC_(new_mem_stack)    );
 
-   VG_(track_copy_mem_heap)        ( & mc_copy_address_range_state );
-   VG_(track_copy_mem_remap)       ( & mc_copy_address_range_state );
-   VG_(track_change_mem_mprotect)  ( & mc_set_perms );
-      
-   VG_(track_ban_mem_heap)         ( & MC_(make_noaccess) );
-   VG_(track_ban_mem_stack)        ( & MC_(make_noaccess) );
-
-   VG_(track_die_mem_heap)         ( & MC_(make_noaccess) );
-   VG_(track_die_mem_stack_signal) ( & MC_(make_noaccess) ); 
-   VG_(track_die_mem_brk)          ( & MC_(make_noaccess) );
-   VG_(track_die_mem_munmap)       ( & MC_(make_noaccess) ); 
-
    VG_(track_die_mem_stack_4)      ( & MAC_(die_mem_stack_4)  );
    VG_(track_die_mem_stack_8)      ( & MAC_(die_mem_stack_8)  );
    VG_(track_die_mem_stack_12)     ( & MAC_(die_mem_stack_12) );
@@ -1577,8 +1575,7 @@
    VG_(track_die_mem_stack_32)     ( & MAC_(die_mem_stack_32) );
    VG_(track_die_mem_stack)        ( & MAC_(die_mem_stack)    );
    
-   VG_(track_bad_free)             ( & MAC_(record_free_error) );
-   VG_(track_mismatched_free)      ( & MAC_(record_freemismatch_error) );
+   VG_(track_ban_mem_stack)        ( & MC_(make_noaccess) );
 
    VG_(track_pre_mem_read)         ( & mc_check_is_readable );
    VG_(track_pre_mem_read_asciiz)  ( & mc_check_is_readable_asciiz );
@@ -1609,7 +1606,7 @@
    MAC_(describe_addr_supp) = MC_(client_perm_maybe_describe);
 
    init_shadow_memory();
-   MAC_(init_prof_mem)();
+   MAC_(common_pre_clo_init)();
 }
 
 void SK_(post_clo_init) ( void )
@@ -1618,20 +1615,8 @@
 
 void SK_(fini) ( void )
 {
-   VG_(print_malloc_stats)();
-
-   if (VG_(clo_verbosity) == 1) {
-      if (!MAC_(clo_leak_check))
-         VG_(message)(Vg_UserMsg, 
-             "For a detailed leak analysis,  rerun with: --leak-check=yes");
-
-      VG_(message)(Vg_UserMsg, 
-                   "For counts of detected errors, rerun with: -v");
-   }
-   if (MAC_(clo_leak_check)) MC_(detect_memory_leaks)();
-
-   MAC_(done_prof_mem)();
-
+   MAC_(common_fini)( MC_(detect_memory_leaks) );
+   
    if (0) {
       VG_(message)(Vg_DebugMsg, 
         "------ Valgrind's client block stats follow ---------------" );
diff --git a/memcheck/mc_replace_strmem.c b/memcheck/mc_replace_strmem.c
new file mode 100644
index 0000000..7776439
--- /dev/null
+++ b/memcheck/mc_replace_strmem.c
@@ -0,0 +1,258 @@
+
+/*--------------------------------------------------------------------*/
+/*--- Replacements for strcpy(), memcpy() et al, which run on the  ---*/
+/*--- simulated CPU.                                               ---*/
+/*---                                          mc_replace_strmem.c ---*/
+/*--------------------------------------------------------------------*/
+
+/*
+   This file is part of MemCheck, a heavyweight Valgrind skin for
+   detecting memory errors, and AddrCheck, a lightweight Valgrind skin 
+   for detecting memory errors.
+
+   Copyright (C) 2000-2002 Julian Seward 
+      jseward@acm.org
+
+   This program is free software; you can redistribute it and/or
+   modify it under the terms of the GNU General Public License as
+   published by the Free Software Foundation; either version 2 of the
+   License, or (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software
+   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+   02111-1307, USA.
+
+   The GNU General Public License is contained in the file COPYING.
+*/
+
+#include "vg_skin.h"
+
+#define __VALGRIND_SOMESKIN_H
+#include "valgrind.h"
+
+/* For snprintf(), ok because on simd CPU */
+#include <stdio.h>
+
+/* ---------------------------------------------------------------------
+   The normal versions of these functions are hyper-optimised, which fools
+   Memcheck and cause spurious value warnings.  So we replace them with
+   simpler versions.  THEY RUN ON SIMD CPU!
+   ------------------------------------------------------------------ */
+
+static __inline__
+Bool is_overlap ( void* dst, const void* src, UInt len )
+{
+   Int diff = src-dst;
+
+   if (diff < 0) 
+      diff = -diff;
+
+   return (diff < len);
+}
+
+static __inline__
+void complain2 ( Char* s, char* dst, const char* src )
+{
+   Char  buf[100];
+   int   res = 0;    /* unused; initialise to shut gcc up */
+
+   snprintf(buf, 100,
+            "Warning: src and dst overlap in %s(%p, %p)", s, dst, src );
+   VALGRIND_MAGIC_SEQUENCE(res, 0, /* irrelevant default */
+                           VG_USERREQ__LOGMESSAGE, buf, 0, 0, 0);
+}
+
+static __inline__
+void complain3 ( Char* s, void* dst, const void* src, int n )
+{
+   Char  buf[100];
+   int   res = 0;    /* unused; initialise to shut gcc up */
+
+   snprintf(buf, 100,
+            "Warning: src and dst overlap in %s(%p, %p, %d)", s, dst, src, n );
+   VALGRIND_MAGIC_SEQUENCE(res, 0, /* irrelevant default */
+                           VG_USERREQ__LOGMESSAGE, buf, 0, 0, 0);
+}
+
+char* strrchr ( const char* s, int c )
+{
+   UChar  ch   = (UChar)((UInt)c);
+   UChar* p    = (UChar*)s;
+   UChar* last = NULL;
+   while (True) {
+      if (*p == ch) last = p;
+      if (*p == 0) return last;
+      p++;
+   }
+}
+
+char* strchr ( const char* s, int c )
+{
+   UChar  ch = (UChar)((UInt)c);
+   UChar* p  = (UChar*)s;
+   while (True) {
+      if (*p == ch) return p;
+      if (*p == 0) return NULL;
+      p++;
+   }
+}
+
+char* strcat ( char* dst, const char* src )
+{
+   Char* dst_orig = dst;
+   while (*dst) dst++;
+   while (*src) *dst++ = *src++;
+   *dst = 0;
+
+   /* This is a bit redundant, I think;  any overlap and the strcat will
+      go forever... or until a seg fault occurs. */
+   if (is_overlap(dst, src, (Addr)dst-(Addr)dst_orig+1))
+      complain2("strcat", dst, src);
+
+   return dst_orig;
+}
+
+char* strncat ( char* dst, const char* src, int n )
+{
+   Char* dst_orig = dst;
+   Int   m = 0;
+
+   while (*dst) dst++;
+   while (*src && m++ < n) *dst++ = *src++;  /* concat at most n chars */
+   *dst = 0;                                 /* then add null (always) */
+
+   /* This checks for overlap after copying, unavoidable without
+      pre-counting lengths... should be ok */
+   if (is_overlap(dst, src, (Addr)dst-(Addr)dst_orig+1))
+      complain3("strncat", dst, src, n);
+
+   return dst_orig;
+}
+
+unsigned int strlen ( const char* str )
+{
+   UInt i = 0;
+   while (str[i] != 0) i++;
+   return i;
+}
+
+char* strcpy ( char* dst, const char* src )
+{
+   Char* dst_orig = dst;
+
+   while (*src) *dst++ = *src++;
+   *dst = 0;
+
+   /* This checks for overlap after copying, unavoidable without
+      pre-counting length... should be ok */
+   if (is_overlap(dst, src, (Addr)dst-(Addr)dst_orig+1))
+      complain2("strcpy", dst, src);
+
+   return dst_orig;
+}
+
+char* strncpy ( char* dst, const char* src, int n )
+{
+   Char* dst_orig = dst;
+   Int   m = 0;
+
+   if (is_overlap(dst, src, n))
+      complain3("strncpy", dst, src, n);
+
+   while (*src && m++ < n) *dst++ = *src++;
+   while (m++ < n) *dst++ = 0;         /* must pad remainder with nulls */
+
+   return dst_orig;
+}
+
+int strncmp ( const unsigned char* s1, const unsigned char* s2, 
+              unsigned int nmax )
+{
+   unsigned int n = 0;
+   while (True) {
+      if (n >= nmax) return 0;
+      if (*s1 == 0 && *s2 == 0) return 0;
+      if (*s1 == 0) return -1;
+      if (*s2 == 0) return 1;
+
+      if (*(unsigned char*)s1 < *(unsigned char*)s2) return -1;
+      if (*(unsigned char*)s1 > *(unsigned char*)s2) return 1;
+
+      s1++; s2++; n++;
+   }
+}
+
+int strcmp ( const char* s1, const char* s2 )
+{
+   register unsigned char c1;
+   register unsigned char c2;
+   while (True) {
+      c1 = *(unsigned char *)s1;
+      c2 = *(unsigned char *)s2;
+      if (c1 != c2) break;
+      if (c1 == 0) break;
+      s1++; s2++;
+   }
+   if ((unsigned char)c1 < (unsigned char)c2) return -1;
+   if ((unsigned char)c1 > (unsigned char)c2) return 1;
+   return 0;
+}
+
+void* memchr(const void *s, int c, unsigned int n)
+{
+   unsigned int i;
+   UChar c0 = (UChar)c;
+   UChar* p = (UChar*)s;
+   for (i = 0; i < n; i++)
+      if (p[i] == c0) return (void*)(&p[i]);
+   return NULL;
+}
+
+void* memcpy( void *dst, const void *src, unsigned int len )
+{
+   register char *d;
+   register char *s;
+
+   if (is_overlap(dst, src, len))
+      complain3("memcpy", dst, src, len);
+      
+   if ( dst > src ) {
+      d = (char *)dst + len - 1;
+      s = (char *)src + len - 1;
+      while ( len >= 4 ) {
+         *d-- = *s--;
+         *d-- = *s--;
+         *d-- = *s--;
+         *d-- = *s--;
+         len -= 4;
+      }
+      while ( len-- ) {
+         *d-- = *s--;
+      }
+   } else if ( dst < src ) {
+      d = (char *)dst;
+      s = (char *)src;
+      while ( len >= 4 ) {
+         *d++ = *s++;
+         *d++ = *s++;
+         *d++ = *s++;
+         *d++ = *s++;
+         len -= 4;
+      }
+      while ( len-- ) {
+         *d++ = *s++;
+      }
+   }
+   return dst;
+}
+
+
+/*--------------------------------------------------------------------*/
+/*--- end                                      mc_replace_strmem.c ---*/
+/*--------------------------------------------------------------------*/
diff --git a/memcheck/tests/Makefile.am b/memcheck/tests/Makefile.am
index 019caef..816c452 100644
--- a/memcheck/tests/Makefile.am
+++ b/memcheck/tests/Makefile.am
@@ -31,6 +31,7 @@
 	inline.stderr.exp inline.stdout.exp inline.vgtest \
 	malloc1.stderr.exp malloc1.vgtest \
 	malloc2.stderr.exp malloc2.vgtest \
+	malloc3.stderr.exp malloc3.stdout.exp malloc3.vgtest \
 	manuel1.stderr.exp manuel1.stdout.exp manuel1.vgtest \
 	manuel2.stderr.exp manuel2.stdout.exp manuel2.vgtest \
 	manuel3.stderr.exp manuel3.vgtest \
@@ -39,7 +40,9 @@
 	mismatches.stderr.exp mismatches.vgtest \
 	mmaptest.stderr.exp mmaptest.vgtest \
 	nanoleak.stderr.exp nanoleak.vgtest \
+	nanoleak_supp.stderr.exp nanoleak_supp.vgtest nanoleak.supp \
 	new_override.stderr.exp new_override.vgtest \
+	overlap.stderr.exp overlap.stdout.exp overlap.vgtest
 	pushfpopf.stderr.exp pushfpopf.stdout.exp pushfpopf.vgtest \
 	realloc1.stderr.exp realloc1.vgtest \
 	realloc2.stderr.exp realloc2.vgtest \
@@ -57,8 +60,8 @@
 noinst_PROGRAMS = \
 	badaddrvalue badfree badjump badloop buflen_check clientperm \
 	doublefree errs1 exitprog fprw fwrite inits inline \
-	malloc1 malloc2 manuel1 manuel2 manuel3 \
-	memalign_test memcmptest mmaptest nanoleak pushfpopf \
+	malloc1 malloc2 malloc3 manuel1 manuel2 manuel3 \
+	memalign_test memcmptest mmaptest nanoleak overlap pushfpopf \
 	realloc1 realloc2 sigaltstack signal2 supp1 supp2 suppfree \
 	trivialleak tronical weirdioctl	\
 	mismatches new_override
@@ -83,6 +86,7 @@
 inline_SOURCES 	        = inline.c
 malloc1_SOURCES 	= malloc1.c
 malloc2_SOURCES 	= malloc2.c
+malloc3_SOURCES 	= malloc3.c
 manuel1_SOURCES 	= manuel1.c
 manuel2_SOURCES 	= manuel2.c
 manuel3_SOURCES 	= manuel3.c
@@ -90,6 +94,7 @@
 memalign_test_SOURCES 	= memalign_test.c
 memcmptest_SOURCES 	= memcmptest.c
 nanoleak_SOURCES 	= nanoleak.c
+overlap_SOURCES 	= overlap.c
 pushfpopf_SOURCES 	= pushfpopf_c.c pushfpopf_s.s
 realloc1_SOURCES 	= realloc1.c
 realloc2_SOURCES 	= realloc2.c
diff --git a/memcheck/tests/badaddrvalue.stderr.exp b/memcheck/tests/badaddrvalue.stderr.exp
index 28143a1..8bb0538 100644
--- a/memcheck/tests/badaddrvalue.stderr.exp
+++ b/memcheck/tests/badaddrvalue.stderr.exp
@@ -4,7 +4,7 @@
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/badaddrvalue)
    Address 0x........ is 1 bytes before a block of size 8 alloc'd
-   at 0x........: malloc (vg_clientfuncs.c:...)
+   at 0x........: malloc (vg_replace_malloc.c:...)
    by 0x........: main (badaddrvalue.c:7)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/badaddrvalue)
@@ -14,7 +14,7 @@
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/badaddrvalue)
    Address 0x........ is 1 bytes before a block of size 8 alloc'd
-   at 0x........: malloc (vg_clientfuncs.c:...)
+   at 0x........: malloc (vg_replace_malloc.c:...)
    by 0x........: main (badaddrvalue.c:7)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/badaddrvalue)
diff --git a/memcheck/tests/badfree-2trace.stderr.exp b/memcheck/tests/badfree-2trace.stderr.exp
index 741fd25..b019d38 100644
--- a/memcheck/tests/badfree-2trace.stderr.exp
+++ b/memcheck/tests/badfree-2trace.stderr.exp
@@ -1,11 +1,11 @@
 
 Invalid free() / delete / delete[]
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (badfree.c:12)
    Address 0x........ is not stack'd, malloc'd or free'd
 
 Invalid free() / delete / delete[]
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (badfree.c:15)
    Address 0x........ is on thread 1's stack
 
diff --git a/memcheck/tests/badfree.stderr.exp b/memcheck/tests/badfree.stderr.exp
index 95616fa..1b7f929 100644
--- a/memcheck/tests/badfree.stderr.exp
+++ b/memcheck/tests/badfree.stderr.exp
@@ -1,13 +1,13 @@
 
 Invalid free() / delete / delete[]
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (badfree.c:12)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/badfree)
    Address 0x........ is not stack'd, malloc'd or free'd
 
 Invalid free() / delete / delete[]
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (badfree.c:15)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/badfree)
diff --git a/memcheck/tests/doublefree.stderr.exp b/memcheck/tests/doublefree.stderr.exp
index c483120..d8b584a 100644
--- a/memcheck/tests/doublefree.stderr.exp
+++ b/memcheck/tests/doublefree.stderr.exp
@@ -1,11 +1,11 @@
 
 Invalid free() / delete / delete[]
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (doublefree.c:10)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/doublefree)
    Address 0x........ is 0 bytes inside a block of size 177 free'd
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (doublefree.c:10)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/doublefree)
diff --git a/memcheck/tests/errs1.stderr.exp b/memcheck/tests/errs1.stderr.exp
index 2de4b48..bc3db41 100644
--- a/memcheck/tests/errs1.stderr.exp
+++ b/memcheck/tests/errs1.stderr.exp
@@ -5,7 +5,7 @@
    by 0x........: aaa (errs1.c:10)
    by 0x........: main (errs1.c:17)
    Address 0x........ is 1 bytes before a block of size 10 alloc'd
-   at 0x........: malloc (vg_clientfuncs.c:...)
+   at 0x........: malloc (vg_replace_malloc.c:...)
    by 0x........: zzzzzzz (errs1.c:12)
    by 0x........: yyy (errs1.c:13)
    by 0x........: xxx (errs1.c:14)
@@ -16,7 +16,7 @@
    by 0x........: aaa (errs1.c:10)
    by 0x........: main (errs1.c:17)
    Address 0x........ is 1 bytes before a block of size 10 alloc'd
-   at 0x........: malloc (vg_clientfuncs.c:...)
+   at 0x........: malloc (vg_replace_malloc.c:...)
    by 0x........: zzzzzzz (errs1.c:12)
    by 0x........: yyy (errs1.c:13)
    by 0x........: xxx (errs1.c:14)
diff --git a/memcheck/tests/exitprog.stderr.exp b/memcheck/tests/exitprog.stderr.exp
index 97a58a4..40b39a6 100644
--- a/memcheck/tests/exitprog.stderr.exp
+++ b/memcheck/tests/exitprog.stderr.exp
@@ -4,7 +4,7 @@
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/exitprog)
    Address 0x........ is 0 bytes after a block of size 1000000 alloc'd
-   at 0x........: malloc (vg_clientfuncs.c:...)
+   at 0x........: malloc (vg_replace_malloc.c:...)
    by 0x........: main (exitprog.c:12)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/exitprog)
diff --git a/memcheck/tests/filter_stderr b/memcheck/tests/filter_stderr
index 511dca0..df2e946 100755
--- a/memcheck/tests/filter_stderr
+++ b/memcheck/tests/filter_stderr
@@ -7,8 +7,8 @@
 # Anonymise addresses
 $dir/../../tests/filter_addresses                       |
 
-# Anonymise line numbers in vg_clientfuncs.c
-sed "s/vg_clientfuncs.c:[0-9]\+/vg_clientfuncs.c:.../"  |
+# Anonymise line numbers in vg_replace_malloc.c
+sed "s/vg_replace_malloc.c:[0-9]\+/vg_replace_malloc.c:.../"  |
 
 $dir/../../tests/filter_test_paths                      |
 
diff --git a/memcheck/tests/fprw.stderr.exp b/memcheck/tests/fprw.stderr.exp
index 53fcbdf..a7f6939 100644
--- a/memcheck/tests/fprw.stderr.exp
+++ b/memcheck/tests/fprw.stderr.exp
@@ -24,7 +24,7 @@
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/fprw)
    Address 0x........ is 0 bytes inside a block of size 8 free'd
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (fprw.c:18)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/fprw)
@@ -34,7 +34,7 @@
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/fprw)
    Address 0x........ is 0 bytes inside a block of size 8 free'd
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (fprw.c:18)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/fprw)
@@ -44,7 +44,7 @@
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/fprw)
    Address 0x........ is 0 bytes inside a block of size 4 free'd
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (fprw.c:19)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/fprw)
@@ -54,13 +54,13 @@
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/fprw)
    Address 0x........ is 0 bytes inside a block of size 4 free'd
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (fprw.c:19)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/fprw)
 
 Invalid free() / delete / delete[]
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (fprw.c:22)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/fprw)
@@ -71,7 +71,7 @@
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/fprw)
    Address 0x........ is 0 bytes inside a block of size 4 alloc'd
-   at 0x........: malloc (vg_clientfuncs.c:...)
+   at 0x........: malloc (vg_replace_malloc.c:...)
    by 0x........: main (fprw.c:23)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/fprw)
diff --git a/memcheck/tests/fwrite.stderr.exp b/memcheck/tests/fwrite.stderr.exp
index 11b0eba..ed6aa87 100644
--- a/memcheck/tests/fwrite.stderr.exp
+++ b/memcheck/tests/fwrite.stderr.exp
@@ -4,7 +4,7 @@
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/fwrite)
    Address 0x........ is 0 bytes inside a block of size 10 alloc'd
-   at 0x........: malloc (vg_clientfuncs.c:...)
+   at 0x........: malloc (vg_replace_malloc.c:...)
    by 0x........: main (fwrite.c:6)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/fwrite)
diff --git a/memcheck/tests/inline.stderr.exp b/memcheck/tests/inline.stderr.exp
index 9d9f79a..ffdb214 100644
--- a/memcheck/tests/inline.stderr.exp
+++ b/memcheck/tests/inline.stderr.exp
@@ -5,7 +5,7 @@
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/inline)
    Address 0x........ is 0 bytes after a block of size 40 alloc'd
-   at 0x........: calloc (vg_clientfuncs.c:...)
+   at 0x........: calloc (vg_replace_malloc.c:...)
    by 0x........: main (inline.c:17)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/inline)
diff --git a/memcheck/tests/malloc1.stderr.exp b/memcheck/tests/malloc1.stderr.exp
index 1571222..d450a38 100644
--- a/memcheck/tests/malloc1.stderr.exp
+++ b/memcheck/tests/malloc1.stderr.exp
@@ -5,7 +5,7 @@
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/malloc1)
    Address 0x........ is 1 bytes inside a block of size 10 free'd
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: really (malloc1.c:19)
    by 0x........: main (malloc1.c:9)
    by 0x........: __libc_start_main (...libc...)
@@ -16,7 +16,7 @@
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/malloc1)
    Address 0x........ is 1 bytes before a block of size 10 alloc'd
-   at 0x........: malloc (vg_clientfuncs.c:...)
+   at 0x........: malloc (vg_replace_malloc.c:...)
    by 0x........: really (malloc1.c:21)
    by 0x........: main (malloc1.c:9)
    by 0x........: __libc_start_main (...libc...)
diff --git a/memcheck/tests/malloc2.stderr.exp b/memcheck/tests/malloc2.stderr.exp
index 141a1ca..5463e17 100644
--- a/memcheck/tests/malloc2.stderr.exp
+++ b/memcheck/tests/malloc2.stderr.exp
@@ -4,18 +4,18 @@
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/malloc2)
    Address 0x........ is 0 bytes inside a block of size 429 free'd
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (malloc2.c:38)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/malloc2)
 
 Invalid free() / delete / delete[]
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (malloc2.c:43)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/malloc2)
    Address 0x........ is 0 bytes inside a block of size 429 free'd
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (malloc2.c:38)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/malloc2)
diff --git a/memcheck/tests/malloc3.c b/memcheck/tests/malloc3.c
new file mode 100644
index 0000000..896645c
--- /dev/null
+++ b/memcheck/tests/malloc3.c
@@ -0,0 +1,32 @@
+
+/* test of plausible behaviour with malloc and stupid args */
+
+#include <stdlib.h>
+#include <stdio.h>
+
+int main ( void )
+{
+  char* p;
+
+  p = malloc(0);
+  printf("malloc(0) = %p\n", p);
+  free(p);
+
+  p = malloc(-1);
+  printf("malloc(-1) = %p\n", p);
+  free(p);
+
+  p = calloc(0,1);
+  printf("calloc(0,1) = %p\n", p);
+  free(p);
+
+  p = calloc(0,-1);
+  printf("calloc(0,-1) = %p\n", p);
+  free(p);
+
+  p = calloc(-1,-1);
+  printf("calloc(-1,-1) = %p\n", p);
+  free(p);
+
+  return 0;
+}
diff --git a/memcheck/tests/malloc3.stderr.exp b/memcheck/tests/malloc3.stderr.exp
new file mode 100644
index 0000000..9a908f3
--- /dev/null
+++ b/memcheck/tests/malloc3.stderr.exp
@@ -0,0 +1,10 @@
+
+Warning: silly arg (-1) to malloc()
+Warning: silly args (0,-1) to calloc()
+Warning: silly args (-1,-1) to calloc()
+
+ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
+malloc/free: in use at exit: 0 bytes in 0 blocks.
+malloc/free: 2 allocs, 2 frees, 0 bytes allocated.
+For a detailed leak analysis,  rerun with: --leak-check=yes
+For counts of detected errors, rerun with: -v
diff --git a/memcheck/tests/malloc3.stdout.exp b/memcheck/tests/malloc3.stdout.exp
new file mode 100644
index 0000000..681c9ec
--- /dev/null
+++ b/memcheck/tests/malloc3.stdout.exp
@@ -0,0 +1,5 @@
+malloc(0) = 0x........
+malloc(-1) = (nil)
+calloc(0,1) = 0x........
+calloc(0,-1) = (nil)
+calloc(-1,-1) = (nil)
diff --git a/memcheck/tests/malloc3.vgtest b/memcheck/tests/malloc3.vgtest
new file mode 100644
index 0000000..5e1b749
--- /dev/null
+++ b/memcheck/tests/malloc3.vgtest
@@ -0,0 +1,2 @@
+prog: malloc3
+stdout_filter: ../../tests/filter_addresses
diff --git a/memcheck/tests/memalign_test.stderr.exp b/memcheck/tests/memalign_test.stderr.exp
index a23a969..7c69342 100644
--- a/memcheck/tests/memalign_test.stderr.exp
+++ b/memcheck/tests/memalign_test.stderr.exp
@@ -1,11 +1,11 @@
 
 Invalid free() / delete / delete[]
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (memalign_test.c:17)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/memalign_test)
    Address 0x........ is 0 bytes inside a block of size 111110 free'd
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (memalign_test.c:15)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/memalign_test)
diff --git a/memcheck/tests/mismatches.stderr.exp b/memcheck/tests/mismatches.stderr.exp
index d216443..93ad6da 100644
--- a/memcheck/tests/mismatches.stderr.exp
+++ b/memcheck/tests/mismatches.stderr.exp
@@ -1,66 +1,66 @@
 
 Mismatched free() / delete / delete []
-   at 0x........: __builtin_delete (vg_clientfuncs.c:...)
+   at 0x........: __builtin_delete (vg_replace_malloc.c:...)
    by 0x........: main (mismatches.cpp:6)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/mismatches)
    Address 0x........ is 0 bytes inside a block of size 10 alloc'd
-   at 0x........: malloc (vg_clientfuncs.c:...)
+   at 0x........: malloc (vg_replace_malloc.c:...)
    by 0x........: main (mismatches.cpp:5)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/mismatches)
 
 Mismatched free() / delete / delete []
-   at 0x........: __builtin_vec_delete (vg_clientfuncs.c:...)
+   at 0x........: __builtin_vec_delete (vg_replace_malloc.c:...)
    by 0x........: main (mismatches.cpp:8)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/mismatches)
    Address 0x........ is 0 bytes inside a block of size 10 alloc'd
-   at 0x........: malloc (vg_clientfuncs.c:...)
+   at 0x........: malloc (vg_replace_malloc.c:...)
    by 0x........: main (mismatches.cpp:7)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/mismatches)
 
 Mismatched free() / delete / delete []
-   at 0x........: __builtin_delete (vg_clientfuncs.c:...)
+   at 0x........: __builtin_delete (vg_replace_malloc.c:...)
    by 0x........: main (mismatches.cpp:13)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/mismatches)
    Address 0x........ is 0 bytes inside a block of size 40 alloc'd
-   at 0x........: __builtin_vec_new (vg_clientfuncs.c:...)
+   at 0x........: __builtin_vec_new (vg_replace_malloc.c:...)
    by 0x........: main (mismatches.cpp:12)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/mismatches)
 
 Mismatched free() / delete / delete []
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (mismatches.cpp:15)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/mismatches)
    Address 0x........ is 0 bytes inside a block of size 40 alloc'd
-   at 0x........: __builtin_vec_new (vg_clientfuncs.c:...)
+   at 0x........: __builtin_vec_new (vg_replace_malloc.c:...)
    by 0x........: main (mismatches.cpp:14)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/mismatches)
 
 Mismatched free() / delete / delete []
-   at 0x........: __builtin_vec_delete (vg_clientfuncs.c:...)
+   at 0x........: __builtin_vec_delete (vg_replace_malloc.c:...)
    by 0x........: main (mismatches.cpp:20)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/mismatches)
    Address 0x........ is 0 bytes inside a block of size 4 alloc'd
-   at 0x........: __builtin_new (vg_clientfuncs.c:...)
+   at 0x........: __builtin_new (vg_replace_malloc.c:...)
    by 0x........: main (mismatches.cpp:19)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/mismatches)
 
 Mismatched free() / delete / delete []
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (mismatches.cpp:22)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/mismatches)
    Address 0x........ is 0 bytes inside a block of size 4 alloc'd
-   at 0x........: __builtin_new (vg_clientfuncs.c:...)
+   at 0x........: __builtin_new (vg_replace_malloc.c:...)
    by 0x........: main (mismatches.cpp:21)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/mismatches)
diff --git a/memcheck/tests/nanoleak.stderr.exp b/memcheck/tests/nanoleak.stderr.exp
index 96eefd1..3183eee 100644
--- a/memcheck/tests/nanoleak.stderr.exp
+++ b/memcheck/tests/nanoleak.stderr.exp
@@ -8,7 +8,7 @@
 checked ... bytes.
 
 1000 bytes in 1 blocks are definitely lost in loss record 1 of 1
-   at 0x........: malloc (vg_clientfuncs.c:...)
+   at 0x........: malloc (vg_replace_malloc.c:...)
    by 0x........: main (nanoleak.c:6)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/nanoleak)
diff --git a/memcheck/tests/nanoleak.supp b/memcheck/tests/nanoleak.supp
new file mode 100644
index 0000000..6c87853
--- /dev/null
+++ b/memcheck/tests/nanoleak.supp
@@ -0,0 +1,8 @@
+{
+   this_is_the_nanoleak_suppression_name
+   Addrcheck,Memcheck:Leak
+   fun:malloc
+   fun:main
+   fun:__libc_start_main
+}
+
diff --git a/memcheck/tests/nanoleak_supp.stderr.exp b/memcheck/tests/nanoleak_supp.stderr.exp
new file mode 100644
index 0000000..9fb7bfe
--- /dev/null
+++ b/memcheck/tests/nanoleak_supp.stderr.exp
@@ -0,0 +1,17 @@
+
+
+ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
+malloc/free: in use at exit: 1000 bytes in 1 blocks.
+malloc/free: 1 allocs, 0 frees, 1000 bytes allocated.
+For counts of detected errors, rerun with: -v
+searching for pointers to 1 not-freed blocks.
+checked ... bytes.
+
+LEAK SUMMARY:
+   definitely lost: 0 bytes in 0 blocks.
+   possibly lost:   0 bytes in 0 blocks.
+   still reachable: 0 bytes in 0 blocks.
+        suppressed: 1000 bytes in 1 blocks.
+Reachable blocks (those to which a pointer was found) are not shown.
+To see them, rerun with: --show-reachable=yes
+
diff --git a/memcheck/tests/nanoleak_supp.vgtest b/memcheck/tests/nanoleak_supp.vgtest
new file mode 100644
index 0000000..766099c
--- /dev/null
+++ b/memcheck/tests/nanoleak_supp.vgtest
@@ -0,0 +1,3 @@
+vgopts: --leak-check=yes --suppressions=nanoleak.supp
+prog: nanoleak
+stderr_filter: filter_leak_check_size
diff --git a/memcheck/tests/overlap.c b/memcheck/tests/overlap.c
new file mode 100644
index 0000000..04d2e37
--- /dev/null
+++ b/memcheck/tests/overlap.c
@@ -0,0 +1,115 @@
+#include <string.h>
+#include <stdio.h>
+
+char b[50];
+
+void reset_b(void)
+{
+   int i;
+
+   for (i = 0; i < 50; i++)
+      b[i] = '_';
+   b[49] = '\0';
+}
+
+void reset_b2(void)
+{
+   reset_b();
+   strcpy(b, "ABCDEFG");
+}
+
+int main(void)
+{
+   char x[100];
+   char a[] = "abcdefghijklmnopqrstuvwxyz";
+   int  i;
+
+   /* testing memcpy/strcpy overlap */
+
+   for (i = 0; i < 50; i++) {
+      x[i] = i+1;    // don't put any zeroes in there
+   }
+   for (i = 50; i < 100; i++) {
+      // because of the errors, the strcpy's will overrun, so put some
+      // zeroes in the second half to stop them eventually
+      x[i] = 0;  
+               
+   }
+
+   memcpy(x+20, x, 20);    // ok
+   memcpy(x+20, x, 21);    // overlap
+   memcpy(x, x+20, 20);    // ok
+   memcpy(x, x+20, 21);    // overlap
+
+   strncpy(x+20, x, 20);    // ok
+   strncpy(x+20, x, 21);    // overlap
+   strncpy(x, x+20, 20);    // ok
+   strncpy(x, x+20, 21);    // overlap
+   
+   x[39] = '\0';
+   strcpy(x, x+20);    // ok
+
+   x[39] = 39;
+   x[40] = '\0';
+   strcpy(x, x+20);    // overlap
+
+   x[19] = '\0';
+   strcpy(x+20, x);    // ok
+
+/*
+   x[19] = 19;
+   x[20] = '\0';
+   strcpy(x+20, x);    // overlap, but runs forever (or until it seg faults)
+*/
+
+   /* testing strcpy, strncpy() */
+
+   reset_b();
+   printf("`%s'\n", b);
+
+   strcpy(b, a);
+   printf("`%s'\n", b);
+   
+   reset_b();
+   strncpy(b, a, 25);
+   printf("`%s'\n", b);
+
+   reset_b();
+   strncpy(b, a, 26);
+   printf("`%s'\n", b);
+
+   reset_b();
+   strncpy(b, a, 27);
+   printf("`%s'\n", b);
+
+   printf("\n");
+
+   /* testing strncat() */
+
+   reset_b2();
+   printf("`%s'\n", b);
+   
+   reset_b2();
+   strcat(b, a);
+   printf("`%s'\n", b);
+   
+   reset_b2();
+   strncat(b, a, 25);
+   printf("`%s'\n", b);
+   
+   reset_b2();
+   strncat(b, a, 26);
+   printf("`%s'\n", b);
+   
+   reset_b2();
+   strncat(b, a, 27);
+   printf("`%s'\n", b);
+
+   /* Nb: can't actually get strcat warning -- if any overlap occurs, it will
+      always run forever, I think... */
+
+   strncat(a+20, a, 21);
+   strncat(a, a+20, 21);
+
+   return 0;
+}
diff --git a/memcheck/tests/overlap.stderr.exp b/memcheck/tests/overlap.stderr.exp
new file mode 100644
index 0000000..ed33593
--- /dev/null
+++ b/memcheck/tests/overlap.stderr.exp
@@ -0,0 +1,14 @@
+
+Warning: src and dst overlap in memcpy(0x........, 0x........, 21)
+Warning: src and dst overlap in memcpy(0x........, 0x........, 21)
+Warning: src and dst overlap in strncpy(0x........, 0x........, 21)
+Warning: src and dst overlap in strncpy(0x........, 0x........, 21)
+Warning: src and dst overlap in strcpy(0x........, 0x........)
+Warning: src and dst overlap in strncat(0x........, 0x........, 21)
+Warning: src and dst overlap in strncat(0x........, 0x........, 21)
+
+ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
+malloc/free: in use at exit: 0 bytes in 0 blocks.
+malloc/free: 0 allocs, 0 frees, 0 bytes allocated.
+For a detailed leak analysis,  rerun with: --leak-check=yes
+For counts of detected errors, rerun with: -v
diff --git a/memcheck/tests/overlap.stdout.exp b/memcheck/tests/overlap.stdout.exp
new file mode 100644
index 0000000..12cb02e
--- /dev/null
+++ b/memcheck/tests/overlap.stdout.exp
@@ -0,0 +1,11 @@
+`_________________________________________________'
+`abcdefghijklmnopqrstuvwxyz'
+`abcdefghijklmnopqrstuvwxy________________________'
+`abcdefghijklmnopqrstuvwxyz_______________________'
+`abcdefghijklmnopqrstuvwxyz'
+
+`ABCDEFG'
+`ABCDEFGabcdefghijklmnopqrstuvwxyz'
+`ABCDEFGabcdefghijklmnopqrstuvwxy'
+`ABCDEFGabcdefghijklmnopqrstuvwxyz'
+`ABCDEFGabcdefghijklmnopqrstuvwxyz'
diff --git a/memcheck/tests/overlap.vgtest b/memcheck/tests/overlap.vgtest
new file mode 100644
index 0000000..7d0d75e
--- /dev/null
+++ b/memcheck/tests/overlap.vgtest
@@ -0,0 +1 @@
+prog: overlap
diff --git a/memcheck/tests/suppfree.stderr.exp b/memcheck/tests/suppfree.stderr.exp
index 149bf84..5f4f4d5 100644
--- a/memcheck/tests/suppfree.stderr.exp
+++ b/memcheck/tests/suppfree.stderr.exp
@@ -1,11 +1,11 @@
 
 Invalid free() / delete / delete[]
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: ddd (suppfree.c:7)
    by 0x........: ccc (suppfree.c:12)
    by 0x........: bbb (suppfree.c:17)
    Address 0x........ is 0 bytes inside a block of size 10 free'd
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: ddd (suppfree.c:6)
    by 0x........: ccc (suppfree.c:12)
    by 0x........: bbb (suppfree.c:17)
diff --git a/memcheck/tests/trivialleak.stderr.exp b/memcheck/tests/trivialleak.stderr.exp
index 42cd261..77e0a60 100644
--- a/memcheck/tests/trivialleak.stderr.exp
+++ b/memcheck/tests/trivialleak.stderr.exp
@@ -8,7 +8,7 @@
 checked ... bytes.
 
 1000 bytes in 1000 blocks are definitely lost in loss record 1 of 1
-   at 0x........: malloc (vg_clientfuncs.c:...)
+   at 0x........: malloc (vg_replace_malloc.c:...)
    by 0x........: test (trivialleak.c:8)
    by 0x........: main (trivialleak.c:12)
    by 0x........: __libc_start_main (...libc...)