-----------------------------------------------------------------------------
overview
-----------------------------------------------------------------------------
Previously Valgrind had its own versions of malloc() et al that replaced
glibc's.  This is necessary for various reasons for Memcheck, but isn't needed,
and was actually detrimental, to some other skins.  I never managed to treat
this satisfactorily w.r.t the core/skin split.

Now I have.  If a skin needs to know about malloc() et al, it must provide its
own replacements.  But because this is not uncommon, the core provides a module
vg_replace_malloc.c which a skin can link with, which provides skeleton
definitions, to reduce the amount of work a skin must do.  The skeletons handle
the transfer of control from the simd CPU to the real CPU, and also the
--alignment, --sloppy-malloc and --trace-malloc options.  These skeleton
definitions subsequently call functions SK_(malloc), SK_(free), etc, which the
skin must define;  in these functions the skin can do the things it needs to do
about tracking heap blocks.

For skins that track extra info about malloc'd blocks -- previously done with
ShadowChunks -- there is a new file vg_hashtable.c that implements a
generic-ish hash table (using dodgy C-style inheritance using struct overlays)
which allows skins to continue doing this fairly easily.

Skins can also replace other functions too, eg. Memcheck has its own versions
of strcpy(), memcpy(), etc.

Overall, it's slightly more work now for skins that need to replace malloc(),
but other skins don't have to use Valgrind's malloc(), so they're getting a
"purer" program run, which is good, and most of the remaining rough edges from
the core/skin split have been removed.

-----------------------------------------------------------------------------
details
-----------------------------------------------------------------------------
Moved malloc() et al intercepts from vg_clientfuncs.c into vg_replace_malloc.c.
Skins can link to it if they want to replace malloc() and friends;  it does
some stuff then passes control to SK_(malloc)() et al which the skin must
define.  They can call VG_(cli_malloc)() and VG_(cli_free)() to do the actual
allocation/deallocation.  Redzone size for the client (the CLIENT arena) is
specified by the static variable VG_(vg_malloc_redzone_szB).
vg_replace_malloc.c thus represents a kind of "mantle" level service.

To get automake to build vg_replace_malloc.o, had to resort to a similar trick
as used for the demangler -- ask for a "no install" library (which is never
used) to be built from it.

Note that all malloc, calloc, realloc, builtin_new, builtin_vec_new, memalign
are now aware of --alignment, when running on simd CPU or real CPU.

This means the new_mem_heap, die_mem_heap, copy_mem_heap and ban_mem_heap
events no longer exist, since the core doesn't control malloc() any more, and
skins can watch for these events themselves.

This required moving all the ShadowChunk stuff out of the core, which meant
the sizeof_shadow_block ``need'' could be removed, yay -- it was a horrible
hack.  Now ShadowChunks are done with a generic HashTable type, in
vg_hashtable.c, which skins can "inherit from" (in a dodgy C-only fashion by
using structs with similar layouts).  Also, the free_list stuff was all moved
as a part of this.  Also, VgAllocKind was moved out of core into
Memcheck/Addrcheck and renamed MAC_AllocKind.

Moved these options out of core into vg_replace_malloc.c:
    --trace-malloc
    --sloppy-malloc
    --alignment

The alternative_free ``need'' could go, too, since Memcheck is now in complete
control of free(), yay -- another horribility.

The bad_free and free_mismatch events could go too, since they're now not
detected by core, yay -- yet another horribility.

Moved malloc() et al wrappers for Memcheck out of vg_clientmalloc.c into
mac_malloc_wrappers.c.  Helgrind has its own wrappers now too.

Introduced VG_USERREQ__CLIENT_CALL[123] client requests.  When a skin function
is operating on the simd CPU, this will call a given function and run it on the
real CPU.  The macros VG_NON_SIMD_CALL[123] in valgrind.h present a cleaner
interface to actually use.  Also introduce analogues of these that pass 'tst'
from the scheduler as the first arg to the called function -- needed for
MC_(client_malloc)() et al.

Fiddled with USERREQ_{MALLOC,FREE} etc. in vg_scheduler.c; they call
SK_({malloc,free})() which by default call VG_(cli_malloc)() -- can't call
glibc's malloc() here.  All the other default SK_(calloc)() etc. instantly
panic; there's a lock variable to ensure that the default SK_({malloc,free})()
are only called from the scheduler, which prevents a skin from forgetting to
override SK_({malloc,free})().  Got rid of the unused USERREQ_CALLOC,
USERREQ_BUILTIN_NEW, etc.

Moved special versions of strcpy/strlen, etc, memcpy() and memchr() into
mac_replace_strmem.c -- they are only necessary for memcheck, because the
hyper-optimised normal glibc versions confuse it, and for memcpy() etc. overlap
checking.

Also added dst/src overlap checks to strcpy(), memcpy(), strcat().  They are
reported not as proper errors, but just with single line warnings, as for silly
args to malloc() et al;  this is mainly because they're on the simulated CPU
and proper error handling would be a pain;  hopefully they're rare enough to
not be a problem.  The strcpy check is done after the copy, because it would
require counting the length of the string beforehand.  Also added strncpy() and
strncat(), which have overlap checks too.  Note that addrcheck doesn't do
overlap checking.

Put USERREQ__LOGMESSAGE in vg_skin.h to do the overlap check error messages.

After removing malloc() et al and strcpy() et al out of vg_clientfuncs.c, moved
the remaining three things (sigsuspend, VG_(__libc_freeres_wrapper),
__errno_location) into vg_intercept.c, since it contains things that run on the
simulated CPU too.  Removed vg_clientfuncs.c altogether.

Moved regression test "malloc3" out of corecheck into memcheck, since corecheck
no longer looks for silly (eg. negative) args to malloc().

Removed the m_eip, m_esp, m_ebp fields from the `Error' type.  They were being
set up, and then read immediately only once, only if GDB attachment was done.
So now they're just being held in local variables.  This saves 12 bytes per
Error.

Made replacement calloc() check for --sloppy-malloc;  previously it didn't.

Added "silly" negative size arg check to realloc(), it didn't have one.

Changed VG_(read_selfprocmaps)() so it can parse the file directly, or from a
previously read buffer.  Buffer can be filled with the new
VG_(read_selfprocmaps_contents)().  Using this at start-up to snapshot
/proc/self/maps before the skins do anything, and then parsing it once they
have done their setup stuff.  Skins can now safely call VG_(malloc)() in
SK_({pre,post}_clo_init)() without the mmap'd superblock erroneously being
identified as client memory.

Changed the --help usage message slightly, now divided into four sections: core
normal, skin normal, core debugging, skin debugging.  Changed the interface for
the command_line_options need slightly -- now two functions, VG_(print_usage)()
and VG_(print_debug_usage)(), and they do the printing themselves, instead of
just returning a string -- that's more flexible.

Removed DEBUG_CLIENTMALLOC code, it wasn't being used and was a pain.

Added a regression test testing leak suppressions (nanoleak_supp), and another
testing strcpy/memcpy/etc overlap warnings (overlap).

Also changed Addrcheck to link with the files shared with Memcheck, rather than
#including the .c files directly.

Commoned up a little more shared Addrcheck/Memcheck code, for the usage
message, and initialisation/finalisation.

Added a Bool param to VG_(unique_error)() dictating whether it should allow
GDB to be attached; for leak checks, because we don't want to attach GDB on
leak errors (causes seg faults).  A bit hacky, but it will do.

Had to change lots of the expected outputs from regression files now that
malloc() et al are in vg_replace_malloc.c rather than vg_clientfuncs.c.


git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1524 a5019735-40e9-0310-863c-91ae7b9d1cf9
diff --git a/Makefile.am b/Makefile.am
index a8769cc..751b099 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -1,9 +1,12 @@
 
 AUTOMAKE_OPTIONS = 1.5
 
+## coregrind must come before memcheck, addrcheck, helgrind, for
+##   vg_replace_malloc.o.
+## addrcheck must come after memcheck, for mac_*.o
 SUBDIRS = 	coregrind . docs tests include auxprogs \
-		addrcheck \
 		memcheck \
+		addrcheck \
 		cachegrind \
 		corecheck \
 		helgrind \
diff --git a/addrcheck/Makefile.am b/addrcheck/Makefile.am
index 1fd6b38..d0f49a1 100644
--- a/addrcheck/Makefile.am
+++ b/addrcheck/Makefile.am
@@ -1,7 +1,7 @@
 
 SUBDIRS = . docs tests
 
-# include memcheck/ for mc_common.{c,h}
+# include memcheck/ for mac_shared.h
 INCLUDES = -I$(top_srcdir)/include -I$(top_srcdir)/memcheck
 
 CFLAGS = $(WERROR) -DVG_LIBDIR="\"$(libdir)"\" \
@@ -13,4 +13,9 @@
 
 vgskin_addrcheck_so_SOURCES = ac_main.c
 vgskin_addrcheck_so_LDFLAGS = -shared
+vgskin_addrcheck_so_LDADD = \
+	../memcheck/mac_leakcheck.o \
+	../memcheck/mac_malloc_wrappers.o \
+	../memcheck/mac_needs.o \
+	../coregrind/vg_replace_malloc.o
 
diff --git a/addrcheck/ac_main.c b/addrcheck/ac_main.c
index fc13efc..f144218 100644
--- a/addrcheck/ac_main.c
+++ b/addrcheck/ac_main.c
@@ -34,10 +34,6 @@
 #include "memcheck.h"
 //#include "vg_profile.c"
 
-#include "mac_leakcheck.c"
-#include "mac_needs.c"
-
-
 
 VG_DETERMINE_INTERFACE_VERSION
 
@@ -1154,18 +1150,14 @@
    return MAC_(process_common_cmd_line_option)(arg);
 }
 
-Char* SK_(usage)(void)
+void SK_(print_usage)(void)
 {  
-   return  
-"    --partial-loads-ok=no|yes too hard to explain here; see manual [yes]\n"
-"    --freelist-vol=<number>   volume of freed blocks queue [1000000]\n"
-"    --leak-check=no|yes       search for memory leaks at exit? [no]\n"
-"    --leak-resolution=low|med|high\n"
-"                              amount of bt merging in leak check [low]\n"
-"    --show-reachable=no|yes   show reachable blocks in leak check? [no]\n"
-"    --workaround-gcc296-bugs=no|yes  self explanatory [no]\n"
-"\n"
-"    --cleanup=no|yes          improve after instrumentation? [yes]\n";
+   MAC_(print_common_usage)();
+}
+
+void SK_(print_debug_usage)(void)
+{  
+   MAC_(print_common_debug_usage)();
 }
 
 
@@ -1186,19 +1178,28 @@
    VG_(needs_core_errors)         ();
    VG_(needs_skin_errors)         ();
    VG_(needs_libc_freeres)        ();
-   VG_(needs_sizeof_shadow_block) ( 1 );
    VG_(needs_command_line_options)();
    VG_(needs_client_requests)     ();
    VG_(needs_syscall_wrapper)     ();
-   VG_(needs_alternative_free)    ();
    VG_(needs_sanity_checks)       ();
 
+   MAC_( new_mem_heap)             = & ac_new_mem_heap;
+   MAC_( ban_mem_heap)             = & ac_make_noaccess;
+   MAC_(copy_mem_heap)             = & ac_copy_address_range_state;
+   MAC_( die_mem_heap)             = & ac_make_noaccess;
+
    VG_(track_new_mem_startup)      ( & ac_new_mem_startup );
-   VG_(track_new_mem_heap)         ( & ac_new_mem_heap );
    VG_(track_new_mem_stack_signal) ( & ac_make_accessible );
    VG_(track_new_mem_brk)          ( & ac_make_accessible );
    VG_(track_new_mem_mmap)         ( & ac_set_perms );
    
+   VG_(track_copy_mem_remap)       ( & ac_copy_address_range_state );
+   VG_(track_change_mem_mprotect)  ( & ac_set_perms );
+      
+   VG_(track_die_mem_stack_signal) ( & ac_make_noaccess ); 
+   VG_(track_die_mem_brk)          ( & ac_make_noaccess );
+   VG_(track_die_mem_munmap)       ( & ac_make_noaccess ); 
+
    VG_(track_new_mem_stack_4)      ( & MAC_(new_mem_stack_4)  );
    VG_(track_new_mem_stack_8)      ( & MAC_(new_mem_stack_8)  );
    VG_(track_new_mem_stack_12)     ( & MAC_(new_mem_stack_12) );
@@ -1206,18 +1207,6 @@
    VG_(track_new_mem_stack_32)     ( & MAC_(new_mem_stack_32) );
    VG_(track_new_mem_stack)        ( & MAC_(new_mem_stack)    );
 
-   VG_(track_copy_mem_heap)        ( & ac_copy_address_range_state );
-   VG_(track_copy_mem_remap)       ( & ac_copy_address_range_state );
-   VG_(track_change_mem_mprotect)  ( & ac_set_perms );
-      
-   VG_(track_ban_mem_heap)         ( & ac_make_noaccess );
-   VG_(track_ban_mem_stack)        ( & ac_make_noaccess );
-
-   VG_(track_die_mem_heap)         ( & ac_make_noaccess );
-   VG_(track_die_mem_stack_signal) ( & ac_make_noaccess ); 
-   VG_(track_die_mem_brk)          ( & ac_make_noaccess );
-   VG_(track_die_mem_munmap)       ( & ac_make_noaccess ); 
-
    VG_(track_die_mem_stack_4)      ( & MAC_(die_mem_stack_4)  );
    VG_(track_die_mem_stack_8)      ( & MAC_(die_mem_stack_8)  );
    VG_(track_die_mem_stack_12)     ( & MAC_(die_mem_stack_12) );
@@ -1225,8 +1214,7 @@
    VG_(track_die_mem_stack_32)     ( & MAC_(die_mem_stack_32) );
    VG_(track_die_mem_stack)        ( & MAC_(die_mem_stack)    );
    
-   VG_(track_bad_free)             ( & MAC_(record_free_error) );
-   VG_(track_mismatched_free)      ( & MAC_(record_freemismatch_error) );
+   VG_(track_ban_mem_stack)        ( & ac_make_noaccess );
 
    VG_(track_pre_mem_read)         ( & ac_check_is_readable );
    VG_(track_pre_mem_read_asciiz)  ( & ac_check_is_readable_asciiz );
@@ -1243,7 +1231,7 @@
    VGP_(register_profile_event) ( VgpESPAdj,   "adjust-ESP" );
 
    init_shadow_memory();
-   MAC_(init_prof_mem)();
+   MAC_(common_pre_clo_init)();
 }
 
 void SK_(post_clo_init) ( void )
@@ -1252,19 +1240,7 @@
 
 void SK_(fini) ( void )
 {
-   VG_(print_malloc_stats)();
-
-   if (VG_(clo_verbosity) == 1) {
-      if (!MAC_(clo_leak_check))
-         VG_(message)(Vg_UserMsg, 
-             "For a detailed leak analysis,  rerun with: --leak-check=yes");
-
-      VG_(message)(Vg_UserMsg, 
-                   "For counts of detected errors, rerun with: -v");
-   }
-   if (MAC_(clo_leak_check)) ac_detect_memory_leaks();
-
-   MAC_(done_prof_mem)();
+   MAC_(common_fini)( ac_detect_memory_leaks );
 }
 
 /*--------------------------------------------------------------------*/
diff --git a/cachegrind/cg_main.c b/cachegrind/cg_main.c
index 0057de8..ae9f98d 100644
--- a/cachegrind/cg_main.c
+++ b/cachegrind/cg_main.c
@@ -1922,12 +1922,20 @@
    return True;
 }
 
-Char* SK_(usage)(void)
+void SK_(print_usage)(void)
 {
-   return 
+   VG_(printf)(
 "    --I1=<size>,<assoc>,<line_size>  set I1 cache manually\n"
 "    --D1=<size>,<assoc>,<line_size>  set D1 cache manually\n"
-"    --L2=<size>,<assoc>,<line_size>  set L2 cache manually\n";
+"    --L2=<size>,<assoc>,<line_size>  set L2 cache manually\n"
+   );
+}
+
+void SK_(print_debug_usage)(void)
+{
+   VG_(printf)(
+"    (none)\n"
+   );
 }
 
 /*--------------------------------------------------------------------*/
diff --git a/cachegrind/docs/cg_main.html b/cachegrind/docs/cg_main.html
index 0a9d295..6449e81 100644
--- a/cachegrind/docs/cg_main.html
+++ b/cachegrind/docs/cg_main.html
@@ -672,12 +672,8 @@
   <li>It doesn't account for cache misses not visible at the instruction level,
       eg. those arising from TLB misses, or speculative execution.</li><p>
 
-  <li>Valgrind's custom <code>malloc()</code> will allocate memory in different
-      ways to the standard <code>malloc()</code>, which could warp the results.
-      </li><p>
-
   <li>Valgrind's custom threads implementation will schedule threads
-      differently to the standard one.  This too could warp the results for
+      differently to the standard one.  This could warp the results for
       threaded programs.
       </li><p>
 
diff --git a/configure.in b/configure.in
index 1a4b81b..d00fbe4 100644
--- a/configure.in
+++ b/configure.in
@@ -1,5 +1,5 @@
 # Process this file with autoconf to produce a configure script.
-AC_INIT(coregrind/vg_clientmalloc.c)   # give me a source file, any source file...
+AC_INIT(coregrind/vg_main.c)   # give me a source file, any source file...
 AM_CONFIG_HEADER(config.h)
 AM_INIT_AUTOMAKE(valgrind, 1.9.4)
 
diff --git a/corecheck/tests/Makefile.am b/corecheck/tests/Makefile.am
index ef407e2..44c9d24 100644
--- a/corecheck/tests/Makefile.am
+++ b/corecheck/tests/Makefile.am
@@ -10,24 +10,18 @@
 
 EXTRA_DIST = \
 	$(noinst_SCRIPTS) \
-	erringfds.stderr.exp erringfds.stdout.exp \
-	erringfds.vgtest \
-	malloc3.stderr.exp malloc3.stdout.exp \
-	malloc3.vgtest \
-	pth_atfork1.stderr.exp \
-	pth_atfork1.stdout.exp pth_atfork1.vgtest \
+	erringfds.stderr.exp erringfds.stdout.exp erringfds.vgtest \
+	pth_atfork1.stderr.exp pth_atfork1.stdout.exp pth_atfork1.vgtest \
 	pth_cancel2.stderr.exp pth_cancel2.vgtest \
-	pth_cvsimple.stderr.exp \
-	pth_cvsimple.stdout.exp pth_cvsimple.vgtest \
+	pth_cvsimple.stderr.exp pth_cvsimple.stdout.exp pth_cvsimple.vgtest \
 	pth_empty.stderr.exp pth_empty.vgtest \
 	pth_mutexspeed.stderr.exp \
 	pth_mutexspeed.stdout.exp pth_mutexspeed.vgtest \
-	pth_once.stderr.exp pth_once.stdout.exp \
-	pth_once.vgtest \
+	pth_once.stderr.exp pth_once.stdout.exp pth_once.vgtest \
 	sigkill.stderr.exp sigkill.vgtest
 
 noinst_PROGRAMS = \
-	erringfds malloc3 sigkill \
+	erringfds sigkill \
 	pth_atfork1 pth_cancel2 pth_cvsimple pth_empty \
 	pth_mutexspeed pth_once
 
@@ -36,7 +30,6 @@
 
 # C ones
 erringfds_SOURCES 	= erringfds.c
-malloc3_SOURCES 	= malloc3.c
 sigkill_SOURCES 	= sigkill.c
 
 # Pthread ones
diff --git a/corecheck/tests/malloc3.stderr.exp b/corecheck/tests/malloc3.stderr.exp
deleted file mode 100644
index 97c1780..0000000
--- a/corecheck/tests/malloc3.stderr.exp
+++ /dev/null
@@ -1,6 +0,0 @@
-
-Warning: silly arg (-1) to malloc()
-Warning: silly args (0,-1) to calloc()
-Warning: silly args (-1,-1) to calloc()
-
-ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
diff --git a/coregrind/Makefile.am b/coregrind/Makefile.am
index 5588677..1170d2c 100644
--- a/coregrind/Makefile.am
+++ b/coregrind/Makefile.am
@@ -35,15 +35,14 @@
 valgrinq_so_LDFLAGS = -shared
 
 valgrind_so_SOURCES = \
-	vg_clientfuncs.c \
 	vg_scheduler.c \
-	vg_clientmalloc.c \
 	vg_default.c \
 	vg_demangle.c \
 	vg_dispatch.S \
 	vg_errcontext.c \
 	vg_execontext.c \
 	vg_from_ucode.c \
+	vg_hashtable.c \
 	vg_helpers.S \
 	vg_instrument.c \
 	vg_intercept.c \
@@ -71,27 +70,22 @@
 	demangle/dyn-string.o \
 	demangle/safe-ctype.o
 
+## Build a .a library, but we don't actually use it;  just a ploy to ensure
+## vg_replace_malloc.o is built.
+noinst_LIBRARIES = lib_replace_malloc.a
+
+lib_replace_malloc_a_SOURCES = vg_replace_malloc.c
+
 noinst_HEADERS = \
         vg_kerneliface.h        \
         vg_include.h            \
         vg_constants.h          \
-        vg_unsafe.h
+	vg_unsafe.h
 
 MANUAL_DEPS = $(noinst_HEADERS) $(include_HEADERS) 
 
 vg_memory.o: vg_memory.c $(MANUAL_DEPS)
 	$(COMPILE) -O2 @PREFERRED_STACK_BOUNDARY@ -c $<
 
-vg_intercept.o vg_libpthread.o vg_clientfuncs.o: CFLAGS += -fno-omit-frame-pointer
+vg_intercept.o vg_libpthread.o vg_replace_malloc.o: CFLAGS += -fno-omit-frame-pointer
 
-##valgrind.so$(EXEEXT): $(valgrind_so_OBJECTS)
-##	$(CC) $(CFLAGS) $(LDFLAGS) -shared -o valgrind.so \
-##		$(valgrind_so_OBJECTS) $(valgrind_so_LDADD)
-
-##valgrinq.so$(EXEEXT): $(valgrinq_so_OBJECTS)
-##	$(CC) $(CFLAGS) -shared -o valgrinq.so $(valgrinq_so_OBJECTS)
-
-##libpthread.so$(EXEEXT): $(libpthread_so_OBJECTS) $(srcdir)/vg_libpthread.vs
-##	$(CC) -Wall -Werror -g -O -shared -fpic -o libpthread.so \
-##		$(libpthread_so_OBJECTS) \
-##		-Wl,-version-script $(srcdir)/vg_libpthread.vs
diff --git a/coregrind/vg_clientfuncs.c b/coregrind/vg_clientfuncs.c
deleted file mode 100644
index 6bda8ec..0000000
--- a/coregrind/vg_clientfuncs.c
+++ /dev/null
@@ -1,600 +0,0 @@
-
-/*--------------------------------------------------------------------*/
-/*--- Code which runs on the simulated CPU.                        ---*/
-/*---                                             vg_clientfuncs.c ---*/
-/*--------------------------------------------------------------------*/
-
-/*
-   This file is part of Valgrind, an extensible x86 protected-mode
-   emulator for monitoring program execution on x86-Unixes.
-
-   Copyright (C) 2000-2002 Julian Seward 
-      jseward@acm.org
-
-   This program is free software; you can redistribute it and/or
-   modify it under the terms of the GNU General Public License as
-   published by the Free Software Foundation; either version 2 of the
-   License, or (at your option) any later version.
-
-   This program is distributed in the hope that it will be useful, but
-   WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   General Public License for more details.
-
-   You should have received a copy of the GNU General Public License
-   along with this program; if not, write to the Free Software
-   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
-   02111-1307, USA.
-
-   The GNU General Public License is contained in the file COPYING.
-*/
-
-#include "vg_include.h"
-
-/* Sidestep the normal check which disallows using valgrind.h
-   directly. */
-#define __VALGRIND_SOMESKIN_H
-#include "valgrind.h"   /* for VALGRIND_MAGIC_SEQUENCE */
-
-
-/* ---------------------------------------------------------------------
-   All the code in this file runs on the SIMULATED CPU.  It is
-   intended for various reasons as drop-in replacements for libc
-   functions.  These functions have global visibility (obviously) and
-   have no prototypes in vg_include.h, since they are not intended to
-   be called from within Valgrind.
-   ------------------------------------------------------------------ */
-
-/* ---------------------------------------------------------------------
-   Intercepts for the GNU malloc interface.
-   ------------------------------------------------------------------ */
-
-#define SIMPLE_REQUEST1(_qyy_request, _qyy_arg1)                 \
-   ({unsigned int _qyy_res;                                      \
-    VALGRIND_MAGIC_SEQUENCE(_qyy_res, 0 /* default return */,    \
-                            _qyy_request,                        \
-                            _qyy_arg1, 0, 0, 0);                 \
-    _qyy_res;                                                    \
-   })
-
-#define SIMPLE_REQUEST2(_qyy_request, _qyy_arg1, _qyy_arg2)      \
-   ({unsigned int _qyy_res;                                      \
-    VALGRIND_MAGIC_SEQUENCE(_qyy_res, 0 /* default return */,    \
-                            _qyy_request,                        \
-                            _qyy_arg1, _qyy_arg2, 0, 0);         \
-    _qyy_res;                                                    \
-   })
-
-
-/* Below are new versions of malloc, __builtin_new, free, 
-   __builtin_delete, calloc and realloc.
-
-   malloc, __builtin_new, free, __builtin_delete, calloc and realloc
-   can be entered either on the real CPU or the simulated one.  If on
-   the real one, this is because the dynamic linker is running the
-   static initialisers for C++, before starting up Valgrind itself.
-   In this case it is safe to route calls through to
-   VG_(arena_malloc)/VG_(arena_free), since they are self-initialising.
-
-   Once Valgrind is initialised, vg_running_on_simd_CPU becomes True.
-   The call needs to be transferred from the simulated CPU back to the
-   real one and routed to the vg_client_* functions.  To do that, the
-   client-request mechanism (in valgrind.h) is used to convey requests
-   to the scheduler.
-*/
-
-/* ALL calls to malloc wind up here. */
-void* malloc ( Int n )
-{
-   void* v;
-
-   if (VG_(clo_trace_malloc))
-      VG_(printf)("malloc[simd=%d](%d)", 
-                  (UInt)VG_(running_on_simd_CPU), n );
-   if (n < 0) {
-      v = NULL;
-      if (VG_(needs).core_errors)
-         VG_(message)(Vg_UserMsg, 
-                      "Warning: silly arg (%d) to malloc()", n );
-   } else {
-      if (VG_(clo_sloppy_malloc)) { while ((n % 4) > 0) n++; }
-
-      if (VG_(running_on_simd_CPU)) {
-         v = (void*)SIMPLE_REQUEST1(VG_USERREQ__MALLOC, n);
-      } else {
-         v = VG_(arena_malloc)(VG_AR_CLIENT, n);
-      }
-   }
-   if (VG_(clo_trace_malloc)) 
-      VG_(printf)(" = %p\n", v );
-   return (void*)v;
-}
-
-void* __builtin_new ( Int n )
-{
-   void* v;
-
-   if (VG_(clo_trace_malloc))
-      VG_(printf)("__builtin_new[simd=%d](%d)", 
-                  (UInt)VG_(running_on_simd_CPU), n );
-   if (n < 0) {
-      v = NULL;
-      if (VG_(needs).core_errors)
-         VG_(message)(Vg_UserMsg, 
-                      "Warning: silly arg (%d) to __builtin_new()", n );
-   } else {
-      if (VG_(clo_sloppy_malloc)) { while ((n % 4) > 0) n++; }
-
-      if (VG_(running_on_simd_CPU)) {
-         v = (void*)SIMPLE_REQUEST1(VG_USERREQ__BUILTIN_NEW, n);
-      } else {
-         v = VG_(arena_malloc)(VG_AR_CLIENT, n);
-      }
-   }
-   if (VG_(clo_trace_malloc)) 
-      VG_(printf)(" = %p\n", v );
-   return v;
-}
-
-/* gcc 3.X.X mangles them differently. */
-void* _Znwj ( Int n )
-{
-  return __builtin_new(n);
-}
-
-void* __builtin_vec_new ( Int n )
-{
-   void* v;
-
-   if (VG_(clo_trace_malloc))
-      VG_(printf)("__builtin_vec_new[simd=%d](%d)", 
-                  (UInt)VG_(running_on_simd_CPU), n );
-   if (n < 0) {
-      v = NULL;
-      if (VG_(needs).core_errors)
-         VG_(message)(Vg_UserMsg, 
-                      "Warning: silly arg (%d) to __builtin_vec_new()", n );
-   } else {
-      if (VG_(clo_sloppy_malloc)) { while ((n % 4) > 0) n++; }
-
-      if (VG_(running_on_simd_CPU)) {
-         v = (void*)SIMPLE_REQUEST1(VG_USERREQ__BUILTIN_VEC_NEW, n);
-      } else {
-         v = VG_(arena_malloc)(VG_AR_CLIENT, n);
-      }
-   }
-   if (VG_(clo_trace_malloc)) 
-      VG_(printf)(" = %p\n", v );
-   return v;
-}
-
-/* gcc 3.X.X mangles them differently. */
-void* _Znaj ( Int n )
-{
-  return __builtin_vec_new(n);
-}
-
-void free ( void* p )
-{
-   if (VG_(clo_trace_malloc))
-      VG_(printf)("free[simd=%d](%p)\n", 
-                  (UInt)VG_(running_on_simd_CPU), p );
-   if (p == NULL) 
-      return;
-   if (VG_(running_on_simd_CPU)) {
-      (void)SIMPLE_REQUEST1(VG_USERREQ__FREE, p);
-   } else {
-      VG_(arena_free)(VG_AR_CLIENT, p);      
-   }
-}
-
-void __builtin_delete ( void* p )
-{
-   if (VG_(clo_trace_malloc))
-      VG_(printf)("__builtin_delete[simd=%d](%p)\n", 
-                  (UInt)VG_(running_on_simd_CPU), p );
-   if (p == NULL) 
-      return;
-   if (VG_(running_on_simd_CPU)) {
-      (void)SIMPLE_REQUEST1(VG_USERREQ__BUILTIN_DELETE, p);
-   } else {
-      VG_(arena_free)(VG_AR_CLIENT, p);
-   }
-}
-
-/* gcc 3.X.X mangles them differently. */
-void _ZdlPv ( void* p )
-{
-  __builtin_delete(p);
-}
-
-void __builtin_vec_delete ( void* p )
-{
-   if (VG_(clo_trace_malloc))
-       VG_(printf)("__builtin_vec_delete[simd=%d](%p)\n", 
-                   (UInt)VG_(running_on_simd_CPU), p );
-   if (p == NULL) 
-      return;
-   if (VG_(running_on_simd_CPU)) {
-      (void)SIMPLE_REQUEST1(VG_USERREQ__BUILTIN_VEC_DELETE, p);
-   } else {
-      VG_(arena_free)(VG_AR_CLIENT, p);
-   }
-}
-
-/* gcc 3.X.X mangles them differently. */
-void _ZdaPv ( void* p )
-{
-  __builtin_vec_delete(p);
-}
-
-void* calloc ( Int nmemb, Int size )
-{
-   void* v;
-
-   if (VG_(clo_trace_malloc))
-      VG_(printf)("calloc[simd=%d](%d,%d)", 
-                  (UInt)VG_(running_on_simd_CPU), nmemb, size );
-   if (nmemb < 0 || size < 0) {
-      v = NULL;
-      if (VG_(needs).core_errors)
-         VG_(message)(Vg_UserMsg, "Warning: silly args (%d,%d) to calloc()", 
-                                  nmemb, size );
-   } else {
-      if (VG_(running_on_simd_CPU)) {
-         v = (void*)SIMPLE_REQUEST2(VG_USERREQ__CALLOC, nmemb, size);
-      } else {
-         v = VG_(arena_calloc)(VG_AR_CLIENT, nmemb, size);
-      }
-   }
-   if (VG_(clo_trace_malloc)) 
-      VG_(printf)(" = %p\n", v );
-   return v;
-}
-
-
-void* realloc ( void* ptrV, Int new_size )
-{
-   void* v;
-
-   if (VG_(clo_trace_malloc))
-      VG_(printf)("realloc[simd=%d](%p,%d)", 
-                  (UInt)VG_(running_on_simd_CPU), ptrV, new_size );
-
-   if (VG_(clo_sloppy_malloc)) 
-      { while ((new_size % 4) > 0) new_size++; }
-
-   if (ptrV == NULL)
-      return malloc(new_size);
-   if (new_size <= 0) {
-      free(ptrV);
-      if (VG_(clo_trace_malloc)) 
-         VG_(printf)(" = 0\n" );
-      return NULL;
-   }   
-   if (VG_(running_on_simd_CPU)) {
-      v = (void*)SIMPLE_REQUEST2(VG_USERREQ__REALLOC, ptrV, new_size);
-   } else {
-      v = VG_(arena_realloc)(VG_AR_CLIENT, ptrV, /*alignment*/4, new_size);
-   }
-   if (VG_(clo_trace_malloc)) 
-      VG_(printf)(" = %p\n", v );
-   return v;
-}
-
-
-void* memalign ( Int alignment, Int n )
-{
-   void* v;
-
-   if (VG_(clo_trace_malloc))
-      VG_(printf)("memalign[simd=%d](al %d, size %d)", 
-                  (UInt)VG_(running_on_simd_CPU), alignment, n );
-   if (n < 0) {
-      v = NULL;
-   } else {
-      if (VG_(clo_sloppy_malloc)) { while ((n % 4) > 0) n++; }
-
-      if (VG_(running_on_simd_CPU)) {
-         v = (void*)SIMPLE_REQUEST2(VG_USERREQ__MEMALIGN, alignment, n);
-      } else {
-         v = VG_(arena_malloc_aligned)(VG_AR_CLIENT, alignment, n);
-      }
-   }
-   if (VG_(clo_trace_malloc)) 
-      VG_(printf)(" = %p\n", v );
-   return (void*)v;
-}
-
-
-void* valloc ( Int size )
-{
-   return memalign(VKI_BYTES_PER_PAGE, size);
-}
-
-
-/* Various compatibility wrapper functions, for glibc and libstdc++. */
-void cfree ( void* p )
-{
-   free ( p );
-}
-
-
-int mallopt ( int cmd, int value )
-{
-   /* In glibc-2.2.4, 1 denotes a successful return value for mallopt */
-   return 1;
-}
-
-
-int __posix_memalign ( void **memptr, UInt alignment, UInt size )
-{
-    void *mem;
-
-    /* Test whether the SIZE argument is valid.  It must be a power of
-       two multiple of sizeof (void *).  */
-    if (size % sizeof (void *) != 0 || (size & (size - 1)) != 0)
-       return VKI_EINVAL /*22*/ /*EINVAL*/;
-
-    mem = memalign (alignment, size);
-
-    if (mem != NULL) {
-       *memptr = mem;
-       return 0;
-    }
-
-    return VKI_ENOMEM /*12*/ /*ENOMEM*/;
-}
-
-
-/* Bomb out if we get any of these. */
-/* HACK: We shouldn't call VG_(core_panic) or VG_(message) on the simulated
-   CPU.  Really we should pass the request in the usual way, and
-   Valgrind itself can do the panic.  Too tedious, however.  
-*/
-void pvalloc ( void )
-{ VG_(core_panic)("call to pvalloc\n"); }
-void malloc_stats ( void )
-{ VG_(core_panic)("call to malloc_stats\n"); }
-void malloc_usable_size ( void )
-{ VG_(core_panic)("call to malloc_usable_size\n"); }
-void malloc_trim ( void )
-{ VG_(core_panic)("call to malloc_trim\n"); }
-void malloc_get_state ( void )
-{ VG_(core_panic)("call to malloc_get_state\n"); }
-void malloc_set_state ( void )
-{ VG_(core_panic)("call to malloc_set_state\n"); }
-
-
-/* Yet another ugly hack.  Cannot include <malloc.h> because we
-   implement functions implemented there with different signatures.
-   This struct definition MUST match the system one. */
-
-/* SVID2/XPG mallinfo structure */
-struct mallinfo {
-   int arena;    /* total space allocated from system */
-   int ordblks;  /* number of non-inuse chunks */
-   int smblks;   /* unused -- always zero */
-   int hblks;    /* number of mmapped regions */
-   int hblkhd;   /* total space in mmapped regions */
-   int usmblks;  /* unused -- always zero */
-   int fsmblks;  /* unused -- always zero */
-   int uordblks; /* total allocated space */
-   int fordblks; /* total non-inuse space */
-   int keepcost; /* top-most, releasable (via malloc_trim) space */
-};
-
-struct mallinfo mallinfo ( void )
-{
-   /* Should really try to return something a bit more meaningful */
-   Int             i;
-   struct mallinfo mi;
-   UChar*          pmi = (UChar*)(&mi);
-   for (i = 0; i < sizeof(mi); i++)
-      pmi[i] = 0;
-   return mi;
-}
-
-
-/* ---------------------------------------------------------------------
-   Replace some C lib things with equivs which don't get
-   spurious value warnings.  THEY RUN ON SIMD CPU!
-   ------------------------------------------------------------------ */
-
-char* strrchr ( const char* s, int c )
-{
-   UChar  ch   = (UChar)((UInt)c);
-   UChar* p    = (UChar*)s;
-   UChar* last = NULL;
-   while (True) {
-      if (*p == ch) last = p;
-      if (*p == 0) return last;
-      p++;
-   }
-}
-
-char* strchr ( const char* s, int c )
-{
-   UChar  ch = (UChar)((UInt)c);
-   UChar* p  = (UChar*)s;
-   while (True) {
-      if (*p == ch) return p;
-      if (*p == 0) return NULL;
-      p++;
-   }
-}
-
-char* strcat ( char* dest, const char* src )
-{
-   Char* dest_orig = dest;
-   while (*dest) dest++;
-   while (*src) *dest++ = *src++;
-   *dest = 0;
-   return dest_orig;
-}
-
-unsigned int strlen ( const char* str )
-{
-   UInt i = 0;
-   while (str[i] != 0) i++;
-   return i;
-}
-
-char* strcpy ( char* dest, const char* src )
-{
-   Char* dest_orig = dest;
-   while (*src) *dest++ = *src++;
-   *dest = 0;
-   return dest_orig;
-}
-
-int strncmp ( const unsigned char* s1, const unsigned char* s2, 
-              unsigned int nmax )
-{
-   unsigned int n = 0;
-   while (True) {
-      if (n >= nmax) return 0;
-      if (*s1 == 0 && *s2 == 0) return 0;
-      if (*s1 == 0) return -1;
-      if (*s2 == 0) return 1;
-
-      if (*(unsigned char*)s1 < *(unsigned char*)s2) return -1;
-      if (*(unsigned char*)s1 > *(unsigned char*)s2) return 1;
-
-      s1++; s2++; n++;
-   }
-}
-
-int strcmp ( const char* s1, const char* s2 )
-{
-   register unsigned char c1;
-   register unsigned char c2;
-   while (True) {
-      c1 = *(unsigned char *)s1;
-      c2 = *(unsigned char *)s2;
-      if (c1 != c2) break;
-      if (c1 == 0) break;
-      s1++; s2++;
-   }
-   if ((unsigned char)c1 < (unsigned char)c2) return -1;
-   if ((unsigned char)c1 > (unsigned char)c2) return 1;
-   return 0;
-}
-
-void* memchr(const void *s, int c, unsigned int n)
-{
-   unsigned int i;
-   UChar c0 = (UChar)c;
-   UChar* p = (UChar*)s;
-   for (i = 0; i < n; i++)
-      if (p[i] == c0) return (void*)(&p[i]);
-   return NULL;
-}
-
-void* memcpy( void *dst, const void *src, unsigned int len )
-{
-    register char *d;
-    register char *s;
-    if ( dst > src ) {
-        d = (char *)dst + len - 1;
-        s = (char *)src + len - 1;
-        while ( len >= 4 ) {
-            *d-- = *s--;
-            *d-- = *s--;
-            *d-- = *s--;
-            *d-- = *s--;
-            len -= 4;
-	}
-        while ( len-- ) {
-            *d-- = *s--;
-        }
-    } else if ( dst < src ) {
-        d = (char *)dst;
-        s = (char *)src;
-	while ( len >= 4 ) {
-            *d++ = *s++;
-            *d++ = *s++;
-            *d++ = *s++;
-            *d++ = *s++;
-            len -= 4;
-	}
-        while ( len-- ) {
-            *d++ = *s++;
-	}
-    }
-    return dst;
-}
-
-
-/* ---------------------------------------------------------------------
-   Horrible hack to make sigsuspend() sort-of work OK.  Same trick as
-   for pause() in vg_libpthread.so.
-   ------------------------------------------------------------------ */
-
-/* Horrible because
-
-   -- uses VG_(ksigprocmask), VG_(nanosleep) and vg_assert, which are 
-      valgrind-native (not intended for client use).
-
-   -- This is here so single-threaded progs (not linking libpthread.so)
-      can see it.  But pause() should also be here.  ???
-*/
-
-/* Either libc supplies this (weak) or our libpthread.so supplies it
-   (strong) in a threaded setting. 
-*/
-extern int* __errno_location ( void );
-
-
-int sigsuspend ( /* const sigset_t * */ void* mask)
-{
-   unsigned int n_orig, n_now;
-   struct vki_timespec nanosleep_interval;
-
-   VALGRIND_MAGIC_SEQUENCE(n_orig, 0xFFFFFFFF /* default */,
-                           VG_USERREQ__GET_N_SIGS_RETURNED, 
-                           0, 0, 0, 0);
-   vg_assert(n_orig != 0xFFFFFFFF);
-
-   VG_(ksigprocmask)(VKI_SIG_SETMASK, mask, NULL);
-
-   while (1) {
-      VALGRIND_MAGIC_SEQUENCE(n_now, 0xFFFFFFFF /* default */,
-                              VG_USERREQ__GET_N_SIGS_RETURNED, 
-                              0, 0, 0, 0);
-      vg_assert(n_now != 0xFFFFFFFF);
-      vg_assert(n_now >= n_orig);
-      if (n_now != n_orig) break;
-
-      nanosleep_interval.tv_sec  = 0;
-      nanosleep_interval.tv_nsec = 53 * 1000 * 1000; /* 53 milliseconds */
-      /* It's critical here that valgrind's nanosleep implementation
-         is nonblocking. */
-      VG_(nanosleep)( &nanosleep_interval, NULL);
-   }
-
-   /* Maybe this is OK both in single and multithreaded setting. */
-   * (__errno_location()) = -VKI_EINTR; /* == EINTR; */ 
-   return -1;
-}
-
-
-/* ---------------------------------------------------------------------
-   Hook for running __libc_freeres once the program exits.
-   ------------------------------------------------------------------ */
-
-void VG_(__libc_freeres_wrapper)( void )
-{
-   int res;
-   extern void __libc_freeres(void);
-   __libc_freeres();
-   VALGRIND_MAGIC_SEQUENCE(res, 0 /* default */,
-                           VG_USERREQ__LIBC_FREERES_DONE, 0, 0, 0, 0);
-   /*NOTREACHED*/
-   vg_assert(12345+54321 == 999999);
-}
-
-
-/*--------------------------------------------------------------------*/
-/*--- end                                         vg_clientfuncs.c ---*/
-/*--------------------------------------------------------------------*/
diff --git a/coregrind/vg_clientmalloc.c b/coregrind/vg_clientmalloc.c
deleted file mode 100644
index 025de27..0000000
--- a/coregrind/vg_clientmalloc.c
+++ /dev/null
@@ -1,534 +0,0 @@
-
-/*--------------------------------------------------------------------*/
-/*--- An implementation of malloc/free for the client.             ---*/
-/*---                                            vg_clientmalloc.c ---*/
-/*--------------------------------------------------------------------*/
-
-/*
-   This file is part of Valgrind, an extensible x86 protected-mode
-   emulator for monitoring program execution on x86-Unixes.
-
-   Copyright (C) 2000-2002 Julian Seward 
-      jseward@acm.org
-
-   This program is free software; you can redistribute it and/or
-   modify it under the terms of the GNU General Public License as
-   published by the Free Software Foundation; either version 2 of the
-   License, or (at your option) any later version.
-
-   This program is distributed in the hope that it will be useful, but
-   WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   General Public License for more details.
-
-   You should have received a copy of the GNU General Public License
-   along with this program; if not, write to the Free Software
-   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
-   02111-1307, USA.
-
-   The GNU General Public License is contained in the file COPYING.
-*/
-
-#include "vg_include.h"
-
-
-/*------------------------------------------------------------*/
-/*--- Defns                                                ---*/
-/*------------------------------------------------------------*/
-
-/* #define DEBUG_CLIENTMALLOC */
-
-/* Holds malloc'd but not freed blocks.  Static, so zero-inited by default. */
-#define VG_MALLOCLIST_NO(aa) (((UInt)(aa)) % VG_N_MALLOCLISTS)
-static ShadowChunk* vg_malloclist[VG_N_MALLOCLISTS];
-
-/* Stats ... */
-static UInt         vg_cmalloc_n_mallocs  = 0;
-static UInt         vg_cmalloc_n_frees    = 0;
-static UInt         vg_cmalloc_bs_mallocd = 0;
-
-static UInt         vg_mlist_frees = 0;
-static UInt         vg_mlist_tries = 0;
-
-
-/*------------------------------------------------------------*/
-/*--- Fns                                                  ---*/
-/*------------------------------------------------------------*/
-
-static __inline__
-Bool needs_shadow_chunks ( void )
-{
-   return VG_(needs).core_errors             ||
-          VG_(needs).alternative_free        ||
-          VG_(needs).sizeof_shadow_block > 0 ||
-          VG_(track_events).bad_free         ||
-          VG_(track_events).mismatched_free  ||
-          VG_(track_events).copy_mem_heap    ||
-          VG_(track_events).die_mem_heap;
-}
-
-#ifdef DEBUG_CLIENTMALLOC
-static 
-Int count_malloclists ( void )
-{
-   ShadowChunk* sc;
-   UInt ml_no;
-   Int  n = 0;
-
-   for (ml_no = 0; ml_no < VG_N_MALLOCLISTS; ml_no++) 
-      for (sc = vg_malloclist[ml_no]; sc != NULL; sc = sc->next)
-         n++;
-   return n;
-}
-#endif
-
-/*------------------------------------------------------------*/
-/*--- Shadow chunks, etc                                   ---*/
-/*------------------------------------------------------------*/
-
-/* Allocate a user-chunk of size bytes.  Also allocate its shadow
-   block, make the shadow block point at the user block.  Put the
-   shadow chunk on the appropriate list, and set all memory
-   protections correctly. */
-static void addShadowChunk ( ThreadState* tst,
-                             Addr p, UInt size, VgAllocKind kind )
-{
-   ShadowChunk* sc;
-   UInt         ml_no = VG_MALLOCLIST_NO(p);
-
-#  ifdef DEBUG_CLIENTMALLOC
-   VG_(printf)("[m %d, f %d (%d)] addShadowChunk "
-               "( sz %d, addr %p, list %d )\n", 
-               count_malloclists(), 
-               0/*count_freelist()*/, 0/*vg_freed_list_volume*/,
-               size, p, ml_no );
-#  endif
-
-   sc = VG_(arena_malloc)(VG_AR_CORE, 
-                          sizeof(ShadowChunk)
-                           + VG_(needs).sizeof_shadow_block);
-   sc->size      = size;
-   sc->allockind = kind;
-   sc->data      = p;
-   /* Fill in any skin-specific shadow chunk stuff */
-   if (VG_(needs).sizeof_shadow_block > 0)
-      SK_(complete_shadow_chunk) ( sc, tst );
-
-   sc->next  = vg_malloclist[ml_no];
-   vg_malloclist[ml_no] = sc;
-}
-
-/* Get the sc, and return the address of the previous node's next pointer
-   which allows sc to be removed from the list later without having to look
-   it up again.  */
-static ShadowChunk* getShadowChunk ( Addr a, /*OUT*/ShadowChunk*** next_ptr )
-{
-   ShadowChunk *prev, *curr;
-   Int ml_no;
-   
-   ml_no = VG_MALLOCLIST_NO(a);
-
-   prev = NULL;
-   curr = vg_malloclist[ml_no];
-   while (True) {
-      if (curr == NULL) 
-         break;
-      if (a == curr->data)
-         break;
-      prev = curr;
-      curr = curr->next;
-   }
-
-   if (NULL == prev)
-      *next_ptr = &vg_malloclist[ml_no];
-   else
-      *next_ptr = &prev->next;
-
-   return curr;
-}
-
-void VG_(free_ShadowChunk) ( ShadowChunk* sc )
-{
-   VG_(arena_free) ( VG_AR_CLIENT, (void*)sc->data );
-   VG_(arena_free) ( VG_AR_CORE,   sc );
-}
-
-static 
-void sort_malloc_shadows ( ShadowChunk** shadows, UInt n_shadows )
-{
-   Int   incs[14] = { 1, 4, 13, 40, 121, 364, 1093, 3280,
-                      9841, 29524, 88573, 265720,
-                      797161, 2391484 };
-   Int          lo = 0;
-   Int          hi = n_shadows-1;
-   Int          i, j, h, bigN, hp;
-   ShadowChunk* v;
-   
-   bigN = hi - lo + 1; if (bigN < 2) return;
-   hp = 0; while (hp < 14 && incs[hp] < bigN) hp++; hp--;
-   vg_assert(0 <= hp && hp < 14);
-   
-   for (; hp >= 0; hp--) {
-      h = incs[hp];
-      i = lo + h;
-      while (1) {
-         if (i > hi) break;
-         v = shadows[i];
-         j = i;
-         while (shadows[j-h]->data > v->data) {
-            shadows[j] = shadows[j-h];
-            j = j - h;
-            if (j <= (lo + h - 1)) break;
-         }
-         shadows[j] = v;
-         i++;
-      }
-   }
-}
-
-/* Allocate a suitably-sized array, copy all the malloc-d block
-   shadows into it, and return both the array and the size of it.
-   This is used by the memory-leak detector.
-*/
-ShadowChunk** VG_(get_malloc_shadows) ( /*OUT*/ UInt* n_shadows )
-{
-   UInt          i, scn;
-   ShadowChunk** arr;
-   ShadowChunk*  sc;
-   *n_shadows = 0;
-   for (scn = 0; scn < VG_N_MALLOCLISTS; scn++) {
-      for (sc = vg_malloclist[scn]; sc != NULL; sc = sc->next) {
-         (*n_shadows)++;
-      }
-   }
-   if (*n_shadows == 0) return NULL;
-
-   arr = VG_(malloc)( *n_shadows * sizeof(ShadowChunk*) );
-
-   i = 0;
-   for (scn = 0; scn < VG_N_MALLOCLISTS; scn++) {
-      for (sc = vg_malloclist[scn]; sc != NULL; sc = sc->next) {
-         arr[i++] = sc;
-      }
-   }
-   vg_assert(i == *n_shadows);
-
-   sort_malloc_shadows(arr, *n_shadows);
-
-   /* Sanity check; assert that the blocks are now in order and that
-      they don't overlap. */
-   for (i = 0; i < *n_shadows-1; i++) {
-      sk_assert( arr[i]->data                < arr[i+1]->data );
-      sk_assert( arr[i]->data + arr[i]->size < arr[i+1]->data );
-   }
-
-   return arr;
-}
-
-Bool VG_(addr_is_in_block)( Addr a, Addr start, UInt size )
-{
-   return (start - VG_AR_CLIENT_REDZONE_SZB <= a
-           && a < start + size + VG_AR_CLIENT_REDZONE_SZB);
-}
-
-/* Return the first shadow chunk satisfying the predicate p. */
-ShadowChunk* VG_(first_matching_mallocd_ShadowChunk)
-                        ( Bool (*p) ( ShadowChunk* ))
-{
-   UInt ml_no;
-   ShadowChunk* sc;
-
-   for (ml_no = 0; ml_no < VG_N_MALLOCLISTS; ml_no++)
-      for (sc = vg_malloclist[ml_no]; sc != NULL; sc = sc->next)
-         if (p(sc))
-            return sc;
-
-   return NULL;
-}
-
-
-/*------------------------------------------------------------*/
-/*--- client_malloc(), etc                                 ---*/
-/*------------------------------------------------------------*/
-
-/* Allocate memory, noticing whether or not we are doing the full
-   instrumentation thing. */
-static __inline__
-void* alloc_and_new_mem ( ThreadState* tst, UInt size, UInt alignment,
-                          Bool is_zeroed, VgAllocKind kind )
-{
-   Addr p;
-
-   VGP_PUSHCC(VgpCliMalloc);
-
-   vg_cmalloc_n_mallocs ++;
-   vg_cmalloc_bs_mallocd += size;
-
-   vg_assert(alignment >= 4);
-   if (alignment == 4)
-      p = (Addr)VG_(arena_malloc)(VG_AR_CLIENT, size);
-   else
-      p = (Addr)VG_(arena_malloc_aligned)(VG_AR_CLIENT, alignment, size);
-
-   if (needs_shadow_chunks())
-      addShadowChunk ( tst, p, size, kind );
-
-   VG_TRACK( ban_mem_heap, p-VG_AR_CLIENT_REDZONE_SZB, 
-                           VG_AR_CLIENT_REDZONE_SZB );
-   VG_TRACK( new_mem_heap, p, size, is_zeroed );
-   VG_TRACK( ban_mem_heap, p+size, VG_AR_CLIENT_REDZONE_SZB );
-
-   VGP_POPCC(VgpCliMalloc);
-   return (void*)p;
-}
-
-void* VG_(client_malloc) ( ThreadState* tst, UInt size, VgAllocKind kind )
-{
-   void* p = alloc_and_new_mem ( tst, size, VG_(clo_alignment), 
-                                 /*is_zeroed*/False, kind );
-#  ifdef DEBUG_CLIENTMALLOC
-   VG_(printf)("[m %d, f %d (%d)] client_malloc ( %d, %x ) = %p\n", 
-               count_malloclists(), 
-               0/*count_freelist()*/, 0/*vg_freed_list_volume*/,
-               size, kind, p );
-#  endif
-   return p;
-}
-
-
-void* VG_(client_memalign) ( ThreadState* tst, UInt align, UInt size )
-{
-   void* p = alloc_and_new_mem ( tst, size, align, 
-                                 /*is_zeroed*/False, Vg_AllocMalloc );
-#  ifdef DEBUG_CLIENTMALLOC
-   VG_(printf)("[m %d, f %d (%d)] client_memalign ( al %d, sz %d ) = %p\n", 
-               count_malloclists(), 
-               0/*count_freelist()*/, 0/*vg_freed_list_volume*/,
-               align, size, p );
-#  endif
-   return p;
-}
-
-
-void* VG_(client_calloc) ( ThreadState* tst, UInt nmemb, UInt size1 )
-{
-   void*        p;
-   UInt         size, i;
-
-   size = nmemb * size1;
-
-   p = alloc_and_new_mem ( tst, size, VG_(clo_alignment), 
-                              /*is_zeroed*/True, Vg_AllocMalloc );
-   /* Must zero block for calloc! */
-   for (i = 0; i < size; i++) ((UChar*)p)[i] = 0;
-
-#  ifdef DEBUG_CLIENTMALLOC
-   VG_(printf)("[m %d, f %d (%d)] client_calloc ( %d, %d ) = %p\n", 
-               count_malloclists(), 
-               0/*count_freelist()*/, 0/*vg_freed_list_volume*/,
-               nmemb, size1, p );
-#  endif
-
-   return p;
-}
-
-static
-void die_and_free_mem ( ThreadState* tst, ShadowChunk* sc,
-                        ShadowChunk** prev_chunks_next_ptr )
-{
-   /* Note: ban redzones again -- just in case user de-banned them
-      with a client request... */
-   VG_TRACK( ban_mem_heap, sc->data-VG_AR_CLIENT_REDZONE_SZB, 
-                           VG_AR_CLIENT_REDZONE_SZB );
-   VG_TRACK( die_mem_heap, sc->data, sc->size );
-   VG_TRACK( ban_mem_heap, sc->data+sc->size, VG_AR_CLIENT_REDZONE_SZB );
-
-   /* Remove sc from the malloclist using prev_chunks_next_ptr to
-      avoid repeating the hash table lookup.  Can't remove until at least
-      after free and free_mismatch errors are done because they use
-      describe_addr() which looks for it in malloclist. */
-   *prev_chunks_next_ptr = sc->next;
-
-   if (VG_(needs).alternative_free)
-      SK_(alt_free) ( sc, tst );
-   else
-      VG_(free_ShadowChunk) ( sc );
-}
-
-
-void VG_(client_free) ( ThreadState* tst, void* p, VgAllocKind kind )
-{
-   ShadowChunk*  sc;
-   ShadowChunk** prev_chunks_next_ptr;
-
-   VGP_PUSHCC(VgpCliMalloc);
-
-#  ifdef DEBUG_CLIENTMALLOC
-   VG_(printf)("[m %d, f %d (%d)] client_free ( %p, %x )\n", 
-               count_malloclists(), 
-               0/*count_freelist()*/, 0/*vg_freed_list_volume*/,
-               p, kind );
-#  endif
-
-   vg_cmalloc_n_frees ++;
-
-   if (! needs_shadow_chunks()) {
-      VG_(arena_free) ( VG_AR_CLIENT, p );
-
-   } else {
-      sc = getShadowChunk ( (Addr)p, &prev_chunks_next_ptr );
-
-      if (sc == NULL) {
-         VG_TRACK( bad_free, tst, (Addr)p );
-         VGP_POPCC(VgpCliMalloc);
-         return;
-      }
-
-      /* check if its a matching free() / delete / delete [] */
-      if (kind != sc->allockind)
-         VG_TRACK( mismatched_free, tst, (Addr)p );
-
-      die_and_free_mem ( tst, sc, prev_chunks_next_ptr );
-   } 
-   VGP_POPCC(VgpCliMalloc);
-}
-
-
-void* VG_(client_realloc) ( ThreadState* tst, void* p, UInt new_size )
-{
-   ShadowChunk  *sc;
-   ShadowChunk **prev_chunks_next_ptr;
-   UInt          i;
-
-   VGP_PUSHCC(VgpCliMalloc);
-
-   vg_cmalloc_n_frees ++;
-   vg_cmalloc_n_mallocs ++;
-   vg_cmalloc_bs_mallocd += new_size;
-
-   if (! needs_shadow_chunks()) {
-      vg_assert(p != NULL && new_size != 0);
-      p = VG_(arena_realloc) ( VG_AR_CLIENT, p, VG_(clo_alignment), 
-                               new_size );
-      VGP_POPCC(VgpCliMalloc);
-      return p;
-
-   } else {
-      /* First try and find the block. */
-      sc = getShadowChunk ( (Addr)p, &prev_chunks_next_ptr );
-
-      if (sc == NULL) {
-         VG_TRACK( bad_free, tst, (Addr)p );
-         /* Perhaps we should return to the program regardless. */
-         VGP_POPCC(VgpCliMalloc);
-         return NULL;
-      }
-     
-      /* check if its a matching free() / delete / delete [] */
-      if (Vg_AllocMalloc != sc->allockind) {
-         /* can not realloc a range that was allocated with new or new [] */
-         VG_TRACK( mismatched_free, tst, (Addr)p );
-         /* but keep going anyway */
-      }
-
-      if (sc->size == new_size) {
-         /* size unchanged */
-         VGP_POPCC(VgpCliMalloc);
-         return p;
-         
-      } else if (sc->size > new_size) {
-         /* new size is smaller */
-         VG_TRACK( die_mem_heap, sc->data+new_size, sc->size-new_size );
-         sc->size = new_size;
-         VGP_POPCC(VgpCliMalloc);
-#        ifdef DEBUG_CLIENTMALLOC
-         VG_(printf)("[m %d, f %d (%d)] client_realloc_smaller ( %p, %d ) = %p\n", 
-                     count_malloclists(), 
-                     0/*count_freelist()*/, 0/*vg_freed_list_volume*/,
-                     p, new_size, p );
-#        endif
-         return p;
-
-      } else {
-         /* new size is bigger */
-         Addr p_new;
-         
-         /* Get new memory */
-         vg_assert(VG_(clo_alignment) >= 4);
-         if (VG_(clo_alignment) == 4)
-            p_new = (Addr)VG_(arena_malloc)(VG_AR_CLIENT, new_size);
-         else
-            p_new = (Addr)VG_(arena_malloc_aligned)(VG_AR_CLIENT, 
-                                            VG_(clo_alignment), new_size);
-
-         /* First half kept and copied, second half new, 
-            red zones as normal */
-         VG_TRACK( ban_mem_heap, p_new-VG_AR_CLIENT_REDZONE_SZB, 
-                                 VG_AR_CLIENT_REDZONE_SZB );
-         VG_TRACK( copy_mem_heap, (Addr)p, p_new, sc->size );
-         VG_TRACK( new_mem_heap, p_new+sc->size, new_size-sc->size, 
-                   /*inited=*/False );
-         VG_TRACK( ban_mem_heap, p_new+new_size, VG_AR_CLIENT_REDZONE_SZB );
-
-         /* Copy from old to new */
-         for (i = 0; i < sc->size; i++)
-            ((UChar*)p_new)[i] = ((UChar*)p)[i];
-
-         /* Free old memory */
-         die_and_free_mem ( tst, sc, prev_chunks_next_ptr );
-
-         /* this has to be after die_and_free_mem, otherwise the
-            former succeeds in shorting out the new block, not the
-            old, in the case when both are on the same list.  */
-         addShadowChunk ( tst, p_new, new_size, Vg_AllocMalloc );
-
-         VGP_POPCC(VgpCliMalloc);
-#        ifdef DEBUG_CLIENTMALLOC
-         VG_(printf)("[m %d, f %d (%d)] client_realloc_bigger ( %p, %d ) = %p\n", 
-                     count_malloclists(), 
-                     0/*count_freelist()*/, 0/*vg_freed_list_volume*/,
-                     p, new_size, (void*)p_new );
-#        endif
-         return (void*)p_new;
-      }  
-   }
-}
-
-void VG_(print_malloc_stats) ( void )
-{
-   UInt         nblocks, nbytes, ml_no;
-   ShadowChunk* sc;
-
-   if (VG_(clo_verbosity) == 0)
-      return;
-
-   vg_assert(needs_shadow_chunks());
-
-   nblocks = nbytes = 0;
-
-   for (ml_no = 0; ml_no < VG_N_MALLOCLISTS; ml_no++) {
-      for (sc = vg_malloclist[ml_no]; sc != NULL; sc = sc->next) {
-         nblocks ++;
-         nbytes  += sc->size;
-      }
-   }
-
-   VG_(message)(Vg_UserMsg, 
-                "malloc/free: in use at exit: %d bytes in %d blocks.",
-                nbytes, nblocks);
-   VG_(message)(Vg_UserMsg, 
-                "malloc/free: %d allocs, %d frees, %u bytes allocated.",
-                vg_cmalloc_n_mallocs,
-                vg_cmalloc_n_frees, vg_cmalloc_bs_mallocd);
-   if (0)
-      VG_(message)(Vg_DebugMsg,
-                   "free search: %d tries, %d frees", 
-                   vg_mlist_tries, 
-                   vg_mlist_frees );
-   if (VG_(clo_verbosity) > 1)
-      VG_(message)(Vg_UserMsg, "");
-}
-
-/*--------------------------------------------------------------------*/
-/*--- end                                        vg_clientmalloc.c ---*/
-/*--------------------------------------------------------------------*/
diff --git a/coregrind/vg_default.c b/coregrind/vg_default.c
index 1bd36ff..8eeb6ee 100644
--- a/coregrind/vg_default.c
+++ b/coregrind/vg_default.c
@@ -180,9 +180,15 @@
 }
 
 __attribute__ ((weak))
-Char* SK_(usage)(void)
+void SK_(print_usage)(void)
 {
-   non_fund_panic("SK_(usage)");
+   non_fund_panic("SK_(print_usage)");
+}
+
+__attribute__ ((weak))
+void SK_(print_debug_usage)(void)
+{
+   non_fund_panic("SK_(print_debug_usage)");
 }
 
 /* ---------------------------------------------------------------------
@@ -247,26 +253,6 @@
 }
 
 /* ---------------------------------------------------------------------
-   Shadow chunks
-   ------------------------------------------------------------------ */
-
-__attribute__ ((weak))
-void SK_(complete_shadow_chunk)( ShadowChunk* sc, ThreadState* tst )
-{
-   non_fund_panic("SK_(complete_shadow_chunk)");
-}
-
-/* ---------------------------------------------------------------------
-   Alternative free()
-   ------------------------------------------------------------------ */
-
-__attribute__ ((weak))
-void SK_(alt_free) ( ShadowChunk* sc, ThreadState* tst )
-{
-   non_fund_panic("SK_(alt_free)");
-}
-
-/* ---------------------------------------------------------------------
    Sanity checks
    ------------------------------------------------------------------ */
 
@@ -282,6 +268,95 @@
    non_fund_panic("SK_(expensive_sanity_check)");
 }
 
+/*------------------------------------------------------------*/
+/*--- Replacing malloc et al                               ---*/
+/*------------------------------------------------------------*/
+
+/* Default redzone for CLIENT arena of Valgrind's malloc() is 4 bytes */
+__attribute__ ((weak))
+UInt VG_(vg_malloc_redzone_szB) = 4;
+
+Bool VG_(sk_malloc_called_by_scheduler) = False;
+
+static __attribute__ ((noreturn))
+void malloc_panic ( Char* fn )
+{
+   VG_(printf)(
+      "\nSkin error:\n"
+      "  The skin you have selected is missing the function `%s'\n"
+      "  required because it is replacing malloc() et al.\n\n",
+      fn);
+   VG_(skin_panic)("Missing skin function");
+}
+
+/* If the skin hasn't replaced malloc(), this one can be called from the
+   scheduler, for the USERREQ__MALLOC user request used by vg_libpthread.c. 
+   (Nb: it cannot call glibc's malloc().)  The lock variable ensures that the
+   scheduler is the only place this can be called from;  this ensures that a
+   malloc()-replacing skin cannot forget to implement SK_(malloc)() or
+   SK_(free)().  */
+__attribute__ ((weak))
+void* SK_(malloc)               ( ThreadState* tst, Int size )
+{
+   if (VG_(sk_malloc_called_by_scheduler))
+      return VG_(cli_malloc)(4, size);
+   else 
+      malloc_panic("SK_(malloc)");
+}
+
+__attribute__ ((weak))
+void* SK_(__builtin_new)        ( ThreadState* tst, Int size )
+{
+   malloc_panic("SK_(__builtin_new)");
+}
+
+__attribute__ ((weak))
+void* SK_(__builtin_vec_new)    ( ThreadState* tst, Int size )
+{
+   malloc_panic("SK_(__builtin_vec_new)");
+}
+
+__attribute__ ((weak))
+void* SK_(memalign)             ( ThreadState* tst, Int align, Int size )
+{
+   malloc_panic("SK_(memalign)");
+}
+
+__attribute__ ((weak))
+void* SK_(calloc)               ( ThreadState* tst, Int nmemb, Int size )
+{
+   malloc_panic("SK_(calloc)");
+}
+
+__attribute__ ((weak))
+void  SK_(free)                 ( ThreadState* tst, void* p )
+{
+   /* see comment for SK_(malloc)() above */
+   if (VG_(sk_malloc_called_by_scheduler))
+      VG_(cli_free)(p);
+   else 
+      malloc_panic("SK_(free)");
+}
+
+__attribute__ ((weak))
+void  SK_(__builtin_delete)     ( ThreadState* tst, void* p )
+{
+   malloc_panic("SK_(__builtin_delete)");
+}
+
+__attribute__ ((weak))
+void  SK_(__builtin_vec_delete) ( ThreadState* tst, void* p )
+{
+   malloc_panic("SK_(__builtin_vec_delete)");
+}
+
+__attribute__ ((weak))
+void* SK_(realloc)              ( ThreadState* tst, void* p, Int new_size )
+{
+   malloc_panic("SK_(realloc)");
+}
+
+
 /*--------------------------------------------------------------------*/
 /*--- end                                            vg_defaults.c ---*/
 /*--------------------------------------------------------------------*/
diff --git a/coregrind/vg_errcontext.c b/coregrind/vg_errcontext.c
index 3040fb2..76b9439 100644
--- a/coregrind/vg_errcontext.c
+++ b/coregrind/vg_errcontext.c
@@ -161,9 +161,8 @@
    error comes from:
 
    - If from generated code (tst == NULL), the %EIP/%EBP values that we
-     need in order to create proper error messages are picked up out of
-     VG_(baseBlock) rather than from the thread table (vg_threads in
-     vg_scheduler.c).
+     need in order to attach GDB are picked up out of VG_(baseBlock) rather
+     than from the thread table (vg_threads in vg_scheduler.c).
 
    - If not from generated code but in response to requests passed back to
      the scheduler (tst != NULL), we pick up %EIP/%EBP values from the
@@ -171,7 +170,9 @@
 */
 static __inline__
 void construct_error ( Error* err, ThreadState* tst, ErrorKind ekind, Addr a,
-                       Char* s, void* extra, ExeContext* where )
+                       Char* s, void* extra, ExeContext* where,
+                       /*out*/Addr* m_eip, /*out*/Addr* m_esp,
+                       /*out*/Addr* m_ebp )
 {
    /* Core-only parts */
    err->next     = NULL;
@@ -184,14 +185,14 @@
 
    if (NULL == tst) {
       err->tid   = VG_(get_current_tid)();
-      err->m_eip = VG_(baseBlock)[VGOFF_(m_eip)];
-      err->m_esp = VG_(baseBlock)[VGOFF_(m_esp)];
-      err->m_ebp = VG_(baseBlock)[VGOFF_(m_ebp)];
+      *m_eip = VG_(baseBlock)[VGOFF_(m_eip)];
+      *m_esp = VG_(baseBlock)[VGOFF_(m_esp)];
+      *m_ebp = VG_(baseBlock)[VGOFF_(m_ebp)];
    } else {
       err->tid   = tst->tid;
-      err->m_eip = tst->m_eip;
-      err->m_esp = tst->m_esp;
-      err->m_ebp = tst->m_ebp;
+      *m_eip = tst->m_eip;
+      *m_esp = tst->m_esp;
+      *m_ebp = tst->m_ebp;
    }
 
    /* Skin-relevant parts */
@@ -246,12 +247,14 @@
    VG_(printf)("}\n");
 }
 
-void do_actions_on_error(Error* err)
+void do_actions_on_error(Error* err, Bool allow_GDB_attach,
+                         Addr m_eip, Addr m_esp, Addr m_ebp )
 {
    /* Perhaps we want a GDB attach at this point? */
-   if (VG_(is_action_requested)( "Attach to GDB", & VG_(clo_GDB_attach) )) {
-      VG_(swizzle_esp_then_start_GDB)(
-         err->m_eip, err->m_esp, err->m_ebp);
+   if (allow_GDB_attach &&
+       VG_(is_action_requested)( "Attach to GDB", & VG_(clo_GDB_attach) )) 
+   {
+      VG_(swizzle_esp_then_start_GDB)( m_eip, m_esp, m_ebp );
    }
    /* Or maybe we want to generate the error's suppression? */
    if (VG_(is_action_requested)( "Print suppression",
@@ -270,6 +273,7 @@
 void VG_(maybe_record_error) ( ThreadState* tst, 
                                ErrorKind ekind, Addr a, Char* s, void* extra )
 {
+          Addr   m_eip, m_esp, m_ebp;
           Error  err;
           Error* p;
           Error* p_prev;
@@ -334,7 +338,8 @@
    }
 
    /* Build ourselves the error */
-   construct_error ( &err, tst, ekind, a, s, extra, NULL );
+   construct_error ( &err, tst, ekind, a, s, extra, NULL,
+                     &m_eip, &m_esp, &m_ebp );
 
    /* First, see if we've got an error record matching this one. */
    p      = vg_errors;
@@ -405,7 +410,7 @@
       pp_Error(p, False);
       is_first_shown_context = False;
       vg_n_errs_shown++;
-      do_actions_on_error(p);
+      do_actions_on_error(p, /*allow_GDB_attach*/True, m_eip, m_esp, m_ebp );
    } else {
       vg_n_errs_suppressed++;
       p->supp->count++;
@@ -418,12 +423,15 @@
    comparing stuff.  But they can be suppressed;  returns True if it is
    suppressed.  Bool `print_error' dictates whether to print the error. */
 Bool VG_(unique_error) ( ThreadState* tst, ErrorKind ekind, Addr a, Char* s,
-                         void* extra, ExeContext* where, Bool print_error )
+                         void* extra, ExeContext* where, Bool print_error,
+                         Bool allow_GDB_attach )
 {
    Error  err;
+   Addr   m_eip, m_esp, m_ebp;
 
    /* Build ourselves the error */
-   construct_error ( &err, tst, ekind, a, s, extra, where );
+   construct_error ( &err, tst, ekind, a, s, extra, where,
+                     &m_eip, &m_esp, &m_ebp );
 
    /* Unless it's suppressed, we're going to show it.  Don't need to make
       a copy, because it's only temporary anyway.
@@ -442,7 +450,7 @@
          pp_Error(&err, False);
          is_first_shown_context = False;
       }
-      do_actions_on_error(&err);
+      do_actions_on_error(&err, allow_GDB_attach, m_eip, m_esp, m_ebp);
 
       return False;
 
@@ -490,7 +498,6 @@
       if (su->count > 0)
          n_supp_contexts++;
    }
-
    VG_(message)(Vg_UserMsg,
                 "ERROR SUMMARY: "
                 "%d errors from %d contexts (suppressed: %d from %d)",
diff --git a/coregrind/vg_hashtable.c b/coregrind/vg_hashtable.c
new file mode 100644
index 0000000..03cbe00
--- /dev/null
+++ b/coregrind/vg_hashtable.c
@@ -0,0 +1,217 @@
+
+/*--------------------------------------------------------------------*/
+/*--- A separately chained hash table.              vg_hashtable.c ---*/
+/*--------------------------------------------------------------------*/
+
+/*
+   This file is part of Valgrind, an extensible x86 protected-mode
+   emulator for monitoring program execution on x86-Unixes.
+
+   Copyright (C) 2000-2002 Julian Seward 
+      jseward@acm.org
+
+   This program is free software; you can redistribute it and/or
+   modify it under the terms of the GNU General Public License as
+   published by the Free Software Foundation; either version 2 of the
+   License, or (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software
+   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+   02111-1307, USA.
+
+   The GNU General Public License is contained in the file COPYING.
+*/
+
+#include "vg_include.h"
+
+/*--------------------------------------------------------------------*/
+/*--- Declarations                                                 ---*/
+/*--------------------------------------------------------------------*/
+
+/* Holds malloc'd but not freed blocks.  Static, so zero-inited by default. */
+
+#define VG_N_CHAINS 997
+
+#define VG_CHAIN_NO(aa) (((UInt)(aa)) % VG_N_CHAINS)
+
+/*--------------------------------------------------------------------*/
+/*--- Functions                                                    ---*/
+/*--------------------------------------------------------------------*/
+
+VgHashTable VG_(HT_construct)(void)
+{
+   /* VG_(malloc) initialises to zero */
+   return VG_(malloc)(VG_N_CHAINS * sizeof(VgHashNode*));
+}
+
+Int VG_(HT_count_nodes) ( VgHashTable table )
+{
+   VgHashNode* node;
+   UInt      chain;
+   Int       n = 0;
+
+   for (chain = 0; chain < VG_N_CHAINS; chain++)
+      for (node = table[chain]; node != NULL; node = node->next)
+         n++;
+   return n;
+}
+
+/* Puts a new, heap allocated VgHashNode, into the malloclist. */
+void VG_(HT_add_node) ( VgHashTable table, VgHashNode* node )
+{
+   UInt chain   = VG_CHAIN_NO(node->key);
+   node->next   = table[chain];
+   table[chain] = node;
+}
+
+/* Looks up a VgHashNode in the table.  Also returns the address of
+   the previous node's `next' pointer which allows it to be removed from the
+   list later without having to look it up again.  */
+VgHashNode* VG_(HT_get_node) ( VgHashTable table, UInt key,
+                             /*OUT*/VgHashNode*** next_ptr )
+{
+   VgHashNode *prev, *curr;
+   Int       chain;
+
+   chain = VG_CHAIN_NO(key);
+
+   prev = NULL;
+   curr = table[chain];
+   while (True) {
+      if (curr == NULL)
+         break;
+      if (key == curr->key)
+         break;
+      prev = curr;
+      curr = curr->next;
+   }
+
+   if (NULL == prev)
+      *next_ptr = & table[chain];
+   else
+      *next_ptr = & prev->next;
+
+   return curr;
+}
+
+static
+void sort_hash_array ( VgHashNode** shadows, UInt n_shadows )
+{
+   Int   incs[14] = { 1, 4, 13, 40, 121, 364, 1093, 3280,
+                      9841, 29524, 88573, 265720,
+                      797161, 2391484 };
+   Int          lo = 0;
+   Int          hi = n_shadows-1;
+   Int          i, j, h, bigN, hp;
+   VgHashNode* v;
+
+   bigN = hi - lo + 1; if (bigN < 2) return;
+   hp = 0; while (hp < 14 && incs[hp] < bigN) hp++; hp--;
+   sk_assert(0 <= hp && hp < 14);
+
+   for (; hp >= 0; hp--) {
+      h = incs[hp];
+      i = lo + h;
+      while (1) {
+         if (i > hi) break;
+         v = shadows[i];
+         j = i;
+         while (shadows[j-h]->key > v->key) {
+            shadows[j] = shadows[j-h];
+            j = j - h;
+            if (j <= (lo + h - 1)) break;
+         }
+         shadows[j] = v;
+         i++;
+      }
+   }
+}
+
+/* Allocates a suitably-sized array, copies all the malloc'd block
+   shadows into it, sorts it by the `key' field, then returns both the array
+   and the size of it.  This is used by the memory-leak detector.
+*/
+VgHashNode** VG_(HT_to_sorted_array) ( VgHashTable table, 
+                                       /*OUT*/ UInt* n_shadows )
+{
+   UInt       i, j;
+   VgHashNode** arr;
+   VgHashNode*  node;
+
+   *n_shadows = 0;
+   for (i = 0; i < VG_N_CHAINS; i++) {
+      for (node = table[i]; node != NULL; node = node->next) {
+         (*n_shadows)++;
+      }
+   }
+   if (*n_shadows == 0) 
+      return NULL;
+
+   arr = VG_(malloc)( *n_shadows * sizeof(VgHashNode*) );
+
+   j = 0;
+   for (i = 0; i < VG_N_CHAINS; i++) {
+      for (node = table[i]; node != NULL; node = node->next) {
+         arr[j++] = node;
+      }
+   }
+   sk_assert(j == *n_shadows);
+
+   sort_hash_array(arr, *n_shadows);
+
+   /* Sanity check; assert that the blocks are now in order */
+   for (i = 0; i < *n_shadows-1; i++) {
+      sk_assert( arr[i]->key < arr[i+1]->key );
+   }
+
+   return arr;
+}
+
+/* Return the first VgHashNode satisfying the predicate p. */
+VgHashNode* VG_(HT_first_match) ( VgHashTable table, Bool (*p) ( VgHashNode* ))
+{
+   UInt      i;
+   VgHashNode* node;
+
+   for (i = 0; i < VG_N_CHAINS; i++)
+      for (node = table[i]; node != NULL; node = node->next)
+         if ( p(node) )
+            return node;
+
+   return NULL;
+}
+
+void VG_(HT_apply_to_all_nodes)( VgHashTable table, void (*f)(VgHashNode*) )
+{
+   UInt      i;
+   VgHashNode* node;
+
+   for (i = 0; i < VG_N_CHAINS; i++) {
+      for (node = table[i]; node != NULL; node = node->next) {
+         f(node);
+      }
+   }
+}
+
+void VG_(HT_destruct)(VgHashTable table)
+{
+   UInt      i;
+   VgHashNode* node;
+   
+   for (i = 0; i < VG_N_CHAINS; i++) {
+      for (node = table[i]; node != NULL; node = node->next) {
+         VG_(free)(node);
+      }
+   }
+   VG_(free)(table);
+}
+
+/*--------------------------------------------------------------------*/
+/*--- end                                           vg_hashtable.c ---*/
+/*--------------------------------------------------------------------*/
diff --git a/coregrind/vg_include.h b/coregrind/vg_include.h
index 85ba492..7eed65a 100644
--- a/coregrind/vg_include.h
+++ b/coregrind/vg_include.h
@@ -177,12 +177,6 @@
 extern Int   VG_(sanity_level);
 /* Automatically attempt to demangle C++ names?  default: YES */
 extern Bool  VG_(clo_demangle);
-/* Round malloc sizes upwards to integral number of words? default:
-   NO */
-extern Bool  VG_(clo_sloppy_malloc);
-/* Minimum alignment in functions that don't specify alignment explicitly.
-   default: 0, i.e. use default of the machine (== 4) */
-extern Int   VG_(clo_alignment);
 /* Simulate child processes? default: NO */
 extern Bool  VG_(clo_trace_children);
 
@@ -224,8 +218,6 @@
 extern Bool  VG_(clo_trace_signals);
 /* DEBUG: print symtab details?  default: NO */
 extern Bool  VG_(clo_trace_symtab);
-/* DEBUG: print malloc details?  default: NO */
-extern Bool  VG_(clo_trace_malloc);
 /* DEBUG: print thread scheduling events?  default: NO */
 extern Bool  VG_(clo_trace_sched);
 /* DEBUG: print pthread (mutex etc) events?  default: 0 (none), 1
@@ -304,8 +296,6 @@
       Bool client_requests;
       Bool extended_UCode;
       Bool syscall_wrapper;
-      UInt sizeof_shadow_block;
-      Bool alternative_free;
       Bool sanity_checks;
       Bool data_syms;
    } 
@@ -320,11 +310,16 @@
    struct {
       /* Memory events */
       void (*new_mem_startup)( Addr a, UInt len, Bool rr, Bool ww, Bool xx );
-      void (*new_mem_heap)   ( Addr a, UInt len, Bool is_inited );
       void (*new_mem_stack_signal)  ( Addr a, UInt len );
       void (*new_mem_brk)    ( Addr a, UInt len );
       void (*new_mem_mmap)   ( Addr a, UInt len, Bool rr, Bool ww, Bool xx );
 
+      void (*copy_mem_remap) ( Addr from, Addr to, UInt len );
+      void (*change_mem_mprotect) ( Addr a, UInt len, Bool rr, Bool ww, Bool xx );
+      void (*die_mem_stack_signal)  ( Addr a, UInt len );
+      void (*die_mem_brk)    ( Addr a, UInt len );
+      void (*die_mem_munmap) ( Addr a, UInt len );
+
       void (*new_mem_stack_4)  ( Addr new_ESP );
       void (*new_mem_stack_8)  ( Addr new_ESP );
       void (*new_mem_stack_12) ( Addr new_ESP );
@@ -332,19 +327,6 @@
       void (*new_mem_stack_32) ( Addr new_ESP );
       void (*new_mem_stack)    ( Addr a, UInt len );
 
-      void (*copy_mem_heap)  ( Addr from, Addr to, UInt len );
-      void (*copy_mem_remap) ( Addr from, Addr to, UInt len );
-      void (*change_mem_mprotect) ( Addr a, UInt len, Bool rr, Bool ww, Bool xx );
-      
-      /* Used on redzones around malloc'd blocks and at end of stack */
-      void (*ban_mem_heap)   ( Addr a, UInt len );
-      void (*ban_mem_stack)  ( Addr a, UInt len );
-
-      void (*die_mem_heap)   ( Addr a, UInt len );
-      void (*die_mem_stack_signal)  ( Addr a, UInt len );
-      void (*die_mem_brk)    ( Addr a, UInt len );
-      void (*die_mem_munmap) ( Addr a, UInt len );
-
       void (*die_mem_stack_4)  ( Addr die_ESP );
       void (*die_mem_stack_8)  ( Addr die_ESP );
       void (*die_mem_stack_12) ( Addr die_ESP );
@@ -352,8 +334,7 @@
       void (*die_mem_stack_32) ( Addr die_ESP );
       void (*die_mem_stack)    ( Addr a, UInt len );
 
-      void (*bad_free)        ( ThreadState* tst, Addr a );
-      void (*mismatched_free) ( ThreadState* tst, Addr a );
+      void (*ban_mem_stack)  ( Addr a, UInt len );
 
       void (*pre_mem_read)   ( CorePart part, ThreadState* tst,
                                Char* s, Addr a, UInt size );
@@ -386,7 +367,7 @@
                                   void* /*pthread_mutex_t* */ mutex );
 
       /* Signal events (not exhaustive) */
-      void (*pre_deliver_signal)  ( ThreadId tid, Int sigNo, Bool alt_stack );
+      void (* pre_deliver_signal) ( ThreadId tid, Int sigNo, Bool alt_stack );
       void (*post_deliver_signal) ( ThreadId tid, Int sigNo );
 
       
@@ -409,35 +390,38 @@
    ------------------------------------------------------------------ */
 
 /* Allocation arenas.  
-      CORE      is for the core's general use.
-      SKIN      is for the skin to use (and the only one it uses).
-      SYMTAB    is for Valgrind's symbol table storage.
-      JITTER    is for small storage during translation.
-      CLIENT    is for the client's mallocs/frees.
-      DEMANGLE  is for the C++ demangler.
-      EXECTXT   is for storing ExeContexts.
-      ERRORS    is for storing CoreErrors.
-      TRANSIENT is for very short-term use.  It should be empty
-                in between uses.
+
+      CORE      for the core's general use.
+      SKIN      for the skin to use (and the only one it uses).
+      SYMTAB    for Valgrind's symbol table storage.
+      JITTER    for small storage during translation.
+      CLIENT    for the client's mallocs/frees, if the skin replaces glibc's
+                    malloc() et al -- redzone size is chosen by the skin.
+      DEMANGLE  for the C++ demangler.
+      EXECTXT   for storing ExeContexts.
+      ERRORS    for storing CoreErrors.
+      TRANSIENT for very short-term use.  It should be empty in between uses.
+
    When adding a new arena, remember also to add it to ensure_mm_init(). 
 */
 typedef Int ArenaId;
 
-#define VG_N_ARENAS 9
+#define VG_N_ARENAS        9 
 
-#define VG_AR_CORE      0    /* :: ArenaId */
-#define VG_AR_SKIN      1    /* :: ArenaId */
-#define VG_AR_SYMTAB    2    /* :: ArenaId */
-#define VG_AR_JITTER    3    /* :: ArenaId */
-#define VG_AR_CLIENT    4    /* :: ArenaId */
-#define VG_AR_DEMANGLE  5    /* :: ArenaId */
-#define VG_AR_EXECTXT   6    /* :: ArenaId */
-#define VG_AR_ERRORS    7    /* :: ArenaId */
-#define VG_AR_TRANSIENT 8    /* :: ArenaId */
+#define VG_AR_CORE         0
+#define VG_AR_SKIN         1
+#define VG_AR_SYMTAB       2
+#define VG_AR_JITTER       3
+#define VG_AR_CLIENT       4
+#define VG_AR_DEMANGLE     5
+#define VG_AR_EXECTXT      6
+#define VG_AR_ERRORS       7
+#define VG_AR_TRANSIENT    8
 
 extern void* VG_(arena_malloc)  ( ArenaId arena, Int nbytes );
 extern void  VG_(arena_free)    ( ArenaId arena, void* ptr );
-extern void* VG_(arena_calloc)  ( ArenaId arena, Int nmemb, Int nbytes );
+extern void* VG_(arena_calloc)  ( ArenaId arena, Int alignment,
+                                  Int nmemb, Int nbytes );
 extern void* VG_(arena_realloc) ( ArenaId arena, void* ptr, Int alignment,
                                   Int size );
 extern void* VG_(arena_malloc_aligned) ( ArenaId aid, Int req_alignB, 
@@ -449,16 +433,8 @@
 extern Bool  VG_(is_empty_arena) ( ArenaId aid );
 
 
-/* The red-zone size for the client.  This can be arbitrary, but
-   unfortunately must be set at compile time. */
-#define VG_AR_CLIENT_REDZONE_SZW 4
-
-#define VG_AR_CLIENT_REDZONE_SZB \
-   (VG_AR_CLIENT_REDZONE_SZW * VKI_BYTES_PER_WORD)
-
-
 /* ---------------------------------------------------------------------
-   Exports of vg_clientfuncs.c
+   Exports of vg_intercept.c
    ------------------------------------------------------------------ */
 
 /* This doesn't export code or data that valgrind.so needs to link
@@ -467,17 +443,7 @@
    defined in valgrind.h, and similar headers for some skins. */
 
 #define VG_USERREQ__MALLOC              0x2001
-#define VG_USERREQ__BUILTIN_NEW         0x2002
-#define VG_USERREQ__BUILTIN_VEC_NEW     0x2003
-
-#define VG_USERREQ__FREE                0x2004
-#define VG_USERREQ__BUILTIN_DELETE      0x2005
-#define VG_USERREQ__BUILTIN_VEC_DELETE  0x2006
-
-#define VG_USERREQ__CALLOC              0x2007
-#define VG_USERREQ__REALLOC             0x2008
-#define VG_USERREQ__MEMALIGN            0x2009
-
+#define VG_USERREQ__FREE                0x2002
 
 /* (Fn, Arg): Create a new thread and run Fn applied to Arg in it.  Fn
    MUST NOT return -- ever.  Eventually it will do either __QUIT or
@@ -552,9 +518,11 @@
 #define VG_USERREQ__GET_PTHREAD_TRACE_LEVEL 0x3101
 /* Log a pthread error from client-space.  Cosmetic. */
 #define VG_USERREQ__PTHREAD_ERROR           0x3102
-/* Write a string to the logging sink. */
+/* 
+In vg_skin.h, so skins can use it.
+Write a string to the logging sink. 
 #define VG_USERREQ__LOGMESSAGE              0x3103
-
+*/
 
 /* 
 In vg_constants.h:
@@ -578,6 +546,13 @@
 
 
 /* ---------------------------------------------------------------------
+   Exports of vg_defaults.c
+   ------------------------------------------------------------------ */
+
+extern Bool VG_(sk_malloc_called_by_scheduler);
+
+
+/* ---------------------------------------------------------------------
    Exports of vg_ldt.c
    ------------------------------------------------------------------ */
 
@@ -1166,13 +1141,6 @@
    Supp* supp;
    Int count;
    ThreadId tid;
-   /* These record %EIP, %ESP and %EBP at the error point.  They
-      are only used to make GDB-attaching convenient; there is no
-      other purpose; specifically they are not used to do
-      comparisons between errors. */
-   UInt m_eip;
-   UInt m_esp;
-   UInt m_ebp;
 
    /* The skin-specific part */
    /* Initialised by core */
@@ -1206,9 +1174,16 @@
    Exports of vg_procselfmaps.c
    ------------------------------------------------------------------ */
 
+/* Reads /proc/self/maps into a static buffer. */
+void VG_(read_procselfmaps_contents) ( void );
+
+/* Parses /proc/self/maps, calling `record_mapping' for each entry.  If
+   `read_from_file' is True, /proc/self/maps is read directly, otherwise
+   it's read from the buffer filled by VG_(read_procselfmaps_contents)(). */
 extern 
 void VG_(read_procselfmaps) (
-   void (*record_mapping)( Addr, UInt, Char, Char, Char, UInt, UChar* )
+   void (*record_mapping)( Addr, UInt, Char, Char, Char, UInt, UChar* ),
+   Bool read_from_file
 );
 
 
@@ -1227,46 +1202,6 @@
 
 
 /* ---------------------------------------------------------------------
-   Exports of vg_clientmalloc.c
-   ------------------------------------------------------------------ */
-
-typedef
-   enum { 
-      Vg_AllocMalloc = 0,
-      Vg_AllocNew    = 1,
-      Vg_AllocNewVec = 2 
-   }
-   VgAllocKind;
-
-/* Description of a malloc'd chunk.  Functions for extracting skin-relevant
-   parts are in include/vg_skin.h Size of skin_extra array is given by
-   VG_(needs).sizeof_shadow_chunk. */
-struct _ShadowChunk {
-   struct _ShadowChunk* next;
-   UInt          size : 30;      /* size requested                   */
-   VgAllocKind   allockind : 2;  /* which wrapper did the allocation */
-   Addr          data;           /* ptr to actual block              */
-   UInt          extra[0];       /* extra skin-specific info         */
-};
-
-
-extern void  VG_(client_malloc_init)();
-
-/* These are called from the scheduler, when it intercepts a user
-   request. */
-extern void* VG_(client_malloc)   ( ThreadState* tst, 
-                                    UInt size, VgAllocKind kind );
-extern void* VG_(client_memalign) ( ThreadState* tst, 
-                                    UInt align, UInt size );
-extern void  VG_(client_free)     ( ThreadState* tst, 
-                                    void* ptrV, VgAllocKind  kind );
-extern void* VG_(client_calloc)   ( ThreadState* tst, 
-                                    UInt nmemb, UInt size1 );
-extern void* VG_(client_realloc)  ( ThreadState* tst, 
-                                    void* ptrV, UInt size_new );
-
-
-/* ---------------------------------------------------------------------
    Exports of vg_main.c
    ------------------------------------------------------------------ */
 
diff --git a/coregrind/vg_intercept.c b/coregrind/vg_intercept.c
index beca735..e91f0fe 100644
--- a/coregrind/vg_intercept.c
+++ b/coregrind/vg_intercept.c
@@ -30,6 +30,14 @@
 */
 
 
+/* ---------------------------------------------------------------------
+   All the code in this file runs on the SIMULATED CPU.  It is
+   intended for various reasons as drop-in replacements for libc
+   functions.  These functions have global visibility (obviously) and
+   have no prototypes in vg_include.h, since they are not intended to
+   be called from within Valgrind.
+   ------------------------------------------------------------------ */
+
 /* This has some nasty duplication of stuff from vg_libpthread.c */
 
 #include <errno.h>
@@ -293,6 +301,83 @@
 
 strong_alias(writev, __writev)
 
+/* ---------------------------------------------------------------------
+   Horrible hack to make sigsuspend() sort-of work OK.  Same trick as
+   for pause() in vg_libpthread.so.
+   ------------------------------------------------------------------ */
+
+/* Horrible because
+
+   -- uses VG_(ksigprocmask), VG_(nanosleep) and vg_assert, which are 
+      valgrind-native (not intended for client use).
+
+   -- This is here so single-threaded progs (not linking libpthread.so)
+      can see it.  But pause() should also be here.  ???
+*/
+
+/* Either libc supplies this (weak) or our libpthread.so supplies it
+   (strong) in a threaded setting. 
+*/
+extern int* __errno_location ( void );
+
+
+int sigsuspend ( /* const sigset_t * */ void* mask)
+{
+   unsigned int n_orig, n_now;
+   struct vki_timespec nanosleep_interval;
+
+   VALGRIND_MAGIC_SEQUENCE(n_orig, 0xFFFFFFFF /* default */,
+                           VG_USERREQ__GET_N_SIGS_RETURNED, 
+                           0, 0, 0, 0);
+   vg_assert(n_orig != 0xFFFFFFFF);
+
+   VG_(ksigprocmask)(VKI_SIG_SETMASK, mask, NULL);
+
+   while (1) {
+      VALGRIND_MAGIC_SEQUENCE(n_now, 0xFFFFFFFF /* default */,
+                              VG_USERREQ__GET_N_SIGS_RETURNED, 
+                              0, 0, 0, 0);
+      vg_assert(n_now != 0xFFFFFFFF);
+      vg_assert(n_now >= n_orig);
+      if (n_now != n_orig) break;
+
+      nanosleep_interval.tv_sec  = 0;
+      nanosleep_interval.tv_nsec = 53 * 1000 * 1000; /* 53 milliseconds */
+      /* It's critical here that valgrind's nanosleep implementation
+         is nonblocking. */
+      VG_(nanosleep)( &nanosleep_interval, NULL);
+   }
+
+   /* Maybe this is OK both in single and multithreaded setting. */
+   * (__errno_location()) = -VKI_EINTR; /* == EINTR; */ 
+   return -1;
+}
+
+
+/* ---------------------------------------------------------------------
+   Hook for running __libc_freeres once the program exits.
+   ------------------------------------------------------------------ */
+
+void VG_(__libc_freeres_wrapper)( void )
+{
+   int res;
+   extern void __libc_freeres(void);
+   __libc_freeres();
+   VALGRIND_MAGIC_SEQUENCE(res, 0 /* default */,
+                           VG_USERREQ__LIBC_FREERES_DONE, 0, 0, 0, 0);
+   /*NOTREACHED*/
+   vg_assert(12345+54321 == 999999);
+}
+
+/* ---------------------------------------------------------------------
+   Useful for skins that want to replace certain functions
+   ------------------------------------------------------------------ */
+
+Bool VG_(is_running_on_simd_CPU)(void)
+{
+   return VG_(running_on_simd_CPU);
+}
+
 /*--------------------------------------------------------------------*/
 /*--- end                                           vg_intercept.c ---*/
 /*--------------------------------------------------------------------*/
diff --git a/coregrind/vg_ldt.c b/coregrind/vg_ldt.c
index 6d74602..c50faeb 100644
--- a/coregrind/vg_ldt.c
+++ b/coregrind/vg_ldt.c
@@ -100,7 +100,7 @@
  
    if (parent_ldt == NULL) {
       /* Allocate a new zeroed-out one. */
-      ldt = (VgLdtEntry*)VG_(arena_calloc)(VG_AR_CORE, nbytes, 1);
+      ldt = (VgLdtEntry*)VG_(arena_calloc)(VG_AR_CORE, /*align*/4, nbytes, 1);
    } else {
      ldt = (VgLdtEntry*)VG_(arena_malloc)(VG_AR_CORE, nbytes);
      for (i = 0; i < VG_M_LDT_ENTRIES; i++)
diff --git a/coregrind/vg_main.c b/coregrind/vg_main.c
index bb75f0b..69079b7 100644
--- a/coregrind/vg_main.c
+++ b/coregrind/vg_main.c
@@ -491,8 +491,6 @@
 Int    VG_(sanity_level)       = 1;
 Int    VG_(clo_verbosity)      = 1;
 Bool   VG_(clo_demangle)       = True;
-Bool   VG_(clo_sloppy_malloc)  = False;
-Int    VG_(clo_alignment)      = 4;
 Bool   VG_(clo_trace_children) = False;
 
 /* See big comment in vg_include.h for meaning of these three. */
@@ -509,7 +507,6 @@
 Bool   VG_(clo_trace_syscalls) = False;
 Bool   VG_(clo_trace_signals)  = False;
 Bool   VG_(clo_trace_symtab)   = False;
-Bool   VG_(clo_trace_malloc)   = False;
 Bool   VG_(clo_trace_sched)    = False;
 Int    VG_(clo_trace_pthread_level) = 0;
 ULong  VG_(clo_stop_after)     = 1000000000000000LL;
@@ -593,8 +590,6 @@
 "    --demangle=no|yes         automatically demangle C++ names? [yes]\n"
 "    --num-callers=<number>    show <num> callers in stack traces [4]\n"
 "    --error-limit=no|yes      stop showing new errors if too many? [yes]\n"
-"    --sloppy-malloc=no|yes    round malloc sizes to next word? [no]\n"
-"    --alignment=<number>      set minimum alignment of allocations [4]\n"
 "    --trace-children=no|yes   Valgrind-ise child processes? [no]\n"
 "    --run-libc-freeres=no|yes Free up glibc memory at exit? [yes]\n"
 "    --logfile-fd=<number>     file descriptor for messages [2=stderr]\n"
@@ -620,15 +615,18 @@
 "    --trace-syscalls=no|yes   show all system calls? [no]\n"
 "    --trace-signals=no|yes    show signal handling details? [no]\n"
 "    --trace-symtab=no|yes     show symbol table details? [no]\n"
-"    --trace-malloc=no|yes     show client malloc details? [no]\n"
 "    --trace-sched=no|yes      show thread scheduler details? [no]\n"
-"    --trace-pthread=none|some|all  show pthread event details? [no]\n"
+"    --trace-pthread=none|some|all  show pthread event details? [none]\n"
 "    --stop-after=<number>     switch to real CPU after executing\n"
 "                              <number> basic blocks [infinity]\n"
 "    --dump-error=<number>     show translation for basic block\n"
 "                              associated with <number>'th\n"
 "                              error context [0=don't show any]\n"
 "\n"
+"  %s skin debugging options:\n";
+
+   Char* usage3 =
+"\n"
 "  Extra options are read from env variable $VALGRIND_OPTS\n"
 "\n"
 "  Valgrind is Copyright (C) 2000-2002 Julian Seward\n"
@@ -642,10 +640,15 @@
    VG_(printf)(usage1, VG_(details).name);
    /* Don't print skin string directly for security, ha! */
    if (VG_(needs).command_line_options)
-      VG_(printf)("%s", SK_(usage)());
+      SK_(print_usage)();
    else
       VG_(printf)("    (none)\n");
-   VG_(printf)(usage2, VG_EMAIL_ADDR);
+   VG_(printf)(usage2, VG_(details).name);
+   if (VG_(needs).command_line_options)
+      SK_(print_debug_usage)();
+   else
+      VG_(printf)("    (none)\n");
+   VG_(printf)(usage3, VG_EMAIL_ADDR);
 
    VG_(shutdown_logging)();
    VG_(clo_log_to)     = VgLogTo_Fd;
@@ -707,11 +710,13 @@
    {
        UInt* sp;
 
-       /* Look for the stack segment by reading /proc/self/maps and
+       /* Look for the stack segment by parsing /proc/self/maps and
 	  looking for a section bracketing VG_(esp_at_startup) which
-	  has rwx permissions and no associated file. */
+	  has rwx permissions and no associated file.  Note that this uses
+          the /proc/self/maps contents read at the start of VG_(main)(),
+          and doesn't re-read /proc/self/maps. */
 
-       VG_(read_procselfmaps)( vg_findstack_callback );
+       VG_(read_procselfmaps)( vg_findstack_callback, /*read_from_file*/False );
 
        /* Now vg_foundstack_start and vg_foundstack_size
           should delimit the stack. */
@@ -890,14 +895,6 @@
       else if (VG_CLO_STREQ(argv[i], "--demangle=no"))
          VG_(clo_demangle) = False;
 
-      else if (VG_CLO_STREQ(argv[i], "--sloppy-malloc=yes"))
-         VG_(clo_sloppy_malloc) = True;
-      else if (VG_CLO_STREQ(argv[i], "--sloppy-malloc=no"))
-         VG_(clo_sloppy_malloc) = False;
-
-      else if (VG_CLO_STREQN(12, argv[i], "--alignment="))
-         VG_(clo_alignment) = (Int)VG_(atoll)(&argv[i][12]);
-
       else if (VG_CLO_STREQ(argv[i], "--trace-children=yes"))
          VG_(clo_trace_children) = True;
       else if (VG_CLO_STREQ(argv[i], "--trace-children=no"))
@@ -993,11 +990,6 @@
       else if (VG_CLO_STREQ(argv[i], "--trace-symtab=no"))
          VG_(clo_trace_symtab) = False;
 
-      else if (VG_CLO_STREQ(argv[i], "--trace-malloc=yes"))
-         VG_(clo_trace_malloc) = True;
-      else if (VG_CLO_STREQ(argv[i], "--trace-malloc=no"))
-         VG_(clo_trace_malloc) = False;
-
       else if (VG_CLO_STREQ(argv[i], "--trace-sched=yes"))
          VG_(clo_trace_sched) = True;
       else if (VG_CLO_STREQ(argv[i], "--trace-sched=no"))
@@ -1042,16 +1034,6 @@
    if (VG_(clo_verbosity < 0))
       VG_(clo_verbosity) = 0;
 
-   if (VG_(clo_alignment) < 4 
-       || VG_(clo_alignment) > 4096
-       || VG_(log2)( VG_(clo_alignment) ) == -1 /* not a power of 2 */) {
-      VG_(message)(Vg_UserMsg, "");
-      VG_(message)(Vg_UserMsg, 
-         "Invalid --alignment= setting.  "
-         "Should be a power of 2, >= 4, <= 4096.");
-      VG_(bad_option)("--alignment");
-   }
-
    if (VG_(clo_GDB_attach) && VG_(clo_trace_children)) {
       VG_(message)(Vg_UserMsg, "");
       VG_(message)(Vg_UserMsg, 
@@ -1149,7 +1131,7 @@
 
       /* Core details */
       VG_(message)(Vg_UserMsg,
-         "Using valgrind-%s, a program instrumentation system for x86-linux.",
+         "Using valgrind-%s, a program supervision framework for x86-linux.",
          VERSION);
       VG_(message)(Vg_UserMsg, 
          "Copyright (C) 2000-2002, and GNU GPL'd, by Julian Seward.");
@@ -1394,6 +1376,17 @@
       VG_(stack)[10000-1-i] = (UInt)(&VG_(stack)[10000-i-1]) ^ 0xABCD4321;
    }
 
+   /* Read /proc/self/maps into a buffer.  Must be before:
+      - SK_(pre_clo_init)(): so that if it calls VG_(malloc)(), any mmap'd
+        superblocks are not erroneously identified as being owned by the
+        client, which would be bad.
+      - init_memory(): that's where the buffer is parsed
+      - init_tt_tc(): so the anonymous mmaps for the translation table and
+        translation cache aren't identified as part of the client, which would
+        waste > 20M of virtual address space, and be bad.
+   */
+   VG_(read_procselfmaps_contents)();
+
    /* Hook to delay things long enough so we can get the pid and
       attach GDB in another shell. */
    if (0) {
@@ -1408,23 +1401,25 @@
       - process_cmd_line_options(): to register skin name and description,
         and turn on/off 'command_line_options' need
       - init_memory() (to setup memory event trackers).
-    */
+   */
    SK_(pre_clo_init)();
    VG_(sanity_check_needs)();
 
    /* Process Valgrind's command-line opts (from env var VG_ARGS). */
    process_cmd_line_options();
 
-   /* Do post command-line processing initialisation */
+   /* Do post command-line processing initialisation.  Must be before:
+      - vg_init_baseBlock(): to register any more helpers
+   */
    SK_(post_clo_init)();
 
-   /* Set up baseBlock offsets and copy the saved machine's state into it. 
-      Comes after SK_(post_clo_init) in case it registers helpers. */
+   /* Set up baseBlock offsets and copy the saved machine's state into it. */
    vg_init_baseBlock();
 
    /* Initialise the scheduler, and copy the client's state from
-      baseBlock into VG_(threads)[1].  This has to come before signal
-      initialisations. */
+      baseBlock into VG_(threads)[1].  Must be before:
+      - VG_(sigstartup_actions)()
+   */
    VG_(scheduler_init)();
 
    /* Initialise the signal handling subsystem, temporarily parking
@@ -1438,8 +1433,7 @@
    /* Start calibration of our RDTSC-based clock. */
    VG_(start_rdtsc_calibration)();
 
-   /* Must come after SK_(init) so memory handler accompaniments (eg.
-    * shadow memory) can be setup ok */
+   /* Parse /proc/self/maps to learn about startup segments. */
    VGP_PUSHCC(VgpInitMem);
    VG_(init_memory)();
    VGP_POPCC(VgpInitMem);
@@ -1453,10 +1447,7 @@
       we can. */
    VG_(end_rdtsc_calibration)();
 
-   /* This should come after init_memory_and_symbols(); otherwise the 
-      latter carefully sets up the permissions maps to cover the 
-      anonymous mmaps for the translation table and translation cache, 
-      which wastes > 20M of virtual address space. */
+   /* Initialise translation table and translation cache. */
    VG_(init_tt_tc)();
 
    if (VG_(clo_verbosity) == 1) {
diff --git a/coregrind/vg_malloc2.c b/coregrind/vg_malloc2.c
index 2edc9ee..5923b3d 100644
--- a/coregrind/vg_malloc2.c
+++ b/coregrind/vg_malloc2.c
@@ -190,37 +190,32 @@
 
    /* Use a checked red zone size of 1 word for our internal stuff,
       and an unchecked zone of arbitrary size for the client.  Of
-      course the client's red zone is checked really, but using the
-      addressibility maps, not by the mechanism implemented here,
-      which merely checks at the time of freeing that the red zone
-      words are unchanged. */
+      course the client's red zone can be checked by the skin, eg. 
+      by using addressibility maps, but not by the mechanism implemented
+      here, which merely checks at the time of freeing that the red 
+      zone words are unchanged. */
 
-   arena_init ( &vg_arena[VG_AR_CORE],      "core    ", 
-                1, True, 262144 );
+   arena_init ( &vg_arena[VG_AR_CORE],      "core",     1, True, 262144 );
 
-   arena_init ( &vg_arena[VG_AR_SKIN],      "skin    ", 
-                1, True, 262144 );
+   arena_init ( &vg_arena[VG_AR_SKIN],      "skin",     1, True, 262144 );
 
-   arena_init ( &vg_arena[VG_AR_SYMTAB],    "symtab  ", 
-                1, True, 262144 );
+   arena_init ( &vg_arena[VG_AR_SYMTAB],    "symtab",   1, True, 262144 );
 
-   arena_init ( &vg_arena[VG_AR_JITTER],    "JITter  ", 
-                1, True, 8192 );
+   arena_init ( &vg_arena[VG_AR_JITTER],    "JITter",   1, True, 8192 );
 
-   arena_init ( &vg_arena[VG_AR_CLIENT],    "client  ",  
-                VG_AR_CLIENT_REDZONE_SZW, False, 262144 );
+   /* No particular reason for this figure, it's just smallish */
+   sk_assert(VG_(vg_malloc_redzone_szB) < 128);
+   arena_init ( &vg_arena[VG_AR_CLIENT],    "client",  
+                VG_(vg_malloc_redzone_szB)/4, False, 262144 );
 
-   arena_init ( &vg_arena[VG_AR_DEMANGLE],  "demangle",  
-                4 /*paranoid*/, True, 16384 );
+   arena_init ( &vg_arena[VG_AR_DEMANGLE],  "demangle", 4 /*paranoid*/,
+                                                           True, 16384 );
 
-   arena_init ( &vg_arena[VG_AR_EXECTXT],   "exectxt ",  
-                1, True, 16384 );
+   arena_init ( &vg_arena[VG_AR_EXECTXT],   "exectxt",  1, True, 16384 );
 
-   arena_init ( &vg_arena[VG_AR_ERRORS],    "errors  ",  
-                1, True, 16384 );
+   arena_init ( &vg_arena[VG_AR_ERRORS],    "errors",   1, True, 16384 );
 
-   arena_init ( &vg_arena[VG_AR_TRANSIENT], "transien",  
-                2, True, 16384 );
+   arena_init ( &vg_arena[VG_AR_TRANSIENT], "transien", 2, True, 16384 );
 
    init_done = True;
 #  ifdef DEBUG_MALLOC
@@ -805,7 +800,7 @@
    }
 
    VG_(message)(Vg_DebugMsg,
-                "mSC [%s]: %2d sbs, %5d tot bs, %4d/%-4d free bs, "
+                "mSC [%8s]: %2d sbs, %5d tot bs, %4d/%-4d free bs, "
                 "%2d lists, %7d mmap, %7d loan", 
                 a->name,
                 superblockctr,
@@ -1189,7 +1184,7 @@
 /*--- Services layered on top of malloc/free.              ---*/
 /*------------------------------------------------------------*/
 
-void* VG_(arena_calloc) ( ArenaId aid, Int nmemb, Int nbytes )
+void* VG_(arena_calloc) ( ArenaId aid, Int alignB, Int nmemb, Int nbytes )
 {
    Int    i, size;
    UChar* p;
@@ -1198,7 +1193,12 @@
 
    size = nmemb * nbytes;
    vg_assert(size >= 0);
-   p = VG_(arena_malloc) ( aid, size );
+
+   if (alignB == 4)
+      p = VG_(arena_malloc) ( aid, size );
+   else
+      p = VG_(arena_malloc_aligned) ( aid, alignB, size );
+
    for (i = 0; i < size; i++) p[i] = 0;
 
    VGP_POPCC(VgpMalloc);
@@ -1271,7 +1271,7 @@
 
 void* VG_(calloc) ( Int nmemb, Int nbytes )
 {
-   return VG_(arena_calloc) ( VG_AR_SKIN, nmemb, nbytes );
+   return VG_(arena_calloc) ( VG_AR_SKIN, /*alignment*/4, nmemb, nbytes );
 }
 
 void* VG_(realloc) ( void* ptr, Int size )
@@ -1285,6 +1285,28 @@
 }
 
 
+void* VG_(cli_malloc) ( UInt align, Int nbytes )                 
+{                                                                             
+   vg_assert(align >= 4);                                                     
+   if (4 == align)                                                            
+      return VG_(arena_malloc)         ( VG_AR_CLIENT, nbytes ); 
+   else                                                                       
+      return VG_(arena_malloc_aligned) ( VG_AR_CLIENT, align, nbytes );                            
+}                                                                             
+
+void VG_(cli_free) ( void* p )                                   
+{                                                                             
+   VG_(arena_free) ( VG_AR_CLIENT, p );                          
+}
+
+
+Bool VG_(addr_is_in_block)( Addr a, Addr start, UInt size )
+{  
+   return (start - VG_(vg_malloc_redzone_szB) <= a
+           && a < start + size + VG_(vg_malloc_redzone_szB));
+}
+
+
 /*------------------------------------------------------------*/
 /*--- The original test driver machinery.                  ---*/
 /*------------------------------------------------------------*/
diff --git a/coregrind/vg_memory.c b/coregrind/vg_memory.c
index 6ade642..6fc18eb 100644
--- a/coregrind/vg_memory.c
+++ b/coregrind/vg_memory.c
@@ -190,26 +190,24 @@
 }
 
 
-/* 1. Records exe segments from /proc/pid/maps -- always necessary, because 
-      if they're munmap()ed we need to know if they were executable in order
-      to discard translations.  Also checks there's no exe segment overlaps.
+/* 1. Records startup segments from /proc/pid/maps.  Takes special note
+      of the executable ones, because if they're munmap()ed we need to
+      discard translations.  Also checks there's no exe segment overlaps.
 
-   2. Marks global variables that might be accessed from generated code;
+      Note that `read_from_file' is false;  we read /proc/self/maps into a
+      buffer at the start of VG_(main) so that any superblocks mmap'd by
+      calls to VG_(malloc)() by SK_({pre,post}_clo_init) aren't erroneously
+      thought of as being owned by the client.
 
-   3. Sets up the end of the data segment so that vg_syscalls.c can make
+   2. Sets up the end of the data segment so that vg_syscalls.c can make
       sense of calls to brk().
  */
 void VG_(init_memory) ( void )
 {
-   /* 1 and 2 */
-   VG_(read_procselfmaps) ( startup_segment_callback );
+   /* 1 */
+   VG_(read_procselfmaps) ( startup_segment_callback, /*read_from_file*/False );
 
-   /* 3 */
-   VG_TRACK( post_mem_write, (Addr) & VG_(running_on_simd_CPU), 1 );
-   VG_TRACK( post_mem_write, (Addr) & VG_(clo_trace_malloc),    1 );
-   VG_TRACK( post_mem_write, (Addr) & VG_(clo_sloppy_malloc),   1 );
-
-   /* 4 */
+   /* 2 */
    VG_(init_dataseg_end_for_brk)();
 
    /* kludge: some newer kernels place a "sysinfo" page up high, with
@@ -219,7 +217,6 @@
       VG_TRACK( new_mem_startup, VG_(sysinfo_page_addr), 4096, 
                 True, True, True );
      }
-
 }
 
 /*------------------------------------------------------------*/
diff --git a/coregrind/vg_needs.c b/coregrind/vg_needs.c
index 983e02f..ad201ef 100644
--- a/coregrind/vg_needs.c
+++ b/coregrind/vg_needs.c
@@ -50,14 +50,12 @@
    .core_errors          = False,
    .skin_errors          = False,
    .libc_freeres         = False,
-   .sizeof_shadow_block  = 0,
    .basic_block_discards = False,
    .shadow_regs          = False,
    .command_line_options = False,
    .client_requests      = False,
    .extended_UCode       = False,
    .syscall_wrapper      = False,
-   .alternative_free     = False,
    .sanity_checks        = False,
    .data_syms	         = False,
 };
@@ -65,11 +63,17 @@
 VgTrackEvents VG_(track_events) = {
    /* Memory events */
    .new_mem_startup              = NULL,
-   .new_mem_heap                 = NULL,
    .new_mem_stack_signal         = NULL,
    .new_mem_brk                  = NULL,
    .new_mem_mmap                 = NULL,
 
+   .copy_mem_remap               = NULL,
+   .change_mem_mprotect          = NULL,
+
+   .die_mem_stack_signal         = NULL,
+   .die_mem_brk                  = NULL,
+   .die_mem_munmap               = NULL,
+
    .new_mem_stack_4              = NULL,
    .new_mem_stack_8              = NULL,
    .new_mem_stack_12             = NULL,
@@ -77,18 +81,6 @@
    .new_mem_stack_32             = NULL,
    .new_mem_stack                = NULL,
 
-   .copy_mem_heap                = NULL,
-   .copy_mem_remap               = NULL,
-   .change_mem_mprotect          = NULL,
-
-   .ban_mem_heap                 = NULL,
-   .ban_mem_stack                = NULL,
-
-   .die_mem_heap                 = NULL,
-   .die_mem_stack_signal         = NULL,
-   .die_mem_brk                  = NULL,
-   .die_mem_munmap               = NULL,
-
    .die_mem_stack_4              = NULL,
    .die_mem_stack_8              = NULL,
    .die_mem_stack_12             = NULL,
@@ -96,8 +88,7 @@
    .die_mem_stack_32             = NULL,
    .die_mem_stack                = NULL,
 
-   .bad_free                     = NULL,
-   .mismatched_free              = NULL,
+   .ban_mem_stack                = NULL,
 
    .pre_mem_read                 = NULL,
    .pre_mem_read_asciiz          = NULL,
@@ -195,13 +186,6 @@
 NEEDS(client_requests)
 NEEDS(extended_UCode)
 NEEDS(syscall_wrapper)
-
-extern void VG_(needs_sizeof_shadow_block)(Int size)
-{
-   VG_(needs).sizeof_shadow_block = size;
-}
-
-NEEDS(alternative_free)
 NEEDS(sanity_checks)
 NEEDS(data_syms)
 
@@ -212,12 +196,19 @@
       VG_(track_events).event = f;        \
    }
 
+/* Memory events */
 TRACK(new_mem_startup,       Addr a, UInt len, Bool rr, Bool ww, Bool xx)
-TRACK(new_mem_heap,          Addr a, UInt len, Bool is_inited)
 TRACK(new_mem_stack_signal,  Addr a, UInt len)
 TRACK(new_mem_brk,           Addr a, UInt len)
 TRACK(new_mem_mmap,          Addr a, UInt len, Bool rr, Bool ww, Bool xx)
 
+TRACK(copy_mem_remap,      Addr from, Addr to, UInt len)
+TRACK(change_mem_mprotect, Addr a, UInt len, Bool rr, Bool ww, Bool xx)
+
+TRACK(die_mem_stack_signal,  Addr a, UInt len)
+TRACK(die_mem_brk,           Addr a, UInt len)
+TRACK(die_mem_munmap,        Addr a, UInt len)
+
 TRACK(new_mem_stack_4,       Addr new_ESP)
 TRACK(new_mem_stack_8,       Addr new_ESP)
 TRACK(new_mem_stack_12,      Addr new_ESP)
@@ -225,18 +216,6 @@
 TRACK(new_mem_stack_32,      Addr new_ESP)
 TRACK(new_mem_stack,         Addr a, UInt len)
 
-TRACK(copy_mem_heap,       Addr from, Addr to, UInt len)
-TRACK(copy_mem_remap,      Addr from, Addr to, UInt len)
-TRACK(change_mem_mprotect, Addr a, UInt len, Bool rr, Bool ww, Bool xx)
-
-TRACK(ban_mem_heap,  Addr a, UInt len)
-TRACK(ban_mem_stack, Addr a, UInt len)
-
-TRACK(die_mem_heap,          Addr a, UInt len)
-TRACK(die_mem_stack_signal,  Addr a, UInt len)
-TRACK(die_mem_brk,           Addr a, UInt len)
-TRACK(die_mem_munmap,        Addr a, UInt len)
-
 TRACK(die_mem_stack_4,       Addr new_ESP)
 TRACK(die_mem_stack_8,       Addr new_ESP)
 TRACK(die_mem_stack_12,      Addr new_ESP)
@@ -244,8 +223,7 @@
 TRACK(die_mem_stack_32,      Addr new_ESP)
 TRACK(die_mem_stack,         Addr a, UInt len)
 
-TRACK(bad_free,        ThreadState* tst, Addr a)
-TRACK(mismatched_free, ThreadState* tst, Addr a)
+TRACK(ban_mem_stack, Addr a, UInt len)
 
 TRACK(pre_mem_read,        CorePart part, ThreadState* tst, Char* s, Addr a,
                            UInt size)
@@ -263,7 +241,7 @@
 TRACK(post_mutex_lock,   ThreadId tid, void* /*pthread_mutex_t* */ mutex)
 TRACK(post_mutex_unlock, ThreadId tid, void* /*pthread_mutex_t* */ mutex)
 
-TRACK(pre_deliver_signal,  ThreadId tid, Int sigNum, Bool alt_stack)
+TRACK( pre_deliver_signal, ThreadId tid, Int sigNum, Bool alt_stack)
 TRACK(post_deliver_signal, ThreadId tid, Int sigNum)
 
 /*--------------------------------------------------------------------*/
@@ -352,42 +330,6 @@
 }
 
 /*--------------------------------------------------------------------*/
-/* ShadowChunks */
-
-UInt VG_(get_sc_size)  ( ShadowChunk* sc )
-{
-   return sc->size;
-}
-
-Addr VG_(get_sc_data)  ( ShadowChunk* sc )
-{
-   return sc->data;
-}
-
-UInt VG_(get_sc_extra) ( ShadowChunk* sc, UInt i )
-{
-   vg_assert(i < VG_(needs).sizeof_shadow_block);
-   return sc->extra[i];
-}
-
-ShadowChunk* VG_(get_sc_next)  ( ShadowChunk* sc )
-{
-   return sc->next;
-}
-
-void VG_(set_sc_extra) ( ShadowChunk* sc, UInt i, UInt word )
-{
-   vg_assert(i < VG_(needs).sizeof_shadow_block);
-   sc->extra[i] = word;
-}
-
-void VG_(set_sc_next)  ( ShadowChunk* sc, ShadowChunk* next )
-{
-   sc->next = next;
-}
-
-
-/*--------------------------------------------------------------------*/
 /*--- end                                               vg_needs.c ---*/
 /*--------------------------------------------------------------------*/
 
diff --git a/coregrind/vg_procselfmaps.c b/coregrind/vg_procselfmaps.c
index 7998927..c457e17 100644
--- a/coregrind/vg_procselfmaps.c
+++ b/coregrind/vg_procselfmaps.c
@@ -34,9 +34,11 @@
 
 
 /* static ... to keep it out of the stack frame. */
-
 static Char procmap_buf[M_PROCMAP_BUF];
 
+/* Records length of /proc/self/maps read into procmap_buf. */
+static Int  buf_n_tot;
+
 
 /* Helper fns. */
 
@@ -67,8 +69,38 @@
 }
 
 
+/* Read /proc/self/maps, store the contents in a static buffer.  If there's
+   a syntax error or other failure, just abort. */
+void VG_(read_procselfmaps_contents)(void)
+{
+   Int n_chunk, fd;
+   
+   /* Read the initial memory mapping from the /proc filesystem. */
+   fd = VG_(open) ( "/proc/self/maps", VKI_O_RDONLY, 0 );
+   if (fd == -1) {
+      VG_(message)(Vg_UserMsg, "FATAL: can't open /proc/self/maps");
+      VG_(exit)(1);
+   }
+   buf_n_tot = 0;
+   do {
+      n_chunk = VG_(read) ( fd, &procmap_buf[buf_n_tot], 
+                            M_PROCMAP_BUF - buf_n_tot );
+      buf_n_tot += n_chunk;
+   } while ( n_chunk > 0 && buf_n_tot < M_PROCMAP_BUF );
+   VG_(close)(fd);
+   if (buf_n_tot >= M_PROCMAP_BUF-5) {
+      VG_(message)(Vg_UserMsg, "FATAL: M_PROCMAP_BUF is too small; "
+                               "increase it and recompile");
+       VG_(exit)(1);
+   }
+   if (buf_n_tot == 0) {
+      VG_(message)(Vg_UserMsg, "FATAL: I/O error on /proc/self/maps" );
+       VG_(exit)(1);
+   }
+   procmap_buf[buf_n_tot] = 0;
+}
 
-/* Read /proc/self/maps.  For each map entry, call
+/* Parse /proc/self/maps.  For each map entry, call
    record_mapping, passing it, in this order:
 
       start address in memory
@@ -88,49 +120,30 @@
    Note that the supplied filename is transiently stored; record_mapping 
    should make a copy if it wants to keep it.
 
-   If there's a syntax error or other failure, just abort.  
+   Nb: it is important that this function does not alter the contents of
+       procmap_buf!
 */
-
 void VG_(read_procselfmaps) (
-   void (*record_mapping)( Addr, UInt, Char, Char, Char, UInt, UChar* )
-)
+   void (*record_mapping)( Addr, UInt, Char, Char, Char, UInt, UChar* ),
+   Bool read_from_file )
 {
-   Int    i, j, n_tot, n_chunk, fd, i_eol;
+   Int    i, j, i_eol;
    Addr   start, endPlusOne;
    UChar* filename;
    UInt   foffset;
-   UChar  rr, ww, xx, pp, ch;
+   UChar  rr, ww, xx, pp, ch, tmp;
 
-   /* Read the initial memory mapping from the /proc filesystem. */
-   fd = VG_(open) ( "/proc/self/maps", VKI_O_RDONLY, 0 );
-   if (fd == -1) {
-      VG_(message)(Vg_UserMsg, "FATAL: can't open /proc/self/maps");
-      VG_(exit)(1);
+   if (read_from_file) {
+      VG_(read_procselfmaps_contents)();
    }
-   n_tot = 0;
-   do {
-      n_chunk = VG_(read) ( fd, &procmap_buf[n_tot], M_PROCMAP_BUF - n_tot );
-      n_tot += n_chunk;
-   } while ( n_chunk > 0 && n_tot < M_PROCMAP_BUF );
-   VG_(close)(fd);
-   if (n_tot >= M_PROCMAP_BUF-5) {
-      VG_(message)(Vg_UserMsg, "FATAL: M_PROCMAP_BUF is too small; "
-                               "increase it and recompile");
-       VG_(exit)(1);
-   }
-   if (n_tot == 0) {
-      VG_(message)(Vg_UserMsg, "FATAL: I/O error on /proc/self/maps" );
-       VG_(exit)(1);
-   }
-   procmap_buf[n_tot] = 0;
+
    if (0)
       VG_(message)(Vg_DebugMsg, "raw:\n%s", procmap_buf );
 
    /* Ok, it's safely aboard.  Parse the entries. */
-
    i = 0;
    while (True) {
-      if (i >= n_tot) break;
+      if (i >= buf_n_tot) break;
 
       /* Read (without fscanf :) the pattern %8x-%8x %c%c%c%c %8x */
       j = readhex(&procmap_buf[i], &start);
@@ -181,9 +194,13 @@
       while (!VG_(isspace)(procmap_buf[i]) && i >= 0) i--;
       i++;
       if (i < i_eol-1 && procmap_buf[i] == '/') {
+         /* Minor hack: put a '\0' at the filename end for the call to
+            `record_mapping', then restore the old char with `tmp'. */
          filename = &procmap_buf[i];
+         tmp = filename[i_eol - i];
          filename[i_eol - i] = '\0';
       } else {
+         tmp = '\0';
          filename = NULL;
          foffset = 0;
       }
@@ -192,6 +209,10 @@
                           rr, ww, xx, 
                           foffset, filename );
 
+      if ('\0' != tmp) {
+         filename[i_eol - i] = tmp;
+      }
+
       i = i_eol + 1;
    }
 }
diff --git a/coregrind/vg_replace_malloc.c b/coregrind/vg_replace_malloc.c
new file mode 100644
index 0000000..f70a465
--- /dev/null
+++ b/coregrind/vg_replace_malloc.c
@@ -0,0 +1,416 @@
+
+/*--------------------------------------------------------------------*/
+/*--- Replacements for malloc() et al, which run on the simulated  ---*/
+/*--- CPU.                                     vg_replace_malloc.c ---*/
+/*--------------------------------------------------------------------*/
+
+/*
+   This file is part of Valgrind, an extensible x86 protected-mode
+   emulator for monitoring program execution on x86-Unixes.
+
+   Copyright (C) 2000-2002 Julian Seward 
+      jseward@acm.org
+
+   This program is free software; you can redistribute it and/or
+   modify it under the terms of the GNU General Public License as
+   published by the Free Software Foundation; either version 2 of the
+   License, or (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software
+   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+   02111-1307, USA.
+
+   The GNU General Public License is contained in the file COPYING.
+*/
+
+/* ---------------------------------------------------------------------
+   All the code in this file runs on the SIMULATED CPU.  It is
+   intended for various reasons as drop-in replacements for malloc()
+   and friends.  These functions have global visibility (obviously) and
+   have no prototypes in vg_include.h, since they are not intended to
+   be called from within Valgrind.
+
+   This file can be #included into a skin that wishes to know about
+   calls to malloc().  It should define functions SK_(malloc) et al
+   that will be called.
+   ------------------------------------------------------------------ */
+
+#include "vg_include.h"
+
+/* Sidestep the normal check which disallows using valgrind.h directly. */
+#define __VALGRIND_SOMESKIN_H
+#include "valgrind.h"            /* for VG_NON_SIMD_tstCALL[12] */
+
+/*------------------------------------------------------------*/
+/*--- Command line options                                 ---*/
+/*------------------------------------------------------------*/
+
+/* Round malloc sizes upwards to integral number of words? default: NO */
+Bool VG_(clo_sloppy_malloc)  = False;
+
+/* DEBUG: print malloc details?  default: NO */
+Bool VG_(clo_trace_malloc)   = False;
+
+/* Minimum alignment in functions that don't specify alignment explicitly.
+   default: 0, i.e. use default of the machine (== 4) */
+Int  VG_(clo_alignment) = 4;
+
+
+Bool VG_(replacement_malloc_process_cmd_line_option)(Char* arg)
+{
+   if      (VG_CLO_STREQN(12, arg, "--alignment=")) {
+      VG_(clo_alignment) = (Int)VG_(atoll)(&arg[12]);
+
+      if (VG_(clo_alignment) < 4 
+          || VG_(clo_alignment) > 4096
+          || VG_(log2)( VG_(clo_alignment) ) == -1 /* not a power of 2 */) {
+         VG_(message)(Vg_UserMsg, "");
+         VG_(message)(Vg_UserMsg, 
+            "Invalid --alignment= setting.  "
+            "Should be a power of 2, >= 4, <= 4096.");
+         VG_(bad_option)("--alignment");
+      }
+   }
+
+   else if (VG_CLO_STREQ(arg, "--sloppy-malloc=yes"))
+      VG_(clo_sloppy_malloc) = True;
+   else if (VG_CLO_STREQ(arg, "--sloppy-malloc=no"))
+      VG_(clo_sloppy_malloc) = False;
+
+   else if (VG_CLO_STREQ(arg, "--trace-malloc=yes"))
+      VG_(clo_trace_malloc) = True;
+   else if (VG_CLO_STREQ(arg, "--trace-malloc=no"))
+      VG_(clo_trace_malloc) = False;
+
+   else 
+      return False;
+
+   return True;
+}
+
+void VG_(replacement_malloc_print_usage)(void)
+{
+   VG_(printf)(
+"    --sloppy-malloc=no|yes    round malloc sizes to next word? [no]\n"
+"    --alignment=<number>      set minimum alignment of allocations [4]\n"
+   );
+}
+
+void VG_(replacement_malloc_print_debug_usage)(void)
+{
+   VG_(printf)(
+"    --trace-malloc=no|yes     show client malloc details? [no]\n"
+   );
+}
+
+
+/*------------------------------------------------------------*/
+/*--- Replacing malloc() et al                             ---*/
+/*------------------------------------------------------------*/
+
+/* Below are new versions of malloc, __builtin_new, free, 
+   __builtin_delete, calloc, realloc, memalign, and friends.
+
+   malloc, __builtin_new, free, __builtin_delete, calloc and realloc
+   can be entered either on the real CPU or the simulated one.  If on
+   the real one, this is because the dynamic linker is running the
+   static initialisers for C++, before starting up Valgrind itself.
+   In this case it is safe to route calls through to
+   VG_(arena_malloc)/VG_(arena_free), since they are self-initialising.
+
+   Once Valgrind is initialised, vg_running_on_simd_CPU becomes True.
+   The call needs to be transferred from the simulated CPU back to the
+   real one and routed to the VG_(cli_malloc)() or VG_(cli_free)().  To do
+   that, the client-request mechanism (in valgrind.h) is used to convey
+   requests to the scheduler.
+*/
+
+#define MALLOC_TRACE(format, args...)  \
+   if (VG_(clo_trace_malloc))          \
+      VG_(printf)(format, ## args )
+
+#define MAYBE_SLOPPIFY(n)           \
+   if (VG_(clo_sloppy_malloc)) {    \
+      while ((n % 4) > 0) n++;      \
+   }
+
+/* ALL calls to malloc wind up here. */
+void* malloc ( Int n )
+{
+   void* v;
+
+   MALLOC_TRACE("malloc[simd=%d](%d)", 
+                (UInt)VG_(is_running_on_simd_CPU)(), n );
+   MAYBE_SLOPPIFY(n);
+
+   if (VG_(is_running_on_simd_CPU)()) {
+      v = (void*)VG_NON_SIMD_tstCALL1( SK_(malloc), n );
+   } else if (VG_(clo_alignment) != 4) {
+      v = VG_(arena_malloc_aligned)(VG_AR_CLIENT, VG_(clo_alignment), n);
+   } else {
+      v = VG_(arena_malloc)(VG_AR_CLIENT, n);
+   }
+   MALLOC_TRACE(" = %p\n", v );
+   return v;
+}
+
+void* __builtin_new ( Int n )
+{
+   void* v;
+
+   MALLOC_TRACE("__builtin_new[simd=%d](%d)", 
+                (UInt)VG_(is_running_on_simd_CPU)(), n );
+   MAYBE_SLOPPIFY(n);
+
+   if (VG_(is_running_on_simd_CPU)()) {
+      v = (void*)VG_NON_SIMD_tstCALL1( SK_(__builtin_new), n );
+   } else if (VG_(clo_alignment) != 4) {
+      v = VG_(arena_malloc_aligned)(VG_AR_CLIENT, VG_(clo_alignment), n);
+   } else {
+      v = VG_(arena_malloc)(VG_AR_CLIENT, n);
+   }
+   MALLOC_TRACE(" = %p\n", v );
+   return v;
+}
+
+/* gcc 3.X.X mangles them differently. */
+void* _Znwj ( Int n )
+{
+  return __builtin_new(n);
+}
+
+void* __builtin_vec_new ( Int n )
+{
+   void* v;
+
+   MALLOC_TRACE("__builtin_vec_new[simd=%d](%d)", 
+                (UInt)VG_(is_running_on_simd_CPU)(), n );
+   MAYBE_SLOPPIFY(n);
+
+   if (VG_(is_running_on_simd_CPU)()) {
+      v = (void*)VG_NON_SIMD_tstCALL1( SK_(__builtin_vec_new), n );
+   } else if (VG_(clo_alignment) != 4) {
+      v = VG_(arena_malloc_aligned)(VG_AR_CLIENT, VG_(clo_alignment), n);
+   } else {
+      v = VG_(arena_malloc)(VG_AR_CLIENT, n);
+   }
+   MALLOC_TRACE(" = %p\n", v );
+   return v;
+}
+
+/* gcc 3.X.X mangles them differently. */
+void* _Znaj ( Int n )
+{
+  return __builtin_vec_new(n);
+}
+
+void free ( void* p )
+{
+   MALLOC_TRACE("free[simd=%d](%p)\n", 
+                (UInt)VG_(is_running_on_simd_CPU)(), p );
+   if (p == NULL) 
+      return;
+   if (VG_(is_running_on_simd_CPU)()) {
+      (void)VG_NON_SIMD_tstCALL1( SK_(free), p );
+   } else {
+      VG_(arena_free)(VG_AR_CLIENT, p);      
+   }
+}
+
+void __builtin_delete ( void* p )
+{
+   MALLOC_TRACE("__builtin_delete[simd=%d](%p)\n", 
+                (UInt)VG_(is_running_on_simd_CPU)(), p );
+   if (p == NULL) 
+      return;
+   if (VG_(is_running_on_simd_CPU)()) {
+      (void)VG_NON_SIMD_tstCALL1( SK_(__builtin_delete), p );
+   } else {
+      VG_(arena_free)(VG_AR_CLIENT, p);
+   }
+}
+
+/* gcc 3.X.X mangles them differently. */
+void _ZdlPv ( void* p )
+{
+  __builtin_delete(p);
+}
+
+void __builtin_vec_delete ( void* p )
+{
+   MALLOC_TRACE("__builtin_vec_delete[simd=%d](%p)\n", 
+                (UInt)VG_(is_running_on_simd_CPU)(), p );
+   if (p == NULL) 
+      return;
+   if (VG_(is_running_on_simd_CPU)()) {
+      (void)VG_NON_SIMD_tstCALL1( SK_(__builtin_vec_delete), p );
+   } else {
+      VG_(arena_free)(VG_AR_CLIENT, p);
+   }
+}
+
+/* gcc 3.X.X mangles them differently. */
+void _ZdaPv ( void* p )
+{
+  __builtin_vec_delete(p);
+}
+
+void* calloc ( Int nmemb, Int size )
+{
+   void* v;
+
+   MALLOC_TRACE("calloc[simd=%d](%d,%d)", 
+                (UInt)VG_(is_running_on_simd_CPU)(), nmemb, size );
+   MAYBE_SLOPPIFY(size);
+
+   if (VG_(is_running_on_simd_CPU)()) {
+      v = (void*)VG_NON_SIMD_tstCALL2( SK_(calloc), nmemb, size );
+   } else {
+      v = VG_(arena_calloc)(VG_AR_CLIENT, VG_(clo_alignment), nmemb, size);
+   }
+   MALLOC_TRACE(" = %p\n", v );
+   return v;
+}
+
+
+void* realloc ( void* ptrV, Int new_size )
+{
+   void* v;
+
+   MALLOC_TRACE("realloc[simd=%d](%p,%d)", 
+                (UInt)VG_(is_running_on_simd_CPU)(), ptrV, new_size );
+   MAYBE_SLOPPIFY(new_size);
+
+   if (ptrV == NULL)
+      return malloc(new_size);
+   if (new_size <= 0) {
+      free(ptrV);
+      if (VG_(clo_trace_malloc)) 
+         VG_(printf)(" = 0\n" );
+      return NULL;
+   }   
+   if (VG_(is_running_on_simd_CPU)()) {
+      v = (void*)VG_NON_SIMD_tstCALL2( SK_(realloc), ptrV, new_size );
+   } else {
+      v = VG_(arena_realloc)(VG_AR_CLIENT, ptrV, VG_(clo_alignment), new_size);
+   }
+   MALLOC_TRACE(" = %p\n", v );
+   return v;
+}
+
+
+void* memalign ( Int alignment, Int n )
+{
+   void* v;
+
+   MALLOC_TRACE("memalign[simd=%d](al %d, size %d)", 
+                (UInt)VG_(is_running_on_simd_CPU)(), alignment, n );
+   MAYBE_SLOPPIFY(n);
+
+   if (VG_(is_running_on_simd_CPU)()) {
+      v = (void*)VG_NON_SIMD_tstCALL2( SK_(memalign), alignment, n );
+   } else {
+      v = VG_(arena_malloc_aligned)(VG_AR_CLIENT, alignment, n);
+   }
+   MALLOC_TRACE(" = %p\n", v );
+   return v;
+}
+
+
+void* valloc ( Int size )
+{
+   return memalign(VKI_BYTES_PER_PAGE, size);
+}
+
+
+/* Various compatibility wrapper functions, for glibc and libstdc++. */
+void cfree ( void* p )
+{
+   free ( p );
+}
+
+
+int mallopt ( int cmd, int value )
+{
+   /* In glibc-2.2.4, 1 denotes a successful return value for mallopt */
+   return 1;
+}
+
+
+int __posix_memalign ( void **memptr, UInt alignment, UInt size )
+{
+    void *mem;
+
+    /* Test whether the SIZE argument is valid.  It must be a power of
+       two multiple of sizeof (void *).  */
+    if (size % sizeof (void *) != 0 || (size & (size - 1)) != 0)
+       return VKI_EINVAL /*22*/ /*EINVAL*/;
+
+    mem = memalign (alignment, size);
+
+    if (mem != NULL) {
+       *memptr = mem;
+       return 0;
+    }
+
+    return VKI_ENOMEM /*12*/ /*ENOMEM*/;
+}
+
+
+/* Bomb out if we get any of these. */
+/* HACK: We shouldn't call VG_(core_panic) or VG_(message) on the simulated
+   CPU.  Really we should pass the request in the usual way, and
+   Valgrind itself can do the panic.  Too tedious, however.  
+*/
+void pvalloc ( void )
+{ VG_(core_panic)("call to pvalloc\n"); }
+void malloc_stats ( void )
+{ VG_(core_panic)("call to malloc_stats\n"); }
+void malloc_usable_size ( void )
+{ VG_(core_panic)("call to malloc_usable_size\n"); }
+void malloc_trim ( void )
+{ VG_(core_panic)("call to malloc_trim\n"); }
+void malloc_get_state ( void )
+{ VG_(core_panic)("call to malloc_get_state\n"); }
+void malloc_set_state ( void )
+{ VG_(core_panic)("call to malloc_set_state\n"); }
+
+
+/* Yet another ugly hack.  Cannot include <malloc.h> because we
+   implement functions implemented there with different signatures.
+   This struct definition MUST match the system one. */
+
+/* SVID2/XPG mallinfo structure */
+struct mallinfo {
+   int arena;    /* total space allocated from system */
+   int ordblks;  /* number of non-inuse chunks */
+   int smblks;   /* unused -- always zero */
+   int hblks;    /* number of mmapped regions */
+   int hblkhd;   /* total space in mmapped regions */
+   int usmblks;  /* unused -- always zero */
+   int fsmblks;  /* unused -- always zero */
+   int uordblks; /* total allocated space */
+   int fordblks; /* total non-inuse space */
+   int keepcost; /* top-most, releasable (via malloc_trim) space */
+};
+
+struct mallinfo mallinfo ( void )
+{
+   /* Should really try to return something a bit more meaningful */
+   Int             i;
+   struct mallinfo mi;
+   UChar*          pmi = (UChar*)(&mi);
+   for (i = 0; i < sizeof(mi); i++)
+      pmi[i] = 0;
+   return mi;
+}
+
+/*--------------------------------------------------------------------*/
+/*--- end                                      vg_replace_malloc.c ---*/
+/*--------------------------------------------------------------------*/
diff --git a/coregrind/vg_scheduler.c b/coregrind/vg_scheduler.c
index ea8f355..f561977 100644
--- a/coregrind/vg_scheduler.c
+++ b/coregrind/vg_scheduler.c
@@ -3265,57 +3265,89 @@
    /* VG_(printf)("req no = 0x%x\n", req_no); */
    switch (req_no) {
 
+      /* For the CLIENT_{,tst}CALL[0123] ones, have to do some nasty casting
+         to make gcc believe it's a function. */
+      case VG_USERREQ__CLIENT_CALL0: {
+         UInt (*f)(void) = (void*)arg[1];
+         RETURN_WITH(
+            f ( )
+         );
+         break;
+      }
+      case VG_USERREQ__CLIENT_CALL1: {
+         UInt (*f)(UInt) = (void*)arg[1];
+         RETURN_WITH(
+            f ( arg[2] )
+         );
+         break;
+      }
+      case VG_USERREQ__CLIENT_CALL2: {
+         UInt (*f)(UInt, UInt) = (void*)arg[1];
+         RETURN_WITH(
+            f ( arg[2], arg[3] )
+         );
+         break;
+      }
+      case VG_USERREQ__CLIENT_CALL3: {
+         UInt (*f)(UInt, UInt, UInt) = (void*)arg[1];
+         RETURN_WITH(
+            f ( arg[2], arg[3], arg[4] )
+         );
+         break;
+      }
+
+      case VG_USERREQ__CLIENT_tstCALL0: {
+         UInt (*f)(ThreadState*) = (void*)arg[1];
+         RETURN_WITH(
+            f ( tst )
+         );
+         break;
+      }
+      case VG_USERREQ__CLIENT_tstCALL1: {
+         UInt (*f)(ThreadState*, UInt) = (void*)arg[1];
+         RETURN_WITH(
+            f ( tst, arg[2] )
+         );
+         break;
+      }
+      case VG_USERREQ__CLIENT_tstCALL2: {
+         UInt (*f)(ThreadState*, UInt, UInt) = (void*)arg[1];
+         RETURN_WITH(
+            f ( tst, arg[2], arg[3] )
+         );
+         break;
+      }
+      case VG_USERREQ__CLIENT_tstCALL3: {
+         UInt (*f)(ThreadState*, UInt, UInt, UInt) = (void*)arg[1];
+         RETURN_WITH(
+            f ( tst, arg[2], arg[3], arg[4] )
+         );
+         break;
+      }
+
+      /* Note:  for skins that replace malloc() et al, we want to call
+         the replacement versions.  For those that don't, we want to call
+         VG_(cli_malloc)() et al.  We do this by calling SK_(malloc)(), which
+         malloc-replacing skins must replace, but have its default definition
+         call */
+
+      /* Note: for MALLOC and FREE, must set the appropriate "lock"... see
+         the comment in vg_defaults.c/SK_(malloc)() for why. */
       case VG_USERREQ__MALLOC:
+         VG_(sk_malloc_called_by_scheduler) = True;
          RETURN_WITH(
-            (UInt)VG_(client_malloc) ( tst, arg[1], Vg_AllocMalloc ) 
+            (UInt)SK_(malloc) ( tst, arg[1] ) 
          );
-         break;
-
-      case VG_USERREQ__BUILTIN_NEW:
-         RETURN_WITH(
-            (UInt)VG_(client_malloc) ( tst, arg[1], Vg_AllocNew )
-         );
-         break;
-
-      case VG_USERREQ__BUILTIN_VEC_NEW:
-         RETURN_WITH(
-            (UInt)VG_(client_malloc) ( tst, arg[1], Vg_AllocNewVec )
-         );
+         VG_(sk_malloc_called_by_scheduler) = False;
          break;
 
       case VG_USERREQ__FREE:
-         VG_(client_free) ( tst, (void*)arg[1], Vg_AllocMalloc );
+         VG_(sk_malloc_called_by_scheduler) = True;
+         SK_(free) ( tst, (void*)arg[1] );
+         VG_(sk_malloc_called_by_scheduler) = False;
 	 RETURN_WITH(0); /* irrelevant */
          break;
 
-      case VG_USERREQ__BUILTIN_DELETE:
-         VG_(client_free) ( tst, (void*)arg[1], Vg_AllocNew );
-	 RETURN_WITH(0); /* irrelevant */
-         break;
-
-      case VG_USERREQ__BUILTIN_VEC_DELETE:
-         VG_(client_free) ( tst, (void*)arg[1], Vg_AllocNewVec );
-	 RETURN_WITH(0); /* irrelevant */
-         break;
-
-      case VG_USERREQ__CALLOC:
-         RETURN_WITH(
-            (UInt)VG_(client_calloc) ( tst, arg[1], arg[2] )
-         );
-         break;
-
-      case VG_USERREQ__REALLOC:
-         RETURN_WITH(
-            (UInt)VG_(client_realloc) ( tst, (void*)arg[1], arg[2] )
-         );
-         break;
-
-      case VG_USERREQ__MEMALIGN:
-         RETURN_WITH(
-            (UInt)VG_(client_memalign) ( tst, arg[1], arg[2] )
-         );
-         break;
-
       case VG_USERREQ__PTHREAD_GET_THREADID:
          RETURN_WITH(tid);
          break;
diff --git a/coregrind/vg_symtab2.c b/coregrind/vg_symtab2.c
index 5f64156..ee0e68a 100644
--- a/coregrind/vg_symtab2.c
+++ b/coregrind/vg_symtab2.c
@@ -1910,7 +1910,8 @@
       return;
 
    VGP_PUSHCC(VgpReadSyms);
-      VG_(read_procselfmaps) ( VG_(read_symtab_callback) );
+      VG_(read_procselfmaps) ( VG_(read_symtab_callback),
+                               /*read_from_file*/True );
    VGP_POPCC(VgpReadSyms);
 }
 
diff --git a/helgrind/Makefile.am b/helgrind/Makefile.am
index 15c9d2f..e16eea7 100644
--- a/helgrind/Makefile.am
+++ b/helgrind/Makefile.am
@@ -12,12 +12,8 @@
 
 vgskin_helgrind_so_SOURCES = hg_main.c
 vgskin_helgrind_so_LDFLAGS = -shared
+vgskin_helgrind_so_LDADD = ../coregrind/vg_replace_malloc.o
 
 hgincludedir = $(includedir)/valgrind
 
-hginclude_HEADERS = \
-	helgrind.h
-
-##vgskin_helgrind.so$(EXEEXT): $(vgskin_helgrind_so_OBJECTS)
-##	$(CC) $(CFLAGS) $(LDFLAGS) -shared -o vgskin_helgrind.so \
-##		$(vgskin_helgrind_so_OBJECTS)
+hginclude_HEADERS = helgrind.h
diff --git a/helgrind/hg_main.c b/helgrind/hg_main.c
index 21bfe74..9aa38b5 100644
--- a/helgrind/hg_main.c
+++ b/helgrind/hg_main.c
@@ -140,6 +140,16 @@
 /*--- Data defns.                                          ---*/
 /*------------------------------------------------------------*/
 
+typedef
+   struct _HG_Chunk {
+      struct _HG_Chunk* next;
+      Addr          data;           /* ptr to actual block              */
+      UInt          size;           /* size requested                   */
+      ExeContext*   where;          /* where it was allocated           */
+      ThreadId      tid;            /* allocating thread                */
+   }
+   HG_Chunk;
+
 typedef enum 
    { Vge_VirginInit, Vge_NonVirginInit, Vge_SegmentInit, Vge_Error } 
    VgeInitStatus;
@@ -1285,42 +1295,6 @@
 }
 
 /*------------------------------------------------------------*/
-/*--- Shadow chunks info                                   ---*/
-/*------------------------------------------------------------*/
-
-#define SHADOW_EXTRA	2
-
-static __inline__
-ExeContext *get_sc_where( ShadowChunk* sc )
-{
-   return (ExeContext*)VG_(get_sc_extra)(sc, 0);
-}
-
-static __inline__
-ThreadId get_sc_tid(ShadowChunk *sc)
-{
-   return (ThreadId)VG_(get_sc_extra)(sc, 1);
-}
-
-static __inline__
-void set_sc_where( ShadowChunk* sc, ExeContext* ec )
-{
-   VG_(set_sc_extra)(sc, 0, (UInt)ec);
-}
-
-static __inline__
-void set_sc_tid( ShadowChunk* sc, ThreadId tid )
-{
-   VG_(set_sc_extra)(sc, 1, (UInt)tid);
-}
-
-void SK_(complete_shadow_chunk) ( ShadowChunk* sc, ThreadState* tst )
-{
-   set_sc_where ( sc, VG_(get_ExeContext)(tst) );
-   set_sc_tid   ( sc, VG_(get_tid_from_ThreadState)(tst) );
-}
-
-/*------------------------------------------------------------*/
 /*--- Implementation of mutex structure.                   ---*/
 /*------------------------------------------------------------*/
 
@@ -1430,40 +1404,6 @@
    }
 }
 
-#define N_FREED_CHUNKS	2
-static Int freechunkptr = 0;
-static ShadowChunk *freechunks[N_FREED_CHUNKS];
-
-/* They're freeing some memory; look to see if it contains any mutexes. */
-void SK_(alt_free) ( ShadowChunk* sc, ThreadState* tst )
-{
-   ThreadId tid = VG_(get_tid_from_ThreadState)(tst);
-   Addr start = VG_(get_sc_data)(sc);
-   Addr end = start + VG_(get_sc_size)(sc);
-
-   Bool deadmx(Mutex *mx) {
-      if (mx->state != MxDead)
-	 set_mutex_state(mx, MxDead, tid, tst);
-
-      return False;
-   }
-
-   set_sc_where(sc, VG_(get_ExeContext)(tst));
-
-   /* maintain a small window so that the error reporting machinery
-      knows about this memory */
-   if (freechunks[freechunkptr] != NULL)
-      VG_(free_ShadowChunk)(freechunks[freechunkptr]);
-   freechunks[freechunkptr] = sc;
-
-   if (++freechunkptr == N_FREED_CHUNKS)
-      freechunkptr = 0;
-
-   /* mark all mutexes in range dead */
-   find_mutex_range(start, end, deadmx);
-}
-
-
 #define MARK_LOOP	(graph_mark+0)
 #define MARK_DONE	(graph_mark+1)
 
@@ -1860,6 +1800,208 @@
 }
 
 
+/*------------------------------------------------------------*/
+/*--- malloc() et al replacements                          ---*/
+/*------------------------------------------------------------*/
+
+VgHashTable hg_malloc_list = NULL;
+
+#define N_FREED_CHUNKS	2
+static Int freechunkptr = 0;
+static HG_Chunk *freechunks[N_FREED_CHUNKS];
+
+/* Use a small redzone (paranoia) */
+UInt VG_(vg_malloc_redzone_szB) = 4;
+
+
+/* Allocate a user-chunk of size bytes.  Also allocate its shadow
+   block, make the shadow block point at the user block.  Put the
+   shadow chunk on the appropriate list, and set all memory
+   protections correctly. */
+
+static void add_HG_Chunk ( ThreadState* tst, Addr p, UInt size )
+{
+   HG_Chunk* hc;
+
+   hc            = VG_(malloc)(sizeof(HG_Chunk));
+   hc->data      = p;
+   hc->size      = size;
+   hc->where     = VG_(get_ExeContext)(tst);
+   hc->tid       = VG_(get_tid_from_ThreadState)(tst);
+
+   VG_(HT_add_node)( hg_malloc_list, (VgHashNode*)hc );
+}
+
+/* Allocate memory and note change in memory available */
+static __inline__
+void* alloc_and_new_mem ( ThreadState* tst, UInt size, UInt alignment,
+                          Bool is_zeroed )
+{
+   Addr p;
+
+   p = (Addr)VG_(cli_malloc)(alignment, size);
+   add_HG_Chunk ( tst, p, size );
+   eraser_new_mem_heap( p, size, is_zeroed );
+
+   return (void*)p;
+}
+
+void* SK_(malloc) ( ThreadState* tst, Int n )
+{
+   return alloc_and_new_mem ( tst, n, VG_(clo_alignment), /*is_zeroed*/False );
+}
+
+void* SK_(__builtin_new) ( ThreadState* tst, Int n )
+{
+   return alloc_and_new_mem ( tst, n, VG_(clo_alignment), /*is_zeroed*/False );
+}
+
+void* SK_(__builtin_vec_new) ( ThreadState* tst, Int n )
+{
+   return alloc_and_new_mem ( tst, n, VG_(clo_alignment), /*is_zeroed*/False );
+}
+
+void* SK_(memalign) ( ThreadState* tst, Int align, Int n )
+{
+   return alloc_and_new_mem ( tst, n, align,              /*is_zeroed*/False );
+}
+
+void* SK_(calloc) ( ThreadState* tst, Int nmemb, Int size1 )
+{
+   void* p;
+   Int  size, i;
+
+   size = nmemb * size1;
+
+   p = alloc_and_new_mem ( tst, size, VG_(clo_alignment), /*is_zeroed*/True );
+   for (i = 0; i < size; i++)    /* calloc() is zeroed */
+      ((UChar*)p)[i] = 0;
+   return p;
+}
+
+static
+void die_and_free_mem ( ThreadState* tst, HG_Chunk* hc,
+                        HG_Chunk** prev_chunks_next_ptr )
+{
+   ThreadId tid   = VG_(get_tid_from_ThreadState)(tst);
+   Addr     start = hc->data;
+   Addr     end   = start + hc->size;
+
+   Bool deadmx(Mutex *mx) {
+      if (mx->state != MxDead)
+         set_mutex_state(mx, MxDead, tid, tst);
+
+      return False;
+   }
+
+   /* Remove hc from the malloclist using prev_chunks_next_ptr to
+      avoid repeating the hash table lookup.  Can't remove until at least
+      after free and free_mismatch errors are done because they use
+      describe_addr() which looks for it in malloclist. */
+   *prev_chunks_next_ptr = hc->next;
+
+   /* Record where freed */
+   hc->where = VG_(get_ExeContext) ( tst );
+
+   /* maintain a small window so that the error reporting machinery
+      knows about this memory */
+   if (freechunks[freechunkptr] != NULL) {
+      /* free HG_Chunk */
+      HG_Chunk* sc1 = freechunks[freechunkptr];
+      VG_(cli_free) ( (void*)(sc1->data) );
+      VG_(free) ( sc1 );
+   }
+
+   freechunks[freechunkptr] = hc;
+
+   if (++freechunkptr == N_FREED_CHUNKS)
+      freechunkptr = 0;
+
+   /* mark all mutexes in range dead */
+   find_mutex_range(start, end, deadmx);
+}
+
+
+static __inline__
+void handle_free ( ThreadState* tst, void* p )
+{
+   HG_Chunk*  hc;
+   HG_Chunk** prev_chunks_next_ptr;
+
+   hc = (HG_Chunk*)VG_(HT_get_node) ( hg_malloc_list, (UInt)p,
+                                      (VgHashNode***)&prev_chunks_next_ptr );
+   if (hc == NULL) {
+      return;
+   }
+   die_and_free_mem ( tst, hc, prev_chunks_next_ptr );
+}
+
+void SK_(free) ( ThreadState* tst, void* p )
+{
+   handle_free(tst, p);
+}
+
+void SK_(__builtin_delete) ( ThreadState* tst, void* p )
+{
+   handle_free(tst, p);
+}
+
+void SK_(__builtin_vec_delete) ( ThreadState* tst, void* p )
+{
+   handle_free(tst, p);
+}
+
+void* SK_(realloc) ( ThreadState* tst, void* p, Int new_size )
+{
+   HG_Chunk  *hc;
+   HG_Chunk **prev_chunks_next_ptr;
+   UInt       i;
+
+   /* First try and find the block. */
+   hc = (HG_Chunk*)VG_(HT_get_node) ( hg_malloc_list, (UInt)p,
+                                       (VgHashNode***)&prev_chunks_next_ptr );
+
+   if (hc == NULL) {
+      return NULL;
+   }
+  
+   if (hc->size == new_size) {
+      /* size unchanged */
+      return p;
+      
+   } else if (hc->size > new_size) {
+      /* new size is smaller */
+      hc->size = new_size;
+      return p;
+
+   } else {
+      /* new size is bigger */
+      Addr p_new;
+
+      /* Get new memory */
+      p_new = (Addr)VG_(cli_malloc)(VG_(clo_alignment), new_size);
+
+      /* First half kept and copied, second half new */
+      copy_address_range_state( (Addr)p, p_new, hc->size );
+      eraser_new_mem_heap ( p_new+hc->size, new_size-hc->size,
+                            /*inited*/False );
+
+      /* Copy from old to new */
+      for (i = 0; i < hc->size; i++)
+         ((UChar*)p_new)[i] = ((UChar*)p)[i];
+
+      /* Free old memory */
+      die_and_free_mem ( tst, hc, prev_chunks_next_ptr );
+
+      /* this has to be after die_and_free_mem, otherwise the
+         former succeeds in shorting out the new block, not the
+         old, in the case when both are on the same list.  */
+      add_HG_Chunk ( tst, p_new, new_size );
+
+      return (void*)p_new;
+   }  
+}
+
 /*--------------------------------------------------------------*/
 /*--- Machinery to support sanity checking                   ---*/
 /*--------------------------------------------------------------*/
@@ -2196,7 +2338,7 @@
 
 static void describe_addr ( Addr a, AddrInfo* ai )
 {
-   ShadowChunk* sc;
+   HG_Chunk* hc;
    Int i;
 
    /* Nested functions, yeah.  Need the lexical scoping of 'a'. */ 
@@ -2207,10 +2349,10 @@
       return (stack_min <= a && a <= stack_max);
    }
    /* Closure for searching malloc'd and free'd lists */
-   Bool addr_is_in_block(ShadowChunk *sh_ch)
+   Bool addr_is_in_block(VgHashNode *node)
    {
-      return VG_(addr_is_in_block) ( a, VG_(get_sc_data)(sh_ch),
-                                        VG_(get_sc_size)(sh_ch) );
+      HG_Chunk* hc2 = (HG_Chunk*)node;
+      return (hc2->data <= a && a < hc2->data + hc2->size);
    }
 
    /* Search for it in segments */
@@ -2247,34 +2389,28 @@
    }
 
    /* Search for a currently malloc'd block which might bracket it. */
-   sc = VG_(first_matching_mallocd_ShadowChunk)(addr_is_in_block);
-   if (NULL != sc) {
+   hc = (HG_Chunk*)VG_(HT_first_match)(hg_malloc_list, addr_is_in_block);
+   if (NULL != hc) {
       ai->akind      = Mallocd;
-      ai->blksize    = VG_(get_sc_size)(sc);
-      ai->rwoffset   = (Int)(a) - (Int)(VG_(get_sc_data)(sc));
-      ai->lastchange = get_sc_where(sc);
-      ai->lasttid    = get_sc_tid(sc);
+      ai->blksize    = hc->size;
+      ai->rwoffset   = (Int)a - (Int)(hc->data);
+      ai->lastchange = hc->where;
+      ai->lasttid    = hc->tid;
       return;
    } 
 
    /* Look in recently freed memory */
    for(i = 0; i < N_FREED_CHUNKS; i++) {
-      Addr sc_data;
-      UInt sc_size;
-      
-      sc = freechunks[i];
-      if (sc == NULL)
+      hc = freechunks[i];
+      if (hc == NULL)
 	 continue;
 
-      sc_data = VG_(get_sc_data)(sc);
-      sc_size = VG_(get_sc_size)(sc);
-
-      if (a >= sc_data && a <  sc_data + sc_size) {
+      if (a >= hc->data && a < hc->data + hc->size) {
 	 ai->akind      = Freed;
-	 ai->blksize    = sc_size;
-	 ai->rwoffset   = a - sc_data;
-	 ai->lastchange = get_sc_where(sc);
-	 ai->lasttid    = get_sc_tid(sc);
+	 ai->blksize    = hc->size;
+	 ai->rwoffset   = a - hc->data;
+	 ai->lastchange = hc->where;
+	 ai->lasttid    = hc->tid;
 	 return;
       } 
    }
@@ -3107,26 +3243,20 @@
    VG_(needs_core_errors)();
    VG_(needs_skin_errors)();
    VG_(needs_data_syms)();
-   VG_(needs_sizeof_shadow_block)(SHADOW_EXTRA);
-   VG_(needs_alternative_free)();
    VG_(needs_client_requests)();
    VG_(needs_command_line_options)();
 
    VG_(track_new_mem_startup)      (& eraser_new_mem_startup);
-   VG_(track_new_mem_heap)         (& eraser_new_mem_heap);
 
    /* stack ones not decided until VG_(post_clo_init)() */
 
    VG_(track_new_mem_brk)          (& make_writable);
    VG_(track_new_mem_mmap)         (& eraser_new_mem_startup);
 
-   VG_(track_copy_mem_heap)        (& copy_address_range_state);
    VG_(track_change_mem_mprotect)  (& eraser_set_perms);
 
-   VG_(track_ban_mem_heap)         (NULL);
    VG_(track_ban_mem_stack)        (NULL);
 
-   VG_(track_die_mem_heap)         (NULL);
    VG_(track_die_mem_stack)        (NULL);
    VG_(track_die_mem_stack_signal) (NULL);
    VG_(track_die_mem_brk)          (NULL);
@@ -3172,6 +3302,7 @@
    }
 
    init_shadow_memory();
+   hg_malloc_list = VG_(HT_construct)();
 }
 
 static Bool match_Bool(Char *arg, Char *argstr, Bool *ret)
@@ -3228,18 +3359,23 @@
    if (match_Bool(arg, "--private-stacks=", &clo_priv_stacks))
       return True;
 
-   return False;
+   return VG_(replacement_malloc_process_cmd_line_option)(arg);
 }
 
-Char *SK_(usage)(void)
+void SK_(print_usage)(void)
 {
-   return ""
+   VG_(printf)(
 "    --private-stacks=yes|no   assume thread stacks are used privately [no]\n"
 "    --show-last-access=no|some|all\n"
 "                           show location of last word access on error [no]\n"
-      ;
+   );
+   VG_(replacement_malloc_print_usage)();
 }
 
+void SK_(print_debug_usage)(void)
+{
+   VG_(replacement_malloc_print_debug_usage)();
+}
 
 void SK_(post_clo_init)(void)
 {
diff --git a/include/valgrind.h b/include/valgrind.h
index b81961b..c9e75af 100644
--- a/include/valgrind.h
+++ b/include/valgrind.h
@@ -151,6 +151,21 @@
 typedef
    enum { VG_USERREQ__RUNNING_ON_VALGRIND = 0x1001,
           VG_USERREQ__DISCARD_TRANSLATIONS,
+
+          /* These allow any function of 0--3 args to be called from the
+             simulated CPU but run on the real CPU */
+          VG_USERREQ__CLIENT_CALL0 = 0x1100,
+          VG_USERREQ__CLIENT_CALL1,
+          VG_USERREQ__CLIENT_CALL2,
+          VG_USERREQ__CLIENT_CALL3,
+
+          /* As above, but a pointer to the current ThreadState is inserted
+             as the first arg. */
+          VG_USERREQ__CLIENT_tstCALL0 = 0x1200,
+          VG_USERREQ__CLIENT_tstCALL1,
+          VG_USERREQ__CLIENT_tstCALL2,
+          VG_USERREQ__CLIENT_tstCALL3,
+
           VG_USERREQ__FINAL_DUMMY_CLIENT_REQUEST
    } Vg_ClientRequest;
 
@@ -178,4 +193,82 @@
    }
 
 
-#endif
+/* These requests allow control to move from the simulated CPU to the
+   real CPU, calling an arbitary function */
+#define VG_NON_SIMD_CALL0(_qyy_fn)                             \
+   ({unsigned int _qyy_res;                                    \
+    VALGRIND_MAGIC_SEQUENCE(_qyy_res, 0 /* default return */,  \
+                            VG_USERREQ__CLIENT_CALL0,          \
+                            _qyy_fn,                           \
+                            0, 0, 0);                          \
+    _qyy_res;                                                  \
+   })
+
+#define VG_NON_SIMD_CALL1(_qyy_fn, _qyy_arg1)                  \
+   ({unsigned int _qyy_res;                                    \
+    VALGRIND_MAGIC_SEQUENCE(_qyy_res, 0 /* default return */,  \
+                            VG_USERREQ__CLIENT_CALL1,          \
+                            _qyy_fn,                           \
+                            _qyy_arg1, 0, 0);                  \
+    _qyy_res;                                                  \
+   })
+
+#define VG_NON_SIMD_CALL2(_qyy_fn, _qyy_arg1, _qyy_arg2)       \
+   ({unsigned int _qyy_res;                                    \
+    VALGRIND_MAGIC_SEQUENCE(_qyy_res, 0 /* default return */,  \
+                            VG_USERREQ__CLIENT_CALL2,          \
+                            _qyy_fn,                           \
+                            _qyy_arg1, _qyy_arg2, 0);          \
+    _qyy_res;                                                  \
+   })
+
+#define VG_NON_SIMD_CALL3(_qyy_fn, _qyy_arg1, _qyy_arg2, _qyy_arg3)  \
+   ({unsigned int _qyy_res;                                          \
+    VALGRIND_MAGIC_SEQUENCE(_qyy_res, 0 /* default return */,        \
+                            VG_USERREQ__CLIENT_CALL3,                \
+                            _qyy_fn,                                 \
+                            _qyy_arg1, _qyy_arg2, _qyy_arg3);        \
+    _qyy_res;                                                        \
+   })
+
+
+/* These requests are similar to those above;  they insert the current
+   ThreadState as the first argument to the called function. */
+#define VG_NON_SIMD_tstCALL0(_qyy_fn)                          \
+   ({unsigned int _qyy_res;                                    \
+    VALGRIND_MAGIC_SEQUENCE(_qyy_res, 0 /* default return */,  \
+                            VG_USERREQ__CLIENT_tstCALL0,       \
+                            _qyy_fn,                           \
+                            0, 0, 0);                          \
+    _qyy_res;                                                  \
+   })
+
+#define VG_NON_SIMD_tstCALL1(_qyy_fn, _qyy_arg1)               \
+   ({unsigned int _qyy_res;                                    \
+    VALGRIND_MAGIC_SEQUENCE(_qyy_res, 0 /* default return */,  \
+                            VG_USERREQ__CLIENT_tstCALL1,       \
+                            _qyy_fn,                           \
+                            _qyy_arg1, 0, 0);                  \
+    _qyy_res;                                                  \
+   })
+
+#define VG_NON_SIMD_tstCALL2(_qyy_fn, _qyy_arg1, _qyy_arg2)    \
+   ({unsigned int _qyy_res;                                    \
+    VALGRIND_MAGIC_SEQUENCE(_qyy_res, 0 /* default return */,  \
+                            VG_USERREQ__CLIENT_tstCALL2,       \
+                            _qyy_fn,                           \
+                            _qyy_arg1, _qyy_arg2, 0);          \
+    _qyy_res;                                                  \
+   })
+
+#define VG_NON_SIMD_tstCALL3(_qyy_fn, _qyy_arg1, _qyy_arg2, _qyy_arg3)  \
+   ({unsigned int _qyy_res;                                             \
+    VALGRIND_MAGIC_SEQUENCE(_qyy_res, 0 /* default return */,           \
+                            VG_USERREQ__CLIENT_tstCALL3,                \
+                            _qyy_fn,                                    \
+                            _qyy_arg1, _qyy_arg2, _qyy_arg3);           \
+    _qyy_res;                                                           \
+   })
+
+
+#endif   /* __VALGRIND_H */
diff --git a/include/vg_skin.h b/include/vg_skin.h
index 723555f..7c97667 100644
--- a/include/vg_skin.h
+++ b/include/vg_skin.h
@@ -291,6 +291,11 @@
 extern ThreadId     VG_(get_tid_from_ThreadState)  ( ThreadState* );
 extern ThreadState* VG_(get_ThreadState)           ( ThreadId tid );
 
+/* Searches through all thread's stacks to see if any match.  Returns
+ * VG_INVALID_THREADID if none match. */
+extern ThreadId VG_(first_matching_thread_stack)
+                        ( Bool (*p) ( Addr stack_min, Addr stack_max ));
+
 
 /*====================================================================*/
 /*=== Valgrind's version of libc                                   ===*/
@@ -1188,8 +1193,7 @@
    if needed.  But it won't be copied if it's NULL.
 
    If no 'a', 's' or 'extra' of interest needs to be recorded, just use
-   NULL for them.
-*/
+   NULL for them.  */
 extern void VG_(maybe_record_error) ( ThreadState* tst, ErrorKind ekind, 
                                       Addr a, Char* s, void* extra );
 
@@ -1201,14 +1205,17 @@
    be suppressed without possibly printing it. */
 extern Bool VG_(unique_error) ( ThreadState* tst, ErrorKind ekind,
                                 Addr a, Char* s, void* extra,
-                                ExeContext* where, Bool print_error );
+                                ExeContext* where, Bool print_error,
+                                Bool allow_GDB_attach );
 
 /* Gets a non-blank, non-comment line of at most nBuf chars from fd.
    Skips leading spaces on the line.  Returns True if EOF was hit instead. 
-   Useful for reading in extra skin-specific suppression lines.
-*/
+   Useful for reading in extra skin-specific suppression lines.  */
 extern Bool VG_(get_line) ( Int fd, Char* buf, Int nBuf );
 
+/* Client request: write a string to the logging sink. */
+#define VG_USERREQ__LOGMESSAGE              0x3103
+
 
 /*====================================================================*/
 /*=== Obtaining debug information                                  ===*/
@@ -1219,8 +1226,7 @@
    copies the info into the buffer/UInt and returns True.  If not, it
    returns False and nothing is copied.  VG_(get_fnname) always
    demangles C++ function names.  VG_(get_fnname_w_offset) is the
-   same, except it appends "+N" to symbol names to indicate offsets.  
-*/
+   same, except it appends "+N" to symbol names to indicate offsets.  */
 extern Bool VG_(get_filename) ( Addr a, Char* filename, Int n_filename );
 extern Bool VG_(get_fnname)   ( Addr a, Char* fnname,   Int n_fnname   );
 extern Bool VG_(get_linenum)  ( Addr a, UInt* linenum );
@@ -1274,63 +1280,136 @@
 
 
 /*====================================================================*/
-/*=== Shadow chunks and block-finding                              ===*/
+/*=== Generic hash table                                           ===*/
 /*====================================================================*/
 
-/* The skin-relevant parts of a ShadowChunk are:
-     size:   size of the block in bytes
-     addr:   addr of the block
-     extra:  anything extra kept by the skin;  size is determined by
-             VG_(needs).sizeof_shadow_chunk
-*/
+/* Generic type for a separately-chained hash table.  Via a kind of dodgy
+   C-as-C++ style inheritance, skins can extend the VgHashNode type, so long
+   as the first two fields match the sizes of these two fields.  Requires
+   a bit of casting by the skin. */
 typedef
-   struct _ShadowChunk
-   ShadowChunk;
+   struct _VgHashNode {
+      struct _VgHashNode * next;
+      UInt               key;
+   }
+   VgHashNode;
 
-extern UInt         VG_(get_sc_size)  ( ShadowChunk* sc );
-extern Addr         VG_(get_sc_data)  ( ShadowChunk* sc );
-/* Gets the ith word of the `extra' field. */
-extern UInt         VG_(get_sc_extra) ( ShadowChunk* sc, UInt i );
-/* Sets the ith word of the `extra' field to `word'. */
-extern void         VG_(set_sc_extra) ( ShadowChunk* sc, UInt i, UInt word );
+typedef
+   VgHashNode**
+   VgHashTable;
 
-/* These two should only be used if the `alternative_free' need is set, once
-   we reach the point where the block would have been free'd. */
-extern ShadowChunk* VG_(get_sc_next)  ( ShadowChunk* sc );
-extern void         VG_(set_sc_next)  ( ShadowChunk* sc, ShadowChunk* next );
+/* Make a new table. */
+extern VgHashTable VG_(HT_construct) ( void );
+
+/* Add a node to the table. */
+extern void VG_(HT_add_node) ( VgHashTable t, VgHashNode* node );
+
+/* Looks up a node in the hash table.  Also returns the address of the 
+   previous node's `next' pointer which allows it to be removed from the
+   list later without having to look it up again.  */
+extern VgHashNode* VG_(HT_get_node) ( VgHashTable t, UInt key,
+                                    /*OUT*/VgHashNode*** next_ptr );
+
+/* Allocates a sorted array of pointers to all the shadow chunks of malloc'd
+   blocks. */
+extern VgHashNode** VG_(HT_to_sorted_array) ( VgHashTable t, 
+                                              /*OUT*/ UInt* n_shadows );
+
+/* Returns first node that matches predicate `p', or NULL if none do.
+   Extra arguments can be implicitly passed to `p' using nested functions;
+   see memcheck/mc_errcontext.c for an example. */
+extern VgHashNode* VG_(HT_first_match) ( VgHashTable t,
+                                         Bool (*p)(VgHashNode*) );
+
+/* Applies a function f() once to each node.  Again, nested functions
+   can be very useful. */
+extern void VG_(HT_apply_to_all_nodes)( VgHashTable t, void (*f)(VgHashNode*) );
+
+/* Destroy a table. */
+extern void VG_(HT_destruct) ( VgHashTable t );
 
 
-/* Use this to free blocks if VG_(needs).alternative_free == True. 
-   It frees the ShadowChunk and the malloc'd block it points to. */
-extern void VG_(free_ShadowChunk) ( ShadowChunk* sc );
+/*====================================================================*/
+/*=== General stuff for replacing functions                        ===*/
+/*====================================================================*/
 
-/* Makes an array of pointers to all the shadow chunks of malloc'd blocks */
-extern ShadowChunk** VG_(get_malloc_shadows) ( /*OUT*/ UInt* n_shadows );
+/* Some skins need to replace the standard definitions of some functions. */
 
-/* Determines if address 'a' is within the bounds of the block at start.
-   Allows a little 'slop' round the edges. */
-extern Bool VG_(addr_is_in_block) ( Addr a, Addr start, UInt size );
+/* ------------------------------------------------------------------ */
+/* General stuff, for replacing any functions */
 
-/* Searches through currently malloc'd blocks until a matching one is found.
-   Returns NULL if none match.  Extra arguments can be implicitly passed to
-   p using nested functions; see memcheck/mc_errcontext.c for an example. */
-extern ShadowChunk* VG_(first_matching_mallocd_ShadowChunk) 
-                        ( Bool (*p) ( ShadowChunk* ));
+/* Is the client running on the simulated CPU or the real one? 
 
-/* Searches through all thread's stacks to see if any match.  Returns
- * VG_INVALID_THREADID if none match. */
-extern ThreadId VG_(first_matching_thread_stack)
-                        ( Bool (*p) ( Addr stack_min, Addr stack_max ));
+   Nb: If it is, and you want to call a function to be run on the real CPU,
+   use one of the VG_NON_SIMD_CALL[123] macros in valgrind.h to call it.
 
-/* Do memory leak detection. */
-extern void VG_(generic_detect_memory_leaks) (
-          Bool is_valid_64k_chunk ( UInt ),
-          Bool is_valid_address ( Addr ),
-          ExeContext* get_where ( ShadowChunk* ),
-          VgRes leak_resolution,
-          Bool  show_reachable,
-          UInt /*CoreErrorKind*/ leakSupp
-       );
+   Nb: don't forget the function parentheses when using this in a 
+   condition... write this:
+
+     if (VG_(is_running_on_simd_CPU)()) { ... }    // calls function
+
+   not this:
+     
+     if (VG_(is_running_on_simd_CPU)) { ... }      // address of var!
+*/
+extern Bool VG_(is_running_on_simd_CPU) ( void ); 
+
+
+/*====================================================================*/
+/*=== Specific stuff for replacing malloc() and friends            ===*/
+/*====================================================================*/
+
+/* ------------------------------------------------------------------ */
+/* Replacing malloc() and friends */
+
+/* If a skin replaces malloc() et al, the easiest way to do so is to link
+   with coregrind/vg_replace_malloc.c, and follow the following instructions.
+   You can do it from scratch, though, if you enjoy that sort of thing. */
+
+/* Arena size for valgrind's own malloc();  default value is 0, but can
+   be overridden by skin -- but must be done so *statically*, eg:
+  
+     Int VG_(vg_malloc_redzone_szB) = 4;
+  
+   It can't be done from a function like SK_(pre_clo_init)().  So it can't,
+   for example, be controlled with a command line option, unfortunately. */
+extern UInt VG_(vg_malloc_redzone_szB);
+
+/* If a skin links with vg_replace_malloc.c, the following functions will be
+   called appropriately when malloc() et al are called. */
+extern void* SK_(malloc)               ( ThreadState* tst, Int n );
+extern void* SK_(__builtin_new)        ( ThreadState* tst, Int n );
+extern void* SK_(__builtin_vec_new)    ( ThreadState* tst, Int n );
+extern void* SK_(memalign)             ( ThreadState* tst, Int align, Int n );
+extern void* SK_(calloc)               ( ThreadState* tst, Int nmemb, Int n );
+extern void  SK_(free)                 ( ThreadState* tst, void* p );
+extern void  SK_(__builtin_delete)     ( ThreadState* tst, void* p );
+extern void  SK_(__builtin_vec_delete) ( ThreadState* tst, void* p );
+extern void* SK_(realloc)              ( ThreadState* tst, void* p, Int size );
+
+/* Can be called from SK_(malloc) et al to do the actual alloc/freeing. */
+extern void* VG_(cli_malloc) ( UInt align, Int nbytes ); 
+extern void  VG_(cli_free)   ( void* p );
+
+/* Check if an address is within a range, allowing for redzones at edges */
+extern Bool VG_(addr_is_in_block)( Addr a, Addr start, UInt size );
+
+/* ------------------------------------------------------------------ */
+/* Some options that can be used by a skin if malloc() et al are replaced. 
+   The skin should use the VG_(process...)() and VG_(print...)() functions
+   to give control over these aspects of Valgrind's version of malloc(). */
+
+/* Round malloc sizes upwards to integral number of words? default: NO */
+extern Bool VG_(clo_sloppy_malloc);
+/* DEBUG: print malloc details?  default: NO */
+extern Bool VG_(clo_trace_malloc);
+/* Minimum alignment in functions that don't specify alignment explicitly.
+   default: 0, i.e. use default of the machine (== 4) */
+extern Int  VG_(clo_alignment);
+
+extern Bool VG_(replacement_malloc_process_cmd_line_option) ( Char* arg );
+extern void VG_(replacement_malloc_print_usage)             ( void );
+extern void VG_(replacement_malloc_print_debug_usage)       ( void );
 
 
 /*====================================================================*/
@@ -1376,7 +1455,6 @@
 
 /* Want to have errors detected by Valgrind's core reported?  Includes:
    - pthread API errors (many;  eg. unlocking a non-locked mutex)
-   - silly arguments to malloc() et al (eg. negative size)
    - invalid file descriptors to blocking syscalls read() and write()
    - bad signal numbers passed to sigaction()
    - attempt to install signal handler for SIGKILL or SIGSTOP */  
@@ -1412,16 +1490,6 @@
 /* Skin does stuff before and/or after system calls? */
 extern void VG_(needs_syscall_wrapper) ( void );
 
-/* Size, in words, of extra info about malloc'd blocks recorded by
-   skin.  Be careful to get this right or you'll get seg faults! */
-extern void VG_(needs_sizeof_shadow_block) ( Int size );
-
-/* Skin does free()s itself?  Useful if a skin needs to keep track of
-   blocks in some way after they're free'd.  
-   WARNING: don't forget to call VG_(free_ShadowChunk)() for each block 
-   eventually! */
-extern void VG_(needs_alternative_free) ( void );
-
 /* Are skin-state sanity checks performed? */
 extern void VG_(needs_sanity_checks) ( void );
 
@@ -1443,17 +1511,30 @@
    function to the appropriate function.  To ignore an event, don't do
    anything (default is for events to be ignored). */
 
-/* Memory events */
 
+/* Memory events (Nb: to track heap allocation/freeing, a skin must replace
+   malloc() et al.  See above how to do this.) */
+
+/* These ones occur at startup, upon some signals, and upon some syscalls */
 EV VG_(track_new_mem_startup) ( void (*f)(Addr a, UInt len, 
                                           Bool rr, Bool ww, Bool xx) );
-EV VG_(track_new_mem_heap)    ( void (*f)(Addr a, UInt len, Bool is_inited) );
 EV VG_(track_new_mem_stack_signal)  ( void (*f)(Addr a, UInt len) );
 EV VG_(track_new_mem_brk)     ( void (*f)(Addr a, UInt len) );
 EV VG_(track_new_mem_mmap)    ( void (*f)(Addr a, UInt len,
                                           Bool rr, Bool ww, Bool xx) );
 
-/* The specialised ones are called in preference to the general one, if they
+EV VG_(track_copy_mem_remap)  ( void (*f)(Addr from, Addr to, UInt len) );
+EV VG_(track_change_mem_mprotect) ( void (*f)(Addr a, UInt len,
+                                              Bool rr, Bool ww, Bool xx) );
+EV VG_(track_die_mem_stack_signal)  ( void (*f)(Addr a, UInt len) );
+EV VG_(track_die_mem_brk)     ( void (*f)(Addr a, UInt len) );
+EV VG_(track_die_mem_munmap)  ( void (*f)(Addr a, UInt len) );
+
+
+/* These ones are called when %esp changes.  A skin could track these itself
+   (except for ban_mem_stack) but it's much easier to use the core's help.
+  
+   The specialised ones are called in preference to the general one, if they
    are defined.  These functions are called a lot if they are used, so
    specialising can optimise things significantly.  If any of the
    specialised cases are defined, the general case must be defined too. 
@@ -1466,23 +1547,6 @@
 EV VG_(track_new_mem_stack_32) ( void (*f)(Addr new_ESP) );
 EV VG_(track_new_mem_stack)    ( void (*f)(Addr a, UInt len) );
 
-EV VG_(track_change_mem_stack)    ( void (*f)(Addr new_ESP) );
-
-EV VG_(track_copy_mem_heap)   ( void (*f)(Addr from, Addr to, UInt len) );
-EV VG_(track_copy_mem_remap)  ( void (*f)(Addr from, Addr to, UInt len) );
-EV VG_(track_change_mem_mprotect) ( void (*f)(Addr a, UInt len,
-                                              Bool rr, Bool ww, Bool xx) );
-      
-/* Used on redzones around malloc'd blocks and at end of stack */
-EV VG_(track_ban_mem_heap)    ( void (*f)(Addr a, UInt len) );
-EV VG_(track_ban_mem_stack)   ( void (*f)(Addr a, UInt len) );
-
-EV VG_(track_die_mem_heap)    ( void (*f)(Addr a, UInt len) );
-EV VG_(track_die_mem_stack_signal)  ( void (*f)(Addr a, UInt len) );
-EV VG_(track_die_mem_brk)     ( void (*f)(Addr a, UInt len) );
-EV VG_(track_die_mem_munmap)  ( void (*f)(Addr a, UInt len) );
-
-/* See comments for VG_(track_new_mem_stack_4) et al above */
 EV VG_(track_die_mem_stack_4)  ( void (*f)(Addr die_ESP) );
 EV VG_(track_die_mem_stack_8)  ( void (*f)(Addr die_ESP) );
 EV VG_(track_die_mem_stack_12) ( void (*f)(Addr die_ESP) );
@@ -1490,9 +1554,10 @@
 EV VG_(track_die_mem_stack_32) ( void (*f)(Addr die_ESP) );
 EV VG_(track_die_mem_stack)    ( void (*f)(Addr a, UInt len) );
 
-EV VG_(track_bad_free)        ( void (*f)(ThreadState* tst, Addr a) );
-EV VG_(track_mismatched_free) ( void (*f)(ThreadState* tst, Addr a) );
+/* Used for redzone at end of thread stacks */
+EV VG_(track_ban_mem_stack)   ( void (*f)(Addr a, UInt len) );
 
+/* These ones occur around syscalls, signal handling, etc */
 EV VG_(track_pre_mem_read)    ( void (*f)(CorePart part, ThreadState* tst,
                                           Char* s, Addr a, UInt size) );
 EV VG_(track_pre_mem_read_asciiz) ( void (*f)(CorePart part, ThreadState* tst,
@@ -1666,9 +1731,11 @@
    record the option as well. */
 extern Bool SK_(process_cmd_line_option) ( Char* argv );
 
-/* Print out command line usage for skin options */
-extern Char* SK_(usage)                  ( void );
+/* Print out command line usage for options for normal skin operation. */
+extern void SK_(print_usage)             ( void );
 
+/* Print out command line usage for options for debugging the skin. */
+extern void SK_(print_debug_usage)       ( void );
 
 /* ------------------------------------------------------------------ */
 /* VG_(needs).client_requests */
@@ -1718,22 +1785,6 @@
                                  Bool is_blocking );
 
 
-/* ------------------------------------------------------------------ */
-/* VG_(needs).sizeof_shadow_chunk (if > 0) */
-
-/* Must fill in the `extra' part, using VG_(set_sc_extra)(). */
-extern void SK_(complete_shadow_chunk) ( ShadowChunk* sc, ThreadState* tst );
-
-
-/* ------------------------------------------------------------------ */
-/* VG_(needs).alternative_free */
-
-/* If this need is set, when a dynamic block would normally be free'd, this
-   is called instead.  The block is contained inside the ShadowChunk;  use
-   the VG_(get_sc_*)() functions to access it. */
-extern void SK_(alt_free) ( ShadowChunk* sc, ThreadState* tst );
-
-
 /* ---------------------------------------------------------------------
    VG_(needs).sanity_checks */
 
diff --git a/memcheck/Makefile.am b/memcheck/Makefile.am
index 9c32164..f8b323d 100644
--- a/memcheck/Makefile.am
+++ b/memcheck/Makefile.am
@@ -15,14 +15,17 @@
 
 vgskin_memcheck_so_SOURCES = \
 	mac_leakcheck.c \
+	mac_malloc_wrappers.c \
 	mac_needs.c \
 	mc_main.c \
 	mc_clientreqs.c \
 	mc_errcontext.c \
 	mc_from_ucode.c \
+	mc_replace_strmem.c \
 	mc_translate.c \
 	mc_helpers.S
 vgskin_memcheck_so_LDFLAGS = -shared
+vgskin_memcheck_so_LDADD = ../coregrind/vg_replace_malloc.o
 
 mcincludedir = $(includedir)/valgrind
 
@@ -34,3 +37,5 @@
 	mc_constants.h	\
 	mc_include.h
 
+mc_replace_strmem.o: CFLAGS += -fno-omit-frame-pointer
+
diff --git a/memcheck/mac_leakcheck.c b/memcheck/mac_leakcheck.c
index 83d3ada..9894b86 100644
--- a/memcheck/mac_leakcheck.c
+++ b/memcheck/mac_leakcheck.c
@@ -224,9 +224,9 @@
 #ifdef VG_DEBUG_LEAKCHECK
 /* Used to sanity-check the fast binary-search mechanism. */
 static 
-Int find_shadow_for_OLD ( Addr          ptr, 
-                          ShadowChunk** shadows,
-                          Int           n_shadows )
+Int find_shadow_for_OLD ( Addr        ptr, 
+                          MAC_Chunk** shadows,
+                          Int         n_shadows )
 
 {
    Int  i;
@@ -245,9 +245,9 @@
 
 
 static 
-Int find_shadow_for ( Addr          ptr, 
-                      ShadowChunk** shadows,
-                      Int           n_shadows )
+Int find_shadow_for ( Addr        ptr, 
+                      MAC_Chunk** shadows,
+                      Int         n_shadows )
 {
    Addr a_mid_lo, a_mid_hi;
    Int lo, mid, hi, retVal;
@@ -256,14 +256,12 @@
    lo = 0;
    hi = n_shadows-1;
    while (True) {
-      /* invariant: current unsearched space is from lo to hi,
-         inclusive. */
+      /* invariant: current unsearched space is from lo to hi, inclusive. */
       if (lo > hi) break; /* not found */
 
       mid      = (lo + hi) / 2;
-      a_mid_lo = VG_(get_sc_data)(shadows[mid]);
-      a_mid_hi = VG_(get_sc_data)(shadows[mid]) + 
-                 VG_(get_sc_size)(shadows[mid]) - 1;
+      a_mid_lo = shadows[mid]->data;
+      a_mid_hi = shadows[mid]->data + shadows[mid]->size - 1;
 
       if (ptr < a_mid_lo) {
          hi = mid-1;
@@ -286,11 +284,11 @@
 }
 
 /* Globals, for the following callback used by VG_(detect_memory_leaks). */
-static ShadowChunk**  vglc_shadows;
-static Int            vglc_n_shadows;
-static Reachedness*   vglc_reachedness;
-static Addr           vglc_min_mallocd_addr;
-static Addr           vglc_max_mallocd_addr;
+static MAC_Chunk**  lc_shadows;
+static Int          lc_n_shadows;
+static Reachedness* lc_reachedness;
+static Addr         lc_min_mallocd_addr;
+static Addr         lc_max_mallocd_addr;
 
 static 
 void vg_detect_memory_leaks_notify_addr ( Addr a, UInt word_at_a )
@@ -313,29 +311,28 @@
       where the .bss segment has been put.  If you can, drop me a
       line.  
    */
-   if (VG_(within_stack)(a))                return;
-   if (VG_(within_m_state_static)(a))       return;
-   if (a == (Addr)(&vglc_min_mallocd_addr)) return;
-   if (a == (Addr)(&vglc_max_mallocd_addr)) return;
+   if (VG_(within_stack)(a))              return;
+   if (VG_(within_m_state_static)(a))     return;
+   if (a == (Addr)(&lc_min_mallocd_addr)) return;
+   if (a == (Addr)(&lc_max_mallocd_addr)) return;
 
    /* OK, let's get on and do something Useful for a change. */
 
    ptr = (Addr)word_at_a;
-   if (ptr >= vglc_min_mallocd_addr && ptr <= vglc_max_mallocd_addr) {
+   if (ptr >= lc_min_mallocd_addr && ptr <= lc_max_mallocd_addr) {
       /* Might be legitimate; we'll have to investigate further. */
-      sh_no = find_shadow_for ( ptr, vglc_shadows, vglc_n_shadows );
+      sh_no = find_shadow_for ( ptr, lc_shadows, lc_n_shadows );
       if (sh_no != -1) {
          /* Found a block at/into which ptr points. */
-         sk_assert(sh_no >= 0 && sh_no < vglc_n_shadows);
-         sk_assert(ptr < VG_(get_sc_data)(vglc_shadows[sh_no])
-                       + VG_(get_sc_size)(vglc_shadows[sh_no]));
+         sk_assert(sh_no >= 0 && sh_no < lc_n_shadows);
+         sk_assert(ptr < lc_shadows[sh_no]->data + lc_shadows[sh_no]->size);
          /* Decide whether Proper-ly or Interior-ly reached. */
-         if (ptr == VG_(get_sc_data)(vglc_shadows[sh_no])) {
+         if (ptr == lc_shadows[sh_no]->data) {
             if (0) VG_(printf)("pointer at %p to %p\n", a, word_at_a );
-            vglc_reachedness[sh_no] = Proper;
+            lc_reachedness[sh_no] = Proper;
          } else {
-            if (vglc_reachedness[sh_no] == Unreached)
-               vglc_reachedness[sh_no] = Interior;
+            if (lc_reachedness[sh_no] == Unreached)
+               lc_reachedness[sh_no] = Interior;
          }
       }
    }
@@ -385,25 +382,33 @@
    LossRecord* errlist;
    LossRecord* p;
 
-   /* VG_(get_malloc_shadows) allocates storage for shadows */
-   vglc_shadows = VG_(get_malloc_shadows)( &vglc_n_shadows );
-   if (vglc_n_shadows == 0) {
-      sk_assert(vglc_shadows == NULL);
+   /* VG_(HashTable_to_array) allocates storage for shadows */
+   lc_shadows = (MAC_Chunk**)VG_(HT_to_sorted_array)( MAC_(malloc_list),
+                                                        &lc_n_shadows );
+
+   /* Sanity check -- make sure they don't overlap */
+   for (i = 0; i < lc_n_shadows-1; i++) {
+      sk_assert( lc_shadows[i]->data + lc_shadows[i]->size
+                 < lc_shadows[i+1]->data );
+   }
+
+   if (lc_n_shadows == 0) {
+      sk_assert(lc_shadows == NULL);
       VG_(message)(Vg_UserMsg, 
                    "No malloc'd blocks -- no leaks are possible.");
       return;
    }
 
    VG_(message)(Vg_UserMsg, "searching for pointers to %d not-freed blocks.", 
-                vglc_n_shadows );
+                lc_n_shadows );
 
-   vglc_min_mallocd_addr = VG_(get_sc_data)(vglc_shadows[0]);
-   vglc_max_mallocd_addr = VG_(get_sc_data)(vglc_shadows[vglc_n_shadows-1])
-                         + VG_(get_sc_size)(vglc_shadows[vglc_n_shadows-1]) - 1;
+   lc_min_mallocd_addr = lc_shadows[0]->data;
+   lc_max_mallocd_addr = lc_shadows[lc_n_shadows-1]->data
+                         + lc_shadows[lc_n_shadows-1]->size - 1;
 
-   vglc_reachedness = VG_(malloc)( vglc_n_shadows * sizeof(Reachedness) );
-   for (i = 0; i < vglc_n_shadows; i++)
-      vglc_reachedness[i] = Unreached;
+   lc_reachedness = VG_(malloc)( lc_n_shadows * sizeof(Reachedness) );
+   for (i = 0; i < lc_n_shadows; i++)
+      lc_reachedness[i] = Unreached;
 
    /* Do the scan of memory. */
    bytes_notified
@@ -419,12 +424,12 @@
    /* Common up the lost blocks so we can print sensible error messages. */
    n_lossrecords = 0;
    errlist       = NULL;
-   for (i = 0; i < vglc_n_shadows; i++) {
+   for (i = 0; i < lc_n_shadows; i++) {
      
-      ExeContext* where = MAC_(get_where) ( vglc_shadows[i] );
+      ExeContext* where = lc_shadows[i]->where;
       
       for (p = errlist; p != NULL; p = p->next) {
-         if (p->loss_mode == vglc_reachedness[i]
+         if (p->loss_mode == lc_reachedness[i]
              && VG_(eq_ExeContext) ( MAC_(clo_leak_resolution),
                                      p->allocated_at, 
                                      where) ) {
@@ -433,13 +438,13 @@
       }
       if (p != NULL) {
          p->num_blocks  ++;
-         p->total_bytes += VG_(get_sc_size)(vglc_shadows[i]);
+         p->total_bytes += lc_shadows[i]->size;
       } else {
          n_lossrecords ++;
          p = VG_(malloc)(sizeof(LossRecord));
-         p->loss_mode    = vglc_reachedness[i];
+         p->loss_mode    = lc_reachedness[i];
          p->allocated_at = where;
-         p->total_bytes  = VG_(get_sc_size)(vglc_shadows[i]);
+         p->total_bytes  = lc_shadows[i]->size;
          p->num_blocks   = 1;
          p->next         = errlist;
          errlist         = p;
@@ -474,7 +479,8 @@
       is_suppressed = 
          VG_(unique_error) ( /*tst*/NULL, LeakErr, (UInt)i+1,
                              (Char*)n_lossrecords, (void*) p_min,
-                             p_min->allocated_at, print_record );
+                             p_min->allocated_at, print_record,
+                             /*allow_GDB_attach*/False );
 
       if (is_suppressed) {
          blocks_suppressed += p_min->num_blocks;
@@ -516,8 +522,8 @@
    }
    VG_(message)(Vg_UserMsg, "");
 
-   VG_(free) ( vglc_shadows );
-   VG_(free) ( vglc_reachedness );
+   VG_(free) ( lc_shadows );
+   VG_(free) ( lc_reachedness );
 }
 
 /*--------------------------------------------------------------------*/
diff --git a/memcheck/mac_malloc_wrappers.c b/memcheck/mac_malloc_wrappers.c
new file mode 100644
index 0000000..0636477
--- /dev/null
+++ b/memcheck/mac_malloc_wrappers.c
@@ -0,0 +1,415 @@
+
+/*--------------------------------------------------------------------*/
+/*--- malloc/free wrappers for detecting errors and updating bits. ---*/
+/*---                                        mac_malloc_wrappers.c ---*/
+/*--------------------------------------------------------------------*/
+
+/*
+   This file is part of MemCheck, a heavyweight Valgrind skin for
+   detecting memory errors, and AddrCheck, a lightweight Valgrind skin 
+   for detecting memory errors.
+
+   Copyright (C) 2000-2002 Julian Seward 
+      jseward@acm.org
+
+   This program is free software; you can redistribute it and/or
+   modify it under the terms of the GNU General Public License as
+   published by the Free Software Foundation; either version 2 of the
+   License, or (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software
+   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+   02111-1307, USA.
+
+   The GNU General Public License is contained in the file COPYING.
+*/
+
+#include "mac_shared.h"
+
+/*------------------------------------------------------------*/
+/*--- Defns                                                ---*/
+/*------------------------------------------------------------*/
+
+/* Stats ... */
+static UInt cmalloc_n_mallocs  = 0;
+static UInt cmalloc_n_frees    = 0;
+static UInt cmalloc_bs_mallocd = 0;
+
+/* We want a 16B redzone on heap blocks for Addrcheck and Memcheck */
+UInt VG_(vg_malloc_redzone_szB) = 16;
+
+/*------------------------------------------------------------*/
+/*--- Tracking malloc'd and free'd blocks                  ---*/
+/*------------------------------------------------------------*/
+
+/* Record malloc'd blocks.  Nb: Addrcheck and Memcheck construct this
+   separately in their respective initialisation functions. */
+VgHashTable MAC_(malloc_list) = NULL;
+   
+/* Records blocks after freeing. */
+static MAC_Chunk* freed_list_start  = NULL;
+static MAC_Chunk* freed_list_end    = NULL;
+static Int        freed_list_volume = 0;
+
+/* Put a shadow chunk on the freed blocks queue, possibly freeing up
+   some of the oldest blocks in the queue at the same time. */
+static void add_to_freed_queue ( MAC_Chunk* mc )
+{
+   MAC_Chunk* sc1;
+
+   /* Put it at the end of the freed list */
+   if (freed_list_end == NULL) {
+      sk_assert(freed_list_start == NULL);
+      freed_list_end    = freed_list_start = mc;
+      freed_list_volume = mc->size;
+   } else {
+      sk_assert(freed_list_end->next == NULL);
+      freed_list_end->next = mc;
+      freed_list_end       = mc;
+      freed_list_volume += mc->size;
+   }
+   mc->next = NULL;
+
+   /* Release enough of the oldest blocks to bring the free queue
+      volume below vg_clo_freelist_vol. */
+
+   while (freed_list_volume > MAC_(clo_freelist_vol)) {
+      sk_assert(freed_list_start != NULL);
+      sk_assert(freed_list_end != NULL);
+
+      sc1 = freed_list_start;
+      freed_list_volume -= sc1->size;
+      /* VG_(printf)("volume now %d\n", freed_list_volume); */
+      sk_assert(freed_list_volume >= 0);
+
+      if (freed_list_start == freed_list_end) {
+         freed_list_start = freed_list_end = NULL;
+      } else {
+         freed_list_start = sc1->next;
+      }
+      sc1->next = NULL; /* just paranoia */
+
+      /* free MAC_Chunk */
+      VG_(cli_free) ( (void*)(sc1->data) );
+      VG_(free) ( sc1 );
+   }
+}
+
+/* Return the first shadow chunk satisfying the predicate p. */
+MAC_Chunk* MAC_(first_matching_freed_MAC_Chunk) ( Bool (*p)(MAC_Chunk*) )
+{
+   MAC_Chunk* mc;
+
+   /* No point looking through freed blocks if we're not keeping
+      them around for a while... */
+   for (mc = freed_list_start; mc != NULL; mc = mc->next)
+      if (p(mc))
+         return mc;
+
+   return NULL;
+}
+
+/* Allocate a user-chunk of size bytes.  Also allocate its shadow
+   block, make the shadow block point at the user block.  Put the
+   shadow chunk on the appropriate list, and set all memory
+   protections correctly. */
+
+static void add_MAC_Chunk ( ThreadState* tst,
+                            Addr p, UInt size, MAC_AllocKind kind )
+{
+   MAC_Chunk* mc;
+
+   mc            = VG_(malloc)(sizeof(MAC_Chunk));
+   mc->data      = p;
+   mc->size      = size;
+   mc->allockind = kind;
+   mc->where     = VG_(get_ExeContext)(tst);
+
+   VG_(HT_add_node)( MAC_(malloc_list), (VgHashNode*)mc );
+}
+
+/*------------------------------------------------------------*/
+/*--- client_malloc(), etc                                 ---*/
+/*------------------------------------------------------------*/
+
+/* Function pointers for the two skins to track interesting events. */
+void (*MAC_(new_mem_heap)) ( Addr a, UInt len, Bool is_inited );
+void (*MAC_(ban_mem_heap)) ( Addr a, UInt len );
+void (*MAC_(die_mem_heap)) ( Addr a, UInt len );
+void (*MAC_(copy_mem_heap))( Addr from, Addr to, UInt len );
+
+/* Allocate memory and note change in memory available */
+static __inline__
+void* alloc_and_new_mem ( ThreadState* tst, UInt size, UInt alignment,
+                          Bool is_zeroed, MAC_AllocKind kind )
+{
+   Addr p;
+
+   VGP_PUSHCC(VgpCliMalloc);
+
+   cmalloc_n_mallocs ++;
+   cmalloc_bs_mallocd += size;
+
+   p = (Addr)VG_(cli_malloc)(alignment, size);
+
+   add_MAC_Chunk ( tst, p, size, kind );
+
+   MAC_(ban_mem_heap)( p-VG_(vg_malloc_redzone_szB), 
+                         VG_(vg_malloc_redzone_szB) );
+   MAC_(new_mem_heap)( p, size, is_zeroed );
+   MAC_(ban_mem_heap)( p+size, VG_(vg_malloc_redzone_szB) );
+
+   VGP_POPCC(VgpCliMalloc);
+   return (void*)p;
+}
+
+void* SK_(malloc) ( ThreadState* tst, Int n )
+{
+   if (n < 0) {
+      VG_(message)(Vg_UserMsg, "Warning: silly arg (%d) to malloc()", n );
+      return NULL;
+   } else {
+      return alloc_and_new_mem ( tst, n, VG_(clo_alignment), 
+                                 /*is_zeroed*/False, MAC_AllocMalloc );
+   }
+}
+
+void* SK_(__builtin_new) ( ThreadState* tst, Int n )
+{
+   if (n < 0) {
+      VG_(message)(Vg_UserMsg, "Warning: silly arg (%d) to __builtin_new()", n);
+      return NULL;
+   } else {
+      return alloc_and_new_mem ( tst, n, VG_(clo_alignment), 
+                                 /*is_zeroed*/False, MAC_AllocNew );
+   }
+}
+
+void* SK_(__builtin_vec_new) ( ThreadState* tst, Int n )
+{
+   if (n < 0) {
+      VG_(message)(Vg_UserMsg, 
+                   "Warning: silly arg (%d) to __builtin_vec_new()", n );
+      return NULL;
+   } else {
+      return alloc_and_new_mem ( tst, n, VG_(clo_alignment), 
+                                 /*is_zeroed*/False, MAC_AllocNewVec );
+   }
+}
+
+void* SK_(memalign) ( ThreadState* tst, Int align, Int n )
+{
+   if (n < 0) {
+      VG_(message)(Vg_UserMsg, "Warning: silly arg (%d) to memalign()", n);
+      return NULL;
+   } else {
+      return alloc_and_new_mem ( tst, n, align, /*is_zeroed*/False, 
+                                 MAC_AllocMalloc );
+   }
+}
+
+void* SK_(calloc) ( ThreadState* tst, Int nmemb, Int size1 )
+{
+   void* p;
+   Int   size, i;
+
+   size = nmemb * size1;
+
+   if (nmemb < 0 || size1 < 0) {
+      VG_(message)(Vg_UserMsg, "Warning: silly args (%d,%d) to calloc()",
+                               nmemb, size1 );
+      return NULL;
+   } else {
+      p = alloc_and_new_mem ( tst, size, VG_(clo_alignment), 
+                              /*is_zeroed*/True, MAC_AllocMalloc );
+      for (i = 0; i < size; i++) 
+         ((UChar*)p)[i] = 0;
+      return p;
+   }
+}
+
+static
+void die_and_free_mem ( ThreadState* tst, MAC_Chunk* mc,
+                        MAC_Chunk** prev_chunks_next_ptr )
+{
+   /* Note: ban redzones again -- just in case user de-banned them
+      with a client request... */
+   MAC_(ban_mem_heap)( mc->data-VG_(vg_malloc_redzone_szB), 
+                                VG_(vg_malloc_redzone_szB) );
+   MAC_(die_mem_heap)( mc->data, mc->size );
+   MAC_(ban_mem_heap)( mc->data+mc->size, VG_(vg_malloc_redzone_szB) );
+
+   /* Remove mc from the malloclist using prev_chunks_next_ptr to
+      avoid repeating the hash table lookup.  Can't remove until at least
+      after free and free_mismatch errors are done because they use
+      describe_addr() which looks for it in malloclist. */
+   *prev_chunks_next_ptr = mc->next;
+
+   /* Record where freed */
+   mc->where = VG_(get_ExeContext) ( tst );
+
+   /* Put it out of harm's way for a while. */
+   add_to_freed_queue ( mc );
+}
+
+
+static __inline__
+void handle_free ( ThreadState* tst, void* p, MAC_AllocKind kind )
+{
+   MAC_Chunk*  mc;
+   MAC_Chunk** prev_chunks_next_ptr;
+
+   VGP_PUSHCC(VgpCliMalloc);
+
+   cmalloc_n_frees++;
+
+   mc = (MAC_Chunk*)VG_(HT_get_node) ( MAC_(malloc_list), (UInt)p,
+                                       (VgHashNode***)&prev_chunks_next_ptr );
+
+   if (mc == NULL) {
+      MAC_(record_free_error) ( tst, (Addr)p );
+      VGP_POPCC(VgpCliMalloc);
+      return;
+   }
+
+   /* check if its a matching free() / delete / delete [] */
+   if (kind != mc->allockind) {
+      MAC_(record_freemismatch_error) ( tst, (Addr)p );
+   }
+
+   die_and_free_mem ( tst, mc, prev_chunks_next_ptr );
+   VGP_POPCC(VgpCliMalloc);
+}
+
+void SK_(free) ( ThreadState* tst, void* p )
+{
+   handle_free(tst, p, MAC_AllocMalloc);
+}
+
+void SK_(__builtin_delete) ( ThreadState* tst, void* p )
+{
+   handle_free(tst, p, MAC_AllocNew);
+}
+
+void SK_(__builtin_vec_delete) ( ThreadState* tst, void* p )
+{
+   handle_free(tst, p, MAC_AllocNewVec);
+}
+
+void* SK_(realloc) ( ThreadState* tst, void* p, Int new_size )
+{
+   MAC_Chunk  *mc;
+   MAC_Chunk **prev_chunks_next_ptr;
+   UInt        i;
+
+   VGP_PUSHCC(VgpCliMalloc);
+
+   cmalloc_n_frees ++;
+   cmalloc_n_mallocs ++;
+   cmalloc_bs_mallocd += new_size;
+
+   if (new_size < 0) {
+      VG_(message)(Vg_UserMsg, 
+                   "Warning: silly arg (%d) to realloc()", new_size );
+      return NULL;
+   }
+
+   /* First try and find the block. */
+   mc = (MAC_Chunk*)VG_(HT_get_node) ( MAC_(malloc_list), (UInt)p,
+                                       (VgHashNode***)&prev_chunks_next_ptr );
+
+   if (mc == NULL) {
+      MAC_(record_free_error) ( tst, (Addr)p );
+      /* Perhaps we should return to the program regardless. */
+      VGP_POPCC(VgpCliMalloc);
+      return NULL;
+   }
+  
+   /* check if its a matching free() / delete / delete [] */
+   if (MAC_AllocMalloc != mc->allockind) {
+      /* can not realloc a range that was allocated with new or new [] */
+      MAC_(record_freemismatch_error) ( tst, (Addr)p );
+      /* but keep going anyway */
+   }
+
+   if (mc->size == new_size) {
+      /* size unchanged */
+      VGP_POPCC(VgpCliMalloc);
+      return p;
+      
+   } else if (mc->size > new_size) {
+      /* new size is smaller */
+      MAC_(die_mem_heap)( mc->data+new_size, mc->size-new_size );
+      mc->size = new_size;
+      VGP_POPCC(VgpCliMalloc);
+      return p;
+
+   } else {
+      /* new size is bigger */
+      Addr p_new;
+
+      /* Get new memory */
+      p_new = (Addr)VG_(cli_malloc)(VG_(clo_alignment), new_size);
+
+      /* First half kept and copied, second half new, 
+         red zones as normal */
+      MAC_(ban_mem_heap) ( p_new-VG_(vg_malloc_redzone_szB), 
+                                 VG_(vg_malloc_redzone_szB) );
+      MAC_(copy_mem_heap)( (Addr)p, p_new, mc->size );
+      MAC_(new_mem_heap) ( p_new+mc->size, new_size-mc->size, /*inited*/False );
+      MAC_(ban_mem_heap) ( p_new+new_size, VG_(vg_malloc_redzone_szB) );
+
+      /* Copy from old to new */
+      for (i = 0; i < mc->size; i++)
+         ((UChar*)p_new)[i] = ((UChar*)p)[i];
+
+      /* Free old memory */
+      die_and_free_mem ( tst, mc, prev_chunks_next_ptr );
+
+      /* this has to be after die_and_free_mem, otherwise the
+         former succeeds in shorting out the new block, not the
+         old, in the case when both are on the same list.  */
+      add_MAC_Chunk ( tst, p_new, new_size, MAC_AllocMalloc );
+
+      VGP_POPCC(VgpCliMalloc);
+      return (void*)p_new;
+   }  
+}
+
+void MAC_(print_malloc_stats) ( void )
+{
+   UInt nblocks = 0, nbytes = 0;
+   
+   /* Mmm... more lexical scoping */
+   void count_one_chunk(VgHashNode* node) {
+      MAC_Chunk* mc = (MAC_Chunk*)node;
+      nblocks ++;
+      nbytes  += mc->size;
+   }
+
+   if (VG_(clo_verbosity) == 0)
+      return;
+
+   /* Count memory still in use. */
+   VG_(HT_apply_to_all_nodes)(MAC_(malloc_list), count_one_chunk);
+
+   VG_(message)(Vg_UserMsg, 
+                "malloc/free: in use at exit: %d bytes in %d blocks.",
+                nbytes, nblocks);
+   VG_(message)(Vg_UserMsg, 
+                "malloc/free: %d allocs, %d frees, %u bytes allocated.",
+                cmalloc_n_mallocs,
+                cmalloc_n_frees, cmalloc_bs_mallocd);
+   if (VG_(clo_verbosity) > 1)
+      VG_(message)(Vg_UserMsg, "");
+}
+
+/*--------------------------------------------------------------------*/
+/*--- end                                    mac_malloc_wrappers.c ---*/
+/*--------------------------------------------------------------------*/
diff --git a/memcheck/mac_needs.c b/memcheck/mac_needs.c
index ad6b2f6..84c778c 100644
--- a/memcheck/mac_needs.c
+++ b/memcheck/mac_needs.c
@@ -87,110 +87,27 @@
       MAC_(clo_workaround_gcc296_bugs) = False;
 
    else
-      return False;
+      return VG_(replacement_malloc_process_cmd_line_option)(arg);
 
    return True;
 }
 
-/*------------------------------------------------------------*/
-/*--- Shadow chunks info                                   ---*/
-/*------------------------------------------------------------*/
-
-void MAC_(set_where)( ShadowChunk* sc, ExeContext* ec )
+void MAC_(print_common_usage)(void)
 {
-   VG_(set_sc_extra)( sc, 0, (UInt)ec );
+   VG_(printf)(
+"    --partial-loads-ok=no|yes too hard to explain here; see manual [yes]\n"
+"    --freelist-vol=<number>   volume of freed blocks queue [1000000]\n"
+"    --leak-check=no|yes       search for memory leaks at exit? [no]\n"
+"    --leak-resolution=low|med|high  how much bt merging in leak check [low]\n"
+"    --show-reachable=no|yes   show reachable blocks in leak check? [no]\n"
+"    --workaround-gcc296-bugs=no|yes  self explanatory [no]\n"
+   );
+   VG_(replacement_malloc_print_usage)();
 }
 
-ExeContext *MAC_(get_where)( ShadowChunk* sc )
+void MAC_(print_common_debug_usage)(void)
 {
-   return (ExeContext*)VG_(get_sc_extra)(sc, 0);
-}
-
-void SK_(complete_shadow_chunk) ( ShadowChunk* sc, ThreadState* tst )
-{
-   VG_(set_sc_extra) ( sc, 0, (UInt)VG_(get_ExeContext)(tst) );
-}
-
-
-/*------------------------------------------------------------*/
-/*--- Postponing free()ing                                 ---*/
-/*------------------------------------------------------------*/
-
-/* Holds blocks after freeing. */
-static ShadowChunk* freed_list_start  = NULL;
-static ShadowChunk* freed_list_end    = NULL;
-static Int          freed_list_volume = 0;
-
-__attribute__ ((unused))
-Int MAC_(count_freelist) ( void )
-{
-   ShadowChunk* sc;
-   Int n = 0;
-   for (sc = freed_list_start; sc != NULL; sc = VG_(get_sc_next)(sc))
-      n++;
-   return n;
-}
-
-__attribute__ ((unused))
-void MAC_(freelist_sanity) ( void )
-{
-   ShadowChunk* sc;
-   Int n = 0;
-   /* VG_(printf)("freelist sanity\n"); */
-   for (sc = freed_list_start; sc != NULL; sc = VG_(get_sc_next)(sc))
-      n += VG_(get_sc_size)(sc);
-   sk_assert(n == freed_list_volume);
-}
-
-/* Put a shadow chunk on the freed blocks queue, possibly freeing up
-   some of the oldest blocks in the queue at the same time. */
-static void add_to_freed_queue ( ShadowChunk* sc )
-{
-   ShadowChunk* sc1;
-
-   /* Put it at the end of the freed list */
-   if (freed_list_end == NULL) {
-      sk_assert(freed_list_start == NULL);
-      freed_list_end = freed_list_start = sc;
-      freed_list_volume = VG_(get_sc_size)(sc);
-   } else {    
-      sk_assert(VG_(get_sc_next)(freed_list_end) == NULL);
-      VG_(set_sc_next)(freed_list_end, sc);
-      freed_list_end = sc;
-      freed_list_volume += VG_(get_sc_size)(sc);
-   }
-   VG_(set_sc_next)(sc, NULL);
-
-   /* Release enough of the oldest blocks to bring the free queue
-      volume below vg_clo_freelist_vol. */
-   
-   while (freed_list_volume > MAC_(clo_freelist_vol)) {
-      /* freelist_sanity(); */
-      sk_assert(freed_list_start != NULL);
-      sk_assert(freed_list_end != NULL);
-
-      sc1 = freed_list_start;
-      freed_list_volume -= VG_(get_sc_size)(sc1);
-      /* VG_(printf)("volume now %d\n", freed_list_volume); */
-      sk_assert(freed_list_volume >= 0);
-
-      if (freed_list_start == freed_list_end) {
-         freed_list_start = freed_list_end = NULL;
-      } else {
-         freed_list_start = VG_(get_sc_next)(sc1);
-      }
-      VG_(set_sc_next)(sc1, NULL); /* just paranoia */
-      VG_(free_ShadowChunk) ( sc1 );
-   }
-}
-
-void SK_(alt_free) ( ShadowChunk* sc, ThreadState* tst )
-{
-   /* Record where freed */
-   MAC_(set_where)( sc, VG_(get_ExeContext) ( tst ) );
-
-   /* Put it out of harm's way for a while. */
-   add_to_freed_queue ( sc );
+   VG_(replacement_malloc_print_debug_usage)();
 }
 
 /*------------------------------------------------------------*/
@@ -389,39 +306,29 @@
    MemCheck for user blocks, which Addrcheck doesn't support. */
 Bool (*MAC_(describe_addr_supp)) ( Addr a, AddrInfo* ai ) = NULL;
    
-/* Return the first shadow chunk satisfying the predicate p. */
-static ShadowChunk* first_matching_freed_ShadowChunk ( Bool (*p)(ShadowChunk*) )
-{
-   ShadowChunk* sc;
-
-   /* No point looking through freed blocks if we're not keeping
-      them around for a while... */
-   for (sc = freed_list_start; sc != NULL; sc = VG_(get_sc_next)(sc))
-      if (p(sc))
-         return sc;
-
-   return NULL;
-}
-
 /* Describe an address as best you can, for error messages,
    putting the result in ai. */
 static void describe_addr ( Addr a, AddrInfo* ai )
 {
-   ShadowChunk* sc;
-   ThreadId     tid;
+   MAC_Chunk* sc;
+   ThreadId   tid;
 
    /* Nested functions, yeah.  Need the lexical scoping of 'a'. */
-   
+
    /* Closure for searching thread stacks */
    Bool addr_is_in_bounds(Addr stack_min, Addr stack_max)
    {
       return (stack_min <= a && a <= stack_max);
    }
-   /* Closure for searching malloc'd and free'd lists */
-   Bool addr_is_in_block(ShadowChunk *sh_ch)
+   /* Closure for searching free'd list */
+   Bool addr_is_in_MAC_Chunk(MAC_Chunk* mc)
    {
-      return VG_(addr_is_in_block) ( a, VG_(get_sc_data)(sh_ch),
-                                        VG_(get_sc_size)(sh_ch) );
+      return VG_(addr_is_in_block)( a, mc->data, mc->size );
+   }
+   /* Closure for searching malloc'd lists */
+   Bool addr_is_in_HashNode(VgHashNode* sh_ch)
+   {
+      return addr_is_in_MAC_Chunk( (MAC_Chunk*)sh_ch );
    }
 
    /* Perhaps it's a user-def'd block ?  (only check if requested, though) */
@@ -437,21 +344,21 @@
       return;
    }
    /* Search for a recently freed block which might bracket it. */
-   sc = first_matching_freed_ShadowChunk(addr_is_in_block);
-   if (NULL != sc) { 
+   sc = MAC_(first_matching_freed_MAC_Chunk)(addr_is_in_MAC_Chunk);
+   if (NULL != sc) {
       ai->akind      = Freed;
-      ai->blksize    = VG_(get_sc_size)(sc);
-      ai->rwoffset   = (Int)a - (Int)VG_(get_sc_data)(sc);
-      ai->lastchange = MAC_(get_where)(sc);
+      ai->blksize    = sc->size;
+      ai->rwoffset   = (Int)a - (Int)sc->data;
+      ai->lastchange = sc->where;
       return;
    }
    /* Search for a currently malloc'd block which might bracket it. */
-   sc = VG_(first_matching_mallocd_ShadowChunk)(addr_is_in_block);
+   sc = (MAC_Chunk*)VG_(HT_first_match)(MAC_(malloc_list), addr_is_in_HashNode);
    if (NULL != sc) {
       ai->akind      = Mallocd;
-      ai->blksize    = VG_(get_sc_size)(sc);
-      ai->rwoffset   = (Int)a - (Int)VG_(get_sc_data)(sc);
-      ai->lastchange = MAC_(get_where)(sc);
+      ai->blksize    = sc->size;
+      ai->rwoffset   = (Int)(a) - (Int)sc->data;
+      ai->lastchange = sc->where;
       return;
    }
    /* Clueless ... */
@@ -459,7 +366,6 @@
    return;
 }
 
-
 /* Is this address within some small distance below %ESP?  Used only
    for the --workaround-gcc296-bugs kludge. */
 static Bool is_just_below_ESP( Addr esp, Addr aa )
@@ -798,14 +704,14 @@
 
 UInt MAC_(event_ctr)[N_PROF_EVENTS];
 
-void MAC_(init_prof_mem) ( void )
+void init_prof_mem ( void )
 {
    Int i;
    for (i = 0; i < N_PROF_EVENTS; i++)
       MAC_(event_ctr)[i] = 0;
 }
 
-void MAC_(done_prof_mem) ( void )
+void done_prof_mem ( void )
 {
    Int i;
    for (i = 0; i < N_PROF_EVENTS; i++) {
@@ -819,12 +725,39 @@
 
 #else
 
-void MAC_(init_prof_mem) ( void ) { }
-void MAC_(done_prof_mem) ( void ) { }
+void init_prof_mem ( void ) { }
+void done_prof_mem ( void ) { }
 
 #endif
 
 /*------------------------------------------------------------*/
+/*--- Common initialisation + finalisation                 ---*/
+/*------------------------------------------------------------*/
+
+void MAC_(common_pre_clo_init)(void)
+{
+   MAC_(malloc_list) = VG_(HT_construct)();
+   init_prof_mem();
+}
+
+void MAC_(common_fini)(void (*leak_check)(void))
+{
+   MAC_(print_malloc_stats)();
+
+   if (VG_(clo_verbosity) == 1) {
+      if (!MAC_(clo_leak_check))
+         VG_(message)(Vg_UserMsg, 
+             "For a detailed leak analysis,  rerun with: --leak-check=yes");
+
+      VG_(message)(Vg_UserMsg, 
+                   "For counts of detected errors, rerun with: -v");
+   }
+   if (MAC_(clo_leak_check)) leak_check();
+
+   done_prof_mem();
+}
+
+/*------------------------------------------------------------*/
 /*--- Syscall wrappers                                     ---*/
 /*------------------------------------------------------------*/
 
diff --git a/memcheck/mac_shared.h b/memcheck/mac_shared.h
index fc3d86b..147fa3f 100644
--- a/memcheck/mac_shared.h
+++ b/memcheck/mac_shared.h
@@ -120,6 +120,26 @@
    }
    MAC_Error;
 
+/* For malloc()/new/new[] vs. free()/delete/delete[] mismatch checking. */
+typedef
+   enum {
+      MAC_AllocMalloc = 0,
+      MAC_AllocNew    = 1,
+      MAC_AllocNewVec = 2
+   }
+   MAC_AllocKind;
+   
+/* Nb: first two fields must match core's VgHashNode. */
+typedef
+   struct _MAC_Chunk {
+      struct _MAC_Chunk* next;
+      Addr          data;           /* ptr to actual block              */
+      UInt          size : 30;      /* size requested                   */
+      MAC_AllocKind allockind : 2;  /* which wrapper did the allocation */
+      ExeContext*   where;          /* where it was allocated           */
+   }
+   MAC_Chunk;
+
 /*------------------------------------------------------------*/
 /*--- Profiling of skins and memory events                 ---*/
 /*------------------------------------------------------------*/
@@ -225,22 +245,36 @@
  * default: NO*/
 extern Bool MAC_(clo_workaround_gcc296_bugs);
 
-extern Bool MAC_(process_common_cmd_line_option)(Char* arg);
+extern Bool MAC_(process_common_cmd_line_option) ( Char* arg );
+extern void MAC_(print_common_usage)             ( void );
+extern void MAC_(print_common_debug_usage)       ( void );
+
+
+/*------------------------------------------------------------*/
+/*--- Variables                                            ---*/
+/*------------------------------------------------------------*/
+
+/* For tracking malloc'd blocks */
+extern VgHashTable MAC_(malloc_list);
+
+/* Function pointers for the two skins to track interesting events. */
+extern void (*MAC_(new_mem_heap)) ( Addr a, UInt len, Bool is_inited );
+extern void (*MAC_(ban_mem_heap)) ( Addr a, UInt len );
+extern void (*MAC_(die_mem_heap)) ( Addr a, UInt len );
+extern void (*MAC_(copy_mem_heap))( Addr from, Addr to, UInt len );
+
+/* Used in describe_addr() */
+extern Bool (*MAC_(describe_addr_supp))    ( Addr a, AddrInfo* ai );
 
 
 /*------------------------------------------------------------*/
 /*--- Functions                                            ---*/
 /*------------------------------------------------------------*/
 
-extern void        MAC_(set_where) ( ShadowChunk* sc, ExeContext* ec );
-extern ExeContext *MAC_(get_where) ( ShadowChunk* sc );
-
 extern void MAC_(pp_AddrInfo) ( Addr a, AddrInfo* ai );
 
 extern void MAC_(clear_MAC_Error)          ( MAC_Error* err_extra );
 
-extern Bool (*MAC_(describe_addr_supp))    ( Addr a, AddrInfo* ai );
-
 extern Bool MAC_(shared_recognised_suppression) ( Char* name, Supp* su );
 
 extern void MAC_(record_address_error)     ( Addr a, Int size, Bool isWrite );
@@ -254,13 +288,12 @@
 
 extern void MAC_(pp_shared_SkinError)      ( Error* err);
 
-extern void MAC_(init_prof_mem) ( void );
-extern void MAC_(done_prof_mem) ( void );
+extern MAC_Chunk* MAC_(first_matching_freed_MAC_Chunk)( Bool (*p)(MAC_Chunk*) );
 
-extern Int          MAC_(count_freelist)  ( void ) __attribute__ ((unused));
-extern void         MAC_(freelist_sanity) ( void ) __attribute__ ((unused));
-extern ShadowChunk* MAC_(any_matching_freed_ShadowChunks) 
-                            ( Bool (*p)(ShadowChunk*) );
+extern void MAC_(common_pre_clo_init) ( void );
+extern void MAC_(common_fini)         ( void (*leak_check)(void) );
+
+extern void MAC_(print_malloc_stats) ( void );
 
 /* For leak checking */
 extern void MAC_(pp_LeakError)(void* vl, UInt n_this_record, 
@@ -281,8 +314,8 @@
 extern __attribute__((regparm(1))) void MAC_(die_mem_stack_16) ( Addr old_ESP );
 extern __attribute__((regparm(1))) void MAC_(new_mem_stack_32) ( Addr old_ESP );
 extern __attribute__((regparm(1))) void MAC_(die_mem_stack_32) ( Addr old_ESP );
-extern                             void MAC_(die_mem_stack) ( Addr a, UInt len );
-extern                             void MAC_(new_mem_stack) ( Addr a, UInt len );
+extern                             void MAC_(die_mem_stack) ( Addr a, UInt len);
+extern                             void MAC_(new_mem_stack) ( Addr a, UInt len);
 
 
 /*------------------------------------------------------------*/
@@ -290,7 +323,7 @@
 /*------------------------------------------------------------*/
 
 /* Some noble preprocessor abuse, to enable Memcheck and Addrcheck to
-   share this code, but not call the same functions.
+   share this code, but call different functions.
 
    Note that this code is executed very frequently and must be highly
    optimised, which is why I resort to the preprocessor to achieve the
diff --git a/memcheck/mc_main.c b/memcheck/mc_main.c
index 46ae522..908fce0 100644
--- a/memcheck/mc_main.c
+++ b/memcheck/mc_main.c
@@ -1503,19 +1503,20 @@
    return True;
 }
 
-Char* SK_(usage)(void)
+void SK_(print_usage)(void)
 {  
-   return  
-"    --partial-loads-ok=no|yes too hard to explain here; see manual [yes]\n"
-"    --freelist-vol=<number>   volume of freed blocks queue [1000000]\n"
-"    --leak-check=no|yes       search for memory leaks at exit? [no]\n"
-"    --leak-resolution=low|med|high\n"
-"                              amount of bt merging in leak check [low]\n"
-"    --show-reachable=no|yes   show reachable blocks in leak check? [no]\n"
-"    --workaround-gcc296-bugs=no|yes  self explanatory [no]\n"
-"\n"
+   MAC_(print_common_usage)();
+   VG_(printf)(
+"    --avoid-strlen-errors=no|yes  suppress errs from inlined strlen [yes]\n"
+   );
+}
+
+void SK_(print_debug_usage)(void)
+{  
+   MAC_(print_common_debug_usage)();
+   VG_(printf)(
 "    --cleanup=no|yes          improve after instrumentation? [yes]\n"
-"    --avoid-strlen-errors=no|yes  suppress errs from inlined strlen [yes]\n";
+   );
 }
 
 
@@ -1536,21 +1537,30 @@
    VG_(needs_core_errors)         ();
    VG_(needs_skin_errors)         ();
    VG_(needs_libc_freeres)        ();
-   VG_(needs_sizeof_shadow_block) ( 1 );
    VG_(needs_shadow_regs)         ();
    VG_(needs_command_line_options)();
    VG_(needs_client_requests)     ();
    VG_(needs_extended_UCode)      ();
    VG_(needs_syscall_wrapper)     ();
-   VG_(needs_alternative_free)    ();
    VG_(needs_sanity_checks)       ();
 
+   MAC_( new_mem_heap)             = & mc_new_mem_heap;
+   MAC_( ban_mem_heap)             = & MC_(make_noaccess);
+   MAC_(copy_mem_heap)             = & mc_copy_address_range_state;
+   MAC_( die_mem_heap)             = & MC_(make_noaccess);
+
    VG_(track_new_mem_startup)      ( & mc_new_mem_startup );
-   VG_(track_new_mem_heap)         ( & mc_new_mem_heap );
    VG_(track_new_mem_stack_signal) ( & MC_(make_writable) );
    VG_(track_new_mem_brk)          ( & MC_(make_writable) );
    VG_(track_new_mem_mmap)         ( & mc_set_perms );
    
+   VG_(track_copy_mem_remap)       ( & mc_copy_address_range_state );
+   VG_(track_change_mem_mprotect)  ( & mc_set_perms );
+      
+   VG_(track_die_mem_stack_signal) ( & MC_(make_noaccess) ); 
+   VG_(track_die_mem_brk)          ( & MC_(make_noaccess) );
+   VG_(track_die_mem_munmap)       ( & MC_(make_noaccess) ); 
+
    VG_(track_new_mem_stack_4)      ( & MAC_(new_mem_stack_4)  );
    VG_(track_new_mem_stack_8)      ( & MAC_(new_mem_stack_8)  );
    VG_(track_new_mem_stack_12)     ( & MAC_(new_mem_stack_12) );
@@ -1558,18 +1568,6 @@
    VG_(track_new_mem_stack_32)     ( & MAC_(new_mem_stack_32) );
    VG_(track_new_mem_stack)        ( & MAC_(new_mem_stack)    );
 
-   VG_(track_copy_mem_heap)        ( & mc_copy_address_range_state );
-   VG_(track_copy_mem_remap)       ( & mc_copy_address_range_state );
-   VG_(track_change_mem_mprotect)  ( & mc_set_perms );
-      
-   VG_(track_ban_mem_heap)         ( & MC_(make_noaccess) );
-   VG_(track_ban_mem_stack)        ( & MC_(make_noaccess) );
-
-   VG_(track_die_mem_heap)         ( & MC_(make_noaccess) );
-   VG_(track_die_mem_stack_signal) ( & MC_(make_noaccess) ); 
-   VG_(track_die_mem_brk)          ( & MC_(make_noaccess) );
-   VG_(track_die_mem_munmap)       ( & MC_(make_noaccess) ); 
-
    VG_(track_die_mem_stack_4)      ( & MAC_(die_mem_stack_4)  );
    VG_(track_die_mem_stack_8)      ( & MAC_(die_mem_stack_8)  );
    VG_(track_die_mem_stack_12)     ( & MAC_(die_mem_stack_12) );
@@ -1577,8 +1575,7 @@
    VG_(track_die_mem_stack_32)     ( & MAC_(die_mem_stack_32) );
    VG_(track_die_mem_stack)        ( & MAC_(die_mem_stack)    );
    
-   VG_(track_bad_free)             ( & MAC_(record_free_error) );
-   VG_(track_mismatched_free)      ( & MAC_(record_freemismatch_error) );
+   VG_(track_ban_mem_stack)        ( & MC_(make_noaccess) );
 
    VG_(track_pre_mem_read)         ( & mc_check_is_readable );
    VG_(track_pre_mem_read_asciiz)  ( & mc_check_is_readable_asciiz );
@@ -1609,7 +1606,7 @@
    MAC_(describe_addr_supp) = MC_(client_perm_maybe_describe);
 
    init_shadow_memory();
-   MAC_(init_prof_mem)();
+   MAC_(common_pre_clo_init)();
 }
 
 void SK_(post_clo_init) ( void )
@@ -1618,20 +1615,8 @@
 
 void SK_(fini) ( void )
 {
-   VG_(print_malloc_stats)();
-
-   if (VG_(clo_verbosity) == 1) {
-      if (!MAC_(clo_leak_check))
-         VG_(message)(Vg_UserMsg, 
-             "For a detailed leak analysis,  rerun with: --leak-check=yes");
-
-      VG_(message)(Vg_UserMsg, 
-                   "For counts of detected errors, rerun with: -v");
-   }
-   if (MAC_(clo_leak_check)) MC_(detect_memory_leaks)();
-
-   MAC_(done_prof_mem)();
-
+   MAC_(common_fini)( MC_(detect_memory_leaks) );
+   
    if (0) {
       VG_(message)(Vg_DebugMsg, 
         "------ Valgrind's client block stats follow ---------------" );
diff --git a/memcheck/mc_replace_strmem.c b/memcheck/mc_replace_strmem.c
new file mode 100644
index 0000000..7776439
--- /dev/null
+++ b/memcheck/mc_replace_strmem.c
@@ -0,0 +1,258 @@
+
+/*--------------------------------------------------------------------*/
+/*--- Replacements for strcpy(), memcpy() et al, which run on the  ---*/
+/*--- simulated CPU.                                               ---*/
+/*---                                          mc_replace_strmem.c ---*/
+/*--------------------------------------------------------------------*/
+
+/*
+   This file is part of MemCheck, a heavyweight Valgrind skin for
+   detecting memory errors, and AddrCheck, a lightweight Valgrind skin 
+   for detecting memory errors.
+
+   Copyright (C) 2000-2002 Julian Seward 
+      jseward@acm.org
+
+   This program is free software; you can redistribute it and/or
+   modify it under the terms of the GNU General Public License as
+   published by the Free Software Foundation; either version 2 of the
+   License, or (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software
+   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+   02111-1307, USA.
+
+   The GNU General Public License is contained in the file COPYING.
+*/
+
+#include "vg_skin.h"
+
+#define __VALGRIND_SOMESKIN_H
+#include "valgrind.h"
+
+/* For snprintf(), ok because on simd CPU */
+#include <stdio.h>
+
+/* ---------------------------------------------------------------------
+   The normal versions of these functions are hyper-optimised, which fools
+   Memcheck and cause spurious value warnings.  So we replace them with
+   simpler versions.  THEY RUN ON SIMD CPU!
+   ------------------------------------------------------------------ */
+
+static __inline__
+Bool is_overlap ( void* dst, const void* src, UInt len )
+{
+   Int diff = src-dst;
+
+   if (diff < 0) 
+      diff = -diff;
+
+   return (diff < len);
+}
+
+static __inline__
+void complain2 ( Char* s, char* dst, const char* src )
+{
+   Char  buf[100];
+   int   res = 0;    /* unused; initialise to shut gcc up */
+
+   snprintf(buf, 100,
+            "Warning: src and dst overlap in %s(%p, %p)", s, dst, src );
+   VALGRIND_MAGIC_SEQUENCE(res, 0, /* irrelevant default */
+                           VG_USERREQ__LOGMESSAGE, buf, 0, 0, 0);
+}
+
+static __inline__
+void complain3 ( Char* s, void* dst, const void* src, int n )
+{
+   Char  buf[100];
+   int   res = 0;    /* unused; initialise to shut gcc up */
+
+   snprintf(buf, 100,
+            "Warning: src and dst overlap in %s(%p, %p, %d)", s, dst, src, n );
+   VALGRIND_MAGIC_SEQUENCE(res, 0, /* irrelevant default */
+                           VG_USERREQ__LOGMESSAGE, buf, 0, 0, 0);
+}
+
+char* strrchr ( const char* s, int c )
+{
+   UChar  ch   = (UChar)((UInt)c);
+   UChar* p    = (UChar*)s;
+   UChar* last = NULL;
+   while (True) {
+      if (*p == ch) last = p;
+      if (*p == 0) return last;
+      p++;
+   }
+}
+
+char* strchr ( const char* s, int c )
+{
+   UChar  ch = (UChar)((UInt)c);
+   UChar* p  = (UChar*)s;
+   while (True) {
+      if (*p == ch) return p;
+      if (*p == 0) return NULL;
+      p++;
+   }
+}
+
+char* strcat ( char* dst, const char* src )
+{
+   Char* dst_orig = dst;
+   while (*dst) dst++;
+   while (*src) *dst++ = *src++;
+   *dst = 0;
+
+   /* This is a bit redundant, I think;  any overlap and the strcat will
+      go forever... or until a seg fault occurs. */
+   if (is_overlap(dst, src, (Addr)dst-(Addr)dst_orig+1))
+      complain2("strcat", dst, src);
+
+   return dst_orig;
+}
+
+char* strncat ( char* dst, const char* src, int n )
+{
+   Char* dst_orig = dst;
+   Int   m = 0;
+
+   while (*dst) dst++;
+   while (*src && m++ < n) *dst++ = *src++;  /* concat at most n chars */
+   *dst = 0;                                 /* then add null (always) */
+
+   /* This checks for overlap after copying, unavoidable without
+      pre-counting lengths... should be ok */
+   if (is_overlap(dst, src, (Addr)dst-(Addr)dst_orig+1))
+      complain3("strncat", dst, src, n);
+
+   return dst_orig;
+}
+
+unsigned int strlen ( const char* str )
+{
+   UInt i = 0;
+   while (str[i] != 0) i++;
+   return i;
+}
+
+char* strcpy ( char* dst, const char* src )
+{
+   Char* dst_orig = dst;
+
+   while (*src) *dst++ = *src++;
+   *dst = 0;
+
+   /* This checks for overlap after copying, unavoidable without
+      pre-counting length... should be ok */
+   if (is_overlap(dst, src, (Addr)dst-(Addr)dst_orig+1))
+      complain2("strcpy", dst, src);
+
+   return dst_orig;
+}
+
+char* strncpy ( char* dst, const char* src, int n )
+{
+   Char* dst_orig = dst;
+   Int   m = 0;
+
+   if (is_overlap(dst, src, n))
+      complain3("strncpy", dst, src, n);
+
+   while (*src && m++ < n) *dst++ = *src++;
+   while (m++ < n) *dst++ = 0;         /* must pad remainder with nulls */
+
+   return dst_orig;
+}
+
+int strncmp ( const unsigned char* s1, const unsigned char* s2, 
+              unsigned int nmax )
+{
+   unsigned int n = 0;
+   while (True) {
+      if (n >= nmax) return 0;
+      if (*s1 == 0 && *s2 == 0) return 0;
+      if (*s1 == 0) return -1;
+      if (*s2 == 0) return 1;
+
+      if (*(unsigned char*)s1 < *(unsigned char*)s2) return -1;
+      if (*(unsigned char*)s1 > *(unsigned char*)s2) return 1;
+
+      s1++; s2++; n++;
+   }
+}
+
+int strcmp ( const char* s1, const char* s2 )
+{
+   register unsigned char c1;
+   register unsigned char c2;
+   while (True) {
+      c1 = *(unsigned char *)s1;
+      c2 = *(unsigned char *)s2;
+      if (c1 != c2) break;
+      if (c1 == 0) break;
+      s1++; s2++;
+   }
+   if ((unsigned char)c1 < (unsigned char)c2) return -1;
+   if ((unsigned char)c1 > (unsigned char)c2) return 1;
+   return 0;
+}
+
+void* memchr(const void *s, int c, unsigned int n)
+{
+   unsigned int i;
+   UChar c0 = (UChar)c;
+   UChar* p = (UChar*)s;
+   for (i = 0; i < n; i++)
+      if (p[i] == c0) return (void*)(&p[i]);
+   return NULL;
+}
+
+void* memcpy( void *dst, const void *src, unsigned int len )
+{
+   register char *d;
+   register char *s;
+
+   if (is_overlap(dst, src, len))
+      complain3("memcpy", dst, src, len);
+      
+   if ( dst > src ) {
+      d = (char *)dst + len - 1;
+      s = (char *)src + len - 1;
+      while ( len >= 4 ) {
+         *d-- = *s--;
+         *d-- = *s--;
+         *d-- = *s--;
+         *d-- = *s--;
+         len -= 4;
+      }
+      while ( len-- ) {
+         *d-- = *s--;
+      }
+   } else if ( dst < src ) {
+      d = (char *)dst;
+      s = (char *)src;
+      while ( len >= 4 ) {
+         *d++ = *s++;
+         *d++ = *s++;
+         *d++ = *s++;
+         *d++ = *s++;
+         len -= 4;
+      }
+      while ( len-- ) {
+         *d++ = *s++;
+      }
+   }
+   return dst;
+}
+
+
+/*--------------------------------------------------------------------*/
+/*--- end                                      mc_replace_strmem.c ---*/
+/*--------------------------------------------------------------------*/
diff --git a/memcheck/tests/Makefile.am b/memcheck/tests/Makefile.am
index 019caef..816c452 100644
--- a/memcheck/tests/Makefile.am
+++ b/memcheck/tests/Makefile.am
@@ -31,6 +31,7 @@
 	inline.stderr.exp inline.stdout.exp inline.vgtest \
 	malloc1.stderr.exp malloc1.vgtest \
 	malloc2.stderr.exp malloc2.vgtest \
+	malloc3.stderr.exp malloc3.stdout.exp malloc3.vgtest \
 	manuel1.stderr.exp manuel1.stdout.exp manuel1.vgtest \
 	manuel2.stderr.exp manuel2.stdout.exp manuel2.vgtest \
 	manuel3.stderr.exp manuel3.vgtest \
@@ -39,7 +40,9 @@
 	mismatches.stderr.exp mismatches.vgtest \
 	mmaptest.stderr.exp mmaptest.vgtest \
 	nanoleak.stderr.exp nanoleak.vgtest \
+	nanoleak_supp.stderr.exp nanoleak_supp.vgtest nanoleak.supp \
 	new_override.stderr.exp new_override.vgtest \
+	overlap.stderr.exp overlap.stdout.exp overlap.vgtest
 	pushfpopf.stderr.exp pushfpopf.stdout.exp pushfpopf.vgtest \
 	realloc1.stderr.exp realloc1.vgtest \
 	realloc2.stderr.exp realloc2.vgtest \
@@ -57,8 +60,8 @@
 noinst_PROGRAMS = \
 	badaddrvalue badfree badjump badloop buflen_check clientperm \
 	doublefree errs1 exitprog fprw fwrite inits inline \
-	malloc1 malloc2 manuel1 manuel2 manuel3 \
-	memalign_test memcmptest mmaptest nanoleak pushfpopf \
+	malloc1 malloc2 malloc3 manuel1 manuel2 manuel3 \
+	memalign_test memcmptest mmaptest nanoleak overlap pushfpopf \
 	realloc1 realloc2 sigaltstack signal2 supp1 supp2 suppfree \
 	trivialleak tronical weirdioctl	\
 	mismatches new_override
@@ -83,6 +86,7 @@
 inline_SOURCES 	        = inline.c
 malloc1_SOURCES 	= malloc1.c
 malloc2_SOURCES 	= malloc2.c
+malloc3_SOURCES 	= malloc3.c
 manuel1_SOURCES 	= manuel1.c
 manuel2_SOURCES 	= manuel2.c
 manuel3_SOURCES 	= manuel3.c
@@ -90,6 +94,7 @@
 memalign_test_SOURCES 	= memalign_test.c
 memcmptest_SOURCES 	= memcmptest.c
 nanoleak_SOURCES 	= nanoleak.c
+overlap_SOURCES 	= overlap.c
 pushfpopf_SOURCES 	= pushfpopf_c.c pushfpopf_s.s
 realloc1_SOURCES 	= realloc1.c
 realloc2_SOURCES 	= realloc2.c
diff --git a/memcheck/tests/badaddrvalue.stderr.exp b/memcheck/tests/badaddrvalue.stderr.exp
index 28143a1..8bb0538 100644
--- a/memcheck/tests/badaddrvalue.stderr.exp
+++ b/memcheck/tests/badaddrvalue.stderr.exp
@@ -4,7 +4,7 @@
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/badaddrvalue)
    Address 0x........ is 1 bytes before a block of size 8 alloc'd
-   at 0x........: malloc (vg_clientfuncs.c:...)
+   at 0x........: malloc (vg_replace_malloc.c:...)
    by 0x........: main (badaddrvalue.c:7)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/badaddrvalue)
@@ -14,7 +14,7 @@
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/badaddrvalue)
    Address 0x........ is 1 bytes before a block of size 8 alloc'd
-   at 0x........: malloc (vg_clientfuncs.c:...)
+   at 0x........: malloc (vg_replace_malloc.c:...)
    by 0x........: main (badaddrvalue.c:7)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/badaddrvalue)
diff --git a/memcheck/tests/badfree-2trace.stderr.exp b/memcheck/tests/badfree-2trace.stderr.exp
index 741fd25..b019d38 100644
--- a/memcheck/tests/badfree-2trace.stderr.exp
+++ b/memcheck/tests/badfree-2trace.stderr.exp
@@ -1,11 +1,11 @@
 
 Invalid free() / delete / delete[]
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (badfree.c:12)
    Address 0x........ is not stack'd, malloc'd or free'd
 
 Invalid free() / delete / delete[]
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (badfree.c:15)
    Address 0x........ is on thread 1's stack
 
diff --git a/memcheck/tests/badfree.stderr.exp b/memcheck/tests/badfree.stderr.exp
index 95616fa..1b7f929 100644
--- a/memcheck/tests/badfree.stderr.exp
+++ b/memcheck/tests/badfree.stderr.exp
@@ -1,13 +1,13 @@
 
 Invalid free() / delete / delete[]
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (badfree.c:12)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/badfree)
    Address 0x........ is not stack'd, malloc'd or free'd
 
 Invalid free() / delete / delete[]
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (badfree.c:15)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/badfree)
diff --git a/memcheck/tests/doublefree.stderr.exp b/memcheck/tests/doublefree.stderr.exp
index c483120..d8b584a 100644
--- a/memcheck/tests/doublefree.stderr.exp
+++ b/memcheck/tests/doublefree.stderr.exp
@@ -1,11 +1,11 @@
 
 Invalid free() / delete / delete[]
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (doublefree.c:10)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/doublefree)
    Address 0x........ is 0 bytes inside a block of size 177 free'd
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (doublefree.c:10)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/doublefree)
diff --git a/memcheck/tests/errs1.stderr.exp b/memcheck/tests/errs1.stderr.exp
index 2de4b48..bc3db41 100644
--- a/memcheck/tests/errs1.stderr.exp
+++ b/memcheck/tests/errs1.stderr.exp
@@ -5,7 +5,7 @@
    by 0x........: aaa (errs1.c:10)
    by 0x........: main (errs1.c:17)
    Address 0x........ is 1 bytes before a block of size 10 alloc'd
-   at 0x........: malloc (vg_clientfuncs.c:...)
+   at 0x........: malloc (vg_replace_malloc.c:...)
    by 0x........: zzzzzzz (errs1.c:12)
    by 0x........: yyy (errs1.c:13)
    by 0x........: xxx (errs1.c:14)
@@ -16,7 +16,7 @@
    by 0x........: aaa (errs1.c:10)
    by 0x........: main (errs1.c:17)
    Address 0x........ is 1 bytes before a block of size 10 alloc'd
-   at 0x........: malloc (vg_clientfuncs.c:...)
+   at 0x........: malloc (vg_replace_malloc.c:...)
    by 0x........: zzzzzzz (errs1.c:12)
    by 0x........: yyy (errs1.c:13)
    by 0x........: xxx (errs1.c:14)
diff --git a/memcheck/tests/exitprog.stderr.exp b/memcheck/tests/exitprog.stderr.exp
index 97a58a4..40b39a6 100644
--- a/memcheck/tests/exitprog.stderr.exp
+++ b/memcheck/tests/exitprog.stderr.exp
@@ -4,7 +4,7 @@
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/exitprog)
    Address 0x........ is 0 bytes after a block of size 1000000 alloc'd
-   at 0x........: malloc (vg_clientfuncs.c:...)
+   at 0x........: malloc (vg_replace_malloc.c:...)
    by 0x........: main (exitprog.c:12)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/exitprog)
diff --git a/memcheck/tests/filter_stderr b/memcheck/tests/filter_stderr
index 511dca0..df2e946 100755
--- a/memcheck/tests/filter_stderr
+++ b/memcheck/tests/filter_stderr
@@ -7,8 +7,8 @@
 # Anonymise addresses
 $dir/../../tests/filter_addresses                       |
 
-# Anonymise line numbers in vg_clientfuncs.c
-sed "s/vg_clientfuncs.c:[0-9]\+/vg_clientfuncs.c:.../"  |
+# Anonymise line numbers in vg_replace_malloc.c
+sed "s/vg_replace_malloc.c:[0-9]\+/vg_replace_malloc.c:.../"  |
 
 $dir/../../tests/filter_test_paths                      |
 
diff --git a/memcheck/tests/fprw.stderr.exp b/memcheck/tests/fprw.stderr.exp
index 53fcbdf..a7f6939 100644
--- a/memcheck/tests/fprw.stderr.exp
+++ b/memcheck/tests/fprw.stderr.exp
@@ -24,7 +24,7 @@
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/fprw)
    Address 0x........ is 0 bytes inside a block of size 8 free'd
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (fprw.c:18)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/fprw)
@@ -34,7 +34,7 @@
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/fprw)
    Address 0x........ is 0 bytes inside a block of size 8 free'd
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (fprw.c:18)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/fprw)
@@ -44,7 +44,7 @@
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/fprw)
    Address 0x........ is 0 bytes inside a block of size 4 free'd
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (fprw.c:19)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/fprw)
@@ -54,13 +54,13 @@
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/fprw)
    Address 0x........ is 0 bytes inside a block of size 4 free'd
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (fprw.c:19)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/fprw)
 
 Invalid free() / delete / delete[]
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (fprw.c:22)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/fprw)
@@ -71,7 +71,7 @@
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/fprw)
    Address 0x........ is 0 bytes inside a block of size 4 alloc'd
-   at 0x........: malloc (vg_clientfuncs.c:...)
+   at 0x........: malloc (vg_replace_malloc.c:...)
    by 0x........: main (fprw.c:23)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/fprw)
diff --git a/memcheck/tests/fwrite.stderr.exp b/memcheck/tests/fwrite.stderr.exp
index 11b0eba..ed6aa87 100644
--- a/memcheck/tests/fwrite.stderr.exp
+++ b/memcheck/tests/fwrite.stderr.exp
@@ -4,7 +4,7 @@
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/fwrite)
    Address 0x........ is 0 bytes inside a block of size 10 alloc'd
-   at 0x........: malloc (vg_clientfuncs.c:...)
+   at 0x........: malloc (vg_replace_malloc.c:...)
    by 0x........: main (fwrite.c:6)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/fwrite)
diff --git a/memcheck/tests/inline.stderr.exp b/memcheck/tests/inline.stderr.exp
index 9d9f79a..ffdb214 100644
--- a/memcheck/tests/inline.stderr.exp
+++ b/memcheck/tests/inline.stderr.exp
@@ -5,7 +5,7 @@
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/inline)
    Address 0x........ is 0 bytes after a block of size 40 alloc'd
-   at 0x........: calloc (vg_clientfuncs.c:...)
+   at 0x........: calloc (vg_replace_malloc.c:...)
    by 0x........: main (inline.c:17)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/inline)
diff --git a/memcheck/tests/malloc1.stderr.exp b/memcheck/tests/malloc1.stderr.exp
index 1571222..d450a38 100644
--- a/memcheck/tests/malloc1.stderr.exp
+++ b/memcheck/tests/malloc1.stderr.exp
@@ -5,7 +5,7 @@
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/malloc1)
    Address 0x........ is 1 bytes inside a block of size 10 free'd
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: really (malloc1.c:19)
    by 0x........: main (malloc1.c:9)
    by 0x........: __libc_start_main (...libc...)
@@ -16,7 +16,7 @@
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/malloc1)
    Address 0x........ is 1 bytes before a block of size 10 alloc'd
-   at 0x........: malloc (vg_clientfuncs.c:...)
+   at 0x........: malloc (vg_replace_malloc.c:...)
    by 0x........: really (malloc1.c:21)
    by 0x........: main (malloc1.c:9)
    by 0x........: __libc_start_main (...libc...)
diff --git a/memcheck/tests/malloc2.stderr.exp b/memcheck/tests/malloc2.stderr.exp
index 141a1ca..5463e17 100644
--- a/memcheck/tests/malloc2.stderr.exp
+++ b/memcheck/tests/malloc2.stderr.exp
@@ -4,18 +4,18 @@
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/malloc2)
    Address 0x........ is 0 bytes inside a block of size 429 free'd
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (malloc2.c:38)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/malloc2)
 
 Invalid free() / delete / delete[]
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (malloc2.c:43)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/malloc2)
    Address 0x........ is 0 bytes inside a block of size 429 free'd
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (malloc2.c:38)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/malloc2)
diff --git a/corecheck/tests/malloc3.c b/memcheck/tests/malloc3.c
similarity index 100%
rename from corecheck/tests/malloc3.c
rename to memcheck/tests/malloc3.c
diff --git a/memcheck/tests/malloc3.stderr.exp b/memcheck/tests/malloc3.stderr.exp
new file mode 100644
index 0000000..9a908f3
--- /dev/null
+++ b/memcheck/tests/malloc3.stderr.exp
@@ -0,0 +1,10 @@
+
+Warning: silly arg (-1) to malloc()
+Warning: silly args (0,-1) to calloc()
+Warning: silly args (-1,-1) to calloc()
+
+ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
+malloc/free: in use at exit: 0 bytes in 0 blocks.
+malloc/free: 2 allocs, 2 frees, 0 bytes allocated.
+For a detailed leak analysis,  rerun with: --leak-check=yes
+For counts of detected errors, rerun with: -v
diff --git a/corecheck/tests/malloc3.stdout.exp b/memcheck/tests/malloc3.stdout.exp
similarity index 100%
rename from corecheck/tests/malloc3.stdout.exp
rename to memcheck/tests/malloc3.stdout.exp
diff --git a/corecheck/tests/malloc3.vgtest b/memcheck/tests/malloc3.vgtest
similarity index 100%
rename from corecheck/tests/malloc3.vgtest
rename to memcheck/tests/malloc3.vgtest
diff --git a/memcheck/tests/memalign_test.stderr.exp b/memcheck/tests/memalign_test.stderr.exp
index a23a969..7c69342 100644
--- a/memcheck/tests/memalign_test.stderr.exp
+++ b/memcheck/tests/memalign_test.stderr.exp
@@ -1,11 +1,11 @@
 
 Invalid free() / delete / delete[]
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (memalign_test.c:17)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/memalign_test)
    Address 0x........ is 0 bytes inside a block of size 111110 free'd
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (memalign_test.c:15)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/memalign_test)
diff --git a/memcheck/tests/mismatches.stderr.exp b/memcheck/tests/mismatches.stderr.exp
index d216443..93ad6da 100644
--- a/memcheck/tests/mismatches.stderr.exp
+++ b/memcheck/tests/mismatches.stderr.exp
@@ -1,66 +1,66 @@
 
 Mismatched free() / delete / delete []
-   at 0x........: __builtin_delete (vg_clientfuncs.c:...)
+   at 0x........: __builtin_delete (vg_replace_malloc.c:...)
    by 0x........: main (mismatches.cpp:6)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/mismatches)
    Address 0x........ is 0 bytes inside a block of size 10 alloc'd
-   at 0x........: malloc (vg_clientfuncs.c:...)
+   at 0x........: malloc (vg_replace_malloc.c:...)
    by 0x........: main (mismatches.cpp:5)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/mismatches)
 
 Mismatched free() / delete / delete []
-   at 0x........: __builtin_vec_delete (vg_clientfuncs.c:...)
+   at 0x........: __builtin_vec_delete (vg_replace_malloc.c:...)
    by 0x........: main (mismatches.cpp:8)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/mismatches)
    Address 0x........ is 0 bytes inside a block of size 10 alloc'd
-   at 0x........: malloc (vg_clientfuncs.c:...)
+   at 0x........: malloc (vg_replace_malloc.c:...)
    by 0x........: main (mismatches.cpp:7)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/mismatches)
 
 Mismatched free() / delete / delete []
-   at 0x........: __builtin_delete (vg_clientfuncs.c:...)
+   at 0x........: __builtin_delete (vg_replace_malloc.c:...)
    by 0x........: main (mismatches.cpp:13)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/mismatches)
    Address 0x........ is 0 bytes inside a block of size 40 alloc'd
-   at 0x........: __builtin_vec_new (vg_clientfuncs.c:...)
+   at 0x........: __builtin_vec_new (vg_replace_malloc.c:...)
    by 0x........: main (mismatches.cpp:12)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/mismatches)
 
 Mismatched free() / delete / delete []
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (mismatches.cpp:15)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/mismatches)
    Address 0x........ is 0 bytes inside a block of size 40 alloc'd
-   at 0x........: __builtin_vec_new (vg_clientfuncs.c:...)
+   at 0x........: __builtin_vec_new (vg_replace_malloc.c:...)
    by 0x........: main (mismatches.cpp:14)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/mismatches)
 
 Mismatched free() / delete / delete []
-   at 0x........: __builtin_vec_delete (vg_clientfuncs.c:...)
+   at 0x........: __builtin_vec_delete (vg_replace_malloc.c:...)
    by 0x........: main (mismatches.cpp:20)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/mismatches)
    Address 0x........ is 0 bytes inside a block of size 4 alloc'd
-   at 0x........: __builtin_new (vg_clientfuncs.c:...)
+   at 0x........: __builtin_new (vg_replace_malloc.c:...)
    by 0x........: main (mismatches.cpp:19)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/mismatches)
 
 Mismatched free() / delete / delete []
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: main (mismatches.cpp:22)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/mismatches)
    Address 0x........ is 0 bytes inside a block of size 4 alloc'd
-   at 0x........: __builtin_new (vg_clientfuncs.c:...)
+   at 0x........: __builtin_new (vg_replace_malloc.c:...)
    by 0x........: main (mismatches.cpp:21)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/mismatches)
diff --git a/memcheck/tests/nanoleak.stderr.exp b/memcheck/tests/nanoleak.stderr.exp
index 96eefd1..3183eee 100644
--- a/memcheck/tests/nanoleak.stderr.exp
+++ b/memcheck/tests/nanoleak.stderr.exp
@@ -8,7 +8,7 @@
 checked ... bytes.
 
 1000 bytes in 1 blocks are definitely lost in loss record 1 of 1
-   at 0x........: malloc (vg_clientfuncs.c:...)
+   at 0x........: malloc (vg_replace_malloc.c:...)
    by 0x........: main (nanoleak.c:6)
    by 0x........: __libc_start_main (...libc...)
    by 0x........: (within /.../tests/nanoleak)
diff --git a/memcheck/tests/nanoleak.supp b/memcheck/tests/nanoleak.supp
new file mode 100644
index 0000000..6c87853
--- /dev/null
+++ b/memcheck/tests/nanoleak.supp
@@ -0,0 +1,8 @@
+{
+   this_is_the_nanoleak_suppression_name
+   Addrcheck,Memcheck:Leak
+   fun:malloc
+   fun:main
+   fun:__libc_start_main
+}
+
diff --git a/memcheck/tests/nanoleak_supp.stderr.exp b/memcheck/tests/nanoleak_supp.stderr.exp
new file mode 100644
index 0000000..9fb7bfe
--- /dev/null
+++ b/memcheck/tests/nanoleak_supp.stderr.exp
@@ -0,0 +1,17 @@
+
+
+ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
+malloc/free: in use at exit: 1000 bytes in 1 blocks.
+malloc/free: 1 allocs, 0 frees, 1000 bytes allocated.
+For counts of detected errors, rerun with: -v
+searching for pointers to 1 not-freed blocks.
+checked ... bytes.
+
+LEAK SUMMARY:
+   definitely lost: 0 bytes in 0 blocks.
+   possibly lost:   0 bytes in 0 blocks.
+   still reachable: 0 bytes in 0 blocks.
+        suppressed: 1000 bytes in 1 blocks.
+Reachable blocks (those to which a pointer was found) are not shown.
+To see them, rerun with: --show-reachable=yes
+
diff --git a/memcheck/tests/nanoleak_supp.vgtest b/memcheck/tests/nanoleak_supp.vgtest
new file mode 100644
index 0000000..766099c
--- /dev/null
+++ b/memcheck/tests/nanoleak_supp.vgtest
@@ -0,0 +1,3 @@
+vgopts: --leak-check=yes --suppressions=nanoleak.supp
+prog: nanoleak
+stderr_filter: filter_leak_check_size
diff --git a/memcheck/tests/overlap.c b/memcheck/tests/overlap.c
new file mode 100644
index 0000000..04d2e37
--- /dev/null
+++ b/memcheck/tests/overlap.c
@@ -0,0 +1,115 @@
+#include <string.h>
+#include <stdio.h>
+
+char b[50];
+
+void reset_b(void)
+{
+   int i;
+
+   for (i = 0; i < 50; i++)
+      b[i] = '_';
+   b[49] = '\0';
+}
+
+void reset_b2(void)
+{
+   reset_b();
+   strcpy(b, "ABCDEFG");
+}
+
+int main(void)
+{
+   char x[100];
+   char a[] = "abcdefghijklmnopqrstuvwxyz";
+   int  i;
+
+   /* testing memcpy/strcpy overlap */
+
+   for (i = 0; i < 50; i++) {
+      x[i] = i+1;    // don't put any zeroes in there
+   }
+   for (i = 50; i < 100; i++) {
+      // because of the errors, the strcpy's will overrun, so put some
+      // zeroes in the second half to stop them eventually
+      x[i] = 0;  
+               
+   }
+
+   memcpy(x+20, x, 20);    // ok
+   memcpy(x+20, x, 21);    // overlap
+   memcpy(x, x+20, 20);    // ok
+   memcpy(x, x+20, 21);    // overlap
+
+   strncpy(x+20, x, 20);    // ok
+   strncpy(x+20, x, 21);    // overlap
+   strncpy(x, x+20, 20);    // ok
+   strncpy(x, x+20, 21);    // overlap
+   
+   x[39] = '\0';
+   strcpy(x, x+20);    // ok
+
+   x[39] = 39;
+   x[40] = '\0';
+   strcpy(x, x+20);    // overlap
+
+   x[19] = '\0';
+   strcpy(x+20, x);    // ok
+
+/*
+   x[19] = 19;
+   x[20] = '\0';
+   strcpy(x+20, x);    // overlap, but runs forever (or until it seg faults)
+*/
+
+   /* testing strcpy, strncpy() */
+
+   reset_b();
+   printf("`%s'\n", b);
+
+   strcpy(b, a);
+   printf("`%s'\n", b);
+   
+   reset_b();
+   strncpy(b, a, 25);
+   printf("`%s'\n", b);
+
+   reset_b();
+   strncpy(b, a, 26);
+   printf("`%s'\n", b);
+
+   reset_b();
+   strncpy(b, a, 27);
+   printf("`%s'\n", b);
+
+   printf("\n");
+
+   /* testing strncat() */
+
+   reset_b2();
+   printf("`%s'\n", b);
+   
+   reset_b2();
+   strcat(b, a);
+   printf("`%s'\n", b);
+   
+   reset_b2();
+   strncat(b, a, 25);
+   printf("`%s'\n", b);
+   
+   reset_b2();
+   strncat(b, a, 26);
+   printf("`%s'\n", b);
+   
+   reset_b2();
+   strncat(b, a, 27);
+   printf("`%s'\n", b);
+
+   /* Nb: can't actually get strcat warning -- if any overlap occurs, it will
+      always run forever, I think... */
+
+   strncat(a+20, a, 21);
+   strncat(a, a+20, 21);
+
+   return 0;
+}
diff --git a/memcheck/tests/overlap.stderr.exp b/memcheck/tests/overlap.stderr.exp
new file mode 100644
index 0000000..ed33593
--- /dev/null
+++ b/memcheck/tests/overlap.stderr.exp
@@ -0,0 +1,14 @@
+
+Warning: src and dst overlap in memcpy(0x........, 0x........, 21)
+Warning: src and dst overlap in memcpy(0x........, 0x........, 21)
+Warning: src and dst overlap in strncpy(0x........, 0x........, 21)
+Warning: src and dst overlap in strncpy(0x........, 0x........, 21)
+Warning: src and dst overlap in strcpy(0x........, 0x........)
+Warning: src and dst overlap in strncat(0x........, 0x........, 21)
+Warning: src and dst overlap in strncat(0x........, 0x........, 21)
+
+ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
+malloc/free: in use at exit: 0 bytes in 0 blocks.
+malloc/free: 0 allocs, 0 frees, 0 bytes allocated.
+For a detailed leak analysis,  rerun with: --leak-check=yes
+For counts of detected errors, rerun with: -v
diff --git a/memcheck/tests/overlap.stdout.exp b/memcheck/tests/overlap.stdout.exp
new file mode 100644
index 0000000..12cb02e
--- /dev/null
+++ b/memcheck/tests/overlap.stdout.exp
@@ -0,0 +1,11 @@
+`_________________________________________________'
+`abcdefghijklmnopqrstuvwxyz'
+`abcdefghijklmnopqrstuvwxy________________________'
+`abcdefghijklmnopqrstuvwxyz_______________________'
+`abcdefghijklmnopqrstuvwxyz'
+
+`ABCDEFG'
+`ABCDEFGabcdefghijklmnopqrstuvwxyz'
+`ABCDEFGabcdefghijklmnopqrstuvwxy'
+`ABCDEFGabcdefghijklmnopqrstuvwxyz'
+`ABCDEFGabcdefghijklmnopqrstuvwxyz'
diff --git a/memcheck/tests/overlap.vgtest b/memcheck/tests/overlap.vgtest
new file mode 100644
index 0000000..7d0d75e
--- /dev/null
+++ b/memcheck/tests/overlap.vgtest
@@ -0,0 +1 @@
+prog: overlap
diff --git a/memcheck/tests/suppfree.stderr.exp b/memcheck/tests/suppfree.stderr.exp
index 149bf84..5f4f4d5 100644
--- a/memcheck/tests/suppfree.stderr.exp
+++ b/memcheck/tests/suppfree.stderr.exp
@@ -1,11 +1,11 @@
 
 Invalid free() / delete / delete[]
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: ddd (suppfree.c:7)
    by 0x........: ccc (suppfree.c:12)
    by 0x........: bbb (suppfree.c:17)
    Address 0x........ is 0 bytes inside a block of size 10 free'd
-   at 0x........: free (vg_clientfuncs.c:...)
+   at 0x........: free (vg_replace_malloc.c:...)
    by 0x........: ddd (suppfree.c:6)
    by 0x........: ccc (suppfree.c:12)
    by 0x........: bbb (suppfree.c:17)
diff --git a/memcheck/tests/trivialleak.stderr.exp b/memcheck/tests/trivialleak.stderr.exp
index 42cd261..77e0a60 100644
--- a/memcheck/tests/trivialleak.stderr.exp
+++ b/memcheck/tests/trivialleak.stderr.exp
@@ -8,7 +8,7 @@
 checked ... bytes.
 
 1000 bytes in 1000 blocks are definitely lost in loss record 1 of 1
-   at 0x........: malloc (vg_clientfuncs.c:...)
+   at 0x........: malloc (vg_replace_malloc.c:...)
    by 0x........: test (trivialleak.c:8)
    by 0x........: main (trivialleak.c:12)
    by 0x........: __libc_start_main (...libc...)
diff --git a/tests/filter_stderr_basic b/tests/filter_stderr_basic
index 72e8b59..f8d2e44 100755
--- a/tests/filter_stderr_basic
+++ b/tests/filter_stderr_basic
@@ -6,17 +6,11 @@
 # Remove ==pid== and --pid-- and ++pid++ strings 
 sed "s/\(==\|--\|++\)[0-9]\{3,5\}\1 //"                                |
 
-# Remove intro line for 1.0.X branch
-sed "/valgrind-.*, a memory error detector for x86 GNU\/Linux./d"       |
-sed "/cachegrind-.*, an I1.D1.L2 cache profiler for x86 GNU\/Linux./d"  |
-
-# Remove "<name>, a <description> for x86-linux." line 
-# and the following copyright notice line for post-1.0.X branch
+# Remove "<name>, a <description> for x86-linux." line and the following
+# copyright notice line.  Works for skin and core intro lines.
 sed "/^.*, .* for x86-linux\./ , /./ d"                                | 
 
 # Remove other introductory lines
-sed "/Built with valgrind-.*, a program execution monitor./d"          |
-sed "/Copyright (C) 2000-2..., and GNU GPL'd, by Julian Seward\./d"    |
 sed "/Estimated CPU clock rate is [0-9]\+ MHz/d"                       |
 sed "/For more details, rerun with: -v/d"