Majorly update and expand, adding workarounds for more or less all
failures with known workarounds.


git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1541 a5019735-40e9-0310-863c-91ae7b9d1cf9
diff --git a/FAQ.txt b/FAQ.txt
index a6804ac..75ec7ca 100644
--- a/FAQ.txt
+++ b/FAQ.txt
@@ -1,9 +1,11 @@
 
-A mini-FAQ for valgrind, version 1.9.5
+A mini-FAQ for valgrind, version 1.9.6
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Last revised 22 Apr 2003
 ~~~~~~~~~~~~~~~~~~~~~~~~
 
+-----------------------------------------------------------------
+
 Q1. Programs run OK on valgrind, but at exit produce a bunch
     of errors a bit like this
 
@@ -30,16 +32,13 @@
     Problem is that running __libc_freeres() in older glibc versions
     causes this crash.  
 
-    WORKAROUND FOR 1.0.X versions of valgrind: The simple fix is to
-    find in valgrind's sources, the one and only call to
-    __libc_freeres() and comment it out, then rebuild the system.  In
-    the 1.0.3 version, this call is on line 584 of vg_clientfuncs.c.
-    This may mean you get false reports of space leaks in glibc, but
-    it at least avoids the crash.
-
     WORKAROUND FOR 1.1.X and later versions of valgrind: use the
-    --run-libc-freeres=no flag.
+    --run-libc-freeres=no flag.  You may then get space leak
+    reports for glibc-allocations (please _don't_ report these
+    to the glibc people, since they are not real leaks), but at
+    least the program runs.
 
+-----------------------------------------------------------------
 
 Q2. My program dies complaining that syscall 197 is unimplemented.
 
@@ -49,17 +48,17 @@
     specific, glibc is asking your kernel to do a syscall which is
     not listed in /usr/include/asm/unistd.h.
 
-    The fix is simple.  Somewhere near the top of vg_syscall_mem.c,
-    add the following line:
+    The fix is simple.  Somewhere near the top of
+    coregrind/vg_syscalls.c, add the following line:
 
        #define __NR_fstat64            197
 
     Rebuild and try again.  The above line should appear before any
     uses of the __NR_fstat64 symbol in that file.  If you look at the
-    place where __NR_fstat64 is used in vg_syscall_mem.c, it will be
-    obvious why this fix works.  NOTE for valgrind versions 1.1.0
-    and later, the relevant file is actually coregrind/vg_syscalls.c.
+    place where __NR_fstat64 is used in vg_syscalls.c, it will be
+    obvious why this fix works.
 
+-----------------------------------------------------------------
 
 Q3. My (buggy) program dies like this:
       valgrind: vg_malloc2.c:442 (bszW_to_pszW): 
@@ -74,37 +73,16 @@
     is to fix your program so that it doesn't do any illegal memory
     accesses.  The above failure will hopefully go away after that.
 
+-----------------------------------------------------------------
 
 Q4. I'm running Red Hat Advanced Server.  Valgrind always segfaults at
     startup.  
 
-A4. Known issue with RHAS 2.1.  The following kludge works, but 
-    is too gruesome to put in the sources permanently.  Try it.
-    Last verified as working on RHAS 2.1 at 20021008.
+A4. Known issue with RHAS 2.1, due to funny stack permissions at
+    startup.  However, valgrind-1.9.4 and later automatically handle
+    this correctly, and should not segfault.
 
-    Find the following comment in vg_main.c -- in 1.0.4 this is at
-    line 636:
-
-       /* we locate: NEW_AUX_ENT(1, AT_PAGESZ, ELF_EXEC_PAGESIZE) in
-          the elf interpreter table */
-
-    Immediately _before_ this comment add the following:
-
-       /* HACK for R H Advanced server.  Ignore all the above and
-          start the search 18 pages below the "obvious" start point.
-          God knows why.  Seems like we can't go into the highest 18
-          pages of the stack.  This is not good! -- the 18 pages is
-          determined just by looking for the highest proddable
-          address.  It would be nice to see some kernel or libc or
-          something code to justify this.  */
-
-       /* 0xBFFEE000 is 0xC0000000 - 18 pages */
-       sp = 0xBFFEE000; 
-
-       /* end of HACK for R H Advanced server. */
-
-    Obviously the assignment to sp is the only important line.
-
+-----------------------------------------------------------------
 
 Q5. I try running "valgrind my_program", but my_program runs normally,
     and Valgrind doesn't emit any output at all.
@@ -121,6 +99,7 @@
 
     it my_program is statically linked.
 
+-----------------------------------------------------------------
 
 Q6. I try running "valgrind my_program" and get Valgrind's startup message,
     but I don't get any errors and I know my program has errors.
@@ -133,5 +112,197 @@
 
     To trace child processes, use the --trace-children=yes option.
 
+    If you are tracing large trees of processes, it can be less
+    disruptive to have the output sent over the network.  Give
+    valgrind the flag --logsocket=127.0.0.1:12345 (if you want 
+    logging output sent to port 12345 on localhost).  You can
+    use the valgrind-listener program to listen on that port:
+       valgrind-listener 12345
+    Obviously you have to start the listener process first.
+    See the documentation for more details.
+
+-----------------------------------------------------------------
+
+Q7. My threaded server process runs unbelievably slowly on
+    valgrind.  So slowly, in fact, that at first I thought it
+    had completely locked up.
+
+A7. We are not completely sure about this, but one possibility
+    is that laptops with power management fool valgrind's 
+    timekeeping mechanism, which is (somewhat in error) based
+    on the x86 RDTSC instruction.  A "fix" which is claimed to
+    work is to run some other cpu-intensive process at the same
+    time, so that the laptop's power-management clock-slowing
+    does not kick in.  We would be interested in hearing more
+    feedback on this.
+
+-----------------------------------------------------------------
+
+Q8. My program dies (exactly) like this:
+
+      REPE then 0xF
+      valgrind: the `impossible' happened:
+         Unhandled REPE case
+
+A8. Yeah ... that I believe is a P4 specific instruction.  Are you
+    building your app with -march=pentium4 or something like that?
+    Others have reported that removing the flag works around this.
+    In fact this is pretty easy to fix and I do have it on my
+    to-do-for-1.9.6 list.
+
+    I'd be interested to hear if you can get rid of it by changing
+    your application build flags.
+
+-----------------------------------------------------------------
+
+Q9. My program dies complaining that __libc_current_sigrtmin
+    is unimplemented.
+
+A9. Try the following.  It is an experiment, but it might work.
+    We would very much appreciate you telling us if it does/
+    does not work for you.
+
+    In vg_libpthread.c, add the 3 functions below.
+
+    In vg_libpthread_unimp.c, remove the stubs for the same 3
+    functions.
+
+    Let me know if it helps.  Quite a lot of other valgrind users
+    complain about this, but I have never been able to reproduce it,
+    so fixing it isn't easy.  So it's useful if you can try.
+
+       int __libc_current_sigrtmin (void)
+       {
+         return -1;
+       }
+
+       int __libc_current_sigrtmax (void)
+       {
+         return -1;
+       }
+
+       int __libc_allocate_rtsig (int high)
+       {
+         return -1;
+       }
+
+-----------------------------------------------------------------
+
+Q10. I upgraded to Red Hat 9 and threaded programs now act
+     strange / deadlock when they didn't before.
+
+A10. Thread support on glibc 2.3.2+ with NPTL is not as 
+     good as on older LinuxThreads-based systems.  We have
+     this under consideration.  Avoid Red Hat >= 8.1 for
+     the time being, if you can.
+
+-----------------------------------------------------------------
+
+Q11. I really need to use the NVidia libGL.so in my app.
+     Help!
+
+A11. NVidia also noticed this it seems, and the "latest" drivers
+     (version 4349, apparently) come with this text
+
+        DISABLING CPU SPECIFIC FEATURES
+
+        Setting the environment variable __GL_FORCE_GENERIC_CPU to a
+        non-zero value will inhibit the use of CPU specific features
+        such as MMX, SSE, or 3DNOW!.  Use of this option may result in
+        performance loss.  This option may be useful in conjunction with
+        software such as the Valgrind memory debugger.
+
+     Set __GL_FORCE_GENERIC_CPU=1 and Valgrind should work.  This has
+     been confirmed by various people.  Thanks NVidia!
+
+-----------------------------------------------------------------
+
+Q12. My program dies like this (often at exit):
+
+     VG_(mash_LD_PRELOAD_and_LD_LIBRARY_PATH): internal error:
+     (loads of text)
+
+A12. We're not entirely sure about this, and would appreciate
+     someone sending a simple test case for us to look at.
+     One possible cause is that your program modifies its
+     environment variables, possibly including zeroing them
+     all.  Avoid this if you can.
+
+     In any case, you may be able to work around it like this:
+     Comment out the 
+     call to VG_(core_panic) at coregrind/vg_main.c:1647 and see
+     if that helps.  The text of coregrind/vg_main.c:1647 is as follows:
+
+     VG_(core_panic)("VG_(mash_LD_PRELOAD_and_LD_LIBRARY_PATH) failed\n");
+
+     and so it's this call you want to comment out.
+
+-----------------------------------------------------------------
+
+Q13.  My program dies like this:
+
+      error: /lib/librt.so.1: symbol __pthread_clock_settime, version
+      GLIBC_PRIVATE not defined in file libpthread.so.0 with link time
+      reference
+
+A13.  This is a total swamp.  Nevertheless there is a way out.
+      It's a problem which is not easy to fix.  Really the problem is
+      that /lib/librt.so.1 refers to some symbols
+      __pthread_clock_settime and __pthread_clock_gettime in
+      /lib/libpthread.so which are not intended to be exported, ie
+      they are private.
+
+      Best solution is to ensure your program does not use
+      /lib/librt.so.1.
+
+      However .. since you're probably not using it directly, or even
+      knowingly, that's hard to do.  You might instead be able to fix
+      it by playing around with coregrind/vg_libpthread.vs.  Things to
+      try:
+
+      Remove this
+
+         GLIBC_PRIVATE {
+            __pthread_clock_gettime;
+            __pthread_clock_settime;
+         };
+
+      or maybe remove this
+
+         GLIBC_2.2.3 {
+            __pthread_clock_gettime;
+            __pthread_clock_settime;
+         } GLIBC_2.2;
+
+      or maybe add this
+
+         GLIBC_2.2.4 {
+            __pthread_clock_gettime;
+            __pthread_clock_settime;
+         } GLIBC_2.2;
+
+         GLIBC_2.2.5 {
+            __pthread_clock_gettime;
+            __pthread_clock_settime;
+         } GLIBC_2.2;
+
+      or some combination of the above.  After each change you need to
+      delete coregrind/libpthread.so and do make && make install.
+
+      I just don't know if any of the above will work.  If you can
+      find a solution which works, I would be interested to hear it.
+
+      To which someone replied:
+
+      I deleted this:
+
+          GLIBC_2.2.3 { 
+             __pthread_clock_gettime; 
+             __pthread_clock_settime; 
+          } GLIBC_2.2; 
+
+      and it worked.
+
+-----------------------------------------------------------------
 
 (this is the end of the FAQ.)