Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1 | [Sat Mar 2 10:32:33 PST 1996 KERNEL_BUG-HOWTO lm@sgi.com (Larry McVoy)] |
| 2 | |
| 3 | This is how to track down a bug if you know nothing about kernel hacking. |
| 4 | It's a brute force approach but it works pretty well. |
| 5 | |
| 6 | You need: |
| 7 | |
| 8 | . A reproducible bug - it has to happen predictably (sorry) |
| 9 | . All the kernel tar files from a revision that worked to the |
| 10 | revision that doesn't |
| 11 | |
| 12 | You will then do: |
| 13 | |
| 14 | . Rebuild a revision that you believe works, install, and verify that. |
| 15 | . Do a binary search over the kernels to figure out which one |
| 16 | introduced the bug. I.e., suppose 1.3.28 didn't have the bug, but |
| 17 | you know that 1.3.69 does. Pick a kernel in the middle and build |
| 18 | that, like 1.3.50. Build & test; if it works, pick the mid point |
| 19 | between .50 and .69, else the mid point between .28 and .50. |
| 20 | . You'll narrow it down to the kernel that introduced the bug. You |
| 21 | can probably do better than this but it gets tricky. |
| 22 | |
| 23 | . Narrow it down to a subdirectory |
| 24 | |
| 25 | - Copy kernel that works into "test". Let's say that 3.62 works, |
| 26 | but 3.63 doesn't. So you diff -r those two kernels and come |
| 27 | up with a list of directories that changed. For each of those |
| 28 | directories: |
| 29 | |
| 30 | Copy the non-working directory next to the working directory |
| 31 | as "dir.63". |
| 32 | One directory at time, try moving the working directory to |
| 33 | "dir.62" and mv dir.63 dir"time, try |
| 34 | |
| 35 | mv dir dir.62 |
| 36 | mv dir.63 dir |
| 37 | find dir -name '*.[oa]' -print | xargs rm -f |
| 38 | |
| 39 | And then rebuild and retest. Assuming that all related |
| 40 | changes were contained in the sub directory, this should |
| 41 | isolate the change to a directory. |
| 42 | |
| 43 | Problems: changes in header files may have occurred; I've |
| 44 | found in my case that they were self explanatory - you may |
| 45 | or may not want to give up when that happens. |
| 46 | |
| 47 | . Narrow it down to a file |
| 48 | |
| 49 | - You can apply the same technique to each file in the directory, |
| 50 | hoping that the changes in that file are self contained. |
| 51 | |
| 52 | . Narrow it down to a routine |
| 53 | |
| 54 | - You can take the old file and the new file and manually create |
| 55 | a merged file that has |
| 56 | |
| 57 | #ifdef VER62 |
| 58 | routine() |
| 59 | { |
| 60 | ... |
| 61 | } |
| 62 | #else |
| 63 | routine() |
| 64 | { |
| 65 | ... |
| 66 | } |
| 67 | #endif |
| 68 | |
| 69 | And then walk through that file, one routine at a time and |
| 70 | prefix it with |
| 71 | |
| 72 | #define VER62 |
| 73 | /* both routines here */ |
| 74 | #undef VER62 |
| 75 | |
| 76 | Then recompile, retest, move the ifdefs until you find the one |
| 77 | that makes the difference. |
| 78 | |
| 79 | Finally, you take all the info that you have, kernel revisions, bug |
| 80 | description, the extent to which you have narrowed it down, and pass |
| 81 | that off to whomever you believe is the maintainer of that section. |
| 82 | A post to linux.dev.kernel isn't such a bad idea if you've done some |
| 83 | work to narrow it down. |
| 84 | |
| 85 | If you get it down to a routine, you'll probably get a fix in 24 hours. |
| 86 | |
| 87 | My apologies to Linus and the other kernel hackers for describing this |
| 88 | brute force approach, it's hardly what a kernel hacker would do. However, |
| 89 | it does work and it lets non-hackers help fix bugs. And it is cool |
| 90 | because Linux snapshots will let you do this - something that you can't |
| 91 | do with vendor supplied releases. |
| 92 | |