Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935818AbcJ0AXY (ORCPT ); Wed, 26 Oct 2016 20:23:24 -0400 Received: from ec2-52-27-115-49.us-west-2.compute.amazonaws.com ([52.27.115.49]:46558 "EHLO s-opensource.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932430AbcJ0AXV (ORCPT ); Wed, 26 Oct 2016 20:23:21 -0400 Date: Wed, 26 Oct 2016 22:23:14 -0200 From: Mauro Carvalho Chehab To: Jonathan Corbet Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Jani Nikula Subject: Re: [PATCH 06/11] docs: Get rid of the "bug-hunting" guide Message-ID: <20161026222314.6a28b97e@vento.lan> In-Reply-To: <1477523979-5837-7-git-send-email-corbet@lwn.net> References: <1477523979-5837-1-git-send-email-corbet@lwn.net> <1477523979-5837-7-git-send-email-corbet@lwn.net> Organization: Samsung X-Mailer: Claws Mail 3.14.0 (GTK+ 2.24.31; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 10544 Lines: 292 Em Wed, 26 Oct 2016 17:19:34 -0600 Jonathan Corbet escreveu: > Larry McVoy's advice on how to manually bisect 1.3.x kernel bugs is of > historical interest, but that's what the repository is for. It is not > useful to users now. In the specific case of this file, I think that the information there about how to disassemble a file and how to use gdb and objdump to get the error line associated with an OOPS very useful. So, I prefer to keep this one. See the patch I sent you today. > Signed-off-by: Jonathan Corbet > --- > Documentation/admin-guide/bug-hunting.rst | 249 ------------------------------ > Documentation/admin-guide/index.rst | 1 - > 2 files changed, 250 deletions(-) > delete mode 100644 Documentation/admin-guide/bug-hunting.rst > > diff --git a/Documentation/admin-guide/bug-hunting.rst b/Documentation/admin-guide/bug-hunting.rst > deleted file mode 100644 > index d35dd9fd1af0..000000000000 > --- a/Documentation/admin-guide/bug-hunting.rst > +++ /dev/null > @@ -1,249 +0,0 @@ > -Bug hunting > -+++++++++++ > - > -Last updated: 20 December 2005 > - > -Introduction > -============ > - > -Always try the latest kernel from kernel.org and build from source. If you are > -not confident in doing that please report the bug to your distribution vendor > -instead of to a kernel developer. > - > -Finding bugs is not always easy. Have a go though. If you can't find it don't > -give up. Report as much as you have found to the relevant maintainer. See > -MAINTAINERS for who that is for the subsystem you have worked on. > - > -Before you submit a bug report read > -:ref:`Documentation/admin-guide/reporting-bugs.rst `. > - > -Devices not appearing > -===================== > - > -Often this is caused by udev. Check that first before blaming it on the > -kernel. > - > -Finding patch that caused a bug > -=============================== > - > - > - > -Finding using ``git-bisect`` > ----------------------------- > - > -Using the provided tools with ``git`` makes finding bugs easy provided the bug > -is reproducible. > - > -Steps to do it: > - > -- start using git for the kernel source > -- read the man page for ``git-bisect`` > -- have fun > - > -Finding it the old way > ----------------------- > - > -[Sat Mar 2 10:32:33 PST 1996 KERNEL_BUG-HOWTO lm@sgi.com (Larry McVoy)] > - > -This is how to track down a bug if you know nothing about kernel hacking. > -It's a brute force approach but it works pretty well. > - > -You need: > - > - - A reproducible bug - it has to happen predictably (sorry) > - - All the kernel tar files from a revision that worked to the > - revision that doesn't > - > -You will then do: > - > - - Rebuild a revision that you believe works, install, and verify that. > - - Do a binary search over the kernels to figure out which one > - introduced the bug. I.e., suppose 1.3.28 didn't have the bug, but > - you know that 1.3.69 does. Pick a kernel in the middle and build > - that, like 1.3.50. Build & test; if it works, pick the mid point > - between .50 and .69, else the mid point between .28 and .50. > - - You'll narrow it down to the kernel that introduced the bug. You > - can probably do better than this but it gets tricky. > - > - - Narrow it down to a subdirectory > - > - - Copy kernel that works into "test". Let's say that 3.62 works, > - but 3.63 doesn't. So you diff -r those two kernels and come > - up with a list of directories that changed. For each of those > - directories: > - > - Copy the non-working directory next to the working directory > - as "dir.63". > - One directory at time, try moving the working directory to > - "dir.62" and mv dir.63 dir"time, try:: > - > - mv dir dir.62 > - mv dir.63 dir > - find dir -name '*.[oa]' -print | xargs rm -f > - > - And then rebuild and retest. Assuming that all related > - changes were contained in the sub directory, this should > - isolate the change to a directory. > - > - Problems: changes in header files may have occurred; I've > - found in my case that they were self explanatory - you may > - or may not want to give up when that happens. > - > - - Narrow it down to a file > - > - - You can apply the same technique to each file in the directory, > - hoping that the changes in that file are self contained. > - > - - Narrow it down to a routine > - > - - You can take the old file and the new file and manually create > - a merged file that has:: > - > - #ifdef VER62 > - routine() > - { > - ... > - } > - #else > - routine() > - { > - ... > - } > - #endif > - > - And then walk through that file, one routine at a time and > - prefix it with:: > - > - #define VER62 > - /* both routines here */ > - #undef VER62 > - > - Then recompile, retest, move the ifdefs until you find the one > - that makes the difference. > - > -Finally, you take all the info that you have, kernel revisions, bug > -description, the extent to which you have narrowed it down, and pass > -that off to whomever you believe is the maintainer of that section. > -A post to linux.dev.kernel isn't such a bad idea if you've done some > -work to narrow it down. > - > -If you get it down to a routine, you'll probably get a fix in 24 hours. > - > -My apologies to Linus and the other kernel hackers for describing this > -brute force approach, it's hardly what a kernel hacker would do. However, > -it does work and it lets non-hackers help fix bugs. And it is cool > -because Linux snapshots will let you do this - something that you can't > -do with vendor supplied releases. > - > -Fixing the bug > -============== > - > -Nobody is going to tell you how to fix bugs. Seriously. You need to work it > -out. But below are some hints on how to use the tools. > - > -To debug a kernel, use objdump and look for the hex offset from the crash > -output to find the valid line of code/assembler. Without debug symbols, you > -will see the assembler code for the routine shown, but if your kernel has > -debug symbols the C code will also be available. (Debug symbols can be enabled > -in the kernel hacking menu of the menu configuration.) For example:: > - > - objdump -r -S -l --disassemble net/dccp/ipv4.o > - > -.. note:: > - > - You need to be at the top level of the kernel tree for this to pick up > - your C files. > - > -If you don't have access to the code you can also debug on some crash dumps > -e.g. crash dump output as shown by Dave Miller:: > - > - EIP is at ip_queue_xmit+0x14/0x4c0 > - ... > - Code: 44 24 04 e8 6f 05 00 00 e9 e8 fe ff ff 8d 76 00 8d bc 27 00 00 > - 00 00 55 57 56 53 81 ec bc 00 00 00 8b ac 24 d0 00 00 00 8b 5d 08 > - <8b> 83 3c 01 00 00 89 44 24 14 8b 45 28 85 c0 89 44 24 18 0f 85 > - > - Put the bytes into a "foo.s" file like this: > - > - .text > - .globl foo > - foo: > - .byte .... /* bytes from Code: part of OOPS dump */ > - > - Compile it with "gcc -c -o foo.o foo.s" then look at the output of > - "objdump --disassemble foo.o". > - > - Output: > - > - ip_queue_xmit: > - push %ebp > - push %edi > - push %esi > - push %ebx > - sub $0xbc, %esp > - mov 0xd0(%esp), %ebp ! %ebp = arg0 (skb) > - mov 0x8(%ebp), %ebx ! %ebx = skb->sk > - mov 0x13c(%ebx), %eax ! %eax = inet_sk(sk)->opt > - > -In addition, you can use GDB to figure out the exact file and line > -number of the OOPS from the ``vmlinux`` file. If you have > -``CONFIG_DEBUG_INFO`` enabled, you can simply copy the EIP value from the > -OOPS:: > - > - EIP: 0060:[] Not tainted VLI > - > -And use GDB to translate that to human-readable form:: > - > - gdb vmlinux > - (gdb) l *0xc021e50e > - > -If you don't have ``CONFIG_DEBUG_INFO`` enabled, you use the function > -offset from the OOPS:: > - > - EIP is at vt_ioctl+0xda8/0x1482 > - > -And recompile the kernel with ``CONFIG_DEBUG_INFO`` enabled:: > - > - make vmlinux > - gdb vmlinux > - (gdb) p vt_ioctl > - (gdb) l *(0x
+ 0xda8) > - > -or, as one command:: > - > - (gdb) l *(vt_ioctl + 0xda8) > - > -If you have a call trace, such as:: > - > - Call Trace: > - [] :jbd:log_wait_commit+0xa3/0xf5 > - [] autoremove_wake_function+0x0/0x2e > - [] :jbd:journal_stop+0x1be/0x1ee > - ... > - > -this shows the problem in the :jbd: module. You can load that module in gdb > -and list the relevant code:: > - > - gdb fs/jbd/jbd.ko > - (gdb) p log_wait_commit > - (gdb) l *(0x
+ 0xa3) > - > -or:: > - > - (gdb) l *(log_wait_commit + 0xa3) > - > - > -Another very useful option of the Kernel Hacking section in menuconfig is > -Debug memory allocations. This will help you see whether data has been > -initialised and not set before use etc. To see the values that get assigned > -with this look at ``mm/slab.c`` and search for ``POISON_INUSE``. When using > -this an Oops will often show the poisoned data instead of zero which is the > -default. > - > -Once you have worked out a fix please submit it upstream. After all open > -source is about sharing what you do and don't you want to be recognised for > -your genius? > - > -Please do read > -ref:`Documentation/process/submitting-patches.rst ` though > -to help your code get accepted. > diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst > index 2872c0c70ea4..2d0a302e8773 100644 > --- a/Documentation/admin-guide/index.rst > +++ b/Documentation/admin-guide/index.rst > @@ -25,7 +25,6 @@ problems and bugs in particular. > > reporting-bugs > security-bugs > - bug-hunting > oops-tracing > ramoops > dynamic-debug-howto Thanks, Mauro