2009-01-27 17:10:42

by Mike Snitzer

[permalink] [raw]
Subject: BISECTED: Re: source line numbers with x86_64 modules?

[I've trimmed the wide cc distribution that was inherited when I
forked a different thread]

On Mon, Jan 12, 2009 at 10:19 PM, Eric W. Biederman
<[email protected]> wrote:
> "Mike Snitzer" <[email protected]> writes:
>>
>> Now if only I could fix line numbers when debugging crashes in x86_64
>> modules with the crash utility! :)
>
> It's a userspace problem...
>
> All of the little usability things are userspace problems.
>
> I won't claim that it is trivial because it is a userspace problem, at the same
> time there is no reason to wait for any kernel features to merge etc. Someone
> just has to scratch an itch and go fix it.

Yes, the crash utility (userspace) is clearly having problems getting
line number for symbols in x86_64 modules. But I finally took some
time to bisect the point in the kernel where the crash utility first
started to fail, it appears to be:

commit 7460ed2844ffad7141e30271c0c3da8336e66014
Author: john stultz <[email protected]>
Date: Fri Feb 16 01:28:21 2007 -0800

[PATCH] time: x86_64: re-enable vsyscall support for x86_64

Cleanup and re-enable vsyscall gettimeofday using the generic clocksource
infrastructure.

[[email protected]: cleanup]
Signed-off-by: John Stultz <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Roman Zippel <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

arch/x86_64/Kconfig | 4 +
arch/x86_64/kernel/hpet.c | 6 ++
arch/x86_64/kernel/time.c | 6 --
arch/x86_64/kernel/tsc.c | 7 ++
arch/x86_64/kernel/vmlinux.lds.S | 28 ++++------
arch/x86_64/kernel/vsyscall.c | 119 ++++++++++++++++++++++---------------
include/asm-x86_64/proto.h | 2 -
include/asm-x86_64/timex.h | 1 -
include/asm-x86_64/vsyscall.h | 29 ++--------
9 files changed, 104 insertions(+), 98 deletions(-)

Here is the full bisect log:

git-bisect start
# good: [62d0cfcb27cf755cebdc93ca95dabc83608007cd] Linux 2.6.20
git-bisect good 62d0cfcb27cf755cebdc93ca95dabc83608007cd
# bad: [c8f71b01a50597e298dc3214a2f2be7b8d31170c] Linux 2.6.21-rc1
git-bisect bad c8f71b01a50597e298dc3214a2f2be7b8d31170c
# good: [574009c1a895aeeb85eaab29c235d75852b09eb8] Merge branch
'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
git-bisect good 574009c1a895aeeb85eaab29c235d75852b09eb8
# good: [087d7ecd5273b480d13f4309a159842700afe276] [POWERPC] mpic: set
IPIs to be per-CPU
git-bisect good 087d7ecd5273b480d13f4309a159842700afe276
# bad: [920841d8d1d61bc12b43f95a579a5374f6d98f81] Merge branch
'for-linus' of ssh://master.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6
git-bisect bad 920841d8d1d61bc12b43f95a579a5374f6d98f81
# bad: [892705a1e1b4d0f9f6c5ac57f777b8055525bf68] USB: kernel-doc fixes
git-bisect bad 892705a1e1b4d0f9f6c5ac57f777b8055525bf68
# good: [741673473a5b26497d5390f38d478362e27e22ad] i386 prepare for dyntick
git-bisect good 741673473a5b26497d5390f38d478362e27e22ad
# bad: [ef29498655b18d2bfd69048e20835d19333981ab] Merge
master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq
git-bisect bad ef29498655b18d2bfd69048e20835d19333981ab
# bad: [bec50c47aaf6f1f9247f1860547ab394a0802a4c] knfsd: nfsd4: acls:
avoid unnecessary denies
git-bisect bad bec50c47aaf6f1f9247f1860547ab394a0802a4c
# bad: [7460ed2844ffad7141e30271c0c3da8336e66014] time: x86_64:
re-enable vsyscall support for x86_64
git-bisect bad 7460ed2844ffad7141e30271c0c3da8336e66014
# good: [289f480af87e45f7a6de6ba9b4c061c2e259fe98] Add debugging
feature /proc/timer_list
git-bisect good 289f480af87e45f7a6de6ba9b4c061c2e259fe98
# good: [2d0c87c3bc49c60ab5bbac401fb1ef37ff10bbe2] time: x86_64:
hpet_address cleanup
git-bisect good 2d0c87c3bc49c60ab5bbac401fb1ef37ff10bbe2
# good: [1489939f0ab64b96998e04068c516c39afe29654] time: x86_64:
convert x86_64 to use GENERIC_TIME
git-bisect good 1489939f0ab64b96998e04068c516c39afe29654

I used version 4.0-7.6 of the crash utility to test if each commit was
good or bad. I simply checked if ext3's module had correct line
number info for the ext3_get_blocks_handle symbol with: sym
ext3_get_blocks_handle

I tried to revert 7460ed2844ffad7141e30271c0c3da8336e66014 from
v2.6.21 but it had conflicts that I've not yet been able to put
adequate time to resolving.

That aside, I'd be very interested to know how/where this commit is
impacting the crash utility. Has alignment of some module metadata
structure been altered and that is the problem? This isn't my area of
expertise but I have to believe others may have useful insight.

Thanks,
Mike


2009-01-27 19:38:15

by Eric W. Biederman

[permalink] [raw]
Subject: Re: BISECTED: Re: source line numbers with x86_64 modules?

Mike Snitzer <[email protected]> writes:

> [I've trimmed the wide cc distribution that was inherited when I
> forked a different thread]
>
> On Mon, Jan 12, 2009 at 10:19 PM, Eric W. Biederman
> <[email protected]> wrote:
>> "Mike Snitzer" <[email protected]> writes:
>>>
>>> Now if only I could fix line numbers when debugging crashes in x86_64
>>> modules with the crash utility! :)
>>
>> It's a userspace problem...
>>
>> All of the little usability things are userspace problems.
>>
>> I won't claim that it is trivial because it is a userspace problem, at the
> same
>> time there is no reason to wait for any kernel features to merge etc. Someone
>> just has to scratch an itch and go fix it.
>
> Yes, the crash utility (userspace) is clearly having problems getting
> line number for symbols in x86_64 modules. But I finally took some
> time to bisect the point in the kernel where the crash utility first
> started to fail, it appears to be:
>
> commit 7460ed2844ffad7141e30271c0c3da8336e66014
> Author: john stultz <[email protected]>
> Date: Fri Feb 16 01:28:21 2007 -0800
>
> I used version 4.0-7.6 of the crash utility to test if each commit was
> good or bad. I simply checked if ext3's module had correct line
> number info for the ext3_get_blocks_handle symbol with: sym
> ext3_get_blocks_handle

Weird. That patch doesn't appear to affect anything in that area.
So my stab in the dark is that there is something in vmlinux that
crash doesn't know how to cope with.

> I tried to revert 7460ed2844ffad7141e30271c0c3da8336e66014 from
> v2.6.21 but it had conflicts that I've not yet been able to put
> adequate time to resolving.
>
> That aside, I'd be very interested to know how/where this commit is
> impacting the crash utility. Has alignment of some module metadata
> structure been altered and that is the problem? This isn't my area of
> expertise but I have to believe others may have useful insight.

Eric

2009-01-27 21:00:48

by Dave Anderson

[permalink] [raw]
Subject: Re: BISECTED: Re: source line numbers with x86_64 modules?


----- "Eric W. Biederman" <[email protected]> wrote:

> Mike Snitzer <[email protected]> writes:
>
> > [I've trimmed the wide cc distribution that was inherited when I
> > forked a different thread]
> >
> > On Mon, Jan 12, 2009 at 10:19 PM, Eric W. Biederman
> > <[email protected]> wrote:
> >> "Mike Snitzer" <[email protected]> writes:
> >>>
> >>> Now if only I could fix line numbers when debugging crashes in x86_64
> >>> modules with the crash utility! :)
> >>
> >> It's a userspace problem...
> >>
> >> All of the little usability things are userspace problems.
> >>
> >> I won't claim that it is trivial because it is a userspace problem, at the same
> >> time there is no reason to wait for any kernel features to merge etc. Someone
> >> just has to scratch an itch and go fix it.
> >
> > Yes, the crash utility (userspace) is clearly having problems getting
> > line number for symbols in x86_64 modules. But I finally took some
> > time to bisect the point in the kernel where the crash utility first
> > started to fail, it appears to be:
> >
> > commit 7460ed2844ffad7141e30271c0c3da8336e66014
> > Author: john stultz <[email protected]>
> > Date: Fri Feb 16 01:28:21 2007 -0800
> >
> > I used version 4.0-7.6 of the crash utility to test if each commit was
> > good or bad. I simply checked if ext3's module had correct line
> > number info for the ext3_get_blocks_handle symbol with: sym
> > ext3_get_blocks_handle
>
> Weird. That patch doesn't appear to affect anything in that area.
> So my stab in the dark is that there is something in vmlinux that
> crash doesn't know how to cope with.

Actually it's not a problem with the vmlinux file, but rather with kernel
module object files. The crash utility has an embedded gdb module which
is invoked as "gdb vmlinux", and to get line numbers, the crash utility
simply uses the relevant built-in gdb function to get them. And line
numbers work fine with the base kernel code from the vmlinux file.

The debuginfo data of kernel modules can be subsequently added to the
crash session by doing a gdb "add-symbol-file" command for any or all
kernel modules. But getting correct line number information for kernel
modules has been a crap-shoot in the past, depending upon architecture
and/or kernel version. For example, they don't work with 2.6.9-based
RHEL4 x86_64 kernel modules, but work fine with 2.6.18-based RHEL5 x86_64
kernels.

Looking at Mike's suspect kernel patch list, I don't see anything that
would have any relationship to the issue. Perhaps there was a build tool
change during the same timeframe?

Dave

2009-01-27 21:23:05

by Arjan van de Ven

[permalink] [raw]
Subject: Re: BISECTED: Re: source line numbers with x86_64 modules?

On Tue, 27 Jan 2009 16:00:19 -0500 (EST)
Dave Anderson <[email protected]> wrote:
>
> Actually it's not a problem with the vmlinux file, but rather with
> kernel module object files. The crash utility has an embedded gdb
> module which is invoked as "gdb vmlinux", and to get line numbers,
> the crash utility simply uses the relevant built-in gdb function to
> get them. And line numbers work fine with the base kernel code from
> the vmlinux file.


if the entire goal is to get a pinpoint of where the crash is, just
using scripts/markup_oops.pl will do the right thing.

--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-01-27 21:55:41

by Eric W. Biederman

[permalink] [raw]
Subject: Re: BISECTED: Re: source line numbers with x86_64 modules?

Dave Anderson <[email protected]> writes:

> Actually it's not a problem with the vmlinux file, but rather with kernel
> module object files. The crash utility has an embedded gdb module which
> is invoked as "gdb vmlinux", and to get line numbers, the crash utility
> simply uses the relevant built-in gdb function to get them. And line
> numbers work fine with the base kernel code from the vmlinux file.
>
> The debuginfo data of kernel modules can be subsequently added to the
> crash session by doing a gdb "add-symbol-file" command for any or all
> kernel modules. But getting correct line number information for kernel
> modules has been a crap-shoot in the past, depending upon architecture
> and/or kernel version. For example, they don't work with 2.6.9-based
> RHEL4 x86_64 kernel modules, but work fine with 2.6.18-based RHEL5 x86_64
> kernels.
>
> Looking at Mike's suspect kernel patch list, I don't see anything that
> would have any relationship to the issue. Perhaps there was a build tool
> change during the same timeframe?

It look like Mike just built a series of kernels and had a problem,
which should preclude a tool change.

That said. Does this feature of crash work in 2.6.29? If not is
there enough interest to track this down, and fix it if it is a kernel
bug?

If we are going to be using these tools we need them working on the
latest and greatest kernels, not some weird enterprise branch, for
fuddy duddies.

Eric

2009-01-27 22:21:29

by Dave Anderson

[permalink] [raw]
Subject: Re: BISECTED: Re: source line numbers with x86_64 modules?


----- "Eric W. Biederman" <[email protected]> wrote:

> Dave Anderson <[email protected]> writes:
>
> > Actually it's not a problem with the vmlinux file, but rather with kernel
> > module object files. The crash utility has an embedded gdb module which
> > is invoked as "gdb vmlinux", and to get line numbers, the crash utility
> > simply uses the relevant built-in gdb function to get them. And line
> > numbers work fine with the base kernel code from the vmlinux file.
> >
> > The debuginfo data of kernel modules can be subsequently added to the
> > crash session by doing a gdb "add-symbol-file" command for any or all
> > kernel modules. But getting correct line number information for kernel
> > modules has been a crap-shoot in the past, depending upon architecture
> > and/or kernel version. For example, they don't work with 2.6.9-based
> > RHEL4 x86_64 kernel modules, but work fine with 2.6.18-based RHEL5 x86_64
> > kernels.
> >
> > Looking at Mike's suspect kernel patch list, I don't see anything that
> > would have any relationship to the issue. Perhaps there was a build tool
> > change during the same timeframe?
>
> It look like Mike just built a series of kernels and had a problem,
> which should preclude a tool change.
>
> That said. Does this feature of crash work in 2.6.29? If not is
> there enough interest to track this down, and fix it if it is a
> kernel bug?
>
> If we are going to be using these tools we need them working on the
> latest and greatest kernels, not some weird enterprise branch, for
> fuddy duddies.

Personally I don't know -- I am a fuddy duddy.

I have tinkered with at least 2.6.28-era vmlinux/vmcore pairs, but never
with any kernel modules thereof.

Dave

2009-01-27 23:42:18

by Mike Snitzer

[permalink] [raw]
Subject: Re: BISECTED: Re: source line numbers with x86_64 modules?

On Tue, Jan 27, 2009 at 4:55 PM, Eric W. Biederman
<[email protected]> wrote:
> Dave Anderson <[email protected]> writes:
>
>> Actually it's not a problem with the vmlinux file, but rather with kernel
>> module object files. The crash utility has an embedded gdb module which
>> is invoked as "gdb vmlinux", and to get line numbers, the crash utility
>> simply uses the relevant built-in gdb function to get them. And line
>> numbers work fine with the base kernel code from the vmlinux file.
>>
>> The debuginfo data of kernel modules can be subsequently added to the
>> crash session by doing a gdb "add-symbol-file" command for any or all
>> kernel modules. But getting correct line number information for kernel
>> modules has been a crap-shoot in the past, depending upon architecture
>> and/or kernel version. For example, they don't work with 2.6.9-based
>> RHEL4 x86_64 kernel modules, but work fine with 2.6.18-based RHEL5 x86_64
>> kernels.
>>
>> Looking at Mike's suspect kernel patch list, I don't see anything that
>> would have any relationship to the issue. Perhaps there was a build tool
>> change during the same timeframe?
>
> It look like Mike just built a series of kernels and had a problem,
> which should preclude a tool change.

That is correct, I just built/booted/tested a series of kernels one
after the other as part of the standard git bisect procedure. No
tools were changed.

> That said. Does this feature of crash work in 2.6.29? If not is
> there enough interest to track this down, and fix it if it is a kernel
> bug?
>
> If we are going to be using these tools we need them working on the
> latest and greatest kernels, not some weird enterprise branch, for
> fuddy duddies.

This feature of crash does not work with 2.6.29, nor does it work with
any kernel I've tried >= 2.6.21. Andi Kleen shared with me that he
sees the same problem with a recent crash and 2.6.28.

AFAIK I found the regression point relative to linux (commit:
7460ed28). The verdict is clearly still out on where the actual bug
lives (linux vs crash).

Mike