Vivek Goyal <[email protected]> writes:
> On Mon, Apr 30, 2007 at 05:17:07PM +0200, Andi Kleen wrote:
>> On Monday 30 April 2007 17:12:39 Eric W. Biederman wrote:
>> >
>> > Currently because vmlinux does not reflect that the kernel is relocatable
>> > we still have to support CONFIG_PHYSICAL_START. So this patch adds a small
>> > c program to do what we cannot do with a linker script set the elf header
>> > type to ET_DYN.
>> >
>> > Since last time I have fixed the type to be in my code ET_DYN (oops),
>> > and verified this works with kexec. I realized while testing that we
>> > don't have anyway of identifying a kernel vmlinux as linux so we
>> > probably want to add an ELF note but that will be another patch.
>>
>> The patch is ok for me, but does it pass Vivek's usual testing?
>
> I am facing one issue with this patch. gdb can not analyze the
> resulting kernel core file. Looks like gdb treats vmlinux differently if
> ELF header type is "ET_DYN". It reads the symbol values incorrectly.
Weird.
> For example, symbol value of "panic_timeout" is 0xffffffff808a1fa8 but
> gdb somehow things that it is 0xffffffff008aaebf. Looks like it is
> performing some relocation.
>
> I am using GNU gdb Red Hat Linux (6.5-5.fc6rh).
Does it take a kernel core file to reproduce this problem?
Or can you just open up gdb on a vmlinux and look at the symbol
address?
At least without a core file it is working on with gdb 6.4.
Eric
On Mon, Apr 30, 2007 at 10:20:53PM -0600, Eric W. Biederman wrote:
> Vivek Goyal <[email protected]> writes:
>
> > On Mon, Apr 30, 2007 at 05:17:07PM +0200, Andi Kleen wrote:
> >> On Monday 30 April 2007 17:12:39 Eric W. Biederman wrote:
> >> >
> >> > Currently because vmlinux does not reflect that the kernel is relocatable
> >> > we still have to support CONFIG_PHYSICAL_START. So this patch adds a small
> >> > c program to do what we cannot do with a linker script set the elf header
> >> > type to ET_DYN.
> >> >
> >> > Since last time I have fixed the type to be in my code ET_DYN (oops),
> >> > and verified this works with kexec. I realized while testing that we
> >> > don't have anyway of identifying a kernel vmlinux as linux so we
> >> > probably want to add an ELF note but that will be another patch.
> >>
> >> The patch is ok for me, but does it pass Vivek's usual testing?
> >
> > I am facing one issue with this patch. gdb can not analyze the
> > resulting kernel core file. Looks like gdb treats vmlinux differently if
> > ELF header type is "ET_DYN". It reads the symbol values incorrectly.
>
> Weird.
>
> > For example, symbol value of "panic_timeout" is 0xffffffff808a1fa8 but
> > gdb somehow things that it is 0xffffffff008aaebf. Looks like it is
> > performing some relocation.
> >
> > I am using GNU gdb Red Hat Linux (6.5-5.fc6rh).
>
> Does it take a kernel core file to reproduce this problem?
> Or can you just open up gdb on a vmlinux and look at the symbol
> address?
It takes a core file to reproduce the problem. Without core file gdb can
get right symbol addresses.
>
> At least without a core file it is working on with gdb 6.4.
>
This seems to be a problem with gdb 6.5. I transferred the dump to a
different machine having GNU gdb 6.4, and it works fine there.
Thanks
Vivek
Vivek Goyal <[email protected]> writes:
>> At least without a core file it is working on with gdb 6.4.
>>
>
> This seems to be a problem with gdb 6.5. I transferred the dump to a
> different machine having GNU gdb 6.4, and it works fine there.
Ok. The difference between those two symbols didn't seem to make
any sense, so a gdb bug makes sense.
Cool. Then the patch is good. :)
Eric
On Mon, Apr 30, 2007 at 11:26:50PM -0600, Eric W. Biederman wrote:
> Vivek Goyal <[email protected]> writes:
>
>
> >> At least without a core file it is working on with gdb 6.4.
> >>
> >
> > This seems to be a problem with gdb 6.5. I transferred the dump to a
> > different machine having GNU gdb 6.4, and it works fine there.
>
> Ok. The difference between those two symbols didn't seem to make
> any sense, so a gdb bug makes sense.
>
> Cool. Then the patch is good. :)
It would still make any gdb 6.5 users unhappy. If no workaround can be found
I guess we'll need a CONFIG of some sort?
-Andi
Andi Kleen <[email protected]> writes:
> On Mon, Apr 30, 2007 at 11:26:50PM -0600, Eric W. Biederman wrote:
>> Vivek Goyal <[email protected]> writes:
>>
>>
>> >> At least without a core file it is working on with gdb 6.4.
>> >>
>> >
>> > This seems to be a problem with gdb 6.5. I transferred the dump to a
>> > different machine having GNU gdb 6.4, and it works fine there.
>>
>> Ok. The difference between those two symbols didn't seem to make
>> any sense, so a gdb bug makes sense.
>>
>> Cool. Then the patch is good. :)
>
> It would still make any gdb 6.5 users unhappy. If no workaround can be found
> I guess we'll need a CONFIG of some sort?
It is probably worth reproducing this bug with a PIE executable.
But it looks very much like gdb got it wrong, or else there is some
slight mismatch between our core dump and gdb.
Given that gdb 6.5 handles the vmlinux fine when it isn't in conjunction
with a core dump I would not say the problem is in vmlinux.
Rather there seems to be something messed up when gdb 6.5 tries to match
up the kernel core dump with the kernel. The offset for the symbol
Vivek gave was 0x7fff70e9. ??? Although that is almost 2M...
Vivek could we see the program headers of your core file?
>From what I can tell what is left to figure out is do we have
a bug in gdb 6.5 or do we have a bug in our core file generation.
Right now I'm inclined to believe that the fedora core 6? gdb 6.5 got it
wrong. I'm probably just burnt out with binutils problems whenever
I try and do something interesting. But I'm just not inclined that
the bleeding edge tools are working properly while there kernel
core dump mechanism would mess up with a two byte field change.
It does make sense to root cause this if we can. If it's a gdb
problem it should also apply to PIE executables, and should irritate
a few users.
Regardless last I heard it was crash that was the primary analysis
tool and not gdb anyway. With gdb serving as the double check to make
certain that the kernel core dump was in a reasonably standard format.
Eric
On Mon, Apr 30, 2007 at 11:54:22PM -0600, Eric W. Biederman wrote:
> Andi Kleen <[email protected]> writes:
>
> > On Mon, Apr 30, 2007 at 11:26:50PM -0600, Eric W. Biederman wrote:
> >> Vivek Goyal <[email protected]> writes:
> >>
> >>
> >> >> At least without a core file it is working on with gdb 6.4.
> >> >>
> >> >
> >> > This seems to be a problem with gdb 6.5. I transferred the dump to a
> >> > different machine having GNU gdb 6.4, and it works fine there.
> >>
> >> Ok. The difference between those two symbols didn't seem to make
> >> any sense, so a gdb bug makes sense.
> >>
> >> Cool. Then the patch is good. :)
> >
> > It would still make any gdb 6.5 users unhappy. If no workaround can be found
> > I guess we'll need a CONFIG of some sort?
>
> It is probably worth reproducing this bug with a PIE executable.
> But it looks very much like gdb got it wrong, or else there is some
> slight mismatch between our core dump and gdb.
>
> Given that gdb 6.5 handles the vmlinux fine when it isn't in conjunction
> with a core dump I would not say the problem is in vmlinux.
>
> Rather there seems to be something messed up when gdb 6.5 tries to match
> up the kernel core dump with the kernel. The offset for the symbol
> Vivek gave was 0x7fff70e9. ??? Although that is almost 2M...
>
> Vivek could we see the program headers of your core file?
>
Hi Eric,
Following are program headers of my core file. They look sane.
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
NOTE 0x0000000000000190 0x0000000000000000 0x0000000000000000
0x0000000000000b20 0x0000000000000b20 0
LOAD 0x0000000000000cb0 0xffffffff80200000 0x0000000000200000
0x0000000000742000 0x0000000000742000 RWE 0
LOAD 0x0000000000742cb0 0xffff810000000000 0x0000000000000000
0x00000000000a0000 0x00000000000a0000 RWE 0
LOAD 0x00000000007e2cb0 0xffff810000100000 0x0000000000100000
0x0000000000f00000 0x0000000000f00000 RWE 0
LOAD 0x00000000016e2cb0 0xffff810009000000 0x0000000009000000
0x00000000c6f8dc80 0x00000000c6f8dc80 RWE 0
LOAD 0x00000000c8670930 0xffff810100000000 0x0000000100000000
0x0000000130000000 0x0000000130000000 RWE 0
First PT_LOAD header is mapping kernel text/data and others just mapping
physical memory in kernel linear virtual address range.
> >From what I can tell what is left to figure out is do we have
> a bug in gdb 6.5 or do we have a bug in our core file generation.
>
Given the fact it works well with gdb 6.4 as well as 6.1 (crash uses gdb 6.1
as backend and crash is working fine). I would think that it is gdb bug.
> Right now I'm inclined to believe that the fedora core 6? gdb 6.5 got it
> wrong. I'm probably just burnt out with binutils problems whenever
> I try and do something interesting. But I'm just not inclined that
> the bleeding edge tools are working properly while there kernel
> core dump mechanism would mess up with a two byte field change.
>
> It does make sense to root cause this if we can. If it's a gdb
> problem it should also apply to PIE executables, and should irritate
> a few users.
>
> Regardless last I heard it was crash that was the primary analysis
> tool and not gdb anyway. With gdb serving as the double check to make
> certain that the kernel core dump was in a reasonably standard format.
I would consider gdb to be equally important, especially because many
a times crash is broken with latest version of kernels (as some data
structures or some mechanism has changed) and gdb is the only one who
can open the dump.
Thanks
Vivek
* Vivek Goyal <[email protected]> [2007-05-01 07:06]:
> This seems to be a problem with gdb 6.5. I transferred the dump to a
> different machine having GNU gdb 6.4, and it works fine there.
What's the state of it? Andy, was the GDB breakage the reason why you
didn't merge it? Did someone file a GDB bug?
Thanks,
Bernhard
On Mon, May 28, 2007 at 12:54:42PM +0200, Bernhard Walle wrote:
> * Vivek Goyal <[email protected]> [2007-05-01 07:06]:
> > This seems to be a problem with gdb 6.5. I transferred the dump to a
> > different machine having GNU gdb 6.4, and it works fine there.
>
> What's the state of it? Andy, was the GDB breakage the reason why you
> didn't merge it? Did someone file a GDB bug?
>
I had sent a mail to gdb mailing list but no response. Did not raise
a bug though.
Thanks
Vivek
* Vivek Goyal <[email protected]> [2007-05-28 13:09]:
> On Mon, May 28, 2007 at 12:54:42PM +0200, Bernhard Walle wrote:
> > * Vivek Goyal <[email protected]> [2007-05-01 07:06]:
> > > This seems to be a problem with gdb 6.5. I transferred the dump to a
> > > different machine having GNU gdb 6.4, and it works fine there.
> >
> > What's the state of it? Andy, was the GDB breakage the reason why you
> > didn't merge it? Did someone file a GDB bug?
>
> I had sent a mail to gdb mailing list but no response. Did not raise
> a bug though.
BTW: Did anyone test with GDB 6.6?
Thanks,
Bernhard
On Monday 28 May 2007 12:54:42 Bernhard Walle wrote:
> * Vivek Goyal <[email protected]> [2007-05-01 07:06]:
> > This seems to be a problem with gdb 6.5. I transferred the dump to a
> > different machine having GNU gdb 6.4, and it works fine there.
>
> What's the state of it? Andy, was the GDB breakage the reason why you
> didn't merge it?
Yes, I was waiting for Vivek to figure that out.
-Andi
* Bernhard Walle <[email protected]> [2007-05-28 16:39]:
> * Vivek Goyal <[email protected]> [2007-05-28 13:09]:
> > On Mon, May 28, 2007 at 12:54:42PM +0200, Bernhard Walle wrote:
> > > * Vivek Goyal <[email protected]> [2007-05-01 07:06]:
> > > > This seems to be a problem with gdb 6.5. I transferred the dump to a
> > > > different machine having GNU gdb 6.4, and it works fine there.
> > >
> > > What's the state of it? Andy, was the GDB breakage the reason why you
> > > didn't merge it? Did someone file a GDB bug?
> >
> > I had sent a mail to gdb mailing list but no response. Did not raise
> > a bug though.
>
> BTW: Did anyone test with GDB 6.6?
/me. But I get the same error as with GDB 6.5. I'm looking if I can
find the cause of the problem.
Thanks,
Bernhard