LinuxLists.cc - Re: 2.6.17-rc2-mm1

[permalink] [raw]

Subject: Re: 2.6.17-rc2-mm1

>>double fault: 0000 [1] SMP
>>last sysfs file: /devices/pci0000:00/0000:00:06.0/resource
>>CPU 0
>>Modules linked in:
>>Pid: 20519, comm: mtest01 Not tainted 2.6.17-rc3-mm1-autokern1 #1
>>RIP: 0010:[<ffffffff8047c8b8>] <ffffffff8047c8b8>{__sched_text_start+1856}
>>RSP: 0000:0000000000000000 EFLAGS: 00010082
>>RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff805d9438
>>RDX: ffff8100db12c0d0 RSI: ffffffff805d9438 RDI: ffff8100db12c0d0
>>RBP: ffffffff805d9438 R08: 0000000000000000 R09: 0000000000000000
>>R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>>R13: ffff8100e39bd440 R14: ffff810008003620 R15: 000002b02751726c
>>FS: 0000000000000000(0000) GS:ffffffff805fa000(0063) knlGS:00000000f7dd0460
>>CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
>>CR2: fffffffffffffff8 CR3: 00000000da399000 CR4: 00000000000006e0
>>Process mtest01 (pid: 20519, threadinfo ffff8100b1bb4000, task
>>ffff8100db12c0d0)
>>Stack: ffffffff80579e20 ffff8100db12c0d0 0000000000000001 ffffffff80579f58
>> 0000000000000000 ffffffff80579e78 ffffffff8020b0b2 ffffffff80579f58
>> 0000000000000000 ffffffff80485520
>>Call Trace: <#DF> <ffffffff8020b0b2>{show_registers+140}
>> <ffffffff8020b357>{__die+159} <ffffffff8020b3cc>{die+50}
>> <ffffffff8020bba6>{do_double_fault+115}
>><ffffffff8020aa91>{double_fault+125}
>> <ffffffff8047c8b8>{__sched_text_start+1856} <EOE>
>>
>>Code: e8 4c ba d8 ff 65 48 8b 34 25 00 00 00 00 4c 8b 46 08 f0 41
>>RIP <ffffffff8047c8b8>{__sched_text_start+1856} RSP <0000000000000000>
>> -- 0:conmux-control -- time-stamp -- May/01/06 3:54:37 --
>
>
> I was not able to reproduce this on the 4-way EMT64 machine. Am a bit stuck.

OK, is there anything we could run this with that'd dump more info?
(eg debug patches or something). There's bugger all of use that I
can see in that stack (and why does __sched_text_start come up anyway,
is that an x86_64-ism ?). I suppose if we're really desperate, we can
play chop search, but that's very boring to try to do remotely ...

It's a couple-of-year-old 4x newisys box.

M.

2006-05-01 17:18:22

by Badari Pulavarty

[permalink] [raw]

Subject: Re: 2.6.17-rc2-mm1

On Mon, 2006-05-01 at 10:07 -0700, Andrew Morton wrote:
> "Martin J. Bligh" <[email protected]> wrote:
> >
> > Andrew Morton wrote:
> > > (I did s/[email protected]/[email protected]/)
> > >
> > > Martin Bligh <[email protected]> wrote:
> > >
> > >>Still crashes in LTP on x86_64:
> > >>(introduced in previous release)
> > >>
> > >>http://test.kernel.org/abat/29674/debug/console.log
> > >
> > >
> > > What a mess. A doublefault inside an NMI watchdog timeout. I think. It's
> > > hard to see. Some CPUs are stuck on a CPU scheduler lock, others seem to
> > > be stuck in flush_tlb_others. One of these could be a consequence of the
> > > other, or both could be a consequence of something else.
> >
> > OK, well the latest one seems cleaner, on -rc3-mm1.
> > http://test.kernel.org/abat/30007/debug/console.log
> >
> > Just has the double fault, with no NMI watchdog timeouts. Not that
> > it means any more to me, but still ;-) mtest01 seems to be able to
> > reproduce this every time, but I don't have an appropriate box here
> > to diagnose it with (this was a 4x Opteron inside IBM), and it's
> > definitely something in -mm that's not in mainline.
> >
> > M.
> >
> > double fault: 0000 [1] SMP
> > last sysfs file: /devices/pci0000:00/0000:00:06.0/resource
> > CPU 0
> > Modules linked in:
> > Pid: 20519, comm: mtest01 Not tainted 2.6.17-rc3-mm1-autokern1 #1
> > RIP: 0010:[<ffffffff8047c8b8>] <ffffffff8047c8b8>{__sched_text_start+1856}
> > RSP: 0000:0000000000000000 EFLAGS: 00010082
> > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff805d9438
> > RDX: ffff8100db12c0d0 RSI: ffffffff805d9438 RDI: ffff8100db12c0d0
> > RBP: ffffffff805d9438 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> > R13: ffff8100e39bd440 R14: ffff810008003620 R15: 000002b02751726c
> > FS: 0000000000000000(0000) GS:ffffffff805fa000(0063) knlGS:00000000f7dd0460
> > CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
> > CR2: fffffffffffffff8 CR3: 00000000da399000 CR4: 00000000000006e0
> > Process mtest01 (pid: 20519, threadinfo ffff8100b1bb4000, task
> > ffff8100db12c0d0)
> > Stack: ffffffff80579e20 ffff8100db12c0d0 0000000000000001 ffffffff80579f58
> > 0000000000000000 ffffffff80579e78 ffffffff8020b0b2 ffffffff80579f58
> > 0000000000000000 ffffffff80485520
> > Call Trace: <#DF> <ffffffff8020b0b2>{show_registers+140}
> > <ffffffff8020b357>{__die+159} <ffffffff8020b3cc>{die+50}
> > <ffffffff8020bba6>{do_double_fault+115}
> > <ffffffff8020aa91>{double_fault+125}
> > <ffffffff8047c8b8>{__sched_text_start+1856} <EOE>
> >
> > Code: e8 4c ba d8 ff 65 48 8b 34 25 00 00 00 00 4c 8b 46 08 f0 41
> > RIP <ffffffff8047c8b8>{__sched_text_start+1856} RSP <0000000000000000>
> > -- 0:conmux-control -- time-stamp -- May/01/06 3:54:37 --
>
> I was not able to reproduce this on the 4-way EMT64 machine. Am a bit stuck.

I ran mtest01 multiple times with various options on my 4-way AMD64 box.
So far couldn't reproduce the problem (2.6.17-rc3-mm1).

Are there any special config or test options you are testing with ?

Thanks,
Badari

2006-05-01 17:26:47

[permalink] [raw]

Subject: Re: 2.6.17-rc2-mm1

> I ran mtest01 multiple times with various options on my 4-way AMD64 box.
> So far couldn't reproduce the problem (2.6.17-rc3-mm1).
>
> Are there any special config or test options you are testing with ?

Config is here:

http://ftp.kernel.org/pub/linux/kernel/people/mbligh/config/abat/amd64

It's just doing "runalltests", I think.

M.

2006-05-01 17:32:28

[permalink] [raw]

Subject: Re: 2.6.17-rc2-mm1

Andrew Morton wrote:
> "Martin J. Bligh" <[email protected]> wrote:
>
>>Andrew Morton wrote:
>>
>>>(I did s/[email protected]/[email protected]/)
>>>
>>>Martin Bligh <[email protected]> wrote:
>>>
>>>
>>>>Still crashes in LTP on x86_64:
>>>>(introduced in previous release)
>>>>
>>>>http://test.kernel.org/abat/29674/debug/console.log
>>>
>>>
>>>What a mess. A doublefault inside an NMI watchdog timeout. I think. It's
>>>hard to see. Some CPUs are stuck on a CPU scheduler lock, others seem to
>>>be stuck in flush_tlb_others. One of these could be a consequence of the
>>>other, or both could be a consequence of something else.
>>
>>OK, well the latest one seems cleaner, on -rc3-mm1.
>>http://test.kernel.org/abat/30007/debug/console.log
>>
>>Just has the double fault, with no NMI watchdog timeouts. Not that
>>it means any more to me, but still ;-) mtest01 seems to be able to
>>reproduce this every time, but I don't have an appropriate box here
>>to diagnose it with (this was a 4x Opteron inside IBM), and it's
>>definitely something in -mm that's not in mainline.

Andy, any chance you could do another run on elm3b6 of ltp with:
2.6.17-rc3 + http://test.kernel.org/patches/2.6.17-rc3-mm1-64

Which is:

x86_64-add-compat_sys_vmsplice-and-use-it-in.patch
i386-x86-64-fix-acpi-disabled-lapic-handling.patch
x86_64-mm-defconfig-update.patch
x86_64-mm-phys-apicid.patch
x86_64-mm-memset-always-inline.patch
x86_64-mm-amd-core-cpuid.patch
x86_64-mm-amd-cpuid4.patch
x86_64-mm-alternatives.patch
x86_64-mm-pci-dma-cleanup.patch
x86_64-mm-ia32-unistd-cleanup.patch
x86_64-mm-large-bzimage.patch
x86_64-mm-topology-comment.patch
x86_64-mm-agp-select.patch
x86_64-mm-iommu_gart_bitmap-search-to-cross-next_bit.patch
x86_64-mm-new-compat-ptrace.patch
x86_64-mm-disable-agp-resource-check.patch
x86_64-mm-avoid-irq0-ioapic-pin-collision.patch
x86_64-mm-gart-direct-call.patch
x86_64-mm-new-northbridge.patch
x86_64-mm-iommu-warning.patch
x86_64-mm-serialize-assign_irq_vector-use-of-static-variables.patch
x86_64-mm-simplify-ioapic_register_intr.patch
x86_64-mm-i386-apic-overwrite.patch
x86_64-mm-i386-up-generic-arch.patch
x86_64-mm-iommu-enodev.patch
x86_64-mm-fix-die_lock-nesting.patch
x86_64-mm-add-nmi_exit-to-die_nmi.patch
x86_64-mm-compat-printk.patch
x86_64-mm-hotadd-reserve-fix-fix-fix.patch
x86_64-mm-compat-printk-fix.patch
x86_64-mm-new-northbridge-fix.patch
x86-64-calgary-iommu-introduce-iommu_detected.patch
x86-64-calgary-iommu-calgary-specific-bits.patch
x86-64-calgary-iommu-hook-it-in.patch
x86-64-check-for-valid-dma-data-direction-in-the-dma-api.patch

Thanks,

M.

2006-05-01 17:54:07

by Badari Pulavarty

[permalink] [raw]

Subject: Re: 2.6.17-rc2-mm1

On Mon, 2006-05-01 at 10:26 -0700, Martin Bligh wrote:
> > I ran mtest01 multiple times with various options on my 4-way AMD64 box.
> > So far couldn't reproduce the problem (2.6.17-rc3-mm1).
> >
> > Are there any special config or test options you are testing with ?
>
> Config is here:
>
> http://ftp.kernel.org/pub/linux/kernel/people/mbligh/config/abat/amd64
>
> It's just doing "runalltests", I think.

FWIW, I tried your config file on my 4-way AMD64 (melody) box
and ran latest "mtest01" fine.

I am now trying runalltests. I guess, its time to bi-sect :(

Thanks,
Badari

2006-05-01 17:58:00

[permalink] [raw]

Subject: Re: 2.6.17-rc2-mm1

Badari Pulavarty wrote:
> On Mon, 2006-05-01 at 10:26 -0700, Martin Bligh wrote:
>
>>>I ran mtest01 multiple times with various options on my 4-way AMD64 box.
>>>So far couldn't reproduce the problem (2.6.17-rc3-mm1).
>>>
>>>Are there any special config or test options you are testing with ?
>>
>>Config is here:
>>
>>http://ftp.kernel.org/pub/linux/kernel/people/mbligh/config/abat/amd64
>>
>>It's just doing "runalltests", I think.
>
>
> FWIW, I tried your config file on my 4-way AMD64 (melody) box
> and ran latest "mtest01" fine.
>
> I am now trying runalltests. I guess, its time to bi-sect :(

There was a panic on PPC64 during LTP too, but it seems to have gone
away with rc3-mm1. Not sure if it was really fixed, or just intermittent.

http://test.kernel.org/abat/29675/debug/console.log

M.

2006-05-01 18:33:25

[permalink] [raw]

Subject: Re: 2.6.17-rc2-mm1

Martin Bligh wrote:
> Badari Pulavarty wrote:
>
>> On Mon, 2006-05-01 at 10:26 -0700, Martin Bligh wrote:
>>
>>>> I ran mtest01 multiple times with various options on my 4-way AMD64
>>>> box.
>>>> So far couldn't reproduce the problem (2.6.17-rc3-mm1).
>>>>
>>>> Are there any special config or test options you are testing with ?
>>>
>>>
>>> Config is here:
>>>
>>> http://ftp.kernel.org/pub/linux/kernel/people/mbligh/config/abat/amd64
>>>
>>> It's just doing "runalltests", I think.
>>
>>
>>
>> FWIW, I tried your config file on my 4-way AMD64 (melody) box and ran
>> latest "mtest01" fine.
>>
>> I am now trying runalltests. I guess, its time to bi-sect :(
>
>
> There was a panic on PPC64 during LTP too, but it seems to have gone
> away with rc3-mm1. Not sure if it was really fixed, or just intermittent.
>
> http://test.kernel.org/abat/29675/debug/console.log

I think its more intermittant than gone. I've got another machine which
runs the same tests, and she threw a very similar failure on 2.6.18-rc3-mm1.

-apw

2006-05-01 18:34:34

[permalink] [raw]

Subject: Re: 2.6.17-rc2-mm1

On Monday 01 May 2006 16:24, Martin J. Bligh wrote:

> double fault: 0000 [1] SMP
> last sysfs file: /devices/pci0000:00/0000:00:06.0/resource
> CPU 0
> Modules linked in:
> Pid: 20519, comm: mtest01 Not tainted 2.6.17-rc3-mm1-autokern1 #1
> RIP: 0010:[<ffffffff8047c8b8>] <ffffffff8047c8b8>{__sched_text_start+1856}
> RSP: 0000:0000000000000000 EFLAGS: 00010082
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff805d9438
> RDX: ffff8100db12c0d0 RSI: ffffffff805d9438 RDI: ffff8100db12c0d0
> RBP: ffffffff805d9438 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> R13: ffff8100e39bd440 R14: ffff810008003620 R15: 000002b02751726c
> FS: 0000000000000000(0000) GS:ffffffff805fa000(0063) knlGS:00000000f7dd0460
> CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
> CR2: fffffffffffffff8 CR3: 00000000da399000 CR4: 00000000000006e0
> Process mtest01 (pid: 20519, threadinfo ffff8100b1bb4000, task
> ffff8100db12c0d0)
> Stack: ffffffff80579e20 ffff8100db12c0d0 0000000000000001 ffffffff80579f58
> 0000000000000000 ffffffff80579e78 ffffffff8020b0b2 ffffffff80579f58
> 0000000000000000 ffffffff80485520
> Call Trace: <#DF> <ffffffff8020b0b2>{show_registers+140}
> <ffffffff8020b357>{__die+159} <ffffffff8020b3cc>{die+50}
> <ffffffff8020bba6>{do_double_fault+115}
> <ffffffff8020aa91>{double_fault+125}
> <ffffffff8047c8b8>{__sched_text_start+1856} <EOE>

That's really strange - i wonder why the backtracer can't find the original
stack. Should probably add some printk diagnosis here.

Can you send the output with this patch?

Index: linux/arch/x86_64/kernel/traps.c
===================================================================
--- linux.orig/arch/x86_64/kernel/traps.c
+++ linux/arch/x86_64/kernel/traps.c
@@ -238,6 +238,7 @@ void show_trace(unsigned long *stack)
HANDLE_STACK (stack < estack_end);
i += printk(" <EOE>");
stack = (unsigned long *) estack_end[-2];
+ printk("new stack %lx (%lx %lx %lx %lx %lx)\n", stack, estack_end[0], estack_end[-1], estack_end[-2], estack_end[-3], estack_end[-4]);
continue;
}
if (irqstack_end) {

2006-05-02 13:20:56

[permalink] [raw]

Subject: Re: 2.6.17-rc2-mm1

Andi Kleen wrote:
> On Monday 01 May 2006 16:24, Martin J. Bligh wrote:
>
>
>>double fault: 0000 [1] SMP
>>last sysfs file: /devices/pci0000:00/0000:00:06.0/resource
>>CPU 0
>>Modules linked in:
>>Pid: 20519, comm: mtest01 Not tainted 2.6.17-rc3-mm1-autokern1 #1
>>RIP: 0010:[<ffffffff8047c8b8>] <ffffffff8047c8b8>{__sched_text_start+1856}
>>RSP: 0000:0000000000000000 EFLAGS: 00010082
>>RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff805d9438
>>RDX: ffff8100db12c0d0 RSI: ffffffff805d9438 RDI: ffff8100db12c0d0
>>RBP: ffffffff805d9438 R08: 0000000000000000 R09: 0000000000000000
>>R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>>R13: ffff8100e39bd440 R14: ffff810008003620 R15: 000002b02751726c
>>FS: 0000000000000000(0000) GS:ffffffff805fa000(0063) knlGS:00000000f7dd0460
>>CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
>>CR2: fffffffffffffff8 CR3: 00000000da399000 CR4: 00000000000006e0
>>Process mtest01 (pid: 20519, threadinfo ffff8100b1bb4000, task
>>ffff8100db12c0d0)
>>Stack: ffffffff80579e20 ffff8100db12c0d0 0000000000000001 ffffffff80579f58
>> 0000000000000000 ffffffff80579e78 ffffffff8020b0b2 ffffffff80579f58
>> 0000000000000000 ffffffff80485520
>>Call Trace: <#DF> <ffffffff8020b0b2>{show_registers+140}
>> <ffffffff8020b357>{__die+159} <ffffffff8020b3cc>{die+50}
>> <ffffffff8020bba6>{do_double_fault+115}
>><ffffffff8020aa91>{double_fault+125}
>> <ffffffff8047c8b8>{__sched_text_start+1856} <EOE>
>
>
> That's really strange - i wonder why the backtracer can't find the original
> stack. Should probably add some printk diagnosis here.
>
> Can you send the output with this patch?

Submitted, they should show up in teh matrix forthwith. Will drop you
the output when done.

-apw

2006-05-02 20:00:57

[permalink] [raw]

Subject: Re: 2.6.17-rc2-mm1

> That's really strange - i wonder why the backtracer can't find the original
> stack. Should probably add some printk diagnosis here.
>
> Can you send the output with this patch?
>
> Index: linux/arch/x86_64/kernel/traps.c
> ===================================================================
> --- linux.orig/arch/x86_64/kernel/traps.c
> +++ linux/arch/x86_64/kernel/traps.c
> @@ -238,6 +238,7 @@ void show_trace(unsigned long *stack)
> HANDLE_STACK (stack < estack_end);
> i += printk(" <EOE>");
> stack = (unsigned long *) estack_end[-2];
> + printk("new stack %lx (%lx %lx %lx %lx %lx)\n", stack, estack_end[0], estack_end[-1], estack_end[-2], estack_end[-3], estack_end[-4]);
> continue;
> }
> if (irqstack_end) {

Thanks for running this Andy:

http://test.kernel.org/abat/30183/debug/console.log

2006-05-02 20:09:44

[permalink] [raw]

Subject: Re: 2.6.17-rc2-mm1

On Tuesday 02 May 2006 22:00, Martin Bligh wrote:

> > Index: linux/arch/x86_64/kernel/traps.c
> > ===================================================================
> > --- linux.orig/arch/x86_64/kernel/traps.c
> > +++ linux/arch/x86_64/kernel/traps.c
> > @@ -238,6 +238,7 @@ void show_trace(unsigned long *stack)
> > HANDLE_STACK (stack < estack_end);
> > i += printk(" <EOE>");
> > stack = (unsigned long *) estack_end[-2];
> > + printk("new stack %lx (%lx %lx %lx %lx %lx)\n", stack, estack_end[0], estack_end[-1], estack_end[-2], estack_end[-3], estack_end[-4]);
> > continue;
> > }
> > if (irqstack_end) {
>
> Thanks for running this Andy:
>
> http://test.kernel.org/abat/30183/debug/console.log

<EOE>new stack 0 (0 0 0 10082 10)

Hmm weird. There isn't anything resembling an exception frame at the top of the
stack. No idea how this could happen.

-Andi

2006-05-02 20:21:49

[permalink] [raw]

Subject: Re: 2.6.17-rc2-mm1

> Andy, any chance you could do another run on elm3b6 of ltp with:
> 2.6.17-rc3 + http://test.kernel.org/patches/2.6.17-rc3-mm1-64
>
> Which is:
>
> x86_64-add-compat_sys_vmsplice-and-use-it-in.patch
> i386-x86-64-fix-acpi-disabled-lapic-handling.patch
> x86_64-mm-defconfig-update.patch
> x86_64-mm-phys-apicid.patch
> x86_64-mm-memset-always-inline.patch
> x86_64-mm-amd-core-cpuid.patch
> x86_64-mm-amd-cpuid4.patch
> x86_64-mm-alternatives.patch
> x86_64-mm-pci-dma-cleanup.patch
> x86_64-mm-ia32-unistd-cleanup.patch
> x86_64-mm-large-bzimage.patch
> x86_64-mm-topology-comment.patch
> x86_64-mm-agp-select.patch
> x86_64-mm-iommu_gart_bitmap-search-to-cross-next_bit.patch
> x86_64-mm-new-compat-ptrace.patch
> x86_64-mm-disable-agp-resource-check.patch
> x86_64-mm-avoid-irq0-ioapic-pin-collision.patch
> x86_64-mm-gart-direct-call.patch
> x86_64-mm-new-northbridge.patch
> x86_64-mm-iommu-warning.patch
> x86_64-mm-serialize-assign_irq_vector-use-of-static-variables.patch
> x86_64-mm-simplify-ioapic_register_intr.patch
> x86_64-mm-i386-apic-overwrite.patch
> x86_64-mm-i386-up-generic-arch.patch
> x86_64-mm-iommu-enodev.patch
> x86_64-mm-fix-die_lock-nesting.patch
> x86_64-mm-add-nmi_exit-to-die_nmi.patch
> x86_64-mm-compat-printk.patch
> x86_64-mm-hotadd-reserve-fix-fix-fix.patch
> x86_64-mm-compat-printk-fix.patch
> x86_64-mm-new-northbridge-fix.patch
> x86-64-calgary-iommu-introduce-iommu_detected.patch
> x86-64-calgary-iommu-calgary-specific-bits.patch
> x86-64-calgary-iommu-hook-it-in.patch
> x86-64-check-for-valid-dma-data-direction-in-the-dma-api.patch

It worked fine with this lot. Hmmm. I guess that makes sense, if it's
the same issue that intermittently occurs on ppc64 (and perhaps it's
easier to diagnose there, since we get better stack tracing).

M.

2006-05-03 06:46:17

by Jan Beulich

[permalink] [raw]

Subject: Re: 2.6.17-rc2-mm1

>>> Andi Kleen <[email protected]> 02.05.06 22:09 >>>
>On Tuesday 02 May 2006 22:00, Martin Bligh wrote:
>
>> > Index: linux/arch/x86_64/kernel/traps.c
>> > ===================================================================
>> > --- linux.orig/arch/x86_64/kernel/traps.c
>> > +++ linux/arch/x86_64/kernel/traps.c
>> > @@ -238,6 +238,7 @@ void show_trace(unsigned long *stack)
>> > HANDLE_STACK (stack < estack_end);
>> > i += printk(" <EOE>");
>> > stack = (unsigned long *) estack_end[-2];
>> > + printk("new stack %lx (%lx %lx %lx %lx %lx)\n", stack, estack_end[0], estack_end[-1],
estack_end[-2], estack_end[-3], estack_end[-4]);
>> > continue;
>> > }
>> > if (irqstack_end) {
>>
>> Thanks for running this Andy:
>>
>> http://test.kernel.org/abat/30183/debug/console.log
>
>
><EOE>new stack 0 (0 0 0 10082 10)

Looks like <rubbish> <SS> <RSP> <RFLAGS> <CS> to me, ...

>Hmm weird. There isn't anything resembling an exception frame at the top of the
>stack. No idea how this could happen.

... which is a valid frame where the stack pointer was corrupted before the exception occurred. One more printed item
(or rather, starting items at estack_end[-1]) would allow at least seeing what RIP this came from.

This actually points out another weakness of that code: if you pick up a mis-aligned stack pointer then the conditions
in both the exception and interrupt stack invocations of HANDLE_STACK() won't prevent you from accessing an item
crossing a page boundary, and hence potentially faulting. Similarly, obtaining an entirely bad stack pointer anywhere in
that code will result in a fault. I guess the stack reads should really be done using get_user() or some other code
having recovery attached.

Jan

2006-05-03 06:49:55

[permalink] [raw]

Subject: Re: 2.6.17-rc2-mm1

On Wednesday 03 May 2006 08:47, Jan Beulich wrote:
> >>> Andi Kleen <[email protected]> 02.05.06 22:09 >>>
> >On Tuesday 02 May 2006 22:00, Martin Bligh wrote:
> >
> >> > Index: linux/arch/x86_64/kernel/traps.c
> >> > ===================================================================
> >> > --- linux.orig/arch/x86_64/kernel/traps.c
> >> > +++ linux/arch/x86_64/kernel/traps.c
> >> > @@ -238,6 +238,7 @@ void show_trace(unsigned long *stack)
> >> > HANDLE_STACK (stack < estack_end);
> >> > i += printk(" <EOE>");
> >> > stack = (unsigned long *) estack_end[-2];
> >> > + printk("new stack %lx (%lx %lx %lx %lx %lx)\n", stack, estack_end[0], estack_end[-1],
> estack_end[-2], estack_end[-3], estack_end[-4]);
> >> > continue;
> >> > }
> >> > if (irqstack_end) {
> >>
> >> Thanks for running this Andy:
> >>
> >> http://test.kernel.org/abat/30183/debug/console.log
> >
> >
> ><EOE>new stack 0 (0 0 0 10082 10)
>
> Looks like <rubbish> <SS> <RSP> <RFLAGS> <CS> to me, ...

Hmm, right.

> >Hmm weird. There isn't anything resembling an exception frame at the top of the
> >stack. No idea how this could happen.
>
> ... which is a valid frame where the stack pointer was corrupted before the exception occurred. One more printed item
> (or rather, starting items at estack_end[-1]) would allow at least seeing what RIP this came from.

Any can you add that please and check?

Also worst case one could dump last branch pointers. AMD unfortunately only has four,
on Intel with 16 it's easier.

I can provide a patch for that if needed.

> This actually points out another weakness of that code: if you pick up a mis-aligned stack pointer then the conditions
> in both the exception and interrupt stack invocations of HANDLE_STACK() won't prevent you from accessing an item
> crossing a page boundary, and hence potentially faulting.

Yes it probably should check for that.

> Similarly, obtaining an entirely bad stack pointer anywhere in
> that code will result in a fault. I guess the stack reads should really be done using get_user() or some other code
> having recovery attached.

That can cause recursive exceptions. I'm a bit paranoid with that.

-Andi

2006-05-03 07:07:36

by Jan Beulich

[permalink] [raw]

Subject: Re: 2.6.17-rc2-mm1

>> ><EOE>new stack 0 (0 0 0 10082 10)
>>
>> Looks like <rubbish> <SS> <RSP> <RFLAGS> <CS> to me, ...
>
>Hmm, right.
>
>> >Hmm weird. There isn't anything resembling an exception frame at the top of the
>> >stack. No idea how this could happen.
>>
>> ... which is a valid frame where the stack pointer was corrupted before the exception occurred. One more printed
item
>> (or rather, starting items at estack_end[-1]) would allow at least seeing what RIP this came from.
>
>Any can you add that please and check?

???

>Also worst case one could dump last branch pointers. AMD unfortunately only has four,
>on Intel with 16 it's easier.

Provided you disable recording early enough. Otherwise only one (last exception from/to) is going to be useful on
both.

>I can provide a patch for that if needed.
>
>> This actually points out another weakness of that code: if you pick up a mis-aligned stack pointer then the
conditions
>> in both the exception and interrupt stack invocations of HANDLE_STACK() won't prevent you from accessing an item
>> crossing a page boundary, and hence potentially faulting.
>
>Yes it probably should check for that.
>
>> Similarly, obtaining an entirely bad stack pointer anywhere in
>> that code will result in a fault. I guess the stack reads should really be done using get_user() or some other code
>> having recovery attached.
>
>That can cause recursive exceptions. I'm a bit paranoid with that.

Without doing so it can also cause recursive exceptions, just that this is going to be deadly then.

Jan

2006-05-03 07:40:29

[permalink] [raw]

Subject: Re: 2.6.17-rc2-mm1

On Wednesday 03 May 2006 09:08, Jan Beulich wrote:
> >> ><EOE>new stack 0 (0 0 0 10082 10)
> >>
> >> Looks like <rubbish> <SS> <RSP> <RFLAGS> <CS> to me, ...
> >
> >Hmm, right.
> >
> >> >Hmm weird. There isn't anything resembling an exception frame at the top of the
> >> >stack. No idea how this could happen.
> >>
> >> ... which is a valid frame where the stack pointer was corrupted before the exception occurred. One more printed
> item
> >> (or rather, starting items at estack_end[-1]) would allow at least seeing what RIP this came from.
> >
> >Any can you add that please and check?
> ???

Sorry I meant to write Andy but left out the d :-( - he did the testing
on the machine that showed the problem.

>

>
> >Also worst case one could dump last branch pointers. AMD unfortunately only has four,
> >on Intel with 16 it's easier.
>
> Provided you disable recording early enough. Otherwise only one (last exception from/to) is going to be useful on
> both.

i usually just saved them as first thing in the exception entry point.

> >That can cause recursive exceptions. I'm a bit paranoid with that.
>
> Without doing so it can also cause recursive exceptions, just that this is going to be deadly then.

Hmm point.

-Andi

2006-05-03 08:13:03

[permalink] [raw]

Subject: Re: 2.6.17-rc2-mm1

Andi Kleen wrote:
> On Wednesday 03 May 2006 09:08, Jan Beulich wrote:
>
>>>>><EOE>new stack 0 (0 0 0 10082 10)
>>>>
>>>>Looks like <rubbish> <SS> <RSP> <RFLAGS> <CS> to me, ...
>>>
>>>Hmm, right.
>>>
>>>
>>>>>Hmm weird. There isn't anything resembling an exception frame at the top of the
>>>>>stack. No idea how this could happen.
>>>>
>>>>... which is a valid frame where the stack pointer was corrupted before the exception occurred. One more printed
>>
>>item
>>
>>>>(or rather, starting items at estack_end[-1]) would allow at least seeing what RIP this came from.
>>>
>>>Any can you add that please and check?
>>
>>???
>
>
> Sorry I meant to write Andy but left out the d :-( - he did the testing
> on the machine that showed the problem.

Heh, happy to do the testing. Just to make sure I am doing the right
thing, you want an entire stack frame dropped out in the case that
SS/RSP are 0; so we get the RIP.

-apw

2006-05-03 08:24:36

by Jan Beulich

[permalink] [raw]

Subject: Re: 2.6.17-rc2-mm1

>Heh, happy to do the testing. Just to make sure I am doing the right
>thing, you want an entire stack frame dropped out in the case that
>SS/RSP are 0; so we get the RIP.

Just decrement the indices in the previous printk() by one each, i.e. ranging from -1 to -5.

Jan

2006-05-03 19:27:30

[permalink] [raw]

Subject: Re: 2.6.17-rc2-mm1

Andi Kleen wrote:
> On Wednesday 03 May 2006 08:47, Jan Beulich wrote:
>
>>>>>Andi Kleen <[email protected]> 02.05.06 22:09 >>>
>>>
>>>On Tuesday 02 May 2006 22:00, Martin Bligh wrote:
>>>
>>>
>>>>>Index: linux/arch/x86_64/kernel/traps.c
>>>>>===================================================================
>>>>>--- linux.orig/arch/x86_64/kernel/traps.c
>>>>>+++ linux/arch/x86_64/kernel/traps.c
>>>>>@@ -238,6 +238,7 @@ void show_trace(unsigned long *stack)
>>>>> HANDLE_STACK (stack < estack_end);
>>>>> i += printk(" <EOE>");
>>>>> stack = (unsigned long *) estack_end[-2];
>>>>>+ printk("new stack %lx (%lx %lx %lx %lx %lx)\n", stack, estack_end[0], estack_end[-1],
>>
>>estack_end[-2], estack_end[-3], estack_end[-4]);
>>
>>>>> continue;
>>>>> }
>>>>> if (irqstack_end) {
>>>>
>>>>Thanks for running this Andy:
>>>>
>>>>http://test.kernel.org/abat/30183/debug/console.log
>>>
>>>
>>><EOE>new stack 0 (0 0 0 10082 10)
>>
>>Looks like <rubbish> <SS> <RSP> <RFLAGS> <CS> to me, ...
>
>
> Hmm, right.
>
>
>>>Hmm weird. There isn't anything resembling an exception frame at the top of the
>>>stack. No idea how this could happen.
>>
>>... which is a valid frame where the stack pointer was corrupted before the exception occurred. One more printed item
>>(or rather, starting items at estack_end[-1]) would allow at least seeing what RIP this came from.
>
>
> Any can you add that please and check?

Ok. Just got some results (in full at the end of the message). Seems
that this is indeed a stack frame:

new stack 0 (0 0 10046 10 ffffffff8047c8e8)

And if my reading of the System.map is right, this is _just_ in schedule.

ffffffff8047c17e T sha_init
ffffffff8047c1a8 T __sched_text_start
ffffffff8047c1a8 T schedule
ffffffff8047c8ed T thread_return
ffffffff8047c9be T wait_for_completion
ffffffff8047caa8 T wait_for_completion_timeout

By the looks of it that would make it here, at the call __switch_to?
Which of course makes loads of sense _if_ the loaded stack pointer was
crap say 0.

#define switch_to(prev,next,last) \
asm volatile(SAVE_CONTEXT \
"movq %%rsp,%P[threadrsp](%[prev])\n\t" /* save RSP
*/ \
"movq %P[threadrsp](%[next]),%%rsp\n\t" /* restore
RSP */ \
"call __switch_to\n\t" \
".globl thread_return\n" \
"thread_return:\n\t"

I'll go shove some debug in there and see what pops out.
-apw

double fault: 0000 [1] SMP
last sysfs file: /devices/pci0000:00/0000:00:06.0/resource
CPU 0
Modules linked in:
Pid: 228, comm: kswapd0 Tainted: G M 2.6.17-rc3-mm1-autokern1 #1
RIP: 0010:[<ffffffff8047c8e8>] <ffffffff8047c8e8>{__sched_text_start+1856}
RSP: 0000:0000000000000000 EFLAGS: 00010046
RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffffffff805d9438
RDX: ffff8100e3d4a090 RSI: ffffffff805d9438 RDI: ffff8100e3d4a090
RBP: ffffffff805d9438 R08: 0000000000000001 R09: ffff8101001c9da8
R10: 0000000000000002 R11: 000000000000004d R12: ffffffff805013c0
R13: ffff8100013dc8c0 R14: ffff810008003620 R15: 000002a75ef255cc
FS: 0000000000000000(0000) GS:ffffffff805fa000(0000) knlGS:00000000f7e0b460
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: fffffffffffffff8 CR3: 000000006b004000 CR4: 00000000000006e0
Process kswapd0 (pid: 228, threadinfo ffff8101001c8000, task
ffff8100e3d4a090)
Stack: ffffffff80579e20 ffff8100e3d4a090 0000000000000001 ffffffff80579f58
0000000000000000 ffffffff80579e78 ffffffff8020b0e3 ffffffff80579f58
0000000000000000 ffffffff80485520
Call Trace: <#DF> <ffffffff8020b0e3>{show_registers+140}
<ffffffff8020b388>{__die+159} <ffffffff8020b3fd>{die+50}
<ffffffff8020bbd9>{do_double_fault+115}
<ffffffff8020aa91>{double_fault+125}
<ffffffff8047c8e8>{__sched_text_start+1856} <EOE>new stack 0 (0 0
10046 10 ffffffff8047c8e8)

Code: e8 1c ba d8 ff 65 48 8b 34 25 00 00 00 00 4c 8b 46 08 f0 41
RIP <ffffffff8047c8e8>{__sched_text_start+1856} RSP <0000000000000000>

2006-05-04 07:40:35

[permalink] [raw]

Subject: Re: 2.6.17-rc2-mm1

Andy Whitcroft wrote:
> Andi Kleen wrote:
>
>>On Wednesday 03 May 2006 08:47, Jan Beulich wrote:
>>
>>
>>>>>>Andi Kleen <[email protected]> 02.05.06 22:09 >>>
>>>>
>>>>On Tuesday 02 May 2006 22:00, Martin Bligh wrote:
>>>>
>>>>
>>>>
>>>>>>Index: linux/arch/x86_64/kernel/traps.c
>>>>>>===================================================================
>>>>>>--- linux.orig/arch/x86_64/kernel/traps.c
>>>>>>+++ linux/arch/x86_64/kernel/traps.c
>>>>>>@@ -238,6 +238,7 @@ void show_trace(unsigned long *stack)
>>>>>> HANDLE_STACK (stack < estack_end);
>>>>>> i += printk(" <EOE>");
>>>>>> stack = (unsigned long *) estack_end[-2];
>>>>>>+ printk("new stack %lx (%lx %lx %lx %lx %lx)\n", stack, estack_end[0], estack_end[-1],
>>>
>>>estack_end[-2], estack_end[-3], estack_end[-4]);
>>>
>>>
>>>>>> continue;
>>>>>> }
>>>>>> if (irqstack_end) {
>>>>>
>>>>>Thanks for running this Andy:
>>>>>
>>>>>http://test.kernel.org/abat/30183/debug/console.log
>>>>
>>>>
>>>><EOE>new stack 0 (0 0 0 10082 10)
>>>
>>>Looks like <rubbish> <SS> <RSP> <RFLAGS> <CS> to me, ...
>>
>>
>>Hmm, right.
>>
>>
>>
>>>>Hmm weird. There isn't anything resembling an exception frame at the top of the
>>>>stack. No idea how this could happen.
>>>
>>>... which is a valid frame where the stack pointer was corrupted before the exception occurred. One more printed item
>>>(or rather, starting items at estack_end[-1]) would allow at least seeing what RIP this came from.
>>
>>
>>Any can you add that please and check?
>
>
> Ok. Just got some results (in full at the end of the message). Seems
> that this is indeed a stack frame:
>
> new stack 0 (0 0 10046 10 ffffffff8047c8e8)
>
> And if my reading of the System.map is right, this is _just_ in schedule.
>
> ffffffff8047c17e T sha_init
> ffffffff8047c1a8 T __sched_text_start
> ffffffff8047c1a8 T schedule
> ffffffff8047c8ed T thread_return
> ffffffff8047c9be T wait_for_completion
> ffffffff8047caa8 T wait_for_completion_timeout
>
> By the looks of it that would make it here, at the call __switch_to?
> Which of course makes loads of sense _if_ the loaded stack pointer was
> crap say 0.

A quick patch to check for a 0 rsp triggers, so will try and find out
what the new process is.

-apw

2006-05-04 16:29:26