2002-11-15 05:51:03

by Justin A

[permalink] [raw]
Subject: 2.5.47-ac4 panic on boot.

My thinkpad 760e starts to boot but panics while running depmod -a:
(unfortunately the top is cut off...it doesn't fit:))

...
CPU: 0
EIP: 0060:[<c012b914>] Not tainted
EFLAGS: 00010006
EIP is at reap_timer_fnc+0x104/0x40c
eax: 00000002 ebx: c47ffab4 ecx: c47fe8a0 edx: 00000003
esi: 00000002 edi: c4742414 ebp: c47ffa98 esp: c031df8c
ds: 0068 es: 0068 ss: 0068
Process swapper (pid: 0, threadinfo=c031c000 task=c02da3e0)
Stack: same as call trace...

Call trace:
[<c012b810>] reap_timer_fnc+0x0/0x40c
[<c011ae47>] run_timer_tasklet+0xb7/0xe8
[<c0117f39>] tasklet_hi_action+0x3d/0x60
[<c0117d5a>] do_softirq+0x5a/0xac
[<c010a268>] do_IRQ+0xc8/0xd4
[<c0105000>] stext+0x0/0x1c
[<c0108b03>] common_interrupt+0x43/x060

Code: 0f 0b fb 07 a8 fe 21 c0 8d 74 26 00 8b 41 04 8b 11 89 42 04

I tried it a few times...the last few change, but its always in
reap_timer_fnc.

I haven't tried any other 2.5 kernels, but 2.4.18/19 work.
--
-Justin


2002-11-15 06:26:18

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.47-ac4 panic on boot.

Justin A wrote:
>
> My thinkpad 760e starts to boot but panics while running depmod -a:
> (unfortunately the top is cut off...it doesn't fit:))
>
> ...
> CPU: 0
> EIP: 0060:[<c012b914>] Not tainted
> EFLAGS: 00010006
> EIP is at reap_timer_fnc+0x104/0x40c
> eax: 00000002 ebx: c47ffab4 ecx: c47fe8a0 edx: 00000003
> esi: 00000002 edi: c4742414 ebp: c47ffa98 esp: c031df8c
> ds: 0068 es: 0068 ss: 0068
> Process swapper (pid: 0, threadinfo=c031c000 task=c02da3e0)
> Stack: same as call trace...
>
> Call trace:
> [<c012b810>] reap_timer_fnc+0x0/0x40c
> [<c011ae47>] run_timer_tasklet+0xb7/0xe8
> [<c0117f39>] tasklet_hi_action+0x3d/0x60
> [<c0117d5a>] do_softirq+0x5a/0xac
> [<c010a268>] do_IRQ+0xc8/0xd4
> [<c0105000>] stext+0x0/0x1c
> [<c0108b03>] common_interrupt+0x43/x060
>
> Code: 0f 0b fb 07 a8 fe 21 c0 8d 74 26 00 8b 41 04 8b 11 89 42 04
>
> I tried it a few times...the last few change, but its always in
> reap_timer_fnc.
>

This is probably a dodgy device driver doing something bad with
kmalloced memory.

If you have time, please go through and disable various drivers
in config, see if you can isolate it to a particular one.

Thanks.

2002-11-15 07:45:38

by Justin A

[permalink] [raw]
Subject: Re: 2.5.47-ac4 panic on boot.

I did that...
Disabling PM made it boot:

< CONFIG_PM=y
< CONFIG_APM=m
< # CONFIG_APM_IGNORE_USER_SUSPEND is not set
< CONFIG_APM_DO_ENABLE=y
< CONFIG_APM_CPU_IDLE=y
< CONFIG_APM_DISPLAY_BLANK=y
< # CONFIG_APM_RTC_IS_GMT is not set
< # CONFIG_APM_ALLOW_INTS is not set
< # CONFIG_APM_REAL_MODE_POWER_OFF is not set
---
> # CONFIG_PM is not set


I think I still had swsusp on before I disabled PM...I will have to test more
tomorrow to make sure thats it...
Perhaps its the CONFIG_APM_CPU_IDLE that did it?

I had tried linux init=/bin/sh, which got to the shell, then 2 seconds later
paniced, so I have a feeling its the idle thingy:)

On another note, pcibios_read_config_dword seems to be missing, and
pcmcia-core wants it. I'll have to see whats up with that tomorrow...but at
least I got it booting now:)

--
-Justin


On Friday 15 November 2002 01:33 am, Andrew Morton wrote:
> Justin A wrote:
> > My thinkpad 760e starts to boot but panics while running depmod -a:
> > (unfortunately the top is cut off...it doesn't fit:))
> >
> > ...
> > CPU: 0
> > EIP: 0060:[<c012b914>] Not tainted
> > EFLAGS: 00010006
> > EIP is at reap_timer_fnc+0x104/0x40c
> > eax: 00000002 ebx: c47ffab4 ecx: c47fe8a0 edx: 00000003
> > esi: 00000002 edi: c4742414 ebp: c47ffa98 esp: c031df8c
> > ds: 0068 es: 0068 ss: 0068
> > Process swapper (pid: 0, threadinfo=c031c000 task=c02da3e0)
> > Stack: same as call trace...
> >
> > Call trace:
> > [<c012b810>] reap_timer_fnc+0x0/0x40c
> > [<c011ae47>] run_timer_tasklet+0xb7/0xe8
> > [<c0117f39>] tasklet_hi_action+0x3d/0x60
> > [<c0117d5a>] do_softirq+0x5a/0xac
> > [<c010a268>] do_IRQ+0xc8/0xd4
> > [<c0105000>] stext+0x0/0x1c
> > [<c0108b03>] common_interrupt+0x43/x060
> >
> > Code: 0f 0b fb 07 a8 fe 21 c0 8d 74 26 00 8b 41 04 8b 11 89 42 04
> >
> > I tried it a few times...the last few change, but its always in
> > reap_timer_fnc.
>
> This is probably a dodgy device driver doing something bad with
> kmalloced memory.
>
> If you have time, please go through and disable various drivers
> in config, see if you can isolate it to a particular one.
>
> Thanks.

2002-11-15 08:21:29

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.47-ac4 panic on boot.

Justin A wrote:
>
> I did that...
> Disabling PM made it boot:
>
> < CONFIG_PM=y
> < CONFIG_APM=m
> < # CONFIG_APM_IGNORE_USER_SUSPEND is not set
> < CONFIG_APM_DO_ENABLE=y
> < CONFIG_APM_CPU_IDLE=y
> < CONFIG_APM_DISPLAY_BLANK=y
> < # CONFIG_APM_RTC_IS_GMT is not set
> < # CONFIG_APM_ALLOW_INTS is not set
> < # CONFIG_APM_REAL_MODE_POWER_OFF is not set
> ---
> > # CONFIG_PM is not set
>
> I think I still had swsusp on before I disabled PM...I will have to test more
> tomorrow to make sure thats it...
> Perhaps its the CONFIG_APM_CPU_IDLE that did it?

Could be...

> I had tried linux init=/bin/sh, which got to the shell, then 2 seconds later
> paniced, so I have a feeling its the idle thingy:)

slab runs a timer every couple of seconds to drain caches which could
otherwise be wasted-for-ever memory. So it looks like we don't get
to find out about abuse until much later. Damn.

Oh, it'd be interesting to enable memory debugging under the kernel-hacking
menu - that may trap the bug when it's happening.

2002-11-15 21:56:41

by Samuli Suonpaa

[permalink] [raw]
Subject: Re: 2.5.47-ac4 panic on boot.

Justin A <[email protected]> writes:
> On another note, pcibios_read_config_dword seems to be missing, and
> pcmcia-core wants it. I'll have to see whats up with that
> tomorrow...but at least I got it booting now:)

Osamu Tomita sent following fix a few days ago. No rediffed against
2.5.47-ac4. With this, I was able to get pcmcia_core loading. On the
other hand, I seem to be able to crash my Dell Latitude by removing
and then re-inserting 3c59x -cardbus-adapter. And by trying to reboot
or halt, so I really cannot tell whether this patch _really_ works.
But it definitely did something!-)

Suonp??...

--- linux-2.5.47-ac4/drivers/pcmcia/cistpl.c.orig 2002-11-15 23:29:24.000000000 +0200
+++ linux-2.5.47-ac4/drivers/pcmcia/cistpl.c 2002-11-15 23:31:35.000000000 +0200
@@ -430,7 +430,7 @@
#ifdef CONFIG_CARDBUS
if (s->state & SOCKET_CARDBUS) {
u_int ptr;
- pcibios_read_config_dword(s->cap.cb_dev->subordinate->number, 0, 0x28, &ptr);
+ pci_bus_read_config_dword(s->cap.cb_dev->subordinate, 0, 0x28, &ptr);
tuple->CISOffset = ptr & ~7;
SPACE(tuple->Flags) = (ptr & 7);
} else

2002-11-15 22:03:29

by Justin A

[permalink] [raw]
Subject: Re: 2.5.47-ac4 panic on boot.

On Friday 15 November 2002 03:28 am, you wrote:
> Justin A wrote:
> > I did that...
> > Disabling PM made it boot:
> >
> > < CONFIG_PM=y
> > < CONFIG_APM=m
> > < # CONFIG_APM_IGNORE_USER_SUSPEND is not set
> > < CONFIG_APM_DO_ENABLE=y
> > < CONFIG_APM_CPU_IDLE=y
> > < CONFIG_APM_DISPLAY_BLANK=y
> > < # CONFIG_APM_RTC_IS_GMT is not set
> > < # CONFIG_APM_ALLOW_INTS is not set
> > < # CONFIG_APM_REAL_MODE_POWER_OFF is not set
> > ---
> >
> > > # CONFIG_PM is not set
> >
> > I think I still had swsusp on before I disabled PM...I will have to test
> > more tomorrow to make sure thats it...
> > Perhaps its the CONFIG_APM_CPU_IDLE that did it?
>
> Could be...
>
> > I had tried linux init=/bin/sh, which got to the shell, then 2 seconds
> > later paniced, so I have a feeling its the idle thingy:)
>
> slab runs a timer every couple of seconds to drain caches which could
> otherwise be wasted-for-ever memory. So it looks like we don't get
> to find out about abuse until much later. Damn.
>
> Oh, it'd be interesting to enable memory debugging under the kernel-hacking
> menu - that may trap the bug when it's happening.

I tried 2.5.47-ac5...
I got the configs confused so I had started from an old one and built up until
it crashed:

(%:/data/src/tp/2.5/linux-2.5.47-ac5)- diff .config ../CONFIG_works
159,167c159
< CONFIG_PNP=y
< # CONFIG_PNP_NAMES is not set
< # CONFIG_PNP_DEBUG is not set
<
< #
< # Protocols
< #
< CONFIG_ISAPNP=y
< CONFIG_PNPBIOS=y
---
> # CONFIG_PNP is not set
213d204
< # CONFIG_BLK_DEV_ISAPNP is not set
362d352
< # CONFIG_NET_SB1000 is not set

.config crashes, it oopses as soon as pnp starts, then 2 seconds later the
previous oops comes up and it panics.

the only thing I managed to read was that the EIP was in kfree, and most of
the call traces had pnp_ in the name

I can't get the rest until I figure out how to get some kind of serial console
working...I wonder if I can get it working over infrared to my pda :)

attached is the full .config:
--
-Justin


Attachments:
.config (17.10 kB)

2002-11-15 22:09:42

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.47-ac4 panic on boot.

Justin A wrote:
>
> .config crashes, it oopses as soon as pnp starts, then 2 seconds later the
> previous oops comes up and it panics.

Irritating when it does that. Here's a little patch which should
stop the machine dead after the first ooops, prevent stuff from
scrolling off the screen.

This, with the missing touch_nmi_watchdog() would be a handy
kernel boot option, perhaps.


--- 25/arch/i386/kernel/traps.c~noscroll Tue Nov 12 13:13:24 2002
+++ 25-akpm/arch/i386/kernel/traps.c Tue Nov 12 13:14:16 2002
@@ -84,7 +84,7 @@ asmlinkage void alignment_check(void);
asmlinkage void spurious_interrupt_bug(void);
asmlinkage void machine_check(void);

-static int kstack_depth_to_print = 24;
+static int kstack_depth_to_print = 10;


/*
@@ -246,6 +246,9 @@ bad:
printk("%02x ", c);
}
}
+ local_irq_disable();
+ for ( ; ; )
+ ;
printk("\n");
}


_

2002-11-15 22:29:22

by Justin A

[permalink] [raw]
Subject: Re: 2.5.47-ac4 panic on boot.

Thanks! that lets pcmcia_core load at least...It still doesn't work though:(
I had the same problem with 2.4.19 though, I need to use pcmcia-cs for it to
work. pcmcia-cs is 3.2.1 while the kernel is 3.1.22, and I haven't figured
out how to get pcmcia-cs to work with 2.5 yet:)

On Friday 15 November 2002 05:02 pm, Samuli Suonpaa wrote:
> Justin A <[email protected]> writes:
> > On another note, pcibios_read_config_dword seems to be missing, and
> > pcmcia-core wants it. I'll have to see whats up with that
> > tomorrow...but at least I got it booting now:)
>
> Osamu Tomita sent following fix a few days ago. No rediffed against
> 2.5.47-ac4. With this, I was able to get pcmcia_core loading. On the
> other hand, I seem to be able to crash my Dell Latitude by removing
> and then re-inserting 3c59x -cardbus-adapter. And by trying to reboot
> or halt, so I really cannot tell whether this patch _really_ works.
> But it definitely did something!-)
>
> Suonp??...
>
> --- linux-2.5.47-ac4/drivers/pcmcia/cistpl.c.orig 2002-11-15
> 23:29:24.000000000 +0200 +++
> linux-2.5.47-ac4/drivers/pcmcia/cistpl.c 2002-11-15 23:31:35.000000000
> +0200 @@ -430,7 +430,7 @@
> #ifdef CONFIG_CARDBUS
> if (s->state & SOCKET_CARDBUS) {
> u_int ptr;
> - pcibios_read_config_dword(s->cap.cb_dev->subordinate->number, 0, 0x28,
> &ptr); + pci_bus_read_config_dword(s->cap.cb_dev->subordinate, 0, 0x28,
> &ptr); tuple->CISOffset = ptr & ~7;
> SPACE(tuple->Flags) = (ptr & 7);
> } else

--
-Justin

2002-11-15 22:50:20

by Justin A

[permalink] [raw]
Subject: Re: 2.5.47-ac4 panic on boot.

Ok..that worked:

for future reference...what is the important part of the oops so I don't have
to type the useless parts?

I'm guessing just the Call Trace, which is:

pnpbios_set_resources+0x7c/0x90
pnp_activate_dev+0xe6/0x114
pnp_device_probe+0x33/0xc4
bus_match+0x37/0x5c
driver_attach+0x44/0x74
bus_add_driver+0x60/0x80
driver_register+0x69/0x84
pnp_register_driver+0x29/0x48
init+0x0/0x13c
init+0x1a/0x13c
init+0x0/0x13c
kernel_thread_helper+0x5/0xc
--
-Justin

On Friday 15 November 2002 05:16 pm, Andrew Morton wrote:
> Justin A wrote:
> > .config crashes, it oopses as soon as pnp starts, then 2 seconds later
> > the previous oops comes up and it panics.
>
> Irritating when it does that. Here's a little patch which should
> stop the machine dead after the first ooops, prevent stuff from
> scrolling off the screen.
>
> This, with the missing touch_nmi_watchdog() would be a handy
> kernel boot option, perhaps.
>
>
> --- 25/arch/i386/kernel/traps.c~noscroll Tue Nov 12 13:13:24 2002
> +++ 25-akpm/arch/i386/kernel/traps.c Tue Nov 12 13:14:16 2002
> @@ -84,7 +84,7 @@ asmlinkage void alignment_check(void);
> asmlinkage void spurious_interrupt_bug(void);
> asmlinkage void machine_check(void);
>
> -static int kstack_depth_to_print = 24;
> +static int kstack_depth_to_print = 10;
>
>
> /*
> @@ -246,6 +246,9 @@ bad:
> printk("%02x ", c);
> }
> }
> + local_irq_disable();
> + for ( ; ; )
> + ;
> printk("\n");
> }
>
>
> _