2002-10-21 22:13:15

by Stephen Hemminger

[permalink] [raw]
Subject: 2.5.44 crash on reboot

The following happens on 2-way SMP box every time I reboot using
serial console. Not sure if it is a socket or inode problem but it looks
like a close race.
--------------------------------------------------------------------

Unable to handle kernel NULL pointer dereference at virtual address
00000000
printing eip:
c01b1a38
*pde = 00000000
Oops: 0000
ide-cd cdrom soundcore mga agpgart autofs nfs lockd sunrpc eepro100 mii
mousede
CPU: 0
EIP: 0060:[<c01b1a38>] Not tainted
EFLAGS: 00010246
EIP is at device_shutdown+0x78/0x9e
eax: ffffffff ebx: 00000000 ecx: c0392650 edx: 00000000
esi: 00000001 edi: ec3ee000 ebp: bffffdb8 esp: ec3efe8c
ds: 0068 es: 0068 ss: 0068
Process reboot (pid: 19350, threadinfo=ec3ee000 task=f01a6cc0)
Stack: c0299ffc 00000077 00000000 01234567 c0130ae2 c03f5bac 00000001
00000000
c01372f8 c19fd0d0 00000007 f0270d10 00001000 00000002 fffee334
f0270d10
f0270ce0 ec57b420 c01394f9 f0270ce0 eed08b40 420cdaf0 00000000
fffee334
Call Trace:
[<c0130ae2>] sys_reboot+0x182/0x380
[<c01372f8>] pte_alloc_map+0x128/0x140
[<c01394f9>] handle_mm_fault+0xb9/0x1c0
[<c0156f23>] invalidate_inode_buffers+0x13/0xc0
[<c022e65b>] sock_destroy_inode+0x1b/0xa0
[<c016f41a>] destroy_inode+0x6a/0x70
[<c01704fc>] generic_forget_inode+0x14c/0x180
[<c0276906>] inet_release+0x56/0x70
[<c01705ac>] iput+0x5c/0x80
[<c016cff0>] dput+0x30/0x200
[<c0155958>] __fput+0x108/0x140
[<c0152fd8>] filp_close+0xf8/0x130
[<c0153089>] sys_close+0x79/0x90
[<c0109a0f>] syscall_call+0x7/0xb



2002-10-21 22:30:10

by Patrick Mochel

[permalink] [raw]
Subject: Re: 2.5.44 crash on reboot


On Mon, 21 Oct 2002, Patrick Mochel wrote:

>
> On 21 Oct 2002, Stephen Hemminger wrote:
>
> > The following happens on 2-way SMP box every time I reboot using
> > serial console. Not sure if it is a socket or inode problem but it looks
> > like a close race.
> > --------------------------------------------------------------------
> >
> > Unable to handle kernel NULL pointer dereference at virtual address
> > 00000000
> > printing eip:
> > c01b1a38
> > *pde = 00000000
> > Oops: 0000
> > ide-cd cdrom soundcore mga agpgart autofs nfs lockd sunrpc eepro100 mii
> > mousede
> > CPU: 0
> > EIP: 0060:[<c01b1a38>] Not tainted
> > EFLAGS: 00010246
> > EIP is at device_shutdown+0x78/0x9e
> ^^^^^^^^^^^^^^^
>
> Actually, it appears to be a problem accessing the global device list.
> Could you please send me your .config (private email is fine).

Actually, if anyone else is experiencing problems shutting down, as there
have been quite a few reports in the last few days (thanks to myself and
the SCSI guys), could you please do the following:

- Apply both the patches mentioned in this email:
http://marc.theaimsgroup.com/?l=linux-kernel&m=103522568629074&w=2

- Enable debugging in drivers/base/power.c at the top of the file by
applying the patch below.

- Rebuild and try again.

If you're still experiencing a hang or an Oops, please report it to me
and/or the list, along with your .config.

Thanks,

-pat

===== drivers/base/power.c 1.13 vs edited =====
--- 1.13/drivers/base/power.c Fri Oct 18 17:57:42 2002
+++ edited/drivers/base/power.c Mon Oct 21 15:33:34 2002
@@ -8,7 +8,7 @@
*
*/

-#define DEBUG 0
+#define DEBUG 1

#include <linux/device.h>
#include <linux/module.h>

2002-10-21 22:22:16

by Patrick Mochel

[permalink] [raw]
Subject: Re: 2.5.44 crash on reboot


On 21 Oct 2002, Stephen Hemminger wrote:

> The following happens on 2-way SMP box every time I reboot using
> serial console. Not sure if it is a socket or inode problem but it looks
> like a close race.
> --------------------------------------------------------------------
>
> Unable to handle kernel NULL pointer dereference at virtual address
> 00000000
> printing eip:
> c01b1a38
> *pde = 00000000
> Oops: 0000
> ide-cd cdrom soundcore mga agpgart autofs nfs lockd sunrpc eepro100 mii
> mousede
> CPU: 0
> EIP: 0060:[<c01b1a38>] Not tainted
> EFLAGS: 00010246
> EIP is at device_shutdown+0x78/0x9e
^^^^^^^^^^^^^^^

Actually, it appears to be a problem accessing the global device list.
Could you please send me your .config (private email is fine).

-pat