2008-12-30 12:54:06

by Tetsuo Handa

[permalink] [raw]
Subject: [2.6.28] NULL pointer dereference at get_stats()

Hello.

I got this on 2.6.28 .

CentOS 5.2 (gcc (GCC) 4.1.2 20071124 (Red Hat 4.1.2-42)) on VMware Workstation 6.5.1.

Config is at http://I-love.SAKURA.ne.jp/tmp/config-2.6.28 .
Full log is at http://I-love.SAKURA.ne.jp/tmp/messages6.txt .

----------------------------------------
BIOS EBDA/lowmem at: 0009f800/0009f800
Linux version 2.6.28 (root@tomoyo) (gcc version 4.1.2 20071124 (Red Hat 4.1.2-42)) #1 SMP Tue Dec 30 21:11:13 JST 2008
KERNEL supported cpus:
Intel GenuineIntel
AMD AuthenticAMD
NSC Geode by NSC
Cyrix CyrixInstead
Centaur CentaurHauls
Transmeta GenuineTMx86
Transmeta TransmetaCPU
UMC UMC UMC UMC
(... snipped ...)
INIT: Entering runlevel: 3

Entering non-interactive startup
Applying Intel CPU microcode update: [ OK ]

Starting sysstat: Calling the system activity data collector (sadc): BUG: unable to handle kernel NULL pointer dereference at 00000004
IP: [<c055f2f5>] get_stats+0x1d/0x48
Oops: 0000 [#1] SMP
last sysfs file: /sys/class/firmware/microcode/loading
Modules linked in: dm_mirror dm_region_hash dm_log dm_multipath dm_mod rfkill input_polldev sbs sbshc battery lp sg floppy ide_cd_mod cdrom serio_raw parport_pc parport rtc_cmos rtc_core ac button pcnet32 rtc_lib mii ata_piix i2c_piix4 libata i2c_core pcspkr mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]

Pid: 2459, comm: sadc Not tainted (2.6.28 #1) VMware Virtual Platform
EIP: 0060:[<c055f2f5>] EFLAGS: 00010297 CPU: 0
EIP is at get_stats+0x1d/0x48
EAX: 00000000 EBX: df94c858 ECX: 00000001 EDX: 00000001
ESI: 00000000 EDI: 00000000 EBP: 206a4abf ESP: df163f0c
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process sadc (pid: 2459, ti=df163000 task=df8c56d0 task.ti=df163000)
Stack:
df94c800 dfb2ae40 df94c800 000000c8 c05bcc0f c066ee4c dfb2ae40 c04828b3
00000400 b7f6b000 df1dcf00 dfb2ae60 00000000 00000001 00000000 00000000
00000000 df8e4340 c04826ec fffffffb df1dcf00 c049f08d df163fa0 00000400
Call Trace:
[<c05bcc0f>] dev_seq_show+0x1c/0x77
[<c04828b3>] seq_read+0x1c7/0x2a0
[<c04826ec>] seq_read+0x0/0x2a0
[<c049f08d>] proc_reg_read+0x58/0x6b
[<c049f035>] proc_reg_read+0x0/0x6b
[<c0470444>] vfs_read+0x81/0xf4
[<c0470720>] sys_read+0x3c/0x63
[<c0403841>] sysenter_do_call+0x12/0x21
Code: ff 00 89 d8 e8 28 e6 05 00 31 c0 5b 5e c3 55 83 c9 ff 57 31 ff 56 31 f6 53 8b a8 6c 03 00 00 8d 58 58 eb 0c 89 e8 f7 d0 8b 04 88 <03> 78 04 03 30 89 c8 ba a0 9c 81 c0 e8 66 a1 f8 ff 83 f8 1f 89
EIP: [<c055f2f5>] get_stats+0x1d/0x48 SS:ESP 0068:df163f0c
---[ end trace 8be667e49b995a38 ]---
/etc/rc3.d/S03sysstat: line 34: 2459 Segmentation fault /usr/lib/sa/sadc -F -L -

[FAILED]

Starting background readahead: [ OK ]

Bringing up loopback interface: BUG: unable to handle kernel NULL pointer dereference at 00000004
IP: [<c055f2f5>] get_stats+0x1d/0x48
*pde = 00000000
Oops: 0000 [#2] SMP
last sysfs file: /sys/class/firmware/microcode/loading
Modules linked in: dm_mirror dm_region_hash dm_log dm_multipath dm_mod rfkill input_polldev sbs sbshc battery lp sg floppy ide_cd_mod cdrom serio_raw parport_pc parport rtc_cmos rtc_core ac button pcnet32 rtc_lib mii ata_piix i2c_piix4 libata i2c_core pcspkr mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]

Pid: 2534, comm: ip Tainted: G D (2.6.28 #1) VMware Virtual Platform
EIP: 0060:[<c055f2f5>] EFLAGS: 00010297 CPU: 0
EIP is at get_stats+0x1d/0x48
EAX: 00000000 EBX: df94c858 ECX: 00000001 EDX: 00000001
ESI: 00000000 EDI: 00000000 EBP: 206a4abf ESP: df0b0c88
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process ip (pid: 2534, ti=df0b0000 task=df8c7b60 task.ti=df0b0000)
Stack:
df99a08c df94c964 deda4780 df94c800 c05c571a 0000000b df99a000 dfb2a940
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00004034
df94c800 dfb2a940 00000000 deda4780 c05c5fe7 000009e6 495a15d3 00000000
Call Trace:
[<c05c571a>] rtnl_fill_ifinfo+0x2c9/0x498
[<c05c5fe7>] rtnl_dump_ifinfo+0x40/0x69
[<c05d0cff>] netlink_dump+0x4a/0x163
[<c05d2812>] netlink_dump_start+0xf9/0x11c
[<c05c5fa7>] rtnl_dump_ifinfo+0x0/0x69
[<c05c5d59>] rtnetlink_rcv_msg+0xad/0x1ac
[<c05c5fa7>] rtnl_dump_ifinfo+0x0/0x69
[<c05c5cac>] rtnetlink_rcv_msg+0x0/0x1ac
[<c05d1a0c>] netlink_rcv_skb+0x2d/0x71
[<c05c5ca6>] rtnetlink_rcv+0x14/0x1a
[<c05d1836>] netlink_unicast+0x1a2/0x205
[<c05d1f0b>] netlink_sendmsg+0x24a/0x257
[<c05b44c4>] sock_sendmsg+0xc7/0xe1
[<c0434958>] autoremove_wake_function+0x0/0x2d
[<c0450d0c>] sync_page+0x0/0x36
[<c044f17d>] __delayacct_blkio_end+0x56/0x59
[<c0629b6b>] io_schedule+0x65/0x81
[<c0629c9d>] __wait_on_bit_lock+0x4b/0x52
[<c0450ba2>] find_get_page+0x1d/0x7a
[<c04ee9d9>] copy_from_user+0x23/0x4f
[<c05b4e0a>] sys_sendto+0xfc/0x127
[<c045c52d>] __do_fault+0x2fb/0x33d
[<c05b5719>] sys_socketcall+0xfc/0x1a9
[<c0403841>] sysenter_do_call+0x12/0x21
Code: ff 00 89 d8 e8 28 e6 05 00 31 c0 5b 5e c3 55 83 c9 ff 57 31 ff 56 31 f6 53 8b a8 6c 03 00 00 8d 58 58 eb 0c 89 e8 f7 d0 8b 04 88 <03> 78 04 03 30 89 c8 ba a0 9c 81 c0 e8 66 a1 f8 ff 83 f8 1f 89
EIP: [<c055f2f5>] get_stats+0x1d/0x48 SS:ESP 0068:df0b0c88
---[ end trace 8be667e49b995a38 ]---
----------------------------------------

After doing "chkconfig microcode_ctl off" and reboot, I got below.

----------------------------------------
(... snipped ...)
INIT: Entering runlevel: 3

Entering non-interactive startup
Starting sysstat: Calling the system activity data collector (sadc): BUG: unable to handle kernel NULL pointer dereference at 00000004
IP: [<c055f2f5>] get_stats+0x1d/0x48
Oops: 0000 [#1] SMP
last sysfs file: /sys/block/hda/removable
Modules linked in: dm_mirror dm_region_hash dm_log dm_multipath dm_mod rfkill input_polldev sbs sbshc battery lp sg floppy ide_cd_mod cdrom serio_raw parport_pc parport rtc_cmos rtc_core rtc_lib ac pcnet32 button mii ata_piix libata i2c_piix4 pcspkr i2c_core mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd

Pid: 2417, comm: sadc Not tainted (2.6.28 #1) VMware Virtual Platform
EIP: 0060:[<c055f2f5>] EFLAGS: 00010297 CPU: 0
EIP is at get_stats+0x1d/0x48
EAX: 00000000 EBX: df94b858 ECX: 00000001 EDX: 00000001
ESI: 00000000 EDI: 00000000 EBP: 206a5abf ESP: df396f0c
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process sadc (pid: 2417, ti=df396000 task=df87edb0 task.ti=df396000)
Stack:
df94b800 df970f00 df94b800 000000c8 c05bcc0f c066ee4c df970f00 c04828b3
00000400 b7f93000 dfb89380 df970f20 00000000 00000001 00000000 00000000
00000000 df8e1340 c04826ec fffffffb dfb89380 c049f08d df396fa0 00000400
Call Trace:
[<c05bcc0f>] dev_seq_show+0x1c/0x77
[<c04828b3>] seq_read+0x1c7/0x2a0
[<c04826ec>] seq_read+0x0/0x2a0
[<c049f08d>] proc_reg_read+0x58/0x6b
[<c049f035>] proc_reg_read+0x0/0x6b
[<c0470444>] vfs_read+0x81/0xf4
[<c0470720>] sys_read+0x3c/0x63
[<c0403841>] sysenter_do_call+0x12/0x21
Code: ff 00 89 d8 e8 28 e6 05 00 31 c0 5b 5e c3 55 83 c9 ff 57 31 ff 56 31 f6 53 8b a8 6c 03 00 00 8d 58 58 eb 0c 89 e8 f7 d0 8b 04 88 <03> 78 04 03 30 89 c8 ba a0 9c 81 c0 e8 66 a1 f8 ff 83 f8 1f 89
EIP: [<c055f2f5>] get_stats+0x1d/0x48 SS:ESP 0068:df396f0c
---[ end trace 51b8087926b0fb03 ]---
/etc/rc3.d/S03sysstat: line 34: 2417 Segmentation fault /usr/lib/sa/sadc -F -L -

[FAILED]

Starting background readahead: [ OK ]

Bringing up loopback interface: BUG: unable to handle kernel NULL pointer dereference at 00000004
IP: [<c055f2f5>] get_stats+0x1d/0x48
*pde = 00000000
Oops: 0000 [#2] SMP
last sysfs file: /sys/block/hda/removable
Modules linked in: dm_mirror dm_region_hash dm_log dm_multipath dm_mod rfkill input_polldev sbs sbshc battery lp sg floppy ide_cd_mod cdrom serio_raw parport_pc parport rtc_cmos rtc_core rtc_lib ac pcnet32 button mii ata_piix libata i2c_piix4 pcspkr i2c_core mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd

Pid: 2492, comm: ip Tainted: G D (2.6.28 #1) VMware Virtual Platform
EIP: 0060:[<c055f2f5>] EFLAGS: 00010297 CPU: 0
EIP is at get_stats+0x1d/0x48
EAX: 00000000 EBX: df94b858 ECX: 00000001 EDX: 00000001
ESI: 00000000 EDI: 00000000 EBP: 206a5abf ESP: deea4c88
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process ip (pid: 2492, ti=deea4000 task=df87edb0 task.ti=deea4000)
Stack:
dee5908c df94b964 df1c36c0 df94b800 c05c571a dedc3040 dee59000 df93e916
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00004034
df94b800 df16b9c0 00000000 df1c36c0 c05c5fe7 000009bc 495a1819 00000000
Call Trace:
[<c05c571a>] rtnl_fill_ifinfo+0x2c9/0x498
[<c05c5fe7>] rtnl_dump_ifinfo+0x40/0x69
[<c05d0cff>] netlink_dump+0x4a/0x163
[<c05d2812>] netlink_dump_start+0xf9/0x11c
[<c05c5fa7>] rtnl_dump_ifinfo+0x0/0x69
[<c05c5d59>] rtnetlink_rcv_msg+0xad/0x1ac
[<c05c5fa7>] rtnl_dump_ifinfo+0x0/0x69
[<c04dfad4>] __generic_unplug_device+0x1a/0x1c
[<c05c5cac>] rtnetlink_rcv_msg+0x0/0x1ac
[<c05d1a0c>] netlink_rcv_skb+0x2d/0x71
[<c05c5ca6>] rtnetlink_rcv+0x14/0x1a
[<c05d1836>] netlink_unicast+0x1a2/0x205
[<c05d1f0b>] netlink_sendmsg+0x24a/0x257
[<c05b44c4>] sock_sendmsg+0xc7/0xe1
[<c0434958>] autoremove_wake_function+0x0/0x2d
[<c0450d0c>] sync_page+0x0/0x36
[<c044f17d>] __delayacct_blkio_end+0x56/0x59
[<c0629b6b>] io_schedule+0x65/0x81
[<c0629c9d>] __wait_on_bit_lock+0x4b/0x52
[<c0450ba2>] find_get_page+0x1d/0x7a
[<c04ee9d9>] copy_from_user+0x23/0x4f
[<c05b4e0a>] sys_sendto+0xfc/0x127
[<c045c52d>] __do_fault+0x2fb/0x33d
[<c05b5719>] sys_socketcall+0xfc/0x1a9
[<c0403841>] sysenter_do_call+0x12/0x21
Code: ff 00 89 d8 e8 28 e6 05 00 31 c0 5b 5e c3 55 83 c9 ff 57 31 ff 56 31 f6 53 8b a8 6c 03 00 00 8d 58 58 eb 0c 89 e8 f7 d0 8b 04 88 <03> 78 04 03 30 89 c8 ba a0 9c 81 c0 e8 66 a1 f8 ff 83 f8 1f 89
EIP: [<c055f2f5>] get_stats+0x1d/0x48 SS:ESP 0068:deea4c88
---[ end trace 51b8087926b0fb03 ]---
----------------------------------------

This bug resembles the one I reported at http://lkml.org/lkml/2008/11/28/99 .

Regards.


2008-12-30 13:28:54

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [2.6.28] NULL pointer dereference at get_stats()

Hi

CC to netdev.

Can you reproduce this bug on native linux?
your log seems a bit strange.

it doesn't have any network card initialization log, but kernel
crashed in network driver.
in addition, loaded module list indicate your network card is pcnet32,
but pcnet32 doesn't have
get_stats() function. (instead, it has pcnet32_get_stats() )


(intentional fully quoted)
> Hello.
>
> I got this on 2.6.28 .
>
> CentOS 5.2 (gcc (GCC) 4.1.2 20071124 (Red Hat 4.1.2-42)) on VMware Workstation 6.5.1.
>
> Config is at http://I-love.SAKURA.ne.jp/tmp/config-2.6.28 .
> Full log is at http://I-love.SAKURA.ne.jp/tmp/messages6.txt .
>
> ----------------------------------------
> BIOS EBDA/lowmem at: 0009f800/0009f800
> Linux version 2.6.28 (root@tomoyo) (gcc version 4.1.2 20071124 (Red Hat 4.1.2-42)) #1 SMP Tue Dec 30 21:11:13 JST 2008
> KERNEL supported cpus:
> Intel GenuineIntel
> AMD AuthenticAMD
> NSC Geode by NSC
> Cyrix CyrixInstead
> Centaur CentaurHauls
> Transmeta GenuineTMx86
> Transmeta TransmetaCPU
> UMC UMC UMC UMC
> (... snipped ...)
> INIT: Entering runlevel: 3
>
> Entering non-interactive startup
> Applying Intel CPU microcode update: [ OK ]
>
> Starting sysstat: Calling the system activity data collector (sadc): BUG: unable to handle kernel NULL pointer dereference at 00000004
> IP: [<c055f2f5>] get_stats+0x1d/0x48
> Oops: 0000 [#1] SMP
> last sysfs file: /sys/class/firmware/microcode/loading
> Modules linked in: dm_mirror dm_region_hash dm_log dm_multipath dm_mod rfkill input_polldev sbs sbshc battery lp sg floppy ide_cd_mod cdrom serio_raw parport_pc parport rtc_cmos rtc_core ac button pcnet32 rtc_lib mii ata_piix i2c_piix4 libata i2c_core pcspkr mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
>
> Pid: 2459, comm: sadc Not tainted (2.6.28 #1) VMware Virtual Platform
> EIP: 0060:[<c055f2f5>] EFLAGS: 00010297 CPU: 0
> EIP is at get_stats+0x1d/0x48
> EAX: 00000000 EBX: df94c858 ECX: 00000001 EDX: 00000001
> ESI: 00000000 EDI: 00000000 EBP: 206a4abf ESP: df163f0c
> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> Process sadc (pid: 2459, ti=df163000 task=df8c56d0 task.ti=df163000)
> Stack:
> df94c800 dfb2ae40 df94c800 000000c8 c05bcc0f c066ee4c dfb2ae40 c04828b3
> 00000400 b7f6b000 df1dcf00 dfb2ae60 00000000 00000001 00000000 00000000
> 00000000 df8e4340 c04826ec fffffffb df1dcf00 c049f08d df163fa0 00000400
> Call Trace:
> [<c05bcc0f>] dev_seq_show+0x1c/0x77
> [<c04828b3>] seq_read+0x1c7/0x2a0
> [<c04826ec>] seq_read+0x0/0x2a0
> [<c049f08d>] proc_reg_read+0x58/0x6b
> [<c049f035>] proc_reg_read+0x0/0x6b
> [<c0470444>] vfs_read+0x81/0xf4
> [<c0470720>] sys_read+0x3c/0x63
> [<c0403841>] sysenter_do_call+0x12/0x21
> Code: ff 00 89 d8 e8 28 e6 05 00 31 c0 5b 5e c3 55 83 c9 ff 57 31 ff 56 31 f6 53 8b a8 6c 03 00 00 8d 58 58 eb 0c 89 e8 f7 d0 8b 04 88 <03> 78 04 03 30 89 c8 ba a0 9c 81 c0 e8 66 a1 f8 ff 83 f8 1f 89
> EIP: [<c055f2f5>] get_stats+0x1d/0x48 SS:ESP 0068:df163f0c
> ---[ end trace 8be667e49b995a38 ]---
> /etc/rc3.d/S03sysstat: line 34: 2459 Segmentation fault /usr/lib/sa/sadc -F -L -
>
> [FAILED]
>
> Starting background readahead: [ OK ]
>
> Bringing up loopback interface: BUG: unable to handle kernel NULL pointer dereference at 00000004
> IP: [<c055f2f5>] get_stats+0x1d/0x48
> *pde = 00000000
> Oops: 0000 [#2] SMP
> last sysfs file: /sys/class/firmware/microcode/loading
> Modules linked in: dm_mirror dm_region_hash dm_log dm_multipath dm_mod rfkill input_polldev sbs sbshc battery lp sg floppy ide_cd_mod cdrom serio_raw parport_pc parport rtc_cmos rtc_core ac button pcnet32 rtc_lib mii ata_piix i2c_piix4 libata i2c_core pcspkr mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
>
> Pid: 2534, comm: ip Tainted: G D (2.6.28 #1) VMware Virtual Platform
> EIP: 0060:[<c055f2f5>] EFLAGS: 00010297 CPU: 0
> EIP is at get_stats+0x1d/0x48
> EAX: 00000000 EBX: df94c858 ECX: 00000001 EDX: 00000001
> ESI: 00000000 EDI: 00000000 EBP: 206a4abf ESP: df0b0c88
> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> Process ip (pid: 2534, ti=df0b0000 task=df8c7b60 task.ti=df0b0000)
> Stack:
> df99a08c df94c964 deda4780 df94c800 c05c571a 0000000b df99a000 dfb2a940
> 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00004034
> df94c800 dfb2a940 00000000 deda4780 c05c5fe7 000009e6 495a15d3 00000000
> Call Trace:
> [<c05c571a>] rtnl_fill_ifinfo+0x2c9/0x498
> [<c05c5fe7>] rtnl_dump_ifinfo+0x40/0x69
> [<c05d0cff>] netlink_dump+0x4a/0x163
> [<c05d2812>] netlink_dump_start+0xf9/0x11c
> [<c05c5fa7>] rtnl_dump_ifinfo+0x0/0x69
> [<c05c5d59>] rtnetlink_rcv_msg+0xad/0x1ac
> [<c05c5fa7>] rtnl_dump_ifinfo+0x0/0x69
> [<c05c5cac>] rtnetlink_rcv_msg+0x0/0x1ac
> [<c05d1a0c>] netlink_rcv_skb+0x2d/0x71
> [<c05c5ca6>] rtnetlink_rcv+0x14/0x1a
> [<c05d1836>] netlink_unicast+0x1a2/0x205
> [<c05d1f0b>] netlink_sendmsg+0x24a/0x257
> [<c05b44c4>] sock_sendmsg+0xc7/0xe1
> [<c0434958>] autoremove_wake_function+0x0/0x2d
> [<c0450d0c>] sync_page+0x0/0x36
> [<c044f17d>] __delayacct_blkio_end+0x56/0x59
> [<c0629b6b>] io_schedule+0x65/0x81
> [<c0629c9d>] __wait_on_bit_lock+0x4b/0x52
> [<c0450ba2>] find_get_page+0x1d/0x7a
> [<c04ee9d9>] copy_from_user+0x23/0x4f
> [<c05b4e0a>] sys_sendto+0xfc/0x127
> [<c045c52d>] __do_fault+0x2fb/0x33d
> [<c05b5719>] sys_socketcall+0xfc/0x1a9
> [<c0403841>] sysenter_do_call+0x12/0x21
> Code: ff 00 89 d8 e8 28 e6 05 00 31 c0 5b 5e c3 55 83 c9 ff 57 31 ff 56 31 f6 53 8b a8 6c 03 00 00 8d 58 58 eb 0c 89 e8 f7 d0 8b 04 88 <03> 78 04 03 30 89 c8 ba a0 9c 81 c0 e8 66 a1 f8 ff 83 f8 1f 89
> EIP: [<c055f2f5>] get_stats+0x1d/0x48 SS:ESP 0068:df0b0c88
> ---[ end trace 8be667e49b995a38 ]---
> ----------------------------------------
>
> After doing "chkconfig microcode_ctl off" and reboot, I got below.
>
> ----------------------------------------
> (... snipped ...)
> INIT: Entering runlevel: 3
>
> Entering non-interactive startup
> Starting sysstat: Calling the system activity data collector (sadc): BUG: unable to handle kernel NULL pointer dereference at 00000004
> IP: [<c055f2f5>] get_stats+0x1d/0x48
> Oops: 0000 [#1] SMP
> last sysfs file: /sys/block/hda/removable
> Modules linked in: dm_mirror dm_region_hash dm_log dm_multipath dm_mod rfkill input_polldev sbs sbshc battery lp sg floppy ide_cd_mod cdrom serio_raw parport_pc parport rtc_cmos rtc_core rtc_lib ac pcnet32 button mii ata_piix libata i2c_piix4 pcspkr i2c_core mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
>
> Pid: 2417, comm: sadc Not tainted (2.6.28 #1) VMware Virtual Platform
> EIP: 0060:[<c055f2f5>] EFLAGS: 00010297 CPU: 0
> EIP is at get_stats+0x1d/0x48
> EAX: 00000000 EBX: df94b858 ECX: 00000001 EDX: 00000001
> ESI: 00000000 EDI: 00000000 EBP: 206a5abf ESP: df396f0c
> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> Process sadc (pid: 2417, ti=df396000 task=df87edb0 task.ti=df396000)
> Stack:
> df94b800 df970f00 df94b800 000000c8 c05bcc0f c066ee4c df970f00 c04828b3
> 00000400 b7f93000 dfb89380 df970f20 00000000 00000001 00000000 00000000
> 00000000 df8e1340 c04826ec fffffffb dfb89380 c049f08d df396fa0 00000400
> Call Trace:
> [<c05bcc0f>] dev_seq_show+0x1c/0x77
> [<c04828b3>] seq_read+0x1c7/0x2a0
> [<c04826ec>] seq_read+0x0/0x2a0
> [<c049f08d>] proc_reg_read+0x58/0x6b
> [<c049f035>] proc_reg_read+0x0/0x6b
> [<c0470444>] vfs_read+0x81/0xf4
> [<c0470720>] sys_read+0x3c/0x63
> [<c0403841>] sysenter_do_call+0x12/0x21
> Code: ff 00 89 d8 e8 28 e6 05 00 31 c0 5b 5e c3 55 83 c9 ff 57 31 ff 56 31 f6 53 8b a8 6c 03 00 00 8d 58 58 eb 0c 89 e8 f7 d0 8b 04 88 <03> 78 04 03 30 89 c8 ba a0 9c 81 c0 e8 66 a1 f8 ff 83 f8 1f 89
> EIP: [<c055f2f5>] get_stats+0x1d/0x48 SS:ESP 0068:df396f0c
> ---[ end trace 51b8087926b0fb03 ]---
> /etc/rc3.d/S03sysstat: line 34: 2417 Segmentation fault /usr/lib/sa/sadc -F -L -
>
> [FAILED]
>
> Starting background readahead: [ OK ]
>
> Bringing up loopback interface: BUG: unable to handle kernel NULL pointer dereference at 00000004
> IP: [<c055f2f5>] get_stats+0x1d/0x48
> *pde = 00000000
> Oops: 0000 [#2] SMP
> last sysfs file: /sys/block/hda/removable
> Modules linked in: dm_mirror dm_region_hash dm_log dm_multipath dm_mod rfkill input_polldev sbs sbshc battery lp sg floppy ide_cd_mod cdrom serio_raw parport_pc parport rtc_cmos rtc_core rtc_lib ac pcnet32 button mii ata_piix libata i2c_piix4 pcspkr i2c_core mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
>
> Pid: 2492, comm: ip Tainted: G D (2.6.28 #1) VMware Virtual Platform
> EIP: 0060:[<c055f2f5>] EFLAGS: 00010297 CPU: 0
> EIP is at get_stats+0x1d/0x48
> EAX: 00000000 EBX: df94b858 ECX: 00000001 EDX: 00000001
> ESI: 00000000 EDI: 00000000 EBP: 206a5abf ESP: deea4c88
> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> Process ip (pid: 2492, ti=deea4000 task=df87edb0 task.ti=deea4000)
> Stack:
> dee5908c df94b964 df1c36c0 df94b800 c05c571a dedc3040 dee59000 df93e916
> 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00004034
> df94b800 df16b9c0 00000000 df1c36c0 c05c5fe7 000009bc 495a1819 00000000
> Call Trace:
> [<c05c571a>] rtnl_fill_ifinfo+0x2c9/0x498
> [<c05c5fe7>] rtnl_dump_ifinfo+0x40/0x69
> [<c05d0cff>] netlink_dump+0x4a/0x163
> [<c05d2812>] netlink_dump_start+0xf9/0x11c
> [<c05c5fa7>] rtnl_dump_ifinfo+0x0/0x69
> [<c05c5d59>] rtnetlink_rcv_msg+0xad/0x1ac
> [<c05c5fa7>] rtnl_dump_ifinfo+0x0/0x69
> [<c04dfad4>] __generic_unplug_device+0x1a/0x1c
> [<c05c5cac>] rtnetlink_rcv_msg+0x0/0x1ac
> [<c05d1a0c>] netlink_rcv_skb+0x2d/0x71
> [<c05c5ca6>] rtnetlink_rcv+0x14/0x1a
> [<c05d1836>] netlink_unicast+0x1a2/0x205
> [<c05d1f0b>] netlink_sendmsg+0x24a/0x257
> [<c05b44c4>] sock_sendmsg+0xc7/0xe1
> [<c0434958>] autoremove_wake_function+0x0/0x2d
> [<c0450d0c>] sync_page+0x0/0x36
> [<c044f17d>] __delayacct_blkio_end+0x56/0x59
> [<c0629b6b>] io_schedule+0x65/0x81
> [<c0629c9d>] __wait_on_bit_lock+0x4b/0x52
> [<c0450ba2>] find_get_page+0x1d/0x7a
> [<c04ee9d9>] copy_from_user+0x23/0x4f
> [<c05b4e0a>] sys_sendto+0xfc/0x127
> [<c045c52d>] __do_fault+0x2fb/0x33d
> [<c05b5719>] sys_socketcall+0xfc/0x1a9
> [<c0403841>] sysenter_do_call+0x12/0x21
> Code: ff 00 89 d8 e8 28 e6 05 00 31 c0 5b 5e c3 55 83 c9 ff 57 31 ff 56 31 f6 53 8b a8 6c 03 00 00 8d 58 58 eb 0c 89 e8 f7 d0 8b 04 88 <03> 78 04 03 30 89 c8 ba a0 9c 81 c0 e8 66 a1 f8 ff 83 f8 1f 89
> EIP: [<c055f2f5>] get_stats+0x1d/0x48 SS:ESP 0068:deea4c88
> ---[ end trace 51b8087926b0fb03 ]---
> ----------------------------------------
>
> This bug resembles the one I reported at http://lkml.org/lkml/2008/11/28/99 .
>
> Regards.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2008-12-30 14:39:14

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [2.6.28] NULL pointer dereference at get_stats()

Hello.

KOSAKI Motohiro wrote:
> it doesn't have any network card initialization log, but kernel
> crashed in network driver.
> in addition, loaded module list indicate your network card is pcnet32,
> but pcnet32 doesn't have
> get_stats() function. (instead, it has pcnet32_get_stats() )

I think pcnet32 is unrelated.
It seems to me that the problem is loopback device.

I added "init=/bin/bash" and tried below commands.
----------------------------------------
# lsmod
Module Size Used by
mptspi 19208 1
mptscsih 33152 1 mptspi
mptbase 74084 2 mptspi,mptscsih
scsi_transport_spi 23296 1 mptspi
sd_mod 27416 2
scsi_mod 138260 4 mptspi,mptscsih,scsi_transport_spi,sd_mod
ext3 109576 1
jbd 43924 1 ext3
uhci_hcd 22800 0
ohci_hcd 23696 0
ehci_hcd 33036 0
# ls -1 /sys/class/net/
lo
# cat /sys/class/net/lo/statistics/rx_packets
BUG: unable to handle kernel NULL pointer dereference at 00000004
IP: [<c055f2f5>] get_stats+0x1d/0x48
*pde = 00000000
Oops: 0000 [#1] SMP
last sysfs file: /sys/class/net/lo/statistics/rx_packets
Modules linked in: mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd

Pid: 597, comm: cat Not tainted (2.6.28 #1) VMware Virtual Platform
EIP: 0060:[<c055f2f5>] EFLAGS: 00010297 CPU: 0
EIP is at get_stats+0x1d/0x48
EAX: 00000000 EBX: df956858 ECX: 00000001 EDX: 00000001
ESI: 00000000 EDI: 00000000 EBP: 2069abbf ESP: df89af24
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process cat (pid: 597, ti=df89a000 task=df8c8db0 task.ti=df89a000)
Stack:
df956800 00000000 dedb9000 c0732a00 c05c82fa c073a0ac fffffffb c05c83dc
c05c83e3 00000000 c0559190 dedaaa00 ffffffed df959718 c04a8da6 00001000
0804f000 dedaaa14 df956bf4 df8a5840 0804f000 c04a8d1d 00001000 c0470444
Call Trace:
[<c05c82fa>] netstat_show+0x48/0x64
[<c05c83dc>] show_rx_packets+0x0/0x9
[<c05c83e3>] show_rx_packets+0x7/0x9
[<c0559190>] dev_attr_show+0x16/0x32
[<c04a8da6>] sysfs_read_file+0x89/0xe8
[<c04a8d1d>] sysfs_read_file+0x0/0xe8
[<c0470444>] vfs_read+0x81/0xf4
[<c0470720>] sys_read+0x3c/0x63
[<c0403841>] sysenter_do_call+0x12/0x21
Code: ff 00 89 d8 e8 28 e6 05 00 31 c0 5b 5e c3 55 83 c9 ff 57 31 ff 56 31 f6 53 8b a8 6c 03 00 00 8d 58 58 eb 0c 89 e8 f7 d0 8b 04 88 <03> 78 04 03 30 89 c8 ba a0 9c 81 c0 e8 66 a1 f8 ff 83 f8 1f 89
EIP: [<c055f2f5>] get_stats+0x1d/0x48 SS:ESP 0068:df89af24
---[ end trace e9672264d489cc49 ]---
----------------------------------------

Regards.

2008-12-30 15:16:31

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [2.6.28] NULL pointer dereference at get_stats()

Hi

>> it doesn't have any network card initialization log, but kernel
>> crashed in network driver.
>> in addition, loaded module list indicate your network card is pcnet32,
>> but pcnet32 doesn't have
>> get_stats() function. (instead, it has pcnet32_get_stats() )
>
> I think pcnet32 is unrelated.
> It seems to me that the problem is loopback device.

my ia32 box can't reproduce your problem although loopback doesn't
depend on any hardware.
I suspect vmware issue.


>
> I added "init=/bin/bash" and tried below commands.
> ----------------------------------------
> # lsmod
> Module Size Used by
> mptspi 19208 1
> mptscsih 33152 1 mptspi
> mptbase 74084 2 mptspi,mptscsih
> scsi_transport_spi 23296 1 mptspi
> sd_mod 27416 2
> scsi_mod 138260 4 mptspi,mptscsih,scsi_transport_spi,sd_mod
> ext3 109576 1
> jbd 43924 1 ext3
> uhci_hcd 22800 0
> ohci_hcd 23696 0
> ehci_hcd 33036 0

2008-12-30 15:52:32

by Frederik Deweerdt

[permalink] [raw]
Subject: Re: [2.6.28] NULL pointer dereference at get_stats()

[...]
>
> This bug resembles the one I reported at http://lkml.org/lkml/2008/11/28/99 .

Does adding maxcpus=0 to the boot parameters solve the problem? Stephen
suspected a per-cpu variable related problem.

Regards,
Frederik

2008-12-30 17:02:17

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [2.6.28] NULL pointer dereference at get_stats()

Hello.

Frederik Deweerdt wrote:
> Does adding maxcpus=0 to the boot parameters solve the problem? Stephen
> suspected a per-cpu variable related problem.

No, but I got interestring results.

Result | # of virtual CPUs | Kernel command line
-------+-------------------+-----------------------------------------
OK | 2 | ro root=LABEL=/
BUG | 2 | ro root=LABEL=/ noapic nolapic
BUG | 2 | ro root=LABEL=/ noapic nolapic maxcpus=0
BUG | 2 | ro root=LABEL=/ noapic nolapic nosmp
BUG | 2 | ro root=LABEL=/ maxcpus=0
BUG | 2 | ro root=LABEL=/ nosmp
OK | 1 | ro root=LABEL=/
OK | 1 | ro root=LABEL=/ nolapic noapic
OK | 1 | ro root=LABEL=/ nolapic noapic maxcpus=0
OK | 1 | ro root=LABEL=/ nolapic noapic nosmp

"OK" means NULL pointer dereference didn't happen.
"BUG" means NULL pointer dereference happened.

I was adding "noapic" "nolapic" to command line while assigning 2 vcpus.
Thus, I encountered this problem. Workaround is to remove "noapic" "nolapic".

Now, it seems to me that this problem is caused by interaction of
"noapic" "nolapic" "maxcpus=0" "nosmp" options.

Regards.

2008-12-30 21:37:41

by Frederik Deweerdt

[permalink] [raw]
Subject: Re: [2.6.28] NULL pointer dereference at get_stats()

[Trimmed netdev from cc:]
Hello Tetsuo,

On Wed, Dec 31, 2008 at 02:01:56AM +0900, Tetsuo Handa wrote:
> Hello.
>
> Frederik Deweerdt wrote:
> > Does adding maxcpus=0 to the boot parameters solve the problem? Stephen
> > suspected a per-cpu variable related problem.
>
> No, but I got interestring results.
>
> Result | # of virtual CPUs | Kernel command line
> -------+-------------------+-----------------------------------------
> OK | 2 | ro root=LABEL=/
> BUG | 2 | ro root=LABEL=/ noapic nolapic
> BUG | 2 | ro root=LABEL=/ noapic nolapic maxcpus=0
> BUG | 2 | ro root=LABEL=/ noapic nolapic nosmp
> BUG | 2 | ro root=LABEL=/ maxcpus=0
> BUG | 2 | ro root=LABEL=/ nosmp
> OK | 1 | ro root=LABEL=/
> OK | 1 | ro root=LABEL=/ nolapic noapic
> OK | 1 | ro root=LABEL=/ nolapic noapic maxcpus=0
> OK | 1 | ro root=LABEL=/ nolapic noapic nosmp
>
> "OK" means NULL pointer dereference didn't happen.
> "BUG" means NULL pointer dereference happened.
>
> I was adding "noapic" "nolapic" to command line while assigning 2 vcpus.
> Thus, I encountered this problem. Workaround is to remove "noapic" "nolapic".
>
> Now, it seems to me that this problem is caused by interaction of
> "noapic" "nolapic" "maxcpus=0" "nosmp" options.

Thanks for the thourough testing. I've reviewed the code, and the only
think that looks weird to me is the case where smp_sanity_check() fails.
The attached patch is an attempt to correct this, Tetsuo, could you
test it?

Ingo, could you please have a look at it? It does what commit
deef325086c3897393b8f7d6bccd0340 "x86: disable preemption in
native_smp_prepare_cpus" was looking to fix, but continuing the APIC
setup in case smp_sanity_check fails.


Regards,
Frederik

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index f8500c9..888c6d3 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1188,16 +1188,13 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
if (smp_sanity_check(max_cpus) < 0) {
printk(KERN_INFO "SMP disabled\n");
disable_smp();
- goto out;
}

- preempt_disable();
if (read_apic_id() != boot_cpu_physical_apicid) {
panic("Boot APIC ID in local APIC unexpected (%d vs %d)",
read_apic_id(), boot_cpu_physical_apicid);
/* Or can we switch back to PIC here? */
}
- preempt_enable();

connect_bsp_APIC();

@@ -1230,7 +1227,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)

if (is_uv_system())
uv_system_init();
-out:
+
preempt_enable();
}
/*

2008-12-31 02:25:23

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [2.6.28] NULL pointer dereference at get_stats()

Hello.

Frederik Deweerdt wrote:
> Thanks for the thourough testing. I've reviewed the code, and the only
> think that looks weird to me is the case where smp_sanity_check() fails.
> The attached patch is an attempt to correct this, Tetsuo, could you
> test it?

That patch breaks driver initialization code
----------------------------------------
SCSI subsystem initialized
Loading sd_mod.ko module
Driver 'sd' needs updating - please use bus_type methods
Loading mptbase.ko module
Fusion MPT base driver 3.04.07
Copyright (c) 1999-2008 LSI Corporation
Loading mptscsih.ko module
Waiting for driver initialization.
Scanning and configuring dmraid supported devices
Creating root device.
Mounting root filesystem.
mount: could not find filesystem '/dev/root'
Setting up other filesystems.
Setting up new root fs
setuproot: moving /dev failed: No such file or directory
no fstab.sys, mounting internal defaults
setuproot: error mounting /proc: No such file or directory
setuproot: error mounting /sys: No such file or directory
Switching to new root and running init.
unmounting old /dev
unmounting old /proc
unmounting old /sys
switchroot: mount failed: No such file or directory
Kernel panic - not syncing: Attempted to kill init!
----------------------------------------
with a popup dialog
"The CPU has been disabled by the guest operating system.
You will need to power off or reset the virtual machine at this point."

The log without that patch looks like
----------------------------------------
SCSI subsystem initialized
Loading sd_mod.ko module
Driver 'sd' needs updating - please use bus_type methods
Loading scsi_transport_spi.ko module
Loading mptbase.ko module
Fusion MPT base driver 3.04.07
Copyright (c) 1999-2008 LSI Corporation
Loading mptscsih.ko module
Loading mptspi.ko module
Fusion MPT SPI Host driver 3.04.07
mptspi 0000:00:10.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
mptbase: ioc0: Initiating bringup
ioc0: LSI53C1030 B0: Capabilities={Initiator}
scsi0 : ioc0: LSI53C1030 B0, FwRev=00000000h, Ports=1, MaxQ=128, IRQ=17
scsi 0:0:0:0: Direct-Access VMware, VMware Virtual S 1.0 PQ: 0 ANSI: 2
target0:0:0: Beginning Domain Validation
target0:0:0: Domain Validation skipping write tests
target0:0:0: Ending Domain Validation
target0:0:0: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, offset 127)
sd 0:0:0:0: [sda] 12582912 512-byte hardware sectors: (6.44 GB/6.00 GiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Cache data unavailable
sd 0:0:0:0: [sda] Assuming drive cache: write through
sd 0:0:0:0: [sda] 12582912 512-byte hardware sectors: (6.44 GB/6.00 GiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Cache data unavailable
sd 0:0:0:0: [sda] Assuming drive cache: write through
sda: sda1 sda2
sd 0:0:0:0: [sda] Attached SCSI disk
mptspi 0000:02:00.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
mptbase: ioc1: Initiating bringup
ioc1: LSI53C1030 B0: Capabilities={Initiator}
scsi1 : ioc1: LSI53C1030 B0, FwRev=00000000h, Ports=1, MaxQ=128, IRQ=18
scsi 1:0:0:0: Direct-Access VMware, VMware Virtual S 1.0 PQ: 0 ANSI: 2
target1:0:0: Beginning Domain Validation
target1:0:0: Domain Validation skipping write tests
target1:0:0: Ending Domain Validation
target1:0:0: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, offset 127)
sd 1:0:0:0: [sdb] 16777216 512-byte hardware sectors: (8.58 GB/8.00 GiB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Cache data unavailable
sd 1:0:0:0: [sdb] Assuming drive cache: write through
sd 1:0:0:0: [sdb] 16777216 512-byte hardware sectors: (8.58 GB/8.00 GiB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Cache data unavailable
sd 1:0:0:0: [sdb] Assuming drive cache: write through
sdb: sdb1
sd 1:0:0:0: [sdb] Attached SCSI disk
insmod used greatest stack depth: 1792 bytes left
Waiting for driver initialization.
Scanning and configuring dmraid supported devices
Creating root device.
Mounting root filesystem.
kjournald starting. Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
Setting up other filesystems.
Setting up new root fs
no fstab.sys, mounting internal defaults
Switching to new root and running init.
----------------------------------------

Regards.

2008-12-31 02:44:39

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [2.6.28] NULL pointer dereference at get_stats()

Tetsuo Handa wrote:
> That patch breaks driver initialization code

Oh, please wait. It may be my operational mistake.
The content of /init in initrd-2.6.28.img differs.

2008-12-31 05:09:30

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [2.6.28] NULL pointer dereference at get_stats()

Hello.

Frederik Deweerdt wrote:
> Thanks for the thourough testing. I've reviewed the code, and the only
> think that looks weird to me is the case where smp_sanity_check() fails.
> The attached patch is an attempt to correct this, Tetsuo, could you
> test it?
That patch didn't solve.

Below are the results (with that patch applied) with more test cases.

Case | Result | # of virtual CPUs | # of CPUs recognized | Kernel command line
-----+--------+-------------------+----------------------+-----------------------------------------
(1) | OK | 2 | 2 | ro root=LABEL=/
(2) | BUG | 2 | 1 | ro root=LABEL=/ noapic nolapic
(3) | BUG | 2 | 1 | ro root=LABEL=/ noapic nolapic maxcpus=0
(4) | BUG | 2 | 1 | ro root=LABEL=/ noapic nolapic nosmp
(5) | BUG | 2 | 1 | ro root=LABEL=/ maxcpus=0
(6) | BUG | 2 | 1 | ro root=LABEL=/ nosmp
(7) | OK | 1 | 1 | ro root=LABEL=/
(8) | OK | 1 | 1 | ro root=LABEL=/ nolapic noapic
(9) | OK | 1 | 1 | ro root=LABEL=/ nolapic noapic maxcpus=0
(10) | OK | 1 | 1 | ro root=LABEL=/ nolapic noapic nosmp
(11) | OK | 2 | 1 | ro root=LABEL=/ maxcpus=1
(12) | OK | 2 | 2 | ro root=LABEL=/ maxcpus=2
(13) | OK | 2 | 2 | ro root=LABEL=/ maxcpus=4
(14) | BUG | 2 | 1 | ro root=LABEL=/ noapic nolapic maxcpus=1
(15) | OK | 2 | 2 | ro root=LABEL=/ noapic
(16) | BUG | 2 | 1 | ro root=LABEL=/ noapic maxcpus=0
(17) | OK | 2 | 1 | ro root=LABEL=/ noapic maxcpus=1
(18) | OK | 2 | 2 | ro root=LABEL=/ noapic maxcpus=2
(19) | BUG | 2 | 1 | ro root=LABEL=/ nolapic
(20) | BUG | 2 | 1 | ro root=LABEL=/ nolapic maxcpus=0
(21) | BUG | 2 | 1 | ro root=LABEL=/ nolapic maxcpus=1
(22) | BUG | 2 | 1 | ro root=LABEL=/ nolapic maxcpus=2

"# of CPUs recognized" means "grep -c ^processor /proc/cpcinfo".

Well, (2) (19) (22) are obviously wrong. "nolapic" option implies "nosmp" option?

Regards.