2009-07-12 12:47:36

by Thomas Meyer

[permalink] [raw]
Subject: 2.6.31-rc2: BUG: unable to handle kernel NULL pointer dereference

[ 6190.368242] Restarting tasks ... done.
[ 6191.252241] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
[ 6191.252253] IP: [<ffffffff8134beea>] strchr+0x24/0x45
[ 6191.252253] PGD 182503067 PUD 1771f4067 PMD 0
[ 6191.252253] Oops: 0000 [#1] PREEMPT SMP
[ 6191.252253] last sysfs file: /sys/devices/pci0000:00/0000:00:0f.0/0000:07:00.1/class
[ 6191.252253] CPU 1
[ 6191.252253] Modules linked in: rndis_wlan floppy olympic forcedeth [last unloaded: scsi_wait_scan]
[ 6191.252253] Pid: 23330, comm: gvfs-gdu-volume Tainted: G W 2.6.31-rc2 #79 MS-7250
[ 6191.252253] RIP: 0010:[<ffffffff8134beea>] [<ffffffff8134beea>] strchr+0x24/0x45
[ 6191.252253] RSP: 0018:ffff8801771b9d98 EFLAGS: 00010246
[ 6191.252253] RAX: 0000000000000040 RBX: 0000000000000001 RCX: 0000000000000008
[ 6191.252253] RDX: 0000000000000000 RSI: 000000000000002c RDI: 0000000000000040
[ 6191.252253] RBP: 0000000000000000 R08: 0000000000008020 R09: ffffffffffffffff
[ 6191.252253] R10: 0000000003e36d36 R11: ffffffff812e3591 R12: ffff880151808780
[ 6191.252253] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 6191.252253] FS: 00007f77e395a790(0000) GS:ffff880028050000(0000) knlGS:0000000000000000
[ 6191.252253] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6191.252253] CR2: 0000000000000040 CR3: 0000000182416000 CR4: 00000000000006e0
[ 6191.252253] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 6191.252253] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 6191.252253] Process gvfs-gdu-volume (pid: 23330, threadinfo ffff8801771b8000, task ffff88019a22a0d0)
[ 6191.252253] Stack:
[ 6191.252253] ffffffff81846b60 0000000003e36d36 0000000000000000 ffffffff812e37d7
[ 6191.252253] <0> ffff8801bd8f3600 ffff880151808780 ffff8801bd9eb2c0 ffff8801bd9eb2c0
[ 6191.252253] <0> ffff880100000001 0000000000000000 0000000000000000 0000000003e36d36
[ 6191.252253] Call Trace:
[ 6191.252253] [<ffffffff812e37d7>] ? selinux_sb_show_options+0x246/0x33c
[ 6191.252253] [<ffffffff811306a2>] ? show_vfsmnt+0xbb/0x133
[ 6191.252253] [<ffffffff81134785>] ? seq_read+0x293/0x376
[ 6191.252253] [<ffffffff81117f49>] ? vfs_read+0xb1/0x123
[ 6191.252253] [<ffffffff811180a3>] ? sys_read+0x55/0x90
[ 6191.252253] [<ffffffff8102bf2b>] ? system_call_fastpath+0x16/0x1b
[ 6191.252253] Code: d2 ff 48 83 c4 18 c3 48 83 ec 18 48 89 f8 65 48 8b 14 25 28 00 00 00 48 89 54 24 08 31 d2 eb 0b 84 d2 75 04 31 c0 eb 0a 48 ff c0 <8a> 10 40 38 f2 75 ee 48 8b 54 24 08 65 48 33 14 25 28 00 00 00
[ 6191.252253] RIP [<ffffffff8134beea>] strchr+0x24/0x45
[ 6191.252253] RSP <ffff8801771b9d98>
[ 6191.252253] CR2: 0000000000000040
[ 6191.258135] ---[ end trace f7a4d612974b9b43 ]---


2009-07-12 17:30:30

by Parag Warudkar

[permalink] [raw]
Subject: Re: 2.6.31-rc2: BUG: unable to handle kernel NULL pointer dereference

Thomas Meyer <thomas <at> m3y3r.de> writes:

>
> [ 6190.368242] Restarting tasks ... done.
> [ 6191.252241] BUG: unable to handle kernel NULL pointer dereference at
0000000000000040
> [ 6191.252253] IP: [<ffffffff8134beea>] strchr+0x24/0x45

[ ... ]

> [ 6191.252253] Call Trace:
> [ 6191.252253] [<ffffffff812e37d7>] ? selinux_sb_show_options+0x246/0x33c
> [ 6191.252253] [<ffffffff811306a2>] ? show_vfsmnt+0xbb/0x133
> [ 6191.252253] [<ffffffff81134785>] ? seq_read+0x293/0x376
> [ 6191.252253] [<ffffffff81117f49>] ? vfs_read+0xb1/0x123
> [ 6191.252253] [<ffffffff811180a3>] ? sys_read+0x55/0x90
> [ 6191.252253] [<ffffffff8102bf2b>] ? system_call_fastpath+0x16/0x1b
> [ 6191.252253] Code: d2 ff 48 83 c4 18 c3 48 83 ec 18 48 89 f8 65 48 8b 14 25
28 00 00 00 48 89 54 24 08 31 d2 eb 0b 84 d2 75 04
> 31 c0 eb 0a 48 ff c0 <8a> 10 40 38 f2 75 ee 48 8b 54 24 08 65 48 33 14 25 28
00 00 00
> [ 6191.252253] RIP [<ffffffff8134beea>] strchr+0x24/0x45
> [ 6191.252253] RSP <ffff8801771b9d98>
> [ 6191.252253] CR2: 0000000000000040
> [ 6191.258135] ---[ end trace f7a4d612974b9b43 ]---

Hmm. It very likely died in selinux_write_opts.
That is the only function that does the strchr in that code path -

static void selinux_write_opts(struct seq_file *m,
1012 struct security_mnt_opts *opts)
1013 {
1014 int i;
1015 char *prefix;
1016
1017 for (i = 0; i < opts->num_mnt_opts; i++) {
1018 char *has_comma;
1019
1020 if (opts->mnt_opts[i])
1021 has_comma = strchr(opts->mnt_opts[i], ',');
^^^^^^^^^^^^^^^^^^^^^^^^^
And that is a NULL pointer dereference - but we just checked for
opts->mnt_opts[i] for not NULL.

Were there other problems before this trace.
The Tainted: G seems to suggest that?

Parag

2009-07-12 20:26:35

by Jiri Slaby

[permalink] [raw]
Subject: Re: 2.6.31-rc2: BUG: unable to handle kernel NULL pointer dereference

On 07/12/2009 07:30 PM, Parag Warudkar wrote:
> static void selinux_write_opts(struct seq_file *m,
> 1012 struct security_mnt_opts *opts)
> 1013 {
> 1014 int i;
> 1015 char *prefix;
> 1016
> 1017 for (i = 0; i < opts->num_mnt_opts; i++) {
> 1018 char *has_comma;
> 1019
> 1020 if (opts->mnt_opts[i])
> 1021 has_comma = strchr(opts->mnt_opts[i], ',');
> ^^^^^^^^^^^^^^^^^^^^^^^^^
> And that is a NULL pointer dereference - but we just checked for
> opts->mnt_opts[i] for not NULL.

Note, that there is not a NULL dereference. It dereferences 0x40 which
came in as %rdi. Looks like somebody assigned garbage in there.

Or a single bit mem error. Is memtest OK with this machine?

What warning tainted the kernel before this oops is still interesting...

2009-07-12 21:57:20

by Eric Paris

[permalink] [raw]
Subject: Re: 2.6.31-rc2: BUG: unable to handle kernel NULL pointer dereference

On Sun, 2009-07-12 at 22:26 +0200, Jiri Slaby wrote:
> On 07/12/2009 07:30 PM, Parag Warudkar wrote:
> > static void selinux_write_opts(struct seq_file *m,
> > 1012 struct security_mnt_opts *opts)
> > 1013 {
> > 1014 int i;
> > 1015 char *prefix;
> > 1016
> > 1017 for (i = 0; i < opts->num_mnt_opts; i++) {
> > 1018 char *has_comma;
> > 1019
> > 1020 if (opts->mnt_opts[i])
> > 1021 has_comma = strchr(opts->mnt_opts[i], ',');
> > ^^^^^^^^^^^^^^^^^^^^^^^^^
> > And that is a NULL pointer dereference - but we just checked for
> > opts->mnt_opts[i] for not NULL.
>
> Note, that there is not a NULL dereference. It dereferences 0x40 which
> came in as %rdi. Looks like somebody assigned garbage in there.
>
> Or a single bit mem error. Is memtest OK with this machine?
>
> What warning tainted the kernel before this oops is still interesting...

I just looked over the selinux code where we build the
security_mnt_opts. We can do a 0 length kmalloc, but that should hurt
aything. I should probably not be doing any allocations and leaving the
opts->mnt_opts and opts->mnt_opts_flags == NULL, but 0x40 !=
ZERO_SIZE_PTR(0x10) nor is the security_mnt_opts structure anywhere near
large enough to hit an offset of 0x40.....

I really think I'd like to see any previous BUG/WARN messages you got
and like Jiri said, see if memtest86+ runs cleanly....

-Eric

2009-07-12 22:03:11

by Parag Warudkar

[permalink] [raw]
Subject: Re: 2.6.31-rc2: BUG: unable to handle kernel NULL pointer dereference

On Sun, Jul 12, 2009 at 4:26 PM, Jiri Slaby<[email protected]> wrote:
> On 07/12/2009 07:30 PM, Parag Warudkar wrote:
>> static void selinux_write_opts(struct seq_file *m,
>> 1012                                struct security_mnt_opts *opts)
>> 1013 {
>> 1014         int i;
>> 1015         char *prefix;
>> 1016
>> 1017         for (i = 0; i < opts->num_mnt_opts; i++) {
>> 1018                 char *has_comma;
>> 1019
>> 1020                 if (opts->mnt_opts[i])
>> 1021                         has_comma = strchr(opts->mnt_opts[i], ',');
>>                                          ^^^^^^^^^^^^^^^^^^^^^^^^^
>> And that is a NULL pointer dereference - but we just checked for
>> opts->mnt_opts[i] for not NULL.
>
> Note, that there is not a NULL dereference. It dereferences 0x40 which
> came in as %rdi. Looks like somebody assigned garbage in there.

Ah right - anything < PAGE_SIZE is "NULL" dereference to the kernel.

Thanks

2009-07-13 17:00:19

by Thomas Meyer

[permalink] [raw]
Subject: Re: 2.6.31-rc2: BUG: unable to handle kernel NULL pointer dereference

Am Sonntag, den 12.07.2009, 22:26 +0200 schrieb Jiri Slaby:
> On 07/12/2009 07:30 PM, Parag Warudkar wrote:
> > static void selinux_write_opts(struct seq_file *m,
> > 1012 struct security_mnt_opts *opts)
> > 1013 {
> > 1014 int i;
> > 1015 char *prefix;
> > 1016
> > 1017 for (i = 0; i < opts->num_mnt_opts; i++) {
> > 1018 char *has_comma;
> > 1019
> > 1020 if (opts->mnt_opts[i])
> > 1021 has_comma = strchr(opts->mnt_opts[i], ',');
> > ^^^^^^^^^^^^^^^^^^^^^^^^^
> > And that is a NULL pointer dereference - but we just checked for
> > opts->mnt_opts[i] for not NULL.
>
> Note, that there is not a NULL dereference. It dereferences 0x40 which
> came in as %rdi. Looks like somebody assigned garbage in there.
>
> Or a single bit mem error. Is memtest OK with this machine?

Will schedule a memtest run.

>
> What warning tainted the kernel before this oops is still interesting...

There seems to be no OOPS before that BUG. Does a WARNING also set the G
flag? Something like that:

Jul 12 16:42:34 localhost kernel: [ 6179.647004] ------------[ cut here ]------------
Jul 12 16:42:34 localhost kernel: [ 6179.647004] WARNING: at lib/kref.c:43 kref_get+0x2f/0x4c()
Jul 12 16:42:34 localhost kernel: [ 6179.647004] Hardware name: MS-7250
Jul 12 16:42:34 localhost kernel: [ 6179.647004] Modules linked in: rndis_wlan floppy olympic forcedeth [last unloaded: scsi_wait_scan]
Jul 12 16:42:34 localhost kernel: [ 6179.647004] Pid: 27483, comm: pm-suspend Tainted: G W 2.6.31-rc2 #79
Jul 12 16:42:34 localhost kernel: [ 6179.647004] Call Trace:
Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff8106c1c6>] ? warn_slowpath_common+0x84/0xb2
Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff8134876b>] ? kref_get+0x2f/0x4c
Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff813476b2>] ? kobject_get+0x26/0x44
Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff815a344f>] ? cpufreq_cpu_get+0x74/0xca
Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff815a3675>] ? cpufreq_suspend+0x5f/0x16f
Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff81044a4f>] ? disable_local_APIC+0x2d/0x6d
Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff81449b54>] ? sysdev_suspend+0xa7/0x28d
Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff810a1229>] ? suspend_devices_and_enter+0x11c/0x1e6
Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff810a13d8>] ? enter_state+0xe5/0x14c
Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff810a0841>] ? state_store+0xc1/0xf6
Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff81178bb8>] ? sysfs_write_file+0xe6/0x137
Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff81117d03>] ? vfs_write+0xb4/0x126
Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff81117e5d>] ? sys_write+0x55/0x90
Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff8102bf2b>] ? system_call_fastpath+0x16/0x1b
Jul 12 16:42:34 localhost kernel: [ 6179.647004] ---[ end trace f7a4d612974b9b3e ]---

Btw this seems also to be new in 2.6.31-rc2:
Jul 12 16:42:34 localhost kernel: [ 6187.256748] sd 2:0:0:0: [sdb] Starting disk
Jul 12 16:42:34 localhost kernel: [ 6190.318012]
Jul 12 16:42:34 localhost kernel: [ 6190.318128] floppy driver state
Jul 12 16:42:34 localhost kernel: [ 6190.318244] -------------------
Jul 12 16:42:34 localhost kernel: [ 6190.320684] now=4300857616 last interrupt=4294760495 diff=6097121 last called handler=ffffffffa00263ea
Jul 12 16:42:34 localhost kernel: [ 6190.320896] timeout_message=lock fdc
Jul 12 16:42:34 localhost kernel: [ 6190.321018] last output bytes:
Jul 12 16:42:34 localhost kernel: [ 6190.321135] f 80 4294754488
Jul 12 16:42:34 localhost kernel: [ 6190.321251] 0 90 4294754489
Jul 12 16:42:34 localhost kernel: [ 6190.321367] 1 91 4294754489
Jul 12 16:42:34 localhost kernel: [ 6190.321484] 8 81 4294754493
Jul 12 16:42:34 localhost kernel: [ 6190.321600] 4 80 4294756489
Jul 12 16:42:34 localhost kernel: [ 6190.321717] 0 90 4294756489
Jul 12 16:42:34 localhost kernel: [ 6190.321833] f 80 4294756489
Jul 12 16:42:34 localhost kernel: [ 6190.321949] 0 90 4294756489
Jul 12 16:42:34 localhost kernel: [ 6190.322073] 0 91 4294756490
Jul 12 16:42:34 localhost kernel: [ 6190.322189] 8 81 4294756494
Jul 12 16:42:34 localhost kernel: [ 6190.322305] f 80 4294758491
Jul 12 16:42:34 localhost kernel: [ 6190.322421] 0 90 4294758491
Jul 12 16:42:34 localhost kernel: [ 6190.322538] 1 91 4294758491
Jul 12 16:42:34 localhost kernel: [ 6190.322654] 8 81 4294758496
Jul 12 16:42:34 localhost kernel: [ 6190.322770] 4 80 4294760491
Jul 12 16:42:34 localhost kernel: [ 6190.322886] 0 90 4294760491
Jul 12 16:42:34 localhost kernel: [ 6190.323010] f 80 4294760491
Jul 12 16:42:34 localhost kernel: [ 6190.323126] 0 90 4294760491
Jul 12 16:42:34 localhost kernel: [ 6190.323243] 0 91 4294760491
Jul 12 16:42:34 localhost kernel: [ 6190.323359] 8 81 4294760495
Jul 12 16:42:34 localhost kernel: [ 6190.323475] last result at 4294760495
Jul 12 16:42:34 localhost kernel: [ 6190.323591] last redo_fd_request at 4296147775
Jul 12 16:42:34 localhost kernel: [ 6190.323708] 20 0
Jul 12 16:42:34 localhost kernel: [ 6190.323869] status=4e
Jul 12 16:42:34 localhost kernel: [ 6190.323984] fdc_busy=1
Jul 12 16:42:34 localhost kernel: [ 6190.324107] do_floppy=ffffffffa0022263
Jul 12 16:42:34 localhost kernel: [ 6190.324224] cont=ffffffffa002d190
Jul 12 16:42:34 localhost kernel: [ 6190.324340] current_req=(null)
Jul 12 16:42:34 localhost kernel: [ 6190.324456] command_status=-1
Jul 12 16:42:34 localhost kernel: [ 6190.324572]
Jul 12 16:42:34 localhost kernel: [ 6190.324689] floppy0: floppy timeout called
Jul 12 16:42:34 localhost kernel: [ 6190.365498] PM: resume devices took 10.626 seconds

greets
thomas

2009-07-13 17:45:11

by Jiri Slaby

[permalink] [raw]
Subject: Re: 2.6.31-rc2: BUG: unable to handle kernel NULL pointer dereference

On 07/13/2009 07:00 PM, Thomas Meyer wrote:
>> What warning tainted the kernel before this oops is still interesting...
>
> There seems to be no OOPS before that BUG. Does a WARNING also set the G
> flag?

No, W is set by warning. G means no proprietary modules loaded.

> Something like that:
>
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] ------------[ cut here ]------------
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] WARNING: at lib/kref.c:43 kref_get+0x2f/0x4c()

It tried to get a reference of a structure which was already "freed".
Its reference count already had lowered to 0.

> Jul 12 16:42:34 localhost kernel: [ 6179.647004] Hardware name: MS-7250
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] Modules linked in: rndis_wlan floppy olympic forcedeth [last unloaded: scsi_wait_scan]
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] Pid: 27483, comm: pm-suspend Tainted: G W 2.6.31-rc2 #79
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] Call Trace:
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff8106c1c6>] ? warn_slowpath_common+0x84/0xb2
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff8134876b>] ? kref_get+0x2f/0x4c
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff813476b2>] ? kobject_get+0x26/0x44
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff815a344f>] ? cpufreq_cpu_get+0x74/0xca
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff815a3675>] ? cpufreq_suspend+0x5f/0x16f
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff81044a4f>] ? disable_local_APIC+0x2d/0x6d
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff81449b54>] ? sysdev_suspend+0xa7/0x28d
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff810a1229>] ? suspend_devices_and_enter+0x11c/0x1e6
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff810a13d8>] ? enter_state+0xe5/0x14c
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff810a0841>] ? state_store+0xc1/0xf6
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff81178bb8>] ? sysfs_write_file+0xe6/0x137
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff81117d03>] ? vfs_write+0xb4/0x126
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff81117e5d>] ? sys_write+0x55/0x90
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff8102bf2b>] ? system_call_fastpath+0x16/0x1b
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] ---[ end trace f7a4d612974b9b3e ]---
>
> Btw this seems also to be new in 2.6.31-rc2:

This is unrelated I think. I met it many times.

> Jul 12 16:42:34 localhost kernel: [ 6187.256748] sd 2:0:0:0: [sdb] Starting disk
> Jul 12 16:42:34 localhost kernel: [ 6190.318012]
> Jul 12 16:42:34 localhost kernel: [ 6190.318128] floppy driver state
> Jul 12 16:42:34 localhost kernel: [ 6190.318244] -------------------
> Jul 12 16:42:34 localhost kernel: [ 6190.320684] now=4300857616 last interrupt=4294760495 diff=6097121 last called handler=ffffffffa00263ea
> Jul 12 16:42:34 localhost kernel: [ 6190.320896] timeout_message=lock fdc

2009-07-20 17:24:01

by Thomas Meyer

[permalink] [raw]
Subject: Re: 2.6.31-rc2: BUG: unable to handle kernel NULL pointer dereference

Am Sonntag, den 12.07.2009, 22:26 +0200 schrieb Jiri Slaby:
> On 07/12/2009 07:30 PM, Parag Warudkar wrote:
> > static void selinux_write_opts(struct seq_file *m,
> > 1012 struct security_mnt_opts *opts)
> > 1013 {
> > 1014 int i;
> > 1015 char *prefix;
> > 1016
> > 1017 for (i = 0; i < opts->num_mnt_opts; i++) {
> > 1018 char *has_comma;
> > 1019
> > 1020 if (opts->mnt_opts[i])
> > 1021 has_comma = strchr(opts->mnt_opts[i], ',');
> > ^^^^^^^^^^^^^^^^^^^^^^^^^
> > And that is a NULL pointer dereference - but we just checked for
> > opts->mnt_opts[i] for not NULL.
>
> Note, that there is not a NULL dereference. It dereferences 0x40 which
> came in as %rdi. Looks like somebody assigned garbage in there.
>
> Or a single bit mem error. Is memtest OK with this machine?
Yes it ran fine for one cycle (about 2:15 hours).

Maybe memory in acpi S3 is not so stable? Is this possible?

>
> What warning tainted the kernel before this oops is still interesting...

2009-07-22 20:28:15

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.31-rc2: BUG: unable to handle kernel NULL pointer dereference

On Monday 20 July 2009, Thomas Meyer wrote:
> Am Sonntag, den 12.07.2009, 22:26 +0200 schrieb Jiri Slaby:
> > On 07/12/2009 07:30 PM, Parag Warudkar wrote:
> > > static void selinux_write_opts(struct seq_file *m,
> > > 1012 struct security_mnt_opts *opts)
> > > 1013 {
> > > 1014 int i;
> > > 1015 char *prefix;
> > > 1016
> > > 1017 for (i = 0; i < opts->num_mnt_opts; i++) {
> > > 1018 char *has_comma;
> > > 1019
> > > 1020 if (opts->mnt_opts[i])
> > > 1021 has_comma = strchr(opts->mnt_opts[i], ',');
> > > ^^^^^^^^^^^^^^^^^^^^^^^^^
> > > And that is a NULL pointer dereference - but we just checked for
> > > opts->mnt_opts[i] for not NULL.
> >
> > Note, that there is not a NULL dereference. It dereferences 0x40 which
> > came in as %rdi. Looks like somebody assigned garbage in there.
> >
> > Or a single bit mem error. Is memtest OK with this machine?
> Yes it ran fine for one cycle (about 2:15 hours).
>
> Maybe memory in acpi S3 is not so stable? Is this possible?

Generally, it is, but I haven't seen it happen yet. In theory, if the memory
chips are not refreshed appropriately while suspended, something like this may
happen.

Best,
Rafael

2009-07-24 11:03:38

by Jiri Slaby

[permalink] [raw]
Subject: Re: 2.6.31-rc2: BUG: unable to handle kernel NULL pointer dereference

On 07/22/2009 10:28 PM, Rafael J. Wysocki wrote:
> On Monday 20 July 2009, Thomas Meyer wrote:
>> Maybe memory in acpi S3 is not so stable? Is this possible?
>
> Generally, it is, but I haven't seen it happen yet. In theory, if the memory
> chips are not refreshed appropriately while suspended, something like this may
> happen.

You can check it by a program from:
http://lkml.org/lkml/2008/4/24/439

Only change 1000000000 to circa 5/6 of your memory size. Suspend the
machine after "alloced" appears on the console.