2008-01-25 13:39:44

by Max Kellermann

[permalink] [raw]
Subject: 2.6.23.1: oops in diskstats_show()

Hi,

my kernel 2.6.23.1 oopsed today in diskstats_show(), leaving the
block_subsys_lock mutex locked. I have an Athlon 64 X2 (dual-core),
architecture x86_64. I have not tried with 2.6.24 yet, but it looks
like there was no relevant change in 2.6.24.

Right before the crash, I was torturing cryptsetup to experiment with
encryption - I created and deleted device mapper objects in a loop.

The NULL dereference happened inside "disk_stat_read(gp,
time_in_queue)", because the percpu pointer gendisk.dkstats was NULL
for the second CPU core (CPU 1). From reading the code, I could not
see why this would happen - maybe the object is freed on one CPU while
diskstats_show() runs on the other CPU? It looks like this condition
is not protected by a mutex/spinlock, but I do not know the code well
enough to estimate if this is required.

Anyway, here is the dmesg output, and my kernel config is attached.
Sadly, the kernel is tainted by the nvidia module.

Unable to handle kernel NULL pointer dereference at 0000000000000048 RIP:
[<ffffffff803c7aa6>] diskstats_show+0x4d/0x40c
PGD 775ec067 PUD 775ed067 PMD 0
Oops: 0000 [1] SMP
CPU 1
Modules linked in: nvidia(P)
Pid: 3509, comm: collectd Tainted: P 2.6.23.1-rabbit #1
RIP: 0010:[<ffffffff803c7aa6>] [<ffffffff803c7aa6>] diskstats_show+0x4d/0x40c
RSP: 0018:ffff810077943d88 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000800 RCX: 0000000000000001
RDX: 0000000000000001 RSI: ffffffff807fa770 RDI: 0000000000000001
RBP: ffff81006779b600 R08: 0000000000000000 R09: 0000000100349020
R10: 0000000000000000 R11: 0000000000000202 R12: 000000000000021e
R13: 00000000000003d9 R14: 0000000000000027 R15: 00002aaaaad01027
FS: 0000000041802960(0063) GS:ffff81007ffa2100(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000048 CR3: 00000000775eb000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process collectd (pid: 3509, threadinfo ffff810077942000, task ffff810079311590)
Stack: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 000000004799bc12 000000003773d45e
Call Trace:
[<ffffffff8028ec5c>] mntput_no_expire+0x1c/0x78
[<ffffffff80293291>] seq_read+0x1b8/0x28f
[<ffffffff802ae5fb>] proc_reg_read+0x7e/0x98
[<ffffffff8027be73>] vfs_read+0xaa/0x132
[<ffffffff8027c20f>] sys_read+0x45/0x6e
[<ffffffff8020bdee>] system_call+0x7e/0x83


Code: 48 03 58 48 e8 c8 59 00 00 89 c1 83 f9 01 7e d6 48 89 df 31
RIP [<ffffffff803c7aa6>] diskstats_show+0x4d/0x40c
RSP <ffff810077943d88>
CR2: 0000000000000048


Other interesting values:

(gdb) p cpu_possible_map
$2 = {bits = {4294901760}}
(gdb) p block_subsys_lock
$2 = {count = {counter = -1}, wait_lock = {raw_lock = {slock = 1}},
wait_list = {next = 0xffff81004f131df0,
prev = 0xffff81003b1bbe20}}
(gdb) p block_subsys
$1 = {ktype = 0xffffffff80756a80, list = {next = 0xffff810037ea6e90,
prev = 0xffff81007fc9a690}, list_lock = {raw_lock = {
slock = 1}}, kobj = {k_name = 0xffffffff807569e8 "block", name =
"block", '\0' <repeats 14 times>, kref = {refcount = {
counter = 35}}, entry = {next = 0xffffffff80756a00, prev =
0xffffffff80756a00}, parent = 0x0, kset = 0x0, ktype = 0x0,
sd = 0xffff810037aa39a0, poll = {lock = {raw_lock = {slock = 1}},
task_list = {next = 0xffffffff80756a38,
prev = 0xffffffff80756a38}}}, uevent_ops = 0xffffffff80756ac0}


This is the relevant assembly snippet:

0xffffffff803c7a77 <diskstats_show+30>: callq 0xffffffff803c2ddf
<disk_round_stats>
0xffffffff803c7a7c <diskstats_show+35>: mov
$0xffffffff807fa770,%rdi
0xffffffff803c7a83 <diskstats_show+42>: callq 0xffffffff803cd458
<__first_cpu>
0xffffffff803c7a88 <diskstats_show+47>: mov %eax,%ecx
0xffffffff803c7a8a <diskstats_show+49>: jmp 0xffffffff803c7ab1
<diskstats_show+88>
0xffffffff803c7a8c <diskstats_show+51>: mov 0x108(%rbp),%rax
0xffffffff803c7a93 <diskstats_show+58>: movslq %ecx,%rdx
0xffffffff803c7a96 <diskstats_show+61>: mov %ecx,%edi
0xffffffff803c7a98 <diskstats_show+63>: mov
$0xffffffff807fa770,%rsi
0xffffffff803c7a9f <diskstats_show+70>: not %rax
0xffffffff803c7aa2 <diskstats_show+73>: mov (%rax,%rdx,8),%rax
0xffffffff803c7aa6 <diskstats_show+77>: add 0x48(%rax),%rbx
0xffffffff803c7aaa <diskstats_show+81>: callq 0xffffffff803cd477
<__next_cpu>
0xffffffff803c7aaf <diskstats_show+86>: mov %eax,%ecx
0xffffffff803c7ab1 <diskstats_show+88>: cmp $0x1,%ecx
0xffffffff803c7ab4 <diskstats_show+91>: jle 0xffffffff803c7a8c
<diskstats_show+51>


Max


Attachments:
(No filename) (4.60 kB)
config-2.6.23.1-rabbit (33.42 kB)
Download all attachments

2008-01-29 22:23:59

by Rick Lindsley

[permalink] [raw]
Subject: Re: 2.6.23.1: oops in diskstats_show()

my kernel 2.6.23.1 oopsed today in diskstats_show(), leaving the
block_subsys_lock mutex locked. I have an Athlon 64 X2 (dual-core),
architecture x86_64. I have not tried with 2.6.24 yet, but it looks
like there was no relevant change in 2.6.24.

Hmm. Yes, this should not happen. Having grabbed the block_subsys_lock
mutex should have been sufficient to keep the disk list stable. And the
evidence you sent along suggests that yes, the gendisk structure may have
gotten cleared after you got a reference to it but before you could get
the data from it.

Right before the crash, I was torturing cryptsetup to experiment with
encryption - I created and deleted device mapper objects in a loop.

Creating and deleting block devices ... yeah, this is probably key to
the problem. Can you provide the command or, more likely, script you
were using to do this?

Rick

2008-01-29 22:32:45

by Max Kellermann

[permalink] [raw]
Subject: Re: 2.6.23.1: oops in diskstats_show()

On 2008/01/29 23:22, Rick Lindsley <[email protected]> wrote:
> Right before the crash, I was torturing cryptsetup to experiment with
> encryption - I created and deleted device mapper objects in a loop.
>
> Creating and deleting block devices ... yeah, this is probably key to
> the problem. Can you provide the command or, more likely, script you
> were using to do this?

Hi Rick,

it was a very simple script, just the following two commands over and
over, nothing more:

"cryptsetup create foo /dev/sdc6",
"cryptsetup remove foo"

This is my workstation, I have been working at the same time (emacs,
gcc, ...), nothing special.

Max