2008-06-04 15:48:45

by Dave Jones

[permalink] [raw]
Subject: 2.6.25 md oops during boot.

Hi Neil,
Here's an odd one.
https://bugzilla.redhat.com/show_bug.cgi?id=442204

Slightly old (.25-rc8-git7), but I don't recall anything changing between
then and .25 final that could explain this.

Dave

BUG: unable to handle kernel NULL pointer dereference at 00000020
IP: [<c04bbb65>] sysfs_addrm_start+0x21/0x7d
Oops: 0000 [#1] SMP
Modules linked in: e1000(+) i2c_piix4 i2c_core sg ac button sr_mod cdrom
ata_piix libata dm_snapshot dm_zero dm_mirror dm_mod mptspi mptscsih mptbase
scsi_transport_spi sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd
[last unloaded: scsi_wait_scan]

Pid: 742, comm: mdadm Not tainted (2.6.25-0.218.rc8.git7.fc9.i686 #1)
EIP: 0060:[<c04bbb65>] EFLAGS: 00010246 CPU: 0
EIP is at sysfs_addrm_start+0x21/0x7d
EAX: c04bbc23 EBX: 00000000 ECX: 00000000 EDX: 00000062
ESI: f6ecdc84 EDI: f6ecdc94 EBP: f6ecdc78 ESP: f6ecdc6c
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process mdadm (pid: 742, ti=f6ecd000 task=f6e22e80 task.ti=f6ecd000)
Stack: f6ecdc84 f6e75bd0 fffffff4 f6ecdca0 c04bbfb7 00000000 00000000 00000000
00000000 00000000 f7853dc4 fffffffe f7af6824 f6ecdcb4 c04bc01c f6ecdcac
c04f08f2 f7853dc4 f6ecdcd0 c04f09ec f7af6824 f6ecdcd0 00000000 f7853dc4
Call Trace:
[<c04bbfb7>] ? create_dir+0x3a/0x72
[<c04bc01c>] ? sysfs_create_dir+0x2d/0x41
[<c04f08f2>] ? kobject_get+0x12/0x17
[<c04f09ec>] ? kobject_add_internal+0xa4/0x145
[<c04f0b21>] ? kobject_add_varg+0x35/0x41
[<c04f0b92>] ? kobject_add+0x43/0x49
[<c05948aa>] ? bind_rdev_to_array+0x124/0x1ab
[<c04cf900>] ? task_has_capability+0x47/0x76
[<c04f4f94>] ? copy_from_user+0x39/0x121
[<c05994d1>] ? md_ioctl+0xf75/0x183b
[<c04cfde2>] ? inode_has_perm+0x5b/0x65
[<c0490ce5>] ? d_free+0x3b/0x4d
[<c0492094>] ? dput+0x34/0xee
[<c04efb51>] ? _atomic_dec_and_lock+0x29/0x44
[<c04958b8>] ? mntput_no_expire+0x16/0x69
[<c0488796>] ? path_put+0x20/0x23
[<c048ab1b>] ? __link_path_walk+0xce7/0xcfc
[<c04e831d>] ? blkdev_driver_ioctl+0x49/0x5b
[<c04e8ab0>] ? blkdev_ioctl+0x781/0x79d
[<c04d06a8>] ? selinux_file_free_security+0x14/0x16
[<c04cfde2>] ? inode_has_perm+0x5b/0x65
[<c04d01b9>] ? file_has_perm+0x7c/0x85
[<c04a241b>] ? block_ioctl+0x16/0x1b
[<c04a2405>] ? block_ioctl+0x0/0x1b
[<c048c99e>] ? vfs_ioctl+0x22/0x69
[<c048cc1e>] ? do_vfs_ioctl+0x239/0x24c
[<c04d034f>] ? selinux_file_ioctl+0xa8/0xab
[<c048cc71>] ? sys_ioctl+0x40/0x5b
[<c0405cf6>] ? syscall_call+0x7/0xb
=======================
Code: 89 31 8d 65 f4 5b 5e 5f 5d c3 55 b9 04 00 00 00 89 e5 57 89 c7 56 89 c6 53
31 c0 89 d3 f3 ab b8 34 fc 71 c0 89 16 e8 1f e2 16 00 <8b> 53 20 b9 f0 b7 4b c0
a1 88 f4 83 c0 53 e8 43 71 fd ff 5f 85
EIP: [<c04bbb65>] sysfs_addrm_start+0x21/0x7d SS:ESP 0068:f6ecdc6c
---[ end trace ca143223eefdc828 ]---


--
http://www.codemonkey.org.uk


2008-06-04 23:12:21

by NeilBrown

[permalink] [raw]
Subject: Re: 2.6.25 md oops during boot.

On Wednesday June 4, [email protected] wrote:
> Hi Neil,
> Here's an odd one.
> https://bugzilla.redhat.com/show_bug.cgi?id=442204
>
> Slightly old (.25-rc8-git7), but I don't recall anything changing between
> then and .25 final that could explain this.
>
> Dave

Hi Dave.

Yes, Odd.

It appear that sysfs_addrm_start is being called with parent_sd == NULL.

That implies that sysfs_create_dir is being given a kobj where
->parent is non-NULL, and ->parent->sd is NULL.

So kobject_add is being given a parent with a NULL ->sd.

So in bind_rdev_to_array, mddev->kobj.sd is NULL.

So in md_probe, either kobject_init_and_add is failing
to set up ->sd properly (which should result in an error message
"md: cannot register md0/md - name in use"
) or alloc_disk is failing.


The most likely scenario is that alloc_disk is failing, so the
md_probe call in autorun_devices (line 3804 of md.c) fails.
The following mddev_find creates a new mddev which is not properly
initialised and gets used.

I wouldn't say this is a likely scenario as it requires (I think)
kmalloc failure very early in boot. But I cannot see any other
possible cause.

I'll see about getting the error paths handled better.

NeilBrown

2008-06-04 23:30:09

by NeilBrown

[permalink] [raw]
Subject: Re: 2.6.25 md oops during boot.

On Thursday June 5, [email protected] wrote:
>
> I wouldn't say this is a likely scenario as it requires (I think)
> kmalloc failure very early in boot. But I cannot see any other
> possible cause.

On closer inspection, I can see another possible cause. I don't think
it is likely (yet) but it might be possible.

If two threads enter md_probe for the same mddev, then the second one
to get disks_mutex could exit before the first had called
kobject_init_and_add, so it could make available an mddev where
kobj.sd was NULL.

I cannot imagine how two threads could be doing that so early in boot,
but I cannot rule it out.

This (untested) patch should close both these possible problems.

NeilBrown


-----------------
Fix error paths if md_probe fails.

md_probe can fail (e.g. alloc_disk could fail) without
returning an error (as it alway returns NULL).
So when we call mddev_find immediately afterwards, we need
to check that md_probe actually succeeded. This means checking
that mdev->gendisk is non-NULL.

Also there is a possible race - if two threads call md_probe
for the same device, then one could exit (having checked that
->gendisk exists) before the other has called kobject_init_and_add,
thus returning an incomplete kobj which is cause problems when
we try to add children to it.

So extend the range of protection of disks_mutex slightly to
avoid this possibility.

Cc: Dave Jones <[email protected]>
Signed-off-by: Neil Brown <[email protected]>

### Diffstat output
./drivers/md/md.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff .prev/drivers/md/md.c ./drivers/md/md.c
--- .prev/drivers/md/md.c 2008-06-03 16:35:41.000000000 +1000
+++ ./drivers/md/md.c 2008-06-05 09:19:56.000000000 +1000
@@ -3363,9 +3363,9 @@ static struct kobject *md_probe(dev_t de
disk->queue = mddev->queue;
add_disk(disk);
mddev->gendisk = disk;
- mutex_unlock(&disks_mutex);
error = kobject_init_and_add(&mddev->kobj, &md_ktype, &disk->dev.kobj,
"%s", "md");
+ mutex_unlock(&disks_mutex);
if (error)
printk(KERN_WARNING "md: cannot register %s/md - name in use\n",
disk->disk_name);
@@ -3935,8 +3935,10 @@ static void autorun_devices(int part)

md_probe(dev, NULL, NULL);
mddev = mddev_find(dev);
- if (!mddev) {
- printk(KERN_ERR
+ if (!mddev || !mddev->gendisk) {
+ if (mddev)
+ mddev_put(mddev);
+ printk(KERN_ERR
"md: cannot allocate memory for md drive.\n");
break;
}