2003-09-10 20:48:57

by spam

[permalink] [raw]
Subject: 2.6.0-test5: Oops on mount /dev/md0

While experimenting with RAID-1 on two loopback devices, I got a
kernel Oops. The RAID-1 volume contains an ext2 filesystem, which I
can mount just fine. However, trying to mount it after turning off the
RAID device with raidstop (which should fail of course), results in a
segmentation fault and kernel Oops. Something is stuck after this,
causing "sync" to hang. This is the scenario:

# cat /etc/raidtab
raiddev /dev/md0
raid-level 1
nr-raid-disks 2
nr-spare-disks 0
chunk-size 4
persistent-superblock 1
device /dev/loop1
raid-disk 0
device /dev/loop2
raid-disk 1
# losetup /dev/loop1 /tmp/img1
# losetup /dev/loop2 /tmp/img2
# raidstart /dev/md0
# raidstop /dev/md0
# mount -t ext2 /dev/md0 /mnt
Segmentation fault

>From /var/log/messages:

Sep 10 21:45:42 zaphod kernel: md: autorun ...
Sep 10 21:45:42 zaphod kernel: md: considering loop2 ...
Sep 10 21:45:42 zaphod kernel: md: adding loop2 ...
Sep 10 21:45:42 zaphod kernel: md: adding loop1 ...
Sep 10 21:45:42 zaphod kernel: md: created md0
Sep 10 21:45:42 zaphod kernel: md: bind<loop1>
Sep 10 21:45:42 zaphod kernel: md: bind<loop2>
Sep 10 21:45:42 zaphod kernel: md: running: <loop2><loop1>
Sep 10 21:45:42 zaphod kernel: raid1: raid set md0 active with 2 out of 2 mirrors
Sep 10 21:45:42 zaphod kernel: md: ... autorun DONE.
Sep 10 21:45:42 zaphod kernel: md: syncing RAID array md0
Sep 10 21:45:42 zaphod kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Sep 10 21:45:42 zaphod kernel: md: using maximum available idle IO bandwith (but not more than 200000 KB/sec) for reconstruction.
Sep 10 21:45:42 zaphod kernel: md: using 128k window, over a total of 10176 blocks.
Sep 10 21:45:45 zaphod kernel: md: md0: sync done.
Sep 10 21:45:48 zaphod kernel: md: md0 stopped.
Sep 10 21:45:48 zaphod kernel: md: unbind<loop2>
Sep 10 21:45:48 zaphod kernel: md: export_rdev(loop2)
Sep 10 21:45:48 zaphod kernel: md: unbind<loop1>
Sep 10 21:45:48 zaphod kernel: md: export_rdev(loop1)
Sep 10 21:45:57 zaphod kernel: printing eip:
Sep 10 21:45:57 zaphod kernel: c03a7747
Sep 10 21:45:57 zaphod kernel: Oops: 0000 [#1]
Sep 10 21:45:57 zaphod kernel: CPU: 1
Sep 10 21:45:57 zaphod kernel: EIP: 0060:[make_request+23/752] Not tainted
Sep 10 21:45:57 zaphod kernel: EFLAGS: 00010296
Sep 10 21:45:57 zaphod kernel: EIP is at make_request+0x17/0x2f0
Sep 10 21:45:57 zaphod kernel: eax: dbce76e0 ebx: dbce1600 ecx: dbfd3580 edx: 00000400
Sep 10 21:45:57 zaphod kernel: esi: 00000002 edi: 00000000 ebp: 00000002 esp: d61e9d5c
Sep 10 21:45:57 zaphod kernel: ds: 007b es: 007b ss: 0068
Sep 10 21:45:57 zaphod kernel: Process mount (pid: 1053, threadinfo=d61e8000 task=d750d350)
Sep 10 21:45:57 zaphod kernel: Stack: dbfde4a0 00000200 00000000 00000000 dbce76e0 c0120750 d61e9d94 d61e9d94
Sep 10 21:45:57 zaphod kernel: d6bbcf10 00000000 00000000 00000000 d750d350 dbce1600 00000002 d5b41a20
Sep 10 21:45:57 zaphod kernel: 00000002 c02d190f dbce1600 d5b41a20 00000000 00000000 00000010 c01641d7
Sep 10 21:45:57 zaphod kernel: Call Trace:
Sep 10 21:45:57 zaphod kernel: [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50
Sep 10 21:45:57 zaphod kernel: [generic_make_request+367/496] generic_make_request+0x16f/0x1f0
Sep 10 21:45:57 zaphod kernel: [bio_alloc+215/448] bio_alloc+0xd7/0x1c0
Sep 10 21:45:57 zaphod kernel: [submit_bh+161/512] submit_bh+0xa1/0x200
Sep 10 21:45:57 zaphod kernel: [submit_bio+84/160] submit_bio+0x54/0xa0
Sep 10 21:45:57 zaphod kernel: [__bread_slow+91/176] __bread_slow+0x5b/0xb0
Sep 10 21:45:57 zaphod kernel: [__bread+62/80] __bread+0x3e/0x50
Sep 10 21:45:57 zaphod kernel: [ext2_fill_super+175/2336] ext2_fill_super+0xaf/0x920
Sep 10 21:45:57 zaphod kernel: [disk_name+177/192] disk_name+0xb1/0xc0
Sep 10 21:45:57 zaphod kernel: [sb_set_blocksize+37/96] sb_set_blocksize+0x25/0x60
Sep 10 21:45:57 zaphod kernel: [get_sb_bdev+296/352] get_sb_bdev+0x128/0x160
Sep 10 21:45:57 zaphod kernel: [alloc_vfsmnt+133/192] alloc_vfsmnt+0x85/0xc0
Sep 10 21:45:57 zaphod kernel: [ext2_get_sb+47/64] ext2_get_sb+0x2f/0x40
Sep 10 21:45:57 zaphod kernel: [ext2_fill_super+0/2336] ext2_fill_super+0x0/0x920
Sep 10 21:45:57 zaphod kernel: [do_kern_mount+95/224] do_kern_mount+0x5f/0xe0
Sep 10 21:45:57 zaphod kernel: [do_add_mount+160/432] do_add_mount+0xa0/0x1b0
Sep 10 21:45:57 zaphod kernel: [do_mount+352/432] do_mount+0x160/0x1b0
Sep 10 21:45:57 zaphod kernel: [copy_mount_options+268/272] copy_mount_options+0x10c/0x110
Sep 10 21:45:57 zaphod kernel: [sys_mount+242/368] sys_mount+0xf2/0x170
Sep 10 21:45:57 zaphod kernel: [syscall_call+7/11] syscall_call+0x7/0xb
Sep 10 21:45:57 zaphod kernel:
Sep 10 21:45:57 zaphod kernel: Code: 8b 47 08 89 44 24 0c fa bb 00 e0 ff ff 21 e3 ff 43 14 f0 fe

This is a dual Pentium-III kernel with the following RAID options:

CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
CONFIG_MD_RAID1=y
CONFIG_BLK_DEV_DM=y

The mount version is 2.11z.

--
Dick Streefland //// De Bilt
[email protected] (@ @) The Netherlands
------------------------------oOO--(_)--OOo------------------


2003-09-12 05:08:40

by NeilBrown

[permalink] [raw]
Subject: Re: 2.6.0-test5: Oops on mount /dev/md0

On Wednesday September 10, [email protected] wrote:
> While experimenting with RAID-1 on two loopback devices, I got a
> kernel Oops. The RAID-1 volume contains an ext2 filesystem, which I
> can mount just fine. However, trying to mount it after turning off the
> RAID device with raidstop (which should fail of course), results in a
> segmentation fault and kernel Oops. Something is stuck after this,
> causing "sync" to hang. This is the scenario:

Yeh... thanks for the report.
This patch should fix it, and another case where an array that is
really stopped accepts io requests and crashes.

NeilBrown

=========================================
Don't setup make_request_fn for md array until *after* it has been start.

Also revert to md_fail_request before stopping an array.

The ->stop method can never fail, so there is not point checking it.

----------- Diffstat output ------------
./drivers/md/md.c | 15 ++++++---------
1 files changed, 6 insertions(+), 9 deletions(-)

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2003-09-12 14:44:39.000000000 +1000
+++ ./drivers/md/md.c 2003-09-12 15:05:18.000000000 +1000
@@ -1607,9 +1607,6 @@ static int do_md_run(mddev_t * mddev)
mddev->pers = pers[pnum];
spin_unlock(&pers_lock);

- blk_queue_make_request(mddev->queue, mddev->pers->make_request);
- mddev->queue->queuedata = mddev;
-
err = mddev->pers->run(mddev);
if (err) {
printk(KERN_ERR "md: pers->run() failed ...\n");
@@ -1627,6 +1624,10 @@ static int do_md_run(mddev_t * mddev)
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
md_wakeup_thread(mddev->thread);
set_capacity(disk, mddev->array_size<<1);
+
+ blk_queue_make_request(mddev->queue, mddev->pers->make_request);
+ mddev->queue->queuedata = mddev;
+
return 0;
}

@@ -1698,12 +1699,8 @@ static int do_md_stop(mddev_t * mddev, i
} else {
if (mddev->ro)
set_disk_ro(disk, 0);
- if (mddev->pers->stop(mddev)) {
- err = -EBUSY;
- if (mddev->ro)
- set_disk_ro(disk, 1);
- goto out;
- }
+ blk_queue_make_request(mddev->queue, md_fail_request);
+ mddev->pers->stop(mddev);
module_put(mddev->pers->owner);
mddev->pers = NULL;
if (mddev->ro)

2003-09-13 09:31:22

by Dick Streefland

[permalink] [raw]
Subject: Re: 2.6.0-test5: Oops on mount /dev/md0

On Friday 2003-09-12 15:08, Neil Brown wrote:
| Yeh... thanks for the report.
| This patch should fix it, and another case where an array that is
| really stopped accepts io requests and crashes.

Yes, the patch works. Thanks.

--
Dick Streefland //// De Bilt
[email protected] (@ @) The Netherlands
------------------------------oOO--(_)--OOo------------------