2003-06-21 21:35:11

by Lou Langholtz

[permalink] [raw]
Subject: [PATCH] nbd driver for 2.5+: fix for module removal & new block device layer

diff -urN linux-2.5.72/drivers/block/nbd.c linux-2.5.72-patched/drivers/block/nbd.c
--- linux-2.5.72/drivers/block/nbd.c 2003-06-16 22:19:44.000000000 -0600
+++ linux-2.5.72-patched/drivers/block/nbd.c 2003-06-21 15:30:17.860967573 -0600
@@ -28,6 +28,7 @@
* the transmit lock. <[email protected]>
* 02-10-11 Allow hung xmit to be aborted via SIGKILL & various fixes.
* <[email protected]> <[email protected]>
+ * 03-06-21 Fix memory corruption from module removal. <[email protected]>
*
* possible FIXME: make set_sock / set_blksize / set_size / do_it one syscall
* why not: would need verify_area and friends, would share yet another
@@ -63,7 +64,18 @@

static struct nbd_device nbd_dev[MAX_NBD];

-static spinlock_t nbd_lock = SPIN_LOCK_UNLOCKED;
+/*
+ * Have these per nbd_device as is now required by the new block layer.
+ * This also helps prevent I/O bottle necks between multiple nbd_devices
+ * resulting in better overall response times!
+ *
+ * ldl: Keep these out from nbd_device for now till the whole 2.5 blockdev
+ * reworking shakes out. Who knows... maybe struct request_queue's
+ * queue_lock field will someday actually be the spinlock instead of just
+ * being a pointer to it!
+ */
+static struct request_queue nbd_queue[MAX_NBD];
+static spinlock_t nbd_lock[MAX_NBD];

#define DEBUG( s )
/* #define DEBUG( s ) printk( s )
@@ -538,8 +550,6 @@
* (Just smiley confuses emacs :-)
*/

-static struct request_queue nbd_queue;
-
static int __init nbd_init(void)
{
int err = -ENOMEM;
@@ -551,6 +561,11 @@
}

for (i = 0; i < MAX_NBD; i++) {
+ nbd_lock[i] = SPIN_LOCK_UNLOCKED;
+ blk_init_queue(&nbd_queue[i], do_nbd_request, &nbd_lock[i]);
+ }
+
+ for (i = 0; i < MAX_NBD; i++) {
struct gendisk *disk = alloc_disk(1);
if (!disk)
goto out;
@@ -564,7 +579,6 @@
#ifdef MODULE
printk("nbd: registered device at major %d\n", NBD_MAJOR);
#endif
- blk_init_queue(&nbd_queue, do_nbd_request, &nbd_lock);
devfs_mk_dir("nbd");
for (i = 0; i < MAX_NBD; i++) {
struct gendisk *disk = nbd_dev[i].disk;
@@ -582,7 +596,7 @@
disk->first_minor = i;
disk->fops = &nbd_fops;
disk->private_data = &nbd_dev[i];
- disk->queue = &nbd_queue;
+ disk->queue = &nbd_queue[i];
sprintf(disk->disk_name, "nbd%d", i);
sprintf(disk->devfs_name, "nbd/%d", i);
set_capacity(disk, 0x3ffffe);
@@ -602,9 +616,9 @@
for (i = 0; i < MAX_NBD; i++) {
del_gendisk(nbd_dev[i].disk);
put_disk(nbd_dev[i].disk);
+ blk_cleanup_queue(&nbd_queue[i]);
}
devfs_remove("nbd");
- blk_cleanup_queue(&nbd_queue);
unregister_blkdev(NBD_MAJOR, "nbd");
}


Attachments:
nbd-patch1 (2.57 kB)

2003-06-21 22:40:59

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] nbd driver for 2.5+: fix for module removal & new block device layer

On Sat, Jun 21, 2003 at 03:48:56PM -0600, Lou Langholtz wrote:
> This patch prevents memory corruption from "rmmod nbd" with the existing
> 2.5.72 (and earlier) nbd driver. It does this by updating the nbd driver
> to the new block layer requirement that every disk has its own
> request_queue structure. This is the first of a series of patchlets
> designed to break down the essential changes I proposed in my last
> "enormous" patch. Note that another patchlet will make the whole
> allocation of per nbd_device memory be dynamic (rather than staticly
> tied to MAX_NBD). Please try out this patch and let me know how nbd is
> working for you before versus after. With any luck, some of these
> smaller patch breakdowns can actually see there way into new kernel
> releases. Thanks.

a) you don't have to have queue per device. It often does make
sense (for nbd it's almost certainly a win), but it's not required.

b) you definitely don't have to use separate queue locks. The
thing will work fine with spinlock being shared and I doubt that there
will be any noticable extra contention.

c) please, make allocation of queue dynamic _and_ separate from
any other allocated objects.

2003-06-21 23:02:13

by Lou Langholtz

[permalink] [raw]
Subject: Re: [PATCH] nbd driver for 2.5+: fix for module removal & new block device layer

[email protected] wrote:

>On Sat, Jun 21, 2003 at 03:48:56PM -0600, Lou Langholtz wrote:
>
>
>>This patch prevents memory corruption from "rmmod nbd" with the existing
>>2.5.72 (and earlier) nbd driver. It does this by updating the nbd driver
>>to the new block layer requirement that every disk has its own
>>request_queue structure. This is the first of a series of patchlets
>>designed to break down the essential changes I proposed in my last
>>"enormous" patch. Note that another patchlet will make the whole
>>allocation of per nbd_device memory be dynamic (rather than staticly
>>tied to MAX_NBD). Please try out this patch and let me know how nbd is
>>working for you before versus after. With any luck, some of these
>>smaller patch breakdowns can actually see there way into new kernel
>>releases. Thanks.
>>
>>
> a) you don't have to have queue per device. It often does make
>sense (for nbd it's almost certainly a win), but it's not required.
>
Unfortunately the way the sysfs code is implemented for struct gendisk
it is actually the case that every gendisk has to have its own
request_queue or else dcache.c BUG's and memory gets corrupted when you
put and release this disks:

Jun 21 12:59:07 cyprus kernel: nbd: registered device at major 43
Jun 21 12:59:25 cyprus kernel: ------------[ cut here ]------------
Jun 21 12:59:25 cyprus kernel: kernel BUG at include/linux/dcache.h:273!
Jun 21 12:59:25 cyprus kernel: invalid operand: 0000 [#1]
Jun 21 12:59:25 cyprus kernel: CPU: 0
Jun 21 12:59:25 cyprus kernel: EIP: 0060:[<c017849d>] Tainted: GF
Jun 21 12:59:25 cyprus kernel: EFLAGS: 00210246
Jun 21 12:59:25 cyprus kernel: EIP is at sysfs_remove_dir+0x1d/0x13f
Jun 21 12:59:25 cyprus kernel: eax: 00000000 ebx: e0947ac4 ecx:
c026b6e0 edx: 00000000
Jun 21 12:59:25 cyprus kernel: esi: 00000078 edi: db10b900 ebp:
d1f2fee0 esp: d1f2fec8
Jun 21 12:59:25 cyprus kernel: ds: 007b es: 007b ss: 0068
Jun 21 12:59:25 cyprus kernel: Process rmmod (pid: 3794,
threadinfo=d1f2e000 task=cf2bb300)
Jun 21 12:59:25 cyprus kernel: Stack: d4a45d00 dfda21c0 dfda21e8
e0947ac4 00000078 d0c10344 d1f2fef8 c0205873
Jun 21 12:59:25 cyprus kernel: e0947ac4 c0450280 e0947ac4
e0947ac4 d1f2ff08 c02058b4 e0947ac4 d0c10280
Jun 21 12:59:25 cyprus kernel: d1f2ff18 c02673ee e0947ac4
d0c10280 d1f2ff2c c026b224 d0c10280 d0c10280
Jun 21 12:59:25 cyprus kernel: Call Trace:
Jun 21 12:59:25 cyprus kernel: [<e0947ac4>] nbd_queue+0x44/0x180 [nbd]
Jun 21 12:59:25 cyprus kernel: [<c0205873>] kobject_del+0x43/0x70
Jun 21 12:59:25 cyprus kernel: [<e0947ac4>] nbd_queue+0x44/0x180 [nbd]
Jun 21 12:59:25 cyprus last message repeated 2 times
Jun 21 12:59:25 cyprus kernel: [<c02058b4>] kobject_unregister+0x14/0x20
Jun 21 12:59:25 cyprus kernel: [<e0947ac4>] nbd_queue+0x44/0x180 [nbd]
Jun 21 12:59:25 cyprus kernel: [<c02673ee>] elv_unregister_queue+0x1e/0x40
Jun 21 12:59:25 cyprus kernel: [<e0947ac4>] nbd_queue+0x44/0x180 [nbd]
Jun 21 12:59:25 cyprus kernel: [<c026b224>] unlink_gendisk+0x14/0x40
Jun 21 12:59:25 cyprus kernel: [<c0177301>] del_gendisk+0x61/0xe0
Jun 21 12:59:25 cyprus kernel: [<e0945a5c>] nbd_dev+0x1dc/0x2200 [nbd]
Jun 21 12:59:25 cyprus kernel: [<e0944c5d>] +0x1d/0x60 [nbd]
Jun 21 12:59:25 cyprus kernel: [<e0947c00>] +0x0/0x140 [nbd]
Jun 21 12:59:25 cyprus kernel: [<c012ce9e>] sys_delete_module+0x12e/0x170
Jun 21 12:59:25 cyprus kernel: [<e0947c00>] +0x0/0x140 [nbd]
Jun 21 12:59:25 cyprus kernel: [<c013e5c7>] sys_munmap+0x57/0x80
Jun 21 12:59:25 cyprus kernel: [<c010ac4d>] sysenter_past_esp+0x52/0x71
Jun 21 12:59:25 cyprus kernel:
Jun 21 12:59:25 cyprus kernel: Code: 0f 0b 11 01 3b fa 3a c0 ff 07 85 ff
0f 84 08 01 00 00 8b 47

> b) you definitely don't have to use separate queue locks. The
>thing will work fine with spinlock being shared and I doubt that there
>will be any noticable extra contention.
>
Probably not noticeable no.

> c) please, make allocation of queue dynamic _and_ separate from
>any other allocated objects.
>
>
Will do!

2003-06-22 09:53:05

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] nbd driver for 2.5+: fix for module removal & new block device layer

On Sat, Jun 21 2003, Lou Langholtz wrote:
> > b) you definitely don't have to use separate queue locks. The
> >thing will work fine with spinlock being shared and I doubt that there
> >will be any noticable extra contention.
> >
> Probably not noticeable no.

The approach you took is probably _worse_ than a single lock, since you
don't even cache align the locks. I'd say just keep the single nbd_lock
and use that in all queues, seperate locks are a questionable win but do
take up extra space.

--
Jens Axboe