2000-11-29 12:35:29

by Martin Schwidefsky

[permalink] [raw]
Subject: plug problem in linux-2.4.0-test11



Hi,
I experienced disk hangs with linux-2.4.0-test11 on S/390 and after
some debugging I found the cause. It the new method of unplugging
block devices that doesn't go along with the S/390 disk driver:

/*
* remove the plug and let it rip..
*/
static inline void __generic_unplug_device(request_queue_t *q)
{
if (!list_empty(&q->queue_head)) {
q->plugged = 0;
q->request_fn(q);
}
}

The story goes like this: at start the request queue was empty but
the disk driver was still working on an older, already dequeued
request. Someone plugged the device (q->plugged = 1 && queue on
tq_disk). Then a new request arrived and the unplugging was
started. But before __generic_unplug_device was reached the
outstanding request finished. The bottom half of the S/390 disk
drivers always checks for queued requests after an interrupt,
starts the first and dequeues some of the requests on the
request queue to put them on its internal queue. You could argue
that it shouldn't dequeue request if q->plugged == 1. On the other
hand why not, before the disk has nothing to do. Anyway the result
was that when the unplug routine was finally reached list_empty
was true. In that case q->plugged will not be cleared! The device
stays plugged. Forever.

The following implementation works:

/*
* remove the plug and let it rip..
*/
static inline void __generic_unplug_device(request_queue_t *q)
{
q->plugged = 0;
if (!list_empty(&q->queue_head))
q->request_fn(q);
}

blue skies,
Martin

Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH
Sch?naicherstr. 220, D-71032 B?blingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: [email protected]



2000-11-29 14:53:55

by Jens Axboe

[permalink] [raw]
Subject: Re: plug problem in linux-2.4.0-test11

On Wed, Nov 29 2000, [email protected] wrote:
> request queue to put them on its internal queue. You could argue
> that it shouldn't dequeue request if q->plugged == 1. On the other
> hand why not, before the disk has nothing to do. Anyway the result

I agree with your reasoning, even if the s390 behaviour is a bit
"non-standard" wrt block devices. Linus, could you apply?

--- drivers/block/ll_rw_blk.c~ Wed Nov 29 15:17:33 2000
+++ drivers/block/ll_rw_blk.c Wed Nov 29 15:18:43 2000
@@ -347,10 +347,9 @@
*/
static inline void __generic_unplug_device(request_queue_t *q)
{
- if (!list_empty(&q->queue_head)) {
- q->plugged = 0;
+ q->plugged = 0;
+ if (!list_empty(&q->queue_head))
q->request_fn(q);
- }
}

static void generic_unplug_device(void *data)

--
* Jens Axboe <[email protected]>
* SuSE Labs

2000-11-29 18:23:30

by Linus Torvalds

[permalink] [raw]
Subject: Re: plug problem in linux-2.4.0-test11



On Wed, 29 Nov 2000, Jens Axboe wrote:
>
> I agree with your reasoning, even if the s390 behaviour is a bit
> "non-standard" wrt block devices. Linus, could you apply?
>
> --- drivers/block/ll_rw_blk.c~ Wed Nov 29 15:17:33 2000
> +++ drivers/block/ll_rw_blk.c Wed Nov 29 15:18:43 2000
> @@ -347,10 +347,9 @@
> */
> static inline void __generic_unplug_device(request_queue_t *q)
> {
> - if (!list_empty(&q->queue_head)) {
> - q->plugged = 0;
> + q->plugged = 0;
> + if (!list_empty(&q->queue_head))
> q->request_fn(q);
> - }
> }

I would much rather actually go back to the original setup, which did
nothing at all if the queue wasn't plugged in the first place.

I think that we should strive for a setup that calls "request_fn" only to
start new IO, and that expects the low-level driver to be able to do the
whole request queue until it is empty. Then we re-start it the next time
around.

Linus

2000-11-29 18:43:02

by Jens Axboe

[permalink] [raw]
Subject: Re: plug problem in linux-2.4.0-test11

On Wed, Nov 29 2000, Linus Torvalds wrote:
> I would much rather actually go back to the original setup, which did
> nothing at all if the queue wasn't plugged in the first place.

But that potentially breaks devices that either don't use plugging
or alternatively implement their own, because q->plugged will not
get set and the unplug from __get_request_wait does nothing. It's
clear that the plug/noplug is too coarse. Anyway, I don't think
there's currently a problem with the old setup for any in-kernel
drivers so I'm fine with reverting to that behaviour for now.

> I think that we should strive for a setup that calls "request_fn" only to
> start new IO, and that expects the low-level driver to be able to do the
> whole request queue until it is empty. Then we re-start it the next time
> around.

Yes agreed.

--- drivers/block/ll_rw_blk.c~ Wed Nov 29 15:17:33 2000
+++ drivers/block/ll_rw_blk.c Wed Nov 29 19:04:50 2000
@@ -347,9 +347,10 @@
*/
static inline void __generic_unplug_device(request_queue_t *q)
{
- if (!list_empty(&q->queue_head)) {
+ if (q->plugged) {
q->plugged = 0;
- q->request_fn(q);
+ if (!list_empty(&q->queue_head))
+ q->request_fn(q);
}
}

--
* Jens Axboe <[email protected]>
* SuSE Labs

2000-12-27 20:35:10

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: plug problem in linux-2.4.0-test11


On Wed, Nov 29, 2000 at 12:56:44PM +0100, [email protected] wrote:
>
>
> Hi,
> I experienced disk hangs with linux-2.4.0-test11 on S/390 and after
> some debugging I found the cause. It the new method of unplugging
> block devices that doesn't go along with the S/390 disk driver:
>
> /*
> * remove the plug and let it rip..
> */
> static inline void __generic_unplug_device(request_queue_t *q)
> {
> if (!list_empty(&q->queue_head)) {
> q->plugged = 0;
> q->request_fn(q);
> }
> }
>
> The story goes like this: at start the request queue was empty but
> the disk driver was still working on an older, already dequeued
> request. Someone plugged the device (q->plugged = 1 && queue on
> tq_disk). Then a new request arrived and the unplugging was
> started. But before __generic_unplug_device was reached the
> outstanding request finished. The bottom half of the S/390 disk
> drivers always checks for queued requests after an interrupt,
> starts the first and dequeues some of the requests on the
> request queue to put them on its internal queue. You could argue
> that it shouldn't dequeue request if q->plugged == 1. On the other

Yes I argue exactly that ;). That's a bug in the driver. For example if the
driver is an headactive device (q->headactive == 1) then the bug will also
corrupt memory (probably it isn't an headactive device because you said it
moves the request into its private lists). Otherwise it only forbids the
elevator to merge requests and optimze seeks away.

I think right behaviour of the blkdev layer is to BUG() if the driver eats
requests while the device is plugged.

Andrea

2000-12-28 14:17:54

by Jens Axboe

[permalink] [raw]
Subject: Re: plug problem in linux-2.4.0-test11

On Wed, Dec 27 2000, Andrea Arcangeli wrote:
> I think right behaviour of the blkdev layer is to BUG() if the driver eats
> requests while the device is plugged.

The device is supposed to know what it's doing. Sure it defeats the
elevators work a bit, but again the driver should know best. Besides, it
doesn't seem worthwhile to go to any lengths detecting this.

But in general I agree, device request_fn should never touch a plugged
queue.

--
* Jens Axboe <[email protected]>
* SuSE Labs