2008-02-15 22:27:35

by Kiyoshi Ueda

[permalink] [raw]
Subject: [RFC PATCH 1/3] block: add rq->complete_io hook for request stacking

This patch adds ->complete_io() hook for request stacking.
Request stacking drivers (such as request-based dm) can set
a callback for completion.
(The hook is not called in blk_end_io(), since request-based dm uses
it for clone completion in the following appendix patches.)

For the stacking to work without deadlock between stacking devices,
both the submission function and the completion function must be
called without the queue lock.
So the stacking is not available for __blk_end_request(), which
is called with the queue lock held.
The patch adds a queue flag (QUEUE_FLAG_STACKABLE) and a check to
make sure that the stacking is not enabled in drivers calling
__blk_end_request().

It means that only scsi mid-layer, cciss and i2o are ready for
the stacking.
I believe current dm-multipath users are using scsi, cciss and dasd.
So at least, dasd needs to be changed not to use __blk_end_request()
for request-based dm-multipath.



Below is the detailed explanation why the stacking is not possible
for drivers using __blk_end_request().

If the completion function is called with the queue lock held,
we have 2 deadlock problems:
a). when the stacking driver completes the hooked request
b). when the stacking driver submits other requests in that
completion context

Suppose we have the following stack:
________________________
| |
| stacking device: DEV#2 |
|________________________|
________________________
| |
| real device: DEV#1 |
|________________________|

For example of a):
1. The device driver takes the queue lock for DEV#1
2. The device driver calls __blk_end_request() for the request
and it is hooked by the stacking driver
3. The stacking driver tries to take the queue lock for DEV#1
to complete the hooked request
=> deadlock happens on the queue lock for DEV#1

For example of b): Assume the a) is worked-around by something
1. The device driver takes the queue lock for DEV#1
2. The device driver calls __blk_end_request() for the request
and it is hooked by the stacking driver
3. The stacking driver completes the hooked request
4. The stacking driver dequeues the next request from DEV#2
to dispatch quickly due to some performance reasons
5. The stacking driver tries to take the queue lock for DEV#1
to submit the next request
=> deadlock happens on the queue lock for DEV#1

To prevent such deadlock problems on the stacking, I'd like to say
that drivers which hold the queue lock when completing request are
not ready for the stacking.
So drivers using __blk_end_request() (and end_[queued|dequeued_]request())
need to be modified in this request stacking design, if we need to
use such devices for the stacking.

To prevent request stacking drivers from using such unstackable
devices, a queue flag, QUEUE_FLAG_STACKABLE, is added.
And device drivers can set it to be used by request stacking drivers.
(scsi patch is included just for an example of the setting.)
To detect wrong use of the flag and the hook, some checks are also
put in __blk_end_request().


Signed-off-by: Kiyoshi Ueda <[email protected]>
Signed-off-by: Jun'ichi Nomura <[email protected]>
---
block/blk-core.c | 31 +++++++++++++++++++++++++++++++
drivers/scsi/scsi_lib.c | 7 +++++++
include/linux/blkdev.h | 7 +++++++
3 files changed, 45 insertions(+)

Index: 2.6.25-rc1/block/blk-core.c
===================================================================
--- 2.6.25-rc1.orig/block/blk-core.c
+++ 2.6.25-rc1/block/blk-core.c
@@ -138,6 +138,7 @@ void rq_init(struct request_queue *q, st
rq->data = NULL;
rq->sense = NULL;
rq->end_io = NULL;
+ rq->complete_io = NULL;
rq->end_io_data = NULL;
rq->next_rq = NULL;
}
@@ -1917,6 +1918,9 @@ static int blk_end_io(struct request *rq
**/
int blk_end_request(struct request *rq, int error, unsigned int nr_bytes)
{
+ if (rq->complete_io)
+ return rq->complete_io(rq, error, nr_bytes, 0, NULL);
+
return blk_end_io(rq, error, nr_bytes, 0, NULL);
}
EXPORT_SYMBOL_GPL(blk_end_request);
@@ -1936,6 +1940,27 @@ EXPORT_SYMBOL_GPL(blk_end_request);
**/
int __blk_end_request(struct request *rq, int error, unsigned int nr_bytes)
{
+ if (unlikely(blk_queue_stackable(rq->q)))
+ printk(KERN_WARNING "dev: %s: Not ready for request stacking, "
+ "but the device driver set it stackable. "
+ "Need to fix the device driver!\n",
+ rq->rq_disk ? rq->rq_disk->disk_name : "?");
+
+ if (unlikely(rq->complete_io)) {
+ /*
+ * If we invoke the ->complete_io here, the request submitter's
+ * handler would get deadlock on the queue lock.
+ *
+ * This happens, when the request submitter didn't check
+ * whether the queue is stackable or the device driver marked
+ * the queue stackable.
+ * Need to fix the request submitter or the device driver.
+ */
+ printk(KERN_ERR "The device driver isn't ready for "
+ "request stacking.\n");
+ BUG();
+ }
+
if (blk_fs_request(rq) || blk_pc_request(rq)) {
if (__end_that_request_first(rq, error, nr_bytes))
return 1;
@@ -1966,6 +1991,9 @@ EXPORT_SYMBOL_GPL(__blk_end_request);
int blk_end_bidi_request(struct request *rq, int error, unsigned int nr_bytes,
unsigned int bidi_bytes)
{
+ if (rq->complete_io)
+ return rq->complete_io(rq, error, nr_bytes, bidi_bytes, NULL);
+
return blk_end_io(rq, error, nr_bytes, bidi_bytes, NULL);
}
EXPORT_SYMBOL_GPL(blk_end_bidi_request);
@@ -1999,6 +2027,9 @@ int blk_end_request_callback(struct requ
unsigned int nr_bytes,
int (drv_callback)(struct request *))
{
+ if (rq->complete_io)
+ return rq->complete_io(rq, error, nr_bytes, 0, drv_callback);
+
return blk_end_io(rq, error, nr_bytes, 0, drv_callback);
}
EXPORT_SYMBOL_GPL(blk_end_request_callback);
Index: 2.6.25-rc1/include/linux/blkdev.h
===================================================================
--- 2.6.25-rc1.orig/include/linux/blkdev.h
+++ 2.6.25-rc1/include/linux/blkdev.h
@@ -42,6 +42,9 @@ void copy_io_context(struct io_context *

struct request;
typedef void (rq_end_io_fn)(struct request *, int);
+typedef int (rq_complete_io_fn)(struct request *rq, int error,
+ unsigned int nr_bytes, unsigned int bidi_bytes,
+ int (drv_callback)(struct request *));

struct request_list {
int count[2];
@@ -228,6 +231,7 @@ struct request {
* completion callback.
*/
rq_end_io_fn *end_io;
+ rq_complete_io_fn *complete_io;
void *end_io_data;

/* for bidi */
@@ -401,6 +405,7 @@ struct request_queue
#define QUEUE_FLAG_PLUGGED 7 /* queue is plugged */
#define QUEUE_FLAG_ELVSWITCH 8 /* don't use elevator, just do FIFO */
#define QUEUE_FLAG_BIDI 9 /* queue supports bidi requests */
+#define QUEUE_FLAG_STACKABLE 10 /* queue supports request stacking */

enum {
/*
@@ -446,6 +451,8 @@ enum {
#define blk_queue_tagged(q) test_bit(QUEUE_FLAG_QUEUED, &(q)->queue_flags)
#define blk_queue_stopped(q) test_bit(QUEUE_FLAG_STOPPED, &(q)->queue_flags)
#define blk_queue_flushing(q) ((q)->ordseq)
+#define blk_queue_stackable(q) \
+ test_bit(QUEUE_FLAG_STACKABLE, &(q)->queue_flags)

#define blk_fs_request(rq) ((rq)->cmd_type == REQ_TYPE_FS)
#define blk_pc_request(rq) ((rq)->cmd_type == REQ_TYPE_BLOCK_PC)
Index: 2.6.25-rc1/drivers/scsi/scsi_lib.c
===================================================================
--- 2.6.25-rc1.orig/drivers/scsi/scsi_lib.c
+++ 2.6.25-rc1/drivers/scsi/scsi_lib.c
@@ -1597,6 +1597,13 @@ struct request_queue *__scsi_alloc_queue
*/
blk_queue_dma_alignment(q, 0x03);

+ /*
+ * SCSI is ready for request stacking, since it doesn't hold
+ * queue lock when completing request.
+ * (It doesn't use __blk_end_request().)
+ */
+ set_bit(QUEUE_FLAG_STACKABLE, &q->queue_flags);
+
return q;
}
EXPORT_SYMBOL(__scsi_alloc_queue);


2008-03-03 16:25:09

by Hannes Reinecke

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] block: add rq->complete_io hook for request stacking

Hi Kiyoshi,

Kiyoshi Ueda wrote:
> This patch adds ->complete_io() hook for request stacking.
> Request stacking drivers (such as request-based dm) can set
> a callback for completion.
> (The hook is not called in blk_end_io(), since request-based dm uses
> it for clone completion in the following appendix patches.)
>
[ .. ]
I would rather have rq->complete_io() to be pointing to blk_end_io in the
default case, this way rq->complete_io() would always be valid and we
would be saving us the if() clause.

> Index: 2.6.25-rc1/block/blk-core.c
> ===================================================================
> --- 2.6.25-rc1.orig/block/blk-core.c
> +++ 2.6.25-rc1/block/blk-core.c
> @@ -138,6 +138,7 @@ void rq_init(struct request_queue *q, st
> rq->data = NULL;
> rq->sense = NULL;
> rq->end_io = NULL;
> + rq->complete_io = NULL;
rq->complete_io = blk_end_io;

> rq->end_io_data = NULL;
> rq->next_rq = NULL;
> }
> @@ -1917,6 +1918,9 @@ static int blk_end_io(struct request *rq
> **/
> int blk_end_request(struct request *rq, int error, unsigned int nr_bytes)
> {
> + if (rq->complete_io)
> + return rq->complete_io(rq, error, nr_bytes, 0, NULL);
> +
> return blk_end_io(rq, error, nr_bytes, 0, NULL);
> }
So when using my proposal this would just become:

{
BUG_ON(!rq->complete_io);
return rq->complete_io(rq, error, nr_bytes, 0, NULL);
}

> EXPORT_SYMBOL_GPL(blk_end_request);
> @@ -1936,6 +1940,27 @@ EXPORT_SYMBOL_GPL(blk_end_request);
> **/
> int __blk_end_request(struct request *rq, int error, unsigned int nr_bytes)
> {
> + if (unlikely(blk_queue_stackable(rq->q)))
> + printk(KERN_WARNING "dev: %s: Not ready for request stacking, "
> + "but the device driver set it stackable. "
> + "Need to fix the device driver!\n",
> + rq->rq_disk ? rq->rq_disk->disk_name : "?");
> +
> + if (unlikely(rq->complete_io)) {
> + /*
> + * If we invoke the ->complete_io here, the request submitter's
> + * handler would get deadlock on the queue lock.
> + *
> + * This happens, when the request submitter didn't check
> + * whether the queue is stackable or the device driver marked
> + * the queue stackable.
> + * Need to fix the request submitter or the device driver.
> + */
> + printk(KERN_ERR "The device driver isn't ready for "
> + "request stacking.\n");
> + BUG();
> + }
> +
This could be skipped entirely then.

> if (blk_fs_request(rq) || blk_pc_request(rq)) {
> if (__end_that_request_first(rq, error, nr_bytes))
> return 1;
> @@ -1966,6 +1991,9 @@ EXPORT_SYMBOL_GPL(__blk_end_request);
> int blk_end_bidi_request(struct request *rq, int error, unsigned int nr_bytes,
> unsigned int bidi_bytes)
> {
> + if (rq->complete_io)
> + return rq->complete_io(rq, error, nr_bytes, bidi_bytes, NULL);
> +
> return blk_end_io(rq, error, nr_bytes, bidi_bytes, NULL);
> }
return rq->complete_io(rq, error, nr_bytes, bidi_bytes, NULL);

> EXPORT_SYMBOL_GPL(blk_end_bidi_request);
> @@ -1999,6 +2027,9 @@ int blk_end_request_callback(struct requ
> unsigned int nr_bytes,
> int (drv_callback)(struct request *))
> {
> + if (rq->complete_io)
> + return rq->complete_io(rq, error, nr_bytes, 0, drv_callback);
> +
> return blk_end_io(rq, error, nr_bytes, 0, drv_callback);
return rq->complete_io(rq, error, nr_bytes, bidi_bytes, NULL);

> }
> EXPORT_SYMBOL_GPL(blk_end_request_callback);
> Index: 2.6.25-rc1/include/linux/blkdev.h
> ===================================================================
> --- 2.6.25-rc1.orig/include/linux/blkdev.h
> +++ 2.6.25-rc1/include/linux/blkdev.h
> @@ -42,6 +42,9 @@ void copy_io_context(struct io_context *
>
> struct request;
> typedef void (rq_end_io_fn)(struct request *, int);
> +typedef int (rq_complete_io_fn)(struct request *rq, int error,
> + unsigned int nr_bytes, unsigned int bidi_bytes,
> + int (drv_callback)(struct request *));
>
> struct request_list {
> int count[2];
> @@ -228,6 +231,7 @@ struct request {
> * completion callback.
> */
> rq_end_io_fn *end_io;
> + rq_complete_io_fn *complete_io;
> void *end_io_data;
>
> /* for bidi */
> @@ -401,6 +405,7 @@ struct request_queue
> #define QUEUE_FLAG_PLUGGED 7 /* queue is plugged */
> #define QUEUE_FLAG_ELVSWITCH 8 /* don't use elevator, just do FIFO */
> #define QUEUE_FLAG_BIDI 9 /* queue supports bidi requests */
> +#define QUEUE_FLAG_STACKABLE 10 /* queue supports request stacking */
>
> enum {
> /*
> @@ -446,6 +451,8 @@ enum {
> #define blk_queue_tagged(q) test_bit(QUEUE_FLAG_QUEUED, &(q)->queue_flags)
> #define blk_queue_stopped(q) test_bit(QUEUE_FLAG_STOPPED, &(q)->queue_flags)
> #define blk_queue_flushing(q) ((q)->ordseq)
> +#define blk_queue_stackable(q) \
> + test_bit(QUEUE_FLAG_STACKABLE, &(q)->queue_flags)
>
> #define blk_fs_request(rq) ((rq)->cmd_type == REQ_TYPE_FS)
> #define blk_pc_request(rq) ((rq)->cmd_type == REQ_TYPE_BLOCK_PC)
> Index: 2.6.25-rc1/drivers/scsi/scsi_lib.c
> ===================================================================
> --- 2.6.25-rc1.orig/drivers/scsi/scsi_lib.c
> +++ 2.6.25-rc1/drivers/scsi/scsi_lib.c
> @@ -1597,6 +1597,13 @@ struct request_queue *__scsi_alloc_queue
> */
> blk_queue_dma_alignment(q, 0x03);
>
> + /*
> + * SCSI is ready for request stacking, since it doesn't hold
> + * queue lock when completing request.
> + * (It doesn't use __blk_end_request().)
> + */
> + set_bit(QUEUE_FLAG_STACKABLE, &q->queue_flags);
> +
> return q;
> }
> EXPORT_SYMBOL(__scsi_alloc_queue);
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

Apart from this: Great work!

Cheers,

Hannes
--
Dr. Hannes Reinecke zSeries & Storage
[email protected] +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Markus Rex, HRB 16746 (AG N?rnberg)

2008-03-03 17:40:32

by Grant Grundler

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] block: add rq->complete_io hook for request stacking

On Mon, Mar 3, 2008 at 8:24 AM, Hannes Reinecke <[email protected]> wrote:

> I would rather have rq->complete_io() to be pointing to blk_end_io in the
> default case, this way rq->complete_io() would always be valid and we
> would be saving us the if() clause.

This is a good idea. But...

...
> So when using my proposal this would just become:
>
> {
> BUG_ON(!rq->complete_io);
>
> return rq->complete_io(rq, error, nr_bytes, 0, NULL);
> }

This "BUG_ON" is also an "if()" clause except
it will panic. The box will panic if the function
pointer is a null pointer and it won't be hard to
sort out why. I suggest omitting the BUG_ON.

thanks,
grant

2008-03-03 17:50:19

by Jens Axboe

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] block: add rq->complete_io hook for request stacking

On Mon, Mar 03 2008, Hannes Reinecke wrote:
> Hi Kiyoshi,
>
> Kiyoshi Ueda wrote:
> > This patch adds ->complete_io() hook for request stacking.
> > Request stacking drivers (such as request-based dm) can set
> > a callback for completion.
> > (The hook is not called in blk_end_io(), since request-based dm uses
> > it for clone completion in the following appendix patches.)
> >
> [ .. ]
> I would rather have rq->complete_io() to be pointing to blk_end_io in the
> default case, this way rq->complete_io() would always be valid and we
> would be saving us the if() clause.

The whole point of this ->complete_io() thread is that we cannot use the
above approach, before all drivers are converted from using
__blk_end_request() to blk_end_request(). And that requires some work.

--
Jens Axboe

2008-03-03 19:04:56

by Kiyoshi Ueda

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] block: add rq->complete_io hook for request stacking

Hi Hannes,

On Mon, 03 Mar 2008 17:24:50 +0100, Hannes Reinecke wrote:
> Hi Kiyoshi,
>
> Kiyoshi Ueda wrote:
> > This patch adds ->complete_io() hook for request stacking.
> > Request stacking drivers (such as request-based dm) can set
> > a callback for completion.
> > (The hook is not called in blk_end_io(), since request-based dm uses
> > it for clone completion in the following appendix patches.)
> >
> [ .. ]
> I would rather have rq->complete_io() to be pointing to blk_end_io in the
> default case, this way rq->complete_io() would always be valid and we
> would be saving us the if() clause.

Thank you for the comment.

I'm thinking about an idea come from Mike and Jens after the session
at LSF'08:
o stacking driver clones both bio and request and uses
bio->bi_end_io and rq->end_io
o stacking driver uses blk_complete_request() in rq->end_io
so that stacking driver can work without queue lock
With this idea, we don't need to add the new hook to request,
and we can use request stacking on drivers using __blk_end_request().

Currently, I'm trying to convert request-based dm to use the idea above.
I'll post the new version as a RFC in the near future.

Thanks,
Kiyoshi Ueda

2008-03-04 08:10:24

by Hannes Reinecke

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] block: add rq->complete_io hook for request stacking

Hi Kiyoshi,

Kiyoshi Ueda wrote:
> Hi Hannes,
>
> On Mon, 03 Mar 2008 17:24:50 +0100, Hannes Reinecke wrote:
>> Hi Kiyoshi,
>>
>> Kiyoshi Ueda wrote:
>>> This patch adds ->complete_io() hook for request stacking.
>>> Request stacking drivers (such as request-based dm) can set
>>> a callback for completion.
>>> (The hook is not called in blk_end_io(), since request-based dm uses
>>> it for clone completion in the following appendix patches.)
>>>
>> [ .. ]
>> I would rather have rq->complete_io() to be pointing to blk_end_io in the
>> default case, this way rq->complete_io() would always be valid and we
>> would be saving us the if() clause.
>
> Thank you for the comment.
>
> I'm thinking about an idea come from Mike and Jens after the session
> at LSF'08:
> o stacking driver clones both bio and request and uses
> bio->bi_end_io and rq->end_io
> o stacking driver uses blk_complete_request() in rq->end_io
> so that stacking driver can work without queue lock
> With this idea, we don't need to add the new hook to request,
> and we can use request stacking on drivers using __blk_end_request().
>
Huh? I can't really imagine how this should work. If you start using
bio->bi_end_io() for stacking usage you'll end up with doing weird thing
with the bios ... Can you elaborate a bit more here?

> Currently, I'm trying to convert request-based dm to use the idea above.
> I'll post the new version as a RFC in the near future.
>
Oh, cool. Looking forward to it.

Cheers,

Hannes
--
Dr. Hannes Reinecke zSeries & Storage
[email protected] +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Markus Rex, HRB 16746 (AG N?rnberg)

2008-03-04 20:03:20

by Junichi Nomura

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] block: add rq->complete_io hook for request stacking

Hi Hannes,

Hannes Reinecke wrote:
> Kiyoshi Ueda wrote:
>> I'm thinking about an idea come from Mike and Jens after the session
>> at LSF'08:
>> o stacking driver clones both bio and request and uses
>> bio->bi_end_io and rq->end_io
>> o stacking driver uses blk_complete_request() in rq->end_io
>> so that stacking driver can work without queue lock
>> With this idea, we don't need to add the new hook to request,
>> and we can use request stacking on drivers using __blk_end_request().
>>
> Huh? I can't really imagine how this should work. If you start using
> bio->bi_end_io() for stacking usage you'll end up with doing weird thing
> with the bios ... Can you elaborate a bit more here?

Cloned bio's bi_end_io() can check the error.

For successful I/O, it calls something equivalent to
end_that_request_first() for the original request,
that may eventually complete the original bio and
notify the submitter.

For error, it can decide not to complete the original bio
and record the error information.

When the clone request is finally completed and rq->end_io() is
called, stacking driver can check the recorded error information
and decide whether it should take any action or just complete
the original request by blk_complete_request().

Thanks,
--
Jun'ichi Nomura, NEC Corporation of America