2011-03-22 19:48:02

by Christoph Hellwig

[permalink] [raw]
Subject: merging discard request in the block layer

It seems the current block layer wil happily try to merge discard
requests that were split because they are at the max that bi_size
can hold together again. At least that's what the

blk: request botched

make me believe when testing XFS code that allows multiple
asynchronous discard request, unlike the current blkdev_issue_discard
which always waits for one before starting the next.

I tried this little sniplet to prevent it:

Index: xfs/block/blk-merge.c
===================================================================
--- xfs.orig/block/blk-merge.c 2011-03-22 13:07:24.733857580 +0100
+++ xfs/block/blk-merge.c 2011-03-22 13:08:17.448856577 +0100
@@ -373,7 +373,7 @@ static int attempt_merge(struct request_
/*
* Don't merge file system requests and discard requests
*/
- if ((req->cmd_flags & REQ_DISCARD) != (next->cmd_flags & REQ_DISCARD))
+ if ((req->cmd_flags & REQ_DISCARD) || (next->cmd_flags & REQ_DISCARD))
return 0;

/*

but it has no effect. Using the big hammer and bypassing the whole
I/O schedule logic on the other works fine:

Index: xfs/block/blk-core.c
===================================================================
--- xfs.orig/block/blk-core.c 2011-03-22 13:07:24.717855861 +0100
+++ xfs/block/blk-core.c 2011-03-22 14:56:13.424856289 +0100
@@ -1218,7 +1218,7 @@ static int __make_request(struct request

spin_lock_irq(q->queue_lock);

- if (bio->bi_rw & (REQ_FLUSH | REQ_FUA)) {
+ if (bio->bi_rw & (REQ_FLUSH | REQ_FUA | REQ_DISCARD)) {
where = ELEVATOR_INSERT_FRONT;
goto get_rq;
}


2011-03-22 19:54:14

by Jens Axboe

[permalink] [raw]
Subject: Re: merging discard request in the block layer

On 2011-03-22 20:47, Christoph Hellwig wrote:
> It seems the current block layer wil happily try to merge discard
> requests that were split because they are at the max that bi_size
> can hold together again. At least that's what the
>
> blk: request botched

That would seem to indicate a bug in the merging logic instead.

> make me believe when testing XFS code that allows multiple
> asynchronous discard request, unlike the current blkdev_issue_discard
> which always waits for one before starting the next.
>
> I tried this little sniplet to prevent it:
>
> Index: xfs/block/blk-merge.c
> ===================================================================
> --- xfs.orig/block/blk-merge.c 2011-03-22 13:07:24.733857580 +0100
> +++ xfs/block/blk-merge.c 2011-03-22 13:08:17.448856577 +0100
> @@ -373,7 +373,7 @@ static int attempt_merge(struct request_
> /*
> * Don't merge file system requests and discard requests
> */
> - if ((req->cmd_flags & REQ_DISCARD) != (next->cmd_flags & REQ_DISCARD))
> + if ((req->cmd_flags & REQ_DISCARD) || (next->cmd_flags & REQ_DISCARD))
> return 0;
>
> /*

That's not going to be enough, you want to disable the bio to request
merging of discards as well in elevator.c:elv_rq_merge_ok(). Does
that then fix it?


--
Jens Axboe

2011-03-22 21:00:38

by Christoph Hellwig

[permalink] [raw]
Subject: Re: merging discard request in the block layer

On Tue, Mar 22, 2011 at 08:54:06PM +0100, Jens Axboe wrote:
> > Index: xfs/block/blk-merge.c
> > ===================================================================
> > --- xfs.orig/block/blk-merge.c 2011-03-22 13:07:24.733857580 +0100
> > +++ xfs/block/blk-merge.c 2011-03-22 13:08:17.448856577 +0100
> > @@ -373,7 +373,7 @@ static int attempt_merge(struct request_
> > /*
> > * Don't merge file system requests and discard requests
> > */
> > - if ((req->cmd_flags & REQ_DISCARD) != (next->cmd_flags & REQ_DISCARD))
> > + if ((req->cmd_flags & REQ_DISCARD) || (next->cmd_flags & REQ_DISCARD))
> > return 0;
> >
> > /*
>
> That's not going to be enough, you want to disable the bio to request
> merging of discards as well in elevator.c:elv_rq_merge_ok(). Does
> that then fix it?

Applying the same fix in elv_rq_merge_ok seems to fix the issue, at
least the xfstests testcase that usually hits it is completes ok.

2011-03-22 21:04:06

by Jens Axboe

[permalink] [raw]
Subject: Re: merging discard request in the block layer

On 2011-03-22 20:54, Jens Axboe wrote:
> On 2011-03-22 20:47, Christoph Hellwig wrote:
>> It seems the current block layer wil happily try to merge discard
>> requests that were split because they are at the max that bi_size
>> can hold together again. At least that's what the
>>
>> blk: request botched
>
> That would seem to indicate a bug in the merging logic instead.

What kind of max discard size does you device have? If the max discard
size is smaller than the regular request size, this could help.

diff --git a/block/blk-merge.c b/block/blk-merge.c
index cfcc37c..76cdfb7 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -232,8 +232,12 @@ int ll_back_merge_fn(struct request_queue *q, struct request *req,

if (unlikely(req->cmd_type == REQ_TYPE_BLOCK_PC))
max_sectors = queue_max_hw_sectors(q);
- else
- max_sectors = queue_max_sectors(q);
+ else {
+ if (req->cmd_flags & REQ_DISCARD)
+ max_sectors = q->limits.max_discard_sectors;
+ else
+ max_sectors = queue_max_sectors(q);
+ }

if (blk_rq_sectors(req) + bio_sectors(bio) > max_sectors) {
req->cmd_flags |= REQ_NOMERGE;
@@ -256,9 +260,12 @@ int ll_front_merge_fn(struct request_queue *q, struct request *req,

if (unlikely(req->cmd_type == REQ_TYPE_BLOCK_PC))
max_sectors = queue_max_hw_sectors(q);
- else
- max_sectors = queue_max_sectors(q);
-
+ else {
+ if (req->cmd_flags & REQ_DISCARD)
+ max_sectors = q->limits.max_discard_sectors;
+ else
+ max_sectors = queue_max_sectors(q);
+ }

if (blk_rq_sectors(req) + bio_sectors(bio) > max_sectors) {
req->cmd_flags |= REQ_NOMERGE;


--
Jens Axboe

2011-03-23 13:01:39

by Christoph Hellwig

[permalink] [raw]
Subject: Re: merging discard request in the block layer

On Tue, Mar 22, 2011 at 10:03:57PM +0100, Jens Axboe wrote:
> > That would seem to indicate a bug in the merging logic instead.
>
> What kind of max discard size does you device have? If the max discard
> size is smaller than the regular request size, this could help.

It's a SCSI device, so the max discard size is a lot larger:

# cat /sys/block/sda/queue/max_sectors_kb
512
# cat /sys/block/sda/queue/max_hw_sectors_kb
32767
# cat /sys/block/sda/queue/discard_max_bytes
4294966784

2011-03-23 15:26:49

by Jens Axboe

[permalink] [raw]
Subject: Re: merging discard request in the block layer

On 2011-03-23 14:01, Christoph Hellwig wrote:
> On Tue, Mar 22, 2011 at 10:03:57PM +0100, Jens Axboe wrote:
>>> That would seem to indicate a bug in the merging logic instead.
>>
>> What kind of max discard size does you device have? If the max discard
>> size is smaller than the regular request size, this could help.
>
> It's a SCSI device, so the max discard size is a lot larger:
>
> # cat /sys/block/sda/queue/max_sectors_kb
> 512
> # cat /sys/block/sda/queue/max_hw_sectors_kb
> 32767
> # cat /sys/block/sda/queue/discard_max_bytes
> 4294966784

I'll try and throw a synthetic test at it that produces a slew of
discard merging and see what happens.

--
Jens Axboe

2011-03-30 14:16:28

by Christoph Hellwig

[permalink] [raw]
Subject: Re: merging discard request in the block layer

On Tue, Mar 22, 2011 at 10:03:57PM +0100, Jens Axboe wrote:
> On 2011-03-22 20:54, Jens Axboe wrote:
> > On 2011-03-22 20:47, Christoph Hellwig wrote:
> >> It seems the current block layer wil happily try to merge discard
> >> requests that were split because they are at the max that bi_size
> >> can hold together again. At least that's what the
> >>
> >> blk: request botched
> >
> > That would seem to indicate a bug in the merging logic instead.
>
> What kind of max discard size does you device have? If the max discard
> size is smaller than the regular request size, this could help.

I've done some heavier test, and both the extended check for mergeable
requests or your patch with different limits hangs the test box hard
with no way to get a backtrace. Using my original patch to completely
skip the merging logic seems to work fine.

2011-05-03 18:05:28

by Christoph Hellwig

[permalink] [raw]
Subject: Re: merging discard request in the block layer

I finally maged to debug this a bit further and can have found a cure
for the null pointer derefences I got on recent kernels.

The problem is that bio_has_data thinks discard requests have a payload
and thus tries to poke into it's pages when trying to merge requests.
Taking the REQ_DISCARD check in to bio_has_data fixes that. I've also
tried to special case discard requests in bio_cur_bytes, but that
doesn't fix the botched requests messages yet. I suspect the merging
code might need some additions to update the bio_size for discard
requests that it currently skips.

Index: xfs/block/blk-core.c
===================================================================
--- xfs.orig/block/blk-core.c 2011-05-03 19:45:51.980219652 +0200
+++ xfs/block/blk-core.c 2011-05-03 19:47:51.756237436 +0200
@@ -1645,7 +1645,7 @@ void submit_bio(int rw, struct bio *bio)
* If it's a regular read/write or a barrier with data attached,
* go through the normal accounting stuff before submission.
*/
- if (bio_has_data(bio) && !(rw & REQ_DISCARD)) {
+ if (bio_has_data(bio)) {
if (rw & WRITE) {
count_vm_events(PGPGOUT, count);
} else {
Index: xfs/include/linux/bio.h
===================================================================
--- xfs.orig/include/linux/bio.h 2011-05-03 19:43:28.537663414 +0200
+++ xfs/include/linux/bio.h 2011-05-03 19:48:18.632758500 +0200
@@ -69,7 +69,7 @@

static inline unsigned int bio_cur_bytes(struct bio *bio)
{
- if (bio->bi_vcnt)
+ if (bio->bi_vcnt && !(bio->bi_rw & REQ_DISCARD))
return bio_iovec(bio)->bv_len;
else /* dataless requests such as discard */
return bio->bi_size;
@@ -368,7 +368,7 @@ static inline char *__bio_kmap_irq(struc
*/
static inline int bio_has_data(struct bio *bio)
{
- return bio && bio->bi_io_vec != NULL;
+ return bio && bio->bi_io_vec != NULL && !(bio->bi_rw & REQ_DISCARD);
}

/*