2004-01-01 17:32:28

by Christophe Saout

[permalink] [raw]
Subject: [RFC][PATCH] Move bv_offset/bv_len update after bio_endio in __end_that_request_first

Hi!

Can we move the update of bio_index(bio)->bv_offset and bv_len after the
bio_endio call in __end_that_request_first please (if a bvec is partially
completed)?

The bi_idx is currently also updated after the bio_endio call.

Currently the bi_end_io function cannot exactly determine whether a bvec
was completed or not.

Think of the following situation:

bv_offset is 0 and bv_len is 4096, now the driver completes 2048 bytes of
that bvec.

At the moment bv_offset and bv_len are set to 2048 first. The bi_end_io
function can't distinguish between this situation and the situation where
bv_offset and bv_len were 2048 before and that bvec was completed (because
bi_idx is incremented afterwards).

This shouldn't break any user since most users are waiting for the whole
bio to complete with if (bio->bi_size > 0) return 1;.

I need this because I want to release buffers as soon as possible. The
incoming bio can get split by my driver due to problems allocating buffers.
If the partial bio returns and can't release its buffers immediately the
whole thing might deadlock.

That's why I need to know exactly how many and which bvecs were completed
in my bi_end_io function.

Or do you think it is safer to count backwards using bi_vcnt and bi_size?


diff -Nur linux-2.6.0.orig/drivers/block/ll_rw_blk.c linux-2.6.0/drivers/block/ll_rw_blk.c
--- linux-2.6.0.orig/drivers/block/ll_rw_blk.c 2003-11-24 02:31:11.000000000 +0100
+++ linux-2.6.0/drivers/block/ll_rw_blk.c 2004-01-01 14:27:34.222352384 +0100
@@ -2494,8 +2494,6 @@
* not a complete bvec done
*/
if (unlikely(nbytes > nr_bytes)) {
- bio_iovec_idx(bio, idx)->bv_offset += nr_bytes;
- bio_iovec_idx(bio, idx)->bv_len -= nr_bytes;
bio_nbytes += nr_bytes;
total_bytes += nr_bytes;
break;
@@ -2531,7 +2529,9 @@
*/
if (bio_nbytes) {
bio_endio(bio, bio_nbytes, error);
- req->bio->bi_idx += next_idx;
+ bio->bi_idx += next_idx;
+ bio_iovec(bio)->bv_offset += nr_bytes;
+ bio_iovec(bio)->bv_len -= nr_bytes;
}

blk_recalc_rq_sectors(req, total_bytes >> 9);


2004-01-02 10:46:45

by Jens Axboe

[permalink] [raw]
Subject: Re: [RFC][PATCH] Move bv_offset/bv_len update after bio_endio in __end_that_request_first

On Thu, Jan 01 2004, Christophe Saout wrote:
> Hi!
>
> Can we move the update of bio_index(bio)->bv_offset and bv_len after the
> bio_endio call in __end_that_request_first please (if a bvec is partially
> completed)?
>
> The bi_idx is currently also updated after the bio_endio call.
>
> Currently the bi_end_io function cannot exactly determine whether a bvec
> was completed or not.
>
> Think of the following situation:
>
> bv_offset is 0 and bv_len is 4096, now the driver completes 2048 bytes of
> that bvec.
>
> At the moment bv_offset and bv_len are set to 2048 first. The bi_end_io
> function can't distinguish between this situation and the situation where
> bv_offset and bv_len were 2048 before and that bvec was completed (because
> bi_idx is incremented afterwards).
>
> This shouldn't break any user since most users are waiting for the whole
> bio to complete with if (bio->bi_size > 0) return 1;.
>
> I need this because I want to release buffers as soon as possible. The
> incoming bio can get split by my driver due to problems allocating buffers.
> If the partial bio returns and can't release its buffers immediately the
> whole thing might deadlock.
>
> That's why I need to know exactly how many and which bvecs were completed
> in my bi_end_io function.
>
> Or do you think it is safer to count backwards using bi_vcnt and bi_size?

I'm inclined to thinking that, indeed. Those two fields have a more well
established usage, so I think you'll be better off doing that in the
long run.

--
Jens Axboe

2004-01-02 13:00:36

by Christophe Saout

[permalink] [raw]
Subject: Re: [RFC][PATCH] Move bv_offset/bv_len update after bio_endio in __end_that_request_first

Am Fr, den 02.01.2004 schrieb Jens Axboe um 11:46:

> > That's why I need to know exactly how many and which bvecs were completed
> > in my bi_end_io function.
> >
> > Or do you think it is safer to count backwards using bi_vcnt and bi_size?
>
> I'm inclined to thinking that, indeed. Those two fields have a more well
> established usage, so I think you'll be better off doing that in the
> long run.

Ok, if you say so. This and the IDE multwrite thing are the only two
places in the kernel preventing bi_idx to be usable this way. I just
thought it was nicer.


2004-01-02 13:26:19

by Christophe Saout

[permalink] [raw]
Subject: Re: [RFC][PATCH] Move bv_offset/bv_len update after bio_endio in __end_that_request_first

Am Fr, den 02.01.2004 schrieb Christophe Saout um 14:00:

> > > That's why I need to know exactly how many and which bvecs were completed
> > > in my bi_end_io function.
> > >
> > > Or do you think it is safer to count backwards using bi_vcnt and bi_size?
> >
> > I'm inclined to thinking that, indeed. Those two fields have a more well
> > established usage, so I think you'll be better off doing that in the
> > long run.
>
> Ok, if you say so. This and the IDE multwrite thing are the only two
> places in the kernel preventing bi_idx to be usable this way. I just
> thought it was nicer.

... but I still need bv_offset and bv_len to be unchanged in the
bio_endio call. Can we please do this?


2004-01-02 13:57:41

by Jens Axboe

[permalink] [raw]
Subject: Re: [RFC][PATCH] Move bv_offset/bv_len update after bio_endio in __end_that_request_first

On Fri, Jan 02 2004, Christophe Saout wrote:
> Am Fr, den 02.01.2004 schrieb Jens Axboe um 11:46:
>
> > > That's why I need to know exactly how many and which bvecs were completed
> > > in my bi_end_io function.
> > >
> > > Or do you think it is safer to count backwards using bi_vcnt and bi_size?
> >
> > I'm inclined to thinking that, indeed. Those two fields have a more well
> > established usage, so I think you'll be better off doing that in the
> > long run.
>
> Ok, if you say so. This and the IDE multwrite thing are the only two
> places in the kernel preventing bi_idx to be usable this way. I just
> thought it was nicer.

It is nicer, I agree. I'm not adverse to including the ll_rw_blk change,
you'll have to deal with IDE yourself :)

--
Jens Axboe