2012-10-02 06:22:06

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] block: makes bio_split support bio without data

On Fri, 28 Sep 2012 09:23:43 -0700 Kent Overstreet <[email protected]>
wrote:

> On Mon, Sep 24, 2012 at 02:56:39PM +1000, NeilBrown wrote:
> >
> > Hi Jens,
> > this patch has been sitting in my -next tree for a little while and I was
> > hoping for it to go in for the next merge window.
> > It simply allows bio_split() to be used on bios without a payload, such as
> > 'discard'.
>
> Thing is, at some point in the stack a discard bio is going to have data
> - see blk_add_rquest_payload(), and it used to be the single page was
> added to discard bios above generic_make_request(), in
> blkdev_issue_discard() or whatever it's called.
>
> So while I'm sure your code works, it's just a fragile way of doing it.
>
> There's also other types of bios where bi_size has nothing to do with
> the amount of data in the bi_io_vec - actually I think this is a new
> thing, since Martin Petersen just added REQ_WRITE_SAME and I don't think
> there were any other instances besides REQ_DISCARD before.
>
> So my preference would be defining a mask (REQ_DISCARD|REQ_WRITE_SAME),
> and if bio->bi_rw & that mask is true, just duplicate the bvec or
> whatever.

Hi Kent,
I'm afraid I don't see the relevance of your comments to the patch.

The current bio_split code can successfully split a bio with zero or one
bi_vec entry. If there are more than that, we cannot split.

How does it matter whether the bio is a DISCARD or a WRITE_SAME or a DATA or
whatever?

NeilBrown


>
> That way it's much more explicit and less likely to trip someone else
> up later.
>
> (I've actually got a patch in my tree that does just that, but it's
> special cased in bio_advance() which makes things work out really
> nicely).
>
> > Are you happy with it going in though my 'md' tree, or would you rather take
> > it though your 'block' tree?
> >
> > Thanks,
> > NeilBrown
> >
> >
> > From: Shaohua Li <[email protected]>
> > Date: Thu, 20 Sep 2012 09:36:03 +1000
> > Subject: [PATCH] block: makes bio_split support bio without data
> >
> > discard bio hasn't data attached. We hit a BUG_ON with such bio. This makes
> > bio_split works for such bio.
> >
> > Signed-off-by: Shaohua Li <[email protected]>
> > Signed-off-by: NeilBrown <[email protected]>
> >
> > diff --git a/fs/bio.c b/fs/bio.c
> > index 71072ab..dbb7a6c 100644
> > --- a/fs/bio.c
> > +++ b/fs/bio.c
> > @@ -1501,7 +1501,7 @@ struct bio_pair *bio_split(struct bio *bi, int first_sectors)
> > trace_block_split(bdev_get_queue(bi->bi_bdev), bi,
> > bi->bi_sector + first_sectors);
> >
> > - BUG_ON(bi->bi_vcnt != 1);
> > + BUG_ON(bi->bi_vcnt != 1 && bi->bi_vcnt != 0);
> > BUG_ON(bi->bi_idx != 0);
> > atomic_set(&bp->cnt, 3);
> > bp->error = 0;
> > @@ -1511,17 +1511,19 @@ struct bio_pair *bio_split(struct bio *bi, int first_sectors)
> > bp->bio2.bi_size -= first_sectors << 9;
> > bp->bio1.bi_size = first_sectors << 9;
> >
> > - bp->bv1 = bi->bi_io_vec[0];
> > - bp->bv2 = bi->bi_io_vec[0];
> > - bp->bv2.bv_offset += first_sectors << 9;
> > - bp->bv2.bv_len -= first_sectors << 9;
> > - bp->bv1.bv_len = first_sectors << 9;
> > + if (bi->bi_vcnt != 0) {
> > + bp->bv1 = bi->bi_io_vec[0];
> > + bp->bv2 = bi->bi_io_vec[0];
> > + bp->bv2.bv_offset += first_sectors << 9;
> > + bp->bv2.bv_len -= first_sectors << 9;
> > + bp->bv1.bv_len = first_sectors << 9;
> >
> > - bp->bio1.bi_io_vec = &bp->bv1;
> > - bp->bio2.bi_io_vec = &bp->bv2;
> > + bp->bio1.bi_io_vec = &bp->bv1;
> > + bp->bio2.bi_io_vec = &bp->bv2;
> >
> > - bp->bio1.bi_max_vecs = 1;
> > - bp->bio2.bi_max_vecs = 1;
> > + bp->bio1.bi_max_vecs = 1;
> > + bp->bio2.bi_max_vecs = 1;
> > + }
> >
> > bp->bio1.bi_end_io = bio_pair_end_1;
> > bp->bio2.bi_end_io = bio_pair_end_2;
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/


Attachments:
signature.asc (828.00 B)

2012-10-02 21:09:29

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH] block: makes bio_split support bio without data

On Tue, Oct 02, 2012 at 04:22:01PM +1000, NeilBrown wrote:
> On Fri, 28 Sep 2012 09:23:43 -0700 Kent Overstreet <[email protected]>
> wrote:
>
> > On Mon, Sep 24, 2012 at 02:56:39PM +1000, NeilBrown wrote:
> > >
> > > Hi Jens,
> > > this patch has been sitting in my -next tree for a little while and I was
> > > hoping for it to go in for the next merge window.
> > > It simply allows bio_split() to be used on bios without a payload, such as
> > > 'discard'.
> >
> > Thing is, at some point in the stack a discard bio is going to have data
> > - see blk_add_rquest_payload(), and it used to be the single page was
> > added to discard bios above generic_make_request(), in
> > blkdev_issue_discard() or whatever it's called.
> >
> > So while I'm sure your code works, it's just a fragile way of doing it.
> >
> > There's also other types of bios where bi_size has nothing to do with
> > the amount of data in the bi_io_vec - actually I think this is a new
> > thing, since Martin Petersen just added REQ_WRITE_SAME and I don't think
> > there were any other instances besides REQ_DISCARD before.
> >
> > So my preference would be defining a mask (REQ_DISCARD|REQ_WRITE_SAME),
> > and if bio->bi_rw & that mask is true, just duplicate the bvec or
> > whatever.
>
> Hi Kent,
> I'm afraid I don't see the relevance of your comments to the patch.
>
> The current bio_split code can successfully split a bio with zero or one
> bi_vec entry. If there are more than that, we cannot split.
>
> How does it matter whether the bio is a DISCARD or a WRITE_SAME or a DATA or
> whatever?

Hrm, I think I didn't explain very well.

After your change, if bio->bi_vcnt != 0, then it splits the bvec.

The trouble is that discard bios do under certain circumstances have
bio->bi_vcnt != 0, in which case splitting the bvec is the wrong thing
to do - first_sectors will quite likely be bigger than the bvec.

In practice this isn't currently a problem for discard bios, because
since Christoph added blk_add_request_payload(), discard bios won't have
that bvec added until they hit the scsi layer which will be after any
splitting. But this is a fairly recent and unrelated change, and IMO not
the kind of behaviour I'd want to rely on.

WRITE_SAME is a problem for the same reason - bio_sectors(bio) may be
large, but the bio will always have a single bvec and splitting the bvec
is always the wrong thing to do for WRITE_SAME.

So, I think it makes more sense to make the splitting conditional on
!(bio->bi_rw & (REQ_DISCARD|REQ_WRITE_SAME)), in addition to
bio->bi_vcnt == 1.

..That make more sense?

2012-10-03 03:30:47

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] block: makes bio_split support bio without data

On Tue, 2 Oct 2012 14:09:23 -0700 Kent Overstreet <[email protected]>
wrote:

> On Tue, Oct 02, 2012 at 04:22:01PM +1000, NeilBrown wrote:
> > On Fri, 28 Sep 2012 09:23:43 -0700 Kent Overstreet <[email protected]>
> > wrote:
> >
> > > On Mon, Sep 24, 2012 at 02:56:39PM +1000, NeilBrown wrote:
> > > >
> > > > Hi Jens,
> > > > this patch has been sitting in my -next tree for a little while and I was
> > > > hoping for it to go in for the next merge window.
> > > > It simply allows bio_split() to be used on bios without a payload, such as
> > > > 'discard'.
> > >
> > > Thing is, at some point in the stack a discard bio is going to have data
> > > - see blk_add_rquest_payload(), and it used to be the single page was
> > > added to discard bios above generic_make_request(), in
> > > blkdev_issue_discard() or whatever it's called.
> > >
> > > So while I'm sure your code works, it's just a fragile way of doing it.
> > >
> > > There's also other types of bios where bi_size has nothing to do with
> > > the amount of data in the bi_io_vec - actually I think this is a new
> > > thing, since Martin Petersen just added REQ_WRITE_SAME and I don't think
> > > there were any other instances besides REQ_DISCARD before.
> > >
> > > So my preference would be defining a mask (REQ_DISCARD|REQ_WRITE_SAME),
> > > and if bio->bi_rw & that mask is true, just duplicate the bvec or
> > > whatever.
> >
> > Hi Kent,
> > I'm afraid I don't see the relevance of your comments to the patch.
> >
> > The current bio_split code can successfully split a bio with zero or one
> > bi_vec entry. If there are more than that, we cannot split.
> >
> > How does it matter whether the bio is a DISCARD or a WRITE_SAME or a DATA or
> > whatever?
>
> Hrm, I think I didn't explain very well.
>
> After your change, if bio->bi_vcnt != 0, then it splits the bvec.
>
> The trouble is that discard bios do under certain circumstances have
> bio->bi_vcnt != 0, in which case splitting the bvec is the wrong thing
> to do - first_sectors will quite likely be bigger than the bvec.
>
> In practice this isn't currently a problem for discard bios, because
> since Christoph added blk_add_request_payload(), discard bios won't have
> that bvec added until they hit the scsi layer which will be after any
> splitting. But this is a fairly recent and unrelated change, and IMO not
> the kind of behaviour I'd want to rely on.
>
> WRITE_SAME is a problem for the same reason - bio_sectors(bio) may be
> large, but the bio will always have a single bvec and splitting the bvec
> is always the wrong thing to do for WRITE_SAME.
>
> So, I think it makes more sense to make the splitting conditional on
> !(bio->bi_rw & (REQ_DISCARD|REQ_WRITE_SAME)), in addition to
> bio->bi_vcnt == 1.
>
> ..That make more sense?

Yes, that does make some more sense, thanks. However it doesn't convince me
that we need to change the patch.

I guess my position is that once we get to this code, we absolutely have to
split the bio - it maps to two separate devices in a RAID0 or similar so
not-splitting is not an option.

Maybe various md devices need to detect and reject REQ_DISCARD requests that
have a payload and REQ_WRITE_SAME requests? Or would they need to explicitly
set a flag to say they accept them?

So maybe there is something to fix, but I don't think it is in bit_split,
except maybe to add WARN_ON ??

Thanks,
NeilBrown


Attachments:
signature.asc (828.00 B)

2012-10-03 03:42:35

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH] block: makes bio_split support bio without data

Adding Martin to the cc, so he can chime in on WRITE_SAME if I got it
wrong

On Wed, Oct 03, 2012 at 01:30:45PM +1000, NeilBrown wrote:
> On Tue, 2 Oct 2012 14:09:23 -0700 Kent Overstreet <[email protected]>
> wrote:
>
> > On Tue, Oct 02, 2012 at 04:22:01PM +1000, NeilBrown wrote:
> > > On Fri, 28 Sep 2012 09:23:43 -0700 Kent Overstreet <[email protected]>
> > > wrote:
> > >
> > > > On Mon, Sep 24, 2012 at 02:56:39PM +1000, NeilBrown wrote:
> > > > >
> > > > > Hi Jens,
> > > > > this patch has been sitting in my -next tree for a little while and I was
> > > > > hoping for it to go in for the next merge window.
> > > > > It simply allows bio_split() to be used on bios without a payload, such as
> > > > > 'discard'.
> > > >
> > > > Thing is, at some point in the stack a discard bio is going to have data
> > > > - see blk_add_rquest_payload(), and it used to be the single page was
> > > > added to discard bios above generic_make_request(), in
> > > > blkdev_issue_discard() or whatever it's called.
> > > >
> > > > So while I'm sure your code works, it's just a fragile way of doing it.
> > > >
> > > > There's also other types of bios where bi_size has nothing to do with
> > > > the amount of data in the bi_io_vec - actually I think this is a new
> > > > thing, since Martin Petersen just added REQ_WRITE_SAME and I don't think
> > > > there were any other instances besides REQ_DISCARD before.
> > > >
> > > > So my preference would be defining a mask (REQ_DISCARD|REQ_WRITE_SAME),
> > > > and if bio->bi_rw & that mask is true, just duplicate the bvec or
> > > > whatever.
> > >
> > > Hi Kent,
> > > I'm afraid I don't see the relevance of your comments to the patch.
> > >
> > > The current bio_split code can successfully split a bio with zero or one
> > > bi_vec entry. If there are more than that, we cannot split.
> > >
> > > How does it matter whether the bio is a DISCARD or a WRITE_SAME or a DATA or
> > > whatever?
> >
> > Hrm, I think I didn't explain very well.
> >
> > After your change, if bio->bi_vcnt != 0, then it splits the bvec.
> >
> > The trouble is that discard bios do under certain circumstances have
> > bio->bi_vcnt != 0, in which case splitting the bvec is the wrong thing
> > to do - first_sectors will quite likely be bigger than the bvec.
> >
> > In practice this isn't currently a problem for discard bios, because
> > since Christoph added blk_add_request_payload(), discard bios won't have
> > that bvec added until they hit the scsi layer which will be after any
> > splitting. But this is a fairly recent and unrelated change, and IMO not
> > the kind of behaviour I'd want to rely on.
> >
> > WRITE_SAME is a problem for the same reason - bio_sectors(bio) may be
> > large, but the bio will always have a single bvec and splitting the bvec
> > is always the wrong thing to do for WRITE_SAME.
> >
> > So, I think it makes more sense to make the splitting conditional on
> > !(bio->bi_rw & (REQ_DISCARD|REQ_WRITE_SAME)), in addition to
> > bio->bi_vcnt == 1.
> >
> > ..That make more sense?
>
> Yes, that does make some more sense, thanks. However it doesn't convince me
> that we need to change the patch.
>
> I guess my position is that once we get to this code, we absolutely have to
> split the bio - it maps to two separate devices in a RAID0 or similar so
> not-splitting is not an option.
>
> Maybe various md devices need to detect and reject REQ_DISCARD requests that
> have a payload and REQ_WRITE_SAME requests? Or would they need to explicitly
> set a flag to say they accept them?

I think we should be able to split REQ_DISCARD bios that have a payload
or REQ_WRITE_SAME bios just fine though - for both of those cases, the
payload doesn't correspond to a particular sector, so just copy the
original bvec to the two splits and don't do anything else to it.

This gets so much cleaner with immutable bvecs :p

Actually that might be wrong for REQ_DISCARD bios if they had a payload,
I have no idea what that payload is actually for. But that should never
happen anymore, could make do WARN_ON((bio->bi_rw & REQ_DISCARD) &&
bio->bi_vcnt)

2012-10-03 16:22:23

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [PATCH] block: makes bio_split support bio without data

>>>>> "Kent" == Kent Overstreet <[email protected]> writes:

Kent> I think we should be able to split REQ_DISCARD bios that have a
Kent> payload or REQ_WRITE_SAME bios just fine though - for both of
Kent> those cases, the payload doesn't correspond to a particular
Kent> sector, so just copy the original bvec to the two splits and don't
Kent> do anything else to it.

DISCARD bios come down with a single bvec that is later used in the SCSI
disk driver to describe a memory page that can then be mapped into a
scatter-gather list. The reason for this is that both ATA TRIM and SCSI
UNMAP put the block range descriptors in the payload rather than in the
command itself. By the time MD calls bio_split there will be an empty
bvec in the bio.

For WRITE SAME the parent payload contains a bvec describing a single
logical block of data (i.e. typically 512 bytes). The same bvec is used
for both bios in the pair.

For neither DISCARD, nor WRITE SAME do we need to muck with bv_offset
and bv_len. As a result, my patch uses the bio_is_rw() conditional to
wrap the the bvec munging code.

--
Martin K. Petersen Oracle Linux Engineering