2008-12-23 01:08:22

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH next] bio: zero inlined bio_vec

bvec_alloc_bs() zeroes its bio_vec, and at least __blk_queue_bounce()
relies upon that: therefore bio_alloc_bioset() must zero the inlined
bio_vec - without that, lots of nastiness occurs in bounce_end_io and
blk_rq_map_sg and other places when booting up my PAE box.

Signed-off-by: Hugh Dickins <[email protected]>
---

fs/bio.c | 2 ++
1 file changed, 2 insertions(+)

--- linux-next/fs/bio.c 2008-12-21 20:49:25.000000000 +0000
+++ fixed/fs/bio.c 2008-12-22 21:10:06.000000000 +0000
@@ -324,6 +324,8 @@ struct bio *bio_alloc_bioset(gfp_t gfp_m
if (nr_iovecs <= BIO_INLINE_VECS) {
idx = 0;
bvl = bio->bi_inline_vecs;
+ memset(bvl, 0, BIO_INLINE_VECS *
+ sizeof(struct bio_vec));
nr_iovecs = BIO_INLINE_VECS;
} else {
bvl = bvec_alloc_bs(gfp_mask, nr_iovecs, &idx,


2008-12-23 08:08:37

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH next] bio: zero inlined bio_vec

On Tue, Dec 23 2008, Hugh Dickins wrote:
> bvec_alloc_bs() zeroes its bio_vec, and at least __blk_queue_bounce()
> relies upon that: therefore bio_alloc_bioset() must zero the inlined
> bio_vec - without that, lots of nastiness occurs in bounce_end_io and
> blk_rq_map_sg and other places when booting up my PAE box.
>
> Signed-off-by: Hugh Dickins <[email protected]>
> ---
>
> fs/bio.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> --- linux-next/fs/bio.c 2008-12-21 20:49:25.000000000 +0000
> +++ fixed/fs/bio.c 2008-12-22 21:10:06.000000000 +0000
> @@ -324,6 +324,8 @@ struct bio *bio_alloc_bioset(gfp_t gfp_m
> if (nr_iovecs <= BIO_INLINE_VECS) {
> idx = 0;
> bvl = bio->bi_inline_vecs;
> + memset(bvl, 0, BIO_INLINE_VECS *
> + sizeof(struct bio_vec));
> nr_iovecs = BIO_INLINE_VECS;
> } else {
> bvl = bvec_alloc_bs(gfp_mask, nr_iovecs, &idx,

Hmm, where does it die? Nobody should look at the bio_vec index beyond
bio->bi_vcnt, and 0...bio->bi_vcnt-1 should always be initialized due to
the way we fill the entries.

--
Jens Axboe

2008-12-23 10:15:20

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH next] bio: zero inlined bio_vec

On Tue, 23 Dec 2008, Jens Axboe wrote:
> On Tue, Dec 23 2008, Hugh Dickins wrote:
> > bvec_alloc_bs() zeroes its bio_vec, and at least __blk_queue_bounce()
> > relies upon that: therefore bio_alloc_bioset() must zero the inlined
> > bio_vec - without that, lots of nastiness occurs in bounce_end_io and
> > blk_rq_map_sg and other places when booting up my PAE box.
>
> Hmm, where does it die? Nobody should look at the bio_vec index beyond
> bio->bi_vcnt, and 0...bio->bi_vcnt-1 should always be initialized due to
> the way we fill the entries.

It dies in a great variety of places, different mmotms and different
nexts and different configs on that box preferring to collapse in
different places all over the blk/bounce/scsi/map_sg stack.

bounce_end_io() and blk_rq_map_sg() and nommu_map_sg() stick in my
mind as among the locations seen, though perhaps the actual oopses
and BUGs were sometimes a level below.

__blk_queue_bounce() does one transit of the bio skipping any
segments which don't need bouncing, then a second transit of the
newly allocated bounce bio assuming
if (!to->bv_page) {
means that it needs to copy from orig_bio: so if the new bio was
not zeroed there, it'll skip that segment and leave junk?

Doesn't that explain it? If there are no other places, then you
could fix that instead. But in that case, why have we been memset'
ting the allocated bio_vec all this time? Perhaps there are other
traps lurking.

I've not seen the problem on the newer machine I run (usually x86_64
but occasionally) x86_32 with highmem, only on that P4 Xeon which
tips above 4GB. I was a little surprised to find it happening also
with mem=700M yesterday, had come to think it was likely a problem
with bouncing; but later in the night found Rusty has broken mem=.

Hugh

2008-12-23 10:24:18

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH next] bio: zero inlined bio_vec

On Tue, Dec 23 2008, Hugh Dickins wrote:
> On Tue, 23 Dec 2008, Jens Axboe wrote:
> > On Tue, Dec 23 2008, Hugh Dickins wrote:
> > > bvec_alloc_bs() zeroes its bio_vec, and at least __blk_queue_bounce()
> > > relies upon that: therefore bio_alloc_bioset() must zero the inlined
> > > bio_vec - without that, lots of nastiness occurs in bounce_end_io and
> > > blk_rq_map_sg and other places when booting up my PAE box.
> >
> > Hmm, where does it die? Nobody should look at the bio_vec index beyond
> > bio->bi_vcnt, and 0...bio->bi_vcnt-1 should always be initialized due to
> > the way we fill the entries.
>
> It dies in a great variety of places, different mmotms and different
> nexts and different configs on that box preferring to collapse in
> different places all over the blk/bounce/scsi/map_sg stack.
>
> bounce_end_io() and blk_rq_map_sg() and nommu_map_sg() stick in my
> mind as among the locations seen, though perhaps the actual oopses
> and BUGs were sometimes a level below.
>
> __blk_queue_bounce() does one transit of the bio skipping any
> segments which don't need bouncing, then a second transit of the
> newly allocated bounce bio assuming
> if (!to->bv_page) {
> means that it needs to copy from orig_bio: so if the new bio was
> not zeroed there, it'll skip that segment and leave junk?

It's because the code in question does this:

__bio_for_each_segment(from, *bio_orig, i, 0) {
to = bio_iovec_idx(bio, i);
if (!to->bv_page) {

That is, it iterates *bio_orig but indexes bio since it knows it has the
same number of segments. So this code is the odd one out, I'd be
surprised if we had more such cases. And since it would be nice to get
rid of the need to memset in general, can you try with the below patch?

The reason why it also oopses in other places is probably because this
very same code can leave junk in in the bio_vec if it happens to find a
non-NULL page there that isn't valid.

diff --git a/mm/bounce.c b/mm/bounce.c
index 06722c4..f4907d3 100644
--- a/mm/bounce.c
+++ b/mm/bounce.c
@@ -181,12 +181,22 @@ static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig,
struct page *page;
struct bio *bio = NULL;
int i, rw = bio_data_dir(*bio_orig);
- struct bio_vec *to, *from;
+ struct bio_vec *uninitialized_var(to), *from;

bio_for_each_segment(from, *bio_orig, i) {
page = from->bv_page;

/*
+ * Make sure we initialize the page element even if we
+ * don't need to bounce it, since we'll be checking for
+ * a valid page further down.
+ */
+ if (bio) {
+ to = bio->bi_io_vec + i;
+ to->bv_page = NULL;
+ }
+
+ /*
* is destination page below bounce pfn?
*/
if (page_to_pfn(page) <= q->bounce_pfn)
@@ -195,10 +205,10 @@ static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig,
/*
* irk, bounce it
*/
- if (!bio)
+ if (!bio) {
bio = bio_alloc(GFP_NOIO, (*bio_orig)->bi_vcnt);
-
- to = bio->bi_io_vec + i;
+ to = bio->bi_io_vec + i;
+ }

to->bv_page = mempool_alloc(pool, q->bounce_gfp);
to->bv_len = from->bv_len;

--
Jens Axboe

2008-12-23 10:32:29

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH next] bio: zero inlined bio_vec

On Tue, Dec 23 2008, Jens Axboe wrote:
> On Tue, Dec 23 2008, Hugh Dickins wrote:
> > On Tue, 23 Dec 2008, Jens Axboe wrote:
> > > On Tue, Dec 23 2008, Hugh Dickins wrote:
> > > > bvec_alloc_bs() zeroes its bio_vec, and at least __blk_queue_bounce()
> > > > relies upon that: therefore bio_alloc_bioset() must zero the inlined
> > > > bio_vec - without that, lots of nastiness occurs in bounce_end_io and
> > > > blk_rq_map_sg and other places when booting up my PAE box.
> > >
> > > Hmm, where does it die? Nobody should look at the bio_vec index beyond
> > > bio->bi_vcnt, and 0...bio->bi_vcnt-1 should always be initialized due to
> > > the way we fill the entries.
> >
> > It dies in a great variety of places, different mmotms and different
> > nexts and different configs on that box preferring to collapse in
> > different places all over the blk/bounce/scsi/map_sg stack.
> >
> > bounce_end_io() and blk_rq_map_sg() and nommu_map_sg() stick in my
> > mind as among the locations seen, though perhaps the actual oopses
> > and BUGs were sometimes a level below.
> >
> > __blk_queue_bounce() does one transit of the bio skipping any
> > segments which don't need bouncing, then a second transit of the
> > newly allocated bounce bio assuming
> > if (!to->bv_page) {
> > means that it needs to copy from orig_bio: so if the new bio was
> > not zeroed there, it'll skip that segment and leave junk?
>
> It's because the code in question does this:
>
> __bio_for_each_segment(from, *bio_orig, i, 0) {
> to = bio_iovec_idx(bio, i);
> if (!to->bv_page) {
>
> That is, it iterates *bio_orig but indexes bio since it knows it has the
> same number of segments. So this code is the odd one out, I'd be
> surprised if we had more such cases. And since it would be nice to get
> rid of the need to memset in general, can you try with the below patch?
>
> The reason why it also oopses in other places is probably because this
> very same code can leave junk in in the bio_vec if it happens to find a
> non-NULL page there that isn't valid.
>
> diff --git a/mm/bounce.c b/mm/bounce.c
> index 06722c4..f4907d3 100644
> --- a/mm/bounce.c
> +++ b/mm/bounce.c
> @@ -181,12 +181,22 @@ static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig,
> struct page *page;
> struct bio *bio = NULL;
> int i, rw = bio_data_dir(*bio_orig);
> - struct bio_vec *to, *from;
> + struct bio_vec *uninitialized_var(to), *from;
>
> bio_for_each_segment(from, *bio_orig, i) {
> page = from->bv_page;
>
> /*
> + * Make sure we initialize the page element even if we
> + * don't need to bounce it, since we'll be checking for
> + * a valid page further down.
> + */
> + if (bio) {
> + to = bio->bi_io_vec + i;
> + to->bv_page = NULL;
> + }
> +
> + /*
> * is destination page below bounce pfn?
> */
> if (page_to_pfn(page) <= q->bounce_pfn)
> @@ -195,10 +205,10 @@ static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig,
> /*
> * irk, bounce it
> */
> - if (!bio)
> + if (!bio) {
> bio = bio_alloc(GFP_NOIO, (*bio_orig)->bi_vcnt);
> -
> - to = bio->bi_io_vec + i;
> + to = bio->bi_io_vec + i;
> + }
>
> to->bv_page = mempool_alloc(pool, q->bounce_gfp);
> to->bv_len = from->bv_len;

Nope, that still wont be enough, since we leave entries 0..i-1
uninitialized. Lets just do this instead.

diff --git a/mm/bounce.c b/mm/bounce.c
index 06722c4..877be42 100644
--- a/mm/bounce.c
+++ b/mm/bounce.c
@@ -195,11 +195,14 @@ static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig,
/*
* irk, bounce it
*/
- if (!bio)
- bio = bio_alloc(GFP_NOIO, (*bio_orig)->bi_vcnt);
+ if (!bio) {
+ unsigned int vcnt = (*bio_orig)->bi_vcnt;

- to = bio->bi_io_vec + i;
+ bio = bio_alloc(GFP_NOIO, vcnt);
+ memset(bio->bi_io_vec, 0,vcnt * sizeof(struct bio_vec));
+ }

+ to = bio->bi_io_vec + i;
to->bv_page = mempool_alloc(pool, q->bounce_gfp);
to->bv_len = from->bv_len;
to->bv_offset = from->bv_offset;

--
Jens Axboe

2008-12-23 11:20:42

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH next] bio: zero inlined bio_vec

On Tue, 23 Dec 2008, Jens Axboe wrote:
> On Tue, Dec 23 2008, Jens Axboe wrote:
> >
> > That is, it iterates *bio_orig but indexes bio since it knows it has the
> > same number of segments. So this code is the odd one out, I'd be
> > surprised if we had more such cases. And since it would be nice to get
> > rid of the need to memset in general, can you try with the below patch?
>
> Nope, that still wont be enough, since we leave entries 0..i-1
> uninitialized. Lets just do this instead.

Yes, that second version worked for me. But if __blk_queue_bounce()
is indeed the odd one out (likely but not certain), then the extension
below makes more sense - I don't like how your bio.c returns a cleared
bio_vec in some cases but not in others. This is running fine for me
on my test machines, but my test coverage is probably very poor (am I
ever using more than the inlined bio_vec?), and I've not really tried
to understand all the ways through bvec_alloc_bs().

Hugh

--- a/fs/bio.c
+++ b/fs/bio.c
@@ -180,7 +180,7 @@ struct bio_vec *bvec_alloc_bs(gfp_t gfp_
* kzalloc() for the exact number of vecs right away.
*/
if (!bs)
- bvl = kzalloc(nr * sizeof(struct bio_vec), gfp_mask);
+ bvl = kmalloc(nr * sizeof(struct bio_vec), gfp_mask);

/*
* see comment near bvec_array define!
@@ -237,9 +237,6 @@ fallback:
}
}

- if (bvl)
- memset(bvl, 0, bvec_nr_vecs(*idx) * sizeof(struct bio_vec));
-
return bvl;
}

--- a/mm/bounce.c
+++ b/mm/bounce.c
@@ -198,11 +198,14 @@ static void __blk_queue_bounce(struct re
/*
* irk, bounce it
*/
- if (!bio)
- bio = bio_alloc(GFP_NOIO, (*bio_orig)->bi_vcnt);
+ if (!bio) {
+ unsigned int vcnt = (*bio_orig)->bi_vcnt;

- to = bio->bi_io_vec + i;
+ bio = bio_alloc(GFP_NOIO, vcnt);
+ memset(bio->bi_io_vec, 0,vcnt * sizeof(struct bio_vec));
+ }

+ to = bio->bi_io_vec + i;
to->bv_page = mempool_alloc(pool, q->bounce_gfp);
to->bv_len = from->bv_len;
to->bv_offset = from->bv_offset;

2008-12-23 11:40:29

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH next] bio: zero inlined bio_vec

On Tue, Dec 23 2008, Hugh Dickins wrote:
> On Tue, 23 Dec 2008, Jens Axboe wrote:
> > On Tue, Dec 23 2008, Jens Axboe wrote:
> > >
> > > That is, it iterates *bio_orig but indexes bio since it knows it has the
> > > same number of segments. So this code is the odd one out, I'd be
> > > surprised if we had more such cases. And since it would be nice to get
> > > rid of the need to memset in general, can you try with the below patch?
> >
> > Nope, that still wont be enough, since we leave entries 0..i-1
> > uninitialized. Lets just do this instead.
>
> Yes, that second version worked for me. But if __blk_queue_bounce()
> is indeed the odd one out (likely but not certain), then the extension
> below makes more sense - I don't like how your bio.c returns a cleared
> bio_vec in some cases but not in others. This is running fine for me
> on my test machines, but my test coverage is probably very poor (am I
> ever using more than the inlined bio_vec?), and I've not really tried
> to understand all the ways through bvec_alloc_bs().
>
> Hugh
>
> --- a/fs/bio.c
> +++ b/fs/bio.c
> @@ -180,7 +180,7 @@ struct bio_vec *bvec_alloc_bs(gfp_t gfp_
> * kzalloc() for the exact number of vecs right away.
> */
> if (!bs)
> - bvl = kzalloc(nr * sizeof(struct bio_vec), gfp_mask);
> + bvl = kmalloc(nr * sizeof(struct bio_vec), gfp_mask);
>
> /*
> * see comment near bvec_array define!
> @@ -237,9 +237,6 @@ fallback:
> }
> }
>
> - if (bvl)
> - memset(bvl, 0, bvec_nr_vecs(*idx) * sizeof(struct bio_vec));
> -
> return bvl;
> }

I agree, those should be killed too. It's why I didn't want to rely on
memset for fixing your issue :-)

I'll merge the bounce.c fix into the original patch.

--
Jens Axboe