2007-02-06 23:27:41

by NeilBrown

[permalink] [raw]
Subject: [PATCH] Re: Bio device too big | kernel BUG at mm/filemap.c:537!

On Tuesday February 6, [email protected] wrote:
>
> This patch should fix the worst of the offences, but I'd like to
> experiment and think a bit more before I submit it to stable.
> And probably test it too - as yet I have only compile and brain
> tested.

Ok, I've experimented and tested and now I know what was causing the
double-unlock.

The following patch is suitable for 2.6.20.1 and mainline. There is
room for a bit more improvement, but only for performance, not
correctness. I'll look into that later.

Thanks,
NeilBrown


------------------------------------
Fix various bugs with aligned reads in RAID5.

It is possible for raid5 to be sent a bio that is too big
for an underlying device. So if it is a READ that we
pass stright down to a device, it will fail and confuse
RAID5.

So in 'chunk_aligned_read' we check that the bio fits within the
parameters for the target device and if it doesn't fit, fall back
on reading through the stripe cache and making lots of one-page
requests.

Note that this is the earliest time we can check against the device
because earlier we don't have a lock on the device, so it could change
underneath us.

Also, the code for handling a retry through the cache when a read
fails has not been tested and was badly broken. This patch fixes that
code.

Signed-off-by: Neil Brown <[email protected]>

### Diffstat output
./drivers/md/raid5.c | 42 +++++++++++++++++++++++++++++++++++++++---
1 file changed, 39 insertions(+), 3 deletions(-)

diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
--- .prev/drivers/md/raid5.c 2007-02-02 14:17:55.000000000 +1100
+++ ./drivers/md/raid5.c 2007-02-06 19:19:01.000000000 +1100
@@ -2570,7 +2570,7 @@ static struct bio *remove_bio_from_retry
}
bi = conf->retry_read_aligned_list;
if(bi) {
- conf->retry_read_aligned = bi->bi_next;
+ conf->retry_read_aligned_list = bi->bi_next;
bi->bi_next = NULL;
bi->bi_phys_segments = 1; /* biased count of active stripes */
bi->bi_hw_segments = 0; /* count of processed stripes */
@@ -2619,6 +2619,27 @@ static int raid5_align_endio(struct bio
return 0;
}

+static int bio_fits_rdev(struct bio *bi)
+{
+ request_queue_t *q = bdev_get_queue(bi->bi_bdev);
+
+ if ((bi->bi_size>>9) > q->max_sectors)
+ return 0;
+ blk_recount_segments(q, bi);
+ if (bi->bi_phys_segments > q->max_phys_segments ||
+ bi->bi_hw_segments > q->max_hw_segments)
+ return 0;
+
+ if (q->merge_bvec_fn)
+ /* it's too hard to apply the merge_bvec_fn at this stage,
+ * just just give up
+ */
+ return 0;
+
+ return 1;
+}
+
+
static int chunk_aligned_read(request_queue_t *q, struct bio * raid_bio)
{
mddev_t *mddev = q->queuedata;
@@ -2665,6 +2686,13 @@ static int chunk_aligned_read(request_qu
align_bi->bi_flags &= ~(1 << BIO_SEG_VALID);
align_bi->bi_sector += rdev->data_offset;

+ if (!bio_fits_rdev(align_bi)) {
+ /* too big in some way */
+ bio_put(align_bi);
+ rdev_dec_pending(rdev, mddev);
+ return 0;
+ }
+
spin_lock_irq(&conf->device_lock);
wait_event_lock_irq(conf->wait_for_stripe,
conf->quiesce == 0,
@@ -3055,7 +3083,9 @@ static int retry_aligned_read(raid5_con
last_sector = raid_bio->bi_sector + (raid_bio->bi_size>>9);

for (; logical_sector < last_sector;
- logical_sector += STRIPE_SECTORS, scnt++) {
+ logical_sector += STRIPE_SECTORS,
+ sector += STRIPE_SECTORS,
+ scnt++) {

if (scnt < raid_bio->bi_hw_segments)
/* already done this stripe */
@@ -3071,7 +3101,13 @@ static int retry_aligned_read(raid5_con
}

set_bit(R5_ReadError, &sh->dev[dd_idx].flags);
- add_stripe_bio(sh, raid_bio, dd_idx, 0);
+ if (!add_stripe_bio(sh, raid_bio, dd_idx, 0)) {
+ release_stripe(sh);
+ raid_bio->bi_hw_segments = scnt;
+ conf->retry_read_aligned = raid_bio;
+ return handled;
+ }
+
handle_stripe(sh, NULL);
release_stripe(sh);
handled++;


2007-02-07 01:15:33

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Re: Bio device too big | kernel BUG at mm/filemap.c:537!

On Wed, 7 Feb 2007 10:26:56 +1100
Neil Brown <[email protected]> wrote:

> +static int bio_fits_rdev(struct bio *bi)
> +{
> + request_queue_t *q = bdev_get_queue(bi->bi_bdev);
> +
> + if ((bi->bi_size>>9) > q->max_sectors)
> + return 0;
> + blk_recount_segments(q, bi);
> + if (bi->bi_phys_segments > q->max_phys_segments ||
> + bi->bi_hw_segments > q->max_hw_segments)
> + return 0;
> +
> + if (q->merge_bvec_fn)
> + /* it's too hard to apply the merge_bvec_fn at this stage,
> + * just just give up
> + */
> + return 0;
> +
> + return 1;
> +}

Isn't think going to return 0 rather a lot of the time?

2007-02-07 01:31:19

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] Re: Bio device too big | kernel BUG at mm/filemap.c:537!

On Tuesday February 6, [email protected] wrote:
> On Wed, 7 Feb 2007 10:26:56 +1100
> Neil Brown <[email protected]> wrote:
>
> > +static int bio_fits_rdev(struct bio *bi)
> > +{
> > + request_queue_t *q = bdev_get_queue(bi->bi_bdev);
> > +
> > + if ((bi->bi_size>>9) > q->max_sectors)
> > + return 0;
> > + blk_recount_segments(q, bi);
> > + if (bi->bi_phys_segments > q->max_phys_segments ||
> > + bi->bi_hw_segments > q->max_hw_segments)
> > + return 0;
> > +
> > + if (q->merge_bvec_fn)
> > + /* it's too hard to apply the merge_bvec_fn at this stage,
> > + * just just give up
> > + */
> > + return 0;
> > +
> > + return 1;
> > +}
>
> Isn't think going to return 0 rather a lot of the time?

Why do you say that?

merge_bvec_fn is only set for dm, md, pktcdvd.c so what won't cause a
zero in real-world cases (it rarely makes sense to put a raid5 on top
of those things).
So it will only return zero when ->max_sectors or ->max_*_segments are
less than the default, and while there are certainly quite a few of
those I wouldn't have expected them to be a majority (else we would
have had more complaints from -mm testers).

Yes, we should find the minimum of those values for devices currently
in the array and use that information in raid5's merge_bvec_fn to stop
bios growing too large. That would make it return 0 somewhat less
often.
But I think for a lot of modern hardware, it won't return 0 at all....
or am I just not seeing some really obvious typo that you can see ??

NeilBrown

2007-02-07 01:40:45

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Re: Bio device too big | kernel BUG at mm/filemap.c:537!

On Wed, 7 Feb 2007 12:30:39 +1100
Neil Brown <[email protected]> wrote:

> On Tuesday February 6, [email protected] wrote:
> > On Wed, 7 Feb 2007 10:26:56 +1100
> > Neil Brown <[email protected]> wrote:
> >
> > > +static int bio_fits_rdev(struct bio *bi)
> > > +{
> > > + request_queue_t *q = bdev_get_queue(bi->bi_bdev);
> > > +
> > > + if ((bi->bi_size>>9) > q->max_sectors)
> > > + return 0;
> > > + blk_recount_segments(q, bi);
> > > + if (bi->bi_phys_segments > q->max_phys_segments ||
> > > + bi->bi_hw_segments > q->max_hw_segments)
> > > + return 0;
> > > +
> > > + if (q->merge_bvec_fn)
> > > + /* it's too hard to apply the merge_bvec_fn at this stage,
> > > + * just just give up
> > > + */
> > > + return 0;
> > > +
> > > + return 1;
> > > +}
> >
> > Isn't think going to return 0 rather a lot of the time?
>
> Why do you say that?

Because I was too lazy to go off and work out who's setting merge_bvec_fn.

> merge_bvec_fn is only set for dm, md, pktcdvd.c so what won't cause a
> zero in real-world cases (it rarely makes sense to put a raid5 on top
> of those things).

OK. Hopefully it'll stay that way...

2007-02-07 16:27:03

by Kai

[permalink] [raw]
Subject: Re: [PATCH] Re: Bio device too big | kernel BUG at mm/filemap.c:537!


On Wed, 7 Feb 2007 10:26:56 +1100, "Neil Brown" <[email protected]> said:
> On Tuesday February 6, [email protected] wrote:
> >
> > This patch should fix the worst of the offences, but I'd like to
> > experiment and think a bit more before I submit it to stable.
> > And probably test it too - as yet I have only compile and brain
> > tested.
>
> Ok, I've experimented and tested and now I know what was causing the
> double-unlock.
>
> The following patch is suitable for 2.6.20.1 and mainline. There is
> room for a bit more improvement, but only for performance, not
> correctness. I'll look into that later.
>
> Thanks,
> NeilBrown

I figure I should test this on my hardware, but since the RAID array
resynched itself when I rebooted back into an earlier kernel version,
I'm guessing it means this bug introduced some corruption into the
array, when it occurred, so I'd like some pointers on how I can test it
out without compromising my data.

2007-02-07 22:09:39

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] Re: Bio device too big | kernel BUG at mm/filemap.c:537!

On Wednesday February 7, [email protected] wrote:
>
> On Wed, 7 Feb 2007 10:26:56 +1100, "Neil Brown" <[email protected]> said:
> > On Tuesday February 6, [email protected] wrote:
> > >
> > > This patch should fix the worst of the offences, but I'd like to
> > > experiment and think a bit more before I submit it to stable.
> > > And probably test it too - as yet I have only compile and brain
> > > tested.
> >
> > Ok, I've experimented and tested and now I know what was causing the
> > double-unlock.
> >
> > The following patch is suitable for 2.6.20.1 and mainline. There is
> > room for a bit more improvement, but only for performance, not
> > correctness. I'll look into that later.
> >
> > Thanks,
> > NeilBrown
>
> I figure I should test this on my hardware, but since the RAID array
> resynched itself when I rebooted back into an earlier kernel version,
> I'm guessing it means this bug introduced some corruption into the
> array, when it occurred, so I'd like some pointers on how I can test it
> out without compromising my data.

This bug should not introduce any data corruption.
It causes some read requests to get a failure from the device, which
will cause raid5 to remove the device from the array (Though the data
will still be intact).
On restart a resync will put everything back as it was.

It is quite possible (this happened to me in my testing) for several
devices to get these errors and for several or even all of these
device to get failed. However even in this case the data is still
intact and "mdadm --assemble --force ..." will put everything back
together.

So there should be no risk of data corruption.

NeilBrown

2007-02-09 17:15:52

by Kai

[permalink] [raw]
Subject: Re: [PATCH] Re: Bio device too big | kernel BUG at mm/filemap.c:537!


On Thu, 8 Feb 2007 09:08:58 +1100, "Neil Brown" <[email protected]> said:
> On Wednesday February 7, [email protected] wrote:
> >
> > On Wed, 7 Feb 2007 10:26:56 +1100, "Neil Brown" <[email protected]> said:
> > > On Tuesday February 6, [email protected] wrote:
> > > >
> > > > This patch should fix the worst of the offences, but I'd like to
> > > > experiment and think a bit more before I submit it to stable.
> > > > And probably test it too - as yet I have only compile and brain
> > > > tested.
> > >
> > > Ok, I've experimented and tested and now I know what was causing the
> > > double-unlock.
> > >
> > > The following patch is suitable for 2.6.20.1 and mainline. There is
> > > room for a bit more improvement, but only for performance, not
> > > correctness. I'll look into that later.
> > >
> > > Thanks,
> > > NeilBrown
> >
> > I figure I should test this on my hardware, but since the RAID array
> > resynched itself when I rebooted back into an earlier kernel version,
> > I'm guessing it means this bug introduced some corruption into the
> > array, when it occurred, so I'd like some pointers on how I can test it
> > out without compromising my data.
>
> This bug should not introduce any data corruption.
> It causes some read requests to get a failure from the device, which
> will cause raid5 to remove the device from the array (Though the data
> will still be intact).
> On restart a resync will put everything back as it was.
>
> It is quite possible (this happened to me in my testing) for several
> devices to get these errors and for several or even all of these
> device to get failed. However even in this case the data is still
> intact and "mdadm --assemble --force ..." will put everything back
> together.
>
> So there should be no risk of data corruption.
>
> NeilBrown

I've been running it for the last few hours, and no error output, yet; I
even did some fairly heavy I/O stuff to try and mess with it, but it
hasn't budged. I'm ready to say, works for me.

-Kai

2007-02-12 08:51:31

by J.A. Magallón

[permalink] [raw]
Subject: Re: [PATCH] Re: Bio device too big | kernel BUG at mm/filemap.c:537!

On Wed, 7 Feb 2007 10:26:56 +1100, Neil Brown <[email protected]> wrote:

> On Tuesday February 6, [email protected] wrote:
> >
> > This patch should fix the worst of the offences, but I'd like to
> > experiment and think a bit more before I submit it to stable.
> > And probably test it too - as yet I have only compile and brain
> > tested.
>
> Ok, I've experimented and tested and now I know what was causing the
> double-unlock.
>
> The following patch is suitable for 2.6.20.1 and mainline. There is
> room for a bit more improvement, but only for performance, not
> correctness. I'll look into that later.
>

> + blk_recount_segments(q, bi);

I needed to export blk_recount_segments() to build raid as a module:

--- linux/block/ll_rw_blk.c.orig 2007-02-12 09:46:55.000000000 +0100
+++ linux/block/ll_rw_blk.c 2007-02-12 09:45:57.000000000 +0100
@@ -1258,6 +1258,8 @@
bio->bi_flags |= (1 << BIO_SEG_VALID);
}

+EXPORT_SYMBOL(blk_recount_segments);
+
static int blk_phys_contig_segment(request_queue_t *q, struct bio *bio,
struct bio *nxt)
{

--
J.A. Magallon <jamagallon()ono!com> \ Software is like sex:
\ It's better when it's free
Mandriva Linux release 2007.1 (Cooker) for i586
Linux 2.6.19-jam06 (gcc 4.1.2 20070115 (prerelease) (4.1.2-0.20070115.1mdv2007.1)) #1 SMP PREEMPT