2007-02-06 04:08:43

by Kai

[permalink] [raw]
Subject: Bio device too big | kernel BUG at mm/filemap.c:537!

I booted up the new kernel version, 2.6.20; I pretty much copied over my
.config that worked in 2.6.19.2, that has worked correctly since that
version came out... I looked through the menuconfig to see if any new
options had been added, but I'm pretty sure I didn't change anything,
and got this error message shortly after booting:

bio too big device hdg1 (184 > 128)
------------[ cut here ]------------
kernel BUG at mm/filemap.c:537!
invalid opcode: 0000 [#1]
Modules linked in: iptable_filter ip_tables x_tables snd_seq_dummy
snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss
snd_mixer_oss nvidia_agp agpgart
CPU: 0
EIP: 0060:[<c012dc66>] Not tainted VLI
EFLAGS: 00010246 (2.6.20 #1)
EIP is at unlock_page+0xd/0x22
eax: 00000000 ebx: c1714a20 ecx: 00000000 edx: c1714a20
esi: c199c45c edi: 00000001 ebp: 1422bf48 esp: f7c8df18
ds: 007b es: 007b ss: 0068
Process md0_raid5 (pid: 901, ti=f7c8c000 task=f7c50050 task.ti=f7c8c000)
Stack: c1997920 c016363c c1bca460 c1997920 c1bca460 c0315af2 f7c8df4c
f7c8df50
c1bca460 00000000 00000000 1422bf48 0a115f48 00000002 00000000
c1bca460
00000002 f7faf600 fffffffc c0315b92 f7ee6c20 7fffffff f7c8df8c
c031f877
Call Trace:
[<c016363c>] mpage_end_io_read+0x4c/0x5e
[<c0315af2>] retry_aligned_read+0x108/0x13a
[<c0315b92>] raid5d+0x6e/0xcc
[<c031f877>] md_thread+0xdc/0xf2
[<c0122465>] autoremove_wake_function+0x0/0x33
[<c0110a90>] __wake_up_common+0x35/0x4f
[<c0122465>] autoremove_wake_function+0x0/0x33
[<c031f79b>] md_thread+0x0/0xf2
[<c01221b8>] kthread+0x72/0x97
[<c0122146>] kthread+0x0/0x97
[<c0103a7b>] kernel_thread_helper+0x7/0x10
=======================
Code: 73 ff ff ff b9 a4 d6 12 c0 89 fa c7 04 24 02 00 00 00 e8 87 77 27
00 83 c4 44 5b 5e 5f c3 53 89 c3 0f ba 30 00 19 c0 85 c0 75 04 <0f> 0b
eb fe 89 d8 e8 41 ff ff ff 89$
EIP: [<c012dc66>] unlock_page+0xd/0x22 SS:ESP 0068:f7c8df18

The devices it seems to be complaining about are /dev/hdg and /dev/hde,
which are physically attached to the PCI0680 Ultra ATA-133 Host
Controller listed in lspci.txt.
Both drives are 160 GB Western Digital HDDs... don't remember the
precise model, but I can find out if necessary.

Attached are the output of lspci and my .config; if anyone needs further
info, let me know.
Please CC replies, not subscribed.

Cheers,

-Kai


Attachments:
(No filename) (2.30 kB)
.config (36.83 kB)
lspci.txt (5.81 kB)
Download all attachments

2007-02-06 04:38:08

by Andrew Morton

[permalink] [raw]
Subject: Re: Bio device too big | kernel BUG at mm/filemap.c:537!

On Mon, 05 Feb 2007 20:08:39 -0800 "Kai" <[email protected]> wrote:

> I booted up the new kernel version, 2.6.20; I pretty much copied over my
> .config that worked in 2.6.19.2, that has worked correctly since that
> version came out... I looked through the menuconfig to see if any new
> options had been added, but I'm pretty sure I didn't change anything,
> and got this error message shortly after booting:
>
> bio too big device hdg1 (184 > 128)
> ------------[ cut here ]------------
> kernel BUG at mm/filemap.c:537!
> invalid opcode: 0000 [#1]
> Modules linked in: iptable_filter ip_tables x_tables snd_seq_dummy
> snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss
> snd_mixer_oss nvidia_agp agpgart
> CPU: 0
> EIP: 0060:[<c012dc66>] Not tainted VLI
> EFLAGS: 00010246 (2.6.20 #1)
> EIP is at unlock_page+0xd/0x22
> eax: 00000000 ebx: c1714a20 ecx: 00000000 edx: c1714a20
> esi: c199c45c edi: 00000001 ebp: 1422bf48 esp: f7c8df18
> ds: 007b es: 007b ss: 0068
> Process md0_raid5 (pid: 901, ti=f7c8c000 task=f7c50050 task.ti=f7c8c000)
> Stack: c1997920 c016363c c1bca460 c1997920 c1bca460 c0315af2 f7c8df4c
> f7c8df50
> c1bca460 00000000 00000000 1422bf48 0a115f48 00000002 00000000
> c1bca460
> 00000002 f7faf600 fffffffc c0315b92 f7ee6c20 7fffffff f7c8df8c
> c031f877
> Call Trace:
> [<c016363c>] mpage_end_io_read+0x4c/0x5e
> [<c0315af2>] retry_aligned_read+0x108/0x13a
> [<c0315b92>] raid5d+0x6e/0xcc
> [<c031f877>] md_thread+0xdc/0xf2
> [<c0122465>] autoremove_wake_function+0x0/0x33
> [<c0110a90>] __wake_up_common+0x35/0x4f
> [<c0122465>] autoremove_wake_function+0x0/0x33
> [<c031f79b>] md_thread+0x0/0xf2
> [<c01221b8>] kthread+0x72/0x97
> [<c0122146>] kthread+0x0/0x97
> [<c0103a7b>] kernel_thread_helper+0x7/0x10
> =======================
> Code: 73 ff ff ff b9 a4 d6 12 c0 89 fa c7 04 24 02 00 00 00 e8 87 77 27
> 00 83 c4 44 5b 5e 5f c3 53 89 c3 0f ba 30 00 19 c0 85 c0 75 04 <0f> 0b
> eb fe 89 d8 e8 41 ff ff ff 89$
> EIP: [<c012dc66>] unlock_page+0xd/0x22 SS:ESP 0068:f7c8df18
>
> The devices it seems to be complaining about are /dev/hdg and /dev/hde,
> which are physically attached to the PCI0680 Ultra ATA-133 Host
> Controller listed in lspci.txt.
> Both drives are 160 GB Western Digital HDDs... don't remember the
> precise model, but I can find out if necessary.
>

You hit two bugs. It seems that raid5 is submitting BIOs which are larger
than the device can accept. In response someone (probably the block layer)
caused a page to come unlocked twice, possibly by running bi_end_io twice
against the same BIO.

2007-02-06 05:24:37

by NeilBrown

[permalink] [raw]
Subject: Re: Bio device too big | kernel BUG at mm/filemap.c:537!

On Monday February 5, [email protected] wrote:
> On Mon, 05 Feb 2007 20:08:39 -0800 "Kai" <[email protected]> wrote:
>
> You hit two bugs. It seems that raid5 is submitting BIOs which are larger
> than the device can accept. In response someone (probably the block layer)
> caused a page to come unlocked twice, possibly by running bi_end_io twice
> against the same BIO.

At least two bugs... there should be a prize for that :-)

Raid5 was definitely submitting a bio that was too big for the device,
and then when it got an error and went to try it the old-fashioned way
(lots of little Bi's through the stripe-cache) it messed up.
Whether that is what trigger the double-unlock I'm not yet sure.

This patch should fix the worst of the offences, but I'd like to
experiment and think a bit more before I submit it to stable.
And probably test it too - as yet I have only compile and brain
tested.

What is the chunk-size on your raid5? Presumably at least 128k ?

NeilBrown



### Diffstat output
./drivers/md/raid5.c | 40 ++++++++++++++++++++++++++++++++++++++--
1 file changed, 38 insertions(+), 2 deletions(-)

diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
--- .prev/drivers/md/raid5.c 2007-02-06 16:16:39.000000000 +1100
+++ ./drivers/md/raid5.c 2007-02-06 16:20:57.000000000 +1100
@@ -2669,6 +2669,27 @@ static int raid5_align_endio(struct bio
return 0;
}

+static int bio_fits_rdev(struct bio *bi)
+{
+ request_queue_t *q = bdev_get_queue(bi->bi_bdev);
+
+ if ((bi->bi_size>>9) > q->max_sectors)
+ return 0;
+ blk_recount_segments(q, bi);
+ if (bi->bi_phys_segments > q->max_phys_segments ||
+ bi->bi_hw_segments > q->max_hw_segments)
+ return 0;
+
+ if (q->merge_bvec_fn)
+ /* it's too hard to apply the merge_bvec_fn at this stage,
+ * just just give up
+ */
+ return 0;
+
+ return 1;
+}
+
+
static int chunk_aligned_read(request_queue_t *q, struct bio * raid_bio)
{
mddev_t *mddev = q->queuedata;
@@ -2715,6 +2736,13 @@ static int chunk_aligned_read(request_qu
align_bi->bi_flags &= ~(1 << BIO_SEG_VALID);
align_bi->bi_sector += rdev->data_offset;

+ if (!bio_fits_rdev(align_bi)) {
+ /* too big in some way */
+ bio_put(align_bi);
+ rdev_dec_pending(rdev, mddev);
+ return 0;
+ }
+
spin_lock_irq(&conf->device_lock);
wait_event_lock_irq(conf->wait_for_stripe,
conf->quiesce == 0,
@@ -3107,7 +3135,9 @@ static int retry_aligned_read(raid5_con
last_sector = raid_bio->bi_sector + (raid_bio->bi_size>>9);

for (; logical_sector < last_sector;
- logical_sector += STRIPE_SECTORS, scnt++) {
+ logical_sector += STRIPE_SECTORS,
+ sector += STRIPE_SECTORS,
+ scnt++) {

if (scnt < raid_bio->bi_hw_segments)
/* already done this stripe */
@@ -3123,7 +3153,13 @@ static int retry_aligned_read(raid5_con
}

set_bit(R5_ReadError, &sh->dev[dd_idx].flags);
- add_stripe_bio(sh, raid_bio, dd_idx, 0);
+ if (!add_stripe_bio(sh, raid_bio, dd_idx, 0)) {
+ release_stripe(sh);
+ raid_bio->bi_hw_segments = scnt;
+ conf->retry_read_aligned = raid_bio;
+ return handled;
+ }
+
handle_stripe(sh, NULL);
release_stripe(sh);
handled++;