2002-10-02 16:38:46

by Miquel van Smoorenburg

[permalink] [raw]
Subject: 2.5.40: raid0_make_request bug and bad: scheduling while atomic!

I'm trying to test 2.5.40 on a NNTP peering server that pumps
hundreds of gigs per day over the network. Interested to see
if I can increase that with 2.5 ;)

Unfortunately my history databases etc are on RAID0:

raid0_make_request bug: can't convert block across chunks or bigger than 32k 8682080 24
raid0_make_request bug: can't convert block across chunks or bigger than 32k 8679792 24

This appears to be a known problem, but I couldn't find a pointer
to a solution anywhere. Scared, I ran fsck -f /dev/md0,
saw no errors, and booted back to 2.4.19

BTW, got something else that looks worrying during boot:
bad: scheduling while atomic!

cpu: 0, clocks: 99813, slice: 3024
CPU0<T0:99808,T1:96784,D:0,S:3024,C:99813>
checking TSC synchronization across 2 CPUs: passed.
Starting migration thread for cpu 0
Bringing up 1
cpu: 1, clocks: 99813, slice: 3024
CPU1<T0:99808,T1:93760,D:0,S:3024,C:99813>
CPU 1 IS NOW UP!
Starting migration thread for cpu 1
Debug: sleeping function called from illegal context at sched.c:1166
c1b6bf18 c0116234 c02755c0 c0275442 0000048e c1b6bf78 c011499b c0275442
0000048e 00000000 c1b6a000 c034a4a0 c1b6bf64 c1b6bfa4 c1b70000 c1b6bf64
c034a4a0 00000001 00000001 00000286 c1b6bf78 c01139ab c1b6f040 00000000
Call Trace:
[<c0116234>]__might_sleep+0x54/0x58
[<c011499b>]wait_for_completion+0x1b/0x114
[<c01139ab>]wake_up_process+0xb/0x10
[<c0115e16>]set_cpus_allowed+0x14a/0x16c
[<c0115e88>]migration_thread+0x50/0x33c
[<c0115e38>]migration_thread+0x0/0x33c
[<c0105501>]kernel_thread_helper+0x5/0xc

bad: scheduling while atomic!
c1b6bf00 c01143d1 c0275420 c1b6a000 c1b6bf70 c1b6bf78 c1b6bf18 c0116234
c02755c0 c0275442 0000048e c1b6bf78 c1b6a000 c1b6bf78 c0114a35 00000000
c1b6a000 c034a4a0 c1b6a000 c1b6bfa4 00000000 c1b6e060 c01147dc 00000000
Call Trace:
[<c01143d1>]schedule+0x3d/0x404
[<c0116234>]__might_sleep+0x54/0x58
[<c0114a35>]wait_for_completion+0xb5/0x114
[<c01147dc>]default_wake_function+0x0/0x34
[<c01147dc>]default_wake_function+0x0/0x34
[<c0115e16>]set_cpus_allowed+0x14a/0x16c
[<c0115e88>]migration_thread+0x50/0x33c
[<c0115e38>]migration_thread+0x0/0x33c
[<c0105501>]kernel_thread_helper+0x5/0xc

CPUS done 4294967295



2002-10-06 11:14:43

by Helge Hafting

[permalink] [raw]
Subject: Re: 2.5.40: raid0_make_request bug and bad: scheduling while atomic!

Miquel van Smoorenburg wrote:
>
> I'm trying to test 2.5.40 on a NNTP peering server that pumps
> hundreds of gigs per day over the network. Interested to see
> if I can increase that with 2.5 ;)
>
> Unfortunately my history databases etc are on RAID0:
>
> raid0_make_request bug: can't convert block across chunks or bigger than 32k 8682080 24
> raid0_make_request bug: can't convert block across chunks or bigger than 32k 8679792 24
>
> This appears to be a known problem, but I couldn't find a pointer
> to a solution anywhere. Scared, I ran fsck -f /dev/md0,
> saw no errors, and booted back to 2.4.19

The workaround is to force BIO's to be no bigger than a page.
That limits performance, but it should still be somewhat better
than the pre-bio times.

This used to be fixable in mpage.c, where a comment
explained what to to in precense of "braindead drivers"
But mpage.c changed sometime between 2.5.36 and 2.5.39.

Now the new include/linux/bio.h has the following:
#define BIO_MAX_PAGES (256)
I tried substituting (1) for (256), but it didn't help.
So I don't mount raid-0 right now.

Helge Hafting

2002-10-06 12:20:37

by Anton Blanchard

[permalink] [raw]
Subject: Re: 2.5.40: raid0_make_request bug and bad: scheduling while atomic!


> The workaround is to force BIO's to be no bigger than a page.
> That limits performance, but it should still be somewhat better
> than the pre-bio times.
>
> This used to be fixable in mpage.c, where a comment
> explained what to to in precense of "braindead drivers"
> But mpage.c changed sometime between 2.5.36 and 2.5.39.
>
> Now the new include/linux/bio.h has the following:
> #define BIO_MAX_PAGES (256)
> I tried substituting (1) for (256), but it didn't help.
> So I don't mount raid-0 right now.

Peter Chubb mailed a fix for this to linux-kernel in the last week
and I can confirm it fixes all my raid0 problems. Thanks Peter!

http://marc.theaimsgroup.com/?l=linux-kernel&m=103369952814053&w=2

Anton

2002-10-09 21:42:41

by Miquel van Smoorenburg

[permalink] [raw]
Subject: Re: 2.5.40: raid0_make_request bug and bad: scheduling while atomic!

According to Anton Blanchard:
> Peter Chubb mailed a fix for this to linux-kernel in the last week
> and I can confirm it fixes all my raid0 problems. Thanks Peter!
>
> http://marc.theaimsgroup.com/?l=linux-kernel&m=103369952814053&w=2

Indeed it works. I'm running it on 2.5.41-mm1; here's the
adjusted patch.

linux-2.5.41-mm1-raid0.patch

--- linux-2.5.41-mm1/drivers/md/raid0.c.orig Tue Oct 8 23:56:14 2002
+++ linux-2.5.41-mm1/drivers/md/raid0.c Wed Oct 9 00:00:58 2002
@@ -162,6 +162,29 @@
return 1;
}

+/**
+ * raid0_mergeable_bvec -- tell bio layer if a two requests can be merged
+ * @q: request queue
+ * @bio: the buffer head that's been built up so far
+ * @biovec: the request that could be merged to it.
+ *
+ * Return 1 if the merge is not permitted (because the
+ * result would cross a chunk boundary), 0 otherwise.
+ */
+static int raid0_mergeable_bvec(request_queue_t *q, struct bio *bio, struct bio_vec *biovec)
+{
+ mddev_t *mddev = q->queuedata;
+ sector_t block;
+ unsigned int chunk_size;
+ unsigned int bio_sz;
+
+ chunk_size = mddev->chunk_size >> 10;
+ block = bio->bi_sector >> 1;
+ bio_sz = (bio->bi_size + biovec->bv_len) >> 10;
+
+ return chunk_size < ((block & (chunk_size - 1)) + bio_sz);
+}
+
static int raid0_run (mddev_t *mddev)
{
unsigned cur=0, i=0, nb_zone;
@@ -233,6 +256,8 @@
conf->hash_table[i++].zone1 = conf->strip_zone + cur;
size -= (conf->smallest->size - zone0_size);
}
+ blk_queue_max_sectors(&mddev->queue, mddev->chunk_size >> 9);
+ blk_queue_merge_bvec(&mddev->queue, raid0_mergeable_bvec);
return 0;

out_free_zone_conf:
@@ -262,13 +287,6 @@
return 0;
}

-/*
- * FIXME - We assume some things here :
- * - requested buffers NEVER bigger than chunk size,
- * - requested buffers NEVER cross stripes limits.
- * Of course, those facts may not be valid anymore (and surely won't...)
- * Hey guys, there's some work out there ;-)
- */
static int raid0_make_request (request_queue_t *q, struct bio *bio)
{
mddev_t *mddev = q->queuedata;
@@ -291,8 +309,8 @@
hash = conf->hash_table + x;
}

- /* Sanity check */
- if (chunk_size < (block & (chunk_size - 1)) + (bio->bi_size >> 10))
+ /* Sanity check -- queue functions should prevent this happening */
+ if (unlikely(chunk_size < (block & (chunk_size - 1)) + (bio->bi_size >> 10)))
goto bad_map;

if (!hash)