2005-01-14 10:50:49

by Reuben Farrelly

[permalink] [raw]
Subject: Breakage with raid in 2.6.11-rc1-mm1 [Regression in mm]

Something seems to have broken with 2.6.11-rc1-mm1, which worked ok with
2.6.10-mm3.

NET: Registered protocol family 17
Starting balanced_irq
BIOS EDD facility v0.16 2004-Jun-25, 2 devices found
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
VFS: Waiting 19sec for root device...
VFS: Waiting 18sec for root device...
VFS: Waiting 17sec for root device...
VFS: Waiting 16sec for root device...
VFS: Waiting 15sec for root device...
VFS: Waiting 14sec for root device...
VFS: Waiting 13sec for root device...
VFS: Waiting 12sec for root device...
VFS: Waiting 11sec for root device...
VFS: Waiting 10sec for root device...
VFS: Waiting 9sec for root device...
VFS: Waiting 8sec for root device...
VFS: Waiting 7sec for root device...
VFS: Waiting 6sec for root device...
VFS: Waiting 5sec for root device...
VFS: Waiting 4sec for root device...
VFS: Waiting 3sec for root device...
VFS: Waiting 2sec for root device...
VFS: Waiting 1sec for root device...
VFS: Cannot open root device "md2" or unknown-block(0,0)
Please append a correct "root=" boot option
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)

The system is running 5 RAID-1 partitions, and md2 is the root as per
grub.conf. Problem seems to be that raid autodetection finds no raid
partitions :(

The two ST380013AS SATA drives are detected earlier in the boot, so I don't
think that's the problem..

Reuben


2005-01-14 11:59:28

by Andrew Morton

[permalink] [raw]
Subject: Re: Breakage with raid in 2.6.11-rc1-mm1 [Regression in mm]

Reuben Farrelly <[email protected]> wrote:
>
> Something seems to have broken with 2.6.11-rc1-mm1, which worked ok with
> 2.6.10-mm3.
>
> NET: Registered protocol family 17
> Starting balanced_irq
> BIOS EDD facility v0.16 2004-Jun-25, 2 devices found
> md: Autodetecting RAID arrays.
> md: autorun ...
> md: ... autorun DONE.
> VFS: Waiting 19sec for root device...
> VFS: Waiting 18sec for root device...
> VFS: Waiting 17sec for root device...
> VFS: Waiting 16sec for root device...
> VFS: Waiting 15sec for root device...
> VFS: Waiting 14sec for root device...
> VFS: Waiting 13sec for root device...
> VFS: Waiting 12sec for root device...
> VFS: Waiting 11sec for root device...
> VFS: Waiting 10sec for root device...
> VFS: Waiting 9sec for root device...
> VFS: Waiting 8sec for root device...
> VFS: Waiting 7sec for root device...
> VFS: Waiting 6sec for root device...
> VFS: Waiting 5sec for root device...
> VFS: Waiting 4sec for root device...
> VFS: Waiting 3sec for root device...
> VFS: Waiting 2sec for root device...
> VFS: Waiting 1sec for root device...
> VFS: Cannot open root device "md2" or unknown-block(0,0)
> Please append a correct "root=" boot option
> Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
>
> The system is running 5 RAID-1 partitions, and md2 is the root as per
> grub.conf. Problem seems to be that raid autodetection finds no raid
> partitions :(
>
> The two ST380013AS SATA drives are detected earlier in the boot, so I don't
> think that's the problem..

hm, the only raidy thing we have in there is the below. Maybe you could
try reverting that?


--- 25/drivers/md/raid5.c~raid5-overlapping-read-hack 2005-01-09 22:20:40.211246912 -0800
+++ 25-akpm/drivers/md/raid5.c 2005-01-09 22:20:40.216246152 -0800
@@ -232,6 +232,7 @@ static struct stripe_head *__find_stripe
}

static void unplug_slaves(mddev_t *mddev);
+static void raid5_unplug_device(request_queue_t *q);

static struct stripe_head *get_active_stripe(raid5_conf_t *conf, sector_t sector,
int pd_idx, int noblock)
@@ -793,7 +794,7 @@ static void compute_parity(struct stripe
* toread/towrite point to the first in a chain.
* The bi_next chain must be in order.
*/
-static void add_stripe_bio (struct stripe_head *sh, struct bio *bi, int dd_idx, int forwrite)
+static int add_stripe_bio (struct stripe_head *sh, struct bio *bi, int dd_idx, int forwrite)
{
struct bio **bip;
raid5_conf_t *conf = sh->raid_conf;
@@ -810,10 +811,10 @@ static void add_stripe_bio (struct strip
else
bip = &sh->dev[dd_idx].toread;
while (*bip && (*bip)->bi_sector < bi->bi_sector) {
- BUG_ON((*bip)->bi_sector + ((*bip)->bi_size >> 9) > bi->bi_sector);
+ if ((*bip)->bi_sector + ((*bip)->bi_size >> 9) > bi->bi_sector)
+ return 0; /* cannot add just now due to overlap */
bip = & (*bip)->bi_next;
}
-/* FIXME do I need to worry about overlapping bion */
if (*bip && bi->bi_next && (*bip) != bi->bi_next)
BUG();
if (*bip)
@@ -840,6 +841,7 @@ static void add_stripe_bio (struct strip
if (sector >= sh->dev[dd_idx].sector + STRIPE_SECTORS)
set_bit(R5_OVERWRITE, &sh->dev[dd_idx].flags);
}
+ return 1;
}


@@ -1413,7 +1415,15 @@ static int make_request (request_queue_t
sh = get_active_stripe(conf, new_sector, pd_idx, (bi->bi_rw&RWA_MASK));
if (sh) {

- add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK));
+ while (!add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK))) {
+ /* add failed due to overlap. Flush everything
+ * and wait a while
+ * FIXME - overlapping requests should be handled better
+ */
+ raid5_unplug_device(mddev->queue);
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ schedule_timeout(1);
+ }

raid5_plug_device(conf);
handle_stripe(sh);
_

2005-01-14 12:18:53

by Reuben Farrelly

[permalink] [raw]
Subject: Re: Breakage with raid in 2.6.11-rc1-mm1 [Regression in mm]

At 12:58 a.m. 15/01/2005, Andrew Morton wrote:
>Reuben Farrelly <[email protected]> wrote:
> >
> > Something seems to have broken with 2.6.11-rc1-mm1, which worked ok with
> > 2.6.10-mm3.
> >
> > NET: Registered protocol family 17
> > Starting balanced_irq
> > BIOS EDD facility v0.16 2004-Jun-25, 2 devices found
> > md: Autodetecting RAID arrays.
> > md: autorun ...
> > md: ... autorun DONE.
> > VFS: Waiting 19sec for root device...
> > VFS: Waiting 18sec for root device...
> > VFS: Waiting 17sec for root device...
> > VFS: Waiting 16sec for root device...
> > VFS: Waiting 15sec for root device...
> > VFS: Waiting 14sec for root device...
> > VFS: Waiting 13sec for root device...
> > VFS: Waiting 12sec for root device...
> > VFS: Waiting 11sec for root device...
> > VFS: Waiting 10sec for root device...
> > VFS: Waiting 9sec for root device...
> > VFS: Waiting 8sec for root device...
> > VFS: Waiting 7sec for root device...
> > VFS: Waiting 6sec for root device...
> > VFS: Waiting 5sec for root device...
> > VFS: Waiting 4sec for root device...
> > VFS: Waiting 3sec for root device...
> > VFS: Waiting 2sec for root device...
> > VFS: Waiting 1sec for root device...
> > VFS: Cannot open root device "md2" or unknown-block(0,0)
> > Please append a correct "root=" boot option
> > Kernel panic - not syncing: VFS: Unable to mount root fs on
> unknown-block(0,0)
> >
> > The system is running 5 RAID-1 partitions, and md2 is the root as per
> > grub.conf. Problem seems to be that raid autodetection finds no raid
> > partitions :(
> >
> > The two ST380013AS SATA drives are detected earlier in the boot, so I
> don't
> > think that's the problem..
>
>hm, the only raidy thing we have in there is the below. Maybe you could
>try reverting that?
>
>
>--- 25/drivers/md/raid5.c~raid5-overlapping-read-hack 2005-01-09
>22:20:40.211246912 -0800
>+++ 25-akpm/drivers/md/raid5.c 2005-01-09 22:20:40.216246152 -0800
>@@ -232,6 +232,7 @@ static struct stripe_head *__find_stripe
> }
>
> static void unplug_slaves(mddev_t *mddev);
>+static void raid5_unplug_device(request_queue_t *q);
>
> static struct stripe_head *get_active_stripe(raid5_conf_t *conf,
> sector_t sector,
> int pd_idx, int noblock)

It's not that patch...with it reverted as above, it I'm still seeing the
same problem. I'll give a generic 2.6.11-rc1 a try and see if the problem
is in there also.

Reuben

2005-01-14 12:45:02

by Reuben Farrelly

[permalink] [raw]
Subject: Re: Breakage with raid in 2.6.11-rc1-mm1 [Regression in mm]

At 12:58 a.m. 15/01/2005, Andrew Morton wrote:
>Reuben Farrelly <[email protected]> wrote:
> >
> > Something seems to have broken with 2.6.11-rc1-mm1, which worked ok with
> > 2.6.10-mm3.
> >
> > NET: Registered protocol family 17
> > Starting balanced_irq
> > BIOS EDD facility v0.16 2004-Jun-25, 2 devices found
> > md: Autodetecting RAID arrays.
> > md: autorun ...
> > md: ... autorun DONE.
> > VFS: Waiting 19sec for root device...
> > VFS: Waiting 18sec for root device...
> > VFS: Waiting 17sec for root device...
> > VFS: Waiting 16sec for root device...
> > VFS: Waiting 15sec for root device...
> > VFS: Waiting 14sec for root device...
> > VFS: Waiting 13sec for root device...
> > VFS: Waiting 12sec for root device...
> > VFS: Waiting 11sec for root device...
> > VFS: Waiting 10sec for root device...
> > VFS: Waiting 9sec for root device...
> > VFS: Waiting 8sec for root device...
> > VFS: Waiting 7sec for root device...
> > VFS: Waiting 6sec for root device...
> > VFS: Waiting 5sec for root device...
> > VFS: Waiting 4sec for root device...
> > VFS: Waiting 3sec for root device...
> > VFS: Waiting 2sec for root device...
> > VFS: Waiting 1sec for root device...
> > VFS: Cannot open root device "md2" or unknown-block(0,0)
> > Please append a correct "root=" boot option
> > Kernel panic - not syncing: VFS: Unable to mount root fs on
> unknown-block(0,0)
> >
> > The system is running 5 RAID-1 partitions, and md2 is the root as per
> > grub.conf. Problem seems to be that raid autodetection finds no raid
> > partitions :(
> >
> > The two ST380013AS SATA drives are detected earlier in the boot, so I
> don't
> > think that's the problem..
>
>hm, the only raidy thing we have in there is the below. Maybe you could
>try reverting that?
>
>
>--- 25/drivers/md/raid5.c~raid5-overlapping-read-hack 2005-01-09
>22:20:40.211246912 -0800
>+++ 25-akpm/drivers/md/raid5.c 2005-01-09 22:20:40.216246152 -0800
>@@ -232,6 +232,7 @@ static struct stripe_head *__find_stripe
> }
>
> static void unplug_slaves(mddev_t *mddev);
>+static void raid5_unplug_device(request_queue_t *q);
>
> static struct stripe_head *get_active_stripe(raid5_conf_t *conf,
> sector_t sector,
> int pd_idx, int noblock)

Ok the breakage occurred somewhere between 2.6.10-mm3 (works) and
2.6.11-rc1 (doesn't work) ie wasn't introduced into the latest -mm patchset
as I first thought.

Are there any other patches that might be worth a try backing out?

reuben

2005-01-14 17:11:16

by Randy.Dunlap

[permalink] [raw]
Subject: Re: Breakage with raid in 2.6.11-rc1-mm1 [Regression in mm]

Reuben Farrelly wrote:
> At 12:58 a.m. 15/01/2005, Andrew Morton wrote:
>
>> Reuben Farrelly <[email protected]> wrote:
>> >
>> > Something seems to have broken with 2.6.11-rc1-mm1, which worked ok
>> with
>> > 2.6.10-mm3.
>> >
>> > NET: Registered protocol family 17
>> > Starting balanced_irq
>> > BIOS EDD facility v0.16 2004-Jun-25, 2 devices found
>> > md: Autodetecting RAID arrays.
>> > md: autorun ...
>> > md: ... autorun DONE.
>> > VFS: Waiting 19sec for root device...
>> > VFS: Waiting 18sec for root device...
>> > VFS: Waiting 17sec for root device...
>> > VFS: Waiting 16sec for root device...
>> > VFS: Waiting 15sec for root device...
>> > VFS: Waiting 14sec for root device...
>> > VFS: Waiting 13sec for root device...
>> > VFS: Waiting 12sec for root device...
>> > VFS: Waiting 11sec for root device...
>> > VFS: Waiting 10sec for root device...
>> > VFS: Waiting 9sec for root device...
>> > VFS: Waiting 8sec for root device...
>> > VFS: Waiting 7sec for root device...
>> > VFS: Waiting 6sec for root device...
>> > VFS: Waiting 5sec for root device...
>> > VFS: Waiting 4sec for root device...
>> > VFS: Waiting 3sec for root device...
>> > VFS: Waiting 2sec for root device...
>> > VFS: Waiting 1sec for root device...
>> > VFS: Cannot open root device "md2" or unknown-block(0,0)
>> > Please append a correct "root=" boot option
>> > Kernel panic - not syncing: VFS: Unable to mount root fs on
>> unknown-block(0,0)
>> >
>> > The system is running 5 RAID-1 partitions, and md2 is the root as per
>> > grub.conf. Problem seems to be that raid autodetection finds no raid
>> > partitions :(
>> >
>> > The two ST380013AS SATA drives are detected earlier in the boot, so
>> I don't
>> > think that's the problem..
>>
>> hm, the only raidy thing we have in there is the below. Maybe you could
>> try reverting that?
>>
>>
>> --- 25/drivers/md/raid5.c~raid5-overlapping-read-hack 2005-01-09
>> 22:20:40.211246912 -0800
>> +++ 25-akpm/drivers/md/raid5.c 2005-01-09 22:20:40.216246152 -0800
>> @@ -232,6 +232,7 @@ static struct stripe_head *__find_stripe
>> }
>>
>> static void unplug_slaves(mddev_t *mddev);
>> +static void raid5_unplug_device(request_queue_t *q);
>>
>> static struct stripe_head *get_active_stripe(raid5_conf_t *conf,
>> sector_t sector,
>> int pd_idx, int noblock)
>
>
> Ok the breakage occurred somewhere between 2.6.10-mm3 (works) and
> 2.6.11-rc1 (doesn't work) ie wasn't introduced into the latest -mm
> patchset as I first thought.
>
> Are there any other patches that might be worth a try backing out?

Someone else reported that they had to back out this one:
waiting-10s-before-mounting-root-filesystem.patch

Can you revert that one and let us know how it goes?

--
~Randy

2005-01-15 12:21:06

by Sander

[permalink] [raw]
Subject: Re: Breakage with raid in 2.6.11-rc1-mm1 [Regression in mm]

Randy.Dunlap wrote (ao):
> Reuben Farrelly wrote:
> >At 12:58 a.m. 15/01/2005, Andrew Morton wrote:
> >
> >>Reuben Farrelly <[email protected]> wrote:
> >>>
> >>> Something seems to have broken with 2.6.11-rc1-mm1, which worked ok
> >>with
> >>> 2.6.10-mm3.
> >>>
> >>> NET: Registered protocol family 17
> >>> Starting balanced_irq
> >>> BIOS EDD facility v0.16 2004-Jun-25, 2 devices found
> >>> md: Autodetecting RAID arrays.
> >>> md: autorun ...
> >>> md: ... autorun DONE.
> >>> VFS: Waiting 19sec for root device...

...

> >>> VFS: Waiting 1sec for root device...
> >>> VFS: Cannot open root device "md2" or unknown-block(0,0)
> >>> Please append a correct "root=" boot option
> >>> Kernel panic - not syncing: VFS: Unable to mount root fs on
> >>unknown-block(0,0)
> >>>
> >>> The system is running 5 RAID-1 partitions, and md2 is the root as
> >>> per grub.conf. Problem seems to be that raid autodetection finds
> >>> no raid partitions :(
> >>>
> >>> The two ST380013AS SATA drives are detected earlier in the boot, so
> >>I don't
> >>> think that's the problem..
> >>
> >>hm, the only raidy thing we have in there is the below. Maybe you could
> >>try reverting that?
> >>
> >>--- 25/drivers/md/raid5.c~raid5-overlapping-read-hack 2005-01-09
> >>22:20:40.211246912 -0800
> >>+++ 25-akpm/drivers/md/raid5.c 2005-01-09 22:20:40.216246152 -0800

...

> >Ok the breakage occurred somewhere between 2.6.10-mm3 (works) and
> >2.6.11-rc1 (doesn't work) ie wasn't introduced into the latest -mm
> >patchset as I first thought.
> >
> >Are there any other patches that might be worth a try backing out?
>
> Someone else reported that they had to back out this one:
> waiting-10s-before-mounting-root-filesystem.patch
>
> Can you revert that one and let us know how it goes?

It Works For Me(tm). This is unpatched 2.6.11-rc1-mm1 (no patches
reverted too):

# uname -r
2.6.11-rc1-mm1
# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid5] [multipath] [raid10]
Event: 2
md1 : active raid10 sdd2[3] sdc2[2] sdb2[1] sda2[0]
70684416 blocks 128K chunks 2 near-copies [4/4] [UUUU]

md0 : active raid1 sdd1[3] sdc1[2] sdb1[1] sda1[0]
500608 blocks [4/4] [UUUU]

unused devices: <none>
# mount
/dev/md1 on / type reiser3 (rw,sync,data=journal,barrier=flush)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
/dev/md0 on /boot type ext2 (ro)
tmpfs on /tmp type tmpfs (rw)


So the problem depends on something. This system is SCSI, and I don't
use modules. I'm happy to provide more info if that would be of any
help.

--
Humilis IT Services and Solutions
http://www.humilis.net

2005-01-16 11:02:17

by Reuben Farrelly

[permalink] [raw]
Subject: Re: Breakage with raid in 2.6.11-rc1-mm1 [Regression in mm]

Hi,

Reuben Farrelly wrote:
> At 12:58 a.m. 15/01/2005, Andrew Morton wrote:
>
>> Reuben Farrelly <[email protected]> wrote:
>> >
>> > Something seems to have broken with 2.6.11-rc1-mm1, which worked ok
>> with
>> > 2.6.10-mm3.
>> >
>> > NET: Registered protocol family 17
>> > Starting balanced_irq
>> > BIOS EDD facility v0.16 2004-Jun-25, 2 devices found
>> > md: Autodetecting RAID arrays.
>> > md: autorun ...
>> > md: ... autorun DONE.

<snip>

>> > Kernel panic - not syncing: VFS: Unable to mount root fs on
>> unknown-block(0,0)
>> >
>> > The system is running 5 RAID-1 partitions, and md2 is the root as per
>> > grub.conf. Problem seems to be that raid autodetection finds no raid
>> > partitions :(
>> >
>> > The two ST380013AS SATA drives are detected earlier in the boot, so
>> I don't
>> > think that's the problem..
>>
>> hm, the only raidy thing we have in there is the below. Maybe you could
>> try reverting that?
>>
>>
>> --- 25/drivers/md/raid5.c~raid5-overlapping-read-hack 2005-01-09
>> 22:20:40.211246912 -0800
>> +++ 25-akpm/drivers/md/raid5.c 2005-01-09 22:20:40.216246152 -0800
>> @@ -232,6 +232,7 @@ static struct stripe_head *__find_stripe
>> }
>>
>> static void unplug_slaves(mddev_t *mddev);
>> +static void raid5_unplug_device(request_queue_t *q);
>>
>> static struct stripe_head *get_active_stripe(raid5_conf_t *conf,
>> sector_t sector,
>> int pd_idx, int noblock)
>
>
> Ok the breakage occurred somewhere between 2.6.10-mm3 (works) and
> 2.6.11-rc1 (doesn't work) ie wasn't introduced into the latest -mm
> patchset as I first thought.
>
> Are there any other patches that might be worth a try backing out?
>
> reuben

I did a full untar of the source and rebuilt my (crusty old) config file
from scratch, and it seems to have come right now. Can't really explain
it though...but obviously wasn't a problem with the -mm release as I
first though. Now running -rc1-mm1 with no problems and no other patches.

Thanks to those who helped on what turned out to be a false alarm.

reuben