2000-11-17 23:12:50

by Jasper Spaans

[permalink] [raw]
Subject: [PATCH] raid5 fix after xor.c cleanup

Hi Ingo & lists,

due to the xor.c cleanup in 2.4.0-test11-pre5+, raid5 compiled into the
kernel fails when booting, because the calibrate_xor_block function
hasn't been called while registering a raid5 volume; this leads to a
panic, as no checksumming function has been chosen.

Here's a tiny patch to restore that functionality, can you apply it?

Regards,
--
Jasper Spaans <[email protected]>

diff -Nru linux-2.4.0-test11-pre6-orig/drivers/md/raid5.c linux-2.4.0-test11-pre6/drivers/md/raid5.c
--- linux-2.4.0-test11-pre6-orig/drivers/md/raid5.c Fri Nov 17 23:21:18 2000
+++ linux-2.4.0-test11-pre6/drivers/md/raid5.c Fri Nov 17 23:19:24 2000
@@ -2344,6 +2344,9 @@

int raid5_init (void)
{
+#ifndef MODULE
+ calibrate_xor_block();
+#endif
return register_md_personality (RAID5, &raid5_personality);
}

diff -Nru linux-2.4.0-test11-pre6-orig/drivers/md/xor.c linux-2.4.0-test11-pre6/drivers/md/xor.c
--- linux-2.4.0-test11-pre6-orig/drivers/md/xor.c Fri Nov 17 23:21:18 2000
+++ linux-2.4.0-test11-pre6/drivers/md/xor.c Fri Nov 17 23:31:36 2000
@@ -98,7 +98,7 @@
speed / 1000, speed % 1000);
}

-static int
+int
calibrate_xor_block(void)
{
void *b1, *b2;
@@ -139,5 +139,6 @@
}

MD_EXPORT_SYMBOL(xor_block);
+MD_EXPORT_SYMBOL(calibrate_xor_block);

module_init(calibrate_xor_block);
diff -Nru linux-2.4.0-test11-pre6-orig/include/linux/raid/xor.h linux-2.4.0-test11-pre6/include/linux/raid/xor.h
--- linux-2.4.0-test11-pre6-orig/include/linux/raid/xor.h Fri Nov 17 23:21:48 2000
+++ linux-2.4.0-test11-pre6/include/linux/raid/xor.h Fri Nov 17 23:33:03 2000
@@ -6,6 +6,7 @@
#define MAX_XOR_BLOCKS 5

extern void xor_block(unsigned int count, struct buffer_head **bh_ptr);
+extern int calibrate_xor_block(void);

struct xor_block_template {
struct xor_block_template *next;


2000-11-18 12:12:13

by Jasper Spaans

[permalink] [raw]
Subject: Re: [PATCH] raid5 fix after xor.c cleanup

On Fri, Nov 17, 2000 at 11:41:44PM +0100, Jasper Spaans wrote:

> due to the xor.c cleanup in 2.4.0-test11-pre5+, raid5 compiled into the
> kernel fails when booting, because the calibrate_xor_block function
> hasn't been called while registering a raid5 volume; this leads to a
> panic, as no checksumming function has been chosen.
>
> Here's a tiny patch to restore that functionality, can you apply it?

Hmm, next time I'll need to eat my own dogfood -- this patch doesn't work, it
only compiles. Don't use it.

Regards,
--
Jasper Spaans <[email protected]>

2000-11-18 23:46:29

by Jasper Spaans

[permalink] [raw]
Subject: Re: [BUG] raid5 link error? (was [PATCH] raid5 fix after xor.c cleanup)

On Sat, Nov 18, 2000 at 12:35:36PM +0100, Jasper Spaans wrote:

> Hmm, next time I'll need to eat my own dogfood -- this patch doesn't work,
> it only compiles. Don't use it.

It seems to me the original code was correct, but the linking isn't in the
right order and the initcalls are in the wrong order (ie,
raid5.c::raid5_init() is being called before xor.c::calibrate_xor_block() --
any linking gurus out there who can help me out on this one?

Feeling a bit of a schizo, but getting used to my brown paper bag,

Regards
--
Q_. Jasper Spaans <mailto:[email protected]> -o)
`~\ Conditional Access/DVB-C/OpenTV/Unix-adviseur /\\
Mr /\ _\_v
Zap Een ongezellig dure consultant nodig? Mail [email protected]

2000-11-26 02:56:29

by Friedrich Lobenstock

[permalink] [raw]
Subject: [BUG] 2.4.0-test11-ac3 breaks raid autodetect (was Re: [BUG] raid5 link error? (was [PATCH] raid5 fix after xor.c cleanup))

Neil Brown wrote:
>
> The following patch changes the link order in the Makefile so that xor
> is initiailised before md tries to autostart anything.
> It also takes the theme a bit further and uses module_init/module_exit
> to init and shutdown the raid personalities. This allows us to remove
> the explicit calls to raidXX_init from md.c, which is nice.
>
> I have tested this patch both
> 1/ monolithic kernel and autodetecting an array
> 2/ md and all personalities modules
>
> and it works fine.

Sorry to tell you that I just tried linux-2.4.0-test11-ac3 (which has this
patch) and I couldn't boot because the kernel detects the raid1 devices
but kicks the shortly after. I had to back out this code.

Here how the messages look like:

ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
ide2 at 0xa400-0xa407,0xa002 on irq 9
hda: 20044080 sectors (10263 MB) w/418KiB Cache, CHS=19885/16/63, UDMA(33)
hdc: 20044080 sectors (10263 MB) w/418KiB Cache, CHS=19885/16/63, UDMA(33)
hde: 58633344 sectors (30020 MB) w/418KiB Cache, CHS=58168/16/63, UDMA(66)
Partition check:
hda: hda1 hda2
hdc: hdc1 hdc2 hdc3
hde: hde1
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
LVM version 0.8final by Heinz Mauelshagen (15/02/2000)
lvm -- Driver successfully initialized
RAMDISK: Compressed image found at block 0
Freeing initrd memory: 640k freed
Serial driver version 5.02 (2000-08-09) with MANY_PORTS SHARE_IRQ SERIAL_PCI enabled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
ttyS01 at 0x02f8 (irq = 3) is a 16550A
SCSI subsystem driver Revision: 1.00
request_module[scsi_hostadapter]: Root fs not mounted
md driver 0.90.0 MAX_MD_DEVS=256, MAX_REAL=12
md.c: sizeof(mdp_super_t) = 4096
autodetecting RAID arrays
(read) hda1's sb offset: 51264 [events: 00000056]
(read) hda2's sb offset: 9970560 [events: 00000051]
(read) hde1's sb offset: 29316544 [events: 00000080]
autorun ...
considering hde1 ...
adding hde1 ...
created md2
bind<hde1,1>
running: <hde1>
now!
hde1's event counter: 00000080
md: device name has changed from [dev 22:01] to hde1 since last import!
md2: former device hde1 is unavailable, removing from array!
request_module[md-personality-3]: Root fs not mounted
do_md_run() returned -22
md2 stopped.
unbind<hde1,0>
export_rdev(hde1)
considering hda2 ...
adding hda2 ...
created md1
bind<hda2,1>
running: <hda2>
now!
hda2's event counter: 00000051
md1: former device hdc2 is unavailable, removing from array!
request_module[md-personality-3]: Root fs not mounted
do_md_run() returned -22
md1 stopped.
unbind<hda2,0>
export_rdev(hda2)
considering hda1 ...
adding hda1 ...
created md0
bind<hda1,1>
running: <hda1>
now!
hda1's event counter: 00000056
md0: former device hdc1 is unavailable, removing from array!
request_module[md-personality-3]: Root fs not mounted
do_md_run() returned -22
md0 stopped.
unbind<hda1,0>
export_rdev(hda1)
... autorun DONE.
raid1 personality registered
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP
IP: routing cache hash table of 2048 buckets, 16Kbytes
TCP: Hash tables configured (established 32768 bind 32768)
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
VFS: Mounted root (ext2 filesystem).
VFS: Mounted root (ext2 filesystem) readonly.
change_root: old root has d_count=3


MfG / Regards
Friedrich Lobenstock

2000-11-27 02:50:31

by NeilBrown

[permalink] [raw]
Subject: Re: [BUG] 2.4.0-test11-ac3 breaks raid autodetect (was Re: [BUG] raid5 link error? (was [PATCH] raid5 fix after xor.c cleanup))

On Sunday November 26, [email protected] wrote:
> Neil Brown wrote:
> >
> > The following patch changes the link order in the Makefile so that xor
> > is initiailised before md tries to autostart anything.
> > It also takes the theme a bit further and uses module_init/module_exit
> > to init and shutdown the raid personalities. This allows us to remove
> > the explicit calls to raidXX_init from md.c, which is nice.
> >
> > I have tested this patch both
> > 1/ monolithic kernel and autodetecting an array
> > 2/ md and all personalities modules
> >
> > and it works fine.
>
> Sorry to tell you that I just tried linux-2.4.0-test11-ac3 (which has this
> patch) and I couldn't boot because the kernel detects the raid1 devices
> but kicks the shortly after. I had to back out this code.

Thanks for this....

I have looked more deeply, and discovered the error of my ways.
As the Makefiles now stand, all export-objs (OX_OBJS) get linked
before non-export-objs (O_OBJS) in the same directory, independantly
of any ordering imposed within the Makefile.
This caused md.o to get linked before raid?.o.
Due to carelessness on my part I didn't notice this happening when I
was testing.

The following patch fixes it. I hope the change to Rules.make is
acceptable - I have CCed to linux-kbuild incase anyone there has an
issue with it.

Even allowing for that though, some of the boot-time messages look
very strange. Friedrich: could you let me know how the various
partitions were expected to be combined into raid arrays - from the
boot log, it looks like there are three single drive raid1 arrays, and
that doesn't seem to make much sense.

NeilBrown

--- ./drivers/md/Makefile 2000/11/27 02:05:52 1.1
+++ ./drivers/md/Makefile 2000/11/27 02:09:42 1.2
@@ -28,6 +28,9 @@
# Translate to Rules.make lists.
O_OBJS := $(filter-out $(export-objs), $(obj-y))
OX_OBJS := $(filter $(export-objs), $(obj-y))
+# Need to maintain ordering between O_ and OX_ objects, so define ALL_O our selves
+ALL_O := $(obj-y)
+
M_OBJS := $(sort $(filter-out $(export-objs), $(obj-m)))
MX_OBJS := $(sort $(filter $(export-objs), $(obj-m)))

--- ./Rules.make 2000/11/27 02:08:52 1.1
+++ ./Rules.make 2000/11/27 02:09:42 1.2
@@ -85,7 +85,9 @@
# Rule to compile a set of .o files into one .o file
#
ifdef O_TARGET
+ifndef ALL_O
ALL_O = $(OX_OBJS) $(O_OBJS)
+endif # ALL_O
$(O_TARGET): $(ALL_O)
rm -f $@
ifneq "$(strip $(ALL_O))" ""

2000-11-27 08:42:54

by Luca Berra

[permalink] [raw]
Subject: Re: [BUG] 2.4.0-test11-ac3 breaks raid autodetect (was Re: [BUG] raid5 link error? (was [PATCH] raid5 fix after xor.c cleanup))

On Mon, Nov 27, 2000 at 01:18:52PM +1100, Neil Brown wrote:
> > Sorry to tell you that I just tried linux-2.4.0-test11-ac3 (which has this
> > patch) and I couldn't boot because the kernel detects the raid1 devices
> > but kicks the shortly after. I had to back out this code.
>
> Thanks for this....
>
> I have looked more deeply, and discovered the error of my ways.
also test11-ac4 contains the following patch which is broken and should
be reversed! (if xor.o is built as a module it will export the symbol
calibrate_xor_block_R??????? and require calibrate_xor_block ?!?)
anyway the only piece of code that uses calibrate_xor_block is
drivers/md/xor.c itself, so why export?

Regards,
L.

--- drivers/md/xor.c Sun Nov 26 21:35:14 2000
+++ drivers/md/xor.c.ac4.foo Sun Nov 26 21:43:52 2000
@@ -98,7 +98,7 @@
speed / 1000, speed % 1000);
}

-static int
+int
calibrate_xor_block(void)
{
void *b1, *b2;
@@ -139,5 +139,6 @@
}

MD_EXPORT_SYMBOL(xor_block);
+MD_EXPORT_SYMBOL(calibrate_xor_block);

module_init(calibrate_xor_block);


--
Luca Berra -- [email protected]
Communication Media & Services S.r.l.

2000-11-27 11:22:11

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [KBUILD] Re: [BUG] 2.4.0-test11-ac3 breaks raid autodetect (was Re: [BUG] raid5 link error? (was [PATCH] raid5 fix after xor.c cleanup))

On Mon, Nov 27, 2000 at 01:18:52PM +1100, Neil Brown wrote:
> Thanks for this....
>
> I have looked more deeply, and discovered the error of my ways.
> As the Makefiles now stand, all export-objs (OX_OBJS) get linked
> before non-export-objs (O_OBJS) in the same directory, independantly
> of any ordering imposed within the Makefile.

Yes.

> This caused md.o to get linked before raid?.o.
> Due to carelessness on my part I didn't notice this happening when I
> was testing.
>
> The following patch fixes it. I hope the change to Rules.make is
> acceptable - I have CCed to linux-kbuild incase anyone there has an
> issue with it.

I don't think so. Look at drivers/usb/Makefile for an other (cleaner)
solution to solve this. I don't think it is a good idea to solve the
same problem with two different hacks...

Christoph

--
Always remember that you are unique. Just like everyone else.