2002-10-03 10:54:36

by Mikael Pettersson

[permalink] [raw]
Subject: initrd breakage in 2.5.38-2.5.40

I've been experiencing initrd-related problems since 2.5.38.
It worked like a charm up to 2.5.37.

The initrd itself works (mine allows users to select root
partition, no modules involved), but some time later, the
kernel hangs hard. (No message, NMI watchdog and SysRQ don't
work.) I can trigger the hangs easily by generating a lot of
FS activity, e.g. by unpacking a kernel tarball just after boot.

When booting without an initrd image the kernel is rock solid.

First I thought the problem was caused by a apparently missing
set_capacity() call in 2.5.38's drivers/block/rd.c:

diff -Nru a/drivers/block/rd.c b/drivers/block/rd.c
--- a/drivers/block/rd.c Sat Sep 21 21:25:46 2002
+++ b/drivers/block/rd.c Sat Sep 21 21:25:46 2002
...
#ifdef CONFIG_BLK_DEV_INITRD
/* We ought to separate initrd operations here */
- register_disk(NULL, mk_kdev(MAJOR_NR,INITRD_MINOR), 1, &rd_bd_op, rd_size<<1);
+ add_disk(&initrd_disk);
devfs_register(devfs_handle, "initrd", DEVFS_FL_DEFAULT, MAJOR_NR,
INITRD_MINOR, S_IFBLK | S_IRUSR, &rd_bd_op, NULL);
#endif

Looking at the other register_disk -> add_disk changes, it looks as
if a "set_capacity(&initrd_disk, rd_size * 2);" call should be made
just before the add_disk call. I tried that and it seemed to fix
initrd in 2.5.38, but unfortunately I still get the hangs in 2.5.39
and 2.5.40 with this patch.

/Mikael


2002-10-03 11:01:50

by Alexander Viro

[permalink] [raw]
Subject: Re: initrd breakage in 2.5.38-2.5.40



On Thu, 3 Oct 2002, Mikael Pettersson wrote:

> First I thought the problem was caused by a apparently missing
> set_capacity() call in 2.5.38's drivers/block/rd.c:

I _really_ doubt it - check the loop just above the add_disk() one.
set_capacity() call is done there, repeating it won't change anything.

2002-10-03 11:47:19

by Mikael Pettersson

[permalink] [raw]
Subject: Re: initrd breakage in 2.5.38-2.5.40

Alexander Viro writes:
>
>
> On Thu, 3 Oct 2002, Mikael Pettersson wrote:
>
> > First I thought the problem was caused by a apparently missing
> > set_capacity() call in 2.5.38's drivers/block/rd.c:
>
> I _really_ doubt it - check the loop just above the add_disk() one.
> set_capacity() call is done there, repeating it won't change anything.

That loop does set_capacity() on each element in rd_disks[], but
set_capacity(disk,size) just sets disk->capacity = size, and
initrd_disk is a different variable so I don't see how initrd_disk
could ever get a capacity assigned to it unless by an explicit
"set_capacity(&initrd_disk, rd_size * 2);".

/Mikael

2002-10-03 12:07:20

by Russell King

[permalink] [raw]
Subject: Re: initrd breakage in 2.5.38-2.5.40

On Thu, Oct 03, 2002 at 07:07:21AM -0400, Alexander Viro wrote:
> On Thu, 3 Oct 2002, Mikael Pettersson wrote:
>
> > First I thought the problem was caused by a apparently missing
> > set_capacity() call in 2.5.38's drivers/block/rd.c:
>
> I _really_ doubt it - check the loop just above the add_disk() one.
> set_capacity() call is done there, repeating it won't change anything.

My mtdblock problems are probably related to this, so I'll followup here.

mtdblock registers its gendisk structure in its open() method.
Unfortunately, do_open wants to obtain this structure before
the open() method (but doesn't use it.)

This patch trivially re-orders stuff to work, and works for me
(with mtdblock.)

--- orig/fs/block_dev.c Thu Oct 3 12:46:08 2002
+++ linux/fs/block_dev.c Thu Oct 3 13:08:10 2002
@@ -631,7 +631,7 @@
}
if (bdev->bd_contains == bdev) {
int part;
- struct gendisk *g = get_gendisk(bdev->bd_dev, &part);
+ struct gendisk *g;

if (!bdev->bd_queue) {
struct blk_dev_struct *p = blk_dev + major(dev);
@@ -645,6 +645,7 @@
if (ret)
goto out2;
}
+ g = get_gendisk(bdev->bd_dev, &part);
if (!bdev->bd_openers) {
struct backing_dev_info *bdi;
sector_t sect = 0;

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2002-10-03 14:35:33

by Alexander Viro

[permalink] [raw]
Subject: Re: initrd breakage in 2.5.38-2.5.40



On Thu, 3 Oct 2002, Russell King wrote:

> My mtdblock problems are probably related to this, so I'll followup here.
>
> mtdblock registers its gendisk structure in its open() method.
> Unfortunately, do_open wants to obtain this structure before
> the open() method (but doesn't use it.)

That's wrong thing to do, actually. Correct way to handle it is the
same as for modular ide, etc. - separate callback to be used by
get_gendisk() and doing allocations/loading subdrivers/etc.

It will go in right after complete switchover to dynamic allocation and
introduction of ->bd_disk.

2002-10-03 14:31:59

by Alexander Viro

[permalink] [raw]
Subject: Re: initrd breakage in 2.5.38-2.5.40



On Thu, 3 Oct 2002, Mikael Pettersson wrote:

> That loop does set_capacity() on each element in rd_disks[], but
> set_capacity(disk,size) just sets disk->capacity = size, and
> initrd_disk is a different variable so I don't see how initrd_disk
> could ever get a capacity assigned to it unless by an explicit
> "set_capacity(&initrd_disk, rd_size * 2);".

_Oh_.

Yes, you are right - it's my fault. FWIW, it should be
(initrd_end-initrd_start+511)>>9
rather than rd_size * 2. Thanks, fixed in my tree, will go to Linus
today.

2002-10-07 12:46:23

by Mikael Pettersson

[permalink] [raw]
Subject: Re: initrd breakage in 2.5.38-2.5.40

On Thu, 3 Oct 2002 13:00:05 +0200, Mikael Pettersson wrote:
>I've been experiencing initrd-related problems since 2.5.38.
>It worked like a charm up to 2.5.37.
>
>The initrd itself works (mine allows users to select root
>partition, no modules involved), but some time later, the
>kernel hangs hard. (No message, NMI watchdog and SysRQ don't
>work.) I can trigger the hangs easily by generating a lot of
>FS activity, e.g. by unpacking a kernel tarball just after boot.
>
>When booting without an initrd image the kernel is rock solid.

Additional information: the problems occurs on every machine
I've tested, and it occurs even if I use a trivial initrd whose
/linuxrc just is "int main(void){return 0;}", so it's not caused
by mounting/pivot_root:ing from the initrd.

/Mikael