2003-01-25 22:29:40

by Tom Sightler

[permalink] [raw]
Subject: 2.5.59-mm5 hangs on boot

After much effort I finally managed to get my Dell Latitude C810 to boot
a 2.5.x kernel and have been happily running 2.5.59-mm2 on my RedHat 8.0
system for the last few days with very good results. There are a few
small bugs mostly relating to unpluging of things like my USB mouse and
my Aironet wireless adapter but overall everything works great (I'll
report the other bugs in another mail).

I was interested in testing the new IO scheduler in 2.5.59-mm5 because
it attempts to correct a problem that has always bothered me but with
this kernel (using the identical config to -mm2) my system hangs almost
immediately after boot. It happens only a few steps into the rc.sysinit
and I have been attempting to determine the exact location with some
print statements but it seems to be a slightly different times so I'm
not sure it's any particular command or step that is causing it.

My kernel is pretty basic and does not currently have ACPI, APM,
preemption, or local APIC support enabled (these have proven to be
troublesome in the past).

This mail is basically a query to see if there is anything obviously
different between -mm2 and -mm5 that could cause this so that I could
simply back out that single patch. At my first glance none of the
additional patches really stood out as a likely candidate for this
problem but I will continue to look in more depth.

Thanks,
Tom



2003-01-25 23:22:32

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.59-mm5 hangs on boot

Tom Sightler <[email protected]> wrote:
>
> I was interested in testing the new IO scheduler in 2.5.59-mm5 because
> it attempts to correct a problem that has always bothered me

Yup.

> but with
> this kernel (using the identical config to -mm2) my system hangs almost
> immediately after boot.

Several people are reporting this. We seem to have lost a disk request or
such.

First up, please see if changing this:

static int antic_expire = HZ / 25;

to

static int antic_expire = 0;

in drivers/block/deadline-iosched.c fixes it up.


If not then please confirm that

http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.59/2.5.59-luuk/2.5.59-luuk.gz

still causes the problem.

If so then I'd really appreciate it if you could work through the individual
patches in

http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.59/2.5.59-luuk/broken-out/

and find out which one is causing the problem. The patch application order is

vmlinux-fix.patch
deadline-np-42.patch
deadline-np-43.patch
reiserfs-readpages.patch
ext3-scheduling-storm.patch
auto-unplug.patch
less-unplugging.patch
kirq.patch
kirq-up-fix.patch
ext3-truncate-ordered-pages.patch
vma-file-merge.patch
quota-lockfix.patch
quota-offsem.patch
dont-wait-on-inode.patch

Thanks.

2003-01-26 00:40:53

by Tom Sightler

[permalink] [raw]
Subject: Re: 2.5.59-mm5 hangs on boot

On Sat, 2003-01-25 at 18:32, Andrew Morton wrote:
> First up, please see if changing this:
>
> static int antic_expire = HZ / 25;
>
> to
>
> static int antic_expire = 0;
>
> in drivers/block/deadline-iosched.c fixes it up.

This worked, but this is obviously not a real fix right? Just to show
that that's where the problem is I guess.

I'll gladly test other patches.

Thanks,
Tom


2003-01-26 00:50:07

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.59-mm5 hangs on boot

Tom Sightler <[email protected]> wrote:
>
> On Sat, 2003-01-25 at 18:32, Andrew Morton wrote:
> > First up, please see if changing this:
> >
> > static int antic_expire = HZ / 25;
> >
> > to
> >
> > static int antic_expire = 0;
> >
> > in drivers/block/deadline-iosched.c fixes it up.
>
> This worked, but this is obviously not a real fix right? Just to show
> that that's where the problem is I guess.

Yup, thanks. I think others have seen a similar problem without the
anticipatory scheduling patch, so there may be a couple of problems here, or
a strange interaction.

> I'll gladly test other patches.

OK, thanks. Not sure what to suggest at present. Maybe when Luuk has done
the patch iteration and we've fixed whatever is causing his boot failure we
can then move on.

Are you using RAID at all?

2003-01-26 01:09:39

by Tom Sightler

[permalink] [raw]
Subject: Re: 2.5.59-mm5 hangs on boot

On Sat, 2003-01-25 at 19:59, Andrew Morton wrote:
> OK, thanks. Not sure what to suggest at present. Maybe when Luuk has done
> the patch iteration and we've fixed whatever is causing his boot failure we
> can then move on.
>
> Are you using RAID at all?

No RAID. Just a basic laptop with a 30GB IDE drive and IDE CDRW/DVDR
using an Intel chipset. I just checked my config and multi-device
support is not even enabled.

Later,
Tom