2010-11-20 02:39:00

by Andries E. Brouwer

[permalink] [raw]
Subject: ide boot failure 2.6.36/2.6.28 - 2.6.27 works

In case anybody is still interested in this old IDE stuff:
I wanted to boot a recent kernel on an old machine and failed.
The last kernel that worked was 2.6.27.
What goes wrong is that the disks are no longer detected on 2.6.28.
I see that 2.6.28 had a lot of changes in this area.
Maybe this just needs a new boot option I overlooked.
(Or is oldfashioned IDE considered broken these days?)

Google gives me many people with the same problem
(e.g., http://bugs.gentoo.org/253628) but no remedy.
I have not looked at the code yet.

Andries


2010-11-20 03:00:01

by Justin P. Mattock

[permalink] [raw]
Subject: Re: ide boot failure 2.6.36/2.6.28 - 2.6.27 works

On 11/19/2010 06:15 PM, Andries E. Brouwer wrote:
> In case anybody is still interested in this old IDE stuff:
> I wanted to boot a recent kernel on an old machine and failed.
> The last kernel that worked was 2.6.27.
> What goes wrong is that the disks are no longer detected on 2.6.28.
> I see that 2.6.28 had a lot of changes in this area.
> Maybe this just needs a new boot option I overlooked.
> (Or is oldfashioned IDE considered broken these days?)
>
> Google gives me many people with the same problem
> (e.g., http://bugs.gentoo.org/253628) but no remedy.
> I have not looked at the code yet.
>
> Andries
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>


probably best to bisect if you can..

Justin P. Mattock

2010-11-20 03:28:48

by Andries E. Brouwer

[permalink] [raw]
Subject: Re: ide boot failure 2.6.36/2.6.28 - 2.6.27 works

Answering myself (and providing info that can be Googled):

> I wanted to boot a recent kernel on an old machine and failed.
> The last kernel that worked was 2.6.27.
> What goes wrong is that the disks are no longer detected on 2.6.28.

A typical error would be

Cannot open root device 342 or unknown block (3,66)

Reading the code shows that the default probing is no longer done.
Editing ./drivers/ide/ide-generic.c and changing

-static int probe_mask;
+static int probe_mask = 3;

returns my disks to life, and this old machine boots again.

Andries

2010-11-20 07:45:17

by Borislav Petkov

[permalink] [raw]
Subject: Re: ide boot failure 2.6.36/2.6.28 - 2.6.27 works

On Sat, Nov 20, 2010 at 04:28:33AM +0100, Andries E. Brouwer wrote:
> Answering myself (and providing info that can be Googled):
>
> > I wanted to boot a recent kernel on an old machine and failed.
> > The last kernel that worked was 2.6.27.
> > What goes wrong is that the disks are no longer detected on 2.6.28.
>
> A typical error would be
>
> Cannot open root device 342 or unknown block (3,66)
>
> Reading the code shows that the default probing is no longer done.

Well, this got changed in 20df429dd6671804999493baf2952f82582869fa since
we had other problems when having ide-generic and a specific PCI IDE
controller driver enabled at the same time, AFAIR.

There are two fixes I can think of - you either enable the specific IDE
controller driver for your chipset or you enforce probing with

ide_generic.probe_mask=0x3f

on the kernel command line.

HTH.

--
Regards/Gruss,
Boris.

2010-11-20 12:11:48

by Andries E. Brouwer

[permalink] [raw]
Subject: Re: ide boot failure 2.6.36/2.6.28 - 2.6.27 works

On Sat, Nov 20, 2010 at 08:45:08AM +0100, Borislav Petkov wrote:
> On Sat, Nov 20, 2010 at 04:28:33AM +0100, Andries E. Brouwer wrote:
> > Answering myself (and providing info that can be Googled):
> >
> > > I wanted to boot a recent kernel on an old machine and failed.
> > > The last kernel that worked was 2.6.27.
> > > What goes wrong is that the disks are no longer detected on 2.6.28.
> >
> > A typical error would be
> >
> > Cannot open root device 342 or unknown block (3,66)
> >
> > Reading the code shows that the default probing is no longer done.
>
> Well, this got changed in 20df429dd6671804999493baf2952f82582869fa since
> we had other problems when having ide-generic and a specific PCI IDE
> controller driver enabled at the same time, AFAIR.

In the meantime I looked at what happened, and how this regression
was introduced. Mikael Pettersson reported that he lost his NIC
because of commit 343a3451e20314d5959b59b992e33fbaadfe52bf that
caused the IDE code to probe where it did not before.
Because of a resource leak, this caused other hardware
not to be found any longer.

One would hope that this resource leak would be investigated further,
but the reaction was to stop IDE probing, causing a few hundred
people to lose their disk.

A regression.

> There are two fixes I can think of - you either enable the specific IDE
> controller driver for your chipset or you enforce probing with
>
> ide_generic.probe_mask=0x3f
>
> on the kernel command line.

Yes, but my edit was better.

>> Editing ./drivers/ide/ide-generic.c and changing
>>
>> -static int probe_mask;
>> +static int probe_mask = 3;
>>
>> returns my disks to life, and this old machine boots again.


(On the one hand, I have many machines and certainly do not recall
the precise hardware details on all. On the other hand, having a
non-booting kernel that requires separate command-line arguments
is a pain, it requires bookkeeping. The 1-line fix makes it work
without command-line arguments.)

The author of the regression knew that he was breaking some setups
and cleared his conscience by adding a printk
+ printk(KERN_INFO DRV_NAME ": please use \"probe_mask=0x3f\" module "
+ "parameter for probing all legacy ISA IDE ports\n");
at boot time. Of course this scrolls off the screen too quickly to read.
Since the kernel does not boot, there is no dmesg afterwards, so one would
need serious debugging, using serial console or netconsole, to see it.

I pointed at a bugzilla where this is still described as an unsolved problem.


Andries

2010-11-20 13:05:16

by Alan

[permalink] [raw]
Subject: Re: ide boot failure 2.6.36/2.6.28 - 2.6.27 works

> The author of the regression knew that he was breaking some setups
> and cleared his conscience by adding a printk
> + printk(KERN_INFO DRV_NAME ": please use \"probe_mask=0x3f\" module "
> + "parameter for probing all legacy ISA IDE ports\n");
> at boot time. Of course this scrolls off the screen too quickly to read.

I pointed out at the time that this was totally bogus hackery, but
without result. On a PCI box however you should always have a matching
PCI driver, and if not you want to force ide-generic to bind to it in
almost all cases, so if its hitting a lot of people then something else
is wrong in the config choices of the distro

The libata driver tries to be a bit smarter, firstly by not leaking
random resources but also knows not to bind against ports mapped to PCI
devices, or to certain special cases (non PCI standard but PCI space
using) devices.

If you have a PCI ATA driver which is not handled by any of the libata
drivers and is not known by ata_generic then please let me know. I
think we have pretty much everything old, weird and wonderful covered in
libata.

Alan

2010-11-21 01:29:45

by Andries E. Brouwer

[permalink] [raw]
Subject: Re: ide boot failure 2.6.36/2.6.28 - 2.6.27 works

Hi Alan,

> On a PCI box however you should always have a matching PCI driver,

True - in this case PIIX4, and selecting it makes things work
(but was not required earlier).

> If you have a PCI ATA driver which is not handled by any of the libata
> drivers and is not known by ata_generic then please let me know.

Also libata needs the additional selection of PIIX4,
but then works fine. (Of course all devices are renamed.)

Andries


> if its hitting a lot of people

Maybe a few hundred visible via Google.
One in every hundred thousand users?

---
Post by Thomas Ingerslev 2009-09-02
I have been struggling with error this fore quite a while,
and have finally found a solution.
... I ended up with just adding ide_generic.probe_mask=0x03 ...