2003-11-12 02:46:53

by Daniel Craig

[permalink] [raw]
Subject: 2.6.0-test9-bk16 ALi M5229 kernel boot error

G'day all,
I have a Benq laptop (Celeron 2.4GHz) which is using an ALi chipset. I
have it working with kernel 2.4, but would like to migrate it to 2.6.
However, when I use 2.6.0-test9 (I have tried mm2, 'vanilla' and bk17), I
end up with a boot time crash. This is primarily to do with the
initialisation of the IDE controller, as far as I can see. I have included
the kernel message below. As I mentioned before, this works fine on
2.4.22... Does anyone have any idea what might be going wrong...?

Cheers,
Dan Craig.

The boot error is as follows:

ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
ALI15X3: IDE controller at PCI slot 0000:00:0e.0
ALI15X3: chipset revision 197
ALI15X3: not 100% native mode: will probe irqs later
Unable to handle kernel NULL pointer dereference at virtual address 00000020
printing eip:
c0378a69
*pde = 00000000
Oops: 0000 [#1]
CPU: 0
EIP: 0060:[<c0378a69>] Not tainted
EFLAGS: 00010046
EIP is at init_chipset_ali15x3+0x13e/0x1c1
eax: 00000000 ebx: c133fc00 ecx: 80007048 edx: cfea1ef3
esi: 00000246 edi: c12e4800 ebp: cfea1f00 esp: cfea1ee0
ds: 007b es: 007b ss:0068
Process swapper (pid: 1, threadinfo=cfea0000 task=128f900)
Stack: c1346980 00000070 00000079 cfea1ef3 c1051f04 00000000 00000001
cfea1f4c
cfea1f34 c0254a6d c133fc00 c0300d1c 00000001 cfea1f24 c020e343
00000001
0000ffff 00000000 c03aaa00 c133fc00 00000000 cfea1f54 c0254ab6
cfea1f4c
Call Trace:
[<c0254a6d>] do_ide_setup_pci_device+0x159/0x178
[<c020e343>] pci_find_subsys+0x81/0xef
[<c0254ab6>] ide_setup_pci_device+0x2a/0x81
[<c0378efa>] alim15x3_init_one+0x45/0x5c
[<c0379a93>] ide_scan_pcidev+0x5c/0x6e
[<c0379ae5>] ide_scan_pcibus+0x40/0xbb
[<c03799c8>] probe_for_hwifs+0x13/0x17
[<c03799d4>] ide_init_builtin_drivers+0x8/0x13
[<c0379a27>] ide_init+0x48/0x58
[<c03686c2>] do_initcalls+0x2b/0x98
[<c012b52c>] init_workqueues+0x12/0x2a
[<c01050ca>] init+0x33/0x139
[<c0105097>] init+0x0/0x139
[<c0108f25>] kernel_thread_helper+0x5/0xb

Code: 8b 50 20 89 54 24 04 8b 40 10 89 04 24 e8 21 32 e9 ff 0f b6
<0>Kernel panic: Attempted to kill init!


2003-11-12 03:04:24

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.0-test9-bk16 ALi M5229 kernel boot error


On Wed, 12 Nov 2003, Daniel Craig wrote:
>
> Does anyone have any idea what might be going wrong...?

Yes. The ALI driver has some really strange code to avoid tweaking non-ALI
southbridges.

But the thing is, it breaks even _with_ ALI southbridges, if we just don't
find the ALI bridge we expect.

Does this patch fix it?

Linus

---
--- 1.15/drivers/ide/pci/alim15x3.c Sun Aug 24 15:33:30 2003
+++ edited/drivers/ide/pci/alim15x3.c Tue Nov 11 19:03:21 2003
@@ -578,7 +578,6 @@
{
unsigned long flags;
u8 tmpbyte;
- struct pci_dev *north = pci_find_slot(0, PCI_DEVFN(0,0));

pci_read_config_byte(dev, PCI_REVISION_ID, &m5229_revision);

@@ -625,11 +624,9 @@

/*
* We should only tune the 1533 enable if we are using an ALi
- * North bridge. We might have no north found on some zany
- * box without a device at 0:0.0. The ALi bridge will be at
- * 0:0.0 so if we didn't find one we know what is cooking.
+ * south bridge.
*/
- if (north && north->vendor != PCI_VENDOR_ID_AL) {
+ if (!isa_dev) {
local_irq_restore(flags);
return 0;
}

2003-11-12 03:50:26

by Jeff Garzik

[permalink] [raw]
Subject: Re: 2.6.0-test9-bk16 ALi M5229 kernel boot error

Linus Torvalds wrote:
> On Wed, 12 Nov 2003, Daniel Craig wrote:
>
>>Does anyone have any idea what might be going wrong...?
>
>
> Yes. The ALI driver has some really strange code to avoid tweaking non-ALI
> southbridges.
>
> But the thing is, it breaks even _with_ ALI southbridges, if we just don't
> find the ALI bridge we expect.
>
> Does this patch fix it?
>
> Linus
>
> ---
> --- 1.15/drivers/ide/pci/alim15x3.c Sun Aug 24 15:33:30 2003
> +++ edited/drivers/ide/pci/alim15x3.c Tue Nov 11 19:03:21 2003
> @@ -578,7 +578,6 @@
> {
> unsigned long flags;
> u8 tmpbyte;
> - struct pci_dev *north = pci_find_slot(0, PCI_DEVFN(0,0));
>
> pci_read_config_byte(dev, PCI_REVISION_ID, &m5229_revision);
>
> @@ -625,11 +624,9 @@
>
> /*
> * We should only tune the 1533 enable if we are using an ALi
> - * North bridge. We might have no north found on some zany
> - * box without a device at 0:0.0. The ALi bridge will be at
> - * 0:0.0 so if we didn't find one we know what is cooking.
> + * south bridge.
> */
> - if (north && north->vendor != PCI_VENDOR_ID_AL) {
> + if (!isa_dev) {
> local_irq_restore(flags);
> return 0;


For a little bit of history, this 'if' test succeeds on Alpha. We
return here on Alpha to avoid x86-specific stuff.

Al, any chance you could test Linus's patch? It hinges on whether the
Alpha will match the southbridge (isa_dev), rather than the northbridge.
This seems like a safe bet, but better to test and be sure...

Jeff



2003-11-12 04:21:17

by Daniel Craig

[permalink] [raw]
Subject: Re: 2.6.0-test9-bk16 ALi M5229 kernel boot error

Linus,
The patch worked like a charm. 2.6.0-test9-bk16 now boots perfectly on my
BenQ laptop. Thanks a lot!

Cheers,
Dan Craig.

2003-11-12 04:31:36

by Al Viro

[permalink] [raw]
Subject: Re: 2.6.0-test9-bk16 ALi M5229 kernel boot error

On Tue, Nov 11, 2003 at 07:04:17PM -0800, Linus Torvalds wrote:

> Yes. The ALI driver has some really strange code to avoid tweaking non-ALI
> southbridges.
>
> But the thing is, it breaks even _with_ ALI southbridges, if we just don't
> find the ALI bridge we expect.
>
> Does this patch fix it?
> - if (north && north->vendor != PCI_VENDOR_ID_AL) {
> + if (!isa_dev) {

Wrong fix, AFAICS. Original condition is bogus, no arguments here.
However, the point is
"tweak our southbridge only if northbridge is known to be OK with that"
and not
"tweak southbridge only if it's ours"

IOW, proper check is || of those two.

2003-11-12 04:36:24

by Al Viro

[permalink] [raw]
Subject: Re: 2.6.0-test9-bk16 ALi M5229 kernel boot error

On Tue, Nov 11, 2003 at 10:50:04PM -0500, Jeff Garzik wrote:
> >- if (north && north->vendor != PCI_VENDOR_ID_AL) {
> >+ if (!isa_dev) {
> > local_irq_restore(flags);
> > return 0;
>
>
> For a little bit of history, this 'if' test succeeds on Alpha. We
> return here on Alpha to avoid x86-specific stuff.

... on _some_ alphas.

> Al, any chance you could test Linus's patch? It hinges on whether the
> Alpha will match the southbridge (isa_dev), rather than the northbridge.
> This seems like a safe bet, but better to test and be sure...

DS10 here actually has Ali southbridge and it has nothing on 0.0.0, so
it won't make any difference on this box - neither condition is met.

And UP1000 box (also 1533/5229) has faulty DMA on the southbridge - even
for floppy, so it's useless for such testing.

2003-11-12 05:29:15

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.0-test9-bk16 ALi M5229 kernel boot error


On Tue, 11 Nov 2003, Jeff Garzik wrote:
>
> For a little bit of history, this 'if' test succeeds on Alpha. We
> return here on Alpha to avoid x86-specific stuff.

It's not x86-specific stuff, it's literally specific to the ISA bridge.

And I'm pretty sure this horrid test was there at least partly for a
Transmeta setup that had a non-ALI northbridge together with an ALI
southbridge. The fact that it _also_ triggers on other machines may well
be true, though.

Linus

2003-11-12 06:58:54

by David Miller

[permalink] [raw]
Subject: Re: 2.6.0-test9-bk16 ALi M5229 kernel boot error

On Wed, 12 Nov 2003 04:31:34 +0000
[email protected] wrote:

> On Tue, Nov 11, 2003 at 07:04:17PM -0800, Linus Torvalds wrote:
>
> > Does this patch fix it?
> > - if (north && north->vendor != PCI_VENDOR_ID_AL) {
> > + if (!isa_dev) {
>
> Wrong fix, AFAICS. Original condition is bogus, no arguments here.
> However, the point is
> "tweak our southbridge only if northbridge is known to be OK with that"
> and not
> "tweak southbridge only if it's ours"
>
> IOW, proper check is || of those two.

I agree with Al's analysis, and this is the kind of logic needed on
sparc64 boxes as well.

Indeed, blindly deref'ing 'isa_dev' here was pretty bogus :)

2003-11-12 15:07:26

by Ivan Kokshaysky

[permalink] [raw]
Subject: Re: 2.6.0-test9-bk16 ALi M5229 kernel boot error

On Tue, Nov 11, 2003 at 10:58:45PM -0800, David S. Miller wrote:
> On Wed, 12 Nov 2003 04:31:34 +0000
> [email protected] wrote:
> > Wrong fix, AFAICS. Original condition is bogus, no arguments here.
> > However, the point is
> > "tweak our southbridge only if northbridge is known to be OK with that"
> > and not
> > "tweak southbridge only if it's ours"
> >
> > IOW, proper check is || of those two.
>
> I agree with Al's analysis, and this is the kind of logic needed on
> sparc64 boxes as well.

I'm not sure there was any logic at all, given extremely misleading
comments in the original code. That "south-bridge's enable bit" stands
for "enable input pins for 80-conductor cable detection" according
to my (rather sparse) docs, and I don't understand why the hell it has
anything to do with a northbridge.
Someone with a more complete ALi documentation ought to verify that...

> Indeed, blindly deref'ing 'isa_dev' here was pretty bogus :)

Perhaps the source of this bug was the fact that M5229 controllers
are always part of the southbridge chip and therefore respective
"isa_dev" must exist. However, PCI IDs are re-writable on newer
ALi chips, which was probably the case.

Ivan.

2003-11-12 15:52:38

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.0-test9-bk16 ALi M5229 kernel boot error


On Wed, 12 Nov 2003, Ivan Kokshaysky wrote:
>
> I'm not sure there was any logic at all, given extremely misleading
> comments in the original code. That "south-bridge's enable bit" stands
> for "enable input pins for 80-conductor cable detection" according
> to my (rather sparse) docs, and I don't understand why the hell it has
> anything to do with a northbridge.

The thing is, those "enable input pins" are actually GPIO's, and some
boards don't use them as cable detect enables..

The whole thing should probably be done as a PCI quirk. Anyway, I'll
change my patch to be the absolute minimal one, ie just adding the
!isa_dev test instead of removing the old confused logic. I'm pretty
certain that we shouldn't touch those GPIO's at all, but since some boards
probably _do_ want them enabled, let's go for the minimal "avoid oops"
approach.

Linus

2003-11-12 16:40:31

by Ivan Kokshaysky

[permalink] [raw]
Subject: Re: 2.6.0-test9-bk16 ALi M5229 kernel boot error

On Wed, Nov 12, 2003 at 07:52:11AM -0800, Linus Torvalds wrote:
> The whole thing should probably be done as a PCI quirk. Anyway, I'll
> change my patch to be the absolute minimal one, ie just adding the
> !isa_dev test instead of removing the old confused logic. I'm pretty
> certain that we shouldn't touch those GPIO's at all, but since some boards
> probably _do_ want them enabled, let's go for the minimal "avoid oops"
> approach.

Agreed.

Hmm, I've just looked at current 2.4 driver - it does test the isa_dev
and also does nothing for revisions C5 and above, probably for reason.
What about this, cut'n'pasted from 2.4?

Ivan.

--- linux/drivers/ide/pci/alim15x3.c.orig Wed Nov 12 19:05:48 2003
+++ linux/drivers/ide/pci/alim15x3.c Wed Nov 12 19:11:14 2003
@@ -578,6 +578,7 @@ static unsigned int __init init_chipset_
{
unsigned long flags;
u8 tmpbyte;
+ struct pci_dev *north = pci_find_slot(0, PCI_DEVFN(0,0));

pci_read_config_byte(dev, PCI_REVISION_ID, &m5229_revision);

@@ -624,30 +625,35 @@ static unsigned int __init init_chipset_

/*
* We should only tune the 1533 enable if we are using an ALi
+ * North bridge. We might have no north found on some zany
+ * box without a device at 0:0.0. The ALi bridge will be at
+ * 0:0.0 so if we didn't find one we know what is cooking.
* south bridge.
*/
- if (!isa_dev) {
+ if (north && north->vendor != PCI_VENDOR_ID_AL) {
local_irq_restore(flags);
return 0;
}

- /*
- * set south-bridge's enable bit, m1533, 0x79
- */
-
- pci_read_config_byte(isa_dev, 0x79, &tmpbyte);
- if (m5229_revision == 0xC2) {
+ if (m5229_revision < 0xC5 && isa_dev)
+ {
/*
- * 1543C-B0 (m1533, 0x79, bit 2)
+ * set south-bridge's enable bit, m1533, 0x79
*/
- pci_write_config_byte(isa_dev, 0x79, tmpbyte | 0x04);
- } else if (m5229_revision >= 0xC3) {
- /*
- * 1553/1535 (m1533, 0x79, bit 1)
- */
- pci_write_config_byte(isa_dev, 0x79, tmpbyte | 0x02);
- }

+ pci_read_config_byte(isa_dev, 0x79, &tmpbyte);
+ if (m5229_revision == 0xC2) {
+ /*
+ * 1543C-B0 (m1533, 0x79, bit 2)
+ */
+ pci_write_config_byte(isa_dev, 0x79, tmpbyte | 0x04);
+ } else if (m5229_revision >= 0xC3) {
+ /*
+ * 1553/1535 (m1533, 0x79, bit 1)
+ */
+ pci_write_config_byte(isa_dev, 0x79, tmpbyte | 0x02);
+ }
+ }
local_irq_restore(flags);
return 0;
}