2005-09-13 05:22:23

by Keith Owens

[permalink] [raw]
Subject: 2.6.14-rc1 breaks tg3 on ia64

2.6.14-rc1 + kdb on ia64 (SGI Altix).

tg3.c:v3.39 (September 5, 2005)
ACPI: PCI Interrupt 0001:01:04.0[A]: no GSI
BRIDGE ERR_STATUS 0x800
BRIDGE ERR_STATUS 0x800
PCI BRIDGE ERROR: int_status is 0x800 for 011c32:slab0:widget15:bus0
Dumping relevant 011c32:slab0:widget15:bus0 registers for each bit set...
11: PCI bus device select timeout
PCI Error Address Register: 0x3000000316808
PCI Error Address: 0x316808
PIC Multiple Interrupt Register is 0x800
11: PCI bus device select timeout

Followed by a machine check and reboot :( 2.6.13 worked fine. Any
ideas which patch to backout this time?


2005-09-13 05:37:59

by David Miller

[permalink] [raw]
Subject: Re: 2.6.14-rc1 breaks tg3 on ia64

From: Keith Owens <[email protected]>
Date: Tue, 13 Sep 2005 15:22:17 +1000

> 2.6.14-rc1 + kdb on ia64 (SGI Altix).
>
> tg3.c:v3.39 (September 5, 2005)
> ACPI: PCI Interrupt 0001:01:04.0[A]: no GSI
> BRIDGE ERR_STATUS 0x800
> BRIDGE ERR_STATUS 0x800
> PCI BRIDGE ERROR: int_status is 0x800 for 011c32:slab0:widget15:bus0
> Dumping relevant 011c32:slab0:widget15:bus0 registers for each bit set...
> 11: PCI bus device select timeout
> PCI Error Address Register: 0x3000000316808
> PCI Error Address: 0x316808
> PIC Multiple Interrupt Register is 0x800
> 11: PCI bus device select timeout
>
> Followed by a machine check and reboot :( 2.6.13 worked fine. Any
> ideas which patch to backout this time?

Does copying over the 2.6.13 tg3.[ch] driver over into your
2.6.14-rc1 tree make it work?

2005-09-13 06:17:42

by Keith Owens

[permalink] [raw]
Subject: Re: 2.6.14-rc1 breaks tg3 on ia64

On Mon, 12 Sep 2005 22:37:55 -0700 (PDT),
"David S. Miller" <[email protected]> wrote:
>From: Keith Owens <[email protected]>
>Date: Tue, 13 Sep 2005 15:22:17 +1000
>
>> 2.6.14-rc1 + kdb on ia64 (SGI Altix).
>>
>> tg3.c:v3.39 (September 5, 2005)
>> ACPI: PCI Interrupt 0001:01:04.0[A]: no GSI
>> BRIDGE ERR_STATUS 0x800
>> BRIDGE ERR_STATUS 0x800
>> PCI BRIDGE ERROR: int_status is 0x800 for 011c32:slab0:widget15:bus0
>> Dumping relevant 011c32:slab0:widget15:bus0 registers for each bit set...
>> 11: PCI bus device select timeout
>> PCI Error Address Register: 0x3000000316808
>> PCI Error Address: 0x316808
>> PIC Multiple Interrupt Register is 0x800
>> 11: PCI bus device select timeout
>>
>> Followed by a machine check and reboot :( 2.6.13 worked fine. Any
>> ideas which patch to backout this time?
>
>Does copying over the 2.6.13 tg3.[ch] driver over into your
>2.6.14-rc1 tree make it work?

No, the 2.6.13 driver in 2.6.14-rc1 has exactly the same problem.

The last time that tg3 broke like this, it was because of the patch
below, in 2.6.13-rc6. That was backed out in 2.6.13-rc7. Was the PCI
patch (or equivalent) reinstated in 2.6.14-rc1?

From: John W. Linville <[email protected]>
Date: Fri, 5 Aug 2005 01:06:10 +0000 (-0700)
Subject: [PATCH] PCI: restore BAR values after D3hot->D0 for devices that need it
X-Git-Tag: v2.6.13-rc6
X-Git-Url: http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=fec59a711eef002d4ef9eb8de09dd0a26986eb77

2005-09-13 06:47:55

by David Miller

[permalink] [raw]
Subject: Re: 2.6.14-rc1 breaks tg3 on ia64

From: Keith Owens <[email protected]>
Date: Tue, 13 Sep 2005 16:17:33 +1000

> No, the 2.6.13 driver in 2.6.14-rc1 has exactly the same problem.
>
> The last time that tg3 broke like this, it was because of the patch
> below, in 2.6.13-rc6. That was backed out in 2.6.13-rc7. Was the PCI
> patch (or equivalent) reinstated in 2.6.14-rc1?

It was reinstated, with a fix for the sparc64 problems it caused.
I wasn't aware of any ia64 regressions introduced by it.

2005-09-13 06:49:01

by Keith Owens

[permalink] [raw]
Subject: Re: 2.6.14-rc1 breaks tg3 on ia64

On Tue, 13 Sep 2005 16:17:33 +1000,
Keith Owens <[email protected]> wrote:
>On Mon, 12 Sep 2005 22:37:55 -0700 (PDT),
>"David S. Miller" <[email protected]> wrote:
>>From: Keith Owens <[email protected]>
>>Date: Tue, 13 Sep 2005 15:22:17 +1000
>>
>>> 2.6.14-rc1 + kdb on ia64 (SGI Altix).
>>>
>>> tg3.c:v3.39 (September 5, 2005)
>>> ACPI: PCI Interrupt 0001:01:04.0[A]: no GSI
>>> BRIDGE ERR_STATUS 0x800
>>> BRIDGE ERR_STATUS 0x800
>>> PCI BRIDGE ERROR: int_status is 0x800 for 011c32:slab0:widget15:bus0
>>> Dumping relevant 011c32:slab0:widget15:bus0 registers for each bit set...
>>> 11: PCI bus device select timeout
>>> PCI Error Address Register: 0x3000000316808
>>> PCI Error Address: 0x316808
>>> PIC Multiple Interrupt Register is 0x800
>>> 11: PCI bus device select timeout
>>>
>>> Followed by a machine check and reboot :( 2.6.13 worked fine. Any
>>> ideas which patch to backout this time?
>>
>>Does copying over the 2.6.13 tg3.[ch] driver over into your
>>2.6.14-rc1 tree make it work?
>
>No, the 2.6.13 driver in 2.6.14-rc1 has exactly the same problem.
>
>The last time that tg3 broke like this, it was because of the patch
>below, in 2.6.13-rc6. That was backed out in 2.6.13-rc7. Was the PCI
>patch (or equivalent) reinstated in 2.6.14-rc1?
>
>From: John W. Linville <[email protected]>
>Date: Fri, 5 Aug 2005 01:06:10 +0000 (-0700)
>Subject: [PATCH] PCI: restore BAR values after D3hot->D0 for devices that need it
>X-Git-Tag: v2.6.13-rc6
>X-Git-Url: http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=fec59a711eef002d4ef9eb8de09dd0a26986eb77

Another data point. Omit the tg3 from the build and then it breaks on
the SCSI qla2300 driver. It is a PCI/ACPI specific problem on ia64,
not just network.

qla1280: QLA12160 found on PCI bus 1, dev 3
ACPI: PCI Interrupt 0001:01:03.0[A]: no GSI
BRIDGE ERR_STATUS 0x800
BRIDGE ERR_STATUS 0x800
PCI BRIDGE ERROR: int_status is 0x800 for 011c32:slab0:widget15:bus0
Dumping relevant 011c32:slab0:widget15:bus0 registers for each bit set...
11: PCI bus device select timeout
PCI Error Address Register: 0x3000000302008
PCI Error Address: 0x302008
PIC Multiple Interrupt Register is 0x800
11: PCI bus device select timeout


2005-09-13 07:00:00

by Greg KH

[permalink] [raw]
Subject: Re: 2.6.14-rc1 breaks tg3 on ia64

On Tue, Sep 13, 2005 at 04:17:33PM +1000, Keith Owens wrote:
> On Mon, 12 Sep 2005 22:37:55 -0700 (PDT),
> "David S. Miller" <[email protected]> wrote:
> >From: Keith Owens <[email protected]>
> >Date: Tue, 13 Sep 2005 15:22:17 +1000
> >
> >> 2.6.14-rc1 + kdb on ia64 (SGI Altix).
> >>
> >> tg3.c:v3.39 (September 5, 2005)
> >> ACPI: PCI Interrupt 0001:01:04.0[A]: no GSI
> >> BRIDGE ERR_STATUS 0x800
> >> BRIDGE ERR_STATUS 0x800
> >> PCI BRIDGE ERROR: int_status is 0x800 for 011c32:slab0:widget15:bus0
> >> Dumping relevant 011c32:slab0:widget15:bus0 registers for each bit set...
> >> 11: PCI bus device select timeout
> >> PCI Error Address Register: 0x3000000316808
> >> PCI Error Address: 0x316808
> >> PIC Multiple Interrupt Register is 0x800
> >> 11: PCI bus device select timeout
> >>
> >> Followed by a machine check and reboot :( 2.6.13 worked fine. Any
> >> ideas which patch to backout this time?
> >
> >Does copying over the 2.6.13 tg3.[ch] driver over into your
> >2.6.14-rc1 tree make it work?
>
> No, the 2.6.13 driver in 2.6.14-rc1 has exactly the same problem.
>
> The last time that tg3 broke like this, it was because of the patch
> below, in 2.6.13-rc6. That was backed out in 2.6.13-rc7. Was the PCI
> patch (or equivalent) reinstated in 2.6.14-rc1?
>
> From: John W. Linville <[email protected]>
> Date: Fri, 5 Aug 2005 01:06:10 +0000 (-0700)
> Subject: [PATCH] PCI: restore BAR values after D3hot->D0 for devices that need it
> X-Git-Tag: v2.6.13-rc6
> X-Git-Url: http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=fec59a711eef002d4ef9eb8de09dd0a26986eb77

So does reverting this patch solve the problem?

thanks,

greg k-h

2005-09-13 07:27:43

by Keith Owens

[permalink] [raw]
Subject: Re: 2.6.14-rc1 breaks tg3 on ia64

On Mon, 12 Sep 2005 23:59:37 -0700,
Greg KH <[email protected]> wrote:
>On Tue, Sep 13, 2005 at 04:17:33PM +1000, Keith Owens wrote:
>> The last time that tg3 broke like this, it was because of the patch
>> below, in 2.6.13-rc6. That was backed out in 2.6.13-rc7. Was the PCI
>> patch (or equivalent) reinstated in 2.6.14-rc1?
>>
>> From: John W. Linville <[email protected]>
>> Date: Fri, 5 Aug 2005 01:06:10 +0000 (-0700)
>> Subject: [PATCH] PCI: restore BAR values after D3hot->D0 for devices that need it
>> X-Git-Tag: v2.6.13-rc6
>> X-Git-Url: http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=fec59a711eef002d4ef9eb8de09dd0a26986eb77
>
>So does reverting this patch solve the problem?

I reversing
http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=064b53dbcc977dbf2753a67c2b8fc1c061d74f21,
which appears to be the latest version of this patch. There was a
patch reject in sparc64, but the common code was reverted. IA64 (SGI
Altix) with that patch reverted now boots 2.6.14-rc1.

2005-09-17 16:48:43

by John W. Linville

[permalink] [raw]
Subject: Re: 2.6.14-rc1 breaks tg3 on ia64

On Sat, Sep 17, 2005 at 11:34:34AM -0500, Jack Steiner wrote:

> We are working on an SN-only workaround. No guarantee, but the person
> working on it is optimistic that we can fix the problem in SN code
> w/o making any generic changes. I should know more on Monday.
>
> Long term, we are making SN ACPI compliant - or at leeast a lot closer.

Well, there you go... :-)

John
--
John W. Linville
[email protected]

2005-09-17 16:47:16

by John W. Linville

[permalink] [raw]
Subject: Re: 2.6.14-rc1 breaks tg3 on ia64

On Sat, Sep 17, 2005 at 09:16:17AM -0700, Greg KH wrote:
> On Sat, Sep 17, 2005 at 11:59:14AM -0400, John W. Linville wrote:
> > On Sat, Sep 17, 2005 at 08:47:03AM -0700, Tony Luck wrote:

> > > Anyone know anything more about this problem? I'm not seeing it
> > > on any of my systems ... so perhaps it only affects cards with a
> > > PCI bridge on them, or cards that haven't already been initialized
> > > by EFI.
> >
> > I posted a patch on Wednesday:
> >
> > http://www.ussg.iu.edu/hypermail/linux/kernel/0509.1/2193.html
> >
> > The original reporter (Keith Owens <[email protected]>) confirmed this
> > patch to fix the problem.
>
> Yes, and a number of people objected to that patch. Care to respond to
> them?

Care to check your email? :-)

http://www.ussg.iu.edu/hypermail/linux/kernel/0509.1/2267.html

Basically, the concerns raised are non-issues. The new patch
merely limits the BAR restoration to those situations where it is
truly needed. Anything broken in that situation was broken before
the original patch as well.

The only other point was that this particular ia64 hardware is
broken if you can't rewrite its BARs. That may well be, but we can
accomodate it w/o losing the intent of the original patch.

Tony's post indicates that this is not a generic ia64 problem. If you
want a patch along the lines of what Dave Miller and Ivan Kokshaysky
advocate (i.e. something in the pci access routines for this box),
then Keith or someone else w/ knowledge of (and access to) this box
will need to step forward w/ a patch or at least some information.

John
--
John W. Linville
[email protected]

2005-09-17 16:35:05

by tip-bot for Jack Steiner

[permalink] [raw]
Subject: Re: 2.6.14-rc1 breaks tg3 on ia64

On Sat, Sep 17, 2005 at 09:16:17AM -0700, Greg KH wrote:
> On Sat, Sep 17, 2005 at 11:59:14AM -0400, John W. Linville wrote:
> > On Sat, Sep 17, 2005 at 08:47:03AM -0700, Tony Luck wrote:
> > > > >So does reverting this patch solve the problem?
> > > >
> > > > I reversing
> > > > http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=064b53dbcc977dbf2753a67c2b8fc1c061d74f21,
> > > > which appears to be the latest version of this patch. There was a
> > > > patch reject in sparc64, but the common code was reverted. IA64 (SGI
> > > > Altix) with that patch reverted now boots 2.6.14-rc1.
> > >
> > > Anyone know anything more about this problem? I'm not seeing it
> > > on any of my systems ... so perhaps it only affects cards with a
> > > PCI bridge on them, or cards that haven't already been initialized
> > > by EFI.
> >
> > I posted a patch on Wednesday:
> >
> > http://www.ussg.iu.edu/hypermail/linux/kernel/0509.1/2193.html
> >
> > The original reporter (Keith Owens <[email protected]>) confirmed this
> > patch to fix the problem.
>
> Yes, and a number of people objected to that patch. Care to respond to
> them?

We are working on an SN-only workaround. No guarantee, but the person
working on it is optimistic that we can fix the problem in SN code
w/o making any generic changes. I should know more on Monday.

Long term, we are making SN ACPI compliant - or at leeast a lot closer.


>
> thanks,
>
> greg k-h
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Thanks

Jack Steiner ([email protected]) 651-683-5302
Principal Engineer SGI - Silicon Graphics, Inc.


2005-09-17 15:47:12

by Tony Luck

[permalink] [raw]
Subject: Re: 2.6.14-rc1 breaks tg3 on ia64

> >So does reverting this patch solve the problem?
>
> I reversing
> http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=064b53dbcc977dbf2753a67c2b8fc1c061d74f21,
> which appears to be the latest version of this patch. There was a
> patch reject in sparc64, but the common code was reverted. IA64 (SGI
> Altix) with that patch reverted now boots 2.6.14-rc1.

Anyone know anything more about this problem? I'm not seeing it
on any of my systems ... so perhaps it only affects cards with a
PCI bridge on them, or cards that haven't already been initialized
by EFI.

-Tony

2005-09-17 16:16:53

by Greg KH

[permalink] [raw]
Subject: Re: 2.6.14-rc1 breaks tg3 on ia64

On Sat, Sep 17, 2005 at 11:59:14AM -0400, John W. Linville wrote:
> On Sat, Sep 17, 2005 at 08:47:03AM -0700, Tony Luck wrote:
> > > >So does reverting this patch solve the problem?
> > >
> > > I reversing
> > > http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=064b53dbcc977dbf2753a67c2b8fc1c061d74f21,
> > > which appears to be the latest version of this patch. There was a
> > > patch reject in sparc64, but the common code was reverted. IA64 (SGI
> > > Altix) with that patch reverted now boots 2.6.14-rc1.
> >
> > Anyone know anything more about this problem? I'm not seeing it
> > on any of my systems ... so perhaps it only affects cards with a
> > PCI bridge on them, or cards that haven't already been initialized
> > by EFI.
>
> I posted a patch on Wednesday:
>
> http://www.ussg.iu.edu/hypermail/linux/kernel/0509.1/2193.html
>
> The original reporter (Keith Owens <[email protected]>) confirmed this
> patch to fix the problem.

Yes, and a number of people objected to that patch. Care to respond to
them?

thanks,

greg k-h

2005-09-17 15:59:32

by John W. Linville

[permalink] [raw]
Subject: Re: 2.6.14-rc1 breaks tg3 on ia64

On Sat, Sep 17, 2005 at 08:47:03AM -0700, Tony Luck wrote:
> > >So does reverting this patch solve the problem?
> >
> > I reversing
> > http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=064b53dbcc977dbf2753a67c2b8fc1c061d74f21,
> > which appears to be the latest version of this patch. There was a
> > patch reject in sparc64, but the common code was reverted. IA64 (SGI
> > Altix) with that patch reverted now boots 2.6.14-rc1.
>
> Anyone know anything more about this problem? I'm not seeing it
> on any of my systems ... so perhaps it only affects cards with a
> PCI bridge on them, or cards that haven't already been initialized
> by EFI.

I posted a patch on Wednesday:

http://www.ussg.iu.edu/hypermail/linux/kernel/0509.1/2193.html

The original reporter (Keith Owens <[email protected]>) confirmed this
patch to fix the problem.

Thanks,

John
--
John W. Linville
[email protected]

2005-09-18 06:23:13

by David Miller

[permalink] [raw]
Subject: Re: 2.6.14-rc1 breaks tg3 on ia64

From: "John W. Linville" <[email protected]>
Date: Sat, 17 Sep 2005 11:59:14 -0400

> I posted a patch on Wednesday:
>
> http://www.ussg.iu.edu/hypermail/linux/kernel/0509.1/2193.html
>
> The original reporter (Keith Owens <[email protected]>) confirmed this
> patch to fix the problem.

It fixes the problem, but it's a hack, and I, perhaps like Tony,
personally would like to know why the these IA64 systems break for
such a simple operation such as writing some base registers with
values we've probed already.

2005-09-18 06:34:01

by Tony Luck

[permalink] [raw]
Subject: Re: 2.6.14-rc1 breaks tg3 on ia64

> It fixes the problem, but it's a hack, and I, perhaps like Tony,
> personally would like to know why the these IA64 systems break for
> such a simple operation such as writing some base registers with
> values we've probed already.

It does sound odd[1] that you can't rewrite the BARs to the values that
they should already have. But I'm willing to wait to see what SGI's
solution that fixes this inside arch/ia64/sn/ looks like before making
any judgements.

-Tony

[1] odd == PCI spec violating?

2005-09-18 11:45:13

by tip-bot for Jack Steiner

[permalink] [raw]
Subject: Re: 2.6.14-rc1 breaks tg3 on ia64

On Sat, Sep 17, 2005 at 11:23:04PM -0700, David S. Miller wrote:
> From: "John W. Linville" <[email protected]>
> Date: Sat, 17 Sep 2005 11:59:14 -0400
>
> > I posted a patch on Wednesday:
> >
> > http://www.ussg.iu.edu/hypermail/linux/kernel/0509.1/2193.html
> >
> > The original reporter (Keith Owens <[email protected]>) confirmed this
> > patch to fix the problem.
>
> It fixes the problem, but it's a hack, and I, perhaps like Tony,
> personally would like to know why the these IA64 systems break for
> such a simple operation such as writing some base registers with
> values we've probed already.

Here is the mail from Mike Habeck (sgi) - he understands the problem
muck better than I do:

(from Mike....)
The problem is we (sgi) don't support the ACPI pci_window stuff that
is setup via ACPI (see: add_window() code). And as a result when
pci_restore_bars() is called to restore the BARs, instead of the BARs
getting "restored" they get wiped out. (pci_restore_bars() calls
pci_update_resource() which calls pcibios_resource_to_bus()... it
is that routine that is expecting the pci_window stuff being set up
from the ACPI path... I think... (I don't know much about how the ACPI
stuff works, John or Aaron could probably prove me right or wrong)...
So John (or Aaron) is it ACPI that is suppose to setup this pci_window
stuff?

I still question why this code path is taken... I don't know anything
about the PCI Power Management stuff, but we shouldn't be in any power
state that results in us needing our BARs restored. But I guess that
really isn't the issue since sooner or later something else will end
up using this pci_window stuff and we'll get burned them.

I suppose for a quick fix (to workaround this power management patch)
could be to set the PCI_PM_CTRL_NO_SOFT_RESET in the cards PM capability
down in PROM thus bypassing this "need_restore" code.

I suppose for a quick fix (to workaround this power management patch)
could be to set the PCI_PM_CTRL_NO_SOFT_RESET in the cards PM capability
down in PROM thus bypassing this "need_restore" code.

/* If we're in D3, force entire word to 0.
* This doesn't affect PME_Status, disables PME_En, and
* sets PowerState to 0.
*/
if (dev->current_state >= PCI_D3hot) {
if (!(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET))
need_restore = 1;
pmcsr = 0;
} else {
pmcsr &= ~PCI_PM_CTRL_STATE_MASK;
pmcsr |= state;
}

Or in the kernel sgi device fixup code change the current_state to D0?
It looks like it get's init'd to PCI_UNKNOWN (which is > PCI_D3hot)
I don't know... will investigate more tomorrow

-mike

--
Thanks

Jack Steiner ([email protected]) 651-683-5302
Principal Engineer SGI - Silicon Graphics, Inc.


2005-09-19 22:43:49

by Mike Habeck

[permalink] [raw]
Subject: Re: 2.6.14-rc1 breaks tg3 on ia64

Jack Steiner wrote:
> On Sat, Sep 17, 2005 at 09:16:17AM -0700, Greg KH wrote:
> > On Sat, Sep 17, 2005 at 11:59:14AM -0400, John W. Linville wrote:
> > > On Sat, Sep 17, 2005 at 08:47:03AM -0700, Tony Luck wrote:
> > > > > >So does reverting this patch solve the problem?
> > > > >
> > > > > I reversing
> > > > > http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=064b53dbcc977dbf2753a67c2b8fc1c061d74f21,
> > > > > which appears to be the latest version of this patch. There was a
> > > > > patch reject in sparc64, but the common code was reverted. IA64 (SGI
> > > > > Altix) with that patch reverted now boots 2.6.14-rc1.
> > > >
> > > > Anyone know anything more about this problem? I'm not seeing it
> > > > on any of my systems ... so perhaps it only affects cards with a
> > > > PCI bridge on them, or cards that haven't already been initialized
> > > > by EFI.
> > >
> > > I posted a patch on Wednesday:
> > >
> > > http://www.ussg.iu.edu/hypermail/linux/kernel/0509.1/2193.html
> > >
> > > The original reporter (Keith Owens <[email protected]>) confirmed this
> > > patch to fix the problem.
> >
> > Yes, and a number of people objected to that patch. Care to respond to
> > them?
>
> We are working on an SN-only workaround. No guarantee, but the person
> working on it is optimistic that we can fix the problem in SN code
> w/o making any generic changes. I should know more on Monday.
>
> Long term, we are making SN ACPI compliant - or at leeast a lot closer.


As Jack stated above, the issue is that sgi isn't fully ACPI
compliant yet. On a SN platform the pci_controller->pci_window's
are not initialized...so routines like pcibios_resource_to_bus()
and pcibios_bus_to_resource() will not work. It is this reason
why the pci_restore_bars() call, added to pci_set_power_state(),
is causing a problem on SN platforms.

SGIs long term goal is to become ACPI compliant, but until that
happens sgi need to somehow prevent things that rely on the
pci_controller->pci_window's from being called. John Linville
latest patch, the one that "restricts calling pci_restore_bars
unless the current state is PCI_UNKNOWN and the actual state of
the device is PCI_D3hot..." does that. I don't understand why
this is wrong (why is checking the physical/actual state of the
device unattractive? Why restore the BARs if it's unnecessary)?
Anyhow, if this patch gets pulled then sgi will needs to do
something quick and dirty in the SN specific code to prevent
pci_restore_bars() from getting called. Like hack
sn_pci_fixup_slot() to set the current_state to PCI_D0:

--- arch/ia64/sn/kernel/io_init.c 2005-09-19 17:11:11 -05:00
+++ arch/ia64/sn/kernel/io_init.c.org 2005-09-19 17:10:40 -05:00
@@ -329,8 +329,6 @@
SN_PCIDEV_INFO(dev)->pdi_sn_irq_info = NULL;
kfree(sn_irq_info);
}
-
- dev->current_state = PCI_D0;

And yes, I understand this is just putting a Band-Aid on the
current problem... until the ACPI support is in place to init
the controller's pci_windows we're just asking for future
problems.

So is hacking the SN specific code to initialize the device's
state to PCI_DO more attractive than keeping John Linville current
patch? As I said, I personally see nothing wrong with John
Linville current patch to check the actual state of the device.

-mike habeck

2005-09-20 01:14:41

by David Miller

[permalink] [raw]
Subject: Re: 2.6.14-rc1 breaks tg3 on ia64

From: Mike Habeck <[email protected]>
Date: Mon, 19 Sep 2005 17:43:29 -0500

> So is hacking the SN specific code to initialize the device's
> state to PCI_DO more attractive than keeping John Linville current
> patch? As I said, I personally see nothing wrong with John
> Linville current patch to check the actual state of the device.

I think keeping John's patch in is fine, especially if you guys
do have the longer term strategy of making pci_resource_to_bus()
work properly on this platform. Because somewhere down the line
something else will depend upon it being correct, and you'll
need to address this properly.

Actually, some PCI quirk fixups use pcibios_bus_to_resource(), but the
one's that do this are probably not relevant on ia64.