2006-10-02 17:06:19

by Roland Dreier

[permalink] [raw]
Subject: The change "PCI: assign ioapic resource at hotplug" breaks my system

The change "PCI: assign ioapic resource at hotplug" (commit
23186279658cea6d42a050400d3e79c56cb459b4 in Linus's tree) makes
networking stop working on my system (SuperMicro H8QC8 with four
dual-core Opteron 885 CPUs). In particular, the on-board NIC stops
working, probably because it gets assigned the wrong IRQ (225 in the
non-working case, 217 in the working case)

With that patch applied, e1000 doesn't work. Reverting just that
patch (shown below) from Linus's latest tree fixes things for me.

Please let me know what other debug information might be useful.

Thanks,
Roland

Here's the patch I revert. I'm not sure what it's trying to do, or
why it breaks my systems. But anyway, reverting this fixes things for
me:

Author: Satoru Takeuchi <[email protected]>
Date: Tue Sep 12 10:21:44 2006 -0700

PCI: assign ioapic resource at hotplug

We need to assign resources to ioapics being hot-added. This patch
changes pbus_assign_resources_sorted() to assign resources if the
ioapic has no assigned resources.

Signed-off-by: Kenji Kaneshige <[email protected]>
Signed-off-by: MUNEDA Takahiro <[email protected]>
Signed-off-by: Satoru Takeuchi <[email protected]>
Signed-off-by: Kristen Carlson Accardi <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 47c1071..5440491 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -55,12 +55,19 @@ pbus_assign_resources_sorted(struct pci_
list_for_each_entry(dev, &bus->devices, bus_list) {
u16 class = dev->class >> 8;

- /* Don't touch classless devices or host bridges or ioapics. */
+ /* Don't touch classless devices or host bridges. */
if (class == PCI_CLASS_NOT_DEFINED ||
- class == PCI_CLASS_BRIDGE_HOST ||
- class == PCI_CLASS_SYSTEM_PIC)
+ class == PCI_CLASS_BRIDGE_HOST)
continue;

+ /* Don't touch ioapics if it has the assigned resources. */
+ if (class == PCI_CLASS_SYSTEM_PIC) {
+ res = &dev->resource[0];
+ if (res[0].start || res[1].start || res[2].start ||
+ res[3].start || res[4].start || res[5].start)
+ continue;
+ }
+
pdev_sort_resources(dev, &head);
}


2006-10-02 17:28:37

by Roland Dreier

[permalink] [raw]
Subject: Re: The change "PCI: assign ioapic resource at hotplug" breaks my system

One piece of information that might be useful is that lspci shows a
difference in the configuration of the PCI bridge IOAPIC. In the good
(working) case, the IOAPIC memory region 0 is disabled, while in the
bad case it is enabled.

Here are full details: first, the good/working case:

04:01.1 PIC: Advanced Micro Devices [AMD] AMD-8132 PCI-X IOAPIC (rev 12) (prog-if 10 [IO-APIC])
Subsystem: Advanced Micro Devices [AMD] AMD-8132 PCI-X IOAPIC
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Region 0: Memory at <ignored> (64-bit, non-prefetchable)
00: 22 10 59 74 06 00 00 02 12 10 00 08 00 00 00 00
10: 04 e0 af fe 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 22 10 59 74
30: 00 00 00 00 50 00 00 00 00 00 00 00 00 00 00 00
40: 00 00 00 00 03 00 00 00 04 e0 af fe 00 00 00 00

Then the bad (non-working e1000) case:

04:01.1 PIC: Advanced Micro Devices [AMD] AMD-8132 PCI-X IOAPIC (rev 12) (prog-if 10 [IO-APIC])
Subsystem: Advanced Micro Devices [AMD] AMD-8132 PCI-X IOAPIC
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Region 0: Memory at e2100000 (64-bit, non-prefetchable) [size=4K]
00: 22 10 59 74 06 00 00 02 12 10 00 08 00 00 00 00
10: 04 00 10 e2 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 22 10 59 74
30: 00 00 00 00 50 00 00 00 00 00 00 00 00 00 00 00
40: 00 00 00 00 03 00 00 00 04 00 10 e2 00 00 00 00

I have no idea whether there's any significance to this.

- R.

2006-10-03 12:37:54

by Kenji Kaneshige

[permalink] [raw]
Subject: Re: The change "PCI: assign ioapic resource at hotplug" breaks my system

Roland Dreier wrote:
> The change "PCI: assign ioapic resource at hotplug" (commit
> 23186279658cea6d42a050400d3e79c56cb459b4 in Linus's tree) makes
> networking stop working on my system (SuperMicro H8QC8 with four
> dual-core Opteron 885 CPUs). In particular, the on-board NIC stops
> working, probably because it gets assigned the wrong IRQ (225 in the
> non-working case, 217 in the working case)
>
> With that patch applied, e1000 doesn't work. Reverting just that
> patch (shown below) from Linus's latest tree fixes things for me.
>
> Please let me know what other debug information might be useful.
>

The cause of this problem might be an wrong assumption that the 'start'
member of resource structure for ioapic device has non-zero value if the
resources are assigned by firmware. The 'start' member of ioapic device
seems not to be set even though the resources were actually assigned to
ioapic devices by firmware.

I made a patch to fix this problem against 2.6.18-git18. This patch
checks command register instead of checking 'start' member to see if
the ioapic is already enabled by firmware. Unfortunately, I don't have
any system to reproduce this problem. Could you please try it and let
me know whether the problem is fixed? If the patch below fixes the
problem, I'll resend it with description and Signed-off-by.

Thanks,
Kenji Kaneshige

---
drivers/pci/setup-bus.c | 10 +++++-----
1 files changed, 5 insertions(+), 5 deletions(-)

Index: linux-2.6.18-git18/drivers/pci/setup-bus.c
===================================================================
--- linux-2.6.18-git18.orig/drivers/pci/setup-bus.c 2006-10-03 13:26:49.000000000 +0900
+++ linux-2.6.18-git18/drivers/pci/setup-bus.c 2006-10-03 13:35:00.000000000 +0900
@@ -55,16 +55,16 @@
list_for_each_entry(dev, &bus->devices, bus_list) {
u16 class = dev->class >> 8;

- /* Don't touch classless devices or host bridges. */
+ /* Don't touch classless devices or host bridges or ioapics. */
if (class == PCI_CLASS_NOT_DEFINED ||
class == PCI_CLASS_BRIDGE_HOST)
continue;

- /* Don't touch ioapics if it has the assigned resources. */
+ /* Don't touch ioapic devices already enabled by firmware */
if (class == PCI_CLASS_SYSTEM_PIC) {
- res = &dev->resource[0];
- if (res[0].start || res[1].start || res[2].start ||
- res[3].start || res[4].start || res[5].start)
+ u16 command;
+ pci_read_config_word(dev, PCI_COMMAND, &command);
+ if (command & (PCI_COMMAND_IO | PCI_COMMAND_MEMORY))
continue;
}

2006-10-03 17:52:25

by Roland Dreier

[permalink] [raw]
Subject: Re: The change "PCI: assign ioapic resource at hotplug" breaks my system

Kenji> The cause of this problem might be an wrong assumption that
Kenji> the 'start' member of resource structure for ioapic device
Kenji> has non-zero value if the resources are assigned by
Kenji> firmware. The 'start' member of ioapic device seems not to
Kenji> be set even though the resources were actually assigned to
Kenji> ioapic devices by firmware.

Kenji> I made a patch to fix this problem against
Kenji> 2.6.18-git18. This patch checks command register instead of
Kenji> checking 'start' member to see if the ioapic is already
Kenji> enabled by firmware. Unfortunately, I don't have any system
Kenji> to reproduce this problem. Could you please try it and let
Kenji> me know whether the problem is fixed? If the patch below
Kenji> fixes the problem, I'll resend it with description and
Kenji> Signed-off-by.

Yes, applying this patch makes everything work on the same SuperMicro
motherboard that breaks with Linus's current tree. Assuming this
doesn't break anything else, I think this should go upstream.

Thanks,
Roland

2006-10-03 18:02:39

by Stephen Hemminger

[permalink] [raw]
Subject: Re: The change "PCI: assign ioapic resource at hotplug" breaks my system

On Tue, 03 Oct 2006 21:32:54 +0900
Kenji Kaneshige <[email protected]> wrote:

> Roland Dreier wrote:
> > The change "PCI: assign ioapic resource at hotplug" (commit
> > 23186279658cea6d42a050400d3e79c56cb459b4 in Linus's tree) makes
> > networking stop working on my system (SuperMicro H8QC8 with four
> > dual-core Opteron 885 CPUs). In particular, the on-board NIC stops
> > working, probably because it gets assigned the wrong IRQ (225 in the
> > non-working case, 217 in the working case)
> >
> > With that patch applied, e1000 doesn't work. Reverting just that
> > patch (shown below) from Linus's latest tree fixes things for me.
> >
> > Please let me know what other debug information might be useful.
> >
>
> The cause of this problem might be an wrong assumption that the 'start'
> member of resource structure for ioapic device has non-zero value if the
> resources are assigned by firmware. The 'start' member of ioapic device
> seems not to be set even though the resources were actually assigned to
> ioapic devices by firmware.
>
> I made a patch to fix this problem against 2.6.18-git18. This patch
> checks command register instead of checking 'start' member to see if
> the ioapic is already enabled by firmware. Unfortunately, I don't have
> any system to reproduce this problem. Could you please try it and let
> me know whether the problem is fixed? If the patch below fixes the
> problem, I'll resend it with description and Signed-off-by.
>
> Thanks,
> Kenji Kaneshige
>

This also fixes my problems with the built in tg3 on the dual CPU Opteron
IBM workstation.

2006-10-04 05:51:18

by Kenji Kaneshige

[permalink] [raw]
Subject: Re: The change "PCI: assign ioapic resource at hotplug" breaks my system

Stephen, Roland,

Thank you very much for testing the patch.

Thanks,
Kenji Kaneshige


Stephen Hemminger wrote:
> On Tue, 03 Oct 2006 21:32:54 +0900
> Kenji Kaneshige <[email protected]> wrote:
>
>> Roland Dreier wrote:
>>> The change "PCI: assign ioapic resource at hotplug" (commit
>>> 23186279658cea6d42a050400d3e79c56cb459b4 in Linus's tree) makes
>>> networking stop working on my system (SuperMicro H8QC8 with four
>>> dual-core Opteron 885 CPUs). In particular, the on-board NIC stops
>>> working, probably because it gets assigned the wrong IRQ (225 in the
>>> non-working case, 217 in the working case)
>>>
>>> With that patch applied, e1000 doesn't work. Reverting just that
>>> patch (shown below) from Linus's latest tree fixes things for me.
>>>
>>> Please let me know what other debug information might be useful.
>>>
>> The cause of this problem might be an wrong assumption that the 'start'
>> member of resource structure for ioapic device has non-zero value if the
>> resources are assigned by firmware. The 'start' member of ioapic device
>> seems not to be set even though the resources were actually assigned to
>> ioapic devices by firmware.
>>
>> I made a patch to fix this problem against 2.6.18-git18. This patch
>> checks command register instead of checking 'start' member to see if
>> the ioapic is already enabled by firmware. Unfortunately, I don't have
>> any system to reproduce this problem. Could you please try it and let
>> me know whether the problem is fixed? If the patch below fixes the
>> problem, I'll resend it with description and Signed-off-by.
>>
>> Thanks,
>> Kenji Kaneshige
>>
>
> This also fixes my problems with the built in tg3 on the dual CPU Opteron
> IBM workstation.
>
>