2010-07-14 17:02:09

by Ben Greear

[permalink] [raw]
Subject: [pci] pci/mrst: Detect loops when reading fixed BAR cap.

The mrst logic introduced in 2.6.33-rc8 in commit
a712ffbc199849364c46e9112b93b66de08e2c26 causes boot
to hang on at least this platform:

Intel E5405 CPU
System Information
Manufacturer: Supermicro
Product Name: X7DBU

The cause of the hang is that pci_cap points to itself
as the next capability, putting the fixed_bar_cap into
an endless loop.

This patch detects the loop, prints a warning, and
continues on with useful work. This strategy was
suggested by Robert Hancock <[email protected]>

This should be a candidate for 2.6.34.y as well.

Signed-off-by: Ben Greear <[email protected]>
---
:100644 100644 1cdc02c... 9535ba9... M arch/x86/pci/mrst.c
arch/x86/pci/mrst.c | 11 +++++++++++
1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/arch/x86/pci/mrst.c b/arch/x86/pci/mrst.c
index 1cdc02c..9535ba9 100644
--- a/arch/x86/pci/mrst.c
+++ b/arch/x86/pci/mrst.c
@@ -76,6 +76,17 @@ static int fixed_bar_cap(struct pci_bus *bus, unsigned int devfn)
return pos;
}

+ if ((pcie_cap >> 20) == 0)
+ break;
+
+ if ((pcie_cap >> 20) <= pos) {
+ printk(KERN_WARNING "WARNING: mrst: detected loop"
+ " when searching for fixed BAR cap, previous"
+ " position: 0x%x new position: 0x%x"
+ " bus-number: %i devfn: %i\n",
+ pos, pcie_cap >> 20, bus->number, devfn);
+ break;
+ }
pos = pcie_cap >> 20;
}

--
1.6.2.5


2010-07-16 17:43:11

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [pci] pci/mrst: Detect loops when reading fixed BAR cap.

On Wednesday, July 14, 2010 11:02:03 am Ben Greear wrote:
> The mrst logic introduced in 2.6.33-rc8 in commit
> a712ffbc199849364c46e9112b93b66de08e2c26 causes boot
> to hang on at least this platform:
>
> Intel E5405 CPU
> System Information
> Manufacturer: Supermicro
> Product Name: X7DBU
>
> The cause of the hang is that pci_cap points to itself
> as the next capability, putting the fixed_bar_cap into
> an endless loop.
>
> This patch detects the loop, prints a warning, and
> continues on with useful work. This strategy was
> suggested by Robert Hancock <[email protected]>
>
> This should be a candidate for 2.6.34.y as well.
>
> Signed-off-by: Ben Greear <[email protected]>
> ---
> :100644 100644 1cdc02c... 9535ba9... M arch/x86/pci/mrst.c
> arch/x86/pci/mrst.c | 11 +++++++++++
> 1 files changed, 11 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/pci/mrst.c b/arch/x86/pci/mrst.c
> index 1cdc02c..9535ba9 100644
> --- a/arch/x86/pci/mrst.c
> +++ b/arch/x86/pci/mrst.c
> @@ -76,6 +76,17 @@ static int fixed_bar_cap(struct pci_bus *bus, unsigned int devfn)
> return pos;
> }
>
> + if ((pcie_cap >> 20) == 0)
> + break;
> +
> + if ((pcie_cap >> 20) <= pos) {
> + printk(KERN_WARNING "WARNING: mrst: detected loop"
> + " when searching for fixed BAR cap, previous"
> + " position: 0x%x new position: 0x%x"
> + " bus-number: %i devfn: %i\n",
> + pos, pcie_cap >> 20, bus->number, devfn);

Can you use dev_warn() here to print the device info in the
standard way?

Bjorn

2010-07-16 17:47:28

by Ben Greear

[permalink] [raw]
Subject: Re: [pci] pci/mrst: Detect loops when reading fixed BAR cap.

On 07/16/2010 10:43 AM, Bjorn Helgaas wrote:
> On Wednesday, July 14, 2010 11:02:03 am Ben Greear wrote:
>> The mrst logic introduced in 2.6.33-rc8 in commit
>> a712ffbc199849364c46e9112b93b66de08e2c26 causes boot
>> to hang on at least this platform:
>>
>> Intel E5405 CPU
>> System Information
>> Manufacturer: Supermicro
>> Product Name: X7DBU
>>
>> The cause of the hang is that pci_cap points to itself
>> as the next capability, putting the fixed_bar_cap into
>> an endless loop.
>>
>> This patch detects the loop, prints a warning, and
>> continues on with useful work. This strategy was
>> suggested by Robert Hancock<[email protected]>
>>
>> This should be a candidate for 2.6.34.y as well.
>>
>> Signed-off-by: Ben Greear<[email protected]>
>> ---
>> :100644 100644 1cdc02c... 9535ba9... M arch/x86/pci/mrst.c
>> arch/x86/pci/mrst.c | 11 +++++++++++
>> 1 files changed, 11 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch/x86/pci/mrst.c b/arch/x86/pci/mrst.c
>> index 1cdc02c..9535ba9 100644
>> --- a/arch/x86/pci/mrst.c
>> +++ b/arch/x86/pci/mrst.c
>> @@ -76,6 +76,17 @@ static int fixed_bar_cap(struct pci_bus *bus, unsigned int devfn)
>> return pos;
>> }
>>
>> + if ((pcie_cap>> 20) == 0)
>> + break;
>> +
>> + if ((pcie_cap>> 20)<= pos) {
>> + printk(KERN_WARNING "WARNING: mrst: detected loop"
>> + " when searching for fixed BAR cap, previous"
>> + " position: 0x%x new position: 0x%x"
>> + " bus-number: %i devfn: %i\n",
>> + pos, pcie_cap>> 20, bus->number, devfn);
>
> Can you use dev_warn() here to print the device info in the
> standard way?

There seems to be a different way to do this. Some of the folks who
understand this better are working on a patch (I tested something based
on HPA's suggestions and it fixed my problem at least.)

Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com