2009-10-11 10:07:30

by Yinghai Lu

[permalink] [raw]
Subject: Re: Boot-time crash from "PCI/x86: detect host bridge config space size w/o using quirks"

Brad Spengler wrote:
> I'm running an SGI 750 (Itanium 1), and have spent all day git-bisecting
> the patches between 2.6.29 and 2.6.30 (having tried all kernels from
> 2.6.26 to 2.6.31) and found that your patch:
> http://patchwork.kernel.org/patch/19486/
> causes the crash I observed, which I've taken a picture of here:
> http://yfrog.com/3oimg1457uj
>
> Let me know if you need any additional information or need me to test a
> fix you'll submit to LKML.

can you check if this debug patch?

YH

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 8105e32..4e6a896 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -903,10 +903,6 @@ int pci_cfg_space_size(struct pci_dev *dev)
u32 status;
u16 class;

- class = dev->class >> 8;
- if (class == PCI_CLASS_BRIDGE_HOST)
- return pci_cfg_space_size_ext(dev);
-
pos = pci_find_capability(dev, PCI_CAP_ID_EXP);
if (!pos) {
pos = pci_find_capability(dev, PCI_CAP_ID_PCIX);


2009-10-11 21:01:36

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH] pci: fix crash about old IA64 about pci_cfg_space_size



Brad reported that his SGI 750 (Itanium 1) will crash from 2.6.26, and bisected
to

| commit: dfadd9edff498d767008edc6b2a6e86a7a19934d
|
|Many host bridges support a 4k config space, so check them directy
|instead of using quirks to add them.
|
|We only need to do this extra check for host bridges at this point,
|because only host bridges are known to have extended address space
|without also having a PCI-X/PCI-E caps. Other devices with this
|property could be done with quirks (if there are any).
|
|As a bonus, we can remove the quirks for AMD host bridges with family
|10h and 11h since they're not needed any more.
|
|With this patch, we can get correct pci cfg size of new Intel CPUs/IOHs
|with host bridges.

it turns out for old IA64 without SAL 3.2 pci_cfg_space_size_ext will cause
problem and those system didn't have PCI-X and PCI-E, so that function was
not called before.

so don't call it in pci_cfg_sapce_size now.
later call it after pci ops code in ia64 could detect and use raw_pci_ops
and raw_pci_ext_ops according to SAL version.

Reported-by: Brad Spengler <[email protected]>
Bisected-by: Brad Spengler <[email protected]>
Tested-by: Brad Spengler <[email protected]>
Signed-off-by: Yinghai Lu <[email protected]>

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 8105e32..0c80a07 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -901,11 +901,24 @@ int pci_cfg_space_size(struct pci_dev *dev)
{
int pos;
u32 status;
+
+#ifndef CONFIG_IA64
+/*
+ * not use it with IA64 at this point
+ * ia64 SAL 3.2 before doesn't support ext space, so
+ * pci_read_config_dword(dev, 0x100, &status) would cause GP
+ * the problem is not triggered, because system with SAL 3.2 before
+ * doesn't include PCI-X 2.0 or PCI Express, so pci_cfg_spce_size_ext()
+ * is not called with them.
+ * need to extend ia64 to detect sal version, and pci_root_ops
+ * to use raw_pci_ops and raw_pci_ext_ops like x86
+ */
u16 class;

class = dev->class >> 8;
if (class == PCI_CLASS_BRIDGE_HOST)
return pci_cfg_space_size_ext(dev);
+#endif

pos = pci_find_capability(dev, PCI_CAP_ID_EXP);
if (!pos) {

2009-10-11 22:02:35

by Brad Spengler

[permalink] [raw]
Subject: Re: [PATCH] pci: fix crash about old IA64 about pci_cfg_space_size

> Brad reported that his SGI 750 (Itanium 1) will crash from 2.6.26, and
> bisected

2.6.26 to 2.6.29 are fine -- the commit responsible was introduced in 2.6.30,
so only those kernels >= 2.6.30 exhibit the problem. For searching
purposes (so that Debian unstable etc can backport the fix to their
2.6.30 kernel), the error causing the panic at boot was:

General Exception: IA-64 Reserved Register/Field fault (data access) 48

and occurred a few lines after "ACPI: PCI Root Bridge [PCI0] (0000:00)"

-Brad


Attachments:
(No filename) (513.00 B)
signature.asc (189.00 B)
Digital signature
Download all attachments

2009-10-11 22:18:01

by Yinghai Lu

[permalink] [raw]
Subject: [RFC PATCH] ia64/pci: add ext pci config detection for SAL 3.2 less

Yinghai Lu wrote:
>
> Brad reported that his SGI 750 (Itanium 1) will crash from 2.6.26, and bisected
> to
>
> | commit: dfadd9edff498d767008edc6b2a6e86a7a19934d
> |
> |Many host bridges support a 4k config space, so check them directy
> |instead of using quirks to add them.
> |
> |We only need to do this extra check for host bridges at this point,
> |because only host bridges are known to have extended address space
> |without also having a PCI-X/PCI-E caps. Other devices with this
> |property could be done with quirks (if there are any).
> |
> |As a bonus, we can remove the quirks for AMD host bridges with family
> |10h and 11h since they're not needed any more.
> |
> |With this patch, we can get correct pci cfg size of new Intel CPUs/IOHs
> |with host bridges.
>
> it turns out for old IA64 without SAL 3.2 pci_cfg_space_size_ext will cause
> problem and those system didn't have PCI-X and PCI-E, so that function was
> not called before.
>
> so don't call it in pci_cfg_sapce_size now.
> later call it after pci ops code in ia64 could detect and use raw_pci_ops
> and raw_pci_ext_ops according to SAL version.
>

for 2.6.33...

[RFC PATCH] ia64/pci: add ext pci config detection for SAL 3.2 less

Signed-off-by: Yinghai Lu <[email protected]>

diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c
index bf79155..12c6b70 100644
--- a/arch/ia64/pci/pci.c
+++ b/arch/ia64/pci/pci.c
@@ -44,7 +44,57 @@
#define PCI_SAL_EXT_ADDRESS(seg, bus, devfn, reg) \
(((u64) seg << 28) | (bus << 20) | (devfn << 12) | (reg))

-int raw_pci_read(unsigned int seg, unsigned int bus, unsigned int devfn,
+struct pci_raw_ops {
+ int (*read)(unsigned int domain, unsigned int bus, unsigned int devfn,
+ int reg, int len, u32 *val);
+ int (*write)(unsigned int domain, unsigned int bus, unsigned int devfn,
+ int reg, int len, u32 val);
+};
+
+static int sal_pci_read(unsigned int seg, unsigned int bus, unsigned int devfn,
+ int reg, int len, u32 *value)
+{
+ u64 addr, data = 0;
+ int mode, result;
+
+ if (!value || !seg || (bus > 255) || (devfn > 255) || (reg > 255))
+ return -EINVAL;
+
+ addr = PCI_SAL_ADDRESS(seg, bus, devfn, reg);
+ mode = 0;
+
+ result = ia64_sal_pci_config_read(addr, mode, len, &data);
+ if (result != 0)
+ return -EINVAL;
+
+ *value = (u32) data;
+ return 0;
+}
+
+static int sal_pci_write(unsigned int seg, unsigned int bus, unsigned int devfn,
+ int reg, int len, u32 value)
+{
+ u64 addr;
+ int mode, result;
+
+ if (!seg || (bus > 255) || (devfn > 255) || (reg > 255))
+ return -EINVAL;
+
+ addr = PCI_SAL_ADDRESS(seg, bus, devfn, reg);
+ mode = 0;
+
+ result = ia64_sal_pci_config_write(addr, mode, len, value);
+ if (result != 0)
+ return -EINVAL;
+ return 0;
+}
+
+static struct pci_raw_ops sal_pci_conf = {
+ .read = sal_pci_read,
+ .write = sal_pci_write,
+};
+
+static int sal_pci_ext_read(unsigned int seg, unsigned int bus, unsigned int devfn,
int reg, int len, u32 *value)
{
u64 addr, data = 0;
@@ -53,13 +103,9 @@ int raw_pci_read(unsigned int seg, unsigned int bus, unsigned int devfn,
if (!value || (seg > 65535) || (bus > 255) || (devfn > 255) || (reg > 4095))
return -EINVAL;

- if ((seg | reg) <= 255) {
- addr = PCI_SAL_ADDRESS(seg, bus, devfn, reg);
- mode = 0;
- } else {
- addr = PCI_SAL_EXT_ADDRESS(seg, bus, devfn, reg);
- mode = 1;
- }
+ addr = PCI_SAL_EXT_ADDRESS(seg, bus, devfn, reg);
+ mode = 1;
+
result = ia64_sal_pci_config_read(addr, mode, len, &data);
if (result != 0)
return -EINVAL;
@@ -68,7 +114,7 @@ int raw_pci_read(unsigned int seg, unsigned int bus, unsigned int devfn,
return 0;
}

-int raw_pci_write(unsigned int seg, unsigned int bus, unsigned int devfn,
+static int sal_pci_ext_write(unsigned int seg, unsigned int bus, unsigned int devfn,
int reg, int len, u32 value)
{
u64 addr;
@@ -77,19 +123,51 @@ int raw_pci_write(unsigned int seg, unsigned int bus, unsigned int devfn,
if ((seg > 65535) || (bus > 255) || (devfn > 255) || (reg > 4095))
return -EINVAL;

- if ((seg | reg) <= 255) {
- addr = PCI_SAL_ADDRESS(seg, bus, devfn, reg);
- mode = 0;
- } else {
- addr = PCI_SAL_EXT_ADDRESS(seg, bus, devfn, reg);
- mode = 1;
- }
+ addr = PCI_SAL_EXT_ADDRESS(seg, bus, devfn, reg);
+ mode = 1;
+
result = ia64_sal_pci_config_write(addr, mode, len, value);
if (result != 0)
return -EINVAL;
return 0;
}

+static struct pci_raw_ops sal_pci_ext_conf = {
+ .read = sal_pci_ext_read,
+ .write = sal_pci_ext_write,
+};
+
+static struct pci_raw_ops *raw_pci_ops = sal_pci_ops;
+static struct pci_raw_ops *raw_pci_ext_opsi = NULL;
+
+static __init int pci_arch_init(void)
+{
+ if (sal_revision >= SAL_VERSION_CODE(3,2))
+ raw_pci_ext_ops = sal_pci_ext_ops;
+}
+
+arch_initcall(pci_arch_init);
+
+int raw_pci_read(unsigned int seg, unsigned int bus, unsigned int devfn,
+ int reg, int len, u32 *value)
+{
+ if (seg == 0 && reg < 256 && raw_pci_ops)
+ return raw_pci_ops->read(seg, bus, devfn, reg, len, val);
+ if (raw_pci_ext_ops)
+ return raw_pci_ext_ops->read(seg, bus, devfn, reg, len, val);
+ return -EINVAL;
+}
+
+int raw_pci_write(unsigned int seg, unsigned int bus, unsigned int devfn,
+ int reg, int len, u32 value)
+{
+ if (seg == 0 && reg < 256 && raw_pci_ops)
+ return raw_pci_ops->write(seg, bus, devfn, reg, len, val);
+ if (raw_pci_ext_ops)
+ return raw_pci_ext_ops->write(seg, bus, devfn, reg, len, val);
+ return -EINVAL;
+}
+
static int pci_read(struct pci_bus *bus, unsigned int devfn, int where,
int size, u32 *value)
{

2009-10-11 23:11:21

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] pci: fix crash about old IA64 about pci_cfg_space_size

On Sun, Oct 11, 2009 at 05:32:45PM -0400, Brad Spengler wrote:
> > Brad reported that his SGI 750 (Itanium 1) will crash from 2.6.26, and
> > bisected
>
> 2.6.26 to 2.6.29 are fine -- the commit responsible was introduced in 2.6.30,
> so only those kernels >= 2.6.30 exhibit the problem. For searching
> purposes (so that Debian unstable etc can backport the fix to their
> 2.6.30 kernel), the error causing the panic at boot was:
>
> General Exception: IA-64 Reserved Register/Field fault (data access) 48
>
> and occurred a few lines after "ACPI: PCI Root Bridge [PCI0] (0000:00)"

The SAL config code _used_ to use SAL 3.2 if the SAL revision were >3.1.
Unfortunately, this broke a few HP machines where HP were reporting
a SAL revision in the 60-80 range (a bug fixed in later firmware, but
some people had prototype machines which couldn't be upgraded to newer
firmware ...)

So how about we go back to adding that check ... this does require that
SGI's 750 machine reports its SAL revision correctly. Could you send
the dmesg?

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2009-10-12 13:57:12

by Brad Spengler

[permalink] [raw]
Subject: Re: [PATCH] pci: fix crash about old IA64 about pci_cfg_space_size

> So how about we go back to adding that check ... this does require that
> SGI's 750 machine reports its SAL revision correctly. Could you send
> the dmesg?

The dmesg of the working kernel:
http://grsecurity.net/~spender/dmesg.txt
The output of lspci -vvxxxx:
http://grsecurity.net/~spender/lspci1.txt
The output of lspci -tvnn:
http://grsecurity.net/~spender/lspci2.txt

I don't have a serial console hooked up to the machine, and I'm not able
to scroll up/down after the panic, so I only have the following picture
from the crash. I modified the code that spews out registers on a panic
so that the context of the panic could be seen. The stack trace didn't
show anything useful, only repeats of an address similar to IP (for
which no symbols exist).
http://img132.yfrog.com/i/img1457u.jpg/

-Brad


Attachments:
(No filename) (810.00 B)
signature.asc (189.00 B)
Digital signature
Download all attachments

2009-10-12 14:25:12

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH] ia64: Don't call SAL < 3.2 for extended config space

On Mon, Oct 12, 2009 at 09:56:28AM -0400, Brad Spengler wrote:
> > So how about we go back to adding that check ... this does require that
> > SGI's 750 machine reports its SAL revision correctly. Could you send
> > the dmesg?
>
> The dmesg of the working kernel:
> http://grsecurity.net/~spender/dmesg.txt

Thanks, that's reporting SAL 3.0, so it will be correctly caught by
this patch:

---

From: Matthew Wilcox <[email protected]>
Subject: Require SAL 3.2 in order to do extended config space ops

We had assumed that SAL firmware would return an error if it didn't
understand extended config space. Unfortunately, the SAL on the SGI 750
doesn't do that, it panics the machine. So, condition the extended PCI
config space accesses on SAL revision 3.2.

Signed-off-by: Matthew Wilcox <[email protected]>

diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c
index 7de76dd..61363cc 100644
--- a/arch/ia64/pci/pci.c
+++ b/arch/ia64/pci/pci.c
@@ -56,10 +56,13 @@ int raw_pci_read(unsigned int seg, unsigned int bus, unsigned int devfn,
if ((seg | reg) <= 255) {
addr = PCI_SAL_ADDRESS(seg, bus, devfn, reg);
mode = 0;
- } else {
+ } else if (sal_revision >= SAL_VERSION_CODE(3,2))
addr = PCI_SAL_EXT_ADDRESS(seg, bus, devfn, reg);
mode = 1;
+ } else {
+ return -EINVAL;
}
+
result = ia64_sal_pci_config_read(addr, mode, len, &data);
if (result != 0)
return -EINVAL;
@@ -80,9 +83,11 @@ int raw_pci_write(unsigned int seg, unsigned int bus, unsigned int devfn,
if ((seg | reg) <= 255) {
addr = PCI_SAL_ADDRESS(seg, bus, devfn, reg);
mode = 0;
- } else {
+ } else if (sal_revision >= SAL_VERSION_CODE(3,2))
addr = PCI_SAL_EXT_ADDRESS(seg, bus, devfn, reg);
mode = 1;
+ } else {
+ return -EINVAL;
}
result = ia64_sal_pci_config_write(addr, mode, len, value);
if (result != 0)

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2009-10-13 03:34:27

by Brad Spengler

[permalink] [raw]
Subject: Re: [PATCH] ia64: Don't call SAL < 3.2 for extended config space

> + } else if (sal_revision >= SAL_VERSION_CODE(3,2))

Looks like a missing ending { here ^
> + } else if (sal_revision >= SAL_VERSION_CODE(3,2))
and here ^

I'll test it to confirm it fixes the problem.

-Brad


Attachments:
(No filename) (211.00 B)
signature.asc (189.00 B)
Digital signature
Download all attachments

2009-10-13 04:58:48

by Brad Spengler

[permalink] [raw]
Subject: Re: [PATCH] ia64: Don't call SAL < 3.2 for extended config space

I've confirmed that the below patch (with the syntax fixes already
mentioned) resolves the issue on the SGI 750.

-Brad

> From: Matthew Wilcox <[email protected]>
> Subject: Require SAL 3.2 in order to do extended config space ops
>
> We had assumed that SAL firmware would return an error if it didn't
> understand extended config space. Unfortunately, the SAL on the SGI 750
> doesn't do that, it panics the machine. So, condition the extended PCI
> config space accesses on SAL revision 3.2.
>
> Signed-off-by: Matthew Wilcox <[email protected]>
>
> diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c
> index 7de76dd..61363cc 100644
> --- a/arch/ia64/pci/pci.c
> +++ b/arch/ia64/pci/pci.c
> @@ -56,10 +56,13 @@ int raw_pci_read(unsigned int seg, unsigned int bus, unsigned int devfn,
> if ((seg | reg) <= 255) {
> addr = PCI_SAL_ADDRESS(seg, bus, devfn, reg);
> mode = 0;
> - } else {
> + } else if (sal_revision >= SAL_VERSION_CODE(3,2))
> addr = PCI_SAL_EXT_ADDRESS(seg, bus, devfn, reg);
> mode = 1;
> + } else {
> + return -EINVAL;
> }
> +
> result = ia64_sal_pci_config_read(addr, mode, len, &data);
> if (result != 0)
> return -EINVAL;
> @@ -80,9 +83,11 @@ int raw_pci_write(unsigned int seg, unsigned int bus, unsigned int devfn,
> if ((seg | reg) <= 255) {
> addr = PCI_SAL_ADDRESS(seg, bus, devfn, reg);
> mode = 0;
> - } else {
> + } else if (sal_revision >= SAL_VERSION_CODE(3,2))
> addr = PCI_SAL_EXT_ADDRESS(seg, bus, devfn, reg);
> mode = 1;
> + } else {
> + return -EINVAL;
> }
> result = ia64_sal_pci_config_write(addr, mode, len, value);
> if (result != 0)
>
> --
> Matthew Wilcox Intel Open Source Technology Centre
> "Bill, look, we understand that you're interested in selling us this
> operating system, but compare it to ours. We can't possibly take such
> a retrograde step."


Attachments:
(No filename) (1.83 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2009-11-04 17:16:09

by Jesse Barnes

[permalink] [raw]
Subject: Re: [PATCH] ia64: Don't call SAL < 3.2 for extended config space

Ah, the SGI 750. Brings back memories.

I've dropped the ball on this one though; maybe Tony already picked it
up? Tony?

Thanks,
Jesse

On Tue, 13 Oct 2009 00:58:08 -0400
[email protected] (Brad Spengler) wrote:

> I've confirmed that the below patch (with the syntax fixes already
> mentioned) resolves the issue on the SGI 750.
>
> -Brad
>
> > From: Matthew Wilcox <[email protected]>
> > Subject: Require SAL 3.2 in order to do extended config space ops
> >
> > We had assumed that SAL firmware would return an error if it didn't
> > understand extended config space. Unfortunately, the SAL on the
> > SGI 750 doesn't do that, it panics the machine. So, condition the
> > extended PCI config space accesses on SAL revision 3.2.
> >
> > Signed-off-by: Matthew Wilcox <[email protected]>
> >
> > diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c
> > index 7de76dd..61363cc 100644
> > --- a/arch/ia64/pci/pci.c
> > +++ b/arch/ia64/pci/pci.c
> > @@ -56,10 +56,13 @@ int raw_pci_read(unsigned int seg, unsigned int
> > bus, unsigned int devfn, if ((seg | reg) <= 255) {
> > addr = PCI_SAL_ADDRESS(seg, bus, devfn, reg);
> > mode = 0;
> > - } else {
> > + } else if (sal_revision >= SAL_VERSION_CODE(3,2))
> > addr = PCI_SAL_EXT_ADDRESS(seg, bus, devfn, reg);
> > mode = 1;
> > + } else {
> > + return -EINVAL;
> > }
> > +
> > result = ia64_sal_pci_config_read(addr, mode, len, &data);
> > if (result != 0)
> > return -EINVAL;
> > @@ -80,9 +83,11 @@ int raw_pci_write(unsigned int seg, unsigned int
> > bus, unsigned int devfn, if ((seg | reg) <= 255) {
> > addr = PCI_SAL_ADDRESS(seg, bus, devfn, reg);
> > mode = 0;
> > - } else {
> > + } else if (sal_revision >= SAL_VERSION_CODE(3,2))
> > addr = PCI_SAL_EXT_ADDRESS(seg, bus, devfn, reg);
> > mode = 1;
> > + } else {
> > + return -EINVAL;
> > }
> > result = ia64_sal_pci_config_write(addr, mode, len, value);
> > if (result != 0)
> >
> > --
> > Matthew Wilcox Intel Open Source
> > Technology Centre "Bill, look, we understand that you're interested
> > in selling us this operating system, but compare it to ours. We
> > can't possibly take such a retrograde step."


--
Jesse Barnes, Intel Open Source Technology Center

2009-11-04 17:20:28

by Tony Luck

[permalink] [raw]
Subject: RE: [PATCH] ia64: Don't call SAL < 3.2 for extended config space

> I've dropped the ball on this one though; maybe Tony already picked it
> up? Tony?

Yes. It went in to Linus' tree with the git pull I sent on Monday (just
prior to -rc6).

Commit: adcd74034 ...

-Tony

2009-11-04 17:22:18

by Jesse Barnes

[permalink] [raw]
Subject: Re: [PATCH] ia64: Don't call SAL < 3.2 for extended config space

On Wed, 4 Nov 2009 09:20:13 -0800
"Luck, Tony" <[email protected]> wrote:

> > I've dropped the ball on this one though; maybe Tony already picked
> > it up? Tony?
>
> Yes. It went in to Linus' tree with the git pull I sent on Monday
> (just prior to -rc6).
>
> Commit: adcd74034 ...

Great, thanks for confirming.

--
Jesse Barnes, Intel Open Source Technology Center