2012-08-08 22:17:38

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] x86, pci: Fix all early PCI scans to check the vendor ID first

From: Andi Kleen <[email protected]>

According to the Intel PCI experts it's not safe to check any
other field than vendor ID for 0xffff when doing PCI scans
to see if the device exists.

Several of the early PCI scans violated this. I changed
them all to always check the vendor ID first.

Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/kernel/aperture_64.c | 5 +++++
arch/x86/kernel/early-quirks.c | 3 +++
arch/x86/kernel/pci-calgary_64.c | 8 ++++++--
arch/x86/pci/early.c | 3 +++
drivers/firewire/init_ohci1394_dma.c | 3 +++
5 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
index d5fd66f..e1ca7cd 100644
--- a/arch/x86/kernel/aperture_64.c
+++ b/arch/x86/kernel/aperture_64.c
@@ -206,6 +206,11 @@ static u32 __init search_agp_bridge(u32 *order, int *valid_agp)
for (func = 0; func < 8; func++) {
u32 class, cap;
u8 type;
+
+ if (read_pci_config_16(bus, slot, func, PCI_VENDOR_ID)
+ == 0xffff)
+ continue;
+
class = read_pci_config(bus, slot, func,
PCI_CLASS_REVISION);
if (class == 0xffffffff)
diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c
index 3755ef4..f76b930 100644
--- a/arch/x86/kernel/early-quirks.c
+++ b/arch/x86/kernel/early-quirks.c
@@ -250,6 +250,9 @@ static int __init check_dev_quirk(int num, int slot, int func)

vendor = read_pci_config_16(num, slot, func, PCI_VENDOR_ID);

+ if (vendor == 0xffff)
+ return -1;
+
device = read_pci_config_16(num, slot, func, PCI_DEVICE_ID);

for (i = 0; early_qrk[i].f != NULL; i++) {
diff --git a/arch/x86/kernel/pci-calgary_64.c b/arch/x86/kernel/pci-calgary_64.c
index 299d493..05798a0 100644
--- a/arch/x86/kernel/pci-calgary_64.c
+++ b/arch/x86/kernel/pci-calgary_64.c
@@ -1324,8 +1324,9 @@ static void __init get_tce_space_from_tar(void)
unsigned short pci_device;
u32 val;

- val = read_pci_config(bus, 0, 0, 0);
- pci_device = (val & 0xFFFF0000) >> 16;
+ if (read_pci_config_16(bus, 0, 0, PCI_VENDOR_ID) == 0xffff)
+ continue;
+ pci_device = read_pci_config_16(bus, 0, 0, PCI_DEVICE_ID);

if (!is_cal_pci_dev(pci_device))
continue;
@@ -1426,6 +1427,9 @@ int __init detect_calgary(void)
unsigned short pci_device;
u32 val;

+ if (read_pci_config_16(bus, 0, 0, PCI_VENDOR_ID) == 0xffff)
+ continue;
+
val = read_pci_config(bus, 0, 0, 0);
pci_device = (val & 0xFFFF0000) >> 16;

diff --git a/arch/x86/pci/early.c b/arch/x86/pci/early.c
index d1067d5..4fb6847 100644
--- a/arch/x86/pci/early.c
+++ b/arch/x86/pci/early.c
@@ -91,6 +91,9 @@ void early_dump_pci_devices(void)
u32 class;
u8 type;

+ if (read_pci_config_16(bus, slot, func, PCI_VENDOR_ID) == 0xffff)
+ continue;
+
class = read_pci_config(bus, slot, func,
PCI_CLASS_REVISION);
if (class == 0xffffffff)
diff --git a/drivers/firewire/init_ohci1394_dma.c b/drivers/firewire/init_ohci1394_dma.c
index a9a347a..dd3bd84 100644
--- a/drivers/firewire/init_ohci1394_dma.c
+++ b/drivers/firewire/init_ohci1394_dma.c
@@ -279,6 +279,9 @@ void __init init_ohci1394_dma_on_all_controllers(void)
for (num = 0; num < 32; num++) {
for (slot = 0; slot < 32; slot++) {
for (func = 0; func < 8; func++) {
+ if (read_pci_config_16(num, slot, func, PCI_VENDOR_ID) == 0xffff)
+ continue;
+
class = read_pci_config(num, slot, func,
PCI_CLASS_REVISION);
if (class == 0xffffffff)
--
1.7.7.6


2012-08-09 22:35:01

by Dall, Elizabeth J

[permalink] [raw]
Subject: Re: [PATCH] x86, pci: Fix all early PCI scans to check the vendor ID first


Hi Andi,

On Wed, 2012-08-08 at 15:17 -0700, Andi Kleen wrote:
> From: Andi Kleen <[email protected]>
>
> According to the Intel PCI experts it's not safe to check any
> other field than vendor ID for 0xffff when doing PCI scans
> to see if the device exists.
>
> Several of the early PCI scans violated this. I changed
> them all to always check the vendor ID first.
>
> Signed-off-by: Andi Kleen <[email protected]>
> ---
> arch/x86/kernel/aperture_64.c | 5 +++++
> arch/x86/kernel/early-quirks.c | 3 +++
> arch/x86/kernel/pci-calgary_64.c | 8 ++++++--
> arch/x86/pci/early.c | 3 +++
> drivers/firewire/init_ohci1394_dma.c | 3 +++
> 5 files changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
> index d5fd66f..e1ca7cd 100644
> --- a/arch/x86/kernel/aperture_64.c
> +++ b/arch/x86/kernel/aperture_64.c
> @@ -206,6 +206,11 @@ static u32 __init search_agp_bridge(u32 *order, int *valid_agp)
> for (func = 0; func < 8; func++) {
> u32 class, cap;
> u8 type;
> +
> + if (read_pci_config_16(bus, slot, func, PCI_VENDOR_ID)
> + == 0xffff)
> + continue;

I thought this should be a break instead of a continue since the code
does a break if the class is 0xffffffff. If the function does not have a
valid VENDOR_ID, then the remaining function numbers do not have to be
scanned because functions are required to be implemented in order (no
skipping a function number.)

> +
> class = read_pci_config(bus, slot, func,
> PCI_CLASS_REVISION);
> if (class == 0xffffffff)
> diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c
> index 3755ef4..f76b930 100644
> --- a/arch/x86/kernel/early-quirks.c
> +++ b/arch/x86/kernel/early-quirks.c
> @@ -250,6 +250,9 @@ static int __init check_dev_quirk(int num, int slot, int func)
>
> vendor = read_pci_config_16(num, slot, func, PCI_VENDOR_ID);
>
> + if (vendor == 0xffff)
> + return -1;
> +
> device = read_pci_config_16(num, slot, func, PCI_DEVICE_ID);
>
> for (i = 0; early_qrk[i].f != NULL; i++) {
> diff --git a/arch/x86/kernel/pci-calgary_64.c b/arch/x86/kernel/pci-calgary_64.c
> index 299d493..05798a0 100644
> --- a/arch/x86/kernel/pci-calgary_64.c
> +++ b/arch/x86/kernel/pci-calgary_64.c
> @@ -1324,8 +1324,9 @@ static void __init get_tce_space_from_tar(void)
> unsigned short pci_device;
> u32 val;
>
> - val = read_pci_config(bus, 0, 0, 0);
> - pci_device = (val & 0xFFFF0000) >> 16;
> + if (read_pci_config_16(bus, 0, 0, PCI_VENDOR_ID) == 0xffff)
> + continue;
> + pci_device = read_pci_config_16(bus, 0, 0, PCI_DEVICE_ID);
>
> if (!is_cal_pci_dev(pci_device))
> continue;
> @@ -1426,6 +1427,9 @@ int __init detect_calgary(void)
> unsigned short pci_device;
> u32 val;
>
> + if (read_pci_config_16(bus, 0, 0, PCI_VENDOR_ID) == 0xffff)
> + continue;
> +
> val = read_pci_config(bus, 0, 0, 0);
> pci_device = (val & 0xFFFF0000) >> 16;

I liked how you replaced the read_pci_config(bus, 0, 0, 0) with
read_pci_config_16(bus, 0, 0, PCI_DEVICE_ID) in the previous diff for
the function get_tce_space_from_tar(). Could you do that in this
detect_calgary() function too?

>
> diff --git a/arch/x86/pci/early.c b/arch/x86/pci/early.c
> index d1067d5..4fb6847 100644
> --- a/arch/x86/pci/early.c
> +++ b/arch/x86/pci/early.c
> @@ -91,6 +91,9 @@ void early_dump_pci_devices(void)
> u32 class;
> u8 type;
>
> + if (read_pci_config_16(bus, slot, func, PCI_VENDOR_ID) == 0xffff)
> + continue;
> +
> class = read_pci_config(bus, slot, func,
> PCI_CLASS_REVISION);
> if (class == 0xffffffff)
> diff --git a/drivers/firewire/init_ohci1394_dma.c b/drivers/firewire/init_ohci1394_dma.c
> index a9a347a..dd3bd84 100644
> --- a/drivers/firewire/init_ohci1394_dma.c
> +++ b/drivers/firewire/init_ohci1394_dma.c
> @@ -279,6 +279,9 @@ void __init init_ohci1394_dma_on_all_controllers(void)
> for (num = 0; num < 32; num++) {
> for (slot = 0; slot < 32; slot++) {
> for (func = 0; func < 8; func++) {
> + if (read_pci_config_16(num, slot, func, PCI_VENDOR_ID) == 0xffff)
> + continue;
> +
> class = read_pci_config(num, slot, func,
> PCI_CLASS_REVISION);
> if (class == 0xffffffff)

It is interesting that these last two functions are doing basically the
same pci discovery as the code in search_agp_bridge(), except that they
uses continue instead of break. It might be beyond the scope of what you
are trying to fix, but those continues could be changed to breaks for
the same reason it is a break in search_agp_bridge().

-Betty

2012-08-09 23:41:18

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] x86, pci: Fix all early PCI scans to check the vendor ID first


Thanks Betty. All fixed. I'll post a v2.

-Andi
--
[email protected] -- Speaking for myself only.

2012-08-10 23:57:30

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86, pci: Fix all early PCI scans to check the vendor ID first

On 08/09/2012 03:34 PM, Betty Dall wrote:
>
> I thought this should be a break instead of a continue since the code
> does a break if the class is 0xffffffff. If the function does not have a
> valid VENDOR_ID, then the remaining function numbers do not have to be
> scanned because functions are required to be implemented in order (no
> skipping a function number.)
>

Is that true? This is certainly not true in PCI in general: there is
required to be a function 0, but there is no guarantee that functions
1-7 don't have gaps.

-hpa

2012-08-11 10:43:54

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] x86, pci: Fix all early PCI scans to check the vendor ID first

On Fri, Aug 10, 2012 at 04:57:02PM -0700, H. Peter Anvin wrote:
> On 08/09/2012 03:34 PM, Betty Dall wrote:
> >
> > I thought this should be a break instead of a continue since the code
> > does a break if the class is 0xffffffff. If the function does not have a
> > valid VENDOR_ID, then the remaining function numbers do not have to be
> > scanned because functions are required to be implemented in order (no
> > skipping a function number.)
> >
>
> Is that true? This is certainly not true in PCI in general: there is
> required to be a function 0, but there is no guarantee that functions
> 1-7 don't have gaps.

These scans are for special known devices, presumably true for them.

Anwyays if you don't like it please use v1 of the patch.

-Andi
--
[email protected] -- Speaking for myself only

2012-08-13 03:23:07

by Khalid Aziz

[permalink] [raw]
Subject: Re: [PATCH] x86, pci: Fix all early PCI scans to check the vendor ID first

On 08/10/2012 05:57 PM, H. Peter Anvin wrote:
> On 08/09/2012 03:34 PM, Betty Dall wrote:
>> I thought this should be a break instead of a continue since the code
>> does a break if the class is 0xffffffff. If the function does not have a
>> valid VENDOR_ID, then the remaining function numbers do not have to be
>> scanned because functions are required to be implemented in order (no
>> skipping a function number.)
>>
> Is that true? This is certainly not true in PCI in general: there is
> required to be a function 0, but there is no guarantee that functions
> 1-7 don't have gaps.

If that is the case, there is a problem in the original code in
arch/x86/kernel/aperture_64.c.The original code already stops scanning
functions the first time it finds an invalid PCI class:

206 for (func = 0; func < 8; func++) {
207 u32 class, cap;
208 u8 type;
209 class = read_pci_config(bus, slot, func,
210 PCI_CLASS_REVISION);
211 if (class == 0xffffffff)
212 break;

--
====================================================================
Khalid Aziz
[email protected]

2012-08-13 03:24:56

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86, pci: Fix all early PCI scans to check the vendor ID first

On 08/12/2012 08:23 PM, Khalid Aziz wrote:
>
> If that is the case, there is a problem in the original code in
> arch/x86/kernel/aperture_64.c.The original code already stops scanning
> functions the first time it finds an invalid PCI class:
>

Unless we can prove that that is invalid *for that specific case*, then
it is definitely wrong, and yes, I have seen devices in the field with
multiple, discontiguous functions.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2012-08-13 04:15:35

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] x86, pci: Fix all early PCI scans to check the vendor ID first

> If that is the case, there is a problem in the original code in
> arch/x86/kernel/aperture_64.c.The original code already stops scanning
> functions the first time it finds an invalid PCI class:

This was only for some AMD northbridges, since it worked there it should
be ok. The code is obsolete now for modern systems I believe.

-Andi

2012-08-13 21:58:24

by Dall, Elizabeth J

[permalink] [raw]
Subject: Re: [PATCH] x86, pci: Fix all early PCI scans to check the vendor ID first

On Sat, 2012-08-11 at 03:43 -0700, Andi Kleen wrote:
> On Fri, Aug 10, 2012 at 04:57:02PM -0700, H. Peter Anvin wrote:
> > On 08/09/2012 03:34 PM, Betty Dall wrote:
> > >
> > > I thought this should be a break instead of a continue since the code
> > > does a break if the class is 0xffffffff. If the function does not have a
> > > valid VENDOR_ID, then the remaining function numbers do not have to be
> > > scanned because functions are required to be implemented in order (no
> > > skipping a function number.)
> > >
> >
> > Is that true? This is certainly not true in PCI in general: there is
> > required to be a function 0, but there is no guarantee that functions
> > 1-7 don't have gaps.
>
> These scans are for special known devices, presumably true for them.
>
> Anwyays if you don't like it please use v1 of the patch.
>
> -Andi

I checked the PCI specification, and Peter is right that function
numbers can be sparse. Please go with version 1 of the patch, as Andi
suggested. I will follow up by looking at why the three scans are not
consistent and send a patch, if appropriate. The scans could be improved
by stopping the function scan if function 0 does not exist because
function 0 is required, and if it is not there then none of the other
functions will be implemented.

-Betty

2012-08-13 22:02:16

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86, pci: Fix all early PCI scans to check the vendor ID first

On 08/13/2012 02:58 PM, Betty Dall wrote:
>
> I checked the PCI specification, and Peter is right that function
> numbers can be sparse. Please go with version 1 of the patch, as Andi
> suggested. I will follow up by looking at why the three scans are not
> consistent and send a patch, if appropriate. The scans could be improved
> by stopping the function scan if function 0 does not exist because
> function 0 is required, and if it is not there then none of the other
> functions will be implemented.
>

Yes, if function 0 doesn't exist we could, and *should* skip functions
1-7; in fact, we should not process functions 1-7 unless the
multifunction bit is set in function 0. This matters on real devices in
the field.

-hpa