2007-12-20 12:28:18

by Tony Camuso

[permalink] [raw]
Subject: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]



-------- Original Message --------
Subject: Re: [PATCH 0/5]PCI: x86 MMCONFIG
Date: Wed, 19 Dec 2007 19:33:45 -0500
From: Tony Camuso <[email protected]>
Reply-To: [email protected]
To: Greg KH <[email protected]>
References:
<20071219221746.20362.39243.sendpatchset@dhcp83-188.boston.redhat.com>
<[email protected]>

Greg KH wrote:
> On Wed, Dec 19, 2007 at 05:17:46PM -0500, [email protected] wrote:
>> There exist devices that do not respond correctly to PCI
>> MMCONFIG accesses in x86 platforms.
>
> What devices are these? Do you have reports of them somewhere?
>
There are the AMD 8131 and 8132, the Serverworks HT1000 bridge chips
and the 830M/MG graphics. Not all versions of these chips present
this pathology, but there are perhaps tens of thousands of systems
out there that have the broken versions of these chipsets.

RedHat have been maintaining a blacklist of systems having these
devices. Systems in the blacklist are confined to legacy PCI
access.

However, some of these systems, high volume ones at that, also have
PCI express buses in them. By limiting the whole platform to legacy
PCI access, we put PCI express extended config capabilities out of
reach. IIRC, the PCIe spec requires platforms to access extended
PCI config space.

> That sounds like this patchset can cause bad side affects on hardware
> that currently works just fine. That is not a good thing to be adding
> to the kernel, right?
>
No, the patch set tries to obviate this without requiring endusers to
write customized scripts with "pci=nommconf" and without requiring the
RH folks to add another platform (usually belatedly) to the blacklist.

If a device is going to machine check when you touch it with an mmconfig
access, it will happen with or without this patch-set.

However, the patch-set does cover most of the devices that don't respond
well to mmconfig access. Such devices almost alway7s return garbage when
you read from them.

The one device we know about that throws exceptions is the 830M/MG
graphics chip. This chip passes the read-compare test, so the code
merrily advances to bus sizing. When the bus sizing code writes the
BAR at offset 0x18 in this device, the system hangs.

I am thinking about a machine check handler that can catch these things,
or at least the exceptions. Aborts are not recoverable, according to
intel lore. However, the one device that we know croaks HARD seems to
throw an exception, since it happens in the same exact place every
time. I think we might be able to recover from that.

But that's in the future.


2007-12-20 17:25:41

by Greg KH

[permalink] [raw]
Subject: Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

On Thu, Dec 20, 2007 at 07:28:00AM -0500, Tony Camuso wrote:
>
>
> -------- Original Message --------
> Subject: Re: [PATCH 0/5]PCI: x86 MMCONFIG
> Date: Wed, 19 Dec 2007 19:33:45 -0500
> From: Tony Camuso <[email protected]>
> Reply-To: [email protected]
> To: Greg KH <[email protected]>
> References:
> <20071219221746.20362.39243.sendpatchset@dhcp83-188.boston.redhat.com>
> <[email protected]>
>
> Greg KH wrote:
>> On Wed, Dec 19, 2007 at 05:17:46PM -0500, [email protected] wrote:
>>> There exist devices that do not respond correctly to PCI
>>> MMCONFIG accesses in x86 platforms.
>> What devices are these? Do you have reports of them somewhere?
> There are the AMD 8131 and 8132, the Serverworks HT1000 bridge chips
> and the 830M/MG graphics. Not all versions of these chips present
> this pathology, but there are perhaps tens of thousands of systems
> out there that have the broken versions of these chipsets.

Why haven't we gotten reports about this before if this is a common
problem?

And why hasn't the vendor fixed the bios on these to work properly?

> RedHat have been maintaining a blacklist of systems having these
> devices. Systems in the blacklist are confined to legacy PCI
> access.

Do you have a pointer to this blacklist anywhere so that everyone can
benifit from this knowledge?

>> That sounds like this patchset can cause bad side affects on hardware
>> that currently works just fine. That is not a good thing to be adding
>> to the kernel, right?
> No, the patch set tries to obviate this without requiring endusers to
> write customized scripts with "pci=nommconf" and without requiring the
> RH folks to add another platform (usually belatedly) to the blacklist.
>
> If a device is going to machine check when you touch it with an mmconfig
> access, it will happen with or without this patch-set.
>
> However, the patch-set does cover most of the devices that don't respond
> well to mmconfig access. Such devices almost alway7s return garbage when
> you read from them.
>
> The one device we know about that throws exceptions is the 830M/MG
> graphics chip. This chip passes the read-compare test, so the code
> merrily advances to bus sizing. When the bus sizing code writes the
> BAR at offset 0x18 in this device, the system hangs.

So it doesn't work at all, with or without this patch? Does the vendor
know about this?

thanks,

greg k-h

2007-12-20 17:35:39

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

On Thu, Dec 20, 2007 at 09:22:05AM -0800, Greg KH wrote:
> > The one device we know about that throws exceptions is the 830M/MG
> > graphics chip. This chip passes the read-compare test, so the code
> > merrily advances to bus sizing. When the bus sizing code writes the
> > BAR at offset 0x18 in this device, the system hangs.
>
> So it doesn't work at all, with or without this patch? Does the vendor
> know about this?

I can't imagine there are too many machines with MMCONFIG and an
i830 chipset. The laptop I'm currently typing on is an i830 ... and
its BIOS is from 2002, well predating the specification of MMCONFIG.
If I didn't know better, I'd accuse Tony of lying about the existance
of such a machine ;-)

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2007-12-20 18:04:22

by Tony Camuso

[permalink] [raw]
Subject: Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

Matthew Wilcox wrote:
> On Thu, Dec 20, 2007 at 09:22:05AM -0800, Greg KH wrote:
>
> I can't imagine there are too many machines with MMCONFIG and an i830
> chipset. The laptop I'm currently typing on is an i830 ... and its
> BIOS is from 2002, well predating the specification of MMCONFIG. If I
> didn't know better, I'd accuse Tony of lying about the existance of
> such a machine ;-)
>
Sorry, that was the 82Q963/Q965 graphics controller, PCI ID 2992.

I can't remember why I thought PCI ID 2992 maps to 830M/MG. I think
it was in some intel doc about the ICH8 or ICH9.

Nevertheless, I have an hp dc5700 microtower right here on the floor
next to me, and it hangs when the bus sizing code does an mmconfig
write to the BAR at offset 0x18 in this device (id 2992).

It hangs in exactly the same place every time.

I am surmising that the write to that BAR is causing a MCE.

Here is the lspci output of the dc5700. The misbehaving device is
00:02.0

[root@localhost ~]# lspci
00:00.0 Host bridge: Intel Corporation 82Q963/Q965 Memory Controller Hub
(rev 02)
00:02.0 VGA compatible controller: Intel Corporation 82Q963/Q965
Integrated Graphics Controller (rev 02)
00:1a.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI
#4 (rev 02)
00:1a.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI
#5 (rev 02)
00:1a.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI
#2 (rev 02)
00:1b.0 Audio device: Intel Corporation 82801H (ICH8 Family) HD Audio
Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express
Port 1 (rev 02)
00:1c.1 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express
Port 2 (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI
#1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI
#2 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI
#1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev f2)
00:1f.0 ISA bridge: Intel Corporation 82801HB/HR (ICH8/R) LPC Interface
Controller (rev 02)
00:1f.2 IDE interface: Intel Corporation 82801H (ICH8 Family) 4 port
SATA IDE Controller (rev 02)
00:1f.5 IDE interface: Intel Corporation 82801H (ICH8 Family) 2 port
SATA IDE Controller (rev 02)
3f:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5755
Gigabit Ethernet PCI Express (rev 02)

2007-12-20 18:16:18

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

On Thu, Dec 20, 2007 at 01:04:09PM -0500, Tony Camuso wrote:
> Sorry, that was the 82Q963/Q965 graphics controller, PCI ID 2992.
>
> I can't remember why I thought PCI ID 2992 maps to 830M/MG. I think
> it was in some intel doc about the ICH8 or ICH9.
>
> Nevertheless, I have an hp dc5700 microtower right here on the floor
> next to me, and it hangs when the bus sizing code does an mmconfig
> write to the BAR at offset 0x18 in this device (id 2992).

Oh, that's the same bug others (including me) have been complaining
about.

http://marc.info/?l=linux-kernel&m=118809338631160&w=2

> It hangs in exactly the same place every time.
>
> I am surmising that the write to that BAR is causing a MCE.

Bad deduction. What's happening is that the write to the BAR is causing
it to overlap the decode for mmconfig space. So the mmconfig write to
set the BAR back never gets through.

I have a different idea to fix this problem. Instead of writing
0xffffffff, we could look for an unused bit of space in the E820 map and
write, say, 0xdfffffff to the low 32-bits of a BAR. Then it wouldn't
overlap, and we could find its size using MMCONFIG.

Does anyone know how Windows handles these machines? Obviously, if it's
using MMCONFIG, it'd have the same problems. Does it just use type 1
for initial sizing? Or does it use type 1 for all accesses below 256
bytes?

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2007-12-20 18:26:19

by Tony Camuso

[permalink] [raw]
Subject: Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

Greg KH wrote:

> Why haven't we gotten reports about this before if this is a common
> problem?
>
> And why hasn't the vendor fixed the bios on these to work properly?
>
I can't really answer either of these questions. All I know is the
problem exists, and we need to deal with it.

That's why the unreachable_devices() routine exists in the first place.

But that routine is limited to only the first 16 buses on segment-0.
You would like to think that hardware designers would confine legacy
hardware, or problematic hardware, to that area, but they don't.

As I said before, expanding that routine to cover more buses would
adversely impact mmconfig pci access, since the mmconfig access
code does a lookup of that bitmap for every request. I don't think
we want that bitmap and the accompanying in-line lookup to increase
enough to encompass the pci configuration of some of these larger
systems.

> Do you have a pointer to this blacklist anywhere so that everyone can
> benifit from this knowledge?
>

Appended below is a code snippet embedded in the RHEL version of mmconfig.c,
for both i386 and x86_64. It does not encompass all the systems that have
(or will have) problems with mmconf. Only HP platforms are listed, but I
believe there are others.

The reason the HP platforms got caught is they put these devices beyond
bus 16, where they would have been trapped, and the problem would have
been avoided.

static int __devinit disable_mmconf(struct dmi_system_id *d)
{
pci_probe &= ~PCI_PROBE_MMCONF;
printk(KERN_INFO "%s detected: disabling PCI MMCONFIG\n", d->ident);
return 0;
}


/*
* Systems which cannot use PCI MMCONFIG at this time...
*/
static struct dmi_system_id __devinitdata nommconf_dmi_table[] = {
{
.callback = disable_mmconf,
.ident = "HP xw9300 Workstation",
.matches = {
DMI_MATCH(DMI_PRODUCT_NAME, "HP xw9300 Workstation"),
},
},
{
.callback = disable_mmconf,
.ident = "HP xw9400 Workstation",
.matches = {
DMI_MATCH(DMI_PRODUCT_NAME, "HP xw9400 Workstation"),
},
},
{
.callback = disable_mmconf,
.ident = "ProLiant DL585 G2",
.matches = {
DMI_MATCH(DMI_PRODUCT_NAME, "ProLiant DL585 G2"),
},
},
{
.callback = disable_mmconf,
.ident = "HP Compaq dc5700 Microtower",
.matches = {
DMI_MATCH(DMI_PRODUCT_NAME,
"HP Compaq dc5700 Microtower"),
},
},

{}
};

>> The one device we know about that throws exceptions is the 830M/MG
>> graphics chip. This chip passes the read-compare test, so the code
>> merrily advances to bus sizing. When the bus sizing code writes the
>> BAR at offset 0x18 in this device, the system hangs.
>
> So it doesn't work at all, with or without this patch? Does the vendor
> know about this?
>
> thanks,
>
> greg k-h

I have talked to intel about this, but they haven't gotten back to me.

All I know at this point is that a mmconf write to the BAR at offset 0x18
of this device hangs the system.

2007-12-20 18:30:32

by Tony Camuso

[permalink] [raw]
Subject: Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

Matthew Wilcox wrote:

> Bad deduction. What's happening is that the write to the BAR is causing
> it to overlap the decode for mmconfig space. So the mmconfig write to
> set the BAR back never gets through.
>
> I have a different idea to fix this problem. Instead of writing
> 0xffffffff, we could look for an unused bit of space in the E820 map and
> write, say, 0xdfffffff to the low 32-bits of a BAR. Then it wouldn't
> overlap, and we could find its size using MMCONFIG.
>
The BAR claims to be a 64-bit BAR.

> Does anyone know how Windows handles these machines? Obviously, if it's
> using MMCONFIG, it'd have the same problems. Does it just use type 1
> for initial sizing? Or does it use type 1 for all accesses below 256
> bytes?
>
As far as I know, Windows has a blacklist that limits systems with these
devices to legacy PCI config access.

2007-12-20 18:36:52

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

On Thu, Dec 20, 2007 at 01:30:16PM -0500, Tony Camuso wrote:
> Matthew Wilcox wrote:
>
> >Bad deduction. What's happening is that the write to the BAR is causing
> >it to overlap the decode for mmconfig space. So the mmconfig write to
> >set the BAR back never gets through.
> >
> >I have a different idea to fix this problem. Instead of writing
> >0xffffffff, we could look for an unused bit of space in the E820 map and
> >write, say, 0xdfffffff to the low 32-bits of a BAR. Then it wouldn't
> >overlap, and we could find its size using MMCONFIG.
> >
> The BAR claims to be a 64-bit BAR.

And what's in the upper 32 bits by default? I'm guessing all zeroes.
We size 64-bit BARs by writing to each half individually. So I bet this
one still overlaps the MMCONFIG space at 0xf0000000.

> As far as I know, Windows has a blacklist that limits systems with these
> devices to legacy PCI config access.

I don't think that's true. If the people designing these machines had
noticed Windows failing to boot on them, they'd've fixed the BIOS to put
the MMCONFIG region elsewhere, not hacked Windows until it worked.

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2007-12-20 18:49:58

by Loic Prylli

[permalink] [raw]
Subject: Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]


On 12/20/2007 1:16 PM, Matthew Wilcox wrote:
>
> Bad deduction. What's happening is that the write to the BAR is causing
> it to overlap the decode for mmconfig space. So the mmconfig write to
> set the BAR back never gets through.
>
> I have a different idea to fix this problem. Instead of writing
> 0xffffffff, we could look for an unused bit of space in the E820 map and
> write, say, 0xdfffffff to the low 32-bits of a BAR. Then it wouldn't
> overlap, and we could find its size using MMCONFIG.
>
> Does anyone know how Windows handles these machines? Obviously, if it's
> using MMCONFIG, it'd have the same problems. Does it just use type 1
> for initial sizing? Or does it use type 1 for all accesses below 256
> bytes?
>



Always using type 1 for accesses below 256 bytes looks like a very very
attractive solution

I know we had a lot of older kernels over the last two years that we
patched like that (we needed MMCONFIG for our own device development
purposes, but we also needed our machines to boot and discover all
devices reliably). Recent kernels works fine out of the box on all
hardware we have, but all this sometimes tricky and apparently endless
work (in big part because of buggy BIOSes) about MMCONFIG would probably
become relatively easy by limiting the aim to have MMCONFIG work when it
is required (for cfg-space accesses >= 256).


Loic

2007-12-20 19:04:46

by Tony Camuso

[permalink] [raw]
Subject: Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

Loic Prylli wrote:

> Always using type 1 for accesses below 256 bytes looks like a very very
> attractive solution
>
> I know we had a lot of older kernels over the last two years that we
> patched like that (we needed MMCONFIG for our own device development
> purposes, but we also needed our machines to boot and discover all
> devices reliably). Recent kernels works fine out of the box on all
> hardware we have, but all this sometimes tricky and apparently endless
> work (in big part because of buggy BIOSes) about MMCONFIG would probably
> become relatively easy by limiting the aim to have MMCONFIG work when it
> is required (for cfg-space accesses >= 256).
>
>
> Loic
>
Hmmm... I think I like this solution.

It may be easier to implement than the solution I posted.

Also, this solution also would allow us to remove the unreachable_devices()
routine and bitmap.

Does anybody see a down side to this?

2007-12-20 19:08:46

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

On Thu, Dec 20, 2007 at 02:04:31PM -0500, Tony Camuso wrote:
> Also, this solution also would allow us to remove the unreachable_devices()
> routine and bitmap.

Not really ... we probe reading address 0x100 to see if the device
supports extended config space or not. So we need to make that fail
gracefully for the amd7111 case.

> Does anybody see a down side to this?

It'll be slower than it would be if we used mmconfig directly. Now yes,
nobody should be using pci config space in performance critical paths
... but see the tg3 driver.

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2007-12-20 19:38:05

by Tony Camuso

[permalink] [raw]
Subject: Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

Matthew Wilcox wrote:

> Bad deduction. What's happening is that the write to the BAR is causing
> it to overlap the decode for mmconfig space. So the mmconfig write to
> set the BAR back never gets through.
>
> I have a different idea to fix this problem. Instead of writing
> 0xffffffff, we could look for an unused bit of space in the E820 map and
> write, say, 0xdfffffff to the low 32-bits of a BAR. Then it wouldn't
> overlap, and we could find its size using MMCONFIG.
>
Let me see if I understand this correctly.

Writing this BAR with 0xffffffff causes it to decode all further mmconfig
references based at addr 0xfxxxxxxx as its own?

If I've got that right, then why don't any of the other BARs do that? Is
it because this one's a 64-bit BAR?

AFIK, there are no devices out there that require 32-bits of address
space, so using 0xdfffffff in the low register would certainly work.

However, the PCI spec says that we should be able to write 0xffffffff
to the BAR, buggy BIOS and hardware notwithstanding.

Does anybody see that change as being within the purview of the patch-set
I am proposing? Or is that another patch for another time?

2007-12-20 19:50:03

by Ivan Kokshaysky

[permalink] [raw]
Subject: Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

On Thu, Dec 20, 2007 at 12:08:33PM -0700, Matthew Wilcox wrote:
> On Thu, Dec 20, 2007 at 02:04:31PM -0500, Tony Camuso wrote:
> > Does anybody see a down side to this?
>
> It'll be slower than it would be if we used mmconfig directly. Now yes,
> nobody should be using pci config space in performance critical paths
> ... but see the tg3 driver.

Use type 1 just for the first 64 bytes and tg3 will be happy. All we need
is to avoid touching BARs with mmconfig.

Ivan.

2007-12-20 20:04:28

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

On Thu, Dec 20, 2007 at 02:37:49PM -0500, Tony Camuso wrote:
> Matthew Wilcox wrote:
>
> >Bad deduction. What's happening is that the write to the BAR is causing
> >it to overlap the decode for mmconfig space. So the mmconfig write to
> >set the BAR back never gets through.
> >
> >I have a different idea to fix this problem. Instead of writing
> >0xffffffff, we could look for an unused bit of space in the E820 map and
> >write, say, 0xdfffffff to the low 32-bits of a BAR. Then it wouldn't
> >overlap, and we could find its size using MMCONFIG.
> >
> Let me see if I understand this correctly.
>
> Writing this BAR with 0xffffffff causes it to decode all further mmconfig
> references based at addr 0xfxxxxxxx as its own?
>
> If I've got that right, then why don't any of the other BARs do that? Is
> it because this one's a 64-bit BAR?

Here's how BARs work ... when you write 0xffffffff to the BAR, it
ignores all the set bits that are less than the size of the BAR. So,
assuming this is a 256MB BAR (like my G33 is), what ends up written to
this BAR is 0xf0000000. Now, because this is graphics, apparently it's
special and embedded in the chipset, even though it looks like it's a
PCI device. So it actually gets priority over MMCONFIG which is also
mapped to 0xf0000000.

For your case of a 64-bit BAR, you could write 0xffffffff to the high
32-bits first, then write to the low 32-bits, then reset the low, then
high bits, and you'd avoid the problem. But the G33 has a 32-bit BAR
with the same problem, so it won't work for that case.

BARs that are behind bridges don't have this problem (they can't decode
memory accesses that aren't forwarded to them). BARs on devices which
have memory IO disabled also don't have theis problem, but disabling
devices has its problems (as does probing BARs for active devices anyway
...).

> AFIK, there are no devices out there that require 32-bits of address
> space, so using 0xdfffffff in the low register would certainly work.

The question is how large can 32-bit BARs get. As we've seen, 256MB
exist, and are causing pain. I can't imagine any PCI device
manufacturer thinks they can allocate 2GB of the low space, but we could
potentially mis-size a large BAR by not using 0xffffffff.

> Does anybody see that change as being within the purview of the patch-set
> I am proposing? Or is that another patch for another time?

I'm really not clear on the purpose of your patchset. Was it all to
address this one problem?

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2007-12-20 20:06:41

by Loic Prylli

[permalink] [raw]
Subject: Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

On 12/20/2007 2:08 PM, Matthew Wilcox wrote:
> On Thu, Dec 20, 2007 at 02:04:31PM -0500, Tony Camuso wrote:
>
>> Also, this solution also would allow us to remove the unreachable_devices()
>> routine and bitmap.
>>
>
> Not really ... we probe reading address 0x100 to see if the device
> supports extended config space or not. So we need to make that fail
> gracefully for the amd7111 case.
>



pci_cfg_space_size() is only done for PCI-express or PCI-X mode 2
devices, so you still have eliminated the bulk of the problems which are
typically handling the legacy busses on modern machines. I don't know
what is the amd7111.



>
>> Does anybody see a down side to this?
>>
>
> It'll be slower than it would be if we used mmconfig directly. Now yes,
> nobody should be using pci config space in performance critical paths
> ... but see the tg3 driver.
>


I am not familiar with the tg3 driver, just trying to give a 5 minutes
look, it seems the typical cases where the pci-conf-space is used
intensively are with some rev in combination with the 82801
(TG3_FLG2_ICH_WORKAROUND) which I don't think support mmconfig anyway,
as well as some very specific PCI-X combinations
(TG3_FLAG_PCIX_TARGET_HWBUG) which are also very unlikely to support
mmconfig.

Even if I am wrong for the tg3, I don't really think mmconfig vs type1
could make a noticeable performance on any common systems (obscure
systems or hardware where it could potentially have a performance impact
could use a non-default configuration).


Loic



2007-12-20 20:15:27

by Tony Camuso

[permalink] [raw]
Subject: Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

Matthew Wilcox wrote:

> Here's how BARs work ... when you write 0xffffffff to the BAR, it
> ignores all the set bits that are less than the size of the BAR. So,
> assuming this is a 256MB BAR (like my G33 is), what ends up written to
> this BAR is 0xf0000000. Now, because this is graphics, apparently it's
> special and embedded in the chipset, even though it looks like it's a
> PCI device. So it actually gets priority over MMCONFIG which is also
> mapped to 0xf0000000.
>
> For your case of a 64-bit BAR, you could write 0xffffffff to the high
> 32-bits first, then write to the low 32-bits, then reset the low, then
> high bits, and you'd avoid the problem. But the G33 has a 32-bit BAR
> with the same problem, so it won't work for that case.
>
> BARs that are behind bridges don't have this problem (they can't decode
> memory accesses that aren't forwarded to them). BARs on devices which
> have memory IO disabled also don't have theis problem, but disabling
> devices has its problems (as does probing BARs for active devices anyway
> ...).
>
Thanks for the detailed explanation.

> The question is how large can 32-bit BARs get. As we've seen, 256MB
> exist, and are causing pain. I can't imagine any PCI device
> manufacturer thinks they can allocate 2GB of the low space, but we could
> potentially mis-size a large BAR by not using 0xffffffff.
>

Point well taken. Graphics devices understandably consume a lot of memory
space, and are likely to consume even more in the not-too-distant future.

> I'm really not clear on the purpose of your patchset. Was it all to
> address this one problem?
>

No. My patch-set does not address this problem at all, but rather the
larger problem of having mmconfig-unfriendly devices on buses that are
out of reach of the unreachable_devices() routine and bitmap.

This problem is one I encountered during my testing and mentioned in
my preamble as not being fixable by my patch-set.

2007-12-20 20:15:43

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

On Thu, Dec 20, 2007 at 03:05:52PM -0500, Loic Prylli wrote:
> I am not familiar with the tg3 driver, just trying to give a 5 minutes
> look, it seems the typical cases where the pci-conf-space is used
> intensively are with some rev in combination with the 82801
> (TG3_FLG2_ICH_WORKAROUND) which I don't think support mmconfig anyway,
> as well as some very specific PCI-X combinations
> (TG3_FLAG_PCIX_TARGET_HWBUG) which are also very unlikely to support
> mmconfig.

It's not a question of whether the card supports mmconfig or not -- the
card can't tell whether a first-256-byte pci config transaction was
initiated through mmconfig, type1, type2, or even a bios call.

I'm just hacking together an implementation based on Ivan's suggestion
to always use type 1 for the first 64 bytes.

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2007-12-20 20:17:44

by Tony Camuso

[permalink] [raw]
Subject: Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

Ivan Kokshaysky wrote:
> On Thu, Dec 20, 2007 at 12:08:33PM -0700, Matthew Wilcox wrote:
>> On Thu, Dec 20, 2007 at 02:04:31PM -0500, Tony Camuso wrote:
>>> Does anybody see a down side to this?
>> It'll be slower than it would be if we used mmconfig directly. Now yes,
>> nobody should be using pci config space in performance critical paths
>> ... but see the tg3 driver.
>
> Use type 1 just for the first 64 bytes and tg3 will be happy. All we need
> is to avoid touching BARs with mmconfig.
>
> Ivan.

Sounds good to me.

However, this does add another in-line test for every pci config access.

The existing test is a lookup in the unreachable_devices bitmap. Even though
the bitmap only covers the first 16 buses, the lookup is performed for every
pci config access.

We would be adding another test,

if the offset is less than 64 bytes, use legacy pci config
else use mmconfig

The emitted code would probably only produce one branch, so it shouldn't
present a performance degradation.

Any objections to taking this tact?

2007-12-20 20:27:29

by Tony Camuso

[permalink] [raw]
Subject: Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

Ivan Kokshaysky wrote:

> Use type 1 just for the first 64 bytes and tg3 will be happy. All we need
> is to avoid touching BARs with mmconfig.
>
> Ivan.

Re-thinking this ...

The problem I see with this approach is that it does not help devices
that are mmconfig-unfriendly and will not work above the 64-byte offset.

So you will be able to configure the BARS of such a device, but not be
able to do much of anything else.

2007-12-20 20:57:38

by Loic Prylli

[permalink] [raw]
Subject: Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

On 12/20/2007 3:15 PM, Matthew Wilcox wrote:
> On Thu, Dec 20, 2007 at 03:05:52PM -0500, Loic Prylli wrote:
>
>> I am not familiar with the tg3 driver, just trying to give a 5 minutes
>> look, it seems the typical cases where the pci-conf-space is used
>> intensively are with some rev in combination with the 82801
>> (TG3_FLG2_ICH_WORKAROUND) which I don't think support mmconfig anyway,
>> as well as some very specific PCI-X combinations
>> (TG3_FLAG_PCIX_TARGET_HWBUG) which are also very unlikely to support
>> mmconfig.
>>
>
> It's not a question of whether the card supports mmconfig or not -- the
> card can't tell whether a first-256-byte pci config transaction was initiated through mmconfig, type1, type2, or even a bios call.
>





I know the final device is not aware on how the config request was
originated. I am just saying platforms built around the Intel 82801
chipset (ICH2) don't support mmconfig at all. I would also not be
surprised if the platforms where tg3 needs TG3_FLAG_PCIX_TARGET_HWBUG
might also not support mmconfig (but for this second case, it's only
speculation based on seeing a couple posts about
TG3_FLAG_PCIX_TARGET_HWBUG where amd hypertransport/PCI-X chipsets where
mentioned). If you know of a platform that support mmconfig, and where
the tg3 does need to use relatively intensively pci-conf-space, I'll be
happy to be corrected.




> I'm just hacking together an implementation based on Ivan's suggestion
> to always use type 1 for the first 64 bytes.
>
>



Whatever you think it's best. If you are only trying to address a
BAR-sizing/mmconfig temporary conflict, and think the "MCFG BIOS" or
other mmconfig problems are under control, that makes sense. If the need
arise for somebody, an option to select between 64 to 256 can be added
later.



Loic

2007-12-20 21:00:54

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

On Thu, Dec 20, 2007 at 03:56:29PM -0500, Loic Prylli wrote:
> I know the final device is not aware on how the config request was
> originated. I am just saying platforms built around the Intel 82801
> chipset (ICH2) don't support mmconfig at all. I would also not be
> surprised if the platforms where tg3 needs TG3_FLAG_PCIX_TARGET_HWBUG
> might also not support mmconfig (but for this second case, it's only
> speculation based on seeing a couple posts about
> TG3_FLAG_PCIX_TARGET_HWBUG where amd hypertransport/PCI-X chipsets where
> mentioned). If you know of a platform that support mmconfig, and where
> the tg3 does need to use relatively intensively pci-conf-space, I'll be
> happy to be corrected.

tg3 is available as an add-in pci card. i have one. i can plug it into
a machine that does support mmconfig.

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2007-12-20 21:55:59

by Loic Prylli

[permalink] [raw]
Subject: Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

On 12/20/2007 4:00 PM, Matthew Wilcox wrote:
> On Thu, Dec 20, 2007 at 03:56:29PM -0500, Loic Prylli wrote:
>
>> I know the final device is not aware on how the config request was
>> originated. I am just saying platforms built around the Intel 82801
>> chipset (ICH2) don't support mmconfig at all. I would also not be
>> surprised if the platforms where tg3 needs TG3_FLAG_PCIX_TARGET_HWBUG
>> might also not support mmconfig (but for this second case, it's only
>> speculation based on seeing a couple posts about
>> TG3_FLAG_PCIX_TARGET_HWBUG where amd hypertransport/PCI-X chipsets where
>> mentioned). If you know of a platform that support mmconfig, and where
>> the tg3 does need to use relatively intensively pci-conf-space, I'll be
>> happy to be corrected.
>>
>
> tg3 is available as an add-in pci card. i have one. i can plug it into
> a machine that does support mmconfig.
>



That doesn't tell for sure your NIC has the specific rev that would
cause the hwbug workaround to be used. But let's assume so, your
combination would still works correctly (maybe a slightly non-optimal
network performance). A non-default mmconf=always option can help
maximize again the performance.

My starting point was that more systems could be supported out-of-the
box (vs not working at all) by using "type 1" more widely,. This would
not break anybody. And this would not affect performance except on what
I initially called "obscure hardware or systems".

I already acknowledged in the previous email the possibility of a
performance impact, but it is still not clear to me whether it would be
widespread. The example you mention is a buggy revision of a chip that
was designed to use memory-mapped IO, and has to use config-space
instead because the memory-mapped IO implementation is buggy (in most
cases only when used in combination with certain chipsets). How much
influence should a modest performance impact there influence the
decision-making?


Usually it's better to trade some performance for stability than the
reverse, and you loose the performance anyway when some many people
start using pci=nommconf by default on all their installs.


Loic

2007-12-20 21:59:08

by Greg KH

[permalink] [raw]
Subject: Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

On Thu, Dec 20, 2007 at 01:25:57PM -0500, Tony Camuso wrote:
> Appended below is a code snippet embedded in the RHEL version of
> mmconfig.c,
> for both i386 and x86_64. It does not encompass all the systems that have
> (or will have) problems with mmconf. Only HP platforms are listed, but I
> believe there are others.
>
> The reason the HP platforms got caught is they put these devices beyond
> bus 16, where they would have been trapped, and the problem would have
> been avoided.

Any reason why these changes were never submitted to the upstream kernel
versions? Or do you all just want to keep patching your newer releases
with this information forever? :)

Please send these kinds of changes upstream...

>>> The one device we know about that throws exceptions is the 830M/MG
>>> graphics chip. This chip passes the read-compare test, so the code
>>> merrily advances to bus sizing. When the bus sizing code writes the
>>> BAR at offset 0x18 in this device, the system hangs.
>> So it doesn't work at all, with or without this patch? Does the vendor
>> know about this?
>> thanks,
>> greg k-h
>
> I have talked to intel about this, but they haven't gotten back to me.

Try poking them harder, they are usually very responsive to Linux kernel
issues.

thanks,

greg k-h

2007-12-20 22:37:00

by Tony Camuso

[permalink] [raw]
Subject: Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

Greg KH wrote:
> On Thu, Dec 20, 2007 at 01:25:57PM -0500, Tony Camuso wrote:
> Any reason why these changes were never submitted to the upstream kernel
> versions? Or do you all just want to keep patching your newer releases
> with this information forever? :)
>

I really don't know why these changes were never submitted to the
upstream kernel versions".

I was brought on the scene about six months ago as HP's on-site engineer
at RH, and this was one of the things they wanted me to do.

We wanted a solution that was more generic and could manage this
problem preemptively, rather than using blacklists. Maintenance of
blacklists is a bother and almost always done after a new system
with this problem is discovered.

Furthermore, blacklisting whole platforms to use legacy pci config
penalizes any mmconfig-friendly buses in those platforms, particularly
the pci express buses, and causes such platforms to be non-compliant
with the pci expres spec.

> Please send these kinds of changes upstream...
>
As you wish.
:)

>> I have talked to intel about this, but they haven't gotten back to me.
>
> Try poking them harder, they are usually very responsive to Linux kernel
> issues.
>
> thanks,
>
> greg k-h

Matthew Wilcox explained the problem to me. The hang that cannot be fixed
by my patch isn't the chipset's fault. The graphics device has a 64-bit
BAR asking for 256 MB of IO.

If I understand his explanation correctly, when the bus sizing code
writes the low dword of this BAR with 0xffffffff, the chip is then
programmed to claim every MMIO reference betweeen 0x00000000.f0000000 and
0x00000001.00000000

It just so happens that MMCONFIG space for the dc5700 (awa some others) is
mapped by its BIOS into that very region, so all MMCONFIG references are now
swallowed up by the graphics chip. At that point, no further boot progress can
be made.

2007-12-20 22:44:21

by Greg KH

[permalink] [raw]
Subject: Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

On Thu, Dec 20, 2007 at 05:36:43PM -0500, Tony Camuso wrote:
> Greg KH wrote:
>> On Thu, Dec 20, 2007 at 01:25:57PM -0500, Tony Camuso wrote:
>> Any reason why these changes were never submitted to the upstream kernel
>> versions? Or do you all just want to keep patching your newer releases
>> with this information forever? :)
>
> I really don't know why these changes were never submitted to the
> upstream kernel versions".
>
> I was brought on the scene about six months ago as HP's on-site engineer
> at RH, and this was one of the things they wanted me to do.
>
> We wanted a solution that was more generic and could manage this
> problem preemptively, rather than using blacklists. Maintenance of
> blacklists is a bother and almost always done after a new system
> with this problem is discovered.
>
> Furthermore, blacklisting whole platforms to use legacy pci config
> penalizes any mmconfig-friendly buses in those platforms, particularly
> the pci express buses, and causes such platforms to be non-compliant
> with the pci expres spec.

Sure, I realize this, but it solves the problem in one way for broken
hardware, such that it at least allows it to work, right? It also
provides a better incentive for the manufacturer to fix their bios,
which as you are on-site at HP, it would seem odd that they would just
not do that instead of trying to work around this in the kernel...

thanks,

greg k-h

2007-12-20 23:21:52

by Tony Camuso

[permalink] [raw]
Subject: Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

Greg KH wrote:
>
> Sure, I realize this, but it solves the problem in one way for broken
> hardware, such that it at least allows it to work, right? It also
> provides a better incentive for the manufacturer to fix their bios,
> which as you are on-site at HP, it would seem odd that they would just
> not do that instead of trying to work around this in the kernel...
>
> thanks,
>
> greg k-h

I don't think that many OEMs have that much control over the BIOS in
their "value lines".
:)

And the MMCONFIG problem with enterprise systems and workstations, where
we do control the BIOS (for the most part), is due to known bugs in
certain versions of certain chipsets, HT1000, AMD8132, among them, not
the BIOS.

Anyway, we are devising better ways to deal with these anomalies
than blacklists and telling customers to use "pci=nommconf"

And we're bringing them to the community for discussion, improvement,
and, we hope, acceptance.

2007-12-21 02:09:56

by Loic Prylli

[permalink] [raw]
Subject: Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]


On 12/20/2007 6:21 PM, Tony Camuso wrote:
>
> And the MMCONFIG problem with enterprise systems and workstations, where
> we do control the BIOS (for the most part), is due to known bugs in
> certain versions of certain chipsets, HT1000, AMD8132, among them, not
> the BIOS.



The lack of MMCONFIG support is indeed because some hypertransport
chipsets lack that support. But there are some BIOSes out there that are
advertising support for all busses in their MCFG acpi attribute (even
the busses managed by some amd8131 in a mixed nvidia-ck804/amd8131
motherboard), and the BIOS seems at least faulty for advertising a
capability that does not exist.


Loic

2007-12-23 20:17:21

by Loic Prylli

[permalink] [raw]
Subject: Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]



On 12/20/2007 1:16 PM, Matthew Wilcox wrote:
> Oh, that's the same bug others (including me) have been complaining
> about.
>
> http://marc.info/?l=linux-kernel&m=118809338631160&w=2
>
>
>> It hangs in exactly the same place every time.
>>
>> I am surmising that the write to that BAR is causing a MCE.
>>
>
> Bad deduction. What's happening is that the write to the BAR is causing
> it to overlap the decode for mmconfig space. So the mmconfig write to
> set the BAR back never gets through.
>
> I have a different idea to fix this problem. Instead of writing
> 0xffffffff, we could look for an unused bit of space in the E820 map and
> write, say, 0xdfffffff to the low 32-bits of a BAR. Then it wouldn't
> overlap, and we could find its size using MMCONFIG.
>
> Does anyone know how Windows handles these machines?



I just realized one thing: the bar sizing code in pci_read_bases() (that
writes 0xffffffff in the bars) does not seem to disable the
PCI_COMMAND_MEM/PCI_COMMAND_IO bits in the cmd register before
manipulating the BARs. And it seems nobody else ensures they are
disabled at this point either (or am I missing something?).


Touching the bars while they are enabled would be buggy behaviour from
our part, and something trivial to fix. And it might well fix that
particular problem (it's fair play from the machine to crash if we
create a decoding conflict, simply disabling the cmd bits in
pci_read_bases() should remove that conflict).

FWIW, to partially answer your last question, Windows does disable
mem-space and/or IO-space when sizing the bars of a device (I have some
traces of configuration-space-access taken on a window machine for one
of the PCI busses).



Loic





2007-12-23 20:56:10

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

On Sun, Dec 23, 2007 at 03:16:24PM -0500, Loic Prylli wrote:
> I just realized one thing: the bar sizing code in pci_read_bases() (that
> writes 0xffffffff in the bars) does not seem to disable the
> PCI_COMMAND_MEM/PCI_COMMAND_IO bits in the cmd register before
> manipulating the BARs. And it seems nobody else ensures they are
> disabled at this point either (or am I missing something?).

Right, we don't. Ivan and Greg are convinced that doing so can break
some machines.

> FWIW, to partially answer your last question, Windows does disable
> mem-space and/or IO-space when sizing the bars of a device (I have some
> traces of configuration-space-access taken on a window machine for one
> of the PCI busses).

Now that contradicts some information we've been told before; can you
post those traces? That would argue in favour of disabling memspace
when configuring BARs.

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2007-12-23 22:41:20

by Loic Prylli

[permalink] [raw]
Subject: Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

On 12/23/2007 3:55 PM, Matthew Wilcox wrote:
> On Sun, Dec 23, 2007 at 03:16:24PM -0500, Loic Prylli wrote:
>
>> I just realized one thing: the bar sizing code in pci_read_bases() (that
>> writes 0xffffffff in the bars) does not seem to disable the
>> PCI_COMMAND_MEM/PCI_COMMAND_IO bits in the cmd register before
>> manipulating the BARs. And it seems nobody else ensures they are
>> disabled at this point either (or am I missing something?).
>>
>
> Right, we don't. Ivan and Greg are convinced that doing so can break
> some machines.
>



It might indeed be scary to suddenly change the command bits on some
old "Host bridges" and maybe other classes of bridges. I was too
optimistic about the triviality of the fix.


Still an obvious improvement to the pci_read_bases() code would be to
not try sizing non-existings BARs (that way if a device has no memory
bars, we never need to touch the mem-enable bit). In addition, if you
exclude Host bridges and other bridges from temporary mem-disabling
while sizing their BARs, then doing temp disabling for the remaining
devices looks less scary than the alternative (messing live with the
bars decoding range especially for regular devices integrated to the
main chipset).


It seems to me worth making that init code safer independantly of the
mmconfig issues.



> Now that contradicts some information we've been told before; can you
> post those traces? That would argue in favour of disabling memspace
> when configuring BARs.
>



Here is a trace where COMMAND/STATUS and 0x10 and 0x14 registers
accesses were captured. Comments in brackets are my interpretation of
what's happening. That was with windows 2003 64bit.


Loic



[ reset at 607 second]

[BIOS: read-status, COMMAND = 0 ]

626.442718:cfg-read:0x6.w=0x10
626.442805:cfg-write:0x4.w=0x0

[BIOS: sizing BAR0]

626.448325:cfg-read:0x10.l=0x8
626.448365:cfg-write:0x10.l=0xfffffffe
626.448439:cfg-read:0x10.l=0xff000008
626.448493:cfg-write:0x10.l=0x8

[BIOS: sizing non-existent BAR1]

626.448552:cfg-read:0x14.l=0x0
626.448591:cfg-write:0x14.l=0xfffffffe
626.448647:cfg-read:0x14.l=0x0
626.448687:cfg-write:0x14.l=0x0

[BIOS: sizing BAR0]

626.449596:cfg-read:0x10.l=0x8
626.449635:cfg-write:0x10.l=0xfffffffe
626.449710:cfg-read:0x10.l=0xff000008
626.449764:cfg-write:0x10.l=0x8

[BIOS: sizing non-existent BAR1]

626.449832:cfg-read:0x14.l=0x0
626.449872:cfg-write:0x14.l=0xfffffffe
626.449928:cfg-read:0x14.l=0x0
626.449967:cfg-write:0x14.l=0x0

[BIOS: sizing BAR0]


626.451822:cfg-read:0x10.l=0x8
626.451862:cfg-write:0x10.l=0xfffffffe
626.451936:cfg-read:0x10.l=0xff000008
626.451990:cfg-write:0x10.l=0x8

[BIOS: sizing non-existent BAR1]

626.452058:cfg-read:0x14.l=0x0
626.452098:cfg-write:0x14.l=0xfffffffe
626.452154:cfg-read:0x14.l=0x0
626.452194:cfg-write:0x14.l=0x0


[BIOS: sizing and assigning BAR0]

626.454899:cfg-read:0x10.l=0x8
626.454939:cfg-write:0x10.l=0xfffffffe
626.455013:cfg-read:0x10.l=0xff000008

626.455068:cfg-write:0x10.l=0x8
626.455138:cfg-write:0x10.l=0xfa000000

[BIOS: sizing non-existent BAR1]

626.455211:cfg-read:0x14.l=0x0
626.455251:cfg-write:0x14.l=0xfffffffe
626.455307:cfg-read:0x14.l=0x0
626.455346:cfg-write:0x14.l=0x0


[BIOS: sizing and restoring BAR0]


628.024774:cfg-read:0x10.l=0xfa000008
628.024831:cfg-write:0x10.l=0xfffffffe
628.024908:cfg-read:0x10.l=0xff000008
628.024964:cfg-write:0x10.l=0xfa000008

[BIOS: sizing non-existent BAR1]

628.025049:cfg-read:0x14.l=0x0
628.025091:cfg-write:0x14.l=0xfffffffe
628.025149:cfg-read:0x14.l=0x0
628.025190:cfg-write:0x14.l=0x0

[BIOS: clear status ]

628.025714:cfg-write:0x6.w=0xffff

[BIOS: enable SERR/IO+MEMORY+MASTER+SPECIAL+INVALIDATE ]
628.025788:cfg-write:0x4.w=0x11f

[BIOS: enable MEMORY+MASTER (after seeing IO/special/invalidate were
read-only 0) ]
629.028114:cfg-read:0x4.b=0x6
629.028160:cfg-write:0x4.b=0x6

[BIOS: read status, clear PARITY/SERR status bits ]
629.028490:cfg-read:0x7.b=0x0
629.028536:cfg-write:0x7.b=0xc1

[BIOS: enable PARITY detection ]
629.028818:cfg-read:0x4.b=0x6
629.028863:cfg-write:0x4.b=0x46

[BIOS: enable SERR (already enabled anyway) ]
629.028920:cfg-read:0x5.b=0x1
629.028966:cfg-write:0x5.b=0x1
629.032338:cfg-read:0x6.w=0x10


[ WINDOWS starting ]

661.965932:cfg-read:0x4.l=0x100146
661.965986:cfg-read:0x10.l=0xfa000008
661.966040:cfg-read:0x14.l=0x0

662.112971:cfg-read:0x4.l=0x100146
662.113024:cfg-read:0x10.l=0xfa000008
662.113078:cfg-read:0x14.l=0x0

662.155903:cfg-read:0x4.l=0x100146

662.156066:cfg-read:0x4.l=0x100146
662.156120:cfg-read:0x10.l=0xfa000008
662.156174:cfg-read:0x14.l=0x0

662.156337:cfg-read:0x4.l=0x100146
662.156391:cfg-read:0x10.l=0xfa000008
662.156445:cfg-read:0x14.l=0x0

[ WINDOWS: disable MEMORY + MASTER ]
662.156519:cfg-write:0x4.l=0x140

[ WINDOWS: size BAR0 and BAR1]
662.156588:cfg-write:0x10.l=0xffffffff
662.156662:cfg-write:0x14.l=0xffffffff
662.156755:cfg-read:0x4.l=0x100140
662.156809:cfg-read:0x10.l=0xff000008
662.156863:cfg-read:0x14.l=0x0

[ WINDOWS: disable MEMORY + MASTER ]

662.156937:cfg-write:0x4.l=0x140

[ WINDOWS: restore BAR0 and non-existing BAR1]

662.157004:cfg-write:0x10.l=0xfa000008
662.157077:cfg-write:0x14.l=0x0

[ WINDOWS: enable MEMORY + MASTER ]

662.157151:cfg-write:0x4.w=0x146

662.157453:cfg-read:0x4.l=0x100146
662.157506:cfg-read:0x10.l=0xfa000008
662.157560:cfg-read:0x14.l=0x0

662.157690:cfg-read:0x4.w=0x146

[ WINDOWS: disable MEMORY + MASTER ]

662.157738:cfg-write:0x4.w=0x140

[ WINDOWS: enter power-saving mode ]
662.157866:Entered L1

2007-12-24 08:09:49

by Grant Grundler

[permalink] [raw]
Subject: Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

On Sun, Dec 23, 2007 at 03:16:24PM -0500, Loic Prylli wrote:
...
> I just realized one thing: the bar sizing code in pci_read_bases() (that
> writes 0xffffffff in the bars) does not seem to disable the
> PCI_COMMAND_MEM/PCI_COMMAND_IO bits in the cmd register before
> manipulating the BARs. And it seems nobody else ensures they are
> disabled at this point either (or am I missing something?).

You are missing some history... I posted such a patch in 2002:
http://lkml.org/lkml/2002/12/19/145

> Touching the bars while they are enabled would be buggy behaviour from
> our part, and something trivial to fix. And it might well fix that
> particular problem (it's fair play from the machine to crash if we
> create a decoding conflict, simply disabling the cmd bits in
> pci_read_bases() should remove that conflict).

ISTR willy or Ivan recently posted a patch that was suggested in 2002
as well (don't disable MMIO on bridge devices when sizing BARs)...so the
main objections might be resolved to this "obvious fix". *sigh*

> FWIW, to partially answer your last question, Windows does disable
> mem-space and/or IO-space when sizing the bars of a device (I have some
> traces of configuration-space-access taken on a window machine for one
> of the PCI busses).

Thanks for posting the traces...it's past midnight here and I'll try to
look at those tomorrow. (Sorry - sounds like a lame excuse but I'm likely
to read the trace incorrectly at the moment.)

cheers,
grant

2007-12-24 08:29:20

by Grant Grundler

[permalink] [raw]
Subject: Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

On Thu, Dec 20, 2007 at 02:40:06PM -0800, Greg KH wrote:
> Sure, I realize this, but it solves the problem in one way for broken
> hardware, such that it at least allows it to work, right? It also
> provides a better incentive for the manufacturer to fix their bios,
> which as you are on-site at HP, it would seem odd that they would just
> not do that instead of trying to work around this in the kernel...

Greg, you know why: Cost.

Asking the BIOS provider to roll a new version costs.
Rolling the BIOS version might also require re-running the OS "Certification".
Besides the cost of an OS re-cert, BIOS changes are unlikely to happen if
it will delay the introduction and/or sale of new products.

And it's not considered to be a "BIOS" problem unless win XP (or maybe
"Vista" these days) fails the same way (which BIOS supplier would have
fixed...).

hth,
grant

2008-01-08 03:20:48

by Tony Camuso

[permalink] [raw]
Subject: Re: [PATCH 0/5]PCI: x86 MMCONFIG

Greg,

Have you given this patch-set any more consideration?

I've submitted the changes you requested.

Regards,
Tony

2008-01-08 03:41:24

by Tony Camuso

[permalink] [raw]
Subject: Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

On Thu, 2007-12-20 at 22:50 +0300, Ivan Kokshaysky wrote:

> Use type 1 just for the first 64 bytes and tg3 will be happy. All we need
> is to avoid touching BARs with mmconfig.
>
> Ivan.

I've tried Ivan's suggestion, and it works.

The patch is appended below.

My question is, do we want to incorporate this full-time upstream?

It would fix a lot of existing and potential problems, and the cost
is very small.

Regards,
Tony

diff --git a/arch/x86/pci/mmconfig_32.c b/arch/x86/pci/mmconfig_32.c
index 1bf5816..42e7d4a 100644
--- a/arch/x86/pci/mmconfig_32.c
+++ b/arch/x86/pci/mmconfig_32.c
@@ -73,7 +73,8 @@ static int pci_mmcfg_read(unsigned int seg, unsigned int bus,
}

base = get_base_addr(seg, bus, devfn);
- if (!base)
+ if ((!addr) || (reg < 0x40))
+
return pci_conf1_read(seg,bus,devfn,reg,len,value);

spin_lock_irqsave(&pci_config_lock, flags);
@@ -106,7 +107,7 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus,
return -EINVAL;

base = get_base_addr(seg, bus, devfn);
- if (!base)
+ if ((!addr) || (reg < 0x40))
return pci_conf1_write(seg,bus,devfn,reg,len,value);

spin_lock_irqsave(&pci_config_lock, flags);
diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c
index 4095e4d..4ad1fcb 100644
--- a/arch/x86/pci/mmconfig_64.c
+++ b/arch/x86/pci/mmconfig_64.c
@@ -61,7 +61,7 @@ static int pci_mmcfg_read(unsigned int seg, unsigned int bus,
}

addr = pci_dev_base(seg, bus, devfn);
- if (!addr)
+ if ((!addr) || (reg < 0x40))
return pci_conf1_read(seg,bus,devfn,reg,len,value);

switch (len) {
@@ -89,7 +89,7 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus,
return -EINVAL;

addr = pci_dev_base(seg, bus, devfn);
- if (!addr)
+ if ((!addr) || (reg < 0x40))
return pci_conf1_write(seg,bus,devfn,reg,len,value);

switch (len) {

2008-01-08 04:55:19

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 0/5]PCI: x86 MMCONFIG

On Mon, Jan 07, 2008 at 10:20:23PM -0500, Tony Camuso wrote:
> Greg,
>
> Have you given this patch-set any more consideration?

Which patch-set, there have been a number of them :)

> I've submitted the changes you requested.

Care to respin them all so I'm not confused?

thanks,

greg k-h

2008-01-08 13:14:49

by Tony Camuso

[permalink] [raw]
Subject: Re: [PATCH 0/5]PCI: x86 MMCONFIG

I'll respin the whole kit with [PATCH ?/5][V2]PCI: x86 MMCONFIG *
in the subject line, where the ? is replaced by the number of the
patch within the set and the * is replaced by a brief description,
if that's acceptable.

I can have it ready in a few hours.

On Mon, 2008-01-07 at 20:56 -0800, Greg KH wrote:
> On Mon, Jan 07, 2008 at 10:20:23PM -0500, Tony Camuso wrote:
> > Greg,
> >
> > Have you given this patch-set any more consideration?
>
> Which patch-set, there have been a number of them :)
>
> > I've submitted the changes you requested.
>
> Care to respin them all so I'm not confused?
>
> thanks,
>
> greg k-h

2008-01-08 13:30:39

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 0/5]PCI: x86 MMCONFIG

On Tue, Jan 08, 2008 at 08:14:22AM -0500, Tony Camuso wrote:
> I'll respin the whole kit with [PATCH ?/5][V2]PCI: x86 MMCONFIG *
> in the subject line, where the ? is replaced by the number of the
> patch within the set and the * is replaced by a brief description,
> if that's acceptable.

That sounds great.

> I can have it ready in a few hours.

I'll be on a plane for a few hours, so don't feel the need to rush :)

thanks,

greg k-h

2008-01-08 13:44:46

by Tony Camuso

[permalink] [raw]
Subject: Re: [PATCH 0/5]PCI: x86 MMCONFIG

Thanks, Greg.

Have a safe flight!

On Tue, 2008-01-08 at 05:36 -0800, Greg KH wrote:
> On Tue, Jan 08, 2008 at 08:14:22AM -0500, Tony Camuso wrote:
> > I'll respin the whole kit with [PATCH ?/5][V2]PCI: x86 MMCONFIG *
> > in the subject line, where the ? is replaced by the number of the
> > patch within the set and the * is replaced by a brief description,
> > if that's acceptable.
>
> That sounds great.
>
> > I can have it ready in a few hours.
>
> I'll be on a plane for a few hours, so don't feel the need to rush :)
>
> thanks,
>
> greg k-h