2007-05-02 02:47:59

by Jesse Barnes

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

On Monday, April 30, 2007, Olivier Galibert wrote:
> On Sun, Apr 29, 2007 at 08:14:37PM -0600, Robert Hancock wrote:
> > -Validate that the area is reserved even if we read it from the
> > chipset directly and not from the MCFG table. This catches the case
> > where the BIOS didn't set the location properly in the chipset and
> > has mapped it over other things it shouldn't have. This might be
> > overly pessimistic - we might be able to instead verify that no
> > other reserved resources (like chipset registers) are inside this
> > memory range.
>
> I have a fundamental problem with that: you don't validate a higher
> reliability information against a lower one. The chipset registers
> are high reliability. Modulo unknown hardware erratas and bugs in the
> code (and accepting f0000000 is in practice a bug in the code, the
> docs are starting to catch up with it too), the chipset *will* decode
> mmconfig at the looked up address no matter what. On the other side,
> the ACPI data is bios generated, and that is well known to be horribly
> unreliable. Hell, if it was reliable we could just use the MFCG ACPI
> table without questions.

Now that I've read his patch closely I think you're right.

Robert, it looks like you'll trust acpi_table_parse if
pci_mmcfg_check_hostbridge returns a failure. I think it should be
treated with a higher priority. If pci_mmcfg_check_hostbridge returns a
failure, there's no way MCFG space can work, so we should disable it
unconditionally in that case (even if ACPI says "trust me, when have I
ever lied to you?").

I'm testing it now on my 965...

Thanks,
Jesse


2007-05-02 02:56:42

by Jesse Barnes

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

On Tuesday, May 01, 2007, Jesse Barnes wrote:
> On Monday, April 30, 2007, Olivier Galibert wrote:
> > On Sun, Apr 29, 2007 at 08:14:37PM -0600, Robert Hancock wrote:
> > > -Validate that the area is reserved even if we read it from the
> > > chipset directly and not from the MCFG table. This catches the case
> > > where the BIOS didn't set the location properly in the chipset and
> > > has mapped it over other things it shouldn't have. This might be
> > > overly pessimistic - we might be able to instead verify that no
> > > other reserved resources (like chipset registers) are inside this
> > > memory range.
> >
> > I have a fundamental problem with that: you don't validate a higher
> > reliability information against a lower one. The chipset registers
> > are high reliability. Modulo unknown hardware erratas and bugs in the
> > code (and accepting f0000000 is in practice a bug in the code, the
> > docs are starting to catch up with it too), the chipset *will* decode
> > mmconfig at the looked up address no matter what. On the other side,
> > the ACPI data is bios generated, and that is well known to be horribly
> > unreliable. Hell, if it was reliable we could just use the MFCG ACPI
> > table without questions.
>
> Now that I've read his patch closely I think you're right.
>
> Robert, it looks like you'll trust acpi_table_parse if
> pci_mmcfg_check_hostbridge returns a failure. I think it should be
> treated with a higher priority. If pci_mmcfg_check_hostbridge returns a
> failure, there's no way MCFG space can work, so we should disable it
> unconditionally in that case (even if ACPI says "trust me, when have I
> ever lied to you?").
>
> I'm testing it now on my 965...

Bah... nevermind Robert, I see you're doing this already in
pci_mmcfg_reject_broken. I'm about to reboot & test now.

Jesse

2007-05-02 05:27:43

by Jesse Barnes

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

On Tuesday, May 01, 2007, Jesse Barnes wrote:
> > I'm testing it now on my 965...
>
> Bah... nevermind Robert, I see you're doing this already in
> pci_mmcfg_reject_broken. I'm about to reboot & test now.

Ok, I've tested a bit on my 965 (after re-adding my old patch to support
it) and the new checks are more complete, but my BIOS still appears to be
buggy.

The extended config space (as defined by the register) is at 0xf0000000
(full value is 0xf0000003 indicating 128M enabled). The ACPI MCFG table
has this space reserved according to Robert's new code, but the machine
hangs due to the address space aliasing Olivier mentioned awhile back. I
don't have a PCIe card to test with (or any devices that require extended
config space that I know of) so I can't really tell if Windows supports
PCIe on this platform, but if it does I don't see how it would w/o having
a full bridge driver and sophisticated address space allocation builtin.

I'm going to try updating my BIOS, but if that doesn't solve this problem,
I'm not sure what we can do about it. Should pci_mmcfg_insert_resources
check for conflicts? Should we just blacklist certain boards? I can try
pinging our BIOS folks about this board to see what was intended, but I'm
sure this won't be the only board we have problems with, so we'll need to
address it generically somehow.

Thanks,
Jesse

2007-05-02 14:35:31

by Robert Hancock

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

Jesse Barnes wrote:
> On Tuesday, May 01, 2007, Jesse Barnes wrote:
>>> I'm testing it now on my 965...
>> Bah... nevermind Robert, I see you're doing this already in
>> pci_mmcfg_reject_broken. I'm about to reboot & test now.
>
> Ok, I've tested a bit on my 965 (after re-adding my old patch to support
> it) and the new checks are more complete, but my BIOS still appears to be
> buggy.
>
> The extended config space (as defined by the register) is at 0xf0000000
> (full value is 0xf0000003 indicating 128M enabled). The ACPI MCFG table
> has this space reserved according to Robert's new code, but the machine
> hangs due to the address space aliasing Olivier mentioned awhile back. I
> don't have a PCIe card to test with (or any devices that require extended
> config space that I know of) so I can't really tell if Windows supports
> PCIe on this platform, but if it does I don't see how it would w/o having
> a full bridge driver and sophisticated address space allocation builtin.

Windows XP doesn't use MMCONFIG or any extended configuration space. I
believe Vista is supposed to, though. Not sure how they are handling
this issue.

>
> I'm going to try updating my BIOS, but if that doesn't solve this problem,
> I'm not sure what we can do about it. Should pci_mmcfg_insert_resources
> check for conflicts? Should we just blacklist certain boards? I can try
> pinging our BIOS folks about this board to see what was intended, but I'm
> sure this won't be the only board we have problems with, so we'll need to
> address it generically somehow.

Can you post what your board has for PNPACPI reserved resources (I
believe they're in /sys/devices/pnp0/*/resources IIRC, don't have a
Linux box handy right now). Full dmesg would also be useful, I think it
dumps out those reservations at boot nowadays..

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2007-05-02 17:57:38

by Jesse Barnes

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

On Wednesday, May 2, 2007 7:34 am Robert Hancock wrote:
> Jesse Barnes wrote:
> > On Tuesday, May 01, 2007, Jesse Barnes wrote:
> >>> I'm testing it now on my 965...
> >>
> >> Bah... nevermind Robert, I see you're doing this already in
> >> pci_mmcfg_reject_broken. I'm about to reboot & test now.
> >
> > Ok, I've tested a bit on my 965 (after re-adding my old patch to
> > support it) and the new checks are more complete, but my BIOS still
> > appears to be buggy.
> >
> > The extended config space (as defined by the register) is at
> > 0xf0000000 (full value is 0xf0000003 indicating 128M enabled). The
> > ACPI MCFG table has this space reserved according to Robert's new
> > code, but the machine hangs due to the address space aliasing
> > Olivier mentioned awhile back. I don't have a PCIe card to test
> > with (or any devices that require extended config space that I know
> > of) so I can't really tell if Windows supports PCIe on this
> > platform, but if it does I don't see how it would w/o having a full
> > bridge driver and sophisticated address space allocation builtin.
>
> Windows XP doesn't use MMCONFIG or any extended configuration space.
> I believe Vista is supposed to, though. Not sure how they are
> handling this issue.

Oh right... Vista will be the first to fully support PCIe & mcfg...

> Can you post what your board has for PNPACPI reserved resources (I
> believe they're in /sys/devices/pnp0/*/resources IIRC, don't have a
> Linux box handy right now). Full dmesg would also be useful, I think
> it dumps out those reservations at boot nowadays..

BIOS update didn't help. Here's the boot log and a dump of the pnp0
resources.

Jesse


Attachments:
(No filename) (1.64 kB)
pnp.out (691.00 B)
boot.out (24.95 kB)
Download all attachments

2007-05-02 23:45:40

by Robert Hancock

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

Jesse Barnes wrote:
> On Wednesday, May 2, 2007 7:34 am Robert Hancock wrote:
>> Jesse Barnes wrote:
>>> On Tuesday, May 01, 2007, Jesse Barnes wrote:
>>>>> I'm testing it now on my 965...
>>>> Bah... nevermind Robert, I see you're doing this already in
>>>> pci_mmcfg_reject_broken. I'm about to reboot & test now.
>>> Ok, I've tested a bit on my 965 (after re-adding my old patch to
>>> support it) and the new checks are more complete, but my BIOS still
>>> appears to be buggy.
>>>
>>> The extended config space (as defined by the register) is at
>>> 0xf0000000 (full value is 0xf0000003 indicating 128M enabled). The
>>> ACPI MCFG table has this space reserved according to Robert's new
>>> code, but the machine hangs due to the address space aliasing
>>> Olivier mentioned awhile back. I don't have a PCIe card to test
>>> with (or any devices that require extended config space that I know
>>> of) so I can't really tell if Windows supports PCIe on this
>>> platform, but if it does I don't see how it would w/o having a full
>>> bridge driver and sophisticated address space allocation builtin.
>> Windows XP doesn't use MMCONFIG or any extended configuration space.
>> I believe Vista is supposed to, though. Not sure how they are
>> handling this issue.
>
> Oh right... Vista will be the first to fully support PCIe & mcfg...
>
>> Can you post what your board has for PNPACPI reserved resources (I
>> believe they're in /sys/devices/pnp0/*/resources IIRC, don't have a
>> Linux box handy right now). Full dmesg would also be useful, I think
>> it dumps out those reservations at boot nowadays..
>
> BIOS update didn't help. Here's the boot log and a dump of the pnp0
> resources.

Curious.. It looks like the ACPI resources have the correct reservation
for the MMCONFIG window according to what the register says the location
should be. There's no other reservations that overlap with that range
(f000000-f7ffffff), and according to the 965 datasheet there's nothing
that's hard-coded to occupy that memory range. I can't really see what
this range could be conflicting with.

What happens if you take out the chipset register detection, does the
MCFG table give you the same result? Wonder if they're doing something
funny with start/end bus values or something in their table. There's
some code in my patch that prints out the important data from the MCFG
table, can you tell me what that shows with the chipset detection taken out?

If that doesn't provide any useful information, I think we may need some
assistance from Intel chipset/motherboard people to figure out what is
going on here..

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2007-05-02 23:54:24

by Jesse Barnes

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

On Wednesday, May 2, 2007 4:45 pm Robert Hancock wrote:
> Jesse Barnes wrote:
> > On Wednesday, May 2, 2007 7:34 am Robert Hancock wrote:
> >> Jesse Barnes wrote:
> >>> On Tuesday, May 01, 2007, Jesse Barnes wrote:
> >>>>> I'm testing it now on my 965...
> >>>>
> >>>> Bah... nevermind Robert, I see you're doing this already in
> >>>> pci_mmcfg_reject_broken. I'm about to reboot & test now.
> >>>
> >>> Ok, I've tested a bit on my 965 (after re-adding my old patch to
> >>> support it) and the new checks are more complete, but my BIOS
> >>> still appears to be buggy.
> >>>
> >>> The extended config space (as defined by the register) is at
> >>> 0xf0000000 (full value is 0xf0000003 indicating 128M enabled).
> >>> The ACPI MCFG table has this space reserved according to Robert's
> >>> new code, but the machine hangs due to the address space aliasing
> >>> Olivier mentioned awhile back. I don't have a PCIe card to test
> >>> with (or any devices that require extended config space that I
> >>> know of) so I can't really tell if Windows supports PCIe on this
> >>> platform, but if it does I don't see how it would w/o having a
> >>> full bridge driver and sophisticated address space allocation
> >>> builtin.
> >>
> >> Windows XP doesn't use MMCONFIG or any extended configuration
> >> space. I believe Vista is supposed to, though. Not sure how they
> >> are handling this issue.
> >
> > Oh right... Vista will be the first to fully support PCIe & mcfg...
> >
> >> Can you post what your board has for PNPACPI reserved resources (I
> >> believe they're in /sys/devices/pnp0/*/resources IIRC, don't have
> >> a Linux box handy right now). Full dmesg would also be useful, I
> >> think it dumps out those reservations at boot nowadays..
> >
> > BIOS update didn't help. Here's the boot log and a dump of the
> > pnp0 resources.
>
> Curious.. It looks like the ACPI resources have the correct
> reservation for the MMCONFIG window according to what the register
> says the location should be. There's no other reservations that
> overlap with that range (f000000-f7ffffff), and according to the 965
> datasheet there's nothing that's hard-coded to occupy that memory
> range. I can't really see what this range could be conflicting with.

Yeah, it's strange. Even /proc/iomem from a working boot looks ok:

d0700000-d07fffff : PCI Bus #04
d0800000-d08fffff : PCI Bus #05
f0000000-f7ffffff : pnp 00:01
fec00000-fec00fff : IOAPIC 0
fed00000-fed003ff : HPET 0

> What happens if you take out the chipset register detection, does the
> MCFG table give you the same result? Wonder if they're doing
> something funny with start/end bus values or something in their
> table. There's some code in my patch that prints out the important
> data from the MCFG table, can you tell me what that shows with the
> chipset detection taken out?

Yeah, I'll look a little more closely. It could also be that another
register needs tweaking somewhere to actually get the bridge to decode
the space.

> If that doesn't provide any useful information, I think we may need
> some assistance from Intel chipset/motherboard people to figure out
> what is going on here..

I'm talking with them now, hopefully they'll shed some light on it.

Thanks,
Jesse

2007-05-04 21:06:52

by Jesse Barnes

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

On Wednesday, May 2, 2007 4:54 pm Jesse Barnes wrote:
> > What happens if you take out the chipset register detection, does
> > the MCFG table give you the same result? Wonder if they're doing
> > something funny with start/end bus values or something in their
> > table. There's some code in my patch that prints out the important
> > data from the MCFG table, can you tell me what that shows with the
> > chipset detection taken out?
>
> Yeah, I'll look a little more closely. It could also be that another
> register needs tweaking somewhere to actually get the bridge to
> decode the space.
>
> > If that doesn't provide any useful information, I think we may need
> > some assistance from Intel chipset/motherboard people to figure out
> > what is going on here..
>
> I'm talking with them now, hopefully they'll shed some light on it.

I did a little more debugging this morning, and found that I can
actually do reads from the space described by ACPI and the device
register, but later when ACPI actually scans the root bridges, it
hangs. Specifically the call to pci_acpi_scan_root in
pci_root.c:acpi_pci_root_add() never seems to return.

I'll walk through that logic when I get back to my test box, but it's
also worth noting that Vista's MCFG on this machine apparently works ok
too.

Jesse

2007-05-05 00:22:32

by Robert Hancock

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

Jesse Barnes wrote:
> On Wednesday, May 2, 2007 4:54 pm Jesse Barnes wrote:
>>> What happens if you take out the chipset register detection, does
>>> the MCFG table give you the same result? Wonder if they're doing
>>> something funny with start/end bus values or something in their
>>> table. There's some code in my patch that prints out the important
>>> data from the MCFG table, can you tell me what that shows with the
>>> chipset detection taken out?
>> Yeah, I'll look a little more closely. It could also be that another
>> register needs tweaking somewhere to actually get the bridge to
>> decode the space.
>>
>>> If that doesn't provide any useful information, I think we may need
>>> some assistance from Intel chipset/motherboard people to figure out
>>> what is going on here..
>> I'm talking with them now, hopefully they'll shed some light on it.
>
> I did a little more debugging this morning, and found that I can
> actually do reads from the space described by ACPI and the device
> register, but later when ACPI actually scans the root bridges, it
> hangs. Specifically the call to pci_acpi_scan_root in
> pci_root.c:acpi_pci_root_add() never seems to return.
>
> I'll walk through that logic when I get back to my test box, but it's
> also worth noting that Vista's MCFG on this machine apparently works ok
> too.

I would try sticking some debug in arch/x86_64/pci/mmconfig.c at the
beginning and end of pci_mmcfg_read and pci_mmcfg_write to print the
seg, bus, devfn and reg for each read and write. Hopefully that will
track down the one that is causing the lockup, if it is an actual
MMCONFIG access that's doing it..

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2007-05-21 19:11:08

by Jesse Barnes

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

> > > What happens if you take out the chipset register detection, does
> > > the MCFG table give you the same result? Wonder if they're doing
> > > something funny with start/end bus values or something in their
> > > table. There's some code in my patch that prints out the important
> > > data from the MCFG table, can you tell me what that shows with the
> > > chipset detection taken out?

I can't see how any MCFG based accesses could work on this box, but I
don't know why. According to the boot log (with our code patched in
but disabled after checking the ACPI reserved status), the space is fine:

...
ACPI: (supports S0 S3 S4 S5)
ACPI: Using IOAPIC for interrupt routing
pciexbar lo: 0xf0000003
pciexbar hi: 0x00000000
Enabled MCFG space at 0x00000000f0000000, size 134217728
PCI: Found Intel Corporation G965 Express Memory Controller Hub with MMCONFIG support.
PCI: MCFG configuration 0: base 00000000f0000000 segment 0 buses 0 - 127
PCI: MCFG area at f0000000 reserved in ACPI motherboard resources
PCI: Not using MMCONFIG. <-- due to the 'goto reject' after
if (is_acpi_reserved) { ... }
PM: Adding info for acpi:acpi_system:00
PM: Adding info for acpi:button_power:00
...

Same thing happens if I disable the chipset specific code and just use
the ACPI stuff you added.

If I leave it enabled, several config cycles work fine, but the box
eventually hangs after probing 24 devices or so. I don't see anything
else mapped into this space, and the MTRRs seem ok, so either there's
something hidden in this memory range or there's another chipset register
that needs poking to fully enable this space properly.

Sysrq doesn't seem to work, and I don't see any events in my machine log,
so figuring out exactly why it's hanging is a bit difficult.

Any ideas on what to try next? I'll see if I can get some more details
from our BIOS folks and do yet another pass over the documentation to see
if there's something I'm missing.

Thanks,
Jesse

2007-05-21 19:39:58

by Robert Hancock

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

Jesse Barnes wrote:
>>>> What happens if you take out the chipset register detection, does
>>>> the MCFG table give you the same result? Wonder if they're doing
>>>> something funny with start/end bus values or something in their
>>>> table. There's some code in my patch that prints out the important
>>>> data from the MCFG table, can you tell me what that shows with the
>>>> chipset detection taken out?
>
> I can't see how any MCFG based accesses could work on this box, but I
> don't know why. According to the boot log (with our code patched in
> but disabled after checking the ACPI reserved status), the space is fine:
>
> ...
> ACPI: (supports S0 S3 S4 S5)
> ACPI: Using IOAPIC for interrupt routing
> pciexbar lo: 0xf0000003
> pciexbar hi: 0x00000000
> Enabled MCFG space at 0x00000000f0000000, size 134217728
> PCI: Found Intel Corporation G965 Express Memory Controller Hub with MMCONFIG support.
> PCI: MCFG configuration 0: base 00000000f0000000 segment 0 buses 0 - 127
> PCI: MCFG area at f0000000 reserved in ACPI motherboard resources
> PCI: Not using MMCONFIG. <-- due to the 'goto reject' after
> if (is_acpi_reserved) { ... }
> PM: Adding info for acpi:acpi_system:00
> PM: Adding info for acpi:button_power:00
> ...
>
> Same thing happens if I disable the chipset specific code and just use
> the ACPI stuff you added.
>
> If I leave it enabled, several config cycles work fine, but the box
> eventually hangs after probing 24 devices or so. I don't see anything
> else mapped into this space, and the MTRRs seem ok, so either there's
> something hidden in this memory range or there's another chipset register
> that needs poking to fully enable this space properly.
>
> Sysrq doesn't seem to work, and I don't see any events in my machine log,
> so figuring out exactly why it's hanging is a bit difficult.
>
> Any ideas on what to try next? I'll see if I can get some more details
> from our BIOS folks and do yet another pass over the documentation to see
> if there's something I'm missing.

Can you find out which config access (bus, device, function, address) is
the one that hangs the box? I assume that either the corresponding
address in the MCFG table is problematic (i.e. has something else mapped
over it), or maybe that device just doesn't like being probed with MCFG
somehow.

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2007-05-21 20:07:36

by Jesse Barnes

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

On Monday, May 21, 2007, Robert Hancock wrote:
> > If I leave it enabled, several config cycles work fine, but the box
> > eventually hangs after probing 24 devices or so. I don't see anything
> > else mapped into this space, and the MTRRs seem ok, so either there's
> > something hidden in this memory range or there's another chipset
> > register that needs poking to fully enable this space properly.
> >
> > Sysrq doesn't seem to work, and I don't see any events in my machine
> > log, so figuring out exactly why it's hanging is a bit difficult.
> >
> > Any ideas on what to try next? I'll see if I can get some more
> > details from our BIOS folks and do yet another pass over the
> > documentation to see if there's something I'm missing.
>
> Can you find out which config access (bus, device, function, address) is
> the one that hangs the box? I assume that either the corresponding
> address in the MCFG table is problematic (i.e. has something else mapped
> over it), or maybe that device just doesn't like being probed with MCFG
> somehow.

Yeah, I've got that data... just a sec while I make sure it's
reproducable...

Aha, I hadn't decoded the devfn before, looks like it's dying on an access
to the graphics device (bus 0, slot 2, device 0):

...
pci_mmcfg_read: 0, 0, 0x10, 0x18, 4 = 0xc000000c
pci_mmcfg_read: 0, 0, 0x10, 0x18, 4 = <hang>
...

Offset 0x18 into the graphics config space should be the graphics memory
range address, and 0xc000000c is the correct value. But for some reason
it hangs on the second access.

It hangs here everytime.

Jesse

2007-05-21 20:22:39

by Jesse Barnes

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

On Monday, May 21, 2007, Jesse Barnes wrote:
> Yeah, I've got that data... just a sec while I make sure it's
> reproducable...
>
> Aha, I hadn't decoded the devfn before, looks like it's dying on an
> access to the graphics device (bus 0, slot 2, device 0):
>
> ...
> pci_mmcfg_read: 0, 0, 0x10, 0x18, 4 = 0xc000000c
> pci_mmcfg_read: 0, 0, 0x10, 0x18, 4 = <hang>
> ...
>
> Offset 0x18 into the graphics config space should be the graphics memory
> range address, and 0xc000000c is the correct value. But for some reason
> it hangs on the second access.
>
> It hangs here everytime.

That register is in the config space BAR region, so it should be ok to
write 0xffffffff to it and read it back to size the register. However,
it's after writing the 0xffffffff to it and trying to read it back that
the machine hangs. I didn't see any accesses to the command register to
disable decoding (at least not via the mmconfig methods), so maybe that's
broken during MCFG based probing?

Jesse

2007-05-23 00:31:58

by Robert Hancock

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

Jesse Barnes wrote:
> On Monday, May 21, 2007, Jesse Barnes wrote:
>> Yeah, I've got that data... just a sec while I make sure it's
>> reproducable...
>>
>> Aha, I hadn't decoded the devfn before, looks like it's dying on an
>> access to the graphics device (bus 0, slot 2, device 0):
>>
>> ...
>> pci_mmcfg_read: 0, 0, 0x10, 0x18, 4 = 0xc000000c
>> pci_mmcfg_read: 0, 0, 0x10, 0x18, 4 = <hang>
>> ...
>>
>> Offset 0x18 into the graphics config space should be the graphics memory
>> range address, and 0xc000000c is the correct value. But for some reason
>> it hangs on the second access.
>>
>> It hangs here everytime.
>
> That register is in the config space BAR region, so it should be ok to
> write 0xffffffff to it and read it back to size the register. However,
> it's after writing the 0xffffffff to it and trying to read it back that
> the machine hangs. I didn't see any accesses to the command register to
> disable decoding (at least not via the mmconfig methods), so maybe that's
> broken during MCFG based probing?

Eww. I don't see where we disable the decode at all while we probe the
BARs on the device. That seems like a bad thing, especially with the way
we probe 64-bit BARs (do the low 32 bits first and then the high 32
bits). This means the base address effectively gets set to 0xfffffff0
momentarily, which might cause some issues.

I'd try adding some code inside pci_setup_device (drivers/pci/probe.c)
to disable PCI_COMMAND_IO and PCI_COMMAND_MEMORY on the device when
probing devices with the standard header type and then restoring the
previous command bits afterwards, and see what effect that has. It'll be
interesting if it does, since obviously it seems to work as it is with
non-MMCONFIG access methods. Maybe the base address being set like that
interferes with MMCONFIG access itself somehow?

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2007-05-23 00:38:34

by Jesse Barnes

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

On Tuesday, May 22, 2007, Robert Hancock wrote:
> Eww. I don't see where we disable the decode at all while we probe the
> BARs on the device. That seems like a bad thing, especially with the way
> we probe 64-bit BARs (do the low 32 bits first and then the high 32
> bits). This means the base address effectively gets set to 0xfffffff0
> momentarily, which might cause some issues.

I'm a bit shocked that things work as well as they do without the
disabling...

> I'd try adding some code inside pci_setup_device (drivers/pci/probe.c)
> to disable PCI_COMMAND_IO and PCI_COMMAND_MEMORY on the device when
> probing devices with the standard header type and then restoring the
> previous command bits afterwards, and see what effect that has. It'll be
> interesting if it does, since obviously it seems to work as it is with
> non-MMCONFIG access methods. Maybe the base address being set like that
> interferes with MMCONFIG access itself somehow?

I tried that, and it seems to get past probing the graphics device at
least, but it hangs a bit later. It could be that the enable/disable I
added wasn't correct though, I didn't check to see which one I should
disable in the command word, which may be a problem (just disabled them
both every probe). I'll try again with more precise enable/disable
semantics.

Jesse

2007-05-23 00:54:23

by Robert Hancock

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

Jesse Barnes wrote:
> On Tuesday, May 22, 2007, Robert Hancock wrote:
>> Eww. I don't see where we disable the decode at all while we probe the
>> BARs on the device. That seems like a bad thing, especially with the way
>> we probe 64-bit BARs (do the low 32 bits first and then the high 32
>> bits). This means the base address effectively gets set to 0xfffffff0
>> momentarily, which might cause some issues.
>
> I'm a bit shocked that things work as well as they do without the
> disabling...
>
>> I'd try adding some code inside pci_setup_device (drivers/pci/probe.c)
>> to disable PCI_COMMAND_IO and PCI_COMMAND_MEMORY on the device when
>> probing devices with the standard header type and then restoring the
>> previous command bits afterwards, and see what effect that has. It'll be
>> interesting if it does, since obviously it seems to work as it is with
>> non-MMCONFIG access methods. Maybe the base address being set like that
>> interferes with MMCONFIG access itself somehow?
>
> I tried that, and it seems to get past probing the graphics device at
> least, but it hangs a bit later. It could be that the enable/disable I
> added wasn't correct though, I didn't check to see which one I should
> disable in the command word, which may be a problem (just disabled them
> both every probe). I'll try again with more precise enable/disable
> semantics.

It'd be interesting to see at what access it ran into trouble next, at
least if it's consistent. Could be that some device doesn't like having
the decode disabled..

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2007-05-23 00:57:01

by Jesse Barnes

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

On Tuesday, May 22, 2007, Robert Hancock wrote:
> Jesse Barnes wrote:
> > On Tuesday, May 22, 2007, Robert Hancock wrote:
> >> Eww. I don't see where we disable the decode at all while we probe
> >> the BARs on the device. That seems like a bad thing, especially with
> >> the way we probe 64-bit BARs (do the low 32 bits first and then the
> >> high 32 bits). This means the base address effectively gets set to
> >> 0xfffffff0 momentarily, which might cause some issues.
> >
> > I'm a bit shocked that things work as well as they do without the
> > disabling...
> >
> >> I'd try adding some code inside pci_setup_device
> >> (drivers/pci/probe.c) to disable PCI_COMMAND_IO and
> >> PCI_COMMAND_MEMORY on the device when probing devices with the
> >> standard header type and then restoring the previous command bits
> >> afterwards, and see what effect that has. It'll be interesting if it
> >> does, since obviously it seems to work as it is with non-MMCONFIG
> >> access methods. Maybe the base address being set like that interferes
> >> with MMCONFIG access itself somehow?
> >
> > I tried that, and it seems to get past probing the graphics device at
> > least, but it hangs a bit later. It could be that the enable/disable
> > I added wasn't correct though, I didn't check to see which one I
> > should disable in the command word, which may be a problem (just
> > disabled them both every probe). I'll try again with more precise
> > enable/disable semantics.
>
> It'd be interesting to see at what access it ran into trouble next, at
> least if it's consistent. Could be that some device doesn't like having
> the decode disabled..

I think it actually gets through the probing but hangs elsewhere, but I'll
have to test again to be sure.

Jesse

2007-05-23 01:06:31

by Robert Hancock

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

Jesse Barnes wrote:
> On Tuesday, May 22, 2007, Robert Hancock wrote:
>> Eww. I don't see where we disable the decode at all while we probe the
>> BARs on the device. That seems like a bad thing, especially with the way
>> we probe 64-bit BARs (do the low 32 bits first and then the high 32
>> bits). This means the base address effectively gets set to 0xfffffff0
>> momentarily, which might cause some issues.
>
> I'm a bit shocked that things work as well as they do without the
> disabling...
>
>> I'd try adding some code inside pci_setup_device (drivers/pci/probe.c)
>> to disable PCI_COMMAND_IO and PCI_COMMAND_MEMORY on the device when
>> probing devices with the standard header type and then restoring the
>> previous command bits afterwards, and see what effect that has. It'll be
>> interesting if it does, since obviously it seems to work as it is with
>> non-MMCONFIG access methods. Maybe the base address being set like that
>> interferes with MMCONFIG access itself somehow?
>
> I tried that, and it seems to get past probing the graphics device at
> least, but it hangs a bit later. It could be that the enable/disable I
> added wasn't correct though, I didn't check to see which one I should
> disable in the command word, which may be a problem (just disabled them
> both every probe). I'll try again with more precise enable/disable
> semantics.

There was a big discussion about this back in 2002, in which Linus
wasn't overly enthused about disabling the decode during probing due to
risk of causing problems with some devices:

http://lkml.org/lkml/2002/12/19/145

In this particular case (64-bit BAR) we might be able to avoid the
problem by changing the order in which we probe the two halves of the
address, i.e. change the top half to 0xffffffff before messing with the
bottom half and then change it back last. That way, we end up mapping it
way to the top of 64-bit address space, which hopefully is less likely
to conflict..

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2007-05-23 18:53:21

by Jesse Barnes

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

On Tuesday, May 22, 2007 6:06 pm Robert Hancock wrote:
> There was a big discussion about this back in 2002, in which Linus
> wasn't overly enthused about disabling the decode during probing due
> to risk of causing problems with some devices:
>
> http://lkml.org/lkml/2002/12/19/145
>
> In this particular case (64-bit BAR) we might be able to avoid the
> problem by changing the order in which we probe the two halves of the
> address, i.e. change the top half to 0xffffffff before messing with
> the bottom half and then change it back last. That way, we end up
> mapping it way to the top of 64-bit address space, which hopefully is
> less likely to conflict..

Fixed it (finally). I don't think moving the 64 bit probing around
would make a difference, since we'd restore its original value anyway
before moving on to the 32 bit probe which is where I think the problem
is.

I think what's happening is the probe is writing 0xffffffff to the video
device, which is in the GMCH, and without memory decoding disabled,
it'll start claiming PCI config access cycles causing the problems I
saw. So my code to disable I/O and memory decode was actually working
but I had a bug in the re-enable path so all my devices were staying
disabled. :)

So here's the patch I used (along with your ACPI patch and my 965 MCFG
enable patch of course). The probing code could probably use a bit
more cleanup, but this patch limits itself to implementing PCI_COMMAND
based I/O and memory space decode disabling during size probing. We
might want to do this unconditionally if we're using mmconfig based
configuration access, since I imagine other machines might end up
having similar address space layouts that would cause problems.

Linus, since you were the one concerned about breaking working setups,
what do you think? Should we use this approach, or specifically quirk
out cases where mmconfig space might conflict with BAR probing?

Thanks,
Jesse

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index e48fcf0..69dfe0c 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -170,6 +170,48 @@ static inline int is_64bit_memory(u32 mask)
return 0;
}

+#define BAR_IS_MEMORY(bar) (((bar) & PCI_BASE_ADDRESS_SPACE) == \
+ PCI_BASE_ADDRESS_SPACE_MEMORY)
+
+/**
+ * pci_bar_size - get raw PCI BAR size
+ * @dev: PCI device
+ * @reg: BAR to probe
+ *
+ * Use basic PCI probing:
+ * - save original BAR value
+ * - disable MEM or IO decode as appropriate in PCI_COMMAND reg
+ * - write all 1s to the BAR
+ * - read back value
+ * - reenble MEM or IO decode as necessary
+ * - write original value back
+ *
+ * Returns raw BAR size to caller.
+ */
+static u32 pci_bar_size(struct pci_dev *dev, unsigned int reg)
+{
+ u32 orig_reg, sz;
+ u16 orig_cmd;
+
+ pci_read_config_dword(dev, reg, &orig_reg);
+ pci_read_config_word(dev, PCI_COMMAND, &orig_cmd);
+
+ if (BAR_IS_MEMORY(orig_reg))
+ pci_write_config_word(dev, PCI_COMMAND,
+ orig_cmd & ~PCI_COMMAND_MEMORY);
+ else
+ pci_write_config_word(dev, PCI_COMMAND,
+ orig_cmd & ~PCI_COMMAND_IO);
+
+ pci_write_config_dword(dev, reg, 0xffffffff);
+ pci_read_config_dword(dev, reg, &sz);
+ pci_write_config_dword(dev, reg, orig_reg);
+
+ pci_write_config_word(dev, PCI_COMMAND, orig_cmd);
+
+ return sz;
+}
+
static void pci_read_bases(struct pci_dev *dev, unsigned int howmany,
int rom)
{
unsigned int pos, reg, next;
@@ -185,17 +227,15 @@ static void pci_read_bases(struct pci_dev *dev,
unsigned int howmany, int rom)
res = &dev->resource[pos];
res->name = pci_name(dev);
reg = PCI_BASE_ADDRESS_0 + (pos << 2);
+
pci_read_config_dword(dev, reg, &l);
- pci_write_config_dword(dev, reg, ~0);
- pci_read_config_dword(dev, reg, &sz);
- pci_write_config_dword(dev, reg, l);
+ sz = pci_bar_size(dev, reg);
if (!sz || sz == 0xffffffff)
continue;
if (l == 0xffffffff)
l = 0;
raw_sz = sz;
- if ((l & PCI_BASE_ADDRESS_SPACE) ==
- PCI_BASE_ADDRESS_SPACE_MEMORY) {
+ if (BAR_IS_MEMORY(l)) {
sz = pci_size(l, sz, (u32)PCI_BASE_ADDRESS_MEM_MASK);
/*
* For 64bit prefetchable memory sz could be 0, if the
@@ -219,9 +259,7 @@ static void pci_read_bases(struct pci_dev *dev,
unsigned int howmany, int rom)
u32 szhi, lhi;

pci_read_config_dword(dev, reg+4, &lhi);
- pci_write_config_dword(dev, reg+4, ~0);
- pci_read_config_dword(dev, reg+4, &szhi);
- pci_write_config_dword(dev, reg+4, lhi);
+ szhi = pci_bar_size(dev, reg+4);
sz64 = ((u64)szhi << 32) | raw_sz;
l64 = ((u64)lhi << 32) | l;
sz64 = pci_size64(l64, sz64, PCI_BASE_ADDRESS_MEM_MASK);

2007-05-23 20:22:39

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources



On Wed, 23 May 2007, Jesse Barnes wrote:
>
> Fixed it (finally). I don't think moving the 64 bit probing around
> would make a difference, since we'd restore its original value anyway
> before moving on to the 32 bit probe which is where I think the problem
> is.

Well, the thing is, I'm pretty sure there is at least one northbridge that
stops memory accesses from the CPU when you turn off the MEM bit on it.
Oops, you just killed the machine.

Looking at the 925X datasheet (which I happened to have around in my
google search history because of the discussions of the sky2 DMA
problems), it looks like at least that one just hardcodes the MEM bit to
be 1, and thus writing to it is a total no-op.

But I really think that clearing the MEM bit for at least the host bridge
is conceptually quite wrong, even if it might turn out that all chipsets
end up just saying (like Intel) "screw it, the user is insane, we're not
going to actually do what he asks us to do".

Do we really want to be that insane? Turn off memory accesses when probing
the CPU host bridge?

So at a _minimum_ I would say that that thing needs to be more careful
about host bridges. Maybe it's not needed, who knows?

> Linus, since you were the one concerned about breaking working setups,
> what do you think? Should we use this approach, or specifically quirk
> out cases where mmconfig space might conflict with BAR probing?

So see above. I think at a minimum, we should consider the host bridge
special.

I also suspect that we'd be simply better off if we didn't use mmconfig at
all unless we _have_ to. Why use mmconfig for the standard BAR accesses?
Is there really any reason? I can understand using it for extended config
space, since then the old-fashioned approach won't work. But for normal
accesses? What's the point, really?

mmconfig seems to be fundamentally designed to be impossible to bootstrap
off, so there's no way you can have a machine that _only_ supports
mmconfig. So why do people seem to think it's so wonderful? Please fill me
in on this fundamental mystery.

Quite frankly, if we just didn't use mmconfig, the whole issue would go
away. Isn't _that_ the much better solution?

Linus

2007-05-23 20:35:30

by Alan

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

> Well, the thing is, I'm pretty sure there is at least one northbridge that
> stops memory accesses from the CPU when you turn off the MEM bit on it.
> Oops, you just killed the machine.

CS5520. But it doesn't have 64bit or PCI Express.


Alan

2007-05-23 20:46:56

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources



On Wed, 23 May 2007, Alan Cox wrote:
>
> > Well, the thing is, I'm pretty sure there is at least one northbridge that
> > stops memory accesses from the CPU when you turn off the MEM bit on it.
> > Oops, you just killed the machine.
>
> CS5520. But it doesn't have 64bit or PCI Express.

That patch does it for _all_ PCI probing. So it would turn any machine
using that northbridge into a brick.

Linus

2007-05-23 20:50:28

by Jesse Barnes

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

On Wednesday, May 23, 2007 1:20 pm Linus Torvalds wrote:
> On Wed, 23 May 2007, Jesse Barnes wrote:
> > Fixed it (finally). I don't think moving the 64 bit probing around
> > would make a difference, since we'd restore its original value
> > anyway before moving on to the 32 bit probe which is where I think
> > the problem is.
>
> Well, the thing is, I'm pretty sure there is at least one northbridge
> that stops memory accesses from the CPU when you turn off the MEM bit
> on it. Oops, you just killed the machine.

Wow, that sounds like a pretty lame host bridge.

> Looking at the 925X datasheet (which I happened to have around in my
> google search history because of the discussions of the sky2 DMA
> problems), it looks like at least that one just hardcodes the MEM bit
> to be 1, and thus writing to it is a total no-op.
>
> But I really think that clearing the MEM bit for at least the host
> bridge is conceptually quite wrong, even if it might turn out that
> all chipsets end up just saying (like Intel) "screw it, the user is
> insane, we're not going to actually do what he asks us to do".
>
> Do we really want to be that insane? Turn off memory accesses when
> probing the CPU host bridge?
>
> So at a _minimum_ I would say that that thing needs to be more
> careful about host bridges. Maybe it's not needed, who knows?

I'm not sure either, but the PCI spec is pretty clear about how probing
ought to be done, and it seems that other OSes do the disabling (though
I'm not sure about how they handle broken host bridges like the one you
mention).

> I also suspect that we'd be simply better off if we didn't use
> mmconfig at all unless we _have_ to. Why use mmconfig for the
> standard BAR accesses? Is there really any reason? I can understand
> using it for extended config space, since then the old-fashioned
> approach won't work. But for normal accesses? What's the point,
> really?

Yeah, it's mainly needed for extended config space and PCIe (lots of
regular PCIe features are in the extended space and are assumed to be
accessible).

> mmconfig seems to be fundamentally designed to be impossible to
> bootstrap off, so there's no way you can have a machine that _only_
> supports mmconfig. So why do people seem to think it's so wonderful?
> Please fill me in on this fundamental mystery.

Well, non-x86 people I think are fairly used to it, for one.

> Quite frankly, if we just didn't use mmconfig, the whole issue would
> go away. Isn't _that_ the much better solution?

Not for systems with PCIe... and the platforms I've been having trouble
with have PCIe slots, so I'd really like mmconfig to be used at least
on machines with PCIe bridges. For other machines, it probably doesn't
matter much. I don't know of any regular PCI devices offhand that
really need extended config space.

Jesse

2007-05-23 20:58:28

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources



On Wed, 23 May 2007, Jesse Barnes wrote:

> On Wednesday, May 23, 2007 1:20 pm Linus Torvalds wrote:
> > On Wed, 23 May 2007, Jesse Barnes wrote:
> > > Fixed it (finally). I don't think moving the 64 bit probing around
> > > would make a difference, since we'd restore its original value
> > > anyway before moving on to the 32 bit probe which is where I think
> > > the problem is.
> >
> > Well, the thing is, I'm pretty sure there is at least one northbridge
> > that stops memory accesses from the CPU when you turn off the MEM bit
> > on it. Oops, you just killed the machine.
>
> Wow, that sounds like a pretty lame host bridge.

Umm. Why? Think about it.

You ASKED it to stop forwarding memory.

So who is lamer: the chip that does what it is told, or the software that
tells it to do it?

I'd vote for the software. Any programmer who expects the hardware to
"just do what I mean, not what I say" is not a programmer, but a dreamer.

You told it to not forward memory. Why complain when it does as told?

> > Quite frankly, if we just didn't use mmconfig, the whole issue would
> > go away. Isn't _that_ the much better solution?
>
> Not for systems with PCIe... and the platforms I've been having trouble
> with have PCIe slots, so I'd really like mmconfig to be used at least
> on machines with PCIe bridges. For other machines, it probably doesn't
> matter much. I don't know of any regular PCI devices offhand that
> really need extended config space.

Ehh. Even for PCIe, why not use the normal accesses for the first 256
bytes? Problem solved.

Linus

2007-05-23 21:03:59

by Jesse Barnes

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

On Wednesday, May 23, 2007 1:56 pm Linus Torvalds wrote:
> On Wed, 23 May 2007, Jesse Barnes wrote:
> > On Wednesday, May 23, 2007 1:20 pm Linus Torvalds wrote:
> > > On Wed, 23 May 2007, Jesse Barnes wrote:
> > > > Fixed it (finally). I don't think moving the 64 bit probing
> > > > around would make a difference, since we'd restore its original
> > > > value anyway before moving on to the 32 bit probe which is
> > > > where I think the problem is.
> > >
> > > Well, the thing is, I'm pretty sure there is at least one
> > > northbridge that stops memory accesses from the CPU when you turn
> > > off the MEM bit on it. Oops, you just killed the machine.
> >
> > Wow, that sounds like a pretty lame host bridge.
>
> Umm. Why? Think about it.
>
> You ASKED it to stop forwarding memory.
>
> So who is lamer: the chip that does what it is told, or the software
> that tells it to do it?
>
> I'd vote for the software. Any programmer who expects the hardware to
> "just do what I mean, not what I say" is not a programmer, but a
> dreamer.
>
> You told it to not forward memory. Why complain when it does as told?

Well, because that's not actually very useful functionality, and likely
makes software that seems "obviously" correct wrt the PCI spec break.

> > > Quite frankly, if we just didn't use mmconfig, the whole issue
> > > would go away. Isn't _that_ the much better solution?
> >
> > Not for systems with PCIe... and the platforms I've been having
> > trouble with have PCIe slots, so I'd really like mmconfig to be
> > used at least on machines with PCIe bridges. For other machines,
> > it probably doesn't matter much. I don't know of any regular PCI
> > devices offhand that really need extended config space.
>
> Ehh. Even for PCIe, why not use the normal accesses for the first 256
> bytes? Problem solved.

Yeah, that's another option. Would just mean an additional conditional
in the mmconfig code, I'll give it a try...

Apparently Vista will move away from using type 1 config space accesses
though, so if we keep using it, we'll probably run into some lame board
that assumes you're using mmconfig at some point in the near future.
But then again, we're often on that less tested path (e.g. with ACPI),
so maybe that doesn't matter much.

Jesse

2007-05-23 21:09:55

by Jeff Garzik

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

Jesse Barnes wrote:
> Apparently Vista will move away from using type 1 config space accesses
> though, so if we keep using it, we'll probably run into some lame board

Yep.


> that assumes you're using mmconfig at some point in the near future.
> But then again, we're often on that less tested path (e.g. with ACPI),
> so maybe that doesn't matter much.

One of the reasons why hardware vendors want to move away from
traditional accesses is to be able to use the larger config space in
PCI-Express, rather than being locked into the 256-byte legacy PCI
config space.

Several modern PCI-Express devices utilize the upper config space, but
due to legacy reasons the registers are usually ones that do not require
OS drivers to know about (like BIST stuff or diagnostic registers).

Expect that to change, as MS shakes out the bugs (or maybe we are doing
their job for them?).

Jeff


2007-05-23 21:20:52

by Jesse Barnes

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

On Wednesday, May 23, 2007 1:56 pm Linus Torvalds wrote:
> > Not for systems with PCIe... and the platforms I've been having
> > trouble with have PCIe slots, so I'd really like mmconfig to be
> > used at least on machines with PCIe bridges. For other machines,
> > it probably doesn't matter much. I don't know of any regular PCI
> > devices offhand that really need extended config space.
>
> Ehh. Even for PCIe, why not use the normal accesses for the first 256
> bytes? Problem solved.

Ok, this patch also works. We still need to enable mmconfig space for
PCIe and extended config space, but we can continue to use type 1
accesses for legacy PCI config space cycles to avoid decode trouble
with mmconfig based BAR sizing.

Assuming Robert's and my patches to enable mmconfig space go in, we'd
want a similar patch to the i386 mmconfig code.

Jesse

diff --git a/arch/x86_64/pci/mmconfig.c b/arch/x86_64/pci/mmconfig.c
index 65d8273..5052f80 100644
--- a/arch/x86_64/pci/mmconfig.c
+++ b/arch/x86_64/pci/mmconfig.c
@@ -61,7 +61,7 @@ static int pci_mmcfg_read(unsigned int seg, unsigned
int bus,
}

addr = pci_dev_base(seg, bus, devfn);
- if (!addr)
+ if (!addr || reg < 256) /* Use type 1 for non-extended access */
return pci_conf1_read(seg,bus,devfn,reg,len,value);

switch (len) {
@@ -89,7 +89,7 @@ static int pci_mmcfg_write(unsigned int seg, unsigned
int bus,
return -EINVAL;

addr = pci_dev_base(seg, bus, devfn);
- if (!addr)
+ if (!addr || reg < 256) /* Use type 1 for non-extended access */
return pci_conf1_write(seg,bus,devfn,reg,len,value);

switch (len) {

2007-05-23 21:33:21

by Alan

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

> One of the reasons why hardware vendors want to move away from
> traditional accesses is to be able to use the larger config space in
> PCI-Express, rather than being locked into the 256-byte legacy PCI
> config space.

Mostly for treacherous computing extensions where subsets of the config
space can only be accessed by signed machines blessed by your favourite
movie company and video card vendor...

> Expect that to change, as MS shakes out the bugs (or maybe we are doing
> their job for them?).

The longer it takes - the better.

Alan

2007-05-23 21:35:42

by Jeff Garzik

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

Alan Cox wrote:
>> One of the reasons why hardware vendors want to move away from
>> traditional accesses is to be able to use the larger config space in
>> PCI-Express, rather than being locked into the 256-byte legacy PCI
>> config space.
>
> Mostly for treacherous computing extensions where subsets of the config
> space can only be accessed by signed machines blessed by your favourite
> movie company and video card vendor...

Um, no, Mr. Paranoia, it's a standard part of the spec.

Jeff



2007-05-23 21:38:44

by Jesse Barnes

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

On Wednesday, May 23, 2007 2:35 pm Alan Cox wrote:
> > One of the reasons why hardware vendors want to move away from
> > traditional accesses is to be able to use the larger config space
> > in PCI-Express, rather than being locked into the 256-byte legacy
> > PCI config space.
>
> Mostly for treacherous computing extensions where subsets of the
> config space can only be accessed by signed machines blessed by your
> favourite movie company and video card vendor...

I hate "trusted" platform garbage as much as the next guy
(where "trusted" means the actual user can't trust it, just the
seller), but I think there are legitimate uses of extended space as
well, PCIe AER uses it iirc, so don't dismiss it on those grounds. :)

Jesse

2007-05-23 21:42:31

by Jeff Garzik

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

Jesse Barnes wrote:
> On Wednesday, May 23, 2007 2:35 pm Alan Cox wrote:
>>> One of the reasons why hardware vendors want to move away from
>>> traditional accesses is to be able to use the larger config space
>>> in PCI-Express, rather than being locked into the 256-byte legacy
>>> PCI config space.
>> Mostly for treacherous computing extensions where subsets of the
>> config space can only be accessed by signed machines blessed by your
>> favourite movie company and video card vendor...
>
> I hate "trusted" platform garbage as much as the next guy
> (where "trusted" means the actual user can't trust it, just the
> seller), but I think there are legitimate uses of extended space as
> well, PCIe AER uses it iirc, so don't dismiss it on those grounds. :)

Indeed. It's just a register space. Assuming one register space is
"more evil" than another, simply because it is bigger, is.. well.. silly.

Jeff



2007-05-23 21:57:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources



On Wed, 23 May 2007, Jesse Barnes wrote:
> > You told it to not forward memory. Why complain when it does as told?
>
> Well, because that's not actually very useful functionality, and likely
> makes software that seems "obviously" correct wrt the PCI spec break.

I agree that a chip that doesn't do it isn't broken either, but the fact
is, there is never any reason to disable MEM/IO on a host bridge. Doing so
is senseless - it can never be a valid operation. So I duspute the
"obviously correct" part. It's _not_ obviously correct at all.

To get back to the MMIO example: even if you were to never shut off RAM,
if you turn off just PCI MMIO on the northbridge, what is a mmconfig cycle
supposed to do? It's not going to _work_ if you disable MEM accesses.

So again, the only sane situation is: don't do it then! You claim that
hardware shouldn't do it, but I don't think software is in any different
situation at all! If it's insane to do, then software shouldn't do it.

It's just insane to turn off the MEM bit. There's simply no valid reason
to. And any PCI spec that says you should is *broken*, or written by
somebody who really only meant to talk about normal PCI devices, not
bridges.

> Apparently Vista will move away from using type 1 config space accesses
> though, so if we keep using it, we'll probably run into some lame board
> that assumes you're using mmconfig at some point in the near future.

How are those boards going to set up mmconfig? The whole standard is
broken, since there is no way to set it up.

Trust the firmware? What a piece of crap!

Linus

2007-05-23 22:07:01

by Jesse Barnes

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

On Wednesday, May 23, 2007 2:54 pm Linus Torvalds wrote:
> On Wed, 23 May 2007, Jesse Barnes wrote:
> > > You told it to not forward memory. Why complain when it does as
> > > told?
> >
> > Well, because that's not actually very useful functionality, and
> > likely makes software that seems "obviously" correct wrt the PCI
> > spec break.
>
> I agree that a chip that doesn't do it isn't broken either, but the
> fact is, there is never any reason to disable MEM/IO on a host
> bridge. Doing so is senseless - it can never be a valid operation. So
> I duspute the "obviously correct" part. It's _not_ obviously correct
> at all.
>
> To get back to the MMIO example: even if you were to never shut off
> RAM, if you turn off just PCI MMIO on the northbridge, what is a
> mmconfig cycle supposed to do? It's not going to _work_ if you
> disable MEM accesses.
>
> So again, the only sane situation is: don't do it then! You claim
> that hardware shouldn't do it, but I don't think software is in any
> different situation at all! If it's insane to do, then software
> shouldn't do it.
>
> It's just insane to turn off the MEM bit. There's simply no valid
> reason to. And any PCI spec that says you should is *broken*, or
> written by somebody who really only meant to talk about normal PCI
> devices, not bridges.

Well theoretically for just sizing BARs, turning off the MEM bit should
be fine, since your next accesses should only be to config space until
the MEM bit is reenabled. But if RAM accesses really are disabled,
then you'd better be sure all the code you need is already in cache, or
you'll get into trouble.

So yeah, I guess special handling for host bridges is needed, but that
doesn't seem like a big deal.

> > Apparently Vista will move away from using type 1 config space
> > accesses though, so if we keep using it, we'll probably run into
> > some lame board that assumes you're using mmconfig at some point in
> > the near future.
>
> How are those boards going to set up mmconfig? The whole standard is
> broken, since there is no way to set it up.
>
> Trust the firmware? What a piece of crap!

What do you mean? You set it up the normal way, by poking at config
space to see what's there, then size the BARs (disabling mem and I/O
accesses in PCI_COMMAND shouldn't affect config space cycles afaik).
You just have to be careful to disable decoding for I/O and memory
regions, especially if your mmconfig space overlaps with what the
devices end up with in their BARs. Which is why my initial patch works
ok (because fortunately the Intel host bridges hard code the mem decode
bit to 1 too).

Jesse

2007-05-23 22:18:41

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources



On Wed, 23 May 2007, Jesse Barnes wrote:
> >
> > How are those boards going to set up mmconfig? The whole standard is
> > broken, since there is no way to set it up.
> >
> > Trust the firmware? What a piece of crap!
>
> What do you mean? You set it up the normal way, by poking at config
> space to see what's there

HOW DO YOU GET TO THE CONFIG SPACE IN THE FIRST PLACE?

The reason mmconfig is *BROKEN*CRAP* is that you cannot bootstrap it.
There's no standard way to even figure out WHERE IT IS!

So we depend on firmware tables that are known to be broken!

That crap should be seen for the crap it is! Dammit, how hard can it be to
just admit that mmconfig isn't that great?

Linus

2007-05-23 22:25:18

by Olivier Galibert

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

On Wed, May 23, 2007 at 02:20:23PM -0700, Jesse Barnes wrote:
> On Wednesday, May 23, 2007 1:56 pm Linus Torvalds wrote:
> > Ehh. Even for PCIe, why not use the normal accesses for the first 256
> > bytes? Problem solved.
>
> Ok, this patch also works. We still need to enable mmconfig space for
> PCIe and extended config space, but we can continue to use type 1
> accesses for legacy PCI config space cycles to avoid decode trouble
> with mmconfig based BAR sizing.

Isn't that a mac-intel instant killer? AFAIK they don't have type1,
period.

OG.

2007-05-23 22:29:19

by Jesse Barnes

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

On Wednesday, May 23, 2007 3:16 pm Linus Torvalds wrote:
> On Wed, 23 May 2007, Jesse Barnes wrote:
> > > How are those boards going to set up mmconfig? The whole standard
> > > is broken, since there is no way to set it up.
> > >
> > > Trust the firmware? What a piece of crap!
> >
> > What do you mean? You set it up the normal way, by poking at
> > config space to see what's there
>
> HOW DO YOU GET TO THE CONFIG SPACE IN THE FIRST PLACE?
>
> The reason mmconfig is *BROKEN*CRAP* is that you cannot bootstrap it.
> There's no standard way to even figure out WHERE IT IS!
>
> So we depend on firmware tables that are known to be broken!
>
> That crap should be seen for the crap it is! Dammit, how hard can it
> be to just admit that mmconfig isn't that great?

Ah, yeah, that's platform specific, I thought you were confused about
how the sizing worked. On x86, we either have to look at ACPI tables
(yay) or use type 1 config accesses to get at the mmconfig base
register (which is what the patches Olivier and I posted do). On ia64
there are firmware calls to do config space accesses. Not sure about
other platforms.

I'm not claiming mmconfig is great and we should make everything use it,
but we do need it these days, so we should figure out a good way of
getting at it.

Jesse

2007-05-23 22:32:16

by Jesse Barnes

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

On Wednesday, May 23, 2007 3:24 pm Olivier Galibert wrote:
> On Wed, May 23, 2007 at 02:20:23PM -0700, Jesse Barnes wrote:
> > On Wednesday, May 23, 2007 1:56 pm Linus Torvalds wrote:
> > > Ehh. Even for PCIe, why not use the normal accesses for the first
> > > 256 bytes? Problem solved.
> >
> > Ok, this patch also works. We still need to enable mmconfig space
> > for PCIe and extended config space, but we can continue to use type
> > 1 accesses for legacy PCI config space cycles to avoid decode
> > trouble with mmconfig based BAR sizing.
>
> Isn't that a mac-intel instant killer? AFAIK they don't have type1,
> period.

Yuck. I'll have to add a check for type 1 then... but that also means
Macs will probably want the decode disable stuff I posted earlier.

Jesse

2007-05-23 22:50:10

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources



On Thu, 24 May 2007, Olivier Galibert wrote:
>
> Isn't that a mac-intel instant killer? AFAIK they don't have type1,
> period.

mac-intel are totally standard Intel chipsets. They have all of
conf1/conf2/mmconfig afaik.

I just happily booted my mac-mini with "pci=nommconf", nothing bad
happened, and the kernel says

PCI: Using configuration type 1

and I don't think you even _can_ disable conf1 type accesses: they are
deep in the Intel chipsets.

Of course, in a virtualized environment, anything can happen. Virtual
machines prefer mmconf, because you can use page-level remapping to hide
devices or make pseudo-devices show up by mapping in pages that have
nothing to do with the true hardware.

So no, I don't think Alan was totally smoking crack when he talked about
"trusted" computing. Read the above paragraph a few times. (You can do it
with trapping IO port accesses too, but it's going to cost you a lot, so
if you want to make a fast but untrustoworthy setup, MMIO is the better
option).

Linus

2007-05-23 22:55:39

by Jesse Barnes

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

On Wednesday, May 23, 2007 3:48 pm Linus Torvalds wrote:
> On Thu, 24 May 2007, Olivier Galibert wrote:
> > Isn't that a mac-intel instant killer? AFAIK they don't have
> > type1, period.
>
> mac-intel are totally standard Intel chipsets. They have all of
> conf1/conf2/mmconfig afaik.
>
> I just happily booted my mac-mini with "pci=nommconf", nothing bad
> happened, and the kernel says
>
> PCI: Using configuration type 1
>
> and I don't think you even _can_ disable conf1 type accesses: they
> are deep in the Intel chipsets.

After I sent my last message I realized the same thing... though I
occasionally hear people talk about removing it (I seriously doubt that
will ever happen). I don't even think there's a way to disable type 1
config access on Intel chipsets...

So the last patch is ok then, as long as we can find mmconfig space in
the first place, but that's a separate problem for another set of
patches (ones that seem to be working fairly well now btw).

Jesse

2007-05-23 23:04:31

by David Miller

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

From: Linus Torvalds <[email protected]>
Date: Wed, 23 May 2007 15:16:23 -0700 (PDT)

> That crap should be seen for the crap it is! Dammit, how hard can it
> be to just admit that mmconfig isn't that great?

I knew mmconfig was broken conceptually the first time I started
seeing write posting "bug fixes" for it that would do a read back from
PCI config space via mmconfig to post the write, which of course has
potential side-effects on the device and is absolutely illegal if the
write just performed put the device into a PM state or whatever.

Truth is stranger than fiction at times.

MMCONFIG is very much an ill-conceived idea.

2007-05-23 23:05:00

by Robert Hancock

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

Jesse Barnes wrote:
> On Tuesday, May 22, 2007 6:06 pm Robert Hancock wrote:
>> There was a big discussion about this back in 2002, in which Linus
>> wasn't overly enthused about disabling the decode during probing due
>> to risk of causing problems with some devices:
>>
>> http://lkml.org/lkml/2002/12/19/145
>>
>> In this particular case (64-bit BAR) we might be able to avoid the
>> problem by changing the order in which we probe the two halves of the
>> address, i.e. change the top half to 0xffffffff before messing with
>> the bottom half and then change it back last. That way, we end up
>> mapping it way to the top of 64-bit address space, which hopefully is
>> less likely to conflict..
>
> Fixed it (finally). I don't think moving the 64 bit probing around
> would make a difference, since we'd restore its original value anyway
> before moving on to the 32 bit probe which is where I think the problem
> is.

You couldn't just reorder the code the way it is now, you'd have to
rearrange the way we do things for 64-bit BARs:

-write FFFFFFFF to high part of 64-bit address (we end up moving the BAR
to 0xFFFFFFFFC0000000 for example)
-If any bits stick, we know what the size is now (more than 4GB of
decode), so just change it back, we're done
-If not, we need to check the low part, so write FFFFFFFF to low part of
64-bit address (BAR moves to 0xFFFFFFFFFFFFFFFF)
-Check which bits stick and calculate the address
-Change the low part of the address back (BAR moves to 0xFFFFFFFFC000000)
-Change the high part of the address back (BAR moves to the original
0xC0000000 address)

This means that at no point do we map the BAR anywhere near the top of
32-bit memory, so we should avoid this issue in this particular case. I
don't think this strategy is too likely to break anything, surely less
likely than disabling command bits. Jesse, you might want to try hacking
up something like this and see what happens.

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2007-05-23 23:05:34

by Robert Hancock

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

Linus Torvalds wrote:
>
> On Wed, 23 May 2007, Jesse Barnes wrote:
>> Fixed it (finally). I don't think moving the 64 bit probing around
>> would make a difference, since we'd restore its original value anyway
>> before moving on to the 32 bit probe which is where I think the problem
>> is.
>
> Well, the thing is, I'm pretty sure there is at least one northbridge that
> stops memory accesses from the CPU when you turn off the MEM bit on it.
> Oops, you just killed the machine.

Which is retarded, since the command bits are only supposed to be for
memory ranges that are part of the BARs, it's not supposed to completely
kill the device function. Unless somehow the memory on that system is
accessed through the PCI bus or something. Anyway, it's something we
have to deal with.

>
> Looking at the 925X datasheet (which I happened to have around in my
> google search history because of the discussions of the sky2 DMA
> problems), it looks like at least that one just hardcodes the MEM bit to
> be 1, and thus writing to it is a total no-op.
>
> But I really think that clearing the MEM bit for at least the host bridge
> is conceptually quite wrong, even if it might turn out that all chipsets
> end up just saying (like Intel) "screw it, the user is insane, we're not
> going to actually do what he asks us to do".
>
> Do we really want to be that insane? Turn off memory accesses when probing
> the CPU host bridge?
>
> So at a _minimum_ I would say that that thing needs to be more careful
> about host bridges. Maybe it's not needed, who knows?

I think we should likely avoid disabling the command bits on host
bridges (maybe any bridge) due to this risk of disabling something that
will break things. Ideally we can get around this without doing any
disabling at all, as noted in my last email.

>
>> Linus, since you were the one concerned about breaking working setups,
>> what do you think? Should we use this approach, or specifically quirk
>> out cases where mmconfig space might conflict with BAR probing?
>
> So see above. I think at a minimum, we should consider the host bridge
> special.
>
> I also suspect that we'd be simply better off if we didn't use mmconfig at
> all unless we _have_ to. Why use mmconfig for the standard BAR accesses?
> Is there really any reason? I can understand using it for extended config
> space, since then the old-fashioned approach won't work. But for normal
> accesses? What's the point, really?

Why not? Either you trust that the MMCONFIG is working or you don't. If
you trust it, you might as well use it for everything, and if you don't,
you can't risk using it for anything. If there are problems that show up
only with MMCONFIG, doing what you propose would simply cover them up
until somebody actually tried accessing extended config space.

> mmconfig seems to be fundamentally designed to be impossible to bootstrap
> off, so there's no way you can have a machine that _only_ supports
> mmconfig. So why do people seem to think it's so wonderful? Please fill me
> in on this fundamental mystery.

Sure you can bootstrap off it, you just need to have some way to know
where to find it (either ACPI or some other system-specific mechanism).

>
> Quite frankly, if we just didn't use mmconfig, the whole issue would go
> away. Isn't _that_ the much better solution?

I don't think that is going to be viable in the long run now that
Windows Vista is out and MS is actually encouraging HW developers to
allow using that config space..

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2007-05-23 23:06:53

by Jesse Barnes

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

On Wednesday, May 23, 2007 4:04 pm Robert Hancock wrote:
> Jesse Barnes wrote:
> > On Tuesday, May 22, 2007 6:06 pm Robert Hancock wrote:
> >> There was a big discussion about this back in 2002, in which Linus
> >> wasn't overly enthused about disabling the decode during probing
> >> due to risk of causing problems with some devices:
> >>
> >> http://lkml.org/lkml/2002/12/19/145
> >>
> >> In this particular case (64-bit BAR) we might be able to avoid the
> >> problem by changing the order in which we probe the two halves of
> >> the address, i.e. change the top half to 0xffffffff before messing
> >> with the bottom half and then change it back last. That way, we
> >> end up mapping it way to the top of 64-bit address space, which
> >> hopefully is less likely to conflict..
> >
> > Fixed it (finally). I don't think moving the 64 bit probing around
> > would make a difference, since we'd restore its original value
> > anyway before moving on to the 32 bit probe which is where I think
> > the problem is.
>
> You couldn't just reorder the code the way it is now, you'd have to
> rearrange the way we do things for 64-bit BARs:
>
> -write FFFFFFFF to high part of 64-bit address (we end up moving the
> BAR to 0xFFFFFFFFC0000000 for example)
> -If any bits stick, we know what the size is now (more than 4GB of
> decode), so just change it back, we're done
> -If not, we need to check the low part, so write FFFFFFFF to low part
> of 64-bit address (BAR moves to 0xFFFFFFFFFFFFFFFF)
> -Check which bits stick and calculate the address
> -Change the low part of the address back (BAR moves to
> 0xFFFFFFFFC000000) -Change the high part of the address back (BAR
> moves to the original 0xC0000000 address)
>
> This means that at no point do we map the BAR anywhere near the top
> of 32-bit memory, so we should avoid this issue in this particular
> case. I don't think this strategy is too likely to break anything,
> surely less likely than disabling command bits. Jesse, you might want
> to try hacking up something like this and see what happens.

Ah yeah, that would probably work in this particular case, but doesn't
seem very general. I think just using type 1 accesses for non-extended
config space is a bit more solid.

Jesse

2007-05-23 23:11:44

by Jesse Barnes

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

On Wednesday, May 23, 2007 4:04 pm David Miller wrote:
> From: Linus Torvalds <[email protected]>
> Date: Wed, 23 May 2007 15:16:23 -0700 (PDT)
>
> > That crap should be seen for the crap it is! Dammit, how hard can
> > it be to just admit that mmconfig isn't that great?
>
> I knew mmconfig was broken conceptually the first time I started
> seeing write posting "bug fixes" for it that would do a read back
> from PCI config space via mmconfig to post the write, which of course
> has potential side-effects on the device and is absolutely illegal if
> the write just performed put the device into a PM state or whatever.

I've actually seen that specific form of posted write flushing cause
crashes on some machines, so yes, it sucks.

Unfortunately, I don't think we have any other way of getting at
extended config space on x86, unless EFI provides methods or something,
but I'm not sure that would be an improvement...

Jesse

2007-05-23 23:15:23

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

On Wed, 23 May 2007 17:09:37 -0400
Jeff Garzik <[email protected]> wrote:

> Jesse Barnes wrote:
> > Apparently Vista will move away from using type 1 config space accesses
> > though, so if we keep using it, we'll probably run into some lame board
>
> Yep.
>
>
> > that assumes you're using mmconfig at some point in the near future.
> > But then again, we're often on that less tested path (e.g. with ACPI),
> > so maybe that doesn't matter much.
>
> One of the reasons why hardware vendors want to move away from
> traditional accesses is to be able to use the larger config space in
> PCI-Express, rather than being locked into the 256-byte legacy PCI
> config space.
>
> Several modern PCI-Express devices utilize the upper config space, but
> due to legacy reasons the registers are usually ones that do not require
> OS drivers to know about (like BIST stuff or diagnostic registers).
>
> Expect that to change, as MS shakes out the bugs (or maybe we are doing
> their job for them?).
>

On some PCI-Express boards, if you don't clear the advanced error reporting
registers on boot up, they will cause IRQ storm. The AER registers
are above 256 boundary. In fact, the AER support in Linux should depend
on MMCONFIG.

--
Stephen Hemminger <[email protected]>

2007-05-23 23:16:37

by Robert Hancock

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

Jesse Barnes wrote:
> On Wednesday, May 23, 2007 4:04 pm David Miller wrote:
>> From: Linus Torvalds <[email protected]>
>> Date: Wed, 23 May 2007 15:16:23 -0700 (PDT)
>>
>>> That crap should be seen for the crap it is! Dammit, how hard can
>>> it be to just admit that mmconfig isn't that great?
>> I knew mmconfig was broken conceptually the first time I started
>> seeing write posting "bug fixes" for it that would do a read back
>> from PCI config space via mmconfig to post the write, which of course
>> has potential side-effects on the device and is absolutely illegal if
>> the write just performed put the device into a PM state or whatever.
>
> I've actually seen that specific form of posted write flushing cause
> crashes on some machines, so yes, it sucks.
>
> Unfortunately, I don't think we have any other way of getting at
> extended config space on x86, unless EFI provides methods or something,
> but I'm not sure that would be an improvement...

That "fix" shouldn't be needed at all, the MMCONFIG memory range
shouldn't be covered by PCI ordering rules, so there should be no such
thing as write posting. I suspect that the author of such patch(es) was
doing so out of some misguided sense that it was needed. (And if there
is some chipset where it is actually needed, better just disable
MMCONFIG on that one, as there's no way to use it sanely.)

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2007-05-23 23:22:19

by Jesse Barnes

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

On Wednesday, May 23, 2007 4:15 pm Robert Hancock wrote:
> Jesse Barnes wrote:
> > On Wednesday, May 23, 2007 4:04 pm David Miller wrote:
> >> From: Linus Torvalds <[email protected]>
> >> Date: Wed, 23 May 2007 15:16:23 -0700 (PDT)
> >>
> >>> That crap should be seen for the crap it is! Dammit, how hard can
> >>> it be to just admit that mmconfig isn't that great?
> >>
> >> I knew mmconfig was broken conceptually the first time I started
> >> seeing write posting "bug fixes" for it that would do a read back
> >> from PCI config space via mmconfig to post the write, which of
> >> course has potential side-effects on the device and is absolutely
> >> illegal if the write just performed put the device into a PM state
> >> or whatever.
> >
> > I've actually seen that specific form of posted write flushing
> > cause crashes on some machines, so yes, it sucks.
> >
> > Unfortunately, I don't think we have any other way of getting at
> > extended config space on x86, unless EFI provides methods or
> > something, but I'm not sure that would be an improvement...
>
> That "fix" shouldn't be needed at all, the MMCONFIG memory range
> shouldn't be covered by PCI ordering rules, so there should be no
> such thing as write posting. I suspect that the author of such
> patch(es) was doing so out of some misguided sense that it was
> needed. (And if there is some chipset where it is actually needed,
> better just disable MMCONFIG on that one, as there's no way to use it
> sanely.)

PCI allows write posting, and on systems using mmconfig the posting
generally (unfortunately) extends to that space as well. So drivers
need to deal with it somehow. But many get it wrong.

Jesse

2007-05-24 00:22:55

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources



On Wed, 23 May 2007, Jesse Barnes wrote:
>
> After I sent my last message I realized the same thing... though I
> occasionally hear people talk about removing it (I seriously doubt that
> will ever happen). I don't even think there's a way to disable type 1
> config access on Intel chipsets...

Considering that the chipsets still have support for features that
*really* aren't used (and haven't been used in over a decade), I doubt the
conf1 thing is going away any time soon.

Things like: A20 gate, 15-16MB holes, i387 FP exception on irq 13 are
totally pointless in this day and age. Things like the DMA controller are
getting there, along with PS/2 keyboard support.

So there's a lot of things that are likely to be removed before conf1
accesses would. Removing CONF1 accesses would break every single current
OS, they'll do that ten years from now at the earliest.

Linus

2007-05-24 02:59:55

by Jesse Barnes

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

On Wednesday, May 23, 2007 5:21:13 Linus Torvalds wrote:
> On Wed, 23 May 2007, Jesse Barnes wrote:
> > After I sent my last message I realized the same thing... though I
> > occasionally hear people talk about removing it (I seriously doubt that
> > will ever happen). I don't even think there's a way to disable type 1
> > config access on Intel chipsets...
>
> Considering that the chipsets still have support for features that
> *really* aren't used (and haven't been used in over a decade), I doubt the
> conf1 thing is going away any time soon.
>
> Things like: A20 gate, 15-16MB holes, i387 FP exception on irq 13 are
> totally pointless in this day and age. Things like the DMA controller are
> getting there, along with PS/2 keyboard support.
>
> So there's a lot of things that are likely to be removed before conf1
> accesses would. Removing CONF1 accesses would break every single current
> OS, they'll do that ten years from now at the earliest.

So what do you think? You ok with enabling mmconfig if it's available as long
as we use type 1 accesses for non-extended stuff? If so, I think the patches
are pretty much ready...

Thanks,
Jesse

2007-05-24 03:20:06

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources



On Wed, 23 May 2007, Jesse Barnes wrote:
>
> So what do you think? You ok with enabling mmconfig if it's available as long
> as we use type 1 accesses for non-extended stuff? If so, I think the patches
> are pretty much ready...

Sure. I think mmconfig is perfectly sane if it falls back to conf1
accesses for legacy stuff..

And I also actually think that your patch to disable MMIO/PIO when testing
the BAR size is fine - I just think that it should likely only be done for
non-bridge devices (or at least non-host-bridge).

Linus

2007-05-24 03:21:55

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources



On Wed, 23 May 2007, Linus Torvalds wrote:
>
> Sure. I think mmconfig is perfectly sane if it falls back to conf1
> accesses for legacy stuff..

.. but without a regression, it's obviously a post-2.6.22 thing, I guess I
should make that clear, just because I think people send me patches after
-rc1 way too eagerly just because they think it fixes a bug.

Basically if it's not somethign that has _ever_ worked some way, it's not
a bug, it's a feature ;)

Linus

2007-05-24 03:41:20

by Jesse Barnes

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

On Wednesday, May 23, 2007 8:20:14 Linus Torvalds wrote:
> On Wed, 23 May 2007, Linus Torvalds wrote:
> > Sure. I think mmconfig is perfectly sane if it falls back to conf1
> > accesses for legacy stuff..
>
> .. but without a regression, it's obviously a post-2.6.22 thing, I guess I
> should make that clear, just because I think people send me patches after
> -rc1 way too eagerly just because they think it fixes a bug.
>
> Basically if it's not somethign that has _ever_ worked some way, it's not
> a bug, it's a feature ;)

No, I know better than to send something after your merge window closes. I
have no desire to be flamed even further on this topic. :)

And come to think of it, adding the enable/disable bits might be good even
with the patch to make legacy accesses go through type 1, since PCIe BAR
probing is probably done the same way (I haven't looked) and so we might run
into the same problems there.

Thanks,
Jesse

2007-05-24 05:19:46

by Robert Hancock

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

Jesse Barnes wrote:
> On Wednesday, May 23, 2007 8:20:14 Linus Torvalds wrote:
>> On Wed, 23 May 2007, Linus Torvalds wrote:
>>> Sure. I think mmconfig is perfectly sane if it falls back to conf1
>>> accesses for legacy stuff..
>> .. but without a regression, it's obviously a post-2.6.22 thing, I guess I
>> should make that clear, just because I think people send me patches after
>> -rc1 way too eagerly just because they think it fixes a bug.
>>
>> Basically if it's not somethign that has _ever_ worked some way, it's not
>> a bug, it's a feature ;)
>
> No, I know better than to send something after your merge window closes. I
> have no desire to be flamed even further on this topic. :)
>
> And come to think of it, adding the enable/disable bits might be good even
> with the patch to make legacy accesses go through type 1, since PCIe BAR
> probing is probably done the same way (I haven't looked) and so we might run
> into the same problems there.

I think that disabling decode on non-host-bridge devices during the BAR
sizing is something we should at least try, indeed.

The issue I have with forcing legacy config space accesses to type1 is
that it would make it much less obvious if the MMCONFIG access wasn't
working properly. You'd likely be able to boot up but then wonder why
something that does extended config space accesses didn't work or hung
the box. As I mentioned before, either we trust the MMCONFIG or we
don't, and if we decide that we don't on a particular box, we should
really be shutting it off entirely. Hopefully with the ACPI reservation
checking patch and the disable-decode-during-BAR-sizing patch
we wouldn't need to add that restriction.

But yes, post-2.6.22 for all of this :-)

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2007-05-24 06:18:38

by Jeff Garzik

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

Linus Torvalds wrote:
> Things like: A20 gate, 15-16MB holes, i387 FP exception on irq 13 are
> totally pointless in this day and age. Things like the DMA controller are
> getting there, along with PS/2 keyboard support.


The latest Intel chipset I have (ICH9) is legacy free: no serial port
and no PS/2 ports. I had to disable the Linux PS2 input drivers
completely, just to get the thing to boot.

Whee, "progress". :)

Jeff, digging out that USB debug cable cuz there's no serial


2007-05-24 15:43:58

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources



On Thu, 24 May 2007, Jeff Garzik wrote:
>
> The latest Intel chipset I have (ICH9) is legacy free: no serial port and no
> PS/2 ports. I had to disable the Linux PS2 input drivers completely, just to
> get the thing to boot.

Ahh, that would be a bug. Can you help trying to debug where it locks up?

I'm also surprised, since on the mac mini I have, I already have:

i8042.c: No controller found.

and it all works beautifully. Of course, it only did that after the
horrible crud that is "grub" got fixed, because the bootloader used to
wait forever, but I thought the kernel itself was able to handle a missing
PS/2 controller fine.

Linus