2006-11-08 17:26:49

by Adrian Bunk

[permalink] [raw]
Subject: Re: [discuss] Re: 2.6.19-rc4: known unfixed regressions (v3)

On Wed, Nov 08, 2006 at 08:05:18AM -0800, Linus Torvalds wrote:
>
>
> On Wed, 8 Nov 2006, Matthew Wilcox wrote:
> >
> > On Wed, Nov 08, 2006 at 08:39:44AM +0100, Andi Kleen wrote:
> > > ACPI knows the number of busses.
> >
> > But what if the number of busses increases later, eg by hotplugging
> > a card with a PCI-PCI bridge on it? Or does it know the number of
> > busses which can be supported by this machine's MMCONFIG region?
>
> ACPI will give the maximum number.
>
> However, in this case, the correct thing to do (always _has_ been) is to
> not use ACPI for _anything_, but just read the base and the size of the
> MMCONFIG region from the hardware itself.
>
> Anyway, I do not consider this a regression. MMCONFIG has _never_ worked
> reliably. It has always been a case of "we can make it work on some
> machines by making it break on others".

It is a serious regression:

The problem is that with the default CONFIG_PCI_GOANY, MMCONFIG is the
_first_ method tried.

In practice, this implies that nearly every system possibly affected
will suffer from a MMCONFIG breakage like the one Jeff observed...

> Linus

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed


2006-11-08 17:52:34

by Linus Torvalds

[permalink] [raw]
Subject: Re: [discuss] Re: 2.6.19-rc4: known unfixed regressions (v3)



On Wed, 8 Nov 2006, Adrian Bunk wrote:
> >
> > Anyway, I do not consider this a regression. MMCONFIG has _never_ worked
> > reliably. It has always been a case of "we can make it work on some
> > machines by making it break on others".
>
> It is a serious regression:
>
> The problem is that with the default CONFIG_PCI_GOANY, MMCONFIG is the
> _first_ method tried.

No. That was a bug at some point, but it's not that way now. See

pci_access_init(void)

which checks the pci_direct_probe() first, and only _then_ calls
pci_mmcfg_init(). And pci_mmcfg_init() will refuse to even use MMCONFIG
unless either the direct probe failed _or_ the MMCONFIG area is marked
entirely reserved in the e820 tables. Exactly because MMCONFIG generally
doesn't _work_.

Now, if that is indeed broken, then yes, we need to fix it.

Jeff - when you enable "direct PCI access", what is the printout? You
should get

PCI: Using configuration type 1

and the kernel should never have used MMCONFIG if the area wasn't marked
as reserved in e820..

Linus

2006-11-08 18:11:51

by Linus Torvalds

[permalink] [raw]
Subject: Re: [discuss] Re: 2.6.19-rc4: known unfixed regressions (v3)



On Wed, 8 Nov 2006, Linus Torvalds wrote:
>
> And pci_mmcfg_init() will refuse to even use MMCONFIG unless either the
> direct probe failed _or_ the MMCONFIG area is marked entirely reserved
> in the e820 tables. Exactly because MMCONFIG generally doesn't _work_.

Ahh. I see the bug now.

When we check whether MMCONFIG is ok, we do this:

if (type == 1 && !e820_all_mapped(pci_mmcfg_config[0].base_address,
pci_mmcfg_config[0].base_address + MMCONFIG_APER_MIN,
E820_RESERVED)) {

ie we check whether there is room for MMCONFIG_APER_MIN reserved things.

We then _reserve_ a totally different region in
pci_mmcfg_insert_resources(), because we use a totally different range.

What a piece of crap.

Andi, I'm getting really upset about this kind of thing. You've been very
much not careful about MMCFG in general, and are allowing total crap to go
into the kernel, without any thought. Just "testing" something isn't good
enough, it needs to be thought out.

I'm going to revert that totally bogus commit that added that broken
"pci_mmcfg_insert_resources()" function. It could be done right, but doing
it right would require that the function

- be called _before_ deciding to use MMCFG
- return an error number if it couldn't allocate the things
- check that the region was reserved

and then if some reservation failed, we shouldn't use MMCONFIG in the
first place.

We really should stop using MMCONFIG entirely, until we have a
per-southbridge true knowledge of what the real decoding is. The BIOS
tables for this are simply too damn unreliable.

Linus

2006-11-08 18:17:47

by Aaron Durbin

[permalink] [raw]
Subject: Re: [discuss] Re: 2.6.19-rc4: known unfixed regressions (v3)

On 11/8/06, Linus Torvalds <[email protected]> wrote:
>
>
> On Wed, 8 Nov 2006, Adrian Bunk wrote:
> > >
> > > Anyway, I do not consider this a regression. MMCONFIG has _never_ worked
> > > reliably. It has always been a case of "we can make it work on some
> > > machines by making it break on others".
> >
> > It is a serious regression:
> >
> > The problem is that with the default CONFIG_PCI_GOANY, MMCONFIG is the
> > _first_ method tried.
>
> No. That was a bug at some point, but it's not that way now. See
>
> pci_access_init(void)
>
> which checks the pci_direct_probe() first, and only _then_ calls
> pci_mmcfg_init(). And pci_mmcfg_init() will refuse to even use MMCONFIG
> unless either the direct probe failed _or_ the MMCONFIG area is marked
> entirely reserved in the e820 tables. Exactly because MMCONFIG generally
> doesn't _work_.
>

It appears in both i386 and x86-64 that the check is only on the first MCFG
entry and it only checks a hard-coded value of 16 buses. This check is only
done if pci access type == 1. The patches I posted yesterday have a few more
checks and warnings concerning the MCFG region, but these checks are only for
the resource allocation. They do not concern actual config access. With those
patches applied we should be at least able to track more buggy BIOS's provided
that people notice messages in their dmesg.

-Aaron

2006-11-08 18:40:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: [discuss] Re: 2.6.19-rc4: known unfixed regressions (v3)



On Wed, 8 Nov 2006, Linus Torvalds wrote:
>
> I'm going to revert that totally bogus commit that added that broken
> "pci_mmcfg_insert_resources()" function.

Pushed out. Jeff, can you verify that current git does the right thing.

Andi - I also ported the x86 io-apic cleanup and fixes to x86-64. I'll
push those out too once I've verified them on the machines I have access
to.

I'm still not 100% happy about the io-apic thing (for example, I was
thinking that maybe we should just automatically choose the order of
writes in apic_write_entry() by just checking whether "mask" was set or
not), but the code really is cleaner, and on x86 it was verified to fix
some things.

Keeping in sync is still better than having two different approaches, one
of which was confirmed broken. And the i386 code is definitely more tested
over the years than the x86-64 code could ever have hoped to be, so going
back to the original ordering makes sense.

Linus

2006-11-10 06:25:41

by Jeff Chua

[permalink] [raw]
Subject: Re: [discuss] Re: 2.6.19-rc4: known unfixed regressions (v3)

On 11/9/06, Linus Torvalds <[email protected]> wrote:

> Pushed out. Jeff, can you verify that current git does the right thing.

Linus,

Can you post those affected patches that I can apply directly to 2.6.19-rc5?

I'm still old-fashioned and have not downloaded the git dist. I'll try
to do that tonight.

Currently, I'm using Aaron Durbin's patch and it works.

Thanks,
Jeff.

2006-11-10 06:47:25

by Andrew Morton

[permalink] [raw]
Subject: Re: [discuss] Re: 2.6.19-rc4: known unfixed regressions (v3)

On Fri, 10 Nov 2006 14:25:30 +0800
"Jeff Chua" <[email protected]> wrote:

> On 11/9/06, Linus Torvalds <[email protected]> wrote:
>
> > Pushed out. Jeff, can you verify that current git does the right thing.
>
> Linus,
>
> Can you post those affected patches that I can apply directly to 2.6.19-rc5?

http://userweb.kernel.org/~akpm/origin.patch is Linus's tree as of twenty seconds
ago (against 2.6.19-rc5)

2006-11-10 06:57:24

by Andi Kleen

[permalink] [raw]
Subject: Re: [discuss] Re: 2.6.19-rc4: known unfixed regressions (v3)


> What a piece of crap.
>
> Andi, I'm getting really upset about this kind of thing. You've been very
> much not careful about MMCFG in general, and are allowing total crap to go
> into the kernel, without any thought. Just "testing" something isn't good
> enough, it needs to be thought out.

Sorry, probably should have read the patch more carefully.

I think I agreed with the high level idea but didn't double check
the details.

>
> I'm going to revert that totally bogus commit that added that broken
> "pci_mmcfg_insert_resources()" function. It could be done right, but doing
> it right would require that the function

Ok fine by me.

> We really should stop using MMCONFIG entirely, until we have a

Hmm, for .19 at least you mean?

Entirely stopping it would break the x86 macs minis again I think.
But we can make it "only use if type 1 doesn't work"

I'm sure some people will be upset again if we don't use it.
Perhaps there are really users who want to use the PCI-E error handling
for example.

> per-southbridge true knowledge of what the real decoding is. The BIOS
> tables for this are simply too damn unreliable.

My hopes for that are on Vista. Perhaps use a DMI year test again
(>= 2007) and only white lists for older boards.

-Andi

2006-11-10 09:21:38

by Jeff Chua

[permalink] [raw]
Subject: Re: [discuss] Re: 2.6.19-rc4: known unfixed regressions (v3)

On 11/10/06, Andrew Morton <[email protected]> wrote:
> On Fri, 10 Nov 2006 14:25:30 +0800
> "Jeff Chua" <[email protected]> wrote:
>
> > On 11/9/06, Linus Torvalds <[email protected]> wrote:
> >
> > > Pushed out. Jeff, can you verify that current git does the right thing.

Yes, this does it. It's working with MMCONFIG now.

Thank you!

Jeff.

2006-11-10 10:13:43

by Alan

[permalink] [raw]
Subject: Re: [discuss] Re: 2.6.19-rc4: known unfixed regressions (v3)

Ar Gwe, 2006-11-10 am 07:56 +0100, ysgrifennodd Andi Kleen:
> I'm sure some people will be upset again if we don't use it.
> Perhaps there are really users who want to use the PCI-E error handling
> for example.

And there is now hardware out there which requires accessing PCI-E
configuration spaces. Disabling it for those cases is not a sensible
option t all.

2006-11-10 13:57:32

by Jeff Garzik

[permalink] [raw]
Subject: Re: [discuss] Re: 2.6.19-rc4: known unfixed regressions (v3)

Linus Torvalds wrote:
> We really should stop using MMCONFIG entirely, until we have a
> per-southbridge true knowledge of what the real decoding is. The BIOS
> tables for this are simply too damn unreliable.


FWIW: MMCONFIG is required for PCI domain support (or "PCI segments" as
ACPI calls them). Only a few mass market OEM boxes exist that need this
-- and they are all pretty new (Opteron multi-core) -- but more are coming.

I have a patch in -mm that works for this. Without the patch, my
sata_mv card and the machine's built-in MPT-Fusion do not appear at all
in PCI bus scans (nor do the associated hard drives) on this production
HP box. So far these machines are rare, /usually/ with a BIOS switch to
turn off PCI domains.

This says nothing about BIOS table reliability, of course. I agree that
MMCONFIG probing is highly unreliable at present. Whitelisting "<2007"
systems like Andi proposed may be the only option in some cases.

Jeff


2006-11-10 14:02:23

by Jeff Garzik

[permalink] [raw]
Subject: Re: [discuss] Re: 2.6.19-rc4: known unfixed regressions (v3)

Alan Cox wrote:
> Ar Gwe, 2006-11-10 am 07:56 +0100, ysgrifennodd Andi Kleen:
>> I'm sure some people will be upset again if we don't use it.
>> Perhaps there are really users who want to use the PCI-E error handling
>> for example.
>
> And there is now hardware out there which requires accessing PCI-E
> configuration spaces. Disabling it for those cases is not a sensible
> option t all.

Yeah, mmconfig is required for accessing extended PCI configuration
space, and for accessing ANY devices on a non-zero PCI segment (read:
x86 PCI domains).

If you have boot devices on a PCI domain bus, you aren't booting at all,
unless you're running -mm...

Jeff