Linus,
Please pull 'for-linus' from
git://one.firstfloor.org/home/andi/git/linux-2.6
Andi Kleen:
x86-64: Update defconfig
i386: Update defconfig
i386: Fix PCI BIOS config space access
x86: Terminate the kernel stacks for the unwinder
x86-64: Fix FPU corruption
x86-64: Annotate interrupt frame backlink in interrupt handlers
x86-64: Ignore alignment checks in kernel
i386: Ignore alignment checks in kernel
Ingo Molnar:
i386: fix rwsem build bug on CONFIG_M386=y
Jon Mason:
x86-64: Calgary IOMMU: deobfuscate calgary_init
x86-64: Calgary IOMMU: Fix off by one when calculating register space location
x86-64: Calgary IOMMU: Update Jon's contact info
x86-64: Calgary IOMMU: print PCI bus numbers in hex
Randy Dunlap:
x86-64: Fix compilation without CONFIG_KALLSYMS
MAINTAINERS | 2 +-
arch/i386/defconfig | 41 +++++++++++++++++++++++++++++-------
arch/i386/kernel/process.c | 6 ++++-
arch/i386/kernel/traps.c | 19 ++++++++++++++++-
arch/i386/lib/semaphore.S | 3 +++
arch/i386/pci/direct.c | 2 ++
arch/i386/pci/init.c | 4 ++++
arch/x86_64/defconfig | 43 +++++++++++++++++++++++++++++++-------
arch/x86_64/kernel/entry.S | 8 +++++++
arch/x86_64/kernel/pci-calgary.c | 36 +++++++++++++++++++++-----------
arch/x86_64/kernel/process.c | 7 +++---
arch/x86_64/kernel/setup64.c | 2 +-
arch/x86_64/kernel/traps.c | 24 +++++++++++++++++++--
13 files changed, 159 insertions(+), 38 deletions(-)
Andi Kleen wrote:
> Linus,
>
> Please pull 'for-linus' from
>
> git://one.firstfloor.org/home/andi/git/linux-2.6
>
> Andi Kleen:
> x86-64: Update defconfig
> i386: Update defconfig
> i386: Fix PCI BIOS config space access
Does this fix the following issue:
PCI: BIOS Bug: MCFG area at e0000000 is not E820-reserved
PCI: Not using MMCONFIG.
100% of my x86-64 boxes, AMD or Intel, print this message. And 100% of
them work just fine with MMCONFIG.
I think this rule is far too drastic for real life.
Jeff
On Thu, 5 Oct 2006, Andi Kleen wrote:
>
> Please pull 'for-linus' from
>
> git://one.firstfloor.org/home/andi/git/linux-2.6
Please write that as
Please pull 'for-linus' from
git://one.firstfloor.org/home/andi/git/linux-2.6 for-linus
ie so that I can't miss the branch-name by mistake. I also cut-and-paste
the repo address (trying to re-type it would be just stupid), and to avoid
mistakes, it's _much_ easier if I can just triple-click and cut the whole
line, and then just do
"git pull <paste>"
without having to be careful about cutting at just the right character and
then having to write the branch-name separately.
Also, I think these two are totally bogus:
> Andi Kleen:
> x86-64: Ignore alignment checks in kernel
> i386: Ignore alignment checks in kernel
Have you actually ever seen an alignment check in the kernel? As far as I
know, the AC flag is only effective in user space, and anything else would
be in violation of the whole definition of the AC flag. The i486 manual
explicitly states that AC events are _only_ handled in ring3.
So I think these both are (a) misleading and (b) wrong.
Please don't do this.
The problem with the AC flag was that it leaked through to user space
because task switching didn't save/restore eflags properly. We already
fixed that (same bug as with the NT flag).
Linus
On Thursday 05 October 2006 19:17, Jeff Garzik wrote:
> Does this fix the following issue:
>
> PCI: BIOS Bug: MCFG area at e0000000 is not E820-reserved
> PCI: Not using MMCONFIG.
>
> 100% of my x86-64 boxes, AMD or Intel, print this message. And 100% of
> them work just fine with MMCONFIG.
No.
But it isn't really a issue. Basically everything[1] will work fine anyways.
[1] Only thing you're missing AFAIK is PCI Extended Error Reporting.
> I think this rule is far too drastic for real life.
If you have a better proposal please share. I tried a few others, but none
of them could handle all the buggy Intel 9x5 boards that hang on any
mmconfig access (so the "try the first few busses" check already hangs)
Originally I thought
DMI blacklisting would work, but it's on too many systems for that
(and Linus rightfully hated it anyways). ACPI checks also didn't work.
I don't know of any others.
-Andi
> Have you actually ever seen an alignment check in the kernel?
No. It was me extrapolating from Marcus' report, but I apparently
didn't read the specification well enough.
> As far as I
> know, the AC flag is only effective in user space, and anything else would
> be in violation of the whole definition of the AC flag. The i486 manual
> explicitly states that AC events are _only_ handled in ring3.
>
> So I think these both are (a) misleading and (b) wrong.
Ok. Please drop them then.
-Andi
Andi Kleen wrote:
> On Thursday 05 October 2006 19:17, Jeff Garzik wrote:
>
>> Does this fix the following issue:
>>
>> PCI: BIOS Bug: MCFG area at e0000000 is not E820-reserved
>> PCI: Not using MMCONFIG.
>>
>> 100% of my x86-64 boxes, AMD or Intel, print this message. And 100% of
>> them work just fine with MMCONFIG.
>
> No.
>
> But it isn't really a issue. Basically everything[1] will work fine anyways.
>
> [1] Only thing you're missing AFAIK is PCI Extended Error Reporting.
Not really true, I have some cards which have >256 bytes of config space.
>> I think this rule is far too drastic for real life.
>
> If you have a better proposal please share. I tried a few others, but none
> of them could handle all the buggy Intel 9x5 boards that hang on any
> mmconfig access (so the "try the first few busses" check already hangs)
>
> Originally I thought
> DMI blacklisting would work, but it's on too many systems for that
> (and Linus rightfully hated it anyways). ACPI checks also didn't work.
> I don't know of any others.
It's a bit disappointing, since I keep getting brand new boxes with
brand new BIOSen, but keep hitting this rule.
AFAICS it's a buggy check, since it continues to needlessly blacklist
working boxes.
My proposal is quite simple: "something that works" -- the current
solution obviously does not.
Jeff
On Thursday 05 October 2006 19:43, Jeff Garzik wrote:
> Andi Kleen wrote:
> > On Thursday 05 October 2006 19:17, Jeff Garzik wrote:
> >
> >> Does this fix the following issue:
> >>
> >> PCI: BIOS Bug: MCFG area at e0000000 is not E820-reserved
> >> PCI: Not using MMCONFIG.
> >>
> >> 100% of my x86-64 boxes, AMD or Intel, print this message. And 100% of
> >> them work just fine with MMCONFIG.
> >
> > No.
> >
> > But it isn't really a issue. Basically everything[1] will work fine anyways.
> >
> > [1] Only thing you're missing AFAIK is PCI Extended Error Reporting.
>
> Not really true, I have some cards which have >256 bytes of config space.
Yes for advanced error handling (which we only support in a few drivers
right now) I'm not aware of any card that uses it for anything else. Do you
have evidence of that?
> >> I think this rule is far too drastic for real life.
> >
> > If you have a better proposal please share. I tried a few others, but none
> > of them could handle all the buggy Intel 9x5 boards that hang on any
> > mmconfig access (so the "try the first few busses" check already hangs)
> >
> > Originally I thought
> > DMI blacklisting would work, but it's on too many systems for that
> > (and Linus rightfully hated it anyways). ACPI checks also didn't work.
> > I don't know of any others.
>
> It's a bit disappointing, since I keep getting brand new boxes with
> brand new BIOSen, but keep hitting this rule.
A lot of new boxes are actually buggy due to a common Intel reference
BIOS bug. There are also a couple of other quirks there.
I suppose it'll only become better once Windows starts using MCFG.
> My proposal is quite simple: "something that works" -- the current
> solution obviously does not.
If you have a patch that works with all known BIOS bugs (including Mac Mini,
a random Intel 975 board and a Asus AMD K8 board with PCI Express) please share it.
-Andi
Andi Kleen wrote:
> On Thursday 05 October 2006 19:43, Jeff Garzik wrote:
>> Andi Kleen wrote:
>>> On Thursday 05 October 2006 19:17, Jeff Garzik wrote:
>>>
>>>> Does this fix the following issue:
>>>>
>>>> PCI: BIOS Bug: MCFG area at e0000000 is not E820-reserved
>>>> PCI: Not using MMCONFIG.
>>>>
>>>> 100% of my x86-64 boxes, AMD or Intel, print this message. And 100% of
>>>> them work just fine with MMCONFIG.
>>> No.
>>>
>>> But it isn't really a issue. Basically everything[1] will work fine anyways.
>>>
>>> [1] Only thing you're missing AFAIK is PCI Extended Error Reporting.
>> Not really true, I have some cards which have >256 bytes of config space.
>
> Yes for advanced error handling (which we only support in a few drivers
> right now) I'm not aware of any card that uses it for anything else. Do you
> have evidence of that?
I need it to access chip-specific configuration registers on a PCI
Express card. It's under NDA so that's all I can say.
This will become more common as PCI Express becomes more common, as
well. It's largely only luck that we haven't run into more cases like
this. Most PCI-Ex devices, like most PCI devices, just don't need a
whole lot of PCI configuration space.
>>>> I think this rule is far too drastic for real life.
>>> If you have a better proposal please share. I tried a few others, but none
>>> of them could handle all the buggy Intel 9x5 boards that hang on any
>>> mmconfig access (so the "try the first few busses" check already hangs)
>>>
>>> Originally I thought
>>> DMI blacklisting would work, but it's on too many systems for that
>>> (and Linus rightfully hated it anyways). ACPI checks also didn't work.
>>> I don't know of any others.
>> It's a bit disappointing, since I keep getting brand new boxes with
>> brand new BIOSen, but keep hitting this rule.
>
> A lot of new boxes are actually buggy due to a common Intel reference
> BIOS bug. There are also a couple of other quirks there.
>
> I suppose it'll only become better once Windows starts using MCFG.
>
>> My proposal is quite simple: "something that works" -- the current
>> solution obviously does not.
>
> If you have a patch that works with all known BIOS bugs (including Mac Mini,
> a random Intel 975 board and a Asus AMD K8 board with PCI Express) please share it.
Can you then please share the list of known BIOS bugs?
All I have to do on my machines is work around the disable-mmconfig
code, and things start working.
Jeff
tel 975 board and a Asus AMD K8 board with PCI Express) please share it.
>
> Can you then please share the list of known BIOS bugs?
I already did, but here again in detail:
Intel 9x5 hangs when you do any mmconfig access (apparently reference BIOS
fills in bogus address). Check against ACPI resources doesn't catch it
either.
Mac Mini doesn't do any Type1 at all, so a verify pass that checks
mcfg against type1 doesn't work there.
AMD K8 boards with PCI-E bridges have mmconfig only
on some busses, but not on the internal northbridge busses.
MCFG can describe this by listing segments, but the contents
of that on many boards can be described as "fictional" at best.
That is why Linux needs to verify that too.
> All I have to do on my machines is work around the disable-mmconfig
> code, and things start working.
If the choice is between a secret NDA only card with dubious
functionality and booting on lots of modern boards I know what to
choose.
-Andi
Andi Kleen wrote:
> If the choice is between a secret NDA only card with dubious
> functionality and booting on lots of modern boards I know what to
> choose.
That's a strawman argument. There is no need to choose. You can
clearly boot on lots of modern boards with mmconfig just fine. We just
need to narrow down which ones.
Jeff
On Thursday 05 October 2006 22:02, Jeff Garzik wrote:
> Andi Kleen wrote:
> > If the choice is between a secret NDA only card with dubious
> > functionality and booting on lots of modern boards I know what to
> > choose.
>
> That's a strawman argument. There is no need to choose. You can
> clearly boot on lots of modern boards with mmconfig just fine. We just
> need to narrow down which ones.
You want a huge white list or what? And you volunteer to maintain
it?
My current thinking is to just wait for Vista certified machines
and then use DMI year >= 2007 or so to enable it.
Most likely your secret card won't ship before that anyways and
maybe it can be even somehow used without MCFG. And if it can't
it is out of luck right now. Send blame to the BIOS writers
who are unable to get the MCFG tables right.
I don't think white list is the right answer though. If it
was only a small list of machines with broken BIOS it would
be possible to black list, but Intel pretty much sabotated that
by releasing a broken reference BIOS that is used in lots of different
boards.
I considered PCI ID checking for those chipsets instead of
SMBIOS checking, but that has other problems too.
-Andi
Ar Iau, 2006-10-05 am 19:53 +0200, ysgrifennodd Andi Kleen:
> If you have a patch that works with all known BIOS bugs (including Mac Mini,
> a random Intel 975 board and a Asus AMD K8 board with PCI Express) please share it.
Well you currently don't have such a patch so thats a disingenuous
argument. More useful for the short term till this is fixed might be to
provide
pci_requires_mmconfig(dev)
which forces it on for that device regardless and may (internal
implementation detail) print a warning "if your system hangs at this
point.." type message.
That gets us the best of both worlds.
On Friday 06 October 2006 00:00, Alan Cox wrote:
> Ar Iau, 2006-10-05 am 19:53 +0200, ysgrifennodd Andi Kleen:
> > If you have a patch that works with all known BIOS bugs (including Mac Mini,
> > a random Intel 975 board and a Asus AMD K8 board with PCI Express) please share it.
>
> Well you currently don't have such a patch so thats a disingenuous
> argument.
Sure if I had one I wouldn't need one from Jeff
(who is frankly currently deeply in the
"it is much easier to give suggestions if you don't understand the problem"
state)
The current kernel boots everywhere at least and enables mmconfig
if the BIOS marks it in e820 (which at least some
systems do). Getting it to this point wasn't easy and required
several iterations.
> pci_requires_mmconfig(dev)
>
> which forces it on for that device regardless and may (internal
> implementation detail) print a warning "if your system hangs at this
> point.." type message.
and the user will never see that message because it hangs?
I think we had that argument before. IMHO such messages are completely
useless. Hangs are not acceptable no matter what messages are printed
before.
> That gets us the best of both worlds.
Hanging systems?
-Andi
On Thu, 5 Oct 2006, Jeff Garzik wrote:
> Andi Kleen wrote:
> > If the choice is between a secret NDA only card with dubious
> > functionality and booting on lots of modern boards I know what to choose.
>
> That's a strawman argument. There is no need to choose. You can clearly boot
> on lots of modern boards with mmconfig just fine. We just need to narrow down
> which ones.
Jeff, _that_ is the strawman argument.
The thing is, nobody has been able to so far come up with a way to narrow
down which ones.
I think Andi's response was quite on the mark: if you have a patch to
narrow it down, please share. Until then, the fact is, we don't know
_how_, and you're barking up the wrong tree.
It is entirely possible that the only reasonable solution is to
potentially _never_ use MMIO as the main config mechanism (where "never"
is "in the next few years" - at some point we may be able to just say
"screw it, we don't care any more"), and that the drivers that want to use
MMIO for config accesses will have to use special operations for that.
In other words, right now we have
int pci_read_config_byte(struct pci_dev *dev, int where, u8 *val)
and maybe we will simply have to add a totally new function like
int pci_read_mmio_config_byte(struct pci_dev *dev, int where, u8 *val)
for drivers that literally _require_ the mmio accesses for one reason or
another.
Alternatively, we could just have a "ioremap()" kind of thing, and use
readl/writel on the end result, ie maybe we could do
void __iomem *cfg = mmiocfg_remap(dev);
if (!cfg) {
printk("MMIO PCI configuration cycles not supported\n");
return -EIO;
}
/* Read the extended error register or some-such crud.. */
val = readl(cfg + 0x180);
..
See? The thign is, at least right now the _advantages_ of MMIOCFG are
basically zero for all regular hardware, and as such the disadvantages (in
the form of machines that hang on device discovery) are way way _way_
bigger, and no way will do start using MMIO config cycles by default just
because some unreleased hardware might want to.
But using them explicitly in drivers - that's another thing.
What do you think about this simple solution?
Linus
> In other words, right now we have
>
> int pci_read_config_byte(struct pci_dev *dev, int where, u8 *val)
>
> and maybe we will simply have to add a totally new function like
>
> int pci_read_mmio_config_byte(struct pci_dev *dev, int where, u8 *val)
>
> for drivers that literally _require_ the mmio accesses for one reason or
> another.
That's easy to decide: if (where >= 256) mmconfig is required.
I'm just afraid it probably won't help if the MCFG is totally broken and
points to some other devices (like on the Intel boards). Then these drivers will
just hang and all of Alan's warning messages won't help with that.
-Andi
On Thu, 5 Oct 2006, Andi Kleen wrote:
>
> Ok. Please drop them then.
Ok, done. Since they weren't the last commits in your branch, I had to do
a certain amount of hand-waving: I merged up to the commit preceding the
AC flag changes, and then I cherry-picked the one later commit separately.
So it's not a normal merge, since I wanted to avoid those two commits.
I've pushed out the result, you should check it out (and drop the top
three commits on your head once you're ok with my result, since they won't
exist in my tree).
Linus
On Friday 06 October 2006 00:59, Linus Torvalds wrote:
>
> On Thu, 5 Oct 2006, Andi Kleen wrote:
> >
> > Ok. Please drop them then.
>
> Ok, done. Since they weren't the last commits in your branch, I had to do
> a certain amount of hand-waving: I merged up to the commit preceding the
> AC flag changes, and then I cherry-picked the one later commit separately.
>
> So it's not a normal merge, since I wanted to avoid those two commits.
>
> I've pushed out the result, you should check it out
Looks good thanks.
> (and drop the top
> three commits on your head once you're ok with my result, since they won't
> exist in my tree).
I only use temporary merge trees from quilt anyways, so it's fine
for me.
-Andi
On Fri, 6 Oct 2006, Andi Kleen wrote:
>
> > In other words, right now we have
> >
> > int pci_read_config_byte(struct pci_dev *dev, int where, u8 *val)
> >
> > and maybe we will simply have to add a totally new function like
> >
> > int pci_read_mmio_config_byte(struct pci_dev *dev, int where, u8 *val)
> >
> > for drivers that literally _require_ the mmio accesses for one reason or
> > another.
>
> That's easy to decide: if (where >= 256) mmconfig is required.
>
> I'm just afraid it probably won't help if the MCFG is totally broken and
> points to some other devices (like on the Intel boards). Then these drivers will
> just hang and all of Alan's warning messages won't help with that.
Sure. I'd actually prefer a separate interface partly for that reason, and
partly also because I think the whole "pci_read_config_xxx()" interface
has always been horrible.
If we had the
void __iomem *cfg = mmiocfg_remap(dev);
interface, we could (fairly easily) blacklist known-bad motherboards if we
needed to, and also, it would allow drivers to check whether mmiocfg is
available. It's possible that some drivers might want it if it exists, but
it wouldn't necessarily be somethign that they _require_, so they could
gracefully handle the case of getting a NULL config space handle back.
For example, for some devices, maybe they'd lose some error handling
capability, but they'd still be able to work otherwise.
We _can_ do the same thing with checking the error return value from
"pci_read_config_xxxx()", and use the "use different access method if the
index is >= 256", but I have to say, that just makes my gag reflex
trigger. Having the same function just silently do two different things
depending on the offset just sounds like a recipe for disaster.
I dunno. I'm not likely to care _that_ deeply about this all, but I do
think that machines that hang on device discovery is just about the worst
possible thing, so I'd much rather have ten machines that can't use their
very rare devices without some explicit kernel command line than have even
_one_ machine that just hangs because MMIOCFG is buggered.
(And we should probably have the "pci=mmiocfg" kernel command line entry
that forces MMIOCFG regardless of any e820 issues, even for normal
accesses).
Linus
> If we had the
>
> void __iomem *cfg = mmiocfg_remap(dev);
>
> interface, we could (fairly easily) blacklist known-bad motherboards if we
> needed to,
With the bad Intel reference BIOS that's far too many.
I've considered using PCI-IDs but even with that it would be quite messy.
> For example, for some devices, maybe they'd lose some error handling
> capability, but they'd still be able to work otherwise.
AFAIK that's always the case right now. Jeff's secret NDA device
is the only known exception so far.
> We _can_ do the same thing with checking the error return value from
> "pci_read_config_xxxx()",
Problem is you get a hang at least on the Intel boards, not a -1
> I dunno. I'm not likely to care _that_ deeply about this all, but I do
> think that machines that hang on device discovery is just about the worst
> possible thing, so I'd much rather have ten machines that can't use their
> very rare devices without some explicit kernel command line than have even
> _one_ machine that just hangs because MMIOCFG is buggered.
Actually they don't hang at discovery, but at the verify step we early use
to decide if mcfg works or not (we go through a few busses and compare
type 1 and mcfg). This is currently separate from the normal discovery
to not burden the generic code with it.
To be fair one issue is that the verify step current ignores the ACPI
information about how many busses there are, but iirc the Intel board hung
pretty early so it likely wouldn't have helped there anyways.
At some point there was hope if checking the MCFG against other
ACPI resources would catch it, but it didn't at least on the board
I tested on.
> (And we should probably have the "pci=mmiocfg" kernel command line entry
> that forces MMIOCFG regardless of any e820 issues, even for normal
> accesses).
Yes that would be a good change.
-Andi
Linus Torvalds wrote:
> If we had the
>
> void __iomem *cfg = mmiocfg_remap(dev);
>
> interface, we could (fairly easily) blacklist known-bad motherboards if we
> needed to, and also, it would allow drivers to check whether mmiocfg is
> available. It's possible that some drivers might want it if it exists, but
> it wouldn't necessarily be somethign that they _require_, so they could
> gracefully handle the case of getting a NULL config space handle back.
>
> For example, for some devices, maybe they'd lose some error handling
> capability, but they'd still be able to work otherwise.
Ugh. Large PCI config space is going to be the norm real soon. That
will just nasty up drivers.
> We _can_ do the same thing with checking the error return value from
> "pci_read_config_xxxx()", and use the "use different access method if the
> index is >= 256", but I have to say, that just makes my gag reflex
> trigger. Having the same function just silently do two different things
> depending on the offset just sounds like a recipe for disaster.
Agreed.
> I dunno. I'm not likely to care _that_ deeply about this all, but I do
> think that machines that hang on device discovery is just about the worst
> possible thing, so I'd much rather have ten machines that can't use their
> very rare devices without some explicit kernel command line than have even
> _one_ machine that just hangs because MMIOCFG is buggered.
>
> (And we should probably have the "pci=mmiocfg" kernel command line entry
> that forces MMIOCFG regardless of any e820 issues, even for normal
> accesses).
Agreed.
Jeff
Signed-off-by: Jeff Garzik <[email protected]>
---
arch/i386/pci/common.c | 4 ++++
arch/i386/pci/mmconfig.c | 3 ++-
arch/x86_64/pci/mmconfig.c | 3 ++-
3 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/arch/i386/pci/common.c b/arch/i386/pci/common.c
index 68bce19..38d9f4f 100644
--- a/arch/i386/pci/common.c
+++ b/arch/i386/pci/common.c
@@ -237,6 +237,10 @@ #ifdef CONFIG_PCI_MMCONFIG
pci_probe &= ~PCI_PROBE_MMCONF;
return NULL;
}
+ else if (!strcmp(str, "mmconf")) {
+ pci_probe |= PCI_PROBE_MMCONF | PCI_NO_CHECKS;
+ return NULL;
+ }
#endif
else if (!strcmp(str, "noacpi")) {
acpi_noirq_set();
diff --git a/arch/i386/pci/mmconfig.c b/arch/i386/pci/mmconfig.c
index d0c3da3..056cb0a 100644
--- a/arch/i386/pci/mmconfig.c
+++ b/arch/i386/pci/mmconfig.c
@@ -237,7 +237,8 @@ void __init pci_mmcfg_init(int type)
/* Only do this check when type 1 works. If it doesn't work
assume we run on a Mac and always use MCFG */
- if (type == 1 && !e820_all_mapped(pci_mmcfg_config[0].base_address,
+ if ((type == 1) && (!(pci_probe & PCI_NO_CHECKS)) &&
+ !e820_all_mapped(pci_mmcfg_config[0].base_address,
pci_mmcfg_config[0].base_address + MMCONFIG_APER_MIN,
E820_RESERVED)) {
printk(KERN_ERR "PCI: BIOS Bug: MCFG area at %x is not E820-reserved\n",
diff --git a/arch/x86_64/pci/mmconfig.c b/arch/x86_64/pci/mmconfig.c
index 7732f42..d942fc7 100644
--- a/arch/x86_64/pci/mmconfig.c
+++ b/arch/x86_64/pci/mmconfig.c
@@ -209,7 +209,8 @@ void __init pci_mmcfg_init(int type)
/* Only do this check when type 1 works. If it doesn't work
assume we run on a Mac and always use MCFG */
- if (type == 1 && !e820_all_mapped(pci_mmcfg_config[0].base_address,
+ if ((type == 1) && (!(pci_probe & PCI_NO_CHECKS)) &&
+ !e820_all_mapped(pci_mmcfg_config[0].base_address,
pci_mmcfg_config[0].base_address + MMCONFIG_APER_MIN,
E820_RESERVED)) {
printk(KERN_ERR "PCI: BIOS Bug: MCFG area at %x is not E820-reserved\n",
On Thu, 5 Oct 2006, Jeff Garzik wrote:
> Linus Torvalds wrote:
> > (And we should probably have the "pci=mmiocfg" kernel command line entry
> > that forces MMIOCFG regardless of any e820 issues, even for normal
> > accesses).
>
> Something like this?
Yes, except I look at the resulting messy conditional:
if ((type == 1) && (!(pci_probe & PCI_NO_CHECKS)) &&
!e820_all_mapped(pci_mmcfg_config[0].base_address,
pci_mmcfg_config[0].base_address + MMCONFIG_APER_MIN,
E820_RESERVED)) {
and it's totally unreadable, so how about making this a helper inline
function like
static inline int validate_mmcfg(int type, unsigned long address)
{
/*
* If type 1 accesses don't work, assume we run on a
* Mac and always use MCFG
*/
if (type != 1)
return 1;
/*
* If the user asked us to not check the MMIOCFG base
* address, just trust it.
*/
if (pci_probe & PCI_NO_CHECKS)
return 1;
/*
* Otherwise require that it's marked reserved in
* the e820 tables
*/
return e820_all_mapped(address,
address + MMCONFIG_APER_MIN,
E820_RESERVED);
}
and then just saying
if (!validate_mmcfg(type, pci_mmcfg_config[0].base_address)) {
.. complain and exit ..
which just seems a hell of a lot prettier to me.
It also means that _if_ we ever figure out other ways to double-check the
address automatically (or we have some automatic way of saying "that can't
be valid"), we can do so in that "validate" function, without adding more
and more horrible code.
What say you? Let's try to keep the code readable (and preferably make it
more so..)
Linus
Ar Iau, 2006-10-05 am 23:44 +0200, ysgrifennodd Andi Kleen:
> I think we had that argument before. IMHO such messages are completely
> useless. Hangs are not acceptable no matter what messages are printed
> before.
Oh so you plan to fix the iommu/aacraid problem you always said you
wouldn't fix ?
> > That gets us the best of both worlds.
>
> Hanging systems?
Working systems because the only hang case will be if you use hardware
that requires the new goodies on a box that is *actually* broken rather
than one which is misdetected by the paranoid check code.
Alan
On Thu, 2006-10-05 at 15:45 -0700, Linus Torvalds wrote:
>
> On Thu, 5 Oct 2006, Jeff Garzik wrote:
>
> > Andi Kleen wrote:
> > > If the choice is between a secret NDA only card with dubious
> > > functionality and booting on lots of modern boards I know what to choose.
> >
> > That's a strawman argument. There is no need to choose. You can clearly boot
> > on lots of modern boards with mmconfig just fine. We just need to narrow down
> > which ones.
>
> Jeff, _that_ is the strawman argument.
>
> The thing is, nobody has been able to so far come up with a way to narrow
> down which ones.
>
> I think Andi's response was quite on the mark: if you have a patch to
> narrow it down, please share. Until then, the fact is, we don't know
> _how_, and you're barking up the wrong tree.
we can do a tiny bit better than the current code; some chipsets have
the address of the MMIO region stored in their config space; so we can
get to that using the old method and validate the acpi code with that.
I'm (in the background) working on collecting which chipsets have this;
it seems that at least several Intel ones do.
On Fri, 6 Oct 2006, Arjan van de Ven wrote:
>
> we can do a tiny bit better than the current code; some chipsets have
> the address of the MMIO region stored in their config space; so we can
> get to that using the old method and validate the acpi code with that.
Yes. I think trusting ACPI is _always_ a mistake. It's insane. We should
never ask the firmware for any data that we can just figure out ourselves.
And we should tell all hardware companies that firmware tables are stupid,
and that we just want to know what the hell the registers MEAN!
I've certainly tried to tell Intel that. I think they may even have heard
me occasionally.
I can't understand why some people _still_ think ACPI is a good idea..
Linus
On Fri, 2006-10-06 at 09:07 -0700, Linus Torvalds wrote:
>
> On Fri, 6 Oct 2006, Arjan van de Ven wrote:
> >
> > we can do a tiny bit better than the current code; some chipsets have
> > the address of the MMIO region stored in their config space; so we can
> > get to that using the old method and validate the acpi code with that.
>
> Yes. I think trusting ACPI is _always_ a mistake. It's insane. We should
> never ask the firmware for any data that we can just figure out ourselves.
>
> And we should tell all hardware companies that firmware tables are stupid,
> and that we just want to know what the hell the registers MEAN!
the chipset docs do tell us that. I can even buy a case of "if we know
the chipset, just exclusively use the address from that ignore acpi".
I'm trying to find out how uniform the register format is across
chipsets, it seems many are the same, but there might be 2 varieties ;(
>
> I've certainly tried to tell Intel that. I think they may even have heard
> me occasionally.
>
> I can't understand why some people _still_ think ACPI is a good idea..
I'm the wrong person to rant about ACPI against ;)
--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
On Friday, 6 October 2006 18:07, Linus Torvalds wrote:
>
> On Fri, 6 Oct 2006, Arjan van de Ven wrote:
> >
> > we can do a tiny bit better than the current code; some chipsets have
> > the address of the MMIO region stored in their config space; so we can
> > get to that using the old method and validate the acpi code with that.
>
> Yes. I think trusting ACPI is _always_ a mistake. It's insane. We should
> never ask the firmware for any data that we can just figure out ourselves.
>
> And we should tell all hardware companies that firmware tables are stupid,
> and that we just want to know what the hell the registers MEAN!
>
> I've certainly tried to tell Intel that. I think they may even have heard
> me occasionally.
>
> I can't understand why some people _still_ think ACPI is a good idea..
I violently agree.
Rafael
--
You never change things by fighting the existing reality.
R. Buckminster Fuller
OK, lets' take K8 processor performance states (p-states) as an example:
BIOS, which should know 'best' about a given platform, needs to
communicate to the OS what 'voltage' (VID code) is correct for given
'frequency' (FID),
and it can do that via ACPI processor tables (_PSS). Otherwise, OS code
is left with having to manage a HUGE amount 'specifics' (processor
models), and endless driver revisions to account for new parts.
So, one can argue that there's merit on having ACPI, it's just a shame
when BIOS doesn't get it right! (thus the justification for lack of
'trust'... the same can probably be said about other BIOS issues, not
just ACPI)
Leo Duran
-----Original Message-----
From: Rafael J. Wysocki [mailto:[email protected]]
Sent: Friday, October 06, 2006 5:01 PM
To: Linus Torvalds
Cc: Arjan van de Ven; Jeff Garzik; Andi Kleen; [email protected];
[email protected]
Subject: Re: [discuss] Re: Please pull x86-64 bug fixes
On Friday, 6 October 2006 18:07, Linus Torvalds wrote:
>
> On Fri, 6 Oct 2006, Arjan van de Ven wrote:
> >
> > we can do a tiny bit better than the current code; some chipsets
have
> > the address of the MMIO region stored in their config space; so we
can
> > get to that using the old method and validate the acpi code with
that.
>
> Yes. I think trusting ACPI is _always_ a mistake. It's insane. We
should
> never ask the firmware for any data that we can just figure out
ourselves.
>
> And we should tell all hardware companies that firmware tables are
stupid,
> and that we just want to know what the hell the registers MEAN!
>
> I've certainly tried to tell Intel that. I think they may even have
heard
> me occasionally.
>
> I can't understand why some people _still_ think ACPI is a good idea..
I violently agree.
Rafael
--
You never change things by fighting the existing reality.
R. Buckminster Fuller
Duran, Leo wrote:
> OK, lets' take K8 processor performance states (p-states) as an example:
> BIOS, which should know 'best' about a given platform, needs to
> communicate to the OS what 'voltage' (VID code) is correct for given
> 'frequency' (FID),
> and it can do that via ACPI processor tables (_PSS). Otherwise, OS code
> is left with having to manage a HUGE amount 'specifics' (processor
> models), and endless driver revisions to account for new parts.
>
> So, one can argue that there's merit on having ACPI, it's just a shame
> when BIOS doesn't get it right! (thus the justification for lack of
> 'trust'... the same can probably be said about other BIOS issues, not
> just ACPI)
That's pretty much it in a nutshell... Since most BIOS are largely
tested and qualified only on That Other OS, Linux often gets the short
end of the stick. We have a long history of running into BIOS bugs, and
having to work around them. We've learned the hard way that programming
the "bare metal" is often the only reliable way to get things done.
Jeff
On Fri, 6 Oct 2006, Duran, Leo wrote:
>
> So, one can argue that there's merit on having ACPI
Not really.
The thing is, you have two choices:
- define interfaces in hardware
- not doing so, and then trying to paper it over with idiotic tables.
Sadly, Intel decided that they should do the latter, and invented ACPI.
If instead they had decided to just let the hardware describe itself, we
wouldn't need ACPI.
There are two kinds of interfaces: the simple ones, and the broken ones.
The simple ones are better defined by the hardware people, and they work.
They are of the kind:
"The pointer to the MMIO config area is readable from IO port at offset
cf4h"
The broken ones are the ones where hardware people know what they want to
do, but they think the interface is sucky and complicated, so they make it
_doubly_ sucky by then saying "we'll describe it in the BIOS tables", so
that now there is another (incompetent) group that can _also_ screw things
up. Yeehaa!
The thing is, Intel did really well for _years_ with just defining
hardware interfaces. The PIIX IDE controller interfaces were a great
success, and worked for over a decade. So here's a question for you:
"After having done something successfully for a decade, what do you do?
Do you
(a) Try to emulate a known successful strategy?
(b) Put a committee together to try to come up with a new and more
'generic' solution, since you were only successful for closer to
fifteen years."
Guess which one is ACPI.
Linus
On Fri, 2006-10-06 at 19:06 -0400, Jeff Garzik wrote:
> Duran, Leo wrote:
> > OK, lets' take K8 processor performance states (p-states) as an example:
> > BIOS, which should know 'best' about a given platform, needs to
> > communicate to the OS what 'voltage' (VID code) is correct for given
> > 'frequency' (FID),
> > and it can do that via ACPI processor tables (_PSS). Otherwise, OS code
> > is left with having to manage a HUGE amount 'specifics' (processor
> > models), and endless driver revisions to account for new parts.
> >
> > So, one can argue that there's merit on having ACPI, it's just a shame
> > when BIOS doesn't get it right! (thus the justification for lack of
> > 'trust'... the same can probably be said about other BIOS issues, not
> > just ACPI)
>
> That's pretty much it in a nutshell... Since most BIOS are largely
> tested and qualified only on That Other OS, Linux often gets the short
> end of the stick.
fwiw we're trying to get this changed; Intel released the Linux-ready
firmware developer kit recently which is designed to make it real easy
for bios people to test their bios with linux....
see http://www.linuxfirmwarekit.org
--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
On Fri, 6 Oct 2006, Linus Torvalds wrote:
>On Fri, 6 Oct 2006, Duran, Leo wrote:
>> So, one can argue that there's merit on having ACPI
>
>Not really.
>
>The thing is, you have two choices:
> - define interfaces in hardware
> - not doing so, and then trying to paper it over with idiotic tables.
I would have to agree that having HW describe itself makes sense, and
would certainly negate the need for 'static' ACPI tables that attempt
doing that.
But, allow me to cite another example to reinforce my point about the
merit of ACPI: Staying with processor power management interfaces, how
about 'dynamic' interfaces such as _PPC? _PPC describes to the OS the
platform's desired behavior based on some event, like unplugging the
power-cord on a laptop - I find merit on that kind of platform-to-OS
communication mechanism (I don't like the idea of having the platform
making decisions & taking actions behind the OS's back... and even if it
had to, I like the idea of at least providing some kind of notification,
which is possible via ACPI interfaces).
Leo Duran.
On Sun, 8 Oct 2006, Duran, Leo wrote:
>
> But, allow me to cite another example to reinforce my point about the
> merit of ACPI: Staying with processor power management interfaces, how
> about 'dynamic' interfaces such as _PPC? _PPC describes to the OS the
> platform's desired behavior based on some event, like unplugging the
> power-cord on a laptop - I find merit on that kind of platform-to-OS
> communication mechanism (I don't like the idea of having the platform
> making decisions & taking actions behind the OS's back... and even if it
> had to, I like the idea of at least providing some kind of notification,
> which is possible via ACPI interfaces).
The thing is, we'd just have been much better off if Intel had just
specified some perfectly regular interrupt for a "power management event",
coming out of a PCI device for that thing (say, the southbridge?), and
just tried to standardize it.
We'd have far fewer bugs that way. As it is, we need to often know about
how the hardware works _anyway_, just to fix up the problems that not
knowing about it causes.
Linus
On Friday 06 October 2006 13:14, Alan Cox wrote:
> Ar Iau, 2006-10-05 am 23:44 +0200, ysgrifennodd Andi Kleen:
> > I think we had that argument before. IMHO such messages are completely
> > useless. Hangs are not acceptable no matter what messages are printed
> > before.
>
> Oh so you plan to fix the iommu/aacraid problem you always said you
> wouldn't fix ?
They don't cause hangs, just IO errors (or panics if you configure
iommu debugging)
Actually I plan to fix this one, but it will require more work.
Basically the plan is to make the current dma zone variable sized
and get rid of all GFP_DMA allocations. Merge the current soft iommu
with that new dma allocator. Then make sure all allocations
that need such low dma use a mask argument to some allocator.
Then we can have a option to configure the size of the dma low zone.
Users of broken hardware just configure a larger size.
-Andi