2013-10-12 17:15:59

by Ingo Molnar

[permalink] [raw]
Subject: [GIT PULL] x86 fixes

Linus,

Please pull the latest x86-urgent-for-linus git tree from:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-urgent-for-linus

# HEAD: 8412da757776727796e9edd64ba94814cc08d536 x86/reboot: Add reboot quirk for Dell Latitude E5410

A build fix and a reboot quirk.

out-of-topic modifications in x86-urgent-for-linus:
-----------------------------------------------------
drivers/iommu/Kconfig # 0dbc607: x86, build, pci: Fix PCI_MSI buil

Thanks,

Ingo

------------------>
Thomas Petazzoni (1):
x86, build, pci: Fix PCI_MSI build on !SMP

Ville Syrj?l? (1):
x86/reboot: Add reboot quirk for Dell Latitude E5410


arch/x86/Kconfig | 6 +++---
arch/x86/kernel/reboot.c | 8 ++++++++
drivers/iommu/Kconfig | 2 +-
3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ee2fb9d..145d703 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -860,7 +860,7 @@ source "kernel/Kconfig.preempt"

config X86_UP_APIC
bool "Local APIC support on uniprocessors"
- depends on X86_32 && !SMP && !X86_32_NON_STANDARD
+ depends on X86_32 && !SMP && !X86_32_NON_STANDARD && !PCI_MSI
---help---
A local APIC (Advanced Programmable Interrupt Controller) is an
integrated interrupt controller in the CPU. If you have a single-CPU
@@ -885,11 +885,11 @@ config X86_UP_IOAPIC

config X86_LOCAL_APIC
def_bool y
- depends on X86_64 || SMP || X86_32_NON_STANDARD || X86_UP_APIC
+ depends on X86_64 || SMP || X86_32_NON_STANDARD || X86_UP_APIC || PCI_MSI

config X86_IO_APIC
def_bool y
- depends on X86_64 || SMP || X86_32_NON_STANDARD || X86_UP_IOAPIC
+ depends on X86_64 || SMP || X86_32_NON_STANDARD || X86_UP_IOAPIC || PCI_MSI

config X86_VISWS_APIC
def_bool y
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index e643e74..7e920bf 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -326,6 +326,14 @@ static struct dmi_system_id __initdata reboot_dmi_table[] = {
DMI_MATCH(DMI_PRODUCT_NAME, "Latitude E6320"),
},
},
+ { /* Handle problems with rebooting on the Latitude E5410. */
+ .callback = set_pci_reboot,
+ .ident = "Dell Latitude E5410",
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
+ DMI_MATCH(DMI_PRODUCT_NAME, "Latitude E5410"),
+ },
+ },
{ /* Handle problems with rebooting on the Latitude E5420. */
.callback = set_pci_reboot,
.ident = "Dell Latitude E5420",
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index fe302e3..c880eba 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -52,7 +52,7 @@ config AMD_IOMMU
select PCI_PRI
select PCI_PASID
select IOMMU_API
- depends on X86_64 && PCI && ACPI && X86_IO_APIC
+ depends on X86_64 && PCI && ACPI
---help---
With this option you can enable support for AMD IOMMU hardware in
your system. An IOMMU is a hardware component which provides


2013-10-12 18:05:59

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] x86 fixes

On Sat, Oct 12, 2013 at 10:15 AM, Ingo Molnar <[email protected]> wrote:
>
> Ville Syrjälä (1):
> x86/reboot: Add reboot quirk for Dell Latitude E5410

So we have a number of these quirks, and Dell seems to be one of the
more common factors.

Now, I'd suggest we just trigger it automatically for Dell, but the
thing is, Dell tends to be the *least* differentiating PC maker out
there, so I'm wondering if it is our default reboot order that is just
plain wrong.

We start off trying to reboot using ACPI. Is that really sane? Is
there any reason to not make the default case be to use the PCI
reboot?

The history behind this seems to be that we _used_ to default to
REBOOT_KBD (since that's the traditional method), and fall back to the
triple fault. Then, it was changed to default to ACPI back in 2008 by
Avi Kivity, because those methods apparently could "assert INIT
instead of RESET", and INIT is apparently blocked in Intel VT. See
commit

But that immediately caused problems, and got reverted by commit
8d00450d296d. But despite the _known_ problems with the ACPI method,
then in 2011 Matthew Garrett defaulted to ACPI again in commit
660e34cebf0a, claiming that that is what Windows does. Which is
clearly NOT TRUE, since I can pretty much guarantee that Windows works
fine on Dell computers.

So Windows clearly does not default to the ACPI reboot model, at least
not unconditionally. There must be something we're missing. I'm
wondering if the way Matthew figured out the Windows defaults was to
run it in a virtual environment? Or uses some other flag to determine
whether the ACPI reboot is reliable or not. Because considering the
known issues with VT, maybe what Windows actually does is to use ACPI
only on VT machines, or something like that.

Because since that whole re-introduction of "default to ACPI", the
majority of patches to the reboot.c file seems to be just "ACPI reboot
doesn't work on this machine, so let's quirk it".

The alternative is that our "acpi_reboot" code is just completely
bogus. If you look at it, it does magic things with
ACPI_ADR_SPACE_PCI_CONFIG: and then calls acpi_reset() if that's not
the case. And the two functions have *inconsistent* validation of the
ACPI registers. acpi_reboot() will happily use a zero address forits
ACPI_ADR_SPACE_PCI_CONFIG poking, for example. And those functions
have gone back and forth on their breakage too (like not checking
ACPI_FADT_RESET_REGISTER which broke things entirely etc etc).

Guys, any ideas/suggestions? Because this constant "let's quirk things
because we're clearly doing something wrong" is wrong.

Linus

2013-10-12 18:19:38

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [GIT PULL] x86 fixes

On 10/12/2013 11:05 AM, Linus Torvalds wrote:
> On Sat, Oct 12, 2013 at 10:15 AM, Ingo Molnar <[email protected]> wrote:
>>
>> Ville Syrjälä (1):
>> x86/reboot: Add reboot quirk for Dell Latitude E5410
>
> So we have a number of these quirks, and Dell seems to be one of the
> more common factors.
>
> Now, I'd suggest we just trigger it automatically for Dell, but the
> thing is, Dell tends to be the *least* differentiating PC maker out
> there, so I'm wondering if it is our default reboot order that is just
> plain wrong.
>
> We start off trying to reboot using ACPI. Is that really sane? Is
> there any reason to not make the default case be to use the PCI
> reboot?

We tried it, it broke too many systems. However, I'm seriously
considering it for the case of Dell specificall.

The problem with Dell is that they are using something unusual like the
KBC to reboot, *and* apparently it is broken when VT-d is enabled. This
may becase the versions of Windows they're testing on aren't using it,
or because they "expect" the OS to disable VT-d before shutdown.

> The history behind this seems to be that we _used_ to default to
> REBOOT_KBD (since that's the traditional method), and fall back to the
> triple fault. Then, it was changed to default to ACPI back in 2008 by
> Avi Kivity, because those methods apparently could "assert INIT
> instead of RESET", and INIT is apparently blocked in Intel VT. See
> commit
>
> But that immediately caused problems, and got reverted by commit
> 8d00450d296d. But despite the _known_ problems with the ACPI method,
> then in 2011 Matthew Garrett defaulted to ACPI again in commit
> 660e34cebf0a, claiming that that is what Windows does. Which is
> clearly NOT TRUE, since I can pretty much guarantee that Windows works
> fine on Dell computers.
>
> So Windows clearly does not default to the ACPI reboot model, at least
> not unconditionally. There must be something we're missing.

See above.

> I'm
> wondering if the way Matthew figured out the Windows defaults was to
> run it in a virtual environment? Or uses some other flag to determine
> whether the ACPI reboot is reliable or not. Because considering the
> known issues with VT, maybe what Windows actually does is to use ACPI
> only on VT machines, or something like that.
>
> Because since that whole re-introduction of "default to ACPI", the
> majority of patches to the reboot.c file seems to be just "ACPI reboot
> doesn't work on this machine, so let's quirk it".
>
> The alternative is that our "acpi_reboot" code is just completely
> bogus. If you look at it, it does magic things with
> ACPI_ADR_SPACE_PCI_CONFIG: and then calls acpi_reset() if that's not
> the case. And the two functions have *inconsistent* validation of the
> ACPI registers. acpi_reboot() will happily use a zero address forits
> ACPI_ADR_SPACE_PCI_CONFIG poking, for example. And those functions
> have gone back and forth on their breakage too (like not checking
> ACPI_FADT_RESET_REGISTER which broke things entirely etc etc).
>
> Guys, any ideas/suggestions? Because this constant "let's quirk things
> because we're clearly doing something wrong" is wrong.

Very much so... the question is what to do about it. However, as you
see above we actually have some idea about what is wrong.

-hpa

2013-10-12 18:49:31

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] x86 fixes


* H. Peter Anvin <[email protected]> wrote:

> On 10/12/2013 11:05 AM, Linus Torvalds wrote:
> > On Sat, Oct 12, 2013 at 10:15 AM, Ingo Molnar <[email protected]> wrote:
> >>
> >> Ville Syrj?l? (1):
> >> x86/reboot: Add reboot quirk for Dell Latitude E5410
> >
> > So we have a number of these quirks, and Dell seems to be one of the
> > more common factors.

I raised this with hpa when the latest round of Dell quirks were added,
but I agree that the quirk frequency sucks (for every quirk there's likely
5 times as many boxes still out there that we haven't applied a quirk for
yet ...) and I agree that this needs to be discussed more widely.

> > Now, I'd suggest we just trigger it automatically for Dell, but the
> > thing is, Dell tends to be the *least* differentiating PC maker out
> > there, so I'm wondering if it is our default reboot order that is just
> > plain wrong.
> >
> > We start off trying to reboot using ACPI. Is that really sane? Is
> > there any reason to not make the default case be to use the PCI
> > reboot?
>
> We tried it, it broke too many systems. However, I'm seriously
> considering it for the case of Dell specificall.
>
> The problem with Dell is that they are using something unusual like the
> KBC to reboot, *and* apparently it is broken when VT-d is enabled. This
> may becase the versions of Windows they're testing on aren't using it,
> or because they "expect" the OS to disable VT-d before shutdown.

If Windows doesn't use and enable VT-d then we probably should not use it
either, or we should _at minimum_ disable VT-d again when we shut down.

I.e. if possible we should leave the hardware roughly in the state we got
it from the firmware, that leads probably to the least amount of
surprises.

So I'd be tempted to just not use IRQ remapping until this is cleared up
and fixed properly. I.e. don't quirk the reboot method, quirk VT-d ...

Thanks,

Ingo

2013-10-12 19:30:51

by Matthew Garrett

[permalink] [raw]
Subject: RE: [GIT PULL] x86 fixes

Windows definitely uses the ACPI reboot vector. The problem with this is that, depending on what the ACPI reboot vector points to, it may also trigger some SMM code. Hitting the PCI reboot vector is likely to skip that in most cases, which results in us breaking a different set of systems that rely on SMM code to reconfigure the hardware to the expectations of the firmware entry point.

The most likely cause of the problem that we're seeing here is that we're leaving the hardware in a state that isn't compatible with assumptions made by Dell's SMM code. We've certainly seen that some previous Dell machines have made incorrect assumptions about VT-d, which probably means that we should be destroying that state before reboot. David Woodhouse had been looking into that - I don't know whether we ever actually merged anything to do so.

2013-10-12 19:41:34

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] x86 fixes

On Sat, Oct 12, 2013 at 12:28 PM, Matthew Garrett
<[email protected]> wrote:
>
> The most likely cause of the problem that we're seeing here is that we're leaving the hardware in a state that isn't compatible with assumptions made by Dell's SMM code. We've certainly seen that some previous Dell machines have made incorrect assumptions about VT-d, which probably means that we should be destroying that state before reboot. David Woodhouse had been looking into that - I don't know whether we ever actually merged anything to do so.

I have access to one of the affected Dell systems, so I'm willing to
try patches..

Linus

2013-10-12 20:36:51

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [GIT PULL] x86 fixes

On 10/12/2013 12:41 PM, Linus Torvalds wrote:
> On Sat, Oct 12, 2013 at 12:28 PM, Matthew Garrett
> <[email protected]> wrote:
>>
>> The most likely cause of the problem that we're seeing here is that we're leaving the hardware in a state that isn't compatible with assumptions made by Dell's SMM code. We've certainly seen that some previous Dell machines have made incorrect assumptions about VT-d, which probably means that we should be destroying that state before reboot. David Woodhouse had been looking into that - I don't know whether we ever actually merged anything to do so.
>
> I have access to one of the affected Dell systems, so I'm willing to
> try patches..
>

The Dell machines definitely use some kind of SMM to reboot.

-hpa

2013-10-15 07:15:13

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] x86 fixes


* Ingo Molnar <[email protected]> wrote:

>
> * H. Peter Anvin <[email protected]> wrote:
>
> > On 10/12/2013 11:05 AM, Linus Torvalds wrote:
> > > On Sat, Oct 12, 2013 at 10:15 AM, Ingo Molnar <[email protected]> wrote:
> > >>
> > >> Ville Syrj?l? (1):
> > >> x86/reboot: Add reboot quirk for Dell Latitude E5410
> > >
> > > So we have a number of these quirks, and Dell seems to be one of the
> > > more common factors.
>
> I raised this with hpa when the latest round of Dell quirks were added,
> but I agree that the quirk frequency sucks (for every quirk there's
> likely 5 times as many boxes still out there that we haven't applied a
> quirk for yet ...) and I agree that this needs to be discussed more
> widely.

One problem we have is that we have sporadic end user reports only, with
no systematic testing done by Dell on these systems apparently. Is there
anyone at Dell who cares about these reboot problems that upstream kernels
exhibit with VT-d enabled?

Thanks,

Ingo

2013-10-15 10:57:50

by Borislav Petkov

[permalink] [raw]
Subject: Re: [GIT PULL] x86 fixes

On Tue, Oct 15, 2013 at 09:15:08AM +0200, Ingo Molnar wrote:
> One problem we have is that we have sporadic end user reports only,
> with no systematic testing done by Dell on these systems apparently.
> Is there anyone at Dell who cares about these reboot problems that
> upstream kernels exhibit with VT-d enabled?

For that, you probably want to CC a couple of Dell people who touched
arch/x86/ :)

$ git log -p arch/x86/ | grep "dell\.com"

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--