2019-08-29 09:14:10

by Kai-Heng Feng

[permalink] [raw]
Subject: [PATCH] x86/hpet: Disable HPET on Intel Coffe Lake

Some Coffee Lake platforms have skewed HPET timer once the SoCs entered
PC10, and marked TSC as unstable clocksource as result.

Harry Pan identified it's a firmware bug [1].

To prevent creating a circular dependency between HPET and TSC, let's
disable HPET on affected platforms.

[1]: https://lore.kernel.org/lkml/[email protected]/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203183

Signed-off-by: Kai-Heng Feng <[email protected]>
---
arch/x86/kernel/hpet.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index c6f791bc481e..07e9ec6f85b6 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -7,7 +7,9 @@
#include <linux/cpu.h>
#include <linux/irq.h>

+#include <asm/cpu_device_id.h>
#include <asm/hpet.h>
+#include <asm/intel-family.h>
#include <asm/time.h>

#undef pr_fmt
@@ -806,6 +808,12 @@ static bool __init hpet_counting(void)
return false;
}

+static const struct x86_cpu_id hpet_blacklist[] __initconst = {
+ { X86_VENDOR_INTEL, 6, INTEL_FAM6_KABYLAKE_MOBILE },
+ { X86_VENDOR_INTEL, 6, INTEL_FAM6_KABYLAKE_DESKTOP },
+ { }
+};
+
/**
* hpet_enable - Try to setup the HPET timer. Returns 1 on success.
*/
@@ -819,6 +827,9 @@ int __init hpet_enable(void)
if (!is_hpet_capable())
return 0;

+ if (!hpet_force_user && x86_match_cpu(hpet_blacklist))
+ return 0;
+
hpet_set_mapping();
if (!hpet_virt_address)
return 0;
--
2.17.1


2019-08-29 12:15:21

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] x86/hpet: Disable HPET on Intel Coffe Lake

On Thu, 29 Aug 2019, Kai-Heng Feng wrote:

> Some Coffee Lake platforms have skewed HPET timer once the SoCs entered
> PC10, and marked TSC as unstable clocksource as result.

So here you talk about Coffee Lake and in the patch you use KABYLAKE.

> Harry Pan identified it's a firmware bug [1].
>
> To prevent creating a circular dependency between HPET and TSC, let's
> disable HPET on affected platforms.
>
> [1]: https://lore.kernel.org/lkml/[email protected]/
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203183

Please use Link:// for reference not [1] and not Bugzilla:

> +static const struct x86_cpu_id hpet_blacklist[] __initconst = {
> + { X86_VENDOR_INTEL, 6, INTEL_FAM6_KABYLAKE_MOBILE },
> + { X86_VENDOR_INTEL, 6, INTEL_FAM6_KABYLAKE_DESKTOP },

So this disables HPET on all Kaby Lake variants not just on the affected
Coffee Lakes. I know that I rejected the initial patch with the random
stepping cutoff...

https://lore.kernel.org/lkml/[email protected]

In the other attempt to 'fix' this I asked for clarification, but silence
from Intel after this:

https://lore.kernel.org/lkml/[email protected]

Can Intel please provide some useful information about this finally?

Thanks,

tglx



2019-08-29 14:17:52

by Kai-Heng Feng

[permalink] [raw]
Subject: Re: [PATCH] x86/hpet: Disable HPET on Intel Coffe Lake

at 20:13, Thomas Gleixner <[email protected]> wrote:

> On Thu, 29 Aug 2019, Kai-Heng Feng wrote:
>
>> Some Coffee Lake platforms have skewed HPET timer once the SoCs entered
>> PC10, and marked TSC as unstable clocksource as result.
>
> So here you talk about Coffee Lake and in the patch you use KABYLAKE.

Coffeelake has the same model number as Kabylake.

>
>> Harry Pan identified it's a firmware bug [1].
>>
>> To prevent creating a circular dependency between HPET and TSC, let's
>> disable HPET on affected platforms.
>>
>> [1]:
>> https://lore.kernel.org/lkml/[email protected]/
>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203183
>
> Please use Link:// for reference not [1] and not Bugzilla:

Ok.

>
>> +static const struct x86_cpu_id hpet_blacklist[] __initconst = {
>> + { X86_VENDOR_INTEL, 6, INTEL_FAM6_KABYLAKE_MOBILE },
>> + { X86_VENDOR_INTEL, 6, INTEL_FAM6_KABYLAKE_DESKTOP },
>
> So this disables HPET on all Kaby Lake variants not just on the affected
> Coffee Lakes. I know that I rejected the initial patch with the random
> stepping cutoff...
>
> https://lore.kernel.org/lkml/[email protected]
>
> In the other attempt to 'fix' this I asked for clarification, but silence
> from Intel after this:
>
> https://lore.kernel.org/lkml/[email protected]
>
> Can Intel please provide some useful information about this finally?

Hopefully Intel can provide more info.

I know we should find the root cause rather than stopping at "it’s a
firmware bug”, but users are already affected by this issue [1].
Is there any better short-term workaround?

[1] https://bugzilla.kernel.org/show_bug.cgi?id=204537

Kai-Heng

>
> Thanks,
>
> tglx


2019-08-29 19:47:49

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] x86/hpet: Disable HPET on Intel Coffe Lake

On Thu, 29 Aug 2019, Kai-Heng Feng wrote:
> at 20:13, Thomas Gleixner <[email protected]> wrote:
> > On Thu, 29 Aug 2019, Kai-Heng Feng wrote:
> >
> > > Some Coffee Lake platforms have skewed HPET timer once the SoCs entered
> > > PC10, and marked TSC as unstable clocksource as result.
> >
> > So here you talk about Coffee Lake and in the patch you use KABYLAKE.
>
> Coffeelake has the same model number as Kabylake.

Yeah, just a bit more text explaining that would be helpful.

> > > +static const struct x86_cpu_id hpet_blacklist[] __initconst = {
> > > + { X86_VENDOR_INTEL, 6, INTEL_FAM6_KABYLAKE_MOBILE },
> > > + { X86_VENDOR_INTEL, 6, INTEL_FAM6_KABYLAKE_DESKTOP },
> >
> > So this disables HPET on all Kaby Lake variants not just on the affected
> > Coffee Lakes. I know that I rejected the initial patch with the random
> > stepping cutoff...
> >
> > https://lore.kernel.org/lkml/[email protected]
> >
> > In the other attempt to 'fix' this I asked for clarification, but silence
> > from Intel after this:
> >
> > https://lore.kernel.org/lkml/[email protected]
> >
> > Can Intel please provide some useful information about this finally?
>
> Hopefully Intel can provide more info.
>
> I know we should find the root cause rather than stopping at "it’s a firmware
> bug”, but users are already affected by this issue [1].
> Is there any better short-term workaround?

Not really. And if Intel stays silent, I'm just going to apply it as is
along with a stable tag.

Thanks,

tglx

2019-08-29 21:39:29

by Thomas Gleixner

[permalink] [raw]
Subject: [RFD] x86/tsc: Loosen the requirements for watchdog - (was x86/hpet: Disable HPET on Intel Coffe Lake)

On Thu, 29 Aug 2019, Thomas Gleixner wrote:
> On Thu, 29 Aug 2019, Kai-Heng Feng wrote:
> > I know we should find the root cause rather than stopping at "it’s a firmware
> > bug”, but users are already affected by this issue [1].
> > Is there any better short-term workaround?
>
> Not really. And if Intel stays silent, I'm just going to apply it as is
> along with a stable tag.

Summary for those who are new on CC:

Coffee Lake machines have a C10 state wrecked HPET which causes the TSC
clocksource watchdog to misbehave which is not surprising as that's like
trying to monitor an atomic clock with a sun-dial.

So the intention is to disable HPET on those machines which affects also
Kaby Lake CPUs as they share the model number and just differ in the
stepping. Unless we get precise information from Intel which steppings
are affected and that these are the only ones, we won't go down the
stepping road as that is going to be an endless whack a mole game. Tried
that before and got burned...

While disabling HPET sounds trivial, this can have side effects.

If the HPET is not available for whatever reason the kernel will use
ACPI_PMTIMER as fallback clocksource for monitoring the TSC if the affected
systems actually advertise it. If not that will effectively disable NOHZ
and high resolution timers. Disabling NOHZ is a pain for power consumption
and those machines are mostly laptops I assume.

Now there is something we can consider to do:

These CPUs have finally a working and usable TSC - knock on wood!

Just for the record: That's 20+ years after we started to asked for it!

The TSC has constant frequency and does not stop in deeper C-states. Aside
of that these CPUs have the TSC_ADJUST MSR which allows us to figure out
when the BIOS/SMM manages to wreckage the TSC on a CPU by writing to it for
completely wrong reasons.

So we could finally start to trust TSC at least on single socket systems.

Multi-socket is a different story as the sockets might drift apart for
reasons which I really don't want to discuss in this context for CoC's
sake. So we definitely want a watchdog there as TSC ADJUST is not able to
catch those issues.

So if we have to disable the HPET on Kaby Lake alltogether unless Intel
comes up with the clever fix, i.e. poking at the right registers, then I
think we should also lift the TSC watchdog restrictions on these machines
if they are single socket, which they are as the affected CPUs so far are
mobile and client types.

Also given the fact that we get more and more 'reduced' hardware exposed
via ACPI and we already dealt with quite some fallout with various related
issues due to that, I fear we need to bite this bullet anyway anytime soon.

But TBH, 20+ years exposure to subtly wrecked timer hardware has left quite
a few scars.

I put AMD/HYGON folks on CC as well as they will run into similar problems
sooner than later and their CPUs still do not have the TSC_ADJUST MSR which
is paramount to loosen the watchdog restrictions. Hint, hint, hint...

Thoughts?

Thanks,

tglx

2019-08-30 03:50:20

by Daniel Drake

[permalink] [raw]
Subject: Re: [RFD] x86/tsc: Loosen the requirements for watchdog - (was x86/hpet: Disable HPET on Intel Coffe Lake)

Hi Thomas,

On Fri, Aug 30, 2019 at 5:38 AM Thomas Gleixner <[email protected]> wrote:
> So if we have to disable the HPET on Kaby Lake alltogether unless Intel
> comes up with the clever fix, i.e. poking at the right registers, then I
> think we should also lift the TSC watchdog restrictions on these machines
> if they are single socket, which they are as the affected CPUs so far are
> mobile and client types.
>
> Also given the fact that we get more and more 'reduced' hardware exposed
> via ACPI and we already dealt with quite some fallout with various related
> issues due to that, I fear we need to bite this bullet anyway anytime soon.

Thanks for the explanation here!

My experience in this area is basically limited to the clock-related
issues that I've sent your way recently, so I don't have deep wisdom
to draw upon, but what you wrote here makes sense to me.

If you can outline a testing procedure, we can test upcoming patches
on Coffee Lake and Kaby Lake consumer laptops.

Thanks,
Daniel

2019-10-01 15:50:36

by Kai-Heng Feng

[permalink] [raw]
Subject: Re: [PATCH] x86/hpet: Disable HPET on Intel Coffe Lake

Hi Thomas,

> On Aug 30, 2019, at 03:45, Thomas Gleixner <[email protected]> wrote:
>
> On Thu, 29 Aug 2019, Kai-Heng Feng wrote:
>> at 20:13, Thomas Gleixner <[email protected]> wrote:
>>> On Thu, 29 Aug 2019, Kai-Heng Feng wrote:
>>>
>>>> Some Coffee Lake platforms have skewed HPET timer once the SoCs entered
>>>> PC10, and marked TSC as unstable clocksource as result.
>>>
>>> So here you talk about Coffee Lake and in the patch you use KABYLAKE.
>>
>> Coffeelake has the same model number as Kabylake.
>
> Yeah, just a bit more text explaining that would be helpful.
>
>>>> +static const struct x86_cpu_id hpet_blacklist[] __initconst = {
>>>> + { X86_VENDOR_INTEL, 6, INTEL_FAM6_KABYLAKE_MOBILE },
>>>> + { X86_VENDOR_INTEL, 6, INTEL_FAM6_KABYLAKE_DESKTOP },
>>>
>>> So this disables HPET on all Kaby Lake variants not just on the affected
>>> Coffee Lakes. I know that I rejected the initial patch with the random
>>> stepping cutoff...
>>>
>>> https://lore.kernel.org/lkml/[email protected]
>>>
>>> In the other attempt to 'fix' this I asked for clarification, but silence
>>> from Intel after this:
>>>
>>> https://lore.kernel.org/lkml/[email protected]
>>>
>>> Can Intel please provide some useful information about this finally?
>>
>> Hopefully Intel can provide more info.
>>
>> I know we should find the root cause rather than stopping at "it’s a firmware
>> bug”, but users are already affected by this issue [1].
>> Is there any better short-term workaround?
>
> Not really. And if Intel stays silent, I'm just going to apply it as is
> along with a stable tag.

Seems like there's still no updates from Intel. Can we have this patch in v5.4?

Kai-Heng

>
> Thanks,
>
> tglx

2019-10-09 06:02:23

by Feng Tang

[permalink] [raw]
Subject: Re: [PATCH] x86/hpet: Disable HPET on Intel Coffe Lake

Hi Kai-Heng,

On Thu, Aug 29, 2019 at 5:14 PM Kai-Heng Feng
<[email protected]> wrote:
>
> Some Coffee Lake platforms have skewed HPET timer once the SoCs entered
> PC10, and marked TSC as unstable clocksource as result.
>
> Harry Pan identified it's a firmware bug [1].
>
> To prevent creating a circular dependency between HPET and TSC, let's
> disable HPET on affected platforms.

Sorry for chiming late.

We have disabled the HPET for Baytrail platforms in
commit 62187910b0fc : x86/intel: Add quirk to disable HPET for the
Baytrail platform

Which added a quirk in
@@ -567,6 +577,12 @@ static struct chipset early_qrk[] __initdata = {
+ /*
+ * HPET on current version of Baytrail platform has accuracy
+ * problems, disable it for now:
+ */
+ { PCI_VENDOR_ID_INTEL, 0x0f00,
+ PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},

So maybe we can unify the method to disable HPET. (btw, I have no idea
about the healthy info of HPET for Kabylake, just want to comment
on the disabling method).

Thanks,
Feng

>
> [1]: https://lore.kernel.org/lkml/[email protected]/
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203183
>
> Signed-off-by: Kai-Heng Feng <[email protected]>
> ---
> arch/x86/kernel/hpet.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
> index c6f791bc481e..07e9ec6f85b6 100644
> --- a/arch/x86/kernel/hpet.c
> +++ b/arch/x86/kernel/hpet.c
> @@ -7,7 +7,9 @@
> #include <linux/cpu.h>
> #include <linux/irq.h>
>
> +#include <asm/cpu_device_id.h>
> #include <asm/hpet.h>
> +#include <asm/intel-family.h>
> #include <asm/time.h>
>
> #undef pr_fmt
> @@ -806,6 +808,12 @@ static bool __init hpet_counting(void)
> return false;
> }
>
> +static const struct x86_cpu_id hpet_blacklist[] __initconst = {
> + { X86_VENDOR_INTEL, 6, INTEL_FAM6_KABYLAKE_MOBILE },
> + { X86_VENDOR_INTEL, 6, INTEL_FAM6_KABYLAKE_DESKTOP },
> + { }
> +};
> +
> /**
> * hpet_enable - Try to setup the HPET timer. Returns 1 on success.
> */
> @@ -819,6 +827,9 @@ int __init hpet_enable(void)
> if (!is_hpet_capable())
> return 0;
>
> + if (!hpet_force_user && x86_match_cpu(hpet_blacklist))
> + return 0;
> +
> hpet_set_mapping();
> if (!hpet_virt_address)
> return 0;
> --
> 2.17.1
>