2019-08-01 06:27:20

by Daniel Drake

[permalink] [raw]
Subject: setup_boot_APIC_clock() NULL dereference during early boot on reduced hardware platforms

Hi,

Working with a new consumer laptop based on AMD R7-3700U, we are
seeing a kernel panic during early boot (before the display
initializes). It's a new product and there is no previous known
working kernel version (tested 5.0, 5.2 and current linus master).

We may have also seen this problem on a MiniPC based on AMD APU 7010
from another vendor, but we don't have it in hands right now to
confirm that it's the exact same crash.

earlycon shows the details: a NULL dereference under
setup_boot_APIC_clock(), which actually happens in
calibrate_APIC_clock():

/* Replace the global interrupt handler */
real_handler = global_clock_event->event_handler;
global_clock_event->event_handler = lapic_cal_handler;

global_clock_event is NULL here. This is a "reduced hardware" ACPI
platform so acpi_generic_reduced_hw_init() has set timer_init to NULL,
avoiding the usual codepaths that would set up global_clock_event.

I tried the obvious:
if (!global_clock_event)
return -1;

However I'm probably missing part of the big picture here, as this
only makes boot fail later on. It continues til the next point that
something leads to schedule(), such as a driver calling msleep() or
mark_readonly() calling rcu_barrier(), etc. Then it hangs.

Is something missing in terms of timer setup here? Suggestions appreciated...

Thanks
Daniel


2019-08-01 07:41:05

by Aubrey Li

[permalink] [raw]
Subject: Re: setup_boot_APIC_clock() NULL dereference during early boot on reduced hardware platforms

On Thu, Aug 1, 2019 at 2:26 PM Daniel Drake <[email protected]> wrote:
>
> Hi,
>
> Working with a new consumer laptop based on AMD R7-3700U, we are
> seeing a kernel panic during early boot (before the display
> initializes). It's a new product and there is no previous known
> working kernel version (tested 5.0, 5.2 and current linus master).
>
> We may have also seen this problem on a MiniPC based on AMD APU 7010
> from another vendor, but we don't have it in hands right now to
> confirm that it's the exact same crash.
>
> earlycon shows the details: a NULL dereference under
> setup_boot_APIC_clock(), which actually happens in
> calibrate_APIC_clock():
>
> /* Replace the global interrupt handler */
> real_handler = global_clock_event->event_handler;
> global_clock_event->event_handler = lapic_cal_handler;
>
> global_clock_event is NULL here. This is a "reduced hardware" ACPI
> platform so acpi_generic_reduced_hw_init() has set timer_init to NULL,
> avoiding the usual codepaths that would set up global_clock_event.
>
IIRC, acpi_generic_reduced_hw_init() avoids initializing PIT, the status of
this legacy device is unknown in ACPI hw-reduced mode.

> I tried the obvious:
> if (!global_clock_event)
> return -1;
>
No, the platform needs a global clock event, can you turn on some other
clock source on your platform, like HPET?

Thanks,
-Aubrey

> However I'm probably missing part of the big picture here, as this
> only makes boot fail later on. It continues til the next point that
> something leads to schedule(), such as a driver calling msleep() or
> mark_readonly() calling rcu_barrier(), etc. Then it hangs.
>
> Is something missing in terms of timer setup here? Suggestions appreciated...
>
> Thanks
> Daniel

2019-08-01 07:48:40

by Thomas Gleixner

[permalink] [raw]
Subject: Re: setup_boot_APIC_clock() NULL dereference during early boot on reduced hardware platforms

On Thu, 1 Aug 2019, Aubrey Li wrote:
> On Thu, Aug 1, 2019 at 2:26 PM Daniel Drake <[email protected]> wrote:
> > global_clock_event is NULL here. This is a "reduced hardware" ACPI
> > platform so acpi_generic_reduced_hw_init() has set timer_init to NULL,
> > avoiding the usual codepaths that would set up global_clock_event.
> >
> IIRC, acpi_generic_reduced_hw_init() avoids initializing PIT, the status of
> this legacy device is unknown in ACPI hw-reduced mode.
>
> > I tried the obvious:
> > if (!global_clock_event)
> > return -1;
> >
> No, the platform needs a global clock event, can you turn on some other

Wrong. The kernel boots perfectly fine without a global clock event. But
for that the TSC and LAPIC frequency must be known.

Thanks,

tglx

2019-08-01 07:59:35

by Aubrey Li

[permalink] [raw]
Subject: Re: setup_boot_APIC_clock() NULL dereference during early boot on reduced hardware platforms

On Thu, Aug 1, 2019 at 3:35 PM Thomas Gleixner <[email protected]> wrote:
>
> On Thu, 1 Aug 2019, Aubrey Li wrote:
> > On Thu, Aug 1, 2019 at 2:26 PM Daniel Drake <[email protected]> wrote:
> > > global_clock_event is NULL here. This is a "reduced hardware" ACPI
> > > platform so acpi_generic_reduced_hw_init() has set timer_init to NULL,
> > > avoiding the usual codepaths that would set up global_clock_event.
> > >
> > IIRC, acpi_generic_reduced_hw_init() avoids initializing PIT, the status of
> > this legacy device is unknown in ACPI hw-reduced mode.
> >
> > > I tried the obvious:
> > > if (!global_clock_event)
> > > return -1;
> > >
> > No, the platform needs a global clock event, can you turn on some other
>
> Wrong. The kernel boots perfectly fine without a global clock event. But
> for that the TSC and LAPIC frequency must be known.

I think LAPIC fast calibrate is only supported on intel platform, while
Daniel's box is an AMD platform. That's why lapic_init_clockevent() failed
and fall into the code path which needs a global clock event.

Thanks,
-Aubrey

2019-08-01 08:21:42

by Thomas Gleixner

[permalink] [raw]
Subject: Re: setup_boot_APIC_clock() NULL dereference during early boot on reduced hardware platforms

Daniel,

On Thu, 1 Aug 2019, Daniel Drake wrote:
> Working with a new consumer laptop based on AMD R7-3700U, we are
> seeing a kernel panic during early boot (before the display
> initializes). It's a new product and there is no previous known
> working kernel version (tested 5.0, 5.2 and current linus master).
>
> We may have also seen this problem on a MiniPC based on AMD APU 7010
> from another vendor, but we don't have it in hands right now to
> confirm that it's the exact same crash.
>
> earlycon shows the details: a NULL dereference under
> setup_boot_APIC_clock(), which actually happens in
> calibrate_APIC_clock():
>
> /* Replace the global interrupt handler */
> real_handler = global_clock_event->event_handler;
> global_clock_event->event_handler = lapic_cal_handler;
>
> global_clock_event is NULL here. This is a "reduced hardware" ACPI
> platform so acpi_generic_reduced_hw_init() has set timer_init to NULL,
> avoiding the usual codepaths that would set up global_clock_event.
>
> I tried the obvious:
> if (!global_clock_event)
> return -1;
>
> However I'm probably missing part of the big picture here, as this
> only makes boot fail later on. It continues til the next point that
> something leads to schedule(), such as a driver calling msleep() or
> mark_readonly() calling rcu_barrier(), etc. Then it hangs.
>
> Is something missing in terms of timer setup here? Suggestions
> appreciated...

So that trips over the problem that there is no timer to calibrate against
and the LAPIC freuency is obviously unknown.

How is the kernel supposed to figure that out?

The only possible option in that case is to use RTC, but we have no support
for this at all.

Thanks,

tglx

2019-08-01 08:38:50

by Li, Aubrey

[permalink] [raw]
Subject: Re: setup_boot_APIC_clock() NULL dereference during early boot on reduced hardware platforms

On 2019/8/1 16:13, Thomas Gleixner wrote:
> On Thu, 1 Aug 2019, Aubrey Li wrote:
>> On Thu, Aug 1, 2019 at 3:35 PM Thomas Gleixner <[email protected]> wrote:
>>>
>>> On Thu, 1 Aug 2019, Aubrey Li wrote:
>>>> On Thu, Aug 1, 2019 at 2:26 PM Daniel Drake <[email protected]> wrote:
>>>>> global_clock_event is NULL here. This is a "reduced hardware" ACPI
>>>>> platform so acpi_generic_reduced_hw_init() has set timer_init to NULL,
>>>>> avoiding the usual codepaths that would set up global_clock_event.
>>>>>
>>>> IIRC, acpi_generic_reduced_hw_init() avoids initializing PIT, the status of
>>>> this legacy device is unknown in ACPI hw-reduced mode.
>>>>
>>>>> I tried the obvious:
>>>>> if (!global_clock_event)
>>>>> return -1;
>>>>>
>>>> No, the platform needs a global clock event, can you turn on some other
>>>
>>> Wrong. The kernel boots perfectly fine without a global clock event. But
>>> for that the TSC and LAPIC frequency must be known.
>>
>> I think LAPIC fast calibrate is only supported on intel platform, while
>> Daniel's box is an AMD platform. That's why lapic_init_clockevent() failed
>> and fall into the code path which needs a global clock event.
>
> We know that.
>
> The point is that it does not matter which vendor a CPU comes from. The
> kernel does support legacyless boot when the frequencies are known. Whether
> that's currently possible on that particular CPU is a different question.
>
Yeah, I should specify, Daniel, your platform needs a global clock event, ;-)

Thanks,
-Aubrey

2019-08-01 09:35:09

by Thomas Gleixner

[permalink] [raw]
Subject: Re: setup_boot_APIC_clock() NULL dereference during early boot on reduced hardware platforms

On Thu, 1 Aug 2019, Aubrey Li wrote:
> On Thu, Aug 1, 2019 at 3:35 PM Thomas Gleixner <[email protected]> wrote:
> >
> > On Thu, 1 Aug 2019, Aubrey Li wrote:
> > > On Thu, Aug 1, 2019 at 2:26 PM Daniel Drake <[email protected]> wrote:
> > > > global_clock_event is NULL here. This is a "reduced hardware" ACPI
> > > > platform so acpi_generic_reduced_hw_init() has set timer_init to NULL,
> > > > avoiding the usual codepaths that would set up global_clock_event.
> > > >
> > > IIRC, acpi_generic_reduced_hw_init() avoids initializing PIT, the status of
> > > this legacy device is unknown in ACPI hw-reduced mode.
> > >
> > > > I tried the obvious:
> > > > if (!global_clock_event)
> > > > return -1;
> > > >
> > > No, the platform needs a global clock event, can you turn on some other
> >
> > Wrong. The kernel boots perfectly fine without a global clock event. But
> > for that the TSC and LAPIC frequency must be known.
>
> I think LAPIC fast calibrate is only supported on intel platform, while
> Daniel's box is an AMD platform. That's why lapic_init_clockevent() failed
> and fall into the code path which needs a global clock event.

We know that.

The point is that it does not matter which vendor a CPU comes from. The
kernel does support legacyless boot when the frequencies are known. Whether
that's currently possible on that particular CPU is a different question.

Thanks,

tglx

2019-08-01 09:52:15

by Daniel Drake

[permalink] [raw]
Subject: Re: setup_boot_APIC_clock() NULL dereference during early boot on reduced hardware platforms

On Thu, Aug 1, 2019 at 3:16 PM Aubrey Li <[email protected]> wrote:
> No, the platform needs a global clock event, can you turn on some other
> clock source on your platform, like HPET?

Thanks Audrey and Thomas for the quick hints!

I double checked under Windows - it seems to be using a HPET there.
Also there is the HPET ACPI table. So I think this is the right angle
to look at.

Under Linux, hpet_legacy_clockevent_register() is the function where
global_clock_event can be set to HPET.

However, the only way this can be called is from hpet_enable().

hpet_enable() is called from 2 places:
1. From hpet_time_init(). This is the default x86 timer_init that
acpi_generic_reduced_hw_init() took out of action here.
2. From hpet_late_init(). However that function is only called late,
after calibrate_APIC_clock() has already crashed the kernel. Also,
even if moved earlier it would also not call hpet_enable() here
because the ACPI HPET table parsing has already populated
hpet_address.

I tried slotting in a call to hpet_enable() at an earlier point
regardless, but I still end up with the kernel hanging later during
boot, probably because irq0 fails to be setup and this error is hit:
if (setup_irq(0, &irq0))
pr_info("Failed to register legacy timer interrupt\n");

I'll go deeper into that; further hints welcome too.

Thanks
Daniel

2019-08-01 10:15:36

by Thomas Gleixner

[permalink] [raw]
Subject: Re: setup_boot_APIC_clock() NULL dereference during early boot on reduced hardware platforms

On Thu, 1 Aug 2019, Li, Aubrey wrote:
> On 2019/8/1 16:13, Thomas Gleixner wrote:
> > The point is that it does not matter which vendor a CPU comes from. The
> > kernel does support legacyless boot when the frequencies are known. Whether
> > that's currently possible on that particular CPU is a different question.
> >
> Yeah, I should specify, Daniel, your platform needs a global clock event, ;-)

Care to look at the manuals before making assumptions?

2.1.9 Timers

Each core includes the following timers. These timers do not vary in
frequency regardless of the current P-state or C-state.

* Core::X86::Msr::TSC; the TSC increments at the rate specified by the
P0 Pstate. See Core::X86::Msr::PStateDef.

* The APIC timer (Core::X86::Apic::TimerInitialCount and
Core::X86::Apic::TimerCurrentCount), which increments at the rate of
2xCLKIN; the APIC timer may increment in units of between 1 and 8.

The Ryzens use a 100MHz input clock for the APIC normally, but I'm not sure
whether this is subject to overclocking. If so then it should be possible
to figure that out somehow. Tom?

Thanks,

tglx

2019-08-01 11:27:22

by Thomas Gleixner

[permalink] [raw]
Subject: Re: setup_boot_APIC_clock() NULL dereference during early boot on reduced hardware platforms

On Thu, 1 Aug 2019, Daniel Drake wrote:
> On Thu, Aug 1, 2019 at 3:16 PM Aubrey Li <[email protected]> wrote:
> However, the only way this can be called is from hpet_enable().
>
> hpet_enable() is called from 2 places:
> 1. From hpet_time_init(). This is the default x86 timer_init that
> acpi_generic_reduced_hw_init() took out of action here.
> 2. From hpet_late_init(). However that function is only called late,
> after calibrate_APIC_clock() has already crashed the kernel. Also,
> even if moved earlier it would also not call hpet_enable() here
> because the ACPI HPET table parsing has already populated
> hpet_address.
>
> I tried slotting in a call to hpet_enable() at an earlier point
> regardless, but I still end up with the kernel hanging later during
> boot, probably because irq0 fails to be setup and this error is hit:
> if (setup_irq(0, &irq0))
> pr_info("Failed to register legacy timer interrupt\n");

Right. The thing also lacks PIT :)

So there are two options:

1) Make sure the HPET is parsed somehow even with the reduced stuff

2) Make the clock frequency detection work.

#1 is a trainwreck

#2 is something we really want to have anyway. See the other reply. I cc'ed
Tom there, he should be able to give us the missing link.

Thanks,

tglx

2019-08-01 21:40:06

by Tom Lendacky

[permalink] [raw]
Subject: Re: setup_boot_APIC_clock() NULL dereference during early boot on reduced hardware platforms

On 8/1/19 5:13 AM, Thomas Gleixner wrote:
> On Thu, 1 Aug 2019, Li, Aubrey wrote:
>> On 2019/8/1 16:13, Thomas Gleixner wrote:
>>> The point is that it does not matter which vendor a CPU comes from. The
>>> kernel does support legacyless boot when the frequencies are known. Whether
>>> that's currently possible on that particular CPU is a different question.
>>>
>> Yeah, I should specify, Daniel, your platform needs a global clock event, ;-)
>
> Care to look at the manuals before making assumptions?
>
> 2.1.9 Timers
>
> Each core includes the following timers. These timers do not vary in
> frequency regardless of the current P-state or C-state.
>
> * Core::X86::Msr::TSC; the TSC increments at the rate specified by the
> P0 Pstate. See Core::X86::Msr::PStateDef.
>
> * The APIC timer (Core::X86::Apic::TimerInitialCount and
> Core::X86::Apic::TimerCurrentCount), which increments at the rate of
> 2xCLKIN; the APIC timer may increment in units of between 1 and 8.
>
> The Ryzens use a 100MHz input clock for the APIC normally, but I'm not sure
> whether this is subject to overclocking. If so then it should be possible
> to figure that out somehow. Tom?

Let me check with the hardware folks and I'll get back to you.

Thanks,
Tom

>
> Thanks,
>
> tglx
>

2019-08-08 20:38:05

by Thomas Gleixner

[permalink] [raw]
Subject: Re: setup_boot_APIC_clock() NULL dereference during early boot on reduced hardware platforms

Tom,

On Thu, 1 Aug 2019, Lendacky, Thomas wrote:
> On 8/1/19 5:13 AM, Thomas Gleixner wrote:
> > 2.1.9 Timers
> >
> > Each core includes the following timers. These timers do not vary in
> > frequency regardless of the current P-state or C-state.
> >
> > * Core::X86::Msr::TSC; the TSC increments at the rate specified by the
> > P0 Pstate. See Core::X86::Msr::PStateDef.
> >
> > * The APIC timer (Core::X86::Apic::TimerInitialCount and
> > Core::X86::Apic::TimerCurrentCount), which increments at the rate of
> > 2xCLKIN; the APIC timer may increment in units of between 1 and 8.
> >
> > The Ryzens use a 100MHz input clock for the APIC normally, but I'm not sure
> > whether this is subject to overclocking. If so then it should be possible
> > to figure that out somehow. Tom?
>
> Let me check with the hardware folks and I'll get back to you.

any update on this? The problem seems to come in from several sides now.

Thanks,

tglx

2019-08-08 21:04:24

by Tom Lendacky

[permalink] [raw]
Subject: Re: setup_boot_APIC_clock() NULL dereference during early boot on reduced hardware platforms

Hi Thomas,

On 8/8/19 3:36 PM, Thomas Gleixner wrote:
> Tom,
>
> On Thu, 1 Aug 2019, Lendacky, Thomas wrote:
>> On 8/1/19 5:13 AM, Thomas Gleixner wrote:
>>> 2.1.9 Timers
>>>
>>> Each core includes the following timers. These timers do not vary in
>>> frequency regardless of the current P-state or C-state.
>>>
>>> * Core::X86::Msr::TSC; the TSC increments at the rate specified by the
>>> P0 Pstate. See Core::X86::Msr::PStateDef.
>>>
>>> * The APIC timer (Core::X86::Apic::TimerInitialCount and
>>> Core::X86::Apic::TimerCurrentCount), which increments at the rate of
>>> 2xCLKIN; the APIC timer may increment in units of between 1 and 8.
>>>
>>> The Ryzens use a 100MHz input clock for the APIC normally, but I'm not sure
>>> whether this is subject to overclocking. If so then it should be possible
>>> to figure that out somehow. Tom?
>>
>> Let me check with the hardware folks and I'll get back to you.
>
> any update on this? The problem seems to come in from several sides now.

Yes, sort of. There are two ways to overclock and it all depends on which
one was used. If the overclocking is done by changing the multipliers,
then that 100MHz clock will still be 100MHz. But if the overclocking is
done by increasing the input clock, then that 100MHz clock will also
increase.

I was trying to get a bit more clarification on this before replying, but
it can be detected in software. The base clock is 100MHz, so read the P0
multiplier and the TSC should be counting at P0 * 100MHz. If you calibrate
the speed of the TSC with the HPET you can see what speed the TSC is
counting at. If you divide the TSC delta from the HPET calibration by the
P0 multiplier you will either get 100MHz if there is no overclocking or if
the multiplier method of overclocking was used, otherwise you'll get a
higher value if the input clock method was used. Either way, that should
give you the APIC clock speed based on a starting assumption of 100MHz.

I'm not all that familiar with this stuff, but I think I translated what
was told to me properly. I'm also not sure if this method can be used at
the point in the code this is all happening. Let me know if it makes sense
or not.

Thanks,
Tom

>
> Thanks,
>
> tglx
>

2019-08-08 21:10:40

by Thomas Gleixner

[permalink] [raw]
Subject: Re: setup_boot_APIC_clock() NULL dereference during early boot on reduced hardware platforms

Tom,

On Thu, 8 Aug 2019, Lendacky, Thomas wrote:
> On 8/8/19 3:36 PM, Thomas Gleixner wrote:
> > On Thu, 1 Aug 2019, Lendacky, Thomas wrote:
> >> On 8/1/19 5:13 AM, Thomas Gleixner wrote:
> >>> 2.1.9 Timers
> >>>
> >>> Each core includes the following timers. These timers do not vary in
> >>> frequency regardless of the current P-state or C-state.
> >>>
> >>> * Core::X86::Msr::TSC; the TSC increments at the rate specified by the
> >>> P0 Pstate. See Core::X86::Msr::PStateDef.
> >>>
> >>> * The APIC timer (Core::X86::Apic::TimerInitialCount and
> >>> Core::X86::Apic::TimerCurrentCount), which increments at the rate of
> >>> 2xCLKIN; the APIC timer may increment in units of between 1 and 8.
> >>>
> >>> The Ryzens use a 100MHz input clock for the APIC normally, but I'm not sure
> >>> whether this is subject to overclocking. If so then it should be possible
> >>> to figure that out somehow. Tom?
> >>
> >> Let me check with the hardware folks and I'll get back to you.
> >
> > any update on this? The problem seems to come in from several sides now.
>
> Yes, sort of. There are two ways to overclock and it all depends on which
> one was used. If the overclocking is done by changing the multipliers,
> then that 100MHz clock will still be 100MHz. But if the overclocking is
> done by increasing the input clock, then that 100MHz clock will also
> increase.
>
> I was trying to get a bit more clarification on this before replying, but
> it can be detected in software. The base clock is 100MHz, so read the P0
> multiplier and the TSC should be counting at P0 * 100MHz. If you calibrate
> the speed of the TSC with the HPET you can see what speed the TSC is
> counting at. If you divide the TSC delta from the HPET calibration by the
> P0 multiplier you will either get 100MHz if there is no overclocking or if
> the multiplier method of overclocking was used, otherwise you'll get a
> higher value if the input clock method was used. Either way, that should
> give you the APIC clock speed based on a starting assumption of 100MHz.

The problem is that we have no HPET on those machines ....

I think I can get away without having HPET and PIT and do some smart stuff
with the pm timer for that stuff. I'll look at it tomorrow with brain
actually awake.

Thanks,

tglx





2019-08-08 21:37:42

by Tom Lendacky

[permalink] [raw]
Subject: Re: setup_boot_APIC_clock() NULL dereference during early boot on reduced hardware platforms

Hi Thomas,

On 8/8/19 4:08 PM, Thomas Gleixner wrote:
> Tom,
>
> On Thu, 8 Aug 2019, Lendacky, Thomas wrote:
>> On 8/8/19 3:36 PM, Thomas Gleixner wrote:
>>> On Thu, 1 Aug 2019, Lendacky, Thomas wrote:
>>>> On 8/1/19 5:13 AM, Thomas Gleixner wrote:
>>>>> 2.1.9 Timers
>>>>>
>>>>> Each core includes the following timers. These timers do not vary in
>>>>> frequency regardless of the current P-state or C-state.
>>>>>
>>>>> * Core::X86::Msr::TSC; the TSC increments at the rate specified by the
>>>>> P0 Pstate. See Core::X86::Msr::PStateDef.
>>>>>
>>>>> * The APIC timer (Core::X86::Apic::TimerInitialCount and
>>>>> Core::X86::Apic::TimerCurrentCount), which increments at the rate of
>>>>> 2xCLKIN; the APIC timer may increment in units of between 1 and 8.
>>>>>
>>>>> The Ryzens use a 100MHz input clock for the APIC normally, but I'm not sure
>>>>> whether this is subject to overclocking. If so then it should be possible
>>>>> to figure that out somehow. Tom?
>>>>
>>>> Let me check with the hardware folks and I'll get back to you.
>>>
>>> any update on this? The problem seems to come in from several sides now.
>>
>> Yes, sort of. There are two ways to overclock and it all depends on which
>> one was used. If the overclocking is done by changing the multipliers,
>> then that 100MHz clock will still be 100MHz. But if the overclocking is
>> done by increasing the input clock, then that 100MHz clock will also
>> increase.
>>
>> I was trying to get a bit more clarification on this before replying, but
>> it can be detected in software. The base clock is 100MHz, so read the P0
>> multiplier and the TSC should be counting at P0 * 100MHz. If you calibrate
>> the speed of the TSC with the HPET you can see what speed the TSC is
>> counting at. If you divide the TSC delta from the HPET calibration by the
>> P0 multiplier you will either get 100MHz if there is no overclocking or if
>> the multiplier method of overclocking was used, otherwise you'll get a
>> higher value if the input clock method was used. Either way, that should
>> give you the APIC clock speed based on a starting assumption of 100MHz.
>
> The problem is that we have no HPET on those machines ....

Sorry about that... I interpreted the email[1] that said the HPET ACPI
table was present, incorrectly. I get it now, for hardware-reduced ACPI
you can't depend on that table to be present.

Thanks,
Tom

[1] https://lore.kernel.org/lkml/CAD8Lp452GdoL-Bt7rSP=u3RKEZ2H3qm3LvKfe=cCsjP0biG_sQ@mail.gmail.com/

>
> I think I can get away without having HPET and PIT and do some smart stuff
> with the pm timer for that stuff. I'll look at it tomorrow with brain
> actually awake.
>
> Thanks,
>
> tglx
>
>
>
>
>

2019-08-09 12:55:56

by Thomas Gleixner

[permalink] [raw]
Subject: [PATCH] x86/apic: Handle missing global clockevent gracefully

Some newer machines do not advertise legacy timers. The kernel can handle
that situation if the TSC and the CPU frequency are enumerated by CPUID or
MSRs and the CPU supports TSC deadline timer. If the CPU does not support
TSC deadline timer the local APIC timer frequency has to be known as well.

Some Ryzens machines do not advertize legacy timers, but there is no
reliable way to determine the bus frequency which feeds the local APIC
timer when the machine allows overclocking of that frequency.

As there is no legacy timer the local APIC timer calibration crashes due to
a NULL pointer dereference when accessing the not installed global clock
event device.

Switch the calibration loop to a non interrupt based one, which polls
either TSC (frequency known) or jiffies. The latter requires a global
clockevent. As the machines which do not have a global clockevent installed
have a known TSC frequency this is a non issue. For older machines where
TSC frequency is not known, there is no known case where the legacy timers
do not exist as that would have been reported long ago.

Reported-by: Daniel Drake <[email protected]>
Reported-by: Jiri Slaby <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
---

Note: Only lightly tested, but of course not on an affected machine.

If that works reliably, then this needs some exhaustive testing
on a wide range of systems so we can risk backports to stable
kernels.

---
arch/x86/kernel/apic/apic.c | 70 +++++++++++++++++++++++++++++++++-----------
include/linux/acpi_pmtmr.h | 10 ++++++
2 files changed, 64 insertions(+), 16 deletions(-)

--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -851,7 +851,8 @@ bool __init apic_needs_pit(void)
static int __init calibrate_APIC_clock(void)
{
struct clock_event_device *levt = this_cpu_ptr(&lapic_events);
- void (*real_handler)(struct clock_event_device *dev);
+ u64 tsc_perj = 0, tsc_start = 0;
+ unsigned long jif_start;
unsigned long deltaj;
long delta, deltatsc;
int pm_referenced = 0;
@@ -878,29 +879,65 @@ static int __init calibrate_APIC_clock(v
apic_printk(APIC_VERBOSE, "Using local APIC timer interrupts.\n"
"calibrating APIC timer ...\n");

- local_irq_disable();
-
- /* Replace the global interrupt handler */
- real_handler = global_clock_event->event_handler;
- global_clock_event->event_handler = lapic_cal_handler;
+ /*
+ * There are platforms w/o global clockevent devices. Instead of
+ * making the calibration conditional on that, use a polling based
+ * approach everywhere.
+ */

+ local_irq_disable();
/*
* Setup the APIC counter to maximum. There is no way the lapic
* can underflow in the 100ms detection time frame
*/
__setup_APIC_LVTT(0xffffffff, 0, 0);

- /* Let the interrupts run */
- local_irq_enable();
+ /*
+ * Methods to terminate the calibration loop:
+ * 1) Global clockevent if available (jiffies)
+ * 2) TSC if available and frequency is known
+ */
+ jif_start = READ_ONCE(jiffies);
+
+ if (tsc_khz) {
+ tsc_start = rdtsc();
+ tsc_perj = div_u64((u64)tsc_khz * 1000, HZ);
+ }
+
+ while (lapic_cal_loops <= LAPIC_CAL_LOOPS) {
+ /*
+ * Enable interrupts so the tick can fire, if a global
+ * clockevent device is available
+ */
+ local_irq_enable();

- while (lapic_cal_loops <= LAPIC_CAL_LOOPS)
- cpu_relax();
+ /* Wait for a tick to elapse */
+ while (1) {
+ if (tsc_khz) {
+ u64 tsc_now = rdtsc();
+ if ((tsc_now - tsc_start) >= tsc_perj) {
+ tsc_start += tsc_perj;
+ break;
+ }
+ } else {
+ unsigned long jif_now = READ_ONCE(jiffies);
+
+ if (time_after(jif_now, jif_start)) {
+ jif_start = jif_now;
+ break;
+ }
+ }
+ cpu_relax();
+ }
+
+ /* Invoke the calibration routine */
+ local_irq_disable();
+ lapic_cal_handler(NULL);
+ local_irq_enable();
+ }

local_irq_disable();

- /* Restore the real event handler */
- global_clock_event->event_handler = real_handler;
-
/* Build delta t1-t2 as apic timer counts down */
delta = lapic_cal_t1 - lapic_cal_t2;
apic_printk(APIC_VERBOSE, "... lapic delta = %ld\n", delta);
@@ -943,10 +980,11 @@ static int __init calibrate_APIC_clock(v
levt->features &= ~CLOCK_EVT_FEAT_DUMMY;

/*
- * PM timer calibration failed or not turned on
- * so lets try APIC timer based calibration
+ * PM timer calibration failed or not turned on so lets try APIC
+ * timer based calibration, if a global clockevent device is
+ * available.
*/
- if (!pm_referenced) {
+ if (!pm_referenced && global_clock_event) {
apic_printk(APIC_VERBOSE, "... verify APIC timer\n");

/*
--- a/include/linux/acpi_pmtmr.h
+++ b/include/linux/acpi_pmtmr.h
@@ -18,6 +18,11 @@
extern u32 acpi_pm_read_verified(void);
extern u32 pmtmr_ioport;

+static inline bool acpi_pm_timer_available(void)
+{
+ return pmtmr_ioport != 0;
+}
+
static inline u32 acpi_pm_read_early(void)
{
if (!pmtmr_ioport)
@@ -28,6 +33,11 @@ static inline u32 acpi_pm_read_early(voi

#else

+static inline bool acpi_pm_timer_available(void)
+{
+ return false;
+}
+
static inline u32 acpi_pm_read_early(void)
{
return 0;

2019-08-09 13:00:02

by Jiri Slaby

[permalink] [raw]
Subject: Re: [PATCH] x86/apic: Handle missing global clockevent gracefully

On 09. 08. 19, 14:54, Thomas Gleixner wrote:
> Some newer machines do not advertise legacy timers. The kernel can handle
> that situation if the TSC and the CPU frequency are enumerated by CPUID or
> MSRs and the CPU supports TSC deadline timer. If the CPU does not support
> TSC deadline timer the local APIC timer frequency has to be known as well.
>
> Some Ryzens machines do not advertize legacy timers, but there is no
> reliable way to determine the bus frequency which feeds the local APIC
> timer when the machine allows overclocking of that frequency.
>
> As there is no legacy timer the local APIC timer calibration crashes due to
> a NULL pointer dereference when accessing the not installed global clock
> event device.
>
> Switch the calibration loop to a non interrupt based one, which polls
> either TSC (frequency known) or jiffies. The latter requires a global
> clockevent. As the machines which do not have a global clockevent installed
> have a known TSC frequency this is a non issue. For older machines where
> TSC frequency is not known, there is no known case where the legacy timers
> do not exist as that would have been reported long ago.
>
> Reported-by: Daniel Drake <[email protected]>
> Reported-by: Jiri Slaby <[email protected]>
> Signed-off-by: Thomas Gleixner <[email protected]>
> ---
>
> Note: Only lightly tested, but of course not on an affected machine.

Thanks, I will make them test the patch and let you know.

--
js
suse labs

2019-08-09 14:01:51

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH] x86/apic: Handle missing global clockevent gracefully

On 8/9/19 7:54 AM, Thomas Gleixner wrote:
> Some newer machines do not advertise legacy timers. The kernel can handle
> that situation if the TSC and the CPU frequency are enumerated by CPUID or
> MSRs and the CPU supports TSC deadline timer. If the CPU does not support
> TSC deadline timer the local APIC timer frequency has to be known as well.
>
> Some Ryzens machines do not advertize legacy timers, but there is no
> reliable way to determine the bus frequency which feeds the local APIC
> timer when the machine allows overclocking of that frequency.
>
> As there is no legacy timer the local APIC timer calibration crashes due to
> a NULL pointer dereference when accessing the not installed global clock
> event device.
>
> Switch the calibration loop to a non interrupt based one, which polls
> either TSC (frequency known) or jiffies. The latter requires a global
> clockevent. As the machines which do not have a global clockevent installed
> have a known TSC frequency this is a non issue. For older machines where
> TSC frequency is not known, there is no known case where the legacy timers
> do not exist as that would have been reported long ago.
>
> Reported-by: Daniel Drake <[email protected]>
> Reported-by: Jiri Slaby <[email protected]>
> Signed-off-by: Thomas Gleixner <[email protected]>
> ---
>
> Note: Only lightly tested, but of course not on an affected machine.
>
> If that works reliably, then this needs some exhaustive testing
> on a wide range of systems so we can risk backports to stable
> kernels.
>
> ---
> arch/x86/kernel/apic/apic.c | 70 +++++++++++++++++++++++++++++++++-----------
> include/linux/acpi_pmtmr.h | 10 ++++++
> 2 files changed, 64 insertions(+), 16 deletions(-)
>
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -851,7 +851,8 @@ bool __init apic_needs_pit(void)
> static int __init calibrate_APIC_clock(void)
> {
> struct clock_event_device *levt = this_cpu_ptr(&lapic_events);
> - void (*real_handler)(struct clock_event_device *dev);
> + u64 tsc_perj = 0, tsc_start = 0;
> + unsigned long jif_start;
> unsigned long deltaj;
> long delta, deltatsc;
> int pm_referenced = 0;
> @@ -878,29 +879,65 @@ static int __init calibrate_APIC_clock(v
> apic_printk(APIC_VERBOSE, "Using local APIC timer interrupts.\n"
> "calibrating APIC timer ...\n");
>
> - local_irq_disable();
> -
> - /* Replace the global interrupt handler */
> - real_handler = global_clock_event->event_handler;
> - global_clock_event->event_handler = lapic_cal_handler;
> + /*
> + * There are platforms w/o global clockevent devices. Instead of
> + * making the calibration conditional on that, use a polling based
> + * approach everywhere.
> + */
>
> + local_irq_disable();
> /*
> * Setup the APIC counter to maximum. There is no way the lapic
> * can underflow in the 100ms detection time frame
> */
> __setup_APIC_LVTT(0xffffffff, 0, 0);
>
> - /* Let the interrupts run */
> - local_irq_enable();
> + /*
> + * Methods to terminate the calibration loop:
> + * 1) Global clockevent if available (jiffies)
> + * 2) TSC if available and frequency is known
> + */
> + jif_start = READ_ONCE(jiffies);
> +
> + if (tsc_khz) {
> + tsc_start = rdtsc();
> + tsc_perj = div_u64((u64)tsc_khz * 1000, HZ);
> + }
> +
> + while (lapic_cal_loops <= LAPIC_CAL_LOOPS) {
> + /*
> + * Enable interrupts so the tick can fire, if a global
> + * clockevent device is available
> + */
> + local_irq_enable();

Just a nit, but you end up doing this at the bottom of the loop, so you
could move this invocation to just before the loop and avoid doing two
local_irq_enable() calls in succession after the first iteration.

Thanks,
Tom

>
> - while (lapic_cal_loops <= LAPIC_CAL_LOOPS)
> - cpu_relax();
> + /* Wait for a tick to elapse */
> + while (1) {
> + if (tsc_khz) {
> + u64 tsc_now = rdtsc();
> + if ((tsc_now - tsc_start) >= tsc_perj) {
> + tsc_start += tsc_perj;
> + break;
> + }
> + } else {
> + unsigned long jif_now = READ_ONCE(jiffies);
> +
> + if (time_after(jif_now, jif_start)) {
> + jif_start = jif_now;
> + break;
> + }
> + }
> + cpu_relax();
> + }
> +
> + /* Invoke the calibration routine */
> + local_irq_disable();
> + lapic_cal_handler(NULL);
> + local_irq_enable();
> + }
>
> local_irq_disable();
>
> - /* Restore the real event handler */
> - global_clock_event->event_handler = real_handler;
> -
> /* Build delta t1-t2 as apic timer counts down */
> delta = lapic_cal_t1 - lapic_cal_t2;
> apic_printk(APIC_VERBOSE, "... lapic delta = %ld\n", delta);
> @@ -943,10 +980,11 @@ static int __init calibrate_APIC_clock(v
> levt->features &= ~CLOCK_EVT_FEAT_DUMMY;
>
> /*
> - * PM timer calibration failed or not turned on
> - * so lets try APIC timer based calibration
> + * PM timer calibration failed or not turned on so lets try APIC
> + * timer based calibration, if a global clockevent device is
> + * available.
> */
> - if (!pm_referenced) {
> + if (!pm_referenced && global_clock_event) {
> apic_printk(APIC_VERBOSE, "... verify APIC timer\n");
>
> /*
> --- a/include/linux/acpi_pmtmr.h
> +++ b/include/linux/acpi_pmtmr.h
> @@ -18,6 +18,11 @@
> extern u32 acpi_pm_read_verified(void);
> extern u32 pmtmr_ioport;
>
> +static inline bool acpi_pm_timer_available(void)
> +{
> + return pmtmr_ioport != 0;
> +}
> +
> static inline u32 acpi_pm_read_early(void)
> {
> if (!pmtmr_ioport)
> @@ -28,6 +33,11 @@ static inline u32 acpi_pm_read_early(voi
>
> #else
>
> +static inline bool acpi_pm_timer_available(void)
> +{
> + return false;
> +}
> +
> static inline u32 acpi_pm_read_early(void)
> {
> return 0;
>

2019-08-09 19:41:44

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] x86/apic: Handle missing global clockevent gracefully

On Fri, 9 Aug 2019, Lendacky, Thomas wrote:
> On 8/9/19 7:54 AM, Thomas Gleixner wrote:
> > + local_irq_disable();
> > /*
> > * Setup the APIC counter to maximum. There is no way the lapic
> > * can underflow in the 100ms detection time frame
> > */
> > __setup_APIC_LVTT(0xffffffff, 0, 0);
> >
> > - /* Let the interrupts run */
> > - local_irq_enable();
> > + /*
> > + * Methods to terminate the calibration loop:
> > + * 1) Global clockevent if available (jiffies)
> > + * 2) TSC if available and frequency is known
> > + */
> > + jif_start = READ_ONCE(jiffies);
> > +
> > + if (tsc_khz) {
> > + tsc_start = rdtsc();
> > + tsc_perj = div_u64((u64)tsc_khz * 1000, HZ);
> > + }
> > +
> > + while (lapic_cal_loops <= LAPIC_CAL_LOOPS) {
> > + /*
> > + * Enable interrupts so the tick can fire, if a global
> > + * clockevent device is available
> > + */
> > + local_irq_enable();
>
> Just a nit, but you end up doing this at the bottom of the loop, so you
> could move this invocation to just before the loop and avoid doing two
> local_irq_enable() calls in succession after the first iteration.

Indeed. Lets see how the reports go. That change is a nobrainer.

Thanks,

tglx

2019-08-12 06:18:49

by Daniel Drake

[permalink] [raw]
Subject: Re: [PATCH] x86/apic: Handle missing global clockevent gracefully

On Fri, Aug 9, 2019 at 8:54 PM Thomas Gleixner <[email protected]> wrote:
> Some newer machines do not advertise legacy timers. The kernel can handle
> that situation if the TSC and the CPU frequency are enumerated by CPUID or
> MSRs and the CPU supports TSC deadline timer. If the CPU does not support
> TSC deadline timer the local APIC timer frequency has to be known as well.
>
> Some Ryzens machines do not advertize legacy timers, but there is no
> reliable way to determine the bus frequency which feeds the local APIC
> timer when the machine allows overclocking of that frequency.
>
> As there is no legacy timer the local APIC timer calibration crashes due to
> a NULL pointer dereference when accessing the not installed global clock
> event device.
>
> Switch the calibration loop to a non interrupt based one, which polls
> either TSC (frequency known) or jiffies. The latter requires a global
> clockevent. As the machines which do not have a global clockevent installed
> have a known TSC frequency this is a non issue. For older machines where
> TSC frequency is not known, there is no known case where the legacy timers
> do not exist as that would have been reported long ago.

This solves the problem I described in the thread:
setup_boot_APIC_clock() NULL dereference during early boot on
reduced hardware platforms

Thanks for your quick support!

> Note: Only lightly tested, but of course not on an affected machine.
>
> If that works reliably, then this needs some exhaustive testing
> on a wide range of systems so we can risk backports to stable
> kernels.

I can do a bit of testing on other platforms too. Are there any
specific tests I should run, other than checking that the system boots
and doesn't have any timer watchdog complaints in the log?

Thanks
Daniel

2019-08-12 07:04:50

by Jiri Slaby

[permalink] [raw]
Subject: Re: [PATCH] x86/apic: Handle missing global clockevent gracefully

On 12. 08. 19, 8:16, Daniel Drake wrote:
> On Fri, Aug 9, 2019 at 8:54 PM Thomas Gleixner <[email protected]> wrote:
>> Some newer machines do not advertise legacy timers. The kernel can handle
>> that situation if the TSC and the CPU frequency are enumerated by CPUID or
>> MSRs and the CPU supports TSC deadline timer. If the CPU does not support
>> TSC deadline timer the local APIC timer frequency has to be known as well.
>>
>> Some Ryzens machines do not advertize legacy timers, but there is no
>> reliable way to determine the bus frequency which feeds the local APIC
>> timer when the machine allows overclocking of that frequency.
>>
>> As there is no legacy timer the local APIC timer calibration crashes due to
>> a NULL pointer dereference when accessing the not installed global clock
>> event device.
>>
>> Switch the calibration loop to a non interrupt based one, which polls
>> either TSC (frequency known) or jiffies. The latter requires a global
>> clockevent. As the machines which do not have a global clockevent installed
>> have a known TSC frequency this is a non issue. For older machines where
>> TSC frequency is not known, there is no known case where the legacy timers
>> do not exist as that would have been reported long ago.
>
> This solves the problem I described in the thread:
> setup_boot_APIC_clock() NULL dereference during early boot on
> reduced hardware platforms

So it does for the openSUSE user:
http://bugzilla.opensuse.org/show_bug.cgi?id=1142926#c12

=========
After installing that build of the kernel from your OBS home project,
that did
more than just fix the issue with the APIC timer screwing up. I now have
all 4
cores/8 threads available.

I do see some errors from the ACPI layer that do indicate that there are
some
areas of the BIOS from HP that are buggy, but at this time, the machine
seems
to be working without issue.
=========

dmesg here:
http://bugzilla.opensuse.org/attachment.cgi?id=813577

thanks,
--
js
suse labs

2019-08-12 07:47:56

by Li, Aubrey

[permalink] [raw]
Subject: Re: [PATCH] x86/apic: Handle missing global clockevent gracefully

On 2019/8/9 20:54, Thomas Gleixner wrote:
> Some newer machines do not advertise legacy timers. The kernel can handle
> that situation if the TSC and the CPU frequency are enumerated by CPUID or
> MSRs and the CPU supports TSC deadline timer. If the CPU does not support
> TSC deadline timer the local APIC timer frequency has to be known as well.
>
> Some Ryzens machines do not advertize legacy timers, but there is no
> reliable way to determine the bus frequency which feeds the local APIC
> timer when the machine allows overclocking of that frequency.

Are these platforms are all ACPI HW-reduced platform?

>
> As there is no legacy timer the local APIC timer calibration crashes due to
> a NULL pointer dereference when accessing the not installed global clock
> event device.
>
> Switch the calibration loop to a non interrupt based one, which polls
> either TSC (frequency known) or jiffies. The latter requires a global
> clockevent. As the machines which do not have a global clockevent installed
> have a known TSC frequency this is a non issue. For older machines where
> TSC frequency is not known, there is no known case where the legacy timers
> do not exist as that would have been reported long ago.
>
> Reported-by: Daniel Drake <[email protected]>
> Reported-by: Jiri Slaby <[email protected]>
> Signed-off-by: Thomas Gleixner <[email protected]>
> ---
>
> Note: Only lightly tested, but of course not on an affected machine.
>
> If that works reliably, then this needs some exhaustive testing
> on a wide range of systems so we can risk backports to stable
> kernels.
>
> ---
> arch/x86/kernel/apic/apic.c | 70 +++++++++++++++++++++++++++++++++-----------
> include/linux/acpi_pmtmr.h | 10 ++++++
> 2 files changed, 64 insertions(+), 16 deletions(-)
>
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -851,7 +851,8 @@ bool __init apic_needs_pit(void)
> static int __init calibrate_APIC_clock(void)
> {
> struct clock_event_device *levt = this_cpu_ptr(&lapic_events);
> - void (*real_handler)(struct clock_event_device *dev);
> + u64 tsc_perj = 0, tsc_start = 0;
> + unsigned long jif_start;
> unsigned long deltaj;
> long delta, deltatsc;
> int pm_referenced = 0;
> @@ -878,29 +879,65 @@ static int __init calibrate_APIC_clock(v
> apic_printk(APIC_VERBOSE, "Using local APIC timer interrupts.\n"
> "calibrating APIC timer ...\n");
>
> - local_irq_disable();
> -
> - /* Replace the global interrupt handler */
> - real_handler = global_clock_event->event_handler;
> - global_clock_event->event_handler = lapic_cal_handler;
> + /*
> + * There are platforms w/o global clockevent devices. Instead of
> + * making the calibration conditional on that, use a polling based
> + * approach everywhere.
> + */
>
> + local_irq_disable();
> /*
> * Setup the APIC counter to maximum. There is no way the lapic
> * can underflow in the 100ms detection time frame
> */
> __setup_APIC_LVTT(0xffffffff, 0, 0);
>
> - /* Let the interrupts run */
> - local_irq_enable();
> + /*
> + * Methods to terminate the calibration loop:
> + * 1) Global clockevent if available (jiffies)
> + * 2) TSC if available and frequency is known
> + */
> + jif_start = READ_ONCE(jiffies);
> +
> + if (tsc_khz) {
> + tsc_start = rdtsc();
> + tsc_perj = div_u64((u64)tsc_khz * 1000, HZ);
> + }
> +
> + while (lapic_cal_loops <= LAPIC_CAL_LOOPS) {

Is this loop still meaningful, can we just invoke the handler twice
before and after the tick?

Thanks,
-Aubrey

> + /*
> + * Enable interrupts so the tick can fire, if a global
> + * clockevent device is available
> + */
> + local_irq_enable();
>
> - while (lapic_cal_loops <= LAPIC_CAL_LOOPS)
> - cpu_relax();
> + /* Wait for a tick to elapse */
> + while (1) {
> + if (tsc_khz) {
> + u64 tsc_now = rdtsc();
> + if ((tsc_now - tsc_start) >= tsc_perj) {
> + tsc_start += tsc_perj;
> + break;
> + }
> + } else {
> + unsigned long jif_now = READ_ONCE(jiffies);
> +
> + if (time_after(jif_now, jif_start)) {
> + jif_start = jif_now;
> + break;
> + }
> + }
> + cpu_relax();
> + }
> +
> + /* Invoke the calibration routine */
> + local_irq_disable();
> + lapic_cal_handler(NULL);
> + local_irq_enable();
> + }
>
> local_irq_disable();
>
> - /* Restore the real event handler */
> - global_clock_event->event_handler = real_handler;
> -
> /* Build delta t1-t2 as apic timer counts down */
> delta = lapic_cal_t1 - lapic_cal_t2;
> apic_printk(APIC_VERBOSE, "... lapic delta = %ld\n", delta);
> @@ -943,10 +980,11 @@ static int __init calibrate_APIC_clock(v
> levt->features &= ~CLOCK_EVT_FEAT_DUMMY;
>
> /*
> - * PM timer calibration failed or not turned on
> - * so lets try APIC timer based calibration
> + * PM timer calibration failed or not turned on so lets try APIC
> + * timer based calibration, if a global clockevent device is
> + * available.
> */
> - if (!pm_referenced) {
> + if (!pm_referenced && global_clock_event) {
> apic_printk(APIC_VERBOSE, "... verify APIC timer\n");
>
> /*
> --- a/include/linux/acpi_pmtmr.h
> +++ b/include/linux/acpi_pmtmr.h
> @@ -18,6 +18,11 @@
> extern u32 acpi_pm_read_verified(void);
> extern u32 pmtmr_ioport;
>
> +static inline bool acpi_pm_timer_available(void)
> +{
> + return pmtmr_ioport != 0;
> +}
> +
> static inline u32 acpi_pm_read_early(void)
> {
> if (!pmtmr_ioport)
> @@ -28,6 +33,11 @@ static inline u32 acpi_pm_read_early(voi
>
> #else
>
> +static inline bool acpi_pm_timer_available(void)
> +{
> + return false;
> +}
> +
> static inline u32 acpi_pm_read_early(void)
> {
> return 0;
>

2019-08-12 12:27:19

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] x86/apic: Handle missing global clockevent gracefully

On Mon, 12 Aug 2019, Li, Aubrey wrote:
> On 2019/8/9 20:54, Thomas Gleixner wrote:
> > + local_irq_disable();
> > /*
> > * Setup the APIC counter to maximum. There is no way the lapic
> > * can underflow in the 100ms detection time frame
> > */
> > __setup_APIC_LVTT(0xffffffff, 0, 0);
> >
> > - /* Let the interrupts run */
> > - local_irq_enable();
> > + /*
> > + * Methods to terminate the calibration loop:
> > + * 1) Global clockevent if available (jiffies)
> > + * 2) TSC if available and frequency is known
> > + */
> > + jif_start = READ_ONCE(jiffies);
> > +
> > + if (tsc_khz) {
> > + tsc_start = rdtsc();
> > + tsc_perj = div_u64((u64)tsc_khz * 1000, HZ);
> > + }
> > +
> > + while (lapic_cal_loops <= LAPIC_CAL_LOOPS) {
>
> Is this loop still meaningful, can we just invoke the handler twice
> before and after the tick?

And that solves what?

> Thanks,
> -Aubrey

<Remove tons of useless quote>

Can you please trim your replies?

Thanks,

tglx

2019-08-12 13:00:56

by Aubrey Li

[permalink] [raw]
Subject: Re: [PATCH] x86/apic: Handle missing global clockevent gracefully

On Mon, Aug 12, 2019 at 8:25 PM Thomas Gleixner <[email protected]> wrote:
>
> On Mon, 12 Aug 2019, Li, Aubrey wrote:
> > On 2019/8/9 20:54, Thomas Gleixner wrote:
> > > + local_irq_disable();
> > > /*
> > > * Setup the APIC counter to maximum. There is no way the lapic
> > > * can underflow in the 100ms detection time frame
> > > */
> > > __setup_APIC_LVTT(0xffffffff, 0, 0);
> > >
> > > - /* Let the interrupts run */
> > > - local_irq_enable();
> > > + /*
> > > + * Methods to terminate the calibration loop:
> > > + * 1) Global clockevent if available (jiffies)
> > > + * 2) TSC if available and frequency is known
> > > + */
> > > + jif_start = READ_ONCE(jiffies);
> > > +
> > > + if (tsc_khz) {
> > > + tsc_start = rdtsc();
> > > + tsc_perj = div_u64((u64)tsc_khz * 1000, HZ);
> > > + }
> > > +
> > > + while (lapic_cal_loops <= LAPIC_CAL_LOOPS) {
> >
> > Is this loop still meaningful, can we just invoke the handler twice
> > before and after the tick?
>
> And that solves what?
>

I meant, can we do this one time?
- lapic_cal_t1 = read APIC counter
- /* Wait for a tick to elapse */
- lapic_cal_t2 = read APIC counter

I'm not clear why we still need this loop, to use the
existing lapic_cal_handler()?

Thanks,
-Aubrey

2019-08-12 17:12:36

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] x86/apic: Handle missing global clockevent gracefully

On Mon, 12 Aug 2019, Aubrey Li wrote:

> On Mon, Aug 12, 2019 at 8:25 PM Thomas Gleixner <[email protected]> wrote:
> >
> > On Mon, 12 Aug 2019, Li, Aubrey wrote:
> > > On 2019/8/9 20:54, Thomas Gleixner wrote:
> > > > + local_irq_disable();
> > > > /*
> > > > * Setup the APIC counter to maximum. There is no way the lapic
> > > > * can underflow in the 100ms detection time frame
> > > > */
> > > > __setup_APIC_LVTT(0xffffffff, 0, 0);
> > > >
> > > > - /* Let the interrupts run */
> > > > - local_irq_enable();
> > > > + /*
> > > > + * Methods to terminate the calibration loop:
> > > > + * 1) Global clockevent if available (jiffies)
> > > > + * 2) TSC if available and frequency is known
> > > > + */
> > > > + jif_start = READ_ONCE(jiffies);
> > > > +
> > > > + if (tsc_khz) {
> > > > + tsc_start = rdtsc();
> > > > + tsc_perj = div_u64((u64)tsc_khz * 1000, HZ);
> > > > + }
> > > > +
> > > > + while (lapic_cal_loops <= LAPIC_CAL_LOOPS) {
> > >
> > > Is this loop still meaningful, can we just invoke the handler twice
> > > before and after the tick?
> >
> > And that solves what?
> >
> I meant, can we do this one time?
> - lapic_cal_t1 = read APIC counter
> - /* Wait for a tick to elapse */
> - lapic_cal_t2 = read APIC counter

Sure, but how does this work with randomly broken hardware, e.g. PIT
running at the wrong frequency/

The calibration code is trying to verify against as many and as reliable
references and it served us well so far.

> I'm not clear why we still need this loop, to use the
> existing lapic_cal_handler()?

A single tick is way too small to get a proper calibration. Sure, this can
be optimized by avoiding the loop and have a longer delay, but you
definitely want to use the rest of the calibration code as is.

Aside of that this was the minial fix I came up with which might be
suitable for backporting. These platforms seem to come out of the woods
right now, so we definitely want support for them in LTS kernels as well.

Thanks,

tglx


2019-08-15 05:47:57

by Daniel Drake

[permalink] [raw]
Subject: Re: [PATCH] x86/apic: Handle missing global clockevent gracefully

On Mon, Aug 12, 2019 at 2:16 PM Daniel Drake <[email protected]> wrote:
> I can do a bit of testing on other platforms too. Are there any
> specific tests I should run, other than checking that the system boots
> and doesn't have any timer watchdog complaints in the log?

Tested this on 2 AMD platforms that were not affected by the crash here.
In addition to confirming that they boot fine without timer complaints
in the logs, I checked the calibrate_APIC_clock() result before and
after this patch. I repeated each test twice.

Asus E402YA (AMD E2-7015)
Before: 99811, 99811
After: 99812, 99812

Acer Aspire A315-21G (AMD A9-9420e)
Before: 99811, 99811
After: 99807, 99820

Those new numbers seem very close to the previous ones and I didn't
observe any problems.

Thanks
Daniel

Subject: [tip:x86/urgent] x86/apic: Handle missing global clockevent gracefully

Commit-ID: f897e60a12f0b9146357780d317879bce2a877dc
Gitweb: https://git.kernel.org/tip/f897e60a12f0b9146357780d317879bce2a877dc
Author: Thomas Gleixner <[email protected]>
AuthorDate: Fri, 9 Aug 2019 14:54:07 +0200
Committer: Thomas Gleixner <[email protected]>
CommitDate: Mon, 19 Aug 2019 12:34:07 +0200

x86/apic: Handle missing global clockevent gracefully

Some newer machines do not advertise legacy timers. The kernel can handle
that situation if the TSC and the CPU frequency are enumerated by CPUID or
MSRs and the CPU supports TSC deadline timer. If the CPU does not support
TSC deadline timer the local APIC timer frequency has to be known as well.

Some Ryzens machines do not advertize legacy timers, but there is no
reliable way to determine the bus frequency which feeds the local APIC
timer when the machine allows overclocking of that frequency.

As there is no legacy timer the local APIC timer calibration crashes due to
a NULL pointer dereference when accessing the not installed global clock
event device.

Switch the calibration loop to a non interrupt based one, which polls
either TSC (if frequency is known) or jiffies. The latter requires a global
clockevent. As the machines which do not have a global clockevent installed
have a known TSC frequency this is a non issue. For older machines where
TSC frequency is not known, there is no known case where the legacy timers
do not exist as that would have been reported long ago.

Reported-by: Daniel Drake <[email protected]>
Reported-by: Jiri Slaby <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Daniel Drake <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: http://bugzilla.opensuse.org/show_bug.cgi?id=1142926#c12
---
arch/x86/kernel/apic/apic.c | 68 +++++++++++++++++++++++++++++++++++----------
1 file changed, 53 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index f5291362da1a..aa5495d0f478 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -722,7 +722,7 @@ static __initdata unsigned long lapic_cal_pm1, lapic_cal_pm2;
static __initdata unsigned long lapic_cal_j1, lapic_cal_j2;

/*
- * Temporary interrupt handler.
+ * Temporary interrupt handler and polled calibration function.
*/
static void __init lapic_cal_handler(struct clock_event_device *dev)
{
@@ -851,7 +851,8 @@ bool __init apic_needs_pit(void)
static int __init calibrate_APIC_clock(void)
{
struct clock_event_device *levt = this_cpu_ptr(&lapic_events);
- void (*real_handler)(struct clock_event_device *dev);
+ u64 tsc_perj = 0, tsc_start = 0;
+ unsigned long jif_start;
unsigned long deltaj;
long delta, deltatsc;
int pm_referenced = 0;
@@ -878,28 +879,64 @@ static int __init calibrate_APIC_clock(void)
apic_printk(APIC_VERBOSE, "Using local APIC timer interrupts.\n"
"calibrating APIC timer ...\n");

+ /*
+ * There are platforms w/o global clockevent devices. Instead of
+ * making the calibration conditional on that, use a polling based
+ * approach everywhere.
+ */
local_irq_disable();

- /* Replace the global interrupt handler */
- real_handler = global_clock_event->event_handler;
- global_clock_event->event_handler = lapic_cal_handler;
-
/*
* Setup the APIC counter to maximum. There is no way the lapic
* can underflow in the 100ms detection time frame
*/
__setup_APIC_LVTT(0xffffffff, 0, 0);

- /* Let the interrupts run */
+ /*
+ * Methods to terminate the calibration loop:
+ * 1) Global clockevent if available (jiffies)
+ * 2) TSC if available and frequency is known
+ */
+ jif_start = READ_ONCE(jiffies);
+
+ if (tsc_khz) {
+ tsc_start = rdtsc();
+ tsc_perj = div_u64((u64)tsc_khz * 1000, HZ);
+ }
+
+ /*
+ * Enable interrupts so the tick can fire, if a global
+ * clockevent device is available
+ */
local_irq_enable();

- while (lapic_cal_loops <= LAPIC_CAL_LOOPS)
- cpu_relax();
+ while (lapic_cal_loops <= LAPIC_CAL_LOOPS) {
+ /* Wait for a tick to elapse */
+ while (1) {
+ if (tsc_khz) {
+ u64 tsc_now = rdtsc();
+ if ((tsc_now - tsc_start) >= tsc_perj) {
+ tsc_start += tsc_perj;
+ break;
+ }
+ } else {
+ unsigned long jif_now = READ_ONCE(jiffies);

- local_irq_disable();
+ if (time_after(jif_now, jif_start)) {
+ jif_start = jif_now;
+ break;
+ }
+ }
+ cpu_relax();
+ }

- /* Restore the real event handler */
- global_clock_event->event_handler = real_handler;
+ /* Invoke the calibration routine */
+ local_irq_disable();
+ lapic_cal_handler(NULL);
+ local_irq_enable();
+ }
+
+ local_irq_disable();

/* Build delta t1-t2 as apic timer counts down */
delta = lapic_cal_t1 - lapic_cal_t2;
@@ -943,10 +980,11 @@ static int __init calibrate_APIC_clock(void)
levt->features &= ~CLOCK_EVT_FEAT_DUMMY;

/*
- * PM timer calibration failed or not turned on
- * so lets try APIC timer based calibration
+ * PM timer calibration failed or not turned on so lets try APIC
+ * timer based calibration, if a global clockevent device is
+ * available.
*/
- if (!pm_referenced) {
+ if (!pm_referenced && global_clock_event) {
apic_printk(APIC_VERBOSE, "... verify APIC timer\n");

/*