2019-04-03 07:52:01

by Daniel Drake

[permalink] [raw]
Subject: No 8254 PIT & no HPET on new Intel N3350 platforms causes kernel panic during early boot

Hi,

I already wrote about this problem in the thread "APIC timer checked
before it is set up, boot fails on Connex L1430"
https://lkml.org/lkml/2018/12/28/10
However my initial diagnosis was misguided, and I have some new
findings to share now, so I'm starting over in this new thread.

Also CCing Hans, who also often attracts this class of problem on low
cost hardware!

The problem is that on affected platforms, all Linux distros (and all
known kernel versions) fail to boot, hanging on a black screen. EFI
earlyprintk can be used to see the panic:

APIC: switch to symmetric I/O mode setup
x2apic: IRQ remapping doesn't support X2APIC mode
..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
..MP-BIOS bug: 8254 timer not connected to IO-APIC
...tryign to set up timer (IRQ0) through the 8259A ...
..... (found apic 0 pin 2) ...
....... failed.
...trying to set up timer as Virtual Wire IRQ...
..... failed.
...trying to set up timer as ExtINT IRQ...
do_IRQ: 0.55 No irq handler for vector
..... failed :(.
Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot with
apic=debug and send a report.

After encountering this on Connex L1430 last time, we have now
encountered another affected product, from a different vendor (SCOPE
SN116PYA). They both have Intel Apollo Lake N3350 and AMI BIOS.

The code in question is making sure that the IRQ0 timer works, by
waiting for an interrupt. In this case there is no interrupt.

The x86 platform code in hpet_time_init() tries to enable the HPET
timer for this, however that is not available on these affected
platforms (no HPET ACPI table). So it then falls back on the 8253/8254
legacy PIT. The i8253.c driver is invoked to program the PIT
accordingly, however in this case it does not result in any IRQ0
interrupts being generated --> panic.

I found a relevant setting in the BIOS: Chipset -> South Cluster
Configuration -> Miscellaneous Configuration -> 8254 Clock Gating
This option is set to Enabled by default. Setting it to Disabled makes
the PIT tick and Linux boot finally works.

It's nice to have a workaround but I would hope we could do better -
especially because it seems like this problem is spreading. In
addition to the two products we found here, searching around finds
several other product manuals and discussions that tell you to go into
the BIOS and change this option if you want Linux to boot, some
examples:
https://blog.csdn.net/qhtsm/article/details/88600316
https://www.manualslib.com/manual/1316475/Ecs-Ed20pa2.html?page=23
https://tools.exone.de/live/shop/img/produkte/fs_112124_2.pdf page 11

As another data point, Windows 10 boots fine in this no-PIT no-HPET
configuation.

Going deeper, I found the clock_gate_8254 option in the coreboot
source code. This pointed me to the ITSSPRC register, which is
documented on page 1694 of
https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/300-series-chipset-pch-datasheet-vol-2.pdf

"8254 Static Clock Gating Enable (CGE8254): When set, the 8254 timer
is disabled statically. This bit shall be set by BIOS if the 8254
feature is not needed in the system or before BIOS hands off the
system that supports C11. Normal operation of 8254 requires this bit
to 0."

(what's C11?)

I verified that the BIOS setting controls this specific bit value, and
I also created and verified a workaround that unsets this bit - now
Linux boots fine regardless of the BIOS setting:

#define INTEL_APL_PSR_BASE 0xd0000000
#define INTEL_APL_PID_ITSS 0xd0
#define INTEL_PCR_PORTID_SHIFT 16
#define INTEL_APL_PCR_ITSSPRC 0x3300
static void quirk_intel_apl_8254(void)
{
u32 addr = INTEL_APL_PSR_BASE | \
(INTEL_APL_PID_ITSS << INTEL_PCR_PORTID_SHIFT) | \
INTEL_APL_PCR_ITSSPRC;
u32 value;
void __iomem *itssprc = ioremap_nocache(addr, 4);

if (!itssprc)
return;

value = readl(itssprc);
if (value & 4) {
value &= ~4;
writel(value, itssprc);
}
iounmap(itssprc);
}

I was hoping I could send a workaround patch here, but I'm not sure of
an appropriate way to detect that we are on an Intel Apollo Lake
platform. This timer stuff happens during early boot, the early quirks
in pci/quirks.c run too late for this. Suggestions appreciated.

Poking at other angles, I tried taking the HPET ACPI table from
another (working) Intel N3350 system and putting it in the initrd as
an override. This makes the HPET work fine, at which point Linux boots
OK without having to touch the (BIOS-crippled) PIT.

I also spotted that GRUB was previously affected by this BIOS-level
behaviour change.
http://git.savannah.gnu.org/cgit/grub.git/commit/?id=446794de8da4329ea532cbee4ca877bcafd0e534
Apparently GRUB used to rely on the 8254 PIT too, but it now uses the
pmtimer for TSC calibration instead. I guess the originally-affected
platforms only ran into GRUB freezing here (as opposed to both GRUB
and Linux freezing) because those platforms had a working HPET,
meaning that Linux was unaware/unaffected by the newly-gated PIT.

I'm at the limit of my current knowledge here, but there's an open
question of whether Linux could be made to work without a working PIT
and no HPET, in the same way that grub and Windows seem to manage.
Even though it is currently essential for boot, the PIT (or HPET) is
usually only needed to tick a few times before being replaced with the
APIC timer as a clocksource (when setup_APIC_timer() happens, the
clocksource layer disables the previous timer source). However, Thomas
Gleixner gave some hints at the importance of the PIT/HPET here:

> Well, [avoiding the PIT/HPET ticking requirement] would be trivial if we
> could rely on the APIC timer being functional on all CPUs and if we could
> figure out the APIC timer frequency without calibrating it against the
> PIT/HPET on older CPUs. Plus a gazillion of other issues (e.g. APIC stops
> in C states ....)
> [...]
> Under certain conditions we actually might avoid touching PIT/HPET and
> solely rely on the CPUID/MSR calibration values. Needs quite some thought
> though.

I'm not sure what is the best way forward on this issue, but hopefully
this investigation is useful somehow, and I'd be happy to act on any
suggestions.

Thanks
Daniel


2019-04-03 11:23:10

by Thomas Gleixner

[permalink] [raw]
Subject: Re: No 8254 PIT & no HPET on new Intel N3350 platforms causes kernel panic during early boot

Daniel,

On Wed, 3 Apr 2019, Daniel Drake wrote:

> After encountering this on Connex L1430 last time, we have now
> encountered another affected product, from a different vendor (SCOPE
> SN116PYA). They both have Intel Apollo Lake N3350 and AMI BIOS.
>
> The code in question is making sure that the IRQ0 timer works, by
> waiting for an interrupt. In this case there is no interrupt.

Right.

> The x86 platform code in hpet_time_init() tries to enable the HPET
> timer for this, however that is not available on these affected
> platforms (no HPET ACPI table). So it then falls back on the 8253/8254
> legacy PIT. The i8253.c driver is invoked to program the PIT
> accordingly, however in this case it does not result in any IRQ0
> interrupts being generated --> panic.

Correct.

> I found a relevant setting in the BIOS: Chipset -> South Cluster
> Configuration -> Miscellaneous Configuration -> 8254 Clock Gating
> This option is set to Enabled by default. Setting it to Disabled makes
> the PIT tick and Linux boot finally works.

Well, your BIOS at least has this switch ...

> As another data point, Windows 10 boots fine in this no-PIT no-HPET
> configuation.

We have support for HPET/PIT less systems already. We just need to figure
out how to switch to that mode automagically at early boot.

ACPI obviously does not switch to it with the ACPI_FADT_HW_REDUCED flag.

> Going deeper, I found the clock_gate_8254 option in the coreboot
> source code. This pointed me to the ITSSPRC register, which is
> documented on page 1694 of
> https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/300-series-chipset-pch-datasheet-vol-2.pdf
>
> "8254 Static Clock Gating Enable (CGE8254): When set, the 8254 timer
> is disabled statically. This bit shall be set by BIOS if the 8254
> feature is not needed in the system or before BIOS hands off the
> system that supports C11. Normal operation of 8254 requires this bit
> to 0."
>
> (what's C11?)

Don't know. Some magic new C-State perhaps? Rafael?

Btw, one of those links you provided

https://www.manualslib.com/manual/1316475/Ecs-Ed20pa2.html?page=23

claims that you have to disable MWAIT as well. No idea why. Is MWAIT
disabled on your platform?

> I verified that the BIOS setting controls this specific bit value, and
> I also created and verified a workaround that unsets this bit - now
> Linux boots fine regardless of the BIOS setting:
>
> #define INTEL_APL_PSR_BASE 0xd0000000
> #define INTEL_APL_PID_ITSS 0xd0
> #define INTEL_PCR_PORTID_SHIFT 16
> #define INTEL_APL_PCR_ITSSPRC 0x3300
> static void quirk_intel_apl_8254(void)
> {
> u32 addr = INTEL_APL_PSR_BASE | \
> (INTEL_APL_PID_ITSS << INTEL_PCR_PORTID_SHIFT) | \
> INTEL_APL_PCR_ITSSPRC;
> u32 value;
> void __iomem *itssprc = ioremap_nocache(addr, 4);
>
> if (!itssprc)
> return;
>
> value = readl(itssprc);
> if (value & 4) {
> value &= ~4;
> writel(value, itssprc);
> }
> iounmap(itssprc);
> }
>
> I was hoping I could send a workaround patch here, but I'm not sure of
> an appropriate way to detect that we are on an Intel Apollo Lake
> platform. This timer stuff happens during early boot, the early quirks
> in pci/quirks.c run too late for this. Suggestions appreciated.

We have early-quirks.c in arch/x86/kernel/ for that.

> Poking at other angles, I tried taking the HPET ACPI table from
> another (working) Intel N3350 system and putting it in the initrd as
> an override. This makes the HPET work fine, at which point Linux boots
> OK without having to touch the (BIOS-crippled) PIT.

We already have quirks for force enabling HPET, so that could be added.

> I'm at the limit of my current knowledge here, but there's an open
> question of whether Linux could be made to work without a working PIT
> and no HPET, in the same way that grub and Windows seem to manage.
> Even though it is currently essential for boot, the PIT (or HPET) is
> usually only needed to tick a few times before being replaced with the
> APIC timer as a clocksource (when setup_APIC_timer() happens, the
> clocksource layer disables the previous timer source). However, Thomas
> Gleixner gave some hints at the importance of the PIT/HPET here:
>
> > Well, [avoiding the PIT/HPET ticking requirement] would be trivial if we
> > could rely on the APIC timer being functional on all CPUs and if we could
> > figure out the APIC timer frequency without calibrating it against the
> > PIT/HPET on older CPUs. Plus a gazillion of other issues (e.g. APIC stops
> > in C states ....)
> > [...]
> > Under certain conditions we actually might avoid touching PIT/HPET and
> > solely rely on the CPUID/MSR calibration values. Needs quite some thought
> > though.

For newer CPUs we might assume that:

1) The TSC and APIC timer are actually usable

2) The frequencies can be retrieved from CPUID or MSRs

If #1 and #2 are reliable we can avoid the whole calibration and interrupt
delivery mess.

That means we need the following decision logic:

1) If HPET is available in ACPI, boot normal.

2) If HPET is not available, verify that the PIT actually counts. If it
does, boot normal.

If it does not either:

2A) Verify that this is a PCH 300/C240 and fiddle with that ISST bit.

But that means that we need to chase PCH ids forever...

2B) Shrug and just avoid the whole PIT/HPET magic all over the place:

- Avoid the interrupt delivery check in the IOAPIC code as it's
uninteresting in that case. Trivial to do.

- Prevent the TSC calibration code from touching PIT/HPET. It
should do that already when the TSC frequency can be retrieved
via CPUID or MSR. Should work, emphasis on should ...

See the mess in: native_calibrate_tsc() and the magic tables in
tsc_msr.c how well that stuff works.

The cpu_khz_from_cpuid() case at seems to not have these
issues. Knock on wood!

- Prevent the APIC calibration code from touching PIT/HPET. That's
only happening right now when the TSC frequency comes from
the MSRs. No idea why the CPUID method does not provide that.

CPUID leaf 0x16 provides the bus frequency, so we can deduce the
APIC timer frequency from there and spare the whole APIC timer
calibration mess:

ECX Bits 15 - 00: Bus (Reference) Frequency (in MHz).

It's usually not required on these newer CPUs because they
support TSC deadline timer, but you can disable that on the
kernel command line and some implementations of that were
broken. With that we are back to square one.

So we need to make sure that these things work under all
circumstances.

Rafael?

Thanks,

tglx


2019-04-03 12:03:59

by Thomas Gleixner

[permalink] [raw]
Subject: Re: No 8254 PIT & no HPET on new Intel N3350 platforms causes kernel panic during early boot

On Wed, 3 Apr 2019, Thomas Gleixner wrote:
> For newer CPUs we might assume that:
>
> 1) The TSC and APIC timer are actually usable
>
> 2) The frequencies can be retrieved from CPUID or MSRs
>
> If #1 and #2 are reliable we can avoid the whole calibration and interrupt
> delivery mess.
>
> That means we need the following decision logic:
>
> 1) If HPET is available in ACPI, boot normal.
>
> 2) If HPET is not available, verify that the PIT actually counts. If it
> does, boot normal.
>
> If it does not either:
>
> 2A) Verify that this is a PCH 300/C240 and fiddle with that ISST bit.
>
> But that means that we need to chase PCH ids forever...
>
> 2B) Shrug and just avoid the whole PIT/HPET magic all over the place:
>
> - Avoid the interrupt delivery check in the IOAPIC code as it's
> uninteresting in that case. Trivial to do.
>
> - Prevent the TSC calibration code from touching PIT/HPET. It
> should do that already when the TSC frequency can be retrieved
> via CPUID or MSR. Should work, emphasis on should ...
>
> See the mess in: native_calibrate_tsc() and the magic tables in
> tsc_msr.c how well that stuff works.
>
> The cpu_khz_from_cpuid() case at seems to not have these
> issues. Knock on wood!
>
> - Prevent the APIC calibration code from touching PIT/HPET. That's
> only happening right now when the TSC frequency comes from
> the MSRs. No idea why the CPUID method does not provide that.
>
> CPUID leaf 0x16 provides the bus frequency, so we can deduce the
> APIC timer frequency from there and spare the whole APIC timer
> calibration mess:
>
> ECX Bits 15 - 00: Bus (Reference) Frequency (in MHz).
>
> It's usually not required on these newer CPUs because they
> support TSC deadline timer, but you can disable that on the
> kernel command line and some implementations of that were
> broken. With that we are back to square one.
>
> So we need to make sure that these things work under all
> circumstances.
>
> Rafael?

And we have to think hard about how we are going to provide that for
backporting in a digestable form. People are supposed to run recent stable
kernels (I'm not talking about dead kernels ...).

Thanks,

tglx

2019-04-09 05:49:05

by Daniel Drake

[permalink] [raw]
Subject: Re: No 8254 PIT & no HPET on new Intel N3350 platforms causes kernel panic during early boot

On Wed, Apr 3, 2019 at 7:21 PM Thomas Gleixner <[email protected]> wrote:
> Btw, one of those links you provided
>
> https://www.manualslib.com/manual/1316475/Ecs-Ed20pa2.html?page=23
>
> claims that you have to disable MWAIT as well. No idea why. Is MWAIT
> disabled on your platform?

I don't have that option in the BIOS. However there's no mention of
"mwait" nor "mwaitx" in /proc/cpuinfo. Checking our more general
database of 202 x86_64 consumer products released over the last few
years, only 19 of them have mwait/mwaitx listed there and they tend to
be older platforms.

> We have early-quirks.c in arch/x86/kernel/ for that.

Nice, we should be able to work around the issue there, but I hope we
can find something better...

> For newer CPUs we might assume that:
>
> 1) The TSC and APIC timer are actually usable
>
> 2) The frequencies can be retrieved from CPUID or MSRs
>
> If #1 and #2 are reliable we can avoid the whole calibration and interrupt
> delivery mess.

Let's take a step back and re-examine the wider sequence of events
here (which I've now done thanks to your pointers).

1. In very early boot, we face the TSC calibration challenge, arriving
at determine_cpu_tsc_frequencies(). This function calculates CPU
frequency and TSC frequency separately.

For the CPU frequency, native_calibrate_cpu_early() tries to do it via
cpu_khz_from_cpuid() with CPUID leaf 0x16, but this is not available
on the platforms in question, which have max cpuid level 0x15.
cpu_khz_from_msr() is then tried, but that doesn't support this
platform either (looks like it only supports older SoC generations).
So now we arrive in quick_pit_calibrate(), which directly programs the
PIT and measures the TSC rate against the PIT ticks.

When the 8254 is ungated in the BIOS, this function fails early because:
if (pit_expect_msb(0xff, &tsc, &d1)) {
/* returned at count=13, d1 is now 32118 */
for (i = 1; i <= MAX_QUICK_PIT_ITERATIONS; i++) {
if (!pit_expect_msb(0xff-i, &delta, &d2))
/* returned at count=13, d2 is now 48595 */
break;

delta -= tsc;
/* delta is now 246741 */

/*
* Extrapolate the error and fail fast if the error will
* never be below 500 ppm.
*/
if (i == 1 &&
d1 + d2 >= (delta * MAX_QUICK_PIT_ITERATIONS) >> 11)
return 0;
/* this return statement is hit, the calculation is:
32118 + 48595 >= (246741 * 233) >> 11
*/

so the error was too high (I'm not sure why) and
determine_cpu_tsc_frequencies() records the CPU frequency as 0.

Then, comparing to when the 8254 is gated via the BIOS option, the
behaviour is surprising too.
In that case, the quick_calibrate_pit() loop runs through up to i=128,
at which point the error level is low enough to be accepted,
calculating the CPU frequency as 4448MHz (4x higher than reality).
During each loop iteration, pit_expect_msb() returns when the value
changes at count=63 (compared to 13 in the PIT-ungated case). Does
this suggest the PIT is not actually fully gated, it's just ticking a
lot slower than otherwise?

Anyway, back in determine_cpu_tsc_frequencies() with the CPU frequency
calibration done, we now do TSC calibration. This one succeeds in all
cases via native_calibrate_tsc() using CPUID leaf 0x15 to read the
correct value. The TSC is 1094MHz.

Then, in both cases (8254 gated or not), the CPU frequency calculation
is discarded here, because it's wildly different from the TSC rate:
else if (abs(cpu_khz - tsc_khz) * 10 > tsc_khz)
cpu_khz = tsc_khz;

So it seems that this code already behaves along the lines you
describe: it gives more trust to the TSC value read in the modern way,
and does not get upset if the CPU frequency calibration against the
PIT didn't produce a meaningful result.


2. Significantly later during boot, x86_late_time_init() calls
hpet_time_init() which sets up either the PIT or HPET. However, as far
as I can see, there is no checking that the selected clock is actually
ticking. In the case of these affected platforms with the 8254 gated,
we sail right pass this point without a working clock source.

3. x86_late_time_init() then calls apic_intr_mode_init() ->
apic_bsp_setup() -> setup_IO_APIC() and at this point we reach
check_timer(), which attempts to verify/fixup IRQ0 delivery via the
IO-APIC. At this point we check that jiffies increments, and if not,
panic.

4. Some time later, naive_smp_prepare_cpus() calls
setup_boot_APIC_clock() -> setup_APIC_timer() which registers the
local APIC clocksource, replacing the previous PIT/HPET clocksource.
There's no check to make sure that the new clocksource is ticking, as
far as I can see.

5. Some time later, start_secondary() calls
start_secondary_APIC_clock() -> setup_APIC_timer() registering the
APIC clocksource (again? or just for another CPU core?).


Hopefully that analysis helps refine/elaborate the plan a bit more...

> That means we need the following decision logic:
>
> 1) If HPET is available in ACPI, boot normal.
>
> 2) If HPET is not available, verify that the PIT actually counts. If it
> does, boot normal.
>
> If it does not either:
>
> 2A) Verify that this is a PCH 300/C240 and fiddle with that ISST bit.
>
> But that means that we need to chase PCH ids forever...

(I found the ISST bit in the coreboot source code, which shows that
the register is shared over multiple Intel SoC generations. I then
searched for the register name online and found it documented in the
320/C240 public documentation, which I linked to. However that's not
actually the platform in question. In this case we are working with
Intel Apollo Lake N3350.)

Anyway, I agree that doing it with PCI IDs would be painful.

> 2B) Shrug and just avoid the whole PIT/HPET magic all over the place:
>
> - Avoid the interrupt delivery check in the IOAPIC code as it's
> uninteresting in that case. Trivial to do.

What do you mean by "in that case"? In the case of having an IOAPIC?

From my analysis above, this interrupt delivery check feels misplaced.
Other parts of the clock setup code (e.g. where PIC, HPET and APIC
timer are enabled) do not seem to check that the timers being set up
actually work. If I were to try a kernel with no APIC/LAPIC support
then Linux would boot with a broken PIT as the clock source without
checking it. So why do we check it here specifically in the IOAPIC
code? I see it does some tricks which are presumably needed on
historical platforms, but maybe it could let boot continue even if it
can't find a working IRQ0 setup? Or it could at least skip the check
if IRQ0 was not working before the IOAPIC gets set up?

If there is desire for some "check that the clocksource is actually
ticking" panic logic, maybe this could be done after the local APIC
timer is setup (which is ultimately the clock source selected and
used), maybe it should even be done in arch-independent code?

> - Prevent the TSC calibration code from touching PIT/HPET. It
> should do that already when the TSC frequency can be retrieved
> via CPUID or MSR. Should work, emphasis on should ...

From above, this seems to be working acceptably already. It does touch
the PIT, but ultimately ignores the information that it provided.

> - Prevent the APIC calibration code from touching PIT/HPET. That's
> only happening right now when the TSC frequency comes from
> the MSRs. No idea why the CPUID method does not provide that.

Where's the APIC calibration code?

> CPUID leaf 0x16 provides the bus frequency, so we can deduce the
> APIC timer frequency from there and spare the whole APIC timer
> calibration mess:
>
> ECX Bits 15 - 00: Bus (Reference) Frequency (in MHz).

That's not available on this platform, plus
https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
page 1-21 says that the data returned is actually marketing stuff, and
shouldn't be treated as real. I think you mean CPUID leaf 0x15
instead.

Thanks for your input!

Daniel

2019-04-10 16:17:35

by Thomas Gleixner

[permalink] [raw]
Subject: Re: No 8254 PIT & no HPET on new Intel N3350 platforms causes kernel panic during early boot

On Tue, 9 Apr 2019, Daniel Drake wrote:
> On Wed, Apr 3, 2019 at 7:21 PM Thomas Gleixner <[email protected]> wrote:
> > That means we need the following decision logic:
> >
> > 1) If HPET is available in ACPI, boot normal.
> >
> > 2) If HPET is not available, verify that the PIT actually counts. If it
> > does, boot normal.
> >
> > If it does not either:
> >
> > 2A) Verify that this is a PCH 300/C240 and fiddle with that ISST bit.
> >
> > But that means that we need to chase PCH ids forever...
>
> (I found the ISST bit in the coreboot source code, which shows that
> the register is shared over multiple Intel SoC generations. I then
> searched for the register name online and found it documented in the
> 320/C240 public documentation, which I linked to. However that's not
> actually the platform in question. In this case we are working with
> Intel Apollo Lake N3350.)
>
> Anyway, I agree that doing it with PCI IDs would be painful.
>
> > 2B) Shrug and just avoid the whole PIT/HPET magic all over the place:
> >
> > - Avoid the interrupt delivery check in the IOAPIC code as it's
> > uninteresting in that case. Trivial to do.
>
> What do you mean by "in that case"? In the case of having an IOAPIC?

In the case that neither HPET nor PIT are available or required.

> From my analysis above, this interrupt delivery check feels misplaced.

Not really.

> Other parts of the clock setup code (e.g. where PIC, HPET and APIC
> timer are enabled) do not seem to check that the timers being set up
> actually work. If I were to try a kernel with no APIC/LAPIC support
> then Linux would boot with a broken PIT as the clock source without
> checking it. So why do we check it here specifically in the IOAPIC
> code? I see it does some tricks which are presumably needed on
> historical platforms, but maybe it could let boot continue even if it
> can't find a working IRQ0 setup? Or it could at least skip the check
> if IRQ0 was not working before the IOAPIC gets set up?

It's not only historical. The irq0 'wiring' is still screwed up on newer
systems and we cannot rely on the ACPI/BIOS information.

> If there is desire for some "check that the clocksource is actually
> ticking" panic logic, maybe this could be done after the local APIC
> timer is setup (which is ultimately the clock source selected and
> used), maybe it should even be done in arch-independent code?

Well. Most systems have working timers. Just x86 is an utter trainwreck in
that regard.

PIT/HPET used to work just fine forever (except for the actual interrupt
delivery) so this is new terretory.

> > - Prevent the TSC calibration code from touching PIT/HPET. It
> > should do that already when the TSC frequency can be retrieved
> > via CPUID or MSR. Should work, emphasis on should ...
>
> >From above, this seems to be working acceptably already. It does touch
> the PIT, but ultimately ignores the information that it provided.

Yes, but we might actually be smarter than that.

> > - Prevent the APIC calibration code from touching PIT/HPET. That's
> > only happening right now when the TSC frequency comes from
> > the MSRs. No idea why the CPUID method does not provide that.
>
> Where's the APIC calibration code?

calibrate_APIC_clock()

> > CPUID leaf 0x16 provides the bus frequency, so we can deduce the
> > APIC timer frequency from there and spare the whole APIC timer
> > calibration mess:
> >
> > ECX Bits 15 - 00: Bus (Reference) Frequency (in MHz).
>
> That's not available on this platform, plus
> architecture-instruction-set-extensions-programming-reference.pdf

Please use the SDM as reference, but yes it tells the same.

> page 1-21 says that the data returned is actually marketing stuff, and
> shouldn't be treated as real. I think you mean CPUID leaf 0x15
> instead.

No, it's not marketing. It's the specification and there must be a reason
why Intel decided to actually use it in the kernel:

Author: Len Brown <[email protected]>
Date: Fri Jun 17 01:22:51 2016 -0400

x86/tsc: Enumerate SKL cpu_khz and tsc_khz via CPUID

Skylake CPU base-frequency and TSC frequency may differ
by up to 2%.

Enumerate CPU and TSC frequencies separately, allowing
cpu_khz and tsc_khz to differ.

The existing CPU frequency calibration mechanism is unchanged.
However, CPUID extensions are preferred, when available.

CPUID.0x16 is preferred over MSR and timer calibration
for CPU frequency discovery.

CPUID.0x15 takes precedence over CPU-frequency
for TSC frequency discovery.


Leaf 0x15 does not tell the bus/reference clock which is what we need for
using the local apic timer (not the TSC deadline timer).

Thanks,

tglx

2019-04-16 05:22:37

by Daniel Drake

[permalink] [raw]
Subject: Re: No 8254 PIT & no HPET on new Intel N3350 platforms causes kernel panic during early boot

On Wed, Apr 10, 2019 at 8:54 PM Thomas Gleixner <[email protected]> wrote:
> On Tue, 9 Apr 2019, Daniel Drake wrote:
> > On Wed, Apr 3, 2019 at 7:21 PM Thomas Gleixner <[email protected]> wrote:
> > > - Prevent the TSC calibration code from touching PIT/HPET. It
> > > should do that already when the TSC frequency can be retrieved
> > > via CPUID or MSR. Should work, emphasis on should ...
> >
> > >From above, this seems to be working acceptably already. It does touch
> > the PIT, but ultimately ignores the information that it provided.
>
> Yes, but we might actually be smarter than that.

Do you have anything specific in mind?

You originally laid out this idea in the context of doing this if the
PIT/HPET is not working. However, I can't immediately see how to judge
that because:
- According to the analysis in my last mail, the PIT is actually
ticking even when it is gated in the BIOS. The BIOS setting just seems
to make it tick 4 times slower and not generate any IRQ0 interrupts.
- TSC calibration code runs really early during boot. To make it
detect this situation we could make it check if IRQ0 is working,
however setup_default_timer_irq() only happens a lot later, so I'm not
sure how this could be checked at such an early stage.

I think I'm now fairly clear on the other suggestions you have made so
I'll see if I can come up with some patches.

Thanks!
Daniel

Subject: [tip:x86/apic] x86/tsc: Set LAPIC timer period to crystal clock frequency

Commit-ID: 2420a0b1798d7a78d1f9b395f09f3c80d92cc588
Gitweb: https://git.kernel.org/tip/2420a0b1798d7a78d1f9b395f09f3c80d92cc588
Author: Daniel Drake <[email protected]>
AuthorDate: Thu, 9 May 2019 13:54:17 +0800
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 9 May 2019 11:06:49 +0200

x86/tsc: Set LAPIC timer period to crystal clock frequency

The APIC timer calibration (calibrate_APIC_timer()) can be skipped
in cases where we know the APIC timer frequency. On Intel SoCs,
we believe that the APIC is fed by the crystal clock; this would make
sense, and the crystal clock frequency has been verified against the
APIC timer calibration result on ApolloLake, GeminiLake, Kabylake,
CoffeeLake, WhiskeyLake and AmberLake.

Set lapic_timer_period based on the crystal clock frequency
accordingly.

APIC timer calibration would normally be skipped on modern CPUs
by nature of the TSC deadline timer being used instead,
however this change is still potentially useful, e.g. if the
TSC deadline timer has been disabled with a kernel parameter.
calibrate_APIC_timer() uses the legacy timer, but we are seeing
new platforms that omit such legacy functionality, so avoiding
such codepaths is becoming more important.

Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Daniel Drake <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/tsc.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 6e6d933fb99c..8f47c4862c56 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -671,6 +671,16 @@ unsigned long native_calibrate_tsc(void)
if (boot_cpu_data.x86_model == INTEL_FAM6_ATOM_GOLDMONT)
setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);

+#ifdef CONFIG_X86_LOCAL_APIC
+ /*
+ * The local APIC appears to be fed by the core crystal clock
+ * (which sounds entirely sensible). We can set the global
+ * lapic_timer_period here to avoid having to calibrate the APIC
+ * timer later.
+ */
+ lapic_timer_period = crystal_khz * 1000 / HZ;
+#endif
+
return crystal_khz * ebx_numerator / eax_denominator;
}

2019-06-27 08:56:58

by Daniel Drake

[permalink] [raw]
Subject: Re: No 8254 PIT & no HPET on new Intel N3350 platforms causes kernel panic during early boot

Hi Thomas,

Picking up this issue again after a break!

We made some progress last time on reducing PIT usage in the TSC
calibration code, but we still have the bigger issue to resolve:
IO-APIC code panicing when the PIT isn't ticking.

On Wed, Apr 3, 2019 at 7:21 PM Thomas Gleixner <[email protected]> wrote:
> For newer CPUs we might assume that:
>
> 1) The TSC and APIC timer are actually usable
>
> 2) The frequencies can be retrieved from CPUID or MSRs
>
> If #1 and #2 are reliable we can avoid the whole calibration and interrupt
> delivery mess.
>
> That means we need the following decision logic:
>
> 1) If HPET is available in ACPI, boot normal.
>
> 2) If HPET is not available, verify that the PIT actually counts. If it
> does, boot normal.
>
> If it does not either:
>
> 2A) Verify that this is a PCH 300/C240 and fiddle with that ISST bit.
>
> But that means that we need to chase PCH ids forever...
>
> 2B) Shrug and just avoid the whole PIT/HPET magic all over the place:
>
> - Avoid the interrupt delivery check in the IOAPIC code as it's
> uninteresting in that case. Trivial to do.

I tried to explore this idea here:
https://lore.kernel.org/patchwork/patch/1064972/
https://lore.kernel.org/patchwork/patch/1064971/

But I can't say I really knew what I was doing there, and you
pointed out some problems.

Any new ideas that I can experiment with?

Being more conservative, how about something like this?
---
arch/x86/kernel/apic/io_apic.c | 23 +++++++++++++++++++----
1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 53aa234a6803..36b1e7d5b657 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2073,7 +2073,7 @@ static int mp_alloc_timer_irq(int ioapic, int pin)
*
* FIXME: really need to revamp this for all platforms.
*/
-static inline void __init check_timer(void)
+static inline void __init check_timer(int timer_was_working)
{
struct irq_data *irq_data = irq_get_irq_data(0);
struct mp_chip_data *data = irq_data->chip_data;
@@ -2216,8 +2216,15 @@ static inline void __init check_timer(void)
apic_printk(APIC_QUIET, KERN_INFO
"Perhaps problem with the pre-enabled x2apic mode\n"
"Try booting with x2apic and interrupt-remapping disabled in the bios.\n");
- panic("IO-APIC + timer doesn't work! Boot with apic=debug and send a "
- "report. Then try booting with the 'noapic' option.\n");
+
+ if (timer_was_working)
+ panic("IO-APIC + timer doesn't work! Boot with apic=debug and send a "
+ "report. Then try booting with the 'noapic' option.\n");
+ else
+ apic_printk(APIC_QUIET, KERN_INFO
+ "Continuing anyway with no 8254 timer, as it was not working even before IO-APIC setup was attempted.\n"
+ "Boot will fail unless another working clocksource is found.\n");
+
out:
local_irq_restore(flags);
}
@@ -2304,12 +2311,20 @@ static void ioapic_destroy_irqdomain(int idx)
void __init setup_IO_APIC(void)
{
int ioapic;
+ int timer_was_working = 0;

if (skip_ioapic_setup || !nr_ioapics)
return;

io_apic_irqs = nr_legacy_irqs() ? ~PIC_IRQS : ~0UL;

+ /*
+ * Record if the timer was in working state before we do any
+ * IO-APIC setup.
+ */
+ if (nr_legacy_irqs())
+ timer_was_working = timer_irq_works();
+
apic_printk(APIC_VERBOSE, "ENABLING IO-APIC IRQs\n");
for_each_ioapic(ioapic)
BUG_ON(mp_irqdomain_create(ioapic));
@@ -2323,7 +2338,7 @@ void __init setup_IO_APIC(void)
setup_IO_APIC_irqs();
init_IO_APIC_traps();
if (nr_legacy_irqs())
- check_timer();
+ check_timer(timer_was_working);

ioapic_initialized = 1;
}
--
2.20.1

2019-06-27 14:07:28

by Thomas Gleixner

[permalink] [raw]
Subject: Re: No 8254 PIT & no HPET on new Intel N3350 platforms causes kernel panic during early boot

Daniel,

On Thu, 27 Jun 2019, Daniel Drake wrote:
> Picking up this issue again after a break!
>
> We made some progress last time on reducing PIT usage in the TSC
> calibration code, but we still have the bigger issue to resolve:
> IO-APIC code panicing when the PIT isn't ticking.

Yeah. I was busy with other stuff and simply forgot.

> Being more conservative, how about something like this?
>
> + /*
> + * Record if the timer was in working state before we do any
> + * IO-APIC setup.
> + */
> + if (nr_legacy_irqs())
> + timer_was_working = timer_irq_works();

Nah. That extra timer works thing is just another bandaid.

What I had in mind is something like the below. That's on top of

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/apic

Be warned. It's neither compiled nor tested, so keep a fire extinguisher
handy. If it explodes you own the pieces.

/me goes off to find icecream

Thanks,

tglx

8<-----------------
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -173,6 +173,7 @@ extern void lapic_assign_system_vectors(
extern void lapic_assign_legacy_vector(unsigned int isairq, bool replace);
extern void lapic_online(void);
extern void lapic_offline(void);
+extern bool apic_needs_pit(void);

#else /* !CONFIG_X86_LOCAL_APIC */
static inline void lapic_shutdown(void) { }
@@ -186,6 +187,7 @@ static inline void init_bsp_APIC(void) {
static inline void apic_intr_mode_init(void) { }
static inline void lapic_assign_system_vectors(void) { }
static inline void lapic_assign_legacy_vector(unsigned int i, bool r) { }
+static inline bool apic_needs_pit(void) { return true; }
#endif /* !CONFIG_X86_LOCAL_APIC */

#ifdef CONFIG_X86_X2APIC
--- a/arch/x86/include/asm/time.h
+++ b/arch/x86/include/asm/time.h
@@ -7,6 +7,7 @@

extern void hpet_time_init(void);
extern void time_init(void);
+extern bool pit_timer_init(void);

extern struct clock_event_device *global_clock_event;

--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -820,6 +820,33 @@ static int __init lapic_init_clockevent(
return 0;
}

+bool __init apic_and_tsc_needs_pit(void)
+{
+ /*
+ * If the frequencies are not known, PIT is required for both TSC
+ * and apic timer calibration.
+ */
+ if (!tsc_khz || !cpu_khz)
+ return true;
+
+ /* Is there an APIC at all? */
+ if (!boot_cpu_has(X86_FEATURE_APIC))
+ return true;
+
+ /* Deadline timer is based on TSC so no further PIT action required */
+ if (boot_cpu_has(X86_FEATURE_TSC_DEADLINE_TIMER))
+ return false;
+
+ /* APIC timer disabled? */
+ if (disable_apic_timer)
+ return true;
+ /*
+ * The APIC timer frequency is known already, no PIT calibration
+ * required. If unknown, let the PIT be initialized.
+ */
+ return lapic_timer_period == 0;
+}
+
static int __init calibrate_APIC_clock(void)
{
struct clock_event_device *levt = this_cpu_ptr(&lapic_events);
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -58,6 +58,7 @@
#include <asm/acpi.h>
#include <asm/dma.h>
#include <asm/timer.h>
+#include <asm/time.h>
#include <asm/i8259.h>
#include <asm/setup.h>
#include <asm/irq_remapping.h>
@@ -2083,6 +2084,9 @@ static inline void __init check_timer(vo
unsigned long flags;
int no_pin1 = 0;

+ if (!global_clock_event)
+ return;
+
local_irq_save(flags);

/*
--- a/arch/x86/kernel/i8253.c
+++ b/arch/x86/kernel/i8253.c
@@ -8,6 +8,7 @@
#include <linux/timex.h>
#include <linux/i8253.h>

+#include <asm/apic.h>
#include <asm/hpet.h>
#include <asm/time.h>
#include <asm/smp.h>
@@ -18,10 +19,32 @@
*/
struct clock_event_device *global_clock_event;

-void __init setup_pit_timer(void)
+/*
+ * Modern chipsets can disable the PIT clock which makes it unusable. It
+ * would be possible to enable the clock but the registers are chipset
+ * specific and not discoverable. Avoid the whack a mole game.
+ *
+ * These platforms have discoverable TSC/CPU frequencies but this also
+ * requires to know the local APIC timer frequency as it normally is
+ * calibrated against the PIT interrupt.
+ */
+static bool __init use_pit(void)
{
+ if (!IS_ENABLED(CONFIG_X86_TSC) || !boot_cpu_has(X86_FEATURE_TSC))
+ return true;
+
+ /* This also returns true when APIC is disabled */
+ return apic_needs_pit();
+}
+
+bool __init pit_timer_init(void)
+{
+ if (!use_pit())
+ return false;
+
clockevent_i8253_init(true);
global_clock_event = &i8253_clockevent;
+ return true;
}

#ifndef CONFIG_X86_64
--- a/arch/x86/kernel/time.c
+++ b/arch/x86/kernel/time.c
@@ -82,8 +82,11 @@ static void __init setup_default_timer_i
/* Default timer init function */
void __init hpet_time_init(void)
{
- if (!hpet_enable())
- setup_pit_timer();
+ if (!hpet_enable()) {
+ if (!pit_timer_init())
+ return;
+ }
+
setup_default_timer_irq();
}

2019-06-28 03:34:26

by Daniel Drake

[permalink] [raw]
Subject: Re: No 8254 PIT & no HPET on new Intel N3350 platforms causes kernel panic during early boot

On Thu, Jun 27, 2019 at 10:07 PM Thomas Gleixner <[email protected]> wrote:
> Nah. That extra timer works thing is just another bandaid.
>
> What I had in mind is something like the below. That's on top of
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/apic
>
> Be warned. It's neither compiled nor tested, so keep a fire extinguisher
> handy. If it explodes you own the pieces.

Thanks, it works, and makes sense!

I'll add a commit message and send it as a patch, just one quick
function naming question... did you mean apic_and_tsc_needs_pit() or
apic_needs_pit()? That's the only compile fix needed.

Daniel

2019-06-28 05:10:17

by Thomas Gleixner

[permalink] [raw]
Subject: Re: No 8254 PIT & no HPET on new Intel N3350 platforms causes kernel panic during early boot

On Fri, 28 Jun 2019, Daniel Drake wrote:
> On Thu, Jun 27, 2019 at 10:07 PM Thomas Gleixner <[email protected]> wrote:
> > Nah. That extra timer works thing is just another bandaid.
> >
> > What I had in mind is something like the below. That's on top of
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/apic
> >
> > Be warned. It's neither compiled nor tested, so keep a fire extinguisher
> > handy. If it explodes you own the pieces.
>
> Thanks, it works, and makes sense!
>
> I'll add a commit message and send it as a patch, just one quick
> function naming question... did you mean apic_and_tsc_needs_pit() or
> apic_needs_pit()? That's the only compile fix needed.

I'd say apic_needs_pit(). Less places to change :)

Thanks,

tglx