My newly acquired AMD Ryzen Threadripper based system seems to have
some TSC quirks, which go away once the system is up.
Given the discussion in https://lkml.org/lkml/2019/1/28/1356
I don't seem to be the only one, and it does not seem to be
Threadripper specific.
The first patch addresses presumably SMI interruptions, that occur
on my system at a 60 Hz frequency. They take too long, making the
existing code fail. The patch makes that code more resilient.
(These interruptions go away at some point during boot; also proven
by the fact, that I don't have any TSC issues, when kexecing into
a new kernel.)
The second patch prevents TSC going unstable when resuming from S3.
This may have a similar source, but is certainly more weird: the
TSC is observed to go backwards on CPU0 by about 200 to 500 cycles
compared to other CPUs every once in a while, as long as its sibling
(CPU16 in my case) hasn't been resumed yet. (I did some experiments
with CPU hotplug before and after suspend, but apart from reproducing
the issue and verifying the "fix", I got nowhere.)
The patches are against v4.20.
Jan H. Schönherr (2):
x86/tsc: Allow quick PIT calibration despite interruptions
cpu/hotplug: Unfreeze sibling CPU first on resume from S3
arch/x86/kernel/tsc.c | 80 +++++++++++++++++++++++++------------------
kernel/cpu.c | 34 ++++++++++++------
2 files changed, 70 insertions(+), 44 deletions(-)
--
2.19.2
At least one system declares the TSC unstable after resume from S3,
because the TSC is observed going backwards up to roughly 500 cycles
every now and then, when bringing secondary CPUs back online.
The system in question is an AMD Ryzen Threadripper 2950X, microcode
0x800820b, on an ASRock Fatal1ty X399 Professional Gaming, BIOS P3.30.
This unexplained behavior goes away as soon as the sibling CPU of the
boot CPU is brought back up. Hence, add a hack to restore the sibling
CPU before all others on unfreeze. This keeps the TSC stable.
Signed-off-by: Jan H. Schönherr <[email protected]>
---
kernel/cpu.c | 34 ++++++++++++++++++++++++----------
1 file changed, 24 insertions(+), 10 deletions(-)
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 91d5c38eb7e5..7097ee8c1b17 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1193,6 +1193,7 @@ EXPORT_SYMBOL_GPL(cpu_up);
#ifdef CONFIG_PM_SLEEP_SMP
static cpumask_var_t frozen_cpus;
+static int frozen_primary_sibling;
int freeze_secondary_cpus(int primary)
{
@@ -1211,6 +1212,8 @@ int freeze_secondary_cpus(int primary)
for_each_online_cpu(cpu) {
if (cpu == primary)
continue;
+ if (cpumask_test_cpu(cpu, topology_sibling_cpumask(primary)))
+ frozen_primary_sibling = cpu;
trace_suspend_resume(TPS("CPU_OFF"), cpu, true);
error = _cpu_down(cpu, 1, CPUHP_OFFLINE);
trace_suspend_resume(TPS("CPU_OFF"), cpu, false);
@@ -1246,9 +1249,23 @@ void __weak arch_enable_nonboot_cpus_end(void)
{
}
+static void enable_nonboot_cpu(int cpu)
+{
+ int error;
+
+ trace_suspend_resume(TPS("CPU_ON"), cpu, true);
+ error = _cpu_up(cpu, 1, CPUHP_ONLINE);
+ trace_suspend_resume(TPS("CPU_ON"), cpu, false);
+ if (!error) {
+ pr_info("CPU%d is up\n", cpu);
+ return;
+ }
+ pr_warn("Error taking CPU%d up: %d\n", cpu, error);
+}
+
void enable_nonboot_cpus(void)
{
- int cpu, error;
+ int cpu;
/* Allow everyone to use the CPU hotplug again */
cpu_maps_update_begin();
@@ -1260,16 +1277,13 @@ void enable_nonboot_cpus(void)
arch_enable_nonboot_cpus_begin();
- for_each_cpu(cpu, frozen_cpus) {
- trace_suspend_resume(TPS("CPU_ON"), cpu, true);
- error = _cpu_up(cpu, 1, CPUHP_ONLINE);
- trace_suspend_resume(TPS("CPU_ON"), cpu, false);
- if (!error) {
- pr_info("CPU%d is up\n", cpu);
- continue;
- }
- pr_warn("Error taking CPU%d up: %d\n", cpu, error);
+ cpu = frozen_primary_sibling;
+ if (cpumask_test_cpu(cpu, frozen_cpus)) {
+ enable_nonboot_cpu(cpu);
+ cpumask_clear_cpu(cpu, frozen_cpus);
}
+ for_each_cpu(cpu, frozen_cpus)
+ enable_nonboot_cpu(cpu);
arch_enable_nonboot_cpus_end();
--
2.19.2
Some systems experience regular interruptions (60 Hz SMI?), that prevent
the quick PIT calibration from succeeding: individual interruptions can be
so long, that the PIT MSB is observed to decrement by 2 or 3 instead of 1.
The existing code cannot recover from this.
The system in question is an AMD Ryzen Threadripper 2950X, microcode
0x800820b, on an ASRock Fatal1ty X399 Professional Gaming, BIOS P3.30.
Change the code to handle (almost) arbitrary interruptions, as long
as they happen only once in a while and they do not take too long.
Specifically, also cover an interruption during the very first reads.
Signed-off-by: Jan H. Schönherr <[email protected]>
---
arch/x86/kernel/tsc.c | 80 +++++++++++++++++++++++++------------------
1 file changed, 46 insertions(+), 34 deletions(-)
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index e9f777bfed40..a005e0aa215e 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -485,7 +485,7 @@ static inline int pit_verify_msb(unsigned char val)
static inline int pit_expect_msb(unsigned char val, u64 *tscp, unsigned long *deltap)
{
int count;
- u64 tsc = 0, prev_tsc = 0;
+ u64 tsc = get_cycles(), prev_tsc = 0;
for (count = 0; count < 50000; count++) {
if (!pit_verify_msb(val))
@@ -500,7 +500,7 @@ static inline int pit_expect_msb(unsigned char val, u64 *tscp, unsigned long *de
* We require _some_ success, but the quality control
* will be based on the error terms on the TSC values.
*/
- return count > 5;
+ return count > 0 && pit_verify_msb(val - 1);
}
/*
@@ -515,7 +515,8 @@ static inline int pit_expect_msb(unsigned char val, u64 *tscp, unsigned long *de
static unsigned long quick_pit_calibrate(void)
{
int i;
- u64 tsc, delta;
+ u64 tsc = 0, delta;
+ unsigned char start;
unsigned long d1, d2;
if (!has_legacy_pic())
@@ -547,39 +548,50 @@ static unsigned long quick_pit_calibrate(void)
*/
pit_verify_msb(0);
- if (pit_expect_msb(0xff, &tsc, &d1)) {
- for (i = 1; i <= MAX_QUICK_PIT_ITERATIONS; i++) {
- if (!pit_expect_msb(0xff-i, &delta, &d2))
- break;
-
- delta -= tsc;
-
- /*
- * Extrapolate the error and fail fast if the error will
- * never be below 500 ppm.
- */
- if (i == 1 &&
- d1 + d2 >= (delta * MAX_QUICK_PIT_ITERATIONS) >> 11)
- return 0;
-
- /*
- * Iterate until the error is less than 500 ppm
- */
- if (d1+d2 >= delta >> 11)
- continue;
-
- /*
- * Check the PIT one more time to verify that
- * all TSC reads were stable wrt the PIT.
- *
- * This also guarantees serialization of the
- * last cycle read ('d2') in pit_expect_msb.
- */
- if (!pit_verify_msb(0xfe - i))
- break;
- goto success;
+ for (i = 0xff; i > 0xf0; i--) {
+ if (!pit_expect_msb(i, &delta, &d2))
+ continue;
+
+ if (!tsc) {
+ /* first success */
+ tsc = delta;
+ d1 = d2;
+ continue;
+ }
+
+ /* second success */
+ if (d2 < d1) {
+ tsc = delta;
+ d1 = d2;
}
+ goto calibrate;
+ }
+
+ pr_info("Fast TSC calibration failed (couldn't even start)\n");
+ return 0;
+
+calibrate:
+ /*
+ * Extrapolate the error and fail fast if the error will
+ * never be below 500 ppm.
+ */
+ if (d1 + d1 >= (delta * MAX_QUICK_PIT_ITERATIONS) >> 11) {
+ pr_info("Fast TSC calibration failed (wouldn't work)\n");
+ return 0;
+ }
+
+ start = i;
+ for (i = 1; i <= MAX_QUICK_PIT_ITERATIONS; i++) {
+ if (!pit_expect_msb(start - i, &delta, &d2))
+ continue;
+
+ delta -= tsc;
+
+ /* Stop when the error is less than 500 ppm */
+ if (d1 + d2 < delta >> 11)
+ goto success;
}
+
pr_info("Fast TSC calibration failed\n");
return 0;
--
2.19.2
Dear Jan,
Thank you for adding me to the CC list.
Am 29.01.19 um 11:23 schrieb Jan H. Schönherr:
> My newly acquired AMD Ryzen Threadripper based system seems to have
> some TSC quirks, which go away once the system is up.
>
> Given the discussion in https://lkml.org/lkml/2019/1/28/1356
> I don't seem to be the only one, and it does not seem to be
> Threadripper specific.
>
> The first patch addresses presumably SMI interruptions, that occur
> on my system at a 60 Hz frequency. They take too long, making the
> existing code fail. The patch makes that code more resilient.
> (These interruptions go away at some point during boot; also proven
> by the fact, that I don't have any TSC issues, when kexecing into
> a new kernel.)
>
> The second patch prevents TSC going unstable when resuming from S3.
> This may have a similar source, but is certainly more weird: the
> TSC is observed to go backwards on CPU0 by about 200 to 500 cycles
> compared to other CPUs every once in a while, as long as its sibling
> (CPU16 in my case) hasn't been resumed yet. (I did some experiments
> with CPU hotplug before and after suspend, but apart from reproducing
> the issue and verifying the "fix", I got nowhere.)
>
> The patches are against v4.20.
>
> Jan H. Schönherr (2):
> x86/tsc: Allow quick PIT calibration despite interruptions
> cpu/hotplug: Unfreeze sibling CPU first on resume from S3
>
> arch/x86/kernel/tsc.c | 80 +++++++++++++++++++++++++------------------
> kernel/cpu.c | 34 ++++++++++++------
> 2 files changed, 70 insertions(+), 44 deletions(-)
I successfully tested both applied on Linus’ current master branch
(v5.0-rc4-1-g4aa9fc2a435a).
> [ 0.000000] DMI: HP HP EliteDesk 705 G4 MT/83E7, BIOS Q06 Ver. 02.04.01 09/14/2018
> [ 0.000000] tsc: Fast TSC calibration using PIT
> [ 0.000000] tsc: Detected 3616.864 MHz processor
Kind regards,
Paul
Dear Jan,
Am 29.01.19 um 20:33 schrieb Paul Menzel:
> Thank you for adding me to the CC list.
>
>
> Am 29.01.19 um 11:23 schrieb Jan H. Schönherr:
>> My newly acquired AMD Ryzen Threadripper based system seems to have
>> some TSC quirks, which go away once the system is up.
>>
>> Given the discussion in https://lkml.org/lkml/2019/1/28/1356
>> I don't seem to be the only one, and it does not seem to be
>> Threadripper specific.
>>
>> The first patch addresses presumably SMI interruptions, that occur
>> on my system at a 60 Hz frequency. They take too long, making the
>> existing code fail. The patch makes that code more resilient.
>> (These interruptions go away at some point during boot; also proven
>> by the fact, that I don't have any TSC issues, when kexecing into
>> a new kernel.)
>>
>> The second patch prevents TSC going unstable when resuming from S3.
>> This may have a similar source, but is certainly more weird: the
>> TSC is observed to go backwards on CPU0 by about 200 to 500 cycles
>> compared to other CPUs every once in a while, as long as its sibling
>> (CPU16 in my case) hasn't been resumed yet. (I did some experiments
>> with CPU hotplug before and after suspend, but apart from reproducing
>> the issue and verifying the "fix", I got nowhere.)
>>
>> The patches are against v4.20.
>>
>> Jan H. Schönherr (2):
>> x86/tsc: Allow quick PIT calibration despite interruptions
>> cpu/hotplug: Unfreeze sibling CPU first on resume from S3
>>
>> arch/x86/kernel/tsc.c | 80 +++++++++++++++++++++++++------------------
>> kernel/cpu.c | 34 ++++++++++++------
>> 2 files changed, 70 insertions(+), 44 deletions(-)
>
> I successfully tested both applied on Linus’ current master branch
> (v5.0-rc4-1-g4aa9fc2a435a).
>
>> [ 0.000000] DMI: HP HP EliteDesk 705 G4 MT/83E7, BIOS Q06 Ver. 02.04.01 09/14/2018
>> [ 0.000000] tsc: Fast TSC calibration using PIT
>> [ 0.000000] tsc: Detected 3616.864 MHz processor
Please do not forget to tag both patches for the stable releases
by adding the tag.
> CC: [email protected]
Kind regards,
Paul
On Tue, 29 Jan 2019, Jan H. Schönherr wrote:
> Am 29.01.2019 um 11:23 schrieb Jan H. Schönherr:
> > +calibrate:
> > + /*
> > + * Extrapolate the error and fail fast if the error will
> > + * never be below 500 ppm.
> > + */
> > + if (d1 + d1 >= (delta * MAX_QUICK_PIT_ITERATIONS) >> 11) {
> > + pr_info("Fast TSC calibration failed (wouldn't work)\n");
> > + return 0;
> > + }
>
> I messed this check up. "delta" is not the actual tsc delta between the
> first two successful reads at this point.
>
> (If it were, it might also correspond to more than one iteration; not
> sure if we care about that aspect, though.)
I rather go for correct and if only for the reason that me/you/whoelse
doesn't have to scratch the head for an hour when looking at this half a
year from now.
Thanks,
tglx
Paul,
On Tue, 29 Jan 2019, Paul Menzel wrote:
>
> Please do not forget to tag both patches for the stable releases
> by adding the tag.
>
> > CC: [email protected]
That's not the way it works. The TSC one might go to stable eventually, but
the second one is a hack and surely goes nowhere in the current
form. Routing patches to stable is not a matter of works for me.
Thanks,
tglx
Jan,
On Tue, 29 Jan 2019, Jan H. Schönherr wrote:
> At least one system declares the TSC unstable after resume from S3,
> because the TSC is observed going backwards up to roughly 500 cycles
> every now and then, when bringing secondary CPUs back online.
>
> The system in question is an AMD Ryzen Threadripper 2950X, microcode
> 0x800820b, on an ASRock Fatal1ty X399 Professional Gaming, BIOS P3.30.
>
> This unexplained behavior goes away as soon as the sibling CPU of the
> boot CPU is brought back up. Hence, add a hack to restore the sibling
> CPU before all others on unfreeze. This keeps the TSC stable.
Uurgh, no. As you said that's a hack and I'm pretty sure that it just works
by chance. It makes the underlying wreckage not longer observable.
I'm pretty sure this is a BIOS bug and I'm really not going to make a
special case here just to accomodate with that particular broken
firmware. This would just set precedence for random ordering requests based
on DMI strings and other data to make it work on all kind of broken
motherboard/firmware/microcode combinations.
Surely nice detective work, but we really don't want to open this can of
worms.
Too bad that AMD does not have the TSC_ADJUST register. It would tell you
immediately what's wrong and the code we have for that would probably cure
the mess.
Sigh, it's more than 20 years by now that I'm complaining to both Intel and
AMD about the complete trainwreck they made out of TSC and it's still not
fixed. Though I still have the illusion that by the time I retire I get my
hands on a machine with a sane TSC implementation. Hope dies last ....
Oh well, enough ranted and with that I hand off the further proceedings to
Tom Lendacky who surely can give you more technical help than me in that
particular matter.
Thanks,
tglx
On Tue, 29 Jan 2019, Thomas Gleixner wrote:
> On Tue, 29 Jan 2019, Jan H. Schönherr wrote:
>
> > Am 29.01.2019 um 11:23 schrieb Jan H. Schönherr:
> > > +calibrate:
> > > + /*
> > > + * Extrapolate the error and fail fast if the error will
> > > + * never be below 500 ppm.
> > > + */
> > > + if (d1 + d1 >= (delta * MAX_QUICK_PIT_ITERATIONS) >> 11) {
> > > + pr_info("Fast TSC calibration failed (wouldn't work)\n");
> > > + return 0;
> > > + }
> >
> > I messed this check up. "delta" is not the actual tsc delta between the
> > first two successful reads at this point.
> >
> > (If it were, it might also correspond to more than one iteration; not
> > sure if we care about that aspect, though.)
>
> I rather go for correct and if only for the reason that me/you/whoelse
> doesn't have to scratch the head for an hour when looking at this half a
> year from now.
Any update on this?
Thanks,
tglx
Am 12.02.19 um 12:57 schrieb Thomas Gleixner:
> On Tue, 29 Jan 2019, Thomas Gleixner wrote:
>> On Tue, 29 Jan 2019, Jan H. Schönherr wrote:
>>
>>> Am 29.01.2019 um 11:23 schrieb Jan H. Schönherr:
>>>> +calibrate:
>>>> + /*
>>>> + * Extrapolate the error and fail fast if the error will
>>>> + * never be below 500 ppm.
>>>> + */
>>>> + if (d1 + d1 >= (delta * MAX_QUICK_PIT_ITERATIONS) >> 11) {
>>>> + pr_info("Fast TSC calibration failed (wouldn't work)\n");
>>>> + return 0;
>>>> + }
>>>
>>> I messed this check up. "delta" is not the actual tsc delta between the
>>> first two successful reads at this point.
>>>
>>> (If it were, it might also correspond to more than one iteration; not
>>> sure if we care about that aspect, though.)
>>
>> I rather go for correct and if only for the reason that me/you/whoelse
>> doesn't have to scratch the head for an hour when looking at this half a
>> year from now.
>
> Any update on this?
Second version is out.
Regards
Jan