2023-06-06 11:59:00

by Feng Tang

[permalink] [raw]
Subject: Re: PROBLEM: skew message does not handle negative ns skew

Hi,

Could you share more info about the hardware, like which generation,
how many sockets or numa nodes (output of lscpu, 'numactl -h') ?

Thanks,
Feng

On Tue, Jun 06, 2023 at 11:33:50AM +0100, Chris Bainbridge wrote:
> Hi,
>
> I noticed this message in the log (booting latest linux master
> v6.4-rc5-2-gf8dba31b0a82):
>
> [ 1.416270] clocksource: tsc: mask: 0xffffffffffffffff max_cycles:
> 0x36c4175520f, ma
> x_idle_ns: 881590509340 ns
> [ 2.087102] clocksource: timekeeping watchdog on CPU3: Marking
> clocksource 'tsc' as unstable because the skew is too large:
> [ 2.087105] clocksource: 'hpet' wd_nsec: 512006134
> wd_now: 1c0c752 wd_last: 150ea9e mask: ffffffff
> [ 2.087107] clocksource: 'tsc' cs_nsec: 511127975
> cs_now: 65279672b cs_last: 618995074 mask: ffffffffffffffff
> [ 2.087108] clocksource: Clocksource 'tsc' skewed
> -878159 ns (18446744073708 ms) over watchdog 'hpet' interval of 512006134
> ns (512 ms)
> [ 2.087110] clocksource: 'tsc-early' (not 'tsc')
> is current clocksource.
>
> Note: Clocksource 'tsc' skewed -878159 ns (18446744073708 ms)
>
> It looks like this message was introduced in December 2022, in commit
> dd029269947a


2023-06-06 12:42:15

by Chris Bainbridge

[permalink] [raw]
Subject: Re: PROBLEM: skew message does not handle negative ns skew

On Tue, 6 Jun 2023 at 12:35, Feng Tang <[email protected]> wrote:
>
> Hi,
>
> Could you share more info about the hardware, like which generation,
> how many sockets or numa nodes (output of lscpu, 'numactl -h') ?
>
> Thanks,
> Feng

The hardware is a HP Pavilion Aero 13 laptop.

$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Vendor ID: AuthenticAMD
Model name: AMD Ryzen 7 5800U with Radeon Graphics
CPU family: 25
Model: 80
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
Stepping: 0
Frequency boost: enabled
CPU(s) scaling MHz: 35%
CPU max MHz: 4505.0781
CPU min MHz: 1600.0000
BogoMIPS: 3792.93
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep
mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht
syscall nx mmxext fxsr_
opt pdpe1gb rdtscp lm constant_tsc rep_good
nopl nonstop_tsc c
puid extd_apicid aperfmperf rapl pni
pclmulqdq monitor ssse3 f
ma cx16 sse4_1 sse4_2 movbe popcnt aes xsave
avx f16c rdrand l
ahf_lm cmp_legacy svm extapic cr8_legacy abm
sse4a misalignsse
3dnowprefetch osvw ibs skinit wdt tce
topoext perfctr_core pe
rfctr_nb bpext perfctr_llc mwaitx cpb cat_l3
cdp_l3 hw_pstate
ssbd mba ibrs ibpb stibp vmmcall fsgsbase
bmi1 avx2 smep bmi2
erms invpcid cqm rdt_a rdseed adx smap
clflushopt clwb sha_ni
xsaveopt xsavec xgetbv1 xsaves cqm_llc
cqm_occup_llc cqm_mbm_t
otal cqm_mbm_local clzero irperf xsaveerptr
rdpru wbnoinvd cpp
c arat npt lbrv svm_lock nrip_save tsc_scale
vmcb_clean flushb
yasid decodeassists pausefilter pfthreshold
avic v_vmsave_vmlo
ad vgif v_spec_ctrl umip pku ospke vaes
vpclmulqdq rdpid overf
low_recov succor smca fsrm
Virtualization features:
Virtualization: AMD-V
Caches (sum of all):
L1d: 256 KiB (8 instances)
L1i: 256 KiB (8 instances)
L2: 4 MiB (8 instances)
L3: 16 MiB (1 instance)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-15
Vulnerabilities:
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Retbleed: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; usercopy/swapgs barriers and
__user pointer saniti
zation
Spectre v2: Mitigation; Retpolines, IBPB conditional,
IBRS_FW, STIBP alway
s-on, RSB filling, PBRSB-eIBRS Not affected
Srbds: Not affected
Tsx async abort: Not affected

$ numactl -H
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 0 size: 15331 MB
node 0 free: 789 MB
node distances:
node 0
0: 10

2023-06-06 13:23:56

by Feng Tang

[permalink] [raw]
Subject: Re: PROBLEM: skew message does not handle negative ns skew

On Tue, Jun 06, 2023 at 01:28:50PM +0100, Chris Bainbridge wrote:
> On Tue, 6 Jun 2023 at 12:35, Feng Tang <[email protected]> wrote:
> >
> > Hi,
> >
> > Could you share more info about the hardware, like which generation,
> > how many sockets or numa nodes (output of lscpu, 'numactl -h') ?
> >
> > Thanks,
> > Feng
>
> The hardware is a HP Pavilion Aero 13 laptop.
>
> $ lscpu
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> Address sizes: 48 bits physical, 48 bits virtual
> Byte Order: Little Endian
> CPU(s): 16
> On-line CPU(s) list: 0-15
> Vendor ID: AuthenticAMD
> Model name: AMD Ryzen 7 5800U with Radeon Graphics
> CPU family: 25
> Model: 80
> Thread(s) per core: 2
> Core(s) per socket: 8
> Socket(s): 1
> Stepping: 0
> Frequency boost: enabled
> CPU(s) scaling MHz: 35%
> CPU max MHz: 4505.0781
> CPU min MHz: 1600.0000
> BogoMIPS: 3792.93
> Flags: fpu vme de pse tsc msr pae mce cx8 apic sep
> mtrr pge mca cmov
> pat pse36 clflush mmx fxsr sse sse2 ht
> syscall nx mmxext fxsr_
> opt pdpe1gb rdtscp lm constant_tsc rep_good
> nopl nonstop_tsc c
> puid extd_apicid aperfmperf rapl pni
> pclmulqdq monitor ssse3 f
> ma cx16 sse4_1 sse4_2 movbe popcnt aes xsave
> avx f16c rdrand l
> ahf_lm cmp_legacy svm extapic cr8_legacy abm
> sse4a misalignsse
> 3dnowprefetch osvw ibs skinit wdt tce
> topoext perfctr_core pe
> rfctr_nb bpext perfctr_llc mwaitx cpb cat_l3
> cdp_l3 hw_pstate
> ssbd mba ibrs ibpb stibp vmmcall fsgsbase
> bmi1 avx2 smep bmi2
> erms invpcid cqm rdt_a rdseed adx smap
> clflushopt clwb sha_ni
> xsaveopt xsavec xgetbv1 xsaves cqm_llc
> cqm_occup_llc cqm_mbm_t
> otal cqm_mbm_local clzero irperf xsaveerptr
> rdpru wbnoinvd cpp
> c arat npt lbrv svm_lock nrip_save tsc_scale
> vmcb_clean flushb
> yasid decodeassists pausefilter pfthreshold
> avic v_vmsave_vmlo
> ad vgif v_spec_ctrl umip pku ospke vaes
> vpclmulqdq rdpid overf
> low_recov succor smca fsrm


There is a commit to lift the watchdog check for morden qualified
platforms: b50db7095fe0 ("Disable clocksource watchdog for TSC on
qualified platorms"). But the patforms need to have 'tsc_adjust'
feature (has a TSC_ADJUST MSR), which can't be found in the above
lscpu info.

And I'm have no idea if there is a real hardware/firmware issue
or just a false alarm.

Thanks,
Feng

> Virtualization features:
> Virtualization: AMD-V
> Caches (sum of all):
> L1d: 256 KiB (8 instances)
> L1i: 256 KiB (8 instances)
> L2: 4 MiB (8 instances)
> L3: 16 MiB (1 instance)
> NUMA:
> NUMA node(s): 1
> NUMA node0 CPU(s): 0-15
> Vulnerabilities:
> Itlb multihit: Not affected
> L1tf: Not affected
> Mds: Not affected
> Meltdown: Not affected
> Mmio stale data: Not affected
> Retbleed: Not affected
> Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
> Spectre v1: Mitigation; usercopy/swapgs barriers and
> __user pointer saniti
> zation
> Spectre v2: Mitigation; Retpolines, IBPB conditional,
> IBRS_FW, STIBP alway
> s-on, RSB filling, PBRSB-eIBRS Not affected
> Srbds: Not affected
> Tsx async abort: Not affected
>
> $ numactl -H
> available: 1 nodes (0)
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> node 0 size: 15331 MB
> node 0 free: 789 MB
> node distances:
> node 0
> 0: 10