From: Borislav Petkov <[email protected]>
Currently, the kernel uses
[LM]FENCE; RDTSC
in the timekeeping code, to guarantee monotonicity of time where the
*FENCE is selected based on vendor.
Replace that sequence with RDTSCP which is faster or on-par and gives
the same guarantees.
A microbenchmark on Intel shows that the change is on-par.
On AMD, the change is either on-par with the current LFENCE-prefixed
RDTSC and some are slightly better with RDTSCP.
The comparison is done with the LFENCE-prefixed RDTSC (and not with the
MFENCE-prefixed one, as one would normally expect) because all modern
AMD families make LFENCE serializing and thus avoid the heavy MFENCE by
effectively enabling X86_FEATURE_LFENCE_RDTSC.
Co-developed-by: Thomas Gleixner <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Thomas Lendacky <[email protected]>
Cc: [email protected]
---
arch/x86/include/asm/msr.h | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 91e4cf189914..f00f2b61d326 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -217,6 +217,8 @@ static __always_inline unsigned long long rdtsc(void)
*/
static __always_inline unsigned long long rdtsc_ordered(void)
{
+ DECLARE_ARGS(val, low, high);
+
/*
* The RDTSC instruction is not ordered relative to memory
* access. The Intel SDM and the AMD APM are both vague on this
@@ -227,9 +229,18 @@ static __always_inline unsigned long long rdtsc_ordered(void)
* ordering guarantees as reading from a global memory location
* that some other imaginary CPU is updating continuously with a
* time stamp.
+ *
+ * Thus, use the preferred barrier on the respective CPU, aiming for
+ * RDTSCP as the default.
*/
- barrier_nospec();
- return rdtsc();
+ asm volatile(ALTERNATIVE_2("mfence; rdtsc",
+ "lfence; rdtsc", X86_FEATURE_LFENCE_RDTSC,
+ "rdtscp", X86_FEATURE_RDTSCP)
+ : EAX_EDX_RET(val, low, high)
+ /* RDTSCP clobbers ECX with MSR_TSC_AUX. */
+ :: "ecx");
+
+ return EAX_EDX_VAL(val, low, high);
}
static inline unsigned long long native_read_pmc(int counter)
--
2.19.1
On 11/19/2018 12:45 PM, Borislav Petkov wrote:
> From: Borislav Petkov <[email protected]>
>
> Currently, the kernel uses
>
> [LM]FENCE; RDTSC
>
> in the timekeeping code, to guarantee monotonicity of time where the
> *FENCE is selected based on vendor.
>
> Replace that sequence with RDTSCP which is faster or on-par and gives
> the same guarantees.
>
> A microbenchmark on Intel shows that the change is on-par.
>
> On AMD, the change is either on-par with the current LFENCE-prefixed
> RDTSC and some are slightly better with RDTSCP.
>
> The comparison is done with the LFENCE-prefixed RDTSC (and not with the
> MFENCE-prefixed one, as one would normally expect) because all modern
> AMD families make LFENCE serializing and thus avoid the heavy MFENCE by
> effectively enabling X86_FEATURE_LFENCE_RDTSC.
Thanks for this. I had been wanting to do something similar since, on AMD,
a guest may not see the LFENCE-serializing bit if the hypervisor doesn't
expose it, and have to fall back to MFENCE.
Reviewed-by: Tom Lendacky <[email protected]>
>
> Co-developed-by: Thomas Gleixner <[email protected]>
> Signed-off-by: Thomas Gleixner <[email protected]>
> Signed-off-by: Borislav Petkov <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: John Stultz <[email protected]>
> Cc: Thomas Lendacky <[email protected]>
> Cc: [email protected]
> ---
> arch/x86/include/asm/msr.h | 15 +++++++++++++--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
> index 91e4cf189914..f00f2b61d326 100644
> --- a/arch/x86/include/asm/msr.h
> +++ b/arch/x86/include/asm/msr.h
> @@ -217,6 +217,8 @@ static __always_inline unsigned long long rdtsc(void)
> */
> static __always_inline unsigned long long rdtsc_ordered(void)
> {
> + DECLARE_ARGS(val, low, high);
> +
> /*
> * The RDTSC instruction is not ordered relative to memory
> * access. The Intel SDM and the AMD APM are both vague on this
> @@ -227,9 +229,18 @@ static __always_inline unsigned long long rdtsc_ordered(void)
> * ordering guarantees as reading from a global memory location
> * that some other imaginary CPU is updating continuously with a
> * time stamp.
> + *
> + * Thus, use the preferred barrier on the respective CPU, aiming for
> + * RDTSCP as the default.
> */
> - barrier_nospec();
> - return rdtsc();
> + asm volatile(ALTERNATIVE_2("mfence; rdtsc",
> + "lfence; rdtsc", X86_FEATURE_LFENCE_RDTSC,
> + "rdtscp", X86_FEATURE_RDTSCP)
> + : EAX_EDX_RET(val, low, high)
> + /* RDTSCP clobbers ECX with MSR_TSC_AUX. */
> + :: "ecx");
> +
> + return EAX_EDX_VAL(val, low, high);
> }
>
> static inline unsigned long long native_read_pmc(int counter)
>
On Mon, Nov 19, 2018 at 10:46 AM Borislav Petkov <[email protected]> wrote:
>
> From: Borislav Petkov <[email protected]>
>
> Currently, the kernel uses
>
> [LM]FENCE; RDTSC
>
> in the timekeeping code, to guarantee monotonicity of time where the
> *FENCE is selected based on vendor.
>
> Replace that sequence with RDTSCP which is faster or on-par and gives
> the same guarantees.
>
> A microbenchmark on Intel shows that the change is on-par.
>
> On AMD, the change is either on-par with the current LFENCE-prefixed
> RDTSC and some are slightly better with RDTSCP.
>
> The comparison is done with the LFENCE-prefixed RDTSC (and not with the
> MFENCE-prefixed one, as one would normally expect) because all modern
> AMD families make LFENCE serializing and thus avoid the heavy MFENCE by
> effectively enabling X86_FEATURE_LFENCE_RDTSC.
>
I thought I benchmarked this on Intel at some point and found the
LFENCE;RDTSC variant to be slightly faster. But I believe you, so:
Acked-by: Andy Lutomirski <[email protected]>
On 11/19/18 11:52 AM, Andy Lutomirski wrote:
>
> I thought I benchmarked this on Intel at some point and found the
> LFENCE;RDTSC variant to be slightly faster. But I believe you, so:
>
> Acked-by: Andy Lutomirski <[email protected]>
>
As long as the difference isn't significant, the simplicity would seem to be a
win.
-hpa
On Mon, Nov 19, 2018 at 12:17:35PM -0800, H. Peter Anvin wrote:
> On 11/19/18 11:52 AM, Andy Lutomirski wrote:
> >
> > I thought I benchmarked this on Intel at some point and found the
> > LFENCE;RDTSC variant to be slightly faster. But I believe you, so:
> >
> > Acked-by: Andy Lutomirski <[email protected]>
> >
>
> As long as the difference isn't significant, the simplicity would seem to be a
> win.
Right, I think by simplicity you mean RDTSCP. :)
Also in the sense that you have a single instruction which gives you
that barrier of waiting for all older insns to retire before reading the
TSC vs two where you don't necessarily know what happens on the uarch
level. I mean, nothing probably happens but RDTSCP is still simpler :)
Also, hpa, isn't LFENCE; RDTSC and RDTSCP equivalent on Intel? In the
sense that RDTSCP microcode practically adds an LFENCE before reading
the TSC?
Because SDM says:
"The RDTSCP instruction is not a serializing instruction, but it does
wait until all previous instructions have executed and all previous
loads are globally visible."
which sounds pretty much like an LFENCE to me:
"LFENCE does not execute until all prior instructions have completed
locally, and no later instruction begins execution until LFENCE
completes."
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On November 19, 2018 12:40:25 PM PST, Borislav Petkov <[email protected]> wrote:
>On Mon, Nov 19, 2018 at 12:17:35PM -0800, H. Peter Anvin wrote:
>> On 11/19/18 11:52 AM, Andy Lutomirski wrote:
>> >
>> > I thought I benchmarked this on Intel at some point and found the
>> > LFENCE;RDTSC variant to be slightly faster. But I believe you, so:
>> >
>> > Acked-by: Andy Lutomirski <[email protected]>
>> >
>>
>> As long as the difference isn't significant, the simplicity would
>seem to be a
>> win.
>
>Right, I think by simplicity you mean RDTSCP. :)
>
>Also in the sense that you have a single instruction which gives you
>that barrier of waiting for all older insns to retire before reading
>the
>TSC vs two where you don't necessarily know what happens on the uarch
>level. I mean, nothing probably happens but RDTSCP is still simpler :)
>
>Also, hpa, isn't LFENCE; RDTSC and RDTSCP equivalent on Intel? In the
>sense that RDTSCP microcode practically adds an LFENCE before reading
>the TSC?
>
>Because SDM says:
>
>"The RDTSCP instruction is not a serializing instruction, but it does
>wait until all previous instructions have executed and all previous
>loads are globally visible."
>
>which sounds pretty much like an LFENCE to me:
>
>"LFENCE does not execute until all prior instructions have completed
>locally, and no later instruction begins execution until LFENCE
>completes."
I don't know the exact sequence of fencing operations it is equivalent to, but it is probably something quite similar.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
Commit-ID: 2e94061096c5c3aa6c3fe3ec2bec176c1f9c1b07
Gitweb: https://git.kernel.org/tip/2e94061096c5c3aa6c3fe3ec2bec176c1f9c1b07
Author: Borislav Petkov <[email protected]>
AuthorDate: Thu, 12 Apr 2018 13:11:36 +0200
Committer: Borislav Petkov <[email protected]>
CommitDate: Mon, 19 Nov 2018 21:55:32 +0100
x86/TSC: Use RDTSCP
Currently, the kernel uses
[LM]FENCE; RDTSC
in the timekeeping code, to guarantee monotonicity of time where the
*FENCE is selected based on vendor.
Replace that sequence with RDTSCP which is faster or on-par and gives
the same guarantees.
A microbenchmark on Intel shows that the change is on-par.
On AMD, the change is either on-par with the current LFENCE-prefixed
RDTSC and some are slightly better with RDTSCP.
The comparison is done with the LFENCE-prefixed RDTSC (and not with the
MFENCE-prefixed one, as one would normally expect) because all modern
AMD families make LFENCE serializing and thus avoid the heavy MFENCE by
effectively enabling X86_FEATURE_LFENCE_RDTSC.
Co-developed-by: Thomas Gleixner <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Tom Lendacky <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: John Stultz <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/include/asm/msr.h | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 91e4cf189914..f00f2b61d326 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -217,6 +217,8 @@ static __always_inline unsigned long long rdtsc(void)
*/
static __always_inline unsigned long long rdtsc_ordered(void)
{
+ DECLARE_ARGS(val, low, high);
+
/*
* The RDTSC instruction is not ordered relative to memory
* access. The Intel SDM and the AMD APM are both vague on this
@@ -227,9 +229,18 @@ static __always_inline unsigned long long rdtsc_ordered(void)
* ordering guarantees as reading from a global memory location
* that some other imaginary CPU is updating continuously with a
* time stamp.
+ *
+ * Thus, use the preferred barrier on the respective CPU, aiming for
+ * RDTSCP as the default.
*/
- barrier_nospec();
- return rdtsc();
+ asm volatile(ALTERNATIVE_2("mfence; rdtsc",
+ "lfence; rdtsc", X86_FEATURE_LFENCE_RDTSC,
+ "rdtscp", X86_FEATURE_RDTSCP)
+ : EAX_EDX_RET(val, low, high)
+ /* RDTSCP clobbers ECX with MSR_TSC_AUX. */
+ :: "ecx");
+
+ return EAX_EDX_VAL(val, low, high);
}
static inline unsigned long long native_read_pmc(int counter)
On November 23, 2018 12:03:07 PM PST, Guenter Roeck <[email protected]> wrote:
>Hi,
>
>On Mon, Nov 19, 2018 at 07:45:56PM +0100, Borislav Petkov wrote:
>> From: Borislav Petkov <[email protected]>
>>
>> Currently, the kernel uses
>>
>> [LM]FENCE; RDTSC
>>
>> in the timekeeping code, to guarantee monotonicity of time where the
>> *FENCE is selected based on vendor.
>>
>> Replace that sequence with RDTSCP which is faster or on-par and gives
>> the same guarantees.
>>
>> A microbenchmark on Intel shows that the change is on-par.
>>
>> On AMD, the change is either on-par with the current LFENCE-prefixed
>> RDTSC and some are slightly better with RDTSCP.
>>
>> The comparison is done with the LFENCE-prefixed RDTSC (and not with
>the
>> MFENCE-prefixed one, as one would normally expect) because all modern
>> AMD families make LFENCE serializing and thus avoid the heavy MFENCE
>by
>> effectively enabling X86_FEATURE_LFENCE_RDTSC.
>>
>> Co-developed-by: Thomas Gleixner <[email protected]>
>> Signed-off-by: Thomas Gleixner <[email protected]>
>> Signed-off-by: Borislav Petkov <[email protected]>
>> Cc: Andy Lutomirski <[email protected]>
>> Cc: "H. Peter Anvin" <[email protected]>
>> Cc: John Stultz <[email protected]>
>> Cc: Thomas Lendacky <[email protected]>
>> Cc: [email protected]
>> ---
>> arch/x86/include/asm/msr.h | 15 +++++++++++++--
>> 1 file changed, 13 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
>> index 91e4cf189914..f00f2b61d326 100644
>> --- a/arch/x86/include/asm/msr.h
>> +++ b/arch/x86/include/asm/msr.h
>> @@ -217,6 +217,8 @@ static __always_inline unsigned long long
>rdtsc(void)
>> */
>> static __always_inline unsigned long long rdtsc_ordered(void)
>> {
>> + DECLARE_ARGS(val, low, high);
>> +
>> /*
>> * The RDTSC instruction is not ordered relative to memory
>> * access. The Intel SDM and the AMD APM are both vague on this
>> @@ -227,9 +229,18 @@ static __always_inline unsigned long long
>rdtsc_ordered(void)
>> * ordering guarantees as reading from a global memory location
>> * that some other imaginary CPU is updating continuously with a
>> * time stamp.
>> + *
>> + * Thus, use the preferred barrier on the respective CPU, aiming
>for
>> + * RDTSCP as the default.
>> */
>> - barrier_nospec();
>> - return rdtsc();
>> + asm volatile(ALTERNATIVE_2("mfence; rdtsc",
>> + "lfence; rdtsc", X86_FEATURE_LFENCE_RDTSC,
>> + "rdtscp", X86_FEATURE_RDTSCP)
>> + : EAX_EDX_RET(val, low, high)
>> + /* RDTSCP clobbers ECX with MSR_TSC_AUX. */
>> + :: "ecx");
>> +
>> + return EAX_EDX_VAL(val, low, high);
>> }
>>
>
>This patch results in a crash with certain qemu emulations.
>
>[ 0.756869] hpet0: 3 comparators, 64-bit 100.000000 MHz counter
>[ 0.762233] invalid opcode: 0000 [#1] PTI
>[ 0.762435] CPU: 0 PID: 1 Comm: swapper Not tainted
>4.20.0-rc3-next-20181123 #2
>[ 0.762644] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
>Ubuntu-1.8.2-1ubuntu1 04/01/2014
>[ 0.762832] EIP: read_tsc+0x4/0x10
>
>To reproduce:
>
>make ARCH=i386 defconfig
>echo "CONFIG_HIGHMEM64G=y" >> .config
>make ARCH=i386 olddefconfig
>make -j30 ARCH=i386
>
>qemu-system-i386 \
> -kernel arch/x86/boot/bzImage \
> -M q35 -cpu pentium3 -no-reboot -m 256 \
> -initrd rootfs.cpio \
> --append 'earlycon=uart8250,io,0x3f8,9600n8 rdinit=/sbin/init panic=-1
>console=ttyS0 console=tty' \
> -nographic -monitor none
>
>initrd or root file system doesn't really matter since the code never
>gets there, but it is available from here:
> https://github.com/groeck/linux-build-test/blob/master/rootfs/x86/rootfs.cpio.gz
>The qemu version does not matter either (I tried with 2.5 and 3.0).
>
>Reverting the patch fixes the problem.
>
>Guenter
>
>---
>crash log:
>
>...
>[ 0.756366] HPET: 3 timers in total, 0 timers will be used for
>per-cpu timer
>[ 0.756718] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
>[ 0.756869] hpet0: 3 comparators, 64-bit 100.000000 MHz counter
>[ 0.762233] invalid opcode: 0000 [#1] PTI
>[ 0.762435] CPU: 0 PID: 1 Comm: swapper Not tainted
>4.20.0-rc3-next-20181123
>#2
>[ 0.762644] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
>Ubuntu-1.8.2-1ubuntu1 04/01/2014
>[ 0.762832] EIP: read_tsc+0x4/0x10
>[ 0.762832] Code: 00 01 00 eb 89 90 55 89 e5 5d c3 90 90 90 90 90 90
>90 90 90 90 90 55 a1 44 5a 8b c5 89 e5 5d c3 8d b6 00 00 00 00 55 89 e5
>57 <0f> ae f0b
>[ 0.762832] EAX: c58062c0 EBX: c58062c0 ECX: 00000008 EDX: c4c1f0a0
>[ 0.762832] ESI: c58168c0 EDI: 00200086 EBP: cb495ed4 ESP: cb495ed0
>[ 0.762832] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 EFLAGS:
>00200002
>[ 0.762832] CR0: 80050033 CR2: 00000000 CR3: 0597a000 CR4: 000006f0
>[ 0.762832] Call Trace:
>[ 0.762832] tk_setup_internals.constprop.10+0x3d/0x260
>[ 0.762832] timekeeping_notify+0x56/0xc0
>[ 0.762832] __clocksource_select+0xf3/0x150
>[ 0.762832] ? boot_override_clock+0x42/0x42
>[ 0.762832] clocksource_done_booting+0x2d/0x3b
>[ 0.762832] do_one_initcall+0x68/0x15e
>[ 0.762832] ? set_debug_rodata+0xf/0xf
>[ 0.762832] kernel_init_freeable+0xec/0x18b
>[ 0.762832] ? rest_init+0x80/0x80
>[ 0.762832] kernel_init+0x8/0xf0
>[ 0.762832] ret_from_fork+0x2e/0x38
>[ 0.762832] Modules linked in:
>[ 0.762832] ---[ end trace ba4ba4587aec46c9 ]---
>[ 0.762832] EIP: read_tsc+0x4/0x10
>[ 0.762832] Code: 00 01 00 eb 89 90 55 89 e5 5d c3 90 90 90 90 90 90
>90 90 90 90 90 55 a1 44 5a 8b c5 89 e5 5d c3 8d b6 00 00 00 00 55 89 e5
>57 <0f> ae f0b
>[ 0.762832] EAX: c58062c0 EBX: c58062c0 ECX: 00000008 EDX: c4c1f0a0
>[ 0.762832] ESI: c58168c0 EDI: 00200086 EBP: cb495ed4 ESP: c597f47c
>[ 0.762832] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 EFLAGS:
>00200002
>[ 0.762832] CR0: 80050033 CR2: 00000000 CR3: 0597a000 CR4: 000006f0
>
>---
>bisect log:
>
># bad: [8c9733fd9806c71e7f2313a280f98cb3051f93df] Add linux-next
>specific files for 20181123
># good: [9ff01193a20d391e8dbce4403dd5ef87c7eaaca6] Linux 4.20-rc3
>git bisect start 'HEAD' 'v4.20-rc3'
># good: [34c2101b4f765edf1b91c2837da9c60fbf9f6912] Merge
>remote-tracking branch 'spi-nor/spi-nor/next'
>git bisect good 34c2101b4f765edf1b91c2837da9c60fbf9f6912
># good: [15367a0657fc8027ff3466bf0202bde9f270259b] Merge
>remote-tracking branch 'kgdb/kgdb-next'
>git bisect good 15367a0657fc8027ff3466bf0202bde9f270259b
># bad: [d29686ab179c34c5dbaac067a9effbeeb6a8073e] Merge remote-tracking
>branch 'soundwire/next'
>git bisect bad d29686ab179c34c5dbaac067a9effbeeb6a8073e
># bad: [7cd63670817c236ccaf21ffe9d7b4921afeab130] Merge remote-tracking
>branch 'tip/auto-latest'
>git bisect bad 7cd63670817c236ccaf21ffe9d7b4921afeab130
># good: [f9c8ebdcb052214954136426990e13a88e45e906] Merge
>remote-tracking branch 'audit/next'
>git bisect good f9c8ebdcb052214954136426990e13a88e45e906
># good: [4142a8a10e7673fa71ff4cb5b4693c4dda299f7c] Merge
>remote-tracking branch 'spi/for-next'
>git bisect good 4142a8a10e7673fa71ff4cb5b4693c4dda299f7c
># good: [fc922a4c7c72131f6c65e4af4e36ff25f3af1485] Merge branch
>'x86/cpu'
>git bisect good fc922a4c7c72131f6c65e4af4e36ff25f3af1485
># good: [906d11c1ff007ad18e2f3477cca04c437ce19c64] Merge branch
>'x86/microcode'
>git bisect good 906d11c1ff007ad18e2f3477cca04c437ce19c64
># good: [d836e1d42739a86738dd92a1b7ab6729b2485cba] Merge branch
>'x86/platform'
>git bisect good d836e1d42739a86738dd92a1b7ab6729b2485cba
># bad: [191bba684b7459842e72b52718ec61f8bdbd8499] Merge branch
>'x86/timers'
>git bisect bad 191bba684b7459842e72b52718ec61f8bdbd8499
># good: [f9cc0921aa4edd9bd4dd6943d5e38f69a301e8ee] Merge branch
>'x86/pti'
>git bisect good f9cc0921aa4edd9bd4dd6943d5e38f69a301e8ee
># bad: [2e94061096c5c3aa6c3fe3ec2bec176c1f9c1b07] x86/TSC: Use RDTSCP
>git bisect bad 2e94061096c5c3aa6c3fe3ec2bec176c1f9c1b07
># good: [a786ef152cdcfebc923a67f63c7815806eefcf81] x86/tsc: Make
>calibration refinement more robust
>git bisect good a786ef152cdcfebc923a67f63c7815806eefcf81
># first bad commit: [2e94061096c5c3aa6c3fe3ec2bec176c1f9c1b07] x86/TSC:
>Use RDTSCP
Right, because that cpu predates RDTSCP, so it needs to use a fallback.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
Hi,
On Mon, Nov 19, 2018 at 07:45:56PM +0100, Borislav Petkov wrote:
> From: Borislav Petkov <[email protected]>
>
> Currently, the kernel uses
>
> [LM]FENCE; RDTSC
>
> in the timekeeping code, to guarantee monotonicity of time where the
> *FENCE is selected based on vendor.
>
> Replace that sequence with RDTSCP which is faster or on-par and gives
> the same guarantees.
>
> A microbenchmark on Intel shows that the change is on-par.
>
> On AMD, the change is either on-par with the current LFENCE-prefixed
> RDTSC and some are slightly better with RDTSCP.
>
> The comparison is done with the LFENCE-prefixed RDTSC (and not with the
> MFENCE-prefixed one, as one would normally expect) because all modern
> AMD families make LFENCE serializing and thus avoid the heavy MFENCE by
> effectively enabling X86_FEATURE_LFENCE_RDTSC.
>
> Co-developed-by: Thomas Gleixner <[email protected]>
> Signed-off-by: Thomas Gleixner <[email protected]>
> Signed-off-by: Borislav Petkov <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: John Stultz <[email protected]>
> Cc: Thomas Lendacky <[email protected]>
> Cc: [email protected]
> ---
> arch/x86/include/asm/msr.h | 15 +++++++++++++--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
> index 91e4cf189914..f00f2b61d326 100644
> --- a/arch/x86/include/asm/msr.h
> +++ b/arch/x86/include/asm/msr.h
> @@ -217,6 +217,8 @@ static __always_inline unsigned long long rdtsc(void)
> */
> static __always_inline unsigned long long rdtsc_ordered(void)
> {
> + DECLARE_ARGS(val, low, high);
> +
> /*
> * The RDTSC instruction is not ordered relative to memory
> * access. The Intel SDM and the AMD APM are both vague on this
> @@ -227,9 +229,18 @@ static __always_inline unsigned long long rdtsc_ordered(void)
> * ordering guarantees as reading from a global memory location
> * that some other imaginary CPU is updating continuously with a
> * time stamp.
> + *
> + * Thus, use the preferred barrier on the respective CPU, aiming for
> + * RDTSCP as the default.
> */
> - barrier_nospec();
> - return rdtsc();
> + asm volatile(ALTERNATIVE_2("mfence; rdtsc",
> + "lfence; rdtsc", X86_FEATURE_LFENCE_RDTSC,
> + "rdtscp", X86_FEATURE_RDTSCP)
> + : EAX_EDX_RET(val, low, high)
> + /* RDTSCP clobbers ECX with MSR_TSC_AUX. */
> + :: "ecx");
> +
> + return EAX_EDX_VAL(val, low, high);
> }
>
This patch results in a crash with certain qemu emulations.
[ 0.756869] hpet0: 3 comparators, 64-bit 100.000000 MHz counter
[ 0.762233] invalid opcode: 0000 [#1] PTI
[ 0.762435] CPU: 0 PID: 1 Comm: swapper Not tainted 4.20.0-rc3-next-20181123 #2
[ 0.762644] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 0.762832] EIP: read_tsc+0x4/0x10
To reproduce:
make ARCH=i386 defconfig
echo "CONFIG_HIGHMEM64G=y" >> .config
make ARCH=i386 olddefconfig
make -j30 ARCH=i386
qemu-system-i386 \
-kernel arch/x86/boot/bzImage \
-M q35 -cpu pentium3 -no-reboot -m 256 \
-initrd rootfs.cpio \
--append 'earlycon=uart8250,io,0x3f8,9600n8 rdinit=/sbin/init panic=-1 console=ttyS0 console=tty' \
-nographic -monitor none
initrd or root file system doesn't really matter since the code never
gets there, but it is available from here:
https://github.com/groeck/linux-build-test/blob/master/rootfs/x86/rootfs.cpio.gz
The qemu version does not matter either (I tried with 2.5 and 3.0).
Reverting the patch fixes the problem.
Guenter
---
crash log:
...
[ 0.756366] HPET: 3 timers in total, 0 timers will be used for per-cpu timer
[ 0.756718] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
[ 0.756869] hpet0: 3 comparators, 64-bit 100.000000 MHz counter
[ 0.762233] invalid opcode: 0000 [#1] PTI
[ 0.762435] CPU: 0 PID: 1 Comm: swapper Not tainted 4.20.0-rc3-next-20181123
#2
[ 0.762644] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 0.762832] EIP: read_tsc+0x4/0x10
[ 0.762832] Code: 00 01 00 eb 89 90 55 89 e5 5d c3 90 90 90 90 90 90 90 90 90 90 90 55 a1 44 5a 8b c5 89 e5 5d c3 8d b6 00 00 00 00 55 89 e5 57 <0f> ae f0b
[ 0.762832] EAX: c58062c0 EBX: c58062c0 ECX: 00000008 EDX: c4c1f0a0
[ 0.762832] ESI: c58168c0 EDI: 00200086 EBP: cb495ed4 ESP: cb495ed0
[ 0.762832] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 EFLAGS: 00200002
[ 0.762832] CR0: 80050033 CR2: 00000000 CR3: 0597a000 CR4: 000006f0
[ 0.762832] Call Trace:
[ 0.762832] tk_setup_internals.constprop.10+0x3d/0x260
[ 0.762832] timekeeping_notify+0x56/0xc0
[ 0.762832] __clocksource_select+0xf3/0x150
[ 0.762832] ? boot_override_clock+0x42/0x42
[ 0.762832] clocksource_done_booting+0x2d/0x3b
[ 0.762832] do_one_initcall+0x68/0x15e
[ 0.762832] ? set_debug_rodata+0xf/0xf
[ 0.762832] kernel_init_freeable+0xec/0x18b
[ 0.762832] ? rest_init+0x80/0x80
[ 0.762832] kernel_init+0x8/0xf0
[ 0.762832] ret_from_fork+0x2e/0x38
[ 0.762832] Modules linked in:
[ 0.762832] ---[ end trace ba4ba4587aec46c9 ]---
[ 0.762832] EIP: read_tsc+0x4/0x10
[ 0.762832] Code: 00 01 00 eb 89 90 55 89 e5 5d c3 90 90 90 90 90 90 90 90 90 90 90 55 a1 44 5a 8b c5 89 e5 5d c3 8d b6 00 00 00 00 55 89 e5 57 <0f> ae f0b
[ 0.762832] EAX: c58062c0 EBX: c58062c0 ECX: 00000008 EDX: c4c1f0a0
[ 0.762832] ESI: c58168c0 EDI: 00200086 EBP: cb495ed4 ESP: c597f47c
[ 0.762832] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 EFLAGS: 00200002
[ 0.762832] CR0: 80050033 CR2: 00000000 CR3: 0597a000 CR4: 000006f0
---
bisect log:
# bad: [8c9733fd9806c71e7f2313a280f98cb3051f93df] Add linux-next specific files for 20181123
# good: [9ff01193a20d391e8dbce4403dd5ef87c7eaaca6] Linux 4.20-rc3
git bisect start 'HEAD' 'v4.20-rc3'
# good: [34c2101b4f765edf1b91c2837da9c60fbf9f6912] Merge remote-tracking branch 'spi-nor/spi-nor/next'
git bisect good 34c2101b4f765edf1b91c2837da9c60fbf9f6912
# good: [15367a0657fc8027ff3466bf0202bde9f270259b] Merge remote-tracking branch 'kgdb/kgdb-next'
git bisect good 15367a0657fc8027ff3466bf0202bde9f270259b
# bad: [d29686ab179c34c5dbaac067a9effbeeb6a8073e] Merge remote-tracking branch 'soundwire/next'
git bisect bad d29686ab179c34c5dbaac067a9effbeeb6a8073e
# bad: [7cd63670817c236ccaf21ffe9d7b4921afeab130] Merge remote-tracking branch 'tip/auto-latest'
git bisect bad 7cd63670817c236ccaf21ffe9d7b4921afeab130
# good: [f9c8ebdcb052214954136426990e13a88e45e906] Merge remote-tracking branch 'audit/next'
git bisect good f9c8ebdcb052214954136426990e13a88e45e906
# good: [4142a8a10e7673fa71ff4cb5b4693c4dda299f7c] Merge remote-tracking branch 'spi/for-next'
git bisect good 4142a8a10e7673fa71ff4cb5b4693c4dda299f7c
# good: [fc922a4c7c72131f6c65e4af4e36ff25f3af1485] Merge branch 'x86/cpu'
git bisect good fc922a4c7c72131f6c65e4af4e36ff25f3af1485
# good: [906d11c1ff007ad18e2f3477cca04c437ce19c64] Merge branch 'x86/microcode'
git bisect good 906d11c1ff007ad18e2f3477cca04c437ce19c64
# good: [d836e1d42739a86738dd92a1b7ab6729b2485cba] Merge branch 'x86/platform'
git bisect good d836e1d42739a86738dd92a1b7ab6729b2485cba
# bad: [191bba684b7459842e72b52718ec61f8bdbd8499] Merge branch 'x86/timers'
git bisect bad 191bba684b7459842e72b52718ec61f8bdbd8499
# good: [f9cc0921aa4edd9bd4dd6943d5e38f69a301e8ee] Merge branch 'x86/pti'
git bisect good f9cc0921aa4edd9bd4dd6943d5e38f69a301e8ee
# bad: [2e94061096c5c3aa6c3fe3ec2bec176c1f9c1b07] x86/TSC: Use RDTSCP
git bisect bad 2e94061096c5c3aa6c3fe3ec2bec176c1f9c1b07
# good: [a786ef152cdcfebc923a67f63c7815806eefcf81] x86/tsc: Make calibration refinement more robust
git bisect good a786ef152cdcfebc923a67f63c7815806eefcf81
# first bad commit: [2e94061096c5c3aa6c3fe3ec2bec176c1f9c1b07] x86/TSC: Use RDTSCP
On 11/23/18 12:44 PM, Borislav Petkov wrote:
> On Fri, Nov 23, 2018 at 12:03:07PM -0800, Guenter Roeck wrote:
>> [ 0.762832] EIP: read_tsc+0x4/0x10
>> [ 0.762832] Code: 00 01 00 eb 89 90 55 89 e5 5d c3 90 90 90 90 90 90 90 90 90 90 90 55 a1 44 5a 8b c5 89 e5 5d c3 8d b6 00 00 00 00 55 89 e5 57 <0f> ae f0b
>
> Where does that 'b' in f0b come from?
>
Good catch.
It is a cut off screen log. x86 boots change xterm configuration from
wrap to non-wrap, and I did a cut-and-paste instead of copying the log
to a file. Sorry for that.
Guenter
> But ok, I was able to reproduce and decode myself. So if the Code:
> section is correct, qemu chokes on MFENCE.
>
> [ 0.854209] Code: 90 90 90 90 90 90 90 a1 84 37 11 cd c3 8d b4 26 00 00 00 00 8d 76 00 c3 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00 90 55 89 e5 <0f> ae f0 0f 31 5d c3 8d b6 00 00 00 00 55 89 e5 57 31 ff 56 53 89
> All code
> ========
> 0: 90 nop
> 1: 90 nop
> 2: 90 nop
> 3: 90 nop
> 4: 90 nop
> 5: 90 nop
> 6: 90 nop
> 7: a1 84 37 11 cd mov 0xcd113784,%eax
> c: c3 ret
> d: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
> 14: 8d 76 00 lea 0x0(%esi),%esi
> 17: c3 ret
> 18: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
> 1f: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
> 26: 90 nop
> 27: 55 push %ebp
> 28: 89 e5 mov %esp,%ebp
> 2a:* 0f ae f0 mfence <-- trapping instruction
> 2d: 0f 31 rdtsc
> 2f: 5d pop %ebp
> 30: c3 ret
> 31: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
> 37: 55 push %ebp
> 38: 89 e5 mov %esp,%ebp
> 3a: 57 push %edi
> 3b: 31 ff xor %edi,%edi
> 3d: 56 push %esi
> 3e: 53 push %ebx
> 3f: 89 .byte 0x89
>
> Doing this:
>
> asm volatile(ALTERNATIVE_2("mfence", ...
>
> fails too which confirms that P3 can't do MFENCE.
>
> I need to think about how to handle that old cruft.
>
> Thx for the report.
>
On Fri, 23 Nov 2018, [email protected] wrote:
> On November 23, 2018 12:03:07 PM PST, Guenter Roeck <[email protected]> wrote:
> ># first bad commit: [2e94061096c5c3aa6c3fe3ec2bec176c1f9c1b07] x86/TSC:
> >Use RDTSCP
>
> Right, because that cpu predates RDTSCP, so it needs to use a fallback.
Well, that's not the problem.
The default here is MFENCE; RDTSC; But P3 does not have MFENCE either :)
We need alternative_3 with default RDTSC
Thanks,
tglx
On Fri, Nov 23, 2018 at 12:03:07PM -0800, Guenter Roeck wrote:
> [ 0.762832] EIP: read_tsc+0x4/0x10
> [ 0.762832] Code: 00 01 00 eb 89 90 55 89 e5 5d c3 90 90 90 90 90 90 90 90 90 90 90 55 a1 44 5a 8b c5 89 e5 5d c3 8d b6 00 00 00 00 55 89 e5 57 <0f> ae f0b
Where does that 'b' in f0b come from?
But ok, I was able to reproduce and decode myself. So if the Code:
section is correct, qemu chokes on MFENCE.
[ 0.854209] Code: 90 90 90 90 90 90 90 a1 84 37 11 cd c3 8d b4 26 00 00 00 00 8d 76 00 c3 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00 90 55 89 e5 <0f> ae f0 0f 31 5d c3 8d b6 00 00 00 00 55 89 e5 57 31 ff 56 53 89
All code
========
0: 90 nop
1: 90 nop
2: 90 nop
3: 90 nop
4: 90 nop
5: 90 nop
6: 90 nop
7: a1 84 37 11 cd mov 0xcd113784,%eax
c: c3 ret
d: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
14: 8d 76 00 lea 0x0(%esi),%esi
17: c3 ret
18: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
1f: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
26: 90 nop
27: 55 push %ebp
28: 89 e5 mov %esp,%ebp
2a:* 0f ae f0 mfence <-- trapping instruction
2d: 0f 31 rdtsc
2f: 5d pop %ebp
30: c3 ret
31: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
37: 55 push %ebp
38: 89 e5 mov %esp,%ebp
3a: 57 push %edi
3b: 31 ff xor %edi,%edi
3d: 56 push %esi
3e: 53 push %ebx
3f: 89 .byte 0x89
Doing this:
asm volatile(ALTERNATIVE_2("mfence", ...
fails too which confirms that P3 can't do MFENCE.
I need to think about how to handle that old cruft.
Thx for the report.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Fri, Nov 23, 2018 at 01:03:25PM -0800, Guenter Roeck wrote:
> It is a cut off screen log. x86 boots change xterm configuration from
> wrap to non-wrap, and I did a cut-and-paste instead of copying the log
> to a file. Sorry for that.
No worries.
It was a head-scratcher though because look what it decodes to with the
'b':
[ 0.854209] Code: 90 90 90 90 90 90 90 a1 84 37 11 cd c3 8d b4 26 00 00 00 00 8d 76 00 c3 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00 90 55 89 e5 <0f> ae f0b 0f 31 5d c3 8d b6 00 00 00 00 55 89 e5 57 31 ff 56 53 89
All code
========
0: 90 nop
1: 90 nop
2: 90 nop
3: 90 nop
4: 90 nop
5: 90 nop
6: 90 nop
7: a1 84 37 11 cd mov 0xcd113784,%eax
c: c3 ret
d: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
14: 8d 76 00 lea 0x0(%esi),%esi
17: c3 ret
18: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
1f: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
26: 90 nop
27: 55 push %ebp
28: 89 e5 mov %esp,%ebp
2a:* 0f ae 0b fxrstor (%ebx) <-- trapping instruction
2d: 0f 31 rdtsc
2f: 5d pop %ebp
30: c3 ret
FXRSTOR?!?!?
:-)
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
Ok,
I have something which looks like it works here, even with the pentium3
qemu CPU. I'll be hammering on it in the coming days but if you wanna
give it a try, here's a conglomerate patch:
---
From 0c6adce02d2e7f3b5bbdc4cbe3eb3dae99448def Mon Sep 17 00:00:00 2001
From: Borislav Petkov <[email protected]>
Date: Fri, 7 Dec 2018 18:54:23 +0100
Subject: [PATCH] WIP
Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/include/asm/alternative.h | 26 +++++++++++++++++++++++++-
arch/x86/include/asm/msr.h | 16 ++++++++++++++--
arch/x86/kernel/alternative.c | 4 ++--
3 files changed, 41 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index ea9886651c39..db8ebe5dd5be 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -114,6 +114,16 @@ static inline int alternatives_text_reserved(void *start, void *end)
"(" alt_max_short(alt_rlen(num1), alt_rlen(num2)) " - (" alt_slen ")), 0x90\n" \
alt_end_marker ":\n"
+#define OLDINSTR_3(oldinsn, n1, n2, n3) \
+ "# ALT: oldinstr\n" \
+ "661:\n\t" oldinsn "\n662:\n" \
+ "# ALT: padding\n" \
+ ".skip -((" alt_max_short(alt_max_short(alt_rlen(n1), alt_rlen(n2)), alt_rlen(n3)) \
+ " - (" alt_slen ")) > 0) * " \
+ "(" alt_max_short(alt_max_short(alt_rlen(n1), alt_rlen(n2)), alt_rlen(n3)) \
+ " - (" alt_slen ")), 0x90\n" \
+ alt_end_marker ":\n"
+
#define ALTINSTR_ENTRY(feature, num) \
" .long 661b - .\n" /* label */ \
" .long " b_replacement(num)"f - .\n" /* new instruction */ \
@@ -122,7 +132,8 @@ static inline int alternatives_text_reserved(void *start, void *end)
" .byte " alt_rlen(num) "\n" /* replacement len */ \
" .byte " alt_pad_len "\n" /* pad len */
-#define ALTINSTR_REPLACEMENT(newinstr, feature, num) /* replacement */ \
+#define ALTINSTR_REPLACEMENT(newinstr, feature, num) /* replacement */ \
+ "# ALT: replacement " #num "\n" \
b_replacement(num)":\n\t" newinstr "\n" e_replacement(num) ":\n\t"
/* alternative assembly primitive: */
@@ -146,6 +157,19 @@ static inline int alternatives_text_reserved(void *start, void *end)
ALTINSTR_REPLACEMENT(newinstr2, feature2, 2) \
".popsection\n"
+#define ALTERNATIVE_3(oldinsn, newinsn1, feat1, newinsn2, feat2, newinsn3, feat3) \
+ OLDINSTR_3(oldinsn, 1, 2, 3) \
+ ".pushsection .altinstructions,\"a\"\n" \
+ ALTINSTR_ENTRY(feat1, 1) \
+ ALTINSTR_ENTRY(feat2, 2) \
+ ALTINSTR_ENTRY(feat3, 3) \
+ ".popsection\n" \
+ ".pushsection .altinstr_replacement, \"ax\"\n" \
+ ALTINSTR_REPLACEMENT(newinsn1, feat1, 1) \
+ ALTINSTR_REPLACEMENT(newinsn2, feat2, 2) \
+ ALTINSTR_REPLACEMENT(newinsn3, feat3, 3) \
+ ".popsection\n"
+
/*
* Alternative instructions for different CPU types or capabilities.
*
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 91e4cf189914..5cc3930cb465 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -217,6 +217,8 @@ static __always_inline unsigned long long rdtsc(void)
*/
static __always_inline unsigned long long rdtsc_ordered(void)
{
+ DECLARE_ARGS(val, low, high);
+
/*
* The RDTSC instruction is not ordered relative to memory
* access. The Intel SDM and the AMD APM are both vague on this
@@ -227,9 +229,19 @@ static __always_inline unsigned long long rdtsc_ordered(void)
* ordering guarantees as reading from a global memory location
* that some other imaginary CPU is updating continuously with a
* time stamp.
+ *
+ * Thus, use the preferred barrier on the respective CPU, aiming for
+ * RDTSCP as the default.
*/
- barrier_nospec();
- return rdtsc();
+ asm volatile(ALTERNATIVE_3("rdtsc",
+ "mfence; rdtsc", X86_FEATURE_MFENCE_RDTSC,
+ "lfence; rdtsc", X86_FEATURE_LFENCE_RDTSC,
+ "rdtscp", X86_FEATURE_RDTSCP)
+ : EAX_EDX_RET(val, low, high)
+ /* RDTSCP clobbers ECX with MSR_TSC_AUX. */
+ :: "ecx");
+
+ return EAX_EDX_VAL(val, low, high);
}
static inline unsigned long long native_read_pmc(int counter)
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index ebeac487a20c..d458c7973c56 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -393,10 +393,10 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start,
continue;
}
- DPRINTK("feat: %d*32+%d, old: (%px len: %d), repl: (%px, len: %d), pad: %d",
+ DPRINTK("feat: %d*32+%d, old: (%pS (%px) len: %d), repl: (%px, len: %d), pad: %d",
a->cpuid >> 5,
a->cpuid & 0x1f,
- instr, a->instrlen,
+ instr, instr, a->instrlen,
replacement, a->replacementlen, a->padlen);
DUMP_BYTES(instr, a->instrlen, "%px: old_insn: ", instr);
--
2.19.1
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
Commit-ID: 093ae8f9a86a974c920b613860f1f7fd5bbd70ab
Gitweb: https://git.kernel.org/tip/093ae8f9a86a974c920b613860f1f7fd5bbd70ab
Author: Borislav Petkov <[email protected]>
AuthorDate: Thu, 12 Apr 2018 13:11:36 +0200
Committer: Borislav Petkov <[email protected]>
CommitDate: Wed, 16 Jan 2019 12:43:08 +0100
x86/TSC: Use RDTSCP
Currently, the kernel uses
[LM]FENCE; RDTSC
in the timekeeping code, to guarantee monotonicity of time where the
*FENCE is selected based on vendor.
Replace that sequence with RDTSCP which is faster or on-par and gives
the same guarantees.
A microbenchmark on Intel shows that the change is on-par.
On AMD, the change is either on-par with the current LFENCE-prefixed
RDTSC or slightly better with RDTSCP.
The comparison is done with the LFENCE-prefixed RDTSC (and not with the
MFENCE-prefixed one, as one would normally expect) because all modern
AMD families make LFENCE serializing and thus avoid the heavy MFENCE by
effectively enabling X86_FEATURE_LFENCE_RDTSC.
Co-developed-by: Thomas Gleixner <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: John Stultz <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/include/asm/msr.h | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 91e4cf189914..5cc3930cb465 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -217,6 +217,8 @@ static __always_inline unsigned long long rdtsc(void)
*/
static __always_inline unsigned long long rdtsc_ordered(void)
{
+ DECLARE_ARGS(val, low, high);
+
/*
* The RDTSC instruction is not ordered relative to memory
* access. The Intel SDM and the AMD APM are both vague on this
@@ -227,9 +229,19 @@ static __always_inline unsigned long long rdtsc_ordered(void)
* ordering guarantees as reading from a global memory location
* that some other imaginary CPU is updating continuously with a
* time stamp.
+ *
+ * Thus, use the preferred barrier on the respective CPU, aiming for
+ * RDTSCP as the default.
*/
- barrier_nospec();
- return rdtsc();
+ asm volatile(ALTERNATIVE_3("rdtsc",
+ "mfence; rdtsc", X86_FEATURE_MFENCE_RDTSC,
+ "lfence; rdtsc", X86_FEATURE_LFENCE_RDTSC,
+ "rdtscp", X86_FEATURE_RDTSCP)
+ : EAX_EDX_RET(val, low, high)
+ /* RDTSCP clobbers ECX with MSR_TSC_AUX. */
+ :: "ecx");
+
+ return EAX_EDX_VAL(val, low, high);
}
static inline unsigned long long native_read_pmc(int counter)