2022-12-08 18:05:27

by Krister Johansen

[permalink] [raw]
Subject: [PATCH linux-next] x86/xen/time: prefer tsc as clocksource when it is invariant

Kvm elects to use tsc instead of kvm-clock when it can detect that the
TSC is invariant.

(As of commit 7539b174aef4 ("x86: kvmguest: use TSC clocksource if
invariant TSC is exposed")).

Notable cloud vendors[1] and performance engineers[2] recommend that Xen
users preferentially select tsc over xen-clocksource due the performance
penalty incurred by the latter. These articles are persuasive and
tailored to specific use cases. In order to understand the tradeoffs
around this choice more fully, this author had to reference the
documented[3] complexities around the Xen configuration, as well as the
kernel's clocksource selection algorithm. Many users may not attempt
this to correctly configure the right clock source in their guest.

The approach taken in the kvm-clock module spares users this confusion,
where possible.

Both the Intel SDM[4] and the Xen tsc documentation explain that marking
a tsc as invariant means that it should be considered stable by the OS
and is elibile to be used as a wall clock source. The Xen documentation
further clarifies that this is only reliable on HVM and PVH because PV
cannot intercept a cpuid instruction.

In order to obtain better out-of-the-box performance, and reduce the
need for user tuning, follow kvm's approach and decrease the xen clock
rating so that tsc is preferable, if it is invariant, stable, and the
guest is a HVM or PVH domain.

[1] https://aws.amazon.com/premiumsupport/knowledge-center/manage-ec2-linux-clock-source/
[2] https://www.brendangregg.com/blog/2021-09-26/the-speed-of-time.html
[3] https://xenbits.xen.org/docs/unstable/man/xen-tscmode.7.html
[4] Intel 64 and IA-32 Architectures Sofware Developer's Manual Volume
3b: System Programming Guide, Part 2, Section 17.17.1, Invariant TSC

Signed-off-by: Krister Johansen <[email protected]>
Code-reviewed-by: David Reaver <[email protected]>
---
arch/x86/xen/time.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index 9ef0a5cca96e..ca78581e4221 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -480,9 +480,22 @@ static void __init xen_time_init(void)
int cpu = smp_processor_id();
struct timespec64 tp;

- /* As Dom0 is never moved, no penalty on using TSC there */
- if (xen_initial_domain())
+ /*
+ * As Dom0 is never moved, no penalty on using TSC there.
+ *
+ * If the guest has invariant tsc, then set xen_clocksource rating
+ * below that of the tsc so that the system prefers tsc instead. This
+ * check excludes PV domains, because PV is unable to guarantee that the
+ * guest's cpuid call has been intercepted by the hypervisor.
+ */
+ if (xen_initial_domain()) {
xen_clocksource.rating = 275;
+ } else if ((xen_hvm_domain() || xen_pvh_domain()) &&
+ boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
+ boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
+ !check_tsc_unstable()) {
+ xen_clocksource.rating = 299;
+ }

clocksource_register_hz(&xen_clocksource, NSEC_PER_SEC);

--
2.25.1


2022-12-09 20:25:42

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [PATCH linux-next] x86/xen/time: prefer tsc as clocksource when it is invariant


On 12/8/22 11:36 AM, Krister Johansen wrote:
> + /*
> + * As Dom0 is never moved, no penalty on using TSC there.
> + *
> + * If the guest has invariant tsc, then set xen_clocksource rating
> + * below that of the tsc so that the system prefers tsc instead. This
> + * check excludes PV domains, because PV is unable to guarantee that the
> + * guest's cpuid call has been intercepted by the hypervisor.
> + */
> + if (xen_initial_domain()) {
> xen_clocksource.rating = 275;
> + } else if ((xen_hvm_domain() || xen_pvh_domain()) &&
> + boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
> + boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
> + !check_tsc_unstable()) {
> + xen_clocksource.rating = 299;
> + }


What if RDTSC is intercepted?


-boris

2022-12-12 16:32:15

by Krister Johansen

[permalink] [raw]
Subject: Re: [PATCH linux-next] x86/xen/time: prefer tsc as clocksource when it is invariant

On Fri, Dec 09, 2022 at 02:32:15PM -0500, Boris Ostrovsky wrote:
>
> On 12/8/22 11:36 AM, Krister Johansen wrote:
> > + /*
> > + * As Dom0 is never moved, no penalty on using TSC there.
> > + *
> > + * If the guest has invariant tsc, then set xen_clocksource rating
> > + * below that of the tsc so that the system prefers tsc instead. This
> > + * check excludes PV domains, because PV is unable to guarantee that the
> > + * guest's cpuid call has been intercepted by the hypervisor.
> > + */
> > + if (xen_initial_domain()) {
> > xen_clocksource.rating = 275;
> > + } else if ((xen_hvm_domain() || xen_pvh_domain()) &&
> > + boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
> > + boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
> > + !check_tsc_unstable()) {
> > + xen_clocksource.rating = 299;
> > + }
>
>
> What if RDTSC is intercepted?

Right, thanks. I'll send out an updated patch here shortly that
attempts to address this by examining the cpuid information to determine
if the tsc is being emulated.

-K

2022-12-12 16:57:46

by Krister Johansen

[permalink] [raw]
Subject: [PATCH linux-next v2] x86/xen/time: prefer tsc as clocksource when it is invariant

Kvm elects to use tsc instead of kvm-clock when it can detect that the
TSC is invariant.

(As of commit 7539b174aef4 ("x86: kvmguest: use TSC clocksource if
invariant TSC is exposed")).

Notable cloud vendors[1] and performance engineers[2] recommend that Xen
users preferentially select tsc over xen-clocksource due the performance
penalty incurred by the latter. These articles are persuasive and
tailored to specific use cases. In order to understand the tradeoffs
around this choice more fully, this author had to reference the
documented[3] complexities around the Xen configuration, as well as the
kernel's clocksource selection algorithm. Many users may not attempt
this to correctly configure the right clock source in their guest.

The approach taken in the kvm-clock module spares users this confusion,
where possible.

Both the Intel SDM[4] and the Xen tsc documentation explain that marking
a tsc as invariant means that it should be considered stable by the OS
and is elibile to be used as a wall clock source. The Xen documentation
further clarifies that this is only reliable on HVM and PVH because PV
cannot intercept a cpuid instruction.

In order to obtain better out-of-the-box performance, and reduce the
need for user tuning, follow kvm's approach and decrease the xen clock
rating so that tsc is preferable, if it is invariant, stable, the
guest is a HVM or PVH domain, and the tsc is not emulated.

[1] https://aws.amazon.com/premiumsupport/knowledge-center/manage-ec2-linux-clock-source/
[2] https://www.brendangregg.com/blog/2021-09-26/the-speed-of-time.html
[3] https://xenbits.xen.org/docs/unstable/man/xen-tscmode.7.html
[4] Intel 64 and IA-32 Architectures Sofware Developer's Manual Volume
3b: System Programming Guide, Part 2, Section 17.17.1, Invariant TSC

Signed-off-by: Krister Johansen <[email protected]>
Code-reviewed-by: David Reaver <[email protected]>
---
v2:
- Use cpuid information to determine if tsc is emulated. Do not use tsc as
clocksource if it is. (feedback from Boris Ostrovsky)
- Move tsc checks into their own helper function
- Add defines for tsc cpuid flags needed by new helper function.
---
arch/x86/include/asm/xen/cpuid.h | 6 +++++
arch/x86/xen/time.c | 43 +++++++++++++++++++++++++++++++-
2 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/xen/cpuid.h b/arch/x86/include/asm/xen/cpuid.h
index 6daa9b0c8d11..d9d7432481e9 100644
--- a/arch/x86/include/asm/xen/cpuid.h
+++ b/arch/x86/include/asm/xen/cpuid.h
@@ -88,6 +88,12 @@
* EDX: shift amount for tsc->ns conversion
* Sub-leaf 2: EAX: host tsc frequency in kHz
*/
+#define XEN_CPUID_TSC_EMULATED (1u << 0)
+#define XEN_CPUID_HOST_TSC_RELIABLE (1u << 1)
+#define XEN_CPUID_RDTSCP_INSTR_AVAIL (1u << 2)
+#define XEN_CPUID_TSC_MODE_DEFAULT (0)
+#define XEN_CPUID_TSC_MODE_EMULATE (1u)
+#define XEN_CPUID_TSC_MODE_NOEMULATE (2u)

/*
* Leaf 5 (0x40000x04)
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index 9ef0a5cca96e..4100b1c3f38d 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -20,6 +20,7 @@
#include <asm/pvclock.h>
#include <asm/xen/hypervisor.h>
#include <asm/xen/hypercall.h>
+#include <asm/xen/cpuid.h>

#include <xen/events.h>
#include <xen/features.h>
@@ -474,15 +475,55 @@ static void xen_setup_vsyscall_time_info(void)
xen_clocksource.vdso_clock_mode = VDSO_CLOCKMODE_PVCLOCK;
}

+/*
+ * Check if it is possible to safely use the tsc as a clocksource. This is only
+ * true if the domain is HVM or PVH, the hypervisor notifies the guest that its
+ * tsc is invariant, and the tsc instruction is not going to be emulated.
+ */
+static int __init xen_tsc_safe_clocksource(void)
+{
+ u32 eax, ebx, ecx, edx;
+
+ if (!(xen_hvm_domain() || xen_pvh_domain()))
+ return 0;
+
+ if (!(boot_cpu_has(X86_FEATURE_CONSTANT_TSC)))
+ return 0;
+
+ if (!(boot_cpu_has(X86_FEATURE_NONSTOP_TSC)))
+ return 0;
+
+ if (check_tsc_unstable())
+ return 0;
+
+ cpuid(xen_cpuid_base() + 3, &eax, &ebx, &ecx, &edx);
+
+ if (eax & XEN_CPUID_TSC_EMULATED)
+ return 0;
+
+ if (ebx != XEN_CPUID_TSC_MODE_NOEMULATE)
+ return 0;
+
+ return 1;
+}
+
static void __init xen_time_init(void)
{
struct pvclock_vcpu_time_info *pvti;
int cpu = smp_processor_id();
struct timespec64 tp;

- /* As Dom0 is never moved, no penalty on using TSC there */
+ /*
+ * As Dom0 is never moved, no penalty on using TSC there.
+ *
+ * If it is possible for the guest to determine that the tsc is a safe
+ * clocksource, then set xen_clocksource rating below that of the tsc so
+ * that the system prefers tsc instead.
+ */
if (xen_initial_domain())
xen_clocksource.rating = 275;
+ else if (xen_tsc_safe_clocksource())
+ xen_clocksource.rating = 299;

clocksource_register_hz(&xen_clocksource, NSEC_PER_SEC);

--
2.25.1

2022-12-12 17:30:57

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH linux-next v2] x86/xen/time: prefer tsc as clocksource when it is invariant

On 12.12.2022 17:05, Krister Johansen wrote:
> Kvm elects to use tsc instead of kvm-clock when it can detect that the
> TSC is invariant.
>
> (As of commit 7539b174aef4 ("x86: kvmguest: use TSC clocksource if
> invariant TSC is exposed")).
>
> Notable cloud vendors[1] and performance engineers[2] recommend that Xen
> users preferentially select tsc over xen-clocksource due the performance
> penalty incurred by the latter. These articles are persuasive and
> tailored to specific use cases. In order to understand the tradeoffs
> around this choice more fully, this author had to reference the
> documented[3] complexities around the Xen configuration, as well as the
> kernel's clocksource selection algorithm. Many users may not attempt
> this to correctly configure the right clock source in their guest.
>
> The approach taken in the kvm-clock module spares users this confusion,
> where possible.
>
> Both the Intel SDM[4] and the Xen tsc documentation explain that marking
> a tsc as invariant means that it should be considered stable by the OS
> and is elibile to be used as a wall clock source. The Xen documentation
> further clarifies that this is only reliable on HVM and PVH because PV
> cannot intercept a cpuid instruction.

Without meaning to express a view on the argumentation as a whole, this
PV aspect is suspicious. Unless you open-code a use of the CPUID insn
in the kernel, all uses of CPUID are going to be processed by Xen by
virtue of the respective pvops hook. Documentation says what it says
for environments where this might not be the case.

> @@ -474,15 +475,55 @@ static void xen_setup_vsyscall_time_info(void)
> xen_clocksource.vdso_clock_mode = VDSO_CLOCKMODE_PVCLOCK;
> }
>
> +/*
> + * Check if it is possible to safely use the tsc as a clocksource. This is only
> + * true if the domain is HVM or PVH, the hypervisor notifies the guest that its
> + * tsc is invariant, and the tsc instruction is not going to be emulated.
> + */
> +static int __init xen_tsc_safe_clocksource(void)
> +{
> + u32 eax, ebx, ecx, edx;
> +
> + if (!(xen_hvm_domain() || xen_pvh_domain()))
> + return 0;
> +
> + if (!(boot_cpu_has(X86_FEATURE_CONSTANT_TSC)))
> + return 0;
> +
> + if (!(boot_cpu_has(X86_FEATURE_NONSTOP_TSC)))
> + return 0;
> +
> + if (check_tsc_unstable())
> + return 0;
> +
> + cpuid(xen_cpuid_base() + 3, &eax, &ebx, &ecx, &edx);

Xen leaf 3 has sub-leaves, so I think you need to set ecx to zero before
this call.

Jan

2022-12-12 18:57:52

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [PATCH linux-next v2] x86/xen/time: prefer tsc as clocksource when it is invariant


On 12/12/22 11:05 AM, Krister Johansen wrote:
>
> diff --git a/arch/x86/include/asm/xen/cpuid.h b/arch/x86/include/asm/xen/cpuid.h
> index 6daa9b0c8d11..d9d7432481e9 100644
> --- a/arch/x86/include/asm/xen/cpuid.h
> +++ b/arch/x86/include/asm/xen/cpuid.h
> @@ -88,6 +88,12 @@
> * EDX: shift amount for tsc->ns conversion
> * Sub-leaf 2: EAX: host tsc frequency in kHz
> */
> +#define XEN_CPUID_TSC_EMULATED (1u << 0)
> +#define XEN_CPUID_HOST_TSC_RELIABLE (1u << 1)
> +#define XEN_CPUID_RDTSCP_INSTR_AVAIL (1u << 2)
> +#define XEN_CPUID_TSC_MODE_DEFAULT (0)
> +#define XEN_CPUID_TSC_MODE_EMULATE (1u)
> +#define XEN_CPUID_TSC_MODE_NOEMULATE (2u)


This file is a copy of Xen public interface so this change should go to Xen first.


>
> +static int __init xen_tsc_safe_clocksource(void)
> +{
> + u32 eax, ebx, ecx, edx;
> +
> + if (!(xen_hvm_domain() || xen_pvh_domain()))
> + return 0;
> +
> + if (!(boot_cpu_has(X86_FEATURE_CONSTANT_TSC)))
> + return 0;
> +
> + if (!(boot_cpu_has(X86_FEATURE_NONSTOP_TSC)))
> + return 0;
> +
> + if (check_tsc_unstable())
> + return 0;
> +
> + cpuid(xen_cpuid_base() + 3, &eax, &ebx, &ecx, &edx);
> +
> + if (eax & XEN_CPUID_TSC_EMULATED)
> + return 0;
> +
> + if (ebx != XEN_CPUID_TSC_MODE_NOEMULATE)
> + return 0;


Why is the last test needed?


-boris

2022-12-12 22:28:16

by Krister Johansen

[permalink] [raw]
Subject: Re: [PATCH linux-next v2] x86/xen/time: prefer tsc as clocksource when it is invariant

On Mon, Dec 12, 2022 at 01:48:24PM -0500, Boris Ostrovsky wrote:
>
> On 12/12/22 11:05 AM, Krister Johansen wrote:
> >
> > diff --git a/arch/x86/include/asm/xen/cpuid.h b/arch/x86/include/asm/xen/cpuid.h
> > index 6daa9b0c8d11..d9d7432481e9 100644
> > --- a/arch/x86/include/asm/xen/cpuid.h
> > +++ b/arch/x86/include/asm/xen/cpuid.h
> > @@ -88,6 +88,12 @@
> > * EDX: shift amount for tsc->ns conversion
> > * Sub-leaf 2: EAX: host tsc frequency in kHz
> > */
> > +#define XEN_CPUID_TSC_EMULATED (1u << 0)
> > +#define XEN_CPUID_HOST_TSC_RELIABLE (1u << 1)
> > +#define XEN_CPUID_RDTSCP_INSTR_AVAIL (1u << 2)
> > +#define XEN_CPUID_TSC_MODE_DEFAULT (0)
> > +#define XEN_CPUID_TSC_MODE_EMULATE (1u)
> > +#define XEN_CPUID_TSC_MODE_NOEMULATE (2u)
>
> This file is a copy of Xen public interface so this change should go to Xen first.

Ok, should I split this into a separate patch on the linux side too?

> > +static int __init xen_tsc_safe_clocksource(void)
> > +{
> > + u32 eax, ebx, ecx, edx;
> > +
> > + if (!(xen_hvm_domain() || xen_pvh_domain()))
> > + return 0;
> > +
> > + if (!(boot_cpu_has(X86_FEATURE_CONSTANT_TSC)))
> > + return 0;
> > +
> > + if (!(boot_cpu_has(X86_FEATURE_NONSTOP_TSC)))
> > + return 0;
> > +
> > + if (check_tsc_unstable())
> > + return 0;
> > +
> > + cpuid(xen_cpuid_base() + 3, &eax, &ebx, &ecx, &edx);
> > +
> > + if (eax & XEN_CPUID_TSC_EMULATED)
> > + return 0;
> > +
> > + if (ebx != XEN_CPUID_TSC_MODE_NOEMULATE)
> > + return 0;
>
> Why is the last test needed?

I was under the impression that if the mode was 0 (default) it would be
possible for the tsc to become emulated in the future, perhaps after a
migration. The presence of the tsc_mode noemulate meant that we could
count on the falseneess of the XEN_CPUID_TSC_EMULATED check remaining
constant.

-K

2022-12-12 22:47:27

by Krister Johansen

[permalink] [raw]
Subject: Re: [PATCH linux-next v2] x86/xen/time: prefer tsc as clocksource when it is invariant

On Mon, Dec 12, 2022 at 05:46:29PM +0100, Jan Beulich wrote:
> On 12.12.2022 17:05, Krister Johansen wrote:
> > Both the Intel SDM[4] and the Xen tsc documentation explain that marking
> > a tsc as invariant means that it should be considered stable by the OS
> > and is elibile to be used as a wall clock source. The Xen documentation
> > further clarifies that this is only reliable on HVM and PVH because PV
> > cannot intercept a cpuid instruction.
>
> Without meaning to express a view on the argumentation as a whole, this
> PV aspect is suspicious. Unless you open-code a use of the CPUID insn
> in the kernel, all uses of CPUID are going to be processed by Xen by
> virtue of the respective pvops hook. Documentation says what it says
> for environments where this might not be the case.

Thanks, appreciate the clarification here. Just restating this for my
own understanding: your advice would be to drop this check below?

> > + if (!(xen_hvm_domain() || xen_pvh_domain()))
> > + return 0;

And then update the commit message to dispense with the distinction
between HVM, PV, and PVH?

> > + cpuid(xen_cpuid_base() + 3, &eax, &ebx, &ecx, &edx);
>
> Xen leaf 3 has sub-leaves, so I think you need to set ecx to zero before
> this call.

The cpuid() inline in arch/x86/include/asm/processor.h assigns zero to
ecx prior to calling __cpuid. In arch/x86/boot/cpuflags.c the macros
are a little different, but it looks like there too, the macro passes 0
as an input argument to cpuid_count which ends up being %ecx. Happy to
fix this up if I'm looking at the wrong cpuid functions, though.

-K

2022-12-13 07:38:48

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH linux-next v2] x86/xen/time: prefer tsc as clocksource when it is invariant

On 12.12.2022 23:05, Krister Johansen wrote:
> On Mon, Dec 12, 2022 at 05:46:29PM +0100, Jan Beulich wrote:
>> On 12.12.2022 17:05, Krister Johansen wrote:
>>> Both the Intel SDM[4] and the Xen tsc documentation explain that marking
>>> a tsc as invariant means that it should be considered stable by the OS
>>> and is elibile to be used as a wall clock source. The Xen documentation
>>> further clarifies that this is only reliable on HVM and PVH because PV
>>> cannot intercept a cpuid instruction.
>>
>> Without meaning to express a view on the argumentation as a whole, this
>> PV aspect is suspicious. Unless you open-code a use of the CPUID insn
>> in the kernel, all uses of CPUID are going to be processed by Xen by
>> virtue of the respective pvops hook. Documentation says what it says
>> for environments where this might not be the case.
>
> Thanks, appreciate the clarification here. Just restating this for my
> own understanding: your advice would be to drop this check below?

No, I'm unconvinced of if/where this transformation is really appropriate.
My comment was merely to indicate that the justification for ...

>>> + if (!(xen_hvm_domain() || xen_pvh_domain()))
>>> + return 0;

... this isn't really correct.

> And then update the commit message to dispense with the distinction
> between HVM, PV, and PVH?
>
>>> + cpuid(xen_cpuid_base() + 3, &eax, &ebx, &ecx, &edx);
>>
>> Xen leaf 3 has sub-leaves, so I think you need to set ecx to zero before
>> this call.
>
> The cpuid() inline in arch/x86/include/asm/processor.h assigns zero to
> ecx prior to calling __cpuid. In arch/x86/boot/cpuflags.c the macros
> are a little different, but it looks like there too, the macro passes 0
> as an input argument to cpuid_count which ends up being %ecx. Happy to
> fix this up if I'm looking at the wrong cpuid functions, though.

Oh, I didn't expect cpuid() to be more than a trivial wrapper around the
the pvops hook, and I merely looked at native_cpuid() and xen_cpuid().
I'm sorry for the noise then. Yet still, with there being sub-leaves, I'd
recommend switching to cpuid_count() just for clarity.

Jan

2022-12-13 20:18:41

by Krister Johansen

[permalink] [raw]
Subject: Re: [PATCH linux-next v2] x86/xen/time: prefer tsc as clocksource when it is invariant

On Tue, Dec 13, 2022 at 08:23:29AM +0100, Jan Beulich wrote:
> On 12.12.2022 23:05, Krister Johansen wrote:
> > On Mon, Dec 12, 2022 at 05:46:29PM +0100, Jan Beulich wrote:
> >> On 12.12.2022 17:05, Krister Johansen wrote:
> >>> Both the Intel SDM[4] and the Xen tsc documentation explain that marking
> >>> a tsc as invariant means that it should be considered stable by the OS
> >>> and is elibile to be used as a wall clock source. The Xen documentation
> >>> further clarifies that this is only reliable on HVM and PVH because PV
> >>> cannot intercept a cpuid instruction.
> >>
> >> Without meaning to express a view on the argumentation as a whole, this
> >> PV aspect is suspicious. Unless you open-code a use of the CPUID insn
> >> in the kernel, all uses of CPUID are going to be processed by Xen by
> >> virtue of the respective pvops hook. Documentation says what it says
> >> for environments where this might not be the case.
> >
> > Thanks, appreciate the clarification here. Just restating this for my
> > own understanding: your advice would be to drop this check below?
>
> No, I'm unconvinced of if/where this transformation is really appropriate.

Ah, I see. You're saying that you're not convinced that this code
should ever lower the priority of xen clocksource in favor of the tsc?
If so, are you willing to say a bit more about what you find to be
unconvincing?

In as much detail as I can muster: the impetus for the patch was that I
had a variety of different systems running as both kvm and xen guests.
Some of these guests had clocksource tunings in place as a result of
consulting the documentation linked in the patch. But others didn't.
On kvm they had somehow done the "right" (?) thing. Some systems had
tuning in place for xen, despite no longer being a xen guests. Other
systems were running on xen and didn't have the recommended tuning
applied. That's all sorted now, but it seemed like it might be nice to
eliminate the need for others to do this in future. With kvm doing
something similar, I thought there might be enough precedent to consider
this for xen guests.

> My comment was merely to indicate that the justification for ...
>
> >>> + if (!(xen_hvm_domain() || xen_pvh_domain()))
> >>> + return 0;
>
> ... this isn't really correct.

The rationale for this bit of code was the justification that turns
out to be incorrect. That sounds to me like I have unnessary code,
unless I was right by mistake?

> > And then update the commit message to dispense with the distinction
> > between HVM, PV, and PVH?
> >
> >>> + cpuid(xen_cpuid_base() + 3, &eax, &ebx, &ecx, &edx);
> >>
> >> Xen leaf 3 has sub-leaves, so I think you need to set ecx to zero before
> >> this call.
> >
> > The cpuid() inline in arch/x86/include/asm/processor.h assigns zero to
> > ecx prior to calling __cpuid. In arch/x86/boot/cpuflags.c the macros
> > are a little different, but it looks like there too, the macro passes 0
> > as an input argument to cpuid_count which ends up being %ecx. Happy to
> > fix this up if I'm looking at the wrong cpuid functions, though.
>
> Oh, I didn't expect cpuid() to be more than a trivial wrapper around the
> the pvops hook, and I merely looked at native_cpuid() and xen_cpuid().
> I'm sorry for the noise then. Yet still, with there being sub-leaves, I'd
> recommend switching to cpuid_count() just for clarity.

No apology necessary. I'm happy to modify this to use cpuid_count() for
clarity.

-K

2022-12-13 21:45:23

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [PATCH linux-next v2] x86/xen/time: prefer tsc as clocksource when it is invariant


On 12/12/22 5:09 PM, Krister Johansen wrote:
> On Mon, Dec 12, 2022 at 01:48:24PM -0500, Boris Ostrovsky wrote:
>> On 12/12/22 11:05 AM, Krister Johansen wrote:
>>> diff --git a/arch/x86/include/asm/xen/cpuid.h b/arch/x86/include/asm/xen/cpuid.h
>>> index 6daa9b0c8d11..d9d7432481e9 100644
>>> --- a/arch/x86/include/asm/xen/cpuid.h
>>> +++ b/arch/x86/include/asm/xen/cpuid.h
>>> @@ -88,6 +88,12 @@
>>> * EDX: shift amount for tsc->ns conversion
>>> * Sub-leaf 2: EAX: host tsc frequency in kHz
>>> */
>>> +#define XEN_CPUID_TSC_EMULATED (1u << 0)
>>> +#define XEN_CPUID_HOST_TSC_RELIABLE (1u << 1)
>>> +#define XEN_CPUID_RDTSCP_INSTR_AVAIL (1u << 2)
>>> +#define XEN_CPUID_TSC_MODE_DEFAULT (0)
>>> +#define XEN_CPUID_TSC_MODE_EMULATE (1u)
>>> +#define XEN_CPUID_TSC_MODE_NOEMULATE (2u)
>> This file is a copy of Xen public interface so this change should go to Xen first.
> Ok, should I split this into a separate patch on the linux side too?


Yes. Once the Xen patch has been accepted you will either submit the same patch for Linux or sync Linux file with Xen (if there are more differences).


>
>>> +static int __init xen_tsc_safe_clocksource(void)
>>> +{
>>> + u32 eax, ebx, ecx, edx;
>>> +
>>> + if (!(xen_hvm_domain() || xen_pvh_domain()))
>>> + return 0;
>>> +
>>> + if (!(boot_cpu_has(X86_FEATURE_CONSTANT_TSC)))
>>> + return 0;
>>> +
>>> + if (!(boot_cpu_has(X86_FEATURE_NONSTOP_TSC)))
>>> + return 0;
>>> +
>>> + if (check_tsc_unstable())
>>> + return 0;
>>> +
>>> + cpuid(xen_cpuid_base() + 3, &eax, &ebx, &ecx, &edx);
>>> +
>>> + if (eax & XEN_CPUID_TSC_EMULATED)
>>> + return 0;
>>> +
>>> + if (ebx != XEN_CPUID_TSC_MODE_NOEMULATE)
>>> + return 0;
>> Why is the last test needed?
> I was under the impression that if the mode was 0 (default) it would be
> possible for the tsc to become emulated in the future, perhaps after a
> migration. The presence of the tsc_mode noemulate meant that we could
> count on the falseneess of the XEN_CPUID_TSC_EMULATED check remaining
> constant.


This will filter out most modern processors with TSC scaling support where in default mode we don't intercept RDTCS after migration. But I don't think we have proper interface to determine this so we don't have much choice but to indeed make this check.


-boris

2022-12-14 08:28:47

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH linux-next v2] x86/xen/time: prefer tsc as clocksource when it is invariant

On 13.12.2022 19:58, Krister Johansen wrote:
> On Tue, Dec 13, 2022 at 08:23:29AM +0100, Jan Beulich wrote:
>> On 12.12.2022 23:05, Krister Johansen wrote:
>>> On Mon, Dec 12, 2022 at 05:46:29PM +0100, Jan Beulich wrote:
>>>> On 12.12.2022 17:05, Krister Johansen wrote:
>>>>> Both the Intel SDM[4] and the Xen tsc documentation explain that marking
>>>>> a tsc as invariant means that it should be considered stable by the OS
>>>>> and is elibile to be used as a wall clock source. The Xen documentation
>>>>> further clarifies that this is only reliable on HVM and PVH because PV
>>>>> cannot intercept a cpuid instruction.
>>>>
>>>> Without meaning to express a view on the argumentation as a whole, this
>>>> PV aspect is suspicious. Unless you open-code a use of the CPUID insn
>>>> in the kernel, all uses of CPUID are going to be processed by Xen by
>>>> virtue of the respective pvops hook. Documentation says what it says
>>>> for environments where this might not be the case.
>>>
>>> Thanks, appreciate the clarification here. Just restating this for my
>>> own understanding: your advice would be to drop this check below?
>>
>> No, I'm unconvinced of if/where this transformation is really appropriate.
>
> Ah, I see. You're saying that you're not convinced that this code
> should ever lower the priority of xen clocksource in favor of the tsc?
> If so, are you willing to say a bit more about what you find to be
> unconvincing?

With the not-for-PV justification not really applicable, the main question
is how else you mean to justify that aspect. Once limited back to HVM/PVH,
it may all be okay.

>> My comment was merely to indicate that the justification for ...
>>
>>>>> + if (!(xen_hvm_domain() || xen_pvh_domain()))
>>>>> + return 0;
>>
>> ... this isn't really correct.
>
> The rationale for this bit of code was the justification that turns
> out to be incorrect. That sounds to me like I have unnessary code,
> unless I was right by mistake?

The PV clock interface was specifically introduced because the TSC could
not be reliably used by PV domains. This may have been purely due to
limitations of the TSC at the time, so taking into account more modern
stability guarantees may make it okay to be used by PV as well. But
migration needs to be considered, and validity (for PV) of the deriving
of the two synthetic feature bits you use also needs to be checked (I
find X86_FEATURE_NONSTOP_TSC particularly interesting, because PV domains
don't really have any notion of "C states"). Note that e.g.
early_init_intel() derives the two bits from CPUID[80000007].EDX[8],
which is an opt-in feature for all guest types as per the present CPUID
policy logic in the hypervisor, but then goes on and sets
X86_FEATURE_NONSTOP_TSC_S3 (which you don't use in your patch, so just to
point out a possible pitfall) purely based on family/model information.

Jan

2022-12-14 18:10:31

by Krister Johansen

[permalink] [raw]
Subject: Re: [PATCH linux-next v2] x86/xen/time: prefer tsc as clocksource when it is invariant

On Tue, Dec 13, 2022 at 04:25:32PM -0500, Boris Ostrovsky wrote:
>
> On 12/12/22 5:09 PM, Krister Johansen wrote:
> > On Mon, Dec 12, 2022 at 01:48:24PM -0500, Boris Ostrovsky wrote:
> > > On 12/12/22 11:05 AM, Krister Johansen wrote:
> > > > diff --git a/arch/x86/include/asm/xen/cpuid.h b/arch/x86/include/asm/xen/cpuid.h
> > > > index 6daa9b0c8d11..d9d7432481e9 100644
> > > > --- a/arch/x86/include/asm/xen/cpuid.h
> > > > +++ b/arch/x86/include/asm/xen/cpuid.h
> > > > @@ -88,6 +88,12 @@
> > > > * EDX: shift amount for tsc->ns conversion
> > > > * Sub-leaf 2: EAX: host tsc frequency in kHz
> > > > */
> > > > +#define XEN_CPUID_TSC_EMULATED (1u << 0)
> > > > +#define XEN_CPUID_HOST_TSC_RELIABLE (1u << 1)
> > > > +#define XEN_CPUID_RDTSCP_INSTR_AVAIL (1u << 2)
> > > > +#define XEN_CPUID_TSC_MODE_DEFAULT (0)
> > > > +#define XEN_CPUID_TSC_MODE_EMULATE (1u)
> > > > +#define XEN_CPUID_TSC_MODE_NOEMULATE (2u)
> > > This file is a copy of Xen public interface so this change should go to Xen first.
> > Ok, should I split this into a separate patch on the linux side too?
>
> Yes. Once the Xen patch has been accepted you will either submit the same patch for Linux or sync Linux file with Xen (if there are more differences).

Thanks. Based upon the feedback I received from you and Jan, I may try
to shrink the check in xen_tsc_safe_clocksource() down a bit. In that
case, I may only need to refer to a single field in the leaf that
provides this information. In that case, are you alright with dropping
the change to the header and referring to the value directly, or would
you prefer that I proceed with adding these to the public API?

> > > > +static int __init xen_tsc_safe_clocksource(void)
> > > > +{
> > > > + u32 eax, ebx, ecx, edx;
> > > > +
> > > > + if (!(xen_hvm_domain() || xen_pvh_domain()))
> > > > + return 0;
> > > > +
> > > > + if (!(boot_cpu_has(X86_FEATURE_CONSTANT_TSC)))
> > > > + return 0;
> > > > +
> > > > + if (!(boot_cpu_has(X86_FEATURE_NONSTOP_TSC)))
> > > > + return 0;
> > > > +
> > > > + if (check_tsc_unstable())
> > > > + return 0;
> > > > +
> > > > + cpuid(xen_cpuid_base() + 3, &eax, &ebx, &ecx, &edx);
> > > > +
> > > > + if (eax & XEN_CPUID_TSC_EMULATED)
> > > > + return 0;
> > > > +
> > > > + if (ebx != XEN_CPUID_TSC_MODE_NOEMULATE)
> > > > + return 0;
> > > Why is the last test needed?
> > I was under the impression that if the mode was 0 (default) it would be
> > possible for the tsc to become emulated in the future, perhaps after a
> > migration. The presence of the tsc_mode noemulate meant that we could
> > count on the falseneess of the XEN_CPUID_TSC_EMULATED check remaining
> > constant.
>
> This will filter out most modern processors with TSC scaling support where in default mode we don't intercept RDTCS after migration. But I don't think we have proper interface to determine this so we don't have much choice but to indeed make this check.

Yes, if this remains a single boot-time check, I'm not sure that knowing
whether the processor supports tsc scaling helps us. If tsc_mode is
default, there's always a possibility of the tsc becoming emulated later
on as part of migration, correct?

The other thing that might be possible here is to add a background
timer that periodically checks if the tsc is still not emulated, and if
it suddenly becomes so, change the rating again to prefer the xen
clocksource. I had written this off initially as an impractical
solution, since it seemed like a lot more mechanism and because it meant
the performance characteristics of the system would change without user
intervention. However, if this seems like a good idea, I'm not opposed
to giving it a try.

-K

2022-12-14 18:10:30

by Krister Johansen

[permalink] [raw]
Subject: Re: [PATCH linux-next v2] x86/xen/time: prefer tsc as clocksource when it is invariant

On Wed, Dec 14, 2022 at 09:17:24AM +0100, Jan Beulich wrote:
> On 13.12.2022 19:58, Krister Johansen wrote:
> > On Tue, Dec 13, 2022 at 08:23:29AM +0100, Jan Beulich wrote:
> >> On 12.12.2022 23:05, Krister Johansen wrote:
> >>> On Mon, Dec 12, 2022 at 05:46:29PM +0100, Jan Beulich wrote:
> >>>> On 12.12.2022 17:05, Krister Johansen wrote:
> >>>>> Both the Intel SDM[4] and the Xen tsc documentation explain that marking
> >>>>> a tsc as invariant means that it should be considered stable by the OS
> >>>>> and is elibile to be used as a wall clock source. The Xen documentation
> >>>>> further clarifies that this is only reliable on HVM and PVH because PV
> >>>>> cannot intercept a cpuid instruction.
> >>>>
> >>>> Without meaning to express a view on the argumentation as a whole, this
> >>>> PV aspect is suspicious. Unless you open-code a use of the CPUID insn
> >>>> in the kernel, all uses of CPUID are going to be processed by Xen by
> >>>> virtue of the respective pvops hook. Documentation says what it says
> >>>> for environments where this might not be the case.
> >>>
> >>> Thanks, appreciate the clarification here. Just restating this for my
> >>> own understanding: your advice would be to drop this check below?
> >>
> >> No, I'm unconvinced of if/where this transformation is really appropriate.
> >
> > Ah, I see. You're saying that you're not convinced that this code
> > should ever lower the priority of xen clocksource in favor of the tsc?
> > If so, are you willing to say a bit more about what you find to be
> > unconvincing?
>
> With the not-for-PV justification not really applicable, the main question
> is how else you mean to justify that aspect. Once limited back to HVM/PVH,
> it may all be okay.

Got it. I think I can provide an answer for this.

> >> My comment was merely to indicate that the justification for ...
> >>
> >>>>> + if (!(xen_hvm_domain() || xen_pvh_domain()))
> >>>>> + return 0;
> >>
> >> ... this isn't really correct.
> >
> > The rationale for this bit of code was the justification that turns
> > out to be incorrect. That sounds to me like I have unnessary code,
> > unless I was right by mistake?
>
> The PV clock interface was specifically introduced because the TSC could
> not be reliably used by PV domains. This may have been purely due to
> limitations of the TSC at the time, so taking into account more modern
> stability guarantees may make it okay to be used by PV as well. But
> migration needs to be considered, and validity (for PV) of the deriving
> of the two synthetic feature bits you use also needs to be checked

The current incarnation of the patch confines itself to just cases where
tsc_mode is never emulate. In mode default, a non-virtualized tsc is
only selected if the host tsc is safe (this is also derived from the
CPUID[80000007].EDX[8] -- tsc invariant bit), and the domain is either
hvm with tsc scaling support, or a domain where the guest tsc frequency
matches the host tsc frequency.

For the PV case, or any case I think, the administrator would have had
to take an explicit action to ensure that no virutalize is enabled on
the tsc, and the underlying feature flags would have either been
inherited from the base cpu, or explicitly added by intervention. I
presume that we'd want to honor these settings if they were enabled
manually?

I may once again be misreading the documentation, but my interpretation
of the requirements around migration was that if you set no-emulate on
the tsc it was up to you to ensure that the target system for the
migration either had the same tsc frequency, or supported scaling.

> (I find X86_FEATURE_NONSTOP_TSC particularly interesting, because PV domains
> don't really have any notion of "C states"). Note that e.g.
> early_init_intel() derives the two bits from CPUID[80000007].EDX[8],
> which is an opt-in feature for all guest types as per the present CPUID
> policy logic in the hypervisor, but then goes on and sets
> X86_FEATURE_NONSTOP_TSC_S3 (which you don't use in your patch, so just to
> point out a possible pitfall) purely based on family/model information.

Thanks for pointing this out. It looks like it's specific to Intel Atom
systems that have a tsc that does not stop during suspend. If this is
set, then the clocksource is eligible to be used to for suspend timing.
Since we don't set this, it looks like there's no behavior change. E.g.
the tsc remains ineligible for consideration as a suspend clocksource.

-K


2022-12-14 22:08:13

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [PATCH linux-next v2] x86/xen/time: prefer tsc as clocksource when it is invariant


On 12/14/22 1:01 PM, Krister Johansen wrote:
> On Tue, Dec 13, 2022 at 04:25:32PM -0500, Boris Ostrovsky wrote:
>> On 12/12/22 5:09 PM, Krister Johansen wrote:
>>> On Mon, Dec 12, 2022 at 01:48:24PM -0500, Boris Ostrovsky wrote:
>>>> On 12/12/22 11:05 AM, Krister Johansen wrote:
>>>>> diff --git a/arch/x86/include/asm/xen/cpuid.h b/arch/x86/include/asm/xen/cpuid.h
>>>>> index 6daa9b0c8d11..d9d7432481e9 100644
>>>>> --- a/arch/x86/include/asm/xen/cpuid.h
>>>>> +++ b/arch/x86/include/asm/xen/cpuid.h
>>>>> @@ -88,6 +88,12 @@
>>>>> * EDX: shift amount for tsc->ns conversion
>>>>> * Sub-leaf 2: EAX: host tsc frequency in kHz
>>>>> */
>>>>> +#define XEN_CPUID_TSC_EMULATED (1u << 0)
>>>>> +#define XEN_CPUID_HOST_TSC_RELIABLE (1u << 1)
>>>>> +#define XEN_CPUID_RDTSCP_INSTR_AVAIL (1u << 2)
>>>>> +#define XEN_CPUID_TSC_MODE_DEFAULT (0)
>>>>> +#define XEN_CPUID_TSC_MODE_EMULATE (1u)
>>>>> +#define XEN_CPUID_TSC_MODE_NOEMULATE (2u)
>>>> This file is a copy of Xen public interface so this change should go to Xen first.
>>> Ok, should I split this into a separate patch on the linux side too?
>> Yes. Once the Xen patch has been accepted you will either submit the same patch for Linux or sync Linux file with Xen (if there are more differences).
> Thanks. Based upon the feedback I received from you and Jan, I may try
> to shrink the check in xen_tsc_safe_clocksource() down a bit. In that
> case, I may only need to refer to a single field in the leaf that
> provides this information. In that case, are you alright with dropping
> the change to the header and referring to the value directly, or would
> you prefer that I proceed with adding these to the public API?


It would certainly be appreciated if you updated the header files but it's up to maintainers to decide whether it's required.


>>>>> +static int __init xen_tsc_safe_clocksource(void)
>>>>> +{
>>>>> + u32 eax, ebx, ecx, edx;
>>>>> +
>>>>> + if (!(xen_hvm_domain() || xen_pvh_domain()))
>>>>> + return 0;
>>>>> +
>>>>> + if (!(boot_cpu_has(X86_FEATURE_CONSTANT_TSC)))
>>>>> + return 0;
>>>>> +
>>>>> + if (!(boot_cpu_has(X86_FEATURE_NONSTOP_TSC)))
>>>>> + return 0;
>>>>> +
>>>>> + if (check_tsc_unstable())
>>>>> + return 0;
>>>>> +
>>>>> + cpuid(xen_cpuid_base() + 3, &eax, &ebx, &ecx, &edx);
>>>>> +
>>>>> + if (eax & XEN_CPUID_TSC_EMULATED)
>>>>> + return 0;
>>>>> +
>>>>> + if (ebx != XEN_CPUID_TSC_MODE_NOEMULATE)
>>>>> + return 0;
>>>> Why is the last test needed?
>>> I was under the impression that if the mode was 0 (default) it would be
>>> possible for the tsc to become emulated in the future, perhaps after a
>>> migration. The presence of the tsc_mode noemulate meant that we could
>>> count on the falseneess of the XEN_CPUID_TSC_EMULATED check remaining
>>> constant.
>> This will filter out most modern processors with TSC scaling support where in default mode we don't intercept RDTCS after migration. But I don't think we have proper interface to determine this so we don't have much choice but to indeed make this check.
> Yes, if this remains a single boot-time check, I'm not sure that knowing
> whether the processor supports tsc scaling helps us. If tsc_mode is
> default, there's always a possibility of the tsc becoming emulated later
> on as part of migration, correct?


If the processor supports TSC scaling I don't think it's possible (it can happen in theory) but if it doesn't and you migrate to a CPU running at different frequency then yes, hypervisor will start emulating RDTSC.


>
> The other thing that might be possible here is to add a background
> timer that periodically checks if the tsc is still not emulated, and if
> it suddenly becomes so, change the rating again to prefer the xen
> clocksource. I had written this off initially as an impractical
> solution, since it seemed like a lot more mechanism and because it meant
> the performance characteristics of the system would change without user
> intervention. However, if this seems like a good idea, I'm not opposed
> to giving it a try.


I don't think we should do it. Having the kernel suddenly change clocksource will probably be somewhat of a surprise to users.


-boris

2022-12-16 16:47:50

by Krister Johansen

[permalink] [raw]
Subject: Re: [PATCH linux-next v2] x86/xen/time: prefer tsc as clocksource when it is invariant

On Wed, Dec 14, 2022 at 04:46:10PM -0500, Boris Ostrovsky wrote:
>
> On 12/14/22 1:01 PM, Krister Johansen wrote:
> > On Tue, Dec 13, 2022 at 04:25:32PM -0500, Boris Ostrovsky wrote:
> > > On 12/12/22 5:09 PM, Krister Johansen wrote:
> > > > On Mon, Dec 12, 2022 at 01:48:24PM -0500, Boris Ostrovsky wrote:
> > > > > On 12/12/22 11:05 AM, Krister Johansen wrote:
> > > > > > diff --git a/arch/x86/include/asm/xen/cpuid.h b/arch/x86/include/asm/xen/cpuid.h
> > > > > > index 6daa9b0c8d11..d9d7432481e9 100644
> > > > > > --- a/arch/x86/include/asm/xen/cpuid.h
> > > > > > +++ b/arch/x86/include/asm/xen/cpuid.h
> > > > > > @@ -88,6 +88,12 @@
> > > > > > * EDX: shift amount for tsc->ns conversion
> > > > > > * Sub-leaf 2: EAX: host tsc frequency in kHz
> > > > > > */
> > > > > > +#define XEN_CPUID_TSC_EMULATED (1u << 0)
> > > > > > +#define XEN_CPUID_HOST_TSC_RELIABLE (1u << 1)
> > > > > > +#define XEN_CPUID_RDTSCP_INSTR_AVAIL (1u << 2)
> > > > > > +#define XEN_CPUID_TSC_MODE_DEFAULT (0)
> > > > > > +#define XEN_CPUID_TSC_MODE_EMULATE (1u)
> > > > > > +#define XEN_CPUID_TSC_MODE_NOEMULATE (2u)
> > > > > This file is a copy of Xen public interface so this change should go to Xen first.
> > > > Ok, should I split this into a separate patch on the linux side too?
> > > Yes. Once the Xen patch has been accepted you will either submit the same patch for Linux or sync Linux file with Xen (if there are more differences).
> > Thanks. Based upon the feedback I received from you and Jan, I may try
> > to shrink the check in xen_tsc_safe_clocksource() down a bit. In that
> > case, I may only need to refer to a single field in the leaf that
> > provides this information. In that case, are you alright with dropping
> > the change to the header and referring to the value directly, or would
> > you prefer that I proceed with adding these to the public API?
>
> It would certainly be appreciated if you updated the header files but it's up to maintainers to decide whether it's required.

Sure, I'm just trying to avoid generating extra work for the maintainers
if this patch isn't likely to make it in. I'm cutting a v3 that doesn't
reference the header. If it's acceptable, and this looks otherwise
unobjectionable, then I'll go ahead and put together the pieces for the
public API, if that's still a desireable change.

> > > > > > +static int __init xen_tsc_safe_clocksource(void)
> > > > > > +{
> > > > > > + u32 eax, ebx, ecx, edx;
> > > > > > +
> > > > > > + if (!(xen_hvm_domain() || xen_pvh_domain()))
> > > > > > + return 0;
> > > > > > +
> > > > > > + if (!(boot_cpu_has(X86_FEATURE_CONSTANT_TSC)))
> > > > > > + return 0;
> > > > > > +
> > > > > > + if (!(boot_cpu_has(X86_FEATURE_NONSTOP_TSC)))
> > > > > > + return 0;
> > > > > > +
> > > > > > + if (check_tsc_unstable())
> > > > > > + return 0;
> > > > > > +
> > > > > > + cpuid(xen_cpuid_base() + 3, &eax, &ebx, &ecx, &edx);
> > > > > > +
> > > > > > + if (eax & XEN_CPUID_TSC_EMULATED)
> > > > > > + return 0;
> > > > > > +
> > > > > > + if (ebx != XEN_CPUID_TSC_MODE_NOEMULATE)
> > > > > > + return 0;
> > > > > Why is the last test needed?
> > > > I was under the impression that if the mode was 0 (default) it would be
> > > > possible for the tsc to become emulated in the future, perhaps after a
> > > > migration. The presence of the tsc_mode noemulate meant that we could
> > > > count on the falseneess of the XEN_CPUID_TSC_EMULATED check remaining
> > > > constant.
> > > This will filter out most modern processors with TSC scaling support where in default mode we don't intercept RDTCS after migration. But I don't think we have proper interface to determine this so we don't have much choice but to indeed make this check.
> > Yes, if this remains a single boot-time check, I'm not sure that knowing
> > whether the processor supports tsc scaling helps us. If tsc_mode is
> > default, there's always a possibility of the tsc becoming emulated later
> > on as part of migration, correct?
>
> If the processor supports TSC scaling I don't think it's possible (it can happen in theory) but if it doesn't and you migrate to a CPU running at different frequency then yes, hypervisor will start emulating RDTSC.

Yes, I wondered whether it's reasonable to expect that users migrate
between hardware that is pretty similar, or if this case of moving from
a CPU that supports tsc scaling to one that doesn't is likely to happen
in practice.

-K