Changes from v4:
- v4: https://lore.kernel.org/lkml/[email protected]/
- Rename capability from KVM_CAP_X86_APIC_BUS_FREQUENCY to
KVM_CAP_X86_APIC_BUS_CYCLES_NS. (Xiaoyao Li).
- Include "Testing" section in cover letter.
- Add Rick's Reviewed-by tags.
- Rebased on latest of "next" branch of https://github.com/kvm-x86/linux.git
Changes from v3:
- v3: https://lore.kernel.org/all/[email protected]/
- Reworked all changelogs.
- Regarding code changes: patches #1 and #2 are unchanged, patch #3 was
reworked according to Sean's guidance, and patch #4 (the test)
needed changes after rebase to kvm-x86/next and the implementation
(patch #3) changes.
- Added Reviewed-by tags to patches #1, #2, and #4, but removed
Maxim's Reviewed-by tag from patch #3 because it changed so much.
- Added a "Suggested-by" to patch #4 to reflect that it represents
Sean's guidance.
- Reworked cover to match customs (in subject line) and reflect feedback
to patches: capability renamed to KVM_CAP_X86_APIC_BUS_FREQUENCY, clarify
distinction between "core crystal clock" and "bus clock", and update
pro/con list.
- Please refer to individual patches for detailed changes.
Changes from v2:
- v2: https://lore.kernel.org/lkml/[email protected]/
- Removed APIC_BUS_FREQUENCY and apic_bus_frequency of struct kvm-arch.
- Update the commit messages.
- Added reviewed-by (Maxim Levitsky)
- Use 1.5GHz instead of 1GHz as frequency for the test case.
Changes from v1:
- v1: https://lore.kernel.org/all/[email protected]/
- Added a test case
- Fix a build error for i386 platform
- Add check if vcpu isn't created.
- Add check if lapic chip is in-kernel emulation.
- Updated api.rst
Summary
-------
Add KVM_CAP_X86_APIC_BUS_CYCLES_NS capability to configure the APIC
bus clock frequency for APIC timer emulation.
Allow KVM_ENABLE_CAPABILITY(KVM_CAP_X86_APIC_BUS_CYCLES_NS) to set the
frequency in nanoseconds. When using this capability, the user space
VMM should configure CPUID leaf 0x15 to advertise the frequency.
Description
-----------
Vishal reported [1] that the TDX guest kernel expects a 25MHz APIC bus
frequency but ends up getting interrupts at a significantly higher rate.
The TDX architecture hard-codes the core crystal clock frequency to
25MHz and mandates exposing it via CPUID leaf 0x15. The TDX architecture
does not allow the VMM to override the value.
In addition, per Intel SDM:
"The APIC timer frequency will be the processor’s bus clock or core
crystal clock frequency (when TSC/core crystal clock ratio is
enumerated in CPUID leaf 0x15) divided by the value specified in
the divide configuration register."
The resulting 25MHz APIC bus frequency conflicts with the KVM hardcoded
APIC bus frequency of 1GHz.
The KVM doesn't enumerate CPUID leaf 0x15 to the guest unless the user
space VMM sets it using KVM_SET_CPUID. If the CPUID leaf 0x15 is
enumerated, the guest kernel uses it as the APIC bus frequency. If not,
the guest kernel measures the frequency based on other known timers like
the ACPI timer or the legacy PIT. As reported in [1] the TDX guest kernel
expects a 25MHz timer frequency but gets timer interrupt more frequently
due to the 1GHz frequency used by KVM.
To ensure that the guest doesn't have a conflicting view of the APIC bus
frequency, allow the userspace to tell KVM to use the same frequency that
TDX mandates instead of the default 1Ghz.
There are several options to address this:
1. Make the KVM able to configure APIC bus frequency (this series).
Pro: It resembles the existing hardware. The recent Intel CPUs
adopts 25MHz.
Con: Require the VMM to emulate the APIC timer at 25MHz.
2. Make the TDX architecture enumerate CPUID leaf 0x15 to configurable
frequency or not enumerate it.
Pro: Any APIC bus frequency is allowed.
Con: Deviates from TDX architecture.
3. Make the TDX guest kernel use 1GHz when it's running on KVM.
Con: The kernel ignores CPUID leaf 0x15.
4. Change CPUID leaf 0x15 under TDX to report the crystal clock frequency
as 1 GHz.
Pro: This has been the virtual APIC frequency for KVM guests for 13
years.
Pro: This requires changing only one hard-coded constant in TDX.
Con: It doesn't work with other VMMs as TDX isn't specific to KVM.
Con: Core crystal clock frequency is also used to calculate TSC frequency.
Con: If it is configured to value different from hardware, it will
break the correctness of INTEL-PT Mini Time Count (MTC) packets
in TDs.
Testing
-------
Tested on non-TDX host using included kselftest. Host running kernel
with this series applied to "next" branch of
https://github.com/kvm-x86/linux.git
Tested on TDX host and TD using a modified version
of x86/apic.c:test_apic_timer_one_shot() available from
https://github.com/intel/kvm-unit-tests-tdx/blob/tdx/x86/apic.c.
Host running TDX KVM development patches and QEMU with corresponding TDX
changes.
The test needed to be modified to (a) stop any lingering timers before the
test starts, and (b) use CPUID 0x15 in TDX to accurately determine the TSC
and APIC frequencies instead of making 1GHz assumption and use similar
check as the kselftest test introduced in this series (while increasing
the amount with which the frequency is allowed to deviate by 1%).
The core changes made to x86/apic.c:test_apic_timer_one_shot() for this
testing are shown below for reference. Work is in progress to upstream
these modifications.
@@ -477,11 +478,29 @@ static void lvtt_handler(isr_regs_t *regs)
static void test_apic_timer_one_shot(void)
{
- uint64_t tsc1, tsc2;
static const uint32_t interval = 0x10000;
+ uint64_t measured_apic_freq, tsc2, tsc1;
+ uint32_t tsc_freq = 0, apic_freq = 0;
+ struct cpuid cpuid_tsc = {};
#define APIC_LVT_TIMER_VECTOR (0xee)
+ /*
+ * CPUID 0x15 is not available in VMX, can use it to obtain
+ * TSC and APIC frequency for accurate testing
+ */
+ if (is_tdx_guest()) {
+ cpuid_tsc = raw_cpuid(0x15, 0);
+ tsc_freq = cpuid_tsc.c * cpuid_tsc.b / cpuid_tsc.a;
+ apic_freq = cpuid_tsc.c;
+ }
+ /*
+ stop already fired local timer
+ the test case can be negative failure if the timer fired
+ after installed lvtt_handler but *before*
+ write to TIMICT again.
+ */
+ apic_write(APIC_TMICT, 0);
handle_irq(APIC_LVT_TIMER_VECTOR, lvtt_handler);
/* One shot mode */
@@ -503,8 +522,16 @@ static void test_apic_timer_one_shot(void)
* cases, the following should satisfy on all modern
* processors.
*/
- report((lvtt_counter == 1) && (tsc2 - tsc1 >= interval),
- "APIC LVT timer one shot");
+ if (is_tdx_guest()) {
+ measured_apic_freq = interval * (tsc_freq / (tsc2 - tsc1));
+ report((lvtt_counter == 1) &&
+ (measured_apic_freq < apic_freq * 102 / 100) &&
+ (measured_apic_freq > apic_freq * 98 / 100),
+ "APIC LVT timer one shot");
+ } else {
+ report((lvtt_counter == 1) && (tsc2 - tsc1 >= interval),
+ "APIC LVT timer one shot");
+ }
}
[1] https://lore.kernel.org/lkml/[email protected]/
Isaku Yamahata (4):
KVM: x86: hyper-v: Calculate APIC bus frequency for Hyper-V
KVM: x86: Make nsec per APIC bus cycle a VM variable
KVM: x86: Add a capability to configure bus frequency for APIC timer
KVM: selftests: Add test for configure of x86 APIC bus frequency
Documentation/virt/kvm/api.rst | 17 ++
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/hyperv.c | 3 +-
arch/x86/kvm/lapic.c | 6 +-
arch/x86/kvm/lapic.h | 3 +-
arch/x86/kvm/x86.c | 28 +++
include/uapi/linux/kvm.h | 1 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/x86_64/apic.h | 7 +
.../kvm/x86_64/apic_bus_clock_test.c | 166 ++++++++++++++++++
10 files changed, 228 insertions(+), 5 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
base-commit: 7b076c6a308ec5bce9fc96e2935443ed228b9148
--
2.34.1
From: Isaku Yamahata <[email protected]>
Remove APIC_BUS_FREQUENCY and calculate it based on nanoseconds per APIC
bus cycle. APIC_BUS_FREQUENCY is used only for HV_X64_MSR_APIC_FREQUENCY.
The MSR is not frequently read, calculate it every time.
There are two constants related to the APIC bus frequency:
APIC_BUS_FREQUENCY and APIC_BUS_CYCLE_NS.
Only one value is required because one can be calculated from the other:
APIC_BUS_CYCLES_NS = 1000 * 1000 * 1000 / APIC_BUS_FREQUENCY.
Remove APIC_BUS_FREQUENCY and instead calculate it when needed.
This prepares for support of configurable APIC bus frequency by
requiring to change only a single variable.
Suggested-by: Maxim Levitsky <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Reviewed-by: Xiaoyao Li <[email protected]>
Reviewed-by: Rick Edgecombe <[email protected]>
[reinette: rework changelog]
Signed-off-by: Reinette Chatre <[email protected]>
---
Changes v5:
- Add Rick's Reviewed-by tag.
Changes v4:
- Modify subject to match custom: "KVM: x86/hyperv:" ->
"KVM: x86: hyper-v:" and use standard capitalization for Hyper-V
("hyper-v" -> "Hyper-V").
- Rework changelog to remove pronouns ("we").
- Change logic in commit log: "APIC bus cycles per NS" -> "nanoseconds
per APIC bus cycle".
- Fix typos. (Maxim and Xiaoyao)
- Add Maxim and Xiaoyao's Reviewed-by tags.
- Add Maxim's "Suggested-by".
Changes v3:
- Newly added according to Maxim Levitsky suggestion.
arch/x86/kvm/hyperv.c | 2 +-
arch/x86/kvm/lapic.h | 1 -
2 files changed, 1 insertion(+), 2 deletions(-)
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 8a47f8541eab..1030701db967 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1737,7 +1737,7 @@ static int kvm_hv_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata,
data = (u64)vcpu->arch.virtual_tsc_khz * 1000;
break;
case HV_X64_MSR_APIC_FREQUENCY:
- data = APIC_BUS_FREQUENCY;
+ data = div64_u64(1000000000ULL, APIC_BUS_CYCLE_NS);
break;
default:
kvm_pr_unimpl_rdmsr(vcpu, msr);
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 0a0ea4b5dd8c..a20cb006b6c8 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -17,7 +17,6 @@
#define APIC_DEST_MASK 0x800
#define APIC_BUS_CYCLE_NS 1
-#define APIC_BUS_FREQUENCY (1000000000ULL / APIC_BUS_CYCLE_NS)
#define APIC_BROADCAST 0xFF
#define X2APIC_BROADCAST 0xFFFFFFFFul
--
2.34.1
From: Isaku Yamahata <[email protected]>
Add KVM_CAP_X86_APIC_BUS_CYCLES_NS capability to configure the APIC
bus clock frequency for APIC timer emulation.
Allow KVM_ENABLE_CAPABILITY(KVM_CAP_X86_APIC_BUS_CYCLES_NS) to set the
frequency in nanoseconds. When using this capability, the user space
VMM should configure CPUID leaf 0x15 to advertise the frequency.
Vishal reported that the TDX guest kernel expects a 25MHz APIC bus
frequency but ends up getting interrupts at a significantly higher rate.
The TDX architecture hard-codes the core crystal clock frequency to
25MHz and mandates exposing it via CPUID leaf 0x15. The TDX architecture
does not allow the VMM to override the value.
In addition, per Intel SDM:
"The APIC timer frequency will be the processor’s bus clock or core
crystal clock frequency (when TSC/core crystal clock ratio is
enumerated in CPUID leaf 0x15) divided by the value specified in
the divide configuration register."
The resulting 25MHz APIC bus frequency conflicts with the KVM hardcoded
APIC bus frequency of 1GHz.
The KVM doesn't enumerate CPUID leaf 0x15 to the guest unless the user
space VMM sets it using KVM_SET_CPUID. If the CPUID leaf 0x15 is
enumerated, the guest kernel uses it as the APIC bus frequency. If not,
the guest kernel measures the frequency based on other known timers like
the ACPI timer or the legacy PIT. As reported by Vishal the TDX guest
kernel expects a 25MHz timer frequency but gets timer interrupt more
frequently due to the 1GHz frequency used by KVM.
To ensure that the guest doesn't have a conflicting view of the APIC bus
frequency, allow the userspace to tell KVM to use the same frequency that
TDX mandates instead of the default 1Ghz.
There are several options to address this:
1. Make the KVM able to configure APIC bus frequency (this series).
Pro: It resembles the existing hardware. The recent Intel CPUs
adapts 25MHz.
Con: Require the VMM to emulate the APIC timer at 25MHz.
2. Make the TDX architecture enumerate CPUID leaf 0x15 to configurable
frequency or not enumerate it.
Pro: Any APIC bus frequency is allowed.
Con: Deviates from TDX architecture.
3. Make the TDX guest kernel use 1GHz when it's running on KVM.
Con: The kernel ignores CPUID leaf 0x15.
4. Change CPUID leaf 0x15 under TDX to report the crystal clock frequency
as 1 GHz.
Pro: This has been the virtual APIC frequency for KVM guests for 13
years.
Pro: This requires changing only one hard-coded constant in TDX.
Con: It doesn't work with other VMMs as TDX isn't specific to KVM.
Con: Core crystal clock frequency is also used to calculate TSC
frequency.
Con: If it is configured to value different from hardware, it will
break the correctness of INTEL-PT Mini Time Count (MTC) packets
in TDs.
Reported-by: Vishal Annapurve <[email protected]>
Closes: https://lore.kernel.org/lkml/[email protected]/
Signed-off-by: Isaku Yamahata <[email protected]>
Reviewed-by: Rick Edgecombe <[email protected]>
Co-developed-by: Reinette Chatre <[email protected]>
Signed-off-by: Reinette Chatre <[email protected]>
---
Changes v5:
- Rename capability KVM_CAP_X86_APIC_BUS_FREQUENCY ->
KVM_CAP_X86_APIC_BUS_CYCLES_NS. (Xiaoyao Li)
- Add Rick's Reviewed-by tag.
Changes v4:
- Rework implementation following Sean's guidance in:
https://lore.kernel.org/all/[email protected]/
- Reword con #2 to acknowledge feedback. (Sean)
- Add the "con" information from Xiaoyao during earlier review of v2.
- Rework changelog to address comments related to "bus clock" vs
"core crystal clock" frequency. (Xiaoyao)
- Drop snippet about impact on TSC deadline timer emulation. (Maxim)
- Drop Maxim Levitsky's "Reviewed-by" tag due to many changes to patch
since tag received.
- Switch "Subject:" to match custom "KVM: X86:" -> "KVM: x86:"
Changes v3:
- Added reviewed-by Maxim Levitsky.
- Minor update of the commit message.
Changes v2:
- Add check if vcpu isn't created.
- Add check if lapic chip is in-kernel emulation.
- Fix build error for i386.
- Add document to api.rst.
- Typo in the commit message.
Documentation/virt/kvm/api.rst | 17 +++++++++++++++++
arch/x86/kvm/x86.c | 27 +++++++++++++++++++++++++++
include/uapi/linux/kvm.h | 1 +
3 files changed, 45 insertions(+)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index f0b76ff5030d..f014cd9b2217 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8063,6 +8063,23 @@ error/annotated fault.
See KVM_EXIT_MEMORY_FAULT for more information.
+7.35 KVM_CAP_X86_APIC_BUS_CYCLES_NS
+-----------------------------------
+
+:Architectures: x86
+:Target: VM
+:Parameters: args[0] is the desired APIC bus clock rate, in nanoseconds
+:Returns: 0 on success, -EINVAL if args[0] contains an invalid value for the
+ frequency or if any vCPUs have been created, -ENXIO if a virtual
+ local APIC has not been created using KVM_CREATE_IRQCHIP.
+
+This capability sets VM's APIC bus clock frequency, used by KVM's in-kernel
+virtual APIC when emulating APIC timers. KVM's default value can be retrieved
+by KVM_CHECK_EXTENSION.
+
+Note: Userspace is responsible for correctly configuring CPUID 0x15, a.k.a. the
+core crystal clock frequency, if a non-zero CPUID 0x15 is exposed to the guest.
+
8. Other capabilities.
======================
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 10e6315103f4..fa6954c9a9d2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4715,6 +4715,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_MEMORY_FAULT_INFO:
r = 1;
break;
+ case KVM_CAP_X86_APIC_BUS_CYCLES_NS:
+ r = APIC_BUS_CYCLE_NS_DEFAULT;
+ break;
case KVM_CAP_EXIT_HYPERCALL:
r = KVM_EXIT_HYPERCALL_VALID_MASK;
break;
@@ -6755,6 +6758,30 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
}
mutex_unlock(&kvm->lock);
break;
+ case KVM_CAP_X86_APIC_BUS_CYCLES_NS: {
+ u64 bus_cycle_ns = cap->args[0];
+ u64 unused;
+
+ r = -EINVAL;
+ /*
+ * Guard against overflow in tmict_to_ns(). 128 is the highest
+ * divide value that can be programmed in APIC_TDCR.
+ */
+ if (!bus_cycle_ns ||
+ check_mul_overflow((u64)U32_MAX * 128, bus_cycle_ns, &unused))
+ break;
+
+ r = 0;
+ mutex_lock(&kvm->lock);
+ if (!irqchip_in_kernel(kvm))
+ r = -ENXIO;
+ else if (kvm->created_vcpus)
+ r = -EINVAL;
+ else
+ kvm->arch.apic_bus_cycle_ns = bus_cycle_ns;
+ mutex_unlock(&kvm->lock);
+ break;
+ }
default:
r = -EINVAL;
break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 2190adbe3002..6a4d9432ab11 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -917,6 +917,7 @@ struct kvm_enable_cap {
#define KVM_CAP_MEMORY_ATTRIBUTES 233
#define KVM_CAP_GUEST_MEMFD 234
#define KVM_CAP_VM_TYPES 235
+#define KVM_CAP_X86_APIC_BUS_CYCLES_NS 236
struct kvm_irq_routing_irqchip {
__u32 irqchip;
--
2.34.1
From: Isaku Yamahata <[email protected]>
Test if the APIC bus clock frequency is the expected configured value.
Set APIC timer's initial count to the maximum value and busy wait for 100
msec (any value is okay) with TSC value. Read the APIC timer's "current
count" to calculate the actual APIC bus clock frequency based on TSC
frequency.
Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Co-developed-by: Reinette Chatre <[email protected]>
Signed-off-by: Reinette Chatre <[email protected]>
---
Changes v5:
- Update to new name of capability KVM_CAP_X86_APIC_BUS_FREQUENCY ->
KVM_CAP_X86_APIC_BUS_CYCLES_NS. (Xiaoyao Li)
Changes v4:
- Rework changelog.
- Add Sean's "Suggested-by" to acknowledge guidance received in
https://lore.kernel.org/all/[email protected]/
- Add copyright.
- Add test description to file header.
- Consistent capitalization for acronyms.
- Rebase to kvm-x86/next.
- Update to v4 change of providing bus clock rate in nanoseconds.
- Add a "TEST_REQUIRE()" for the new capability so that the test can
work on kernels that do not support the new capability.
- Address checkpatch warnings and use tabs instead of spaces in header
file to match existing code.
Changes v3:
- Use 1.5GHz instead of 1GHz as frequency.
Changes v2:
- Newly added.
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/x86_64/apic.h | 7 +
.../kvm/x86_64/apic_bus_clock_test.c | 166 ++++++++++++++++++
3 files changed, 174 insertions(+)
create mode 100644 tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 871e2de3eb05..b65c7c88008a 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -111,6 +111,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/vmx_invalid_nested_guest_state
TEST_GEN_PROGS_x86_64 += x86_64/vmx_set_nested_state_test
TEST_GEN_PROGS_x86_64 += x86_64/vmx_tsc_adjust_test
TEST_GEN_PROGS_x86_64 += x86_64/vmx_nested_tsc_scaling_test
+TEST_GEN_PROGS_x86_64 += x86_64/apic_bus_clock_test
TEST_GEN_PROGS_x86_64 += x86_64/xapic_ipi_test
TEST_GEN_PROGS_x86_64 += x86_64/xapic_state_test
TEST_GEN_PROGS_x86_64 += x86_64/xcr0_cpuid_test
diff --git a/tools/testing/selftests/kvm/include/x86_64/apic.h b/tools/testing/selftests/kvm/include/x86_64/apic.h
index bed316fdecd5..b0d2fc62e172 100644
--- a/tools/testing/selftests/kvm/include/x86_64/apic.h
+++ b/tools/testing/selftests/kvm/include/x86_64/apic.h
@@ -60,6 +60,13 @@
#define APIC_VECTOR_MASK 0x000FF
#define APIC_ICR2 0x310
#define SET_APIC_DEST_FIELD(x) ((x) << 24)
+#define APIC_LVT0 0x350
+#define APIC_LVT_TIMER_ONESHOT (0 << 17)
+#define APIC_LVT_TIMER_PERIODIC (1 << 17)
+#define APIC_LVT_TIMER_TSCDEADLINE (2 << 17)
+#define APIC_TMICT 0x380
+#define APIC_TMCCT 0x390
+#define APIC_TDCR 0x3E0
void apic_disable(void);
void xapic_enable(void);
diff --git a/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c b/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
new file mode 100644
index 000000000000..5100b28228af
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
@@ -0,0 +1,166 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Test configure of APIC bus frequency.
+ *
+ * Copyright (c) 2024 Intel Corporation
+ *
+ * To verify if the APIC bus frequency can be configured this test starts
+ * by setting the TSC frequency in KVM, and then:
+ * For every APIC timer frequency supported:
+ * * In the guest:
+ * * * Start the APIC timer by programming the APIC TMICT (initial count
+ * register) to the largest value possible to guarantee that it will
+ * not expire during the test,
+ * * * Wait for a known duration based on previously set TSC frequency,
+ * * * Stop the timer and read the APIC TMCCT (current count) register to
+ * determine the count at that time (TMCCT is loaded from TMICT when
+ * TMICT is programmed and then starts counting down).
+ * * In the host:
+ * * * Determine if the APIC counts close to configured APIC bus frequency
+ * while taking into account how the APIC timer frequency was modified
+ * using the APIC TDCR (divide configuration register).
+ */
+#define _GNU_SOURCE /* for program_invocation_short_name */
+
+#include "apic.h"
+#include "test_util.h"
+
+/*
+ * Pick one convenient value, 1.5GHz. No special meaning and different from
+ * the default value, 1GHz.
+ */
+#define TSC_HZ (1500 * 1000 * 1000ULL)
+
+/* Wait for 100 msec, not too long, not too short value. */
+#define LOOP_MSEC 100ULL
+#define TSC_WAIT_DELTA (TSC_HZ / 1000 * LOOP_MSEC)
+
+/*
+ * Pick a typical value, 25MHz. Different enough from the default value, 1GHz.
+ */
+#define APIC_BUS_CLOCK_FREQ (25 * 1000 * 1000ULL)
+
+static void guest_code(void)
+{
+ /*
+ * Possible TDCR values and its divide count. Used to modify APIC
+ * timer frequency.
+ */
+ struct {
+ u32 tdcr;
+ u32 divide_count;
+ } tdcrs[] = {
+ {0x0, 2},
+ {0x1, 4},
+ {0x2, 8},
+ {0x3, 16},
+ {0x8, 32},
+ {0x9, 64},
+ {0xa, 128},
+ {0xb, 1},
+ };
+
+ u32 tmict, tmcct;
+ u64 tsc0, tsc1;
+ int i;
+
+ asm volatile("cli");
+
+ xapic_enable();
+
+ /*
+ * Setup one-shot timer. The vector does not matter because the
+ * interrupt does not fire.
+ */
+ xapic_write_reg(APIC_LVT0, APIC_LVT_TIMER_ONESHOT);
+
+ for (i = 0; i < ARRAY_SIZE(tdcrs); i++) {
+ xapic_write_reg(APIC_TDCR, tdcrs[i].tdcr);
+
+ /* Set the largest value to not trigger the interrupt. */
+ tmict = ~0;
+ xapic_write_reg(APIC_TMICT, tmict);
+
+ /* Busy wait for LOOP_MSEC */
+ tsc0 = rdtsc();
+ tsc1 = tsc0;
+ while (tsc1 - tsc0 < TSC_WAIT_DELTA)
+ tsc1 = rdtsc();
+
+ /* Read APIC timer and TSC */
+ tmcct = xapic_read_reg(APIC_TMCCT);
+ tsc1 = rdtsc();
+
+ /* Stop timer */
+ xapic_write_reg(APIC_TMICT, 0);
+
+ /* Report it. */
+ GUEST_SYNC_ARGS(tdcrs[i].divide_count, tmict - tmcct,
+ tsc1 - tsc0, 0, 0);
+ }
+
+ GUEST_DONE();
+}
+
+void test_apic_bus_clock(struct kvm_vcpu *vcpu)
+{
+ bool done = false;
+ struct ucall uc;
+
+ while (!done) {
+ vcpu_run(vcpu);
+ TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
+
+ switch (get_ucall(vcpu, &uc)) {
+ case UCALL_DONE:
+ done = true;
+ break;
+ case UCALL_ABORT:
+ REPORT_GUEST_ASSERT(uc);
+ break;
+ case UCALL_SYNC: {
+ u32 divide_counter = uc.args[1];
+ u32 apic_cycles = uc.args[2];
+ u64 tsc_cycles = uc.args[3];
+ u64 freq;
+
+ TEST_ASSERT(tsc_cycles > 0,
+ "TSC cycles must not be zero.");
+
+ /* Allow 1% slack. */
+ freq = apic_cycles * divide_counter * TSC_HZ / tsc_cycles;
+ TEST_ASSERT(freq < APIC_BUS_CLOCK_FREQ * 101 / 100,
+ "APIC bus clock frequency is too large");
+ TEST_ASSERT(freq > APIC_BUS_CLOCK_FREQ * 99 / 100,
+ "APIC bus clock frequency is too small");
+ break;
+ }
+ default:
+ TEST_FAIL("Unknown ucall %lu", uc.cmd);
+ break;
+ }
+ }
+}
+
+int main(int argc, char *argv[])
+{
+ struct kvm_vcpu *vcpu;
+ struct kvm_vm *vm;
+
+ TEST_REQUIRE(kvm_has_cap(KVM_CAP_X86_APIC_BUS_CYCLES_NS));
+
+ vm = __vm_create(VM_SHAPE_DEFAULT, 1, 0);
+ vm_ioctl(vm, KVM_SET_TSC_KHZ, (void *)(TSC_HZ / 1000));
+ /*
+ * KVM_CAP_X86_APIC_BUS_CYCLES_NS expects APIC bus clock rate in
+ * nanoseconds and requires that no vCPU is created.
+ */
+ vm_enable_cap(vm, KVM_CAP_X86_APIC_BUS_CYCLES_NS,
+ NSEC_PER_SEC / APIC_BUS_CLOCK_FREQ);
+ vcpu = vm_vcpu_add(vm, 0, guest_code);
+
+ virt_pg_map(vm, APIC_DEFAULT_GPA, APIC_DEFAULT_GPA);
+
+ test_apic_bus_clock(vcpu);
+ kvm_vm_free(vm);
+}
--
2.34.1
From: Isaku Yamahata <[email protected]>
Introduce the VM variable "nanoseconds per APIC bus cycle" in
preparation to make the APIC bus frequency configurable.
The TDX architecture hard-codes the core crystal clock frequency to
25MHz and mandates exposing it via CPUID leaf 0x15. The TDX architecture
does not allow the VMM to override the value.
In addition, per Intel SDM:
"The APIC timer frequency will be the processor’s bus clock or core
crystal clock frequency (when TSC/core crystal clock ratio is
enumerated in CPUID leaf 0x15) divided by the value specified in
the divide configuration register."
The resulting 25MHz APIC bus frequency conflicts with the KVM hardcoded
APIC bus frequency of 1GHz.
Introduce the VM variable "nanoseconds per APIC bus cycle" to prepare
for allowing userspace to tell KVM to use the frequency that TDX mandates
instead of the default 1Ghz. Doing so ensures that the guest doesn't have
a conflicting view of the APIC bus frequency.
Signed-off-by: Isaku Yamahata <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Reviewed-by: Rick Edgecombe <[email protected]>
[reinette: rework changelog]
Signed-off-by: Reinette Chatre <[email protected]>
---
Changes v5:
- Add Rick's Reviewed-by tag.
Changes v4:
- Reword changelog to address comments related to "bus clock" vs
"core crystal clock" frequency. (Xiaoyao)
- Typo in changelog ("APIC APIC" -> "APIC").
- Change logic "APIC bus cycles per nsec" -> "nanoseconds per
APIC bus cycle".
Changes V3:
- Update commit message.
- Dropped apic_bus_frequency according to Maxim Levitsky.
Changes v2:
- No change.
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/hyperv.c | 3 ++-
arch/x86/kvm/lapic.c | 6 ++++--
arch/x86/kvm/lapic.h | 2 +-
arch/x86/kvm/x86.c | 1 +
5 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1d13e3cd1dc5..f2735582c7e0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1358,6 +1358,7 @@ struct kvm_arch {
u32 default_tsc_khz;
bool user_set_tsc;
+ u64 apic_bus_cycle_ns;
seqcount_raw_spinlock_t pvclock_sc;
bool use_master_clock;
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 1030701db967..5c31e715d2ad 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1737,7 +1737,8 @@ static int kvm_hv_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata,
data = (u64)vcpu->arch.virtual_tsc_khz * 1000;
break;
case HV_X64_MSR_APIC_FREQUENCY:
- data = div64_u64(1000000000ULL, APIC_BUS_CYCLE_NS);
+ data = div64_u64(1000000000ULL,
+ vcpu->kvm->arch.apic_bus_cycle_ns);
break;
default:
kvm_pr_unimpl_rdmsr(vcpu, msr);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index cf37586f0466..3e66a0a95999 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1547,7 +1547,8 @@ static u32 apic_get_tmcct(struct kvm_lapic *apic)
remaining = 0;
ns = mod_64(ktime_to_ns(remaining), apic->lapic_timer.period);
- return div64_u64(ns, (APIC_BUS_CYCLE_NS * apic->divide_count));
+ return div64_u64(ns, (apic->vcpu->kvm->arch.apic_bus_cycle_ns *
+ apic->divide_count));
}
static void __report_tpr_access(struct kvm_lapic *apic, bool write)
@@ -1965,7 +1966,8 @@ static void start_sw_tscdeadline(struct kvm_lapic *apic)
static inline u64 tmict_to_ns(struct kvm_lapic *apic, u32 tmict)
{
- return (u64)tmict * APIC_BUS_CYCLE_NS * (u64)apic->divide_count;
+ return (u64)tmict * apic->vcpu->kvm->arch.apic_bus_cycle_ns *
+ (u64)apic->divide_count;
}
static void update_target_expiration(struct kvm_lapic *apic, uint32_t old_divisor)
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index a20cb006b6c8..51e09f5a7fc5 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -16,7 +16,7 @@
#define APIC_DEST_NOSHORT 0x0
#define APIC_DEST_MASK 0x800
-#define APIC_BUS_CYCLE_NS 1
+#define APIC_BUS_CYCLE_NS_DEFAULT 1
#define APIC_BROADCAST 0xFF
#define X2APIC_BROADCAST 0xFFFFFFFFul
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e9ef1fa4b90b..10e6315103f4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12629,6 +12629,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
kvm->arch.default_tsc_khz = max_tsc_khz ? : tsc_khz;
+ kvm->arch.apic_bus_cycle_ns = APIC_BUS_CYCLE_NS_DEFAULT;
kvm->arch.guest_can_read_msr_platform_info = true;
kvm->arch.enable_pmu = enable_pmu;
--
2.34.1
On 4/25/2024 3:07 PM, Reinette Chatre wrote:
> diff --git a/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c b/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
> new file mode 100644
> index 000000000000..5100b28228af
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
> @@ -0,0 +1,166 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Test configure of APIC bus frequency.
> + *
> + * Copyright (c) 2024 Intel Corporation
> + *
> + * To verify if the APIC bus frequency can be configured this test starts
Nit: some typos here?
> +int main(int argc, char *argv[])
> +{
> + struct kvm_vcpu *vcpu;
> + struct kvm_vm *vm;
> +
> + TEST_REQUIRE(kvm_has_cap(KVM_CAP_X86_APIC_BUS_CYCLES_NS));
> +
> + vm = __vm_create(VM_SHAPE_DEFAULT, 1, 0);
Use vm_create() instead?
Hi Zide,
On 4/26/2024 4:06 PM, Chen, Zide wrote:
>
>
> On 4/25/2024 3:07 PM, Reinette Chatre wrote:
>> diff --git a/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c b/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
>> new file mode 100644
>> index 000000000000..5100b28228af
>> --- /dev/null
>> +++ b/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
>> @@ -0,0 +1,166 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * Test configure of APIC bus frequency.
>> + *
>> + * Copyright (c) 2024 Intel Corporation
>> + *
>> + * To verify if the APIC bus frequency can be configured this test starts
>
> Nit: some typos here?
Apologies but this is not obvious to me. Could you please help
by pointing out all those typos to me?
>
>> +int main(int argc, char *argv[])
>> +{
>> + struct kvm_vcpu *vcpu;
>> + struct kvm_vm *vm;
>> +
>> + TEST_REQUIRE(kvm_has_cap(KVM_CAP_X86_APIC_BUS_CYCLES_NS));
>> +
>> + vm = __vm_create(VM_SHAPE_DEFAULT, 1, 0);
>
> Use vm_create() instead?
Sure. This is easier on the eye while generating exactly the same code.
diff --git a/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c b/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
index 5100b28228af..6ed253875971 100644
--- a/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
+++ b/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
@@ -149,7 +149,7 @@ int main(int argc, char *argv[])
TEST_REQUIRE(kvm_has_cap(KVM_CAP_X86_APIC_BUS_CYCLES_NS));
- vm = __vm_create(VM_SHAPE_DEFAULT, 1, 0);
+ vm = vm_create(1);
vm_ioctl(vm, KVM_SET_TSC_KHZ, (void *)(TSC_HZ / 1000));
/*
* KVM_CAP_X86_APIC_BUS_CYCLES_NS expects APIC bus clock rate in
Reinette
On 4/26/2024 4:26 PM, Reinette Chatre wrote:
> Hi Zide,
>
> On 4/26/2024 4:06 PM, Chen, Zide wrote:
>>
>>
>> On 4/25/2024 3:07 PM, Reinette Chatre wrote:
>>> diff --git a/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c b/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
>>> new file mode 100644
>>> index 000000000000..5100b28228af
>>> --- /dev/null
>>> +++ b/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
>>> @@ -0,0 +1,166 @@
>>> +// SPDX-License-Identifier: GPL-2.0-only
>>> +/*
>>> + * Test configure of APIC bus frequency.
>>> + *
>>> + * Copyright (c) 2024 Intel Corporation
>>> + *
>>> + * To verify if the APIC bus frequency can be configured this test starts
>>
>> Nit: some typos here?
>
> Apologies but this is not obvious to me. Could you please help
> by pointing out all those typos to me?
Do you think it's more readable to add a ","?
- * To verify if the APIC bus frequency can be configured this test starts
+ * To verify if the APIC bus frequency can be configured, this test starts
* by setting the TSC frequency in KVM, and then:
On Thu, Apr 25, 2024 at 03:07:01PM -0700, Reinette Chatre wrote:
> From: Isaku Yamahata <[email protected]>
>
> Add KVM_CAP_X86_APIC_BUS_CYCLES_NS capability to configure the APIC
> bus clock frequency for APIC timer emulation.
> Allow KVM_ENABLE_CAPABILITY(KVM_CAP_X86_APIC_BUS_CYCLES_NS) to set the
> frequency in nanoseconds. When using this capability, the user space
> VMM should configure CPUID leaf 0x15 to advertise the frequency.
>
> Vishal reported that the TDX guest kernel expects a 25MHz APIC bus
> frequency but ends up getting interrupts at a significantly higher rate.
>
> The TDX architecture hard-codes the core crystal clock frequency to
> 25MHz and mandates exposing it via CPUID leaf 0x15. The TDX architecture
> does not allow the VMM to override the value.
>
> In addition, per Intel SDM:
> "The APIC timer frequency will be the processor’s bus clock or core
> crystal clock frequency (when TSC/core crystal clock ratio is
> enumerated in CPUID leaf 0x15) divided by the value specified in
> the divide configuration register."
>
> The resulting 25MHz APIC bus frequency conflicts with the KVM hardcoded
> APIC bus frequency of 1GHz.
>
> The KVM doesn't enumerate CPUID leaf 0x15 to the guest unless the user
> space VMM sets it using KVM_SET_CPUID. If the CPUID leaf 0x15 is
> enumerated, the guest kernel uses it as the APIC bus frequency. If not,
> the guest kernel measures the frequency based on other known timers like
> the ACPI timer or the legacy PIT. As reported by Vishal the TDX guest
> kernel expects a 25MHz timer frequency but gets timer interrupt more
> frequently due to the 1GHz frequency used by KVM.
>
> To ensure that the guest doesn't have a conflicting view of the APIC bus
> frequency, allow the userspace to tell KVM to use the same frequency that
> TDX mandates instead of the default 1Ghz.
>
> There are several options to address this:
> 1. Make the KVM able to configure APIC bus frequency (this series).
> Pro: It resembles the existing hardware. The recent Intel CPUs
> adapts 25MHz.
> Con: Require the VMM to emulate the APIC timer at 25MHz.
> 2. Make the TDX architecture enumerate CPUID leaf 0x15 to configurable
> frequency or not enumerate it.
> Pro: Any APIC bus frequency is allowed.
> Con: Deviates from TDX architecture.
> 3. Make the TDX guest kernel use 1GHz when it's running on KVM.
> Con: The kernel ignores CPUID leaf 0x15.
> 4. Change CPUID leaf 0x15 under TDX to report the crystal clock frequency
> as 1 GHz.
> Pro: This has been the virtual APIC frequency for KVM guests for 13
> years.
> Pro: This requires changing only one hard-coded constant in TDX.
> Con: It doesn't work with other VMMs as TDX isn't specific to KVM.
> Con: Core crystal clock frequency is also used to calculate TSC
> frequency.
> Con: If it is configured to value different from hardware, it will
> break the correctness of INTEL-PT Mini Time Count (MTC) packets
> in TDs.
Reviewed-by: Yuan Yao <[email protected]>
>
> Reported-by: Vishal Annapurve <[email protected]>
> Closes: https://lore.kernel.org/lkml/[email protected]/
> Signed-off-by: Isaku Yamahata <[email protected]>
> Reviewed-by: Rick Edgecombe <[email protected]>
> Co-developed-by: Reinette Chatre <[email protected]>
> Signed-off-by: Reinette Chatre <[email protected]>
> ---
> Changes v5:
> - Rename capability KVM_CAP_X86_APIC_BUS_FREQUENCY ->
> KVM_CAP_X86_APIC_BUS_CYCLES_NS. (Xiaoyao Li)
> - Add Rick's Reviewed-by tag.
>
> Changes v4:
> - Rework implementation following Sean's guidance in:
> https://lore.kernel.org/all/[email protected]/
> - Reword con #2 to acknowledge feedback. (Sean)
> - Add the "con" information from Xiaoyao during earlier review of v2.
> - Rework changelog to address comments related to "bus clock" vs
> "core crystal clock" frequency. (Xiaoyao)
> - Drop snippet about impact on TSC deadline timer emulation. (Maxim)
> - Drop Maxim Levitsky's "Reviewed-by" tag due to many changes to patch
> since tag received.
> - Switch "Subject:" to match custom "KVM: X86:" -> "KVM: x86:"
>
> Changes v3:
> - Added reviewed-by Maxim Levitsky.
> - Minor update of the commit message.
>
> Changes v2:
> - Add check if vcpu isn't created.
> - Add check if lapic chip is in-kernel emulation.
> - Fix build error for i386.
> - Add document to api.rst.
> - Typo in the commit message.
>
> Documentation/virt/kvm/api.rst | 17 +++++++++++++++++
> arch/x86/kvm/x86.c | 27 +++++++++++++++++++++++++++
> include/uapi/linux/kvm.h | 1 +
> 3 files changed, 45 insertions(+)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index f0b76ff5030d..f014cd9b2217 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -8063,6 +8063,23 @@ error/annotated fault.
>
> See KVM_EXIT_MEMORY_FAULT for more information.
>
> +7.35 KVM_CAP_X86_APIC_BUS_CYCLES_NS
> +-----------------------------------
> +
> +:Architectures: x86
> +:Target: VM
> +:Parameters: args[0] is the desired APIC bus clock rate, in nanoseconds
> +:Returns: 0 on success, -EINVAL if args[0] contains an invalid value for the
> + frequency or if any vCPUs have been created, -ENXIO if a virtual
> + local APIC has not been created using KVM_CREATE_IRQCHIP.
> +
> +This capability sets VM's APIC bus clock frequency, used by KVM's in-kernel
> +virtual APIC when emulating APIC timers. KVM's default value can be retrieved
> +by KVM_CHECK_EXTENSION.
> +
> +Note: Userspace is responsible for correctly configuring CPUID 0x15, a.k.a. the
> +core crystal clock frequency, if a non-zero CPUID 0x15 is exposed to the guest.
> +
> 8. Other capabilities.
> ======================
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 10e6315103f4..fa6954c9a9d2 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4715,6 +4715,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> case KVM_CAP_MEMORY_FAULT_INFO:
> r = 1;
> break;
> + case KVM_CAP_X86_APIC_BUS_CYCLES_NS:
> + r = APIC_BUS_CYCLE_NS_DEFAULT;
> + break;
> case KVM_CAP_EXIT_HYPERCALL:
> r = KVM_EXIT_HYPERCALL_VALID_MASK;
> break;
> @@ -6755,6 +6758,30 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> }
> mutex_unlock(&kvm->lock);
> break;
> + case KVM_CAP_X86_APIC_BUS_CYCLES_NS: {
> + u64 bus_cycle_ns = cap->args[0];
> + u64 unused;
> +
> + r = -EINVAL;
> + /*
> + * Guard against overflow in tmict_to_ns(). 128 is the highest
> + * divide value that can be programmed in APIC_TDCR.
> + */
> + if (!bus_cycle_ns ||
> + check_mul_overflow((u64)U32_MAX * 128, bus_cycle_ns, &unused))
> + break;
> +
> + r = 0;
> + mutex_lock(&kvm->lock);
> + if (!irqchip_in_kernel(kvm))
> + r = -ENXIO;
> + else if (kvm->created_vcpus)
> + r = -EINVAL;
> + else
> + kvm->arch.apic_bus_cycle_ns = bus_cycle_ns;
> + mutex_unlock(&kvm->lock);
> + break;
> + }
> default:
> r = -EINVAL;
> break;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 2190adbe3002..6a4d9432ab11 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -917,6 +917,7 @@ struct kvm_enable_cap {
> #define KVM_CAP_MEMORY_ATTRIBUTES 233
> #define KVM_CAP_GUEST_MEMFD 234
> #define KVM_CAP_VM_TYPES 235
> +#define KVM_CAP_X86_APIC_BUS_CYCLES_NS 236
>
> struct kvm_irq_routing_irqchip {
> __u32 irqchip;
> --
> 2.34.1
>
>
On 4/26/2024 6:07 AM, Reinette Chatre wrote:
> From: Isaku Yamahata <[email protected]>
>
> Introduce the VM variable "nanoseconds per APIC bus cycle" in
> preparation to make the APIC bus frequency configurable.
>
> The TDX architecture hard-codes the core crystal clock frequency to
> 25MHz and mandates exposing it via CPUID leaf 0x15. The TDX architecture
> does not allow the VMM to override the value.
>
> In addition, per Intel SDM:
> "The APIC timer frequency will be the processor’s bus clock or core
> crystal clock frequency (when TSC/core crystal clock ratio is
> enumerated in CPUID leaf 0x15) divided by the value specified in
> the divide configuration register."
>
> The resulting 25MHz APIC bus frequency conflicts with the KVM hardcoded
> APIC bus frequency of 1GHz.
>
> Introduce the VM variable "nanoseconds per APIC bus cycle" to prepare
> for allowing userspace to tell KVM to use the frequency that TDX mandates
> instead of the default 1Ghz. Doing so ensures that the guest doesn't have
> a conflicting view of the APIC bus frequency.
Reviewed-by: Xiaoyao Li <[email protected]>
> Signed-off-by: Isaku Yamahata <[email protected]>
> Reviewed-by: Maxim Levitsky <[email protected]>
> Reviewed-by: Rick Edgecombe <[email protected]>
> [reinette: rework changelog]
> Signed-off-by: Reinette Chatre <[email protected]>
> ---
> Changes v5:
> - Add Rick's Reviewed-by tag.
>
> Changes v4:
> - Reword changelog to address comments related to "bus clock" vs
> "core crystal clock" frequency. (Xiaoyao)
> - Typo in changelog ("APIC APIC" -> "APIC").
> - Change logic "APIC bus cycles per nsec" -> "nanoseconds per
> APIC bus cycle".
>
> Changes V3:
> - Update commit message.
> - Dropped apic_bus_frequency according to Maxim Levitsky.
>
> Changes v2:
> - No change.
>
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/hyperv.c | 3 ++-
> arch/x86/kvm/lapic.c | 6 ++++--
> arch/x86/kvm/lapic.h | 2 +-
> arch/x86/kvm/x86.c | 1 +
> 5 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 1d13e3cd1dc5..f2735582c7e0 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1358,6 +1358,7 @@ struct kvm_arch {
>
> u32 default_tsc_khz;
> bool user_set_tsc;
> + u64 apic_bus_cycle_ns;
>
> seqcount_raw_spinlock_t pvclock_sc;
> bool use_master_clock;
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 1030701db967..5c31e715d2ad 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -1737,7 +1737,8 @@ static int kvm_hv_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata,
> data = (u64)vcpu->arch.virtual_tsc_khz * 1000;
> break;
> case HV_X64_MSR_APIC_FREQUENCY:
> - data = div64_u64(1000000000ULL, APIC_BUS_CYCLE_NS);
> + data = div64_u64(1000000000ULL,
> + vcpu->kvm->arch.apic_bus_cycle_ns);
> break;
> default:
> kvm_pr_unimpl_rdmsr(vcpu, msr);
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index cf37586f0466..3e66a0a95999 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1547,7 +1547,8 @@ static u32 apic_get_tmcct(struct kvm_lapic *apic)
> remaining = 0;
>
> ns = mod_64(ktime_to_ns(remaining), apic->lapic_timer.period);
> - return div64_u64(ns, (APIC_BUS_CYCLE_NS * apic->divide_count));
> + return div64_u64(ns, (apic->vcpu->kvm->arch.apic_bus_cycle_ns *
> + apic->divide_count));
> }
>
> static void __report_tpr_access(struct kvm_lapic *apic, bool write)
> @@ -1965,7 +1966,8 @@ static void start_sw_tscdeadline(struct kvm_lapic *apic)
>
> static inline u64 tmict_to_ns(struct kvm_lapic *apic, u32 tmict)
> {
> - return (u64)tmict * APIC_BUS_CYCLE_NS * (u64)apic->divide_count;
> + return (u64)tmict * apic->vcpu->kvm->arch.apic_bus_cycle_ns *
> + (u64)apic->divide_count;
> }
>
> static void update_target_expiration(struct kvm_lapic *apic, uint32_t old_divisor)
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index a20cb006b6c8..51e09f5a7fc5 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -16,7 +16,7 @@
> #define APIC_DEST_NOSHORT 0x0
> #define APIC_DEST_MASK 0x800
>
> -#define APIC_BUS_CYCLE_NS 1
> +#define APIC_BUS_CYCLE_NS_DEFAULT 1
>
> #define APIC_BROADCAST 0xFF
> #define X2APIC_BROADCAST 0xFFFFFFFFul
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e9ef1fa4b90b..10e6315103f4 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12629,6 +12629,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
>
> kvm->arch.default_tsc_khz = max_tsc_khz ? : tsc_khz;
> + kvm->arch.apic_bus_cycle_ns = APIC_BUS_CYCLE_NS_DEFAULT;
> kvm->arch.guest_can_read_msr_platform_info = true;
> kvm->arch.enable_pmu = enable_pmu;
>
On 4/26/2024 6:07 AM, Reinette Chatre wrote:
> From: Isaku Yamahata <[email protected]>
>
> Add KVM_CAP_X86_APIC_BUS_CYCLES_NS capability to configure the APIC
> bus clock frequency for APIC timer emulation.
> Allow KVM_ENABLE_CAPABILITY(KVM_CAP_X86_APIC_BUS_CYCLES_NS) to set the
> frequency in nanoseconds. When using this capability, the user space
> VMM should configure CPUID leaf 0x15 to advertise the frequency.
>
> Vishal reported that the TDX guest kernel expects a 25MHz APIC bus
> frequency but ends up getting interrupts at a significantly higher rate.
>
> The TDX architecture hard-codes the core crystal clock frequency to
> 25MHz and mandates exposing it via CPUID leaf 0x15. The TDX architecture
> does not allow the VMM to override the value.
>
> In addition, per Intel SDM:
> "The APIC timer frequency will be the processor’s bus clock or core
> crystal clock frequency (when TSC/core crystal clock ratio is
> enumerated in CPUID leaf 0x15) divided by the value specified in
> the divide configuration register."
>
> The resulting 25MHz APIC bus frequency conflicts with the KVM hardcoded
> APIC bus frequency of 1GHz.
>
> The KVM doesn't enumerate CPUID leaf 0x15 to the guest unless the user
> space VMM sets it using KVM_SET_CPUID. If the CPUID leaf 0x15 is
> enumerated, the guest kernel uses it as the APIC bus frequency. If not,
> the guest kernel measures the frequency based on other known timers like
> the ACPI timer or the legacy PIT. As reported by Vishal the TDX guest
> kernel expects a 25MHz timer frequency but gets timer interrupt more
> frequently due to the 1GHz frequency used by KVM.
>
> To ensure that the guest doesn't have a conflicting view of the APIC bus
> frequency, allow the userspace to tell KVM to use the same frequency that
> TDX mandates instead of the default 1Ghz.
>
> There are several options to address this:
> 1. Make the KVM able to configure APIC bus frequency (this series).
> Pro: It resembles the existing hardware. The recent Intel CPUs
> adapts 25MHz.
> Con: Require the VMM to emulate the APIC timer at 25MHz.
> 2. Make the TDX architecture enumerate CPUID leaf 0x15 to configurable
> frequency or not enumerate it.
> Pro: Any APIC bus frequency is allowed.
> Con: Deviates from TDX architecture.
> 3. Make the TDX guest kernel use 1GHz when it's running on KVM.
> Con: The kernel ignores CPUID leaf 0x15.
> 4. Change CPUID leaf 0x15 under TDX to report the crystal clock frequency
> as 1 GHz.
> Pro: This has been the virtual APIC frequency for KVM guests for 13
> years.
> Pro: This requires changing only one hard-coded constant in TDX.
> Con: It doesn't work with other VMMs as TDX isn't specific to KVM.
> Con: Core crystal clock frequency is also used to calculate TSC
> frequency.
> Con: If it is configured to value different from hardware, it will
> break the correctness of INTEL-PT Mini Time Count (MTC) packets
> in TDs.
>
> Reported-by: Vishal Annapurve <[email protected]>
> Closes: https://lore.kernel.org/lkml/[email protected]/
> Signed-off-by: Isaku Yamahata <[email protected]>
> Reviewed-by: Rick Edgecombe <[email protected]>
> Co-developed-by: Reinette Chatre <[email protected]>
> Signed-off-by: Reinette Chatre <[email protected]>
Reviewed-by: Xiaoyao Li <[email protected]>
> ---
> Changes v5:
> - Rename capability KVM_CAP_X86_APIC_BUS_FREQUENCY ->
> KVM_CAP_X86_APIC_BUS_CYCLES_NS. (Xiaoyao Li)
> - Add Rick's Reviewed-by tag.
>
> Changes v4:
> - Rework implementation following Sean's guidance in:
> https://lore.kernel.org/all/[email protected]/
> - Reword con #2 to acknowledge feedback. (Sean)
> - Add the "con" information from Xiaoyao during earlier review of v2.
> - Rework changelog to address comments related to "bus clock" vs
> "core crystal clock" frequency. (Xiaoyao)
> - Drop snippet about impact on TSC deadline timer emulation. (Maxim)
> - Drop Maxim Levitsky's "Reviewed-by" tag due to many changes to patch
> since tag received.
> - Switch "Subject:" to match custom "KVM: X86:" -> "KVM: x86:"
>
> Changes v3:
> - Added reviewed-by Maxim Levitsky.
> - Minor update of the commit message.
>
> Changes v2:
> - Add check if vcpu isn't created.
> - Add check if lapic chip is in-kernel emulation.
> - Fix build error for i386.
> - Add document to api.rst.
> - Typo in the commit message.
>
> Documentation/virt/kvm/api.rst | 17 +++++++++++++++++
> arch/x86/kvm/x86.c | 27 +++++++++++++++++++++++++++
> include/uapi/linux/kvm.h | 1 +
> 3 files changed, 45 insertions(+)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index f0b76ff5030d..f014cd9b2217 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -8063,6 +8063,23 @@ error/annotated fault.
>
> See KVM_EXIT_MEMORY_FAULT for more information.
>
> +7.35 KVM_CAP_X86_APIC_BUS_CYCLES_NS
> +-----------------------------------
> +
> +:Architectures: x86
> +:Target: VM
> +:Parameters: args[0] is the desired APIC bus clock rate, in nanoseconds
> +:Returns: 0 on success, -EINVAL if args[0] contains an invalid value for the
> + frequency or if any vCPUs have been created, -ENXIO if a virtual
> + local APIC has not been created using KVM_CREATE_IRQCHIP.
> +
> +This capability sets VM's APIC bus clock frequency, used by KVM's in-kernel
> +virtual APIC when emulating APIC timers. KVM's default value can be retrieved
> +by KVM_CHECK_EXTENSION.
> +
> +Note: Userspace is responsible for correctly configuring CPUID 0x15, a.k.a. the
> +core crystal clock frequency, if a non-zero CPUID 0x15 is exposed to the guest.
> +
> 8. Other capabilities.
> ======================
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 10e6315103f4..fa6954c9a9d2 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4715,6 +4715,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> case KVM_CAP_MEMORY_FAULT_INFO:
> r = 1;
> break;
> + case KVM_CAP_X86_APIC_BUS_CYCLES_NS:
> + r = APIC_BUS_CYCLE_NS_DEFAULT;
> + break;
> case KVM_CAP_EXIT_HYPERCALL:
> r = KVM_EXIT_HYPERCALL_VALID_MASK;
> break;
> @@ -6755,6 +6758,30 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> }
> mutex_unlock(&kvm->lock);
> break;
> + case KVM_CAP_X86_APIC_BUS_CYCLES_NS: {
> + u64 bus_cycle_ns = cap->args[0];
> + u64 unused;
> +
> + r = -EINVAL;
> + /*
> + * Guard against overflow in tmict_to_ns(). 128 is the highest
> + * divide value that can be programmed in APIC_TDCR.
> + */
> + if (!bus_cycle_ns ||
> + check_mul_overflow((u64)U32_MAX * 128, bus_cycle_ns, &unused))
> + break;
> +
> + r = 0;
> + mutex_lock(&kvm->lock);
> + if (!irqchip_in_kernel(kvm))
> + r = -ENXIO;
> + else if (kvm->created_vcpus)
> + r = -EINVAL;
> + else
> + kvm->arch.apic_bus_cycle_ns = bus_cycle_ns;
> + mutex_unlock(&kvm->lock);
> + break;
> + }
> default:
> r = -EINVAL;
> break;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 2190adbe3002..6a4d9432ab11 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -917,6 +917,7 @@ struct kvm_enable_cap {
> #define KVM_CAP_MEMORY_ATTRIBUTES 233
> #define KVM_CAP_GUEST_MEMFD 234
> #define KVM_CAP_VM_TYPES 235
> +#define KVM_CAP_X86_APIC_BUS_CYCLES_NS 236
>
> struct kvm_irq_routing_irqchip {
> __u32 irqchip;
Hi Zide,
On 4/26/2024 10:38 PM, Chen, Zide wrote:
> On 4/26/2024 4:26 PM, Reinette Chatre wrote:
>> On 4/26/2024 4:06 PM, Chen, Zide wrote:
>>> On 4/25/2024 3:07 PM, Reinette Chatre wrote:
>>>> diff --git a/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c b/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
>>>> new file mode 100644
>>>> index 000000000000..5100b28228af
>>>> --- /dev/null
>>>> +++ b/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
>>>> @@ -0,0 +1,166 @@
>>>> +// SPDX-License-Identifier: GPL-2.0-only
>>>> +/*
>>>> + * Test configure of APIC bus frequency.
>>>> + *
>>>> + * Copyright (c) 2024 Intel Corporation
>>>> + *
>>>> + * To verify if the APIC bus frequency can be configured this test starts
>>>
>>> Nit: some typos here?
>>
>> Apologies but this is not obvious to me. Could you please help
>> by pointing out all those typos to me?
>
> Do you think it's more readable to add a ","?
>
> - * To verify if the APIC bus frequency can be configured this test starts
> + * To verify if the APIC bus frequency can be configured, this test starts
> * by setting the TSC frequency in KVM, and then:
Sure. I'll do so in the next version.
Thank you.
Reinette
On Thu, Apr 25, 2024, Reinette Chatre wrote:
> From: Isaku Yamahata <[email protected]>
>
> Test if the APIC bus clock frequency is the expected configured value.
This is one of the cases where explicitly calling out "code" by name is extremely
valuable. E.g.
Test if KVM emulates the APIC bus clock at the expected frequency when
userspace configures the frequency via KVM_CAP_X86_APIC_BUS_CYCLES_NS.
Set APIC timer's initial count to the maximum value and busy wait for 100
msec (largely arbitrary) using the TSC. Read the APIC timer's "current
count" to calculate the actual APIC bus clock frequency based on TSC
frequency.
> Set APIC timer's initial count to the maximum value and busy wait for 100
> msec (any value is okay) with TSC value. Read the APIC timer's "current
> count" to calculate the actual APIC bus clock frequency based on TSC
> frequency.
>
> diff --git a/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c b/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
> new file mode 100644
> index 000000000000..5100b28228af
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
> @@ -0,0 +1,166 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Test configure of APIC bus frequency.
> + *
> + * Copyright (c) 2024 Intel Corporation
> + *
> + * To verify if the APIC bus frequency can be configured this test starts
> + * by setting the TSC frequency in KVM, and then:
> + * For every APIC timer frequency supported:
> + * * In the guest:
> + * * * Start the APIC timer by programming the APIC TMICT (initial count
> + * register) to the largest value possible to guarantee that it will
> + * not expire during the test,
> + * * * Wait for a known duration based on previously set TSC frequency,
> + * * * Stop the timer and read the APIC TMCCT (current count) register to
> + * determine the count at that time (TMCCT is loaded from TMICT when
> + * TMICT is programmed and then starts counting down).
> + * * In the host:
> + * * * Determine if the APIC counts close to configured APIC bus frequency
> + * while taking into account how the APIC timer frequency was modified
> + * using the APIC TDCR (divide configuration register).
I find the asterisks super hard to parse. And I honestly wouldn't bother breaking
things down by guest vs. host. History has shown that file comments that are *too*
specific eventually become stale, often sooner than later. E.g. it's entirely
feasible to do the checking in the guest, not the host.
How about this?
/*
* Copyright (c) 2024 Intel Corporation
*
* Verify KVM correctly emulates the APIC bus frequency when the VMM configures
* the frequency via KVM_CAP_X86_APIC_BUS_CYCLES_NS. Start the APIC timer by
* programming TMICT (timer initial count) to the largest value possible (so
* that the timer will not expire during the test). Then, after an arbitrary
* amount of time has elapsed, verify TMCCT (timer current count) is within 1%
* of the expected value based on the time elapsed, the APIC bus frequency, and
* the programmed TDCR (timer divide configuration register).
*/
> + */
> +#define _GNU_SOURCE /* for program_invocation_short_name */
This can now be dropped.
> +#include "apic.h"
> +#include "test_util.h"
> +
> +/*
> + * Pick one convenient value, 1.5GHz. No special meaning and different from
> + * the default value, 1GHz.
I have no idea where the 1GHz comes from. KVM doesn't force a default TSC, KVM
uses the underlying CPU's frequency. Peeking further ahead, I don't understand
why this test sets KVM_SET_TSC_KHZ. That brings in a whole different set of
behavior, and that behavior is already verified by tsc_scaling_sync.c.
I suspect/assume this test forces a frequency so that it can hardcode the math,
but (a) that's odd and (b) x86 selftests really should provide a udelay() so that
goofy stuff like this doesn't end up in random tests.
> + */
> +#define TSC_HZ (1500 * 1000 * 1000ULL)
Definitely do not call this TSC_HZ. Yeah, it's local to this file, but defining
generic macros like this is just asking for conflicts, and the value itself has
nothing to do with the TSC (it's a raw value). E.g. _if_ we need to keep this,
something like
#define FREQ_1_5_GHZ (1500 * 1000 * 1000ULL)
> +
> +/* Wait for 100 msec, not too long, not too short value. */
> +#define LOOP_MSEC 100ULL
> +#define TSC_WAIT_DELTA (TSC_HZ / 1000 * LOOP_MSEC)
These shouldn't exist.
> +
> +/*
> + * Pick a typical value, 25MHz. Different enough from the default value, 1GHz.
> + */
> +#define APIC_BUS_CLOCK_FREQ (25 * 1000 * 1000ULL)
Rather than hardcode a single frequency, use 25MHz as the default value but let
the user override it via command line.
> + asm volatile("cli");
Unless I'm misremembering, the timer still counts when the LVT entry is masked
so just mask the IRQ in the LVT. Or rather, keep the entry masked in the LVT.
FWIW, you _could_ simply leave APIC_LVT0 at its default value to verify KVM
correctly emulates that reset value (masked, one-shot). That'd be mildly amusing,
but possibly a net negative from readability, so
> +
> + xapic_enable();
What about x2APIC? Arguably that's _more_ interesting since it's required for
TDX.
> + /*
> + * Setup one-shot timer. The vector does not matter because the
> + * interrupt does not fire.
_should_ not fire.
> + */
> + xapic_write_reg(APIC_LVT0, APIC_LVT_TIMER_ONESHOT);
> +
> + for (i = 0; i < ARRAY_SIZE(tdcrs); i++) {
> + xapic_write_reg(APIC_TDCR, tdcrs[i].tdcr);
> +
> + /* Set the largest value to not trigger the interrupt. */
> + tmict = ~0;
> + xapic_write_reg(APIC_TMICT, tmict);
> +
> + /* Busy wait for LOOP_MSEC */
> + tsc0 = rdtsc();
> + tsc1 = tsc0;
> + while (tsc1 - tsc0 < TSC_WAIT_DELTA)
> + tsc1 = rdtsc();
> +
> + /* Read APIC timer and TSC */
> + tmcct = xapic_read_reg(APIC_TMCCT);
> + tsc1 = rdtsc();
> +
> + /* Stop timer */
> + xapic_write_reg(APIC_TMICT, 0);
> +
> + /* Report it. */
> + GUEST_SYNC_ARGS(tdcrs[i].divide_count, tmict - tmcct,
> + tsc1 - tsc0, 0, 0);
Why punt to the host? I don't see any reason why GUEST_ASSERT() wouldn't work
here.
> + }
> +
> + GUEST_DONE();
> +}
> +
> +void test_apic_bus_clock(struct kvm_vcpu *vcpu)
> +{
> + bool done = false;
> + struct ucall uc;
> +
> + while (!done) {
> + vcpu_run(vcpu);
> + TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
> +
> + switch (get_ucall(vcpu, &uc)) {
> + case UCALL_DONE:
> + done = true;
> + break;
> + case UCALL_ABORT:
> + REPORT_GUEST_ASSERT(uc);
> + break;
> + case UCALL_SYNC: {
> + u32 divide_counter = uc.args[1];
> + u32 apic_cycles = uc.args[2];
> + u64 tsc_cycles = uc.args[3];
> + u64 freq;
> +
> + TEST_ASSERT(tsc_cycles > 0,
> + "TSC cycles must not be zero.");
> +
> + /* Allow 1% slack. */
> + freq = apic_cycles * divide_counter * TSC_HZ / tsc_cycles;
> + TEST_ASSERT(freq < APIC_BUS_CLOCK_FREQ * 101 / 100,
> + "APIC bus clock frequency is too large");
> + TEST_ASSERT(freq > APIC_BUS_CLOCK_FREQ * 99 / 100,
> + "APIC bus clock frequency is too small");
> + break;
> + }
> + default:
> + TEST_FAIL("Unknown ucall %lu", uc.cmd);
> + break;
> + }
> + }
> +}
> +
> +int main(int argc, char *argv[])
> +{
> + struct kvm_vcpu *vcpu;
> + struct kvm_vm *vm;
> +
> + TEST_REQUIRE(kvm_has_cap(KVM_CAP_X86_APIC_BUS_CYCLES_NS));
> +
> + vm = __vm_create(VM_SHAPE_DEFAULT, 1, 0);
> + vm_ioctl(vm, KVM_SET_TSC_KHZ, (void *)(TSC_HZ / 1000));
> + /*
> + * KVM_CAP_X86_APIC_BUS_CYCLES_NS expects APIC bus clock rate in
> + * nanoseconds and requires that no vCPU is created.
Meh, I'd drop this comment. It should be quite obvious that the rate is in
nanoseconds. And instead of adding a comment for the vCPU creation, do
__vm_enable_cap() again _after_ creating a vCPU and verify it fails with -EINVAL.
> + */
> + vm_enable_cap(vm, KVM_CAP_X86_APIC_BUS_CYCLES_NS,
> + NSEC_PER_SEC / APIC_BUS_CLOCK_FREQ);
> + vcpu = vm_vcpu_add(vm, 0, guest_code);
> +
> + virt_pg_map(vm, APIC_DEFAULT_GPA, APIC_DEFAULT_GPA);
> +
> + test_apic_bus_clock(vcpu);
> + kvm_vm_free(vm);
> +}
> --
> 2.34.1
>
On Thu, 25 Apr 2024 15:06:58 -0700, Reinette Chatre wrote:
> Changes from v4:
> - v4: https://lore.kernel.org/lkml/[email protected]/
> - Rename capability from KVM_CAP_X86_APIC_BUS_FREQUENCY to
> KVM_CAP_X86_APIC_BUS_CYCLES_NS. (Xiaoyao Li).
> - Include "Testing" section in cover letter.
> - Add Rick's Reviewed-by tags.
> - Rebased on latest of "next" branch of https://github.com/kvm-x86/linux.git
>
> [...]
Applied the KVM changes to kvm-x86 misc (I'm feeling lucky). Please prioritize
refreshing the selftests patch, I'd like to get it applied sooner than later
for obvious reasons (I'm not feeling _that_ lucky).
[1/4] KVM: x86: hyper-v: Calculate APIC bus frequency for Hyper-V
https://github.com/kvm-x86/linux/commit/41c7b1bb656c
[2/4] KVM: x86: Make nsec per APIC bus cycle a VM variable
https://github.com/kvm-x86/linux/commit/01de6ce03b1e
[3/4] KVM: x86: Add a capability to configure bus frequency for APIC timer
https://github.com/kvm-x86/linux/commit/937296fd3deb
[4/4] KVM: selftests: Add test for configure of x86 APIC bus frequency
(not applied)
--
https://github.com/kvm-x86/linux/tree/next
Hi Sean,
On 6/3/24 10:25 AM, Sean Christopherson wrote:
> On Thu, Apr 25, 2024, Reinette Chatre wrote:
>> From: Isaku Yamahata <[email protected]>
>>
>> Test if the APIC bus clock frequency is the expected configured value.
>
> This is one of the cases where explicitly calling out "code" by name is extremely
> valuable. E.g.
>
> Test if KVM emulates the APIC bus clock at the expected frequency when
> userspace configures the frequency via KVM_CAP_X86_APIC_BUS_CYCLES_NS.
>
> Set APIC timer's initial count to the maximum value and busy wait for 100
> msec (largely arbitrary) using the TSC. Read the APIC timer's "current
> count" to calculate the actual APIC bus clock frequency based on TSC
> frequency.
Thank you very much. (copy&pasted)
>
>> Set APIC timer's initial count to the maximum value and busy wait for 100
>> msec (any value is okay) with TSC value. Read the APIC timer's "current
>> count" to calculate the actual APIC bus clock frequency based on TSC
>> frequency.
>>
>> diff --git a/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c b/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
>> new file mode 100644
>> index 000000000000..5100b28228af
>> --- /dev/null
>> +++ b/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
>> @@ -0,0 +1,166 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * Test configure of APIC bus frequency.
>> + *
>> + * Copyright (c) 2024 Intel Corporation
>> + *
>> + * To verify if the APIC bus frequency can be configured this test starts
>> + * by setting the TSC frequency in KVM, and then:
>> + * For every APIC timer frequency supported:
>> + * * In the guest:
>> + * * * Start the APIC timer by programming the APIC TMICT (initial count
>> + * register) to the largest value possible to guarantee that it will
>> + * not expire during the test,
>> + * * * Wait for a known duration based on previously set TSC frequency,
>> + * * * Stop the timer and read the APIC TMCCT (current count) register to
>> + * determine the count at that time (TMCCT is loaded from TMICT when
>> + * TMICT is programmed and then starts counting down).
>> + * * In the host:
>> + * * * Determine if the APIC counts close to configured APIC bus frequency
>> + * while taking into account how the APIC timer frequency was modified
>> + * using the APIC TDCR (divide configuration register).
>
> I find the asterisks super hard to parse. And I honestly wouldn't bother breaking
> things down by guest vs. host. History has shown that file comments that are *too*
> specific eventually become stale, often sooner than later. E.g. it's entirely
> feasible to do the checking in the guest, not the host.
>
> How about this?
>
> /*
> * Copyright (c) 2024 Intel Corporation
> *
> * Verify KVM correctly emulates the APIC bus frequency when the VMM configures
> * the frequency via KVM_CAP_X86_APIC_BUS_CYCLES_NS. Start the APIC timer by
> * programming TMICT (timer initial count) to the largest value possible (so
> * that the timer will not expire during the test). Then, after an arbitrary
> * amount of time has elapsed, verify TMCCT (timer current count) is within 1%
> * of the expected value based on the time elapsed, the APIC bus frequency, and
> * the programmed TDCR (timer divide configuration register).
> */
Thank you very much. (copy&pasted)
>
>> + */
>> +#define _GNU_SOURCE /* for program_invocation_short_name */
>
> This can now be dropped.
Right.
>
>> +#include "apic.h"
>> +#include "test_util.h"
>> +
>> +/*
>> + * Pick one convenient value, 1.5GHz. No special meaning and different from
>> + * the default value, 1GHz.
>
> I have no idea where the 1GHz comes from. KVM doesn't force a default TSC, KVM
> uses the underlying CPU's frequency. Peeking further ahead, I don't understand
> why this test sets KVM_SET_TSC_KHZ. That brings in a whole different set of
> behavior, and that behavior is already verified by tsc_scaling_sync.c.
>
> I suspect/assume this test forces a frequency so that it can hardcode the math,
> but (a) that's odd and (b) x86 selftests really should provide a udelay() so that
> goofy stuff like this doesn't end up in random tests.
I believe the "default 1GHz" actually refers to the default APIC bus frequency and
the goal was indeed to (a) make the TSC frequency different from APIC bus frequency,
and (b) make math easier.
Yes, there is no need to use KVM_SET_TSC_KHZ. An implementation of udelay() would
require calibration and to make this simple for KVM I think we can just use
KVM_GET_TSC_KHZ. For now I continue to open code this (see later) since I did not
notice similar patterns in existing tests that may need a utility. I'd be happy
to add a utility if the needed usage pattern is clear since the closest candidate
I could find was xapic_ipi_test.c that does not have a nop loop.
>
>> + */
>> +#define TSC_HZ (1500 * 1000 * 1000ULL)
>
> Definitely do not call this TSC_HZ. Yeah, it's local to this file, but defining
> generic macros like this is just asking for conflicts, and the value itself has
> nothing to do with the TSC (it's a raw value). E.g. _if_ we need to keep this,
> something like
Macro is gone. In its place is a new global tsc_hz that is the actual TSC
frequency of the guest.
>
> #define FREQ_1_5_GHZ (1500 * 1000 * 1000ULL)
>
>> +
>> +/* Wait for 100 msec, not too long, not too short value. */
>> +#define LOOP_MSEC 100ULL
>> +#define TSC_WAIT_DELTA (TSC_HZ / 1000 * LOOP_MSEC)
>
> These shouldn't exist.
Gone.
>
>
>> +
>> +/*
>> + * Pick a typical value, 25MHz. Different enough from the default value, 1GHz.
>> + */
>> +#define APIC_BUS_CLOCK_FREQ (25 * 1000 * 1000ULL)
>
> Rather than hardcode a single frequency, use 25MHz as the default value but let
> the user override it via command line.
Done.
>
>> + asm volatile("cli");
>
> Unless I'm misremembering, the timer still counts when the LVT entry is masked
> so just mask the IRQ in the LVT. Or rather, keep the entry masked in the LVT.
hmmm ... I do not think this is specific to LVT entry but instead an attempt
to ignore all maskable external interrupt that may interfere with the test.
LVT entry is prevented from triggering because if the large configuration value.
>
> FWIW, you _could_ simply leave APIC_LVT0 at its default value to verify KVM
> correctly emulates that reset value (masked, one-shot). That'd be mildly amusing,
> but possibly a net negative from readability, so
>
>> +
>> + xapic_enable();
>
> What about x2APIC? Arguably that's _more_ interesting since it's required for
> TDX.
Added test for x2APIC to test both.
>
>> + /*
>> + * Setup one-shot timer. The vector does not matter because the
>> + * interrupt does not fire.
>
> _should_ not fire.
ack.
>
>> + */
>> + xapic_write_reg(APIC_LVT0, APIC_LVT_TIMER_ONESHOT);
>> +
>> + for (i = 0; i < ARRAY_SIZE(tdcrs); i++) {
>> + xapic_write_reg(APIC_TDCR, tdcrs[i].tdcr);
>> +
>> + /* Set the largest value to not trigger the interrupt. */
>> + tmict = ~0;
>> + xapic_write_reg(APIC_TMICT, tmict);
>> +
>> + /* Busy wait for LOOP_MSEC */
>> + tsc0 = rdtsc();
>> + tsc1 = tsc0;
>> + while (tsc1 - tsc0 < TSC_WAIT_DELTA)
>> + tsc1 = rdtsc();
>> +
>> + /* Read APIC timer and TSC */
>> + tmcct = xapic_read_reg(APIC_TMCCT);
>> + tsc1 = rdtsc();
>> +
>> + /* Stop timer */
>> + xapic_write_reg(APIC_TMICT, 0);
>> +
>> + /* Report it. */
>> + GUEST_SYNC_ARGS(tdcrs[i].divide_count, tmict - tmcct,
>> + tsc1 - tsc0, 0, 0);
>
> Why punt to the host? I don't see any reason why GUEST_ASSERT() wouldn't work
> here.
GUEST_ASSERT works and changed to it.
...
>> +
>> +int main(int argc, char *argv[])
>> +{
>> + struct kvm_vcpu *vcpu;
>> + struct kvm_vm *vm;
>> +
>> + TEST_REQUIRE(kvm_has_cap(KVM_CAP_X86_APIC_BUS_CYCLES_NS));
>> +
>> + vm = __vm_create(VM_SHAPE_DEFAULT, 1, 0);
>> + vm_ioctl(vm, KVM_SET_TSC_KHZ, (void *)(TSC_HZ / 1000));
>> + /*
>> + * KVM_CAP_X86_APIC_BUS_CYCLES_NS expects APIC bus clock rate in
>> + * nanoseconds and requires that no vCPU is created.
>
> Meh, I'd drop this comment. It should be quite obvious that the rate is in
> nanoseconds. And instead of adding a comment for the vCPU creation, do
> __vm_enable_cap() again _after_ creating a vCPU and verify it fails with -EINVAL.
Done.
>
>> + */
>> + vm_enable_cap(vm, KVM_CAP_X86_APIC_BUS_CYCLES_NS,
>> + NSEC_PER_SEC / APIC_BUS_CLOCK_FREQ);
>> + vcpu = vm_vcpu_add(vm, 0, guest_code);
>> +
>> + virt_pg_map(vm, APIC_DEFAULT_GPA, APIC_DEFAULT_GPA);
>> +
>> + test_apic_bus_clock(vcpu);
>> + kvm_vm_free(vm);
>> +}
Apart from my uncertainty surrounding CLI I believed that I am able to
address all your feedback with the resulting test looking as below. Is this
what you had in mind?
--->8---
From: Isaku Yamahata <[email protected]>
Subject: [PATCH] KVM: selftests: Add test for configure of x86 APIC bus frequency
Test if KVM emulates the APIC bus clock at the expected frequency when
userspace configures the frequency via KVM_CAP_X86_APIC_BUS_CYCLES_NS.
Set APIC timer's initial count to the maximum value and busy wait for 100
msec (largely arbitrary) using the TSC. Read the APIC timer's "current
count" to calculate the actual APIC bus clock frequency based on TSC
frequency.
Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
Co-developed-by: Reinette Chatre <[email protected]>
Signed-off-by: Reinette Chatre <[email protected]>
---
Changes v7:
- Drop Maxim Levitsky's Reviewed-by because of significant changes.
- Remove redefine of _GNU_SOURCE. (kernel test robot)
- Rewrite changelog and test description. (Sean)
- Do not set guest TSC frequency but instead discover it.
- Enable user space to set APIC bus frequency. (Sean)
- Use GUEST_ASSERT() from guest instead of TEST_ASSERT() from host. (Sean)
- Test xAPIC as well as x2APIC. (Sean)
- Add check that KVM_CAP_X86_APIC_BUS_CYCLES_NS cannot be set
after vCPU created. (Sean)
- Remove unnecessary static functions from single file test.
- Be consistent in types by using uint32_t/uint64_t instead of
u32/u64.
[SNIP older changes]
---
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/x86_64/apic.h | 7 +
.../kvm/x86_64/apic_bus_clock_test.c | 233 ++++++++++++++++++
3 files changed, 241 insertions(+)
create mode 100644 tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index ce8ff8e8ce3a..ad8b5d15f2bd 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -112,6 +112,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/vmx_invalid_nested_guest_state
TEST_GEN_PROGS_x86_64 += x86_64/vmx_set_nested_state_test
TEST_GEN_PROGS_x86_64 += x86_64/vmx_tsc_adjust_test
TEST_GEN_PROGS_x86_64 += x86_64/vmx_nested_tsc_scaling_test
+TEST_GEN_PROGS_x86_64 += x86_64/apic_bus_clock_test
TEST_GEN_PROGS_x86_64 += x86_64/xapic_ipi_test
TEST_GEN_PROGS_x86_64 += x86_64/xapic_state_test
TEST_GEN_PROGS_x86_64 += x86_64/xcr0_cpuid_test
diff --git a/tools/testing/selftests/kvm/include/x86_64/apic.h b/tools/testing/selftests/kvm/include/x86_64/apic.h
index bed316fdecd5..b0d2fc62e172 100644
--- a/tools/testing/selftests/kvm/include/x86_64/apic.h
+++ b/tools/testing/selftests/kvm/include/x86_64/apic.h
@@ -60,6 +60,13 @@
#define APIC_VECTOR_MASK 0x000FF
#define APIC_ICR2 0x310
#define SET_APIC_DEST_FIELD(x) ((x) << 24)
+#define APIC_LVT0 0x350
+#define APIC_LVT_TIMER_ONESHOT (0 << 17)
+#define APIC_LVT_TIMER_PERIODIC (1 << 17)
+#define APIC_LVT_TIMER_TSCDEADLINE (2 << 17)
+#define APIC_TMICT 0x380
+#define APIC_TMCCT 0x390
+#define APIC_TDCR 0x3E0
void apic_disable(void);
void xapic_enable(void);
diff --git a/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c b/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
new file mode 100644
index 000000000000..f2071c002bf5
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
@@ -0,0 +1,233 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2024 Intel Corporation
+ *
+ * Verify KVM correctly emulates the APIC bus frequency when the VMM configures
+ * the frequency via KVM_CAP_X86_APIC_BUS_CYCLES_NS. Start the APIC timer by
+ * programming TMICT (timer initial count) to the largest value possible (so
+ * that the timer will not expire during the test). Then, after an arbitrary
+ * amount of time has elapsed, verify TMCCT (timer current count) is within 1%
+ * of the expected value based on the time elapsed, the APIC bus frequency, and
+ * the programmed TDCR (timer divide configuration register).
+ */
+
+#include "apic.h"
+#include "test_util.h"
+
+/* Guest TSC frequency. Used to calibrate delays. */
+unsigned long tsc_hz;
+
+/*
+ * Pick 25MHz for APIC bus frequency. Different enough from the default 1GHz.
+ * User can override via command line.
+ */
+unsigned long apic_hz = 25 * 1000 * 1000;
+
+/*
+ * Possible TDCR values with matching divide count. Used to modify APIC
+ * timer frequency.
+ */
+struct {
+ uint32_t tdcr;
+ uint32_t divide_count;
+} tdcrs[] = {
+ {0x0, 2},
+ {0x1, 4},
+ {0x2, 8},
+ {0x3, 16},
+ {0x8, 32},
+ {0x9, 64},
+ {0xa, 128},
+ {0xb, 1},
+};
+
+void guest_verify(uint64_t tsc_cycles, uint32_t apic_cycles, uint32_t divide_count)
+{
+ uint64_t freq;
+
+ GUEST_ASSERT(tsc_cycles > 0);
+ freq = apic_cycles * divide_count * tsc_hz / tsc_cycles;
+ /* Check if measured frequency is within 1% of configured frequency. */
+ GUEST_ASSERT(freq < apic_hz * 101 / 100);
+ GUEST_ASSERT(freq > apic_hz * 99 / 100);
+}
+
+void x2apic_guest_code(void)
+{
+ uint32_t tmict, tmcct;
+ uint64_t tsc0, tsc1;
+ int i;
+
+ asm volatile("cli");
+
+ x2apic_enable();
+
+ /*
+ * Setup one-shot timer. The vector does not matter because the
+ * interrupt should not fire.
+ */
+ x2apic_write_reg(APIC_LVT0, APIC_LVT_TIMER_ONESHOT);
+
+ for (i = 0; i < ARRAY_SIZE(tdcrs); i++) {
+ x2apic_write_reg(APIC_TDCR, tdcrs[i].tdcr);
+
+ /* Set the largest value to not trigger the interrupt. */
+ tmict = ~0;
+ x2apic_write_reg(APIC_TMICT, tmict);
+
+ /* Busy wait for 100 msec. */
+ tsc0 = rdtsc();
+ tsc1 = tsc0;
+ while (tsc1 - tsc0 < tsc_hz / 1000 * 100)
+ tsc1 = rdtsc();
+
+ /* Read APIC timer and TSC. */
+ tmcct = x2apic_read_reg(APIC_TMCCT);
+ tsc1 = rdtsc();
+
+ /* Stop timer. */
+ x2apic_write_reg(APIC_TMICT, 0);
+
+ guest_verify(tsc1 - tsc0, tmict - tmcct, tdcrs[i].divide_count);
+ }
+
+ GUEST_DONE();
+}
+
+void xapic_guest_code(void)
+{
+ uint32_t tmict, tmcct;
+ uint64_t tsc0, tsc1;
+ int i;
+
+ asm volatile("cli");
+
+ xapic_enable();
+
+ /*
+ * Setup one-shot timer. The vector does not matter because the
+ * interrupt should not fire.
+ */
+ xapic_write_reg(APIC_LVT0, APIC_LVT_TIMER_ONESHOT);
+
+ for (i = 0; i < ARRAY_SIZE(tdcrs); i++) {
+ xapic_write_reg(APIC_TDCR, tdcrs[i].tdcr);
+
+ /* Set the largest value to not trigger the interrupt. */
+ tmict = ~0;
+ xapic_write_reg(APIC_TMICT, tmict);
+
+ /* Busy wait for 100 msec. */
+ tsc0 = rdtsc();
+ tsc1 = tsc0;
+ while (tsc1 - tsc0 < tsc_hz / 1000 * 100)
+ tsc1 = rdtsc();
+
+ /* Read APIC timer and TSC. */
+ tmcct = xapic_read_reg(APIC_TMCCT);
+ tsc1 = rdtsc();
+
+ /* Stop timer. */
+ xapic_write_reg(APIC_TMICT, 0);
+
+ guest_verify(tsc1 - tsc0, tmict - tmcct, tdcrs[i].divide_count);
+ }
+
+ GUEST_DONE();
+}
+
+void test_apic_bus_clock(struct kvm_vcpu *vcpu)
+{
+ bool done = false;
+ struct ucall uc;
+
+ while (!done) {
+ vcpu_run(vcpu);
+ TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
+
+ switch (get_ucall(vcpu, &uc)) {
+ case UCALL_DONE:
+ done = true;
+ break;
+ case UCALL_ABORT:
+ REPORT_GUEST_ASSERT(uc);
+ break;
+ default:
+ TEST_FAIL("Unknown ucall %lu", uc.cmd);
+ break;
+ }
+ }
+}
+
+void run_apic_bus_clock_test(bool xapic)
+{
+ struct kvm_vcpu *vcpu;
+ struct kvm_vm *vm;
+ int ret;
+
+ vm = vm_create(1);
+
+ tsc_hz = __vm_ioctl(vm, KVM_GET_TSC_KHZ, NULL) * 1000;
+ sync_global_to_guest(vm, tsc_hz);
+ sync_global_to_guest(vm, apic_hz);
+
+ vm_enable_cap(vm, KVM_CAP_X86_APIC_BUS_CYCLES_NS,
+ NSEC_PER_SEC / apic_hz);
+
+ vcpu = vm_vcpu_add(vm, 0, xapic ? xapic_guest_code : x2apic_guest_code);
+
+ ret = __vm_enable_cap(vm, KVM_CAP_X86_APIC_BUS_CYCLES_NS,
+ NSEC_PER_SEC / apic_hz);
+ TEST_ASSERT(ret < 0 && errno == EINVAL,
+ "Setting of APIC bus frequency after vCPU is created should fail.");
+
+ if (xapic)
+ virt_pg_map(vm, APIC_DEFAULT_GPA, APIC_DEFAULT_GPA);
+
+ test_apic_bus_clock(vcpu);
+ kvm_vm_free(vm);
+}
+
+void run_xapic_bus_clock_test(void)
+{
+ run_apic_bus_clock_test(true);
+}
+
+void run_x2apic_bus_clock_test(void)
+{
+ run_apic_bus_clock_test(false);
+}
+
+void help(char *name)
+{
+ puts("");
+ printf("usage: %s [-h] [-a APIC bus freq]\n", name);
+ puts("");
+ printf("-a: The APIC bus frequency (in Hz) to be configured for the guest.\n");
+ puts("");
+}
+
+int main(int argc, char *argv[])
+{
+ int opt;
+
+ TEST_REQUIRE(kvm_has_cap(KVM_CAP_X86_APIC_BUS_CYCLES_NS));
+ TEST_REQUIRE(kvm_has_cap(KVM_CAP_GET_TSC_KHZ));
+
+ while ((opt = getopt(argc, argv, "a:h")) != -1) {
+ switch (opt) {
+ case 'a':
+ apic_hz = atol(optarg);
+ break;
+ case 'h':
+ help(argv[0]);
+ exit(0);
+ default:
+ help(argv[0]);
+ exit(1);
+ }
+ }
+
+ run_xapic_bus_clock_test();
+ run_x2apic_bus_clock_test();
+}
--
2.34.1
On Tue, Jun 04, 2024, Reinette Chatre wrote:
> > > +/*
> > > + * Pick one convenient value, 1.5GHz. No special meaning and different from
> > > + * the default value, 1GHz.
> >
> > I have no idea where the 1GHz comes from. KVM doesn't force a default TSC, KVM
> > uses the underlying CPU's frequency. Peeking further ahead, I don't understand
> > why this test sets KVM_SET_TSC_KHZ. That brings in a whole different set of
> > behavior, and that behavior is already verified by tsc_scaling_sync.c.
> >
> > I suspect/assume this test forces a frequency so that it can hardcode the math,
> > but (a) that's odd and (b) x86 selftests really should provide a udelay() so that
> > goofy stuff like this doesn't end up in random tests.
>
> I believe the "default 1GHz" actually refers to the default APIC bus frequency and
> the goal was indeed to (a) make the TSC frequency different from APIC bus frequency,
> and (b) make math easier.
>
> Yes, there is no need to use KVM_SET_TSC_KHZ. An implementation of udelay() would
> require calibration and to make this simple for KVM I think we can just use
> KVM_GET_TSC_KHZ. For now I continue to open code this (see later) since I did not
> notice similar patterns in existing tests that may need a utility. I'd be happy
> to add a utility if the needed usage pattern is clear since the closest candidate
> I could find was xapic_ipi_test.c that does not have a nop loop.
Please add a utility. ARM and RISC-V already implement udelay(), and this isn't
the first test that's wanted udelay() functionality. At the very least, it'd be
nice to have for debug, e.g. to be able to insert artificial delay if a test is
failing due to a suspected race suspected. I.e. this is likely an "if you build
it, they will come" situations.
> > Unless I'm misremembering, the timer still counts when the LVT entry is masked
> > so just mask the IRQ in the LVT. Or rather, keep the entry masked in the LVT.
>
> hmmm ... I do not think this is specific to LVT entry but instead an attempt
> to ignore all maskable external interrupt that may interfere with the test.
I doubt it. And if that really is the motiviation, that's a fools errand. This
is _guest_ code. Disabling IRQs in the guest doesn't prevent host IRQs from being
delivered, it only blocks virtual IRQs. And KVM selftests guests should never
receive virtual IRQs unless the test itself explicitly sends them.
> LVT entry is prevented from triggering because if the large configuration value.
Yes and no. A large configuration value _should_ avoid a timer IRQ, but the
entire point of this test is to verify KVM correctly emulates the timer. If this
test does its job and finds a KVM bug that causes the timer to prematurely expire,
then unmasking the LVT entry will generate an unexpected IRQ.
Of course, the test doesn't configure a legal vector so the IRQ will never be
delivered. We could fix that problem, but then a test failure would manifest as
a hard-to-triage unexpected event, compared to an explicit TEST_ASSERT() on the
timer value.
That said, I'm not totally pposed to letting the guest die if KVM unexpectedly
injects a timer IRQ, e.g. if all is well, it's a cheap way to provide a bit of
extra coverage. On the other hand, masking the interrupt is even simpler, and
the odds of false pass are low.
On Tue, Jun 04, 2024, Sean Christopherson wrote:
> On Thu, 25 Apr 2024 15:06:58 -0700, Reinette Chatre wrote:
> > Changes from v4:
> > - v4: https://lore.kernel.org/lkml/[email protected]/
> > - Rename capability from KVM_CAP_X86_APIC_BUS_FREQUENCY to
> > KVM_CAP_X86_APIC_BUS_CYCLES_NS. (Xiaoyao Li).
> > - Include "Testing" section in cover letter.
> > - Add Rick's Reviewed-by tags.
> > - Rebased on latest of "next" branch of https://github.com/kvm-x86/linux.git
> >
> > [...]
>
> Applied the KVM changes to kvm-x86 misc (I'm feeling lucky). Please prioritize
> refreshing the selftests patch, I'd like to get it applied sooner than later
> for obvious reasons (I'm not feeling _that_ lucky).
>
> [1/4] KVM: x86: hyper-v: Calculate APIC bus frequency for Hyper-V
> https://github.com/kvm-x86/linux/commit/41c7b1bb656c
> [2/4] KVM: x86: Make nsec per APIC bus cycle a VM variable
> https://github.com/kvm-x86/linux/commit/01de6ce03b1e
> [3/4] KVM: x86: Add a capability to configure bus frequency for APIC timer
> https://github.com/kvm-x86/linux/commit/937296fd3deb
> [4/4] KVM: selftests: Add test for configure of x86 APIC bus frequency
> (not applied)
FYI, the hashes changed as I had to drop an unrelated commit.
[1/3] KVM: x86: hyper-v: Calculate APIC bus frequency for Hyper-V
https://github.com/kvm-x86/linux/commit/f9e1cbf1805e
[2/3] KVM: x86: Make nanoseconds per APIC bus cycle a VM variable
https://github.com/kvm-x86/linux/commit/b460256b162d
[3/3] KVM: x86: Add a capability to configure bus frequency for APIC timer
https://github.com/kvm-x86/linux/commit/6fef518594bc
Hi Sean,
On 6/5/24 7:22 AM, Sean Christopherson wrote:
> On Tue, Jun 04, 2024, Reinette Chatre wrote:
>>>> +/*
>>>> + * Pick one convenient value, 1.5GHz. No special meaning and different from
>>>> + * the default value, 1GHz.
>>>
>>> I have no idea where the 1GHz comes from. KVM doesn't force a default TSC, KVM
>>> uses the underlying CPU's frequency. Peeking further ahead, I don't understand
>>> why this test sets KVM_SET_TSC_KHZ. That brings in a whole different set of
>>> behavior, and that behavior is already verified by tsc_scaling_sync.c.
>>>
>>> I suspect/assume this test forces a frequency so that it can hardcode the math,
>>> but (a) that's odd and (b) x86 selftests really should provide a udelay() so that
>>> goofy stuff like this doesn't end up in random tests.
>>
>> I believe the "default 1GHz" actually refers to the default APIC bus frequency and
>> the goal was indeed to (a) make the TSC frequency different from APIC bus frequency,
>> and (b) make math easier.
>>
>> Yes, there is no need to use KVM_SET_TSC_KHZ. An implementation of udelay() would
>> require calibration and to make this simple for KVM I think we can just use
>> KVM_GET_TSC_KHZ. For now I continue to open code this (see later) since I did not
>> notice similar patterns in existing tests that may need a utility. I'd be happy
>> to add a utility if the needed usage pattern is clear since the closest candidate
>> I could find was xapic_ipi_test.c that does not have a nop loop.
>
> Please add a utility. ARM and RISC-V already implement udelay(), and this isn't
> the first test that's wanted udelay() functionality. At the very least, it'd be
> nice to have for debug, e.g. to be able to insert artificial delay if a test is
> failing due to a suspected race suspected. I.e. this is likely an "if you build
> it, they will come" situations.
Will do.
>
>>> Unless I'm misremembering, the timer still counts when the LVT entry is masked
>>> so just mask the IRQ in the LVT. Or rather, keep the entry masked in the LVT.
>>
>> hmmm ... I do not think this is specific to LVT entry but instead an attempt
>> to ignore all maskable external interrupt that may interfere with the test.
>
> I doubt it. And if that really is the motiviation, that's a fools errand. This
> is _guest_ code. Disabling IRQs in the guest doesn't prevent host IRQs from being
> delivered, it only blocks virtual IRQs. And KVM selftests guests should never
> receive virtual IRQs unless the test itself explicitly sends them.
Thank you for clarifying.
>
>> LVT entry is prevented from triggering because if the large configuration value.
>
> Yes and no. A large configuration value _should_ avoid a timer IRQ, but the
> entire point of this test is to verify KVM correctly emulates the timer. If this
> test does its job and finds a KVM bug that causes the timer to prematurely expire,
> then unmasking the LVT entry will generate an unexpected IRQ.
>
> Of course, the test doesn't configure a legal vector so the IRQ will never be
> delivered. We could fix that problem, but then a test failure would manifest as
> a hard-to-triage unexpected event, compared to an explicit TEST_ASSERT() on the
> timer value.
>
> That said, I'm not totally pposed to letting the guest die if KVM unexpectedly
> injects a timer IRQ, e.g. if all is well, it's a cheap way to provide a bit of
> extra coverage. On the other hand, masking the interrupt is even simpler, and
> the odds of false pass are low.
Understood. Will mask the interrupt in LVT entry as you suggested initially. While
checking which bits to set I realized that the existing test was enabling oneshot
mode in an entry where those bits are actually reserved. This is now fixed also.
I plan to send the changes as next version of this work, with merged patches
dropped from series.
Reinette