Changes from v7:
- v7: https://lore.kernel.org/lkml/[email protected]/
- Use appropriate ioctl() return type.
- Let new tsc_khz used by KVM selftest be of the same type as same variable
used in kernel.
- Please refer to individual patches for detailed changes.
Changes from v6:
- V6: https://lore.kernel.org/lkml/[email protected]/
- KVM changes were merged into kvm-x86 misc and subsequently dropped from
this submission. This submission only contains the selftest changes.
https://lore.kernel.org/lkml/[email protected]/
- New patch to add udelay() utility to x86_64 KVM selftests. (Sean)
- Many changes to APIC bus frequency selftest. Please refer to the patch
for detailed changes.
- Series applies cleanly to next branch of kvm-x86 with HEAD
af0903ab52ee6d6f0f63af67fa73d5eb00f79b9a.
Changes from v5:
- v5: https://lore.kernel.org/lkml/[email protected]/
- Rebased on latest of "next" branch of https://github.com/kvm-x86/linux.git
- Added Xiaoyao Li and Yuan Yao's Reviewed-by tags.
- Use the KVM selftest vm_create() wrapper instead of open coding it. (Zide)
- Improve grammar of selftest description. (Zide)
Changes from v4:
- v4: https://lore.kernel.org/lkml/[email protected]/
- Rename capability from KVM_CAP_X86_APIC_BUS_FREQUENCY to
KVM_CAP_X86_APIC_BUS_CYCLES_NS. (Xiaoyao Li).
- Include "Testing" section in cover letter.
- Add Rick's Reviewed-by tags.
- Rebased on latest of "next" branch of https://github.com/kvm-x86/linux.git
Changes from v3:
- v3: https://lore.kernel.org/all/[email protected]/
- Reworked all changelogs.
- Regarding code changes: patches #1 and #2 are unchanged, patch #3 was
reworked according to Sean's guidance, and patch #4 (the test)
needed changes after rebase to kvm-x86/next and the implementation
(patch #3) changes.
- Added Reviewed-by tags to patches #1, #2, and #4, but removed
Maxim's Reviewed-by tag from patch #3 because it changed so much.
- Added a "Suggested-by" to patch #4 to reflect that it represents
Sean's guidance.
- Reworked cover to match customs (in subject line) and reflect feedback
to patches: capability renamed to KVM_CAP_X86_APIC_BUS_FREQUENCY, clarify
distinction between "core crystal clock" and "bus clock", and update
pro/con list.
- Please refer to individual patches for detailed changes.
Changes from v2:
- v2: https://lore.kernel.org/lkml/[email protected]/
- Removed APIC_BUS_FREQUENCY and apic_bus_frequency of struct kvm-arch.
- Update the commit messages.
- Added reviewed-by (Maxim Levitsky)
- Use 1.5GHz instead of 1GHz as frequency for the test case.
Changes from v1:
- v1: https://lore.kernel.org/all/[email protected]/
- Added a test case
- Fix a build error for i386 platform
- Add check if vcpu isn't created.
- Add check if lapic chip is in-kernel emulation.
- Updated api.rst
Summary
-------
Add KVM_CAP_X86_APIC_BUS_CYCLES_NS capability to configure the APIC
bus clock frequency for APIC timer emulation.
Allow KVM_ENABLE_CAPABILITY(KVM_CAP_X86_APIC_BUS_CYCLES_NS) to set the
frequency in nanoseconds. When using this capability, the user space
VMM should configure CPUID leaf 0x15 to advertise the frequency.
Description
-----------
Vishal reported [1] that the TDX guest kernel expects a 25MHz APIC bus
frequency but ends up getting interrupts at a significantly higher rate.
The TDX architecture hard-codes the core crystal clock frequency to
25MHz and mandates exposing it via CPUID leaf 0x15. The TDX architecture
does not allow the VMM to override the value.
In addition, per Intel SDM:
"The APIC timer frequency will be the processor’s bus clock or core
crystal clock frequency (when TSC/core crystal clock ratio is
enumerated in CPUID leaf 0x15) divided by the value specified in
the divide configuration register."
The resulting 25MHz APIC bus frequency conflicts with the KVM hardcoded
APIC bus frequency of 1GHz.
The KVM doesn't enumerate CPUID leaf 0x15 to the guest unless the user
space VMM sets it using KVM_SET_CPUID. If the CPUID leaf 0x15 is
enumerated, the guest kernel uses it as the APIC bus frequency. If not,
the guest kernel measures the frequency based on other known timers like
the ACPI timer or the legacy PIT. As reported in [1] the TDX guest kernel
expects a 25MHz timer frequency but gets timer interrupt more frequently
due to the 1GHz frequency used by KVM.
To ensure that the guest doesn't have a conflicting view of the APIC bus
frequency, allow the userspace to tell KVM to use the same frequency that
TDX mandates instead of the default 1Ghz.
There are several options to address this:
1. Make the KVM able to configure APIC bus frequency (this series).
Pro: It resembles the existing hardware. The recent Intel CPUs
adopts 25MHz.
Con: Require the VMM to emulate the APIC timer at 25MHz.
2. Make the TDX architecture enumerate CPUID leaf 0x15 to configurable
frequency or not enumerate it.
Pro: Any APIC bus frequency is allowed.
Con: Deviates from TDX architecture.
3. Make the TDX guest kernel use 1GHz when it's running on KVM.
Con: The kernel ignores CPUID leaf 0x15.
4. Change CPUID leaf 0x15 under TDX to report the crystal clock frequency
as 1 GHz.
Pro: This has been the virtual APIC frequency for KVM guests for 13
years.
Pro: This requires changing only one hard-coded constant in TDX.
Con: It doesn't work with other VMMs as TDX isn't specific to KVM.
Con: Core crystal clock frequency is also used to calculate TSC frequency.
Con: If it is configured to value different from hardware, it will
break the correctness of INTEL-PT Mini Time Count (MTC) packets
in TDs.
Testing
-------
Tested on non-TDX host using included kselftest. Host running kernel
with this series applied to "next" branch of
https://github.com/kvm-x86/linux.git
Tested on TDX host and TD using a modified version
of x86/apic.c:test_apic_timer_one_shot() available from
https://github.com/intel/kvm-unit-tests-tdx/blob/tdx/x86/apic.c.
Host running TDX KVM development patches and QEMU with corresponding TDX
changes.
The test needed to be modified to (a) stop any lingering timers before the
test starts, and (b) use CPUID 0x15 in TDX to accurately determine the TSC
and APIC frequencies instead of making 1GHz assumption and use similar
check as the kselftest test introduced in this series (while increasing
the amount with which the frequency is allowed to deviate by 1%).
The core changes made to x86/apic.c:test_apic_timer_one_shot() for this
testing are shown below for reference. Work is in progress to upstream
these modifications.
@@ -477,11 +478,29 @@ static void lvtt_handler(isr_regs_t *regs)
static void test_apic_timer_one_shot(void)
{
- uint64_t tsc1, tsc2;
static const uint32_t interval = 0x10000;
+ uint64_t measured_apic_freq, tsc2, tsc1;
+ uint32_t tsc_freq = 0, apic_freq = 0;
+ struct cpuid cpuid_tsc = {};
#define APIC_LVT_TIMER_VECTOR (0xee)
+ /*
+ * CPUID 0x15 is not available in VMX, can use it to obtain
+ * TSC and APIC frequency for accurate testing
+ */
+ if (is_tdx_guest()) {
+ cpuid_tsc = raw_cpuid(0x15, 0);
+ tsc_freq = cpuid_tsc.c * cpuid_tsc.b / cpuid_tsc.a;
+ apic_freq = cpuid_tsc.c;
+ }
+ /*
+ stop already fired local timer
+ the test case can be negative failure if the timer fired
+ after installed lvtt_handler but *before*
+ write to TIMICT again.
+ */
+ apic_write(APIC_TMICT, 0);
handle_irq(APIC_LVT_TIMER_VECTOR, lvtt_handler);
/* One shot mode */
@@ -503,8 +522,16 @@ static void test_apic_timer_one_shot(void)
* cases, the following should satisfy on all modern
* processors.
*/
- report((lvtt_counter == 1) && (tsc2 - tsc1 >= interval),
- "APIC LVT timer one shot");
+ if (is_tdx_guest()) {
+ measured_apic_freq = interval * (tsc_freq / (tsc2 - tsc1));
+ report((lvtt_counter == 1) &&
+ (measured_apic_freq < apic_freq * 102 / 100) &&
+ (measured_apic_freq > apic_freq * 98 / 100),
+ "APIC LVT timer one shot");
+ } else {
+ report((lvtt_counter == 1) && (tsc2 - tsc1 >= interval),
+ "APIC LVT timer one shot");
+ }
}
[1] https://lore.kernel.org/lkml/[email protected]/
Isaku Yamahata (1):
KVM: selftests: Add test for configure of x86 APIC bus frequency
Reinette Chatre (1):
KVM: selftests: Add x86_64 guest udelay() utility
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/x86_64/apic.h | 8 +
.../selftests/kvm/include/x86_64/processor.h | 15 ++
.../selftests/kvm/lib/x86_64/processor.c | 12 +
.../kvm/x86_64/apic_bus_clock_test.c | 219 ++++++++++++++++++
5 files changed, 255 insertions(+)
create mode 100644 tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
--
2.34.1
Create udelay() utility for x86_64 tests to match
udelay() available to ARM and RISC-V tests.
Calibrate guest frequency using the KVM_GET_TSC_KHZ ioctl()
and share it between host and guest with the new
tsc_khz global variable.
Signed-off-by: Reinette Chatre <[email protected]>
---
Changes v8:
- Use appropriate signed type to discover TSC freq from KVM.
- Switch type used to store TSC frequency from unsigned long
to unsigned int to match the type used by the kernel.
Changes v7:
- New patch
---
.../selftests/kvm/include/x86_64/processor.h | 15 +++++++++++++++
.../testing/selftests/kvm/lib/x86_64/processor.c | 12 ++++++++++++
2 files changed, 27 insertions(+)
diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 8eb57de0b587..b473f210ba6c 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -23,6 +23,7 @@
extern bool host_cpu_is_intel;
extern bool host_cpu_is_amd;
+extern unsigned int tsc_khz;
/* Forced emulation prefix, used to invoke the emulator unconditionally. */
#define KVM_FEP "ud2; .byte 'k', 'v', 'm';"
@@ -815,6 +816,20 @@ static inline void cpu_relax(void)
asm volatile("rep; nop" ::: "memory");
}
+static inline void udelay(unsigned long usec)
+{
+ unsigned long cycles = tsc_khz / 1000 * usec;
+ uint64_t start, now;
+
+ start = rdtsc();
+ for (;;) {
+ now = rdtsc();
+ if (now - start >= cycles)
+ break;
+ cpu_relax();
+ }
+}
+
#define ud2() \
__asm__ __volatile__( \
"ud2\n" \
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index c664e446136b..ff579674032f 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -25,6 +25,7 @@ vm_vaddr_t exception_handlers;
bool host_cpu_is_amd;
bool host_cpu_is_intel;
bool is_forced_emulation_enabled;
+unsigned int tsc_khz;
static void regs_dump(FILE *stream, struct kvm_regs *regs, uint8_t indent)
{
@@ -616,6 +617,8 @@ void assert_on_unhandled_exception(struct kvm_vcpu *vcpu)
void kvm_arch_vm_post_create(struct kvm_vm *vm)
{
+ int r;
+
vm_create_irqchip(vm);
vm_init_descriptor_tables(vm);
@@ -628,6 +631,15 @@ void kvm_arch_vm_post_create(struct kvm_vm *vm)
vm_sev_ioctl(vm, KVM_SEV_INIT2, &init);
}
+
+ if (kvm_has_cap(KVM_CAP_GET_TSC_KHZ)) {
+ r = __vm_ioctl(vm, KVM_GET_TSC_KHZ, NULL);
+ if (r < 0)
+ tsc_khz = 0;
+ else
+ tsc_khz = r;
+ sync_global_to_guest(vm, tsc_khz);
+ }
}
void vcpu_arch_set_entry_point(struct kvm_vcpu *vcpu, void *guest_code)
--
2.34.1
From: Isaku Yamahata <[email protected]>
Test if KVM emulates the APIC bus clock at the expected frequency when
userspace configures the frequency via KVM_CAP_X86_APIC_BUS_CYCLES_NS.
Set APIC timer's initial count to the maximum value and busy wait for 100
msec (largely arbitrary) using the TSC. Read the APIC timer's "current
count" to calculate the actual APIC bus clock frequency based on TSC
frequency.
Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
Co-developed-by: Reinette Chatre <[email protected]>
Signed-off-by: Reinette Chatre <[email protected]>
---
Changes v8:
- With TSC frequency switching to unsigned int type the implicit
type promotion during bus frequency computation results in
overflow. Use explicit unsigned long type within computation to
avoid overflow.
Changes v7:
- Drop Maxim Levitsky's Reviewed-by because of significant changes.
- Remove redefine of _GNU_SOURCE. (kernel test robot)
- Rewrite changelog and test description. (Sean)
- Enable user space to set APIC bus frequency. (Sean)
- Use GUEST_ASSERT() from guest instead of TEST_ASSERT() from host. (Sean)
- Test xAPIC as well as x2APIC. (Sean)
- Add check that KVM_CAP_X86_APIC_BUS_CYCLES_NS cannot be set
after vCPU created. (Sean)
- Use new udelay() utility. (Sean)
- Drop CLI. Mask LVT timer independently. (Sean)
- Use correct LVT timer entry (0x350 -> 0x320) to enable oneshot operation.
- Remove unnecessary static functions from single file test.
- Be consistent in types by using uint32_t/uint64_t instead of
u32/u64.
Changes v6:
- Use vm_create() wrapper instead of open coding it. (Zide)
- Improve grammar of test description. (Zide)
Changes v5:
- Update to new name of capability KVM_CAP_X86_APIC_BUS_FREQUENCY ->
KVM_CAP_X86_APIC_BUS_CYCLES_NS. (Xiaoyao Li)
Changes v4:
- Rework changelog.
- Add Sean's "Suggested-by" to acknowledge guidance received in
https://lore.kernel.org/all/[email protected]/
- Add copyright.
- Add test description to file header.
- Consistent capitalization for acronyms.
- Rebase to kvm-x86/next.
- Update to v4 change of providing bus clock rate in nanoseconds.
- Add a "TEST_REQUIRE()" for the new capability so that the test can
work on kernels that do not support the new capability.
- Address checkpatch warnings and use tabs instead of spaces in header
file to match existing code.
Changes v3:
- Use 1.5GHz instead of 1GHz as frequency.
Changes v2:
- Newly added.
fixup of test
---
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/x86_64/apic.h | 8 +
.../kvm/x86_64/apic_bus_clock_test.c | 219 ++++++++++++++++++
3 files changed, 228 insertions(+)
create mode 100644 tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index ce8ff8e8ce3a..ad8b5d15f2bd 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -112,6 +112,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/vmx_invalid_nested_guest_state
TEST_GEN_PROGS_x86_64 += x86_64/vmx_set_nested_state_test
TEST_GEN_PROGS_x86_64 += x86_64/vmx_tsc_adjust_test
TEST_GEN_PROGS_x86_64 += x86_64/vmx_nested_tsc_scaling_test
+TEST_GEN_PROGS_x86_64 += x86_64/apic_bus_clock_test
TEST_GEN_PROGS_x86_64 += x86_64/xapic_ipi_test
TEST_GEN_PROGS_x86_64 += x86_64/xapic_state_test
TEST_GEN_PROGS_x86_64 += x86_64/xcr0_cpuid_test
diff --git a/tools/testing/selftests/kvm/include/x86_64/apic.h b/tools/testing/selftests/kvm/include/x86_64/apic.h
index bed316fdecd5..0f268b55fa06 100644
--- a/tools/testing/selftests/kvm/include/x86_64/apic.h
+++ b/tools/testing/selftests/kvm/include/x86_64/apic.h
@@ -60,6 +60,14 @@
#define APIC_VECTOR_MASK 0x000FF
#define APIC_ICR2 0x310
#define SET_APIC_DEST_FIELD(x) ((x) << 24)
+#define APIC_LVTT 0x320
+#define APIC_LVT_TIMER_ONESHOT (0 << 17)
+#define APIC_LVT_TIMER_PERIODIC (1 << 17)
+#define APIC_LVT_TIMER_TSCDEADLINE (2 << 17)
+#define APIC_LVT_MASKED (1 << 16)
+#define APIC_TMICT 0x380
+#define APIC_TMCCT 0x390
+#define APIC_TDCR 0x3E0
void apic_disable(void);
void xapic_enable(void);
diff --git a/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c b/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
new file mode 100644
index 000000000000..602cec91d8ee
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
@@ -0,0 +1,219 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2024 Intel Corporation
+ *
+ * Verify KVM correctly emulates the APIC bus frequency when the VMM configures
+ * the frequency via KVM_CAP_X86_APIC_BUS_CYCLES_NS. Start the APIC timer by
+ * programming TMICT (timer initial count) to the largest value possible (so
+ * that the timer will not expire during the test). Then, after an arbitrary
+ * amount of time has elapsed, verify TMCCT (timer current count) is within 1%
+ * of the expected value based on the time elapsed, the APIC bus frequency, and
+ * the programmed TDCR (timer divide configuration register).
+ */
+
+#include "apic.h"
+#include "test_util.h"
+
+/*
+ * Pick 25MHz for APIC bus frequency. Different enough from the default 1GHz.
+ * User can override via command line.
+ */
+unsigned long apic_hz = 25 * 1000 * 1000;
+
+/*
+ * Possible TDCR values with matching divide count. Used to modify APIC
+ * timer frequency.
+ */
+struct {
+ uint32_t tdcr;
+ uint32_t divide_count;
+} tdcrs[] = {
+ {0x0, 2},
+ {0x1, 4},
+ {0x2, 8},
+ {0x3, 16},
+ {0x8, 32},
+ {0x9, 64},
+ {0xa, 128},
+ {0xb, 1},
+};
+
+void guest_verify(uint64_t tsc_cycles, uint32_t apic_cycles, uint32_t divide_count)
+{
+ unsigned long tsc_hz = tsc_khz * 1000;
+ uint64_t freq;
+
+ GUEST_ASSERT(tsc_cycles > 0);
+ freq = apic_cycles * divide_count * tsc_hz / tsc_cycles;
+ /* Check if measured frequency is within 1% of configured frequency. */
+ GUEST_ASSERT(freq < apic_hz * 101 / 100);
+ GUEST_ASSERT(freq > apic_hz * 99 / 100);
+}
+
+void x2apic_guest_code(void)
+{
+ uint32_t tmict, tmcct;
+ uint64_t tsc0, tsc1;
+ int i;
+
+ x2apic_enable();
+
+ /*
+ * Setup one-shot timer. The vector does not matter because the
+ * interrupt should not fire.
+ */
+ x2apic_write_reg(APIC_LVTT, APIC_LVT_TIMER_ONESHOT | APIC_LVT_MASKED);
+
+ for (i = 0; i < ARRAY_SIZE(tdcrs); i++) {
+ x2apic_write_reg(APIC_TDCR, tdcrs[i].tdcr);
+
+ /* Set the largest value to not trigger the interrupt. */
+ tmict = ~0;
+ x2apic_write_reg(APIC_TMICT, tmict);
+
+ /* Busy wait for 100 msec. */
+ tsc0 = rdtsc();
+ udelay(100000);
+ /* Read APIC timer and TSC. */
+ tmcct = x2apic_read_reg(APIC_TMCCT);
+ tsc1 = rdtsc();
+
+ /* Stop timer. */
+ x2apic_write_reg(APIC_TMICT, 0);
+
+ guest_verify(tsc1 - tsc0, tmict - tmcct, tdcrs[i].divide_count);
+ }
+
+ GUEST_DONE();
+}
+
+void xapic_guest_code(void)
+{
+ uint32_t tmict, tmcct;
+ uint64_t tsc0, tsc1;
+ int i;
+
+ xapic_enable();
+
+ /*
+ * Setup one-shot timer. The vector does not matter because the
+ * interrupt should not fire.
+ */
+ xapic_write_reg(APIC_LVTT, APIC_LVT_TIMER_ONESHOT | APIC_LVT_MASKED);
+
+ for (i = 0; i < ARRAY_SIZE(tdcrs); i++) {
+ xapic_write_reg(APIC_TDCR, tdcrs[i].tdcr);
+
+ /* Set the largest value to not trigger the interrupt. */
+ tmict = ~0;
+ xapic_write_reg(APIC_TMICT, tmict);
+
+ /* Busy wait for 100 msec. */
+ tsc0 = rdtsc();
+ udelay(100000);
+ /* Read APIC timer and TSC. */
+ tmcct = xapic_read_reg(APIC_TMCCT);
+ tsc1 = rdtsc();
+
+ /* Stop timer. */
+ xapic_write_reg(APIC_TMICT, 0);
+
+ guest_verify(tsc1 - tsc0, tmict - tmcct, tdcrs[i].divide_count);
+ }
+
+ GUEST_DONE();
+}
+
+void test_apic_bus_clock(struct kvm_vcpu *vcpu)
+{
+ bool done = false;
+ struct ucall uc;
+
+ while (!done) {
+ vcpu_run(vcpu);
+ TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
+
+ switch (get_ucall(vcpu, &uc)) {
+ case UCALL_DONE:
+ done = true;
+ break;
+ case UCALL_ABORT:
+ REPORT_GUEST_ASSERT(uc);
+ break;
+ default:
+ TEST_FAIL("Unknown ucall %lu", uc.cmd);
+ break;
+ }
+ }
+}
+
+void run_apic_bus_clock_test(bool xapic)
+{
+ struct kvm_vcpu *vcpu;
+ struct kvm_vm *vm;
+ int ret;
+
+ vm = vm_create(1);
+
+ sync_global_to_guest(vm, apic_hz);
+
+ vm_enable_cap(vm, KVM_CAP_X86_APIC_BUS_CYCLES_NS,
+ NSEC_PER_SEC / apic_hz);
+
+ vcpu = vm_vcpu_add(vm, 0, xapic ? xapic_guest_code : x2apic_guest_code);
+
+ ret = __vm_enable_cap(vm, KVM_CAP_X86_APIC_BUS_CYCLES_NS,
+ NSEC_PER_SEC / apic_hz);
+ TEST_ASSERT(ret < 0 && errno == EINVAL,
+ "Setting of APIC bus frequency after vCPU is created should fail.");
+
+ if (xapic)
+ virt_pg_map(vm, APIC_DEFAULT_GPA, APIC_DEFAULT_GPA);
+
+ test_apic_bus_clock(vcpu);
+ kvm_vm_free(vm);
+}
+
+void run_xapic_bus_clock_test(void)
+{
+ run_apic_bus_clock_test(true);
+}
+
+void run_x2apic_bus_clock_test(void)
+{
+ run_apic_bus_clock_test(false);
+}
+
+void help(char *name)
+{
+ puts("");
+ printf("usage: %s [-h] [-a APIC bus freq]\n", name);
+ puts("");
+ printf("-a: The APIC bus frequency (in Hz) to be configured for the guest.\n");
+ puts("");
+}
+
+int main(int argc, char *argv[])
+{
+ int opt;
+
+ TEST_REQUIRE(kvm_has_cap(KVM_CAP_X86_APIC_BUS_CYCLES_NS));
+ TEST_REQUIRE(kvm_has_cap(KVM_CAP_GET_TSC_KHZ));
+
+ while ((opt = getopt(argc, argv, "a:h")) != -1) {
+ switch (opt) {
+ case 'a':
+ apic_hz = atol(optarg);
+ break;
+ case 'h':
+ help(argv[0]);
+ exit(0);
+ default:
+ help(argv[0]);
+ exit(1);
+ }
+ }
+
+ run_xapic_bus_clock_test();
+ run_x2apic_bus_clock_test();
+}
--
2.34.1
On Mon, Jun 10, 2024, Reinette Chatre wrote:
> diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
> index 8eb57de0b587..b473f210ba6c 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/processor.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
> @@ -23,6 +23,7 @@
>
> extern bool host_cpu_is_intel;
> extern bool host_cpu_is_amd;
> +extern unsigned int tsc_khz;
>
> /* Forced emulation prefix, used to invoke the emulator unconditionally. */
> #define KVM_FEP "ud2; .byte 'k', 'v', 'm';"
> @@ -815,6 +816,20 @@ static inline void cpu_relax(void)
> asm volatile("rep; nop" ::: "memory");
> }
>
> +static inline void udelay(unsigned long usec)
uint64_t instead of unsigned long? Practically speaking it doesn't change anything,
but I don't see any reason to mix "unsigned long" and "uint64_t", e.g. the max
delay isn't a property of the address space.
> +{
> + unsigned long cycles = tsc_khz / 1000 * usec;
> + uint64_t start, now;
> +
> + start = rdtsc();
> + for (;;) {
> + now = rdtsc();
> + if (now - start >= cycles)
> + break;
> + cpu_relax();
> + }
> +}
> +
> #define ud2() \
> __asm__ __volatile__( \
> "ud2\n" \
> diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> index c664e446136b..ff579674032f 100644
> --- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
> +++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> @@ -25,6 +25,7 @@ vm_vaddr_t exception_handlers;
> bool host_cpu_is_amd;
> bool host_cpu_is_intel;
> bool is_forced_emulation_enabled;
> +unsigned int tsc_khz;
Slight preference for uint32_t, mostly because KVM stores its version as a u32.
> static void regs_dump(FILE *stream, struct kvm_regs *regs, uint8_t indent)
> {
> @@ -616,6 +617,8 @@ void assert_on_unhandled_exception(struct kvm_vcpu *vcpu)
>
> void kvm_arch_vm_post_create(struct kvm_vm *vm)
> {
> + int r;
> +
> vm_create_irqchip(vm);
> vm_init_descriptor_tables(vm);
>
> @@ -628,6 +631,15 @@ void kvm_arch_vm_post_create(struct kvm_vm *vm)
>
> vm_sev_ioctl(vm, KVM_SEV_INIT2, &init);
> }
> +
> + if (kvm_has_cap(KVM_CAP_GET_TSC_KHZ)) {
I think we should make this a TEST_REQUIRE(), or maybe even a TEST_ASSERT().
Support for KVM_GET_TSC_KHZ predates KVM selftests by 7+ years.
> + r = __vm_ioctl(vm, KVM_GET_TSC_KHZ, NULL);
> + if (r < 0)
Heh, the docs are stale. KVM hasn't returned an error since commit cc578287e322
("KVM: Infrastructure for software and hardware based TSC rate scaling"), which
again predates selftests by many years (6+ in this case). To make our lives
much simpler, I think we should assert that KVM_GET_TSC_KHZ succeeds, and maybe
throw in a GUEST_ASSERT(thz_khz) in udelay()?
E.g. as is, if KVM_GET_TSC_KHZ is allowed to fail, then we risk having to deal
with weird failures due to udelay() unexpectedly doing nothing.
> + tsc_khz = 0;
> + else
> + tsc_khz = r;
> + sync_global_to_guest(vm, tsc_khz);
> + }
> }
>
> void vcpu_arch_set_entry_point(struct kvm_vcpu *vcpu, void *guest_code)
> --
> 2.34.1
>
On Mon, Jun 10, 2024, Reinette Chatre wrote:
> diff --git a/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c b/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
> new file mode 100644
> index 000000000000..602cec91d8ee
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
> @@ -0,0 +1,219 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2024 Intel Corporation
> + *
> + * Verify KVM correctly emulates the APIC bus frequency when the VMM configures
> + * the frequency via KVM_CAP_X86_APIC_BUS_CYCLES_NS. Start the APIC timer by
> + * programming TMICT (timer initial count) to the largest value possible (so
> + * that the timer will not expire during the test). Then, after an arbitrary
> + * amount of time has elapsed, verify TMCCT (timer current count) is within 1%
> + * of the expected value based on the time elapsed, the APIC bus frequency, and
> + * the programmed TDCR (timer divide configuration register).
> + */
> +
> +#include "apic.h"
> +#include "test_util.h"
> +
> +/*
> + * Pick 25MHz for APIC bus frequency. Different enough from the default 1GHz.
> + * User can override via command line.
> + */
> +unsigned long apic_hz = 25 * 1000 * 1000;
static, and maybe a uint64_t to match the other stuff?
> +/*
> + * Possible TDCR values with matching divide count. Used to modify APIC
> + * timer frequency.
> + */
> +struct {
> + uint32_t tdcr;
> + uint32_t divide_count;
> +} tdcrs[] = {
> + {0x0, 2},
> + {0x1, 4},
> + {0x2, 8},
> + {0x3, 16},
> + {0x8, 32},
> + {0x9, 64},
> + {0xa, 128},
> + {0xb, 1},
> +};
> +
> +void guest_verify(uint64_t tsc_cycles, uint32_t apic_cycles, uint32_t divide_count)
uin64_t for apic_cycles? And maybe something like guest_check_apic_count(), to
make it more obvious what is being verified? Actually, it should be quite easy
to have the two flavors share the bulk of the code.
> +{
> + unsigned long tsc_hz = tsc_khz * 1000;
> + uint64_t freq;
> +
> + GUEST_ASSERT(tsc_cycles > 0);
Is this necessary? Won't the "freq < ..." check fail? I love me some paranoia,
but this seems unnecessary.
> + freq = apic_cycles * divide_count * tsc_hz / tsc_cycles;
> + /* Check if measured frequency is within 1% of configured frequency. */
> + GUEST_ASSERT(freq < apic_hz * 101 / 100);
> + GUEST_ASSERT(freq > apic_hz * 99 / 100);
> +}
> +
> +void x2apic_guest_code(void)
> +{
> + uint32_t tmict, tmcct;
> + uint64_t tsc0, tsc1;
> + int i;
> +
> + x2apic_enable();
> +
> + /*
> + * Setup one-shot timer. The vector does not matter because the
> + * interrupt should not fire.
> + */
> + x2apic_write_reg(APIC_LVTT, APIC_LVT_TIMER_ONESHOT | APIC_LVT_MASKED);
> +
> + for (i = 0; i < ARRAY_SIZE(tdcrs); i++) {
> + x2apic_write_reg(APIC_TDCR, tdcrs[i].tdcr);
> +
> + /* Set the largest value to not trigger the interrupt. */
Nit, the goal isn't to avoid triggering the interrupt, e.g. the above masking
takes care of that. The goal is to prevent the timer from expiring, because if
the timer expires it stops counting and will trigger a false failure because the
TSC doesn't stop counting.
Honestly, I would just delete the comment. Same with the "busy wait for 100 msec"
and "Read APIC timer and TSC" comments. They state the obvious. Loading the max
TMICT is mildly interesting, but that's covered by the file-level comment.
> + tmict = ~0;
This really belongs outside of the loop, e.g.
const uint32_t tmict = ~0u;
> + x2apic_write_reg(APIC_TMICT, tmict);
> +
> + /* Busy wait for 100 msec. */
Hmm, should this be configurable?
> + tsc0 = rdtsc();
> + udelay(100000);
> + /* Read APIC timer and TSC. */
> + tmcct = x2apic_read_reg(APIC_TMCCT);
> + tsc1 = rdtsc();
> +
> + /* Stop timer. */
This comment is a bit more interesting, as readers might not know writing '0'
stops the timer. But that's even more interesting is the ordering, e.g. it's
not at all unreasonable to think that the timer should be stopped _before_ reading
the current count. E.g. something like:
/*
* Stop the timer _after_ reading the current, final count, as
* writing the initial counter also modifies the current count.
*/
> + x2apic_write_reg(APIC_TMICT, 0);
> +
> + guest_verify(tsc1 - tsc0, tmict - tmcct, tdcrs[i].divide_count);
> + }
> +
> + GUEST_DONE();
> +}
> +
> +void xapic_guest_code(void)
> +{
> + uint32_t tmict, tmcct;
> + uint64_t tsc0, tsc1;
> + int i;
> +
> + xapic_enable();
> +
> + /*
> + * Setup one-shot timer. The vector does not matter because the
> + * interrupt should not fire.
> + */
> + xapic_write_reg(APIC_LVTT, APIC_LVT_TIMER_ONESHOT | APIC_LVT_MASKED);
> +
> + for (i = 0; i < ARRAY_SIZE(tdcrs); i++) {
> + xapic_write_reg(APIC_TDCR, tdcrs[i].tdcr);
> +
> + /* Set the largest value to not trigger the interrupt. */
> + tmict = ~0;
> + xapic_write_reg(APIC_TMICT, tmict);
> +
> + /* Busy wait for 100 msec. */
> + tsc0 = rdtsc();
> + udelay(100000);
> + /* Read APIC timer and TSC. */
> + tmcct = xapic_read_reg(APIC_TMCCT);
> + tsc1 = rdtsc();
> +
> + /* Stop timer. */
> + xapic_write_reg(APIC_TMICT, 0);
> +
> + guest_verify(tsc1 - tsc0, tmict - tmcct, tdcrs[i].divide_count);
That's some nice copy+paste :-)
This test isn't writing ICR, so the whole 32-bit vs. 64-bit weirdness with xAPIC
vs X2APIC is irrevelant. Two tiny helpers, a global flag, and you can avoid a
pile of copy+paste, and the need to find a better name than guest_verify().
static bool is_x2apic;
static uint32_t apic_read_reg(unsigned int reg)
{
return is_x2apic ? x2apic_read_reg(reg) : xapic_read_reg(reg);
}
static void apic_read_write(unsigned int reg, uint32_t val)
{
if (is_x2apic)
x2apic_write_reg(reg, val);
else
xapic_write_reg(reg, val);
return is_x2apic ? x2apic_read_reg(reg) : xapic_read_reg(reg);
}
> + }
> +
> + GUEST_DONE();
> +}
> +
> +void test_apic_bus_clock(struct kvm_vcpu *vcpu)
> +{
> + bool done = false;
> + struct ucall uc;
> +
> + while (!done) {
> + vcpu_run(vcpu);
> + TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
> +
> + switch (get_ucall(vcpu, &uc)) {
> + case UCALL_DONE:
> + done = true;
> + break;
> + case UCALL_ABORT:
> + REPORT_GUEST_ASSERT(uc);
> + break;
> + default:
> + TEST_FAIL("Unknown ucall %lu", uc.cmd);
> + break;
> + }
> + }
> +}
> +
> +void run_apic_bus_clock_test(bool xapic)
> +{
> + struct kvm_vcpu *vcpu;
> + struct kvm_vm *vm;
> + int ret;
> +
> + vm = vm_create(1);
> +
> + sync_global_to_guest(vm, apic_hz);
> +
> + vm_enable_cap(vm, KVM_CAP_X86_APIC_BUS_CYCLES_NS,
> + NSEC_PER_SEC / apic_hz);
> +
> + vcpu = vm_vcpu_add(vm, 0, xapic ? xapic_guest_code : x2apic_guest_code);
> +
> + ret = __vm_enable_cap(vm, KVM_CAP_X86_APIC_BUS_CYCLES_NS,
> + NSEC_PER_SEC / apic_hz);
> + TEST_ASSERT(ret < 0 && errno == EINVAL,
> + "Setting of APIC bus frequency after vCPU is created should fail.");
> +
> + if (xapic)
> + virt_pg_map(vm, APIC_DEFAULT_GPA, APIC_DEFAULT_GPA);
> +
> + test_apic_bus_clock(vcpu);
> + kvm_vm_free(vm);
> +}
> +
> +void run_xapic_bus_clock_test(void)
> +{
> + run_apic_bus_clock_test(true);
> +}
> +
> +void run_x2apic_bus_clock_test(void)
> +{
> + run_apic_bus_clock_test(false);
> +}
> +
> +void help(char *name)
> +{
> + puts("");
> + printf("usage: %s [-h] [-a APIC bus freq]\n", name);
> + puts("");
> + printf("-a: The APIC bus frequency (in Hz) to be configured for the guest.\n");
> + puts("");
> +}
> +
> +int main(int argc, char *argv[])
> +{
> + int opt;
> +
> + TEST_REQUIRE(kvm_has_cap(KVM_CAP_X86_APIC_BUS_CYCLES_NS));
> + TEST_REQUIRE(kvm_has_cap(KVM_CAP_GET_TSC_KHZ));
> +
> + while ((opt = getopt(argc, argv, "a:h")) != -1) {
> + switch (opt) {
> + case 'a':
Maybe -f for frequency instead of -a for APIC? And if we make the delay
configurable, -d (delay)?
> + apic_hz = atol(optarg);
> + break;
> + case 'h':
> + help(argv[0]);
> + exit(0);
> + default:
> + help(argv[0]);
> + exit(1);
> + }
> + }
> +
> + run_xapic_bus_clock_test();
> + run_x2apic_bus_clock_test();
> +}
> --
> 2.34.1
>
Hi Sean,
Thank you very much for your detailed feedback.
On 6/10/24 5:21 PM, Sean Christopherson wrote:
> On Mon, Jun 10, 2024, Reinette Chatre wrote:
>> diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
>> index 8eb57de0b587..b473f210ba6c 100644
>> --- a/tools/testing/selftests/kvm/include/x86_64/processor.h
>> +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
>> @@ -23,6 +23,7 @@
>>
>> extern bool host_cpu_is_intel;
>> extern bool host_cpu_is_amd;
>> +extern unsigned int tsc_khz;
>>
>> /* Forced emulation prefix, used to invoke the emulator unconditionally. */
>> #define KVM_FEP "ud2; .byte 'k', 'v', 'm';"
>> @@ -815,6 +816,20 @@ static inline void cpu_relax(void)
>> asm volatile("rep; nop" ::: "memory");
>> }
>>
>> +static inline void udelay(unsigned long usec)
>
> uint64_t instead of unsigned long? Practically speaking it doesn't change anything,
> but I don't see any reason to mix "unsigned long" and "uint64_t", e.g. the max
> delay isn't a property of the address space.
I assume that you refer to "cycles" below. Will do.
>
>> +{
>> + unsigned long cycles = tsc_khz / 1000 * usec;
>> + uint64_t start, now;
>> +
>> + start = rdtsc();
>> + for (;;) {
>> + now = rdtsc();
>> + if (now - start >= cycles)
>> + break;
>> + cpu_relax();
>> + }
>> +}
>> +
>> #define ud2() \
>> __asm__ __volatile__( \
>> "ud2\n" \
>> diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
>> index c664e446136b..ff579674032f 100644
>> --- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
>> +++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
>> @@ -25,6 +25,7 @@ vm_vaddr_t exception_handlers;
>> bool host_cpu_is_amd;
>> bool host_cpu_is_intel;
>> bool is_forced_emulation_enabled;
>> +unsigned int tsc_khz;
>
> Slight preference for uint32_t, mostly because KVM stores its version as a u32.
Changed it to uint32_t.
>
>> static void regs_dump(FILE *stream, struct kvm_regs *regs, uint8_t indent)
>> {
>> @@ -616,6 +617,8 @@ void assert_on_unhandled_exception(struct kvm_vcpu *vcpu)
>>
>> void kvm_arch_vm_post_create(struct kvm_vm *vm)
>> {
>> + int r;
>> +
>> vm_create_irqchip(vm);
>> vm_init_descriptor_tables(vm);
>>
>> @@ -628,6 +631,15 @@ void kvm_arch_vm_post_create(struct kvm_vm *vm)
>>
>> vm_sev_ioctl(vm, KVM_SEV_INIT2, &init);
>> }
>> +
>> + if (kvm_has_cap(KVM_CAP_GET_TSC_KHZ)) {
>
> I think we should make this a TEST_REQUIRE(), or maybe even a TEST_ASSERT().
> Support for KVM_GET_TSC_KHZ predates KVM selftests by 7+ years.
Changed it to a TEST_ASSERT() right at the beginning of kvm_arch_vm_post_create().
>
>> + r = __vm_ioctl(vm, KVM_GET_TSC_KHZ, NULL);
>> + if (r < 0)
>
> Heh, the docs are stale. KVM hasn't returned an error since commit cc578287e322
> ("KVM: Infrastructure for software and hardware based TSC rate scaling"), which
> again predates selftests by many years (6+ in this case). To make our lives
> much simpler, I think we should assert that KVM_GET_TSC_KHZ succeeds, and maybe
> throw in a GUEST_ASSERT(thz_khz) in udelay()?
I added the GUEST_ASSERT() but I find that it comes with a caveat (more below).
I plan an assert as below that would end up testing the same as what a
GUEST_ASSERT(tsc_khz) would accomplish:
r = __vm_ioctl(vm, KVM_GET_TSC_KHZ, NULL);
TEST_ASSERT(r > 0, "KVM_GET_TSC_KHZ did not provide a valid TSC freq.");
tsc_khz = r;
Caveat is: the additional GUEST_ASSERT() requires all tests that use udelay() in
the guest to now subtly be required to implement a ucall (UCALL_ABORT) handler.
I did a crude grep check to see and of the 69 x86_64 tests there are 47 that do
indeed have a UCALL_ABORT handler. If any of the other use udelay() then the
GUEST_ASSERT() will of course still trigger, but will be quite cryptic. For
example, "Unhandled exception '0xe' at guest RIP '0x0'" vs. "tsc_khz".
> E.g. as is, if KVM_GET_TSC_KHZ is allowed to fail, then we risk having to deal
> with weird failures due to udelay() unexpectedly doing nothing.
>
>
>> + tsc_khz = 0;
>> + else
>> + tsc_khz = r;
>> + sync_global_to_guest(vm, tsc_khz);
>> + }
>> }
>>
>> void vcpu_arch_set_entry_point(struct kvm_vcpu *vcpu, void *guest_code)
>> --
>> 2.34.1
>>
Reinette
Hi Sean,
On 6/10/24 5:51 PM, Sean Christopherson wrote:
> On Mon, Jun 10, 2024, Reinette Chatre wrote:
>> diff --git a/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c b/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
>> new file mode 100644
>> index 000000000000..602cec91d8ee
>> --- /dev/null
>> +++ b/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
>> @@ -0,0 +1,219 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * Copyright (c) 2024 Intel Corporation
>> + *
>> + * Verify KVM correctly emulates the APIC bus frequency when the VMM configures
>> + * the frequency via KVM_CAP_X86_APIC_BUS_CYCLES_NS. Start the APIC timer by
>> + * programming TMICT (timer initial count) to the largest value possible (so
>> + * that the timer will not expire during the test). Then, after an arbitrary
>> + * amount of time has elapsed, verify TMCCT (timer current count) is within 1%
>> + * of the expected value based on the time elapsed, the APIC bus frequency, and
>> + * the programmed TDCR (timer divide configuration register).
>> + */
>> +
>> +#include "apic.h"
>> +#include "test_util.h"
>> +
>> +/*
>> + * Pick 25MHz for APIC bus frequency. Different enough from the default 1GHz.
>> + * User can override via command line.
>> + */
>> +unsigned long apic_hz = 25 * 1000 * 1000;
>
> static, and maybe a uint64_t to match the other stuff?
Sure. Also moved all other globals and functions back to static.
>
>> +/*
>> + * Possible TDCR values with matching divide count. Used to modify APIC
>> + * timer frequency.
>> + */
>> +struct {
>> + uint32_t tdcr;
>> + uint32_t divide_count;
>> +} tdcrs[] = {
>> + {0x0, 2},
>> + {0x1, 4},
>> + {0x2, 8},
>> + {0x3, 16},
>> + {0x8, 32},
>> + {0x9, 64},
>> + {0xa, 128},
>> + {0xb, 1},
>> +};
>> +
>> +void guest_verify(uint64_t tsc_cycles, uint32_t apic_cycles, uint32_t divide_count)
>
> uin64_t for apic_cycles? And maybe something like guest_check_apic_count(), to
> make it more obvious what is being verified? Actually, it should be quite easy
> to have the two flavors share the bulk of the code.
I now plan to drop this function and instead just open code the
checks in what has now become a shared function between xAPIC and x2APIC.
>
>> +{
>> + unsigned long tsc_hz = tsc_khz * 1000;
>> + uint64_t freq;
>> +
>> + GUEST_ASSERT(tsc_cycles > 0);
>
> Is this necessary? Won't the "freq < ..." check fail? I love me some paranoia,
> but this seems unnecessary.
Sure. After needing to field reports from static checkers not able to determine
that a denominator can never be zero I do tend to add these checks just to
pre-emptively placate them. I did run the code through a static checker after making
all planned changes and it had no complaints so it is now gone.
>
>> + freq = apic_cycles * divide_count * tsc_hz / tsc_cycles;
>> + /* Check if measured frequency is within 1% of configured frequency. */
>> + GUEST_ASSERT(freq < apic_hz * 101 / 100);
>> + GUEST_ASSERT(freq > apic_hz * 99 / 100);
>> +}
>> +
>> +void x2apic_guest_code(void)
>> +{
>> + uint32_t tmict, tmcct;
>> + uint64_t tsc0, tsc1;
>> + int i;
>> +
>> + x2apic_enable();
>> +
>> + /*
>> + * Setup one-shot timer. The vector does not matter because the
>> + * interrupt should not fire.
>> + */
>> + x2apic_write_reg(APIC_LVTT, APIC_LVT_TIMER_ONESHOT | APIC_LVT_MASKED);
>> +
>> + for (i = 0; i < ARRAY_SIZE(tdcrs); i++) {
>> + x2apic_write_reg(APIC_TDCR, tdcrs[i].tdcr);
>> +
>> + /* Set the largest value to not trigger the interrupt. */
>
> Nit, the goal isn't to avoid triggering the interrupt, e.g. the above masking
> takes care of that. The goal is to prevent the timer from expiring, because if
> the timer expires it stops counting and will trigger a false failure because the
> TSC doesn't stop counting.
>
> Honestly, I would just delete the comment. Same with the "busy wait for 100 msec"
> and "Read APIC timer and TSC" comments. They state the obvious. Loading the max
> TMICT is mildly interesting, but that's covered by the file-level comment.
>
>> + tmict = ~0;
>
> This really belongs outside of the loop, e.g.
>
> const uint32_t tmict = ~0u;
>
>> + x2apic_write_reg(APIC_TMICT, tmict);
>> +
>> + /* Busy wait for 100 msec. */
>
> Hmm, should this be configurable?
Will do.
>
>> + tsc0 = rdtsc();
>> + udelay(100000);
>> + /* Read APIC timer and TSC. */
>> + tmcct = x2apic_read_reg(APIC_TMCCT);
>> + tsc1 = rdtsc();
>> +
>> + /* Stop timer. */
>
> This comment is a bit more interesting, as readers might not know writing '0'
> stops the timer. But that's even more interesting is the ordering, e.g. it's
> not at all unreasonable to think that the timer should be stopped _before_ reading
> the current count. E.g. something like:
>
> /*
> * Stop the timer _after_ reading the current, final count, as
> * writing the initial counter also modifies the current count.
> */
>
>> + x2apic_write_reg(APIC_TMICT, 0);
>> +
>> + guest_verify(tsc1 - tsc0, tmict - tmcct, tdcrs[i].divide_count);
>> + }
>> +
>> + GUEST_DONE();
>> +}
>> +
>> +void xapic_guest_code(void)
>> +{
>> + uint32_t tmict, tmcct;
>> + uint64_t tsc0, tsc1;
>> + int i;
>> +
>> + xapic_enable();
>> +
>> + /*
>> + * Setup one-shot timer. The vector does not matter because the
>> + * interrupt should not fire.
>> + */
>> + xapic_write_reg(APIC_LVTT, APIC_LVT_TIMER_ONESHOT | APIC_LVT_MASKED);
>> +
>> + for (i = 0; i < ARRAY_SIZE(tdcrs); i++) {
>> + xapic_write_reg(APIC_TDCR, tdcrs[i].tdcr);
>> +
>> + /* Set the largest value to not trigger the interrupt. */
>> + tmict = ~0;
>> + xapic_write_reg(APIC_TMICT, tmict);
>> +
>> + /* Busy wait for 100 msec. */
>> + tsc0 = rdtsc();
>> + udelay(100000);
>> + /* Read APIC timer and TSC. */
>> + tmcct = xapic_read_reg(APIC_TMCCT);
>> + tsc1 = rdtsc();
>> +
>> + /* Stop timer. */
>> + xapic_write_reg(APIC_TMICT, 0);
>> +
>> + guest_verify(tsc1 - tsc0, tmict - tmcct, tdcrs[i].divide_count);
>
> That's some nice copy+paste :-)
>
> This test isn't writing ICR, so the whole 32-bit vs. 64-bit weirdness with xAPIC
> vs X2APIC is irrevelant. Two tiny helpers, a global flag, and you can avoid a
> pile of copy+paste, and the need to find a better name than guest_verify().
Will do. Thank you very much for your detailed and valuable feedback.
Reinette
On Tue, Jun 11, 2024, Reinette Chatre wrote:
> > Heh, the docs are stale. KVM hasn't returned an error since commit cc578287e322
> > ("KVM: Infrastructure for software and hardware based TSC rate scaling"), which
> > again predates selftests by many years (6+ in this case). To make our lives
> > much simpler, I think we should assert that KVM_GET_TSC_KHZ succeeds, and maybe
> > throw in a GUEST_ASSERT(thz_khz) in udelay()?
>
> I added the GUEST_ASSERT() but I find that it comes with a caveat (more below).
>
> I plan an assert as below that would end up testing the same as what a
> GUEST_ASSERT(tsc_khz) would accomplish:
>
> r = __vm_ioctl(vm, KVM_GET_TSC_KHZ, NULL);
> TEST_ASSERT(r > 0, "KVM_GET_TSC_KHZ did not provide a valid TSC freq.");
> tsc_khz = r;
>
>
> Caveat is: the additional GUEST_ASSERT() requires all tests that use udelay() in
> the guest to now subtly be required to implement a ucall (UCALL_ABORT) handler.
> I did a crude grep check to see and of the 69 x86_64 tests there are 47 that do
> indeed have a UCALL_ABORT handler. If any of the other use udelay() then the
> GUEST_ASSERT() will of course still trigger, but will be quite cryptic. For
> example, "Unhandled exception '0xe' at guest RIP '0x0'" vs. "tsc_khz".
Yeah, we really need to add a bit more infrastructure, there is way, way, waaaay
too much boilerplate needed just to run a guest and handle the basic ucalls.
Reporting guest asserts should Just Work for 99.9% of tests.
Anyways, is it any less cryptic if ucall_assert() forces a failure? I forget if
the problem with an unhandled GUEST_ASSERT() is that the test re-enters the guest,
or if it's something else.
I don't think we need a perfect solution _now_, as tsc_khz really should never
be 0, just something to not make life completely miserable for future developers.
diff --git a/tools/testing/selftests/kvm/lib/ucall_common.c b/tools/testing/selftests/kvm/lib/ucall_common.c
index 42151e571953..1116bce5cdbf 100644
--- a/tools/testing/selftests/kvm/lib/ucall_common.c
+++ b/tools/testing/selftests/kvm/lib/ucall_common.c
@@ -98,6 +98,8 @@ void ucall_assert(uint64_t cmd, const char *exp, const char *file,
ucall_arch_do_ucall((vm_vaddr_t)uc->hva);
+ ucall_arch_do_ucall(GUEST_UCALL_FAILED);
+
ucall_free(uc);
}
Hi Sean,
On 6/11/24 3:03 PM, Sean Christopherson wrote:
> On Tue, Jun 11, 2024, Reinette Chatre wrote:
>>> Heh, the docs are stale. KVM hasn't returned an error since commit cc578287e322
>>> ("KVM: Infrastructure for software and hardware based TSC rate scaling"), which
>>> again predates selftests by many years (6+ in this case). To make our lives
>>> much simpler, I think we should assert that KVM_GET_TSC_KHZ succeeds, and maybe
>>> throw in a GUEST_ASSERT(thz_khz) in udelay()?
>>
>> I added the GUEST_ASSERT() but I find that it comes with a caveat (more below).
>>
>> I plan an assert as below that would end up testing the same as what a
>> GUEST_ASSERT(tsc_khz) would accomplish:
>>
>> r = __vm_ioctl(vm, KVM_GET_TSC_KHZ, NULL);
>> TEST_ASSERT(r > 0, "KVM_GET_TSC_KHZ did not provide a valid TSC freq.");
>> tsc_khz = r;
>>
>>
>> Caveat is: the additional GUEST_ASSERT() requires all tests that use udelay() in
>> the guest to now subtly be required to implement a ucall (UCALL_ABORT) handler.
>> I did a crude grep check to see and of the 69 x86_64 tests there are 47 that do
>> indeed have a UCALL_ABORT handler. If any of the other use udelay() then the
>> GUEST_ASSERT() will of course still trigger, but will be quite cryptic. For
>> example, "Unhandled exception '0xe' at guest RIP '0x0'" vs. "tsc_khz".
>
> Yeah, we really need to add a bit more infrastructure, there is way, way, waaaay
> too much boilerplate needed just to run a guest and handle the basic ucalls.
> Reporting guest asserts should Just Work for 99.9% of tests.
>
> Anyways, is it any less cryptic if ucall_assert() forces a failure? I forget if
> the problem with an unhandled GUEST_ASSERT() is that the test re-enters the guest,
> or if it's something else.
>
> I don't think we need a perfect solution _now_, as tsc_khz really should never
> be 0, just something to not make life completely miserable for future developers.
>
> diff --git a/tools/testing/selftests/kvm/lib/ucall_common.c b/tools/testing/selftests/kvm/lib/ucall_common.c
> index 42151e571953..1116bce5cdbf 100644
> --- a/tools/testing/selftests/kvm/lib/ucall_common.c
> +++ b/tools/testing/selftests/kvm/lib/ucall_common.c
> @@ -98,6 +98,8 @@ void ucall_assert(uint64_t cmd, const char *exp, const char *file,
>
> ucall_arch_do_ucall((vm_vaddr_t)uc->hva);
>
> + ucall_arch_do_ucall(GUEST_UCALL_FAILED);
> +
> ucall_free(uc);
> }
>
Thank you very much.
With your suggestion an example unhandled GUEST_ASSERT() looks as below.
It does not guide on what (beyond vcpu_run()) triggered the assert but it
indeed provides a hint that adding ucall handling may be needed.
[SNIP]
==== Test Assertion Failure ====
lib/ucall_common.c:154: addr != (void *)GUEST_UCALL_FAILED
pid=16002 tid=16002 errno=4 - Interrupted system call
1 0x000000000040da91: get_ucall at ucall_common.c:154
2 0x0000000000410142: assert_on_unhandled_exception at processor.c:614
3 0x0000000000406590: _vcpu_run at kvm_util.c:1718
4 (inlined by) vcpu_run at kvm_util.c:1729
5 0x00000000004026cf: test_apic_bus_clock at apic_bus_clock_test.c:115
6 (inlined by) run_apic_bus_clock_test at apic_bus_clock_test.c:164
7 (inlined by) main at apic_bus_clock_test.c:201
8 0x00007fb1d8429d8f: ?? ??:0
9 0x00007fb1d8429e3f: ?? ??:0
10 0x00000000004027a4: _start at ??:?
Guest failed to allocate ucall struct
[SNIP]
Is this acceptable? I can add a new preparatory patch with your
suggestion that has as its goal to provide slightly better error message
when there is an unhandled ucall.
Reinette
On Tue, Jun 11, 2024, Reinette Chatre wrote:
> > diff --git a/tools/testing/selftests/kvm/lib/ucall_common.c b/tools/testing/selftests/kvm/lib/ucall_common.c
> > index 42151e571953..1116bce5cdbf 100644
> > --- a/tools/testing/selftests/kvm/lib/ucall_common.c
> > +++ b/tools/testing/selftests/kvm/lib/ucall_common.c
> > @@ -98,6 +98,8 @@ void ucall_assert(uint64_t cmd, const char *exp, const char *file,
> > ucall_arch_do_ucall((vm_vaddr_t)uc->hva);
> > + ucall_arch_do_ucall(GUEST_UCALL_FAILED);
> > +
> > ucall_free(uc);
> > }
> >
>
> Thank you very much.
>
> With your suggestion an example unhandled GUEST_ASSERT() looks as below.
> It does not guide on what (beyond vcpu_run()) triggered the assert but it
> indeed provides a hint that adding ucall handling may be needed.
>
> [SNIP]
> ==== Test Assertion Failure ====
> lib/ucall_common.c:154: addr != (void *)GUEST_UCALL_FAILED
> pid=16002 tid=16002 errno=4 - Interrupted system call
> 1 0x000000000040da91: get_ucall at ucall_common.c:154
> 2 0x0000000000410142: assert_on_unhandled_exception at processor.c:614
> 3 0x0000000000406590: _vcpu_run at kvm_util.c:1718
> 4 (inlined by) vcpu_run at kvm_util.c:1729
> 5 0x00000000004026cf: test_apic_bus_clock at apic_bus_clock_test.c:115
> 6 (inlined by) run_apic_bus_clock_test at apic_bus_clock_test.c:164
> 7 (inlined by) main at apic_bus_clock_test.c:201
> 8 0x00007fb1d8429d8f: ?? ??:0
> 9 0x00007fb1d8429e3f: ?? ??:0
> 10 0x00000000004027a4: _start at ??:?
> Guest failed to allocate ucall struct
/facepalm
No, it won't work, e.g. relies on get_ucall() being invoked. I'm also being
unnecessarily clever, and missing the obvious, simple solution.
The only reason tests manually handle UCALL_ABORT is because back when it was
added, there was no sprintf support in the guest, i.e. the guest could only spit
out raw information, it couldn't format a human-readable error message. And so
tests manually handled UCALL_ABORT with a custom message.
When we added sprintf support, (almost) all tests moved formatting to the guest
and converged on using REPORT_GUEST_ASSERT(), but we never completed the cleanup
by moving REPORT_GUEST_ASSERT() to common code.
Even more ridiculous is that assert_on_unhandled_exception() is still a thing.
That code exists _literally_ to handle this scenario, where common guest library
code needs to signal a failure.
In short, the right way to resolve this is to have _vcpu_run() (or maybe even
__vcpu_run()) handle UCALL_ABORT. The the bajillion REPORT_GUEST_ASSERT() calls
can be removed, as can UCALL_UNHANDLED and assert_on_unhandled_exception() since
they can and should use a normal GUEST_ASSERT() now that guest code can provide
the formating, and library code will ensure the assert is reported.
For this series, just ignore the GUEST_ASSERT() wonkiness. If someone develops
a test that uses udelay(), doesn't handle ucalls, _and_ runs on funky hardware,
then so be it, they can come yell at me :-)
And I'll work on a series to handle UCALL_ABORT in _vcpu_run() (and poke around
a bit more to see if there's other low hanging cleanup fruit).
Hi Sean,
On 6/11/24 6:15 PM, Sean Christopherson wrote:
> On Tue, Jun 11, 2024, Reinette Chatre wrote:
>>> diff --git a/tools/testing/selftests/kvm/lib/ucall_common.c b/tools/testing/selftests/kvm/lib/ucall_common.c
>>> index 42151e571953..1116bce5cdbf 100644
>>> --- a/tools/testing/selftests/kvm/lib/ucall_common.c
>>> +++ b/tools/testing/selftests/kvm/lib/ucall_common.c
>>> @@ -98,6 +98,8 @@ void ucall_assert(uint64_t cmd, const char *exp, const char *file,
>>> ucall_arch_do_ucall((vm_vaddr_t)uc->hva);
>>> + ucall_arch_do_ucall(GUEST_UCALL_FAILED);
>>> +
>>> ucall_free(uc);
>>> }
>>>
>>
>> Thank you very much.
>>
>> With your suggestion an example unhandled GUEST_ASSERT() looks as below.
>> It does not guide on what (beyond vcpu_run()) triggered the assert but it
>> indeed provides a hint that adding ucall handling may be needed.
>>
>> [SNIP]
>> ==== Test Assertion Failure ====
>> lib/ucall_common.c:154: addr != (void *)GUEST_UCALL_FAILED
>> pid=16002 tid=16002 errno=4 - Interrupted system call
>> 1 0x000000000040da91: get_ucall at ucall_common.c:154
>> 2 0x0000000000410142: assert_on_unhandled_exception at processor.c:614
>> 3 0x0000000000406590: _vcpu_run at kvm_util.c:1718
>> 4 (inlined by) vcpu_run at kvm_util.c:1729
>> 5 0x00000000004026cf: test_apic_bus_clock at apic_bus_clock_test.c:115
>> 6 (inlined by) run_apic_bus_clock_test at apic_bus_clock_test.c:164
>> 7 (inlined by) main at apic_bus_clock_test.c:201
>> 8 0x00007fb1d8429d8f: ?? ??:0
>> 9 0x00007fb1d8429e3f: ?? ??:0
>> 10 0x00000000004027a4: _start at ??:?
>> Guest failed to allocate ucall struct
>
> /facepalm
>
> No, it won't work, e.g. relies on get_ucall() being invoked. I'm also being
> unnecessarily clever, and missing the obvious, simple solution.
>
> The only reason tests manually handle UCALL_ABORT is because back when it was
> added, there was no sprintf support in the guest, i.e. the guest could only spit
> out raw information, it couldn't format a human-readable error message. And so
> tests manually handled UCALL_ABORT with a custom message.
>
> When we added sprintf support, (almost) all tests moved formatting to the guest
> and converged on using REPORT_GUEST_ASSERT(), but we never completed the cleanup
> by moving REPORT_GUEST_ASSERT() to common code.
>
> Even more ridiculous is that assert_on_unhandled_exception() is still a thing.
> That code exists _literally_ to handle this scenario, where common guest library
> code needs to signal a failure.
>
> In short, the right way to resolve this is to have _vcpu_run() (or maybe even
> __vcpu_run()) handle UCALL_ABORT. The the bajillion REPORT_GUEST_ASSERT() calls
> can be removed, as can UCALL_UNHANDLED and assert_on_unhandled_exception() since
> they can and should use a normal GUEST_ASSERT() now that guest code can provide
> the formating, and library code will ensure the assert is reported.
>
> For this series, just ignore the GUEST_ASSERT() wonkiness. If someone develops
> a test that uses udelay(), doesn't handle ucalls, _and_ runs on funky hardware,
> then so be it, they can come yell at me :-)
>
> And I'll work on a series to handle UCALL_ABORT in _vcpu_run() (and poke around
> a bit more to see if there's other low hanging cleanup fruit).
Thank you very much for explaining these details. Next version intends to address
all your feedback and I will send that shortly.
Reinette