2024-03-21 16:46:38

by Reinette Chatre

[permalink] [raw]
Subject: [PATCH V4 0/4] KVM: x86: Make bus clock frequency for vAPIC timer configurable

(I am helping Isaku for a bit by submitting the next versions of this
series.)

Changes from v3:
- v3: https://lore.kernel.org/all/[email protected]/
- Reworked all changelogs.
- Regarding code changes: patches #1 and #2 are unchanged, patch #3 was
reworked according to Sean's guidance, and patch #4 (the test)
needed changes after rebase to kvm-x86/next and the implementation
(patch #3) changes.
- Added Reviewed-by tags to patches #1, #2, and #4, but removed
Maxim's Reviewed-by tag from patch #3 because it changed so much.
- Added a "Suggested-by" to patch #4 to reflect that it represents
Sean's guidance.
- Reworked cover to match customs (in subject line) and reflect feedback
to patches: capability renamed to KVM_CAP_X86_APIC_BUS_FREQUENCY, clarify
distinction between "core crystal clock" and "bus clock", and update
pro/con list.
- Please refer to individual patches for detailed changes.

Changes from v2:
- v2: https://lore.kernel.org/lkml/[email protected]/
- Removed APIC_BUS_FREQUENCY and apic_bus_frequency of struct kvm-arch.
- Update the commit messages.
- Added reviewed-by (Maxim Levitsky)
- Use 1.5GHz instead of 1GHz as frequency for the test case.

Changes from v1:
- v1: https://lore.kernel.org/all/[email protected]/
- Added a test case
- Fix a build error for i386 platform
- Add check if vcpu isn't created.
- Add check if lapic chip is in-kernel emulation.
- Updated api.rst

Summary
-------
Add KVM_CAP_X86_APIC_BUS_FREQUENCY capability to configure the APIC
bus clock frequency for APIC timer emulation.
Allow KVM_ENABLE_CAPABILITY(KVM_CAP_X86_APIC_BUS_FREQUENCY) to set the
frequency in nanoseconds. When using this capability, the user space
VMM should configure CPUID leaf 0x15 to advertise the frequency.

Description
-----------
Vishal reported [1] that the TDX guest kernel expects a 25MHz APIC bus
frequency but ends up getting interrupts at a significantly higher rate.

The TDX architecture hard-codes the core crystal clock frequency to
25MHz and mandates exposing it via CPUID leaf 0x15. The TDX architecture
does not allow the VMM to override the value.

In addition, per Intel SDM:
"The APIC timer frequency will be the processor’s bus clock or core
crystal clock frequency (when TSC/core crystal clock ratio is
enumerated in CPUID leaf 0x15) divided by the value specified in
the divide configuration register."

The resulting 25MHz APIC bus frequency conflicts with the KVM hardcoded
APIC bus frequency of 1GHz.

The KVM doesn't enumerate CPUID leaf 0x15 to the guest unless the user
space VMM sets it using KVM_SET_CPUID. If the CPUID leaf 0x15 is
enumerated, the guest kernel uses it as the APIC bus frequency. If not,
the guest kernel measures the frequency based on other known timers like
the ACPI timer or the legacy PIT. As reported in [1] the TDX guest kernel
expects a 25MHz timer frequency but gets timer interrupt more frequently
due to the 1GHz frequency used by KVM.

To ensure that the guest doesn't have a conflicting view of the APIC bus
frequency, allow the userspace to tell KVM to use the same frequency that
TDX mandates instead of the default 1Ghz.

There are several options to address this:
1. Make the KVM able to configure APIC bus frequency (this series).
Pro: It resembles the existing hardware. The recent Intel CPUs
adopts 25MHz.
Con: Require the VMM to emulate the APIC timer at 25MHz.
2. Make the TDX architecture enumerate CPUID leaf 0x15 to configurable
frequency or not enumerate it.
Pro: Any APIC bus frequency is allowed.
Con: Deviates from TDX architecture.
3. Make the TDX guest kernel use 1GHz when it's running on KVM.
Con: The kernel ignores CPUID leaf 0x15.
4. Change CPUID leaf 0x15 under TDX to report the crystal clock frequency
as 1 GHz.
Pro: This has been the virtual APIC frequency for KVM guests for 13
years.
Pro: This requires changing only one hard-coded constant in TDX.
Con: It doesn't work with other VMMs as TDX isn't specific to KVM.
Con: Core crystal clock frequency is also used to calculate TSC frequency.
Con: If it is configured to value different from hardware, it will
break the correctness of INTEL-PT Mini Time Count (MTC) packets
in TDs.

[1] https://lore.kernel.org/lkml/[email protected]/

Isaku Yamahata (4):
KVM: x86: hyper-v: Calculate APIC bus frequency for Hyper-V
KVM: x86: Make nsec per APIC bus cycle a VM variable
KVM: x86: Add a capability to configure bus frequency for APIC timer
KVM: selftests: Add test for configure of x86 APIC bus frequency

Documentation/virt/kvm/api.rst | 17 ++
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/hyperv.c | 3 +-
arch/x86/kvm/lapic.c | 6 +-
arch/x86/kvm/lapic.h | 3 +-
arch/x86/kvm/x86.c | 28 +++
include/uapi/linux/kvm.h | 1 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/x86_64/apic.h | 7 +
.../kvm/x86_64/apic_bus_clock_test.c | 166 ++++++++++++++++++
10 files changed, 228 insertions(+), 5 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c


base-commit: 964d0c614c7f71917305a5afdca9178fe8231434
--
2.34.1



2024-03-21 16:47:22

by Reinette Chatre

[permalink] [raw]
Subject: [PATCH V4 3/4] KVM: x86: Add a capability to configure bus frequency for APIC timer

From: Isaku Yamahata <[email protected]>

Add KVM_CAP_X86_APIC_BUS_FREQUENCY capability to configure the APIC
bus clock frequency for APIC timer emulation.
Allow KVM_ENABLE_CAPABILITY(KVM_CAP_X86_APIC_BUS_FREQUENCY) to set the
frequency in nanoseconds. When using this capability, the user space
VMM should configure CPUID leaf 0x15 to advertise the frequency.

Vishal reported that the TDX guest kernel expects a 25MHz APIC bus
frequency but ends up getting interrupts at a significantly higher rate.

The TDX architecture hard-codes the core crystal clock frequency to
25MHz and mandates exposing it via CPUID leaf 0x15. The TDX architecture
does not allow the VMM to override the value.

In addition, per Intel SDM:
"The APIC timer frequency will be the processor’s bus clock or core
crystal clock frequency (when TSC/core crystal clock ratio is
enumerated in CPUID leaf 0x15) divided by the value specified in
the divide configuration register."

The resulting 25MHz APIC bus frequency conflicts with the KVM hardcoded
APIC bus frequency of 1GHz.

The KVM doesn't enumerate CPUID leaf 0x15 to the guest unless the user
space VMM sets it using KVM_SET_CPUID. If the CPUID leaf 0x15 is
enumerated, the guest kernel uses it as the APIC bus frequency. If not,
the guest kernel measures the frequency based on other known timers like
the ACPI timer or the legacy PIT. As reported by Vishal the TDX guest
kernel expects a 25MHz timer frequency but gets timer interrupt more
frequently due to the 1GHz frequency used by KVM.

To ensure that the guest doesn't have a conflicting view of the APIC bus
frequency, allow the userspace to tell KVM to use the same frequency that
TDX mandates instead of the default 1Ghz.

There are several options to address this:
1. Make the KVM able to configure APIC bus frequency (this series).
Pro: It resembles the existing hardware. The recent Intel CPUs
adapts 25MHz.
Con: Require the VMM to emulate the APIC timer at 25MHz.
2. Make the TDX architecture enumerate CPUID leaf 0x15 to configurable
frequency or not enumerate it.
Pro: Any APIC bus frequency is allowed.
Con: Deviates from TDX architecture.
3. Make the TDX guest kernel use 1GHz when it's running on KVM.
Con: The kernel ignores CPUID leaf 0x15.
4. Change CPUID leaf 0x15 under TDX to report the crystal clock frequency
as 1 GHz.
Pro: This has been the virtual APIC frequency for KVM guests for 13
years.
Pro: This requires changing only one hard-coded constant in TDX.
Con: It doesn't work with other VMMs as TDX isn't specific to KVM.
Con: Core crystal clock frequency is also used to calculate TSC
frequency.
Con: If it is configured to value different from hardware, it will
break the correctness of INTEL-PT Mini Time Count (MTC) packets
in TDs.

Reported-by: Vishal Annapurve <[email protected]>
Closes: https://lore.kernel.org/lkml/[email protected]/
Signed-off-by: Isaku Yamahata <[email protected]>
Co-developed-by: Reinette Chatre <[email protected]>
Signed-off-by: Reinette Chatre <[email protected]>
---

Changes v4:
- Rework implementation following Sean's guidance in:
https://lore.kernel.org/all/[email protected]/
- Reword con #2 to acknowledge feedback. (Sean)
- Add the "con" information from Xiaoyao during earlier review of v2.
- Rework changelog to address comments related to "bus clock" vs
"core crystal clock" frequency. (Xiaoyao)
- Drop snippet about impact on TSC deadline timer emulation. (Maxim)
- Drop Maxim Levitsky's "Reviewed-by" tag due to many changes to patch
since tag received.
- Switch "Subject:" to match custom "KVM: X86:" -> "KVM: x86:"

Changes v3:
- Added reviewed-by Maxim Levitsky.
- Minor update of the commit message.

Changes v2:
- Add check if vcpu isn't created.
- Add check if lapic chip is in-kernel emulation.
- Fix build error for i386.
- Add document to api.rst.
- Typo in the commit message.

Documentation/virt/kvm/api.rst | 17 +++++++++++++++++
arch/x86/kvm/x86.c | 27 +++++++++++++++++++++++++++
include/uapi/linux/kvm.h | 1 +
3 files changed, 45 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 0b5a33ee71ee..20080fe4b8ee 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8063,6 +8063,23 @@ error/annotated fault.

See KVM_EXIT_MEMORY_FAULT for more information.

+7.35 KVM_CAP_X86_APIC_BUS_FREQUENCY
+-----------------------------------
+
+:Architectures: x86
+:Target: VM
+:Parameters: args[0] is the desired APIC bus clock rate, in nanoseconds
+:Returns: 0 on success, -EINVAL if args[0] contains an invalid value for the
+ frequency or if any vCPUs have been created, -ENXIO if a virtual
+ local APIC has not been created using KVM_CREATE_IRQCHIP.
+
+This capability sets VM's APIC bus clock frequency, used by KVM's in-kernel
+virtual APIC when emulating APIC timers. KVM's default value can be retrieved
+by KVM_CHECK_EXTENSION.
+
+Note: Userspace is responsible for correctly configuring CPUID 0x15, a.k.a. the
+core crystal clock frequency, if a non-zero CPUID 0x15 is exposed to the guest.
+
8. Other capabilities.
======================

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 23f25799c760..bf52c0ca7a56 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4713,6 +4713,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_MEMORY_FAULT_INFO:
r = 1;
break;
+ case KVM_CAP_X86_APIC_BUS_FREQUENCY:
+ r = APIC_BUS_CYCLE_NS_DEFAULT;
+ break;
case KVM_CAP_EXIT_HYPERCALL:
r = KVM_EXIT_HYPERCALL_VALID_MASK;
break;
@@ -6726,6 +6729,30 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
}
mutex_unlock(&kvm->lock);
break;
+ case KVM_CAP_X86_APIC_BUS_FREQUENCY: {
+ u64 bus_cycle_ns = cap->args[0];
+ u64 unused;
+
+ r = -EINVAL;
+ /*
+ * Guard against overflow in tmict_to_ns(). 128 is the highest
+ * divide value that can be programmed in APIC_TDCR.
+ */
+ if (!bus_cycle_ns ||
+ check_mul_overflow((u64)U32_MAX * 128, bus_cycle_ns, &unused))
+ break;
+
+ r = 0;
+ mutex_lock(&kvm->lock);
+ if (!irqchip_in_kernel(kvm))
+ r = -ENXIO;
+ else if (kvm->created_vcpus)
+ r = -EINVAL;
+ else
+ kvm->arch.apic_bus_cycle_ns = bus_cycle_ns;
+ mutex_unlock(&kvm->lock);
+ break;
+ }
default:
r = -EINVAL;
break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 2190adbe3002..49da7ec97bd5 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -917,6 +917,7 @@ struct kvm_enable_cap {
#define KVM_CAP_MEMORY_ATTRIBUTES 233
#define KVM_CAP_GUEST_MEMFD 234
#define KVM_CAP_VM_TYPES 235
+#define KVM_CAP_X86_APIC_BUS_FREQUENCY 236

struct kvm_irq_routing_irqchip {
__u32 irqchip;
--
2.34.1


2024-03-21 16:48:06

by Reinette Chatre

[permalink] [raw]
Subject: [PATCH V4 1/4] KVM: x86: hyper-v: Calculate APIC bus frequency for Hyper-V

From: Isaku Yamahata <[email protected]>

Remove APIC_BUS_FREQUENCY and calculate it based on nanoseconds per APIC
bus cycle. APIC_BUS_FREQUENCY is used only for HV_X64_MSR_APIC_FREQUENCY.
The MSR is not frequently read, calculate it every time.

There are two constants related to the APIC bus frequency:
APIC_BUS_FREQUENCY and APIC_BUS_CYCLE_NS.
Only one value is required because one can be calculated from the other:
APIC_BUS_CYCLES_NS = 1000 * 1000 * 1000 / APIC_BUS_FREQUENCY.

Remove APIC_BUS_FREQUENCY and instead calculate it when needed.
This prepares for support of configurable APIC bus frequency by
requiring to change only a single variable.

Suggested-by: Maxim Levitsky <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Reviewed-by: Xiaoyao Li <[email protected]>
[reinette: rework changelog]
Signed-off-by: Reinette Chatre <[email protected]>
---

Changes v4:
- Modify subject to match custom: "KVM: x86/hyperv:" ->
"KVM: x86: hyper-v:" and use standard capitalization for Hyper-V
("hyper-v" -> "Hyper-V").
- Rework changelog to remove pronouns ("we").
- Change logic in commit log: "APIC bus cycles per NS" -> "nanoseconds
per APIC bus cycle".
- Fix typos. (Maxim and Xiaoyao)
- Add Maxim and Xiaoyao's Reviewed-by tags.
- Add Maxim's "Suggested-by".

Changes v3:
- Newly added according to Maxim Levitsky suggestion.

arch/x86/kvm/hyperv.c | 2 +-
arch/x86/kvm/lapic.h | 1 -
2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 8a47f8541eab..1030701db967 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1737,7 +1737,7 @@ static int kvm_hv_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata,
data = (u64)vcpu->arch.virtual_tsc_khz * 1000;
break;
case HV_X64_MSR_APIC_FREQUENCY:
- data = APIC_BUS_FREQUENCY;
+ data = div64_u64(1000000000ULL, APIC_BUS_CYCLE_NS);
break;
default:
kvm_pr_unimpl_rdmsr(vcpu, msr);
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 0a0ea4b5dd8c..a20cb006b6c8 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -17,7 +17,6 @@
#define APIC_DEST_MASK 0x800

#define APIC_BUS_CYCLE_NS 1
-#define APIC_BUS_FREQUENCY (1000000000ULL / APIC_BUS_CYCLE_NS)

#define APIC_BROADCAST 0xFF
#define X2APIC_BROADCAST 0xFFFFFFFFul
--
2.34.1


2024-03-21 16:48:08

by Reinette Chatre

[permalink] [raw]
Subject: [PATCH V4 2/4] KVM: x86: Make nsec per APIC bus cycle a VM variable

From: Isaku Yamahata <[email protected]>

Introduce the VM variable "nanoseconds per APIC bus cycle" in
preparation to make the APIC bus frequency configurable.

The TDX architecture hard-codes the core crystal clock frequency to
25MHz and mandates exposing it via CPUID leaf 0x15. The TDX architecture
does not allow the VMM to override the value.

In addition, per Intel SDM:
"The APIC timer frequency will be the processor’s bus clock or core
crystal clock frequency (when TSC/core crystal clock ratio is
enumerated in CPUID leaf 0x15) divided by the value specified in
the divide configuration register."

The resulting 25MHz APIC bus frequency conflicts with the KVM hardcoded
APIC bus frequency of 1GHz.

Introduce the VM variable "nanoseconds per APIC bus cycle" to prepare
for allowing userspace to tell KVM to use the frequency that TDX mandates
instead of the default 1Ghz. Doing so ensures that the guest doesn't have
a conflicting view of the APIC bus frequency.

Signed-off-by: Isaku Yamahata <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
[reinette: rework changelog]
Signed-off-by: Reinette Chatre <[email protected]>
---

Changes v4:
- Reword changelog to address comments related to "bus clock" vs
"core crystal clock" frequency. (Xiaoyao)
- Typo in changelog ("APIC APIC" -> "APIC").
- Change logic "APIC bus cycles per nsec" -> "nanoseconds per
APIC bus cycle".

Changes V3:
- Update commit message.
- Dropped apic_bus_frequency according to Maxim Levitsky.

Changes v2:
- No change.

arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/hyperv.c | 3 ++-
arch/x86/kvm/lapic.c | 6 ++++--
arch/x86/kvm/lapic.h | 2 +-
arch/x86/kvm/x86.c | 1 +
5 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9e7b1a00e265..91ae7345700c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1359,6 +1359,7 @@ struct kvm_arch {

u32 default_tsc_khz;
bool user_set_tsc;
+ u64 apic_bus_cycle_ns;

seqcount_raw_spinlock_t pvclock_sc;
bool use_master_clock;
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 1030701db967..5c31e715d2ad 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1737,7 +1737,8 @@ static int kvm_hv_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata,
data = (u64)vcpu->arch.virtual_tsc_khz * 1000;
break;
case HV_X64_MSR_APIC_FREQUENCY:
- data = div64_u64(1000000000ULL, APIC_BUS_CYCLE_NS);
+ data = div64_u64(1000000000ULL,
+ vcpu->kvm->arch.apic_bus_cycle_ns);
break;
default:
kvm_pr_unimpl_rdmsr(vcpu, msr);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 11b9a9bdc07a..f336a0ca1190 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1547,7 +1547,8 @@ static u32 apic_get_tmcct(struct kvm_lapic *apic)
remaining = 0;

ns = mod_64(ktime_to_ns(remaining), apic->lapic_timer.period);
- return div64_u64(ns, (APIC_BUS_CYCLE_NS * apic->divide_count));
+ return div64_u64(ns, (apic->vcpu->kvm->arch.apic_bus_cycle_ns *
+ apic->divide_count));
}

static void __report_tpr_access(struct kvm_lapic *apic, bool write)
@@ -1965,7 +1966,8 @@ static void start_sw_tscdeadline(struct kvm_lapic *apic)

static inline u64 tmict_to_ns(struct kvm_lapic *apic, u32 tmict)
{
- return (u64)tmict * APIC_BUS_CYCLE_NS * (u64)apic->divide_count;
+ return (u64)tmict * apic->vcpu->kvm->arch.apic_bus_cycle_ns *
+ (u64)apic->divide_count;
}

static void update_target_expiration(struct kvm_lapic *apic, uint32_t old_divisor)
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index a20cb006b6c8..51e09f5a7fc5 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -16,7 +16,7 @@
#define APIC_DEST_NOSHORT 0x0
#define APIC_DEST_MASK 0x800

-#define APIC_BUS_CYCLE_NS 1
+#define APIC_BUS_CYCLE_NS_DEFAULT 1

#define APIC_BROADCAST 0xFF
#define X2APIC_BROADCAST 0xFFFFFFFFul
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 66c4381460dc..23f25799c760 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12559,6 +12559,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);

kvm->arch.default_tsc_khz = max_tsc_khz ? : tsc_khz;
+ kvm->arch.apic_bus_cycle_ns = APIC_BUS_CYCLE_NS_DEFAULT;
kvm->arch.guest_can_read_msr_platform_info = true;
kvm->arch.enable_pmu = enable_pmu;

--
2.34.1


2024-03-21 16:48:38

by Reinette Chatre

[permalink] [raw]
Subject: [PATCH V4 4/4] KVM: selftests: Add test for configure of x86 APIC bus frequency

From: Isaku Yamahata <[email protected]>

Test if the APIC bus clock frequency is the expected configured value.

Set APIC timer's initial count to the maximum value and busy wait for 100
msec (any value is okay) with TSC value. Read the APIC timer's "current
count" to calculate the actual APIC bus clock frequency based on TSC
frequency.

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Co-developed-by: Reinette Chatre <[email protected]>
Signed-off-by: Reinette Chatre <[email protected]>
---

Changes v4:
- Rework changelog.
- Add Sean's "Suggested-by" to acknowledge guidance received in
https://lore.kernel.org/all/[email protected]/
- Add copyright.
- Add test description to file header.
- Consistent capitalization for acronyms.
- Rebase to kvm-x86/next.
- Update to v4 change of providing bus clock rate in nanoseconds.
- Add a "TEST_REQUIRE()" for the new capability so that the test can
work on kernels that do not support the new capability.
- Address checkpatch warnings and use tabs instead of spaces in header
file to match existing code.

Changes v3:
- Use 1.5GHz instead of 1GHz as frequency.

Changes v2:
- Newly added.

tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/x86_64/apic.h | 7 +
.../kvm/x86_64/apic_bus_clock_test.c | 166 ++++++++++++++++++
3 files changed, 174 insertions(+)
create mode 100644 tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c

diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index da20e6bb43ed..05b9a942a878 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -110,6 +110,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/vmx_invalid_nested_guest_state
TEST_GEN_PROGS_x86_64 += x86_64/vmx_set_nested_state_test
TEST_GEN_PROGS_x86_64 += x86_64/vmx_tsc_adjust_test
TEST_GEN_PROGS_x86_64 += x86_64/vmx_nested_tsc_scaling_test
+TEST_GEN_PROGS_x86_64 += x86_64/apic_bus_clock_test
TEST_GEN_PROGS_x86_64 += x86_64/xapic_ipi_test
TEST_GEN_PROGS_x86_64 += x86_64/xapic_state_test
TEST_GEN_PROGS_x86_64 += x86_64/xcr0_cpuid_test
diff --git a/tools/testing/selftests/kvm/include/x86_64/apic.h b/tools/testing/selftests/kvm/include/x86_64/apic.h
index bed316fdecd5..b0d2fc62e172 100644
--- a/tools/testing/selftests/kvm/include/x86_64/apic.h
+++ b/tools/testing/selftests/kvm/include/x86_64/apic.h
@@ -60,6 +60,13 @@
#define APIC_VECTOR_MASK 0x000FF
#define APIC_ICR2 0x310
#define SET_APIC_DEST_FIELD(x) ((x) << 24)
+#define APIC_LVT0 0x350
+#define APIC_LVT_TIMER_ONESHOT (0 << 17)
+#define APIC_LVT_TIMER_PERIODIC (1 << 17)
+#define APIC_LVT_TIMER_TSCDEADLINE (2 << 17)
+#define APIC_TMICT 0x380
+#define APIC_TMCCT 0x390
+#define APIC_TDCR 0x3E0

void apic_disable(void);
void xapic_enable(void);
diff --git a/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c b/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
new file mode 100644
index 000000000000..ee0cae056827
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
@@ -0,0 +1,166 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Test configure of APIC bus frequency.
+ *
+ * Copyright (c) 2024 Intel Corporation
+ *
+ * To verify if the APIC bus frequency can be configured this test starts
+ * by setting the TSC frequency in KVM, and then:
+ * For every APIC timer frequency supported:
+ * * In the guest:
+ * * * Start the APIC timer by programming the APIC TMICT (initial count
+ * register) to the largest value possible to guarantee that it will
+ * not expire during the test,
+ * * * Wait for a known duration based on previously set TSC frequency,
+ * * * Stop the timer and read the APIC TMCCT (current count) register to
+ * determine the count at that time (TMCCT is loaded from TMICT when
+ * TMICT is programmed and then starts counting down).
+ * * In the host:
+ * * * Determine if the APIC counts close to configured APIC bus frequency
+ * while taking into account how the APIC timer frequency was modified
+ * using the APIC TDCR (divide configuration register).
+ */
+#define _GNU_SOURCE /* for program_invocation_short_name */
+
+#include "apic.h"
+#include "test_util.h"
+
+/*
+ * Pick one convenient value, 1.5GHz. No special meaning and different from
+ * the default value, 1GHz.
+ */
+#define TSC_HZ (1500 * 1000 * 1000ULL)
+
+/* Wait for 100 msec, not too long, not too short value. */
+#define LOOP_MSEC 100ULL
+#define TSC_WAIT_DELTA (TSC_HZ / 1000 * LOOP_MSEC)
+
+/*
+ * Pick a typical value, 25MHz. Different enough from the default value, 1GHz.
+ */
+#define APIC_BUS_CLOCK_FREQ (25 * 1000 * 1000ULL)
+
+static void guest_code(void)
+{
+ /*
+ * Possible TDCR values and its divide count. Used to modify APIC
+ * timer frequency.
+ */
+ struct {
+ u32 tdcr;
+ u32 divide_count;
+ } tdcrs[] = {
+ {0x0, 2},
+ {0x1, 4},
+ {0x2, 8},
+ {0x3, 16},
+ {0x8, 32},
+ {0x9, 64},
+ {0xa, 128},
+ {0xb, 1},
+ };
+
+ u32 tmict, tmcct;
+ u64 tsc0, tsc1;
+ int i;
+
+ asm volatile("cli");
+
+ xapic_enable();
+
+ /*
+ * Setup one-shot timer. The vector does not matter because the
+ * interrupt does not fire.
+ */
+ xapic_write_reg(APIC_LVT0, APIC_LVT_TIMER_ONESHOT);
+
+ for (i = 0; i < ARRAY_SIZE(tdcrs); i++) {
+ xapic_write_reg(APIC_TDCR, tdcrs[i].tdcr);
+
+ /* Set the largest value to not trigger the interrupt. */
+ tmict = ~0;
+ xapic_write_reg(APIC_TMICT, tmict);
+
+ /* Busy wait for LOOP_MSEC */
+ tsc0 = rdtsc();
+ tsc1 = tsc0;
+ while (tsc1 - tsc0 < TSC_WAIT_DELTA)
+ tsc1 = rdtsc();
+
+ /* Read APIC timer and TSC */
+ tmcct = xapic_read_reg(APIC_TMCCT);
+ tsc1 = rdtsc();
+
+ /* Stop timer */
+ xapic_write_reg(APIC_TMICT, 0);
+
+ /* Report it. */
+ GUEST_SYNC_ARGS(tdcrs[i].divide_count, tmict - tmcct,
+ tsc1 - tsc0, 0, 0);
+ }
+
+ GUEST_DONE();
+}
+
+void test_apic_bus_clock(struct kvm_vcpu *vcpu)
+{
+ bool done = false;
+ struct ucall uc;
+
+ while (!done) {
+ vcpu_run(vcpu);
+ TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
+
+ switch (get_ucall(vcpu, &uc)) {
+ case UCALL_DONE:
+ done = true;
+ break;
+ case UCALL_ABORT:
+ REPORT_GUEST_ASSERT(uc);
+ break;
+ case UCALL_SYNC: {
+ u32 divide_counter = uc.args[1];
+ u32 apic_cycles = uc.args[2];
+ u64 tsc_cycles = uc.args[3];
+ u64 freq;
+
+ TEST_ASSERT(tsc_cycles > 0,
+ "TSC cycles must not be zero.");
+
+ /* Allow 1% slack. */
+ freq = apic_cycles * divide_counter * TSC_HZ / tsc_cycles;
+ TEST_ASSERT(freq < APIC_BUS_CLOCK_FREQ * 101 / 100,
+ "APIC bus clock frequency is too large");
+ TEST_ASSERT(freq > APIC_BUS_CLOCK_FREQ * 99 / 100,
+ "APIC bus clock frequency is too small");
+ break;
+ }
+ default:
+ TEST_FAIL("Unknown ucall %lu", uc.cmd);
+ break;
+ }
+ }
+}
+
+int main(int argc, char *argv[])
+{
+ struct kvm_vcpu *vcpu;
+ struct kvm_vm *vm;
+
+ TEST_REQUIRE(kvm_has_cap(KVM_CAP_X86_APIC_BUS_FREQUENCY));
+
+ vm = __vm_create(VM_SHAPE_DEFAULT, 1, 0);
+ vm_ioctl(vm, KVM_SET_TSC_KHZ, (void *)(TSC_HZ / 1000));
+ /*
+ * KVM_CAP_X86_APIC_BUS_FREQUENCY expects APIC bus clock rate in
+ * nanoseconds and requires that no vCPU is created.
+ */
+ vm_enable_cap(vm, KVM_CAP_X86_APIC_BUS_FREQUENCY,
+ NSEC_PER_SEC / APIC_BUS_CLOCK_FREQ);
+ vcpu = vm_vcpu_add(vm, 0, guest_code);
+
+ virt_pg_map(vm, APIC_DEFAULT_GPA, APIC_DEFAULT_GPA);
+
+ test_apic_bus_clock(vcpu);
+ kvm_vm_free(vm);
+}
--
2.34.1


2024-04-16 17:08:19

by Edgecombe, Rick P

[permalink] [raw]
Subject: Re: [PATCH V4 1/4] KVM: x86: hyper-v: Calculate APIC bus frequency for Hyper-V

On Thu, 2024-03-21 at 09:37 -0700, Reinette Chatre wrote:
> From: Isaku Yamahata <[email protected]>
>
> Remove APIC_BUS_FREQUENCY and calculate it based on nanoseconds per APIC
> bus cycle.  APIC_BUS_FREQUENCY is used only for HV_X64_MSR_APIC_FREQUENCY.
> The MSR is not frequently read, calculate it every time.
>
> There are two constants related to the APIC bus frequency:
> APIC_BUS_FREQUENCY and APIC_BUS_CYCLE_NS.
> Only one value is required because one can be calculated from the other:
>    APIC_BUS_CYCLES_NS = 1000 * 1000 * 1000 / APIC_BUS_FREQUENCY.
>
> Remove APIC_BUS_FREQUENCY and instead calculate it when needed.
> This prepares for support of configurable APIC bus frequency by
> requiring to change only a single variable.
>
> Suggested-by: Maxim Levitsky <[email protected]>
> Signed-off-by: Isaku Yamahata <[email protected]>
> Reviewed-by: Maxim Levitsky <[email protected]>
> Reviewed-by: Xiaoyao Li <[email protected]>
> [reinette: rework changelog]
> Signed-off-by: Reinette Chatre <[email protected]>

Reviewed-by: Rick Edgecombe <[email protected]>

2024-04-16 17:08:40

by Edgecombe, Rick P

[permalink] [raw]
Subject: Re: [PATCH V4 2/4] KVM: x86: Make nsec per APIC bus cycle a VM variable

On Thu, 2024-03-21 at 09:37 -0700, Reinette Chatre wrote:
> From: Isaku Yamahata <[email protected]>
>
> Introduce the VM variable "nanoseconds per APIC bus cycle" in
> preparation to make the APIC bus frequency configurable.
>
> The TDX architecture hard-codes the core crystal clock frequency to
> 25MHz and mandates exposing it via CPUID leaf 0x15. The TDX architecture
> does not allow the VMM to override the value.
>
> In addition, per Intel SDM:
>     "The APIC timer frequency will be the processor’s bus clock or core
>      crystal clock frequency (when TSC/core crystal clock ratio is
>      enumerated in CPUID leaf 0x15) divided by the value specified in
>      the divide configuration register."
>
> The resulting 25MHz APIC bus frequency conflicts with the KVM hardcoded
> APIC bus frequency of 1GHz.
>
> Introduce the VM variable "nanoseconds per APIC bus cycle" to prepare
> for allowing userspace to tell KVM to use the frequency that TDX mandates
> instead of the default 1Ghz. Doing so ensures that the guest doesn't have
> a conflicting view of the APIC bus frequency.
>
> Signed-off-by: Isaku Yamahata <[email protected]>
> Reviewed-by: Maxim Levitsky <[email protected]>
> [reinette: rework changelog]
> Signed-off-by: Reinette Chatre <[email protected]>

Reviewed-by: Rick Edgecombe <[email protected]>

2024-04-16 17:09:00

by Edgecombe, Rick P

[permalink] [raw]
Subject: Re: [PATCH V4 3/4] KVM: x86: Add a capability to configure bus frequency for APIC timer

On Thu, 2024-03-21 at 09:37 -0700, Reinette Chatre wrote:
> From: Isaku Yamahata <[email protected]>
>
> Add KVM_CAP_X86_APIC_BUS_FREQUENCY capability to configure the APIC
> bus clock frequency for APIC timer emulation.
> Allow KVM_ENABLE_CAPABILITY(KVM_CAP_X86_APIC_BUS_FREQUENCY) to set the
> frequency in nanoseconds. When using this capability, the user space
> VMM should configure CPUID leaf 0x15 to advertise the frequency.
>
> Vishal reported that the TDX guest kernel expects a 25MHz APIC bus
> frequency but ends up getting interrupts at a significantly higher rate.
>
> The TDX architecture hard-codes the core crystal clock frequency to
> 25MHz and mandates exposing it via CPUID leaf 0x15. The TDX architecture
> does not allow the VMM to override the value.
>
> In addition, per Intel SDM:
>     "The APIC timer frequency will be the processor’s bus clock or core
>      crystal clock frequency (when TSC/core crystal clock ratio is
>      enumerated in CPUID leaf 0x15) divided by the value specified in
>      the divide configuration register."
>
> The resulting 25MHz APIC bus frequency conflicts with the KVM hardcoded
> APIC bus frequency of 1GHz.
>
> The KVM doesn't enumerate CPUID leaf 0x15 to the guest unless the user
> space VMM sets it using KVM_SET_CPUID. If the CPUID leaf 0x15 is
> enumerated, the guest kernel uses it as the APIC bus frequency. If not,
> the guest kernel measures the frequency based on other known timers like
> the ACPI timer or the legacy PIT. As reported by Vishal the TDX guest
> kernel expects a 25MHz timer frequency but gets timer interrupt more
> frequently due to the 1GHz frequency used by KVM.
>
> To ensure that the guest doesn't have a conflicting view of the APIC bus
> frequency, allow the userspace to tell KVM to use the same frequency that
> TDX mandates instead of the default 1Ghz.
>
> There are several options to address this:
> 1. Make the KVM able to configure APIC bus frequency (this series).
>    Pro: It resembles the existing hardware.  The recent Intel CPUs
>         adapts 25MHz.
>    Con: Require the VMM to emulate the APIC timer at 25MHz.
> 2. Make the TDX architecture enumerate CPUID leaf 0x15 to configurable
>    frequency or not enumerate it.
>    Pro: Any APIC bus frequency is allowed.
>    Con: Deviates from TDX architecture.
> 3. Make the TDX guest kernel use 1GHz when it's running on KVM.
>    Con: The kernel ignores CPUID leaf 0x15.
> 4. Change CPUID leaf 0x15 under TDX to report the crystal clock frequency
>    as 1 GHz.
>    Pro: This has been the virtual APIC frequency for KVM guests for 13
>         years.
>    Pro: This requires changing only one hard-coded constant in TDX.
>    Con: It doesn't work with other VMMs as TDX isn't specific to KVM.
>    Con: Core crystal clock frequency is also used to calculate TSC
>         frequency.
>    Con: If it is configured to value different from hardware, it will
>         break the correctness of INTEL-PT Mini Time Count (MTC) packets
>         in TDs.
>
> Reported-by: Vishal Annapurve <[email protected]>
> Closes:
> https://lore.kernel.org/lkml/[email protected]/

Is Closes appropriate, given the issue Vishal hit was on non-upstream code?

> Signed-off-by: Isaku Yamahata <[email protected]>
> Co-developed-by: Reinette Chatre <[email protected]>
> Signed-off-by: Reinette Chatre <[email protected]>

Reviewed-by: Rick Edgecombe <[email protected]>

2024-04-16 17:10:01

by Edgecombe, Rick P

[permalink] [raw]
Subject: Re: [PATCH V4 0/4] KVM: x86: Make bus clock frequency for vAPIC timer configurable

On Thu, 2024-03-21 at 09:37 -0700, Reinette Chatre wrote:
>
> Summary
> -------
> Add KVM_CAP_X86_APIC_BUS_FREQUENCY capability to configure the APIC
> bus clock frequency for APIC timer emulation.
> Allow KVM_ENABLE_CAPABILITY(KVM_CAP_X86_APIC_BUS_FREQUENCY) to set the
> frequency in nanoseconds. When using this capability, the user space
> VMM should configure CPUID leaf 0x15 to advertise the frequency.

Looks good to me and...
Tested-by: Rick Edgecombe <[email protected]>

The only thing missing is actually integrating it into TDX qemu patches and
testing the resulting TD. I think we are making a fair assumption that the
problem should be resolved based on the analysis, but we have not actually
tested that part. Is that right? What do you think?

2024-04-16 21:38:41

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH V4 3/4] KVM: x86: Add a capability to configure bus frequency for APIC timer

Hi Rick,

On 4/16/2024 10:08 AM, Edgecombe, Rick P wrote:
> On Thu, 2024-03-21 at 09:37 -0700, Reinette Chatre wrote:
>> From: Isaku Yamahata <[email protected]>
>>
>> Add KVM_CAP_X86_APIC_BUS_FREQUENCY capability to configure the APIC
>> bus clock frequency for APIC timer emulation.
>> Allow KVM_ENABLE_CAPABILITY(KVM_CAP_X86_APIC_BUS_FREQUENCY) to set the
>> frequency in nanoseconds. When using this capability, the user space
>> VMM should configure CPUID leaf 0x15 to advertise the frequency.
>>
>> Vishal reported that the TDX guest kernel expects a 25MHz APIC bus
>> frequency but ends up getting interrupts at a significantly higher rate.
>>
>> The TDX architecture hard-codes the core crystal clock frequency to
>> 25MHz and mandates exposing it via CPUID leaf 0x15. The TDX architecture
>> does not allow the VMM to override the value.
>>
>> In addition, per Intel SDM:
>>     "The APIC timer frequency will be the processor’s bus clock or core
>>      crystal clock frequency (when TSC/core crystal clock ratio is
>>      enumerated in CPUID leaf 0x15) divided by the value specified in
>>      the divide configuration register."
>>
>> The resulting 25MHz APIC bus frequency conflicts with the KVM hardcoded
>> APIC bus frequency of 1GHz.
>>
>> The KVM doesn't enumerate CPUID leaf 0x15 to the guest unless the user
>> space VMM sets it using KVM_SET_CPUID. If the CPUID leaf 0x15 is
>> enumerated, the guest kernel uses it as the APIC bus frequency. If not,
>> the guest kernel measures the frequency based on other known timers like
>> the ACPI timer or the legacy PIT. As reported by Vishal the TDX guest
>> kernel expects a 25MHz timer frequency but gets timer interrupt more
>> frequently due to the 1GHz frequency used by KVM.
>>
>> To ensure that the guest doesn't have a conflicting view of the APIC bus
>> frequency, allow the userspace to tell KVM to use the same frequency that
>> TDX mandates instead of the default 1Ghz.
>>
>> There are several options to address this:
>> 1. Make the KVM able to configure APIC bus frequency (this series).
>>    Pro: It resembles the existing hardware.  The recent Intel CPUs
>>         adapts 25MHz.
>>    Con: Require the VMM to emulate the APIC timer at 25MHz.
>> 2. Make the TDX architecture enumerate CPUID leaf 0x15 to configurable
>>    frequency or not enumerate it.
>>    Pro: Any APIC bus frequency is allowed.
>>    Con: Deviates from TDX architecture.
>> 3. Make the TDX guest kernel use 1GHz when it's running on KVM.
>>    Con: The kernel ignores CPUID leaf 0x15.
>> 4. Change CPUID leaf 0x15 under TDX to report the crystal clock frequency
>>    as 1 GHz.
>>    Pro: This has been the virtual APIC frequency for KVM guests for 13
>>         years.
>>    Pro: This requires changing only one hard-coded constant in TDX.
>>    Con: It doesn't work with other VMMs as TDX isn't specific to KVM.
>>    Con: Core crystal clock frequency is also used to calculate TSC
>>         frequency.
>>    Con: If it is configured to value different from hardware, it will
>>         break the correctness of INTEL-PT Mini Time Count (MTC) packets
>>         in TDs.
>>
>> Reported-by: Vishal Annapurve <[email protected]>
>> Closes:
>> https://lore.kernel.org/lkml/[email protected]/
>
> Is Closes appropriate, given the issue Vishal hit was on non-upstream code?

The issue that Vishal encountered was root-caused to the APIC bus frequency
conflict between KVM and TDX that has not changed since the original report.

It would be great if this submission can get confirmation that this series addresses
the issues encountered originally. From what I can tell there has not been
any formal accept/reject of this solution (apart from agreeing that it is the
right approach [1]) since the original report.

Reinette

[1] https://lore.kernel.org/lkml/CAGtprH8-jiC+wsy2LgmZimrRUT6kuntD6EJso2Mvx5Y42za9Dw@mail.gmail.com/

2024-04-17 03:37:54

by Xiaoyao Li

[permalink] [raw]
Subject: Re: [PATCH V4 3/4] KVM: x86: Add a capability to configure bus frequency for APIC timer

On 3/22/2024 12:37 AM, Reinette Chatre wrote:

..

> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 0b5a33ee71ee..20080fe4b8ee 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -8063,6 +8063,23 @@ error/annotated fault.
>
> See KVM_EXIT_MEMORY_FAULT for more information.
>
> +7.35 KVM_CAP_X86_APIC_BUS_FREQUENCY

As sean mentioned it previous comment, I would be the one prefers
KVM_CAP_X86_APIC_BUS_CYCLES_NS

Depending on whether people get hung up on nanoseconds not being a
"frequency", either KVM_CAP_X86_APIC_BUS_FREQUENCY or
KVM_CAP_X86_APIC_BUS_CYCLES_NS.

2024-04-24 16:13:40

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH V4 0/4] KVM: x86: Make bus clock frequency for vAPIC timer configurable

On Tue, Apr 16, 2024, Rick P Edgecombe wrote:
> On Thu, 2024-03-21 at 09:37 -0700, Reinette Chatre wrote:
> >
> > Summary
> > -------
> > Add KVM_CAP_X86_APIC_BUS_FREQUENCY capability to configure the APIC
> > bus clock frequency for APIC timer emulation.
> > Allow KVM_ENABLE_CAPABILITY(KVM_CAP_X86_APIC_BUS_FREQUENCY) to set the
> > frequency in nanoseconds. When using this capability, the user space
> > VMM should configure CPUID leaf 0x15 to advertise the frequency.
>
> Looks good to me and...
> Tested-by: Rick Edgecombe <[email protected]>
>
> The only thing missing is actually integrating it into TDX qemu patches and
> testing the resulting TD. I think we are making a fair assumption that the
> problem should be resolved based on the analysis, but we have not actually
> tested that part. Is that right?

Please tell me that Rick is wrong, and that this actually has been tested with
a TDX guest. I don't care _who_ tested it, or with what VMM it has been tested,
but _someone_ needs to verify that this actually fixes the TDX issue.

2024-04-24 16:42:34

by Edgecombe, Rick P

[permalink] [raw]
Subject: Re: [PATCH V4 0/4] KVM: x86: Make bus clock frequency for vAPIC timer configurable

On Wed, 2024-04-24 at 09:13 -0700, Sean Christopherson wrote:
> On Tue, Apr 16, 2024, Rick P Edgecombe wrote:
> > On Thu, 2024-03-21 at 09:37 -0700, Reinette Chatre wrote:
> > >
> > > Summary
> > > -------
> > > Add KVM_CAP_X86_APIC_BUS_FREQUENCY capability to configure the APIC
> > > bus clock frequency for APIC timer emulation.
> > > Allow KVM_ENABLE_CAPABILITY(KVM_CAP_X86_APIC_BUS_FREQUENCY) to set the
> > > frequency in nanoseconds. When using this capability, the user space
> > > VMM should configure CPUID leaf 0x15 to advertise the frequency.
> >
> > Looks good to me and...
> > Tested-by: Rick Edgecombe <[email protected]>
> >
> > The only thing missing is actually integrating it into TDX qemu patches and
> > testing the resulting TD. I think we are making a fair assumption that the
> > problem should be resolved based on the analysis, but we have not actually
> > tested that part. Is that right?
>
> Please tell me that Rick is wrong, and that this actually has been tested with
> a TDX guest.  I don't care _who_ tested it, or with what VMM it has been
> tested,
> but _someone_ needs to verify that this actually fixes the TDX issue.

It is in the process of getting a TDX test developed (or rather updated).
Agreed, it requires verification that it fixes the original TDX issue. That is
why I raised it.

Reinette was working on this internally and some iterations were happening, but
we are trying to work on the public list as much as possible per your earlier
comments. So that is why she posted it.

There was at least some level of TDX integration in the past. I'm not sure what
exactly was tested, but we are going to re-verify it with the latest everything.

2024-04-24 17:25:38

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH V4 0/4] KVM: x86: Make bus clock frequency for vAPIC timer configurable

On Wed, Apr 24, 2024, Rick P Edgecombe wrote:
> On Wed, 2024-04-24 at 09:13 -0700, Sean Christopherson wrote:
> > On Tue, Apr 16, 2024, Rick P Edgecombe wrote:
> > > On Thu, 2024-03-21 at 09:37 -0700, Reinette Chatre wrote:
> > > >
> > > > Summary
> > > > -------
> > > > Add KVM_CAP_X86_APIC_BUS_FREQUENCY capability to configure the APIC
> > > > bus clock frequency for APIC timer emulation.
> > > > Allow KVM_ENABLE_CAPABILITY(KVM_CAP_X86_APIC_BUS_FREQUENCY) to set the
> > > > frequency in nanoseconds. When using this capability, the user space
> > > > VMM should configure CPUID leaf 0x15 to advertise the frequency.
> > >
> > > Looks good to me and...
> > > Tested-by: Rick Edgecombe <[email protected]>
> > >
> > > The only thing missing is actually integrating it into TDX qemu patches and
> > > testing the resulting TD. I think we are making a fair assumption that the
> > > problem should be resolved based on the analysis, but we have not actually
> > > tested that part. Is that right?
> >
> > Please tell me that Rick is wrong, and that this actually has been tested with
> > a TDX guest.  I don't care _who_ tested it, or with what VMM it has been
> > tested, but _someone_ needs to verify that this actually fixes the TDX issue.
>
> It is in the process of getting a TDX test developed (or rather updated).
> Agreed, it requires verification that it fixes the original TDX issue. That is
> why I raised it.
>
> Reinette was working on this internally and some iterations were happening, but
> we are trying to work on the public list as much as possible per your earlier
> comments. So that is why she posted it.

I have no problem posting "early", but Documentation/process/maintainer-kvm-x86.rst
clearly states under Testing that:

If you can't fully test a change, e.g. due to lack of hardware, clearly state
what level of testing you were able to do, e.g. in the cover letter.

I was assuming that this was actually *fully* tested, because nothing suggests
otherwise. And _that_ is a problem, e.g. I was planning on applying this series
for 6.10, which would have made for quite the mess if it turns out that this
doesn't actually solve the TDX problem.

> There was at least some level of TDX integration in the past. I'm not sure what
> exactly was tested, but we are going to re-verify it with the latest everything.

Honest question, is it a big lift to re-test the QEMU+KVM TDX changes, e.g. to
verify this new capability actually does what we hope it does? If testing is a
big lift, what are the pain points? Or perhaps a better question is, is there
anything we (both upstream people, and end users like Google) can do to make
re-testing less awful?

2024-04-24 18:10:39

by Edgecombe, Rick P

[permalink] [raw]
Subject: Re: [PATCH V4 0/4] KVM: x86: Make bus clock frequency for vAPIC timer configurable

+Vishal

On Wed, 2024-04-24 at 10:23 -0700, Sean Christopherson wrote:
> I have no problem posting "early", but Documentation/process/maintainer-kvm-
> x86.rst
> clearly states under Testing that:
>
>   If you can't fully test a change, e.g. due to lack of hardware, clearly
> state
>   what level of testing you were able to do, e.g. in the cover letter.
>
> I was assuming that this was actually *fully* tested, because nothing suggests
> otherwise.  And _that_ is a problem, e.g. I was planning on applying this
> series
> for 6.10, which would have made for quite the mess if it turns out that this
> doesn't actually solve the TDX problem.

Ok, sorry. Will definitely be clear about this in the future. There is a little
bit of chaos on our end right now as new people spin up and we adjust our
working model to be more upstream focused. Thanks for being clear.

Yes, please don't apply until we have the full testing done. It may not be far
away though, per Reinette.

>
> > There was at least some level of TDX integration in the past. I'm not sure
> > what
> > exactly was tested, but we are going to re-verify it with the latest
> > everything.
>
> Honest question, is it a big lift to re-test the QEMU+KVM TDX changes, e.g. to
> verify this new capability actually does what we hope it does?  If testing is
> a
> big lift, what are the pain points?  Or perhaps a better question is, is there
> anything we (both upstream people, and end users like Google) can do to make
> re-testing less awful?

I wouldn't say a big lift, but probably more than usual. Most of the testing
challenges come from updating and matching specific, often out of tree
components. We need to have specific OVMF, TDX module, QEMU, KVM core patches,
KVM breakout series, and tests versions that all match.

To wrangle it, automated testing is something we are working on internally right
now. I can't think of anything to ask of upstream specifically. But Vishal
might.

As for Google, the TDX selftests are useful. We need to update them ourselves to
keep up with uAPI changes. We could do a little more co-development on those? As
in, not wait until we post new versions to get updates from Google's side. Just
an idea off the top of my head.

As for the TDX kvm unit tests updates [0]. They have not had much review. I
think we maybe have enough TDX patches to focus on already though.

Long term though, I have been wondering about how to prevent TDX regressions
especially on the MMU pieces. It is one thing to have the TDX setups available
for maintainers, but most normal developers will likely not have access to TDX
HW for a bit. Just a problem without a solution.

[0] https://lore.kernel.org/kvm/[email protected]/#t

2024-04-24 19:57:56

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH V4 0/4] KVM: x86: Make bus clock frequency for vAPIC timer configurable

On Wed, Apr 24, 2024, Rick P Edgecombe wrote:
> Long term though, I have been wondering about how to prevent TDX regressions
> especially on the MMU pieces. It is one thing to have the TDX setups available
> for maintainers, but most normal developers will likely not have access to TDX
> HW for a bit. Just a problem without a solution.

I wouldn't worry too much about hardware availability. As you said, it's not
a problem we can really solve, and we already have to be concious of the fact
that not all developers have comparable hardware. E.g. most people don't have
a 4-sock, multi-hundred CPU system with TiBs of RAM. Not being able to test at
all is obviously a little different, but it's not entirely new.

Instead, I would encourage spending time and effort (after things have settled
down patch wise) to build out selftests. I tried to run a "real" SEV-ES VM
and gave up because I needed the "right" OVMF build, blah blah blah. At some
point I'll probably bite the bullet and get a "full" CoCo setup working, but it's
not exactly at the top of my todo list, in no small part because the triage and
debug experience when things go wrong is miles and miles better in selftests.

2024-04-24 21:08:01

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH V4 0/4] KVM: x86: Make bus clock frequency for vAPIC timer configurable

(+Yao Yuan)

Hi Sean and Rick,

On 4/24/2024 9:38 AM, Edgecombe, Rick P wrote:
> On Wed, 2024-04-24 at 09:13 -0700, Sean Christopherson wrote:
>> On Tue, Apr 16, 2024, Rick P Edgecombe wrote:
>>> On Thu, 2024-03-21 at 09:37 -0700, Reinette Chatre wrote:
>>>>
>>>> Summary
>>>> -------
>>>> Add KVM_CAP_X86_APIC_BUS_FREQUENCY capability to configure the APIC
>>>> bus clock frequency for APIC timer emulation.
>>>> Allow KVM_ENABLE_CAPABILITY(KVM_CAP_X86_APIC_BUS_FREQUENCY) to set the
>>>> frequency in nanoseconds. When using this capability, the user space
>>>> VMM should configure CPUID leaf 0x15 to advertise the frequency.
>>>
>>> Looks good to me and...
>>> Tested-by: Rick Edgecombe <[email protected]>
>>>
>>> The only thing missing is actually integrating it into TDX qemu patches and
>>> testing the resulting TD. I think we are making a fair assumption that the
>>> problem should be resolved based on the analysis, but we have not actually
>>> tested that part. Is that right?
>>
>> Please tell me that Rick is wrong, and that this actually has been tested with
>> a TDX guest.  I don't care _who_ tested it, or with what VMM it has been
>> tested,
>> but _someone_ needs to verify that this actually fixes the TDX issue.
>
> It is in the process of getting a TDX test developed (or rather updated).
> Agreed, it requires verification that it fixes the original TDX issue. That is
> why I raised it.
>
> Reinette was working on this internally and some iterations were happening, but
> we are trying to work on the public list as much as possible per your earlier
> comments. So that is why she posted it.
>
> There was at least some level of TDX integration in the past. I'm not sure what
> exactly was tested, but we are going to re-verify it with the latest everything.

Apologies for the delay. I am the one needing to do this testing and it took me a while
to ramp up on all the parts (and I am still learning).

I encountered quite the roadblock (for me) along the way that was caused by a lingering
timer (presumably left by TDVF). Thank you so much to Isaku and Yao Yuan for helping me
to root cause this. I believe that this is unique to the kvm-unit-tests that does
not reset the environment like the OS.

A modified x86/apic.c:test_apic_timer_one_shot() was used to test this feature. Below I
provide the diff of essential parts against
https://github.com/intel/kvm-unit-tests-tdx/blob/tdx/x86/apic.c for your reference. With
these modifications it can be confirmed that the test within a TD fails without the work
in this series, and passes with it. This was tested against a host kernel running a
snapshot of the ongoing KVM TDX work and corresponding QEMU changes (including a QEMU
change that enables the new capability introduced in this series).

Below are the core changes made to the existing APIC test. The two major changes are:
(a) stop any lingering timers before the test starts, (b) use CPUID 0x15 in TDX to
accurately determine the TSC and APIC frequencies instead of making 1GHz assumption
and use similar check as the kselftest test introduced in this series (I did have to
increase the amount with which the frequency is allowed to deviate by 1% in my testing).

Please note that there are some more changes needed to run this test in TDX since all
APIC tests are not appropriate for TDX. This snippet was used in my testing and I
will work with kvm-unit-test folks on the next steps to have it integrated.


@@ -477,11 +478,29 @@ static void lvtt_handler(isr_regs_t *regs)

static void test_apic_timer_one_shot(void)
{
- uint64_t tsc1, tsc2;
static const uint32_t interval = 0x10000;
+ uint64_t measured_apic_freq, tsc2, tsc1;
+ uint32_t tsc_freq = 0, apic_freq = 0;
+ struct cpuid cpuid_tsc = {};

#define APIC_LVT_TIMER_VECTOR (0xee)

+ /*
+ * CPUID 0x15 is not available in VMX, can use it to obtain
+ * TSC and APIC frequency for accurate testing
+ */
+ if (is_tdx_guest()) {
+ cpuid_tsc = raw_cpuid(0x15, 0);
+ tsc_freq = cpuid_tsc.c * cpuid_tsc.b / cpuid_tsc.a;
+ apic_freq = cpuid_tsc.c;
+ }
+ /*
+ stop already fired local timer
+ the test case can be negative failure if the timer fired
+ after installed lvtt_handler but *before*
+ write to TIMICT again.
+ */
+ apic_write(APIC_TMICT, 0);
handle_irq(APIC_LVT_TIMER_VECTOR, lvtt_handler);

/* One shot mode */
@@ -503,8 +522,16 @@ static void test_apic_timer_one_shot(void)
* cases, the following should satisfy on all modern
* processors.
*/
- report((lvtt_counter == 1) && (tsc2 - tsc1 >= interval),
- "APIC LVT timer one shot");
+ if (is_tdx_guest()) {
+ measured_apic_freq = interval * (tsc_freq / (tsc2 - tsc1));
+ report((lvtt_counter == 1) &&
+ (measured_apic_freq < apic_freq * 102 / 100) &&
+ (measured_apic_freq > apic_freq * 98 / 100),
+ "APIC LVT timer one shot");
+ } else {
+ report((lvtt_counter == 1) && (tsc2 - tsc1 >= interval),
+ "APIC LVT timer one shot");
+ }
}


Reinette

2024-04-24 21:21:14

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH V4 0/4] KVM: x86: Make bus clock frequency for vAPIC timer configurable

Hi Sean,

On 4/24/2024 10:23 AM, Sean Christopherson wrote:
> On Wed, Apr 24, 2024, Rick P Edgecombe wrote:
>> On Wed, 2024-04-24 at 09:13 -0700, Sean Christopherson wrote:
>>> On Tue, Apr 16, 2024, Rick P Edgecombe wrote:
>>>> On Thu, 2024-03-21 at 09:37 -0700, Reinette Chatre wrote:
>>>>>
>>>>> Summary
>>>>> -------
>>>>> Add KVM_CAP_X86_APIC_BUS_FREQUENCY capability to configure the APIC
>>>>> bus clock frequency for APIC timer emulation.
>>>>> Allow KVM_ENABLE_CAPABILITY(KVM_CAP_X86_APIC_BUS_FREQUENCY) to set the
>>>>> frequency in nanoseconds. When using this capability, the user space
>>>>> VMM should configure CPUID leaf 0x15 to advertise the frequency.
>>>>
>>>> Looks good to me and...
>>>> Tested-by: Rick Edgecombe <[email protected]>
>>>>
>>>> The only thing missing is actually integrating it into TDX qemu patches and
>>>> testing the resulting TD. I think we are making a fair assumption that the
>>>> problem should be resolved based on the analysis, but we have not actually
>>>> tested that part. Is that right?
>>>
>>> Please tell me that Rick is wrong, and that this actually has been tested with
>>> a TDX guest.  I don't care _who_ tested it, or with what VMM it has been
>>> tested, but _someone_ needs to verify that this actually fixes the TDX issue.
>>
>> It is in the process of getting a TDX test developed (or rather updated).
>> Agreed, it requires verification that it fixes the original TDX issue. That is
>> why I raised it.
>>
>> Reinette was working on this internally and some iterations were happening, but
>> we are trying to work on the public list as much as possible per your earlier
>> comments. So that is why she posted it.
>
> I have no problem posting "early", but Documentation/process/maintainer-kvm-x86.rst
> clearly states under Testing that:
>
> If you can't fully test a change, e.g. due to lack of hardware, clearly state
> what level of testing you were able to do, e.g. in the cover letter.
>
> I was assuming that this was actually *fully* tested, because nothing suggests
> otherwise. And _that_ is a problem, e.g. I was planning on applying this series
> for 6.10, which would have made for quite the mess if it turns out that this
> doesn't actually solve the TDX problem.

There was one vote for the capability name to rather be KVM_CAP_X86_APIC_BUS_CYCLES_NS [1]

I'd be happy to resubmit with the name changed but after reading your statement above it
is not clear to me what name is preferred: KVM_CAP_X86_APIC_BUS_FREQUENCY
as used in this series that seem to meet your approval or KVM_CAP_X86_APIC_BUS_CYCLES_NS.

Please let me know what you prefer.

Thank you.

Reinette

[1] https://lore.kernel.org/lkml/[email protected]/

2024-04-25 16:17:48

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH V4 0/4] KVM: x86: Make bus clock frequency for vAPIC timer configurable

On Wed, Apr 24, 2024, Reinette Chatre wrote:
> There was one vote for the capability name to rather be KVM_CAP_X86_APIC_BUS_CYCLES_NS [1]
>
> I'd be happy to resubmit with the name changed but after reading your
> statement above it is not clear to me what name is preferred:
> KVM_CAP_X86_APIC_BUS_FREQUENCY as used in this series that seem to meet your
> approval or KVM_CAP_X86_APIC_BUS_CYCLES_NS.
>
> Please let me know what you prefer.

Both work for me, I don't have a strong preference.

2024-04-25 16:58:42

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH V4 0/4] KVM: x86: Make bus clock frequency for vAPIC timer configurable

Hi Sean,

On 4/25/2024 9:17 AM, Sean Christopherson wrote:
> On Wed, Apr 24, 2024, Reinette Chatre wrote:
>> There was one vote for the capability name to rather be KVM_CAP_X86_APIC_BUS_CYCLES_NS [1]
>>
>> I'd be happy to resubmit with the name changed but after reading your
>> statement above it is not clear to me what name is preferred:
>> KVM_CAP_X86_APIC_BUS_FREQUENCY as used in this series that seem to meet your
>> approval or KVM_CAP_X86_APIC_BUS_CYCLES_NS.
>>
>> Please let me know what you prefer.
>
> Both work for me, I don't have a strong preference.

Thank you. I'll resubmit with the capability name changed to
KVM_CAP_X86_APIC_BUS_CYCLES_NS. This is the only planned change.

Reinette