2023-12-19 08:35:40

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v3 0/4] KVM: X86: Make bus clock frequency for vapic timer configurable

Changes from v2:
- Removed APIC_BUS_FREQUENCY and apic_bus_frequency of struct kvm-arch.
- Update the commit messages.
- Added reviewed-by (Maxim Levitsky)
- Use 1.5GHz instead of 1GHz as frequency for the test case.

Add KVM_CAP_X86_BUS_FREQUENCY_CONTROL capability to configure the core
crystal clock (or processor's bus clock) for APIC timer emulation. Allow
KVM_ENABLE_CAPABILITY(KVM_CAP_X86_BUS_FREUQNCY_CONTROL) to set the
frequency. When using this capability, the user space VMM should configure
CPUID[0x15] to advertise the frequency.

The TDX architecture hard-codes the APIC bus frequency to 25MHz in the
CPUID leaf 0x15. The TDX mandates it to be exposed and doesn't allow the
VMM to override its value. The KVM APIC timer emulation hard-codes the
frequency to 1GHz. The KVM doesn't unconditionally enumerate it to the
guest unless the user space VMM sets the CPUID leaf 0x15 by KVM_SET_CPUID.

If the CPUID leaf 0x15 is enumerated, the guest kernel uses it as the APIC bus
frequency. If not, the guest kernel measures the frequency based on other known
timers like the ACPI timer or the legacy PIT. The TDX guest kernel gets timer
interrupt more times by 1GHz / 25MHz. [1] T o ensure that the guest doesn't have
a conflicting view of the APIC bus frequency, allow the userspace to tell KVM to
use the same frequency that TDX mandates instead of the default 1Ghz.

There are several options to address this.
1. Make the KVM able to configure APIC bus frequency (This patch).
Pro: It resembles the existing hardware. The recent Intel CPUs
adapts 25MHz.
Con: Require the VMM to emulate the APIC timer at 25MHz.
2. Make the TDX architecture enumerate CPUID 0x15 to configurable
frequency or not enumerate it.
Pro: Any APIC bus frequency is allowed.
Con: Deviation from the real hardware.
3. Make the TDX guest kernel use 1GHz when it's running on KVM.
Con: The kernel ignores CPUID leaf 0x15.
4. Change CPUID.15H under TDX to report the crystal clock frequency as
1 GHz.
Pro: This has been the virtual APIC frequency for KVM guests for 13
years.
Pro: This requires changing only one hard-coded constant in TDX.
Con: It doesn't work with other VMMs as TDX isn't specific to KVM.

[1] https://lore.kernel.org/lkml/[email protected]/

Changes from v1:
https://lore.kernel.org/all/[email protected]/
- Added a test case
- Fix a build error for i386 platform
- Add check if vcpu isn't created.
- Add check if lapic chip is in-kernel emulation.
- Updated api.rst

Isaku Yamahata (4):
KVM: x86/hyperv: Calculate APIC bus frequency for hyper-v
KVM: x86: Make the APIC bus cycles per nsec VM variable
KVM: X86: Add a capability to configure bus frequency for APIC timer
KVM: selftests: Add test case for x86 apic_bus_clock_frequency

Documentation/virt/kvm/api.rst | 14 ++
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/hyperv.c | 3 +-
arch/x86/kvm/lapic.c | 6 +-
arch/x86/kvm/lapic.h | 3 +-
arch/x86/kvm/x86.c | 34 +++++
include/uapi/linux/kvm.h | 1 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/x86_64/apic.h | 7 +
.../kvm/x86_64/apic_bus_clock_test.c | 135 ++++++++++++++++++
10 files changed, 200 insertions(+), 5 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c


base-commit: ceb6a6f023fd3e8b07761ed900352ef574010bcb
--
2.25.1



2023-12-19 08:35:54

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v3 1/4] KVM: x86/hyperv: Calculate APIC bus frequency for hyper-v

Remove APIC_BUS_FREUQNCY and calculate it based on APIC bus cycles per NS.
APIC_BUS_FREUQNCY is used only for HV_X64_MSR_APIC_FREQUENCY. The MSR is
not frequently read, calculate it every time.

In order to make APIC bus frequency configurable, we need to make make two
related constants into variables. APIC_BUS_FREUQNCY and APIC_BUS_CYCLE_NS.
One can be calculated from the other.
APIC_BUS_CYCLES_NS = 1000 * 1000 * 1000 / APIC_BUS_FREQUENCY.
By removing APIC_BUS_FREQUENCY, we need to track only single variable
instead of two.

Signed-off-by: Isaku Yamahata <[email protected]>
---
Changes v3:
- Newly added according to Maxim Levistsky suggestion.
---
arch/x86/kvm/hyperv.c | 2 +-
arch/x86/kvm/lapic.h | 1 -
2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 238afd7335e4..a40ca2fef58c 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1687,7 +1687,7 @@ static int kvm_hv_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata,
data = (u64)vcpu->arch.virtual_tsc_khz * 1000;
break;
case HV_X64_MSR_APIC_FREQUENCY:
- data = APIC_BUS_FREQUENCY;
+ data = div64_u64(1000000000ULL, APIC_BUS_CYCLE_NS);
break;
default:
kvm_pr_unimpl_rdmsr(vcpu, msr);
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 0a0ea4b5dd8c..a20cb006b6c8 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -17,7 +17,6 @@
#define APIC_DEST_MASK 0x800

#define APIC_BUS_CYCLE_NS 1
-#define APIC_BUS_FREQUENCY (1000000000ULL / APIC_BUS_CYCLE_NS)

#define APIC_BROADCAST 0xFF
#define X2APIC_BROADCAST 0xFFFFFFFFul
--
2.25.1


2023-12-19 08:36:19

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v3 3/4] KVM: X86: Add a capability to configure bus frequency for APIC timer

Add KVM_CAP_X86_BUS_FREQUENCY_CONTROL capability to configure the core
crystal clock (or processor's bus clock) for APIC timer emulation. Allow
KVM_ENABLE_CAPABILITY(KVM_CAP_X86_BUS_FREQUENCY_CONTROL) to set the
frequency.

The TDX architecture hard-codes the APIC bus frequency to 25MHz. The TDX
mandates it to be enumerated in CPUIID leaf 0x15 and doesn't allow the VMM
to override its value. The KVM APIC timer emulation hard-codes the
frequency to 1GHz. The KVM doesn't enumerate it to the guest unless the
user space VMM sets the CPUID leaf 0x15 by KVM_SET_CPUID. If the CPUID
leaf 0x15 is enumerated, the guest kernel uses it as the APIC bus
frequency. If not, the guest kernel measures the frequency based on other
known timers like the ACPI timer or the legacy PIT. The TDX guest kernel
gets timer interrupt more times by (1GHz as the frequency KVM used) /
(25MHz as TDX CPUID enumerates). To ensure that the guest doesn't have a
conflicting view of the APIC bus frequency, allow the userspace to tell KVM
to use the same frequency that TDX mandates instead of the default 1Ghz.

There are several options to address this.
1. Make the KVM able to configure APIC bus frequency (This patch).
Pro: It resembles the existing hardware. The recent Intel CPUs
adapts 25MHz.
Con: Require the VMM to emulate the APIC timer at 25MHz.
2. Make the TDX architecture enumerate CPUID 0x15 to configurable
frequency or not enumerate it.
Pro: Any APIC bus frequency is allowed.
Con: Deviation from the real hardware.
3. Make the TDX guest kernel use 1GHz when it's running on KVM.
Con: The kernel ignores CPUID leaf 0x15.
4. Change CPUID.15H under TDX to report the crystal clock frequency as
1 GHz.
Pro: This has been the virtual APIC frequency for KVM guests for 13
years.
Pro: This requires changing only one hard-coded constant in TDX.
Con: It doesn't work with other VMMs as TDX isn't specific to KVM.

This patch doesn't affect the TSC deadline timer emulation. The APIC timer
emulation path calculates the TSC value from the TMICT register value and
uses the TSC deadline timer path. This patch touches only the APIC
timer-specific code.

[1] https://lore.kernel.org/lkml/[email protected]/
Reported-by: Vishal Annapurve <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>

---
Changes v3:
- Added reviewed-by Maxim Levitsky.
- minor update of the commit message.

Changes v2:
- Add check if vcpu isn't created.
- Add check if lapic chip is in-kernel emulation.
- Fix build error for i386.
- Add document to api.rst.
- typo in the commit message.
---
Documentation/virt/kvm/api.rst | 14 ++++++++++++++
arch/x86/kvm/x86.c | 33 +++++++++++++++++++++++++++++++++
include/uapi/linux/kvm.h | 1 +
3 files changed, 48 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 7025b3751027..cc976df2651e 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7858,6 +7858,20 @@ This capability is aimed to mitigate the threat that malicious VMs can
cause CPU stuck (due to event windows don't open up) and make the CPU
unavailable to host or other VMs.

+7.34 KVM_CAP_X86_BUS_FREQUENCY_CONTROL
+--------------------------------------
+
+:Architectures: x86
+:Target: VM
+:Parameters: args[0] is the value of apic bus clock frequency
+:Returns: 0 on success, -EINVAL if args[0] contains invalid value for the
+ frequency, or -ENXIO if virtual local APIC isn't enabled by
+ KVM_CREATE_IRQCHIP, or -EBUSY if any vcpu is created.
+
+This capability sets the APIC bus clock frequency (or core crystal clock
+frequency) for kvm to emulate APIC in the kernel. The default value is 1000000
+(1GHz).
+
8. Other capabilities.
======================

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d7d865f7c847..97f81d612366 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4625,6 +4625,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_ENABLE_CAP:
case KVM_CAP_VM_DISABLE_NX_HUGE_PAGES:
case KVM_CAP_IRQFD_RESAMPLE:
+ case KVM_CAP_X86_BUS_FREQUENCY_CONTROL:
r = 1;
break;
case KVM_CAP_EXIT_HYPERCALL:
@@ -6616,6 +6617,38 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
}
mutex_unlock(&kvm->lock);
break;
+ case KVM_CAP_X86_BUS_FREQUENCY_CONTROL: {
+ u64 bus_frequency = cap->args[0];
+ u64 bus_cycle_ns;
+
+ if (!bus_frequency)
+ return -EINVAL;
+ /* CPUID[0x15] only support 32bits. */
+ if (bus_frequency != (u32)bus_frequency)
+ return -EINVAL;
+
+ /* Cast to avoid 64bit division on 32bit platform. */
+ bus_cycle_ns = 1000000000UL / (u32)bus_frequency;
+ if (!bus_cycle_ns)
+ return -EINVAL;
+
+ r = 0;
+ mutex_lock(&kvm->lock);
+ /*
+ * Don't allow to change the frequency dynamically during vcpu
+ * running to avoid potentially bizarre behavior.
+ */
+ if (kvm->created_vcpus)
+ r = -EBUSY;
+ /* This is for in-kernel vAPIC emulation. */
+ else if (!irqchip_in_kernel(kvm))
+ r = -ENXIO;
+
+ if (!r)
+ kvm->arch.apic_bus_cycle_ns = bus_cycle_ns;
+ mutex_unlock(&kvm->lock);
+ return r;
+ }
default:
r = -EINVAL;
break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 211b86de35ac..d74a057df173 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1201,6 +1201,7 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE 228
#define KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES 229
#define KVM_CAP_ARM_SUPPORTED_REG_MASK_RANGES 230
+#define KVM_CAP_X86_BUS_FREQUENCY_CONTROL 231

#ifdef KVM_CAP_IRQ_ROUTING

--
2.25.1


2023-12-19 08:38:00

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v3 2/4] KVM: x86: Make the APIC bus cycles per nsec VM variable

Introduce the VM variable of the APIC cycles per nano second as the
preparation to make the APIC APIC bus frequency configurable.

The TDX architecture hard-codes the APIC bus frequency to 25MHz in the
CPUID leaf 0x15. The TDX mandates it to be exposed and doesn't allow the
VMM to override its value. The KVM APIC timer emulation hard-codes the
frequency to 1GHz. To ensure that the guest doesn't have a conflicting
view of the APIC bus frequency, allow the userspace to tell KVM to use the
same frequency that TDX mandates instead of the default 1Ghz.

Signed-off-by: Isaku Yamahata <[email protected]>
---
Changes v3:
- Update commit message.
- Dropped apic_bus_frequency according to Maxim Levitsky.

Changes v2:
- no change.
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/hyperv.c | 3 ++-
arch/x86/kvm/lapic.c | 6 ++++--
arch/x86/kvm/lapic.h | 2 +-
arch/x86/kvm/x86.c | 1 +
5 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d7036982332e..45ee7a1d13f6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1334,6 +1334,7 @@ struct kvm_arch {

u32 default_tsc_khz;
bool user_set_tsc;
+ u64 apic_bus_cycle_ns;

seqcount_raw_spinlock_t pvclock_sc;
bool use_master_clock;
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index a40ca2fef58c..37ff31c18ad1 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1687,7 +1687,8 @@ static int kvm_hv_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata,
data = (u64)vcpu->arch.virtual_tsc_khz * 1000;
break;
case HV_X64_MSR_APIC_FREQUENCY:
- data = div64_u64(1000000000ULL, APIC_BUS_CYCLE_NS);
+ data = div64_u64(1000000000ULL,
+ vcpu->kvm->arch.apic_bus_cycle_ns);
break;
default:
kvm_pr_unimpl_rdmsr(vcpu, msr);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 245b20973cae..73956b0ac1f1 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1542,7 +1542,8 @@ static u32 apic_get_tmcct(struct kvm_lapic *apic)
remaining = 0;

ns = mod_64(ktime_to_ns(remaining), apic->lapic_timer.period);
- return div64_u64(ns, (APIC_BUS_CYCLE_NS * apic->divide_count));
+ return div64_u64(ns, (apic->vcpu->kvm->arch.apic_bus_cycle_ns *
+ apic->divide_count));
}

static void __report_tpr_access(struct kvm_lapic *apic, bool write)
@@ -1960,7 +1961,8 @@ static void start_sw_tscdeadline(struct kvm_lapic *apic)

static inline u64 tmict_to_ns(struct kvm_lapic *apic, u32 tmict)
{
- return (u64)tmict * APIC_BUS_CYCLE_NS * (u64)apic->divide_count;
+ return (u64)tmict * apic->vcpu->kvm->arch.apic_bus_cycle_ns *
+ (u64)apic->divide_count;
}

static void update_target_expiration(struct kvm_lapic *apic, uint32_t old_divisor)
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index a20cb006b6c8..51e09f5a7fc5 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -16,7 +16,7 @@
#define APIC_DEST_NOSHORT 0x0
#define APIC_DEST_MASK 0x800

-#define APIC_BUS_CYCLE_NS 1
+#define APIC_BUS_CYCLE_NS_DEFAULT 1

#define APIC_BROADCAST 0xFF
#define X2APIC_BROADCAST 0xFFFFFFFFul
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1a3aaa7dafae..d7d865f7c847 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12466,6 +12466,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);

kvm->arch.default_tsc_khz = max_tsc_khz ? : tsc_khz;
+ kvm->arch.apic_bus_cycle_ns = APIC_BUS_CYCLE_NS_DEFAULT;
kvm->arch.guest_can_read_msr_platform_info = true;
kvm->arch.enable_pmu = enable_pmu;

--
2.25.1


2023-12-19 08:38:59

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v3 4/4] KVM: selftests: Add test case for x86 apic_bus_clock_frequency

Test if the apic bus clock frequency is exptected to the configured value.
Set APIC TMICT to the maximum value and busy wait for 100 msec (any value
is okay) with tsc value, and read TMCCT. Calculate apic bus clock frequency
based on TSC frequency.

Signed-off-by: Isaku Yamahata <[email protected]>
---
Changes v3:
- Use 1.5GHz instead of 1GHz as frequency.

Changes v2:
- Newly added.
---
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/x86_64/apic.h | 7 +
.../kvm/x86_64/apic_bus_clock_test.c | 135 ++++++++++++++++++
3 files changed, 143 insertions(+)
create mode 100644 tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c

diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 963435959a92..e07ec9c1dbd1 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -116,6 +116,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/vmx_invalid_nested_guest_state
TEST_GEN_PROGS_x86_64 += x86_64/vmx_set_nested_state_test
TEST_GEN_PROGS_x86_64 += x86_64/vmx_tsc_adjust_test
TEST_GEN_PROGS_x86_64 += x86_64/vmx_nested_tsc_scaling_test
+TEST_GEN_PROGS_x86_64 += x86_64/apic_bus_clock_test
TEST_GEN_PROGS_x86_64 += x86_64/xapic_ipi_test
TEST_GEN_PROGS_x86_64 += x86_64/xapic_state_test
TEST_GEN_PROGS_x86_64 += x86_64/xcr0_cpuid_test
diff --git a/tools/testing/selftests/kvm/include/x86_64/apic.h b/tools/testing/selftests/kvm/include/x86_64/apic.h
index bed316fdecd5..866a58d5fa11 100644
--- a/tools/testing/selftests/kvm/include/x86_64/apic.h
+++ b/tools/testing/selftests/kvm/include/x86_64/apic.h
@@ -60,6 +60,13 @@
#define APIC_VECTOR_MASK 0x000FF
#define APIC_ICR2 0x310
#define SET_APIC_DEST_FIELD(x) ((x) << 24)
+#define APIC_LVT0 0x350
+#define APIC_LVT_TIMER_ONESHOT (0 << 17)
+#define APIC_LVT_TIMER_PERIODIC (1 << 17)
+#define APIC_LVT_TIMER_TSCDEADLINE (2 << 17)
+#define APIC_TMICT 0x380
+#define APIC_TMCCT 0x390
+#define APIC_TDCR 0x3E0

void apic_disable(void);
void xapic_enable(void);
diff --git a/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c b/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
new file mode 100644
index 000000000000..e7896d703e7d
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
@@ -0,0 +1,135 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#define _GNU_SOURCE /* for program_invocation_short_name */
+
+#include "apic.h"
+#include "test_util.h"
+
+/*
+ * Pick one convenient value, 1.5Ghz. No special meaning and different from
+ * the default value, 1Ghz.
+ */
+#define TSC_HZ (1500 * 1000 * 1000ULL)
+
+/* Wait for 100 msec, not too long, not too short value. */
+#define LOOP_MSEC 100ULL
+#define TSC_WAIT_DELTA (TSC_HZ / 1000 * LOOP_MSEC)
+
+/* Pick up typical value. Different enough from the default value, 1GHz. */
+#define APIC_BUS_CLOCK_FREQ (25 * 1000 * 1000ULL)
+
+static void guest_code(void)
+{
+ /* Possible tdcr values and its divide count. */
+ struct {
+ u32 tdcr;
+ u32 divide_count;
+ } tdcrs[] = {
+ {0x0, 2},
+ {0x1, 4},
+ {0x2, 8},
+ {0x3, 16},
+ {0x8, 32},
+ {0x9, 64},
+ {0xa, 128},
+ {0xb, 1},
+ };
+
+ u32 tmict, tmcct;
+ u64 tsc0, tsc1;
+ int i;
+
+ asm volatile("cli");
+
+ xapic_enable();
+
+ /*
+ * Setup one-shot timer. Because we don't fire the interrupt, the
+ * vector doesn't matter.
+ */
+ xapic_write_reg(APIC_LVT0, APIC_LVT_TIMER_ONESHOT);
+
+ for (i = 0; i < ARRAY_SIZE(tdcrs); i++) {
+ xapic_write_reg(APIC_TDCR, tdcrs[i].tdcr);
+
+ /* Set the largest value to not trigger the interrupt. */
+ tmict = ~0;
+ xapic_write_reg(APIC_TMICT, tmict);
+
+ /* Busy wait for LOOP_MSEC */
+ tsc0 = rdtsc();
+ tsc1 = tsc0;
+ while (tsc1 - tsc0 < TSC_WAIT_DELTA)
+ tsc1 = rdtsc();
+
+ /* Read apic timer and tsc */
+ tmcct = xapic_read_reg(APIC_TMCCT);
+ tsc1 = rdtsc();
+
+ /* Stop timer */
+ xapic_write_reg(APIC_TMICT, 0);
+
+ /* Report it. */
+ GUEST_SYNC_ARGS(tdcrs[i].divide_count, tmict - tmcct,
+ tsc1 - tsc0, 0, 0);
+ }
+
+ GUEST_DONE();
+}
+
+void test_apic_bus_clock(struct kvm_vcpu *vcpu)
+{
+ bool done = false;
+ struct ucall uc;
+
+ while (!done) {
+ vcpu_run(vcpu);
+ TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
+
+ switch (get_ucall(vcpu, &uc)) {
+ case UCALL_DONE:
+ done = true;
+ break;
+ case UCALL_ABORT:
+ REPORT_GUEST_ASSERT(uc);
+ break;
+ case UCALL_SYNC: {
+ u32 divide_counter = uc.args[1];
+ u32 apic_cycles = uc.args[2];
+ u64 tsc_cycles = uc.args[3];
+ u64 freq;
+
+ TEST_ASSERT(tsc_cycles > 0,
+ "tsc cycles must not be zero.");
+
+ /* Allow 1% slack. */
+ freq = apic_cycles * divide_counter * TSC_HZ / tsc_cycles;
+ TEST_ASSERT(freq < APIC_BUS_CLOCK_FREQ * 101 / 100,
+ "APIC bus clock frequency is too large");
+ TEST_ASSERT(freq > APIC_BUS_CLOCK_FREQ * 99 / 100,
+ "APIC bus clock frequency is too small");
+ break;
+ }
+ default:
+ TEST_FAIL("Unknown ucall %lu", uc.cmd);
+ break;
+ }
+ }
+}
+
+int main(int argc, char *argv[])
+{
+ struct kvm_vm *vm;
+ struct kvm_vcpu *vcpu;
+
+ vm = __vm_create(VM_MODE_DEFAULT, 1, 0);
+ vm_ioctl(vm, KVM_SET_TSC_KHZ, (void *) (TSC_HZ / 1000));
+ /* KVM_CAP_X86_BUS_FREQUENCY_CONTROL requires that no vcpu is created. */
+ vm_enable_cap(vm, KVM_CAP_X86_BUS_FREQUENCY_CONTROL,
+ APIC_BUS_CLOCK_FREQ);
+ vcpu = vm_vcpu_add(vm, 0, guest_code);
+
+ virt_pg_map(vm, APIC_DEFAULT_GPA, APIC_DEFAULT_GPA);
+
+ test_apic_bus_clock(vcpu);
+ kvm_vm_free(vm);
+}
--
2.25.1


2023-12-21 05:27:19

by Xiaoyao Li

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] KVM: x86/hyperv: Calculate APIC bus frequency for hyper-v

On 12/19/2023 4:34 PM, Isaku Yamahata wrote:
> Remove APIC_BUS_FREUQNCY and calculate it based on APIC bus cycles per NS.
> APIC_BUS_FREUQNCY is used only for HV_X64_MSR_APIC_FREQUENCY. The MSR is
> not frequently read, calculate it every time.
>
> In order to make APIC bus frequency configurable, we need to make make two

two 'make', please drop one.

> related constants into variables. APIC_BUS_FREUQNCY and APIC_BUS_CYCLE_NS.
> One can be calculated from the other.
> APIC_BUS_CYCLES_NS = 1000 * 1000 * 1000 / APIC_BUS_FREQUENCY.
> By removing APIC_BUS_FREQUENCY, we need to track only single variable
> instead of two.
>
> Signed-off-by: Isaku Yamahata <[email protected]>
> ---
> Changes v3:
> - Newly added according to Maxim Levistsky suggestion.
> ---
> arch/x86/kvm/hyperv.c | 2 +-
> arch/x86/kvm/lapic.h | 1 -
> 2 files changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 238afd7335e4..a40ca2fef58c 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -1687,7 +1687,7 @@ static int kvm_hv_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata,
> data = (u64)vcpu->arch.virtual_tsc_khz * 1000;
> break;
> case HV_X64_MSR_APIC_FREQUENCY:
> - data = APIC_BUS_FREQUENCY;
> + data = div64_u64(1000000000ULL, APIC_BUS_CYCLE_NS);
> break;
> default:
> kvm_pr_unimpl_rdmsr(vcpu, msr);
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index 0a0ea4b5dd8c..a20cb006b6c8 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -17,7 +17,6 @@
> #define APIC_DEST_MASK 0x800
>
> #define APIC_BUS_CYCLE_NS 1
> -#define APIC_BUS_FREQUENCY (1000000000ULL / APIC_BUS_CYCLE_NS)
>
> #define APIC_BROADCAST 0xFF
> #define X2APIC_BROADCAST 0xFFFFFFFFul


2023-12-21 05:44:08

by Xiaoyao Li

[permalink] [raw]
Subject: Re: [PATCH v3 2/4] KVM: x86: Make the APIC bus cycles per nsec VM variable

On 12/19/2023 4:34 PM, Isaku Yamahata wrote:
> Introduce the VM variable of the APIC cycles per nano second as the
> preparation to make the APIC APIC bus frequency configurable.
>
> The TDX architecture hard-codes the APIC bus frequency to 25MHz in the
> CPUID leaf 0x15.

The intend from TDX architecture is not to hard code APIC bus frequency.
It is just a side effect of "TDX architecture uses CPUID leaf 0x15 to
expose TSC frequency for TD guest and choose a hard-coded 25MHz (the
same as the hardware that supports TDX) as the core crystal frequency"

SDM says "The APIC timer frequency will be the processor’s bus clock or
core crystal clock frequency (when TSC/core crystal clock ratio is
enumerated in CPUID leaf 0x15) divided by the value specified in the
divide configuration register."

> The TDX mandates it to be exposed and doesn't allow the
> VMM to override its value. The KVM APIC timer emulation hard-codes the
> frequency to 1GHz. To ensure that the guest doesn't have a conflicting
> view of the APIC bus frequency, allow the userspace to tell KVM to use the
> same frequency that TDX mandates instead of the default 1Ghz.
>
> Signed-off-by: Isaku Yamahata <[email protected]>
> ---
> Changes v3:
> - Update commit message.
> - Dropped apic_bus_frequency according to Maxim Levitsky.
>
> Changes v2:
> - no change.
> ---
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/hyperv.c | 3 ++-
> arch/x86/kvm/lapic.c | 6 ++++--
> arch/x86/kvm/lapic.h | 2 +-
> arch/x86/kvm/x86.c | 1 +
> 5 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index d7036982332e..45ee7a1d13f6 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1334,6 +1334,7 @@ struct kvm_arch {
>
> u32 default_tsc_khz;
> bool user_set_tsc;
> + u64 apic_bus_cycle_ns;
>
> seqcount_raw_spinlock_t pvclock_sc;
> bool use_master_clock;
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index a40ca2fef58c..37ff31c18ad1 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -1687,7 +1687,8 @@ static int kvm_hv_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata,
> data = (u64)vcpu->arch.virtual_tsc_khz * 1000;
> break;
> case HV_X64_MSR_APIC_FREQUENCY:
> - data = div64_u64(1000000000ULL, APIC_BUS_CYCLE_NS);
> + data = div64_u64(1000000000ULL,
> + vcpu->kvm->arch.apic_bus_cycle_ns);
> break;
> default:
> kvm_pr_unimpl_rdmsr(vcpu, msr);
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 245b20973cae..73956b0ac1f1 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1542,7 +1542,8 @@ static u32 apic_get_tmcct(struct kvm_lapic *apic)
> remaining = 0;
>
> ns = mod_64(ktime_to_ns(remaining), apic->lapic_timer.period);
> - return div64_u64(ns, (APIC_BUS_CYCLE_NS * apic->divide_count));
> + return div64_u64(ns, (apic->vcpu->kvm->arch.apic_bus_cycle_ns *
> + apic->divide_count));
> }
>
> static void __report_tpr_access(struct kvm_lapic *apic, bool write)
> @@ -1960,7 +1961,8 @@ static void start_sw_tscdeadline(struct kvm_lapic *apic)
>
> static inline u64 tmict_to_ns(struct kvm_lapic *apic, u32 tmict)
> {
> - return (u64)tmict * APIC_BUS_CYCLE_NS * (u64)apic->divide_count;
> + return (u64)tmict * apic->vcpu->kvm->arch.apic_bus_cycle_ns *
> + (u64)apic->divide_count;
> }
>
> static void update_target_expiration(struct kvm_lapic *apic, uint32_t old_divisor)
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index a20cb006b6c8..51e09f5a7fc5 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -16,7 +16,7 @@
> #define APIC_DEST_NOSHORT 0x0
> #define APIC_DEST_MASK 0x800
>
> -#define APIC_BUS_CYCLE_NS 1
> +#define APIC_BUS_CYCLE_NS_DEFAULT 1
>
> #define APIC_BROADCAST 0xFF
> #define X2APIC_BROADCAST 0xFFFFFFFFul
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 1a3aaa7dafae..d7d865f7c847 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12466,6 +12466,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
>
> kvm->arch.default_tsc_khz = max_tsc_khz ? : tsc_khz;
> + kvm->arch.apic_bus_cycle_ns = APIC_BUS_CYCLE_NS_DEFAULT;
> kvm->arch.guest_can_read_msr_platform_info = true;
> kvm->arch.enable_pmu = enable_pmu;
>


2023-12-21 17:38:54

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] KVM: selftests: Add test case for x86 apic_bus_clock_frequency

On Tue, 2023-12-19 at 00:34 -0800, Isaku Yamahata wrote:
> Test if the apic bus clock frequency is exptected to the configured value.
Typo.
> Set APIC TMICT to the maximum value and busy wait for 100 msec (any value
> is okay) with tsc value, and read TMCCT. Calculate apic bus clock frequency
> based on TSC frequency.
>
> Signed-off-by: Isaku Yamahata <[email protected]>
> ---
> Changes v3:
> - Use 1.5GHz instead of 1GHz as frequency.
>
> Changes v2:
> - Newly added.
> ---
> tools/testing/selftests/kvm/Makefile | 1 +
> .../selftests/kvm/include/x86_64/apic.h | 7 +
> .../kvm/x86_64/apic_bus_clock_test.c | 135 ++++++++++++++++++
> 3 files changed, 143 insertions(+)
> create mode 100644 tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
>
> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> index 963435959a92..e07ec9c1dbd1 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -116,6 +116,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/vmx_invalid_nested_guest_state
> TEST_GEN_PROGS_x86_64 += x86_64/vmx_set_nested_state_test
> TEST_GEN_PROGS_x86_64 += x86_64/vmx_tsc_adjust_test
> TEST_GEN_PROGS_x86_64 += x86_64/vmx_nested_tsc_scaling_test
> +TEST_GEN_PROGS_x86_64 += x86_64/apic_bus_clock_test
> TEST_GEN_PROGS_x86_64 += x86_64/xapic_ipi_test
> TEST_GEN_PROGS_x86_64 += x86_64/xapic_state_test
> TEST_GEN_PROGS_x86_64 += x86_64/xcr0_cpuid_test
> diff --git a/tools/testing/selftests/kvm/include/x86_64/apic.h b/tools/testing/selftests/kvm/include/x86_64/apic.h
> index bed316fdecd5..866a58d5fa11 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/apic.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/apic.h
> @@ -60,6 +60,13 @@
> #define APIC_VECTOR_MASK 0x000FF
> #define APIC_ICR2 0x310
> #define SET_APIC_DEST_FIELD(x) ((x) << 24)
> +#define APIC_LVT0 0x350
> +#define APIC_LVT_TIMER_ONESHOT (0 << 17)
> +#define APIC_LVT_TIMER_PERIODIC (1 << 17)
> +#define APIC_LVT_TIMER_TSCDEADLINE (2 << 17)
> +#define APIC_TMICT 0x380
> +#define APIC_TMCCT 0x390
> +#define APIC_TDCR 0x3E0
>
> void apic_disable(void);
> void xapic_enable(void);
> diff --git a/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c b/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
> new file mode 100644
> index 000000000000..e7896d703e7d
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
> @@ -0,0 +1,135 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +#define _GNU_SOURCE /* for program_invocation_short_name */
> +
> +#include "apic.h"
> +#include "test_util.h"
> +
> +/*
> + * Pick one convenient value, 1.5Ghz. No special meaning and different from
> + * the default value, 1Ghz.
> + */
> +#define TSC_HZ (1500 * 1000 * 1000ULL)
> +
> +/* Wait for 100 msec, not too long, not too short value. */
> +#define LOOP_MSEC 100ULL
> +#define TSC_WAIT_DELTA (TSC_HZ / 1000 * LOOP_MSEC)
> +
> +/* Pick up typical value. Different enough from the default value, 1GHz. */
> +#define APIC_BUS_CLOCK_FREQ (25 * 1000 * 1000ULL)
> +
> +static void guest_code(void)
> +{
> + /* Possible tdcr values and its divide count. */
> + struct {
> + u32 tdcr;
> + u32 divide_count;
> + } tdcrs[] = {
> + {0x0, 2},
> + {0x1, 4},
> + {0x2, 8},
> + {0x3, 16},
> + {0x8, 32},
> + {0x9, 64},
> + {0xa, 128},
> + {0xb, 1},
> + };
> +
> + u32 tmict, tmcct;
> + u64 tsc0, tsc1;
> + int i;
> +
> + asm volatile("cli");
> +
> + xapic_enable();
> +
> + /*
> + * Setup one-shot timer. Because we don't fire the interrupt, the
> + * vector doesn't matter.
> + */
> + xapic_write_reg(APIC_LVT0, APIC_LVT_TIMER_ONESHOT);
> +
> + for (i = 0; i < ARRAY_SIZE(tdcrs); i++) {
> + xapic_write_reg(APIC_TDCR, tdcrs[i].tdcr);
> +
> + /* Set the largest value to not trigger the interrupt. */
> + tmict = ~0;
> + xapic_write_reg(APIC_TMICT, tmict);
> +
> + /* Busy wait for LOOP_MSEC */
> + tsc0 = rdtsc();
> + tsc1 = tsc0;
> + while (tsc1 - tsc0 < TSC_WAIT_DELTA)
> + tsc1 = rdtsc();
> +
> + /* Read apic timer and tsc */
> + tmcct = xapic_read_reg(APIC_TMCCT);
> + tsc1 = rdtsc();
> +
> + /* Stop timer */
> + xapic_write_reg(APIC_TMICT, 0);
> +
> + /* Report it. */
> + GUEST_SYNC_ARGS(tdcrs[i].divide_count, tmict - tmcct,
> + tsc1 - tsc0, 0, 0);
> + }
> +
> + GUEST_DONE();
> +}
> +
> +void test_apic_bus_clock(struct kvm_vcpu *vcpu)
> +{
> + bool done = false;
> + struct ucall uc;
> +
> + while (!done) {
> + vcpu_run(vcpu);
> + TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
> +
> + switch (get_ucall(vcpu, &uc)) {
> + case UCALL_DONE:
> + done = true;
> + break;
> + case UCALL_ABORT:
> + REPORT_GUEST_ASSERT(uc);
> + break;
> + case UCALL_SYNC: {
> + u32 divide_counter = uc.args[1];
> + u32 apic_cycles = uc.args[2];
> + u64 tsc_cycles = uc.args[3];
> + u64 freq;
> +
> + TEST_ASSERT(tsc_cycles > 0,
> + "tsc cycles must not be zero.");
> +
> + /* Allow 1% slack. */
> + freq = apic_cycles * divide_counter * TSC_HZ / tsc_cycles;
> + TEST_ASSERT(freq < APIC_BUS_CLOCK_FREQ * 101 / 100,
> + "APIC bus clock frequency is too large");
> + TEST_ASSERT(freq > APIC_BUS_CLOCK_FREQ * 99 / 100,
> + "APIC bus clock frequency is too small");
> + break;
> + }
> + default:
> + TEST_FAIL("Unknown ucall %lu", uc.cmd);
> + break;
> + }
> + }
> +}
> +
> +int main(int argc, char *argv[])
> +{
> + struct kvm_vm *vm;
> + struct kvm_vcpu *vcpu;
> +
> + vm = __vm_create(VM_MODE_DEFAULT, 1, 0);
> + vm_ioctl(vm, KVM_SET_TSC_KHZ, (void *) (TSC_HZ / 1000));
> + /* KVM_CAP_X86_BUS_FREQUENCY_CONTROL requires that no vcpu is created. */
> + vm_enable_cap(vm, KVM_CAP_X86_BUS_FREQUENCY_CONTROL,
> + APIC_BUS_CLOCK_FREQ);
> + vcpu = vm_vcpu_add(vm, 0, guest_code);
> +
> + virt_pg_map(vm, APIC_DEFAULT_GPA, APIC_DEFAULT_GPA);
> +
> + test_apic_bus_clock(vcpu);
> + kvm_vm_free(vm);
> +}



Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky



2023-12-21 17:39:18

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] KVM: x86/hyperv: Calculate APIC bus frequency for hyper-v

On Tue, 2023-12-19 at 00:34 -0800, Isaku Yamahata wrote:
> Remove APIC_BUS_FREUQNCY

> and calculate it based on APIC bus cycles per NS.
> APIC_BUS_FREUQNCY is used only for HV_X64_MSR_APIC_FREQUENCY. The MSR is
> not frequently read, calculate it every time.
>
> In order to make APIC bus frequency configurable, we need to make make two
> related constants into variables. APIC_BUS_FREUQNCY and APIC_BUS_CYCLE_NS.
> One can be calculated from the other.

APIC_BUS_FREUQNCY is a typo.

> APIC_BUS_CYCLES_NS = 1000 * 1000 * 1000 / APIC_BUS_FREQUENCY.
> By removing APIC_BUS_FREQUENCY, we need to track only single variable
> instead of two.
>
> Signed-off-by: Isaku Yamahata <[email protected]>
> ---
> Changes v3:
> - Newly added according to Maxim Levistsky suggestion.
Typo: Maxim Levitsky

> ---
> arch/x86/kvm/hyperv.c | 2 +-
> arch/x86/kvm/lapic.h | 1 -
> 2 files changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 238afd7335e4..a40ca2fef58c 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -1687,7 +1687,7 @@ static int kvm_hv_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata,
> data = (u64)vcpu->arch.virtual_tsc_khz * 1000;
> break;
> case HV_X64_MSR_APIC_FREQUENCY:
> - data = APIC_BUS_FREQUENCY;
> + data = div64_u64(1000000000ULL, APIC_BUS_CYCLE_NS);
> break;
> default:
> kvm_pr_unimpl_rdmsr(vcpu, msr);
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index 0a0ea4b5dd8c..a20cb006b6c8 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -17,7 +17,6 @@
> #define APIC_DEST_MASK 0x800
>
> #define APIC_BUS_CYCLE_NS 1
> -#define APIC_BUS_FREQUENCY (1000000000ULL / APIC_BUS_CYCLE_NS)
>
> #define APIC_BROADCAST 0xFF
> #define X2APIC_BROADCAST 0xFFFFFFFFul


Other than typos:

Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky


2023-12-21 17:41:58

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH v3 2/4] KVM: x86: Make the APIC bus cycles per nsec VM variable

On Tue, 2023-12-19 at 00:34 -0800, Isaku Yamahata wrote:
> Introduce the VM variable of the APIC cycles per nano second as the
> preparation to make the APIC APIC bus frequency configurable.
>
> The TDX architecture hard-codes the APIC bus frequency to 25MHz in the
> CPUID leaf 0x15. The TDX mandates it to be exposed and doesn't allow the
> VMM to override its value. The KVM APIC timer emulation hard-codes the
> frequency to 1GHz. To ensure that the guest doesn't have a conflicting
> view of the APIC bus frequency, allow the userspace to tell KVM to use the
> same frequency that TDX mandates instead of the default 1Ghz.
>
> Signed-off-by: Isaku Yamahata <[email protected]>
> ---
> Changes v3:
> - Update commit message.
> - Dropped apic_bus_frequency according to Maxim Levitsky.

Looks good to me, thanks!



>
> Changes v2:
> - no change.
> ---
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/hyperv.c | 3 ++-
> arch/x86/kvm/lapic.c | 6 ++++--
> arch/x86/kvm/lapic.h | 2 +-
> arch/x86/kvm/x86.c | 1 +
> 5 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index d7036982332e..45ee7a1d13f6 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1334,6 +1334,7 @@ struct kvm_arch {
>
> u32 default_tsc_khz;
> bool user_set_tsc;
> + u64 apic_bus_cycle_ns;
>
> seqcount_raw_spinlock_t pvclock_sc;
> bool use_master_clock;
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index a40ca2fef58c..37ff31c18ad1 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -1687,7 +1687,8 @@ static int kvm_hv_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata,
> data = (u64)vcpu->arch.virtual_tsc_khz * 1000;
> break;
> case HV_X64_MSR_APIC_FREQUENCY:
> - data = div64_u64(1000000000ULL, APIC_BUS_CYCLE_NS);
> + data = div64_u64(1000000000ULL,
> + vcpu->kvm->arch.apic_bus_cycle_ns);
> break;
> default:
> kvm_pr_unimpl_rdmsr(vcpu, msr);
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 245b20973cae..73956b0ac1f1 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1542,7 +1542,8 @@ static u32 apic_get_tmcct(struct kvm_lapic *apic)
> remaining = 0;
>
> ns = mod_64(ktime_to_ns(remaining), apic->lapic_timer.period);
> - return div64_u64(ns, (APIC_BUS_CYCLE_NS * apic->divide_count));
> + return div64_u64(ns, (apic->vcpu->kvm->arch.apic_bus_cycle_ns *
> + apic->divide_count));
> }
>
> static void __report_tpr_access(struct kvm_lapic *apic, bool write)
> @@ -1960,7 +1961,8 @@ static void start_sw_tscdeadline(struct kvm_lapic *apic)
>
> static inline u64 tmict_to_ns(struct kvm_lapic *apic, u32 tmict)
> {
> - return (u64)tmict * APIC_BUS_CYCLE_NS * (u64)apic->divide_count;
> + return (u64)tmict * apic->vcpu->kvm->arch.apic_bus_cycle_ns *
> + (u64)apic->divide_count;
> }
>
> static void update_target_expiration(struct kvm_lapic *apic, uint32_t old_divisor)
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index a20cb006b6c8..51e09f5a7fc5 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -16,7 +16,7 @@
> #define APIC_DEST_NOSHORT 0x0
> #define APIC_DEST_MASK 0x800
>
> -#define APIC_BUS_CYCLE_NS 1
> +#define APIC_BUS_CYCLE_NS_DEFAULT 1
>
> #define APIC_BROADCAST 0xFF
> #define X2APIC_BROADCAST 0xFFFFFFFFul
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 1a3aaa7dafae..d7d865f7c847 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12466,6 +12466,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
>
> kvm->arch.default_tsc_khz = max_tsc_khz ? : tsc_khz;
> + kvm->arch.apic_bus_cycle_ns = APIC_BUS_CYCLE_NS_DEFAULT;
> kvm->arch.guest_can_read_msr_platform_info = true;
> kvm->arch.enable_pmu = enable_pmu;
>


Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky



2023-12-29 04:21:18

by Xiaoyao Li

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] KVM: x86/hyperv: Calculate APIC bus frequency for hyper-v

On 12/21/2023 1:26 PM, Xiaoyao Li wrote:
> On 12/19/2023 4:34 PM, Isaku Yamahata wrote:
>> Remove APIC_BUS_FREUQNCY and calculate it based on APIC bus cycles per
>> NS.
>> APIC_BUS_FREUQNCY is used only for HV_X64_MSR_APIC_FREQUENCY.  The MSR is
>> not frequently read, calculate it every time.
>>
>> In order to make APIC bus frequency configurable, we need to make make
>> two
>
> two 'make', please drop one.

With this and other typos pointed by Maxim fixed.

Reviewed-by: Xiaoyao Li <[email protected]>

>> related constants into variables.  APIC_BUS_FREUQNCY and
>> APIC_BUS_CYCLE_NS.
>> One can be calculated from the other.
>>     APIC_BUS_CYCLES_NS = 1000 * 1000 * 1000 / APIC_BUS_FREQUENCY.
>> By removing APIC_BUS_FREQUENCY, we need to track only single variable
>> instead of two.
>>
>> Signed-off-by: Isaku Yamahata <[email protected]>
>> ---
>> Changes v3:
>> - Newly added according to Maxim Levistsky suggestion.
>> ---
>>   arch/x86/kvm/hyperv.c | 2 +-
>>   arch/x86/kvm/lapic.h  | 1 -
>>   2 files changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
>> index 238afd7335e4..a40ca2fef58c 100644
>> --- a/arch/x86/kvm/hyperv.c
>> +++ b/arch/x86/kvm/hyperv.c
>> @@ -1687,7 +1687,7 @@ static int kvm_hv_get_msr(struct kvm_vcpu *vcpu,
>> u32 msr, u64 *pdata,
>>           data = (u64)vcpu->arch.virtual_tsc_khz * 1000;
>>           break;
>>       case HV_X64_MSR_APIC_FREQUENCY:
>> -        data = APIC_BUS_FREQUENCY;
>> +        data = div64_u64(1000000000ULL, APIC_BUS_CYCLE_NS);
>>           break;
>>       default:
>>           kvm_pr_unimpl_rdmsr(vcpu, msr);
>> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
>> index 0a0ea4b5dd8c..a20cb006b6c8 100644
>> --- a/arch/x86/kvm/lapic.h
>> +++ b/arch/x86/kvm/lapic.h
>> @@ -17,7 +17,6 @@
>>   #define APIC_DEST_MASK            0x800
>>   #define APIC_BUS_CYCLE_NS       1
>> -#define APIC_BUS_FREQUENCY      (1000000000ULL / APIC_BUS_CYCLE_NS)
>>   #define APIC_BROADCAST            0xFF
>>   #define X2APIC_BROADCAST        0xFFFFFFFFul
>
>


2023-12-29 04:41:41

by Xiaoyao Li

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] KVM: X86: Add a capability to configure bus frequency for APIC timer

On 12/19/2023 4:34 PM, Isaku Yamahata wrote:
> +7.34 KVM_CAP_X86_BUS_FREQUENCY_CONTROL
> +--------------------------------------
> +
> +:Architectures: x86
> +:Target: VM
> +:Parameters: args[0] is the value of apic bus clock frequency
> +:Returns: 0 on success, -EINVAL if args[0] contains invalid value for the
> + frequency, or -ENXIO if virtual local APIC isn't enabled by
> + KVM_CREATE_IRQCHIP, or -EBUSY if any vcpu is created.
> +
> +This capability sets the APIC bus clock frequency (or core crystal clock
> +frequency) for kvm to emulate APIC in the kernel.

Isaku,

you are mixing the `bus clock` and `core crystal clock` frequency. They
are different.

- When CPUID 0x15 doesn't exist, or CPUID 0x15 doesn't enumerate core
crystal clock frequency, the APIC timer frequency is the processor's bus
clock.

- When CPUID 0x15 does enumerate the core crystal clock frequency, the
APIC timer frequency is the core crystal clock frequency.

This patch only enables the user-configurable bus clock frequency, or
specifically APIC timer frequency. It doesn't enable the configuration
of core crystal clock frequency. Userspace can configure core crystal
clock frequency by passing a valid CPUID 0x15 leaf into KVM_SET_CPUID2,
not by this KVM_CAP.

> The default value is 1000000
> +(1GHz).


2024-02-23 19:34:14

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] KVM: X86: Add a capability to configure bus frequency for APIC timer

On Tue, Dec 19, 2023, Isaku Yamahata wrote:
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 7025b3751027..cc976df2651e 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -7858,6 +7858,20 @@ This capability is aimed to mitigate the threat that malicious VMs can
> cause CPU stuck (due to event windows don't open up) and make the CPU
> unavailable to host or other VMs.
>
> +7.34 KVM_CAP_X86_BUS_FREQUENCY_CONTROL

BUS_FREQUENCY_CONTROL is simultaneously too long, yet not descriptive enough.
Depending on whether people get hung up on nanoseconds not being a "frequency",
either KVM_CAP_X86_APIC_BUS_FREQUENCY or KVM_CAP_X86_APIC_BUS_CYCLES_NS.

Also, this series needs to be rebased onto kvm-x86/next.

> +:Architectures: x86
> +:Target: VM
> +:Parameters: args[0] is the value of apic bus clock frequency
> +:Returns: 0 on success, -EINVAL if args[0] contains invalid value for the
> + frequency, or -ENXIO if virtual local APIC isn't enabled by
> + KVM_CREATE_IRQCHIP, or -EBUSY if any vcpu is created.
> +
> +This capability sets the APIC bus clock frequency (or core crystal clock
> +frequency) for kvm to emulate APIC in the kernel. The default value is 1000000
> +(1GHz).

If we're going to add a capability, might as well make KVM's default value
discoverable.

This also needs to clarify the units. Having to count the number of zeros in the
code to figure that out is ridiculous.

And as Xiaoyao, this is NOT the core crystal clock. Though conversely, this
documentation should make it clear that setting CPUID 0x15 is userspace's problem.
E.g.

7.35 KVM_CAP_X86_APIC_BUS_FREQUENCY
-----------------------------------

:Architectures: x86
:Target: VM
:Parameters: args[0] is the desired APIC bus clock rate, in nanoseconds
:Returns: 0 on success, -EINVAL if args[0] contains invalid value for the
frequency, or -ENXIO if virtual local APIC isn't enabled by
KVM_CREATE_IRQCHIP, or -EBUSY if any vcpu is created.

This capability sets VM's APIC bus clock frequency, used by KVM's in-kernel
virtual APIC when emulating APIC timers. KVM's default value can be retrieved
by KVM_CHECK_EXTENSION.

Note: Userspace is responsible for correctly configuring CPUID 0x15, a.k.a. the
core crystal clock frequency, if a non-zero CPUID 0x15 is exposed to the guest.

> 8. Other capabilities.
> ======================
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index d7d865f7c847..97f81d612366 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4625,6 +4625,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> case KVM_CAP_ENABLE_CAP:
> case KVM_CAP_VM_DISABLE_NX_HUGE_PAGES:
> case KVM_CAP_IRQFD_RESAMPLE:
> + case KVM_CAP_X86_BUS_FREQUENCY_CONTROL:
> r = 1;

And instead of returning '1', return APIC_BUS_CYCLE_NS_DEFAULT (which is amusingly
also '1', but there's no reason to rely on that, it's unnecessarily confusing).

> break;
> case KVM_CAP_EXIT_HYPERCALL:
> @@ -6616,6 +6617,38 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> }
> mutex_unlock(&kvm->lock);
> break;
> + case KVM_CAP_X86_BUS_FREQUENCY_CONTROL: {
> + u64 bus_frequency = cap->args[0];
> + u64 bus_cycle_ns;
> +
> + if (!bus_frequency)
> + return -EINVAL;
> + /* CPUID[0x15] only support 32bits. */

So? This capability might exist to play nice with TDX forcing CPUID 0x15, but
that doesn't mean the capability is beholden to CPUID 0x15.

> + if (bus_frequency != (u32)bus_frequency)
> + return -EINVAL;
> +
> + /* Cast to avoid 64bit division on 32bit platform. */
> + bus_cycle_ns = 1000000000UL / (u32)bus_frequency;

Why take the userspace value as a frequency? That will unnecessarily result in
loss of fidelity if 1000000000UL isn't cleanly disibile by bus_frequency, e.g.
reversing the math in the Hyper-V code will yield the "wrong" frequency.

> + if (!bus_cycle_ns)

This needs to guard against overflow in tmict_to_ns(). The max divide count is
14, I think? Whatever this yields:

apic->divide_count = 0x1 << (tmp2 & 0x7);

So from that, we can derive the max allowed bus_cycle_ns.

> + return -EINVAL;

Use break, like literally every other case statement. Burying a return in the
middle of this pile will result in breakage if kvm_vm_ioctl_enable_cap() ever
gains an epilogue.

> +
> + r = 0;
> + mutex_lock(&kvm->lock);
> + /*
> + * Don't allow to change the frequency dynamically during vcpu
> + * running to avoid potentially bizarre behavior.
> + */
> + if (kvm->created_vcpus)
> + r = -EBUSY;

EINVAL, not EBUSY, because userspace can't magically uncreate vCPUs.

> + /* This is for in-kernel vAPIC emulation. */

Meh, just drop the comment. Same for the one above created_vcpus, it's fairly
self-explantory.

> + else if (!irqchip_in_kernel(kvm))
> + r = -ENXIO;

This should go before created_vcpus, e.g. creating a vCPU shouldn't change the
error code.

> +
> + if (!r)

Make this an else...

> + kvm->arch.apic_bus_cycle_ns = bus_cycle_ns;

> + mutex_unlock(&kvm->lock);
> + return r;

break;

Something like:

case KVM_CAP_X86_APIC_BUS_FREQUENCY: {
u64 bus_cycle_ns = cap->args[0];

r = -EINVAL;
if (!bus_frequency || bus_frequency > (whatever cause overflow))
break;

r = 0;
mutex_lock(&kvm->lock);
if (!irqchip_in_kernel(kvm))
r = -ENXIO;
else if (kvm->created_vcpus)
r = -EINVAL;
else
kvm->arch.apic_bus_cycle_ns = bus_cycle_ns;
mutex_unlock(&kvm->lock);
break;
}


2024-03-08 01:36:20

by Isaku Yamahata

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] KVM: X86: Add a capability to configure bus frequency for APIC timer

On Fri, Feb 23, 2024 at 11:33:54AM -0800,
Sean Christopherson <[email protected]> wrote:

> On Tue, Dec 19, 2023, Isaku Yamahata wrote:
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index 7025b3751027..cc976df2651e 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -7858,6 +7858,20 @@ This capability is aimed to mitigate the threat that malicious VMs can
> > cause CPU stuck (due to event windows don't open up) and make the CPU
> > unavailable to host or other VMs.
> >
> > +7.34 KVM_CAP_X86_BUS_FREQUENCY_CONTROL
>
> BUS_FREQUENCY_CONTROL is simultaneously too long, yet not descriptive enough.
> Depending on whether people get hung up on nanoseconds not being a "frequency",
> either KVM_CAP_X86_APIC_BUS_FREQUENCY or KVM_CAP_X86_APIC_BUS_CYCLES_NS.
>
> Also, this series needs to be rebased onto kvm-x86/next.

Thanks for the feedback with the concrete change to the patch.
I agree with those for the next respin.
--
Isaku Yamahata <[email protected]>