2019-12-26 14:00:05

by yezengruan

[permalink] [raw]
Subject: [PATCH v2 0/6] KVM: arm64: VCPU preempted check support

This patch set aims to support the vcpu_is_preempted() functionality
under KVM/arm64, which allowing the guest to obtain the VCPU is
currently running or not. This will enhance lock performance on
overcommitted hosts (more runnable VCPUs than physical CPUs in the
system) as doing busy waits for preempted VCPUs will hurt system
performance far worse than early yielding.

We have observed some performace improvements in uninx benchmark tests.

unix benchmark result:
host: kernel 5.5.0-rc1, HiSilicon Kunpeng920, 8 CPUs
guest: kernel 5.5.0-rc1, 16 VCPUs

test-case | after-patch | before-patch
----------------------------------------+-------------------+------------------
Dhrystone 2 using register variables | 334600751.0 lps | 335319028.3 lps
Double-Precision Whetstone | 32856.1 MWIPS | 32849.6 MWIPS
Execl Throughput | 3662.1 lps | 2718.0 lps
File Copy 1024 bufsize 2000 maxblocks | 432906.4 KBps | 158011.8 KBps
File Copy 256 bufsize 500 maxblocks | 116023.0 KBps | 37664.0 KBps
File Copy 4096 bufsize 8000 maxblocks | 1432769.8 KBps | 441108.8 KBps
Pipe Throughput | 6405029.6 lps | 6021457.6 lps
Pipe-based Context Switching | 185872.7 lps | 184255.3 lps
Process Creation | 4025.7 lps | 3706.6 lps
Shell Scripts (1 concurrent) | 6745.6 lpm | 6436.1 lpm
Shell Scripts (8 concurrent) | 998.7 lpm | 931.1 lpm
System Call Overhead | 3913363.1 lps | 3883287.8 lps
----------------------------------------+-------------------+------------------
System Benchmarks Index Score | 1835.1 | 1327.6

Changes from v1:
https://lore.kernel.org/lkml/[email protected]/
* Guest kernel no longer allocates the PV lock structure, instead it
is allocated by user space to avoid lifetime issues about kexec.
* Provide VCPU attributes for PV lock.
* Update SMC number of PV lock features.
* Report some basic validation when PV lock init.
* Document preempted field.
* Bunch of typo fixes.

Zengruan Ye (6):
KVM: arm64: Document PV-lock interface
KVM: arm64: Add SMCCC paravirtualised lock calls
KVM: arm64: Support pvlock preempted via shared structure
KVM: arm64: Provide VCPU attributes for PV lock
KVM: arm64: Add interface to support VCPU preempted check
KVM: arm64: Support the VCPU preemption check

Documentation/virt/kvm/arm/pvlock.rst | 63 ++++++++++++
Documentation/virt/kvm/devices/vcpu.txt | 14 +++
arch/arm/include/asm/kvm_host.h | 18 ++++
arch/arm64/include/asm/kvm_host.h | 28 ++++++
arch/arm64/include/asm/paravirt.h | 15 +++
arch/arm64/include/asm/pvlock-abi.h | 16 ++++
arch/arm64/include/asm/spinlock.h | 7 ++
arch/arm64/include/uapi/asm/kvm.h | 2 +
arch/arm64/kernel/Makefile | 2 +-
arch/arm64/kernel/paravirt-spinlocks.c | 13 +++
arch/arm64/kernel/paravirt.c | 121 +++++++++++++++++++++++-
arch/arm64/kernel/setup.c | 2 +
arch/arm64/kvm/Makefile | 1 +
arch/arm64/kvm/guest.c | 9 ++
include/linux/arm-smccc.h | 14 +++
include/linux/cpuhotplug.h | 1 +
include/uapi/linux/kvm.h | 2 +
virt/kvm/arm/arm.c | 8 ++
virt/kvm/arm/hypercalls.c | 8 ++
virt/kvm/arm/pvlock.c | 103 ++++++++++++++++++++
20 files changed, 445 insertions(+), 2 deletions(-)
create mode 100644 Documentation/virt/kvm/arm/pvlock.rst
create mode 100644 arch/arm64/include/asm/pvlock-abi.h
create mode 100644 arch/arm64/kernel/paravirt-spinlocks.c
create mode 100644 virt/kvm/arm/pvlock.c

--
2.19.1



2019-12-26 14:00:08

by yezengruan

[permalink] [raw]
Subject: [PATCH v2 3/6] KVM: arm64: Support pvlock preempted via shared structure

Implement the service call for configuring a shared structure between a
VCPU and the hypervisor in which the hypervisor can tell the VCPU is
running or not.

The preempted field is zero if 1) some old KVM deos not support this filed.
2) the VCPU is not preempted. Other values means the VCPU has been preempted.

Signed-off-by: Zengruan Ye <[email protected]>
---
arch/arm/include/asm/kvm_host.h | 18 ++++++++++++
arch/arm64/include/asm/kvm_host.h | 19 +++++++++++++
arch/arm64/kvm/Makefile | 1 +
virt/kvm/arm/arm.c | 8 ++++++
virt/kvm/arm/hypercalls.c | 8 ++++++
virt/kvm/arm/pvlock.c | 46 +++++++++++++++++++++++++++++++
6 files changed, 100 insertions(+)
create mode 100644 virt/kvm/arm/pvlock.c

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 556cd818eccf..dfeaf9204875 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -356,6 +356,24 @@ static inline bool kvm_arm_is_pvtime_enabled(struct kvm_vcpu_arch *vcpu_arch)
return false;
}

+static inline void kvm_arm_pvlock_preempted_init(struct kvm_vcpu_arch *vcpu_arch)
+{
+}
+
+static inline bool kvm_arm_is_pvlock_preempted_ready(struct kvm_vcpu_arch *vcpu_arch)
+{
+ return false;
+}
+
+static inline gpa_t kvm_init_pvlock(struct kvm_vcpu *vcpu)
+{
+ return GPA_INVALID;
+}
+
+static inline void kvm_update_pvlock_preempted(struct kvm_vcpu *vcpu, u64 preempted)
+{
+}
+
void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);

struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index c61260cf63c5..2818a2330f92 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -354,6 +354,12 @@ struct kvm_vcpu_arch {
u64 last_steal;
gpa_t base;
} steal;
+
+ /* Guest PV lock state */
+ struct {
+ u64 preempted;
+ gpa_t base;
+ } pv;
};

/* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
@@ -515,6 +521,19 @@ static inline bool kvm_arm_is_pvtime_enabled(struct kvm_vcpu_arch *vcpu_arch)
return (vcpu_arch->steal.base != GPA_INVALID);
}

+static inline void kvm_arm_pvlock_preempted_init(struct kvm_vcpu_arch *vcpu_arch)
+{
+ vcpu_arch->pv.base = GPA_INVALID;
+}
+
+static inline bool kvm_arm_is_pvlock_preempted_ready(struct kvm_vcpu_arch *vcpu_arch)
+{
+ return (vcpu_arch->pv.base != GPA_INVALID);
+}
+
+gpa_t kvm_init_pvlock(struct kvm_vcpu *vcpu);
+void kvm_update_pvlock_preempted(struct kvm_vcpu *vcpu, u64 preempted);
+
void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome);

struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 5ffbdc39e780..e4591f56d5f1 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -15,6 +15,7 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arm.o $(KVM)/arm/mmu.o $(KVM)/arm/mmio.
kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/psci.o $(KVM)/arm/perf.o
kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hypercalls.o
kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/pvtime.o
+kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/pvlock.o

kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o va_layout.o
kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 8de4daf25097..36d57e77d3c4 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -383,6 +383,8 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)

kvm_arm_pvtime_vcpu_init(&vcpu->arch);

+ kvm_arm_pvlock_preempted_init(&vcpu->arch);
+
return kvm_vgic_vcpu_init(vcpu);
}

@@ -421,6 +423,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
vcpu_set_wfx_traps(vcpu);

vcpu_ptrauth_setup_lazy(vcpu);
+
+ if (kvm_arm_is_pvlock_preempted_ready(&vcpu->arch))
+ kvm_update_pvlock_preempted(vcpu, 0);
}

void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@@ -434,6 +439,9 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
vcpu->cpu = -1;

kvm_arm_set_running_vcpu(NULL);
+
+ if (kvm_arm_is_pvlock_preempted_ready(&vcpu->arch))
+ kvm_update_pvlock_preempted(vcpu, 1);
}

static void vcpu_power_off(struct kvm_vcpu *vcpu)
diff --git a/virt/kvm/arm/hypercalls.c b/virt/kvm/arm/hypercalls.c
index 550dfa3e53cd..1c6a11f21bb4 100644
--- a/virt/kvm/arm/hypercalls.c
+++ b/virt/kvm/arm/hypercalls.c
@@ -52,6 +52,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
case ARM_SMCCC_HV_PV_TIME_FEATURES:
val = SMCCC_RET_SUCCESS;
break;
+ case ARM_SMCCC_HV_PV_LOCK_FEATURES:
+ val = SMCCC_RET_SUCCESS;
+ break;
}
break;
case ARM_SMCCC_HV_PV_TIME_FEATURES:
@@ -62,6 +65,11 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
if (gpa != GPA_INVALID)
val = gpa;
break;
+ case ARM_SMCCC_HV_PV_LOCK_PREEMPTED:
+ gpa = kvm_init_pvlock(vcpu);
+ if (gpa != GPA_INVALID)
+ val = gpa;
+ break;
default:
return kvm_psci_call(vcpu);
}
diff --git a/virt/kvm/arm/pvlock.c b/virt/kvm/arm/pvlock.c
new file mode 100644
index 000000000000..cdfd30a903b9
--- /dev/null
+++ b/virt/kvm/arm/pvlock.c
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright(c) 2019 Huawei Technologies Co., Ltd
+ * Author: Zengruan Ye <[email protected]>
+ */
+
+#include <linux/arm-smccc.h>
+#include <linux/kvm_host.h>
+
+#include <asm/pvlock-abi.h>
+
+#include <kvm/arm_hypercalls.h>
+
+gpa_t kvm_init_pvlock(struct kvm_vcpu *vcpu)
+{
+ struct pvlock_vcpu_state init_values = {};
+ struct kvm *kvm = vcpu->kvm;
+ u64 base = vcpu->arch.pv.base;
+ int idx;
+
+ if (base == GPA_INVALID)
+ return base;
+
+ idx = srcu_read_lock(&kvm->srcu);
+ kvm_write_guest(kvm, base, &init_values, sizeof(init_values));
+ srcu_read_unlock(&kvm->srcu, idx);
+
+ return base;
+}
+
+void kvm_update_pvlock_preempted(struct kvm_vcpu *vcpu, u64 preempted)
+{
+ int idx;
+ u64 offset;
+ __le64 preempted_le;
+ struct kvm *kvm = vcpu->kvm;
+ u64 base = vcpu->arch.pv.base;
+
+ vcpu->arch.pv.preempted = preempted;
+ preempted_le = cpu_to_le64(preempted);
+
+ idx = srcu_read_lock(&kvm->srcu);
+ offset = offsetof(struct pvlock_vcpu_state, preempted);
+ kvm_put_guest(kvm, base + offset, preempted_le, u64);
+ srcu_read_unlock(&kvm->srcu, idx);
+}
--
2.19.1


2019-12-26 14:00:09

by yezengruan

[permalink] [raw]
Subject: [PATCH v2 6/6] KVM: arm64: Support the VCPU preemption check

Support the vcpu_is_preempted() functionality under KVM/arm64. This will
enhance lock performance on overcommitted hosts (more runnable VCPUs
than physical CPUs in the system) as doing busy waits for preempted
VCPUs will hurt system performance far worse than early yielding.

unix benchmark result:
host: kernel 5.5.0-rc1, HiSilicon Kunpeng920, 8 CPUs
guest: kernel 5.5.0-rc1, 16 VCPUs

test-case | after-patch | before-patch
----------------------------------------+-------------------+------------------
Dhrystone 2 using register variables | 334600751.0 lps | 335319028.3 lps
Double-Precision Whetstone | 32856.1 MWIPS | 32849.6 MWIPS
Execl Throughput | 3662.1 lps | 2718.0 lps
File Copy 1024 bufsize 2000 maxblocks | 432906.4 KBps | 158011.8 KBps
File Copy 256 bufsize 500 maxblocks | 116023.0 KBps | 37664.0 KBps
File Copy 4096 bufsize 8000 maxblocks | 1432769.8 KBps | 441108.8 KBps
Pipe Throughput | 6405029.6 lps | 6021457.6 lps
Pipe-based Context Switching | 185872.7 lps | 184255.3 lps
Process Creation | 4025.7 lps | 3706.6 lps
Shell Scripts (1 concurrent) | 6745.6 lpm | 6436.1 lpm
Shell Scripts (8 concurrent) | 998.7 lpm | 931.1 lpm
System Call Overhead | 3913363.1 lps | 3883287.8 lps
----------------------------------------+-------------------+------------------
System Benchmarks Index Score | 1835.1 | 1327.6

Signed-off-by: Zengruan Ye <[email protected]>
---
arch/arm64/include/asm/paravirt.h | 3 +
arch/arm64/kernel/paravirt.c | 117 ++++++++++++++++++++++++++++++
arch/arm64/kernel/setup.c | 2 +
include/linux/cpuhotplug.h | 1 +
4 files changed, 123 insertions(+)

diff --git a/arch/arm64/include/asm/paravirt.h b/arch/arm64/include/asm/paravirt.h
index 7b1c81b544bb..ca3a2c7881f3 100644
--- a/arch/arm64/include/asm/paravirt.h
+++ b/arch/arm64/include/asm/paravirt.h
@@ -29,6 +29,8 @@ static inline u64 paravirt_steal_clock(int cpu)

int __init pv_time_init(void);

+int __init pv_lock_init(void);
+
__visible bool __native_vcpu_is_preempted(int cpu);

static inline bool pv_vcpu_is_preempted(int cpu)
@@ -39,6 +41,7 @@ static inline bool pv_vcpu_is_preempted(int cpu)
#else

#define pv_time_init() do {} while (0)
+#define pv_lock_init() do {} while (0)

#endif // CONFIG_PARAVIRT

diff --git a/arch/arm64/kernel/paravirt.c b/arch/arm64/kernel/paravirt.c
index d8f1ba8c22ce..bd2ad6a17a26 100644
--- a/arch/arm64/kernel/paravirt.c
+++ b/arch/arm64/kernel/paravirt.c
@@ -22,6 +22,7 @@
#include <asm/paravirt.h>
#include <asm/pvclock-abi.h>
#include <asm/smp_plat.h>
+#include <asm/pvlock-abi.h>

struct static_key paravirt_steal_enabled;
struct static_key paravirt_steal_rq_enabled;
@@ -35,6 +36,10 @@ struct pv_time_stolen_time_region {
struct pvclock_vcpu_stolen_time *kaddr;
};

+struct pv_lock_state_region {
+ struct pvlock_vcpu_state *kaddr;
+};
+
static DEFINE_PER_CPU(struct pv_time_stolen_time_region, stolen_time_region);

static bool steal_acc = true;
@@ -158,3 +163,115 @@ int __init pv_time_init(void)

return 0;
}
+
+static DEFINE_PER_CPU(struct pv_lock_state_region, lock_state_region);
+
+static bool kvm_vcpu_is_preempted(int cpu)
+{
+ struct pv_lock_state_region *reg;
+ __le64 preempted_le;
+
+ reg = per_cpu_ptr(&lock_state_region, cpu);
+ if (!reg->kaddr) {
+ pr_warn_once("PV lock enabled but not configured for cpu %d\n",
+ cpu);
+ return false;
+ }
+
+ preempted_le = le64_to_cpu(READ_ONCE(reg->kaddr->preempted));
+
+ return !!(preempted_le & 1);
+}
+
+static int pvlock_vcpu_state_dying_cpu(unsigned int cpu)
+{
+ struct pv_lock_state_region *reg;
+
+ reg = this_cpu_ptr(&lock_state_region);
+ if (!reg->kaddr)
+ return 0;
+
+ memunmap(reg->kaddr);
+ memset(reg, 0, sizeof(*reg));
+
+ return 0;
+}
+
+static int init_pvlock_vcpu_state(unsigned int cpu)
+{
+ struct pv_lock_state_region *reg;
+ struct arm_smccc_res res;
+
+ reg = this_cpu_ptr(&lock_state_region);
+
+ arm_smccc_1_1_invoke(ARM_SMCCC_HV_PV_LOCK_PREEMPTED, &res);
+
+ if (res.a0 == SMCCC_RET_NOT_SUPPORTED) {
+ pr_warn("Failed to init PV lock data structure\n");
+ return -EINVAL;
+ }
+
+ reg->kaddr = memremap(res.a0,
+ sizeof(struct pvlock_vcpu_state),
+ MEMREMAP_WB);
+
+ if (!reg->kaddr) {
+ pr_warn("Failed to map PV lock data structure\n");
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+static int kvm_arm_init_pvlock(void)
+{
+ int ret;
+
+ ret = cpuhp_setup_state(CPUHP_AP_ARM_KVM_PVLOCK_STARTING,
+ "hypervisor/arm/pvlock:starting",
+ init_pvlock_vcpu_state,
+ pvlock_vcpu_state_dying_cpu);
+ if (ret < 0) {
+ pr_warn("PV-lock init failed\n");
+ return ret;
+ }
+
+ return 0;
+}
+
+static bool has_kvm_pvlock(void)
+{
+ struct arm_smccc_res res;
+
+ /* To detect the presence of PV lock support we require SMCCC 1.1+ */
+ if (psci_ops.smccc_version < SMCCC_VERSION_1_1)
+ return false;
+
+ arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
+ ARM_SMCCC_HV_PV_LOCK_FEATURES, &res);
+
+ if (res.a0 != SMCCC_RET_SUCCESS)
+ return false;
+
+ return true;
+}
+
+int __init pv_lock_init(void)
+{
+ int ret;
+
+ if (is_hyp_mode_available())
+ return 0;
+
+ if (!has_kvm_pvlock())
+ return 0;
+
+ ret = kvm_arm_init_pvlock();
+ if (ret)
+ return ret;
+
+ pv_ops.lock.vcpu_is_preempted = kvm_vcpu_is_preempted;
+ pr_info("using PV-lock preempted\n");
+
+ return 0;
+}
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 56f664561754..aa3a8b9e710f 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -341,6 +341,8 @@ void __init setup_arch(char **cmdline_p)
smp_init_cpus();
smp_build_mpidr_hash();

+ pv_lock_init();
+
/* Init percpu seeds for random tags after cpus are set up. */
kasan_init_tags();

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index e51ee772b9f5..f72ff95ab63a 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -138,6 +138,7 @@ enum cpuhp_state {
CPUHP_AP_DUMMY_TIMER_STARTING,
CPUHP_AP_ARM_XEN_STARTING,
CPUHP_AP_ARM_KVMPV_STARTING,
+ CPUHP_AP_ARM_KVM_PVLOCK_STARTING,
CPUHP_AP_ARM_CORESIGHT_STARTING,
CPUHP_AP_ARM64_ISNDEP_STARTING,
CPUHP_AP_SMPCFD_DYING,
--
2.19.1


2019-12-26 14:00:25

by yezengruan

[permalink] [raw]
Subject: [PATCH v2 5/6] KVM: arm64: Add interface to support VCPU preempted check

This is to fix some lock holder preemption issues. Some other locks
implementation do a spin loop before acquiring the lock itself.
Currently kernel has an interface of bool vcpu_is_preempted(int cpu). It
takes the CPU as parameter and return true if the CPU is preempted.
Then kernel can break the spin loops upon the retval of vcpu_is_preempted.

As kernel has used this interface, So lets support it.

Signed-off-by: Zengruan Ye <[email protected]>
---
arch/arm64/include/asm/paravirt.h | 12 ++++++++++++
arch/arm64/include/asm/spinlock.h | 7 +++++++
arch/arm64/kernel/Makefile | 2 +-
arch/arm64/kernel/paravirt-spinlocks.c | 13 +++++++++++++
arch/arm64/kernel/paravirt.c | 4 +++-
5 files changed, 36 insertions(+), 2 deletions(-)
create mode 100644 arch/arm64/kernel/paravirt-spinlocks.c

diff --git a/arch/arm64/include/asm/paravirt.h b/arch/arm64/include/asm/paravirt.h
index cf3a0fd7c1a7..7b1c81b544bb 100644
--- a/arch/arm64/include/asm/paravirt.h
+++ b/arch/arm64/include/asm/paravirt.h
@@ -11,8 +11,13 @@ struct pv_time_ops {
unsigned long long (*steal_clock)(int cpu);
};

+struct pv_lock_ops {
+ bool (*vcpu_is_preempted)(int cpu);
+};
+
struct paravirt_patch_template {
struct pv_time_ops time;
+ struct pv_lock_ops lock;
};

extern struct paravirt_patch_template pv_ops;
@@ -24,6 +29,13 @@ static inline u64 paravirt_steal_clock(int cpu)

int __init pv_time_init(void);

+__visible bool __native_vcpu_is_preempted(int cpu);
+
+static inline bool pv_vcpu_is_preempted(int cpu)
+{
+ return pv_ops.lock.vcpu_is_preempted(cpu);
+}
+
#else

#define pv_time_init() do {} while (0)
diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h
index b093b287babf..45ff1b2949a6 100644
--- a/arch/arm64/include/asm/spinlock.h
+++ b/arch/arm64/include/asm/spinlock.h
@@ -7,8 +7,15 @@

#include <asm/qrwlock.h>
#include <asm/qspinlock.h>
+#include <asm/paravirt.h>

/* See include/linux/spinlock.h */
#define smp_mb__after_spinlock() smp_mb()

+#define vcpu_is_preempted vcpu_is_preempted
+static inline bool vcpu_is_preempted(long cpu)
+{
+ return pv_vcpu_is_preempted(cpu);
+}
+
#endif /* __ASM_SPINLOCK_H */
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index fc6488660f64..b23cdae433a4 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -50,7 +50,7 @@ obj-$(CONFIG_ARMV8_DEPRECATED) += armv8_deprecated.o
obj-$(CONFIG_ACPI) += acpi.o
obj-$(CONFIG_ACPI_NUMA) += acpi_numa.o
obj-$(CONFIG_ARM64_ACPI_PARKING_PROTOCOL) += acpi_parking_protocol.o
-obj-$(CONFIG_PARAVIRT) += paravirt.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o paravirt-spinlocks.o
obj-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o
obj-$(CONFIG_KEXEC_CORE) += machine_kexec.o relocate_kernel.o \
diff --git a/arch/arm64/kernel/paravirt-spinlocks.c b/arch/arm64/kernel/paravirt-spinlocks.c
new file mode 100644
index 000000000000..718aa773d45c
--- /dev/null
+++ b/arch/arm64/kernel/paravirt-spinlocks.c
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright(c) 2019 Huawei Technologies Co., Ltd
+ * Author: Zengruan Ye <[email protected]>
+ */
+
+#include <linux/spinlock.h>
+#include <asm/paravirt.h>
+
+__visible bool __native_vcpu_is_preempted(int cpu)
+{
+ return false;
+}
diff --git a/arch/arm64/kernel/paravirt.c b/arch/arm64/kernel/paravirt.c
index 1ef702b0be2d..d8f1ba8c22ce 100644
--- a/arch/arm64/kernel/paravirt.c
+++ b/arch/arm64/kernel/paravirt.c
@@ -26,7 +26,9 @@
struct static_key paravirt_steal_enabled;
struct static_key paravirt_steal_rq_enabled;

-struct paravirt_patch_template pv_ops;
+struct paravirt_patch_template pv_ops = {
+ .lock.vcpu_is_preempted = __native_vcpu_is_preempted,
+};
EXPORT_SYMBOL_GPL(pv_ops);

struct pv_time_stolen_time_region {
--
2.19.1


2019-12-26 14:00:31

by yezengruan

[permalink] [raw]
Subject: [PATCH v2 1/6] KVM: arm64: Document PV-lock interface

Introduce a paravirtualization interface for KVM/arm64 to obtain the VCPU
is currently running or not.

The PV lock structure of the guest is allocated by user space.

A hypercall interface is provided for the guest to interrogate the
hypervisor's support for this interface and the location of the shared
memory structures.

Signed-off-by: Zengruan Ye <[email protected]>
---
Documentation/virt/kvm/arm/pvlock.rst | 63 +++++++++++++++++++++++++
Documentation/virt/kvm/devices/vcpu.txt | 14 ++++++
2 files changed, 77 insertions(+)
create mode 100644 Documentation/virt/kvm/arm/pvlock.rst

diff --git a/Documentation/virt/kvm/arm/pvlock.rst b/Documentation/virt/kvm/arm/pvlock.rst
new file mode 100644
index 000000000000..58b3b8ee7537
--- /dev/null
+++ b/Documentation/virt/kvm/arm/pvlock.rst
@@ -0,0 +1,63 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Paravirtualized lock support for arm64
+======================================
+
+KVM/arm64 provides some hypervisor service calls to support a paravirtualized
+guest obtaining the VCPU is currently running or not.
+
+Two new SMCCC compatible hypercalls are defined:
+
+* PV_LOCK_FEATURES: 0xC6000020
+* PV_LOCK_PREEMPTED: 0xC6000021
+
+The existence of the PV_LOCK hypercall should be probed using the SMCCC 1.1
+ARCH_FEATURES mechanism before calling it.
+
+PV_LOCK_FEATURES
+ ============= ======== ==========
+ Function ID: (uint32) 0xC6000020
+ PV_call_id: (uint32) The function to query for support.
+ Return value: (int64) NOT_SUPPORTED (-1) or SUCCESS (0) if the relevant
+ PV-lock feature is supported by the hypervisor.
+ ============= ======== ==========
+
+PV_LOCK_PREEMPTED
+ ============= ======== ==========
+ Function ID: (uint32) 0xC6000021
+ Return value: (int64) NOT_SUPPORTED (-1) or SUCCESS (0) if the IPA of
+ this VCPU's pv data structure is configured by
+ the hypervisor.
+ ============= ======== ==========
+
+The IPA returned by PV_LOCK_PREEMPTED should be mapped by the guest as normal
+memory with inner and outer write back caching attributes, in the inner
+shareable domain.
+
+PV_LOCK_PREEMPTED returns the structure for the calling VCPU.
+
+PV lock state
+-------------
+
+The structure pointed to by the PV_LOCK_PREEMPTED hypercall is as follows:
+
++-----------+-------------+-------------+---------------------------------+
+| Field | Byte Length | Byte Offset | Description |
++===========+=============+=============+=================================+
+| preempted | 8 | 0 | Indicate the VCPU who owns this |
+| | | | struct is running or not. |
+| | | | Non-zero values mean the VCPU |
+| | | | has been preempted. Zero means |
+| | | | the VCPU is not preempted. |
++-----------+-------------+-------------+---------------------------------+
+
+The preempted field will be updated to 1 by the hypervisor prior to scheduling
+a VCPU. When the VCPU is scheduled out, the preempted field will be updated
+to 0 by the hypervisor.
+
+The structure will be present within a reserved region of the normal memory
+given to the guest. The guest should not attempt to write into this memory.
+There is a structure per VCPU of the guest.
+
+For the user space interface see Documentation/virt/kvm/devices/vcpu.txt
+section "4. GROUP: KVM_ARM_VCPU_PVLOCK_CTRL".
diff --git a/Documentation/virt/kvm/devices/vcpu.txt b/Documentation/virt/kvm/devices/vcpu.txt
index 6f3bd64a05b0..c10a5945075b 100644
--- a/Documentation/virt/kvm/devices/vcpu.txt
+++ b/Documentation/virt/kvm/devices/vcpu.txt
@@ -74,3 +74,17 @@ Specifies the base address of the stolen time structure for this VCPU. The
base address must be 64 byte aligned and exist within a valid guest memory
region. See Documentation/virt/kvm/arm/pvtime.txt for more information
including the layout of the stolen time structure.
+
+4. GROUP: KVM_ARM_VCPU_PVLOCK_CTRL
+Architectures: ARM64
+
+4.1 ATTRIBUTE: KVM_ARM_VCPU_PVLOCK_IPA
+Parameters: 64-bit base address
+Returns: -ENXIO: PV lock not implemented
+ -EEXIST: Base address already set for this VCPU
+ -EINVAL: Base address not 64 byte aligned
+
+Specifies the base address of the PV lock structure for this VCPU. The
+base address must be 64 byte aligned and exist within a valid guest memory
+region. See Documentation/virt/kvm/arm/pvlock.rst for more information
+including the layout of the pv lock structure.
--
2.19.1


2019-12-26 14:00:38

by yezengruan

[permalink] [raw]
Subject: [PATCH v2 2/6] KVM: arm64: Add SMCCC paravirtualised lock calls

Add two new SMCCC compatible hypercalls for PV lock features:
PV_LOCK_FEATURES: 0xC6000020
PV_LOCK_PREEMPTED: 0xC6000021

Also add the header file which defines the ABI for the paravirtualized
lock features we're about to add.

Signed-off-by: Zengruan Ye <[email protected]>
---
arch/arm64/include/asm/pvlock-abi.h | 16 ++++++++++++++++
include/linux/arm-smccc.h | 14 ++++++++++++++
2 files changed, 30 insertions(+)
create mode 100644 arch/arm64/include/asm/pvlock-abi.h

diff --git a/arch/arm64/include/asm/pvlock-abi.h b/arch/arm64/include/asm/pvlock-abi.h
new file mode 100644
index 000000000000..06e0c3d7710a
--- /dev/null
+++ b/arch/arm64/include/asm/pvlock-abi.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright(c) 2019 Huawei Technologies Co., Ltd
+ * Author: Zengruan Ye <[email protected]>
+ */
+
+#ifndef __ASM_PVLOCK_ABI_H
+#define __ASM_PVLOCK_ABI_H
+
+struct pvlock_vcpu_state {
+ __le64 preempted;
+ /* Structure must be 64 byte aligned, pad to that size */
+ u8 padding[56];
+} __packed;
+
+#endif
diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
index 59494df0f55b..3a5c6b35492f 100644
--- a/include/linux/arm-smccc.h
+++ b/include/linux/arm-smccc.h
@@ -46,6 +46,7 @@
#define ARM_SMCCC_OWNER_OEM 3
#define ARM_SMCCC_OWNER_STANDARD 4
#define ARM_SMCCC_OWNER_STANDARD_HYP 5
+#define ARM_SMCCC_OWNER_VENDOR_HYP 6
#define ARM_SMCCC_OWNER_TRUSTED_APP 48
#define ARM_SMCCC_OWNER_TRUSTED_APP_END 49
#define ARM_SMCCC_OWNER_TRUSTED_OS 50
@@ -377,5 +378,18 @@ asmlinkage void __arm_smccc_hvc(unsigned long a0, unsigned long a1,
ARM_SMCCC_OWNER_STANDARD_HYP, \
0x21)

+/* Paravirtualised lock calls */
+#define ARM_SMCCC_HV_PV_LOCK_FEATURES \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_64, \
+ ARM_SMCCC_OWNER_VENDOR_HYP, \
+ 0x20)
+
+#define ARM_SMCCC_HV_PV_LOCK_PREEMPTED \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_64, \
+ ARM_SMCCC_OWNER_VENDOR_HYP, \
+ 0x21)
+
#endif /*__ASSEMBLY__*/
#endif /*__LINUX_ARM_SMCCC_H*/
--
2.19.1


2019-12-26 14:01:54

by yezengruan

[permalink] [raw]
Subject: [PATCH v2 4/6] KVM: arm64: Provide VCPU attributes for PV lock

Allow user space to inform the KVM host where in the physical memory
map the paravirtualized lock structures should be located.

User space can set an attribute on the VCPU providing the IPA base
address of the PV lock structure for that VCPU. This must be
repeated for every VCPU in the VM.

The address is given in terms of the physical address visible to
the guest and must be 64 byte aligned. The guest will discover the
address via a hypercall.

Signed-off-by: Zengruan Ye <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 9 +++++
arch/arm64/include/uapi/asm/kvm.h | 2 ++
arch/arm64/kvm/guest.c | 9 +++++
include/uapi/linux/kvm.h | 2 ++
virt/kvm/arm/pvlock.c | 57 +++++++++++++++++++++++++++++++
5 files changed, 79 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 2818a2330f92..63b6e204676b 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -521,6 +521,15 @@ static inline bool kvm_arm_is_pvtime_enabled(struct kvm_vcpu_arch *vcpu_arch)
return (vcpu_arch->steal.base != GPA_INVALID);
}

+int kvm_arm_pvlock_set_attr(struct kvm_vcpu *vcpu,
+ struct kvm_device_attr *attr);
+
+int kvm_arm_pvlock_get_attr(struct kvm_vcpu *vcpu,
+ struct kvm_device_attr *attr);
+
+int kvm_arm_pvlock_has_attr(struct kvm_vcpu *vcpu,
+ struct kvm_device_attr *attr);
+
static inline void kvm_arm_pvlock_preempted_init(struct kvm_vcpu_arch *vcpu_arch)
{
vcpu_arch->pv.base = GPA_INVALID;
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 820e5751ada7..137d966b57c7 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -326,6 +326,8 @@ struct kvm_vcpu_events {
#define KVM_ARM_VCPU_TIMER_IRQ_PTIMER 1
#define KVM_ARM_VCPU_PVTIME_CTRL 2
#define KVM_ARM_VCPU_PVTIME_IPA 0
+#define KVM_ARM_VCPU_PVLOCK_CTRL 3
+#define KVM_ARM_VCPU_PVLOCK_IPA 0

/* KVM_IRQ_LINE irq field index values */
#define KVM_ARM_IRQ_VCPU2_SHIFT 28
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 2fff06114a8f..6a5c12f3b08b 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -875,6 +875,9 @@ int kvm_arm_vcpu_arch_set_attr(struct kvm_vcpu *vcpu,
case KVM_ARM_VCPU_PVTIME_CTRL:
ret = kvm_arm_pvtime_set_attr(vcpu, attr);
break;
+ case KVM_ARM_VCPU_PVLOCK_CTRL:
+ ret = kvm_arm_pvlock_set_attr(vcpu, attr);
+ break;
default:
ret = -ENXIO;
break;
@@ -898,6 +901,9 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
case KVM_ARM_VCPU_PVTIME_CTRL:
ret = kvm_arm_pvtime_get_attr(vcpu, attr);
break;
+ case KVM_ARM_VCPU_PVLOCK_CTRL:
+ ret = kvm_arm_pvlock_get_attr(vcpu, attr);
+ break;
default:
ret = -ENXIO;
break;
@@ -921,6 +927,9 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
case KVM_ARM_VCPU_PVTIME_CTRL:
ret = kvm_arm_pvtime_has_attr(vcpu, attr);
break;
+ case KVM_ARM_VCPU_PVLOCK_CTRL:
+ ret = kvm_arm_pvlock_has_attr(vcpu, attr);
+ break;
default:
ret = -ENXIO;
break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index f0a16b4adbbd..bfc628c580d4 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1238,6 +1238,8 @@ enum kvm_device_type {
#define KVM_DEV_TYPE_XIVE KVM_DEV_TYPE_XIVE
KVM_DEV_TYPE_ARM_PV_TIME,
#define KVM_DEV_TYPE_ARM_PV_TIME KVM_DEV_TYPE_ARM_PV_TIME
+ KVM_DEV_TYPE_ARM_PV_LOCK,
+#define KVM_DEV_TYPE_ARM_PV_LOCK KVM_DEV_TYPE_ARM_PV_LOCK
KVM_DEV_TYPE_MAX,
};

diff --git a/virt/kvm/arm/pvlock.c b/virt/kvm/arm/pvlock.c
index cdfd30a903b9..cbc562274e5e 100644
--- a/virt/kvm/arm/pvlock.c
+++ b/virt/kvm/arm/pvlock.c
@@ -44,3 +44,60 @@ void kvm_update_pvlock_preempted(struct kvm_vcpu *vcpu, u64 preempted)
kvm_put_guest(kvm, base + offset, preempted_le, u64);
srcu_read_unlock(&kvm->srcu, idx);
}
+
+int kvm_arm_pvlock_set_attr(struct kvm_vcpu *vcpu,
+ struct kvm_device_attr *attr)
+{
+ u64 __user *user = (u64 __user *)attr->addr;
+ struct kvm *kvm = vcpu->kvm;
+ u64 ipa;
+ int ret = 0;
+ int idx;
+
+ if (attr->attr != KVM_ARM_VCPU_PVLOCK_IPA)
+ return -ENXIO;
+
+ if (get_user(ipa, user))
+ return -EFAULT;
+ if (!IS_ALIGNED(ipa, 64))
+ return -EINVAL;
+ if (vcpu->arch.pv.base != GPA_INVALID)
+ return -EEXIST;
+
+ /* Check the address is in a valid memslot */
+ idx = srcu_read_lock(&kvm->srcu);
+ if (kvm_is_error_hva(gfn_to_hva(kvm, ipa >> PAGE_SHIFT)))
+ ret = -EINVAL;
+ srcu_read_unlock(&kvm->srcu, idx);
+
+ if (!ret)
+ vcpu->arch.pv.base = ipa;
+
+ return ret;
+}
+
+int kvm_arm_pvlock_get_attr(struct kvm_vcpu *vcpu,
+ struct kvm_device_attr *attr)
+{
+ u64 __user *user = (u64 __user *)attr->addr;
+ u64 ipa;
+
+ if (attr->attr != KVM_ARM_VCPU_PVLOCK_IPA)
+ return -ENXIO;
+
+ ipa = vcpu->arch.pv.base;
+
+ if (put_user(ipa, user))
+ return -EFAULT;
+ return 0;
+}
+
+int kvm_arm_pvlock_has_attr(struct kvm_vcpu *vcpu,
+ struct kvm_device_attr *attr)
+{
+ switch (attr->attr) {
+ case KVM_ARM_VCPU_PVLOCK_IPA:
+ return 0;
+ }
+ return -ENXIO;
+}
--
2.19.1


2019-12-26 18:54:00

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] KVM: arm64: Add interface to support VCPU preempted check

Hi Zengruan,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on kvmarm/next]
[also build test ERROR on kvm/linux-next linus/master v5.5-rc3 next-20191220]
[cannot apply to arm64/for-next/core]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url: https://github.com/0day-ci/linux/commits/Zengruan-Ye/KVM-arm64-VCPU-preempted-check-support/20191227-000637
base: https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next
config: arm64-alldefconfig (attached as .config)
compiler: aarch64-linux-gcc (GCC) 7.5.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.5.0 make.cross ARCH=arm64

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <[email protected]>

All errors (new ones prefixed by >>):

In file included from include/linux/spinlock.h:89:0,
from include/linux/radix-tree.h:16,
from include/linux/idr.h:15,
from include/linux/kernfs.h:13,
from include/linux/sysfs.h:16,
from include/linux/kobject.h:20,
from include/linux/of.h:17,
from include/linux/irqdomain.h:35,
from include/linux/acpi.h:13,
from include/acpi/apei.h:9,
from include/acpi/ghes.h:5,
from include/linux/arm_sdei.h:8,
from arch/arm64/kernel/asm-offsets.c:10:
arch/arm64/include/asm/spinlock.h: In function 'vcpu_is_preempted':
>> arch/arm64/include/asm/spinlock.h:18:9: error: implicit declaration of function 'pv_vcpu_is_preempted'; did you mean 'vcpu_is_preempted'? [-Werror=implicit-function-declaration]
return pv_vcpu_is_preempted(cpu);
^~~~~~~~~~~~~~~~~~~~
vcpu_is_preempted
cc1: some warnings being treated as errors
make[2]: *** [arch/arm64/kernel/asm-offsets.s] Error 1
make[2]: Target '__build' not remade because of errors.
make[1]: *** [prepare0] Error 2
make[1]: Target 'prepare' not remade because of errors.
make: *** [sub-make] Error 2
27 real 5 user 7 sys 48.63% cpu make prepare

vim +18 arch/arm64/include/asm/spinlock.h

14
15 #define vcpu_is_preempted vcpu_is_preempted
16 static inline bool vcpu_is_preempted(long cpu)
17 {
> 18 return pv_vcpu_is_preempted(cpu);
19 }
20

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation


Attachments:
(No filename) (2.89 kB)
.config.gz (9.96 kB)
Download all attachments

2019-12-27 06:53:15

by yezengruan

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] KVM: arm64: Add interface to support VCPU preempted check

Hi,

On 2019/12/27 2:51, kbuild test robot wrote:
> Hi Zengruan,
>
> Thank you for the patch! Yet something to improve:
>
> [auto build test ERROR on kvmarm/next]
> [also build test ERROR on kvm/linux-next linus/master v5.5-rc3 next-20191220]
> [cannot apply to arm64/for-next/core]
> [if your patch is applied to the wrong git tree, please drop us a note to help
> improve the system. BTW, we also suggest to use '--base' option to specify the
> base tree in git format-patch, please see https://stackoverflow.com/a/37406982]
>
> url: https://github.com/0day-ci/linux/commits/Zengruan-Ye/KVM-arm64-VCPU-preempted-check-support/20191227-000637
> base: https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next
> config: arm64-alldefconfig (attached as .config)
> compiler: aarch64-linux-gcc (GCC) 7.5.0
> reproduce:
> wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> GCC_VERSION=7.5.0 make.cross ARCH=arm64
>
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot <[email protected]>
>
> All errors (new ones prefixed by >>):
>
> In file included from include/linux/spinlock.h:89:0,
> from include/linux/radix-tree.h:16,
> from include/linux/idr.h:15,
> from include/linux/kernfs.h:13,
> from include/linux/sysfs.h:16,
> from include/linux/kobject.h:20,
> from include/linux/of.h:17,
> from include/linux/irqdomain.h:35,
> from include/linux/acpi.h:13,
> from include/acpi/apei.h:9,
> from include/acpi/ghes.h:5,
> from include/linux/arm_sdei.h:8,
> from arch/arm64/kernel/asm-offsets.c:10:
> arch/arm64/include/asm/spinlock.h: In function 'vcpu_is_preempted':
>>> arch/arm64/include/asm/spinlock.h:18:9: error: implicit declaration of function 'pv_vcpu_is_preempted'; did you mean 'vcpu_is_preempted'? [-Werror=implicit-function-declaration]
> return pv_vcpu_is_preempted(cpu);
> ^~~~~~~~~~~~~~~~~~~~
> vcpu_is_preempted
> cc1: some warnings being treated as errors
> make[2]: *** [arch/arm64/kernel/asm-offsets.s] Error 1
> make[2]: Target '__build' not remade because of errors.
> make[1]: *** [prepare0] Error 2
> make[1]: Target 'prepare' not remade because of errors.
> make: *** [sub-make] Error 2
> 27 real 5 user 7 sys 48.63% cpu make prepare
>
> vim +18 arch/arm64/include/asm/spinlock.h
>
> 14
> 15 #define vcpu_is_preempted vcpu_is_preempted
> 16 static inline bool vcpu_is_preempted(long cpu)
> 17 {
> > 18 return pv_vcpu_is_preempted(cpu);
> 19 }
> 20
>
> ---
> 0-DAY kernel test infrastructure Open Source Technology Center
> https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation
>

Thanks for posting this, I'll update the code to fix this issue.

Thanks,

Zengruan


---
arch/arm64/include/asm/spinlock.h | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h
index 45ff1b2949a6..b5d1982414c5 100644
--- a/arch/arm64/include/asm/spinlock.h
+++ b/arch/arm64/include/asm/spinlock.h
@@ -12,10 +12,12 @@
/* See include/linux/spinlock.h */
#define smp_mb__after_spinlock() smp_mb()

+#ifdef CONFIG_PARAVIRT
#define vcpu_is_preempted vcpu_is_preempted
static inline bool vcpu_is_preempted(long cpu)
{
return pv_vcpu_is_preempted(cpu);
}
+#endif // CONFIG_PARAVIRT

#endif /* __ASM_SPINLOCK_H */
--
2.19.1




2020-01-09 17:42:59

by Steven Price

[permalink] [raw]
Subject: Re: [PATCH v2 1/6] KVM: arm64: Document PV-lock interface

On 26/12/2019 13:58, Zengruan Ye wrote:
> Introduce a paravirtualization interface for KVM/arm64 to obtain the VCPU
> is currently running or not.
>
> The PV lock structure of the guest is allocated by user space.
>
> A hypercall interface is provided for the guest to interrogate the
> hypervisor's support for this interface and the location of the shared
> memory structures.
>
> Signed-off-by: Zengruan Ye <[email protected]>
> ---
> Documentation/virt/kvm/arm/pvlock.rst | 63 +++++++++++++++++++++++++
> Documentation/virt/kvm/devices/vcpu.txt | 14 ++++++
> 2 files changed, 77 insertions(+)
> create mode 100644 Documentation/virt/kvm/arm/pvlock.rst
>
> diff --git a/Documentation/virt/kvm/arm/pvlock.rst b/Documentation/virt/kvm/arm/pvlock.rst
> new file mode 100644
> index 000000000000..58b3b8ee7537
> --- /dev/null
> +++ b/Documentation/virt/kvm/arm/pvlock.rst
> @@ -0,0 +1,63 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +Paravirtualized lock support for arm64
> +======================================
> +
> +KVM/arm64 provides some hypervisor service calls to support a paravirtualized
> +guest obtaining the VCPU is currently running or not.
NIT: ^ whether

> +
> +Two new SMCCC compatible hypercalls are defined:
> +
> +* PV_LOCK_FEATURES: 0xC6000020
> +* PV_LOCK_PREEMPTED: 0xC6000021
> +
> +The existence of the PV_LOCK hypercall should be probed using the SMCCC 1.1
> +ARCH_FEATURES mechanism before calling it.

Since these are within the "vendor specific" SMCCC region ideally you should also check that you are talking to KVM. (Other hypervisors could allocate SMCCC IDs differently within this block). Will has a patch on a branch which gives an example of how this could work [1]

[1] https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/commit/?h=kvm/hvc&id=464f5a1741e5959c3e4d2be1966ae0093b4dce06

> +
> +PV_LOCK_FEATURES
> + ============= ======== ==========
> + Function ID: (uint32) 0xC6000020
> + PV_call_id: (uint32) The function to query for support.
> + Return value: (int64) NOT_SUPPORTED (-1) or SUCCESS (0) if the relevant
> + PV-lock feature is supported by the hypervisor.
> + ============= ======== ==========
> +
> +PV_LOCK_PREEMPTED
> + ============= ======== ==========
> + Function ID: (uint32) 0xC6000021
> + Return value: (int64) NOT_SUPPORTED (-1) or SUCCESS (0) if the IPA of
> + this VCPU's pv data structure is configured by
> + the hypervisor.
> + ============= ======== ==========

PV_LOCK_PREEMPTED also needs to return the address of this data structure. Either by returning this in another register, or by e.g. treating a positive return as an address and a negative value as an error.

> +
> +The IPA returned by PV_LOCK_PREEMPTED should be mapped by the guest as normal
> +memory with inner and outer write back caching attributes, in the inner
> +shareable domain.
> +
> +PV_LOCK_PREEMPTED returns the structure for the calling VCPU.
> +
> +PV lock state
> +-------------
> +
> +The structure pointed to by the PV_LOCK_PREEMPTED hypercall is as follows:
> +
> ++-----------+-------------+-------------+---------------------------------+
> +| Field | Byte Length | Byte Offset | Description |
> ++===========+=============+=============+=================================+
> +| preempted | 8 | 0 | Indicate the VCPU who owns this |

NIT: s/Indicate/Indicates that/. Also more common English would be "the VCPU *that* owns"

> +| | | | struct is running or not. |
> +| | | | Non-zero values mean the VCPU |
> +| | | | has been preempted. Zero means |
> +| | | | the VCPU is not preempted. |
> ++-----------+-------------+-------------+---------------------------------+
> +
> +The preempted field will be updated to 1 by the hypervisor prior to scheduling
> +a VCPU. When the VCPU is scheduled out, the preempted field will be updated
> +to 0 by the hypervisor.
> +
> +The structure will be present within a reserved region of the normal memory
> +given to the guest. The guest should not attempt to write into this memory.
> +There is a structure per VCPU of the guest.

I think it would be worth mentioning in this document that the structure is guaranteed to be 64-byte aligned.

Steve

> +
> +For the user space interface see Documentation/virt/kvm/devices/vcpu.txt
> +section "4. GROUP: KVM_ARM_VCPU_PVLOCK_CTRL".
> diff --git a/Documentation/virt/kvm/devices/vcpu.txt b/Documentation/virt/kvm/devices/vcpu.txt
> index 6f3bd64a05b0..c10a5945075b 100644
> --- a/Documentation/virt/kvm/devices/vcpu.txt
> +++ b/Documentation/virt/kvm/devices/vcpu.txt
> @@ -74,3 +74,17 @@ Specifies the base address of the stolen time structure for this VCPU. The
> base address must be 64 byte aligned and exist within a valid guest memory
> region. See Documentation/virt/kvm/arm/pvtime.txt for more information
> including the layout of the stolen time structure.
> +
> +4. GROUP: KVM_ARM_VCPU_PVLOCK_CTRL
> +Architectures: ARM64
> +
> +4.1 ATTRIBUTE: KVM_ARM_VCPU_PVLOCK_IPA
> +Parameters: 64-bit base address
> +Returns: -ENXIO: PV lock not implemented
> + -EEXIST: Base address already set for this VCPU
> + -EINVAL: Base address not 64 byte aligned
> +
> +Specifies the base address of the PV lock structure for this VCPU. The
> +base address must be 64 byte aligned and exist within a valid guest memory
> +region. See Documentation/virt/kvm/arm/pvlock.rst for more information
> +including the layout of the pv lock structure.
>

2020-01-09 18:35:58

by Steven Price

[permalink] [raw]
Subject: Re: [PATCH v2 3/6] KVM: arm64: Support pvlock preempted via shared structure

On 26/12/2019 13:58, Zengruan Ye wrote:
> Implement the service call for configuring a shared structure between a
> VCPU and the hypervisor in which the hypervisor can tell the VCPU is
> running or not.
>
> The preempted field is zero if 1) some old KVM deos not support this filed.

NIT: s/deos/does/

However, I would hope that the service call will fail if it's an old KVM not simply return zero.

> 2) the VCPU is not preempted. Other values means the VCPU has been preempted.
>
> Signed-off-by: Zengruan Ye <[email protected]>
> ---
> arch/arm/include/asm/kvm_host.h | 18 ++++++++++++
> arch/arm64/include/asm/kvm_host.h | 19 +++++++++++++
> arch/arm64/kvm/Makefile | 1 +
> virt/kvm/arm/arm.c | 8 ++++++
> virt/kvm/arm/hypercalls.c | 8 ++++++
> virt/kvm/arm/pvlock.c | 46 +++++++++++++++++++++++++++++++
> 6 files changed, 100 insertions(+)
> create mode 100644 virt/kvm/arm/pvlock.c
>
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 556cd818eccf..dfeaf9204875 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -356,6 +356,24 @@ static inline bool kvm_arm_is_pvtime_enabled(struct kvm_vcpu_arch *vcpu_arch)
> return false;
> }
>
> +static inline void kvm_arm_pvlock_preempted_init(struct kvm_vcpu_arch *vcpu_arch)
> +{
> +}
> +
> +static inline bool kvm_arm_is_pvlock_preempted_ready(struct kvm_vcpu_arch *vcpu_arch)
> +{
> + return false;
> +}
> +
> +static inline gpa_t kvm_init_pvlock(struct kvm_vcpu *vcpu)
> +{
> + return GPA_INVALID;
> +}
> +
> +static inline void kvm_update_pvlock_preempted(struct kvm_vcpu *vcpu, u64 preempted)
> +{
> +}
> +
> void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>
> struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index c61260cf63c5..2818a2330f92 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -354,6 +354,12 @@ struct kvm_vcpu_arch {
> u64 last_steal;
> gpa_t base;
> } steal;
> +
> + /* Guest PV lock state */
> + struct {
> + u64 preempted;

I'm not sure why the kernel needs to (separately) track this preempted state? It doesn't appear to be used from what I can tell.

Steve

> + gpa_t base;
> + } pv;
> };
>
> /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
> @@ -515,6 +521,19 @@ static inline bool kvm_arm_is_pvtime_enabled(struct kvm_vcpu_arch *vcpu_arch)
> return (vcpu_arch->steal.base != GPA_INVALID);
> }
>
> +static inline void kvm_arm_pvlock_preempted_init(struct kvm_vcpu_arch *vcpu_arch)
> +{
> + vcpu_arch->pv.base = GPA_INVALID;
> +}
> +
> +static inline bool kvm_arm_is_pvlock_preempted_ready(struct kvm_vcpu_arch *vcpu_arch)
> +{
> + return (vcpu_arch->pv.base != GPA_INVALID);
> +}
> +
> +gpa_t kvm_init_pvlock(struct kvm_vcpu *vcpu);
> +void kvm_update_pvlock_preempted(struct kvm_vcpu *vcpu, u64 preempted);
> +
> void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome);
>
> struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 5ffbdc39e780..e4591f56d5f1 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -15,6 +15,7 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arm.o $(KVM)/arm/mmu.o $(KVM)/arm/mmio.
> kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/psci.o $(KVM)/arm/perf.o
> kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hypercalls.o
> kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/pvtime.o
> +kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/pvlock.o
>
> kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o va_layout.o
> kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 8de4daf25097..36d57e77d3c4 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -383,6 +383,8 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
>
> kvm_arm_pvtime_vcpu_init(&vcpu->arch);
>
> + kvm_arm_pvlock_preempted_init(&vcpu->arch);
> +
> return kvm_vgic_vcpu_init(vcpu);
> }
>
> @@ -421,6 +423,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> vcpu_set_wfx_traps(vcpu);
>
> vcpu_ptrauth_setup_lazy(vcpu);
> +
> + if (kvm_arm_is_pvlock_preempted_ready(&vcpu->arch))
> + kvm_update_pvlock_preempted(vcpu, 0);
> }
>
> void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> @@ -434,6 +439,9 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> vcpu->cpu = -1;
>
> kvm_arm_set_running_vcpu(NULL);
> +
> + if (kvm_arm_is_pvlock_preempted_ready(&vcpu->arch))
> + kvm_update_pvlock_preempted(vcpu, 1);
> }
>
> static void vcpu_power_off(struct kvm_vcpu *vcpu)
> diff --git a/virt/kvm/arm/hypercalls.c b/virt/kvm/arm/hypercalls.c
> index 550dfa3e53cd..1c6a11f21bb4 100644
> --- a/virt/kvm/arm/hypercalls.c
> +++ b/virt/kvm/arm/hypercalls.c
> @@ -52,6 +52,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> case ARM_SMCCC_HV_PV_TIME_FEATURES:
> val = SMCCC_RET_SUCCESS;
> break;
> + case ARM_SMCCC_HV_PV_LOCK_FEATURES:
> + val = SMCCC_RET_SUCCESS;
> + break;
> }
> break;
> case ARM_SMCCC_HV_PV_TIME_FEATURES:
> @@ -62,6 +65,11 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> if (gpa != GPA_INVALID)
> val = gpa;
> break;
> + case ARM_SMCCC_HV_PV_LOCK_PREEMPTED:
> + gpa = kvm_init_pvlock(vcpu);
> + if (gpa != GPA_INVALID)
> + val = gpa;
> + break;
> default:
> return kvm_psci_call(vcpu);
> }
> diff --git a/virt/kvm/arm/pvlock.c b/virt/kvm/arm/pvlock.c
> new file mode 100644
> index 000000000000..cdfd30a903b9
> --- /dev/null
> +++ b/virt/kvm/arm/pvlock.c
> @@ -0,0 +1,46 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright(c) 2019 Huawei Technologies Co., Ltd
> + * Author: Zengruan Ye <[email protected]>
> + */
> +
> +#include <linux/arm-smccc.h>
> +#include <linux/kvm_host.h>
> +
> +#include <asm/pvlock-abi.h>
> +
> +#include <kvm/arm_hypercalls.h>
> +
> +gpa_t kvm_init_pvlock(struct kvm_vcpu *vcpu)
> +{
> + struct pvlock_vcpu_state init_values = {};
> + struct kvm *kvm = vcpu->kvm;
> + u64 base = vcpu->arch.pv.base;
> + int idx;
> +
> + if (base == GPA_INVALID)
> + return base;
> +
> + idx = srcu_read_lock(&kvm->srcu);
> + kvm_write_guest(kvm, base, &init_values, sizeof(init_values));
> + srcu_read_unlock(&kvm->srcu, idx);
> +
> + return base;
> +}
> +
> +void kvm_update_pvlock_preempted(struct kvm_vcpu *vcpu, u64 preempted)
> +{
> + int idx;
> + u64 offset;
> + __le64 preempted_le;
> + struct kvm *kvm = vcpu->kvm;
> + u64 base = vcpu->arch.pv.base;
> +
> + vcpu->arch.pv.preempted = preempted;
> + preempted_le = cpu_to_le64(preempted);
> +
> + idx = srcu_read_lock(&kvm->srcu);
> + offset = offsetof(struct pvlock_vcpu_state, preempted);
> + kvm_put_guest(kvm, base + offset, preempted_le, u64);
> + srcu_read_unlock(&kvm->srcu, idx);
> +}
>

2020-01-09 18:53:25

by Steven Price

[permalink] [raw]
Subject: Re: [PATCH v2 6/6] KVM: arm64: Support the VCPU preemption check

On 26/12/2019 13:58, Zengruan Ye wrote:
> Support the vcpu_is_preempted() functionality under KVM/arm64. This will
> enhance lock performance on overcommitted hosts (more runnable VCPUs
> than physical CPUs in the system) as doing busy waits for preempted
> VCPUs will hurt system performance far worse than early yielding.
>
> unix benchmark result:
> host: kernel 5.5.0-rc1, HiSilicon Kunpeng920, 8 CPUs
> guest: kernel 5.5.0-rc1, 16 VCPUs
>
> test-case | after-patch | before-patch
> ----------------------------------------+-------------------+------------------
> Dhrystone 2 using register variables | 334600751.0 lps | 335319028.3 lps
> Double-Precision Whetstone | 32856.1 MWIPS | 32849.6 MWIPS
> Execl Throughput | 3662.1 lps | 2718.0 lps
> File Copy 1024 bufsize 2000 maxblocks | 432906.4 KBps | 158011.8 KBps
> File Copy 256 bufsize 500 maxblocks | 116023.0 KBps | 37664.0 KBps
> File Copy 4096 bufsize 8000 maxblocks | 1432769.8 KBps | 441108.8 KBps
> Pipe Throughput | 6405029.6 lps | 6021457.6 lps
> Pipe-based Context Switching | 185872.7 lps | 184255.3 lps
> Process Creation | 4025.7 lps | 3706.6 lps
> Shell Scripts (1 concurrent) | 6745.6 lpm | 6436.1 lpm
> Shell Scripts (8 concurrent) | 998.7 lpm | 931.1 lpm
> System Call Overhead | 3913363.1 lps | 3883287.8 lps
> ----------------------------------------+-------------------+------------------
> System Benchmarks Index Score | 1835.1 | 1327.6
>
> Signed-off-by: Zengruan Ye <[email protected]>
> ---
> arch/arm64/include/asm/paravirt.h | 3 +
> arch/arm64/kernel/paravirt.c | 117 ++++++++++++++++++++++++++++++
> arch/arm64/kernel/setup.c | 2 +
> include/linux/cpuhotplug.h | 1 +
> 4 files changed, 123 insertions(+)
>
> diff --git a/arch/arm64/include/asm/paravirt.h b/arch/arm64/include/asm/paravirt.h
> index 7b1c81b544bb..ca3a2c7881f3 100644
> --- a/arch/arm64/include/asm/paravirt.h
> +++ b/arch/arm64/include/asm/paravirt.h
> @@ -29,6 +29,8 @@ static inline u64 paravirt_steal_clock(int cpu)
>
> int __init pv_time_init(void);
>
> +int __init pv_lock_init(void);
> +
> __visible bool __native_vcpu_is_preempted(int cpu);
>
> static inline bool pv_vcpu_is_preempted(int cpu)
> @@ -39,6 +41,7 @@ static inline bool pv_vcpu_is_preempted(int cpu)
> #else
>
> #define pv_time_init() do {} while (0)
> +#define pv_lock_init() do {} while (0)
>
> #endif // CONFIG_PARAVIRT
>
> diff --git a/arch/arm64/kernel/paravirt.c b/arch/arm64/kernel/paravirt.c
> index d8f1ba8c22ce..bd2ad6a17a26 100644
> --- a/arch/arm64/kernel/paravirt.c
> +++ b/arch/arm64/kernel/paravirt.c
> @@ -22,6 +22,7 @@
> #include <asm/paravirt.h>
> #include <asm/pvclock-abi.h>
> #include <asm/smp_plat.h>
> +#include <asm/pvlock-abi.h>
>
> struct static_key paravirt_steal_enabled;
> struct static_key paravirt_steal_rq_enabled;
> @@ -35,6 +36,10 @@ struct pv_time_stolen_time_region {
> struct pvclock_vcpu_stolen_time *kaddr;
> };
>
> +struct pv_lock_state_region {
> + struct pvlock_vcpu_state *kaddr;
> +};
> +
> static DEFINE_PER_CPU(struct pv_time_stolen_time_region, stolen_time_region);
>
> static bool steal_acc = true;
> @@ -158,3 +163,115 @@ int __init pv_time_init(void)
>
> return 0;
> }
> +
> +static DEFINE_PER_CPU(struct pv_lock_state_region, lock_state_region);
> +
> +static bool kvm_vcpu_is_preempted(int cpu)
> +{
> + struct pv_lock_state_region *reg;
> + __le64 preempted_le;
> +
> + reg = per_cpu_ptr(&lock_state_region, cpu);
> + if (!reg->kaddr) {
> + pr_warn_once("PV lock enabled but not configured for cpu %d\n",
> + cpu);
> + return false;
> + }
> +
> + preempted_le = le64_to_cpu(READ_ONCE(reg->kaddr->preempted));
> +
> + return !!(preempted_le & 1);

According to the documentation preempted != 0 means preempted, but here you are checking the LSB. You need to be consistent about the ABI.

> +}
> +
> +static int pvlock_vcpu_state_dying_cpu(unsigned int cpu)
> +{
> + struct pv_lock_state_region *reg;
> +
> + reg = this_cpu_ptr(&lock_state_region);
> + if (!reg->kaddr)
> + return 0;
> +
> + memunmap(reg->kaddr);
> + memset(reg, 0, sizeof(*reg));
> +
> + return 0;
> +}
> +
> +static int init_pvlock_vcpu_state(unsigned int cpu)
> +{
> + struct pv_lock_state_region *reg;
> + struct arm_smccc_res res;
> +
> + reg = this_cpu_ptr(&lock_state_region);
> +
> + arm_smccc_1_1_invoke(ARM_SMCCC_HV_PV_LOCK_PREEMPTED, &res);
> +
> + if (res.a0 == SMCCC_RET_NOT_SUPPORTED) {
> + pr_warn("Failed to init PV lock data structure\n");
> + return -EINVAL;
> + }
> +
> + reg->kaddr = memremap(res.a0,
> + sizeof(struct pvlock_vcpu_state),
> + MEMREMAP_WB);
> +
> + if (!reg->kaddr) {
> + pr_warn("Failed to map PV lock data structure\n");
> + return -ENOMEM;
> + }
> +
> + return 0;
> +}
> +
> +static int kvm_arm_init_pvlock(void)
> +{
> + int ret;
> +
> + ret = cpuhp_setup_state(CPUHP_AP_ARM_KVM_PVLOCK_STARTING,
> + "hypervisor/arm/pvlock:starting",
> + init_pvlock_vcpu_state,
> + pvlock_vcpu_state_dying_cpu);
> + if (ret < 0) {
> + pr_warn("PV-lock init failed\n");
> + return ret;
> + }
> +
> + return 0;
> +}
> +
> +static bool has_kvm_pvlock(void)
> +{
> + struct arm_smccc_res res;
> +
> + /* To detect the presence of PV lock support we require SMCCC 1.1+ */
> + if (psci_ops.smccc_version < SMCCC_VERSION_1_1)
> + return false;
> +
> + arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
> + ARM_SMCCC_HV_PV_LOCK_FEATURES, &res);

As mentioned previously we could do with something more robust to check that the hypervisor is actually KVM before assuming that vendor specific IDs are valid.

Steve

> +
> + if (res.a0 != SMCCC_RET_SUCCESS)
> + return false;
> +
> + return true;
> +}
> +
> +int __init pv_lock_init(void)
> +{
> + int ret;
> +
> + if (is_hyp_mode_available())
> + return 0;
> +
> + if (!has_kvm_pvlock())
> + return 0;
> +
> + ret = kvm_arm_init_pvlock();
> + if (ret)
> + return ret;
> +
> + pv_ops.lock.vcpu_is_preempted = kvm_vcpu_is_preempted;
> + pr_info("using PV-lock preempted\n");
> +
> + return 0;
> +}
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index 56f664561754..aa3a8b9e710f 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -341,6 +341,8 @@ void __init setup_arch(char **cmdline_p)
> smp_init_cpus();
> smp_build_mpidr_hash();
>
> + pv_lock_init();
> +
> /* Init percpu seeds for random tags after cpus are set up. */
> kasan_init_tags();
>
> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> index e51ee772b9f5..f72ff95ab63a 100644
> --- a/include/linux/cpuhotplug.h
> +++ b/include/linux/cpuhotplug.h
> @@ -138,6 +138,7 @@ enum cpuhp_state {
> CPUHP_AP_DUMMY_TIMER_STARTING,
> CPUHP_AP_ARM_XEN_STARTING,
> CPUHP_AP_ARM_KVMPV_STARTING,
> + CPUHP_AP_ARM_KVM_PVLOCK_STARTING,
> CPUHP_AP_ARM_CORESIGHT_STARTING,
> CPUHP_AP_ARM64_ISNDEP_STARTING,
> CPUHP_AP_SMPCFD_DYING,
>

2020-01-11 06:55:14

by yezengruan

[permalink] [raw]
Subject: Re: [PATCH v2 1/6] KVM: arm64: Document PV-lock interface

Hi Steve,

On 2020/1/9 22:53, Steven Price wrote:
> On 26/12/2019 13:58, Zengruan Ye wrote:
>> Introduce a paravirtualization interface for KVM/arm64 to obtain the VCPU
>> is currently running or not.
>>
>> The PV lock structure of the guest is allocated by user space.
>>
>> A hypercall interface is provided for the guest to interrogate the
>> hypervisor's support for this interface and the location of the shared
>> memory structures.
>>
>> Signed-off-by: Zengruan Ye <[email protected]>
>> ---
>>   Documentation/virt/kvm/arm/pvlock.rst   | 63 +++++++++++++++++++++++++
>>   Documentation/virt/kvm/devices/vcpu.txt | 14 ++++++
>>   2 files changed, 77 insertions(+)
>>   create mode 100644 Documentation/virt/kvm/arm/pvlock.rst
>>
>> diff --git a/Documentation/virt/kvm/arm/pvlock.rst b/Documentation/virt/kvm/arm/pvlock.rst
>> new file mode 100644
>> index 000000000000..58b3b8ee7537
>> --- /dev/null
>> +++ b/Documentation/virt/kvm/arm/pvlock.rst
>> @@ -0,0 +1,63 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +Paravirtualized lock support for arm64
>> +======================================
>> +
>> +KVM/arm64 provides some hypervisor service calls to support a paravirtualized
>> +guest obtaining the VCPU is currently running or not.
> NIT:              ^ whether

Thanks for posting this.

>
>> +
>> +Two new SMCCC compatible hypercalls are defined:
>> +
>> +* PV_LOCK_FEATURES:   0xC6000020
>> +* PV_LOCK_PREEMPTED:  0xC6000021
>> +
>> +The existence of the PV_LOCK hypercall should be probed using the SMCCC 1.1
>> +ARCH_FEATURES mechanism before calling it.
>
> Since these are within the "vendor specific" SMCCC region ideally you should also check that you are talking to KVM. (Other hypervisors could allocate SMCCC IDs differently within this block). Will has a patch on a branch which gives an example of how this could work [1]
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/commit/?h=kvm/hvc&id=464f5a1741e5959c3e4d2be1966ae0093b4dce06

OK, I will add "vendor specific" check next version.

>
>> +
>> +PV_LOCK_FEATURES
>> +    ============= ========    ==========
>> +    Function ID:  (uint32)    0xC6000020
>> +    PV_call_id:   (uint32)    The function to query for support.
>> +    Return value: (int64)     NOT_SUPPORTED (-1) or SUCCESS (0) if the relevant
>> +                              PV-lock feature is supported by the hypervisor.
>> +    ============= ========    ==========
>> +
>> +PV_LOCK_PREEMPTED
>> +    ============= ========    ==========
>> +    Function ID:  (uint32)    0xC6000021
>> +    Return value: (int64)     NOT_SUPPORTED (-1) or SUCCESS (0) if the IPA of
>> +                              this VCPU's pv data structure is configured by
>> +                              the hypervisor.
>> +    ============= ========    ==========
>
> PV_LOCK_PREEMPTED also needs to return the address of this data structure. Either by returning this in another register, or by e.g. treating a positive return as an address and a negative value as an error.

This is somewhat embarrassing. The code does what you say, but the doc doesn't. Thanks for pointing it out to me! I'll update the doc to match.

>
>> +
>> +The IPA returned by PV_LOCK_PREEMPTED should be mapped by the guest as normal
>> +memory with inner and outer write back caching attributes, in the inner
>> +shareable domain.
>> +
>> +PV_LOCK_PREEMPTED returns the structure for the calling VCPU.
>> +
>> +PV lock state
>> +-------------
>> +
>> +The structure pointed to by the PV_LOCK_PREEMPTED hypercall is as follows:
>> +
>> ++-----------+-------------+-------------+---------------------------------+
>> +| Field     | Byte Length | Byte Offset | Description                     |
>> ++===========+=============+=============+=================================+
>> +| preempted |      8      |      0      | Indicate the VCPU who owns this |
>
> NIT: s/Indicate/Indicates that/. Also more common English would be "the VCPU *that* owns"

Will update.

>
>> +|           |             |             | struct is running or not.       |
>> +|           |             |             | Non-zero values mean the VCPU   |
>> +|           |             |             | has been preempted. Zero means  |
>> +|           |             |             | the VCPU is not preempted.      |
>> ++-----------+-------------+-------------+---------------------------------+
>> +
>> +The preempted field will be updated to 1 by the hypervisor prior to scheduling
>> +a VCPU. When the VCPU is scheduled out, the preempted field will be updated
>> +to 0 by the hypervisor.
>> +
>> +The structure will be present within a reserved region of the normal memory
>> +given to the guest. The guest should not attempt to write into this memory.
>> +There is a structure per VCPU of the guest.
>
> I think it would be worth mentioning in this document that the structure is guaranteed to be 64-byte aligned.

Good point, I'll update the doc.

>
> Steve
>
>> +
>> +For the user space interface see Documentation/virt/kvm/devices/vcpu.txt
>> +section "4. GROUP: KVM_ARM_VCPU_PVLOCK_CTRL".
>> diff --git a/Documentation/virt/kvm/devices/vcpu.txt b/Documentation/virt/kvm/devices/vcpu.txt
>> index 6f3bd64a05b0..c10a5945075b 100644
>> --- a/Documentation/virt/kvm/devices/vcpu.txt
>> +++ b/Documentation/virt/kvm/devices/vcpu.txt
>> @@ -74,3 +74,17 @@ Specifies the base address of the stolen time structure for this VCPU. The
>>   base address must be 64 byte aligned and exist within a valid guest memory
>>   region. See Documentation/virt/kvm/arm/pvtime.txt for more information
>>   including the layout of the stolen time structure.
>> +
>> +4. GROUP: KVM_ARM_VCPU_PVLOCK_CTRL
>> +Architectures: ARM64
>> +
>> +4.1 ATTRIBUTE: KVM_ARM_VCPU_PVLOCK_IPA
>> +Parameters: 64-bit base address
>> +Returns: -ENXIO:  PV lock not implemented
>> +         -EEXIST: Base address already set for this VCPU
>> +         -EINVAL: Base address not 64 byte aligned
>> +
>> +Specifies the base address of the PV lock structure for this VCPU. The
>> +base address must be 64 byte aligned and exist within a valid guest memory
>> +region. See Documentation/virt/kvm/arm/pvlock.rst for more information
>> +including the layout of the pv lock structure.
>>
>
>
> .

Thanks,

Zengruan

2020-01-11 07:32:33

by yezengruan

[permalink] [raw]
Subject: Re: [PATCH v2 3/6] KVM: arm64: Support pvlock preempted via shared structure

Hi Steve,

On 2020/1/9 23:02, Steven Price wrote:
> On 26/12/2019 13:58, Zengruan Ye wrote:
>> Implement the service call for configuring a shared structure between a
>> VCPU and the hypervisor in which the hypervisor can tell the VCPU is
>> running or not.
>>
>> The preempted field is zero if 1) some old KVM deos not support this filed.
>
> NIT: s/deos/does/

Thanks for posting this.

>
> However, I would hope that the service call will fail if it's an old KVM not simply return zero.

Sorry, I'm not sure what you mean. The service call will fail if it's an old KVM, and the Guest will use __native_vcpu_is_preempted.

>
>> 2) the VCPU is not preempted. Other values means the VCPU has been preempted.
>>
>> Signed-off-by: Zengruan Ye <[email protected]>
>> ---
>>   arch/arm/include/asm/kvm_host.h   | 18 ++++++++++++
>>   arch/arm64/include/asm/kvm_host.h | 19 +++++++++++++
>>   arch/arm64/kvm/Makefile           |  1 +
>>   virt/kvm/arm/arm.c                |  8 ++++++
>>   virt/kvm/arm/hypercalls.c         |  8 ++++++
>>   virt/kvm/arm/pvlock.c             | 46 +++++++++++++++++++++++++++++++
>>   6 files changed, 100 insertions(+)
>>   create mode 100644 virt/kvm/arm/pvlock.c
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>> index 556cd818eccf..dfeaf9204875 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -356,6 +356,24 @@ static inline bool kvm_arm_is_pvtime_enabled(struct kvm_vcpu_arch *vcpu_arch)
>>       return false;
>>   }
>>   +static inline void kvm_arm_pvlock_preempted_init(struct kvm_vcpu_arch *vcpu_arch)
>> +{
>> +}
>> +
>> +static inline bool kvm_arm_is_pvlock_preempted_ready(struct kvm_vcpu_arch *vcpu_arch)
>> +{
>> +    return false;
>> +}
>> +
>> +static inline gpa_t kvm_init_pvlock(struct kvm_vcpu *vcpu)
>> +{
>> +    return GPA_INVALID;
>> +}
>> +
>> +static inline void kvm_update_pvlock_preempted(struct kvm_vcpu *vcpu, u64 preempted)
>> +{
>> +}
>> +
>>   void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>>     struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index c61260cf63c5..2818a2330f92 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -354,6 +354,12 @@ struct kvm_vcpu_arch {
>>           u64 last_steal;
>>           gpa_t base;
>>       } steal;
>> +
>> +    /* Guest PV lock state */
>> +    struct {
>> +        u64 preempted;
>
> I'm not sure why the kernel needs to (separately) track this preempted state? It doesn't appear to be used from what I can tell.

Good point, the preempted state field is not actually used, I'll remove it.

>
> Steve
>
>> +        gpa_t base;
>> +    } pv;
>>   };
>>     /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
>> @@ -515,6 +521,19 @@ static inline bool kvm_arm_is_pvtime_enabled(struct kvm_vcpu_arch *vcpu_arch)
>>       return (vcpu_arch->steal.base != GPA_INVALID);
>>   }
>>   +static inline void kvm_arm_pvlock_preempted_init(struct kvm_vcpu_arch *vcpu_arch)
>> +{
>> +    vcpu_arch->pv.base = GPA_INVALID;
>> +}
>> +
>> +static inline bool kvm_arm_is_pvlock_preempted_ready(struct kvm_vcpu_arch *vcpu_arch)
>> +{
>> +    return (vcpu_arch->pv.base != GPA_INVALID);
>> +}
>> +
>> +gpa_t kvm_init_pvlock(struct kvm_vcpu *vcpu);
>> +void kvm_update_pvlock_preempted(struct kvm_vcpu *vcpu, u64 preempted);
>> +
>>   void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome);
>>     struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
>> index 5ffbdc39e780..e4591f56d5f1 100644
>> --- a/arch/arm64/kvm/Makefile
>> +++ b/arch/arm64/kvm/Makefile
>> @@ -15,6 +15,7 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arm.o $(KVM)/arm/mmu.o $(KVM)/arm/mmio.
>>   kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/psci.o $(KVM)/arm/perf.o
>>   kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hypercalls.o
>>   kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/pvtime.o
>> +kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/pvlock.o
>>     kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o va_layout.o
>>   kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o
>> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
>> index 8de4daf25097..36d57e77d3c4 100644
>> --- a/virt/kvm/arm/arm.c
>> +++ b/virt/kvm/arm/arm.c
>> @@ -383,6 +383,8 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
>>         kvm_arm_pvtime_vcpu_init(&vcpu->arch);
>>   +    kvm_arm_pvlock_preempted_init(&vcpu->arch);
>> +
>>       return kvm_vgic_vcpu_init(vcpu);
>>   }
>>   @@ -421,6 +423,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>           vcpu_set_wfx_traps(vcpu);
>>         vcpu_ptrauth_setup_lazy(vcpu);
>> +
>> +    if (kvm_arm_is_pvlock_preempted_ready(&vcpu->arch))
>> +        kvm_update_pvlock_preempted(vcpu, 0);
>>   }
>>     void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>> @@ -434,6 +439,9 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>>       vcpu->cpu = -1;
>>         kvm_arm_set_running_vcpu(NULL);
>> +
>> +    if (kvm_arm_is_pvlock_preempted_ready(&vcpu->arch))
>> +        kvm_update_pvlock_preempted(vcpu, 1);
>>   }
>>     static void vcpu_power_off(struct kvm_vcpu *vcpu)
>> diff --git a/virt/kvm/arm/hypercalls.c b/virt/kvm/arm/hypercalls.c
>> index 550dfa3e53cd..1c6a11f21bb4 100644
>> --- a/virt/kvm/arm/hypercalls.c
>> +++ b/virt/kvm/arm/hypercalls.c
>> @@ -52,6 +52,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
>>           case ARM_SMCCC_HV_PV_TIME_FEATURES:
>>               val = SMCCC_RET_SUCCESS;
>>               break;
>> +        case ARM_SMCCC_HV_PV_LOCK_FEATURES:
>> +            val = SMCCC_RET_SUCCESS;
>> +            break;
>>           }
>>           break;
>>       case ARM_SMCCC_HV_PV_TIME_FEATURES:
>> @@ -62,6 +65,11 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
>>           if (gpa != GPA_INVALID)
>>               val = gpa;
>>           break;
>> +    case ARM_SMCCC_HV_PV_LOCK_PREEMPTED:
>> +        gpa = kvm_init_pvlock(vcpu);
>> +        if (gpa != GPA_INVALID)
>> +            val = gpa;
>> +        break;
>>       default:
>>           return kvm_psci_call(vcpu);
>>       }
>> diff --git a/virt/kvm/arm/pvlock.c b/virt/kvm/arm/pvlock.c
>> new file mode 100644
>> index 000000000000..cdfd30a903b9
>> --- /dev/null
>> +++ b/virt/kvm/arm/pvlock.c
>> @@ -0,0 +1,46 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/*
>> + * Copyright(c) 2019 Huawei Technologies Co., Ltd
>> + * Author: Zengruan Ye <[email protected]>
>> + */
>> +
>> +#include <linux/arm-smccc.h>
>> +#include <linux/kvm_host.h>
>> +
>> +#include <asm/pvlock-abi.h>
>> +
>> +#include <kvm/arm_hypercalls.h>
>> +
>> +gpa_t kvm_init_pvlock(struct kvm_vcpu *vcpu)
>> +{
>> +    struct pvlock_vcpu_state init_values = {};
>> +    struct kvm *kvm = vcpu->kvm;
>> +    u64 base = vcpu->arch.pv.base;
>> +    int idx;
>> +
>> +    if (base == GPA_INVALID)
>> +        return base;
>> +
>> +    idx = srcu_read_lock(&kvm->srcu);
>> +    kvm_write_guest(kvm, base, &init_values, sizeof(init_values));
>> +    srcu_read_unlock(&kvm->srcu, idx);
>> +
>> +    return base;
>> +}
>> +
>> +void kvm_update_pvlock_preempted(struct kvm_vcpu *vcpu, u64 preempted)
>> +{
>> +    int idx;
>> +    u64 offset;
>> +    __le64 preempted_le;
>> +    struct kvm *kvm = vcpu->kvm;
>> +    u64 base = vcpu->arch.pv.base;
>> +
>> +    vcpu->arch.pv.preempted = preempted;
>> +    preempted_le = cpu_to_le64(preempted);
>> +
>> +    idx = srcu_read_lock(&kvm->srcu);
>> +    offset = offsetof(struct pvlock_vcpu_state, preempted);
>> +    kvm_put_guest(kvm, base + offset, preempted_le, u64);
>> +    srcu_read_unlock(&kvm->srcu, idx);
>> +}
>>
>
>
> .

Thanks,

Zengruan

2020-01-11 07:34:29

by yezengruan

[permalink] [raw]
Subject: Re: [PATCH v2 6/6] KVM: arm64: Support the VCPU preemption check

Hi Steve,

On 2020/1/9 23:09, Steven Price wrote:
> On 26/12/2019 13:58, Zengruan Ye wrote:
>> Support the vcpu_is_preempted() functionality under KVM/arm64. This will
>> enhance lock performance on overcommitted hosts (more runnable VCPUs
>> than physical CPUs in the system) as doing busy waits for preempted
>> VCPUs will hurt system performance far worse than early yielding.
>>
>> unix benchmark result:
>>    host:  kernel 5.5.0-rc1, HiSilicon Kunpeng920, 8 CPUs
>>    guest: kernel 5.5.0-rc1, 16 VCPUs
>>
>>                 test-case                |    after-patch    |   before-patch
>> ----------------------------------------+-------------------+------------------
>>   Dhrystone 2 using register variables   | 334600751.0 lps   | 335319028.3 lps
>>   Double-Precision Whetstone             |     32856.1 MWIPS |     32849.6 MWIPS
>>   Execl Throughput                       |      3662.1 lps   |      2718.0 lps
>>   File Copy 1024 bufsize 2000 maxblocks  |    432906.4 KBps  |    158011.8 KBps
>>   File Copy 256 bufsize 500 maxblocks    |    116023.0 KBps  |     37664.0 KBps
>>   File Copy 4096 bufsize 8000 maxblocks  |   1432769.8 KBps  |    441108.8 KBps
>>   Pipe Throughput                        |   6405029.6 lps   |   6021457.6 lps
>>   Pipe-based Context Switching           |    185872.7 lps   |    184255.3 lps
>>   Process Creation                       |      4025.7 lps   |      3706.6 lps
>>   Shell Scripts (1 concurrent)           |      6745.6 lpm   |      6436.1 lpm
>>   Shell Scripts (8 concurrent)           |       998.7 lpm   |       931.1 lpm
>>   System Call Overhead                   |   3913363.1 lps   |   3883287.8 lps
>> ----------------------------------------+-------------------+------------------
>>   System Benchmarks Index Score          |      1835.1       |      1327.6
>>
>> Signed-off-by: Zengruan Ye <[email protected]>
>> ---
>>   arch/arm64/include/asm/paravirt.h |   3 +
>>   arch/arm64/kernel/paravirt.c      | 117 ++++++++++++++++++++++++++++++
>>   arch/arm64/kernel/setup.c         |   2 +
>>   include/linux/cpuhotplug.h        |   1 +
>>   4 files changed, 123 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/paravirt.h b/arch/arm64/include/asm/paravirt.h
>> index 7b1c81b544bb..ca3a2c7881f3 100644
>> --- a/arch/arm64/include/asm/paravirt.h
>> +++ b/arch/arm64/include/asm/paravirt.h
>> @@ -29,6 +29,8 @@ static inline u64 paravirt_steal_clock(int cpu)
>>     int __init pv_time_init(void);
>>   +int __init pv_lock_init(void);
>> +
>>   __visible bool __native_vcpu_is_preempted(int cpu);
>>     static inline bool pv_vcpu_is_preempted(int cpu)
>> @@ -39,6 +41,7 @@ static inline bool pv_vcpu_is_preempted(int cpu)
>>   #else
>>     #define pv_time_init() do {} while (0)
>> +#define pv_lock_init() do {} while (0)
>>     #endif // CONFIG_PARAVIRT
>>   diff --git a/arch/arm64/kernel/paravirt.c b/arch/arm64/kernel/paravirt.c
>> index d8f1ba8c22ce..bd2ad6a17a26 100644
>> --- a/arch/arm64/kernel/paravirt.c
>> +++ b/arch/arm64/kernel/paravirt.c
>> @@ -22,6 +22,7 @@
>>   #include <asm/paravirt.h>
>>   #include <asm/pvclock-abi.h>
>>   #include <asm/smp_plat.h>
>> +#include <asm/pvlock-abi.h>
>>     struct static_key paravirt_steal_enabled;
>>   struct static_key paravirt_steal_rq_enabled;
>> @@ -35,6 +36,10 @@ struct pv_time_stolen_time_region {
>>       struct pvclock_vcpu_stolen_time *kaddr;
>>   };
>>   +struct pv_lock_state_region {
>> +    struct pvlock_vcpu_state *kaddr;
>> +};
>> +
>>   static DEFINE_PER_CPU(struct pv_time_stolen_time_region, stolen_time_region);
>>     static bool steal_acc = true;
>> @@ -158,3 +163,115 @@ int __init pv_time_init(void)
>>         return 0;
>>   }
>> +
>> +static DEFINE_PER_CPU(struct pv_lock_state_region, lock_state_region);
>> +
>> +static bool kvm_vcpu_is_preempted(int cpu)
>> +{
>> +    struct pv_lock_state_region *reg;
>> +    __le64 preempted_le;
>> +
>> +    reg = per_cpu_ptr(&lock_state_region, cpu);
>> +    if (!reg->kaddr) {
>> +        pr_warn_once("PV lock enabled but not configured for cpu %d\n",
>> +                 cpu);
>> +        return false;
>> +    }
>> +
>> +    preempted_le = le64_to_cpu(READ_ONCE(reg->kaddr->preempted));
>> +
>> +    return !!(preempted_le & 1);
>
> According to the documentation preempted != 0 means preempted, but here you are checking the LSB. You need to be consistent about the ABI.

Thanks for posting this. I'll update the code.

>
>> +}
>> +
>> +static int pvlock_vcpu_state_dying_cpu(unsigned int cpu)
>> +{
>> +    struct pv_lock_state_region *reg;
>> +
>> +    reg = this_cpu_ptr(&lock_state_region);
>> +    if (!reg->kaddr)
>> +        return 0;
>> +
>> +    memunmap(reg->kaddr);
>> +    memset(reg, 0, sizeof(*reg));
>> +
>> +    return 0;
>> +}
>> +
>> +static int init_pvlock_vcpu_state(unsigned int cpu)
>> +{
>> +    struct pv_lock_state_region *reg;
>> +    struct arm_smccc_res res;
>> +
>> +    reg = this_cpu_ptr(&lock_state_region);
>> +
>> +    arm_smccc_1_1_invoke(ARM_SMCCC_HV_PV_LOCK_PREEMPTED, &res);
>> +
>> +    if (res.a0 == SMCCC_RET_NOT_SUPPORTED) {
>> +        pr_warn("Failed to init PV lock data structure\n");
>> +        return -EINVAL;
>> +    }
>> +
>> +    reg->kaddr = memremap(res.a0,
>> +                  sizeof(struct pvlock_vcpu_state),
>> +                  MEMREMAP_WB);
>> +
>> +    if (!reg->kaddr) {
>> +        pr_warn("Failed to map PV lock data structure\n");
>> +        return -ENOMEM;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int kvm_arm_init_pvlock(void)
>> +{
>> +    int ret;
>> +
>> +    ret = cpuhp_setup_state(CPUHP_AP_ARM_KVM_PVLOCK_STARTING,
>> +                "hypervisor/arm/pvlock:starting",
>> +                init_pvlock_vcpu_state,
>> +                pvlock_vcpu_state_dying_cpu);
>> +    if (ret < 0) {
>> +        pr_warn("PV-lock init failed\n");
>> +        return ret;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static bool has_kvm_pvlock(void)
>> +{
>> +    struct arm_smccc_res res;
>> +
>> +    /* To detect the presence of PV lock support we require SMCCC 1.1+ */
>> +    if (psci_ops.smccc_version < SMCCC_VERSION_1_1)
>> +        return false;
>> +
>> +    arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
>> +                 ARM_SMCCC_HV_PV_LOCK_FEATURES, &res);
>
> As mentioned previously we could do with something more robust to check that the hypervisor is actually KVM before assuming that vendor specific IDs are valid.

Will update next version.

>
> Steve
>
>> +
>> +    if (res.a0 != SMCCC_RET_SUCCESS)
>> +        return false;
>> +
>> +    return true;
>> +}
>> +
>> +int __init pv_lock_init(void)
>> +{
>> +    int ret;
>> +
>> +    if (is_hyp_mode_available())
>> +        return 0;
>> +
>> +    if (!has_kvm_pvlock())
>> +        return 0;
>> +
>> +    ret = kvm_arm_init_pvlock();
>> +    if (ret)
>> +        return ret;
>> +
>> +    pv_ops.lock.vcpu_is_preempted = kvm_vcpu_is_preempted;
>> +    pr_info("using PV-lock preempted\n");
>> +
>> +    return 0;
>> +}
>> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
>> index 56f664561754..aa3a8b9e710f 100644
>> --- a/arch/arm64/kernel/setup.c
>> +++ b/arch/arm64/kernel/setup.c
>> @@ -341,6 +341,8 @@ void __init setup_arch(char **cmdline_p)
>>       smp_init_cpus();
>>       smp_build_mpidr_hash();
>>   +    pv_lock_init();
>> +
>>       /* Init percpu seeds for random tags after cpus are set up. */
>>       kasan_init_tags();
>>   diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
>> index e51ee772b9f5..f72ff95ab63a 100644
>> --- a/include/linux/cpuhotplug.h
>> +++ b/include/linux/cpuhotplug.h
>> @@ -138,6 +138,7 @@ enum cpuhp_state {
>>       CPUHP_AP_DUMMY_TIMER_STARTING,
>>       CPUHP_AP_ARM_XEN_STARTING,
>>       CPUHP_AP_ARM_KVMPV_STARTING,
>> +    CPUHP_AP_ARM_KVM_PVLOCK_STARTING,
>>       CPUHP_AP_ARM_CORESIGHT_STARTING,
>>       CPUHP_AP_ARM64_ISNDEP_STARTING,
>>       CPUHP_AP_SMPCFD_DYING,
>>
>
>
> .

Thanks,

Zengruan

2020-01-13 10:33:36

by Steven Price

[permalink] [raw]
Subject: Re: [PATCH v2 3/6] KVM: arm64: Support pvlock preempted via shared structure

On Sat, Jan 11, 2020 at 07:30:42AM +0000, yezengruan wrote:
> Hi Steve,
>
> On 2020/1/9 23:02, Steven Price wrote:
> > On 26/12/2019 13:58, Zengruan Ye wrote:
> >> Implement the service call for configuring a shared structure between a
> >> VCPU and the hypervisor in which the hypervisor can tell the VCPU is
> >> running or not.
> >>
> >> The preempted field is zero if 1) some old KVM deos not support this filed.
> >
> > NIT: s/deos/does/
>
> Thanks for posting this.
>
> >
> > However, I would hope that the service call will fail if it's an old KVM not simply return zero.
>
> Sorry, I'm not sure what you mean. The service call will fail if it's an old KVM, and the Guest will use __native_vcpu_is_preempted.

You previously said the "field is zero if [...] some old KVM does not
support this field". This seems a bit of an odd statement, because the
field just doesn't exist (it's an old KVM so won't have allocated it),
and if the guest attempts to find the field using the service call then
the call will fail.

So I'm not sure in what situation you are expecting the field to be zero
on an old KVM.

Steve

2020-01-13 12:14:02

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v2 0/6] KVM: arm64: VCPU preempted check support

[+PeterZ]

On Thu, Dec 26, 2019 at 09:58:27PM +0800, Zengruan Ye wrote:
> This patch set aims to support the vcpu_is_preempted() functionality
> under KVM/arm64, which allowing the guest to obtain the VCPU is
> currently running or not. This will enhance lock performance on
> overcommitted hosts (more runnable VCPUs than physical CPUs in the
> system) as doing busy waits for preempted VCPUs will hurt system
> performance far worse than early yielding.
>
> We have observed some performace improvements in uninx benchmark tests.
>
> unix benchmark result:
> host: kernel 5.5.0-rc1, HiSilicon Kunpeng920, 8 CPUs
> guest: kernel 5.5.0-rc1, 16 VCPUs
>
> test-case | after-patch | before-patch
> ----------------------------------------+-------------------+------------------
> Dhrystone 2 using register variables | 334600751.0 lps | 335319028.3 lps
> Double-Precision Whetstone | 32856.1 MWIPS | 32849.6 MWIPS
> Execl Throughput | 3662.1 lps | 2718.0 lps
> File Copy 1024 bufsize 2000 maxblocks | 432906.4 KBps | 158011.8 KBps
> File Copy 256 bufsize 500 maxblocks | 116023.0 KBps | 37664.0 KBps
> File Copy 4096 bufsize 8000 maxblocks | 1432769.8 KBps | 441108.8 KBps
> Pipe Throughput | 6405029.6 lps | 6021457.6 lps
> Pipe-based Context Switching | 185872.7 lps | 184255.3 lps
> Process Creation | 4025.7 lps | 3706.6 lps
> Shell Scripts (1 concurrent) | 6745.6 lpm | 6436.1 lpm
> Shell Scripts (8 concurrent) | 998.7 lpm | 931.1 lpm
> System Call Overhead | 3913363.1 lps | 3883287.8 lps
> ----------------------------------------+-------------------+------------------
> System Benchmarks Index Score | 1835.1 | 1327.6

Interesting, thanks for the numbers.

So it looks like there is a decent improvement to be had from targetted vCPU
wakeup, but I really dislike the explicit PV interface and it's already been
shown to interact badly with the WFE-based polling in smp_cond_load_*().

Rather than expose a divergent interface, I would instead like to explore an
improvement to smp_cond_load_*() and see how that performs before we commit
to something more intrusive. Marc and I looked at this very briefly in the
past, and the basic idea is to register all of the WFE sites with the
hypervisor, indicating which register contains the address being spun on
and which register contains the "bad" value. That way, you don't bother
rescheduling a vCPU if the value at the address is still bad, because you
know it will exit immediately.

Of course, the devil is in the details because when I say "address", that's
a guest virtual address, so you need to play some tricks in the hypervisor
so that you have a separate mapping for the lockword (it's enough to keep
track of the physical address).

Our hacks are here but we basically ran out of time to work on them beyond
an unoptimised and hacky prototype:

https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/pvcy

Marc -- how would you prefer to handle this?

Will

2020-01-15 14:16:01

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v2 0/6] KVM: arm64: VCPU preempted check support

On 2020-01-13 12:12, Will Deacon wrote:
> [+PeterZ]
>
> On Thu, Dec 26, 2019 at 09:58:27PM +0800, Zengruan Ye wrote:
>> This patch set aims to support the vcpu_is_preempted() functionality
>> under KVM/arm64, which allowing the guest to obtain the VCPU is
>> currently running or not. This will enhance lock performance on
>> overcommitted hosts (more runnable VCPUs than physical CPUs in the
>> system) as doing busy waits for preempted VCPUs will hurt system
>> performance far worse than early yielding.
>>
>> We have observed some performace improvements in uninx benchmark
>> tests.
>>
>> unix benchmark result:
>> host: kernel 5.5.0-rc1, HiSilicon Kunpeng920, 8 CPUs
>> guest: kernel 5.5.0-rc1, 16 VCPUs
>>
>> test-case | after-patch |
>> before-patch
>> ----------------------------------------+-------------------+------------------
>> Dhrystone 2 using register variables | 334600751.0 lps |
>> 335319028.3 lps
>> Double-Precision Whetstone | 32856.1 MWIPS |
>> 32849.6 MWIPS
>> Execl Throughput | 3662.1 lps |
>> 2718.0 lps
>> File Copy 1024 bufsize 2000 maxblocks | 432906.4 KBps |
>> 158011.8 KBps
>> File Copy 256 bufsize 500 maxblocks | 116023.0 KBps |
>> 37664.0 KBps
>> File Copy 4096 bufsize 8000 maxblocks | 1432769.8 KBps |
>> 441108.8 KBps
>> Pipe Throughput | 6405029.6 lps |
>> 6021457.6 lps
>> Pipe-based Context Switching | 185872.7 lps |
>> 184255.3 lps
>> Process Creation | 4025.7 lps |
>> 3706.6 lps
>> Shell Scripts (1 concurrent) | 6745.6 lpm |
>> 6436.1 lpm
>> Shell Scripts (8 concurrent) | 998.7 lpm |
>> 931.1 lpm
>> System Call Overhead | 3913363.1 lps |
>> 3883287.8 lps
>> ----------------------------------------+-------------------+------------------
>> System Benchmarks Index Score | 1835.1 |
>> 1327.6
>
> Interesting, thanks for the numbers.
>
> So it looks like there is a decent improvement to be had from targetted
> vCPU
> wakeup, but I really dislike the explicit PV interface and it's already
> been
> shown to interact badly with the WFE-based polling in
> smp_cond_load_*().
>
> Rather than expose a divergent interface, I would instead like to
> explore an
> improvement to smp_cond_load_*() and see how that performs before we
> commit
> to something more intrusive. Marc and I looked at this very briefly in
> the
> past, and the basic idea is to register all of the WFE sites with the
> hypervisor, indicating which register contains the address being spun
> on
> and which register contains the "bad" value. That way, you don't bother
> rescheduling a vCPU if the value at the address is still bad, because
> you
> know it will exit immediately.
>
> Of course, the devil is in the details because when I say "address",
> that's
> a guest virtual address, so you need to play some tricks in the
> hypervisor
> so that you have a separate mapping for the lockword (it's enough to
> keep
> track of the physical address).
>
> Our hacks are here but we basically ran out of time to work on them
> beyond
> an unoptimised and hacky prototype:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/pvcy
>
> Marc -- how would you prefer to handle this?

Let me try and rebase this thing to a modern kernel (I doubt it applies
without
conflicts to mainline). We can then have discussion about its merit on
the list
once I post it. It'd be good to have a pointer to the benchamrks that
have been
used here.

Thanks,

M.
--
Jazz is not dead. It just smells funny...

2020-12-29 08:52:01

by yezengruan

[permalink] [raw]
Subject: Re: [PATCH v2 0/6] KVM: arm64: VCPU preempted check support

On 2020/1/15 22:14, Marc Zyngier wrote:
> On 2020-01-13 12:12, Will Deacon wrote:
>> [+PeterZ]
>>
>> On Thu, Dec 26, 2019 at 09:58:27PM +0800, Zengruan Ye wrote:
>>> This patch set aims to support the vcpu_is_preempted() functionality
>>> under KVM/arm64, which allowing the guest to obtain the VCPU is
>>> currently running or not. This will enhance lock performance on
>>> overcommitted hosts (more runnable VCPUs than physical CPUs in the
>>> system) as doing busy waits for preempted VCPUs will hurt system
>>> performance far worse than early yielding.
>>>
>>> We have observed some performace improvements in uninx benchmark tests.
>>>
>>> unix benchmark result:
>>>   host:  kernel 5.5.0-rc1, HiSilicon Kunpeng920, 8 CPUs
>>>   guest: kernel 5.5.0-rc1, 16 VCPUs
>>>
>>>                test-case                |    after-patch    |   before-patch
>>> ----------------------------------------+-------------------+------------------
>>>  Dhrystone 2 using register variables   | 334600751.0 lps   | 335319028.3 lps
>>>  Double-Precision Whetstone             |     32856.1 MWIPS |     32849.6 MWIPS
>>>  Execl Throughput                       |      3662.1 lps   |      2718.0 lps
>>>  File Copy 1024 bufsize 2000 maxblocks  |    432906.4 KBps  |    158011.8 KBps
>>>  File Copy 256 bufsize 500 maxblocks    |    116023.0 KBps  |     37664.0 KBps
>>>  File Copy 4096 bufsize 8000 maxblocks  |   1432769.8 KBps  |    441108.8 KBps
>>>  Pipe Throughput                        |   6405029.6 lps   |   6021457.6 lps
>>>  Pipe-based Context Switching           |    185872.7 lps   |    184255.3 lps
>>>  Process Creation                       |      4025.7 lps   |      3706.6 lps
>>>  Shell Scripts (1 concurrent)           |      6745.6 lpm   |      6436.1 lpm
>>>  Shell Scripts (8 concurrent)           |       998.7 lpm   |       931.1 lpm
>>>  System Call Overhead                   |   3913363.1 lps   |   3883287.8 lps
>>> ----------------------------------------+-------------------+------------------
>>>  System Benchmarks Index Score          |      1835.1       |      1327.6
>>
>> Interesting, thanks for the numbers.
>>
>> So it looks like there is a decent improvement to be had from targetted vCPU
>> wakeup, but I really dislike the explicit PV interface and it's already been
>> shown to interact badly with the WFE-based polling in smp_cond_load_*().
>>
>> Rather than expose a divergent interface, I would instead like to explore an
>> improvement to smp_cond_load_*() and see how that performs before we commit
>> to something more intrusive. Marc and I looked at this very briefly in the
>> past, and the basic idea is to register all of the WFE sites with the
>> hypervisor, indicating which register contains the address being spun on
>> and which register contains the "bad" value. That way, you don't bother
>> rescheduling a vCPU if the value at the address is still bad, because you
>> know it will exit immediately.
>>
>> Of course, the devil is in the details because when I say "address", that's
>> a guest virtual address, so you need to play some tricks in the hypervisor
>> so that you have a separate mapping for the lockword (it's enough to keep
>> track of the physical address).
>>
>> Our hacks are here but we basically ran out of time to work on them beyond
>> an unoptimised and hacky prototype:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/pvcy
>>
>> Marc -- how would you prefer to handle this?
>
> Let me try and rebase this thing to a modern kernel (I doubt it applies without
> conflicts to mainline). We can then have discussion about its merit on the list
> once I post it. It'd be good to have a pointer to the benchamrks that have been
> used here.
>
> Thanks,
>
>         M.


Hi Marc, Will,

My apologies for the slow reply. Just checking what is the latest on this
PV cond yield prototype?

https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/pvcy

The following are the unixbench test results of PV cond yield prototype:

unix benchmark result:
host: kernel 5.10.0-rc6, HiSilicon Kunpeng920, 8 CPUs
guest: kernel 5.10.0-rc6, 16 VCPUs
| 5.10.0-rc6 | pv_cond_yield | vcpu_is_preempted
System Benchmarks Index Values | INDEX | INDEX | INDEX
---------------------------------------+------------+---------------+-------------------
Dhrystone 2 using register variables | 29164.0 | 29156.9 | 29207.2
Double-Precision Whetstone | 6807.6 | 6789.2 | 6912.1
Execl Throughput | 856.7 | 1195.6 | 863.1
File Copy 1024 bufsize 2000 maxblocks | 189.9 | 923.5 | 1094.2
File Copy 256 bufsize 500 maxblocks | 121.9 | 578.4 | 588.7
File Copy 4096 bufsize 8000 maxblocks | 419.9 | 1992.0 | 2733.7
Pipe Throughput | 6727.2 | 6670.2 | 6743.2
Pipe-based Context Switching | 486.9 | 547.0 | 471.9
Process Creation | 353.4 | 345.1 | 338.5
Shell Scripts (1 concurrent) | 3187.2 | 1432.2 | 2798.7
Shell Scripts (8 concurrent) | 3410.5 | 1360.1 | 2672.9
System Call Overhead | 2967.0 | 3273.9 | 3497.9
---------------------------------------+------------+---------------+-------------------
System Benchmarks Index Score | 1410.0 | 1885.8 | 2128.5


Thanks,

Zengruan