Hello,
Continuing the discussion from [1], the series tries to add support
for the user-space to elect the hypercall services that it wishes
to expose to the guest, rather than the guest discovering them
unconditionally. The idea employed by the series was taken from
[1] as suggested by Marc Z.
In a broad sense, the idea is similar to the current implementation
of PSCI interface- create a 'firmware psuedo-register' to handle the
firmware revisions. The series extends this idea to all the other
hypercalls such as TRNG (True Random Number Generator), PV_TIME
(Paravirtualized Time), and PTP (Precision Time protocol).
For better categorization and future scaling, these firmware registers
are categorized based on the service call owners, but unlike the
existing firmware psuedo-registers, they hold the features supported
in the form of a bitmap. During VM (vCPU) initialization, the registers
shows an upper-limit of the features supported by the corresponding
registers. The VMM can simply use GET_ONE_REG to discover the features.
If it's unhappy with any of the features, it can simply write-back the
desired feature bitmap using SET_ONE_REG.
KVM allows these modification only until a VM has started. KVM also
assumes that the VMM is unaware of a register if a register remains
unaccessed (read/write), and would simply clear all the bits of the
registers such that the guest accidently doesn't get exposed to the
features. Finally, the set of bitmaps from all the registers are the
services that are exposed to the guest.
In order to provide backward compatibility with already existing VMMs,
a new capability, KVM_CAP_ARM_HVC_FW_REG_BMAP, is introduced. To enable
the bitmap firmware registers extension, the capability must be
explicitly enabled. If not, the behavior is similar to the previous
setup.
The patches are based off of mainline kernel 5.15, with the selftest
patches from [2] applied.
Patch-1 factors out the non-PSCI related interface from psci.c to
hypercalls.c, as the series would extend the list in the upcoming
patches.
Patches-2,3 introduces core KVM functions, kvm_vcpu_has_run_once()
and kvm_vm_has_run_once() to be used in upcoming patches.
Patch-4 sets up the framework for the bitmap firmware psuedo-registers.
This includes introducing the capability, KVM_CAP_ARM_HVC_FW_REG_BMAP,
read/write helpers for the registers, helper to sanitize the regsiters
before VM start, and another helper to check if a particular hypercall
service is supported for the guest.
It also adds the register KVM_REG_ARM_STD_HYP_BMAP to support ARM's
standard secure services.
Patch-5 introduces the firmware register, KVM_REG_ARM_STD_HYP_BMAP,
which holds the standard hypervisor services (such as PV_TIME).
Patch-6 introduces the firmware register, KVM_REG_ARM_VENDOR_HYP_BMAP,
which holds the vendor specific hypercall services.
Patch-7,8 Add the necessary documentation for the newly added capability
and firmware registers.
Patch-9 imports the SMCCC definitions from linux/arm-smccc.h into tools/
for further use in selftests.
Patch-10 adds the selftest to test the guest (using 'hvc') and VMM
interfaces (SET/GET_ONE_REG).
Patch-11 adds these firmware registers into the get-reg-list selftest.
[1]: https://lore.kernel.org/kvmarm/[email protected]/T/
[2]: https://lore.kernel.org/kvmarm/[email protected]/
Regards,
Raghavendra
v1 -> v2
Addressed comments by Oliver (thanks!):
- Introduced kvm_vcpu_has_run_once() and kvm_vm_has_run_once() in the
core kvm code, rather than relying on ARM specific vcpu->arch.has_run_once.
- Writing to KVM_REG_ARM_PSCI_VERSION is done in hypercalls.c itself,
rather than separating out to psci.c.
- Introduced KVM_CAP_ARM_HVC_FW_REG_BMAP to enable the extension.
- Tracks the register accesses from VMM to decide whether to sanitize
a register or not, as opposed to sanitizing upon the first 'write'
in v1.
- kvm_hvc_call_supported() is implemented using a direct switch-case
statement, instead of looping over all the registers to pick the
register for the function-id.
- Replaced the register bit definitions with #defines, instead of enums.
- Removed the patch v1-06/08 that imports the firmware register
definitions as it's not needed.
- Separated out the documentations in its own patch, and the renaming
of hypercalls.rst to psci.rst into another patch.
- Add the new firmware registers to get-reg-list KVM selftest.
v1: https://lore.kernel.org/kvmarm/[email protected]/
Raghavendra Rao Ananta (11):
KVM: arm64: Factor out firmware register handling from psci.c
KVM: Introduce kvm_vcpu_has_run_once
KVM: Introduce kvm_vm_has_run_once
KVM: arm64: Setup a framework for hypercall bitmap firmware registers
KVM: arm64: Add standard hypervisor firmware register
KVM: arm64: Add vendor hypervisor firmware register
Docs: KVM: Add doc for the bitmap firmware registers
Docs: KVM: Rename psci.rst to hypercalls.rst
tools: Import ARM SMCCC definitions
selftests: KVM: aarch64: Introduce hypercall ABI test
selftests: KVM: aarch64: Add the bitmap firmware registers to
get-reg-list
Documentation/virt/kvm/api.rst | 23 +
Documentation/virt/kvm/arm/hypercalls.rst | 132 ++++++
Documentation/virt/kvm/arm/psci.rst | 77 ---
arch/arm64/include/asm/kvm_host.h | 21 +-
arch/arm64/include/uapi/asm/kvm.h | 12 +
arch/arm64/kvm/arm.c | 31 +-
arch/arm64/kvm/guest.c | 2 +-
arch/arm64/kvm/hypercalls.c | 437 +++++++++++++++++-
arch/arm64/kvm/psci.c | 166 -------
arch/arm64/kvm/pvtime.c | 3 +
arch/arm64/kvm/trng.c | 9 +-
arch/arm64/kvm/vgic/vgic-init.c | 2 +-
arch/riscv/include/asm/kvm_host.h | 3 -
arch/riscv/kvm/vcpu.c | 7 +-
include/kvm/arm_hypercalls.h | 20 +
include/kvm/arm_psci.h | 7 -
include/linux/kvm_host.h | 9 +
include/uapi/linux/kvm.h | 1 +
tools/include/linux/arm-smccc.h | 188 ++++++++
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/aarch64/get-reg-list.c | 35 ++
.../selftests/kvm/aarch64/hypercalls.c | 367 +++++++++++++++
virt/kvm/kvm_main.c | 18 +
24 files changed, 1291 insertions(+), 281 deletions(-)
create mode 100644 Documentation/virt/kvm/arm/hypercalls.rst
delete mode 100644 Documentation/virt/kvm/arm/psci.rst
create mode 100644 tools/include/linux/arm-smccc.h
create mode 100644 tools/testing/selftests/kvm/aarch64/hypercalls.c
--
2.34.0.rc1.387.gb447b232ab-goog
Architectures such as arm64 and riscv uses vcpu variables
such as has_run_once and ran_atleast_once, respectively,
to mark if the vCPU has started running. Since these are
architecture agnostic variables, introduce
kvm_vcpu_has_run_once() as a core kvm functionality and
use this instead of the architecture defined variables.
No functional change intended.
Signed-off-by: Raghavendra Rao Ananta <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 3 ---
arch/arm64/kvm/arm.c | 8 ++++----
arch/arm64/kvm/vgic/vgic-init.c | 2 +-
arch/riscv/include/asm/kvm_host.h | 3 ---
arch/riscv/kvm/vcpu.c | 7 ++-----
include/linux/kvm_host.h | 7 +++++++
virt/kvm/kvm_main.c | 1 +
7 files changed, 15 insertions(+), 16 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 4be8486042a7..02dffe50a20c 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -367,9 +367,6 @@ struct kvm_vcpu_arch {
int target;
DECLARE_BITMAP(features, KVM_VCPU_MAX_FEATURES);
- /* Detect first run of a vcpu */
- bool has_run_once;
-
/* Virtual SError ESR to restore when HCR_EL2.VSE is set */
u64 vsesr_el2;
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index f5490afe1ebf..0cc148211b4e 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -344,7 +344,7 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
{
- if (vcpu->arch.has_run_once && unlikely(!irqchip_in_kernel(vcpu->kvm)))
+ if (kvm_vcpu_has_run_once(vcpu) && unlikely(!irqchip_in_kernel(vcpu->kvm)))
static_branch_dec(&userspace_irqchip_in_use);
kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
@@ -582,13 +582,13 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
struct kvm *kvm = vcpu->kvm;
int ret = 0;
- if (likely(vcpu->arch.has_run_once))
+ if (likely(kvm_vcpu_has_run_once(vcpu)))
return 0;
if (!kvm_arm_vcpu_is_finalized(vcpu))
return -EPERM;
- vcpu->arch.has_run_once = true;
+ vcpu->has_run_once = true;
kvm_arm_vcpu_init_debug(vcpu);
@@ -1116,7 +1116,7 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
* need to invalidate the I-cache though, as FWB does *not*
* imply CTR_EL0.DIC.
*/
- if (vcpu->arch.has_run_once) {
+ if (kvm_vcpu_has_run_once(vcpu)) {
if (!cpus_have_final_cap(ARM64_HAS_STAGE2_FWB))
stage2_unmap_vm(vcpu->kvm);
else
diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index 0a06d0648970..6fb41097880b 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -91,7 +91,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
return ret;
kvm_for_each_vcpu(i, vcpu, kvm) {
- if (vcpu->arch.has_run_once)
+ if (kvm_vcpu_has_run_once(vcpu))
goto out_unlock;
}
ret = 0;
diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index 25ba21f98504..645e95f61d47 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -147,9 +147,6 @@ struct kvm_vcpu_csr {
};
struct kvm_vcpu_arch {
- /* VCPU ran at least once */
- bool ran_atleast_once;
-
/* ISA feature bits (similar to MISA) */
unsigned long isa;
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index e3d3aed46184..18cbc8b0c03d 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -75,9 +75,6 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
{
struct kvm_cpu_context *cntx;
- /* Mark this VCPU never ran */
- vcpu->arch.ran_atleast_once = false;
-
/* Setup ISA features available to VCPU */
vcpu->arch.isa = riscv_isa_extension_base(NULL) & KVM_RISCV_ISA_ALLOWED;
@@ -190,7 +187,7 @@ static int kvm_riscv_vcpu_set_reg_config(struct kvm_vcpu *vcpu,
switch (reg_num) {
case KVM_REG_RISCV_CONFIG_REG(isa):
- if (!vcpu->arch.ran_atleast_once) {
+ if (!kvm_vcpu_has_run_once(vcpu)) {
vcpu->arch.isa = reg_val;
vcpu->arch.isa &= riscv_isa_extension_base(NULL);
vcpu->arch.isa &= KVM_RISCV_ISA_ALLOWED;
@@ -682,7 +679,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
struct kvm_run *run = vcpu->run;
/* Mark this VCPU ran at least once */
- vcpu->arch.ran_atleast_once = true;
+ vcpu->has_run_once = true;
vcpu->arch.srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 60a35d9fe259..b373929c71eb 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -360,6 +360,8 @@ struct kvm_vcpu {
* it is a valid slot.
*/
int last_used_slot;
+
+ bool has_run_once;
};
/* must be called with irqs disabled */
@@ -1847,4 +1849,9 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu)
/* Max number of entries allowed for each kvm dirty ring */
#define KVM_DIRTY_RING_MAX_ENTRIES 65536
+static inline bool kvm_vcpu_has_run_once(struct kvm_vcpu *vcpu)
+{
+ return vcpu->has_run_once;
+}
+
#endif
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 3f6d450355f0..1ec8a8e959b2 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -433,6 +433,7 @@ static void kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
vcpu->ready = false;
preempt_notifier_init(&vcpu->preempt_notifier, &kvm_preempt_ops);
vcpu->last_used_slot = 0;
+ vcpu->has_run_once = false;
}
void kvm_vcpu_destroy(struct kvm_vcpu *vcpu)
--
2.34.0.rc1.387.gb447b232ab-goog
Common hypercall firmware register handing is currently employed
by psci.c. Since the upcoming patches add more of these registers,
it's better to move the generic handling to hypercall.c for a
cleaner presentation.
While we are at it, collect all the firmware registers under
fw_reg_ids[] to help implement kvm_arm_get_fw_num_regs() and
kvm_arm_copy_fw_reg_indices() in a generic way.
No functional change intended.
Signed-off-by: Raghavendra Rao Ananta <[email protected]>
---
arch/arm64/kvm/guest.c | 2 +-
arch/arm64/kvm/hypercalls.c | 170 +++++++++++++++++++++++++++++++++++
arch/arm64/kvm/psci.c | 166 ----------------------------------
include/kvm/arm_hypercalls.h | 7 ++
include/kvm/arm_psci.h | 7 --
5 files changed, 178 insertions(+), 174 deletions(-)
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 5ce26bedf23c..625f97f7b304 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -18,7 +18,7 @@
#include <linux/string.h>
#include <linux/vmalloc.h>
#include <linux/fs.h>
-#include <kvm/arm_psci.h>
+#include <kvm/arm_hypercalls.h>
#include <asm/cputype.h>
#include <linux/uaccess.h>
#include <asm/fpsimd.h>
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 30da78f72b3b..9e136d91b470 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -146,3 +146,173 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
return 1;
}
+
+static const u64 fw_reg_ids[] = {
+ KVM_REG_ARM_PSCI_VERSION,
+ KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1,
+ KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2,
+};
+
+int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu)
+{
+ return ARRAY_SIZE(fw_reg_ids);
+}
+
+int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(fw_reg_ids); i++) {
+ if (put_user(fw_reg_ids[i], uindices))
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
+#define KVM_REG_FEATURE_LEVEL_WIDTH 4
+#define KVM_REG_FEATURE_LEVEL_MASK (BIT(KVM_REG_FEATURE_LEVEL_WIDTH) - 1)
+
+/*
+ * Convert the workaround level into an easy-to-compare number, where higher
+ * values mean better protection.
+ */
+static int get_kernel_wa_level(u64 regid)
+{
+ switch (regid) {
+ case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
+ switch (arm64_get_spectre_v2_state()) {
+ case SPECTRE_VULNERABLE:
+ return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_AVAIL;
+ case SPECTRE_MITIGATED:
+ return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_AVAIL;
+ case SPECTRE_UNAFFECTED:
+ return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_REQUIRED;
+ }
+ return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_AVAIL;
+ case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
+ switch (arm64_get_spectre_v4_state()) {
+ case SPECTRE_MITIGATED:
+ /*
+ * As for the hypercall discovery, we pretend we
+ * don't have any FW mitigation if SSBS is there at
+ * all times.
+ */
+ if (cpus_have_final_cap(ARM64_SSBS))
+ return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL;
+ fallthrough;
+ case SPECTRE_UNAFFECTED:
+ return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED;
+ case SPECTRE_VULNERABLE:
+ return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL;
+ }
+ }
+
+ return -EINVAL;
+}
+
+int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
+{
+ void __user *uaddr = (void __user *)(long)reg->addr;
+ u64 val;
+
+ switch (reg->id) {
+ case KVM_REG_ARM_PSCI_VERSION:
+ val = kvm_psci_version(vcpu, vcpu->kvm);
+ break;
+ case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
+ case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
+ val = get_kernel_wa_level(reg->id) & KVM_REG_FEATURE_LEVEL_MASK;
+ break;
+ default:
+ return -ENOENT;
+ }
+
+ if (copy_to_user(uaddr, &val, KVM_REG_SIZE(reg->id)))
+ return -EFAULT;
+
+ return 0;
+}
+
+int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
+{
+ void __user *uaddr = (void __user *)(long)reg->addr;
+ u64 val;
+ int wa_level;
+
+ if (copy_from_user(&val, uaddr, KVM_REG_SIZE(reg->id)))
+ return -EFAULT;
+
+ switch (reg->id) {
+ case KVM_REG_ARM_PSCI_VERSION:
+ {
+ bool wants_02;
+
+ wants_02 = test_bit(KVM_ARM_VCPU_PSCI_0_2, vcpu->arch.features);
+
+ switch (val) {
+ case KVM_ARM_PSCI_0_1:
+ if (wants_02)
+ return -EINVAL;
+ vcpu->kvm->arch.psci_version = val;
+ return 0;
+ case KVM_ARM_PSCI_0_2:
+ case KVM_ARM_PSCI_1_0:
+ if (!wants_02)
+ return -EINVAL;
+ vcpu->kvm->arch.psci_version = val;
+ return 0;
+ }
+ break;
+ }
+
+ case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
+ if (val & ~KVM_REG_FEATURE_LEVEL_MASK)
+ return -EINVAL;
+
+ if (get_kernel_wa_level(reg->id) < val)
+ return -EINVAL;
+
+ return 0;
+
+ case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
+ if (val & ~(KVM_REG_FEATURE_LEVEL_MASK |
+ KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED))
+ return -EINVAL;
+
+ /* The enabled bit must not be set unless the level is AVAIL. */
+ if ((val & KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED) &&
+ (val & KVM_REG_FEATURE_LEVEL_MASK) != KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_AVAIL)
+ return -EINVAL;
+
+ /*
+ * Map all the possible incoming states to the only two we
+ * really want to deal with.
+ */
+ switch (val & KVM_REG_FEATURE_LEVEL_MASK) {
+ case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL:
+ case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_UNKNOWN:
+ wa_level = KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL;
+ break;
+ case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_AVAIL:
+ case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED:
+ wa_level = KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ /*
+ * We can deal with NOT_AVAIL on NOT_REQUIRED, but not the
+ * other way around.
+ */
+ if (get_kernel_wa_level(reg->id) < wa_level)
+ return -EINVAL;
+
+ return 0;
+ default:
+ return -ENOENT;
+ }
+
+ return -EINVAL;
+}
diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index 74c47d420253..6c8323ae32f2 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -403,169 +403,3 @@ int kvm_psci_call(struct kvm_vcpu *vcpu)
return -EINVAL;
};
}
-
-int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu)
-{
- return 3; /* PSCI version and two workaround registers */
-}
-
-int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
-{
- if (put_user(KVM_REG_ARM_PSCI_VERSION, uindices++))
- return -EFAULT;
-
- if (put_user(KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1, uindices++))
- return -EFAULT;
-
- if (put_user(KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2, uindices++))
- return -EFAULT;
-
- return 0;
-}
-
-#define KVM_REG_FEATURE_LEVEL_WIDTH 4
-#define KVM_REG_FEATURE_LEVEL_MASK (BIT(KVM_REG_FEATURE_LEVEL_WIDTH) - 1)
-
-/*
- * Convert the workaround level into an easy-to-compare number, where higher
- * values mean better protection.
- */
-static int get_kernel_wa_level(u64 regid)
-{
- switch (regid) {
- case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
- switch (arm64_get_spectre_v2_state()) {
- case SPECTRE_VULNERABLE:
- return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_AVAIL;
- case SPECTRE_MITIGATED:
- return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_AVAIL;
- case SPECTRE_UNAFFECTED:
- return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_REQUIRED;
- }
- return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_AVAIL;
- case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
- switch (arm64_get_spectre_v4_state()) {
- case SPECTRE_MITIGATED:
- /*
- * As for the hypercall discovery, we pretend we
- * don't have any FW mitigation if SSBS is there at
- * all times.
- */
- if (cpus_have_final_cap(ARM64_SSBS))
- return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL;
- fallthrough;
- case SPECTRE_UNAFFECTED:
- return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED;
- case SPECTRE_VULNERABLE:
- return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL;
- }
- }
-
- return -EINVAL;
-}
-
-int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
-{
- void __user *uaddr = (void __user *)(long)reg->addr;
- u64 val;
-
- switch (reg->id) {
- case KVM_REG_ARM_PSCI_VERSION:
- val = kvm_psci_version(vcpu, vcpu->kvm);
- break;
- case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
- case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
- val = get_kernel_wa_level(reg->id) & KVM_REG_FEATURE_LEVEL_MASK;
- break;
- default:
- return -ENOENT;
- }
-
- if (copy_to_user(uaddr, &val, KVM_REG_SIZE(reg->id)))
- return -EFAULT;
-
- return 0;
-}
-
-int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
-{
- void __user *uaddr = (void __user *)(long)reg->addr;
- u64 val;
- int wa_level;
-
- if (copy_from_user(&val, uaddr, KVM_REG_SIZE(reg->id)))
- return -EFAULT;
-
- switch (reg->id) {
- case KVM_REG_ARM_PSCI_VERSION:
- {
- bool wants_02;
-
- wants_02 = test_bit(KVM_ARM_VCPU_PSCI_0_2, vcpu->arch.features);
-
- switch (val) {
- case KVM_ARM_PSCI_0_1:
- if (wants_02)
- return -EINVAL;
- vcpu->kvm->arch.psci_version = val;
- return 0;
- case KVM_ARM_PSCI_0_2:
- case KVM_ARM_PSCI_1_0:
- if (!wants_02)
- return -EINVAL;
- vcpu->kvm->arch.psci_version = val;
- return 0;
- }
- break;
- }
-
- case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
- if (val & ~KVM_REG_FEATURE_LEVEL_MASK)
- return -EINVAL;
-
- if (get_kernel_wa_level(reg->id) < val)
- return -EINVAL;
-
- return 0;
-
- case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
- if (val & ~(KVM_REG_FEATURE_LEVEL_MASK |
- KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED))
- return -EINVAL;
-
- /* The enabled bit must not be set unless the level is AVAIL. */
- if ((val & KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED) &&
- (val & KVM_REG_FEATURE_LEVEL_MASK) != KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_AVAIL)
- return -EINVAL;
-
- /*
- * Map all the possible incoming states to the only two we
- * really want to deal with.
- */
- switch (val & KVM_REG_FEATURE_LEVEL_MASK) {
- case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL:
- case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_UNKNOWN:
- wa_level = KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL;
- break;
- case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_AVAIL:
- case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED:
- wa_level = KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED;
- break;
- default:
- return -EINVAL;
- }
-
- /*
- * We can deal with NOT_AVAIL on NOT_REQUIRED, but not the
- * other way around.
- */
- if (get_kernel_wa_level(reg->id) < wa_level)
- return -EINVAL;
-
- return 0;
- default:
- return -ENOENT;
- }
-
- return -EINVAL;
-}
diff --git a/include/kvm/arm_hypercalls.h b/include/kvm/arm_hypercalls.h
index 0e2509d27910..5d38628a8d04 100644
--- a/include/kvm/arm_hypercalls.h
+++ b/include/kvm/arm_hypercalls.h
@@ -40,4 +40,11 @@ static inline void smccc_set_retval(struct kvm_vcpu *vcpu,
vcpu_set_reg(vcpu, 3, a3);
}
+struct kvm_one_reg;
+
+int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu);
+int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
+int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
+int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
+
#endif
diff --git a/include/kvm/arm_psci.h b/include/kvm/arm_psci.h
index 5b58bd2fe088..080c2d0bd6e7 100644
--- a/include/kvm/arm_psci.h
+++ b/include/kvm/arm_psci.h
@@ -42,11 +42,4 @@ static inline int kvm_psci_version(struct kvm_vcpu *vcpu, struct kvm *kvm)
int kvm_psci_call(struct kvm_vcpu *vcpu);
-struct kvm_one_reg;
-
-int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu);
-int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
-int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
-int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
-
#endif /* __KVM_ARM_PSCI_H__ */
--
2.34.0.rc1.387.gb447b232ab-goog
The upcoming patches need a way to detect if the VM, as
a whole, has started. Hence, unionize kvm_vcpu_has_run_once()
of all the vcpus of the VM and build kvm_vm_has_run_once()
to achieve the functionality.
No functional change intended.
Signed-off-by: Raghavendra Rao Ananta <[email protected]>
---
include/linux/kvm_host.h | 2 ++
virt/kvm/kvm_main.c | 17 +++++++++++++++++
2 files changed, 19 insertions(+)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b373929c71eb..102e00c0e21c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1854,4 +1854,6 @@ static inline bool kvm_vcpu_has_run_once(struct kvm_vcpu *vcpu)
return vcpu->has_run_once;
}
+bool kvm_vm_has_run_once(struct kvm *kvm);
+
#endif
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1ec8a8e959b2..3d8d96e8f61d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4339,6 +4339,23 @@ static int kvm_vm_ioctl_get_stats_fd(struct kvm *kvm)
return fd;
}
+bool kvm_vm_has_run_once(struct kvm *kvm)
+{
+ int i, ret = false;
+ struct kvm_vcpu *vcpu;
+
+ mutex_lock(&kvm->lock);
+
+ kvm_for_each_vcpu(i, vcpu, kvm) {
+ ret = kvm_vcpu_has_run_once(vcpu);
+ if (ret)
+ break;
+ }
+
+ mutex_unlock(&kvm->lock);
+ return ret;
+}
+
static long kvm_vm_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
{
--
2.34.0.rc1.387.gb447b232ab-goog
KVM regularly introduces new hypercall services to the guests without
any consent from the Virtual Machine Manager (VMM). This means, the
guests can observe hypercall services in and out as they migrate
across various host kernel versions. This could be a major problem
if the guest discovered a hypercall, started using it, and after
getting migrated to an older kernel realizes that it's no longer
available. Depending on how the guest handles the change, there's
a potential chance that the guest would just panic.
As a result, there's a need for the VMM to elect the services that
it wishes the guest to discover. VMM can elect these services based
on the kernels spread across its (migration) fleet. To remedy this,
extend the existing firmware psuedo-registers, such as
KVM_REG_ARM_PSCI_VERSION, for all the hypercall services available.
These firmware registers are categorized based on the service call
owners, and unlike the existing firmware psuedo-registers, they hold
the features supported in the form of a bitmap. During VM (vCPU)
initialization, the registers shows an upper-limit of the features
supported by the corresponding registers. The VMM can simply use
GET_ONE_REG to discover the features. If it's unhappy with any of
the features, it can simply write-back the desired feature bitmap
using SET_ONE_REG.
KVM allows these modification only until a VM has started. KVM also
assumes that the VMM is unaware of a register if a register remains
unaccessed (read/write), and would simply clear all the bits of the
registers such that the guest accidently doesn't get exposed to the
features. Finally, the set of bitmaps from all the registers are the
services that are exposed to the guest.
In order to provide backward compatibility with already existing VMMs,
a new capability, KVM_CAP_ARM_HVC_FW_REG_BMAP, is introduced. To enable
the bitmap firmware registers extension, the capability must be
explicitly enabled. If not, the behavior is similar to the previous
setup.
In this patch, the framework adds the register only for ARM's standard
secure services (owner value 4). Currently, this includes support only
for ARM True Random Number Generator (TRNG) service, with bit-0 of the
register representing mandatory features of v1.0. Other services are
momentarily added in the upcoming patches.
Signed-off-by: Raghavendra Rao Ananta <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 16 +++
arch/arm64/include/uapi/asm/kvm.h | 4 +
arch/arm64/kvm/arm.c | 23 +++-
arch/arm64/kvm/hypercalls.c | 217 +++++++++++++++++++++++++++++-
arch/arm64/kvm/trng.c | 9 +-
include/kvm/arm_hypercalls.h | 7 +
include/uapi/linux/kvm.h | 1 +
7 files changed, 262 insertions(+), 15 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 02dffe50a20c..1546a2f973ef 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -102,6 +102,19 @@ struct kvm_s2_mmu {
struct kvm_arch_memory_slot {
};
+struct hvc_fw_reg_bmap {
+ bool accessed;
+ u64 reg_id;
+ u64 bmap;
+};
+
+struct hvc_reg_desc {
+ spinlock_t lock;
+ bool fw_reg_bmap_enabled;
+
+ struct hvc_fw_reg_bmap hvc_std_bmap;
+};
+
struct kvm_arch {
struct kvm_s2_mmu mmu;
@@ -137,6 +150,9 @@ struct kvm_arch {
/* Memory Tagging Extension enabled for the guest */
bool mte_enabled;
+
+ /* Hypercall firmware registers' descriptor */
+ struct hvc_reg_desc hvc_desc;
};
struct kvm_vcpu_fault_info {
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index b3edde68bc3e..d6e099ed14ef 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -281,6 +281,10 @@ struct kvm_arm_copy_mte_tags {
#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED 3
#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED (1U << 4)
+#define KVM_REG_ARM_STD_BMAP KVM_REG_ARM_FW_REG(3)
+#define KVM_REG_ARM_STD_BIT_TRNG_V1_0 BIT(0)
+#define KVM_REG_ARM_STD_BMAP_BIT_MAX 0 /* Last valid bit */
+
/* SVE registers */
#define KVM_REG_ARM64_SVE (0x15 << KVM_REG_ARM_COPROC_SHIFT)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 0cc148211b4e..f2099e4d1109 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -81,26 +81,32 @@ int kvm_arch_check_processor_compat(void *opaque)
int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
struct kvm_enable_cap *cap)
{
- int r;
+ int r = 0;
+ struct hvc_reg_desc *hvc_desc = &kvm->arch.hvc_desc;
if (cap->flags)
return -EINVAL;
switch (cap->cap) {
case KVM_CAP_ARM_NISV_TO_USER:
- r = 0;
kvm->arch.return_nisv_io_abort_to_user = true;
break;
case KVM_CAP_ARM_MTE:
mutex_lock(&kvm->lock);
- if (!system_supports_mte() || kvm->created_vcpus) {
+ if (!system_supports_mte() || kvm->created_vcpus)
r = -EINVAL;
- } else {
- r = 0;
+ else
kvm->arch.mte_enabled = true;
- }
mutex_unlock(&kvm->lock);
break;
+ case KVM_CAP_ARM_HVC_FW_REG_BMAP:
+ if (kvm_vm_has_run_once(kvm))
+ return -EBUSY;
+
+ spin_lock(&hvc_desc->lock);
+ hvc_desc->fw_reg_bmap_enabled = true;
+ spin_unlock(&hvc_desc->lock);
+ break;
default:
r = -EINVAL;
break;
@@ -157,6 +163,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
set_default_spectre(kvm);
+ kvm_arm_init_hypercalls(kvm);
+
return ret;
out_free_stage2_pgd:
kvm_free_stage2_pgd(&kvm->arch.mmu);
@@ -215,6 +223,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_SET_GUEST_DEBUG:
case KVM_CAP_VCPU_ATTRIBUTES:
case KVM_CAP_PTP_KVM:
+ case KVM_CAP_ARM_HVC_FW_REG_BMAP:
r = 1;
break;
case KVM_CAP_SET_GUEST_DEBUG2:
@@ -622,6 +631,8 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
if (kvm_vm_is_protected(kvm))
kvm_call_hyp_nvhe(__pkvm_vcpu_init_traps, vcpu);
+ kvm_arm_sanitize_fw_regs(kvm);
+
return ret;
}
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 9e136d91b470..f5df7bc61146 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -58,6 +58,41 @@ static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
val[3] = lower_32_bits(cycles);
}
+static bool
+kvm_arm_fw_reg_feat_enabled(struct hvc_fw_reg_bmap *reg_bmap, u64 feat_bit)
+{
+ return reg_bmap->bmap & feat_bit;
+}
+
+bool kvm_hvc_call_supported(struct kvm_vcpu *vcpu, u32 func_id)
+{
+ struct hvc_reg_desc *hvc_desc = &vcpu->kvm->arch.hvc_desc;
+
+ /*
+ * To ensure backward compatibility, support all the service calls,
+ * including new additions, if the firmware registers holding the
+ * feature bitmaps isn't explicitly enabled.
+ */
+ if (!hvc_desc->fw_reg_bmap_enabled)
+ return true;
+
+ switch (func_id) {
+ case ARM_SMCCC_TRNG_VERSION:
+ case ARM_SMCCC_TRNG_FEATURES:
+ case ARM_SMCCC_TRNG_GET_UUID:
+ case ARM_SMCCC_TRNG_RND32:
+ case ARM_SMCCC_TRNG_RND64:
+ return kvm_arm_fw_reg_feat_enabled(&hvc_desc->hvc_std_bmap,
+ KVM_REG_ARM_STD_BIT_TRNG_V1_0);
+ default:
+ /* By default, allow the services that aren't listed here */
+ return true;
+ }
+
+ /* We shouldn't be reaching here */
+ return true;
+}
+
int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
{
u32 func_id = smccc_get_function(vcpu);
@@ -65,6 +100,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
u32 feature;
gpa_t gpa;
+ if (!kvm_hvc_call_supported(vcpu, func_id))
+ goto out;
+
switch (func_id) {
case ARM_SMCCC_VERSION_FUNC_ID:
val[0] = ARM_SMCCC_VERSION_1_1;
@@ -143,6 +181,7 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
return kvm_psci_call(vcpu);
}
+out:
smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
return 1;
}
@@ -153,17 +192,178 @@ static const u64 fw_reg_ids[] = {
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2,
};
+static const u64 fw_reg_bmap_ids[] = {
+ KVM_REG_ARM_STD_BMAP,
+};
+
+static void kvm_arm_fw_reg_init_hvc(struct hvc_reg_desc *hvc_desc,
+ struct hvc_fw_reg_bmap *fw_reg_bmap,
+ u64 reg_id, u64 default_map)
+{
+ fw_reg_bmap->reg_id = reg_id;
+ fw_reg_bmap->bmap = default_map;
+}
+
+void kvm_arm_init_hypercalls(struct kvm *kvm)
+{
+ struct hvc_reg_desc *hvc_desc = &kvm->arch.hvc_desc;
+
+ spin_lock_init(&hvc_desc->lock);
+
+ kvm_arm_fw_reg_init_hvc(hvc_desc, &hvc_desc->hvc_std_bmap,
+ KVM_REG_ARM_STD_BMAP, ARM_SMCCC_STD_FEATURES);
+}
+
+static void kvm_arm_fw_reg_sanitize(struct hvc_fw_reg_bmap *fw_reg_bmap)
+{
+ if (!fw_reg_bmap->accessed)
+ fw_reg_bmap->bmap = 0;
+}
+
+/*
+ * kvm_arm_sanitize_fw_regs: Sanitize the hypercall firmware registers
+ *
+ * Sanitization, in the case of hypercall firmware registers, is basically
+ * clearing out the feature bitmaps so that the guests are not exposed to
+ * the services corresponding to a particular register. The registers that
+ * needs sanitization is decided on two factors on the user-space part:
+ * 1. Enablement of KVM_CAP_ARM_HVC_FW_REG_BMAP:
+ * If the user-space hasn't enabled the capability, it either means
+ * that it's unaware of its existence, or it simply doesn't want to
+ * participate in the arrangement and is okay with the default settings.
+ * The former case is to ensure backward compatibility.
+ *
+ * 2. Has the user-space accessed (read/write) the register? :
+ * If yes, it means that the user-space is aware of the register's
+ * existence and can set the bits as it sees fit for the guest. A
+ * read-only access from user-space indicates that the user-space is
+ * happy with the default settings, and doesn't wish to change it.
+ *
+ * The logic for sanitizing a register will then be:
+ * ---------------------------------------------------------------------------
+ * | CAP enabled | Accessed reg | Clear reg | Comments |
+ * ---------------------------------------------------------------------------
+ * | N | N | N | |
+ * | N | Y | N | -ENOENT returned during access |
+ * | Y | N | Y | |
+ * | Y | Y | N | |
+ * ---------------------------------------------------------------------------
+ */
+void kvm_arm_sanitize_fw_regs(struct kvm *kvm)
+{
+ struct hvc_reg_desc *hvc_desc = &kvm->arch.hvc_desc;
+
+ spin_lock(&hvc_desc->lock);
+
+ if (!hvc_desc->fw_reg_bmap_enabled)
+ goto out;
+
+ kvm_arm_fw_reg_sanitize(&hvc_desc->hvc_std_bmap);
+
+out:
+ spin_unlock(&hvc_desc->lock);
+}
+
+static int kvm_arm_fw_reg_get_bmap(struct kvm *kvm,
+ struct hvc_fw_reg_bmap *fw_reg_bmap, u64 *val)
+{
+ int ret = 0;
+ struct hvc_reg_desc *hvc_desc = &kvm->arch.hvc_desc;
+
+ spin_lock(&hvc_desc->lock);
+
+ if (!hvc_desc->fw_reg_bmap_enabled) {
+ ret = -ENOENT;
+ goto out;
+ }
+
+ fw_reg_bmap->accessed = true;
+ *val = fw_reg_bmap->bmap;
+out:
+ spin_unlock(&hvc_desc->lock);
+ return ret;
+}
+
+static int kvm_arm_fw_reg_set_bmap(struct kvm *kvm,
+ struct hvc_fw_reg_bmap *fw_reg_bmap, u64 val)
+{
+ int ret = 0;
+ u64 fw_reg_features;
+ struct hvc_reg_desc *hvc_desc = &kvm->arch.hvc_desc;
+
+ spin_lock(&hvc_desc->lock);
+
+ if (!hvc_desc->fw_reg_bmap_enabled) {
+ ret = -ENOENT;
+ goto out;
+ }
+
+ if (fw_reg_bmap->bmap == val)
+ goto out;
+
+ if (kvm_vm_has_run_once(kvm)) {
+ ret = -EBUSY;
+ goto out;
+ }
+
+ switch (fw_reg_bmap->reg_id) {
+ case KVM_REG_ARM_STD_BMAP:
+ fw_reg_features = ARM_SMCCC_STD_FEATURES;
+ break;
+ default:
+ ret = -EINVAL;
+ goto out;
+ }
+
+ /* Check for unsupported feature bit */
+ if (val & ~fw_reg_features) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ fw_reg_bmap->accessed = true;
+ fw_reg_bmap->bmap = val;
+out:
+ spin_unlock(&hvc_desc->lock);
+ return ret;
+}
+
int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu)
{
- return ARRAY_SIZE(fw_reg_ids);
+ struct hvc_reg_desc *hvc_desc = &vcpu->kvm->arch.hvc_desc;
+ int n_regs = ARRAY_SIZE(fw_reg_ids);
+
+ spin_lock(&hvc_desc->lock);
+
+ if (hvc_desc->fw_reg_bmap_enabled)
+ n_regs += ARRAY_SIZE(fw_reg_bmap_ids);
+
+ spin_unlock(&hvc_desc->lock);
+
+ return n_regs;
}
int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
{
+ struct hvc_reg_desc *hvc_desc = &vcpu->kvm->arch.hvc_desc;
int i;
for (i = 0; i < ARRAY_SIZE(fw_reg_ids); i++) {
- if (put_user(fw_reg_ids[i], uindices))
+ if (put_user(fw_reg_ids[i], uindices++))
+ return -EFAULT;
+ }
+
+ spin_lock(&hvc_desc->lock);
+
+ if (!hvc_desc->fw_reg_bmap_enabled) {
+ spin_unlock(&hvc_desc->lock);
+ return 0;
+ }
+
+ spin_unlock(&hvc_desc->lock);
+
+ for (i = 0; i < ARRAY_SIZE(fw_reg_bmap_ids); i++) {
+ if (put_user(fw_reg_bmap_ids[i], uindices++))
return -EFAULT;
}
@@ -213,8 +413,11 @@ static int get_kernel_wa_level(u64 regid)
int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
{
+ struct hvc_reg_desc *hvc_desc = &vcpu->kvm->arch.hvc_desc;
void __user *uaddr = (void __user *)(long)reg->addr;
+ struct kvm *kvm = vcpu->kvm;
u64 val;
+ int ret;
switch (reg->id) {
case KVM_REG_ARM_PSCI_VERSION:
@@ -223,6 +426,12 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
val = get_kernel_wa_level(reg->id) & KVM_REG_FEATURE_LEVEL_MASK;
+ break;
+ case KVM_REG_ARM_STD_BMAP:
+ ret = kvm_arm_fw_reg_get_bmap(kvm, &hvc_desc->hvc_std_bmap, &val);
+ if (ret)
+ return ret;
+
break;
default:
return -ENOENT;
@@ -236,6 +445,8 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
{
+ struct kvm *kvm = vcpu->kvm;
+ struct hvc_reg_desc *hvc_desc = &kvm->arch.hvc_desc;
void __user *uaddr = (void __user *)(long)reg->addr;
u64 val;
int wa_level;
@@ -310,6 +521,8 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
return -EINVAL;
return 0;
+ case KVM_REG_ARM_STD_BMAP:
+ return kvm_arm_fw_reg_set_bmap(kvm, &hvc_desc->hvc_std_bmap, val);
default:
return -ENOENT;
}
diff --git a/arch/arm64/kvm/trng.c b/arch/arm64/kvm/trng.c
index 99bdd7103c9c..6dff765f5b9b 100644
--- a/arch/arm64/kvm/trng.c
+++ b/arch/arm64/kvm/trng.c
@@ -60,14 +60,9 @@ int kvm_trng_call(struct kvm_vcpu *vcpu)
val = ARM_SMCCC_TRNG_VERSION_1_0;
break;
case ARM_SMCCC_TRNG_FEATURES:
- switch (smccc_get_arg1(vcpu)) {
- case ARM_SMCCC_TRNG_VERSION:
- case ARM_SMCCC_TRNG_FEATURES:
- case ARM_SMCCC_TRNG_GET_UUID:
- case ARM_SMCCC_TRNG_RND32:
- case ARM_SMCCC_TRNG_RND64:
+ if (kvm_hvc_call_supported(vcpu, smccc_get_arg1(vcpu)))
val = TRNG_SUCCESS;
- }
+
break;
case ARM_SMCCC_TRNG_GET_UUID:
smccc_set_retval(vcpu, le32_to_cpu(u[0]), le32_to_cpu(u[1]),
diff --git a/include/kvm/arm_hypercalls.h b/include/kvm/arm_hypercalls.h
index 5d38628a8d04..8c6300d1cbaf 100644
--- a/include/kvm/arm_hypercalls.h
+++ b/include/kvm/arm_hypercalls.h
@@ -6,6 +6,9 @@
#include <asm/kvm_emulate.h>
+#define ARM_SMCCC_STD_FEATURES \
+ GENMASK_ULL(KVM_REG_ARM_STD_BMAP_BIT_MAX, 0)
+
int kvm_hvc_call_handler(struct kvm_vcpu *vcpu);
static inline u32 smccc_get_function(struct kvm_vcpu *vcpu)
@@ -42,9 +45,13 @@ static inline void smccc_set_retval(struct kvm_vcpu *vcpu,
struct kvm_one_reg;
+void kvm_arm_init_hypercalls(struct kvm *kvm);
int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu);
int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
+void kvm_arm_sanitize_fw_regs(struct kvm *kvm);
+
+bool kvm_hvc_call_supported(struct kvm_vcpu *vcpu, u32 func_id);
#endif
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 78f0719cc2a3..3855b7b33bb3 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1130,6 +1130,7 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_BINARY_STATS_FD 203
#define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
#define KVM_CAP_ARM_MTE 205
+#define KVM_CAP_ARM_HVC_FW_REG_BMAP 206
#ifdef KVM_CAP_IRQ_ROUTING
--
2.34.0.rc1.387.gb447b232ab-goog
Introduce the firmware register to hold the standard hypervisor
service calls (owner value 5) as a bitmap. The bitmap represents
the features that'll be enabled for the guest, as configured by
the user-space. Currently, this includes support only for
Paravirtualized time, represented by bit-0.
Signed-off-by: Raghavendra Rao Ananta <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 1 +
arch/arm64/include/uapi/asm/kvm.h | 4 ++++
arch/arm64/kvm/hypercalls.c | 24 ++++++++++++++++++++++++
arch/arm64/kvm/pvtime.c | 3 +++
include/kvm/arm_hypercalls.h | 3 +++
5 files changed, 35 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 1546a2f973ef..e8e540bd1fe5 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -113,6 +113,7 @@ struct hvc_reg_desc {
bool fw_reg_bmap_enabled;
struct hvc_fw_reg_bmap hvc_std_bmap;
+ struct hvc_fw_reg_bmap hvc_std_hyp_bmap;
};
struct kvm_arch {
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index d6e099ed14ef..5890cbcd6385 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -285,6 +285,10 @@ struct kvm_arm_copy_mte_tags {
#define KVM_REG_ARM_STD_BIT_TRNG_V1_0 BIT(0)
#define KVM_REG_ARM_STD_BMAP_BIT_MAX 0 /* Last valid bit */
+#define KVM_REG_ARM_STD_HYP_BMAP KVM_REG_ARM_FW_REG(4)
+#define KVM_REG_ARM_STD_HYP_BIT_PV_TIME BIT(0)
+#define KVM_REG_ARM_STD_HYP_BMAP_BIT_MAX 0 /* Last valid bit */
+
/* SVE registers */
#define KVM_REG_ARM64_SVE (0x15 << KVM_REG_ARM_COPROC_SHIFT)
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index f5df7bc61146..b3320adc068c 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -84,6 +84,10 @@ bool kvm_hvc_call_supported(struct kvm_vcpu *vcpu, u32 func_id)
case ARM_SMCCC_TRNG_RND64:
return kvm_arm_fw_reg_feat_enabled(&hvc_desc->hvc_std_bmap,
KVM_REG_ARM_STD_BIT_TRNG_V1_0);
+ case ARM_SMCCC_HV_PV_TIME_FEATURES:
+ case ARM_SMCCC_HV_PV_TIME_ST:
+ return kvm_arm_fw_reg_feat_enabled(&hvc_desc->hvc_std_hyp_bmap,
+ KVM_REG_ARM_STD_HYP_BIT_PV_TIME);
default:
/* By default, allow the services that aren't listed here */
return true;
@@ -109,6 +113,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
break;
case ARM_SMCCC_ARCH_FEATURES_FUNC_ID:
feature = smccc_get_arg1(vcpu);
+ if (!kvm_hvc_call_supported(vcpu, feature))
+ break;
+
switch (feature) {
case ARM_SMCCC_ARCH_WORKAROUND_1:
switch (arm64_get_spectre_v2_state()) {
@@ -194,6 +201,7 @@ static const u64 fw_reg_ids[] = {
static const u64 fw_reg_bmap_ids[] = {
KVM_REG_ARM_STD_BMAP,
+ KVM_REG_ARM_STD_HYP_BMAP,
};
static void kvm_arm_fw_reg_init_hvc(struct hvc_reg_desc *hvc_desc,
@@ -212,6 +220,8 @@ void kvm_arm_init_hypercalls(struct kvm *kvm)
kvm_arm_fw_reg_init_hvc(hvc_desc, &hvc_desc->hvc_std_bmap,
KVM_REG_ARM_STD_BMAP, ARM_SMCCC_STD_FEATURES);
+ kvm_arm_fw_reg_init_hvc(hvc_desc, &hvc_desc->hvc_std_hyp_bmap,
+ KVM_REG_ARM_STD_HYP_BMAP, ARM_SMCCC_STD_HYP_FEATURES);
}
static void kvm_arm_fw_reg_sanitize(struct hvc_fw_reg_bmap *fw_reg_bmap)
@@ -259,6 +269,7 @@ void kvm_arm_sanitize_fw_regs(struct kvm *kvm)
goto out;
kvm_arm_fw_reg_sanitize(&hvc_desc->hvc_std_bmap);
+ kvm_arm_fw_reg_sanitize(&hvc_desc->hvc_std_hyp_bmap);
out:
spin_unlock(&hvc_desc->lock);
@@ -310,6 +321,9 @@ static int kvm_arm_fw_reg_set_bmap(struct kvm *kvm,
case KVM_REG_ARM_STD_BMAP:
fw_reg_features = ARM_SMCCC_STD_FEATURES;
break;
+ case KVM_REG_ARM_STD_HYP_BMAP:
+ fw_reg_features = ARM_SMCCC_STD_HYP_FEATURES;
+ break;
default:
ret = -EINVAL;
goto out;
@@ -432,6 +446,13 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
if (ret)
return ret;
+ break;
+ case KVM_REG_ARM_STD_HYP_BMAP:
+ ret = kvm_arm_fw_reg_get_bmap(kvm,
+ &hvc_desc->hvc_std_hyp_bmap, &val);
+ if (ret)
+ return ret;
+
break;
default:
return -ENOENT;
@@ -523,6 +544,9 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
return 0;
case KVM_REG_ARM_STD_BMAP:
return kvm_arm_fw_reg_set_bmap(kvm, &hvc_desc->hvc_std_bmap, val);
+ case KVM_REG_ARM_STD_HYP_BMAP:
+ return kvm_arm_fw_reg_set_bmap(kvm,
+ &hvc_desc->hvc_std_hyp_bmap, val);
default:
return -ENOENT;
}
diff --git a/arch/arm64/kvm/pvtime.c b/arch/arm64/kvm/pvtime.c
index 78a09f7a6637..4fa436dbd0b7 100644
--- a/arch/arm64/kvm/pvtime.c
+++ b/arch/arm64/kvm/pvtime.c
@@ -37,6 +37,9 @@ long kvm_hypercall_pv_features(struct kvm_vcpu *vcpu)
u32 feature = smccc_get_arg1(vcpu);
long val = SMCCC_RET_NOT_SUPPORTED;
+ if (!kvm_hvc_call_supported(vcpu, feature))
+ return val;
+
switch (feature) {
case ARM_SMCCC_HV_PV_TIME_FEATURES:
case ARM_SMCCC_HV_PV_TIME_ST:
diff --git a/include/kvm/arm_hypercalls.h b/include/kvm/arm_hypercalls.h
index 8c6300d1cbaf..77c30e335f44 100644
--- a/include/kvm/arm_hypercalls.h
+++ b/include/kvm/arm_hypercalls.h
@@ -9,6 +9,9 @@
#define ARM_SMCCC_STD_FEATURES \
GENMASK_ULL(KVM_REG_ARM_STD_BMAP_BIT_MAX, 0)
+#define ARM_SMCCC_STD_HYP_FEATURES \
+ GENMASK_ULL(KVM_REG_ARM_STD_HYP_BMAP_BIT_MAX, 0)
+
int kvm_hvc_call_handler(struct kvm_vcpu *vcpu);
static inline u32 smccc_get_function(struct kvm_vcpu *vcpu)
--
2.34.0.rc1.387.gb447b232ab-goog
Introduce the firmware register to hold the vendor specific
hypervisor service calls (owner value 6) as a bitmap. The
bitmap represents the features that'll be enabled for the
guest, as configured by the user-space. Currently, this
includes support only for Precision Time Protocol (PTP),
represented by bit-0.
Signed-off-by: Raghavendra Rao Ananta <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 1 +
arch/arm64/include/uapi/asm/kvm.h | 4 ++++
arch/arm64/kvm/hypercalls.c | 30 +++++++++++++++++++++++++++++-
include/kvm/arm_hypercalls.h | 3 +++
4 files changed, 37 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e8e540bd1fe5..ef1d10bdf562 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -114,6 +114,7 @@ struct hvc_reg_desc {
struct hvc_fw_reg_bmap hvc_std_bmap;
struct hvc_fw_reg_bmap hvc_std_hyp_bmap;
+ struct hvc_fw_reg_bmap hvc_vendor_hyp_bmap;
};
struct kvm_arch {
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 5890cbcd6385..8468e5d265df 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -289,6 +289,10 @@ struct kvm_arm_copy_mte_tags {
#define KVM_REG_ARM_STD_HYP_BIT_PV_TIME BIT(0)
#define KVM_REG_ARM_STD_HYP_BMAP_BIT_MAX 0 /* Last valid bit */
+#define KVM_REG_ARM_VENDOR_HYP_BMAP KVM_REG_ARM_FW_REG(5)
+#define KVM_REG_ARM_VENDOR_HYP_BIT_PTP BIT(0)
+#define KVM_REG_ARM_VENDOR_HYP_BMAP_BIT_MAX 0 /* Last valid bit */
+
/* SVE registers */
#define KVM_REG_ARM64_SVE (0x15 << KVM_REG_ARM_COPROC_SHIFT)
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index b3320adc068c..e1361029101e 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -88,6 +88,9 @@ bool kvm_hvc_call_supported(struct kvm_vcpu *vcpu, u32 func_id)
case ARM_SMCCC_HV_PV_TIME_ST:
return kvm_arm_fw_reg_feat_enabled(&hvc_desc->hvc_std_hyp_bmap,
KVM_REG_ARM_STD_HYP_BIT_PV_TIME);
+ case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
+ return kvm_arm_fw_reg_feat_enabled(&hvc_desc->hvc_vendor_hyp_bmap,
+ KVM_REG_ARM_VENDOR_HYP_BIT_PTP);
default:
/* By default, allow the services that aren't listed here */
return true;
@@ -99,6 +102,7 @@ bool kvm_hvc_call_supported(struct kvm_vcpu *vcpu, u32 func_id)
int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
{
+ struct hvc_reg_desc *hvc_desc = &vcpu->kvm->arch.hvc_desc;
u32 func_id = smccc_get_function(vcpu);
u64 val[4] = {SMCCC_RET_NOT_SUPPORTED};
u32 feature;
@@ -173,7 +177,14 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
break;
case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
- val[0] |= BIT(ARM_SMCCC_KVM_FUNC_PTP);
+
+ /*
+ * The feature bits exposed to user-space doesn't include
+ * ARM_SMCCC_KVM_FUNC_FEATURES. However, we expose this to
+ * the guest as bit-0. Hence, left-shift the user-space
+ * exposed bitmap by 1 to accommodate this.
+ */
+ val[0] |= hvc_desc->hvc_vendor_hyp_bmap.bmap << 1;
break;
case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
kvm_ptp_get_time(vcpu, val);
@@ -202,6 +213,7 @@ static const u64 fw_reg_ids[] = {
static const u64 fw_reg_bmap_ids[] = {
KVM_REG_ARM_STD_BMAP,
KVM_REG_ARM_STD_HYP_BMAP,
+ KVM_REG_ARM_VENDOR_HYP_BMAP,
};
static void kvm_arm_fw_reg_init_hvc(struct hvc_reg_desc *hvc_desc,
@@ -222,6 +234,8 @@ void kvm_arm_init_hypercalls(struct kvm *kvm)
KVM_REG_ARM_STD_BMAP, ARM_SMCCC_STD_FEATURES);
kvm_arm_fw_reg_init_hvc(hvc_desc, &hvc_desc->hvc_std_hyp_bmap,
KVM_REG_ARM_STD_HYP_BMAP, ARM_SMCCC_STD_HYP_FEATURES);
+ kvm_arm_fw_reg_init_hvc(hvc_desc, &hvc_desc->hvc_vendor_hyp_bmap,
+ KVM_REG_ARM_VENDOR_HYP_BMAP, ARM_SMCCC_VENDOR_HYP_FEATURES);
}
static void kvm_arm_fw_reg_sanitize(struct hvc_fw_reg_bmap *fw_reg_bmap)
@@ -270,6 +284,7 @@ void kvm_arm_sanitize_fw_regs(struct kvm *kvm)
kvm_arm_fw_reg_sanitize(&hvc_desc->hvc_std_bmap);
kvm_arm_fw_reg_sanitize(&hvc_desc->hvc_std_hyp_bmap);
+ kvm_arm_fw_reg_sanitize(&hvc_desc->hvc_vendor_hyp_bmap);
out:
spin_unlock(&hvc_desc->lock);
@@ -324,6 +339,9 @@ static int kvm_arm_fw_reg_set_bmap(struct kvm *kvm,
case KVM_REG_ARM_STD_HYP_BMAP:
fw_reg_features = ARM_SMCCC_STD_HYP_FEATURES;
break;
+ case KVM_REG_ARM_VENDOR_HYP_BMAP:
+ fw_reg_features = ARM_SMCCC_VENDOR_HYP_FEATURES;
+ break;
default:
ret = -EINVAL;
goto out;
@@ -453,6 +471,13 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
if (ret)
return ret;
+ break;
+ case KVM_REG_ARM_VENDOR_HYP_BMAP:
+ ret = kvm_arm_fw_reg_get_bmap(kvm,
+ &hvc_desc->hvc_vendor_hyp_bmap, &val);
+ if (ret)
+ return ret;
+
break;
default:
return -ENOENT;
@@ -547,6 +572,9 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
case KVM_REG_ARM_STD_HYP_BMAP:
return kvm_arm_fw_reg_set_bmap(kvm,
&hvc_desc->hvc_std_hyp_bmap, val);
+ case KVM_REG_ARM_VENDOR_HYP_BMAP:
+ return kvm_arm_fw_reg_set_bmap(kvm,
+ &hvc_desc->hvc_vendor_hyp_bmap, val);
default:
return -ENOENT;
}
diff --git a/include/kvm/arm_hypercalls.h b/include/kvm/arm_hypercalls.h
index 77c30e335f44..94f56562fea8 100644
--- a/include/kvm/arm_hypercalls.h
+++ b/include/kvm/arm_hypercalls.h
@@ -12,6 +12,9 @@
#define ARM_SMCCC_STD_HYP_FEATURES \
GENMASK_ULL(KVM_REG_ARM_STD_HYP_BMAP_BIT_MAX, 0)
+#define ARM_SMCCC_VENDOR_HYP_FEATURES \
+ GENMASK_ULL(KVM_REG_ARM_VENDOR_HYP_BMAP_BIT_MAX, 0)
+
int kvm_hvc_call_handler(struct kvm_vcpu *vcpu);
static inline u32 smccc_get_function(struct kvm_vcpu *vcpu)
--
2.34.0.rc1.387.gb447b232ab-goog
- Add documentation for the capability, KVM_CAP_ARM_HVC_FW_REG_BMAP,
in KVM's api.rst.
- Add the documentation for the bitmap firmware registers in
psci.rst. This includes the details for KVM_REG_ARM_STD_BMAP,
KVM_REG_ARM_STD_HYP_BMAP, and KVM_REG_ARM_VENDOR_HYP_BMAP
registers.
Signed-off-by: Raghavendra Rao Ananta <[email protected]>
---
Documentation/virt/kvm/api.rst | 23 ++++++++
Documentation/virt/kvm/arm/psci.rst | 89 +++++++++++++++++++++++------
2 files changed, 95 insertions(+), 17 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 3b093d6dbe22..7d88567feaa7 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6911,6 +6911,29 @@ MAP_SHARED mmap will result in an -EINVAL return.
When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to
perform a bulk copy of tags to/from the guest.
+7.29 KVM_CAP_ARM_HVC_FW_REG_BMAP
+--------------------------------
+
+:Architecture: arm64
+:Parameters: none
+
+This capability indicates that KVM for arm64 supports the psuedo-firmware
+register bitmap extension. It must be explicitly enabled. Once enabled,
+KVM allows access to the firmware registers that hold the bitmap of the
+hypercall services that should be exposed to the guest.
+
+By default, the registers are set with the upper-limit of the features
+exposed to the guest. User-space can discover them via the GET_ONE_REG
+interface. If unsatisfied with the configuration, it can write-back the
+bitmap that it sees fit for the guest via SET_ONE_REG interface. The
+registers that are never accessed by the user-space (read/write) are
+by default cleared just before the vCPU runs. This is to make sure that
+the features are not accidentally exposed to the guest without the
+consent of user-space.
+
+Note that the capability has to be enabled before running any vCPU. Also,
+the capability cannot be disabled. The VM has to be restarted for that.
+
8. Other capabilities.
======================
diff --git a/Documentation/virt/kvm/arm/psci.rst b/Documentation/virt/kvm/arm/psci.rst
index d52c2e83b5b8..f6306b91168d 100644
--- a/Documentation/virt/kvm/arm/psci.rst
+++ b/Documentation/virt/kvm/arm/psci.rst
@@ -1,32 +1,32 @@
.. SPDX-License-Identifier: GPL-2.0
-=========================================
-Power State Coordination Interface (PSCI)
-=========================================
+=======================
+ARM Hypercall Interface
+=======================
-KVM implements the PSCI (Power State Coordination Interface)
-specification in order to provide services such as CPU on/off, reset
-and power-off to the guest.
+KVM handles the hypercall services as requested by the guests. New hypercall
+services are regularly made available by the ARM specification or by KVM (as
+vendor services) if they make sense from a virtualization point of view.
-The PSCI specification is regularly updated to provide new features,
-and KVM implements these updates if they make sense from a virtualization
-point of view.
-
-This means that a guest booted on two different versions of KVM can
-observe two different "firmware" revisions. This could cause issues if
-a given guest is tied to a particular PSCI revision (unlikely), or if
-a migration causes a different PSCI version to be exposed out of the
-blue to an unsuspecting guest.
+This means that a guest booted on two different versions of KVM can observe
+two different "firmware" revisions. This could cause issues if a given guest
+is tied to a particular version of a hypercall service, or if a migration
+causes a different version to be exposed out of the blue to an unsuspecting
+guest.
In order to remedy this situation, KVM exposes a set of "firmware
pseudo-registers" that can be manipulated using the GET/SET_ONE_REG
-interface. These registers can be saved/restored by userspace, and set
+interface. These registers can be saved/restored by user-space, and set
to a convenient value if required.
-The following register is defined:
+The following registers are defined:
* KVM_REG_ARM_PSCI_VERSION:
+ KVM implements the PSCI (Power State Coordination Interface)
+ specification in order to provide services such as CPU on/off, reset
+ and power-off to the guest.
+
- Only valid if the vcpu has the KVM_ARM_VCPU_PSCI_0_2 feature set
(and thus has already been initialized)
- Returns the current PSCI version on GET_ONE_REG (defaulting to the
@@ -74,4 +74,59 @@ The following register is defined:
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED:
The workaround is always active on this vCPU or it is not needed.
+Contrary to the above registers, the following registers exposes the hypercall
+services in the form of a feature-bitmap. This bitmap is translated to the
+services that are exposed to the guest. There is a register defined per service
+call owner and can be accessed via GET/SET_ONE_REG interface.
+
+A new KVM capability, KVM_CAP_ARM_HVC_FW_REG_BMAP, is introduced to let
+user-space know of this extension. It has to explicitly enable the capability
+to get access to these registers. If the capability is enabled, a 'read' of
+these registers will simply expose the upper-limit of all features supported
+by the corresponding service call owner in the form of a bitmap. If the
+user-space is unhappy with the arrangement, it can 'write-back' the bitmap
+that it wishes to expose.
+
+If a register is not accessed (either read/write), KVM will assume that the
+user-space is unaware of its existence. In such a case, KVM would simply
+clear all the bits of that register just before starting the VM. This way
+no new features are accidentally exposed to the guest.
+
+The psuedo-firmware bitmap register are as follows:
+
+* KVM_REG_ARM_STD_BMAP:
+ Controls the bitmap of the ARM Standard Secure Service Calls.
+
+ The following bits are accepted:
+
+ KVM_REG_ARM_STD_BIT_TRNG_V1_0:
+ The bit represents the services offered under v1.0 of ARM True Random
+ Number Generator (TRNG) specification, ARM DEN0098.
+
+* KVM_REG_ARM_STD_HYP_BMAP:
+ Controls the bitmap of the ARM Standard Hypervisor Service Calls.
+
+ The following bits are accepted:
+
+ KVM_REG_ARM_STD_HYP_BIT_PV_TIME:
+ The bit represents the Paravirtualized Time service as represented by
+ ARM DEN0057A.
+
+* KVM_REG_ARM_VENDOR_HYP_BMAP:
+ Controls the bitmap of the Vendor specific Hypervisor Service Calls.
+
+ The following bits are accepted:
+
+ KVM_REG_ARM_VENDOR_HYP_BIT_PTP:
+ The bit represents the Precision Time Protocol KVM service.
+
+Errors:
+
+ ======= =============================================================
+ -ENOENT Register accessed (read/write) without enabling
+ KVM_CAP_ARM_HVC_FW_REG_BMAP.
+ -EBUSY Attempt a 'write' to the register after the VM has started.
+ -EINVAL Invalid bitmap written to the register.
+ ======= =============================================================
+
.. [1] https://developer.arm.com/-/media/developer/pdf/ARM_DEN_0070A_Firmware_interfaces_for_mitigating_CVE-2017-5715.pdf
--
2.34.0.rc1.387.gb447b232ab-goog
Since the doc now covers more of general hypercalls'
details, rather than just PSCI, rename the file to a
more appropriate name- hypercalls.rst.
Signed-off-by: Raghavendra Rao Ananta <[email protected]>
---
Documentation/virt/kvm/arm/{psci.rst => hypercalls.rst} | 0
1 file changed, 0 insertions(+), 0 deletions(-)
rename Documentation/virt/kvm/arm/{psci.rst => hypercalls.rst} (100%)
diff --git a/Documentation/virt/kvm/arm/psci.rst b/Documentation/virt/kvm/arm/hypercalls.rst
similarity index 100%
rename from Documentation/virt/kvm/arm/psci.rst
rename to Documentation/virt/kvm/arm/hypercalls.rst
--
2.34.0.rc1.387.gb447b232ab-goog
Import the standard SMCCC definitions from include/linux/arm-smccc.h.
Signed-off-by: Raghavendra Rao Ananta <[email protected]>
---
tools/include/linux/arm-smccc.h | 188 ++++++++++++++++++++++++++++++++
1 file changed, 188 insertions(+)
create mode 100644 tools/include/linux/arm-smccc.h
diff --git a/tools/include/linux/arm-smccc.h b/tools/include/linux/arm-smccc.h
new file mode 100644
index 000000000000..a11c0bbabd5b
--- /dev/null
+++ b/tools/include/linux/arm-smccc.h
@@ -0,0 +1,188 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2015, Linaro Limited
+ */
+#ifndef __LINUX_ARM_SMCCC_H
+#define __LINUX_ARM_SMCCC_H
+
+#include <linux/const.h>
+
+/*
+ * This file provides common defines for ARM SMC Calling Convention as
+ * specified in
+ * https://developer.arm.com/docs/den0028/latest
+ *
+ * This code is up-to-date with version DEN 0028 C
+ */
+
+#define ARM_SMCCC_STD_CALL _AC(0,U)
+#define ARM_SMCCC_FAST_CALL _AC(1,U)
+#define ARM_SMCCC_TYPE_SHIFT 31
+
+#define ARM_SMCCC_SMC_32 0
+#define ARM_SMCCC_SMC_64 1
+#define ARM_SMCCC_CALL_CONV_SHIFT 30
+
+#define ARM_SMCCC_OWNER_MASK 0x3F
+#define ARM_SMCCC_OWNER_SHIFT 24
+
+#define ARM_SMCCC_FUNC_MASK 0xFFFF
+
+#define ARM_SMCCC_IS_FAST_CALL(smc_val) \
+ ((smc_val) & (ARM_SMCCC_FAST_CALL << ARM_SMCCC_TYPE_SHIFT))
+#define ARM_SMCCC_IS_64(smc_val) \
+ ((smc_val) & (ARM_SMCCC_SMC_64 << ARM_SMCCC_CALL_CONV_SHIFT))
+#define ARM_SMCCC_FUNC_NUM(smc_val) ((smc_val) & ARM_SMCCC_FUNC_MASK)
+#define ARM_SMCCC_OWNER_NUM(smc_val) \
+ (((smc_val) >> ARM_SMCCC_OWNER_SHIFT) & ARM_SMCCC_OWNER_MASK)
+
+#define ARM_SMCCC_CALL_VAL(type, calling_convention, owner, func_num) \
+ (((type) << ARM_SMCCC_TYPE_SHIFT) | \
+ ((calling_convention) << ARM_SMCCC_CALL_CONV_SHIFT) | \
+ (((owner) & ARM_SMCCC_OWNER_MASK) << ARM_SMCCC_OWNER_SHIFT) | \
+ ((func_num) & ARM_SMCCC_FUNC_MASK))
+
+#define ARM_SMCCC_OWNER_ARCH 0
+#define ARM_SMCCC_OWNER_CPU 1
+#define ARM_SMCCC_OWNER_SIP 2
+#define ARM_SMCCC_OWNER_OEM 3
+#define ARM_SMCCC_OWNER_STANDARD 4
+#define ARM_SMCCC_OWNER_STANDARD_HYP 5
+#define ARM_SMCCC_OWNER_VENDOR_HYP 6
+#define ARM_SMCCC_OWNER_TRUSTED_APP 48
+#define ARM_SMCCC_OWNER_TRUSTED_APP_END 49
+#define ARM_SMCCC_OWNER_TRUSTED_OS 50
+#define ARM_SMCCC_OWNER_TRUSTED_OS_END 63
+
+#define ARM_SMCCC_FUNC_QUERY_CALL_UID 0xff01
+
+#define ARM_SMCCC_QUIRK_NONE 0
+#define ARM_SMCCC_QUIRK_QCOM_A6 1 /* Save/restore register a6 */
+
+#define ARM_SMCCC_VERSION_1_0 0x10000
+#define ARM_SMCCC_VERSION_1_1 0x10001
+#define ARM_SMCCC_VERSION_1_2 0x10002
+#define ARM_SMCCC_VERSION_1_3 0x10003
+
+#define ARM_SMCCC_1_3_SVE_HINT 0x10000
+
+#define ARM_SMCCC_VERSION_FUNC_ID \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_32, \
+ 0, 0)
+
+#define ARM_SMCCC_ARCH_FEATURES_FUNC_ID \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_32, \
+ 0, 1)
+
+#define ARM_SMCCC_ARCH_SOC_ID \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_32, \
+ 0, 2)
+
+#define ARM_SMCCC_ARCH_WORKAROUND_1 \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_32, \
+ 0, 0x8000)
+
+#define ARM_SMCCC_ARCH_WORKAROUND_2 \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_32, \
+ 0, 0x7fff)
+
+#define ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_32, \
+ ARM_SMCCC_OWNER_VENDOR_HYP, \
+ ARM_SMCCC_FUNC_QUERY_CALL_UID)
+
+/* KVM UID value: 28b46fb6-2ec5-11e9-a9ca-4b564d003a74 */
+#define ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_0 0xb66fb428U
+#define ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_1 0xe911c52eU
+#define ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_2 0x564bcaa9U
+#define ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3 0x743a004dU
+
+/* KVM "vendor specific" services */
+#define ARM_SMCCC_KVM_FUNC_FEATURES 0
+#define ARM_SMCCC_KVM_FUNC_PTP 1
+#define ARM_SMCCC_KVM_FUNC_FEATURES_2 127
+#define ARM_SMCCC_KVM_NUM_FUNCS 128
+
+#define ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_32, \
+ ARM_SMCCC_OWNER_VENDOR_HYP, \
+ ARM_SMCCC_KVM_FUNC_FEATURES)
+
+#define SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED 1
+
+/*
+ * ptp_kvm is a feature used for time sync between vm and host.
+ * ptp_kvm module in guest kernel will get service from host using
+ * this hypercall ID.
+ */
+#define ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_32, \
+ ARM_SMCCC_OWNER_VENDOR_HYP, \
+ ARM_SMCCC_KVM_FUNC_PTP)
+
+/* ptp_kvm counter type ID */
+#define KVM_PTP_VIRT_COUNTER 0
+#define KVM_PTP_PHYS_COUNTER 1
+
+/* Paravirtualised time calls (defined by ARM DEN0057A) */
+#define ARM_SMCCC_HV_PV_TIME_FEATURES \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_64, \
+ ARM_SMCCC_OWNER_STANDARD_HYP, \
+ 0x20)
+
+#define ARM_SMCCC_HV_PV_TIME_ST \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_64, \
+ ARM_SMCCC_OWNER_STANDARD_HYP, \
+ 0x21)
+
+/* TRNG entropy source calls (defined by ARM DEN0098) */
+#define ARM_SMCCC_TRNG_VERSION \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_32, \
+ ARM_SMCCC_OWNER_STANDARD, \
+ 0x50)
+
+#define ARM_SMCCC_TRNG_FEATURES \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_32, \
+ ARM_SMCCC_OWNER_STANDARD, \
+ 0x51)
+
+#define ARM_SMCCC_TRNG_GET_UUID \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_32, \
+ ARM_SMCCC_OWNER_STANDARD, \
+ 0x52)
+
+#define ARM_SMCCC_TRNG_RND32 \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_32, \
+ ARM_SMCCC_OWNER_STANDARD, \
+ 0x53)
+
+#define ARM_SMCCC_TRNG_RND64 \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_64, \
+ ARM_SMCCC_OWNER_STANDARD, \
+ 0x53)
+
+/*
+ * Return codes defined in ARM DEN 0070A
+ * ARM DEN 0070A is now merged/consolidated into ARM DEN 0028 C
+ */
+#define SMCCC_RET_SUCCESS 0
+#define SMCCC_RET_NOT_SUPPORTED -1
+#define SMCCC_RET_NOT_REQUIRED -2
+#define SMCCC_RET_INVALID_PARAMETER -3
+
+#endif /*__LINUX_ARM_SMCCC_H*/
--
2.34.0.rc1.387.gb447b232ab-goog
Introduce a KVM selftest to check the hypercall interface
for arm64 platforms. The test validates the user-space's
IOCTL interface to read/write the psuedo-firmware registers
as well as its effects on the guest upon certain configurations.
Signed-off-by: Raghavendra Rao Ananta <[email protected]>
---
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/aarch64/hypercalls.c | 367 ++++++++++++++++++
3 files changed, 369 insertions(+)
create mode 100644 tools/testing/selftests/kvm/aarch64/hypercalls.c
diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index 9d9571d19a29..f3662b10aa7f 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -2,6 +2,7 @@
/aarch64/arch_timer
/aarch64/debug-exceptions
/aarch64/get-reg-list
+/aarch64/hypercalls
/aarch64/psci_test
/aarch64/vgic_init
/s390x/memop
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 4ef9fa47597b..4f462d7b2e40 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -91,6 +91,7 @@ TEST_GEN_PROGS_x86_64 += system_counter_offset_test
TEST_GEN_PROGS_aarch64 += aarch64/arch_timer
TEST_GEN_PROGS_aarch64 += aarch64/debug-exceptions
TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
+TEST_GEN_PROGS_aarch64 += aarch64/hypercalls
TEST_GEN_PROGS_aarch64 += aarch64/psci_test
TEST_GEN_PROGS_aarch64 += aarch64/vgic_init
TEST_GEN_PROGS_aarch64 += demand_paging_test
diff --git a/tools/testing/selftests/kvm/aarch64/hypercalls.c b/tools/testing/selftests/kvm/aarch64/hypercalls.c
new file mode 100644
index 000000000000..c89f73109f1d
--- /dev/null
+++ b/tools/testing/selftests/kvm/aarch64/hypercalls.c
@@ -0,0 +1,367 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <errno.h>
+#include <linux/arm-smccc.h>
+#include <asm/kvm.h>
+#include <kvm_util.h>
+
+#include "processor.h"
+
+#define FW_REG_ULIMIT_VAL(max_feat_bit) (GENMASK_ULL(max_feat_bit, 0))
+
+struct kvm_fw_reg_info {
+ uint64_t reg; /* Register definition */
+ uint64_t max_feat_bit; /* Bit that represents the upper limit of the feature-map */
+};
+
+#define FW_REG_INFO(r, bit_max) \
+ { \
+ .reg = r, \
+ .max_feat_bit = bit_max, \
+ }
+
+static const struct kvm_fw_reg_info fw_reg_info[] = {
+ FW_REG_INFO(KVM_REG_ARM_STD_BMAP, KVM_REG_ARM_STD_BMAP_BIT_MAX),
+ FW_REG_INFO(KVM_REG_ARM_STD_HYP_BMAP, KVM_REG_ARM_STD_HYP_BMAP_BIT_MAX),
+ FW_REG_INFO(KVM_REG_ARM_VENDOR_HYP_BMAP, KVM_REG_ARM_VENDOR_HYP_BMAP_BIT_MAX),
+};
+
+enum test_stage {
+ TEST_STAGE_REG_IFACE,
+ TEST_STAGE_HVC_IFACE_FEAT_DISABLED,
+ TEST_STAGE_HVC_IFACE_FEAT_ENABLED,
+ TEST_STAGE_END,
+};
+
+static int stage;
+
+struct test_hvc_info {
+ uint32_t func_id;
+ int64_t arg0;
+
+ void (*test_hvc_disabled)(const struct test_hvc_info *hc_info,
+ struct arm_smccc_res *res);
+ void (*test_hvc_enabled)(const struct test_hvc_info *hc_info,
+ struct arm_smccc_res *res);
+};
+
+#define TEST_HVC_INFO(f, a0, test_disabled, test_enabled) \
+ { \
+ .func_id = f, \
+ .arg0 = a0, \
+ .test_hvc_disabled = test_disabled, \
+ .test_hvc_enabled = test_enabled, \
+ }
+
+static void
+test_ptp_feat_hvc_disabled(const struct test_hvc_info *hc_info, struct arm_smccc_res *res)
+{
+ GUEST_ASSERT_3((res->a0 & BIT(ARM_SMCCC_KVM_FUNC_PTP)) == 0,
+ res->a0, hc_info->func_id, hc_info->arg0);
+}
+
+static void
+test_ptp_feat_hvc_enabled(const struct test_hvc_info *hc_info, struct arm_smccc_res *res)
+{
+ GUEST_ASSERT_3((res->a0 & BIT(ARM_SMCCC_KVM_FUNC_PTP)) != 0,
+ res->a0, hc_info->func_id, hc_info->arg0);
+}
+
+static const struct test_hvc_info hvc_info[] = {
+ /* KVM_REG_ARM_STD_BMAP: KVM_REG_ARM_STD_BIT_TRNG_V1_0 */
+ TEST_HVC_INFO(ARM_SMCCC_TRNG_VERSION, 0, NULL, NULL),
+ TEST_HVC_INFO(ARM_SMCCC_TRNG_FEATURES, ARM_SMCCC_TRNG_RND64, NULL, NULL),
+ TEST_HVC_INFO(ARM_SMCCC_TRNG_GET_UUID, 0, NULL, NULL),
+ TEST_HVC_INFO(ARM_SMCCC_TRNG_RND32, 0, NULL, NULL),
+ TEST_HVC_INFO(ARM_SMCCC_TRNG_RND64, 0, NULL, NULL),
+
+ /* KVM_REG_ARM_STD_HYP_BMAP: KVM_REG_ARM_STD_HYP_BIT_PV_TIME */
+ TEST_HVC_INFO(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
+ ARM_SMCCC_HV_PV_TIME_FEATURES, NULL, NULL),
+ TEST_HVC_INFO(ARM_SMCCC_HV_PV_TIME_FEATURES,
+ ARM_SMCCC_HV_PV_TIME_ST, NULL, NULL),
+ TEST_HVC_INFO(ARM_SMCCC_HV_PV_TIME_ST, 0, NULL, NULL),
+
+ /* KVM_REG_ARM_VENDOR_HYP_BMAP: KVM_REG_ARM_VENDOR_HYP_BIT_PTP */
+ TEST_HVC_INFO(ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID,
+ ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID,
+ test_ptp_feat_hvc_disabled, test_ptp_feat_hvc_enabled),
+ TEST_HVC_INFO(ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID,
+ KVM_PTP_VIRT_COUNTER, NULL, NULL),
+};
+
+static void guest_test_hvc(int stage)
+{
+ unsigned int i;
+ struct arm_smccc_res res;
+
+ for (i = 0; i < ARRAY_SIZE(hvc_info); i++) {
+ const struct test_hvc_info *hc_info = &hvc_info[i];
+
+ memset(&res, 0, sizeof(res));
+ smccc_hvc(hc_info->func_id, hc_info->arg0, 0, 0, 0, 0, 0, 0, &res);
+
+ switch (stage) {
+ case TEST_STAGE_HVC_IFACE_FEAT_DISABLED:
+ if (hc_info->test_hvc_disabled)
+ hc_info->test_hvc_disabled(hc_info, &res);
+ else
+ GUEST_ASSERT_3(res.a0 == SMCCC_RET_NOT_SUPPORTED,
+ res.a0, hc_info->func_id, hc_info->arg0);
+ break;
+ case TEST_STAGE_HVC_IFACE_FEAT_ENABLED:
+ if (hc_info->test_hvc_enabled)
+ hc_info->test_hvc_enabled(hc_info, &res);
+ else
+ GUEST_ASSERT_3(res.a0 != SMCCC_RET_NOT_SUPPORTED,
+ res.a0, hc_info->func_id, hc_info->arg0);
+ break;
+ default:
+ GUEST_ASSERT_1(0, stage);
+ }
+ }
+}
+
+static void guest_code(void)
+{
+ while (stage != TEST_STAGE_END) {
+ switch (stage) {
+ case TEST_STAGE_REG_IFACE:
+ break;
+ case TEST_STAGE_HVC_IFACE_FEAT_DISABLED:
+ case TEST_STAGE_HVC_IFACE_FEAT_ENABLED:
+ guest_test_hvc(stage);
+ break;
+ default:
+ GUEST_ASSERT_1(0, stage);
+ }
+
+ GUEST_SYNC(stage);
+ }
+
+ GUEST_DONE();
+}
+
+static int set_fw_reg(struct kvm_vm *vm, uint64_t id, uint64_t val)
+{
+ struct kvm_one_reg reg = {
+ .id = KVM_REG_ARM_FW_REG(id),
+ .addr = (uint64_t)&val,
+ };
+
+ return _vcpu_ioctl(vm, 0, KVM_SET_ONE_REG, ®);
+}
+
+static void get_fw_reg(struct kvm_vm *vm, uint64_t id, uint64_t *addr)
+{
+ struct kvm_one_reg reg = {
+ .id = KVM_REG_ARM_FW_REG(id),
+ .addr = (uint64_t)addr,
+ };
+
+ return vcpu_ioctl(vm, 0, KVM_GET_ONE_REG, ®);
+}
+
+struct st_time {
+ uint32_t rev;
+ uint32_t attr;
+ uint64_t st_time;
+};
+
+#define STEAL_TIME_SIZE ((sizeof(struct st_time) + 63) & ~63)
+#define ST_GPA_BASE (1 << 30)
+
+static void steal_time_init(struct kvm_vm *vm)
+{
+ uint64_t st_ipa = (ulong)ST_GPA_BASE;
+ unsigned int gpages;
+ struct kvm_device_attr dev = {
+ .group = KVM_ARM_VCPU_PVTIME_CTRL,
+ .attr = KVM_ARM_VCPU_PVTIME_IPA,
+ .addr = (uint64_t)&st_ipa,
+ };
+
+ gpages = vm_calc_num_guest_pages(VM_MODE_DEFAULT, STEAL_TIME_SIZE);
+ vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, ST_GPA_BASE, 1, gpages, 0);
+
+ vcpu_ioctl(vm, 0, KVM_SET_DEVICE_ATTR, &dev);
+}
+
+static void test_fw_regs_first_read(struct kvm_vm *vm)
+{
+ uint64_t val;
+ unsigned int i;
+
+ for (i = 0; i < ARRAY_SIZE(fw_reg_info); i++) {
+ const struct kvm_fw_reg_info *reg_info = &fw_reg_info[i];
+
+ /* First read should be an upper limit of the features supported */
+ get_fw_reg(vm, reg_info->reg, &val);
+ TEST_ASSERT(val == FW_REG_ULIMIT_VAL(reg_info->max_feat_bit),
+ "Expected all the features to be set for reg: 0x%lx; expected: 0x%llx; read: 0x%lx\n",
+ reg_info->reg, GENMASK_ULL(reg_info->max_feat_bit, 0), val);
+ }
+}
+
+static void test_fw_regs_before_vm_start(struct kvm_vm *vm)
+{
+ uint64_t val;
+ unsigned int i;
+ int ret;
+
+ for (i = 0; i < ARRAY_SIZE(fw_reg_info); i++) {
+ const struct kvm_fw_reg_info *reg_info = &fw_reg_info[i];
+
+ /* Test 'write' by disabling all the features of the register map */
+ ret = set_fw_reg(vm, reg_info->reg, 0);
+ TEST_ASSERT(ret == 0,
+ "Failed to clear all the features of reg: 0x%lx; ret: %d\n",
+ reg_info->reg, errno);
+
+ get_fw_reg(vm, reg_info->reg, &val);
+ TEST_ASSERT(val == 0,
+ "Expected all the features to be cleared for reg: 0x%lx\n", reg_info->reg);
+
+ /*
+ * Test enabling a feature that's not supported.
+ * Avoid this check if all the bits are occupied.
+ */
+ if (reg_info->max_feat_bit < 63) {
+ ret = set_fw_reg(vm, reg_info->reg, BIT(reg_info->max_feat_bit + 1));
+ TEST_ASSERT(ret != 0 && errno == EINVAL,
+ "Unexpected behavior or return value (%d) while setting an unsupported feature for reg: 0x%lx\n",
+ errno, reg_info->reg);
+ }
+ }
+}
+
+static void test_fw_regs_after_vm_start(struct kvm_vm *vm)
+{
+ uint64_t val;
+ unsigned int i;
+ int ret;
+
+ for (i = 0; i < ARRAY_SIZE(fw_reg_info); i++) {
+ const struct kvm_fw_reg_info *reg_info = &fw_reg_info[i];
+
+ /*
+ * Before starting the VM, the test clears all the bits.
+ * Check if that's still the case.
+ */
+ get_fw_reg(vm, reg_info->reg, &val);
+ TEST_ASSERT(val == 0,
+ "Expected all the features to be cleared for reg: 0x%lx\n",
+ reg_info->reg);
+
+ /*
+ * Test setting the last known value. KVM should allow this
+ * even if VM has started running.
+ */
+ ret = set_fw_reg(vm, reg_info->reg, 0);
+ TEST_ASSERT(ret == 0,
+ "Failed to clear all the features of reg: 0x%lx; ret: %d\n",
+ reg_info->reg, errno);
+
+ /*
+ * Set all the features for this register again. KVM shouldn't
+ * allow this as the VM is running.
+ */
+ ret = set_fw_reg(vm, reg_info->reg, FW_REG_ULIMIT_VAL(reg_info->max_feat_bit));
+ TEST_ASSERT(ret != 0 && errno == EBUSY,
+ "Unexpected behavior or return value (%d) while setting a feature while VM is running for reg: 0x%lx\n",
+ errno, reg_info->reg);
+ }
+}
+
+static struct kvm_vm *test_vm_create(void)
+{
+ struct kvm_vm *vm;
+ struct kvm_enable_cap cap = {
+ .cap = KVM_CAP_ARM_HVC_FW_REG_BMAP,
+ };
+
+ vm = vm_create_default(0, 0, guest_code);
+
+ vm_enable_cap(vm, &cap);
+
+ ucall_init(vm, NULL);
+ steal_time_init(vm);
+
+ /* Read all the registers to mark them as 'accessed' */
+ test_fw_regs_first_read(vm);
+
+ return vm;
+}
+
+static struct kvm_vm *test_guest_stage(struct kvm_vm *vm)
+{
+ struct kvm_vm *ret_vm = vm;
+
+ pr_debug("Stage: %d\n", stage);
+
+ switch (stage) {
+ case TEST_STAGE_REG_IFACE:
+ test_fw_regs_after_vm_start(vm);
+ break;
+ case TEST_STAGE_HVC_IFACE_FEAT_DISABLED:
+ /* Start a new VM so that all the features are now enabled by default */
+ kvm_vm_free(vm);
+ ret_vm = test_vm_create();
+ break;
+ case TEST_STAGE_HVC_IFACE_FEAT_ENABLED:
+ break;
+ default:
+ TEST_FAIL("Unknown test stage: %d\n", stage);
+ }
+
+ stage++;
+ sync_global_to_guest(vm, stage);
+
+ return ret_vm;
+}
+
+static void test_run(void)
+{
+ struct kvm_vm *vm;
+ struct ucall uc;
+ bool guest_done = false;
+
+ vm = test_vm_create();
+
+ test_fw_regs_before_vm_start(vm);
+
+ while (!guest_done) {
+ vcpu_run(vm, 0);
+
+ switch (get_ucall(vm, 0, &uc)) {
+ case UCALL_SYNC:
+ vm = test_guest_stage(vm);
+ break;
+ case UCALL_DONE:
+ guest_done = true;
+ break;
+ case UCALL_ABORT:
+ TEST_FAIL("%s at %s:%ld\n\tvalues: 0x%lx, %lu; %lu, stage: %u",
+ (const char *)uc.args[0], __FILE__, uc.args[1],
+ uc.args[2], uc.args[3], uc.args[4], stage);
+ break;
+ default:
+ TEST_FAIL("Unexpected guest exit\n");
+ }
+ }
+
+ kvm_vm_free(vm);
+}
+
+int main(void)
+{
+ setbuf(stdout, NULL);
+
+ if (kvm_check_cap(KVM_CAP_ARM_HVC_FW_REG_BMAP) != 1) {
+ print_skip("ARM64 fw registers bitmap not supported\n");
+ exit(KSFT_SKIP);
+ }
+
+ test_run();
+ return 0;
+}
--
2.34.0.rc1.387.gb447b232ab-goog
The bitmap firmware psuedo-registers needs special handling, such as
enabling the capability KVM_CAP_ARM_HVC_FW_REG_BMAP. Since there's
no support as of yet in get-reg-list to enable a capability, add a
field 'enable_capability' in the 'struct reg_sublist' to
incorporate this. Also, to not mess with the existing configurations,
create a new vcpu_config to hold these bitmap firmware registers.
Signed-off-by: Raghavendra Rao Ananta <[email protected]>
---
.../selftests/kvm/aarch64/get-reg-list.c | 35 +++++++++++++++++++
1 file changed, 35 insertions(+)
diff --git a/tools/testing/selftests/kvm/aarch64/get-reg-list.c b/tools/testing/selftests/kvm/aarch64/get-reg-list.c
index cc898181faab..7479d0ae501e 100644
--- a/tools/testing/selftests/kvm/aarch64/get-reg-list.c
+++ b/tools/testing/selftests/kvm/aarch64/get-reg-list.c
@@ -40,6 +40,7 @@ static __u64 *blessed_reg, blessed_n;
struct reg_sublist {
const char *name;
long capability;
+ long enable_capability;
int feature;
bool finalize;
__u64 *regs;
@@ -397,6 +398,19 @@ static void check_supported(struct vcpu_config *c)
}
}
+static void enable_capabilities(struct kvm_vm *vm, struct vcpu_config *c)
+{
+ struct reg_sublist *s;
+
+ for_each_sublist(c, s) {
+ if (s->enable_capability) {
+ struct kvm_enable_cap cap = {.cap = s->enable_capability};
+
+ vm_enable_cap(vm, &cap);
+ }
+ }
+}
+
static bool print_list;
static bool print_filtered;
static bool fixup_core_regs;
@@ -414,6 +428,7 @@ static void run_test(struct vcpu_config *c)
vm = vm_create(VM_MODE_DEFAULT, DEFAULT_GUEST_PHY_PAGES, O_RDWR);
prepare_vcpu_init(c, &init);
aarch64_vcpu_add_default(vm, 0, &init, NULL);
+ enable_capabilities(vm, c);
finalize_vcpu(vm, 0, c);
reg_list = vcpu_get_reg_list(vm, 0);
@@ -1014,6 +1029,12 @@ static __u64 sve_rejects_set[] = {
KVM_REG_ARM64_SVE_VLS,
};
+static __u64 fw_reg_bmap_set[] = {
+ KVM_REG_ARM_FW_REG(3), /* KVM_REG_ARM_STD_BMAP */
+ KVM_REG_ARM_FW_REG(4), /* KVM_REG_ARM_STD_HYP_BMAP */
+ KVM_REG_ARM_FW_REG(5), /* KVM_REG_ARM_VENDOR_HYP_BMAP */
+};
+
#define BASE_SUBLIST \
{ "base", .regs = base_regs, .regs_n = ARRAY_SIZE(base_regs), }
#define VREGS_SUBLIST \
@@ -1025,6 +1046,10 @@ static __u64 sve_rejects_set[] = {
{ "sve", .capability = KVM_CAP_ARM_SVE, .feature = KVM_ARM_VCPU_SVE, .finalize = true, \
.regs = sve_regs, .regs_n = ARRAY_SIZE(sve_regs), \
.rejects_set = sve_rejects_set, .rejects_set_n = ARRAY_SIZE(sve_rejects_set), }
+#define FW_REG_BMAP_SUBLIST \
+ { "fw_reg_bmap", .regs = fw_reg_bmap_set, .regs_n = ARRAY_SIZE(fw_reg_bmap_set), \
+ .capability = KVM_CAP_ARM_HVC_FW_REG_BMAP, \
+ .enable_capability = KVM_CAP_ARM_HVC_FW_REG_BMAP, }
static struct vcpu_config vregs_config = {
.sublists = {
@@ -1057,10 +1082,20 @@ static struct vcpu_config sve_pmu_config = {
},
};
+static struct vcpu_config vregs_fw_regs_bmap_config = {
+ .sublists = {
+ BASE_SUBLIST,
+ VREGS_SUBLIST,
+ FW_REG_BMAP_SUBLIST,
+ {0},
+ },
+};
+
static struct vcpu_config *vcpu_configs[] = {
&vregs_config,
&vregs_pmu_config,
&sve_config,
&sve_pmu_config,
+ &vregs_fw_regs_bmap_config,
};
static int vcpu_configs_n = ARRAY_SIZE(vcpu_configs);
--
2.34.0.rc1.387.gb447b232ab-goog
On Sat, 13 Nov 2021 01:22:25 +0000,
Raghavendra Rao Ananta <[email protected]> wrote:
>
> Architectures such as arm64 and riscv uses vcpu variables
> such as has_run_once and ran_atleast_once, respectively,
> to mark if the vCPU has started running. Since these are
> architecture agnostic variables, introduce
> kvm_vcpu_has_run_once() as a core kvm functionality and
> use this instead of the architecture defined variables.
>
> No functional change intended.
>
> Signed-off-by: Raghavendra Rao Ananta <[email protected]>
arm64 is moving away from this, see [1]. You also don't need any new
state, as vcpu->pid gives you exactly what you need.
Happy to queue additional patches on top if you want to deal with
riscv.
Thanks,
M.
[1] https://lore.kernel.org/all/[email protected]/
--
Without deviation from the norm, progress is not possible.
On Sat, 13 Nov 2021 01:22:26 +0000,
Raghavendra Rao Ananta <[email protected]> wrote:
>
> The upcoming patches need a way to detect if the VM, as
> a whole, has started. Hence, unionize kvm_vcpu_has_run_once()
> of all the vcpus of the VM and build kvm_vm_has_run_once()
> to achieve the functionality.
>
> No functional change intended.
>
> Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> ---
> include/linux/kvm_host.h | 2 ++
> virt/kvm/kvm_main.c | 17 +++++++++++++++++
> 2 files changed, 19 insertions(+)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index b373929c71eb..102e00c0e21c 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1854,4 +1854,6 @@ static inline bool kvm_vcpu_has_run_once(struct kvm_vcpu *vcpu)
> return vcpu->has_run_once;
> }
>
> +bool kvm_vm_has_run_once(struct kvm *kvm);
> +
> #endif
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 1ec8a8e959b2..3d8d96e8f61d 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -4339,6 +4339,23 @@ static int kvm_vm_ioctl_get_stats_fd(struct kvm *kvm)
> return fd;
> }
>
> +bool kvm_vm_has_run_once(struct kvm *kvm)
> +{
> + int i, ret = false;
> + struct kvm_vcpu *vcpu;
> +
> + mutex_lock(&kvm->lock);
> +
> + kvm_for_each_vcpu(i, vcpu, kvm) {
> + ret = kvm_vcpu_has_run_once(vcpu);
> + if (ret)
> + break;
> + }
> +
> + mutex_unlock(&kvm->lock);
> + return ret;
> +}
This is horribly racy. Nothing prevents a vcpu from running behind
your back. If you want any sort of guarantee, look at what we do in
kvm_vgic_create(). Alexandru has patches that extract it to make it
generally available (at least for arm64).
M.
--
Without deviation from the norm, progress is not possible.
On Sat, 13 Nov 2021 01:22:27 +0000,
Raghavendra Rao Ananta <[email protected]> wrote:
>
> KVM regularly introduces new hypercall services to the guests without
> any consent from the Virtual Machine Manager (VMM). This means, the
> guests can observe hypercall services in and out as they migrate
> across various host kernel versions. This could be a major problem
> if the guest discovered a hypercall, started using it, and after
> getting migrated to an older kernel realizes that it's no longer
> available. Depending on how the guest handles the change, there's
> a potential chance that the guest would just panic.
>
> As a result, there's a need for the VMM to elect the services that
> it wishes the guest to discover. VMM can elect these services based
> on the kernels spread across its (migration) fleet. To remedy this,
> extend the existing firmware psuedo-registers, such as
> KVM_REG_ARM_PSCI_VERSION, for all the hypercall services available.
>
> These firmware registers are categorized based on the service call
> owners, and unlike the existing firmware psuedo-registers, they hold
> the features supported in the form of a bitmap. During VM (vCPU)
> initialization, the registers shows an upper-limit of the features
> supported by the corresponding registers. The VMM can simply use
> GET_ONE_REG to discover the features. If it's unhappy with any of
> the features, it can simply write-back the desired feature bitmap
> using SET_ONE_REG.
>
> KVM allows these modification only until a VM has started. KVM also
> assumes that the VMM is unaware of a register if a register remains
> unaccessed (read/write), and would simply clear all the bits of the
> registers such that the guest accidently doesn't get exposed to the
> features. Finally, the set of bitmaps from all the registers are the
> services that are exposed to the guest.
>
> In order to provide backward compatibility with already existing VMMs,
> a new capability, KVM_CAP_ARM_HVC_FW_REG_BMAP, is introduced. To enable
> the bitmap firmware registers extension, the capability must be
> explicitly enabled. If not, the behavior is similar to the previous
> setup.
>
> In this patch, the framework adds the register only for ARM's standard
> secure services (owner value 4). Currently, this includes support only
> for ARM True Random Number Generator (TRNG) service, with bit-0 of the
> register representing mandatory features of v1.0. Other services are
> momentarily added in the upcoming patches.
>
> Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> ---
> arch/arm64/include/asm/kvm_host.h | 16 +++
> arch/arm64/include/uapi/asm/kvm.h | 4 +
> arch/arm64/kvm/arm.c | 23 +++-
> arch/arm64/kvm/hypercalls.c | 217 +++++++++++++++++++++++++++++-
> arch/arm64/kvm/trng.c | 9 +-
> include/kvm/arm_hypercalls.h | 7 +
> include/uapi/linux/kvm.h | 1 +
> 7 files changed, 262 insertions(+), 15 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 02dffe50a20c..1546a2f973ef 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -102,6 +102,19 @@ struct kvm_s2_mmu {
> struct kvm_arch_memory_slot {
> };
>
> +struct hvc_fw_reg_bmap {
> + bool accessed;
> + u64 reg_id;
> + u64 bmap;
> +};
> +
> +struct hvc_reg_desc {
> + spinlock_t lock;
> + bool fw_reg_bmap_enabled;
> +
> + struct hvc_fw_reg_bmap hvc_std_bmap;
> +};
Please document what these data structures track. Without any
documentation, it is pretty difficult to build a mental picture of how
this all fits together.
> +
> struct kvm_arch {
> struct kvm_s2_mmu mmu;
>
> @@ -137,6 +150,9 @@ struct kvm_arch {
>
> /* Memory Tagging Extension enabled for the guest */
> bool mte_enabled;
> +
> + /* Hypercall firmware registers' descriptor */
> + struct hvc_reg_desc hvc_desc;
> };
>
> struct kvm_vcpu_fault_info {
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index b3edde68bc3e..d6e099ed14ef 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -281,6 +281,10 @@ struct kvm_arm_copy_mte_tags {
> #define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED 3
> #define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED (1U << 4)
>
> +#define KVM_REG_ARM_STD_BMAP KVM_REG_ARM_FW_REG(3)
> +#define KVM_REG_ARM_STD_BIT_TRNG_V1_0 BIT(0)
> +#define KVM_REG_ARM_STD_BMAP_BIT_MAX 0 /* Last valid bit */
> +
> /* SVE registers */
> #define KVM_REG_ARM64_SVE (0x15 << KVM_REG_ARM_COPROC_SHIFT)
>
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 0cc148211b4e..f2099e4d1109 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -81,26 +81,32 @@ int kvm_arch_check_processor_compat(void *opaque)
> int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> struct kvm_enable_cap *cap)
> {
> - int r;
> + int r = 0;
> + struct hvc_reg_desc *hvc_desc = &kvm->arch.hvc_desc;
>
> if (cap->flags)
> return -EINVAL;
>
> switch (cap->cap) {
> case KVM_CAP_ARM_NISV_TO_USER:
> - r = 0;
> kvm->arch.return_nisv_io_abort_to_user = true;
> break;
> case KVM_CAP_ARM_MTE:
> mutex_lock(&kvm->lock);
> - if (!system_supports_mte() || kvm->created_vcpus) {
> + if (!system_supports_mte() || kvm->created_vcpus)
> r = -EINVAL;
> - } else {
> - r = 0;
> + else
> kvm->arch.mte_enabled = true;
> - }
> mutex_unlock(&kvm->lock);
> break;
> + case KVM_CAP_ARM_HVC_FW_REG_BMAP:
> + if (kvm_vm_has_run_once(kvm))
> + return -EBUSY;
> +
> + spin_lock(&hvc_desc->lock);
Does this really need to be a spin-lock? Are you ever taking it on a
context where you cannot sleep? And why does it need to be a new lock
when we already have a plethora of them?
> + hvc_desc->fw_reg_bmap_enabled = true;
> + spin_unlock(&hvc_desc->lock);
> + break;
> default:
> r = -EINVAL;
> break;
> @@ -157,6 +163,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>
> set_default_spectre(kvm);
>
> + kvm_arm_init_hypercalls(kvm);
> +
> return ret;
> out_free_stage2_pgd:
> kvm_free_stage2_pgd(&kvm->arch.mmu);
> @@ -215,6 +223,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> case KVM_CAP_SET_GUEST_DEBUG:
> case KVM_CAP_VCPU_ATTRIBUTES:
> case KVM_CAP_PTP_KVM:
> + case KVM_CAP_ARM_HVC_FW_REG_BMAP:
> r = 1;
> break;
> case KVM_CAP_SET_GUEST_DEBUG2:
> @@ -622,6 +631,8 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
> if (kvm_vm_is_protected(kvm))
> kvm_call_hyp_nvhe(__pkvm_vcpu_init_traps, vcpu);
>
> + kvm_arm_sanitize_fw_regs(kvm);
What is the rational for doing this on first run? Surely this could be
done at the point where userspace performs the write, couldn't it?
My mental model is that the VMM reads some data, clears some bits if
it really wants to, and writes it back. There shouldn't be anything to
sanitise after the facts. Or at least that's my gut feeling so far.
/me reads on.
> +
> return ret;
> }
>
> diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> index 9e136d91b470..f5df7bc61146 100644
> --- a/arch/arm64/kvm/hypercalls.c
> +++ b/arch/arm64/kvm/hypercalls.c
> @@ -58,6 +58,41 @@ static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
> val[3] = lower_32_bits(cycles);
> }
>
> +static bool
> +kvm_arm_fw_reg_feat_enabled(struct hvc_fw_reg_bmap *reg_bmap, u64 feat_bit)
> +{
> + return reg_bmap->bmap & feat_bit;
> +}
> +
> +bool kvm_hvc_call_supported(struct kvm_vcpu *vcpu, u32 func_id)
> +{
> + struct hvc_reg_desc *hvc_desc = &vcpu->kvm->arch.hvc_desc;
> +
> + /*
> + * To ensure backward compatibility, support all the service calls,
> + * including new additions, if the firmware registers holding the
> + * feature bitmaps isn't explicitly enabled.
> + */
> + if (!hvc_desc->fw_reg_bmap_enabled)
> + return true;
> +
> + switch (func_id) {
> + case ARM_SMCCC_TRNG_VERSION:
> + case ARM_SMCCC_TRNG_FEATURES:
> + case ARM_SMCCC_TRNG_GET_UUID:
> + case ARM_SMCCC_TRNG_RND32:
> + case ARM_SMCCC_TRNG_RND64:
> + return kvm_arm_fw_reg_feat_enabled(&hvc_desc->hvc_std_bmap,
> + KVM_REG_ARM_STD_BIT_TRNG_V1_0);
> + default:
> + /* By default, allow the services that aren't listed here */
> + return true;
> + }
> +
> + /* We shouldn't be reaching here */
> + return true;
So why have anything at all?
> +}
> +
> int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> {
> u32 func_id = smccc_get_function(vcpu);
> @@ -65,6 +100,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> u32 feature;
> gpa_t gpa;
>
> + if (!kvm_hvc_call_supported(vcpu, func_id))
> + goto out;
> +
> switch (func_id) {
> case ARM_SMCCC_VERSION_FUNC_ID:
> val[0] = ARM_SMCCC_VERSION_1_1;
> @@ -143,6 +181,7 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> return kvm_psci_call(vcpu);
> }
>
> +out:
> smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
> return 1;
> }
> @@ -153,17 +192,178 @@ static const u64 fw_reg_ids[] = {
> KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2,
> };
>
> +static const u64 fw_reg_bmap_ids[] = {
> + KVM_REG_ARM_STD_BMAP,
> +};
> +
> +static void kvm_arm_fw_reg_init_hvc(struct hvc_reg_desc *hvc_desc,
> + struct hvc_fw_reg_bmap *fw_reg_bmap,
> + u64 reg_id, u64 default_map)
> +{
> + fw_reg_bmap->reg_id = reg_id;
> + fw_reg_bmap->bmap = default_map;
> +}
> +
> +void kvm_arm_init_hypercalls(struct kvm *kvm)
> +{
> + struct hvc_reg_desc *hvc_desc = &kvm->arch.hvc_desc;
> +
> + spin_lock_init(&hvc_desc->lock);
> +
> + kvm_arm_fw_reg_init_hvc(hvc_desc, &hvc_desc->hvc_std_bmap,
> + KVM_REG_ARM_STD_BMAP, ARM_SMCCC_STD_FEATURES);
> +}
> +
> +static void kvm_arm_fw_reg_sanitize(struct hvc_fw_reg_bmap *fw_reg_bmap)
> +{
> + if (!fw_reg_bmap->accessed)
> + fw_reg_bmap->bmap = 0;
> +}
> +
> +/*
> + * kvm_arm_sanitize_fw_regs: Sanitize the hypercall firmware registers
> + *
> + * Sanitization, in the case of hypercall firmware registers, is basically
> + * clearing out the feature bitmaps so that the guests are not exposed to
> + * the services corresponding to a particular register. The registers that
> + * needs sanitization is decided on two factors on the user-space part:
> + * 1. Enablement of KVM_CAP_ARM_HVC_FW_REG_BMAP:
> + * If the user-space hasn't enabled the capability, it either means
> + * that it's unaware of its existence, or it simply doesn't want to
> + * participate in the arrangement and is okay with the default settings.
> + * The former case is to ensure backward compatibility.
> + *
> + * 2. Has the user-space accessed (read/write) the register? :
> + * If yes, it means that the user-space is aware of the register's
> + * existence and can set the bits as it sees fit for the guest. A
> + * read-only access from user-space indicates that the user-space is
> + * happy with the default settings, and doesn't wish to change it.
> + *
> + * The logic for sanitizing a register will then be:
> + * ---------------------------------------------------------------------------
> + * | CAP enabled | Accessed reg | Clear reg | Comments |
> + * ---------------------------------------------------------------------------
> + * | N | N | N | |
> + * | N | Y | N | -ENOENT returned during access |
> + * | Y | N | Y | |
> + * | Y | Y | N | |
> + * ---------------------------------------------------------------------------
> + */
> +void kvm_arm_sanitize_fw_regs(struct kvm *kvm)
> +{
> + struct hvc_reg_desc *hvc_desc = &kvm->arch.hvc_desc;
> +
> + spin_lock(&hvc_desc->lock);
> +
> + if (!hvc_desc->fw_reg_bmap_enabled)
> + goto out;
> +
> + kvm_arm_fw_reg_sanitize(&hvc_desc->hvc_std_bmap);
> +
> +out:
> + spin_unlock(&hvc_desc->lock);
I keep being baffled by this. Why should we track the VMM accesses or
the VMM writeback? This logic doesn't seem to bring anything useful as
far as I can tell. All we need to ensure is that what is written to
the pseudo-register is an acceptable subset of the previous value, and
I cannot see why this can't be done at write-time.
If you want to hide this behind a capability, fine (although my guts
feeling is that we don't need that either). But I really want to be
convinced about all this tracking.
> +}
> +
> +static int kvm_arm_fw_reg_get_bmap(struct kvm *kvm,
> + struct hvc_fw_reg_bmap *fw_reg_bmap, u64 *val)
> +{
> + int ret = 0;
> + struct hvc_reg_desc *hvc_desc = &kvm->arch.hvc_desc;
> +
> + spin_lock(&hvc_desc->lock);
> +
> + if (!hvc_desc->fw_reg_bmap_enabled) {
> + ret = -ENOENT;
> + goto out;
> + }
> +
> + fw_reg_bmap->accessed = true;
> + *val = fw_reg_bmap->bmap;
> +out:
> + spin_unlock(&hvc_desc->lock);
> + return ret;
> +}
> +
> +static int kvm_arm_fw_reg_set_bmap(struct kvm *kvm,
> + struct hvc_fw_reg_bmap *fw_reg_bmap, u64 val)
> +{
> + int ret = 0;
> + u64 fw_reg_features;
> + struct hvc_reg_desc *hvc_desc = &kvm->arch.hvc_desc;
> +
> + spin_lock(&hvc_desc->lock);
> +
> + if (!hvc_desc->fw_reg_bmap_enabled) {
> + ret = -ENOENT;
> + goto out;
> + }
> +
> + if (fw_reg_bmap->bmap == val)
> + goto out;
> +
> + if (kvm_vm_has_run_once(kvm)) {
> + ret = -EBUSY;
> + goto out;
> + }
> +
> + switch (fw_reg_bmap->reg_id) {
> + case KVM_REG_ARM_STD_BMAP:
> + fw_reg_features = ARM_SMCCC_STD_FEATURES;
> + break;
> + default:
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + /* Check for unsupported feature bit */
> + if (val & ~fw_reg_features) {
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + fw_reg_bmap->accessed = true;
> + fw_reg_bmap->bmap = val;
> +out:
> + spin_unlock(&hvc_desc->lock);
> + return ret;
> +}
> +
> int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu)
> {
> - return ARRAY_SIZE(fw_reg_ids);
> + struct hvc_reg_desc *hvc_desc = &vcpu->kvm->arch.hvc_desc;
> + int n_regs = ARRAY_SIZE(fw_reg_ids);
> +
> + spin_lock(&hvc_desc->lock);
> +
> + if (hvc_desc->fw_reg_bmap_enabled)
> + n_regs += ARRAY_SIZE(fw_reg_bmap_ids);
> +
> + spin_unlock(&hvc_desc->lock);
> +
> + return n_regs;
> }
>
> int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
> {
> + struct hvc_reg_desc *hvc_desc = &vcpu->kvm->arch.hvc_desc;
> int i;
>
> for (i = 0; i < ARRAY_SIZE(fw_reg_ids); i++) {
> - if (put_user(fw_reg_ids[i], uindices))
> + if (put_user(fw_reg_ids[i], uindices++))
> + return -EFAULT;
> + }
> +
> + spin_lock(&hvc_desc->lock);
> +
> + if (!hvc_desc->fw_reg_bmap_enabled) {
> + spin_unlock(&hvc_desc->lock);
> + return 0;
> + }
> +
> + spin_unlock(&hvc_desc->lock);
> +
> + for (i = 0; i < ARRAY_SIZE(fw_reg_bmap_ids); i++) {
> + if (put_user(fw_reg_bmap_ids[i], uindices++))
> return -EFAULT;
I really don't get what you are trying to achieve with this locking.
You guard the 'enabled' bit, but you still allow the kernel view to be
copied to userspace while another thread is writing to it?
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
Hello Marc,
On Mon, Nov 22, 2021 at 8:27 AM Marc Zyngier <[email protected]> wrote:
>
> On Sat, 13 Nov 2021 01:22:25 +0000,
> Raghavendra Rao Ananta <[email protected]> wrote:
> >
> > Architectures such as arm64 and riscv uses vcpu variables
> > such as has_run_once and ran_atleast_once, respectively,
> > to mark if the vCPU has started running. Since these are
> > architecture agnostic variables, introduce
> > kvm_vcpu_has_run_once() as a core kvm functionality and
> > use this instead of the architecture defined variables.
> >
> > No functional change intended.
> >
> > Signed-off-by: Raghavendra Rao Ananta <[email protected]>
>
> arm64 is moving away from this, see [1]. You also don't need any new
> state, as vcpu->pid gives you exactly what you need.
Thanks for the pointer. I can directly use this!
>
> Happy to queue additional patches on top if you want to deal with
> riscv.
>
Just to clarify, you mean get the kvm support for riscv on kvmarm-next?
If yes, then sure, I can make changes in riscv to use
vcpu_has_run_once() from your series.
Regards,
Raghavendra
> Thanks,
>
> M.
>
> [1] https://lore.kernel.org/all/[email protected]/
>
> --
> Without deviation from the norm, progress is not possible.
On Mon, Nov 22, 2021 at 9:23 AM Marc Zyngier <[email protected]> wrote:
>
> On Sat, 13 Nov 2021 01:22:27 +0000,
> Raghavendra Rao Ananta <[email protected]> wrote:
> >
> > KVM regularly introduces new hypercall services to the guests without
> > any consent from the Virtual Machine Manager (VMM). This means, the
> > guests can observe hypercall services in and out as they migrate
> > across various host kernel versions. This could be a major problem
> > if the guest discovered a hypercall, started using it, and after
> > getting migrated to an older kernel realizes that it's no longer
> > available. Depending on how the guest handles the change, there's
> > a potential chance that the guest would just panic.
> >
> > As a result, there's a need for the VMM to elect the services that
> > it wishes the guest to discover. VMM can elect these services based
> > on the kernels spread across its (migration) fleet. To remedy this,
> > extend the existing firmware psuedo-registers, such as
> > KVM_REG_ARM_PSCI_VERSION, for all the hypercall services available.
> >
> > These firmware registers are categorized based on the service call
> > owners, and unlike the existing firmware psuedo-registers, they hold
> > the features supported in the form of a bitmap. During VM (vCPU)
> > initialization, the registers shows an upper-limit of the features
> > supported by the corresponding registers. The VMM can simply use
> > GET_ONE_REG to discover the features. If it's unhappy with any of
> > the features, it can simply write-back the desired feature bitmap
> > using SET_ONE_REG.
> >
> > KVM allows these modification only until a VM has started. KVM also
> > assumes that the VMM is unaware of a register if a register remains
> > unaccessed (read/write), and would simply clear all the bits of the
> > registers such that the guest accidently doesn't get exposed to the
> > features. Finally, the set of bitmaps from all the registers are the
> > services that are exposed to the guest.
> >
> > In order to provide backward compatibility with already existing VMMs,
> > a new capability, KVM_CAP_ARM_HVC_FW_REG_BMAP, is introduced. To enable
> > the bitmap firmware registers extension, the capability must be
> > explicitly enabled. If not, the behavior is similar to the previous
> > setup.
> >
> > In this patch, the framework adds the register only for ARM's standard
> > secure services (owner value 4). Currently, this includes support only
> > for ARM True Random Number Generator (TRNG) service, with bit-0 of the
> > register representing mandatory features of v1.0. Other services are
> > momentarily added in the upcoming patches.
> >
> > Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> > ---
> > arch/arm64/include/asm/kvm_host.h | 16 +++
> > arch/arm64/include/uapi/asm/kvm.h | 4 +
> > arch/arm64/kvm/arm.c | 23 +++-
> > arch/arm64/kvm/hypercalls.c | 217 +++++++++++++++++++++++++++++-
> > arch/arm64/kvm/trng.c | 9 +-
> > include/kvm/arm_hypercalls.h | 7 +
> > include/uapi/linux/kvm.h | 1 +
> > 7 files changed, 262 insertions(+), 15 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 02dffe50a20c..1546a2f973ef 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -102,6 +102,19 @@ struct kvm_s2_mmu {
> > struct kvm_arch_memory_slot {
> > };
> >
> > +struct hvc_fw_reg_bmap {
> > + bool accessed;
> > + u64 reg_id;
> > + u64 bmap;
> > +};
> > +
> > +struct hvc_reg_desc {
> > + spinlock_t lock;
> > + bool fw_reg_bmap_enabled;
> > +
> > + struct hvc_fw_reg_bmap hvc_std_bmap;
> > +};
>
> Please document what these data structures track. Without any
> documentation, it is pretty difficult to build a mental picture of how
> this all fits together.
>
Sure, will do.
> > +
> > struct kvm_arch {
> > struct kvm_s2_mmu mmu;
> >
> > @@ -137,6 +150,9 @@ struct kvm_arch {
> >
> > /* Memory Tagging Extension enabled for the guest */
> > bool mte_enabled;
> > +
> > + /* Hypercall firmware registers' descriptor */
> > + struct hvc_reg_desc hvc_desc;
> > };
> >
> > struct kvm_vcpu_fault_info {
> > diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> > index b3edde68bc3e..d6e099ed14ef 100644
> > --- a/arch/arm64/include/uapi/asm/kvm.h
> > +++ b/arch/arm64/include/uapi/asm/kvm.h
> > @@ -281,6 +281,10 @@ struct kvm_arm_copy_mte_tags {
> > #define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED 3
> > #define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED (1U << 4)
> >
> > +#define KVM_REG_ARM_STD_BMAP KVM_REG_ARM_FW_REG(3)
> > +#define KVM_REG_ARM_STD_BIT_TRNG_V1_0 BIT(0)
> > +#define KVM_REG_ARM_STD_BMAP_BIT_MAX 0 /* Last valid bit */
> > +
> > /* SVE registers */
> > #define KVM_REG_ARM64_SVE (0x15 << KVM_REG_ARM_COPROC_SHIFT)
> >
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 0cc148211b4e..f2099e4d1109 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -81,26 +81,32 @@ int kvm_arch_check_processor_compat(void *opaque)
> > int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> > struct kvm_enable_cap *cap)
> > {
> > - int r;
> > + int r = 0;
> > + struct hvc_reg_desc *hvc_desc = &kvm->arch.hvc_desc;
> >
> > if (cap->flags)
> > return -EINVAL;
> >
> > switch (cap->cap) {
> > case KVM_CAP_ARM_NISV_TO_USER:
> > - r = 0;
> > kvm->arch.return_nisv_io_abort_to_user = true;
> > break;
> > case KVM_CAP_ARM_MTE:
> > mutex_lock(&kvm->lock);
> > - if (!system_supports_mte() || kvm->created_vcpus) {
> > + if (!system_supports_mte() || kvm->created_vcpus)
> > r = -EINVAL;
> > - } else {
> > - r = 0;
> > + else
> > kvm->arch.mte_enabled = true;
> > - }
> > mutex_unlock(&kvm->lock);
> > break;
> > + case KVM_CAP_ARM_HVC_FW_REG_BMAP:
> > + if (kvm_vm_has_run_once(kvm))
> > + return -EBUSY;
> > +
> > + spin_lock(&hvc_desc->lock);
>
> Does this really need to be a spin-lock? Are you ever taking it on a
> context where you cannot sleep? And why does it need to be a new lock
> when we already have a plethora of them?
>
I suppose I was going with the fact that we have very small critical
sections and could just go with a spinlock without interfering with
any other paths. But I suppose that's not needed. I can go with the
kvm->lock mutex instead.
> > + hvc_desc->fw_reg_bmap_enabled = true;
> > + spin_unlock(&hvc_desc->lock);
> > + break;
> > default:
> > r = -EINVAL;
> > break;
> > @@ -157,6 +163,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> >
> > set_default_spectre(kvm);
> >
> > + kvm_arm_init_hypercalls(kvm);
> > +
> > return ret;
> > out_free_stage2_pgd:
> > kvm_free_stage2_pgd(&kvm->arch.mmu);
> > @@ -215,6 +223,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> > case KVM_CAP_SET_GUEST_DEBUG:
> > case KVM_CAP_VCPU_ATTRIBUTES:
> > case KVM_CAP_PTP_KVM:
> > + case KVM_CAP_ARM_HVC_FW_REG_BMAP:
> > r = 1;
> > break;
> > case KVM_CAP_SET_GUEST_DEBUG2:
> > @@ -622,6 +631,8 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
> > if (kvm_vm_is_protected(kvm))
> > kvm_call_hyp_nvhe(__pkvm_vcpu_init_traps, vcpu);
> >
> > + kvm_arm_sanitize_fw_regs(kvm);
>
> What is the rational for doing this on first run? Surely this could be
> done at the point where userspace performs the write, couldn't it?
>
> My mental model is that the VMM reads some data, clears some bits if
> it really wants to, and writes it back. There shouldn't be anything to
> sanitise after the facts. Or at least that's my gut feeling so far.
>
I tried to explain things below..
> /me reads on.
>
> > +
> > return ret;
> > }
> >
> > diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> > index 9e136d91b470..f5df7bc61146 100644
> > --- a/arch/arm64/kvm/hypercalls.c
> > +++ b/arch/arm64/kvm/hypercalls.c
> > @@ -58,6 +58,41 @@ static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
> > val[3] = lower_32_bits(cycles);
> > }
> >
> > +static bool
> > +kvm_arm_fw_reg_feat_enabled(struct hvc_fw_reg_bmap *reg_bmap, u64 feat_bit)
> > +{
> > + return reg_bmap->bmap & feat_bit;
> > +}
> > +
> > +bool kvm_hvc_call_supported(struct kvm_vcpu *vcpu, u32 func_id)
> > +{
> > + struct hvc_reg_desc *hvc_desc = &vcpu->kvm->arch.hvc_desc;
> > +
> > + /*
> > + * To ensure backward compatibility, support all the service calls,
> > + * including new additions, if the firmware registers holding the
> > + * feature bitmaps isn't explicitly enabled.
> > + */
> > + if (!hvc_desc->fw_reg_bmap_enabled)
> > + return true;
> > +
> > + switch (func_id) {
> > + case ARM_SMCCC_TRNG_VERSION:
> > + case ARM_SMCCC_TRNG_FEATURES:
> > + case ARM_SMCCC_TRNG_GET_UUID:
> > + case ARM_SMCCC_TRNG_RND32:
> > + case ARM_SMCCC_TRNG_RND64:
> > + return kvm_arm_fw_reg_feat_enabled(&hvc_desc->hvc_std_bmap,
> > + KVM_REG_ARM_STD_BIT_TRNG_V1_0);
> > + default:
> > + /* By default, allow the services that aren't listed here */
> > + return true;
> > + }
> > +
> > + /* We shouldn't be reaching here */
> > + return true;
>
> So why have anything at all?
>
I was expecting the compiler might complain, but guess not, I'll remove this.
> > +}
> > +
> > int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> > {
> > u32 func_id = smccc_get_function(vcpu);
> > @@ -65,6 +100,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> > u32 feature;
> > gpa_t gpa;
> >
> > + if (!kvm_hvc_call_supported(vcpu, func_id))
> > + goto out;
> > +
> > switch (func_id) {
> > case ARM_SMCCC_VERSION_FUNC_ID:
> > val[0] = ARM_SMCCC_VERSION_1_1;
> > @@ -143,6 +181,7 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> > return kvm_psci_call(vcpu);
> > }
> >
> > +out:
> > smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
> > return 1;
> > }
> > @@ -153,17 +192,178 @@ static const u64 fw_reg_ids[] = {
> > KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2,
> > };
> >
> > +static const u64 fw_reg_bmap_ids[] = {
> > + KVM_REG_ARM_STD_BMAP,
> > +};
> > +
> > +static void kvm_arm_fw_reg_init_hvc(struct hvc_reg_desc *hvc_desc,
> > + struct hvc_fw_reg_bmap *fw_reg_bmap,
> > + u64 reg_id, u64 default_map)
> > +{
> > + fw_reg_bmap->reg_id = reg_id;
> > + fw_reg_bmap->bmap = default_map;
> > +}
> > +
> > +void kvm_arm_init_hypercalls(struct kvm *kvm)
> > +{
> > + struct hvc_reg_desc *hvc_desc = &kvm->arch.hvc_desc;
> > +
> > + spin_lock_init(&hvc_desc->lock);
> > +
> > + kvm_arm_fw_reg_init_hvc(hvc_desc, &hvc_desc->hvc_std_bmap,
> > + KVM_REG_ARM_STD_BMAP, ARM_SMCCC_STD_FEATURES);
> > +}
> > +
> > +static void kvm_arm_fw_reg_sanitize(struct hvc_fw_reg_bmap *fw_reg_bmap)
> > +{
> > + if (!fw_reg_bmap->accessed)
> > + fw_reg_bmap->bmap = 0;
> > +}
> > +
> > +/*
> > + * kvm_arm_sanitize_fw_regs: Sanitize the hypercall firmware registers
> > + *
> > + * Sanitization, in the case of hypercall firmware registers, is basically
> > + * clearing out the feature bitmaps so that the guests are not exposed to
> > + * the services corresponding to a particular register. The registers that
> > + * needs sanitization is decided on two factors on the user-space part:
> > + * 1. Enablement of KVM_CAP_ARM_HVC_FW_REG_BMAP:
> > + * If the user-space hasn't enabled the capability, it either means
> > + * that it's unaware of its existence, or it simply doesn't want to
> > + * participate in the arrangement and is okay with the default settings.
> > + * The former case is to ensure backward compatibility.
> > + *
> > + * 2. Has the user-space accessed (read/write) the register? :
> > + * If yes, it means that the user-space is aware of the register's
> > + * existence and can set the bits as it sees fit for the guest. A
> > + * read-only access from user-space indicates that the user-space is
> > + * happy with the default settings, and doesn't wish to change it.
> > + *
> > + * The logic for sanitizing a register will then be:
> > + * ---------------------------------------------------------------------------
> > + * | CAP enabled | Accessed reg | Clear reg | Comments |
> > + * ---------------------------------------------------------------------------
> > + * | N | N | N | |
> > + * | N | Y | N | -ENOENT returned during access |
> > + * | Y | N | Y | |
> > + * | Y | Y | N | |
> > + * ---------------------------------------------------------------------------
> > + */
> > +void kvm_arm_sanitize_fw_regs(struct kvm *kvm)
> > +{
> > + struct hvc_reg_desc *hvc_desc = &kvm->arch.hvc_desc;
> > +
> > + spin_lock(&hvc_desc->lock);
> > +
> > + if (!hvc_desc->fw_reg_bmap_enabled)
> > + goto out;
> > +
> > + kvm_arm_fw_reg_sanitize(&hvc_desc->hvc_std_bmap);
> > +
> > +out:
> > + spin_unlock(&hvc_desc->lock);
>
> I keep being baffled by this. Why should we track the VMM accesses or
> the VMM writeback? This logic doesn't seem to bring anything useful as
> far as I can tell. All we need to ensure is that what is written to
> the pseudo-register is an acceptable subset of the previous value, and
> I cannot see why this can't be done at write-time.
>
> If you want to hide this behind a capability, fine (although my guts
> feeling is that we don't need that either). But I really want to be
> convinced about all this tracking.
>
The tracking of each owner register is necessary here to safe-guard
the possibility that the user-space may not be aware of a newly
introduced register, and hence, hasn't accessed it. If it had at least
read the register, but not write-back, we assume that the user-space
is happy with the configuration. But the fact that the register has
not even been read would state that user-space is unaware of the
existence of this new register. In such a case, if we don't sanitize
(clear all the bits) this register, the features will be exposed
unconditionally to the guest.
The capability is introduced here to make sure that this new
infrastructure is backward compatible with old VMMs. If the VMMs don't
enable this capability, they are probably unaware of this, and this
will work as it always has- expose new services to the guest
unconditionally as and when they are introduced.
> > +}
> > +
> > +static int kvm_arm_fw_reg_get_bmap(struct kvm *kvm,
> > + struct hvc_fw_reg_bmap *fw_reg_bmap, u64 *val)
> > +{
> > + int ret = 0;
> > + struct hvc_reg_desc *hvc_desc = &kvm->arch.hvc_desc;
> > +
> > + spin_lock(&hvc_desc->lock);
> > +
> > + if (!hvc_desc->fw_reg_bmap_enabled) {
> > + ret = -ENOENT;
> > + goto out;
> > + }
> > +
> > + fw_reg_bmap->accessed = true;
> > + *val = fw_reg_bmap->bmap;
> > +out:
> > + spin_unlock(&hvc_desc->lock);
> > + return ret;
> > +}
> > +
> > +static int kvm_arm_fw_reg_set_bmap(struct kvm *kvm,
> > + struct hvc_fw_reg_bmap *fw_reg_bmap, u64 val)
> > +{
> > + int ret = 0;
> > + u64 fw_reg_features;
> > + struct hvc_reg_desc *hvc_desc = &kvm->arch.hvc_desc;
> > +
> > + spin_lock(&hvc_desc->lock);
> > +
> > + if (!hvc_desc->fw_reg_bmap_enabled) {
> > + ret = -ENOENT;
> > + goto out;
> > + }
> > +
> > + if (fw_reg_bmap->bmap == val)
> > + goto out;
> > +
> > + if (kvm_vm_has_run_once(kvm)) {
> > + ret = -EBUSY;
> > + goto out;
> > + }
> > +
> > + switch (fw_reg_bmap->reg_id) {
> > + case KVM_REG_ARM_STD_BMAP:
> > + fw_reg_features = ARM_SMCCC_STD_FEATURES;
> > + break;
> > + default:
> > + ret = -EINVAL;
> > + goto out;
> > + }
> > +
> > + /* Check for unsupported feature bit */
> > + if (val & ~fw_reg_features) {
> > + ret = -EINVAL;
> > + goto out;
> > + }
> > +
> > + fw_reg_bmap->accessed = true;
> > + fw_reg_bmap->bmap = val;
> > +out:
> > + spin_unlock(&hvc_desc->lock);
> > + return ret;
> > +}
> > +
> > int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu)
> > {
> > - return ARRAY_SIZE(fw_reg_ids);
> > + struct hvc_reg_desc *hvc_desc = &vcpu->kvm->arch.hvc_desc;
> > + int n_regs = ARRAY_SIZE(fw_reg_ids);
> > +
> > + spin_lock(&hvc_desc->lock);
> > +
> > + if (hvc_desc->fw_reg_bmap_enabled)
> > + n_regs += ARRAY_SIZE(fw_reg_bmap_ids);
> > +
> > + spin_unlock(&hvc_desc->lock);
> > +
> > + return n_regs;
> > }
> >
> > int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
> > {
> > + struct hvc_reg_desc *hvc_desc = &vcpu->kvm->arch.hvc_desc;
> > int i;
> >
> > for (i = 0; i < ARRAY_SIZE(fw_reg_ids); i++) {
> > - if (put_user(fw_reg_ids[i], uindices))
> > + if (put_user(fw_reg_ids[i], uindices++))
> > + return -EFAULT;
> > + }
> > +
> > + spin_lock(&hvc_desc->lock);
> > +
> > + if (!hvc_desc->fw_reg_bmap_enabled) {
> > + spin_unlock(&hvc_desc->lock);
> > + return 0;
> > + }
> > +
> > + spin_unlock(&hvc_desc->lock);
> > +
> > + for (i = 0; i < ARRAY_SIZE(fw_reg_bmap_ids); i++) {
> > + if (put_user(fw_reg_bmap_ids[i], uindices++))
> > return -EFAULT;
>
> I really don't get what you are trying to achieve with this locking.
> You guard the 'enabled' bit, but you still allow the kernel view to be
> copied to userspace while another thread is writing to it?
>
That's my mistake. Thanks for pointing it out. I'll get rid of the
spinlock and use the existing ones correctly.
Regards,
Raghavendra
> Thanks,
>
> M.
>
> --
> Without deviation from the norm, progress is not possible.
On Mon, Nov 22, 2021 at 8:31 AM Marc Zyngier <[email protected]> wrote:
>
> On Sat, 13 Nov 2021 01:22:26 +0000,
> Raghavendra Rao Ananta <[email protected]> wrote:
> >
> > The upcoming patches need a way to detect if the VM, as
> > a whole, has started. Hence, unionize kvm_vcpu_has_run_once()
> > of all the vcpus of the VM and build kvm_vm_has_run_once()
> > to achieve the functionality.
> >
> > No functional change intended.
> >
> > Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> > ---
> > include/linux/kvm_host.h | 2 ++
> > virt/kvm/kvm_main.c | 17 +++++++++++++++++
> > 2 files changed, 19 insertions(+)
> >
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index b373929c71eb..102e00c0e21c 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -1854,4 +1854,6 @@ static inline bool kvm_vcpu_has_run_once(struct kvm_vcpu *vcpu)
> > return vcpu->has_run_once;
> > }
> >
> > +bool kvm_vm_has_run_once(struct kvm *kvm);
> > +
> > #endif
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 1ec8a8e959b2..3d8d96e8f61d 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -4339,6 +4339,23 @@ static int kvm_vm_ioctl_get_stats_fd(struct kvm *kvm)
> > return fd;
> > }
> >
> > +bool kvm_vm_has_run_once(struct kvm *kvm)
> > +{
> > + int i, ret = false;
> > + struct kvm_vcpu *vcpu;
> > +
> > + mutex_lock(&kvm->lock);
> > +
> > + kvm_for_each_vcpu(i, vcpu, kvm) {
> > + ret = kvm_vcpu_has_run_once(vcpu);
> > + if (ret)
> > + break;
> > + }
> > +
> > + mutex_unlock(&kvm->lock);
> > + return ret;
> > +}
>
> This is horribly racy. Nothing prevents a vcpu from running behind
> your back. If you want any sort of guarantee, look at what we do in
> kvm_vgic_create(). Alexandru has patches that extract it to make it
> generally available (at least for arm64).
>
Yes, I looked into kvm_lock_all_vcpus(), but the fact that the series
would call the function with the current vcpu lock held caused me to
back off..
Perhaps I can come up with a similar function, kvm_lock_all_vcpus_except(vcpu) ?
Regards,
Raghavendra
> M.
>
> --
> Without deviation from the norm, progress is not possible.
On Sat, Nov 13, 2021 at 01:22:24AM +0000, Raghavendra Rao Ananta wrote:
> Common hypercall firmware register handing is currently employed
> by psci.c. Since the upcoming patches add more of these registers,
> it's better to move the generic handling to hypercall.c for a
> cleaner presentation.
>
> While we are at it, collect all the firmware registers under
> fw_reg_ids[] to help implement kvm_arm_get_fw_num_regs() and
> kvm_arm_copy_fw_reg_indices() in a generic way.
>
> No functional change intended.
>
> Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> ---
> arch/arm64/kvm/guest.c | 2 +-
> arch/arm64/kvm/hypercalls.c | 170 +++++++++++++++++++++++++++++++++++
> arch/arm64/kvm/psci.c | 166 ----------------------------------
> include/kvm/arm_hypercalls.h | 7 ++
> include/kvm/arm_psci.h | 7 --
> 5 files changed, 178 insertions(+), 174 deletions(-)
>
> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index 5ce26bedf23c..625f97f7b304 100644
> --- a/arch/arm64/kvm/guest.c
> +++ b/arch/arm64/kvm/guest.c
> @@ -18,7 +18,7 @@
> #include <linux/string.h>
> #include <linux/vmalloc.h>
> #include <linux/fs.h>
> -#include <kvm/arm_psci.h>
> +#include <kvm/arm_hypercalls.h>
> #include <asm/cputype.h>
> #include <linux/uaccess.h>
> #include <asm/fpsimd.h>
> diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> index 30da78f72b3b..9e136d91b470 100644
> --- a/arch/arm64/kvm/hypercalls.c
> +++ b/arch/arm64/kvm/hypercalls.c
> @@ -146,3 +146,173 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
> return 1;
> }
> +
> +static const u64 fw_reg_ids[] = {
> + KVM_REG_ARM_PSCI_VERSION,
> + KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1,
> + KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2,
> +};
> +
> +int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu)
> +{
> + return ARRAY_SIZE(fw_reg_ids);
> +}
> +
> +int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
> +{
> + int i;
> +
> + for (i = 0; i < ARRAY_SIZE(fw_reg_ids); i++) {
> + if (put_user(fw_reg_ids[i], uindices))
This is missing the ++ on uindices, so it just writes the same offset
three times.
> + return -EFAULT;
> + }
> +
> + return 0;
> +}
I assume the rest of the patch is just a cut+paste move of code.
Thanks,
drew
On Tue, Nov 23, 2021 at 10:34:23AM -0800, Raghavendra Rao Ananta wrote:
> On Mon, Nov 22, 2021 at 9:23 AM Marc Zyngier <[email protected]> wrote:
> > I keep being baffled by this. Why should we track the VMM accesses or
> > the VMM writeback? This logic doesn't seem to bring anything useful as
> > far as I can tell. All we need to ensure is that what is written to
> > the pseudo-register is an acceptable subset of the previous value, and
> > I cannot see why this can't be done at write-time.
> >
> > If you want to hide this behind a capability, fine (although my guts
> > feeling is that we don't need that either). But I really want to be
> > convinced about all this tracking.
> >
> The tracking of each owner register is necessary here to safe-guard
> the possibility that the user-space may not be aware of a newly
> introduced register, and hence, hasn't accessed it. If it had at least
> read the register, but not write-back, we assume that the user-space
> is happy with the configuration. But the fact that the register has
> not even been read would state that user-space is unaware of the
> existence of this new register. In such a case, if we don't sanitize
> (clear all the bits) this register, the features will be exposed
> unconditionally to the guest.
>
> The capability is introduced here to make sure that this new
> infrastructure is backward compatible with old VMMs. If the VMMs don't
> enable this capability, they are probably unaware of this, and this
> will work as it always has- expose new services to the guest
> unconditionally as and when they are introduced.
Hi Raghavendra,
I don't think we need a CAP that has to be enabled or to make any
assumptions or policy decisions in the kernel. I think we just need to
provide a bit more information to the VMM when it checks if KVM has the
CAP. If KVM would tell the VMM how may pseudo registers there are, which
can be done with the return value of the CAP, then the VMM code could be
something like this
r = check_cap(KVM_CAP_ARM_HVC_FW_REG_BMAP);
if (r) {
num_regs = r;
for (idx = 0; idx < num_regs; ++idx) {
reg = hvc_fw_reg(idx);
if (idx > vmm_last_known_idx) {
...
} else {
...
}
}
}
With this, the VMM is free to decide if it wants to clear all registers
greater than the last index it was aware of or if it wants to let those
registers just get exposed to the guest without knowing what's getting
exposed. Along with documenting that by default everything gets exposed
by KVM, which is the backwards compatible thing to do, then the VMM has
been warned and given everything it needs to manage its guests.
Another thing that might be nice is giving userspace control of how many
pseudo registers show up in get-reg-list. In order to migrate from a host
with a more recent KVM to a host with an older KVM[*] we should only
expose the number of pseudo registers that the older host is aware of.
The VMM would zero these registers out anyway, in order to be compatible
for migration, but that's not enough when they also show up in the list
(at least not with QEMU that aborts migration when the destination
expects less registers than what get-reg-list provides)
[*] This isn't a great idea, but it'd be nice if we can make it work,
because users may want to rollback upgrades or, after migrating to a
host with a newer kernel, they may want to migrate back to where they
started.
Thanks,
drew
On Sat, Nov 27, 2021 at 9:27 AM Andrew Jones <[email protected]> wrote:
>
> On Tue, Nov 23, 2021 at 10:34:23AM -0800, Raghavendra Rao Ananta wrote:
> > On Mon, Nov 22, 2021 at 9:23 AM Marc Zyngier <[email protected]> wrote:
> > > I keep being baffled by this. Why should we track the VMM accesses or
> > > the VMM writeback? This logic doesn't seem to bring anything useful as
> > > far as I can tell. All we need to ensure is that what is written to
> > > the pseudo-register is an acceptable subset of the previous value, and
> > > I cannot see why this can't be done at write-time.
> > >
> > > If you want to hide this behind a capability, fine (although my guts
> > > feeling is that we don't need that either). But I really want to be
> > > convinced about all this tracking.
> > >
> > The tracking of each owner register is necessary here to safe-guard
> > the possibility that the user-space may not be aware of a newly
> > introduced register, and hence, hasn't accessed it. If it had at least
> > read the register, but not write-back, we assume that the user-space
> > is happy with the configuration. But the fact that the register has
> > not even been read would state that user-space is unaware of the
> > existence of this new register. In such a case, if we don't sanitize
> > (clear all the bits) this register, the features will be exposed
> > unconditionally to the guest.
> >
> > The capability is introduced here to make sure that this new
> > infrastructure is backward compatible with old VMMs. If the VMMs don't
> > enable this capability, they are probably unaware of this, and this
> > will work as it always has- expose new services to the guest
> > unconditionally as and when they are introduced.
>
> Hi Raghavendra,
>
> I don't think we need a CAP that has to be enabled or to make any
> assumptions or policy decisions in the kernel. I think we just need to
> provide a bit more information to the VMM when it checks if KVM has the
> CAP. If KVM would tell the VMM how may pseudo registers there are, which
> can be done with the return value of the CAP, then the VMM code could be
> something like this
>
> r = check_cap(KVM_CAP_ARM_HVC_FW_REG_BMAP);
> if (r) {
> num_regs = r;
>
> for (idx = 0; idx < num_regs; ++idx) {
> reg = hvc_fw_reg(idx);
>
> if (idx > vmm_last_known_idx) {
> ...
> } else {
> ...
> }
> }
> }
>
> With this, the VMM is free to decide if it wants to clear all registers
> greater than the last index it was aware of or if it wants to let those
> registers just get exposed to the guest without knowing what's getting
> exposed. Along with documenting that by default everything gets exposed
> by KVM, which is the backwards compatible thing to do, then the VMM has
> been warned and given everything it needs to manage its guests.
>
Hi Andrew,
Thanks for your comments and suggestions!
I like the idea of sharing info via a read of the CAP, and not having
to explicitly sanitize/clear the registers before the guest begins to
run.
However the handshake is done over an API doc, which is a little
concerning. The user-space must remember and explicitly clear any new
register that it doesn't want to expose to the guest, while the
current approach does this automatically.
Any bug in VMM's implementation could be risky and unintentionally
expose features to the guest. What do you think?
> Another thing that might be nice is giving userspace control of how many
> pseudo registers show up in get-reg-list. In order to migrate from a host
> with a more recent KVM to a host with an older KVM[*] we should only
> expose the number of pseudo registers that the older host is aware of.
> The VMM would zero these registers out anyway, in order to be compatible
> for migration, but that's not enough when they also show up in the list
> (at least not with QEMU that aborts migration when the destination
> expects less registers than what get-reg-list provides)
>
> [*] This isn't a great idea, but it'd be nice if we can make it work,
> because users may want to rollback upgrades or, after migrating to a
> host with a newer kernel, they may want to migrate back to where they
> started.
>
Good point. But IIUC, if the user-space is able to communicate the
info that it's expecting a certain get-reg-list, do you think it can
handle it at its end too, rather than relying on the kernel to send a
list back?
My assumption was that VMM would statically maintain a known set of
registers that it wants to work with and are to be modified by hand,
rather than relying on get-reg-list. This could be the least common
set of registers that are present in all the host kernels (higher or
lower versions) of the migration fleet. This config doesn't change
even with get-reg-list declaring a new register as the features
exposed by it could still be untested. Although, migrating to a host
with a missing register shouldn't be possible in this setting, but if
it encounters the scenario, it should be able to avoid migration to
the host (similar to QEMU).
Please correct me if you think it's a false assumption to proceed with.
Regards,
Raghavendra
> Thanks,
> drew
>
On Sat, Nov 27, 2021 at 5:16 AM Andrew Jones <[email protected]> wrote:
>
> On Sat, Nov 13, 2021 at 01:22:24AM +0000, Raghavendra Rao Ananta wrote:
> > Common hypercall firmware register handing is currently employed
> > by psci.c. Since the upcoming patches add more of these registers,
> > it's better to move the generic handling to hypercall.c for a
> > cleaner presentation.
> >
> > While we are at it, collect all the firmware registers under
> > fw_reg_ids[] to help implement kvm_arm_get_fw_num_regs() and
> > kvm_arm_copy_fw_reg_indices() in a generic way.
> >
> > No functional change intended.
> >
> > Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> > ---
> > arch/arm64/kvm/guest.c | 2 +-
> > arch/arm64/kvm/hypercalls.c | 170 +++++++++++++++++++++++++++++++++++
> > arch/arm64/kvm/psci.c | 166 ----------------------------------
> > include/kvm/arm_hypercalls.h | 7 ++
> > include/kvm/arm_psci.h | 7 --
> > 5 files changed, 178 insertions(+), 174 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> > index 5ce26bedf23c..625f97f7b304 100644
> > --- a/arch/arm64/kvm/guest.c
> > +++ b/arch/arm64/kvm/guest.c
> > @@ -18,7 +18,7 @@
> > #include <linux/string.h>
> > #include <linux/vmalloc.h>
> > #include <linux/fs.h>
> > -#include <kvm/arm_psci.h>
> > +#include <kvm/arm_hypercalls.h>
> > #include <asm/cputype.h>
> > #include <linux/uaccess.h>
> > #include <asm/fpsimd.h>
> > diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> > index 30da78f72b3b..9e136d91b470 100644
> > --- a/arch/arm64/kvm/hypercalls.c
> > +++ b/arch/arm64/kvm/hypercalls.c
> > @@ -146,3 +146,173 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> > smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
> > return 1;
> > }
> > +
> > +static const u64 fw_reg_ids[] = {
> > + KVM_REG_ARM_PSCI_VERSION,
> > + KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1,
> > + KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2,
> > +};
> > +
> > +int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu)
> > +{
> > + return ARRAY_SIZE(fw_reg_ids);
> > +}
> > +
> > +int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
> > +{
> > + int i;
> > +
> > + for (i = 0; i < ARRAY_SIZE(fw_reg_ids); i++) {
> > + if (put_user(fw_reg_ids[i], uindices))
>
> This is missing the ++ on uindices, so it just writes the same offset
> three times.
>
Thanks for catching this! I believe I realized this later and
corrected it in patch-04/11 of the series and missed it here.
I'll fix it here as well.
> > + return -EFAULT;
> > + }
> > +
> > + return 0;
> > +}
>
> I assume the rest of the patch is just a cut+paste move of code.
>
That's right.
Regards,
Raghavendra
> Thanks,
> drew
>
On Mon, Nov 29, 2021 at 04:56:19PM -0800, Raghavendra Rao Ananta wrote:
> On Sat, Nov 27, 2021 at 9:27 AM Andrew Jones <[email protected]> wrote:
> >
> > On Tue, Nov 23, 2021 at 10:34:23AM -0800, Raghavendra Rao Ananta wrote:
> > > On Mon, Nov 22, 2021 at 9:23 AM Marc Zyngier <[email protected]> wrote:
> > > > I keep being baffled by this. Why should we track the VMM accesses or
> > > > the VMM writeback? This logic doesn't seem to bring anything useful as
> > > > far as I can tell. All we need to ensure is that what is written to
> > > > the pseudo-register is an acceptable subset of the previous value, and
> > > > I cannot see why this can't be done at write-time.
> > > >
> > > > If you want to hide this behind a capability, fine (although my guts
> > > > feeling is that we don't need that either). But I really want to be
> > > > convinced about all this tracking.
> > > >
> > > The tracking of each owner register is necessary here to safe-guard
> > > the possibility that the user-space may not be aware of a newly
> > > introduced register, and hence, hasn't accessed it. If it had at least
> > > read the register, but not write-back, we assume that the user-space
> > > is happy with the configuration. But the fact that the register has
> > > not even been read would state that user-space is unaware of the
> > > existence of this new register. In such a case, if we don't sanitize
> > > (clear all the bits) this register, the features will be exposed
> > > unconditionally to the guest.
> > >
> > > The capability is introduced here to make sure that this new
> > > infrastructure is backward compatible with old VMMs. If the VMMs don't
> > > enable this capability, they are probably unaware of this, and this
> > > will work as it always has- expose new services to the guest
> > > unconditionally as and when they are introduced.
> >
> > Hi Raghavendra,
> >
> > I don't think we need a CAP that has to be enabled or to make any
> > assumptions or policy decisions in the kernel. I think we just need to
> > provide a bit more information to the VMM when it checks if KVM has the
> > CAP. If KVM would tell the VMM how may pseudo registers there are, which
> > can be done with the return value of the CAP, then the VMM code could be
> > something like this
> >
> > r = check_cap(KVM_CAP_ARM_HVC_FW_REG_BMAP);
> > if (r) {
> > num_regs = r;
> >
> > for (idx = 0; idx < num_regs; ++idx) {
> > reg = hvc_fw_reg(idx);
> >
> > if (idx > vmm_last_known_idx) {
> > ...
> > } else {
> > ...
> > }
> > }
> > }
> >
> > With this, the VMM is free to decide if it wants to clear all registers
> > greater than the last index it was aware of or if it wants to let those
> > registers just get exposed to the guest without knowing what's getting
> > exposed. Along with documenting that by default everything gets exposed
> > by KVM, which is the backwards compatible thing to do, then the VMM has
> > been warned and given everything it needs to manage its guests.
> >
> Hi Andrew,
>
> Thanks for your comments and suggestions!
>
> I like the idea of sharing info via a read of the CAP, and not having
> to explicitly sanitize/clear the registers before the guest begins to
> run.
> However the handshake is done over an API doc, which is a little
> concerning. The user-space must remember and explicitly clear any new
> register that it doesn't want to expose to the guest, while the
> current approach does this automatically.
> Any bug in VMM's implementation could be risky and unintentionally
> expose features to the guest. What do you think?
The VMM can mess things up in many ways. While KVM should protect itself
from the VMM, it shouldn't try to protect the VMM from the VMM itself. In
this case, the risk here isn't that we allow the VMM to do something that
can harm KVM, or even the guest. The risk is only that the VMM fails to do
what it wanted to do (assuming it didn't want to expose unknown features
to the guest). I.e. the risk here is only that the VMM has a bug, and it's
an easily detectable bug. I say let the VMM developers manage it.
>
> > Another thing that might be nice is giving userspace control of how many
> > pseudo registers show up in get-reg-list. In order to migrate from a host
> > with a more recent KVM to a host with an older KVM[*] we should only
> > expose the number of pseudo registers that the older host is aware of.
> > The VMM would zero these registers out anyway, in order to be compatible
> > for migration, but that's not enough when they also show up in the list
> > (at least not with QEMU that aborts migration when the destination
> > expects less registers than what get-reg-list provides)
> >
> > [*] This isn't a great idea, but it'd be nice if we can make it work,
> > because users may want to rollback upgrades or, after migrating to a
> > host with a newer kernel, they may want to migrate back to where they
> > started.
> >
> Good point. But IIUC, if the user-space is able to communicate the
> info that it's expecting a certain get-reg-list, do you think it can
> handle it at its end too, rather than relying on the kernel to send a
> list back?
Yes, I think we can probably manage this in the VMM, and maybe/probably
that's the better place to manage it.
>
> My assumption was that VMM would statically maintain a known set of
> registers that it wants to work with and are to be modified by hand,
> rather than relying on get-reg-list. This could be the least common
> set of registers that are present in all the host kernels (higher or
> lower versions) of the migration fleet. This config doesn't change
> even with get-reg-list declaring a new register as the features
> exposed by it could still be untested. Although, migrating to a host
> with a missing register shouldn't be possible in this setting, but if
> it encounters the scenario, it should be able to avoid migration to
> the host (similar to QEMU).
>
> Please correct me if you think it's a false assumption to proceed with.
Your assumptions align with mine. It seems as we move towards CPU models,
get-reg-list's role will likely only be to confirm a host supports the
minimum required. We should probably implement/change the VMM to allow
migrating from a host with more registers to one with less as long as
the one with less includes all the required registers. Of course we
also need to ensure that any registers we don't want to require are
not exposed to the guest, but I guess that's precisely what we're trying
to do with this series for at least some pseudo registers.
Thanks,
drew