Hello,
Continuing the discussion from [1], the series tries to add support
for the user-space to elect the hypercall services that it wishes
to expose to the guest, rather than the guest discovering them
unconditionally. The idea employed by the series was taken from
[1] as suggested by Marc Z.
In a broad sense, the idea is similar to the current implementation
of PSCI interface- create a 'firmware psuedo-register' to handle the
firmware revisions. The series extends this idea to all the other
hypercalls such as TRNG (True Random Number Generator), PV_TIME
(Paravirtualized Time), and PTP (Precision Time protocol).
For better categorization and future scaling, these firmware registers
are categorized based on the service call owners, but unlike the
existing firmware psuedo-registers, they hold the features supported
in the form of a bitmap.
The capability, KVM_CAP_ARM_HVC_FW_REG_BMAP, is introduced to announce
this extension, which returns the number of psuedo-firmware
registers supported. During the VM initialization, the registers
holds an upper-limit of the features supported by each one of them.
It's expected that the VMMs discover the features provided by each
register via GET_ONE_REG, and writeback the desired values using
SET_ONE_REG. KVM allows this modification only until the VM has started.
Older VMMs can simply ignore the capability and the hypercall services
will be exposed unconditionally to the guests, thus ensuring backward
compatibility.
The patches are based off of mainline kernel 5.16-rc8, with the selftest
patches from [2] applied.
Patch-1 tracks if the VM has started running, which would be used in the
upcoming patches.
Patch-2 factors out the non-PSCI related interface from psci.c to
hypercalls.c, as the series would extend the list in the upcoming
patches.
Patch-3 introduces the KVM_CAP_ARM_HVC_FW_REG_BMAP capability
definition.
Patch-4 sets up the framework for the bitmap firmware psuedo-registers.
It includes read/write helpers for the registers, and a helper to check
if a particular hypercall service is supported for the guest.
It also adds the register KVM_REG_ARM_STD_HYP_BMAP to support ARM's
standard secure services.
Patch-5 introduces the firmware register, KVM_REG_ARM_STD_HYP_BMAP,
which holds the standard hypervisor services (such as PV_TIME).
Patch-6 introduces the firmware register, KVM_REG_ARM_VENDOR_HYP_BMAP,
which holds the vendor specific hypercall services.
Patch-7,8 Add the necessary documentation for the newly added firmware
registers.
Patch-9 imports the SMCCC definitions from linux/arm-smccc.h into tools/
for further use in selftests.
Patch-10 adds the selftest to test the guest (using 'hvc') and VMM
interfaces (SET/GET_ONE_REG).
Patch-11 adds these firmware registers into the get-reg-list selftest.
[1]: https://lore.kernel.org/kvmarm/[email protected]/T/
[2]: https://lore.kernel.org/kvmarm/[email protected]/
Regards,
Raghavendra
v2 -> v3
Addressed comments by Marc and Andrew:
- Dropped kvm_vcpu_has_run_once() implementation.
- Redifined kvm_vm_has_run_once() as kvm_vm_has_started() in the core
KVM code that introduces a new field, 'vm_started', to track this.
- KVM_CAP_ARM_HVC_FW_REG_BMAP returns the number of psuedo-firmware
bitmap registers upon a 'read'. Support for 'write' removed.
- Removed redundant spinlock, 'fw_reg_bmap_enabled' fields from the
hypercall descriptor structure.
- A separate sub-struct to hold the bitmap info is removed. The bitmap
info is directly stored in the hypercall descriptor structure
(struct kvm_hvc_desc).
v1 -> v2
Addressed comments by Oliver (thanks!):
- Introduced kvm_vcpu_has_run_once() and kvm_vm_has_run_once() in the
core kvm code, rather than relying on ARM specific vcpu->arch.has_run_once.
- Writing to KVM_REG_ARM_PSCI_VERSION is done in hypercalls.c itself,
rather than separating out to psci.c.
- Introduced KVM_CAP_ARM_HVC_FW_REG_BMAP to enable the extension.
- Tracks the register accesses from VMM to decide whether to sanitize
a register or not, as opposed to sanitizing upon the first 'write'
in v1.
- kvm_hvc_call_supported() is implemented using a direct switch-case
statement, instead of looping over all the registers to pick the
register for the function-id.
- Replaced the register bit definitions with #defines, instead of enums.
- Removed the patch v1-06/08 that imports the firmware register
definitions as it's not needed.
- Separated out the documentations in its own patch, and the renaming
of hypercalls.rst to psci.rst into another patch.
- Add the new firmware registers to get-reg-list KVM selftest.
v1: https://lore.kernel.org/kvmarm/[email protected]/
v2: https://lore.kernel.org/kvmarm/[email protected]/
Raghavendra Rao Ananta (11):
KVM: Capture VM start
KVM: arm64: Factor out firmware register handling from psci.c
KVM: Introduce KVM_CAP_ARM_HVC_FW_REG_BMAP
KVM: arm64: Setup a framework for hypercall bitmap firmware registers
KVM: arm64: Add standard hypervisor firmware register
KVM: arm64: Add vendor hypervisor firmware register
Docs: KVM: Add doc for the bitmap firmware registers
Docs: KVM: Rename psci.rst to hypercalls.rst
tools: Import ARM SMCCC definitions
selftests: KVM: aarch64: Introduce hypercall ABI test
selftests: KVM: aarch64: Add the bitmap firmware registers to
get-reg-list
Documentation/virt/kvm/api.rst | 21 +
Documentation/virt/kvm/arm/hypercalls.rst | 128 +++++++
Documentation/virt/kvm/arm/psci.rst | 77 ----
arch/arm64/include/asm/kvm_host.h | 16 +
arch/arm64/include/uapi/asm/kvm.h | 12 +
arch/arm64/kvm/arm.c | 4 +
arch/arm64/kvm/guest.c | 2 +-
arch/arm64/kvm/hypercalls.c | 311 ++++++++++++++-
arch/arm64/kvm/psci.c | 166 --------
arch/arm64/kvm/pvtime.c | 3 +
arch/arm64/kvm/trng.c | 8 +-
include/kvm/arm_hypercalls.h | 19 +
include/kvm/arm_psci.h | 7 -
include/linux/kvm_host.h | 3 +
include/uapi/linux/kvm.h | 1 +
tools/include/linux/arm-smccc.h | 188 +++++++++
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/aarch64/get-reg-list.c | 3 +
.../selftests/kvm/aarch64/hypercalls.c | 358 ++++++++++++++++++
virt/kvm/kvm_main.c | 9 +
21 files changed, 1078 insertions(+), 260 deletions(-)
create mode 100644 Documentation/virt/kvm/arm/hypercalls.rst
delete mode 100644 Documentation/virt/kvm/arm/psci.rst
create mode 100644 tools/include/linux/arm-smccc.h
create mode 100644 tools/testing/selftests/kvm/aarch64/hypercalls.c
--
2.34.1.448.ga2b2bfdf31-goog
Capture the start of the KVM VM, which is basically the
start of any vCPU run. This state of the VM is helpful
in the upcoming patches to prevent user-space from
configuring certain VM features after the VM has started
running.
Signed-off-by: Raghavendra Rao Ananta <[email protected]>
---
include/linux/kvm_host.h | 3 +++
virt/kvm/kvm_main.c | 9 +++++++++
2 files changed, 12 insertions(+)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c310648cc8f1..d0bd8f7a026c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -623,6 +623,7 @@ struct kvm {
struct notifier_block pm_notifier;
#endif
char stats_id[KVM_STATS_NAME_SIZE];
+ bool vm_started;
};
#define kvm_err(fmt, ...) \
@@ -1666,6 +1667,8 @@ static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu)
}
}
+#define kvm_vm_has_started(kvm) (kvm->vm_started)
+
extern bool kvm_rebooting;
extern unsigned int halt_poll_ns;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 72c4e6b39389..962b91ac2064 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3686,6 +3686,7 @@ static long kvm_vcpu_ioctl(struct file *filp,
int r;
struct kvm_fpu *fpu = NULL;
struct kvm_sregs *kvm_sregs = NULL;
+ struct kvm *kvm = vcpu->kvm;
if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_dead)
return -EIO;
@@ -3723,6 +3724,14 @@ static long kvm_vcpu_ioctl(struct file *filp,
if (oldpid)
synchronize_rcu();
put_pid(oldpid);
+
+ /*
+ * Since we land here even on the first vCPU run,
+ * we can mark that the VM has started running.
+ */
+ mutex_lock(&kvm->lock);
+ kvm->vm_started = true;
+ mutex_unlock(&kvm->lock);
}
r = kvm_arch_vcpu_ioctl_run(vcpu);
trace_kvm_userspace_exit(vcpu->run->exit_reason, r);
--
2.34.1.448.ga2b2bfdf31-goog
Common hypercall firmware register handing is currently employed
by psci.c. Since the upcoming patches add more of these registers,
it's better to move the generic handling to hypercall.c for a
cleaner presentation.
While we are at it, collect all the firmware registers under
fw_reg_ids[] to help implement kvm_arm_get_fw_num_regs() and
kvm_arm_copy_fw_reg_indices() in a generic way.
No functional change intended.
Signed-off-by: Raghavendra Rao Ananta <[email protected]>
---
arch/arm64/kvm/guest.c | 2 +-
arch/arm64/kvm/hypercalls.c | 170 +++++++++++++++++++++++++++++++++++
arch/arm64/kvm/psci.c | 166 ----------------------------------
include/kvm/arm_hypercalls.h | 7 ++
include/kvm/arm_psci.h | 7 --
5 files changed, 178 insertions(+), 174 deletions(-)
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index e116c7767730..8238e52d890d 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -18,7 +18,7 @@
#include <linux/string.h>
#include <linux/vmalloc.h>
#include <linux/fs.h>
-#include <kvm/arm_psci.h>
+#include <kvm/arm_hypercalls.h>
#include <asm/cputype.h>
#include <linux/uaccess.h>
#include <asm/fpsimd.h>
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 30da78f72b3b..3c2fcf31ad3d 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -146,3 +146,173 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
return 1;
}
+
+static const u64 kvm_arm_fw_reg_ids[] = {
+ KVM_REG_ARM_PSCI_VERSION,
+ KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1,
+ KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2,
+};
+
+int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu)
+{
+ return ARRAY_SIZE(kvm_arm_fw_reg_ids);
+}
+
+int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(kvm_arm_fw_reg_ids); i++) {
+ if (put_user(kvm_arm_fw_reg_ids[i], uindices++))
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
+#define KVM_REG_FEATURE_LEVEL_WIDTH 4
+#define KVM_REG_FEATURE_LEVEL_MASK (BIT(KVM_REG_FEATURE_LEVEL_WIDTH) - 1)
+
+/*
+ * Convert the workaround level into an easy-to-compare number, where higher
+ * values mean better protection.
+ */
+static int get_kernel_wa_level(u64 regid)
+{
+ switch (regid) {
+ case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
+ switch (arm64_get_spectre_v2_state()) {
+ case SPECTRE_VULNERABLE:
+ return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_AVAIL;
+ case SPECTRE_MITIGATED:
+ return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_AVAIL;
+ case SPECTRE_UNAFFECTED:
+ return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_REQUIRED;
+ }
+ return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_AVAIL;
+ case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
+ switch (arm64_get_spectre_v4_state()) {
+ case SPECTRE_MITIGATED:
+ /*
+ * As for the hypercall discovery, we pretend we
+ * don't have any FW mitigation if SSBS is there at
+ * all times.
+ */
+ if (cpus_have_final_cap(ARM64_SSBS))
+ return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL;
+ fallthrough;
+ case SPECTRE_UNAFFECTED:
+ return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED;
+ case SPECTRE_VULNERABLE:
+ return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL;
+ }
+ }
+
+ return -EINVAL;
+}
+
+int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
+{
+ void __user *uaddr = (void __user *)(long)reg->addr;
+ u64 val;
+
+ switch (reg->id) {
+ case KVM_REG_ARM_PSCI_VERSION:
+ val = kvm_psci_version(vcpu, vcpu->kvm);
+ break;
+ case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
+ case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
+ val = get_kernel_wa_level(reg->id) & KVM_REG_FEATURE_LEVEL_MASK;
+ break;
+ default:
+ return -ENOENT;
+ }
+
+ if (copy_to_user(uaddr, &val, KVM_REG_SIZE(reg->id)))
+ return -EFAULT;
+
+ return 0;
+}
+
+int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
+{
+ void __user *uaddr = (void __user *)(long)reg->addr;
+ u64 val;
+ int wa_level;
+
+ if (copy_from_user(&val, uaddr, KVM_REG_SIZE(reg->id)))
+ return -EFAULT;
+
+ switch (reg->id) {
+ case KVM_REG_ARM_PSCI_VERSION:
+ {
+ bool wants_02;
+
+ wants_02 = test_bit(KVM_ARM_VCPU_PSCI_0_2, vcpu->arch.features);
+
+ switch (val) {
+ case KVM_ARM_PSCI_0_1:
+ if (wants_02)
+ return -EINVAL;
+ vcpu->kvm->arch.psci_version = val;
+ return 0;
+ case KVM_ARM_PSCI_0_2:
+ case KVM_ARM_PSCI_1_0:
+ if (!wants_02)
+ return -EINVAL;
+ vcpu->kvm->arch.psci_version = val;
+ return 0;
+ }
+ break;
+ }
+
+ case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
+ if (val & ~KVM_REG_FEATURE_LEVEL_MASK)
+ return -EINVAL;
+
+ if (get_kernel_wa_level(reg->id) < val)
+ return -EINVAL;
+
+ return 0;
+
+ case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
+ if (val & ~(KVM_REG_FEATURE_LEVEL_MASK |
+ KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED))
+ return -EINVAL;
+
+ /* The enabled bit must not be set unless the level is AVAIL. */
+ if ((val & KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED) &&
+ (val & KVM_REG_FEATURE_LEVEL_MASK) != KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_AVAIL)
+ return -EINVAL;
+
+ /*
+ * Map all the possible incoming states to the only two we
+ * really want to deal with.
+ */
+ switch (val & KVM_REG_FEATURE_LEVEL_MASK) {
+ case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL:
+ case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_UNKNOWN:
+ wa_level = KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL;
+ break;
+ case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_AVAIL:
+ case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED:
+ wa_level = KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ /*
+ * We can deal with NOT_AVAIL on NOT_REQUIRED, but not the
+ * other way around.
+ */
+ if (get_kernel_wa_level(reg->id) < wa_level)
+ return -EINVAL;
+
+ return 0;
+ default:
+ return -ENOENT;
+ }
+
+ return -EINVAL;
+}
diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index 74c47d420253..6c8323ae32f2 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -403,169 +403,3 @@ int kvm_psci_call(struct kvm_vcpu *vcpu)
return -EINVAL;
};
}
-
-int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu)
-{
- return 3; /* PSCI version and two workaround registers */
-}
-
-int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
-{
- if (put_user(KVM_REG_ARM_PSCI_VERSION, uindices++))
- return -EFAULT;
-
- if (put_user(KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1, uindices++))
- return -EFAULT;
-
- if (put_user(KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2, uindices++))
- return -EFAULT;
-
- return 0;
-}
-
-#define KVM_REG_FEATURE_LEVEL_WIDTH 4
-#define KVM_REG_FEATURE_LEVEL_MASK (BIT(KVM_REG_FEATURE_LEVEL_WIDTH) - 1)
-
-/*
- * Convert the workaround level into an easy-to-compare number, where higher
- * values mean better protection.
- */
-static int get_kernel_wa_level(u64 regid)
-{
- switch (regid) {
- case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
- switch (arm64_get_spectre_v2_state()) {
- case SPECTRE_VULNERABLE:
- return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_AVAIL;
- case SPECTRE_MITIGATED:
- return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_AVAIL;
- case SPECTRE_UNAFFECTED:
- return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_REQUIRED;
- }
- return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_AVAIL;
- case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
- switch (arm64_get_spectre_v4_state()) {
- case SPECTRE_MITIGATED:
- /*
- * As for the hypercall discovery, we pretend we
- * don't have any FW mitigation if SSBS is there at
- * all times.
- */
- if (cpus_have_final_cap(ARM64_SSBS))
- return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL;
- fallthrough;
- case SPECTRE_UNAFFECTED:
- return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED;
- case SPECTRE_VULNERABLE:
- return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL;
- }
- }
-
- return -EINVAL;
-}
-
-int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
-{
- void __user *uaddr = (void __user *)(long)reg->addr;
- u64 val;
-
- switch (reg->id) {
- case KVM_REG_ARM_PSCI_VERSION:
- val = kvm_psci_version(vcpu, vcpu->kvm);
- break;
- case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
- case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
- val = get_kernel_wa_level(reg->id) & KVM_REG_FEATURE_LEVEL_MASK;
- break;
- default:
- return -ENOENT;
- }
-
- if (copy_to_user(uaddr, &val, KVM_REG_SIZE(reg->id)))
- return -EFAULT;
-
- return 0;
-}
-
-int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
-{
- void __user *uaddr = (void __user *)(long)reg->addr;
- u64 val;
- int wa_level;
-
- if (copy_from_user(&val, uaddr, KVM_REG_SIZE(reg->id)))
- return -EFAULT;
-
- switch (reg->id) {
- case KVM_REG_ARM_PSCI_VERSION:
- {
- bool wants_02;
-
- wants_02 = test_bit(KVM_ARM_VCPU_PSCI_0_2, vcpu->arch.features);
-
- switch (val) {
- case KVM_ARM_PSCI_0_1:
- if (wants_02)
- return -EINVAL;
- vcpu->kvm->arch.psci_version = val;
- return 0;
- case KVM_ARM_PSCI_0_2:
- case KVM_ARM_PSCI_1_0:
- if (!wants_02)
- return -EINVAL;
- vcpu->kvm->arch.psci_version = val;
- return 0;
- }
- break;
- }
-
- case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
- if (val & ~KVM_REG_FEATURE_LEVEL_MASK)
- return -EINVAL;
-
- if (get_kernel_wa_level(reg->id) < val)
- return -EINVAL;
-
- return 0;
-
- case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
- if (val & ~(KVM_REG_FEATURE_LEVEL_MASK |
- KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED))
- return -EINVAL;
-
- /* The enabled bit must not be set unless the level is AVAIL. */
- if ((val & KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED) &&
- (val & KVM_REG_FEATURE_LEVEL_MASK) != KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_AVAIL)
- return -EINVAL;
-
- /*
- * Map all the possible incoming states to the only two we
- * really want to deal with.
- */
- switch (val & KVM_REG_FEATURE_LEVEL_MASK) {
- case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL:
- case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_UNKNOWN:
- wa_level = KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL;
- break;
- case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_AVAIL:
- case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED:
- wa_level = KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED;
- break;
- default:
- return -EINVAL;
- }
-
- /*
- * We can deal with NOT_AVAIL on NOT_REQUIRED, but not the
- * other way around.
- */
- if (get_kernel_wa_level(reg->id) < wa_level)
- return -EINVAL;
-
- return 0;
- default:
- return -ENOENT;
- }
-
- return -EINVAL;
-}
diff --git a/include/kvm/arm_hypercalls.h b/include/kvm/arm_hypercalls.h
index 0e2509d27910..5d38628a8d04 100644
--- a/include/kvm/arm_hypercalls.h
+++ b/include/kvm/arm_hypercalls.h
@@ -40,4 +40,11 @@ static inline void smccc_set_retval(struct kvm_vcpu *vcpu,
vcpu_set_reg(vcpu, 3, a3);
}
+struct kvm_one_reg;
+
+int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu);
+int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
+int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
+int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
+
#endif
diff --git a/include/kvm/arm_psci.h b/include/kvm/arm_psci.h
index 5b58bd2fe088..080c2d0bd6e7 100644
--- a/include/kvm/arm_psci.h
+++ b/include/kvm/arm_psci.h
@@ -42,11 +42,4 @@ static inline int kvm_psci_version(struct kvm_vcpu *vcpu, struct kvm *kvm)
int kvm_psci_call(struct kvm_vcpu *vcpu);
-struct kvm_one_reg;
-
-int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu);
-int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
-int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
-int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
-
#endif /* __KVM_ARM_PSCI_H__ */
--
2.34.1.448.ga2b2bfdf31-goog
Introduce the KVM ARM64 capability, KVM_CAP_ARM_HVC_FW_REG_BMAP,
to indicate the support for psuedo-firmware bitmap extension.
Each of these registers holds a feature-set exposed to the guest
in the form of a bitmap. If supported, a simple 'read' of the
capability should return the number of psuedo-firmware registers
supported. User-space can utilize this to discover the registers.
It can further explore or modify the features using the classical
GET/SET_ONE_REG interface.
Signed-off-by: Raghavendra Rao Ananta <[email protected]>
---
Documentation/virt/kvm/api.rst | 21 +++++++++++++++++++++
include/uapi/linux/kvm.h | 1 +
2 files changed, 22 insertions(+)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index aeeb071c7688..646176537f2c 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6925,6 +6925,27 @@ indicated by the fd to the VM this is called on.
This is intended to support intra-host migration of VMs between userspace VMMs,
upgrading the VMM process without interrupting the guest.
+7.30 KVM_CAP_ARM_HVC_FW_REG_BMAP
+--------------------------------
+
+:Architectures: arm64
+:Parameters: None
+:Returns: Number of psuedo-firmware registers supported
+
+This capability indicates that KVM for arm64 supports the psuedo-firmware
+register bitmap extension. Each of these registers represent the features
+supported by a particular type in the form of a bitmap. By default, these
+registers are set with the upper limit of the features that are supported.
+
+The registers can be accessed via the standard SET_ONE_REG and KVM_GET_ONE_REG
+interfaces. The user-space is expected to read the number of these registers
+available by reading KVM_CAP_ARM_HVC_FW_REG_BMAP, read the current bitmap
+configuration via GET_ONE_REG for each register, and then write back the
+desired bitmap of features that it wishes the guest to see via SET_ONE_REG.
+
+Note that KVM doesn't allow the user-space to modify these registers after
+the VM (any of the vCPUs) has started running.
+
8. Other capabilities.
======================
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 1daa45268de2..209b43dbbc3c 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1131,6 +1131,7 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
#define KVM_CAP_ARM_MTE 205
#define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
+#define KVM_CAP_ARM_HVC_FW_REG_BMAP 207
#ifdef KVM_CAP_IRQ_ROUTING
--
2.34.1.448.ga2b2bfdf31-goog
KVM regularly introduces new hypercall services to the guests without
any consent from the Virtual Machine Manager (VMM). This means, the
guests can observe hypercall services in and out as they migrate
across various host kernel versions. This could be a major problem
if the guest discovered a hypercall, started using it, and after
getting migrated to an older kernel realizes that it's no longer
available. Depending on how the guest handles the change, there's
a potential chance that the guest would just panic.
As a result, there's a need for the VMM to elect the services that
it wishes the guest to discover. VMM can elect these services based
on the kernels spread across its (migration) fleet. To remedy this,
extend the existing firmware psuedo-registers, such as
KVM_REG_ARM_PSCI_VERSION, for all the hypercall services available.
These firmware registers are categorized based on the service call
owners, and unlike the existing firmware psuedo-registers, they hold
the features supported in the form of a bitmap.
The capability, KVM_CAP_ARM_HVC_FW_REG_BMAP, is used to announce
this extension, which returns the number of psuedo-firmware
registers supported. During the VM initialization, the registers
holds an upper-limit of the features supported by the corresponding
registers. It's expected that the VMMs discover the features
provided by each register via GET_ONE_REG, and writeback the
desired values using SET_ONE_REG. KVM allows this modification
only until the VM has started.
Older VMMs can simply ignore the capability and the hypercall services
will be exposed unconditionally to the guests, thus ensuring backward
compatibility.
In this patch, the framework adds the register only for ARM's standard
secure services (owner value 4). Currently, this includes support only
for ARM True Random Number Generator (TRNG) service, with bit-0 of the
register representing mandatory features of v1.0. Other services are
momentarily added in the upcoming patches.
Signed-off-by: Raghavendra Rao Ananta <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 12 ++++
arch/arm64/include/uapi/asm/kvm.h | 4 ++
arch/arm64/kvm/arm.c | 4 ++
arch/arm64/kvm/hypercalls.c | 103 +++++++++++++++++++++++++++++-
arch/arm64/kvm/trng.c | 8 +--
include/kvm/arm_hypercalls.h | 6 ++
6 files changed, 129 insertions(+), 8 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 2a5f7f38006f..a32cded0371b 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -102,6 +102,15 @@ struct kvm_s2_mmu {
struct kvm_arch_memory_slot {
};
+/**
+ * struct kvm_hvc_desc: KVM ARM64 hypercall descriptor
+ *
+ * @hvc_std_bmap: Bitmap of standard secure service calls
+ */
+struct kvm_hvc_desc {
+ u64 hvc_std_bmap;
+};
+
struct kvm_arch {
struct kvm_s2_mmu mmu;
@@ -137,6 +146,9 @@ struct kvm_arch {
/* Memory Tagging Extension enabled for the guest */
bool mte_enabled;
+
+ /* Hypercall firmware register' descriptor */
+ struct kvm_hvc_desc hvc_desc;
};
struct kvm_vcpu_fault_info {
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index b3edde68bc3e..0d6f29c58456 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -281,6 +281,10 @@ struct kvm_arm_copy_mte_tags {
#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED 3
#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED (1U << 4)
+#define KVM_REG_ARM_STD_BMAP KVM_REG_ARM_FW_REG(3)
+#define KVM_REG_ARM_STD_BIT_TRNG_V1_0 BIT(0)
+#define KVM_REG_ARM_STD_BMAP_BIT_MAX 0 /* Last valid bit */
+
/* SVE registers */
#define KVM_REG_ARM64_SVE (0x15 << KVM_REG_ARM_COPROC_SHIFT)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index e4727dc771bf..56fe81565235 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -156,6 +156,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
kvm->arch.max_vcpus = kvm_arm_default_max_vcpus();
set_default_spectre(kvm);
+ kvm_arm_init_hypercalls(kvm);
return ret;
out_free_stage2_pgd:
@@ -283,6 +284,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_ARM_PTRAUTH_GENERIC:
r = system_has_full_ptr_auth();
break;
+ case KVM_CAP_ARM_HVC_FW_REG_BMAP:
+ r = kvm_arm_num_fw_bmap_regs();
+ break;
default:
r = 0;
}
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 3c2fcf31ad3d..06243e4670eb 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -58,6 +58,29 @@ static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
val[3] = lower_32_bits(cycles);
}
+static bool kvm_arm_fw_reg_feat_enabled(u64 reg_bmap, u64 feat_bit)
+{
+ return reg_bmap & feat_bit;
+}
+
+bool kvm_hvc_call_supported(struct kvm_vcpu *vcpu, u32 func_id)
+{
+ struct kvm_hvc_desc *hvc_desc = &vcpu->kvm->arch.hvc_desc;
+
+ switch (func_id) {
+ case ARM_SMCCC_TRNG_VERSION:
+ case ARM_SMCCC_TRNG_FEATURES:
+ case ARM_SMCCC_TRNG_GET_UUID:
+ case ARM_SMCCC_TRNG_RND32:
+ case ARM_SMCCC_TRNG_RND64:
+ return kvm_arm_fw_reg_feat_enabled(hvc_desc->hvc_std_bmap,
+ KVM_REG_ARM_STD_BIT_TRNG_V1_0);
+ default:
+ /* By default, allow the services that aren't listed here */
+ return true;
+ }
+}
+
int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
{
u32 func_id = smccc_get_function(vcpu);
@@ -65,6 +88,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
u32 feature;
gpa_t gpa;
+ if (!kvm_hvc_call_supported(vcpu, func_id))
+ goto out;
+
switch (func_id) {
case ARM_SMCCC_VERSION_FUNC_ID:
val[0] = ARM_SMCCC_VERSION_1_1;
@@ -143,6 +169,7 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
return kvm_psci_call(vcpu);
}
+out:
smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
return 1;
}
@@ -153,9 +180,25 @@ static const u64 kvm_arm_fw_reg_ids[] = {
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2,
};
+static const u64 kvm_arm_fw_reg_bmap_ids[] = {
+ KVM_REG_ARM_STD_BMAP,
+};
+
+void kvm_arm_init_hypercalls(struct kvm *kvm)
+{
+ struct kvm_hvc_desc *hvc_desc = &kvm->arch.hvc_desc;
+
+ hvc_desc->hvc_std_bmap = ARM_SMCCC_STD_FEATURES;
+}
+
+int kvm_arm_num_fw_bmap_regs(void)
+{
+ return ARRAY_SIZE(kvm_arm_fw_reg_bmap_ids);
+}
+
int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu)
{
- return ARRAY_SIZE(kvm_arm_fw_reg_ids);
+ return ARRAY_SIZE(kvm_arm_fw_reg_ids) + kvm_arm_num_fw_bmap_regs();
}
int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
@@ -167,6 +210,11 @@ int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
return -EFAULT;
}
+ for (i = 0; i < ARRAY_SIZE(kvm_arm_fw_reg_bmap_ids); i++) {
+ if (put_user(kvm_arm_fw_reg_bmap_ids[i], uindices++))
+ return -EFAULT;
+ }
+
return 0;
}
@@ -211,9 +259,20 @@ static int get_kernel_wa_level(u64 regid)
return -EINVAL;
}
+static void
+kvm_arm_get_fw_reg_bmap(struct kvm_vcpu *vcpu, u64 fw_reg_bmap, u64 *val)
+{
+ struct kvm *kvm = vcpu->kvm;
+
+ mutex_lock(&kvm->lock);
+ *val = fw_reg_bmap;
+ mutex_unlock(&kvm->lock);
+}
+
int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
{
void __user *uaddr = (void __user *)(long)reg->addr;
+ struct kvm_hvc_desc *hvc_desc = &vcpu->kvm->arch.hvc_desc;
u64 val;
switch (reg->id) {
@@ -224,6 +283,9 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
val = get_kernel_wa_level(reg->id) & KVM_REG_FEATURE_LEVEL_MASK;
break;
+ case KVM_REG_ARM_STD_BMAP:
+ kvm_arm_get_fw_reg_bmap(vcpu, hvc_desc->hvc_std_bmap, &val);
+ break;
default:
return -ENOENT;
}
@@ -234,6 +296,43 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
return 0;
}
+static int kvm_arm_set_fw_reg_bmap(struct kvm_vcpu *vcpu, u64 reg_id, u64 val)
+{
+ int ret = 0;
+ struct kvm *kvm = vcpu->kvm;
+ struct kvm_hvc_desc *hvc_desc = &kvm->arch.hvc_desc;
+ u64 *fw_reg_bmap, fw_reg_features;
+
+ switch (reg_id) {
+ case KVM_REG_ARM_STD_BMAP:
+ fw_reg_bmap = &hvc_desc->hvc_std_bmap;
+ fw_reg_features = ARM_SMCCC_STD_FEATURES;
+ break;
+ default:
+ return -ENOENT;
+ }
+
+ /* Check for unsupported bit */
+ if (val & ~fw_reg_features)
+ return -EINVAL;
+
+ mutex_lock(&kvm->lock);
+
+ /*
+ * If the VM (any vCPU) has already started running, return success
+ * if there's no change in the value. Else, return -EBUSY.
+ */
+ if (kvm_vm_has_started(kvm)) {
+ ret = *fw_reg_bmap != val ? -EBUSY : 0;
+ goto out;
+ }
+
+ *fw_reg_bmap = val;
+out:
+ mutex_unlock(&kvm->lock);
+ return ret;
+}
+
int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
{
void __user *uaddr = (void __user *)(long)reg->addr;
@@ -310,6 +409,8 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
return -EINVAL;
return 0;
+ case KVM_REG_ARM_STD_BMAP:
+ return kvm_arm_set_fw_reg_bmap(vcpu, reg->id, val);
default:
return -ENOENT;
}
diff --git a/arch/arm64/kvm/trng.c b/arch/arm64/kvm/trng.c
index 99bdd7103c9c..23f912514b06 100644
--- a/arch/arm64/kvm/trng.c
+++ b/arch/arm64/kvm/trng.c
@@ -60,14 +60,8 @@ int kvm_trng_call(struct kvm_vcpu *vcpu)
val = ARM_SMCCC_TRNG_VERSION_1_0;
break;
case ARM_SMCCC_TRNG_FEATURES:
- switch (smccc_get_arg1(vcpu)) {
- case ARM_SMCCC_TRNG_VERSION:
- case ARM_SMCCC_TRNG_FEATURES:
- case ARM_SMCCC_TRNG_GET_UUID:
- case ARM_SMCCC_TRNG_RND32:
- case ARM_SMCCC_TRNG_RND64:
+ if (kvm_hvc_call_supported(vcpu, smccc_get_arg1(vcpu)))
val = TRNG_SUCCESS;
- }
break;
case ARM_SMCCC_TRNG_GET_UUID:
smccc_set_retval(vcpu, le32_to_cpu(u[0]), le32_to_cpu(u[1]),
diff --git a/include/kvm/arm_hypercalls.h b/include/kvm/arm_hypercalls.h
index 5d38628a8d04..8fe68d8d6d96 100644
--- a/include/kvm/arm_hypercalls.h
+++ b/include/kvm/arm_hypercalls.h
@@ -6,6 +6,9 @@
#include <asm/kvm_emulate.h>
+#define ARM_SMCCC_STD_FEATURES \
+ GENMASK_ULL(KVM_REG_ARM_STD_BMAP_BIT_MAX, 0)
+
int kvm_hvc_call_handler(struct kvm_vcpu *vcpu);
static inline u32 smccc_get_function(struct kvm_vcpu *vcpu)
@@ -42,9 +45,12 @@ static inline void smccc_set_retval(struct kvm_vcpu *vcpu,
struct kvm_one_reg;
+void kvm_arm_init_hypercalls(struct kvm *kvm);
+int kvm_arm_num_fw_bmap_regs(void);
int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu);
int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
+bool kvm_hvc_call_supported(struct kvm_vcpu *vcpu, u32 func_id);
#endif
--
2.34.1.448.ga2b2bfdf31-goog
Introduce the firmware register to hold the standard hypervisor
service calls (owner value 5) as a bitmap. The bitmap represents
the features that'll be enabled for the guest, as configured by
the user-space. Currently, this includes support only for
Paravirtualized time, represented by bit-0.
Signed-off-by: Raghavendra Rao Ananta <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 2 ++
arch/arm64/include/uapi/asm/kvm.h | 4 ++++
arch/arm64/kvm/hypercalls.c | 17 ++++++++++++++++-
arch/arm64/kvm/pvtime.c | 3 +++
include/kvm/arm_hypercalls.h | 3 +++
5 files changed, 28 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index a32cded0371b..1daf2c0b3b85 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -106,9 +106,11 @@ struct kvm_arch_memory_slot {
* struct kvm_hvc_desc: KVM ARM64 hypercall descriptor
*
* @hvc_std_bmap: Bitmap of standard secure service calls
+ * @hvc_std_hyp_bmap: Bitmap of standard hypervisor service calls
*/
struct kvm_hvc_desc {
u64 hvc_std_bmap;
+ u64 hvc_std_hyp_bmap;
};
struct kvm_arch {
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 0d6f29c58456..af59c434ae33 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -285,6 +285,10 @@ struct kvm_arm_copy_mte_tags {
#define KVM_REG_ARM_STD_BIT_TRNG_V1_0 BIT(0)
#define KVM_REG_ARM_STD_BMAP_BIT_MAX 0 /* Last valid bit */
+#define KVM_REG_ARM_STD_HYP_BMAP KVM_REG_ARM_FW_REG(4)
+#define KVM_REG_ARM_STD_HYP_BIT_PV_TIME BIT(0)
+#define KVM_REG_ARM_STD_HYP_BMAP_BIT_MAX 0 /* Last valid bit */
+
/* SVE registers */
#define KVM_REG_ARM64_SVE (0x15 << KVM_REG_ARM_COPROC_SHIFT)
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 06243e4670eb..9df0221834a3 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -75,6 +75,10 @@ bool kvm_hvc_call_supported(struct kvm_vcpu *vcpu, u32 func_id)
case ARM_SMCCC_TRNG_RND64:
return kvm_arm_fw_reg_feat_enabled(hvc_desc->hvc_std_bmap,
KVM_REG_ARM_STD_BIT_TRNG_V1_0);
+ case ARM_SMCCC_HV_PV_TIME_FEATURES:
+ case ARM_SMCCC_HV_PV_TIME_ST:
+ return kvm_arm_fw_reg_feat_enabled(hvc_desc->hvc_std_hyp_bmap,
+ KVM_REG_ARM_STD_HYP_BIT_PV_TIME);
default:
/* By default, allow the services that aren't listed here */
return true;
@@ -134,7 +138,8 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
}
break;
case ARM_SMCCC_HV_PV_TIME_FEATURES:
- val[0] = SMCCC_RET_SUCCESS;
+ if (kvm_hvc_call_supported(vcpu, feature))
+ val[0] = SMCCC_RET_SUCCESS;
break;
}
break;
@@ -182,6 +187,7 @@ static const u64 kvm_arm_fw_reg_ids[] = {
static const u64 kvm_arm_fw_reg_bmap_ids[] = {
KVM_REG_ARM_STD_BMAP,
+ KVM_REG_ARM_STD_HYP_BMAP,
};
void kvm_arm_init_hypercalls(struct kvm *kvm)
@@ -189,6 +195,7 @@ void kvm_arm_init_hypercalls(struct kvm *kvm)
struct kvm_hvc_desc *hvc_desc = &kvm->arch.hvc_desc;
hvc_desc->hvc_std_bmap = ARM_SMCCC_STD_FEATURES;
+ hvc_desc->hvc_std_hyp_bmap = ARM_SMCCC_STD_HYP_FEATURES;
}
int kvm_arm_num_fw_bmap_regs(void)
@@ -286,6 +293,9 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
case KVM_REG_ARM_STD_BMAP:
kvm_arm_get_fw_reg_bmap(vcpu, hvc_desc->hvc_std_bmap, &val);
break;
+ case KVM_REG_ARM_STD_HYP_BMAP:
+ kvm_arm_get_fw_reg_bmap(vcpu, hvc_desc->hvc_std_hyp_bmap, &val);
+ break;
default:
return -ENOENT;
}
@@ -308,6 +318,10 @@ static int kvm_arm_set_fw_reg_bmap(struct kvm_vcpu *vcpu, u64 reg_id, u64 val)
fw_reg_bmap = &hvc_desc->hvc_std_bmap;
fw_reg_features = ARM_SMCCC_STD_FEATURES;
break;
+ case KVM_REG_ARM_STD_HYP_BMAP:
+ fw_reg_bmap = &hvc_desc->hvc_std_hyp_bmap;
+ fw_reg_features = ARM_SMCCC_STD_HYP_FEATURES;
+ break;
default:
return -ENOENT;
}
@@ -410,6 +424,7 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
return 0;
case KVM_REG_ARM_STD_BMAP:
+ case KVM_REG_ARM_STD_HYP_BMAP:
return kvm_arm_set_fw_reg_bmap(vcpu, reg->id, val);
default:
return -ENOENT;
diff --git a/arch/arm64/kvm/pvtime.c b/arch/arm64/kvm/pvtime.c
index 78a09f7a6637..4fa436dbd0b7 100644
--- a/arch/arm64/kvm/pvtime.c
+++ b/arch/arm64/kvm/pvtime.c
@@ -37,6 +37,9 @@ long kvm_hypercall_pv_features(struct kvm_vcpu *vcpu)
u32 feature = smccc_get_arg1(vcpu);
long val = SMCCC_RET_NOT_SUPPORTED;
+ if (!kvm_hvc_call_supported(vcpu, feature))
+ return val;
+
switch (feature) {
case ARM_SMCCC_HV_PV_TIME_FEATURES:
case ARM_SMCCC_HV_PV_TIME_ST:
diff --git a/include/kvm/arm_hypercalls.h b/include/kvm/arm_hypercalls.h
index 8fe68d8d6d96..1a79b5f89a24 100644
--- a/include/kvm/arm_hypercalls.h
+++ b/include/kvm/arm_hypercalls.h
@@ -9,6 +9,9 @@
#define ARM_SMCCC_STD_FEATURES \
GENMASK_ULL(KVM_REG_ARM_STD_BMAP_BIT_MAX, 0)
+#define ARM_SMCCC_STD_HYP_FEATURES \
+ GENMASK_ULL(KVM_REG_ARM_STD_HYP_BMAP_BIT_MAX, 0)
+
int kvm_hvc_call_handler(struct kvm_vcpu *vcpu);
static inline u32 smccc_get_function(struct kvm_vcpu *vcpu)
--
2.34.1.448.ga2b2bfdf31-goog
Introduce the firmware register to hold the vendor specific
hypervisor service calls (owner value 6) as a bitmap. The
bitmap represents the features that'll be enabled for the
guest, as configured by the user-space. Currently, this
includes support only for Precision Time Protocol (PTP),
represented by bit-0.
Signed-off-by: Raghavendra Rao Ananta <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 2 ++
arch/arm64/include/uapi/asm/kvm.h | 4 ++++
arch/arm64/kvm/hypercalls.c | 23 ++++++++++++++++++++++-
include/kvm/arm_hypercalls.h | 3 +++
4 files changed, 31 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 1daf2c0b3b85..877096737ee1 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -107,10 +107,12 @@ struct kvm_arch_memory_slot {
*
* @hvc_std_bmap: Bitmap of standard secure service calls
* @hvc_std_hyp_bmap: Bitmap of standard hypervisor service calls
+ * @hvc_vendor_hyp_bmap: Bitmap of vendor specific hypervisor service calls
*/
struct kvm_hvc_desc {
u64 hvc_std_bmap;
u64 hvc_std_hyp_bmap;
+ u64 hvc_vendor_hyp_bmap;
};
struct kvm_arch {
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index af59c434ae33..7f076da55f30 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -289,6 +289,10 @@ struct kvm_arm_copy_mte_tags {
#define KVM_REG_ARM_STD_HYP_BIT_PV_TIME BIT(0)
#define KVM_REG_ARM_STD_HYP_BMAP_BIT_MAX 0 /* Last valid bit */
+#define KVM_REG_ARM_VENDOR_HYP_BMAP KVM_REG_ARM_FW_REG(5)
+#define KVM_REG_ARM_VENDOR_HYP_BIT_PTP BIT(0)
+#define KVM_REG_ARM_VENDOR_HYP_BMAP_BIT_MAX 0 /* Last valid bit */
+
/* SVE registers */
#define KVM_REG_ARM64_SVE (0x15 << KVM_REG_ARM_COPROC_SHIFT)
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 9df0221834a3..2403ad50d759 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -79,6 +79,9 @@ bool kvm_hvc_call_supported(struct kvm_vcpu *vcpu, u32 func_id)
case ARM_SMCCC_HV_PV_TIME_ST:
return kvm_arm_fw_reg_feat_enabled(hvc_desc->hvc_std_hyp_bmap,
KVM_REG_ARM_STD_HYP_BIT_PV_TIME);
+ case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
+ return kvm_arm_fw_reg_feat_enabled(hvc_desc->hvc_vendor_hyp_bmap,
+ KVM_REG_ARM_VENDOR_HYP_BIT_PTP);
default:
/* By default, allow the services that aren't listed here */
return true;
@@ -87,6 +90,7 @@ bool kvm_hvc_call_supported(struct kvm_vcpu *vcpu, u32 func_id)
int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
{
+ struct kvm_hvc_desc *hvc_desc = &vcpu->kvm->arch.hvc_desc;
u32 func_id = smccc_get_function(vcpu);
u64 val[4] = {SMCCC_RET_NOT_SUPPORTED};
u32 feature;
@@ -159,7 +163,14 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
break;
case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
- val[0] |= BIT(ARM_SMCCC_KVM_FUNC_PTP);
+
+ /*
+ * The feature bits exposed to user-space doesn't include
+ * ARM_SMCCC_KVM_FUNC_FEATURES. However, we expose this to
+ * the guest as bit-0. Hence, left-shift the user-space
+ * exposed bitmap by 1 to accommodate this.
+ */
+ val[0] |= hvc_desc->hvc_vendor_hyp_bmap << 1;
break;
case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
kvm_ptp_get_time(vcpu, val);
@@ -188,6 +199,7 @@ static const u64 kvm_arm_fw_reg_ids[] = {
static const u64 kvm_arm_fw_reg_bmap_ids[] = {
KVM_REG_ARM_STD_BMAP,
KVM_REG_ARM_STD_HYP_BMAP,
+ KVM_REG_ARM_VENDOR_HYP_BMAP,
};
void kvm_arm_init_hypercalls(struct kvm *kvm)
@@ -196,6 +208,7 @@ void kvm_arm_init_hypercalls(struct kvm *kvm)
hvc_desc->hvc_std_bmap = ARM_SMCCC_STD_FEATURES;
hvc_desc->hvc_std_hyp_bmap = ARM_SMCCC_STD_HYP_FEATURES;
+ hvc_desc->hvc_vendor_hyp_bmap = ARM_SMCCC_VENDOR_HYP_FEATURES;
}
int kvm_arm_num_fw_bmap_regs(void)
@@ -296,6 +309,9 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
case KVM_REG_ARM_STD_HYP_BMAP:
kvm_arm_get_fw_reg_bmap(vcpu, hvc_desc->hvc_std_hyp_bmap, &val);
break;
+ case KVM_REG_ARM_VENDOR_HYP_BMAP:
+ kvm_arm_get_fw_reg_bmap(vcpu, hvc_desc->hvc_vendor_hyp_bmap, &val);
+ break;
default:
return -ENOENT;
}
@@ -322,6 +338,10 @@ static int kvm_arm_set_fw_reg_bmap(struct kvm_vcpu *vcpu, u64 reg_id, u64 val)
fw_reg_bmap = &hvc_desc->hvc_std_hyp_bmap;
fw_reg_features = ARM_SMCCC_STD_HYP_FEATURES;
break;
+ case KVM_REG_ARM_VENDOR_HYP_BMAP:
+ fw_reg_bmap = &hvc_desc->hvc_vendor_hyp_bmap;
+ fw_reg_features = ARM_SMCCC_VENDOR_HYP_FEATURES;
+ break;
default:
return -ENOENT;
}
@@ -425,6 +445,7 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
return 0;
case KVM_REG_ARM_STD_BMAP:
case KVM_REG_ARM_STD_HYP_BMAP:
+ case KVM_REG_ARM_VENDOR_HYP_BMAP:
return kvm_arm_set_fw_reg_bmap(vcpu, reg->id, val);
default:
return -ENOENT;
diff --git a/include/kvm/arm_hypercalls.h b/include/kvm/arm_hypercalls.h
index 1a79b5f89a24..8f54cacfbf40 100644
--- a/include/kvm/arm_hypercalls.h
+++ b/include/kvm/arm_hypercalls.h
@@ -12,6 +12,9 @@
#define ARM_SMCCC_STD_HYP_FEATURES \
GENMASK_ULL(KVM_REG_ARM_STD_HYP_BMAP_BIT_MAX, 0)
+#define ARM_SMCCC_VENDOR_HYP_FEATURES \
+ GENMASK_ULL(KVM_REG_ARM_VENDOR_HYP_BMAP_BIT_MAX, 0)
+
int kvm_hvc_call_handler(struct kvm_vcpu *vcpu);
static inline u32 smccc_get_function(struct kvm_vcpu *vcpu)
--
2.34.1.448.ga2b2bfdf31-goog
Add the documentation for the bitmap firmware registers in
psci.rst. This includes the details for KVM_REG_ARM_STD_BMAP,
KVM_REG_ARM_STD_HYP_BMAP, and KVM_REG_ARM_VENDOR_HYP_BMAP
registers.
Signed-off-by: Raghavendra Rao Ananta <[email protected]>
---
Documentation/virt/kvm/arm/psci.rst | 85 +++++++++++++++++++++++------
1 file changed, 68 insertions(+), 17 deletions(-)
diff --git a/Documentation/virt/kvm/arm/psci.rst b/Documentation/virt/kvm/arm/psci.rst
index d52c2e83b5b8..edc3caf927ae 100644
--- a/Documentation/virt/kvm/arm/psci.rst
+++ b/Documentation/virt/kvm/arm/psci.rst
@@ -1,32 +1,32 @@
.. SPDX-License-Identifier: GPL-2.0
-=========================================
-Power State Coordination Interface (PSCI)
-=========================================
+=======================
+ARM Hypercall Interface
+=======================
-KVM implements the PSCI (Power State Coordination Interface)
-specification in order to provide services such as CPU on/off, reset
-and power-off to the guest.
+KVM handles the hypercall services as requested by the guests. New hypercall
+services are regularly made available by the ARM specification or by KVM (as
+vendor services) if they make sense from a virtualization point of view.
-The PSCI specification is regularly updated to provide new features,
-and KVM implements these updates if they make sense from a virtualization
-point of view.
-
-This means that a guest booted on two different versions of KVM can
-observe two different "firmware" revisions. This could cause issues if
-a given guest is tied to a particular PSCI revision (unlikely), or if
-a migration causes a different PSCI version to be exposed out of the
-blue to an unsuspecting guest.
+This means that a guest booted on two different versions of KVM can observe
+two different "firmware" revisions. This could cause issues if a given guest
+is tied to a particular version of a hypercall service, or if a migration
+causes a different version to be exposed out of the blue to an unsuspecting
+guest.
In order to remedy this situation, KVM exposes a set of "firmware
pseudo-registers" that can be manipulated using the GET/SET_ONE_REG
interface. These registers can be saved/restored by userspace, and set
to a convenient value if required.
-The following register is defined:
+The following registers are defined:
* KVM_REG_ARM_PSCI_VERSION:
+ KVM implements the PSCI (Power State Coordination Interface)
+ specification in order to provide services such as CPU on/off, reset
+ and power-off to the guest.
+
- Only valid if the vcpu has the KVM_ARM_VCPU_PSCI_0_2 feature set
(and thus has already been initialized)
- Returns the current PSCI version on GET_ONE_REG (defaulting to the
@@ -74,4 +74,55 @@ The following register is defined:
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED:
The workaround is always active on this vCPU or it is not needed.
-.. [1] https://developer.arm.com/-/media/developer/pdf/ARM_DEN_0070A_Firmware_interfaces_for_mitigating_CVE-2017-5715.pdf
+Contrary to the above registers, the following registers exposes the hypercall
+services in the form of a feature-bitmap. This bitmap is translated to the
+services that are exposed to the guest. There is a register defined per service
+call owner and can be accessed via GET/SET_ONE_REG interface.
+
+A new KVM capability, KVM_CAP_ARM_HVC_FW_REG_BMAP, is introduced to let
+user-space know of this extension. A simple 'read' of the capability would
+return the number of bitmapped registers. The user-space is expected to
+make a not of this value and configure each of the register.
+
+By default, these registers are set with the upper limit of the features that
+are supported. User-space can discover this configuration via GET_ONE_REG. If
+unsatisfied, the user-space can write back the desired bitmap back via
+SET_ONE_REG.
+
+The psuedo-firmware bitmap register are as follows:
+
+* KVM_REG_ARM_STD_BMAP:
+ Controls the bitmap of the ARM Standard Secure Service Calls.
+
+ The following bits are accepted:
+
+ KVM_REG_ARM_STD_BIT_TRNG_V1_0:
+ The bit represents the services offered under v1.0 of ARM True Random
+ Number Generator (TRNG) specification, ARM DEN0098.
+
+* KVM_REG_ARM_STD_HYP_BMAP:
+ Controls the bitmap of the ARM Standard Hypervisor Service Calls.
+
+ The following bits are accepted:
+
+ KVM_REG_ARM_STD_HYP_BIT_PV_TIME:
+ The bit represents the Paravirtualized Time service as represented by
+ ARM DEN0057A.
+
+* KVM_REG_ARM_VENDOR_HYP_BMAP:
+ Controls the bitmap of the Vendor specific Hypervisor Service Calls.
+
+ The following bits are accepted:
+
+ KVM_REG_ARM_VENDOR_HYP_BIT_PTP:
+ The bit represents the Precision Time Protocol KVM service.
+
+Errors:
+
+ ======= =============================================================
+ -ENOENT Unknown register accessed.
+ -EBUSY Attempt a 'write' to the register after the VM has started.
+ -EINVAL Invalid bitmap written to the register.
+ ======= =============================================================
+
+.. [1] https://developer.arm.com/-/media/developer/pdf/ARM_DEN_0070A_Firmware_interfaces_for_mitigating_CVE-2017-5715.pdf
\ No newline at end of file
--
2.34.1.448.ga2b2bfdf31-goog
Since the doc now covers more of general hypercalls'
details, rather than just PSCI, rename the file to a
more appropriate name- hypercalls.rst.
Signed-off-by: Raghavendra Rao Ananta <[email protected]>
---
Documentation/virt/kvm/arm/{psci.rst => hypercalls.rst} | 0
1 file changed, 0 insertions(+), 0 deletions(-)
rename Documentation/virt/kvm/arm/{psci.rst => hypercalls.rst} (100%)
diff --git a/Documentation/virt/kvm/arm/psci.rst b/Documentation/virt/kvm/arm/hypercalls.rst
similarity index 100%
rename from Documentation/virt/kvm/arm/psci.rst
rename to Documentation/virt/kvm/arm/hypercalls.rst
--
2.34.1.448.ga2b2bfdf31-goog
Import the standard SMCCC definitions from include/linux/arm-smccc.h.
Signed-off-by: Raghavendra Rao Ananta <[email protected]>
---
tools/include/linux/arm-smccc.h | 188 ++++++++++++++++++++++++++++++++
1 file changed, 188 insertions(+)
create mode 100644 tools/include/linux/arm-smccc.h
diff --git a/tools/include/linux/arm-smccc.h b/tools/include/linux/arm-smccc.h
new file mode 100644
index 000000000000..a11c0bbabd5b
--- /dev/null
+++ b/tools/include/linux/arm-smccc.h
@@ -0,0 +1,188 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2015, Linaro Limited
+ */
+#ifndef __LINUX_ARM_SMCCC_H
+#define __LINUX_ARM_SMCCC_H
+
+#include <linux/const.h>
+
+/*
+ * This file provides common defines for ARM SMC Calling Convention as
+ * specified in
+ * https://developer.arm.com/docs/den0028/latest
+ *
+ * This code is up-to-date with version DEN 0028 C
+ */
+
+#define ARM_SMCCC_STD_CALL _AC(0,U)
+#define ARM_SMCCC_FAST_CALL _AC(1,U)
+#define ARM_SMCCC_TYPE_SHIFT 31
+
+#define ARM_SMCCC_SMC_32 0
+#define ARM_SMCCC_SMC_64 1
+#define ARM_SMCCC_CALL_CONV_SHIFT 30
+
+#define ARM_SMCCC_OWNER_MASK 0x3F
+#define ARM_SMCCC_OWNER_SHIFT 24
+
+#define ARM_SMCCC_FUNC_MASK 0xFFFF
+
+#define ARM_SMCCC_IS_FAST_CALL(smc_val) \
+ ((smc_val) & (ARM_SMCCC_FAST_CALL << ARM_SMCCC_TYPE_SHIFT))
+#define ARM_SMCCC_IS_64(smc_val) \
+ ((smc_val) & (ARM_SMCCC_SMC_64 << ARM_SMCCC_CALL_CONV_SHIFT))
+#define ARM_SMCCC_FUNC_NUM(smc_val) ((smc_val) & ARM_SMCCC_FUNC_MASK)
+#define ARM_SMCCC_OWNER_NUM(smc_val) \
+ (((smc_val) >> ARM_SMCCC_OWNER_SHIFT) & ARM_SMCCC_OWNER_MASK)
+
+#define ARM_SMCCC_CALL_VAL(type, calling_convention, owner, func_num) \
+ (((type) << ARM_SMCCC_TYPE_SHIFT) | \
+ ((calling_convention) << ARM_SMCCC_CALL_CONV_SHIFT) | \
+ (((owner) & ARM_SMCCC_OWNER_MASK) << ARM_SMCCC_OWNER_SHIFT) | \
+ ((func_num) & ARM_SMCCC_FUNC_MASK))
+
+#define ARM_SMCCC_OWNER_ARCH 0
+#define ARM_SMCCC_OWNER_CPU 1
+#define ARM_SMCCC_OWNER_SIP 2
+#define ARM_SMCCC_OWNER_OEM 3
+#define ARM_SMCCC_OWNER_STANDARD 4
+#define ARM_SMCCC_OWNER_STANDARD_HYP 5
+#define ARM_SMCCC_OWNER_VENDOR_HYP 6
+#define ARM_SMCCC_OWNER_TRUSTED_APP 48
+#define ARM_SMCCC_OWNER_TRUSTED_APP_END 49
+#define ARM_SMCCC_OWNER_TRUSTED_OS 50
+#define ARM_SMCCC_OWNER_TRUSTED_OS_END 63
+
+#define ARM_SMCCC_FUNC_QUERY_CALL_UID 0xff01
+
+#define ARM_SMCCC_QUIRK_NONE 0
+#define ARM_SMCCC_QUIRK_QCOM_A6 1 /* Save/restore register a6 */
+
+#define ARM_SMCCC_VERSION_1_0 0x10000
+#define ARM_SMCCC_VERSION_1_1 0x10001
+#define ARM_SMCCC_VERSION_1_2 0x10002
+#define ARM_SMCCC_VERSION_1_3 0x10003
+
+#define ARM_SMCCC_1_3_SVE_HINT 0x10000
+
+#define ARM_SMCCC_VERSION_FUNC_ID \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_32, \
+ 0, 0)
+
+#define ARM_SMCCC_ARCH_FEATURES_FUNC_ID \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_32, \
+ 0, 1)
+
+#define ARM_SMCCC_ARCH_SOC_ID \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_32, \
+ 0, 2)
+
+#define ARM_SMCCC_ARCH_WORKAROUND_1 \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_32, \
+ 0, 0x8000)
+
+#define ARM_SMCCC_ARCH_WORKAROUND_2 \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_32, \
+ 0, 0x7fff)
+
+#define ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_32, \
+ ARM_SMCCC_OWNER_VENDOR_HYP, \
+ ARM_SMCCC_FUNC_QUERY_CALL_UID)
+
+/* KVM UID value: 28b46fb6-2ec5-11e9-a9ca-4b564d003a74 */
+#define ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_0 0xb66fb428U
+#define ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_1 0xe911c52eU
+#define ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_2 0x564bcaa9U
+#define ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3 0x743a004dU
+
+/* KVM "vendor specific" services */
+#define ARM_SMCCC_KVM_FUNC_FEATURES 0
+#define ARM_SMCCC_KVM_FUNC_PTP 1
+#define ARM_SMCCC_KVM_FUNC_FEATURES_2 127
+#define ARM_SMCCC_KVM_NUM_FUNCS 128
+
+#define ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_32, \
+ ARM_SMCCC_OWNER_VENDOR_HYP, \
+ ARM_SMCCC_KVM_FUNC_FEATURES)
+
+#define SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED 1
+
+/*
+ * ptp_kvm is a feature used for time sync between vm and host.
+ * ptp_kvm module in guest kernel will get service from host using
+ * this hypercall ID.
+ */
+#define ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_32, \
+ ARM_SMCCC_OWNER_VENDOR_HYP, \
+ ARM_SMCCC_KVM_FUNC_PTP)
+
+/* ptp_kvm counter type ID */
+#define KVM_PTP_VIRT_COUNTER 0
+#define KVM_PTP_PHYS_COUNTER 1
+
+/* Paravirtualised time calls (defined by ARM DEN0057A) */
+#define ARM_SMCCC_HV_PV_TIME_FEATURES \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_64, \
+ ARM_SMCCC_OWNER_STANDARD_HYP, \
+ 0x20)
+
+#define ARM_SMCCC_HV_PV_TIME_ST \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_64, \
+ ARM_SMCCC_OWNER_STANDARD_HYP, \
+ 0x21)
+
+/* TRNG entropy source calls (defined by ARM DEN0098) */
+#define ARM_SMCCC_TRNG_VERSION \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_32, \
+ ARM_SMCCC_OWNER_STANDARD, \
+ 0x50)
+
+#define ARM_SMCCC_TRNG_FEATURES \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_32, \
+ ARM_SMCCC_OWNER_STANDARD, \
+ 0x51)
+
+#define ARM_SMCCC_TRNG_GET_UUID \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_32, \
+ ARM_SMCCC_OWNER_STANDARD, \
+ 0x52)
+
+#define ARM_SMCCC_TRNG_RND32 \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_32, \
+ ARM_SMCCC_OWNER_STANDARD, \
+ 0x53)
+
+#define ARM_SMCCC_TRNG_RND64 \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_64, \
+ ARM_SMCCC_OWNER_STANDARD, \
+ 0x53)
+
+/*
+ * Return codes defined in ARM DEN 0070A
+ * ARM DEN 0070A is now merged/consolidated into ARM DEN 0028 C
+ */
+#define SMCCC_RET_SUCCESS 0
+#define SMCCC_RET_NOT_SUPPORTED -1
+#define SMCCC_RET_NOT_REQUIRED -2
+#define SMCCC_RET_INVALID_PARAMETER -3
+
+#endif /*__LINUX_ARM_SMCCC_H*/
--
2.34.1.448.ga2b2bfdf31-goog
Introduce a KVM selftest to check the hypercall interface
for arm64 platforms. The test validates the user-space's
IOCTL interface to read/write the psuedo-firmware registers
as well as its effects on the guest upon certain configurations.
Signed-off-by: Raghavendra Rao Ananta <[email protected]>
---
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/aarch64/hypercalls.c | 358 ++++++++++++++++++
3 files changed, 360 insertions(+)
create mode 100644 tools/testing/selftests/kvm/aarch64/hypercalls.c
diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index 99ed78ddd63f..bdecb27411ad 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -2,6 +2,7 @@
/aarch64/arch_timer
/aarch64/debug-exceptions
/aarch64/get-reg-list
+/aarch64/hypercalls
/aarch64/psci_test
/aarch64/vgic_init
/s390x/memop
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 5be2690168a8..ea96a92493d5 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -94,6 +94,7 @@ TEST_GEN_PROGS_x86_64 += system_counter_offset_test
TEST_GEN_PROGS_aarch64 += aarch64/arch_timer
TEST_GEN_PROGS_aarch64 += aarch64/debug-exceptions
TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
+TEST_GEN_PROGS_aarch64 += aarch64/hypercalls
TEST_GEN_PROGS_aarch64 += aarch64/psci_test
TEST_GEN_PROGS_aarch64 += aarch64/vgic_init
TEST_GEN_PROGS_aarch64 += demand_paging_test
diff --git a/tools/testing/selftests/kvm/aarch64/hypercalls.c b/tools/testing/selftests/kvm/aarch64/hypercalls.c
new file mode 100644
index 000000000000..386f754ca99d
--- /dev/null
+++ b/tools/testing/selftests/kvm/aarch64/hypercalls.c
@@ -0,0 +1,358 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <errno.h>
+#include <linux/arm-smccc.h>
+#include <asm/kvm.h>
+#include <kvm_util.h>
+
+#include "processor.h"
+
+#define FW_REG_ULIMIT_VAL(max_feat_bit) (GENMASK_ULL(max_feat_bit, 0))
+
+struct kvm_fw_reg_info {
+ uint64_t reg; /* Register definition */
+ uint64_t max_feat_bit; /* Bit that represents the upper limit of the feature-map */
+};
+
+#define FW_REG_INFO(r, bit_max) \
+ { \
+ .reg = r, \
+ .max_feat_bit = bit_max, \
+ }
+
+static const struct kvm_fw_reg_info fw_reg_info[] = {
+ FW_REG_INFO(KVM_REG_ARM_STD_BMAP, KVM_REG_ARM_STD_BMAP_BIT_MAX),
+ FW_REG_INFO(KVM_REG_ARM_STD_HYP_BMAP, KVM_REG_ARM_STD_HYP_BMAP_BIT_MAX),
+ FW_REG_INFO(KVM_REG_ARM_VENDOR_HYP_BMAP, KVM_REG_ARM_VENDOR_HYP_BMAP_BIT_MAX),
+};
+
+enum test_stage {
+ TEST_STAGE_REG_IFACE,
+ TEST_STAGE_HVC_IFACE_FEAT_DISABLED,
+ TEST_STAGE_HVC_IFACE_FEAT_ENABLED,
+ TEST_STAGE_END,
+};
+
+static int stage;
+
+struct test_hvc_info {
+ uint32_t func_id;
+ int64_t arg0;
+
+ void (*test_hvc_disabled)(const struct test_hvc_info *hc_info,
+ struct arm_smccc_res *res);
+ void (*test_hvc_enabled)(const struct test_hvc_info *hc_info,
+ struct arm_smccc_res *res);
+};
+
+#define TEST_HVC_INFO(f, a0, test_disabled, test_enabled) \
+ { \
+ .func_id = f, \
+ .arg0 = a0, \
+ .test_hvc_disabled = test_disabled, \
+ .test_hvc_enabled = test_enabled, \
+ }
+
+static void
+test_ptp_feat_hvc_disabled(const struct test_hvc_info *hc_info, struct arm_smccc_res *res)
+{
+ GUEST_ASSERT_3((res->a0 & BIT(ARM_SMCCC_KVM_FUNC_PTP)) == 0,
+ res->a0, hc_info->func_id, hc_info->arg0);
+}
+
+static void
+test_ptp_feat_hvc_enabled(const struct test_hvc_info *hc_info, struct arm_smccc_res *res)
+{
+ GUEST_ASSERT_3((res->a0 & BIT(ARM_SMCCC_KVM_FUNC_PTP)) != 0,
+ res->a0, hc_info->func_id, hc_info->arg0);
+}
+
+static const struct test_hvc_info hvc_info[] = {
+ /* KVM_REG_ARM_STD_BMAP: KVM_REG_ARM_STD_BIT_TRNG_V1_0 */
+ TEST_HVC_INFO(ARM_SMCCC_TRNG_VERSION, 0, NULL, NULL),
+ TEST_HVC_INFO(ARM_SMCCC_TRNG_FEATURES, ARM_SMCCC_TRNG_RND64, NULL, NULL),
+ TEST_HVC_INFO(ARM_SMCCC_TRNG_GET_UUID, 0, NULL, NULL),
+ TEST_HVC_INFO(ARM_SMCCC_TRNG_RND32, 0, NULL, NULL),
+ TEST_HVC_INFO(ARM_SMCCC_TRNG_RND64, 0, NULL, NULL),
+
+ /* KVM_REG_ARM_STD_HYP_BMAP: KVM_REG_ARM_STD_HYP_BIT_PV_TIME */
+ TEST_HVC_INFO(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
+ ARM_SMCCC_HV_PV_TIME_FEATURES, NULL, NULL),
+ TEST_HVC_INFO(ARM_SMCCC_HV_PV_TIME_FEATURES,
+ ARM_SMCCC_HV_PV_TIME_ST, NULL, NULL),
+ TEST_HVC_INFO(ARM_SMCCC_HV_PV_TIME_ST, 0, NULL, NULL),
+
+ /* KVM_REG_ARM_VENDOR_HYP_BMAP: KVM_REG_ARM_VENDOR_HYP_BIT_PTP */
+ TEST_HVC_INFO(ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID,
+ ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID,
+ test_ptp_feat_hvc_disabled, test_ptp_feat_hvc_enabled),
+ TEST_HVC_INFO(ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID,
+ KVM_PTP_VIRT_COUNTER, NULL, NULL),
+};
+
+static void guest_test_hvc(int stage)
+{
+ unsigned int i;
+ struct arm_smccc_res res;
+
+ for (i = 0; i < ARRAY_SIZE(hvc_info); i++) {
+ const struct test_hvc_info *hc_info = &hvc_info[i];
+
+ memset(&res, 0, sizeof(res));
+ smccc_hvc(hc_info->func_id, hc_info->arg0, 0, 0, 0, 0, 0, 0, &res);
+
+ switch (stage) {
+ case TEST_STAGE_HVC_IFACE_FEAT_DISABLED:
+ if (hc_info->test_hvc_disabled)
+ hc_info->test_hvc_disabled(hc_info, &res);
+ else
+ GUEST_ASSERT_3(res.a0 == SMCCC_RET_NOT_SUPPORTED,
+ res.a0, hc_info->func_id, hc_info->arg0);
+ break;
+ case TEST_STAGE_HVC_IFACE_FEAT_ENABLED:
+ if (hc_info->test_hvc_enabled)
+ hc_info->test_hvc_enabled(hc_info, &res);
+ else
+ GUEST_ASSERT_3(res.a0 != SMCCC_RET_NOT_SUPPORTED,
+ res.a0, hc_info->func_id, hc_info->arg0);
+ break;
+ default:
+ GUEST_ASSERT_1(0, stage);
+ }
+ }
+}
+
+static void guest_code(void)
+{
+ while (stage != TEST_STAGE_END) {
+ switch (stage) {
+ case TEST_STAGE_REG_IFACE:
+ break;
+ case TEST_STAGE_HVC_IFACE_FEAT_DISABLED:
+ case TEST_STAGE_HVC_IFACE_FEAT_ENABLED:
+ guest_test_hvc(stage);
+ break;
+ default:
+ GUEST_ASSERT_1(0, stage);
+ }
+
+ GUEST_SYNC(stage);
+ }
+
+ GUEST_DONE();
+}
+
+static int set_fw_reg(struct kvm_vm *vm, uint64_t id, uint64_t val)
+{
+ struct kvm_one_reg reg = {
+ .id = KVM_REG_ARM_FW_REG(id),
+ .addr = (uint64_t)&val,
+ };
+
+ return _vcpu_ioctl(vm, 0, KVM_SET_ONE_REG, ®);
+}
+
+static void get_fw_reg(struct kvm_vm *vm, uint64_t id, uint64_t *addr)
+{
+ struct kvm_one_reg reg = {
+ .id = KVM_REG_ARM_FW_REG(id),
+ .addr = (uint64_t)addr,
+ };
+
+ return vcpu_ioctl(vm, 0, KVM_GET_ONE_REG, ®);
+}
+
+struct st_time {
+ uint32_t rev;
+ uint32_t attr;
+ uint64_t st_time;
+};
+
+#define STEAL_TIME_SIZE ((sizeof(struct st_time) + 63) & ~63)
+#define ST_GPA_BASE (1 << 30)
+
+static void steal_time_init(struct kvm_vm *vm)
+{
+ uint64_t st_ipa = (ulong)ST_GPA_BASE;
+ unsigned int gpages;
+ struct kvm_device_attr dev = {
+ .group = KVM_ARM_VCPU_PVTIME_CTRL,
+ .attr = KVM_ARM_VCPU_PVTIME_IPA,
+ .addr = (uint64_t)&st_ipa,
+ };
+
+ gpages = vm_calc_num_guest_pages(VM_MODE_DEFAULT, STEAL_TIME_SIZE);
+ vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, ST_GPA_BASE, 1, gpages, 0);
+
+ vcpu_ioctl(vm, 0, KVM_SET_DEVICE_ATTR, &dev);
+}
+
+static void test_fw_regs_before_vm_start(struct kvm_vm *vm)
+{
+ uint64_t val;
+ unsigned int i;
+ int ret;
+
+ for (i = 0; i < ARRAY_SIZE(fw_reg_info); i++) {
+ const struct kvm_fw_reg_info *reg_info = &fw_reg_info[i];
+
+ /* First read should be an upper limit of the features supported */
+ get_fw_reg(vm, reg_info->reg, &val);
+ TEST_ASSERT(val == FW_REG_ULIMIT_VAL(reg_info->max_feat_bit),
+ "Expected all the features to be set for reg: 0x%lx; expected: 0x%llx; read: 0x%lx\n",
+ reg_info->reg, GENMASK_ULL(reg_info->max_feat_bit, 0), val);
+
+ /* Test 'write' by disabling all the features of the register map */
+ ret = set_fw_reg(vm, reg_info->reg, 0);
+ TEST_ASSERT(ret == 0,
+ "Failed to clear all the features of reg: 0x%lx; ret: %d\n",
+ reg_info->reg, errno);
+
+ get_fw_reg(vm, reg_info->reg, &val);
+ TEST_ASSERT(val == 0,
+ "Expected all the features to be cleared for reg: 0x%lx\n", reg_info->reg);
+
+ /*
+ * Test enabling a feature that's not supported.
+ * Avoid this check if all the bits are occupied.
+ */
+ if (reg_info->max_feat_bit < 63) {
+ ret = set_fw_reg(vm, reg_info->reg, BIT(reg_info->max_feat_bit + 1));
+ TEST_ASSERT(ret != 0 && errno == EINVAL,
+ "Unexpected behavior or return value (%d) while setting an unsupported feature for reg: 0x%lx\n",
+ errno, reg_info->reg);
+ }
+ }
+}
+
+static void test_fw_regs_after_vm_start(struct kvm_vm *vm)
+{
+ uint64_t val;
+ unsigned int i;
+ int ret;
+
+ for (i = 0; i < ARRAY_SIZE(fw_reg_info); i++) {
+ const struct kvm_fw_reg_info *reg_info = &fw_reg_info[i];
+
+ /*
+ * Before starting the VM, the test clears all the bits.
+ * Check if that's still the case.
+ */
+ get_fw_reg(vm, reg_info->reg, &val);
+ TEST_ASSERT(val == 0,
+ "Expected all the features to be cleared for reg: 0x%lx\n",
+ reg_info->reg);
+
+ /*
+ * Test setting the last known value. KVM should allow this
+ * even if VM has started running.
+ */
+ ret = set_fw_reg(vm, reg_info->reg, val);
+ TEST_ASSERT(ret == 0,
+ "Failed to clear all the features of reg: 0x%lx; ret: %d\n",
+ reg_info->reg, errno);
+
+ /*
+ * Set all the features for this register again. KVM shouldn't
+ * allow this as the VM is running.
+ */
+ ret = set_fw_reg(vm, reg_info->reg, FW_REG_ULIMIT_VAL(reg_info->max_feat_bit));
+ TEST_ASSERT(ret != 0 && errno == EBUSY,
+ "Unexpected behavior or return value (%d) while setting a feature while VM is running for reg: 0x%lx\n",
+ errno, reg_info->reg);
+ }
+}
+
+static struct kvm_vm *test_vm_create(void)
+{
+ struct kvm_vm *vm;
+
+ vm = vm_create_default(0, 0, guest_code);
+
+ ucall_init(vm, NULL);
+ steal_time_init(vm);
+
+ return vm;
+}
+
+static struct kvm_vm *test_guest_stage(struct kvm_vm *vm)
+{
+ struct kvm_vm *ret_vm = vm;
+
+ pr_debug("Stage: %d\n", stage);
+
+ switch (stage) {
+ case TEST_STAGE_REG_IFACE:
+ test_fw_regs_after_vm_start(vm);
+ break;
+ case TEST_STAGE_HVC_IFACE_FEAT_DISABLED:
+ /* Start a new VM so that all the features are now enabled by default */
+ kvm_vm_free(vm);
+ ret_vm = test_vm_create();
+ break;
+ case TEST_STAGE_HVC_IFACE_FEAT_ENABLED:
+ break;
+ default:
+ TEST_FAIL("Unknown test stage: %d\n", stage);
+ }
+
+ stage++;
+ sync_global_to_guest(vm, stage);
+
+ return ret_vm;
+}
+
+static void test_run(void)
+{
+ struct kvm_vm *vm;
+ struct ucall uc;
+ bool guest_done = false;
+
+ vm = test_vm_create();
+
+ test_fw_regs_before_vm_start(vm);
+
+ while (!guest_done) {
+ vcpu_run(vm, 0);
+
+ switch (get_ucall(vm, 0, &uc)) {
+ case UCALL_SYNC:
+ vm = test_guest_stage(vm);
+ break;
+ case UCALL_DONE:
+ guest_done = true;
+ break;
+ case UCALL_ABORT:
+ TEST_FAIL("%s at %s:%ld\n\tvalues: 0x%lx, %lu; %lu, stage: %u",
+ (const char *)uc.args[0], __FILE__, uc.args[1],
+ uc.args[2], uc.args[3], uc.args[4], stage);
+ break;
+ default:
+ TEST_FAIL("Unexpected guest exit\n");
+ }
+ }
+
+ kvm_vm_free(vm);
+}
+
+int main(void)
+{
+ unsigned int num_fw_bmap_regs;
+
+ setbuf(stdout, NULL);
+
+ num_fw_bmap_regs = kvm_check_cap(KVM_CAP_ARM_HVC_FW_REG_BMAP);
+ if (!num_fw_bmap_regs) {
+ print_skip("ARM64 fw registers bitmap not supported\n");
+ exit(KSFT_SKIP);
+ } else if (num_fw_bmap_regs != ARRAY_SIZE(fw_reg_info)) {
+ print_skip("Mismatched fw registers between kernel (%u) and test (%lu)\n",
+ num_fw_bmap_regs, ARRAY_SIZE(fw_reg_info));
+ exit(KSFT_SKIP);
+ }
+
+ pr_info("Number of firmware bitmap registers discovered: %u\n", num_fw_bmap_regs);
+
+ test_run();
+ return 0;
+}
--
2.34.1.448.ga2b2bfdf31-goog
Add the psuedo-firmware registers KVM_REG_ARM_STD_BMAP,
KVM_REG_ARM_STD_HYP_BMAP, and KVM_REG_ARM_VENDOR_HYP_BMAP to
the base_regs[] list.
Signed-off-by: Raghavendra Rao Ananta <[email protected]>
---
tools/testing/selftests/kvm/aarch64/get-reg-list.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tools/testing/selftests/kvm/aarch64/get-reg-list.c b/tools/testing/selftests/kvm/aarch64/get-reg-list.c
index cc898181faab..6321f4472fdf 100644
--- a/tools/testing/selftests/kvm/aarch64/get-reg-list.c
+++ b/tools/testing/selftests/kvm/aarch64/get-reg-list.c
@@ -686,6 +686,9 @@ static __u64 base_regs[] = {
KVM_REG_ARM_FW_REG(0),
KVM_REG_ARM_FW_REG(1),
KVM_REG_ARM_FW_REG(2),
+ KVM_REG_ARM_FW_REG(3), /* KVM_REG_ARM_STD_BMAP */
+ KVM_REG_ARM_FW_REG(4), /* KVM_REG_ARM_STD_HYP_BMAP */
+ KVM_REG_ARM_FW_REG(5), /* KVM_REG_ARM_VENDOR_HYP_BMAP */
ARM64_SYS_REG(3, 3, 14, 3, 1), /* CNTV_CTL_EL0 */
ARM64_SYS_REG(3, 3, 14, 3, 2), /* CNTV_CVAL_EL0 */
ARM64_SYS_REG(3, 3, 14, 0, 2),
--
2.34.1.448.ga2b2bfdf31-goog
Hi Raghu,
On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta
<[email protected]> wrote:
>
> Capture the start of the KVM VM, which is basically the
> start of any vCPU run. This state of the VM is helpful
> in the upcoming patches to prevent user-space from
> configuring certain VM features after the VM has started
> running.
>
> Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> ---
> include/linux/kvm_host.h | 3 +++
> virt/kvm/kvm_main.c | 9 +++++++++
> 2 files changed, 12 insertions(+)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index c310648cc8f1..d0bd8f7a026c 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -623,6 +623,7 @@ struct kvm {
> struct notifier_block pm_notifier;
> #endif
> char stats_id[KVM_STATS_NAME_SIZE];
> + bool vm_started;
Since KVM_RUN on any vCPUs doesn't necessarily mean that the VM
started yet, the name might be a bit misleading IMHO. I would
think 'has_run_once' or 'ran_once' might be more clear (?).
> };
>
> #define kvm_err(fmt, ...) \
> @@ -1666,6 +1667,8 @@ static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu)
> }
> }
>
> +#define kvm_vm_has_started(kvm) (kvm->vm_started)
> +
> extern bool kvm_rebooting;
>
> extern unsigned int halt_poll_ns;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 72c4e6b39389..962b91ac2064 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3686,6 +3686,7 @@ static long kvm_vcpu_ioctl(struct file *filp,
> int r;
> struct kvm_fpu *fpu = NULL;
> struct kvm_sregs *kvm_sregs = NULL;
> + struct kvm *kvm = vcpu->kvm;
>
> if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_dead)
> return -EIO;
> @@ -3723,6 +3724,14 @@ static long kvm_vcpu_ioctl(struct file *filp,
> if (oldpid)
> synchronize_rcu();
> put_pid(oldpid);
> +
> + /*
> + * Since we land here even on the first vCPU run,
> + * we can mark that the VM has started running.
> + */
It might be nicer to add a comment why the code below gets kvm->lock.
Anyway, the patch generally looks good to me, and thank you
for making this change (it works for my purpose as well).
Reviewed-by: Reiji Watanabe <[email protected]>
Thanks,
Reiji
> + mutex_lock(&kvm->lock);
> + kvm->vm_started = true;
> + mutex_unlock(&kvm->lock);
> }
> r = kvm_arch_vcpu_ioctl_run(vcpu);
> trace_kvm_userspace_exit(vcpu->run->exit_reason, r);
> --
> 2.34.1.448.ga2b2bfdf31-goog
>
Hi Reiji,
On Thu, Jan 6, 2022 at 10:07 PM Reiji Watanabe <[email protected]> wrote:
>
> Hi Raghu,
>
> On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta
> <[email protected]> wrote:
> >
> > Capture the start of the KVM VM, which is basically the
> > start of any vCPU run. This state of the VM is helpful
> > in the upcoming patches to prevent user-space from
> > configuring certain VM features after the VM has started
> > running.
> >
> > Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> > ---
> > include/linux/kvm_host.h | 3 +++
> > virt/kvm/kvm_main.c | 9 +++++++++
> > 2 files changed, 12 insertions(+)
> >
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index c310648cc8f1..d0bd8f7a026c 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -623,6 +623,7 @@ struct kvm {
> > struct notifier_block pm_notifier;
> > #endif
> > char stats_id[KVM_STATS_NAME_SIZE];
> > + bool vm_started;
>
> Since KVM_RUN on any vCPUs doesn't necessarily mean that the VM
> started yet, the name might be a bit misleading IMHO. I would
> think 'has_run_once' or 'ran_once' might be more clear (?).
>
I always struggle with the names; but if you feel that 'ran_once'
makes more sense for a reader, I can change it.
>
> > };
> >
> > #define kvm_err(fmt, ...) \
> > @@ -1666,6 +1667,8 @@ static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu)
> > }
> > }
> >
> > +#define kvm_vm_has_started(kvm) (kvm->vm_started)
> > +
> > extern bool kvm_rebooting;
> >
> > extern unsigned int halt_poll_ns;
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 72c4e6b39389..962b91ac2064 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -3686,6 +3686,7 @@ static long kvm_vcpu_ioctl(struct file *filp,
> > int r;
> > struct kvm_fpu *fpu = NULL;
> > struct kvm_sregs *kvm_sregs = NULL;
> > + struct kvm *kvm = vcpu->kvm;
> >
> > if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_dead)
> > return -EIO;
> > @@ -3723,6 +3724,14 @@ static long kvm_vcpu_ioctl(struct file *filp,
> > if (oldpid)
> > synchronize_rcu();
> > put_pid(oldpid);
> > +
> > + /*
> > + * Since we land here even on the first vCPU run,
> > + * we can mark that the VM has started running.
> > + */
>
> It might be nicer to add a comment why the code below gets kvm->lock.
>
I've been going back and forth on this one. Initially I considered
simply going with atomic_t, but the patch 4/11 (KVM: arm64: Setup a
framework for hypercall bitmap firmware registers)
kvm_arm_set_fw_reg_bmap()'s implementation felt like we need a lock to
have the whole 'is the register busy?' operation atomic. But, that's
just one of the applications.
> Anyway, the patch generally looks good to me, and thank you
> for making this change (it works for my purpose as well).
>
> Reviewed-by: Reiji Watanabe <[email protected]>
>
Glad that it's helping you as well and thanks for the review.
Regards,
Raghavendra
> Thanks,
> Reiji
>
>
> > + mutex_lock(&kvm->lock);
> > + kvm->vm_started = true;
> > + mutex_unlock(&kvm->lock);
> > }
> > r = kvm_arch_vcpu_ioctl_run(vcpu);
> > trace_kvm_userspace_exit(vcpu->run->exit_reason, r);
> > --
> > 2.34.1.448.ga2b2bfdf31-goog
> >
On Fri, Jan 7, 2022 at 3:43 PM Raghavendra Rao Ananta
<[email protected]> wrote:
>
> Hi Reiji,
>
> On Thu, Jan 6, 2022 at 10:07 PM Reiji Watanabe <[email protected]> wrote:
> >
> > Hi Raghu,
> >
> > On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta
> > <[email protected]> wrote:
> > >
> > > Capture the start of the KVM VM, which is basically the
> > > start of any vCPU run. This state of the VM is helpful
> > > in the upcoming patches to prevent user-space from
> > > configuring certain VM features after the VM has started
> > > running.
What about live migration, where the VM has already technically been
started before the first call to KVM_RUN?
On Tue, Jan 04, 2022, Raghavendra Rao Ananta wrote:
> Capture the start of the KVM VM, which is basically the
Please wrap at ~75 chars.
> start of any vCPU run. This state of the VM is helpful
> in the upcoming patches to prevent user-space from
> configuring certain VM features after the VM has started
> running.
Please provide context of how the flag will be used. I glanced at the future
patches, and knowing very little about arm, I was unable to glean useful info
about exactly who is being prevented from doing what.
>
> Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> ---
> include/linux/kvm_host.h | 3 +++
> virt/kvm/kvm_main.c | 9 +++++++++
> 2 files changed, 12 insertions(+)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index c310648cc8f1..d0bd8f7a026c 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -623,6 +623,7 @@ struct kvm {
> struct notifier_block pm_notifier;
> #endif
> char stats_id[KVM_STATS_NAME_SIZE];
> + bool vm_started;
> };
>
> #define kvm_err(fmt, ...) \
> @@ -1666,6 +1667,8 @@ static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu)
> }
> }
>
> +#define kvm_vm_has_started(kvm) (kvm->vm_started)
Needs parantheses around (kvm), but why bother with a macro? This is the same
header that defines struct kvm.
> +
> extern bool kvm_rebooting;
>
> extern unsigned int halt_poll_ns;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 72c4e6b39389..962b91ac2064 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3686,6 +3686,7 @@ static long kvm_vcpu_ioctl(struct file *filp,
> int r;
> struct kvm_fpu *fpu = NULL;
> struct kvm_sregs *kvm_sregs = NULL;
> + struct kvm *kvm = vcpu->kvm;
If you're going to bother grabbing kvm, replace the instances below that also do
vcpu->kvm.
>
> if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_dead)
> return -EIO;
> @@ -3723,6 +3724,14 @@ static long kvm_vcpu_ioctl(struct file *filp,
> if (oldpid)
> synchronize_rcu();
> put_pid(oldpid);
> +
> + /*
> + * Since we land here even on the first vCPU run,
> + * we can mark that the VM has started running.
Please avoid "we", "us", etc..
"vm_started" is also ambiguous. If we end up with a flag, then I would prefer a
much more literal name, a la created_vcpus, e.g. ran_vcpus or something.
> + */
> + mutex_lock(&kvm->lock);
This adds unnecessary lock contention when running vCPUs. The naive solution
would be:
if (!kvm->vm_started) {
...
}
> + kvm->vm_started = true;
> + mutex_unlock(&kvm->lock);
Lastly, why is this in generic KVM?
> }
> r = kvm_arch_vcpu_ioctl_run(vcpu);
> trace_kvm_userspace_exit(vcpu->run->exit_reason, r);
> --
> 2.34.1.448.ga2b2bfdf31-goog
>
> _______________________________________________
> kvmarm mailing list
> [email protected]
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Hi Raghu,
On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta
<[email protected]> wrote:
>
> Introduce the KVM ARM64 capability, KVM_CAP_ARM_HVC_FW_REG_BMAP,
> to indicate the support for psuedo-firmware bitmap extension.
> Each of these registers holds a feature-set exposed to the guest
> in the form of a bitmap. If supported, a simple 'read' of the
> capability should return the number of psuedo-firmware registers
> supported. User-space can utilize this to discover the registers.
> It can further explore or modify the features using the classical
> GET/SET_ONE_REG interface.
>
> Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> ---
> Documentation/virt/kvm/api.rst | 21 +++++++++++++++++++++
> include/uapi/linux/kvm.h | 1 +
> 2 files changed, 22 insertions(+)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index aeeb071c7688..646176537f2c 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6925,6 +6925,27 @@ indicated by the fd to the VM this is called on.
> This is intended to support intra-host migration of VMs between userspace VMMs,
> upgrading the VMM process without interrupting the guest.
>
> +7.30 KVM_CAP_ARM_HVC_FW_REG_BMAP
IMHO, instead of including its format of the register in the name,
including its purpose/function in the name might be better.
e.g. KVM_CAP_ARM_HVC_FEATURE_REG ?
(Feature fields don't necessarily have to be in a bitmap format
if they don't fit well although I'm not sure if we have such fields.)
> +--------------------------------
> +
> +:Architectures: arm64
> +:Parameters: None
> +:Returns: Number of psuedo-firmware registers supported
Looking at patch-4, the return value of this would be the number of
pseudo-firmware *bitmap* registers supported.
BTW, "4.68 KVM_SET_ONE_REG" in the doc uses the word "arm64 firmware
pseudo-registers". It would be nicer to use the same term.
> +
> +This capability indicates that KVM for arm64 supports the psuedo-firmware
> +register bitmap extension. Each of these registers represent the features
> +supported by a particular type in the form of a bitmap. By default, these
> +registers are set with the upper limit of the features that are supported.
> +
> +The registers can be accessed via the standard SET_ONE_REG and KVM_GET_ONE_REG
> +interfaces. The user-space is expected to read the number of these registers
> +available by reading KVM_CAP_ARM_HVC_FW_REG_BMAP, read the current bitmap
> +configuration via GET_ONE_REG for each register, and then write back the
> +desired bitmap of features that it wishes the guest to see via SET_ONE_REG.
> +
> +Note that KVM doesn't allow the user-space to modify these registers after
> +the VM (any of the vCPUs) has started running.
Since even if KVM_RUN fails, and the VM hasn't started yet,
it will get immutable. So, "after any of the vCPUs run KVM_RUN."
might be more clear ?
Thanks,
Reiji
> +
> 8. Other capabilities.
> ======================
>
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 1daa45268de2..209b43dbbc3c 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1131,6 +1131,7 @@ struct kvm_ppc_resize_hpt {
> #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
> #define KVM_CAP_ARM_MTE 205
> #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
> +#define KVM_CAP_ARM_HVC_FW_REG_BMAP 207
>
> #ifdef KVM_CAP_IRQ_ROUTING
>
> --
> 2.34.1.448.ga2b2bfdf31-goog
>
Hi Raghu,
On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta
<[email protected]> wrote:
>
> KVM regularly introduces new hypercall services to the guests without
> any consent from the Virtual Machine Manager (VMM). This means, the
> guests can observe hypercall services in and out as they migrate
> across various host kernel versions. This could be a major problem
> if the guest discovered a hypercall, started using it, and after
> getting migrated to an older kernel realizes that it's no longer
> available. Depending on how the guest handles the change, there's
> a potential chance that the guest would just panic.
>
> As a result, there's a need for the VMM to elect the services that
> it wishes the guest to discover. VMM can elect these services based
> on the kernels spread across its (migration) fleet. To remedy this,
> extend the existing firmware psuedo-registers, such as
> KVM_REG_ARM_PSCI_VERSION, for all the hypercall services available.
>
> These firmware registers are categorized based on the service call
> owners, and unlike the existing firmware psuedo-registers, they hold
> the features supported in the form of a bitmap.
>
> The capability, KVM_CAP_ARM_HVC_FW_REG_BMAP, is used to announce
> this extension, which returns the number of psuedo-firmware
> registers supported. During the VM initialization, the registers
> holds an upper-limit of the features supported by the corresponding
> registers. It's expected that the VMMs discover the features
> provided by each register via GET_ONE_REG, and writeback the
> desired values using SET_ONE_REG. KVM allows this modification
> only until the VM has started.
>
> Older VMMs can simply ignore the capability and the hypercall services
> will be exposed unconditionally to the guests, thus ensuring backward
> compatibility.
>
> In this patch, the framework adds the register only for ARM's standard
> secure services (owner value 4). Currently, this includes support only
> for ARM True Random Number Generator (TRNG) service, with bit-0 of the
> register representing mandatory features of v1.0. Other services are
> momentarily added in the upcoming patches.
>
> Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> ---
> arch/arm64/include/asm/kvm_host.h | 12 ++++
> arch/arm64/include/uapi/asm/kvm.h | 4 ++
> arch/arm64/kvm/arm.c | 4 ++
> arch/arm64/kvm/hypercalls.c | 103 +++++++++++++++++++++++++++++-
> arch/arm64/kvm/trng.c | 8 +--
> include/kvm/arm_hypercalls.h | 6 ++
> 6 files changed, 129 insertions(+), 8 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 2a5f7f38006f..a32cded0371b 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -102,6 +102,15 @@ struct kvm_s2_mmu {
> struct kvm_arch_memory_slot {
> };
>
> +/**
> + * struct kvm_hvc_desc: KVM ARM64 hypercall descriptor
> + *
> + * @hvc_std_bmap: Bitmap of standard secure service calls
> + */
> +struct kvm_hvc_desc {
> + u64 hvc_std_bmap;
> +};
> +
> struct kvm_arch {
> struct kvm_s2_mmu mmu;
>
> @@ -137,6 +146,9 @@ struct kvm_arch {
>
> /* Memory Tagging Extension enabled for the guest */
> bool mte_enabled;
> +
> + /* Hypercall firmware register' descriptor */
> + struct kvm_hvc_desc hvc_desc;
> };
>
> struct kvm_vcpu_fault_info {
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index b3edde68bc3e..0d6f29c58456 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -281,6 +281,10 @@ struct kvm_arm_copy_mte_tags {
> #define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED 3
> #define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED (1U << 4)
>
> +#define KVM_REG_ARM_STD_BMAP KVM_REG_ARM_FW_REG(3)
> +#define KVM_REG_ARM_STD_BIT_TRNG_V1_0 BIT(0)
> +#define KVM_REG_ARM_STD_BMAP_BIT_MAX 0 /* Last valid bit */
> +
> /* SVE registers */
> #define KVM_REG_ARM64_SVE (0x15 << KVM_REG_ARM_COPROC_SHIFT)
>
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index e4727dc771bf..56fe81565235 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -156,6 +156,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> kvm->arch.max_vcpus = kvm_arm_default_max_vcpus();
>
> set_default_spectre(kvm);
> + kvm_arm_init_hypercalls(kvm);
>
> return ret;
> out_free_stage2_pgd:
> @@ -283,6 +284,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> case KVM_CAP_ARM_PTRAUTH_GENERIC:
> r = system_has_full_ptr_auth();
> break;
> + case KVM_CAP_ARM_HVC_FW_REG_BMAP:
> + r = kvm_arm_num_fw_bmap_regs();
> + break;
Looking at the discussion for the v2 series,
https://lore.kernel.org/kvmarm/[email protected]/
I assume that the number of the pseudo-firmware bitmap registers
will be used to clear pseudo firmware bitmap registers that
userspace doesn't know.
I'm wondering how userspace can identify which pseudo-firmware
registers that KVM_GET_REG_LIST provides are the pseudo-firmware
bitmap registers that it doesn't know.
For instance, suppose pseudo-firmware registers that KVM_GET_REG_LIST
provides are KVM_REG_ARM_FW_REG(0) to KVM_REG_ARM_FW_REG(9), userspace
doesn't knows KVM_REG_ARM_FW_REG(6) to KVM_REG_ARM_FW_REG(9), and
KVM_CAP_ARM_HVC_FW_REG_BMAP returns 5, how can userspace identify
remaining two bitmap registers from those 4 (fw-reg #6 to #9)
firmware registers ?
> default:
> r = 0;
> }
> diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> index 3c2fcf31ad3d..06243e4670eb 100644
> --- a/arch/arm64/kvm/hypercalls.c
> +++ b/arch/arm64/kvm/hypercalls.c
> @@ -58,6 +58,29 @@ static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
> val[3] = lower_32_bits(cycles);
> }
>
> +static bool kvm_arm_fw_reg_feat_enabled(u64 reg_bmap, u64 feat_bit)
> +{
> + return reg_bmap & feat_bit;
> +}
> +
> +bool kvm_hvc_call_supported(struct kvm_vcpu *vcpu, u32 func_id)
> +{
> + struct kvm_hvc_desc *hvc_desc = &vcpu->kvm->arch.hvc_desc;
> +
> + switch (func_id) {
> + case ARM_SMCCC_TRNG_VERSION:
> + case ARM_SMCCC_TRNG_FEATURES:
> + case ARM_SMCCC_TRNG_GET_UUID:
> + case ARM_SMCCC_TRNG_RND32:
> + case ARM_SMCCC_TRNG_RND64:
> + return kvm_arm_fw_reg_feat_enabled(hvc_desc->hvc_std_bmap,
> + KVM_REG_ARM_STD_BIT_TRNG_V1_0);
> + default:
> + /* By default, allow the services that aren't listed here */
> + return true;
> + }
> +}
kvm_hvc_call_supported() could return true even for @func_id that
kvm_hvc_call_handler() returns -EINVAL for. Is this behavior what
you really want ?
If so, IMHO the function name might be a bit mis-leading.
"kvm_hvc_call_disabled" (and flip the return value)
might be closer to what it does(?).
> +
> int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> {
> u32 func_id = smccc_get_function(vcpu);
> @@ -65,6 +88,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> u32 feature;
> gpa_t gpa;
>
> + if (!kvm_hvc_call_supported(vcpu, func_id))
> + goto out;
> +
> switch (func_id) {
> case ARM_SMCCC_VERSION_FUNC_ID:
> val[0] = ARM_SMCCC_VERSION_1_1;
> @@ -143,6 +169,7 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> return kvm_psci_call(vcpu);
> }
>
> +out:
> smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
> return 1;
> }
> @@ -153,9 +180,25 @@ static const u64 kvm_arm_fw_reg_ids[] = {
> KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2,
> };
>
> +static const u64 kvm_arm_fw_reg_bmap_ids[] = {
> + KVM_REG_ARM_STD_BMAP,
> +};
> +
> +void kvm_arm_init_hypercalls(struct kvm *kvm)
> +{
> + struct kvm_hvc_desc *hvc_desc = &kvm->arch.hvc_desc;
> +
> + hvc_desc->hvc_std_bmap = ARM_SMCCC_STD_FEATURES;
> +}
> +
> +int kvm_arm_num_fw_bmap_regs(void)
> +{
> + return ARRAY_SIZE(kvm_arm_fw_reg_bmap_ids);
> +}
> +
> int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu)
> {
> - return ARRAY_SIZE(kvm_arm_fw_reg_ids);
> + return ARRAY_SIZE(kvm_arm_fw_reg_ids) + kvm_arm_num_fw_bmap_regs();
> }
>
> int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
> @@ -167,6 +210,11 @@ int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
> return -EFAULT;
> }
>
> + for (i = 0; i < ARRAY_SIZE(kvm_arm_fw_reg_bmap_ids); i++) {
> + if (put_user(kvm_arm_fw_reg_bmap_ids[i], uindices++))
> + return -EFAULT;
> + }
> +
> return 0;
> }
>
> @@ -211,9 +259,20 @@ static int get_kernel_wa_level(u64 regid)
> return -EINVAL;
> }
>
> +static void
> +kvm_arm_get_fw_reg_bmap(struct kvm_vcpu *vcpu, u64 fw_reg_bmap, u64 *val)
> +{
> + struct kvm *kvm = vcpu->kvm;
> +
> + mutex_lock(&kvm->lock);
> + *val = fw_reg_bmap;
> + mutex_unlock(&kvm->lock);
Why does it need to hold the lock ? (Wouldn't READ_ONCE be enough ?)
> +}
> +
> int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> {
> void __user *uaddr = (void __user *)(long)reg->addr;
> + struct kvm_hvc_desc *hvc_desc = &vcpu->kvm->arch.hvc_desc;
> u64 val;
>
> switch (reg->id) {
> @@ -224,6 +283,9 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
> val = get_kernel_wa_level(reg->id) & KVM_REG_FEATURE_LEVEL_MASK;
> break;
> + case KVM_REG_ARM_STD_BMAP:
> + kvm_arm_get_fw_reg_bmap(vcpu, hvc_desc->hvc_std_bmap, &val);
> + break;
> default:
> return -ENOENT;
> }
> @@ -234,6 +296,43 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> return 0;
> }
>
> +static int kvm_arm_set_fw_reg_bmap(struct kvm_vcpu *vcpu, u64 reg_id, u64 val)
> +{
> + int ret = 0;
> + struct kvm *kvm = vcpu->kvm;
> + struct kvm_hvc_desc *hvc_desc = &kvm->arch.hvc_desc;
> + u64 *fw_reg_bmap, fw_reg_features;
> +
> + switch (reg_id) {
> + case KVM_REG_ARM_STD_BMAP:
> + fw_reg_bmap = &hvc_desc->hvc_std_bmap;
> + fw_reg_features = ARM_SMCCC_STD_FEATURES;
> + break;
> + default:
> + return -ENOENT;
> + }
> +
> + /* Check for unsupported bit */
> + if (val & ~fw_reg_features)
> + return -EINVAL;
> +
> + mutex_lock(&kvm->lock);
> +
> + /*
> + * If the VM (any vCPU) has already started running, return success
> + * if there's no change in the value. Else, return -EBUSY.
> + */
> + if (kvm_vm_has_started(kvm)) {
> + ret = *fw_reg_bmap != val ? -EBUSY : 0;
> + goto out;
> + }
> +
> + *fw_reg_bmap = val;
> +out:
> + mutex_unlock(&kvm->lock);
> + return ret;
> +}
> +
> int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> {
> void __user *uaddr = (void __user *)(long)reg->addr;
> @@ -310,6 +409,8 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> return -EINVAL;
>
> return 0;
> + case KVM_REG_ARM_STD_BMAP:
> + return kvm_arm_set_fw_reg_bmap(vcpu, reg->id, val);
> default:
> return -ENOENT;
> }
> diff --git a/arch/arm64/kvm/trng.c b/arch/arm64/kvm/trng.c
> index 99bdd7103c9c..23f912514b06 100644
> --- a/arch/arm64/kvm/trng.c
> +++ b/arch/arm64/kvm/trng.c
> @@ -60,14 +60,8 @@ int kvm_trng_call(struct kvm_vcpu *vcpu)
> val = ARM_SMCCC_TRNG_VERSION_1_0;
> break;
> case ARM_SMCCC_TRNG_FEATURES:
> - switch (smccc_get_arg1(vcpu)) {
> - case ARM_SMCCC_TRNG_VERSION:
> - case ARM_SMCCC_TRNG_FEATURES:
> - case ARM_SMCCC_TRNG_GET_UUID:
> - case ARM_SMCCC_TRNG_RND32:
> - case ARM_SMCCC_TRNG_RND64:
> + if (kvm_hvc_call_supported(vcpu, smccc_get_arg1(vcpu)))
> val = TRNG_SUCCESS;
kvm_hvc_call_supported() returns true for any values that are
not explicitly listed in kvm_hvc_call_supported() (i.e. it returns
true even for @func_id that are not any of ARM_SMCCC_TRNG_*).
So, I don't think it can simply use the current kvm_hvc_call_supported.
Thanks,
Reiji
> - }
> break;
> case ARM_SMCCC_TRNG_GET_UUID:
> smccc_set_retval(vcpu, le32_to_cpu(u[0]), le32_to_cpu(u[1]),
> diff --git a/include/kvm/arm_hypercalls.h b/include/kvm/arm_hypercalls.h
> index 5d38628a8d04..8fe68d8d6d96 100644
> --- a/include/kvm/arm_hypercalls.h
> +++ b/include/kvm/arm_hypercalls.h
> @@ -6,6 +6,9 @@
>
> #include <asm/kvm_emulate.h>
>
> +#define ARM_SMCCC_STD_FEATURES \
> + GENMASK_ULL(KVM_REG_ARM_STD_BMAP_BIT_MAX, 0)
> +
> int kvm_hvc_call_handler(struct kvm_vcpu *vcpu);
>
> static inline u32 smccc_get_function(struct kvm_vcpu *vcpu)
> @@ -42,9 +45,12 @@ static inline void smccc_set_retval(struct kvm_vcpu *vcpu,
>
> struct kvm_one_reg;
>
> +void kvm_arm_init_hypercalls(struct kvm *kvm);
> +int kvm_arm_num_fw_bmap_regs(void);
> int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu);
> int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
> int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
> int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
> +bool kvm_hvc_call_supported(struct kvm_vcpu *vcpu, u32 func_id);
>
> #endif
> --
> 2.34.1.448.ga2b2bfdf31-goog
>
On Fri, Jan 7, 2022 at 4:05 PM Jim Mattson <[email protected]> wrote:
>
> On Fri, Jan 7, 2022 at 3:43 PM Raghavendra Rao Ananta
> <[email protected]> wrote:
> >
> > Hi Reiji,
> >
> > On Thu, Jan 6, 2022 at 10:07 PM Reiji Watanabe <[email protected]> wrote:
> > >
> > > Hi Raghu,
> > >
> > > On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta
> > > <[email protected]> wrote:
> > > >
> > > > Capture the start of the KVM VM, which is basically the
> > > > start of any vCPU run. This state of the VM is helpful
> > > > in the upcoming patches to prevent user-space from
> > > > configuring certain VM features after the VM has started
> > > > running.
>
> What about live migration, where the VM has already technically been
> started before the first call to KVM_RUN?
My understanding is that a new 'struct kvm' is created on the target
machine and this flag should be reset, which would allow the VMM to
restore the firmware registers. However, we would be running KVM_RUN
for the first time on the target machine, thus setting the flag.
So, you are right; It's more of a resume operation from the guest's
point of view. I guess the name of the variable is what's confusing
here.
Thanks,
Raghavendra
On Fri, Jan 7, 2022 at 5:06 PM Sean Christopherson <[email protected]> wrote:
>
> On Tue, Jan 04, 2022, Raghavendra Rao Ananta wrote:
> > Capture the start of the KVM VM, which is basically the
>
> Please wrap at ~75 chars.
>
> > start of any vCPU run. This state of the VM is helpful
> > in the upcoming patches to prevent user-space from
> > configuring certain VM features after the VM has started
> > running.
>
> Please provide context of how the flag will be used. I glanced at the future
> patches, and knowing very little about arm, I was unable to glean useful info
> about exactly who is being prevented from doing what.
>
> >
> > Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> > ---
> > include/linux/kvm_host.h | 3 +++
> > virt/kvm/kvm_main.c | 9 +++++++++
> > 2 files changed, 12 insertions(+)
> >
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index c310648cc8f1..d0bd8f7a026c 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -623,6 +623,7 @@ struct kvm {
> > struct notifier_block pm_notifier;
> > #endif
> > char stats_id[KVM_STATS_NAME_SIZE];
> > + bool vm_started;
> > };
> >
> > #define kvm_err(fmt, ...) \
> > @@ -1666,6 +1667,8 @@ static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu)
> > }
> > }
> >
> > +#define kvm_vm_has_started(kvm) (kvm->vm_started)
>
> Needs parantheses around (kvm), but why bother with a macro? This is the same
> header that defines struct kvm.
>
No specific reason for creating a macro as such. I can remove it if it
feels noisy.
> > +
> > extern bool kvm_rebooting;
> >
> > extern unsigned int halt_poll_ns;
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 72c4e6b39389..962b91ac2064 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -3686,6 +3686,7 @@ static long kvm_vcpu_ioctl(struct file *filp,
> > int r;
> > struct kvm_fpu *fpu = NULL;
> > struct kvm_sregs *kvm_sregs = NULL;
> > + struct kvm *kvm = vcpu->kvm;
>
> If you're going to bother grabbing kvm, replace the instances below that also do
> vcpu->kvm.
>
> >
> > if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_dead)
> > return -EIO;
> > @@ -3723,6 +3724,14 @@ static long kvm_vcpu_ioctl(struct file *filp,
> > if (oldpid)
> > synchronize_rcu();
> > put_pid(oldpid);
> > +
> > + /*
> > + * Since we land here even on the first vCPU run,
> > + * we can mark that the VM has started running.
>
> Please avoid "we", "us", etc..
>
> "vm_started" is also ambiguous. If we end up with a flag, then I would prefer a
> much more literal name, a la created_vcpus, e.g. ran_vcpus or something.
>
> > + */
> > + mutex_lock(&kvm->lock);
>
> This adds unnecessary lock contention when running vCPUs. The naive solution
> would be:
> if (!kvm->vm_started) {
> ...
> }
>
Not sure if I understood the solution..
> > + kvm->vm_started = true;
> > + mutex_unlock(&kvm->lock);
>
> Lastly, why is this in generic KVM?
>
The v1 of the series originally had it in the arm specific code.
However, I was suggested to move it to the generic code since the book
keeping is not arch specific and could be helpful to others too [1].
Thanks for the review. I'll add your other comments as well.
Regards,
Raghavendra
[1]: https://lore.kernel.org/kvmarm/[email protected]/
> > }
> > r = kvm_arch_vcpu_ioctl_run(vcpu);
> > trace_kvm_userspace_exit(vcpu->run->exit_reason, r);
> > --
> > 2.34.1.448.ga2b2bfdf31-goog
> >
> > _______________________________________________
> > kvmarm mailing list
> > [email protected]
> > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
On Fri, Jan 7, 2022 at 9:40 PM Reiji Watanabe <[email protected]> wrote:
>
> Hi Raghu,
>
> On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta
> <[email protected]> wrote:
> >
> > Introduce the KVM ARM64 capability, KVM_CAP_ARM_HVC_FW_REG_BMAP,
> > to indicate the support for psuedo-firmware bitmap extension.
> > Each of these registers holds a feature-set exposed to the guest
> > in the form of a bitmap. If supported, a simple 'read' of the
> > capability should return the number of psuedo-firmware registers
> > supported. User-space can utilize this to discover the registers.
> > It can further explore or modify the features using the classical
> > GET/SET_ONE_REG interface.
> >
> > Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> > ---
> > Documentation/virt/kvm/api.rst | 21 +++++++++++++++++++++
> > include/uapi/linux/kvm.h | 1 +
> > 2 files changed, 22 insertions(+)
> >
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index aeeb071c7688..646176537f2c 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -6925,6 +6925,27 @@ indicated by the fd to the VM this is called on.
> > This is intended to support intra-host migration of VMs between userspace VMMs,
> > upgrading the VMM process without interrupting the guest.
> >
> > +7.30 KVM_CAP_ARM_HVC_FW_REG_BMAP
>
> IMHO, instead of including its format of the register in the name,
> including its purpose/function in the name might be better.
> e.g. KVM_CAP_ARM_HVC_FEATURE_REG ?
> (Feature fields don't necessarily have to be in a bitmap format
> if they don't fit well although I'm not sure if we have such fields.)
>
Well we do have registers, KVM_REG_ARM_PSCI_VERSION for instance,
that's not covered by this CAP. But sure, I can explicitly add
'FEATURES' to the name. I also wanted to explicitly convey that we are
covering the *bitmapped* firmware registers here. But not sure if
appending 'BMAP' might give an impression that the CAP itself is
bitmapped.
Do you think KVM_CAP_ARM_HVC_BMAP_FEAT_REG is better?
> > +
> > +:Architectures: arm64
> > +:Parameters: None
> > +:Returns: Number of psuedo-firmware registers supported
>
> Looking at patch-4, the return value of this would be the number of
> pseudo-firmware *bitmap* registers supported.
> BTW, "4.68 KVM_SET_ONE_REG" in the doc uses the word "arm64 firmware
> pseudo-registers". It would be nicer to use the same term.
>
Nice catch. I'll fix it here in apr.rst.
> > +
> > +This capability indicates that KVM for arm64 supports the psuedo-firmware
> > +register bitmap extension. Each of these registers represent the features
> > +supported by a particular type in the form of a bitmap. By default, these
> > +registers are set with the upper limit of the features that are supported.
> > +
> > +The registers can be accessed via the standard SET_ONE_REG and KVM_GET_ONE_REG
> > +interfaces. The user-space is expected to read the number of these registers
> > +available by reading KVM_CAP_ARM_HVC_FW_REG_BMAP, read the current bitmap
> > +configuration via GET_ONE_REG for each register, and then write back the
> > +desired bitmap of features that it wishes the guest to see via SET_ONE_REG.
> > +
> > +Note that KVM doesn't allow the user-space to modify these registers after
> > +the VM (any of the vCPUs) has started running.
>
> Since even if KVM_RUN fails, and the VM hasn't started yet,
> it will get immutable. So, "after any of the vCPUs run KVM_RUN."
> might be more clear ?
>
Sure, that's probably more clear. I'll fix it.
Regards,
Raghavendra
> Thanks,
> Reiji
>
>
>
> > +
> > 8. Other capabilities.
> > ======================
> >
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index 1daa45268de2..209b43dbbc3c 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -1131,6 +1131,7 @@ struct kvm_ppc_resize_hpt {
> > #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
> > #define KVM_CAP_ARM_MTE 205
> > #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
> > +#define KVM_CAP_ARM_HVC_FW_REG_BMAP 207
> >
> > #ifdef KVM_CAP_IRQ_ROUTING
> >
> > --
> > 2.34.1.448.ga2b2bfdf31-goog
> >
On Mon, Jan 10, 2022 at 3:07 PM Raghavendra Rao Ananta
<[email protected]> wrote:
>
> On Fri, Jan 7, 2022 at 4:05 PM Jim Mattson <[email protected]> wrote:
> >
> > On Fri, Jan 7, 2022 at 3:43 PM Raghavendra Rao Ananta
> > <[email protected]> wrote:
> > >
> > > Hi Reiji,
> > >
> > > On Thu, Jan 6, 2022 at 10:07 PM Reiji Watanabe <[email protected]> wrote:
> > > >
> > > > Hi Raghu,
> > > >
> > > > On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta
> > > > <[email protected]> wrote:
> > > > >
> > > > > Capture the start of the KVM VM, which is basically the
> > > > > start of any vCPU run. This state of the VM is helpful
> > > > > in the upcoming patches to prevent user-space from
> > > > > configuring certain VM features after the VM has started
> > > > > running.
> >
> > What about live migration, where the VM has already technically been
> > started before the first call to KVM_RUN?
>
> My understanding is that a new 'struct kvm' is created on the target
> machine and this flag should be reset, which would allow the VMM to
> restore the firmware registers. However, we would be running KVM_RUN
> for the first time on the target machine, thus setting the flag.
> So, you are right; It's more of a resume operation from the guest's
> point of view. I guess the name of the variable is what's confusing
> here.
I was actually thinking that live migration gives userspace an easy
way to circumvent your restriction. You said, "This state of the VM is
helpful in the upcoming patches to prevent user-space from configuring
certain VM features after the VM has started running." However, if you
don't ensure that these VM features are configured the same way on the
target machine as they were on the source machine, you have not
actually accomplished your stated goal.
> Thanks,
> Raghavendra
On Fri, Jan 7, 2022 at 3:43 PM Raghavendra Rao Ananta
<[email protected]> wrote:
>
> Hi Reiji,
>
> On Thu, Jan 6, 2022 at 10:07 PM Reiji Watanabe <[email protected]> wrote:
> >
> > Hi Raghu,
> >
> > On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta
> > <[email protected]> wrote:
> > >
> > > Capture the start of the KVM VM, which is basically the
> > > start of any vCPU run. This state of the VM is helpful
> > > in the upcoming patches to prevent user-space from
> > > configuring certain VM features after the VM has started
> > > running.
> > >
> > > Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> > > ---
> > > include/linux/kvm_host.h | 3 +++
> > > virt/kvm/kvm_main.c | 9 +++++++++
> > > 2 files changed, 12 insertions(+)
> > >
> > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > index c310648cc8f1..d0bd8f7a026c 100644
> > > --- a/include/linux/kvm_host.h
> > > +++ b/include/linux/kvm_host.h
> > > @@ -623,6 +623,7 @@ struct kvm {
> > > struct notifier_block pm_notifier;
> > > #endif
> > > char stats_id[KVM_STATS_NAME_SIZE];
> > > + bool vm_started;
> >
> > Since KVM_RUN on any vCPUs doesn't necessarily mean that the VM
> > started yet, the name might be a bit misleading IMHO. I would
> > think 'has_run_once' or 'ran_once' might be more clear (?).
> >
> I always struggle with the names; but if you feel that 'ran_once'
> makes more sense for a reader, I can change it.
I would prefer 'ran_once'.
> > > };
> > >
> > > #define kvm_err(fmt, ...) \
> > > @@ -1666,6 +1667,8 @@ static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu)
> > > }
> > > }
> > >
> > > +#define kvm_vm_has_started(kvm) (kvm->vm_started)
> > > +
> > > extern bool kvm_rebooting;
> > >
> > > extern unsigned int halt_poll_ns;
> > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > index 72c4e6b39389..962b91ac2064 100644
> > > --- a/virt/kvm/kvm_main.c
> > > +++ b/virt/kvm/kvm_main.c
> > > @@ -3686,6 +3686,7 @@ static long kvm_vcpu_ioctl(struct file *filp,
> > > int r;
> > > struct kvm_fpu *fpu = NULL;
> > > struct kvm_sregs *kvm_sregs = NULL;
> > > + struct kvm *kvm = vcpu->kvm;
> > >
> > > if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_dead)
> > > return -EIO;
> > > @@ -3723,6 +3724,14 @@ static long kvm_vcpu_ioctl(struct file *filp,
> > > if (oldpid)
> > > synchronize_rcu();
> > > put_pid(oldpid);
> > > +
> > > + /*
> > > + * Since we land here even on the first vCPU run,
> > > + * we can mark that the VM has started running.
> > > + */
> >
> > It might be nicer to add a comment why the code below gets kvm->lock.
> >
> I've been going back and forth on this one. Initially I considered
> simply going with atomic_t, but the patch 4/11 (KVM: arm64: Setup a
> framework for hypercall bitmap firmware registers)
> kvm_arm_set_fw_reg_bmap()'s implementation felt like we need a lock to
> have the whole 'is the register busy?' operation atomic. But, that's
> just one of the applications.
I understand why you need the code to get the lock here with the
current implementation.
But, since the code just set the one field (vm_started) with the lock,
I thought the intention of getting the lock might not be so obvious.
(But, maybe clear enough looking at the code in the patch-4)
Thanks,
Reiji
> > Anyway, the patch generally looks good to me, and thank you
> > for making this change (it works for my purpose as well).
> >
> > Reviewed-by: Reiji Watanabe <[email protected]>
> >
> Glad that it's helping you as well and thanks for the review.
>
> Regards,
> Raghavendra
>
> > Thanks,
> > Reiji
> >
> >
> > > + mutex_lock(&kvm->lock);
> > > + kvm->vm_started = true;
> > > + mutex_unlock(&kvm->lock);
> > > }
> > > r = kvm_arch_vcpu_ioctl_run(vcpu);
> > > trace_kvm_userspace_exit(vcpu->run->exit_reason, r);
> > > --
> > > 2.34.1.448.ga2b2bfdf31-goog
> > >
On Sun, Jan 9, 2022 at 10:29 PM Reiji Watanabe <[email protected]> wrote:
>
> Hi Raghu,
>
> On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta
> <[email protected]> wrote:
> >
> > KVM regularly introduces new hypercall services to the guests without
> > any consent from the Virtual Machine Manager (VMM). This means, the
> > guests can observe hypercall services in and out as they migrate
> > across various host kernel versions. This could be a major problem
> > if the guest discovered a hypercall, started using it, and after
> > getting migrated to an older kernel realizes that it's no longer
> > available. Depending on how the guest handles the change, there's
> > a potential chance that the guest would just panic.
> >
> > As a result, there's a need for the VMM to elect the services that
> > it wishes the guest to discover. VMM can elect these services based
> > on the kernels spread across its (migration) fleet. To remedy this,
> > extend the existing firmware psuedo-registers, such as
> > KVM_REG_ARM_PSCI_VERSION, for all the hypercall services available.
> >
> > These firmware registers are categorized based on the service call
> > owners, and unlike the existing firmware psuedo-registers, they hold
> > the features supported in the form of a bitmap.
> >
> > The capability, KVM_CAP_ARM_HVC_FW_REG_BMAP, is used to announce
> > this extension, which returns the number of psuedo-firmware
> > registers supported. During the VM initialization, the registers
> > holds an upper-limit of the features supported by the corresponding
> > registers. It's expected that the VMMs discover the features
> > provided by each register via GET_ONE_REG, and writeback the
> > desired values using SET_ONE_REG. KVM allows this modification
> > only until the VM has started.
> >
> > Older VMMs can simply ignore the capability and the hypercall services
> > will be exposed unconditionally to the guests, thus ensuring backward
> > compatibility.
> >
> > In this patch, the framework adds the register only for ARM's standard
> > secure services (owner value 4). Currently, this includes support only
> > for ARM True Random Number Generator (TRNG) service, with bit-0 of the
> > register representing mandatory features of v1.0. Other services are
> > momentarily added in the upcoming patches.
> >
> > Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> > ---
> > arch/arm64/include/asm/kvm_host.h | 12 ++++
> > arch/arm64/include/uapi/asm/kvm.h | 4 ++
> > arch/arm64/kvm/arm.c | 4 ++
> > arch/arm64/kvm/hypercalls.c | 103 +++++++++++++++++++++++++++++-
> > arch/arm64/kvm/trng.c | 8 +--
> > include/kvm/arm_hypercalls.h | 6 ++
> > 6 files changed, 129 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 2a5f7f38006f..a32cded0371b 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -102,6 +102,15 @@ struct kvm_s2_mmu {
> > struct kvm_arch_memory_slot {
> > };
> >
> > +/**
> > + * struct kvm_hvc_desc: KVM ARM64 hypercall descriptor
> > + *
> > + * @hvc_std_bmap: Bitmap of standard secure service calls
> > + */
> > +struct kvm_hvc_desc {
> > + u64 hvc_std_bmap;
> > +};
> > +
> > struct kvm_arch {
> > struct kvm_s2_mmu mmu;
> >
> > @@ -137,6 +146,9 @@ struct kvm_arch {
> >
> > /* Memory Tagging Extension enabled for the guest */
> > bool mte_enabled;
> > +
> > + /* Hypercall firmware register' descriptor */
> > + struct kvm_hvc_desc hvc_desc;
> > };
> >
> > struct kvm_vcpu_fault_info {
> > diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> > index b3edde68bc3e..0d6f29c58456 100644
> > --- a/arch/arm64/include/uapi/asm/kvm.h
> > +++ b/arch/arm64/include/uapi/asm/kvm.h
> > @@ -281,6 +281,10 @@ struct kvm_arm_copy_mte_tags {
> > #define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED 3
> > #define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED (1U << 4)
> >
> > +#define KVM_REG_ARM_STD_BMAP KVM_REG_ARM_FW_REG(3)
> > +#define KVM_REG_ARM_STD_BIT_TRNG_V1_0 BIT(0)
> > +#define KVM_REG_ARM_STD_BMAP_BIT_MAX 0 /* Last valid bit */
> > +
> > /* SVE registers */
> > #define KVM_REG_ARM64_SVE (0x15 << KVM_REG_ARM_COPROC_SHIFT)
> >
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index e4727dc771bf..56fe81565235 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -156,6 +156,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> > kvm->arch.max_vcpus = kvm_arm_default_max_vcpus();
> >
> > set_default_spectre(kvm);
> > + kvm_arm_init_hypercalls(kvm);
> >
> > return ret;
> > out_free_stage2_pgd:
> > @@ -283,6 +284,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> > case KVM_CAP_ARM_PTRAUTH_GENERIC:
> > r = system_has_full_ptr_auth();
> > break;
> > + case KVM_CAP_ARM_HVC_FW_REG_BMAP:
> > + r = kvm_arm_num_fw_bmap_regs();
> > + break;
>
> Looking at the discussion for the v2 series,
>
> https://lore.kernel.org/kvmarm/[email protected]/
>
> I assume that the number of the pseudo-firmware bitmap registers
> will be used to clear pseudo firmware bitmap registers that
> userspace doesn't know.
> I'm wondering how userspace can identify which pseudo-firmware
> registers that KVM_GET_REG_LIST provides are the pseudo-firmware
> bitmap registers that it doesn't know.
> For instance, suppose pseudo-firmware registers that KVM_GET_REG_LIST
> provides are KVM_REG_ARM_FW_REG(0) to KVM_REG_ARM_FW_REG(9), userspace
> doesn't knows KVM_REG_ARM_FW_REG(6) to KVM_REG_ARM_FW_REG(9), and
> KVM_CAP_ARM_HVC_FW_REG_BMAP returns 5, how can userspace identify
> remaining two bitmap registers from those 4 (fw-reg #6 to #9)
> firmware registers ?
>
In v3, we leave the decision upto the userspace. If the userspace
encounters a register that it's unaware, it can choose either to clear
it or let it get exposed to the guest as is (see the code snipped
shared by Andrew in the link).
Trying to understand the question better- are you asking how would
userspace distinguish between bitmap and regular fw registers with
intermixed sequence numbers?
If yes, do you foresee a reason why they 'unaware' registers needed to
be treated differently?
>
> > default:
> > r = 0;
> > }
> > diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> > index 3c2fcf31ad3d..06243e4670eb 100644
> > --- a/arch/arm64/kvm/hypercalls.c
> > +++ b/arch/arm64/kvm/hypercalls.c
> > @@ -58,6 +58,29 @@ static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
> > val[3] = lower_32_bits(cycles);
> > }
> >
> > +static bool kvm_arm_fw_reg_feat_enabled(u64 reg_bmap, u64 feat_bit)
> > +{
> > + return reg_bmap & feat_bit;
> > +}
> > +
> > +bool kvm_hvc_call_supported(struct kvm_vcpu *vcpu, u32 func_id)
> > +{
> > + struct kvm_hvc_desc *hvc_desc = &vcpu->kvm->arch.hvc_desc;
> > +
> > + switch (func_id) {
> > + case ARM_SMCCC_TRNG_VERSION:
> > + case ARM_SMCCC_TRNG_FEATURES:
> > + case ARM_SMCCC_TRNG_GET_UUID:
> > + case ARM_SMCCC_TRNG_RND32:
> > + case ARM_SMCCC_TRNG_RND64:
> > + return kvm_arm_fw_reg_feat_enabled(hvc_desc->hvc_std_bmap,
> > + KVM_REG_ARM_STD_BIT_TRNG_V1_0);
> > + default:
> > + /* By default, allow the services that aren't listed here */
> > + return true;
> > + }
> > +}
>
> kvm_hvc_call_supported() could return true even for @func_id that
> kvm_hvc_call_handler() returns -EINVAL for. Is this behavior what
> you really want ?
Yes. My idea was to let kvm_hvc_call_supported() check for the
support, while kvm_hvc_call_handler() does the real processing of the
call.
> If so, IMHO the function name might be a bit mis-leading.
> "kvm_hvc_call_disabled" (and flip the return value)
> might be closer to what it does(?).
>
Sorry, I'm unclear how flipping is helping. Wouldn't we return 'false'
if we don't have a case for the func_id, indicating it's NOT disabled,
but kvm_hvc_call_handler() can still return SMCCC_RET_NOT_SUPPORTED?
>
> > +
> > int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> > {
> > u32 func_id = smccc_get_function(vcpu);
> > @@ -65,6 +88,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> > u32 feature;
> > gpa_t gpa;
> >
> > + if (!kvm_hvc_call_supported(vcpu, func_id))
> > + goto out;
> > +
> > switch (func_id) {
> > case ARM_SMCCC_VERSION_FUNC_ID:
> > val[0] = ARM_SMCCC_VERSION_1_1;
> > @@ -143,6 +169,7 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> > return kvm_psci_call(vcpu);
> > }
> >
> > +out:
> > smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
> > return 1;
> > }
> > @@ -153,9 +180,25 @@ static const u64 kvm_arm_fw_reg_ids[] = {
> > KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2,
> > };
> >
> > +static const u64 kvm_arm_fw_reg_bmap_ids[] = {
> > + KVM_REG_ARM_STD_BMAP,
> > +};
> > +
> > +void kvm_arm_init_hypercalls(struct kvm *kvm)
> > +{
> > + struct kvm_hvc_desc *hvc_desc = &kvm->arch.hvc_desc;
> > +
> > + hvc_desc->hvc_std_bmap = ARM_SMCCC_STD_FEATURES;
> > +}
> > +
> > +int kvm_arm_num_fw_bmap_regs(void)
> > +{
> > + return ARRAY_SIZE(kvm_arm_fw_reg_bmap_ids);
> > +}
> > +
> > int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu)
> > {
> > - return ARRAY_SIZE(kvm_arm_fw_reg_ids);
> > + return ARRAY_SIZE(kvm_arm_fw_reg_ids) + kvm_arm_num_fw_bmap_regs();
> > }
> >
> > int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
> > @@ -167,6 +210,11 @@ int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
> > return -EFAULT;
> > }
> >
> > + for (i = 0; i < ARRAY_SIZE(kvm_arm_fw_reg_bmap_ids); i++) {
> > + if (put_user(kvm_arm_fw_reg_bmap_ids[i], uindices++))
> > + return -EFAULT;
> > + }
> > +
> > return 0;
> > }
> >
> > @@ -211,9 +259,20 @@ static int get_kernel_wa_level(u64 regid)
> > return -EINVAL;
> > }
> >
> > +static void
> > +kvm_arm_get_fw_reg_bmap(struct kvm_vcpu *vcpu, u64 fw_reg_bmap, u64 *val)
> > +{
> > + struct kvm *kvm = vcpu->kvm;
> > +
> > + mutex_lock(&kvm->lock);
> > + *val = fw_reg_bmap;
> > + mutex_unlock(&kvm->lock);
>
> Why does it need to hold the lock ? (Wouldn't READ_ONCE be enough ?)
>
I don't have much experience with READ_ONCE at this point, but do you
think this read can be protected again the read/write in
kvm_arm_set_fw_reg_bmap()?
>
> > +}
> > +
> > int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > {
> > void __user *uaddr = (void __user *)(long)reg->addr;
> > + struct kvm_hvc_desc *hvc_desc = &vcpu->kvm->arch.hvc_desc;
> > u64 val;
> >
> > switch (reg->id) {
> > @@ -224,6 +283,9 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
> > val = get_kernel_wa_level(reg->id) & KVM_REG_FEATURE_LEVEL_MASK;
> > break;
> > + case KVM_REG_ARM_STD_BMAP:
> > + kvm_arm_get_fw_reg_bmap(vcpu, hvc_desc->hvc_std_bmap, &val);
> > + break;
> > default:
> > return -ENOENT;
> > }
> > @@ -234,6 +296,43 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > return 0;
> > }
> >
> > +static int kvm_arm_set_fw_reg_bmap(struct kvm_vcpu *vcpu, u64 reg_id, u64 val)
> > +{
> > + int ret = 0;
> > + struct kvm *kvm = vcpu->kvm;
> > + struct kvm_hvc_desc *hvc_desc = &kvm->arch.hvc_desc;
> > + u64 *fw_reg_bmap, fw_reg_features;
> > +
> > + switch (reg_id) {
> > + case KVM_REG_ARM_STD_BMAP:
> > + fw_reg_bmap = &hvc_desc->hvc_std_bmap;
> > + fw_reg_features = ARM_SMCCC_STD_FEATURES;
> > + break;
> > + default:
> > + return -ENOENT;
> > + }
> > +
> > + /* Check for unsupported bit */
> > + if (val & ~fw_reg_features)
> > + return -EINVAL;
> > +
> > + mutex_lock(&kvm->lock);
> > +
> > + /*
> > + * If the VM (any vCPU) has already started running, return success
> > + * if there's no change in the value. Else, return -EBUSY.
> > + */
> > + if (kvm_vm_has_started(kvm)) {
> > + ret = *fw_reg_bmap != val ? -EBUSY : 0;
> > + goto out;
> > + }
> > +
> > + *fw_reg_bmap = val;
> > +out:
> > + mutex_unlock(&kvm->lock);
> > + return ret;
> > +}
> > +
> > int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > {
> > void __user *uaddr = (void __user *)(long)reg->addr;
> > @@ -310,6 +409,8 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > return -EINVAL;
> >
> > return 0;
> > + case KVM_REG_ARM_STD_BMAP:
> > + return kvm_arm_set_fw_reg_bmap(vcpu, reg->id, val);
> > default:
> > return -ENOENT;
> > }
> > diff --git a/arch/arm64/kvm/trng.c b/arch/arm64/kvm/trng.c
> > index 99bdd7103c9c..23f912514b06 100644
> > --- a/arch/arm64/kvm/trng.c
> > +++ b/arch/arm64/kvm/trng.c
> > @@ -60,14 +60,8 @@ int kvm_trng_call(struct kvm_vcpu *vcpu)
> > val = ARM_SMCCC_TRNG_VERSION_1_0;
> > break;
> > case ARM_SMCCC_TRNG_FEATURES:
> > - switch (smccc_get_arg1(vcpu)) {
> > - case ARM_SMCCC_TRNG_VERSION:
> > - case ARM_SMCCC_TRNG_FEATURES:
> > - case ARM_SMCCC_TRNG_GET_UUID:
> > - case ARM_SMCCC_TRNG_RND32:
> > - case ARM_SMCCC_TRNG_RND64:
> > + if (kvm_hvc_call_supported(vcpu, smccc_get_arg1(vcpu)))
> > val = TRNG_SUCCESS;
>
> kvm_hvc_call_supported() returns true for any values that are
> not explicitly listed in kvm_hvc_call_supported() (i.e. it returns
> true even for @func_id that are not any of ARM_SMCCC_TRNG_*).
> So, I don't think it can simply use the current kvm_hvc_call_supported.
>
You are right. Probably I should leave the case statements as is (or
think of some better way).
Thanks for the review and suggestions.
Regards,
Raghavendra
> Thanks,
> Reiji
>
> > - }
> > break;
> > case ARM_SMCCC_TRNG_GET_UUID:
> > smccc_set_retval(vcpu, le32_to_cpu(u[0]), le32_to_cpu(u[1]),
> > diff --git a/include/kvm/arm_hypercalls.h b/include/kvm/arm_hypercalls.h
> > index 5d38628a8d04..8fe68d8d6d96 100644
> > --- a/include/kvm/arm_hypercalls.h
> > +++ b/include/kvm/arm_hypercalls.h
> > @@ -6,6 +6,9 @@
> >
> > #include <asm/kvm_emulate.h>
> >
> > +#define ARM_SMCCC_STD_FEATURES \
> > + GENMASK_ULL(KVM_REG_ARM_STD_BMAP_BIT_MAX, 0)
> > +
> > int kvm_hvc_call_handler(struct kvm_vcpu *vcpu);
> >
> > static inline u32 smccc_get_function(struct kvm_vcpu *vcpu)
> > @@ -42,9 +45,12 @@ static inline void smccc_set_retval(struct kvm_vcpu *vcpu,
> >
> > struct kvm_one_reg;
> >
> > +void kvm_arm_init_hypercalls(struct kvm *kvm);
> > +int kvm_arm_num_fw_bmap_regs(void);
> > int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu);
> > int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
> > int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
> > int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
> > +bool kvm_hvc_call_supported(struct kvm_vcpu *vcpu, u32 func_id);
> >
> > #endif
> > --
> > 2.34.1.448.ga2b2bfdf31-goog
> >
On Mon, Jan 10, 2022 at 3:40 PM Raghavendra Rao Ananta
<[email protected]> wrote:
>
> On Fri, Jan 7, 2022 at 9:40 PM Reiji Watanabe <[email protected]> wrote:
> >
> > Hi Raghu,
> >
> > On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta
> > <[email protected]> wrote:
> > >
> > > Introduce the KVM ARM64 capability, KVM_CAP_ARM_HVC_FW_REG_BMAP,
> > > to indicate the support for psuedo-firmware bitmap extension.
> > > Each of these registers holds a feature-set exposed to the guest
> > > in the form of a bitmap. If supported, a simple 'read' of the
> > > capability should return the number of psuedo-firmware registers
> > > supported. User-space can utilize this to discover the registers.
> > > It can further explore or modify the features using the classical
> > > GET/SET_ONE_REG interface.
> > >
> > > Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> > > ---
> > > Documentation/virt/kvm/api.rst | 21 +++++++++++++++++++++
> > > include/uapi/linux/kvm.h | 1 +
> > > 2 files changed, 22 insertions(+)
> > >
> > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > > index aeeb071c7688..646176537f2c 100644
> > > --- a/Documentation/virt/kvm/api.rst
> > > +++ b/Documentation/virt/kvm/api.rst
> > > @@ -6925,6 +6925,27 @@ indicated by the fd to the VM this is called on.
> > > This is intended to support intra-host migration of VMs between userspace VMMs,
> > > upgrading the VMM process without interrupting the guest.
> > >
> > > +7.30 KVM_CAP_ARM_HVC_FW_REG_BMAP
> >
> > IMHO, instead of including its format of the register in the name,
> > including its purpose/function in the name might be better.
> > e.g. KVM_CAP_ARM_HVC_FEATURE_REG ?
> > (Feature fields don't necessarily have to be in a bitmap format
> > if they don't fit well although I'm not sure if we have such fields.)
> >
> Well we do have registers, KVM_REG_ARM_PSCI_VERSION for instance,
> that's not covered by this CAP. But sure, I can explicitly add
> 'FEATURES' to the name. I also wanted to explicitly convey that we are
> covering the *bitmapped* firmware registers here. But not sure if
> appending 'BMAP' might give an impression that the CAP itself is
> bitmapped.
> Do you think KVM_CAP_ARM_HVC_BMAP_FEAT_REG is better?
Thank you for the explanation! That sounds better to me.
Regards,
Reiji
> > > +
> > > +:Architectures: arm64
> > > +:Parameters: None
> > > +:Returns: Number of psuedo-firmware registers supported
> >
> > Looking at patch-4, the return value of this would be the number of
> > pseudo-firmware *bitmap* registers supported.
> > BTW, "4.68 KVM_SET_ONE_REG" in the doc uses the word "arm64 firmware
> > pseudo-registers". It would be nicer to use the same term.
> >
> Nice catch. I'll fix it here in apr.rst.
> > > +
> > > +This capability indicates that KVM for arm64 supports the psuedo-firmware
> > > +register bitmap extension. Each of these registers represent the features
> > > +supported by a particular type in the form of a bitmap. By default, these
> > > +registers are set with the upper limit of the features that are supported.
> > > +
> > > +The registers can be accessed via the standard SET_ONE_REG and KVM_GET_ONE_REG
> > > +interfaces. The user-space is expected to read the number of these registers
> > > +available by reading KVM_CAP_ARM_HVC_FW_REG_BMAP, read the current bitmap
> > > +configuration via GET_ONE_REG for each register, and then write back the
> > > +desired bitmap of features that it wishes the guest to see via SET_ONE_REG.
> > > +
> > > +Note that KVM doesn't allow the user-space to modify these registers after
> > > +the VM (any of the vCPUs) has started running.
> >
> > Since even if KVM_RUN fails, and the VM hasn't started yet,
> > it will get immutable. So, "after any of the vCPUs run KVM_RUN."
> > might be more clear ?
> >
> Sure, that's probably more clear. I'll fix it.
>
> Regards,
> Raghavendra
>
> > Thanks,
> > Reiji
> >
> >
> >
> > > +
> > > 8. Other capabilities.
> > > ======================
> > >
> > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > > index 1daa45268de2..209b43dbbc3c 100644
> > > --- a/include/uapi/linux/kvm.h
> > > +++ b/include/uapi/linux/kvm.h
> > > @@ -1131,6 +1131,7 @@ struct kvm_ppc_resize_hpt {
> > > #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
> > > #define KVM_CAP_ARM_MTE 205
> > > #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
> > > +#define KVM_CAP_ARM_HVC_FW_REG_BMAP 207
> > >
> > > #ifdef KVM_CAP_IRQ_ROUTING
> > >
> > > --
> > > 2.34.1.448.ga2b2bfdf31-goog
> > >
On Mon, Jan 10, 2022, Raghavendra Rao Ananta wrote:
> On Fri, Jan 7, 2022 at 5:06 PM Sean Christopherson <[email protected]> wrote:
> >
> > On Tue, Jan 04, 2022, Raghavendra Rao Ananta wrote:
> > > +#define kvm_vm_has_started(kvm) (kvm->vm_started)
> >
> > Needs parantheses around (kvm), but why bother with a macro? This is the same
> > header that defines struct kvm.
> >
> No specific reason for creating a macro as such. I can remove it if it
> feels noisy.
Please do. In the future, don't use a macro unless there's a good reason to do
so. Don't get me wrong, I love abusing macros, but for things like this they are
completely inferior to
static inline bool kvm_vm_has_started(struct kvm *kvm)
{
return kvm->vm_started;
}
because a helper function gives us type safety, doesn't suffer from concatenation
of tokens potentially doing weird things, is easier to extend to a multi-line
implementation, etc...
An example of when it's ok to use a macro is x86's
#define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0)
which uses a macro instead of a proper function to avoid a circular dependency
due to arch/x86/include/asm/kvm_host.h being included by include/linux/kvm_host.h
and thus x86's implementation of kvm_arch_vcpu_memslots_id() coming before the
definition of struct kvm_vcpu. But that's very much an exception and done only
because the alternatives suck more.
> > > + */
> > > + mutex_lock(&kvm->lock);
> >
> > This adds unnecessary lock contention when running vCPUs. The naive solution
> > would be:
> > if (!kvm->vm_started) {
> > ...
> > }
> >
> Not sure if I understood the solution..
In your proposed patch, KVM_RUN will take kvm->lock _every_ time. That introduces
unnecessary contention as it will serialize this bit of code if multiple vCPUs
are attempting KVM_RUN. By checking !vm_started, only the "first" KVM_RUN for a
VM will acquire kvm->lock and thus avoid contention once the VM is up and running.
There's still a possibility that multiple vCPUs will contend for kvm->lock on their
first KVM_RUN, hence the quotes. I called it "naive" because it's possible there's
a more elegant solution depending on the use case, e.g. a lockless approach might
work (or it might not).
> > > + kvm->vm_started = true;
> > > + mutex_unlock(&kvm->lock);
> >
> > Lastly, why is this in generic KVM?
> >
> The v1 of the series originally had it in the arm specific code.
> However, I was suggested to move it to the generic code since the book
> keeping is not arch specific and could be helpful to others too [1].
I'm definitely in favor of moving/adding thing to generic KVM when it makes sense,
but I'm skeptical in this particular case. The code _is_ arch specific in that
arm64 apparently needs to acquire kvm->lock when checking if a vCPU has run, e.g.
versus a hypothetical x86 use case that might be completely ok with a lockless
implementation. And it's not obvious that there's a plausible, safe use case
outside of arm64, e.g. on x86, there is very, very little that is truly shared
across the entire VM/system, most things are per-thread/core/package in some way,
shape, or form. In other words, I'm a wary of providing something like this for
x86 because odds are good that any use will be functionally incorrect.
On Tue, Jan 11, 2022 at 9:36 AM Sean Christopherson <[email protected]> wrote:
>
> On Mon, Jan 10, 2022, Raghavendra Rao Ananta wrote:
> > On Fri, Jan 7, 2022 at 5:06 PM Sean Christopherson <[email protected]> wrote:
> > >
> > > On Tue, Jan 04, 2022, Raghavendra Rao Ananta wrote:
> > > > +#define kvm_vm_has_started(kvm) (kvm->vm_started)
> > >
> > > Needs parantheses around (kvm), but why bother with a macro? This is the same
> > > header that defines struct kvm.
> > >
> > No specific reason for creating a macro as such. I can remove it if it
> > feels noisy.
>
> Please do. In the future, don't use a macro unless there's a good reason to do
> so. Don't get me wrong, I love abusing macros, but for things like this they are
> completely inferior to
>
> static inline bool kvm_vm_has_started(struct kvm *kvm)
> {
> return kvm->vm_started;
> }
>
> because a helper function gives us type safety, doesn't suffer from concatenation
> of tokens potentially doing weird things, is easier to extend to a multi-line
> implementation, etc...
>
> An example of when it's ok to use a macro is x86's
>
> #define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0)
>
> which uses a macro instead of a proper function to avoid a circular dependency
> due to arch/x86/include/asm/kvm_host.h being included by include/linux/kvm_host.h
> and thus x86's implementation of kvm_arch_vcpu_memslots_id() coming before the
> definition of struct kvm_vcpu. But that's very much an exception and done only
> because the alternatives suck more.
>
Understood. Thanks for the explanation! Will switch to an inline function.
> > > > + */
> > > > + mutex_lock(&kvm->lock);
> > >
> > > This adds unnecessary lock contention when running vCPUs. The naive solution
> > > would be:
> > > if (!kvm->vm_started) {
> > > ...
> > > }
> > >
> > Not sure if I understood the solution..
>
> In your proposed patch, KVM_RUN will take kvm->lock _every_ time. That introduces
> unnecessary contention as it will serialize this bit of code if multiple vCPUs
> are attempting KVM_RUN. By checking !vm_started, only the "first" KVM_RUN for a
> VM will acquire kvm->lock and thus avoid contention once the VM is up and running.
> There's still a possibility that multiple vCPUs will contend for kvm->lock on their
> first KVM_RUN, hence the quotes. I called it "naive" because it's possible there's
> a more elegant solution depending on the use case, e.g. a lockless approach might
> work (or it might not).
>
But is it safe to read kvm->vm_started without grabbing the lock in
the first place? use atomic_t maybe for this?
> > > > + kvm->vm_started = true;
> > > > + mutex_unlock(&kvm->lock);
> > >
> > > Lastly, why is this in generic KVM?
> > >
> > The v1 of the series originally had it in the arm specific code.
> > However, I was suggested to move it to the generic code since the book
> > keeping is not arch specific and could be helpful to others too [1].
>
> I'm definitely in favor of moving/adding thing to generic KVM when it makes sense,
> but I'm skeptical in this particular case. The code _is_ arch specific in that
> arm64 apparently needs to acquire kvm->lock when checking if a vCPU has run, e.g.
> versus a hypothetical x86 use case that might be completely ok with a lockless
> implementation. And it's not obvious that there's a plausible, safe use case
> outside of arm64, e.g. on x86, there is very, very little that is truly shared
> across the entire VM/system, most things are per-thread/core/package in some way,
> shape, or form. In other words, I'm a wary of providing something like this for
> x86 because odds are good that any use will be functionally incorrect.
I've been going back and forth on this. I've seen a couple of
variables declared in the generic struct and used only in the arch
code. vcpu->valid_wakeup for instance, which is used only by s390
arch. Maybe I'm looking at it the wrong way as to what can and can't
go in the generic kvm code.
Thanks,
Raghavendra
On Mon, Jan 10, 2022 at 3:57 PM Jim Mattson <[email protected]> wrote:
>
> On Mon, Jan 10, 2022 at 3:07 PM Raghavendra Rao Ananta
> <[email protected]> wrote:
> >
> > On Fri, Jan 7, 2022 at 4:05 PM Jim Mattson <[email protected]> wrote:
> > >
> > > On Fri, Jan 7, 2022 at 3:43 PM Raghavendra Rao Ananta
> > > <[email protected]> wrote:
> > > >
> > > > Hi Reiji,
> > > >
> > > > On Thu, Jan 6, 2022 at 10:07 PM Reiji Watanabe <[email protected]> wrote:
> > > > >
> > > > > Hi Raghu,
> > > > >
> > > > > On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > Capture the start of the KVM VM, which is basically the
> > > > > > start of any vCPU run. This state of the VM is helpful
> > > > > > in the upcoming patches to prevent user-space from
> > > > > > configuring certain VM features after the VM has started
> > > > > > running.
> > >
> > > What about live migration, where the VM has already technically been
> > > started before the first call to KVM_RUN?
> >
> > My understanding is that a new 'struct kvm' is created on the target
> > machine and this flag should be reset, which would allow the VMM to
> > restore the firmware registers. However, we would be running KVM_RUN
> > for the first time on the target machine, thus setting the flag.
> > So, you are right; It's more of a resume operation from the guest's
> > point of view. I guess the name of the variable is what's confusing
> > here.
>
> I was actually thinking that live migration gives userspace an easy
> way to circumvent your restriction. You said, "This state of the VM is
> helpful in the upcoming patches to prevent user-space from configuring
> certain VM features after the VM has started running." However, if you
> don't ensure that these VM features are configured the same way on the
> target machine as they were on the source machine, you have not
> actually accomplished your stated goal.
>
Isn't that up to the VMM to save/restore and validate the registers
across migrations?
Perhaps I have to re-word my intentions for the patch- userspace
should be able to configure the registers before issuing the first
KVM_RUN.
Thanks,
Raghavendra
> > Thanks,
> > Raghavendra
On Mon, Jan 10, 2022 at 4:04 PM Reiji Watanabe <[email protected]> wrote:
>
> On Fri, Jan 7, 2022 at 3:43 PM Raghavendra Rao Ananta
> <[email protected]> wrote:
> >
> > Hi Reiji,
> >
> > On Thu, Jan 6, 2022 at 10:07 PM Reiji Watanabe <[email protected]> wrote:
> > >
> > > Hi Raghu,
> > >
> > > On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta
> > > <[email protected]> wrote:
> > > >
> > > > Capture the start of the KVM VM, which is basically the
> > > > start of any vCPU run. This state of the VM is helpful
> > > > in the upcoming patches to prevent user-space from
> > > > configuring certain VM features after the VM has started
> > > > running.
> > > >
> > > > Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> > > > ---
> > > > include/linux/kvm_host.h | 3 +++
> > > > virt/kvm/kvm_main.c | 9 +++++++++
> > > > 2 files changed, 12 insertions(+)
> > > >
> > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > > index c310648cc8f1..d0bd8f7a026c 100644
> > > > --- a/include/linux/kvm_host.h
> > > > +++ b/include/linux/kvm_host.h
> > > > @@ -623,6 +623,7 @@ struct kvm {
> > > > struct notifier_block pm_notifier;
> > > > #endif
> > > > char stats_id[KVM_STATS_NAME_SIZE];
> > > > + bool vm_started;
> > >
> > > Since KVM_RUN on any vCPUs doesn't necessarily mean that the VM
> > > started yet, the name might be a bit misleading IMHO. I would
> > > think 'has_run_once' or 'ran_once' might be more clear (?).
> > >
> > I always struggle with the names; but if you feel that 'ran_once'
> > makes more sense for a reader, I can change it.
>
> I would prefer 'ran_once'.
>
Yes, that makes sense. I think the name created a lot of confusion for
the patch.
Thanks,
Raghavendra
>
> > > > };
> > > >
> > > > #define kvm_err(fmt, ...) \
> > > > @@ -1666,6 +1667,8 @@ static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu)
> > > > }
> > > > }
> > > >
> > > > +#define kvm_vm_has_started(kvm) (kvm->vm_started)
> > > > +
> > > > extern bool kvm_rebooting;
> > > >
> > > > extern unsigned int halt_poll_ns;
> > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > > index 72c4e6b39389..962b91ac2064 100644
> > > > --- a/virt/kvm/kvm_main.c
> > > > +++ b/virt/kvm/kvm_main.c
> > > > @@ -3686,6 +3686,7 @@ static long kvm_vcpu_ioctl(struct file *filp,
> > > > int r;
> > > > struct kvm_fpu *fpu = NULL;
> > > > struct kvm_sregs *kvm_sregs = NULL;
> > > > + struct kvm *kvm = vcpu->kvm;
> > > >
> > > > if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_dead)
> > > > return -EIO;
> > > > @@ -3723,6 +3724,14 @@ static long kvm_vcpu_ioctl(struct file *filp,
> > > > if (oldpid)
> > > > synchronize_rcu();
> > > > put_pid(oldpid);
> > > > +
> > > > + /*
> > > > + * Since we land here even on the first vCPU run,
> > > > + * we can mark that the VM has started running.
> > > > + */
> > >
> > > It might be nicer to add a comment why the code below gets kvm->lock.
> > >
> > I've been going back and forth on this one. Initially I considered
> > simply going with atomic_t, but the patch 4/11 (KVM: arm64: Setup a
> > framework for hypercall bitmap firmware registers)
> > kvm_arm_set_fw_reg_bmap()'s implementation felt like we need a lock to
> > have the whole 'is the register busy?' operation atomic. But, that's
> > just one of the applications.
>
> I understand why you need the code to get the lock here with the
> current implementation.
> But, since the code just set the one field (vm_started) with the lock,
> I thought the intention of getting the lock might not be so obvious.
> (But, maybe clear enough looking at the code in the patch-4)
>
> Thanks,
> Reiji
>
>
> > > Anyway, the patch generally looks good to me, and thank you
> > > for making this change (it works for my purpose as well).
> > >
> > > Reviewed-by: Reiji Watanabe <[email protected]>
> > >
> > Glad that it's helping you as well and thanks for the review.
> >
> > Regards,
> > Raghavendra
> >
> > > Thanks,
> > > Reiji
> > >
> > >
> > > > + mutex_lock(&kvm->lock);
> > > > + kvm->vm_started = true;
> > > > + mutex_unlock(&kvm->lock);
> > > > }
> > > > r = kvm_arch_vcpu_ioctl_run(vcpu);
> > > > trace_kvm_userspace_exit(vcpu->run->exit_reason, r);
> > > > --
> > > > 2.34.1.448.ga2b2bfdf31-goog
> > > >
On Tue, Jan 11, 2022, Raghavendra Rao Ananta wrote:
> On Tue, Jan 11, 2022 at 9:36 AM Sean Christopherson <[email protected]> wrote:
> > In your proposed patch, KVM_RUN will take kvm->lock _every_ time. That introduces
> > unnecessary contention as it will serialize this bit of code if multiple vCPUs
> > are attempting KVM_RUN. By checking !vm_started, only the "first" KVM_RUN for a
> > VM will acquire kvm->lock and thus avoid contention once the VM is up and running.
> > There's still a possibility that multiple vCPUs will contend for kvm->lock on their
> > first KVM_RUN, hence the quotes. I called it "naive" because it's possible there's
> > a more elegant solution depending on the use case, e.g. a lockless approach might
> > work (or it might not).
> >
> But is it safe to read kvm->vm_started without grabbing the lock in
> the first place?
Don't know, but that's my point. Without a consumer in generic KVM and due to
my lack of arm64 knowledge, without a high-level description of how the flag will
be used by arm64, it's really difficult to determine what's safe and what's not.
For other architectures, it's an impossible question to answer because we don't
know how the flag might be used.
> use atomic_t maybe for this?
No. An atomic_t is generally useful only if there are multiple writers that can
possibly write different values. It's highly unlikely that simply switching to an
atomic address the needs of arm64.
> > > > > + kvm->vm_started = true;
> > > > > + mutex_unlock(&kvm->lock);
> > > >
> > > > Lastly, why is this in generic KVM?
> > > >
> > > The v1 of the series originally had it in the arm specific code.
> > > However, I was suggested to move it to the generic code since the book
> > > keeping is not arch specific and could be helpful to others too [1].
> >
> > I'm definitely in favor of moving/adding thing to generic KVM when it makes sense,
> > but I'm skeptical in this particular case. The code _is_ arch specific in that
> > arm64 apparently needs to acquire kvm->lock when checking if a vCPU has run, e.g.
> > versus a hypothetical x86 use case that might be completely ok with a lockless
> > implementation. And it's not obvious that there's a plausible, safe use case
> > outside of arm64, e.g. on x86, there is very, very little that is truly shared
> > across the entire VM/system, most things are per-thread/core/package in some way,
> > shape, or form. In other words, I'm a wary of providing something like this for
> > x86 because odds are good that any use will be functionally incorrect.
> I've been going back and forth on this. I've seen a couple of
> variables declared in the generic struct and used only in the arch
> code. vcpu->valid_wakeup for instance, which is used only by s390
> arch. Maybe I'm looking at it the wrong way as to what can and can't
> go in the generic kvm code.
Ya, valid_wakeup is an oddball, I don't know why it's in kvm_vcpu instead of
arch code that's wrapped with e.g. kvm_arch_vcpu_valid_wakeup().
That said, valid_wakeup is consumed by generic KVM, i.e. has well defined semantics
for how it is used, so it's purely a "this code is rather odd" issue. vm_started
on the other hand is only produced by generic KVM, and so its required semantics are
unclear.
On Tue, Jan 11, 2022 at 10:52 AM Raghavendra Rao Ananta
<[email protected]> wrote:
>
> On Mon, Jan 10, 2022 at 3:57 PM Jim Mattson <[email protected]> wrote:
> >
> > On Mon, Jan 10, 2022 at 3:07 PM Raghavendra Rao Ananta
> > <[email protected]> wrote:
> > >
> > > On Fri, Jan 7, 2022 at 4:05 PM Jim Mattson <[email protected]> wrote:
> > > >
> > > > On Fri, Jan 7, 2022 at 3:43 PM Raghavendra Rao Ananta
> > > > <[email protected]> wrote:
> > > > >
> > > > > Hi Reiji,
> > > > >
> > > > > On Thu, Jan 6, 2022 at 10:07 PM Reiji Watanabe <[email protected]> wrote:
> > > > > >
> > > > > > Hi Raghu,
> > > > > >
> > > > > > On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta
> > > > > > <[email protected]> wrote:
> > > > > > >
> > > > > > > Capture the start of the KVM VM, which is basically the
> > > > > > > start of any vCPU run. This state of the VM is helpful
> > > > > > > in the upcoming patches to prevent user-space from
> > > > > > > configuring certain VM features after the VM has started
> > > > > > > running.
> > > >
> > > > What about live migration, where the VM has already technically been
> > > > started before the first call to KVM_RUN?
> > >
> > > My understanding is that a new 'struct kvm' is created on the target
> > > machine and this flag should be reset, which would allow the VMM to
> > > restore the firmware registers. However, we would be running KVM_RUN
> > > for the first time on the target machine, thus setting the flag.
> > > So, you are right; It's more of a resume operation from the guest's
> > > point of view. I guess the name of the variable is what's confusing
> > > here.
> >
> > I was actually thinking that live migration gives userspace an easy
> > way to circumvent your restriction. You said, "This state of the VM is
> > helpful in the upcoming patches to prevent user-space from configuring
> > certain VM features after the VM has started running." However, if you
> > don't ensure that these VM features are configured the same way on the
> > target machine as they were on the source machine, you have not
> > actually accomplished your stated goal.
> >
> Isn't that up to the VMM to save/restore and validate the registers
> across migrations?
Yes, just as it is up to userspace not to make bad configuration
changes after the first VMRUN.
> Perhaps I have to re-word my intentions for the patch- userspace
> should be able to configure the registers before issuing the first
> KVM_RUN.
Perhaps it would help if you explained *why* you are doing this. It
sounds like you are either trying to protect against a malicious
userspace, or you are trying to keep userspace from doing something
stupid. In general, kvm only enforces constraints that are necessary
to protect the host. If that's what you're doing, I don't understand
why live migration doesn't provide an end-run around your protections.
On Mon, Jan 10, 2022 at 4:51 PM Raghavendra Rao Ananta
<[email protected]> wrote:
>
> On Sun, Jan 9, 2022 at 10:29 PM Reiji Watanabe <[email protected]> wrote:
> >
> > Hi Raghu,
> >
> > On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta
> > <[email protected]> wrote:
> > >
> > > KVM regularly introduces new hypercall services to the guests without
> > > any consent from the Virtual Machine Manager (VMM). This means, the
> > > guests can observe hypercall services in and out as they migrate
> > > across various host kernel versions. This could be a major problem
> > > if the guest discovered a hypercall, started using it, and after
> > > getting migrated to an older kernel realizes that it's no longer
> > > available. Depending on how the guest handles the change, there's
> > > a potential chance that the guest would just panic.
> > >
> > > As a result, there's a need for the VMM to elect the services that
> > > it wishes the guest to discover. VMM can elect these services based
> > > on the kernels spread across its (migration) fleet. To remedy this,
> > > extend the existing firmware psuedo-registers, such as
> > > KVM_REG_ARM_PSCI_VERSION, for all the hypercall services available.
> > >
> > > These firmware registers are categorized based on the service call
> > > owners, and unlike the existing firmware psuedo-registers, they hold
> > > the features supported in the form of a bitmap.
> > >
> > > The capability, KVM_CAP_ARM_HVC_FW_REG_BMAP, is used to announce
> > > this extension, which returns the number of psuedo-firmware
> > > registers supported. During the VM initialization, the registers
> > > holds an upper-limit of the features supported by the corresponding
> > > registers. It's expected that the VMMs discover the features
> > > provided by each register via GET_ONE_REG, and writeback the
> > > desired values using SET_ONE_REG. KVM allows this modification
> > > only until the VM has started.
> > >
> > > Older VMMs can simply ignore the capability and the hypercall services
> > > will be exposed unconditionally to the guests, thus ensuring backward
> > > compatibility.
> > >
> > > In this patch, the framework adds the register only for ARM's standard
> > > secure services (owner value 4). Currently, this includes support only
> > > for ARM True Random Number Generator (TRNG) service, with bit-0 of the
> > > register representing mandatory features of v1.0. Other services are
> > > momentarily added in the upcoming patches.
> > >
> > > Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> > > ---
> > > arch/arm64/include/asm/kvm_host.h | 12 ++++
> > > arch/arm64/include/uapi/asm/kvm.h | 4 ++
> > > arch/arm64/kvm/arm.c | 4 ++
> > > arch/arm64/kvm/hypercalls.c | 103 +++++++++++++++++++++++++++++-
> > > arch/arm64/kvm/trng.c | 8 +--
> > > include/kvm/arm_hypercalls.h | 6 ++
> > > 6 files changed, 129 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > > index 2a5f7f38006f..a32cded0371b 100644
> > > --- a/arch/arm64/include/asm/kvm_host.h
> > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > @@ -102,6 +102,15 @@ struct kvm_s2_mmu {
> > > struct kvm_arch_memory_slot {
> > > };
> > >
> > > +/**
> > > + * struct kvm_hvc_desc: KVM ARM64 hypercall descriptor
> > > + *
> > > + * @hvc_std_bmap: Bitmap of standard secure service calls
> > > + */
> > > +struct kvm_hvc_desc {
> > > + u64 hvc_std_bmap;
> > > +};
> > > +
> > > struct kvm_arch {
> > > struct kvm_s2_mmu mmu;
> > >
> > > @@ -137,6 +146,9 @@ struct kvm_arch {
> > >
> > > /* Memory Tagging Extension enabled for the guest */
> > > bool mte_enabled;
> > > +
> > > + /* Hypercall firmware register' descriptor */
> > > + struct kvm_hvc_desc hvc_desc;
> > > };
> > >
> > > struct kvm_vcpu_fault_info {
> > > diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> > > index b3edde68bc3e..0d6f29c58456 100644
> > > --- a/arch/arm64/include/uapi/asm/kvm.h
> > > +++ b/arch/arm64/include/uapi/asm/kvm.h
> > > @@ -281,6 +281,10 @@ struct kvm_arm_copy_mte_tags {
> > > #define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED 3
> > > #define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED (1U << 4)
> > >
> > > +#define KVM_REG_ARM_STD_BMAP KVM_REG_ARM_FW_REG(3)
> > > +#define KVM_REG_ARM_STD_BIT_TRNG_V1_0 BIT(0)
> > > +#define KVM_REG_ARM_STD_BMAP_BIT_MAX 0 /* Last valid bit */
> > > +
> > > /* SVE registers */
> > > #define KVM_REG_ARM64_SVE (0x15 << KVM_REG_ARM_COPROC_SHIFT)
> > >
> > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > > index e4727dc771bf..56fe81565235 100644
> > > --- a/arch/arm64/kvm/arm.c
> > > +++ b/arch/arm64/kvm/arm.c
> > > @@ -156,6 +156,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> > > kvm->arch.max_vcpus = kvm_arm_default_max_vcpus();
> > >
> > > set_default_spectre(kvm);
> > > + kvm_arm_init_hypercalls(kvm);
> > >
> > > return ret;
> > > out_free_stage2_pgd:
> > > @@ -283,6 +284,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> > > case KVM_CAP_ARM_PTRAUTH_GENERIC:
> > > r = system_has_full_ptr_auth();
> > > break;
> > > + case KVM_CAP_ARM_HVC_FW_REG_BMAP:
> > > + r = kvm_arm_num_fw_bmap_regs();
> > > + break;
> >
> > Looking at the discussion for the v2 series,
> >
> > https://lore.kernel.org/kvmarm/[email protected]/
> >
> > I assume that the number of the pseudo-firmware bitmap registers
> > will be used to clear pseudo firmware bitmap registers that
> > userspace doesn't know.
> > I'm wondering how userspace can identify which pseudo-firmware
> > registers that KVM_GET_REG_LIST provides are the pseudo-firmware
> > bitmap registers that it doesn't know.
> > For instance, suppose pseudo-firmware registers that KVM_GET_REG_LIST
> > provides are KVM_REG_ARM_FW_REG(0) to KVM_REG_ARM_FW_REG(9), userspace
> > doesn't knows KVM_REG_ARM_FW_REG(6) to KVM_REG_ARM_FW_REG(9), and
> > KVM_CAP_ARM_HVC_FW_REG_BMAP returns 5, how can userspace identify
> > remaining two bitmap registers from those 4 (fw-reg #6 to #9)
> > firmware registers ?
> >
> In v3, we leave the decision upto the userspace. If the userspace
> encounters a register that it's unaware, it can choose either to clear
> it or let it get exposed to the guest as is (see the code snipped
> shared by Andrew in the link).
> Trying to understand the question better- are you asking how would
> userspace distinguish between bitmap and regular fw registers with
> intermixed sequence numbers?
Yes, that's my question.
> If yes, do you foresee a reason why they 'unaware' registers needed to
> be treated differently?
Since I'm not sure what the specification of 'unaware' (non-bitmap)
registers will be, it would be safer for us to assume that they might
need to be treated differently from the bitmap registers.
Considering there is KVM_REG_ARM_PSCI_VERSION, which KVM doesn't allow
userspace to set to 0, there might be similar registers that userspace
cannot set to 0 in the future.
BTW, If you assume that all those 'unaware' firmware registers are
treated in the same way, I don't think userspace needs the number of
those bitmap registers from KVM_CAP_ARM_HVC_FW_REG_BMAP (Instead,
I would think it can handle the 'unaware' registers with a list of
firmware registers from KVM_GET_REG_LIST).
> >
> > > default:
> > > r = 0;
> > > }
> > > diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> > > index 3c2fcf31ad3d..06243e4670eb 100644
> > > --- a/arch/arm64/kvm/hypercalls.c
> > > +++ b/arch/arm64/kvm/hypercalls.c
> > > @@ -58,6 +58,29 @@ static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
> > > val[3] = lower_32_bits(cycles);
> > > }
> > >
> > > +static bool kvm_arm_fw_reg_feat_enabled(u64 reg_bmap, u64 feat_bit)
> > > +{
> > > + return reg_bmap & feat_bit;
> > > +}
> > > +
> > > +bool kvm_hvc_call_supported(struct kvm_vcpu *vcpu, u32 func_id)
> > > +{
> > > + struct kvm_hvc_desc *hvc_desc = &vcpu->kvm->arch.hvc_desc;
> > > +
> > > + switch (func_id) {
> > > + case ARM_SMCCC_TRNG_VERSION:
> > > + case ARM_SMCCC_TRNG_FEATURES:
> > > + case ARM_SMCCC_TRNG_GET_UUID:
> > > + case ARM_SMCCC_TRNG_RND32:
> > > + case ARM_SMCCC_TRNG_RND64:
> > > + return kvm_arm_fw_reg_feat_enabled(hvc_desc->hvc_std_bmap,
> > > + KVM_REG_ARM_STD_BIT_TRNG_V1_0);
> > > + default:
> > > + /* By default, allow the services that aren't listed here */
> > > + return true;
> > > + }
> > > +}
> >
> > kvm_hvc_call_supported() could return true even for @func_id that
> > kvm_hvc_call_handler() returns -EINVAL for. Is this behavior what
> > you really want ?
> Yes. My idea was to let kvm_hvc_call_supported() check for the
> support, while kvm_hvc_call_handler() does the real processing of the
> call.
>
> > If so, IMHO the function name might be a bit mis-leading.
> > "kvm_hvc_call_disabled" (and flip the return value)
> > might be closer to what it does(?).
> >
> Sorry, I'm unclear how flipping is helping. Wouldn't we return 'false'
> if we don't have a case for the func_id, indicating it's NOT disabled,
> but kvm_hvc_call_handler() can still return SMCCC_RET_NOT_SUPPORTED?
Yes, that's fine, too.
Since those services are disabled (because they are enabled by default),
I just thought checking 'disabled' might be closer to what it does than
checking 'enabled'. But, 'enabled' is also fine.
> >
> > > +
> > > int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> > > {
> > > u32 func_id = smccc_get_function(vcpu);
> > > @@ -65,6 +88,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> > > u32 feature;
> > > gpa_t gpa;
> > >
> > > + if (!kvm_hvc_call_supported(vcpu, func_id))
> > > + goto out;
> > > +
> > > switch (func_id) {
> > > case ARM_SMCCC_VERSION_FUNC_ID:
> > > val[0] = ARM_SMCCC_VERSION_1_1;
> > > @@ -143,6 +169,7 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> > > return kvm_psci_call(vcpu);
> > > }
> > >
> > > +out:
> > > smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
> > > return 1;
> > > }
> > > @@ -153,9 +180,25 @@ static const u64 kvm_arm_fw_reg_ids[] = {
> > > KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2,
> > > };
> > >
> > > +static const u64 kvm_arm_fw_reg_bmap_ids[] = {
> > > + KVM_REG_ARM_STD_BMAP,
> > > +};
> > > +
> > > +void kvm_arm_init_hypercalls(struct kvm *kvm)
> > > +{
> > > + struct kvm_hvc_desc *hvc_desc = &kvm->arch.hvc_desc;
> > > +
> > > + hvc_desc->hvc_std_bmap = ARM_SMCCC_STD_FEATURES;
> > > +}
> > > +
> > > +int kvm_arm_num_fw_bmap_regs(void)
> > > +{
> > > + return ARRAY_SIZE(kvm_arm_fw_reg_bmap_ids);
> > > +}
> > > +
> > > int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu)
> > > {
> > > - return ARRAY_SIZE(kvm_arm_fw_reg_ids);
> > > + return ARRAY_SIZE(kvm_arm_fw_reg_ids) + kvm_arm_num_fw_bmap_regs();
> > > }
> > >
> > > int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
> > > @@ -167,6 +210,11 @@ int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
> > > return -EFAULT;
> > > }
> > >
> > > + for (i = 0; i < ARRAY_SIZE(kvm_arm_fw_reg_bmap_ids); i++) {
> > > + if (put_user(kvm_arm_fw_reg_bmap_ids[i], uindices++))
> > > + return -EFAULT;
> > > + }
> > > +
> > > return 0;
> > > }
> > >
> > > @@ -211,9 +259,20 @@ static int get_kernel_wa_level(u64 regid)
> > > return -EINVAL;
> > > }
> > >
> > > +static void
> > > +kvm_arm_get_fw_reg_bmap(struct kvm_vcpu *vcpu, u64 fw_reg_bmap, u64 *val)
> > > +{
> > > + struct kvm *kvm = vcpu->kvm;
> > > +
> > > + mutex_lock(&kvm->lock);
> > > + *val = fw_reg_bmap;
> > > + mutex_unlock(&kvm->lock);
> >
> > Why does it need to hold the lock ? (Wouldn't READ_ONCE be enough ?)
> >
> I don't have much experience with READ_ONCE at this point, but do you
> think this read can be protected again the read/write in
> kvm_arm_set_fw_reg_bmap()?
If kvm_arm_set_fw_reg_bmap is changed to use WRITE_ONCE to
update hvc_desc->hvc_*_bmap (kvm_arm_set_fw_reg_bmap still needs
to get the lock to prevent other vCPUs from running KVM_RUN),
I would think using READ_ONCE in kvm_arm_get_fw_reg_bmap() without
getting the lock should work (will see either old or new value).
Thanks,
Reiji
> >
> > > +}
> > > +
> > > int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > > {
> > > void __user *uaddr = (void __user *)(long)reg->addr;
> > > + struct kvm_hvc_desc *hvc_desc = &vcpu->kvm->arch.hvc_desc;
> > > u64 val;
> > >
> > > switch (reg->id) {
> > > @@ -224,6 +283,9 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > > case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
> > > val = get_kernel_wa_level(reg->id) & KVM_REG_FEATURE_LEVEL_MASK;
> > > break;
> > > + case KVM_REG_ARM_STD_BMAP:
> > > + kvm_arm_get_fw_reg_bmap(vcpu, hvc_desc->hvc_std_bmap, &val);
> > > + break;
> > > default:
> > > return -ENOENT;
> > > }
> > > @@ -234,6 +296,43 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > > return 0;
> > > }
> > >
> > > +static int kvm_arm_set_fw_reg_bmap(struct kvm_vcpu *vcpu, u64 reg_id, u64 val)
> > > +{
> > > + int ret = 0;
> > > + struct kvm *kvm = vcpu->kvm;
> > > + struct kvm_hvc_desc *hvc_desc = &kvm->arch.hvc_desc;
> > > + u64 *fw_reg_bmap, fw_reg_features;
> > > +
> > > + switch (reg_id) {
> > > + case KVM_REG_ARM_STD_BMAP:
> > > + fw_reg_bmap = &hvc_desc->hvc_std_bmap;
> > > + fw_reg_features = ARM_SMCCC_STD_FEATURES;
> > > + break;
> > > + default:
> > > + return -ENOENT;
> > > + }
> > > +
> > > + /* Check for unsupported bit */
> > > + if (val & ~fw_reg_features)
> > > + return -EINVAL;
> > > +
> > > + mutex_lock(&kvm->lock);
> > > +
> > > + /*
> > > + * If the VM (any vCPU) has already started running, return success
> > > + * if there's no change in the value. Else, return -EBUSY.
> > > + */
> > > + if (kvm_vm_has_started(kvm)) {
> > > + ret = *fw_reg_bmap != val ? -EBUSY : 0;
> > > + goto out;
> > > + }
> > > +
> > > + *fw_reg_bmap = val;
> > > +out:
> > > + mutex_unlock(&kvm->lock);
> > > + return ret;
> > > +}
> > > +
> > > int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > > {
> > > void __user *uaddr = (void __user *)(long)reg->addr;
> > > @@ -310,6 +409,8 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > > return -EINVAL;
> > >
> > > return 0;
> > > + case KVM_REG_ARM_STD_BMAP:
> > > + return kvm_arm_set_fw_reg_bmap(vcpu, reg->id, val);
> > > default:
> > > return -ENOENT;
> > > }
> > > diff --git a/arch/arm64/kvm/trng.c b/arch/arm64/kvm/trng.c
> > > index 99bdd7103c9c..23f912514b06 100644
> > > --- a/arch/arm64/kvm/trng.c
> > > +++ b/arch/arm64/kvm/trng.c
> > > @@ -60,14 +60,8 @@ int kvm_trng_call(struct kvm_vcpu *vcpu)
> > > val = ARM_SMCCC_TRNG_VERSION_1_0;
> > > break;
> > > case ARM_SMCCC_TRNG_FEATURES:
> > > - switch (smccc_get_arg1(vcpu)) {
> > > - case ARM_SMCCC_TRNG_VERSION:
> > > - case ARM_SMCCC_TRNG_FEATURES:
> > > - case ARM_SMCCC_TRNG_GET_UUID:
> > > - case ARM_SMCCC_TRNG_RND32:
> > > - case ARM_SMCCC_TRNG_RND64:
> > > + if (kvm_hvc_call_supported(vcpu, smccc_get_arg1(vcpu)))
> > > val = TRNG_SUCCESS;
> >
> > kvm_hvc_call_supported() returns true for any values that are
> > not explicitly listed in kvm_hvc_call_supported() (i.e. it returns
> > true even for @func_id that are not any of ARM_SMCCC_TRNG_*).
> > So, I don't think it can simply use the current kvm_hvc_call_supported.
> >
> You are right. Probably I should leave the case statements as is (or
> think of some better way).
>
>
> Thanks for the review and suggestions.
>
> Regards,
> Raghavendra
> > Thanks,
> > Reiji
> >
> > > - }
> > > break;
> > > case ARM_SMCCC_TRNG_GET_UUID:
> > > smccc_set_retval(vcpu, le32_to_cpu(u[0]), le32_to_cpu(u[1]),
> > > diff --git a/include/kvm/arm_hypercalls.h b/include/kvm/arm_hypercalls.h
> > > index 5d38628a8d04..8fe68d8d6d96 100644
> > > --- a/include/kvm/arm_hypercalls.h
> > > +++ b/include/kvm/arm_hypercalls.h
> > > @@ -6,6 +6,9 @@
> > >
> > > #include <asm/kvm_emulate.h>
> > >
> > > +#define ARM_SMCCC_STD_FEATURES \
> > > + GENMASK_ULL(KVM_REG_ARM_STD_BMAP_BIT_MAX, 0)
> > > +
> > > int kvm_hvc_call_handler(struct kvm_vcpu *vcpu);
> > >
> > > static inline u32 smccc_get_function(struct kvm_vcpu *vcpu)
> > > @@ -42,9 +45,12 @@ static inline void smccc_set_retval(struct kvm_vcpu *vcpu,
> > >
> > > struct kvm_one_reg;
> > >
> > > +void kvm_arm_init_hypercalls(struct kvm *kvm);
> > > +int kvm_arm_num_fw_bmap_regs(void);
> > > int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu);
> > > int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
> > > int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
> > > int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
> > > +bool kvm_hvc_call_supported(struct kvm_vcpu *vcpu, u32 func_id);
> > >
> > > #endif
> > > --
> > > 2.34.1.448.ga2b2bfdf31-goog
> > >
On Tue, Jan 11, 2022 at 9:12 PM Reiji Watanabe <[email protected]> wrote:
>
> On Mon, Jan 10, 2022 at 4:51 PM Raghavendra Rao Ananta
> <[email protected]> wrote:
> >
> > On Sun, Jan 9, 2022 at 10:29 PM Reiji Watanabe <[email protected]> wrote:
> > >
> > > Hi Raghu,
> > >
> > > On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta
> > > <[email protected]> wrote:
> > > >
> > > > KVM regularly introduces new hypercall services to the guests without
> > > > any consent from the Virtual Machine Manager (VMM). This means, the
> > > > guests can observe hypercall services in and out as they migrate
> > > > across various host kernel versions. This could be a major problem
> > > > if the guest discovered a hypercall, started using it, and after
> > > > getting migrated to an older kernel realizes that it's no longer
> > > > available. Depending on how the guest handles the change, there's
> > > > a potential chance that the guest would just panic.
> > > >
> > > > As a result, there's a need for the VMM to elect the services that
> > > > it wishes the guest to discover. VMM can elect these services based
> > > > on the kernels spread across its (migration) fleet. To remedy this,
> > > > extend the existing firmware psuedo-registers, such as
> > > > KVM_REG_ARM_PSCI_VERSION, for all the hypercall services available.
> > > >
> > > > These firmware registers are categorized based on the service call
> > > > owners, and unlike the existing firmware psuedo-registers, they hold
> > > > the features supported in the form of a bitmap.
> > > >
> > > > The capability, KVM_CAP_ARM_HVC_FW_REG_BMAP, is used to announce
> > > > this extension, which returns the number of psuedo-firmware
> > > > registers supported. During the VM initialization, the registers
> > > > holds an upper-limit of the features supported by the corresponding
> > > > registers. It's expected that the VMMs discover the features
> > > > provided by each register via GET_ONE_REG, and writeback the
> > > > desired values using SET_ONE_REG. KVM allows this modification
> > > > only until the VM has started.
> > > >
> > > > Older VMMs can simply ignore the capability and the hypercall services
> > > > will be exposed unconditionally to the guests, thus ensuring backward
> > > > compatibility.
> > > >
> > > > In this patch, the framework adds the register only for ARM's standard
> > > > secure services (owner value 4). Currently, this includes support only
> > > > for ARM True Random Number Generator (TRNG) service, with bit-0 of the
> > > > register representing mandatory features of v1.0. Other services are
> > > > momentarily added in the upcoming patches.
> > > >
> > > > Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> > > > ---
> > > > arch/arm64/include/asm/kvm_host.h | 12 ++++
> > > > arch/arm64/include/uapi/asm/kvm.h | 4 ++
> > > > arch/arm64/kvm/arm.c | 4 ++
> > > > arch/arm64/kvm/hypercalls.c | 103 +++++++++++++++++++++++++++++-
> > > > arch/arm64/kvm/trng.c | 8 +--
> > > > include/kvm/arm_hypercalls.h | 6 ++
> > > > 6 files changed, 129 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > > > index 2a5f7f38006f..a32cded0371b 100644
> > > > --- a/arch/arm64/include/asm/kvm_host.h
> > > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > > @@ -102,6 +102,15 @@ struct kvm_s2_mmu {
> > > > struct kvm_arch_memory_slot {
> > > > };
> > > >
> > > > +/**
> > > > + * struct kvm_hvc_desc: KVM ARM64 hypercall descriptor
> > > > + *
> > > > + * @hvc_std_bmap: Bitmap of standard secure service calls
> > > > + */
> > > > +struct kvm_hvc_desc {
> > > > + u64 hvc_std_bmap;
> > > > +};
> > > > +
> > > > struct kvm_arch {
> > > > struct kvm_s2_mmu mmu;
> > > >
> > > > @@ -137,6 +146,9 @@ struct kvm_arch {
> > > >
> > > > /* Memory Tagging Extension enabled for the guest */
> > > > bool mte_enabled;
> > > > +
> > > > + /* Hypercall firmware register' descriptor */
> > > > + struct kvm_hvc_desc hvc_desc;
> > > > };
> > > >
> > > > struct kvm_vcpu_fault_info {
> > > > diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> > > > index b3edde68bc3e..0d6f29c58456 100644
> > > > --- a/arch/arm64/include/uapi/asm/kvm.h
> > > > +++ b/arch/arm64/include/uapi/asm/kvm.h
> > > > @@ -281,6 +281,10 @@ struct kvm_arm_copy_mte_tags {
> > > > #define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED 3
> > > > #define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED (1U << 4)
> > > >
> > > > +#define KVM_REG_ARM_STD_BMAP KVM_REG_ARM_FW_REG(3)
> > > > +#define KVM_REG_ARM_STD_BIT_TRNG_V1_0 BIT(0)
> > > > +#define KVM_REG_ARM_STD_BMAP_BIT_MAX 0 /* Last valid bit */
> > > > +
> > > > /* SVE registers */
> > > > #define KVM_REG_ARM64_SVE (0x15 << KVM_REG_ARM_COPROC_SHIFT)
> > > >
> > > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > > > index e4727dc771bf..56fe81565235 100644
> > > > --- a/arch/arm64/kvm/arm.c
> > > > +++ b/arch/arm64/kvm/arm.c
> > > > @@ -156,6 +156,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> > > > kvm->arch.max_vcpus = kvm_arm_default_max_vcpus();
> > > >
> > > > set_default_spectre(kvm);
> > > > + kvm_arm_init_hypercalls(kvm);
> > > >
> > > > return ret;
> > > > out_free_stage2_pgd:
> > > > @@ -283,6 +284,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> > > > case KVM_CAP_ARM_PTRAUTH_GENERIC:
> > > > r = system_has_full_ptr_auth();
> > > > break;
> > > > + case KVM_CAP_ARM_HVC_FW_REG_BMAP:
> > > > + r = kvm_arm_num_fw_bmap_regs();
> > > > + break;
> > >
> > > Looking at the discussion for the v2 series,
> > >
> > > https://lore.kernel.org/kvmarm/[email protected]/
> > >
> > > I assume that the number of the pseudo-firmware bitmap registers
> > > will be used to clear pseudo firmware bitmap registers that
> > > userspace doesn't know.
> > > I'm wondering how userspace can identify which pseudo-firmware
> > > registers that KVM_GET_REG_LIST provides are the pseudo-firmware
> > > bitmap registers that it doesn't know.
> > > For instance, suppose pseudo-firmware registers that KVM_GET_REG_LIST
> > > provides are KVM_REG_ARM_FW_REG(0) to KVM_REG_ARM_FW_REG(9), userspace
> > > doesn't knows KVM_REG_ARM_FW_REG(6) to KVM_REG_ARM_FW_REG(9), and
> > > KVM_CAP_ARM_HVC_FW_REG_BMAP returns 5, how can userspace identify
> > > remaining two bitmap registers from those 4 (fw-reg #6 to #9)
> > > firmware registers ?
> > >
> > In v3, we leave the decision upto the userspace. If the userspace
> > encounters a register that it's unaware, it can choose either to clear
> > it or let it get exposed to the guest as is (see the code snipped
> > shared by Andrew in the link).
> > Trying to understand the question better- are you asking how would
> > userspace distinguish between bitmap and regular fw registers with
> > intermixed sequence numbers?
>
> Yes, that's my question.
>
>
> > If yes, do you foresee a reason why they 'unaware' registers needed to
> > be treated differently?
>
> Since I'm not sure what the specification of 'unaware' (non-bitmap)
> registers will be, it would be safer for us to assume that they might
> need to be treated differently from the bitmap registers.
> Considering there is KVM_REG_ARM_PSCI_VERSION, which KVM doesn't allow
> userspace to set to 0, there might be similar registers that userspace
> cannot set to 0 in the future.
>
> BTW, If you assume that all those 'unaware' firmware registers are
> treated in the same way, I don't think userspace needs the number of
> those bitmap registers from KVM_CAP_ARM_HVC_FW_REG_BMAP (Instead,
> I would think it can handle the 'unaware' registers with a list of
> firmware registers from KVM_GET_REG_LIST).
>
You are right; mixing these registers would create an issue for the
VMM. Instead, we can probably have a subset of the KVM_REG_ARM_FW_REG
space dedicated for the bitmapped firmware registers, something like:
#define KVM_REG_ARM_FW_BMAP_BASE KVM_REG_ARM_FW_REG(0xff00) /*
Upper half of the fw reg space */
#define KVM_REG_ARM_FW_BMAP_REG(r) (KVM_REG_ARM_FW_BMAP_BASE | (r))
#define KVM_REG_ARM_STD_BMAP KVM_REG_ARM_FW_BMAP_REG(0)
With this, I think the VMM can easily detect a bitmap fw register.
Also, if it encounters an unknown bitmapped register it can handle it
separately if it likes.
The minor advantage of the CAP still returning the number of bitmapped
registers can be an inexpensive shortcut to VMM to get a general idea
of the number of registers.
> > >
> > > > default:
> > > > r = 0;
> > > > }
> > > > diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> > > > index 3c2fcf31ad3d..06243e4670eb 100644
> > > > --- a/arch/arm64/kvm/hypercalls.c
> > > > +++ b/arch/arm64/kvm/hypercalls.c
> > > > @@ -58,6 +58,29 @@ static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
> > > > val[3] = lower_32_bits(cycles);
> > > > }
> > > >
> > > > +static bool kvm_arm_fw_reg_feat_enabled(u64 reg_bmap, u64 feat_bit)
> > > > +{
> > > > + return reg_bmap & feat_bit;
> > > > +}
> > > > +
> > > > +bool kvm_hvc_call_supported(struct kvm_vcpu *vcpu, u32 func_id)
> > > > +{
> > > > + struct kvm_hvc_desc *hvc_desc = &vcpu->kvm->arch.hvc_desc;
> > > > +
> > > > + switch (func_id) {
> > > > + case ARM_SMCCC_TRNG_VERSION:
> > > > + case ARM_SMCCC_TRNG_FEATURES:
> > > > + case ARM_SMCCC_TRNG_GET_UUID:
> > > > + case ARM_SMCCC_TRNG_RND32:
> > > > + case ARM_SMCCC_TRNG_RND64:
> > > > + return kvm_arm_fw_reg_feat_enabled(hvc_desc->hvc_std_bmap,
> > > > + KVM_REG_ARM_STD_BIT_TRNG_V1_0);
> > > > + default:
> > > > + /* By default, allow the services that aren't listed here */
> > > > + return true;
> > > > + }
> > > > +}
> > >
> > > kvm_hvc_call_supported() could return true even for @func_id that
> > > kvm_hvc_call_handler() returns -EINVAL for. Is this behavior what
> > > you really want ?
> > Yes. My idea was to let kvm_hvc_call_supported() check for the
> > support, while kvm_hvc_call_handler() does the real processing of the
> > call.
> >
> > > If so, IMHO the function name might be a bit mis-leading.
> > > "kvm_hvc_call_disabled" (and flip the return value)
> > > might be closer to what it does(?).
> > >
> > Sorry, I'm unclear how flipping is helping. Wouldn't we return 'false'
> > if we don't have a case for the func_id, indicating it's NOT disabled,
> > but kvm_hvc_call_handler() can still return SMCCC_RET_NOT_SUPPORTED?
>
> Yes, that's fine, too.
> Since those services are disabled (because they are enabled by default),
> I just thought checking 'disabled' might be closer to what it does than
> checking 'enabled'. But, 'enabled' is also fine.
>
> > >
> > > > +
> > > > int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> > > > {
> > > > u32 func_id = smccc_get_function(vcpu);
> > > > @@ -65,6 +88,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> > > > u32 feature;
> > > > gpa_t gpa;
> > > >
> > > > + if (!kvm_hvc_call_supported(vcpu, func_id))
> > > > + goto out;
> > > > +
> > > > switch (func_id) {
> > > > case ARM_SMCCC_VERSION_FUNC_ID:
> > > > val[0] = ARM_SMCCC_VERSION_1_1;
> > > > @@ -143,6 +169,7 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> > > > return kvm_psci_call(vcpu);
> > > > }
> > > >
> > > > +out:
> > > > smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
> > > > return 1;
> > > > }
> > > > @@ -153,9 +180,25 @@ static const u64 kvm_arm_fw_reg_ids[] = {
> > > > KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2,
> > > > };
> > > >
> > > > +static const u64 kvm_arm_fw_reg_bmap_ids[] = {
> > > > + KVM_REG_ARM_STD_BMAP,
> > > > +};
> > > > +
> > > > +void kvm_arm_init_hypercalls(struct kvm *kvm)
> > > > +{
> > > > + struct kvm_hvc_desc *hvc_desc = &kvm->arch.hvc_desc;
> > > > +
> > > > + hvc_desc->hvc_std_bmap = ARM_SMCCC_STD_FEATURES;
> > > > +}
> > > > +
> > > > +int kvm_arm_num_fw_bmap_regs(void)
> > > > +{
> > > > + return ARRAY_SIZE(kvm_arm_fw_reg_bmap_ids);
> > > > +}
> > > > +
> > > > int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu)
> > > > {
> > > > - return ARRAY_SIZE(kvm_arm_fw_reg_ids);
> > > > + return ARRAY_SIZE(kvm_arm_fw_reg_ids) + kvm_arm_num_fw_bmap_regs();
> > > > }
> > > >
> > > > int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
> > > > @@ -167,6 +210,11 @@ int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
> > > > return -EFAULT;
> > > > }
> > > >
> > > > + for (i = 0; i < ARRAY_SIZE(kvm_arm_fw_reg_bmap_ids); i++) {
> > > > + if (put_user(kvm_arm_fw_reg_bmap_ids[i], uindices++))
> > > > + return -EFAULT;
> > > > + }
> > > > +
> > > > return 0;
> > > > }
> > > >
> > > > @@ -211,9 +259,20 @@ static int get_kernel_wa_level(u64 regid)
> > > > return -EINVAL;
> > > > }
> > > >
> > > > +static void
> > > > +kvm_arm_get_fw_reg_bmap(struct kvm_vcpu *vcpu, u64 fw_reg_bmap, u64 *val)
> > > > +{
> > > > + struct kvm *kvm = vcpu->kvm;
> > > > +
> > > > + mutex_lock(&kvm->lock);
> > > > + *val = fw_reg_bmap;
> > > > + mutex_unlock(&kvm->lock);
> > >
> > > Why does it need to hold the lock ? (Wouldn't READ_ONCE be enough ?)
> > >
> > I don't have much experience with READ_ONCE at this point, but do you
> > think this read can be protected again the read/write in
> > kvm_arm_set_fw_reg_bmap()?
>
> If kvm_arm_set_fw_reg_bmap is changed to use WRITE_ONCE to
> update hvc_desc->hvc_*_bmap (kvm_arm_set_fw_reg_bmap still needs
> to get the lock to prevent other vCPUs from running KVM_RUN),
> I would think using READ_ONCE in kvm_arm_get_fw_reg_bmap() without
> getting the lock should work (will see either old or new value).
>
That makes sense. Thanks for the suggestion and all the reviews.
Regards,
Raghavendra
> Thanks,
> Reiji
>
>
> > >
> > > > +}
> > > > +
> > > > int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > > > {
> > > > void __user *uaddr = (void __user *)(long)reg->addr;
> > > > + struct kvm_hvc_desc *hvc_desc = &vcpu->kvm->arch.hvc_desc;
> > > > u64 val;
> > > >
> > > > switch (reg->id) {
> > > > @@ -224,6 +283,9 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > > > case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
> > > > val = get_kernel_wa_level(reg->id) & KVM_REG_FEATURE_LEVEL_MASK;
> > > > break;
> > > > + case KVM_REG_ARM_STD_BMAP:
> > > > + kvm_arm_get_fw_reg_bmap(vcpu, hvc_desc->hvc_std_bmap, &val);
> > > > + break;
> > > > default:
> > > > return -ENOENT;
> > > > }
> > > > @@ -234,6 +296,43 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > > > return 0;
> > > > }
> > > >
> > > > +static int kvm_arm_set_fw_reg_bmap(struct kvm_vcpu *vcpu, u64 reg_id, u64 val)
> > > > +{
> > > > + int ret = 0;
> > > > + struct kvm *kvm = vcpu->kvm;
> > > > + struct kvm_hvc_desc *hvc_desc = &kvm->arch.hvc_desc;
> > > > + u64 *fw_reg_bmap, fw_reg_features;
> > > > +
> > > > + switch (reg_id) {
> > > > + case KVM_REG_ARM_STD_BMAP:
> > > > + fw_reg_bmap = &hvc_desc->hvc_std_bmap;
> > > > + fw_reg_features = ARM_SMCCC_STD_FEATURES;
> > > > + break;
> > > > + default:
> > > > + return -ENOENT;
> > > > + }
> > > > +
> > > > + /* Check for unsupported bit */
> > > > + if (val & ~fw_reg_features)
> > > > + return -EINVAL;
> > > > +
> > > > + mutex_lock(&kvm->lock);
> > > > +
> > > > + /*
> > > > + * If the VM (any vCPU) has already started running, return success
> > > > + * if there's no change in the value. Else, return -EBUSY.
> > > > + */
> > > > + if (kvm_vm_has_started(kvm)) {
> > > > + ret = *fw_reg_bmap != val ? -EBUSY : 0;
> > > > + goto out;
> > > > + }
> > > > +
> > > > + *fw_reg_bmap = val;
> > > > +out:
> > > > + mutex_unlock(&kvm->lock);
> > > > + return ret;
> > > > +}
> > > > +
> > > > int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > > > {
> > > > void __user *uaddr = (void __user *)(long)reg->addr;
> > > > @@ -310,6 +409,8 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > > > return -EINVAL;
> > > >
> > > > return 0;
> > > > + case KVM_REG_ARM_STD_BMAP:
> > > > + return kvm_arm_set_fw_reg_bmap(vcpu, reg->id, val);
> > > > default:
> > > > return -ENOENT;
> > > > }
> > > > diff --git a/arch/arm64/kvm/trng.c b/arch/arm64/kvm/trng.c
> > > > index 99bdd7103c9c..23f912514b06 100644
> > > > --- a/arch/arm64/kvm/trng.c
> > > > +++ b/arch/arm64/kvm/trng.c
> > > > @@ -60,14 +60,8 @@ int kvm_trng_call(struct kvm_vcpu *vcpu)
> > > > val = ARM_SMCCC_TRNG_VERSION_1_0;
> > > > break;
> > > > case ARM_SMCCC_TRNG_FEATURES:
> > > > - switch (smccc_get_arg1(vcpu)) {
> > > > - case ARM_SMCCC_TRNG_VERSION:
> > > > - case ARM_SMCCC_TRNG_FEATURES:
> > > > - case ARM_SMCCC_TRNG_GET_UUID:
> > > > - case ARM_SMCCC_TRNG_RND32:
> > > > - case ARM_SMCCC_TRNG_RND64:
> > > > + if (kvm_hvc_call_supported(vcpu, smccc_get_arg1(vcpu)))
> > > > val = TRNG_SUCCESS;
> > >
> > > kvm_hvc_call_supported() returns true for any values that are
> > > not explicitly listed in kvm_hvc_call_supported() (i.e. it returns
> > > true even for @func_id that are not any of ARM_SMCCC_TRNG_*).
> > > So, I don't think it can simply use the current kvm_hvc_call_supported.
> > >
> > You are right. Probably I should leave the case statements as is (or
> > think of some better way).
> >
> >
> > Thanks for the review and suggestions.
> >
> > Regards,
> > Raghavendra
> > > Thanks,
> > > Reiji
> > >
> > > > - }
> > > > break;
> > > > case ARM_SMCCC_TRNG_GET_UUID:
> > > > smccc_set_retval(vcpu, le32_to_cpu(u[0]), le32_to_cpu(u[1]),
> > > > diff --git a/include/kvm/arm_hypercalls.h b/include/kvm/arm_hypercalls.h
> > > > index 5d38628a8d04..8fe68d8d6d96 100644
> > > > --- a/include/kvm/arm_hypercalls.h
> > > > +++ b/include/kvm/arm_hypercalls.h
> > > > @@ -6,6 +6,9 @@
> > > >
> > > > #include <asm/kvm_emulate.h>
> > > >
> > > > +#define ARM_SMCCC_STD_FEATURES \
> > > > + GENMASK_ULL(KVM_REG_ARM_STD_BMAP_BIT_MAX, 0)
> > > > +
> > > > int kvm_hvc_call_handler(struct kvm_vcpu *vcpu);
> > > >
> > > > static inline u32 smccc_get_function(struct kvm_vcpu *vcpu)
> > > > @@ -42,9 +45,12 @@ static inline void smccc_set_retval(struct kvm_vcpu *vcpu,
> > > >
> > > > struct kvm_one_reg;
> > > >
> > > > +void kvm_arm_init_hypercalls(struct kvm *kvm);
> > > > +int kvm_arm_num_fw_bmap_regs(void);
> > > > int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu);
> > > > int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
> > > > int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
> > > > int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
> > > > +bool kvm_hvc_call_supported(struct kvm_vcpu *vcpu, u32 func_id);
> > > >
> > > > #endif
> > > > --
> > > > 2.34.1.448.ga2b2bfdf31-goog
> > > >
On Tue, Jan 11, 2022 at 11:04 AM Sean Christopherson <[email protected]> wrote:
>
> On Tue, Jan 11, 2022, Raghavendra Rao Ananta wrote:
> > On Tue, Jan 11, 2022 at 9:36 AM Sean Christopherson <[email protected]> wrote:
> > > In your proposed patch, KVM_RUN will take kvm->lock _every_ time. That introduces
> > > unnecessary contention as it will serialize this bit of code if multiple vCPUs
> > > are attempting KVM_RUN. By checking !vm_started, only the "first" KVM_RUN for a
> > > VM will acquire kvm->lock and thus avoid contention once the VM is up and running.
> > > There's still a possibility that multiple vCPUs will contend for kvm->lock on their
> > > first KVM_RUN, hence the quotes. I called it "naive" because it's possible there's
> > > a more elegant solution depending on the use case, e.g. a lockless approach might
> > > work (or it might not).
> > >
> > But is it safe to read kvm->vm_started without grabbing the lock in
> > the first place?
>
> Don't know, but that's my point. Without a consumer in generic KVM and due to
> my lack of arm64 knowledge, without a high-level description of how the flag will
> be used by arm64, it's really difficult to determine what's safe and what's not.
> For other architectures, it's an impossible question to answer because we don't
> know how the flag might be used.
>
> > use atomic_t maybe for this?
>
> No. An atomic_t is generally useful only if there are multiple writers that can
> possibly write different values. It's highly unlikely that simply switching to an
> atomic address the needs of arm64.
>
> > > > > > + kvm->vm_started = true;
> > > > > > + mutex_unlock(&kvm->lock);
> > > > >
> > > > > Lastly, why is this in generic KVM?
> > > > >
> > > > The v1 of the series originally had it in the arm specific code.
> > > > However, I was suggested to move it to the generic code since the book
> > > > keeping is not arch specific and could be helpful to others too [1].
> > >
> > > I'm definitely in favor of moving/adding thing to generic KVM when it makes sense,
> > > but I'm skeptical in this particular case. The code _is_ arch specific in that
> > > arm64 apparently needs to acquire kvm->lock when checking if a vCPU has run, e.g.
> > > versus a hypothetical x86 use case that might be completely ok with a lockless
> > > implementation. And it's not obvious that there's a plausible, safe use case
> > > outside of arm64, e.g. on x86, there is very, very little that is truly shared
> > > across the entire VM/system, most things are per-thread/core/package in some way,
> > > shape, or form. In other words, I'm a wary of providing something like this for
> > > x86 because odds are good that any use will be functionally incorrect.
> > I've been going back and forth on this. I've seen a couple of
> > variables declared in the generic struct and used only in the arch
> > code. vcpu->valid_wakeup for instance, which is used only by s390
> > arch. Maybe I'm looking at it the wrong way as to what can and can't
> > go in the generic kvm code.
>
> Ya, valid_wakeup is an oddball, I don't know why it's in kvm_vcpu instead of
> arch code that's wrapped with e.g. kvm_arch_vcpu_valid_wakeup().
>
> That said, valid_wakeup is consumed by generic KVM, i.e. has well defined semantics
> for how it is used, so it's purely a "this code is rather odd" issue. vm_started
> on the other hand is only produced by generic KVM, and so its required semantics are
> unclear.
Understood. I'll move it to arm64 and we can refactor it if there's a
need for any other arch(s).
Thanks,
Raghavendra
On Wed, Jan 12, 2022, Raghavendra Rao Ananta wrote:
> Understood. I'll move it to arm64 and we can refactor it if there's a
> need for any other arch(s).
Before you do that, can you answer Jim's question on _why_ arm64 needs this?
On Tue, Jan 11, 2022 at 11:16 AM Jim Mattson <[email protected]> wrote:
>
> On Tue, Jan 11, 2022 at 10:52 AM Raghavendra Rao Ananta
> <[email protected]> wrote:
> >
> > On Mon, Jan 10, 2022 at 3:57 PM Jim Mattson <[email protected]> wrote:
> > >
> > > On Mon, Jan 10, 2022 at 3:07 PM Raghavendra Rao Ananta
> > > <[email protected]> wrote:
> > > >
> > > > On Fri, Jan 7, 2022 at 4:05 PM Jim Mattson <[email protected]> wrote:
> > > > >
> > > > > On Fri, Jan 7, 2022 at 3:43 PM Raghavendra Rao Ananta
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > Hi Reiji,
> > > > > >
> > > > > > On Thu, Jan 6, 2022 at 10:07 PM Reiji Watanabe <[email protected]> wrote:
> > > > > > >
> > > > > > > Hi Raghu,
> > > > > > >
> > > > > > > On Tue, Jan 4, 2022 at 11:49 AM Raghavendra Rao Ananta
> > > > > > > <[email protected]> wrote:
> > > > > > > >
> > > > > > > > Capture the start of the KVM VM, which is basically the
> > > > > > > > start of any vCPU run. This state of the VM is helpful
> > > > > > > > in the upcoming patches to prevent user-space from
> > > > > > > > configuring certain VM features after the VM has started
> > > > > > > > running.
> > > > >
> > > > > What about live migration, where the VM has already technically been
> > > > > started before the first call to KVM_RUN?
> > > >
> > > > My understanding is that a new 'struct kvm' is created on the target
> > > > machine and this flag should be reset, which would allow the VMM to
> > > > restore the firmware registers. However, we would be running KVM_RUN
> > > > for the first time on the target machine, thus setting the flag.
> > > > So, you are right; It's more of a resume operation from the guest's
> > > > point of view. I guess the name of the variable is what's confusing
> > > > here.
> > >
> > > I was actually thinking that live migration gives userspace an easy
> > > way to circumvent your restriction. You said, "This state of the VM is
> > > helpful in the upcoming patches to prevent user-space from configuring
> > > certain VM features after the VM has started running." However, if you
> > > don't ensure that these VM features are configured the same way on the
> > > target machine as they were on the source machine, you have not
> > > actually accomplished your stated goal.
> > >
> > Isn't that up to the VMM to save/restore and validate the registers
> > across migrations?
>
> Yes, just as it is up to userspace not to make bad configuration
> changes after the first VMRUN.
>
> > Perhaps I have to re-word my intentions for the patch- userspace
> > should be able to configure the registers before issuing the first
> > KVM_RUN.
>
> Perhaps it would help if you explained *why* you are doing this. It
> sounds like you are either trying to protect against a malicious
> userspace, or you are trying to keep userspace from doing something
> stupid. In general, kvm only enforces constraints that are necessary
> to protect the host. If that's what you're doing, I don't understand
> why live migration doesn't provide an end-run around your protections.
It's mainly to safeguard the guests. With respect to migration, KVM
and the userspace are collectively playing a role here. It's up to the
userspace to ensure that the registers are configured the same across
migrations and KVM ensures that the userspace doesn't modify the
registers after KVM_RUN so that they don't see features turned OFF/ON
during execution. I'm not sure if it falls into the definition of
protecting the host. Do you see a value in adding this extra
protection from KVM?
Regards,
Raghavendra
On Wed, Jan 12, 2022, Sean Christopherson wrote:
> On Wed, Jan 12, 2022, Raghavendra Rao Ananta wrote:
> > Understood. I'll move it to arm64 and we can refactor it if there's a
> > need for any other arch(s).
>
> Before you do that, can you answer Jim's question on _why_ arm64 needs this?
Gah, sorry, [email protected] once again lost the spam battle with Gmail.
On Wed, Jan 12, 2022, Raghavendra Rao Ananta wrote:
> On Tue, Jan 11, 2022 at 11:16 AM Jim Mattson <[email protected]> wrote:
> > Perhaps it would help if you explained *why* you are doing this. It
> > sounds like you are either trying to protect against a malicious
> > userspace, or you are trying to keep userspace from doing something
> > stupid. In general, kvm only enforces constraints that are necessary
> > to protect the host. If that's what you're doing, I don't understand
> > why live migration doesn't provide an end-run around your protections.
> It's mainly to safeguard the guests. With respect to migration, KVM
> and the userspace are collectively playing a role here. It's up to the
> userspace to ensure that the registers are configured the same across
> migrations and KVM ensures that the userspace doesn't modify the
> registers after KVM_RUN so that they don't see features turned OFF/ON
> during execution. I'm not sure if it falls into the definition of
> protecting the host. Do you see a value in adding this extra
> protection from KVM?
Short answer: probably not?
There is precedent for disallowing userspace from doing stupid things, but that's
either for KVM's protection (as Jim pointed out), or because KVM can't honor the
change, e.g. x86 is currently in the process of disallowing most CPUID changes
after KVM_RUN because KVM itself consumes the CPUID information and KVM doesn't
support updating some of it's own internal state (because removing features like
GB hugepage support is nonsensical and would require a large pile of complicated,
messy code).
Restricing CPUID changes does offer some "protection" to the guest, but that's
not the goal. E.g. KVM won't detect CPUID misconfiguration in the migration
case, and trying to do so is a fool's errand.
If restricting updates in the arm64 is necessary to ensure KVM provides sane
behavior, then it could be justified. But if it's purely a sanity check on
behalf of the guest, then it's not justified.
On Thu, Jan 13, 2022 at 9:21 AM Sean Christopherson <[email protected]> wrote:
>
> On Wed, Jan 12, 2022, Raghavendra Rao Ananta wrote:
> > On Tue, Jan 11, 2022 at 11:16 AM Jim Mattson <[email protected]> wrote:
> > > Perhaps it would help if you explained *why* you are doing this. It
> > > sounds like you are either trying to protect against a malicious
> > > userspace, or you are trying to keep userspace from doing something
> > > stupid. In general, kvm only enforces constraints that are necessary
> > > to protect the host. If that's what you're doing, I don't understand
> > > why live migration doesn't provide an end-run around your protections.
> > It's mainly to safeguard the guests. With respect to migration, KVM
> > and the userspace are collectively playing a role here. It's up to the
> > userspace to ensure that the registers are configured the same across
> > migrations and KVM ensures that the userspace doesn't modify the
> > registers after KVM_RUN so that they don't see features turned OFF/ON
> > during execution. I'm not sure if it falls into the definition of
> > protecting the host. Do you see a value in adding this extra
> > protection from KVM?
>
> Short answer: probably not?
>
> There is precedent for disallowing userspace from doing stupid things, but that's
> either for KVM's protection (as Jim pointed out), or because KVM can't honor the
> change, e.g. x86 is currently in the process of disallowing most CPUID changes
> after KVM_RUN because KVM itself consumes the CPUID information and KVM doesn't
> support updating some of it's own internal state (because removing features like
> GB hugepage support is nonsensical and would require a large pile of complicated,
> messy code).
>
> Restricing CPUID changes does offer some "protection" to the guest, but that's
> not the goal. E.g. KVM won't detect CPUID misconfiguration in the migration
> case, and trying to do so is a fool's errand.
>
> If restricting updates in the arm64 is necessary to ensure KVM provides sane
> behavior, then it could be justified. But if it's purely a sanity check on
> behalf of the guest, then it's not justified.
Agreed that KVM doesn't really safeguard the guests, but just curious,
is there really a downside in adding this thin layer of safety check?
On the bright side, the guests would be safe, and it could save the
developers some time in hunting down the bugs in this path, no?
Regards,
Raghavendra
On Thu, Jan 13, 2022, Raghavendra Rao Ananta wrote:
> On Thu, Jan 13, 2022 at 9:21 AM Sean Christopherson <[email protected]> wrote:
> > If restricting updates in the arm64 is necessary to ensure KVM provides sane
> > behavior, then it could be justified. But if it's purely a sanity check on
> > behalf of the guest, then it's not justified.
> Agreed that KVM doesn't really safeguard the guests, but just curious,
> is there really a downside in adding this thin layer of safety check?
It's more stuff that KVM has to maintain, creates an ABI that KVM must adhere to,
potentially creates inconsistencies in KVM, and prevents using KVM to intentionally
do stupid things to test scenarios that are "impossible". And we also try to avoid
defining arbitrary CPU behavior in KVM (that may not be the case here).
> On the bright side, the guests would be safe, and it could save the
> developers some time in hunting down the bugs in this path, no?
Yes, but that can be said for lots and lots of things. This is both a slippery
slope argument and the inconsistency argument above, e.g. if KVM actively prevents
userspace from doing X, why doesn't KVM prevent userspace from doing Y? Having a
decently defined rule for these types of things, e.g. protect KVM/kernel and adhere
to the architecture but otherwise let userspace do whatever, avoids spending too
much time arguing over what KVM should/shouldn't allow, or wondering why on earth
KVM does XYZ, at least in theory :-)
There are certainly times where KVM could have saved userspace some pain, but
overall I do think KVM is better off staying out of the way when possible.
> > > > > +static void
> > > > > +kvm_arm_get_fw_reg_bmap(struct kvm_vcpu *vcpu, u64 fw_reg_bmap, u64 *val)
> > > > > +{
> > > > > + struct kvm *kvm = vcpu->kvm;
> > > > > +
> > > > > + mutex_lock(&kvm->lock);
> > > > > + *val = fw_reg_bmap;
> > > > > + mutex_unlock(&kvm->lock);
> > > >
I have another comment for kvm_arm_get_fw_reg_bmap.
I just noticed that @fw_reg_bmap is a value of the bitmap register
(not a pointer). I believe what you meant was a pointer to
hvc_desc->hvc_*_bmap. Also, you can remove @val and return the register
value instead (change the type of the return value from void to u64).
I'm not sure if you will keep this function in the next version though.
Thanks,
Reiji
On Thu, Jan 13, 2022 at 9:21 AM Sean Christopherson <[email protected]> wrote:
>
> On Wed, Jan 12, 2022, Raghavendra Rao Ananta wrote:
> > On Tue, Jan 11, 2022 at 11:16 AM Jim Mattson <[email protected]> wrote:
> > > Perhaps it would help if you explained *why* you are doing this. It
> > > sounds like you are either trying to protect against a malicious
> > > userspace, or you are trying to keep userspace from doing something
> > > stupid. In general, kvm only enforces constraints that are necessary
> > > to protect the host. If that's what you're doing, I don't understand
> > > why live migration doesn't provide an end-run around your protections.
> > It's mainly to safeguard the guests. With respect to migration, KVM
> > and the userspace are collectively playing a role here. It's up to the
> > userspace to ensure that the registers are configured the same across
> > migrations and KVM ensures that the userspace doesn't modify the
> > registers after KVM_RUN so that they don't see features turned OFF/ON
> > during execution. I'm not sure if it falls into the definition of
> > protecting the host. Do you see a value in adding this extra
> > protection from KVM?
>
> Short answer: probably not?
>
> There is precedent for disallowing userspace from doing stupid things, but that's
> either for KVM's protection (as Jim pointed out), or because KVM can't honor the
> change, e.g. x86 is currently in the process of disallowing most CPUID changes
> after KVM_RUN because KVM itself consumes the CPUID information and KVM doesn't
> support updating some of it's own internal state (because removing features like
> GB hugepage support is nonsensical and would require a large pile of complicated,
> messy code).
>
> Restricing CPUID changes does offer some "protection" to the guest, but that's
> not the goal. E.g. KVM won't detect CPUID misconfiguration in the migration
> case, and trying to do so is a fool's errand.
>
> If restricting updates in the arm64 is necessary to ensure KVM provides sane
> behavior, then it could be justified. But if it's purely a sanity check on
> behalf of the guest, then it's not justified.
The pseudo firmware hvc registers, which this series are adding, are
used by KVM to identify available hvc features for the guest, and not
directly exposed to the guest as registers.
The ways the KVM code in the series consumes the registers' values are
very limited, and no KVM data/state is created based on their values.
But, as the code that consumes the registers grows in the future,
I wouldn't be surprised if KVM consumes them differently than it does
now (e.g. create another data structure based on the register values).
I'm not sure though :)
The restriction, with which KVM doesn't need to worry about the changes
in the registers after KVM_RUN, could potentially protect or be useful
to protect KVM and simplify future changes/maintenance of the KVM codes
that consumes the values.
I thought this was one of the reasons for having the restriction.
Thanks,
Reiji
On Fri, Jan 14, 2022 at 1:51 PM Reiji Watanabe <[email protected]> wrote:
>
> On Thu, Jan 13, 2022 at 9:21 AM Sean Christopherson <[email protected]> wrote:
> >
> > On Wed, Jan 12, 2022, Raghavendra Rao Ananta wrote:
> > > On Tue, Jan 11, 2022 at 11:16 AM Jim Mattson <[email protected]> wrote:
> > > > Perhaps it would help if you explained *why* you are doing this. It
> > > > sounds like you are either trying to protect against a malicious
> > > > userspace, or you are trying to keep userspace from doing something
> > > > stupid. In general, kvm only enforces constraints that are necessary
> > > > to protect the host. If that's what you're doing, I don't understand
> > > > why live migration doesn't provide an end-run around your protections.
> > > It's mainly to safeguard the guests. With respect to migration, KVM
> > > and the userspace are collectively playing a role here. It's up to the
> > > userspace to ensure that the registers are configured the same across
> > > migrations and KVM ensures that the userspace doesn't modify the
> > > registers after KVM_RUN so that they don't see features turned OFF/ON
> > > during execution. I'm not sure if it falls into the definition of
> > > protecting the host. Do you see a value in adding this extra
> > > protection from KVM?
> >
> > Short answer: probably not?
> >
> > There is precedent for disallowing userspace from doing stupid things, but that's
> > either for KVM's protection (as Jim pointed out), or because KVM can't honor the
> > change, e.g. x86 is currently in the process of disallowing most CPUID changes
> > after KVM_RUN because KVM itself consumes the CPUID information and KVM doesn't
> > support updating some of it's own internal state (because removing features like
> > GB hugepage support is nonsensical and would require a large pile of complicated,
> > messy code).
> >
> > Restricing CPUID changes does offer some "protection" to the guest, but that's
> > not the goal. E.g. KVM won't detect CPUID misconfiguration in the migration
> > case, and trying to do so is a fool's errand.
> >
> > If restricting updates in the arm64 is necessary to ensure KVM provides sane
> > behavior, then it could be justified. But if it's purely a sanity check on
> > behalf of the guest, then it's not justified.
>
> The pseudo firmware hvc registers, which this series are adding, are
> used by KVM to identify available hvc features for the guest, and not
> directly exposed to the guest as registers.
> The ways the KVM code in the series consumes the registers' values are
> very limited, and no KVM data/state is created based on their values.
> But, as the code that consumes the registers grows in the future,
> I wouldn't be surprised if KVM consumes them differently than it does
> now (e.g. create another data structure based on the register values).
> I'm not sure though :)
>
> The restriction, with which KVM doesn't need to worry about the changes
> in the registers after KVM_RUN, could potentially protect or be useful
> to protect KVM and simplify future changes/maintenance of the KVM codes
> that consumes the values.
> I thought this was one of the reasons for having the restriction.
>
Well, that wasn't the original intention of the patch, but just to
protect the guests from the userspace's dynamic updates. Having said
that, and based on what Sean mentioned in his last reply, it could be
inconsistent from what KVM has been doing so far and would be
difficult to cover all the scenarios that userspace can mess things up
for guests.
I'll plan to drop this patch in the next version, and bring it back
back to arm64 if we really need it.
Thanks Sean, Jim, and Reiji for the comments and discussion.
Regards,
Raghavendra
> Thanks,
> Reiji
On Fri, Jan 14, 2022, Reiji Watanabe wrote:
> The restriction, with which KVM doesn't need to worry about the changes
> in the registers after KVM_RUN, could potentially protect or be useful
> to protect KVM and simplify future changes/maintenance of the KVM codes
> that consumes the values.
That sort of protection is definitely welcome, the previously mentioned CPUID mess
on x86 would have benefit greatly by KVM being restrictive in the past. That said,
hooking KVM_RUN is likely the wrong way to go about implementing any restrictions.
Running a vCPU is where much of the vCPU's state is explicitly consumed, but it's
all too easy for KVM to implicity/indirectly consume state via a different ioctl(),
e.g. if there are side effects that are visible in other registers, than an update
can also be visible to userspace via KVM_{G,S}ET_{S,}REGS, at which point disallowing
modifying state after KVM_RUN but not after reading/writing regs is arbitrary and
inconsitent.
If possible, preventing modification if kvm->created_vcpus > 0 is ideal as it's
a relatively common pattern in KVM, and provides a clear boundary to userpace
regarding what is/isn't allowed.
在 2022/1/5 上午3:49, Raghavendra Rao Ananta 写道:
> KVM regularly introduces new hypercall services to the guests without
> any consent from the Virtual Machine Manager (VMM). This means, the
> guests can observe hypercall services in and out as they migrate
> across various host kernel versions. This could be a major problem
> if the guest discovered a hypercall, started using it, and after
> getting migrated to an older kernel realizes that it's no longer
> available. Depending on how the guest handles the change, there's
> a potential chance that the guest would just panic.
>
> As a result, there's a need for the VMM to elect the services that
> it wishes the guest to discover. VMM can elect these services based
> on the kernels spread across its (migration) fleet. To remedy this,
> extend the existing firmware psuedo-registers, such as
> KVM_REG_ARM_PSCI_VERSION, for all the hypercall services available.
Haven't gone through the series but I wonder whether it's better to have
a (e)BPF filter for this like seccomp.
Thanks
>
> These firmware registers are categorized based on the service call
> owners, and unlike the existing firmware psuedo-registers, they hold
> the features supported in the form of a bitmap.
>
> The capability, KVM_CAP_ARM_HVC_FW_REG_BMAP, is used to announce
> this extension, which returns the number of psuedo-firmware
> registers supported. During the VM initialization, the registers
> holds an upper-limit of the features supported by the corresponding
> registers. It's expected that the VMMs discover the features
> provided by each register via GET_ONE_REG, and writeback the
> desired values using SET_ONE_REG. KVM allows this modification
> only until the VM has started.
>
> Older VMMs can simply ignore the capability and the hypercall services
> will be exposed unconditionally to the guests, thus ensuring backward
> compatibility.
>
> In this patch, the framework adds the register only for ARM's standard
> secure services (owner value 4). Currently, this includes support only
> for ARM True Random Number Generator (TRNG) service, with bit-0 of the
> register representing mandatory features of v1.0. Other services are
> momentarily added in the upcoming patches.
>
> Signed-off-by: Raghavendra Rao Ananta<[email protected]>
On Tue, Jan 18, 2022 at 4:07 PM Sean Christopherson <[email protected]> wrote:
>
> On Fri, Jan 14, 2022, Reiji Watanabe wrote:
> > The restriction, with which KVM doesn't need to worry about the changes
> > in the registers after KVM_RUN, could potentially protect or be useful
> > to protect KVM and simplify future changes/maintenance of the KVM codes
> > that consumes the values.
>
> That sort of protection is definitely welcome, the previously mentioned CPUID mess
> on x86 would have benefit greatly by KVM being restrictive in the past. That said,
> hooking KVM_RUN is likely the wrong way to go about implementing any restrictions.
> Running a vCPU is where much of the vCPU's state is explicitly consumed, but it's
> all too easy for KVM to implicity/indirectly consume state via a different ioctl(),
> e.g. if there are side effects that are visible in other registers, than an update
> can also be visible to userspace via KVM_{G,S}ET_{S,}REGS, at which point disallowing
> modifying state after KVM_RUN but not after reading/writing regs is arbitrary and
> inconsitent.
Thank you for your comments !
I think I understand your concern, and that's a great point.
That's not the case for those pseudo registers though at least for now :)
BTW, is this concern specific to hooking KVM_RUN ? (Wouldn't it be the
same for the option with "if kvm->created_vcpus > 0" ?)
> If possible, preventing modification if kvm->created_vcpus > 0 is ideal as it's
> a relatively common pattern in KVM, and provides a clear boundary to userpace
> regarding what is/isn't allowed.
Yes, I agree that would be better in general. For (pseudo) registers,
I would think preventing modification if kvm->created_vcpus > 0 might
not be a very good option for KVM/ARM though considering usage of
KVM_GET_REG_LIST and KVM_{G,S}ET_ONE_REG.
Thanks,
Reiji
On Wed, 19 Jan 2022 06:42:15 +0000,
Jason Wang <[email protected]> wrote:
>
>
> 在 2022/1/5 上午3:49, Raghavendra Rao Ananta 写道:
> > KVM regularly introduces new hypercall services to the guests without
> > any consent from the Virtual Machine Manager (VMM). This means, the
> > guests can observe hypercall services in and out as they migrate
> > across various host kernel versions. This could be a major problem
> > if the guest discovered a hypercall, started using it, and after
> > getting migrated to an older kernel realizes that it's no longer
> > available. Depending on how the guest handles the change, there's
> > a potential chance that the guest would just panic.
> >
> > As a result, there's a need for the VMM to elect the services that
> > it wishes the guest to discover. VMM can elect these services based
> > on the kernels spread across its (migration) fleet. To remedy this,
> > extend the existing firmware psuedo-registers, such as
> > KVM_REG_ARM_PSCI_VERSION, for all the hypercall services available.
>
>
>
> Haven't gone through the series but I wonder whether it's better to
> have a (e)BPF filter for this like seccomp.
No, please. This has to fit in the save/restore model, and should be
under control of the VMM. If you want to filter things using seccomp,
that's fine, but also that's completely orthogonal.
M.
--
Without deviation from the norm, progress is not possible.
On Tue, Jan 18, 2022, Reiji Watanabe wrote:
> On Tue, Jan 18, 2022 at 4:07 PM Sean Christopherson <[email protected]> wrote:
> >
> > On Fri, Jan 14, 2022, Reiji Watanabe wrote:
> > > The restriction, with which KVM doesn't need to worry about the changes
> > > in the registers after KVM_RUN, could potentially protect or be useful
> > > to protect KVM and simplify future changes/maintenance of the KVM codes
> > > that consumes the values.
> >
> > That sort of protection is definitely welcome, the previously mentioned CPUID mess
> > on x86 would have benefit greatly by KVM being restrictive in the past. That said,
> > hooking KVM_RUN is likely the wrong way to go about implementing any restrictions.
> > Running a vCPU is where much of the vCPU's state is explicitly consumed, but it's
> > all too easy for KVM to implicity/indirectly consume state via a different ioctl(),
> > e.g. if there are side effects that are visible in other registers, than an update
> > can also be visible to userspace via KVM_{G,S}ET_{S,}REGS, at which point disallowing
> > modifying state after KVM_RUN but not after reading/writing regs is arbitrary and
> > inconsitent.
>
> Thank you for your comments !
> I think I understand your concern, and that's a great point.
> That's not the case for those pseudo registers though at least for now :)
> BTW, is this concern specific to hooking KVM_RUN ? (Wouldn't it be the
> same for the option with "if kvm->created_vcpus > 0" ?)
Not really? The goal with created_vcpus is to avoid having inconsistent state in
"struct kvm_vcpu" with respect to the VM as whole. "struct kvm" obvioulsy can't
be inconsistent with itself, e.g. even if userspace consumes some side effect,
that's simply "the state". Did that make sense? Hard to explain in writing :-)
> > If possible, preventing modification if kvm->created_vcpus > 0 is ideal as it's
> > a relatively common pattern in KVM, and provides a clear boundary to userpace
> > regarding what is/isn't allowed.
>
> Yes, I agree that would be better in general. For (pseudo) registers,
What exactly are these pseudo registers? If it's something that's an immutable
property of the (virtual) system, then it might make sense to use a separate,
non-vCPU mechanism for setting/getting their values. Then you can easily restrict
the <whatever> to pre-created_vcpus, e.g. see x86's KVM_SET_IDENTITY_MAP_ADDR.
> I would think preventing modification if kvm->created_vcpus > 0 might
> not be a very good option for KVM/ARM though considering usage of
> KVM_GET_REG_LIST and KVM_{G,S}ET_ONE_REG.
On Wed, Jan 19, 2022 at 4:27 PM Sean Christopherson <[email protected]> wrote:
>
> On Tue, Jan 18, 2022, Reiji Watanabe wrote:
> > On Tue, Jan 18, 2022 at 4:07 PM Sean Christopherson <[email protected]> wrote:
> > >
> > > On Fri, Jan 14, 2022, Reiji Watanabe wrote:
> > > > The restriction, with which KVM doesn't need to worry about the changes
> > > > in the registers after KVM_RUN, could potentially protect or be useful
> > > > to protect KVM and simplify future changes/maintenance of the KVM codes
> > > > that consumes the values.
> > >
> > > That sort of protection is definitely welcome, the previously mentioned CPUID mess
> > > on x86 would have benefit greatly by KVM being restrictive in the past. That said,
> > > hooking KVM_RUN is likely the wrong way to go about implementing any restrictions.
> > > Running a vCPU is where much of the vCPU's state is explicitly consumed, but it's
> > > all too easy for KVM to implicity/indirectly consume state via a different ioctl(),
> > > e.g. if there are side effects that are visible in other registers, than an update
> > > can also be visible to userspace via KVM_{G,S}ET_{S,}REGS, at which point disallowing
> > > modifying state after KVM_RUN but not after reading/writing regs is arbitrary and
> > > inconsitent.
> >
> > Thank you for your comments !
> > I think I understand your concern, and that's a great point.
> > That's not the case for those pseudo registers though at least for now :)
> > BTW, is this concern specific to hooking KVM_RUN ? (Wouldn't it be the
> > same for the option with "if kvm->created_vcpus > 0" ?)
>
> Not really? The goal with created_vcpus is to avoid having inconsistent state in
> "struct kvm_vcpu" with respect to the VM as whole. "struct kvm" obvioulsy can't
> be inconsistent with itself, e.g. even if userspace consumes some side effect,
> that's simply "the state". Did that make sense? Hard to explain in writing :-)
>
> > > If possible, preventing modification if kvm->created_vcpus > 0 is ideal as it's
> > > a relatively common pattern in KVM, and provides a clear boundary to userpace
> > > regarding what is/isn't allowed.
> >
> > Yes, I agree that would be better in general. For (pseudo) registers,
>
> What exactly are these pseudo registers? If it's something that's an immutable
> property of the (virtual) system, then it might make sense to use a separate,
> non-vCPU mechanism for setting/getting their values. Then you can easily restrict
> the <whatever> to pre-created_vcpus, e.g. see x86's KVM_SET_IDENTITY_MAP_ADDR.
>
In general, these pseudo-registers are reserved non-architectural
register spaces, currently being used to represent KVM-as-a-firmware's
versioning across guests' migrations [1]. That is, the user-space
configures these registers for the guests to see same 'firmware'
versions before and after migrations. The model is built over the
existing KVM_GET_REG_LIST and KVM_[SET|GET]_ONE_REG APIs. Since this
series' efforts falls into the same realm, the idea was keep this
consistent with the existing model to which VMMs (such as QEMU) are
already used to.
Granted, even though these registers should technically be immutable,
there was no similar protection employed for the existing
psuedo-registers. I was wondering if that could be of any value if we
start providing one; But I guess not, since it may break the
user-space's expectations of KVM (and probably why we didn't have it
earlier).
Regards,
Raghavendra
[1]: https://github.com/torvalds/linux/blob/master/Documentation/virt/kvm/arm/psci.rst
> > I would think preventing modification if kvm->created_vcpus > 0 might
> > not be a very good option for KVM/ARM though considering usage of
> > KVM_GET_REG_LIST and KVM_{G,S}ET_ONE_REG.
It's probably time I jump on this thread
On Thu, 13 Jan 2022 17:21:19 +0000,
Sean Christopherson <[email protected]> wrote:
>
> On Wed, Jan 12, 2022, Raghavendra Rao Ananta wrote:
> > On Tue, Jan 11, 2022 at 11:16 AM Jim Mattson <[email protected]> wrote:
> > > Perhaps it would help if you explained *why* you are doing this. It
> > > sounds like you are either trying to protect against a malicious
> > > userspace, or you are trying to keep userspace from doing something
> > > stupid. In general, kvm only enforces constraints that are necessary
> > > to protect the host. If that's what you're doing, I don't understand
> > > why live migration doesn't provide an end-run around your protections.
> > It's mainly to safeguard the guests. With respect to migration, KVM
> > and the userspace are collectively playing a role here. It's up to the
> > userspace to ensure that the registers are configured the same across
> > migrations and KVM ensures that the userspace doesn't modify the
> > registers after KVM_RUN so that they don't see features turned OFF/ON
> > during execution. I'm not sure if it falls into the definition of
> > protecting the host. Do you see a value in adding this extra
> > protection from KVM?
>
> Short answer: probably not?
Well, I beg to defer.
> There is precedent for disallowing userspace from doing stupid
> things, but that's either for KVM's protection (as Jim pointed out),
> or because KVM can't honor the change, e.g. x86 is currently in the
> process of disallowing most CPUID changes after KVM_RUN because KVM
> itself consumes the CPUID information and KVM doesn't support
> updating some of it's own internal state (because removing features
> like GB hugepage support is nonsensical and would require a large
> pile of complicated, messy code).
We provide quite a lot of CPU emulation based on the feature
registers exposed to the guest. Allowing userspace to change this
stuff once the guest is running is a massive attack vector.
> Restricing CPUID changes does offer some "protection" to the guest, but that's
> not the goal. E.g. KVM won't detect CPUID misconfiguration in the migration
> case, and trying to do so is a fool's errand.
>
> If restricting updates in the arm64 is necessary to ensure KVM provides sane
> behavior, then it could be justified. But if it's purely a sanity check on
> behalf of the guest, then it's not justified.
No. This is about preventing userspace from tripping the kernel into
doing things it really shouldn't by flipping bits of configuration
that should be set in stone.
M.
--
Without deviation from the norm, progress is not possible.
On Wed, 19 Jan 2022 00:07:44 +0000,
Sean Christopherson <[email protected]> wrote:
>
> On Fri, Jan 14, 2022, Reiji Watanabe wrote:
> > The restriction, with which KVM doesn't need to worry about the changes
> > in the registers after KVM_RUN, could potentially protect or be useful
> > to protect KVM and simplify future changes/maintenance of the KVM codes
> > that consumes the values.
>
> That sort of protection is definitely welcome, the previously mentioned CPUID mess
> on x86 would have benefit greatly by KVM being restrictive in the past. That said,
> hooking KVM_RUN is likely the wrong way to go about implementing any restrictions.
> Running a vCPU is where much of the vCPU's state is explicitly consumed, but it's
> all too easy for KVM to implicity/indirectly consume state via a different ioctl(),
> e.g. if there are side effects that are visible in other registers, than an update
> can also be visible to userspace via KVM_{G,S}ET_{S,}REGS, at which point disallowing
> modifying state after KVM_RUN but not after reading/writing regs is arbitrary and
> inconsitent.
>
> If possible, preventing modification if kvm->created_vcpus > 0 is
> ideal as it's a relatively common pattern in KVM, and provides a
> clear boundary to userpace regarding what is/isn't allowed.
No, that's way too late. The configuration is in general per-CPU, and
I really don't want to expand the surface of the userspace API to
allow all sort of magic trick depending on the nature of what you
save/restore.
The "first run" crap is already there. We have it on a per-CPU basis,
and we need it at the VM level for other reasons (see the recent
discussion about PMU filtering vs binding to a specific PMU
implementation).
M.
--
Without deviation from the norm, progress is not possible.