Hello,
Continuing the discussion from [1], the series tries to add support
for the userspace to elect the hypercall services that it wishes
to expose to the guest, rather than the guest discovering them
unconditionally. The idea employed by the series was taken from
[1] as suggested by Marc Z.
In a broad sense, the concept is similar to the current implementation
of PSCI interface- create a 'firmware psuedo-register' to handle the
firmware revisions. The series extends this idea to all the other
hypercalls such as TRNG (True Random Number Generator), PV_TIME
(Paravirtualized Time), and PTP (Precision Time protocol).
For better categorization and future scaling, these firmware registers
are categorized based on the service call owners. Also, unlike the
existing firmware psuedo-registers, they hold the features supported
in the form of a bitmap.
During the VM initialization, the registers holds an upper-limit of
the features supported by each one of them. It's expected that the
userspace discover the features provided by each register via GET_ONE_REG,
and writeback the desired values using SET_ONE_REG. KVM allows this
modification only until the VM has started.
Some of the standard function-ids, such as ARM_SMCCC_VERSION_FUNC_ID,
need not be associated with a feature bit. For such ids, the series
introduced an allowed-list (in kvm_hvc_call_default_allowed()), that holds
all such ids. As a result, the functions that are not elected by userspace,
or if they are not a part of this allowed-list, will be denied for when
the guests invoke them.
Older VMMs can simply ignore this interface and the hypercall services
will be exposed unconditionally to the guests, thus ensuring backward
compatibility.
The patches are based off of mainline kernel 5.18-rc3, with the selftest
patches from [2] applied.
Patch-1 factors out the non-PSCI related interface from psci.c to
hypercalls.c, as the series would extend the list in the upcoming
patches.
Patch-2 sets up the framework for the bitmap firmware psuedo-registers.
It includes read/write support for the registers, and a helper to check
if a particular hypercall service is supported for the guest.
It also adds the register KVM_REG_ARM_STD_HYP_BMAP to support ARM's
standard secure services.
Patch-3 introduces the firmware register, KVM_REG_ARM_STD_HYP_BMAP,
which holds the standard hypervisor services (such as PV_TIME).
Patch-4 introduces the firmware register, KVM_REG_ARM_VENDOR_HYP_BMAP,
which holds the vendor specific hypercall services.
Patch-5,6 Add the necessary documentation for the newly added firmware
registers.
Patch-7 imports the SMCCC definitions from linux/arm-smccc.h into tools/
for further use in selftests.
Patch-8 adds the selftest to test the guest (using 'hvc') and userspace
interfaces (SET/GET_ONE_REG).
Patch-9 adds these firmware registers into the get-reg-list selftest.
[1]: https://lore.kernel.org/kvmarm/[email protected]/T/
[2]: https://lore.kernel.org/all/[email protected]/
Regards,
Raghavendra
v5 -> v6:
Addressed the comments by Marc and Gavin:
- Bitmaps are represented using 'unsigned long' inctead of 'u64' (Marc).
- Replaced the array holding the allowed-list,
hvc_func_default_allowed_list[], which looked up the func_id using a
loop, with a switch-case statement (Marc).
- kvm_arm_set_fw_reg_bmap() now always returns -EBUSY for any 'write' of
the bitmap value after the VM has started running. Documentation is
adjusted accordingly (Marc).
- kvm_psci_func_id_is_valid() is moved from an inline function to
kvm/psci.c (Marc).
- Merged ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID into bit-0 of the vendor
hypervisor firmware register (Gavin).
- Macro optimizations and replace arg0 with arg1 (to comply with KVM
convention) in hypercalls.c selftest (Gavin).
- Dropped the patch v5 10/10 (Add KVM_REG_ARM_FW_REG(3) to get-reg-list)
as it was already uploaded by Andrew.
- Fixed typos
v4 -> v5:
Addressed comments by Oliver (thank you!):
- Rebased the series to accommodate ARM_SMCCC_ARCH_WORKAROUND_3
and PSCI 1.1 changes, and capturing VM's first run.
- Removed the patches related to register scoping (v4 02/13 and
03/13). I plan to re-introduce them in its own series.
- Dropped the patch that captures VM's first run.
- Moved the bitmap feature firmware registers to its own CORPOC
space (0x0016).
- Move the KVM_REG_ARM_*_BIT_MAX definitions from uapi header
to internal header (arm_hypercalls.h).
- Renamed the hypercall descriptor to 'struct kvm_smccc_features',
and kvm_hvc_call_supported() to kvm_hvc_call_allowed().
- Introduced an allowed-list to hold the function-ids that aren't
represented by feature-bits.
- Introduced kvm_psci_func_id_is_valid() to check if a given
function-id is a valid PSCI id, which is used in
kvm_hvc_call_allowed().
- Introduced KVM_REG_ARM_VENDOR_HYP_BIT_FUNC_FEAT as bit-0 of
KVM_REG_ARM_VENDOR_HYP_BMAP register and
KVM_REG_ARM_VENDOR_HYP_BIT_PTP is moved to bit-1.
- Updated the arm-smccc.h import to include the definition of
ARM_SMCCC_ARCH_WORKAROUND_3.
- Introduced the KVM_REG_ARM_FW_FEAT_BMAP COPROC definition to
get-reg-list selftest.
- Created a new patch to include KVM_REG_ARM_FW_REG(3) in
get-reg-list.
v3 -> v4
Addressed comments and took suggestions by Reiji, Oliver, Marc,
Sean and Jim:
- Renamed and moved the VM has run once check to arm64.
- Introduced the capability to dynamically modify the register
encodings to include the scope information.
- Replaced mutex_lock with READ_ONCE and WRITE_ONCE when the
bitmaps are accessed.
- The hypercalls selftest re-runs with KVM_CAP_ARM_REG_SCOPE
enabled.
v2 -> v3
Addressed comments by Marc and Andrew:
- Dropped kvm_vcpu_has_run_once() implementation.
- Redifined kvm_vm_has_run_once() as kvm_vm_has_started() in the core
KVM code that introduces a new field, 'vm_started', to track this.
- KVM_CAP_ARM_HVC_FW_REG_BMAP returns the number of psuedo-firmware
bitmap registers upon a 'read'. Support for 'write' removed.
- Removed redundant spinlock, 'fw_reg_bmap_enabled' fields from the
hypercall descriptor structure.
- A separate sub-struct to hold the bitmap info is removed. The bitmap
info is directly stored in the hypercall descriptor structure
(struct kvm_hvc_desc).
v1 -> v2
Addressed comments by Oliver (thanks!):
- Introduced kvm_vcpu_has_run_once() and kvm_vm_has_run_once() in the
core kvm code, rather than relying on ARM specific
vcpu->arch.has_run_once.
- Writing to KVM_REG_ARM_PSCI_VERSION is done in hypercalls.c itself,
rather than separating out to psci.c.
- Introduced KVM_CAP_ARM_HVC_FW_REG_BMAP to enable the extension.
- Tracks the register accesses from VMM to decide whether to sanitize
a register or not, as opposed to sanitizing upon the first 'write'
in v1.
- kvm_hvc_call_supported() is implemented using a direct switch-case
statement, instead of looping over all the registers to pick the
register for the function-id.
- Replaced the register bit definitions with #defines, instead of enums.
- Removed the patch v1-06/08 that imports the firmware register
definitions as it's not needed.
- Separated out the documentations in its own patch, and the renaming
of hypercalls.rst to psci.rst into another patch.
- Add the new firmware registers to get-reg-list KVM selftest.
v1: https://lore.kernel.org/kvmarm/[email protected]/
v2: https://lore.kernel.org/kvmarm/[email protected]/
v3: https://lore.kernel.org/linux-arm-kernel/[email protected]/
v4: https://lore.kernel.org/lkml/[email protected]/
v5: https://lore.kernel.org/lkml/[email protected]/
Raghavendra Rao Ananta (9):
KVM: arm64: Factor out firmware register handling from psci.c
KVM: arm64: Setup a framework for hypercall bitmap firmware registers
KVM: arm64: Add standard hypervisor firmware register
KVM: arm64: Add vendor hypervisor firmware register
Docs: KVM: Rename psci.rst to hypercalls.rst
Docs: KVM: Add doc for the bitmap firmware registers
tools: Import ARM SMCCC definitions
selftests: KVM: aarch64: Introduce hypercall ABI test
selftests: KVM: aarch64: Add the bitmap firmware registers to
get-reg-list
Documentation/virt/kvm/api.rst | 16 +
Documentation/virt/kvm/arm/hypercalls.rst | 135 +++++++
Documentation/virt/kvm/arm/psci.rst | 77 ----
arch/arm64/include/asm/kvm_host.h | 16 +
arch/arm64/include/uapi/asm/kvm.h | 16 +
arch/arm64/kvm/arm.c | 1 +
arch/arm64/kvm/guest.c | 10 +-
arch/arm64/kvm/hypercalls.c | 313 +++++++++++++++-
arch/arm64/kvm/psci.c | 186 +---------
include/kvm/arm_hypercalls.h | 17 +
include/kvm/arm_psci.h | 9 +-
tools/include/linux/arm-smccc.h | 193 ++++++++++
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/aarch64/get-reg-list.c | 8 +
.../selftests/kvm/aarch64/hypercalls.c | 335 ++++++++++++++++++
16 files changed, 1065 insertions(+), 269 deletions(-)
create mode 100644 Documentation/virt/kvm/arm/hypercalls.rst
delete mode 100644 Documentation/virt/kvm/arm/psci.rst
create mode 100644 tools/include/linux/arm-smccc.h
create mode 100644 tools/testing/selftests/kvm/aarch64/hypercalls.c
--
2.36.0.rc2.479.g8af0fa9b8e-goog
Since the doc also covers general hypercalls' details,
rather than just PSCI, and the fact that the bitmap firmware
registers' details will be added to this doc, rename the file
to a more appropriate name- hypercalls.rst.
Signed-off-by: Raghavendra Rao Ananta <[email protected]>
Reviewed-by: Oliver Upton <[email protected]>
---
Documentation/virt/kvm/arm/{psci.rst => hypercalls.rst} | 0
1 file changed, 0 insertions(+), 0 deletions(-)
rename Documentation/virt/kvm/arm/{psci.rst => hypercalls.rst} (100%)
diff --git a/Documentation/virt/kvm/arm/psci.rst b/Documentation/virt/kvm/arm/hypercalls.rst
similarity index 100%
rename from Documentation/virt/kvm/arm/psci.rst
rename to Documentation/virt/kvm/arm/hypercalls.rst
--
2.36.0.rc2.479.g8af0fa9b8e-goog
Introduce a KVM selftest to check the hypercall interface
for arm64 platforms. The test validates the user-space'
[GET|SET]_ONE_REG interface to read/write the psuedo-firmware
registers as well as its effects on the guest upon certain
configurations.
Signed-off-by: Raghavendra Rao Ananta <[email protected]>
---
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/aarch64/hypercalls.c | 335 ++++++++++++++++++
3 files changed, 337 insertions(+)
create mode 100644 tools/testing/selftests/kvm/aarch64/hypercalls.c
diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index 1bb575dfc42e..b17e464ec661 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -2,6 +2,7 @@
/aarch64/arch_timer
/aarch64/debug-exceptions
/aarch64/get-reg-list
+/aarch64/hypercalls
/aarch64/psci_test
/aarch64/vcpu_width_config
/aarch64/vgic_init
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index c2cf4d318296..97eef0c03d3b 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -105,6 +105,7 @@ TEST_GEN_PROGS_x86_64 += system_counter_offset_test
TEST_GEN_PROGS_aarch64 += aarch64/arch_timer
TEST_GEN_PROGS_aarch64 += aarch64/debug-exceptions
TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
+TEST_GEN_PROGS_aarch64 += aarch64/hypercalls
TEST_GEN_PROGS_aarch64 += aarch64/psci_test
TEST_GEN_PROGS_aarch64 += aarch64/vcpu_width_config
TEST_GEN_PROGS_aarch64 += aarch64/vgic_init
diff --git a/tools/testing/selftests/kvm/aarch64/hypercalls.c b/tools/testing/selftests/kvm/aarch64/hypercalls.c
new file mode 100644
index 000000000000..f404343a0ae3
--- /dev/null
+++ b/tools/testing/selftests/kvm/aarch64/hypercalls.c
@@ -0,0 +1,335 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/* hypercalls: Check the ARM64's psuedo-firmware bitmap register interface.
+ *
+ * The test validates the basic hypercall functionalities that are exposed
+ * via the psuedo-firmware bitmap register. This includes the registers'
+ * read/write behavior before and after the VM has started, and if the
+ * hypercalls are properly masked or unmasked to the guest when disabled or
+ * enabled from the KVM userspace, respectively.
+ */
+
+#include <errno.h>
+#include <linux/arm-smccc.h>
+#include <asm/kvm.h>
+#include <kvm_util.h>
+
+#include "processor.h"
+
+#define FW_REG_ULIMIT_VAL(max_feat_bit) (GENMASK(max_feat_bit, 0))
+
+/* Last valid bits of the bitmapped firmware registers */
+#define KVM_REG_ARM_STD_BMAP_BIT_MAX 0
+#define KVM_REG_ARM_STD_HYP_BMAP_BIT_MAX 0
+#define KVM_REG_ARM_VENDOR_HYP_BMAP_BIT_MAX 1
+
+struct kvm_fw_reg_info {
+ uint64_t reg; /* Register definition */
+ uint64_t max_feat_bit; /* Bit that represents the upper limit of the feature-map */
+};
+
+#define FW_REG_INFO(r) \
+ { \
+ .reg = r, \
+ .max_feat_bit = r##_BIT_MAX, \
+ }
+
+static const struct kvm_fw_reg_info fw_reg_info[] = {
+ FW_REG_INFO(KVM_REG_ARM_STD_BMAP),
+ FW_REG_INFO(KVM_REG_ARM_STD_HYP_BMAP),
+ FW_REG_INFO(KVM_REG_ARM_VENDOR_HYP_BMAP),
+};
+
+enum test_stage {
+ TEST_STAGE_REG_IFACE,
+ TEST_STAGE_HVC_IFACE_FEAT_DISABLED,
+ TEST_STAGE_HVC_IFACE_FEAT_ENABLED,
+ TEST_STAGE_HVC_IFACE_FALSE_INFO,
+ TEST_STAGE_END,
+};
+
+static int stage = TEST_STAGE_REG_IFACE;
+
+struct test_hvc_info {
+ uint32_t func_id;
+ uint64_t arg1;
+};
+
+#define TEST_HVC_INFO(f, a1) \
+ { \
+ .func_id = f, \
+ .arg1 = a1, \
+ }
+
+static const struct test_hvc_info hvc_info[] = {
+ /* KVM_REG_ARM_STD_BMAP */
+ TEST_HVC_INFO(ARM_SMCCC_TRNG_VERSION, 0),
+ TEST_HVC_INFO(ARM_SMCCC_TRNG_FEATURES, ARM_SMCCC_TRNG_RND64),
+ TEST_HVC_INFO(ARM_SMCCC_TRNG_GET_UUID, 0),
+ TEST_HVC_INFO(ARM_SMCCC_TRNG_RND32, 0),
+ TEST_HVC_INFO(ARM_SMCCC_TRNG_RND64, 0),
+
+ /* KVM_REG_ARM_STD_HYP_BMAP */
+ TEST_HVC_INFO(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, ARM_SMCCC_HV_PV_TIME_FEATURES),
+ TEST_HVC_INFO(ARM_SMCCC_HV_PV_TIME_FEATURES, ARM_SMCCC_HV_PV_TIME_ST),
+ TEST_HVC_INFO(ARM_SMCCC_HV_PV_TIME_ST, 0),
+
+ /* KVM_REG_ARM_VENDOR_HYP_BMAP */
+ TEST_HVC_INFO(ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID,
+ ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID),
+ TEST_HVC_INFO(ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID, 0),
+ TEST_HVC_INFO(ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID, KVM_PTP_VIRT_COUNTER),
+};
+
+/* Feed false hypercall info to test the KVM behavior */
+static const struct test_hvc_info false_hvc_info[] = {
+ /* Feature support check against a different family of hypercalls */
+ TEST_HVC_INFO(ARM_SMCCC_TRNG_FEATURES, ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID),
+ TEST_HVC_INFO(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, ARM_SMCCC_TRNG_RND64),
+ TEST_HVC_INFO(ARM_SMCCC_HV_PV_TIME_FEATURES, ARM_SMCCC_TRNG_RND64),
+};
+
+static void guest_test_hvc(const struct test_hvc_info *hc_info)
+{
+ unsigned int i;
+ struct arm_smccc_res res;
+ unsigned int hvc_info_arr_sz;
+
+ hvc_info_arr_sz =
+ hc_info == hvc_info ? ARRAY_SIZE(hvc_info) : ARRAY_SIZE(false_hvc_info);
+
+ for (i = 0; i < hvc_info_arr_sz; i++, hc_info++) {
+ memset(&res, 0, sizeof(res));
+ smccc_hvc(hc_info->func_id, hc_info->arg1, 0, 0, 0, 0, 0, 0, &res);
+
+ switch (stage) {
+ case TEST_STAGE_HVC_IFACE_FEAT_DISABLED:
+ case TEST_STAGE_HVC_IFACE_FALSE_INFO:
+ GUEST_ASSERT_3(res.a0 == SMCCC_RET_NOT_SUPPORTED,
+ res.a0, hc_info->func_id, hc_info->arg1);
+ break;
+ case TEST_STAGE_HVC_IFACE_FEAT_ENABLED:
+ GUEST_ASSERT_3(res.a0 != SMCCC_RET_NOT_SUPPORTED,
+ res.a0, hc_info->func_id, hc_info->arg1);
+ break;
+ default:
+ GUEST_ASSERT_1(0, stage);
+ }
+ }
+}
+
+static void guest_code(void)
+{
+ while (stage != TEST_STAGE_END) {
+ switch (stage) {
+ case TEST_STAGE_REG_IFACE:
+ break;
+ case TEST_STAGE_HVC_IFACE_FEAT_DISABLED:
+ case TEST_STAGE_HVC_IFACE_FEAT_ENABLED:
+ guest_test_hvc(hvc_info);
+ break;
+ case TEST_STAGE_HVC_IFACE_FALSE_INFO:
+ guest_test_hvc(false_hvc_info);
+ break;
+ default:
+ GUEST_ASSERT_1(0, stage);
+ }
+
+ GUEST_SYNC(stage);
+ }
+
+ GUEST_DONE();
+}
+
+static int set_fw_reg(struct kvm_vm *vm, uint64_t id, uint64_t val)
+{
+ struct kvm_one_reg reg = {
+ .id = id,
+ .addr = (uint64_t)&val,
+ };
+
+ return _vcpu_ioctl(vm, 0, KVM_SET_ONE_REG, ®);
+}
+
+static void get_fw_reg(struct kvm_vm *vm, uint64_t id, uint64_t *addr)
+{
+ struct kvm_one_reg reg = {
+ .id = id,
+ .addr = (uint64_t)addr,
+ };
+
+ vcpu_ioctl(vm, 0, KVM_GET_ONE_REG, ®);
+}
+
+struct st_time {
+ uint32_t rev;
+ uint32_t attr;
+ uint64_t st_time;
+};
+
+#define STEAL_TIME_SIZE ((sizeof(struct st_time) + 63) & ~63)
+#define ST_GPA_BASE (1 << 30)
+
+static void steal_time_init(struct kvm_vm *vm)
+{
+ uint64_t st_ipa = (ulong)ST_GPA_BASE;
+ unsigned int gpages;
+ struct kvm_device_attr dev = {
+ .group = KVM_ARM_VCPU_PVTIME_CTRL,
+ .attr = KVM_ARM_VCPU_PVTIME_IPA,
+ .addr = (uint64_t)&st_ipa,
+ };
+
+ gpages = vm_calc_num_guest_pages(VM_MODE_DEFAULT, STEAL_TIME_SIZE);
+ vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, ST_GPA_BASE, 1, gpages, 0);
+
+ vcpu_ioctl(vm, 0, KVM_SET_DEVICE_ATTR, &dev);
+}
+
+static void test_fw_regs_before_vm_start(struct kvm_vm *vm)
+{
+ uint64_t val;
+ unsigned int i;
+ int ret;
+
+ for (i = 0; i < ARRAY_SIZE(fw_reg_info); i++) {
+ const struct kvm_fw_reg_info *reg_info = &fw_reg_info[i];
+
+ /* First 'read' should be an upper limit of the features supported */
+ get_fw_reg(vm, reg_info->reg, &val);
+ TEST_ASSERT(val == FW_REG_ULIMIT_VAL(reg_info->max_feat_bit),
+ "Expected all the features to be set for reg: 0x%lx; expected: 0x%lx; read: 0x%lx\n",
+ reg_info->reg, FW_REG_ULIMIT_VAL(reg_info->max_feat_bit), val);
+
+ /* Test a 'write' by disabling all the features of the register map */
+ ret = set_fw_reg(vm, reg_info->reg, 0);
+ TEST_ASSERT(ret == 0,
+ "Failed to clear all the features of reg: 0x%lx; ret: %d\n",
+ reg_info->reg, errno);
+
+ get_fw_reg(vm, reg_info->reg, &val);
+ TEST_ASSERT(val == 0,
+ "Expected all the features to be cleared for reg: 0x%lx\n", reg_info->reg);
+
+ /*
+ * Test enabling a feature that's not supported.
+ * Avoid this check if all the bits are occupied.
+ */
+ if (reg_info->max_feat_bit < 63) {
+ ret = set_fw_reg(vm, reg_info->reg, BIT(reg_info->max_feat_bit + 1));
+ TEST_ASSERT(ret != 0 && errno == EINVAL,
+ "Unexpected behavior or return value (%d) while setting an unsupported feature for reg: 0x%lx\n",
+ errno, reg_info->reg);
+ }
+ }
+}
+
+static void test_fw_regs_after_vm_start(struct kvm_vm *vm)
+{
+ uint64_t val;
+ unsigned int i;
+ int ret;
+
+ for (i = 0; i < ARRAY_SIZE(fw_reg_info); i++) {
+ const struct kvm_fw_reg_info *reg_info = &fw_reg_info[i];
+
+ /*
+ * Before starting the VM, the test clears all the bits.
+ * Check if that's still the case.
+ */
+ get_fw_reg(vm, reg_info->reg, &val);
+ TEST_ASSERT(val == 0,
+ "Expected all the features to be cleared for reg: 0x%lx\n",
+ reg_info->reg);
+
+ /*
+ * Set all the features for this register again. KVM shouldn't
+ * allow this as the VM is running.
+ */
+ ret = set_fw_reg(vm, reg_info->reg, FW_REG_ULIMIT_VAL(reg_info->max_feat_bit));
+ TEST_ASSERT(ret != 0 && errno == EBUSY,
+ "Unexpected behavior or return value (%d) while setting a feature while VM is running for reg: 0x%lx\n",
+ errno, reg_info->reg);
+ }
+}
+
+static struct kvm_vm *test_vm_create(void)
+{
+ struct kvm_vm *vm;
+
+ vm = vm_create_default(0, 0, guest_code);
+
+ ucall_init(vm, NULL);
+ steal_time_init(vm);
+
+ return vm;
+}
+
+static struct kvm_vm *test_guest_stage(struct kvm_vm *vm)
+{
+ struct kvm_vm *ret_vm = vm;
+
+ pr_debug("Stage: %d\n", stage);
+
+ switch (stage) {
+ case TEST_STAGE_REG_IFACE:
+ test_fw_regs_after_vm_start(vm);
+ break;
+ case TEST_STAGE_HVC_IFACE_FEAT_DISABLED:
+ /* Start a new VM so that all the features are now enabled by default */
+ kvm_vm_free(vm);
+ ret_vm = test_vm_create();
+ break;
+ case TEST_STAGE_HVC_IFACE_FEAT_ENABLED:
+ case TEST_STAGE_HVC_IFACE_FALSE_INFO:
+ break;
+ default:
+ TEST_FAIL("Unknown test stage: %d\n", stage);
+ }
+
+ stage++;
+ sync_global_to_guest(vm, stage);
+
+ return ret_vm;
+}
+
+static void test_run(void)
+{
+ struct kvm_vm *vm;
+ struct ucall uc;
+ bool guest_done = false;
+
+ vm = test_vm_create();
+
+ test_fw_regs_before_vm_start(vm);
+
+ while (!guest_done) {
+ vcpu_run(vm, 0);
+
+ switch (get_ucall(vm, 0, &uc)) {
+ case UCALL_SYNC:
+ vm = test_guest_stage(vm);
+ break;
+ case UCALL_DONE:
+ guest_done = true;
+ break;
+ case UCALL_ABORT:
+ TEST_FAIL("%s at %s:%ld\n\tvalues: 0x%lx, 0x%lx; 0x%lx, stage: %u",
+ (const char *)uc.args[0], __FILE__, uc.args[1],
+ uc.args[2], uc.args[3], uc.args[4], stage);
+ break;
+ default:
+ TEST_FAIL("Unexpected guest exit\n");
+ }
+ }
+
+ kvm_vm_free(vm);
+}
+
+int main(void)
+{
+ setbuf(stdout, NULL);
+
+ test_run();
+ return 0;
+}
--
2.36.0.rc2.479.g8af0fa9b8e-goog
KVM regularly introduces new hypercall services to the guests without
any consent from the userspace. This means, the guests can observe
hypercall services in and out as they migrate across various host
kernel versions. This could be a major problem if the guest
discovered a hypercall, started using it, and after getting migrated
to an older kernel realizes that it's no longer available. Depending
on how the guest handles the change, there's a potential chance that
the guest would just panic.
As a result, there's a need for the userspace to elect the services
that it wishes the guest to discover. It can elect these services
based on the kernels spread across its (migration) fleet. To remedy
this, extend the existing firmware pseudo-registers, such as
KVM_REG_ARM_PSCI_VERSION, but by creating a new COPROC register space
for all the hypercall services available.
These firmware registers are categorized based on the service call
owners, but unlike the existing firmware pseudo-registers, they hold
the features supported in the form of a bitmap.
During the VM initialization, the registers are set to upper-limit of
the features supported by the corresponding registers. It's expected
that the VMMs discover the features provided by each register via
GET_ONE_REG, and write back the desired values using SET_ONE_REG.
KVM allows this modification only until the VM has started.
Some of the standard features are not mapped to any bits of the
registers. But since they can recreate the original problem of
making it available without userspace's consent, they need to
be explicitly added to the case-list in
kvm_hvc_call_default_allowed(). Any function-id that's not enabled
via the bitmap, or not listed in kvm_hvc_call_default_allowed, will
be returned as SMCCC_RET_NOT_SUPPORTED to the guest.
Older userspace code can simply ignore the feature and the
hypercall services will be exposed unconditionally to the guests,
thus ensuring backward compatibility.
In this patch, the framework adds the register only for ARM's standard
secure services (owner value 4). Currently, this includes support only
for ARM True Random Number Generator (TRNG) service, with bit-0 of the
register representing mandatory features of v1.0. Other services are
momentarily added in the upcoming patches.
Signed-off-by: Raghavendra Rao Ananta <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 12 ++++
arch/arm64/include/uapi/asm/kvm.h | 9 +++
arch/arm64/kvm/arm.c | 1 +
arch/arm64/kvm/guest.c | 8 ++-
arch/arm64/kvm/hypercalls.c | 94 +++++++++++++++++++++++++++++++
arch/arm64/kvm/psci.c | 13 +++++
include/kvm/arm_hypercalls.h | 6 ++
include/kvm/arm_psci.h | 2 +-
8 files changed, 142 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 94a27a7520f4..df07f4c10197 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -101,6 +101,15 @@ struct kvm_s2_mmu {
struct kvm_arch_memory_slot {
};
+/**
+ * struct kvm_smccc_features: Descriptor the hypercall services exposed to the guests
+ *
+ * @std_bmap: Bitmap of standard secure service calls
+ */
+struct kvm_smccc_features {
+ unsigned long std_bmap;
+};
+
struct kvm_arch {
struct kvm_s2_mmu mmu;
@@ -150,6 +159,9 @@ struct kvm_arch {
u8 pfr0_csv2;
u8 pfr0_csv3;
+
+ /* Hypercall features firmware registers' descriptor */
+ struct kvm_smccc_features smccc_feat;
};
struct kvm_vcpu_fault_info {
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index c1b6ddc02d2f..0b79d2dc6ffd 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -332,6 +332,15 @@ struct kvm_arm_copy_mte_tags {
#define KVM_ARM64_SVE_VLS_WORDS \
((KVM_ARM64_SVE_VQ_MAX - KVM_ARM64_SVE_VQ_MIN) / 64 + 1)
+/* Bitmap feature firmware registers */
+#define KVM_REG_ARM_FW_FEAT_BMAP (0x0016 << KVM_REG_ARM_COPROC_SHIFT)
+#define KVM_REG_ARM_FW_FEAT_BMAP_REG(r) (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
+ KVM_REG_ARM_FW_FEAT_BMAP | \
+ ((r) & 0xffff))
+
+#define KVM_REG_ARM_STD_BMAP KVM_REG_ARM_FW_FEAT_BMAP_REG(0)
+#define KVM_REG_ARM_STD_BIT_TRNG_V1_0 0
+
/* Device Control API: ARM VGIC */
#define KVM_DEV_ARM_VGIC_GRP_ADDR 0
#define KVM_DEV_ARM_VGIC_GRP_DIST_REGS 1
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 523bc934fe2f..a37fadbd617e 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -156,6 +156,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
kvm->arch.max_vcpus = kvm_arm_default_max_vcpus();
set_default_spectre(kvm);
+ kvm_arm_init_hypercalls(kvm);
return ret;
out_free_stage2_pgd:
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 0d5cca56cbda..8c607199cad1 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -756,7 +756,9 @@ int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
switch (reg->id & KVM_REG_ARM_COPROC_MASK) {
case KVM_REG_ARM_CORE: return get_core_reg(vcpu, reg);
- case KVM_REG_ARM_FW: return kvm_arm_get_fw_reg(vcpu, reg);
+ case KVM_REG_ARM_FW:
+ case KVM_REG_ARM_FW_FEAT_BMAP:
+ return kvm_arm_get_fw_reg(vcpu, reg);
case KVM_REG_ARM64_SVE: return get_sve_reg(vcpu, reg);
}
@@ -774,7 +776,9 @@ int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
switch (reg->id & KVM_REG_ARM_COPROC_MASK) {
case KVM_REG_ARM_CORE: return set_core_reg(vcpu, reg);
- case KVM_REG_ARM_FW: return kvm_arm_set_fw_reg(vcpu, reg);
+ case KVM_REG_ARM_FW:
+ case KVM_REG_ARM_FW_FEAT_BMAP:
+ return kvm_arm_set_fw_reg(vcpu, reg);
case KVM_REG_ARM64_SVE: return set_sve_reg(vcpu, reg);
}
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index fa6d9378d8e7..df55a04d2fe8 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -58,6 +58,48 @@ static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
val[3] = lower_32_bits(cycles);
}
+static bool kvm_arm_fw_reg_feat_enabled(unsigned long *reg_bmap, unsigned long feat_bit)
+{
+ return test_bit(feat_bit, reg_bmap);
+}
+
+static bool kvm_hvc_call_default_allowed(struct kvm_vcpu *vcpu, u32 func_id)
+{
+ switch (func_id) {
+ /*
+ * List of function-ids that are not gated with the bitmapped feature
+ * firmware registers, and are to be allowed for servicing the call by default.
+ */
+ case ARM_SMCCC_VERSION_FUNC_ID:
+ case ARM_SMCCC_ARCH_FEATURES_FUNC_ID:
+ case ARM_SMCCC_HV_PV_TIME_FEATURES:
+ case ARM_SMCCC_HV_PV_TIME_ST:
+ case ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID:
+ case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
+ case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
+ return true;
+ default:
+ return kvm_psci_func_id_is_valid(vcpu, func_id);
+ }
+}
+
+static bool kvm_hvc_call_allowed(struct kvm_vcpu *vcpu, u32 func_id)
+{
+ struct kvm_smccc_features *smccc_feat = &vcpu->kvm->arch.smccc_feat;
+
+ switch (func_id) {
+ case ARM_SMCCC_TRNG_VERSION:
+ case ARM_SMCCC_TRNG_FEATURES:
+ case ARM_SMCCC_TRNG_GET_UUID:
+ case ARM_SMCCC_TRNG_RND32:
+ case ARM_SMCCC_TRNG_RND64:
+ return kvm_arm_fw_reg_feat_enabled(&smccc_feat->std_bmap,
+ KVM_REG_ARM_STD_BIT_TRNG_V1_0);
+ default:
+ return kvm_hvc_call_default_allowed(vcpu, func_id);
+ }
+}
+
int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
{
u32 func_id = smccc_get_function(vcpu);
@@ -65,6 +107,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
u32 feature;
gpa_t gpa;
+ if (!kvm_hvc_call_allowed(vcpu, func_id))
+ goto out;
+
switch (func_id) {
case ARM_SMCCC_VERSION_FUNC_ID:
val[0] = ARM_SMCCC_VERSION_1_1;
@@ -155,6 +200,7 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
return kvm_psci_call(vcpu);
}
+out:
smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
return 1;
}
@@ -164,8 +210,16 @@ static const u64 kvm_arm_fw_reg_ids[] = {
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1,
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2,
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3,
+ KVM_REG_ARM_STD_BMAP,
};
+void kvm_arm_init_hypercalls(struct kvm *kvm)
+{
+ struct kvm_smccc_features *smccc_feat = &kvm->arch.smccc_feat;
+
+ smccc_feat->std_bmap = KVM_ARM_SMCCC_STD_FEATURES;
+}
+
int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu)
{
return ARRAY_SIZE(kvm_arm_fw_reg_ids);
@@ -237,6 +291,7 @@ static int get_kernel_wa_level(u64 regid)
int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
{
+ struct kvm_smccc_features *smccc_feat = &vcpu->kvm->arch.smccc_feat;
void __user *uaddr = (void __user *)(long)reg->addr;
u64 val;
@@ -249,6 +304,9 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3:
val = get_kernel_wa_level(reg->id) & KVM_REG_FEATURE_LEVEL_MASK;
break;
+ case KVM_REG_ARM_STD_BMAP:
+ val = READ_ONCE(smccc_feat->std_bmap);
+ break;
default:
return -ENOENT;
}
@@ -259,6 +317,40 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
return 0;
}
+static int kvm_arm_set_fw_reg_bmap(struct kvm_vcpu *vcpu, u64 reg_id, u64 val)
+{
+ int ret = 0;
+ struct kvm *kvm = vcpu->kvm;
+ struct kvm_smccc_features *smccc_feat = &kvm->arch.smccc_feat;
+ unsigned long *fw_reg_bmap, fw_reg_features;
+
+ switch (reg_id) {
+ case KVM_REG_ARM_STD_BMAP:
+ fw_reg_bmap = &smccc_feat->std_bmap;
+ fw_reg_features = KVM_ARM_SMCCC_STD_FEATURES;
+ break;
+ default:
+ return -ENOENT;
+ }
+
+ /* Check for unsupported bit */
+ if (val & ~fw_reg_features)
+ return -EINVAL;
+
+ mutex_lock(&kvm->lock);
+
+ /* Return -EBUSY if the VM (any vCPU) has already started running. */
+ if (test_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &kvm->arch.flags)) {
+ ret = -EBUSY;
+ goto out;
+ }
+
+ WRITE_ONCE(*fw_reg_bmap, val);
+out:
+ mutex_unlock(&kvm->lock);
+ return ret;
+}
+
int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
{
void __user *uaddr = (void __user *)(long)reg->addr;
@@ -337,6 +429,8 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
return -EINVAL;
return 0;
+ case KVM_REG_ARM_STD_BMAP:
+ return kvm_arm_set_fw_reg_bmap(vcpu, reg->id, val);
default:
return -ENOENT;
}
diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index 346535169faa..67d1273e8086 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -436,3 +436,16 @@ int kvm_psci_call(struct kvm_vcpu *vcpu)
return -EINVAL;
}
}
+
+bool kvm_psci_func_id_is_valid(struct kvm_vcpu *vcpu, u32 func_id)
+{
+ /* PSCI 0.1 doesn't comply with the standard SMCCC */
+ if (kvm_psci_version(vcpu) == KVM_ARM_PSCI_0_1)
+ return (func_id == KVM_PSCI_FN_CPU_OFF || func_id == KVM_PSCI_FN_CPU_ON);
+
+ if (ARM_SMCCC_OWNER_NUM(func_id) == ARM_SMCCC_OWNER_STANDARD &&
+ ARM_SMCCC_FUNC_NUM(func_id) >= 0 && ARM_SMCCC_FUNC_NUM(func_id) <= 0x1f)
+ return true;
+
+ return false;
+}
diff --git a/include/kvm/arm_hypercalls.h b/include/kvm/arm_hypercalls.h
index 5d38628a8d04..499b45b607b6 100644
--- a/include/kvm/arm_hypercalls.h
+++ b/include/kvm/arm_hypercalls.h
@@ -6,6 +6,11 @@
#include <asm/kvm_emulate.h>
+/* Last valid bits of the bitmapped firmware registers */
+#define KVM_REG_ARM_STD_BMAP_BIT_MAX 0
+
+#define KVM_ARM_SMCCC_STD_FEATURES GENMASK(KVM_REG_ARM_STD_BMAP_BIT_MAX, 0)
+
int kvm_hvc_call_handler(struct kvm_vcpu *vcpu);
static inline u32 smccc_get_function(struct kvm_vcpu *vcpu)
@@ -42,6 +47,7 @@ static inline void smccc_set_retval(struct kvm_vcpu *vcpu,
struct kvm_one_reg;
+void kvm_arm_init_hypercalls(struct kvm *kvm);
int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu);
int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
diff --git a/include/kvm/arm_psci.h b/include/kvm/arm_psci.h
index 6e55b9283789..c47be3e26965 100644
--- a/include/kvm/arm_psci.h
+++ b/include/kvm/arm_psci.h
@@ -36,7 +36,7 @@ static inline int kvm_psci_version(struct kvm_vcpu *vcpu)
return KVM_ARM_PSCI_0_1;
}
-
int kvm_psci_call(struct kvm_vcpu *vcpu);
+bool kvm_psci_func_id_is_valid(struct kvm_vcpu *vcpu, u32 func_id);
#endif /* __KVM_ARM_PSCI_H__ */
--
2.36.0.rc2.479.g8af0fa9b8e-goog
Hi Raghu,
On Fri, Apr 22, 2022 at 5:03 PM Raghavendra Rao Ananta
<[email protected]> wrote:
>
> KVM regularly introduces new hypercall services to the guests without
> any consent from the userspace. This means, the guests can observe
> hypercall services in and out as they migrate across various host
> kernel versions. This could be a major problem if the guest
> discovered a hypercall, started using it, and after getting migrated
> to an older kernel realizes that it's no longer available. Depending
> on how the guest handles the change, there's a potential chance that
> the guest would just panic.
>
> As a result, there's a need for the userspace to elect the services
> that it wishes the guest to discover. It can elect these services
> based on the kernels spread across its (migration) fleet. To remedy
> this, extend the existing firmware pseudo-registers, such as
> KVM_REG_ARM_PSCI_VERSION, but by creating a new COPROC register space
> for all the hypercall services available.
>
> These firmware registers are categorized based on the service call
> owners, but unlike the existing firmware pseudo-registers, they hold
> the features supported in the form of a bitmap.
>
> During the VM initialization, the registers are set to upper-limit of
> the features supported by the corresponding registers. It's expected
> that the VMMs discover the features provided by each register via
> GET_ONE_REG, and write back the desired values using SET_ONE_REG.
> KVM allows this modification only until the VM has started.
>
> Some of the standard features are not mapped to any bits of the
> registers. But since they can recreate the original problem of
> making it available without userspace's consent, they need to
> be explicitly added to the case-list in
> kvm_hvc_call_default_allowed(). Any function-id that's not enabled
> via the bitmap, or not listed in kvm_hvc_call_default_allowed, will
> be returned as SMCCC_RET_NOT_SUPPORTED to the guest.
>
> Older userspace code can simply ignore the feature and the
> hypercall services will be exposed unconditionally to the guests,
> thus ensuring backward compatibility.
>
> In this patch, the framework adds the register only for ARM's standard
> secure services (owner value 4). Currently, this includes support only
> for ARM True Random Number Generator (TRNG) service, with bit-0 of the
> register representing mandatory features of v1.0. Other services are
> momentarily added in the upcoming patches.
>
> Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> ---
> arch/arm64/include/asm/kvm_host.h | 12 ++++
> arch/arm64/include/uapi/asm/kvm.h | 9 +++
> arch/arm64/kvm/arm.c | 1 +
> arch/arm64/kvm/guest.c | 8 ++-
> arch/arm64/kvm/hypercalls.c | 94 +++++++++++++++++++++++++++++++
> arch/arm64/kvm/psci.c | 13 +++++
> include/kvm/arm_hypercalls.h | 6 ++
> include/kvm/arm_psci.h | 2 +-
> 8 files changed, 142 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 94a27a7520f4..df07f4c10197 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -101,6 +101,15 @@ struct kvm_s2_mmu {
> struct kvm_arch_memory_slot {
> };
>
> +/**
> + * struct kvm_smccc_features: Descriptor the hypercall services exposed to the guests
> + *
> + * @std_bmap: Bitmap of standard secure service calls
> + */
> +struct kvm_smccc_features {
> + unsigned long std_bmap;
> +};
> +
> struct kvm_arch {
> struct kvm_s2_mmu mmu;
>
> @@ -150,6 +159,9 @@ struct kvm_arch {
>
> u8 pfr0_csv2;
> u8 pfr0_csv3;
> +
> + /* Hypercall features firmware registers' descriptor */
> + struct kvm_smccc_features smccc_feat;
> };
>
> struct kvm_vcpu_fault_info {
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index c1b6ddc02d2f..0b79d2dc6ffd 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -332,6 +332,15 @@ struct kvm_arm_copy_mte_tags {
> #define KVM_ARM64_SVE_VLS_WORDS \
> ((KVM_ARM64_SVE_VQ_MAX - KVM_ARM64_SVE_VQ_MIN) / 64 + 1)
>
> +/* Bitmap feature firmware registers */
> +#define KVM_REG_ARM_FW_FEAT_BMAP (0x0016 << KVM_REG_ARM_COPROC_SHIFT)
> +#define KVM_REG_ARM_FW_FEAT_BMAP_REG(r) (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
> + KVM_REG_ARM_FW_FEAT_BMAP | \
> + ((r) & 0xffff))
> +
> +#define KVM_REG_ARM_STD_BMAP KVM_REG_ARM_FW_FEAT_BMAP_REG(0)
> +#define KVM_REG_ARM_STD_BIT_TRNG_V1_0 0
> +
> /* Device Control API: ARM VGIC */
> #define KVM_DEV_ARM_VGIC_GRP_ADDR 0
> #define KVM_DEV_ARM_VGIC_GRP_DIST_REGS 1
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 523bc934fe2f..a37fadbd617e 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -156,6 +156,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> kvm->arch.max_vcpus = kvm_arm_default_max_vcpus();
>
> set_default_spectre(kvm);
> + kvm_arm_init_hypercalls(kvm);
>
> return ret;
> out_free_stage2_pgd:
> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index 0d5cca56cbda..8c607199cad1 100644
> --- a/arch/arm64/kvm/guest.c
> +++ b/arch/arm64/kvm/guest.c
> @@ -756,7 +756,9 @@ int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>
> switch (reg->id & KVM_REG_ARM_COPROC_MASK) {
> case KVM_REG_ARM_CORE: return get_core_reg(vcpu, reg);
> - case KVM_REG_ARM_FW: return kvm_arm_get_fw_reg(vcpu, reg);
> + case KVM_REG_ARM_FW:
> + case KVM_REG_ARM_FW_FEAT_BMAP:
> + return kvm_arm_get_fw_reg(vcpu, reg);
> case KVM_REG_ARM64_SVE: return get_sve_reg(vcpu, reg);
> }
>
> @@ -774,7 +776,9 @@ int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>
> switch (reg->id & KVM_REG_ARM_COPROC_MASK) {
> case KVM_REG_ARM_CORE: return set_core_reg(vcpu, reg);
> - case KVM_REG_ARM_FW: return kvm_arm_set_fw_reg(vcpu, reg);
> + case KVM_REG_ARM_FW:
> + case KVM_REG_ARM_FW_FEAT_BMAP:
> + return kvm_arm_set_fw_reg(vcpu, reg);
> case KVM_REG_ARM64_SVE: return set_sve_reg(vcpu, reg);
> }
>
> diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> index fa6d9378d8e7..df55a04d2fe8 100644
> --- a/arch/arm64/kvm/hypercalls.c
> +++ b/arch/arm64/kvm/hypercalls.c
> @@ -58,6 +58,48 @@ static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
> val[3] = lower_32_bits(cycles);
> }
>
> +static bool kvm_arm_fw_reg_feat_enabled(unsigned long *reg_bmap, unsigned long feat_bit)
> +{
> + return test_bit(feat_bit, reg_bmap);
> +}
> +
> +static bool kvm_hvc_call_default_allowed(struct kvm_vcpu *vcpu, u32 func_id)
> +{
> + switch (func_id) {
> + /*
> + * List of function-ids that are not gated with the bitmapped feature
> + * firmware registers, and are to be allowed for servicing the call by default.
> + */
> + case ARM_SMCCC_VERSION_FUNC_ID:
> + case ARM_SMCCC_ARCH_FEATURES_FUNC_ID:
> + case ARM_SMCCC_HV_PV_TIME_FEATURES:
> + case ARM_SMCCC_HV_PV_TIME_ST:
> + case ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID:
> + case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
> + case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
> + return true;
> + default:
> + return kvm_psci_func_id_is_valid(vcpu, func_id);
> + }
> +}
> +
> +static bool kvm_hvc_call_allowed(struct kvm_vcpu *vcpu, u32 func_id)
> +{
> + struct kvm_smccc_features *smccc_feat = &vcpu->kvm->arch.smccc_feat;
> +
> + switch (func_id) {
> + case ARM_SMCCC_TRNG_VERSION:
> + case ARM_SMCCC_TRNG_FEATURES:
> + case ARM_SMCCC_TRNG_GET_UUID:
> + case ARM_SMCCC_TRNG_RND32:
> + case ARM_SMCCC_TRNG_RND64:
> + return kvm_arm_fw_reg_feat_enabled(&smccc_feat->std_bmap,
> + KVM_REG_ARM_STD_BIT_TRNG_V1_0);
> + default:
> + return kvm_hvc_call_default_allowed(vcpu, func_id);
> + }
> +}
> +
> int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> {
> u32 func_id = smccc_get_function(vcpu);
> @@ -65,6 +107,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> u32 feature;
> gpa_t gpa;
>
> + if (!kvm_hvc_call_allowed(vcpu, func_id))
> + goto out;
> +
> switch (func_id) {
> case ARM_SMCCC_VERSION_FUNC_ID:
> val[0] = ARM_SMCCC_VERSION_1_1;
> @@ -155,6 +200,7 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> return kvm_psci_call(vcpu);
> }
>
> +out:
> smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
> return 1;
> }
> @@ -164,8 +210,16 @@ static const u64 kvm_arm_fw_reg_ids[] = {
> KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1,
> KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2,
> KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3,
> + KVM_REG_ARM_STD_BMAP,
> };
>
> +void kvm_arm_init_hypercalls(struct kvm *kvm)
> +{
> + struct kvm_smccc_features *smccc_feat = &kvm->arch.smccc_feat;
> +
> + smccc_feat->std_bmap = KVM_ARM_SMCCC_STD_FEATURES;
> +}
> +
> int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu)
> {
> return ARRAY_SIZE(kvm_arm_fw_reg_ids);
> @@ -237,6 +291,7 @@ static int get_kernel_wa_level(u64 regid)
>
> int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> {
> + struct kvm_smccc_features *smccc_feat = &vcpu->kvm->arch.smccc_feat;
> void __user *uaddr = (void __user *)(long)reg->addr;
> u64 val;
>
> @@ -249,6 +304,9 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3:
> val = get_kernel_wa_level(reg->id) & KVM_REG_FEATURE_LEVEL_MASK;
> break;
> + case KVM_REG_ARM_STD_BMAP:
> + val = READ_ONCE(smccc_feat->std_bmap);
> + break;
> default:
> return -ENOENT;
> }
> @@ -259,6 +317,40 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> return 0;
> }
>
> +static int kvm_arm_set_fw_reg_bmap(struct kvm_vcpu *vcpu, u64 reg_id, u64 val)
> +{
> + int ret = 0;
> + struct kvm *kvm = vcpu->kvm;
> + struct kvm_smccc_features *smccc_feat = &kvm->arch.smccc_feat;
> + unsigned long *fw_reg_bmap, fw_reg_features;
> +
> + switch (reg_id) {
> + case KVM_REG_ARM_STD_BMAP:
> + fw_reg_bmap = &smccc_feat->std_bmap;
> + fw_reg_features = KVM_ARM_SMCCC_STD_FEATURES;
> + break;
> + default:
> + return -ENOENT;
> + }
> +
> + /* Check for unsupported bit */
> + if (val & ~fw_reg_features)
> + return -EINVAL;
> +
> + mutex_lock(&kvm->lock);
Why don't you check if the register value will be modified before
getting the lock ? (then there is nothing to do)
It would help reduce unnecessary serialization for live migration
(even without the vm-scoped register capability).
> +
> + /* Return -EBUSY if the VM (any vCPU) has already started running. */
> + if (test_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &kvm->arch.flags)) {
> + ret = -EBUSY;
> + goto out;
> + }
I just would like to make sure that you are sure that existing
userspace you know will not run KVM_RUN for any vCPUs until
KVM_SET_ONE_REG is complete for all vCPUs (even for migration),
correct ?
> +
> + WRITE_ONCE(*fw_reg_bmap, val);
> +out:
> + mutex_unlock(&kvm->lock);
> + return ret;
> +}
> +
> int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> {
> void __user *uaddr = (void __user *)(long)reg->addr;
> @@ -337,6 +429,8 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> return -EINVAL;
>
> return 0;
> + case KVM_REG_ARM_STD_BMAP:
> + return kvm_arm_set_fw_reg_bmap(vcpu, reg->id, val);
> default:
> return -ENOENT;
> }
> diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> index 346535169faa..67d1273e8086 100644
> --- a/arch/arm64/kvm/psci.c
> +++ b/arch/arm64/kvm/psci.c
> @@ -436,3 +436,16 @@ int kvm_psci_call(struct kvm_vcpu *vcpu)
> return -EINVAL;
> }
> }
> +
> +bool kvm_psci_func_id_is_valid(struct kvm_vcpu *vcpu, u32 func_id)
> +{
> + /* PSCI 0.1 doesn't comply with the standard SMCCC */
> + if (kvm_psci_version(vcpu) == KVM_ARM_PSCI_0_1)
> + return (func_id == KVM_PSCI_FN_CPU_OFF || func_id == KVM_PSCI_FN_CPU_ON);
> +
> + if (ARM_SMCCC_OWNER_NUM(func_id) == ARM_SMCCC_OWNER_STANDARD &&
> + ARM_SMCCC_FUNC_NUM(func_id) >= 0 && ARM_SMCCC_FUNC_NUM(func_id) <= 0x1f)
> + return true;
For PSCI 0.1, the function checks if the funct_id is valid for
the vCPU (according to the vCPU's PSCI version).
For other version of PSCI, the function doesn't care the vCPU's
PSCI version (although supported functions depend on the PSCI
version and not all of them are defined yet, the code returns
true as long as the function id is within the reserved PSCI
function id range).
So, the behavior appears to be inconsistent.
Shouldn't it return the validity of the function id according
to the vCPU's psci version for non-PSCI 0.1 case as well ?
(Otherwise, shouldn't it return true if the function id is valid
for any of the PSCI versions ?)
Thanks,
Reiji
> +
> + return false;
> +}
> diff --git a/include/kvm/arm_hypercalls.h b/include/kvm/arm_hypercalls.h
> index 5d38628a8d04..499b45b607b6 100644
> --- a/include/kvm/arm_hypercalls.h
> +++ b/include/kvm/arm_hypercalls.h
> @@ -6,6 +6,11 @@
>
> #include <asm/kvm_emulate.h>
>
> +/* Last valid bits of the bitmapped firmware registers */
> +#define KVM_REG_ARM_STD_BMAP_BIT_MAX 0
> +
> +#define KVM_ARM_SMCCC_STD_FEATURES GENMASK(KVM_REG_ARM_STD_BMAP_BIT_MAX, 0)
> +
> int kvm_hvc_call_handler(struct kvm_vcpu *vcpu);
>
> static inline u32 smccc_get_function(struct kvm_vcpu *vcpu)
> @@ -42,6 +47,7 @@ static inline void smccc_set_retval(struct kvm_vcpu *vcpu,
>
> struct kvm_one_reg;
>
> +void kvm_arm_init_hypercalls(struct kvm *kvm);
> int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu);
> int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
> int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
> diff --git a/include/kvm/arm_psci.h b/include/kvm/arm_psci.h
> index 6e55b9283789..c47be3e26965 100644
> --- a/include/kvm/arm_psci.h
> +++ b/include/kvm/arm_psci.h
> @@ -36,7 +36,7 @@ static inline int kvm_psci_version(struct kvm_vcpu *vcpu)
> return KVM_ARM_PSCI_0_1;
> }
>
> -
> int kvm_psci_call(struct kvm_vcpu *vcpu);
> +bool kvm_psci_func_id_is_valid(struct kvm_vcpu *vcpu, u32 func_id);
>
> #endif /* __KVM_ARM_PSCI_H__ */
> --
> 2.36.0.rc2.479.g8af0fa9b8e-goog
>
Hi Reiji,
On Sun, Apr 24, 2022 at 9:52 PM Reiji Watanabe <[email protected]> wrote:
>
> Hi Raghu,
>
> On Fri, Apr 22, 2022 at 5:03 PM Raghavendra Rao Ananta
> <[email protected]> wrote:
> >
> > KVM regularly introduces new hypercall services to the guests without
> > any consent from the userspace. This means, the guests can observe
> > hypercall services in and out as they migrate across various host
> > kernel versions. This could be a major problem if the guest
> > discovered a hypercall, started using it, and after getting migrated
> > to an older kernel realizes that it's no longer available. Depending
> > on how the guest handles the change, there's a potential chance that
> > the guest would just panic.
> >
> > As a result, there's a need for the userspace to elect the services
> > that it wishes the guest to discover. It can elect these services
> > based on the kernels spread across its (migration) fleet. To remedy
> > this, extend the existing firmware pseudo-registers, such as
> > KVM_REG_ARM_PSCI_VERSION, but by creating a new COPROC register space
> > for all the hypercall services available.
> >
> > These firmware registers are categorized based on the service call
> > owners, but unlike the existing firmware pseudo-registers, they hold
> > the features supported in the form of a bitmap.
> >
> > During the VM initialization, the registers are set to upper-limit of
> > the features supported by the corresponding registers. It's expected
> > that the VMMs discover the features provided by each register via
> > GET_ONE_REG, and write back the desired values using SET_ONE_REG.
> > KVM allows this modification only until the VM has started.
> >
> > Some of the standard features are not mapped to any bits of the
> > registers. But since they can recreate the original problem of
> > making it available without userspace's consent, they need to
> > be explicitly added to the case-list in
> > kvm_hvc_call_default_allowed(). Any function-id that's not enabled
> > via the bitmap, or not listed in kvm_hvc_call_default_allowed, will
> > be returned as SMCCC_RET_NOT_SUPPORTED to the guest.
> >
> > Older userspace code can simply ignore the feature and the
> > hypercall services will be exposed unconditionally to the guests,
> > thus ensuring backward compatibility.
> >
> > In this patch, the framework adds the register only for ARM's standard
> > secure services (owner value 4). Currently, this includes support only
> > for ARM True Random Number Generator (TRNG) service, with bit-0 of the
> > register representing mandatory features of v1.0. Other services are
> > momentarily added in the upcoming patches.
> >
> > Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> > ---
> > arch/arm64/include/asm/kvm_host.h | 12 ++++
> > arch/arm64/include/uapi/asm/kvm.h | 9 +++
> > arch/arm64/kvm/arm.c | 1 +
> > arch/arm64/kvm/guest.c | 8 ++-
> > arch/arm64/kvm/hypercalls.c | 94 +++++++++++++++++++++++++++++++
> > arch/arm64/kvm/psci.c | 13 +++++
> > include/kvm/arm_hypercalls.h | 6 ++
> > include/kvm/arm_psci.h | 2 +-
> > 8 files changed, 142 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 94a27a7520f4..df07f4c10197 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -101,6 +101,15 @@ struct kvm_s2_mmu {
> > struct kvm_arch_memory_slot {
> > };
> >
> > +/**
> > + * struct kvm_smccc_features: Descriptor the hypercall services exposed to the guests
> > + *
> > + * @std_bmap: Bitmap of standard secure service calls
> > + */
> > +struct kvm_smccc_features {
> > + unsigned long std_bmap;
> > +};
> > +
> > struct kvm_arch {
> > struct kvm_s2_mmu mmu;
> >
> > @@ -150,6 +159,9 @@ struct kvm_arch {
> >
> > u8 pfr0_csv2;
> > u8 pfr0_csv3;
> > +
> > + /* Hypercall features firmware registers' descriptor */
> > + struct kvm_smccc_features smccc_feat;
> > };
> >
> > struct kvm_vcpu_fault_info {
> > diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> > index c1b6ddc02d2f..0b79d2dc6ffd 100644
> > --- a/arch/arm64/include/uapi/asm/kvm.h
> > +++ b/arch/arm64/include/uapi/asm/kvm.h
> > @@ -332,6 +332,15 @@ struct kvm_arm_copy_mte_tags {
> > #define KVM_ARM64_SVE_VLS_WORDS \
> > ((KVM_ARM64_SVE_VQ_MAX - KVM_ARM64_SVE_VQ_MIN) / 64 + 1)
> >
> > +/* Bitmap feature firmware registers */
> > +#define KVM_REG_ARM_FW_FEAT_BMAP (0x0016 << KVM_REG_ARM_COPROC_SHIFT)
> > +#define KVM_REG_ARM_FW_FEAT_BMAP_REG(r) (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
> > + KVM_REG_ARM_FW_FEAT_BMAP | \
> > + ((r) & 0xffff))
> > +
> > +#define KVM_REG_ARM_STD_BMAP KVM_REG_ARM_FW_FEAT_BMAP_REG(0)
> > +#define KVM_REG_ARM_STD_BIT_TRNG_V1_0 0
> > +
> > /* Device Control API: ARM VGIC */
> > #define KVM_DEV_ARM_VGIC_GRP_ADDR 0
> > #define KVM_DEV_ARM_VGIC_GRP_DIST_REGS 1
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 523bc934fe2f..a37fadbd617e 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -156,6 +156,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> > kvm->arch.max_vcpus = kvm_arm_default_max_vcpus();
> >
> > set_default_spectre(kvm);
> > + kvm_arm_init_hypercalls(kvm);
> >
> > return ret;
> > out_free_stage2_pgd:
> > diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> > index 0d5cca56cbda..8c607199cad1 100644
> > --- a/arch/arm64/kvm/guest.c
> > +++ b/arch/arm64/kvm/guest.c
> > @@ -756,7 +756,9 @@ int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> >
> > switch (reg->id & KVM_REG_ARM_COPROC_MASK) {
> > case KVM_REG_ARM_CORE: return get_core_reg(vcpu, reg);
> > - case KVM_REG_ARM_FW: return kvm_arm_get_fw_reg(vcpu, reg);
> > + case KVM_REG_ARM_FW:
> > + case KVM_REG_ARM_FW_FEAT_BMAP:
> > + return kvm_arm_get_fw_reg(vcpu, reg);
> > case KVM_REG_ARM64_SVE: return get_sve_reg(vcpu, reg);
> > }
> >
> > @@ -774,7 +776,9 @@ int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> >
> > switch (reg->id & KVM_REG_ARM_COPROC_MASK) {
> > case KVM_REG_ARM_CORE: return set_core_reg(vcpu, reg);
> > - case KVM_REG_ARM_FW: return kvm_arm_set_fw_reg(vcpu, reg);
> > + case KVM_REG_ARM_FW:
> > + case KVM_REG_ARM_FW_FEAT_BMAP:
> > + return kvm_arm_set_fw_reg(vcpu, reg);
> > case KVM_REG_ARM64_SVE: return set_sve_reg(vcpu, reg);
> > }
> >
> > diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> > index fa6d9378d8e7..df55a04d2fe8 100644
> > --- a/arch/arm64/kvm/hypercalls.c
> > +++ b/arch/arm64/kvm/hypercalls.c
> > @@ -58,6 +58,48 @@ static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
> > val[3] = lower_32_bits(cycles);
> > }
> >
> > +static bool kvm_arm_fw_reg_feat_enabled(unsigned long *reg_bmap, unsigned long feat_bit)
> > +{
> > + return test_bit(feat_bit, reg_bmap);
> > +}
> > +
> > +static bool kvm_hvc_call_default_allowed(struct kvm_vcpu *vcpu, u32 func_id)
> > +{
> > + switch (func_id) {
> > + /*
> > + * List of function-ids that are not gated with the bitmapped feature
> > + * firmware registers, and are to be allowed for servicing the call by default.
> > + */
> > + case ARM_SMCCC_VERSION_FUNC_ID:
> > + case ARM_SMCCC_ARCH_FEATURES_FUNC_ID:
> > + case ARM_SMCCC_HV_PV_TIME_FEATURES:
> > + case ARM_SMCCC_HV_PV_TIME_ST:
> > + case ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID:
> > + case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
> > + case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
> > + return true;
> > + default:
> > + return kvm_psci_func_id_is_valid(vcpu, func_id);
> > + }
> > +}
> > +
> > +static bool kvm_hvc_call_allowed(struct kvm_vcpu *vcpu, u32 func_id)
> > +{
> > + struct kvm_smccc_features *smccc_feat = &vcpu->kvm->arch.smccc_feat;
> > +
> > + switch (func_id) {
> > + case ARM_SMCCC_TRNG_VERSION:
> > + case ARM_SMCCC_TRNG_FEATURES:
> > + case ARM_SMCCC_TRNG_GET_UUID:
> > + case ARM_SMCCC_TRNG_RND32:
> > + case ARM_SMCCC_TRNG_RND64:
> > + return kvm_arm_fw_reg_feat_enabled(&smccc_feat->std_bmap,
> > + KVM_REG_ARM_STD_BIT_TRNG_V1_0);
> > + default:
> > + return kvm_hvc_call_default_allowed(vcpu, func_id);
> > + }
> > +}
> > +
> > int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> > {
> > u32 func_id = smccc_get_function(vcpu);
> > @@ -65,6 +107,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> > u32 feature;
> > gpa_t gpa;
> >
> > + if (!kvm_hvc_call_allowed(vcpu, func_id))
> > + goto out;
> > +
> > switch (func_id) {
> > case ARM_SMCCC_VERSION_FUNC_ID:
> > val[0] = ARM_SMCCC_VERSION_1_1;
> > @@ -155,6 +200,7 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> > return kvm_psci_call(vcpu);
> > }
> >
> > +out:
> > smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
> > return 1;
> > }
> > @@ -164,8 +210,16 @@ static const u64 kvm_arm_fw_reg_ids[] = {
> > KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1,
> > KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2,
> > KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3,
> > + KVM_REG_ARM_STD_BMAP,
> > };
> >
> > +void kvm_arm_init_hypercalls(struct kvm *kvm)
> > +{
> > + struct kvm_smccc_features *smccc_feat = &kvm->arch.smccc_feat;
> > +
> > + smccc_feat->std_bmap = KVM_ARM_SMCCC_STD_FEATURES;
> > +}
> > +
> > int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu)
> > {
> > return ARRAY_SIZE(kvm_arm_fw_reg_ids);
> > @@ -237,6 +291,7 @@ static int get_kernel_wa_level(u64 regid)
> >
> > int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > {
> > + struct kvm_smccc_features *smccc_feat = &vcpu->kvm->arch.smccc_feat;
> > void __user *uaddr = (void __user *)(long)reg->addr;
> > u64 val;
> >
> > @@ -249,6 +304,9 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3:
> > val = get_kernel_wa_level(reg->id) & KVM_REG_FEATURE_LEVEL_MASK;
> > break;
> > + case KVM_REG_ARM_STD_BMAP:
> > + val = READ_ONCE(smccc_feat->std_bmap);
> > + break;
> > default:
> > return -ENOENT;
> > }
> > @@ -259,6 +317,40 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > return 0;
> > }
> >
> > +static int kvm_arm_set_fw_reg_bmap(struct kvm_vcpu *vcpu, u64 reg_id, u64 val)
> > +{
> > + int ret = 0;
> > + struct kvm *kvm = vcpu->kvm;
> > + struct kvm_smccc_features *smccc_feat = &kvm->arch.smccc_feat;
> > + unsigned long *fw_reg_bmap, fw_reg_features;
> > +
> > + switch (reg_id) {
> > + case KVM_REG_ARM_STD_BMAP:
> > + fw_reg_bmap = &smccc_feat->std_bmap;
> > + fw_reg_features = KVM_ARM_SMCCC_STD_FEATURES;
> > + break;
> > + default:
> > + return -ENOENT;
> > + }
> > +
> > + /* Check for unsupported bit */
> > + if (val & ~fw_reg_features)
> > + return -EINVAL;
> > +
> > + mutex_lock(&kvm->lock);
>
> Why don't you check if the register value will be modified before
> getting the lock ? (then there is nothing to do)
> It would help reduce unnecessary serialization for live migration
> (even without the vm-scoped register capability).
>
That was the case until v5. Since v6, we return -EBUSY unconditionally
regardless of the incoming value. See Marc's comments in [1].
>
>
> > +
> > + /* Return -EBUSY if the VM (any vCPU) has already started running. */
> > + if (test_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &kvm->arch.flags)) {
> > + ret = -EBUSY;
> > + goto out;
> > + }
>
> I just would like to make sure that you are sure that existing
> userspace you know will not run KVM_RUN for any vCPUs until
> KVM_SET_ONE_REG is complete for all vCPUs (even for migration),
> correct ?
>
Since v6, that is something that we are leaving with the userspace to
synchronize. See [1].
>
> > +
> > + WRITE_ONCE(*fw_reg_bmap, val);
> > +out:
> > + mutex_unlock(&kvm->lock);
> > + return ret;
> > +}
> > +
> > int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > {
> > void __user *uaddr = (void __user *)(long)reg->addr;
> > @@ -337,6 +429,8 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > return -EINVAL;
> >
> > return 0;
> > + case KVM_REG_ARM_STD_BMAP:
> > + return kvm_arm_set_fw_reg_bmap(vcpu, reg->id, val);
> > default:
> > return -ENOENT;
> > }
> > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> > index 346535169faa..67d1273e8086 100644
> > --- a/arch/arm64/kvm/psci.c
> > +++ b/arch/arm64/kvm/psci.c
> > @@ -436,3 +436,16 @@ int kvm_psci_call(struct kvm_vcpu *vcpu)
> > return -EINVAL;
> > }
> > }
> > +
> > +bool kvm_psci_func_id_is_valid(struct kvm_vcpu *vcpu, u32 func_id)
> > +{
> > + /* PSCI 0.1 doesn't comply with the standard SMCCC */
> > + if (kvm_psci_version(vcpu) == KVM_ARM_PSCI_0_1)
> > + return (func_id == KVM_PSCI_FN_CPU_OFF || func_id == KVM_PSCI_FN_CPU_ON);
> > +
> > + if (ARM_SMCCC_OWNER_NUM(func_id) == ARM_SMCCC_OWNER_STANDARD &&
> > + ARM_SMCCC_FUNC_NUM(func_id) >= 0 && ARM_SMCCC_FUNC_NUM(func_id) <= 0x1f)
> > + return true;
>
> For PSCI 0.1, the function checks if the funct_id is valid for
> the vCPU (according to the vCPU's PSCI version).
> For other version of PSCI, the function doesn't care the vCPU's
> PSCI version (although supported functions depend on the PSCI
> version and not all of them are defined yet, the code returns
> true as long as the function id is within the reserved PSCI
> function id range).
> So, the behavior appears to be inconsistent.
> Shouldn't it return the validity of the function id according
> to the vCPU's psci version for non-PSCI 0.1 case as well ?
> (Otherwise, shouldn't it return true if the function id is valid
> for any of the PSCI versions ?)
>
Well, PSCI 1.0 is somewhat of an odd implementation. It doesn't comply
with the SMCCC, hence needed some special handling. Only two func_ids
are currently supported by KVM, and we just check for each. The second
'if' statement is for all the PSCI versions >= 0.2. Thankfully, the
specification defines a range of acceptable PSCI func_ids.
If it's confusing, I can add a comment above the second 'if' that it's
for all PSCI versions >= 0.2.
> Thanks,
> Reiji
>
Thank you.
Raghavendra
[1]: https://lore.kernel.org/lkml/[email protected]/
>
>
> > +
> > + return false;
> > +}
> > diff --git a/include/kvm/arm_hypercalls.h b/include/kvm/arm_hypercalls.h
> > index 5d38628a8d04..499b45b607b6 100644
> > --- a/include/kvm/arm_hypercalls.h
> > +++ b/include/kvm/arm_hypercalls.h
> > @@ -6,6 +6,11 @@
> >
> > #include <asm/kvm_emulate.h>
> >
> > +/* Last valid bits of the bitmapped firmware registers */
> > +#define KVM_REG_ARM_STD_BMAP_BIT_MAX 0
> > +
> > +#define KVM_ARM_SMCCC_STD_FEATURES GENMASK(KVM_REG_ARM_STD_BMAP_BIT_MAX, 0)
> > +
> > int kvm_hvc_call_handler(struct kvm_vcpu *vcpu);
> >
> > static inline u32 smccc_get_function(struct kvm_vcpu *vcpu)
> > @@ -42,6 +47,7 @@ static inline void smccc_set_retval(struct kvm_vcpu *vcpu,
> >
> > struct kvm_one_reg;
> >
> > +void kvm_arm_init_hypercalls(struct kvm *kvm);
> > int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu);
> > int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
> > int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
> > diff --git a/include/kvm/arm_psci.h b/include/kvm/arm_psci.h
> > index 6e55b9283789..c47be3e26965 100644
> > --- a/include/kvm/arm_psci.h
> > +++ b/include/kvm/arm_psci.h
> > @@ -36,7 +36,7 @@ static inline int kvm_psci_version(struct kvm_vcpu *vcpu)
> > return KVM_ARM_PSCI_0_1;
> > }
> >
> > -
> > int kvm_psci_call(struct kvm_vcpu *vcpu);
> > +bool kvm_psci_func_id_is_valid(struct kvm_vcpu *vcpu, u32 func_id);
> >
> > #endif /* __KVM_ARM_PSCI_H__ */
> > --
> > 2.36.0.rc2.479.g8af0fa9b8e-goog
> >
Hi Raghavendra,
On 4/23/22 8:03 AM, Raghavendra Rao Ananta wrote:
> KVM regularly introduces new hypercall services to the guests without
> any consent from the userspace. This means, the guests can observe
> hypercall services in and out as they migrate across various host
> kernel versions. This could be a major problem if the guest
> discovered a hypercall, started using it, and after getting migrated
> to an older kernel realizes that it's no longer available. Depending
> on how the guest handles the change, there's a potential chance that
> the guest would just panic.
>
> As a result, there's a need for the userspace to elect the services
> that it wishes the guest to discover. It can elect these services
> based on the kernels spread across its (migration) fleet. To remedy
> this, extend the existing firmware pseudo-registers, such as
> KVM_REG_ARM_PSCI_VERSION, but by creating a new COPROC register space
> for all the hypercall services available.
>
> These firmware registers are categorized based on the service call
> owners, but unlike the existing firmware pseudo-registers, they hold
> the features supported in the form of a bitmap.
>
> During the VM initialization, the registers are set to upper-limit of
> the features supported by the corresponding registers. It's expected
> that the VMMs discover the features provided by each register via
> GET_ONE_REG, and write back the desired values using SET_ONE_REG.
> KVM allows this modification only until the VM has started.
>
> Some of the standard features are not mapped to any bits of the
> registers. But since they can recreate the original problem of
> making it available without userspace's consent, they need to
> be explicitly added to the case-list in
> kvm_hvc_call_default_allowed(). Any function-id that's not enabled
> via the bitmap, or not listed in kvm_hvc_call_default_allowed, will
> be returned as SMCCC_RET_NOT_SUPPORTED to the guest.
>
> Older userspace code can simply ignore the feature and the
> hypercall services will be exposed unconditionally to the guests,
> thus ensuring backward compatibility.
>
> In this patch, the framework adds the register only for ARM's standard
> secure services (owner value 4). Currently, this includes support only
> for ARM True Random Number Generator (TRNG) service, with bit-0 of the
> register representing mandatory features of v1.0. Other services are
> momentarily added in the upcoming patches.
>
> Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> ---
> arch/arm64/include/asm/kvm_host.h | 12 ++++
> arch/arm64/include/uapi/asm/kvm.h | 9 +++
> arch/arm64/kvm/arm.c | 1 +
> arch/arm64/kvm/guest.c | 8 ++-
> arch/arm64/kvm/hypercalls.c | 94 +++++++++++++++++++++++++++++++
> arch/arm64/kvm/psci.c | 13 +++++
> include/kvm/arm_hypercalls.h | 6 ++
> include/kvm/arm_psci.h | 2 +-
> 8 files changed, 142 insertions(+), 3 deletions(-)
>
Some nits as below, please consider to improve if you need another
respin.
Reviewed-by: Gavin Shan <[email protected]>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 94a27a7520f4..df07f4c10197 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -101,6 +101,15 @@ struct kvm_s2_mmu {
> struct kvm_arch_memory_slot {
> };
>
> +/**
> + * struct kvm_smccc_features: Descriptor the hypercall services exposed to the guests
> + *
> + * @std_bmap: Bitmap of standard secure service calls
> + */
> +struct kvm_smccc_features {
> + unsigned long std_bmap;
> +};
> +
s/Descriptor/Descriptor of
> struct kvm_arch {
> struct kvm_s2_mmu mmu;
>
> @@ -150,6 +159,9 @@ struct kvm_arch {
>
> u8 pfr0_csv2;
> u8 pfr0_csv3;
> +
> + /* Hypercall features firmware registers' descriptor */
> + struct kvm_smccc_features smccc_feat;
> };
>
> struct kvm_vcpu_fault_info {
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index c1b6ddc02d2f..0b79d2dc6ffd 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -332,6 +332,15 @@ struct kvm_arm_copy_mte_tags {
> #define KVM_ARM64_SVE_VLS_WORDS \
> ((KVM_ARM64_SVE_VQ_MAX - KVM_ARM64_SVE_VQ_MIN) / 64 + 1)
>
> +/* Bitmap feature firmware registers */
> +#define KVM_REG_ARM_FW_FEAT_BMAP (0x0016 << KVM_REG_ARM_COPROC_SHIFT)
> +#define KVM_REG_ARM_FW_FEAT_BMAP_REG(r) (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
> + KVM_REG_ARM_FW_FEAT_BMAP | \
> + ((r) & 0xffff))
> +
> +#define KVM_REG_ARM_STD_BMAP KVM_REG_ARM_FW_FEAT_BMAP_REG(0)
> +#define KVM_REG_ARM_STD_BIT_TRNG_V1_0 0
> +
> /* Device Control API: ARM VGIC */
> #define KVM_DEV_ARM_VGIC_GRP_ADDR 0
> #define KVM_DEV_ARM_VGIC_GRP_DIST_REGS 1
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 523bc934fe2f..a37fadbd617e 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -156,6 +156,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> kvm->arch.max_vcpus = kvm_arm_default_max_vcpus();
>
> set_default_spectre(kvm);
> + kvm_arm_init_hypercalls(kvm);
>
> return ret;
> out_free_stage2_pgd:
> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index 0d5cca56cbda..8c607199cad1 100644
> --- a/arch/arm64/kvm/guest.c
> +++ b/arch/arm64/kvm/guest.c
> @@ -756,7 +756,9 @@ int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>
> switch (reg->id & KVM_REG_ARM_COPROC_MASK) {
> case KVM_REG_ARM_CORE: return get_core_reg(vcpu, reg);
> - case KVM_REG_ARM_FW: return kvm_arm_get_fw_reg(vcpu, reg);
> + case KVM_REG_ARM_FW:
> + case KVM_REG_ARM_FW_FEAT_BMAP:
> + return kvm_arm_get_fw_reg(vcpu, reg);
> case KVM_REG_ARM64_SVE: return get_sve_reg(vcpu, reg);
> }
>
> @@ -774,7 +776,9 @@ int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>
> switch (reg->id & KVM_REG_ARM_COPROC_MASK) {
> case KVM_REG_ARM_CORE: return set_core_reg(vcpu, reg);
> - case KVM_REG_ARM_FW: return kvm_arm_set_fw_reg(vcpu, reg);
> + case KVM_REG_ARM_FW:
> + case KVM_REG_ARM_FW_FEAT_BMAP:
> + return kvm_arm_set_fw_reg(vcpu, reg);
> case KVM_REG_ARM64_SVE: return set_sve_reg(vcpu, reg);
> }
>
> diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> index fa6d9378d8e7..df55a04d2fe8 100644
> --- a/arch/arm64/kvm/hypercalls.c
> +++ b/arch/arm64/kvm/hypercalls.c
> @@ -58,6 +58,48 @@ static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
> val[3] = lower_32_bits(cycles);
> }
>
> +static bool kvm_arm_fw_reg_feat_enabled(unsigned long *reg_bmap, unsigned long feat_bit)
> +{
> + return test_bit(feat_bit, reg_bmap);
> +}
> +
Might be worhty to be 'inline'. This function would be called
frequently.
> +static bool kvm_hvc_call_default_allowed(struct kvm_vcpu *vcpu, u32 func_id)
> +{
> + switch (func_id) {
> + /*
> + * List of function-ids that are not gated with the bitmapped feature
> + * firmware registers, and are to be allowed for servicing the call by default.
> + */
> + case ARM_SMCCC_VERSION_FUNC_ID:
> + case ARM_SMCCC_ARCH_FEATURES_FUNC_ID:
> + case ARM_SMCCC_HV_PV_TIME_FEATURES:
> + case ARM_SMCCC_HV_PV_TIME_ST:
> + case ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID:
> + case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
> + case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
> + return true;
> + default:
> + return kvm_psci_func_id_is_valid(vcpu, func_id);
> + }
> +}
> +
> +static bool kvm_hvc_call_allowed(struct kvm_vcpu *vcpu, u32 func_id)
> +{
> + struct kvm_smccc_features *smccc_feat = &vcpu->kvm->arch.smccc_feat;
> +
> + switch (func_id) {
> + case ARM_SMCCC_TRNG_VERSION:
> + case ARM_SMCCC_TRNG_FEATURES:
> + case ARM_SMCCC_TRNG_GET_UUID:
> + case ARM_SMCCC_TRNG_RND32:
> + case ARM_SMCCC_TRNG_RND64:
> + return kvm_arm_fw_reg_feat_enabled(&smccc_feat->std_bmap,
> + KVM_REG_ARM_STD_BIT_TRNG_V1_0);
> + default:
> + return kvm_hvc_call_default_allowed(vcpu, func_id);
> + }
> +}
> +
> int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> {
> u32 func_id = smccc_get_function(vcpu);
> @@ -65,6 +107,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> u32 feature;
> gpa_t gpa;
>
> + if (!kvm_hvc_call_allowed(vcpu, func_id))
> + goto out;
> +
> switch (func_id) {
> case ARM_SMCCC_VERSION_FUNC_ID:
> val[0] = ARM_SMCCC_VERSION_1_1;
> @@ -155,6 +200,7 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> return kvm_psci_call(vcpu);
> }
>
> +out:
> smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
> return 1;
> }
> @@ -164,8 +210,16 @@ static const u64 kvm_arm_fw_reg_ids[] = {
> KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1,
> KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2,
> KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3,
> + KVM_REG_ARM_STD_BMAP,
> };
>
> +void kvm_arm_init_hypercalls(struct kvm *kvm)
> +{
> + struct kvm_smccc_features *smccc_feat = &kvm->arch.smccc_feat;
> +
> + smccc_feat->std_bmap = KVM_ARM_SMCCC_STD_FEATURES;
> +}
> +
> int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu)
> {
> return ARRAY_SIZE(kvm_arm_fw_reg_ids);
> @@ -237,6 +291,7 @@ static int get_kernel_wa_level(u64 regid)
>
> int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> {
> + struct kvm_smccc_features *smccc_feat = &vcpu->kvm->arch.smccc_feat;
> void __user *uaddr = (void __user *)(long)reg->addr;
> u64 val;
>
> @@ -249,6 +304,9 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3:
> val = get_kernel_wa_level(reg->id) & KVM_REG_FEATURE_LEVEL_MASK;
> break;
> + case KVM_REG_ARM_STD_BMAP:
> + val = READ_ONCE(smccc_feat->std_bmap);
> + break;
> default:
> return -ENOENT;
> }
> @@ -259,6 +317,40 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> return 0;
> }
>
> +static int kvm_arm_set_fw_reg_bmap(struct kvm_vcpu *vcpu, u64 reg_id, u64 val)
> +{
> + int ret = 0;
> + struct kvm *kvm = vcpu->kvm;
> + struct kvm_smccc_features *smccc_feat = &kvm->arch.smccc_feat;
> + unsigned long *fw_reg_bmap, fw_reg_features;
> +
> + switch (reg_id) {
> + case KVM_REG_ARM_STD_BMAP:
> + fw_reg_bmap = &smccc_feat->std_bmap;
> + fw_reg_features = KVM_ARM_SMCCC_STD_FEATURES;
> + break;
> + default:
> + return -ENOENT;
> + }
> +
> + /* Check for unsupported bit */
> + if (val & ~fw_reg_features)
> + return -EINVAL;
> +
> + mutex_lock(&kvm->lock);
> +
> + /* Return -EBUSY if the VM (any vCPU) has already started running. */
> + if (test_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &kvm->arch.flags)) {
> + ret = -EBUSY;
> + goto out;
> + }
> +
> + WRITE_ONCE(*fw_reg_bmap, val);
> +out:
> + mutex_unlock(&kvm->lock);
> + return ret;
> +}
> +
> int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> {
> void __user *uaddr = (void __user *)(long)reg->addr;
> @@ -337,6 +429,8 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> return -EINVAL;
>
> return 0;
> + case KVM_REG_ARM_STD_BMAP:
> + return kvm_arm_set_fw_reg_bmap(vcpu, reg->id, val);
> default:
> return -ENOENT;
> }
> diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> index 346535169faa..67d1273e8086 100644
> --- a/arch/arm64/kvm/psci.c
> +++ b/arch/arm64/kvm/psci.c
> @@ -436,3 +436,16 @@ int kvm_psci_call(struct kvm_vcpu *vcpu)
> return -EINVAL;
> }
> }
> +
> +bool kvm_psci_func_id_is_valid(struct kvm_vcpu *vcpu, u32 func_id)
> +{
> + /* PSCI 0.1 doesn't comply with the standard SMCCC */
> + if (kvm_psci_version(vcpu) == KVM_ARM_PSCI_0_1)
> + return (func_id == KVM_PSCI_FN_CPU_OFF || func_id == KVM_PSCI_FN_CPU_ON);
> +
> + if (ARM_SMCCC_OWNER_NUM(func_id) == ARM_SMCCC_OWNER_STANDARD &&
> + ARM_SMCCC_FUNC_NUM(func_id) >= 0 && ARM_SMCCC_FUNC_NUM(func_id) <= 0x1f)
> + return true;
> +
> + return false;
> +}
> diff --git a/include/kvm/arm_hypercalls.h b/include/kvm/arm_hypercalls.h
> index 5d38628a8d04..499b45b607b6 100644
> --- a/include/kvm/arm_hypercalls.h
> +++ b/include/kvm/arm_hypercalls.h
> @@ -6,6 +6,11 @@
>
> #include <asm/kvm_emulate.h>
>
> +/* Last valid bits of the bitmapped firmware registers */
> +#define KVM_REG_ARM_STD_BMAP_BIT_MAX 0
> +
> +#define KVM_ARM_SMCCC_STD_FEATURES GENMASK(KVM_REG_ARM_STD_BMAP_BIT_MAX, 0)
> +
s/bits of/bit of
> int kvm_hvc_call_handler(struct kvm_vcpu *vcpu);
>
> static inline u32 smccc_get_function(struct kvm_vcpu *vcpu)
> @@ -42,6 +47,7 @@ static inline void smccc_set_retval(struct kvm_vcpu *vcpu,
>
> struct kvm_one_reg;
>
> +void kvm_arm_init_hypercalls(struct kvm *kvm);
> int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu);
> int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
> int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
> diff --git a/include/kvm/arm_psci.h b/include/kvm/arm_psci.h
> index 6e55b9283789..c47be3e26965 100644
> --- a/include/kvm/arm_psci.h
> +++ b/include/kvm/arm_psci.h
> @@ -36,7 +36,7 @@ static inline int kvm_psci_version(struct kvm_vcpu *vcpu)
> return KVM_ARM_PSCI_0_1;
> }
>
> -
> int kvm_psci_call(struct kvm_vcpu *vcpu);
> +bool kvm_psci_func_id_is_valid(struct kvm_vcpu *vcpu, u32 func_id);
>
> #endif /* __KVM_ARM_PSCI_H__ */
>
Thanks,
Gavin
Hi Raghavendra,
On 4/23/22 8:03 AM, Raghavendra Rao Ananta wrote:
> Introduce a KVM selftest to check the hypercall interface
> for arm64 platforms. The test validates the user-space'
> [GET|SET]_ONE_REG interface to read/write the psuedo-firmware
> registers as well as its effects on the guest upon certain
> configurations.
>
> Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> ---
> tools/testing/selftests/kvm/.gitignore | 1 +
> tools/testing/selftests/kvm/Makefile | 1 +
> .../selftests/kvm/aarch64/hypercalls.c | 335 ++++++++++++++++++
> 3 files changed, 337 insertions(+)
> create mode 100644 tools/testing/selftests/kvm/aarch64/hypercalls.c
>
There are comments about @false_hvc_info[] and some nits, as below.
Please evaluate and improve if it makes sense to you. Otherwise, it
looks good to me:
Reviewed-by: Gavin Shan <[email protected]>
> diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
> index 1bb575dfc42e..b17e464ec661 100644
> --- a/tools/testing/selftests/kvm/.gitignore
> +++ b/tools/testing/selftests/kvm/.gitignore
> @@ -2,6 +2,7 @@
> /aarch64/arch_timer
> /aarch64/debug-exceptions
> /aarch64/get-reg-list
> +/aarch64/hypercalls
> /aarch64/psci_test
> /aarch64/vcpu_width_config
> /aarch64/vgic_init
> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> index c2cf4d318296..97eef0c03d3b 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -105,6 +105,7 @@ TEST_GEN_PROGS_x86_64 += system_counter_offset_test
> TEST_GEN_PROGS_aarch64 += aarch64/arch_timer
> TEST_GEN_PROGS_aarch64 += aarch64/debug-exceptions
> TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
> +TEST_GEN_PROGS_aarch64 += aarch64/hypercalls
> TEST_GEN_PROGS_aarch64 += aarch64/psci_test
> TEST_GEN_PROGS_aarch64 += aarch64/vcpu_width_config
> TEST_GEN_PROGS_aarch64 += aarch64/vgic_init
> diff --git a/tools/testing/selftests/kvm/aarch64/hypercalls.c b/tools/testing/selftests/kvm/aarch64/hypercalls.c
> new file mode 100644
> index 000000000000..f404343a0ae3
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/aarch64/hypercalls.c
> @@ -0,0 +1,335 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +/* hypercalls: Check the ARM64's psuedo-firmware bitmap register interface.
> + *
> + * The test validates the basic hypercall functionalities that are exposed
> + * via the psuedo-firmware bitmap register. This includes the registers'
> + * read/write behavior before and after the VM has started, and if the
> + * hypercalls are properly masked or unmasked to the guest when disabled or
> + * enabled from the KVM userspace, respectively.
> + */
> +
> +#include <errno.h>
> +#include <linux/arm-smccc.h>
> +#include <asm/kvm.h>
> +#include <kvm_util.h>
> +
> +#include "processor.h"
> +
> +#define FW_REG_ULIMIT_VAL(max_feat_bit) (GENMASK(max_feat_bit, 0))
> +
> +/* Last valid bits of the bitmapped firmware registers */
> +#define KVM_REG_ARM_STD_BMAP_BIT_MAX 0
> +#define KVM_REG_ARM_STD_HYP_BMAP_BIT_MAX 0
> +#define KVM_REG_ARM_VENDOR_HYP_BMAP_BIT_MAX 1
> +
> +struct kvm_fw_reg_info {
> + uint64_t reg; /* Register definition */
> + uint64_t max_feat_bit; /* Bit that represents the upper limit of the feature-map */
> +};
> +
> +#define FW_REG_INFO(r) \
> + { \
> + .reg = r, \
> + .max_feat_bit = r##_BIT_MAX, \
> + }
> +
> +static const struct kvm_fw_reg_info fw_reg_info[] = {
> + FW_REG_INFO(KVM_REG_ARM_STD_BMAP),
> + FW_REG_INFO(KVM_REG_ARM_STD_HYP_BMAP),
> + FW_REG_INFO(KVM_REG_ARM_VENDOR_HYP_BMAP),
> +};
> +
> +enum test_stage {
> + TEST_STAGE_REG_IFACE,
> + TEST_STAGE_HVC_IFACE_FEAT_DISABLED,
> + TEST_STAGE_HVC_IFACE_FEAT_ENABLED,
> + TEST_STAGE_HVC_IFACE_FALSE_INFO,
> + TEST_STAGE_END,
> +};
> +
> +static int stage = TEST_STAGE_REG_IFACE;
> +
> +struct test_hvc_info {
> + uint32_t func_id;
> + uint64_t arg1;
> +};
> +
> +#define TEST_HVC_INFO(f, a1) \
> + { \
> + .func_id = f, \
> + .arg1 = a1, \
> + }
> +
> +static const struct test_hvc_info hvc_info[] = {
> + /* KVM_REG_ARM_STD_BMAP */
> + TEST_HVC_INFO(ARM_SMCCC_TRNG_VERSION, 0),
> + TEST_HVC_INFO(ARM_SMCCC_TRNG_FEATURES, ARM_SMCCC_TRNG_RND64),
> + TEST_HVC_INFO(ARM_SMCCC_TRNG_GET_UUID, 0),
> + TEST_HVC_INFO(ARM_SMCCC_TRNG_RND32, 0),
> + TEST_HVC_INFO(ARM_SMCCC_TRNG_RND64, 0),
> +
> + /* KVM_REG_ARM_STD_HYP_BMAP */
> + TEST_HVC_INFO(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, ARM_SMCCC_HV_PV_TIME_FEATURES),
> + TEST_HVC_INFO(ARM_SMCCC_HV_PV_TIME_FEATURES, ARM_SMCCC_HV_PV_TIME_ST),
> + TEST_HVC_INFO(ARM_SMCCC_HV_PV_TIME_ST, 0),
> +
> + /* KVM_REG_ARM_VENDOR_HYP_BMAP */
> + TEST_HVC_INFO(ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID,
> + ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID),
> + TEST_HVC_INFO(ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID, 0),
> + TEST_HVC_INFO(ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID, KVM_PTP_VIRT_COUNTER),
> +};
> +
> +/* Feed false hypercall info to test the KVM behavior */
> +static const struct test_hvc_info false_hvc_info[] = {
> + /* Feature support check against a different family of hypercalls */
> + TEST_HVC_INFO(ARM_SMCCC_TRNG_FEATURES, ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID),
> + TEST_HVC_INFO(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, ARM_SMCCC_TRNG_RND64),
> + TEST_HVC_INFO(ARM_SMCCC_HV_PV_TIME_FEATURES, ARM_SMCCC_TRNG_RND64),
> +};
> +
I don't see too much benefits of @false_hvc_info[] because
NOT_SUPPORTED is always returned from its test case. I think
it and its test case can be removed if you agree. I'm not
sure if it was suggested by somebody else.
> +static void guest_test_hvc(const struct test_hvc_info *hc_info)
> +{
> + unsigned int i;
> + struct arm_smccc_res res;
> + unsigned int hvc_info_arr_sz;
> +
> + hvc_info_arr_sz =
> + hc_info == hvc_info ? ARRAY_SIZE(hvc_info) : ARRAY_SIZE(false_hvc_info);
> +
> + for (i = 0; i < hvc_info_arr_sz; i++, hc_info++) {
> + memset(&res, 0, sizeof(res));
> + smccc_hvc(hc_info->func_id, hc_info->arg1, 0, 0, 0, 0, 0, 0, &res);
> +
> + switch (stage) {
> + case TEST_STAGE_HVC_IFACE_FEAT_DISABLED:
> + case TEST_STAGE_HVC_IFACE_FALSE_INFO:
> + GUEST_ASSERT_3(res.a0 == SMCCC_RET_NOT_SUPPORTED,
> + res.a0, hc_info->func_id, hc_info->arg1);
> + break;
> + case TEST_STAGE_HVC_IFACE_FEAT_ENABLED:
> + GUEST_ASSERT_3(res.a0 != SMCCC_RET_NOT_SUPPORTED,
> + res.a0, hc_info->func_id, hc_info->arg1);
> + break;
> + default:
> + GUEST_ASSERT_1(0, stage);
> + }
> + }
> +}
> +
> +static void guest_code(void)
> +{
> + while (stage != TEST_STAGE_END) {
> + switch (stage) {
> + case TEST_STAGE_REG_IFACE:
> + break;
> + case TEST_STAGE_HVC_IFACE_FEAT_DISABLED:
> + case TEST_STAGE_HVC_IFACE_FEAT_ENABLED:
> + guest_test_hvc(hvc_info);
> + break;
> + case TEST_STAGE_HVC_IFACE_FALSE_INFO:
> + guest_test_hvc(false_hvc_info);
> + break;
> + default:
> + GUEST_ASSERT_1(0, stage);
> + }
> +
> + GUEST_SYNC(stage);
> + }
> +
> + GUEST_DONE();
> +}
> +
> +static int set_fw_reg(struct kvm_vm *vm, uint64_t id, uint64_t val)
> +{
> + struct kvm_one_reg reg = {
> + .id = id,
> + .addr = (uint64_t)&val,
> + };
> +
> + return _vcpu_ioctl(vm, 0, KVM_SET_ONE_REG, ®);
> +}
> +
> +static void get_fw_reg(struct kvm_vm *vm, uint64_t id, uint64_t *addr)
> +{
> + struct kvm_one_reg reg = {
> + .id = id,
> + .addr = (uint64_t)addr,
> + };
> +
> + vcpu_ioctl(vm, 0, KVM_GET_ONE_REG, ®);
> +}
> +
> +struct st_time {
> + uint32_t rev;
> + uint32_t attr;
> + uint64_t st_time;
> +};
> +
> +#define STEAL_TIME_SIZE ((sizeof(struct st_time) + 63) & ~63)
> +#define ST_GPA_BASE (1 << 30)
> +
> +static void steal_time_init(struct kvm_vm *vm)
> +{
> + uint64_t st_ipa = (ulong)ST_GPA_BASE;
> + unsigned int gpages;
> + struct kvm_device_attr dev = {
> + .group = KVM_ARM_VCPU_PVTIME_CTRL,
> + .attr = KVM_ARM_VCPU_PVTIME_IPA,
> + .addr = (uint64_t)&st_ipa,
> + };
> +
> + gpages = vm_calc_num_guest_pages(VM_MODE_DEFAULT, STEAL_TIME_SIZE);
> + vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, ST_GPA_BASE, 1, gpages, 0);
> +
> + vcpu_ioctl(vm, 0, KVM_SET_DEVICE_ATTR, &dev);
> +}
> +
> +static void test_fw_regs_before_vm_start(struct kvm_vm *vm)
> +{
> + uint64_t val;
> + unsigned int i;
> + int ret;
> +
> + for (i = 0; i < ARRAY_SIZE(fw_reg_info); i++) {
> + const struct kvm_fw_reg_info *reg_info = &fw_reg_info[i];
> +
> + /* First 'read' should be an upper limit of the features supported */
> + get_fw_reg(vm, reg_info->reg, &val);
> + TEST_ASSERT(val == FW_REG_ULIMIT_VAL(reg_info->max_feat_bit),
> + "Expected all the features to be set for reg: 0x%lx; expected: 0x%lx; read: 0x%lx\n",
> + reg_info->reg, FW_REG_ULIMIT_VAL(reg_info->max_feat_bit), val);
> +
> + /* Test a 'write' by disabling all the features of the register map */
> + ret = set_fw_reg(vm, reg_info->reg, 0);
> + TEST_ASSERT(ret == 0,
> + "Failed to clear all the features of reg: 0x%lx; ret: %d\n",
> + reg_info->reg, errno);
> +
> + get_fw_reg(vm, reg_info->reg, &val);
> + TEST_ASSERT(val == 0,
> + "Expected all the features to be cleared for reg: 0x%lx\n", reg_info->reg);
> +
> + /*
> + * Test enabling a feature that's not supported.
> + * Avoid this check if all the bits are occupied.
> + */
> + if (reg_info->max_feat_bit < 63) {
> + ret = set_fw_reg(vm, reg_info->reg, BIT(reg_info->max_feat_bit + 1));
> + TEST_ASSERT(ret != 0 && errno == EINVAL,
> + "Unexpected behavior or return value (%d) while setting an unsupported feature for reg: 0x%lx\n",
> + errno, reg_info->reg);
> + }
> + }
> +}
Just in case :)
ret = set_fw_reg(vm, reg_info->reg, GENMASK(63, reg_info->max_feat_bit + 1));
> +
> +static void test_fw_regs_after_vm_start(struct kvm_vm *vm)
> +{
> + uint64_t val;
> + unsigned int i;
> + int ret;
> +
> + for (i = 0; i < ARRAY_SIZE(fw_reg_info); i++) {
> + const struct kvm_fw_reg_info *reg_info = &fw_reg_info[i];
> +
> + /*
> + * Before starting the VM, the test clears all the bits.
> + * Check if that's still the case.
> + */
> + get_fw_reg(vm, reg_info->reg, &val);
> + TEST_ASSERT(val == 0,
> + "Expected all the features to be cleared for reg: 0x%lx\n",
> + reg_info->reg);
> +
> + /*
> + * Set all the features for this register again. KVM shouldn't
> + * allow this as the VM is running.
> + */
> + ret = set_fw_reg(vm, reg_info->reg, FW_REG_ULIMIT_VAL(reg_info->max_feat_bit));
> + TEST_ASSERT(ret != 0 && errno == EBUSY,
> + "Unexpected behavior or return value (%d) while setting a feature while VM is running for reg: 0x%lx\n",
> + errno, reg_info->reg);
> + }
> +}
> +
I guess you want to check -EBUSY is returned. In that case,
the comments here could be clearer, something like below
to emphasize '-EBUSY'.
/*
* After VM runs for once, -EBUSY should be returned on attempt
* to set features. Check if the correct errno is returned.
*/
> +static struct kvm_vm *test_vm_create(void)
> +{
> + struct kvm_vm *vm;
> +
> + vm = vm_create_default(0, 0, guest_code);
> +
> + ucall_init(vm, NULL);
> + steal_time_init(vm);
> +
> + return vm;
> +}
> +
> +static struct kvm_vm *test_guest_stage(struct kvm_vm *vm)
> +{
> + struct kvm_vm *ret_vm = vm;
> +
> + pr_debug("Stage: %d\n", stage);
> +
> + switch (stage) {
> + case TEST_STAGE_REG_IFACE:
> + test_fw_regs_after_vm_start(vm);
> + break;
> + case TEST_STAGE_HVC_IFACE_FEAT_DISABLED:
> + /* Start a new VM so that all the features are now enabled by default */
> + kvm_vm_free(vm);
> + ret_vm = test_vm_create();
> + break;
> + case TEST_STAGE_HVC_IFACE_FEAT_ENABLED:
> + case TEST_STAGE_HVC_IFACE_FALSE_INFO:
> + break;
> + default:
> + TEST_FAIL("Unknown test stage: %d\n", stage);
> + }
> +
> + stage++;
> + sync_global_to_guest(vm, stage);
> +
> + return ret_vm;
> +}
> +
> +static void test_run(void)
> +{
> + struct kvm_vm *vm;
> + struct ucall uc;
> + bool guest_done = false;
> +
> + vm = test_vm_create();
> +
> + test_fw_regs_before_vm_start(vm);
> +
> + while (!guest_done) {
> + vcpu_run(vm, 0);
> +
> + switch (get_ucall(vm, 0, &uc)) {
> + case UCALL_SYNC:
> + vm = test_guest_stage(vm);
> + break;
> + case UCALL_DONE:
> + guest_done = true;
> + break;
> + case UCALL_ABORT:
> + TEST_FAIL("%s at %s:%ld\n\tvalues: 0x%lx, 0x%lx; 0x%lx, stage: %u",
> + (const char *)uc.args[0], __FILE__, uc.args[1],
> + uc.args[2], uc.args[3], uc.args[4], stage);
> + break;
> + default:
> + TEST_FAIL("Unexpected guest exit\n");
> + }
> + }
> +
> + kvm_vm_free(vm);
> +}
> +
> +int main(void)
> +{
> + setbuf(stdout, NULL);
> +
> + test_run();
> + return 0;
> +}
>
Thanks,
Gavin
Hi Raghavendra,
On 4/27/22 12:44 AM, Raghavendra Rao Ananta wrote:
> On Mon, Apr 25, 2022 at 11:34 PM Gavin Shan <[email protected]> wrote:
>> On 4/23/22 8:03 AM, Raghavendra Rao Ananta wrote:
>>> KVM regularly introduces new hypercall services to the guests without
>>> any consent from the userspace. This means, the guests can observe
>>> hypercall services in and out as they migrate across various host
>>> kernel versions. This could be a major problem if the guest
>>> discovered a hypercall, started using it, and after getting migrated
>>> to an older kernel realizes that it's no longer available. Depending
>>> on how the guest handles the change, there's a potential chance that
>>> the guest would just panic.
>>>
>>> As a result, there's a need for the userspace to elect the services
>>> that it wishes the guest to discover. It can elect these services
>>> based on the kernels spread across its (migration) fleet. To remedy
>>> this, extend the existing firmware pseudo-registers, such as
>>> KVM_REG_ARM_PSCI_VERSION, but by creating a new COPROC register space
>>> for all the hypercall services available.
>>>
>>> These firmware registers are categorized based on the service call
>>> owners, but unlike the existing firmware pseudo-registers, they hold
>>> the features supported in the form of a bitmap.
>>>
>>> During the VM initialization, the registers are set to upper-limit of
>>> the features supported by the corresponding registers. It's expected
>>> that the VMMs discover the features provided by each register via
>>> GET_ONE_REG, and write back the desired values using SET_ONE_REG.
>>> KVM allows this modification only until the VM has started.
>>>
>>> Some of the standard features are not mapped to any bits of the
>>> registers. But since they can recreate the original problem of
>>> making it available without userspace's consent, they need to
>>> be explicitly added to the case-list in
>>> kvm_hvc_call_default_allowed(). Any function-id that's not enabled
>>> via the bitmap, or not listed in kvm_hvc_call_default_allowed, will
>>> be returned as SMCCC_RET_NOT_SUPPORTED to the guest.
>>>
>>> Older userspace code can simply ignore the feature and the
>>> hypercall services will be exposed unconditionally to the guests,
>>> thus ensuring backward compatibility.
>>>
>>> In this patch, the framework adds the register only for ARM's standard
>>> secure services (owner value 4). Currently, this includes support only
>>> for ARM True Random Number Generator (TRNG) service, with bit-0 of the
>>> register representing mandatory features of v1.0. Other services are
>>> momentarily added in the upcoming patches.
>>>
>>> Signed-off-by: Raghavendra Rao Ananta <[email protected]>
>>> ---
>>> arch/arm64/include/asm/kvm_host.h | 12 ++++
>>> arch/arm64/include/uapi/asm/kvm.h | 9 +++
>>> arch/arm64/kvm/arm.c | 1 +
>>> arch/arm64/kvm/guest.c | 8 ++-
>>> arch/arm64/kvm/hypercalls.c | 94 +++++++++++++++++++++++++++++++
>>> arch/arm64/kvm/psci.c | 13 +++++
>>> include/kvm/arm_hypercalls.h | 6 ++
>>> include/kvm/arm_psci.h | 2 +-
>>> 8 files changed, 142 insertions(+), 3 deletions(-)
>>>
>>
>> Some nits as below, please consider to improve if you need another
>> respin.
>>
>> Reviewed-by: Gavin Shan <[email protected]>
>>
>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>> index 94a27a7520f4..df07f4c10197 100644
>>> --- a/arch/arm64/include/asm/kvm_host.h
>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>> @@ -101,6 +101,15 @@ struct kvm_s2_mmu {
>>> struct kvm_arch_memory_slot {
>>> };
>>>
>>> +/**
>>> + * struct kvm_smccc_features: Descriptor the hypercall services exposed to the guests
>>> + *
>>> + * @std_bmap: Bitmap of standard secure service calls
>>> + */
>>> +struct kvm_smccc_features {
>>> + unsigned long std_bmap;
>>> +};
>>> +
>>
>> s/Descriptor/Descriptor of
>>
> Nice catch!
>
>>> struct kvm_arch {
>>> struct kvm_s2_mmu mmu;
>>>
>>> @@ -150,6 +159,9 @@ struct kvm_arch {
>>>
>>> u8 pfr0_csv2;
>>> u8 pfr0_csv3;
>>> +
>>> + /* Hypercall features firmware registers' descriptor */
>>> + struct kvm_smccc_features smccc_feat;
>>> };
>>>
>>> struct kvm_vcpu_fault_info {
>>> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
>>> index c1b6ddc02d2f..0b79d2dc6ffd 100644
>>> --- a/arch/arm64/include/uapi/asm/kvm.h
>>> +++ b/arch/arm64/include/uapi/asm/kvm.h
>>> @@ -332,6 +332,15 @@ struct kvm_arm_copy_mte_tags {
>>> #define KVM_ARM64_SVE_VLS_WORDS \
>>> ((KVM_ARM64_SVE_VQ_MAX - KVM_ARM64_SVE_VQ_MIN) / 64 + 1)
>>>
>>> +/* Bitmap feature firmware registers */
>>> +#define KVM_REG_ARM_FW_FEAT_BMAP (0x0016 << KVM_REG_ARM_COPROC_SHIFT)
>>> +#define KVM_REG_ARM_FW_FEAT_BMAP_REG(r) (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
>>> + KVM_REG_ARM_FW_FEAT_BMAP | \
>>> + ((r) & 0xffff))
>>> +
>>> +#define KVM_REG_ARM_STD_BMAP KVM_REG_ARM_FW_FEAT_BMAP_REG(0)
>>> +#define KVM_REG_ARM_STD_BIT_TRNG_V1_0 0
>>> +
>>> /* Device Control API: ARM VGIC */
>>> #define KVM_DEV_ARM_VGIC_GRP_ADDR 0
>>> #define KVM_DEV_ARM_VGIC_GRP_DIST_REGS 1
>>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>>> index 523bc934fe2f..a37fadbd617e 100644
>>> --- a/arch/arm64/kvm/arm.c
>>> +++ b/arch/arm64/kvm/arm.c
>>> @@ -156,6 +156,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>>> kvm->arch.max_vcpus = kvm_arm_default_max_vcpus();
>>>
>>> set_default_spectre(kvm);
>>> + kvm_arm_init_hypercalls(kvm);
>>>
>>> return ret;
>>> out_free_stage2_pgd:
>>> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
>>> index 0d5cca56cbda..8c607199cad1 100644
>>> --- a/arch/arm64/kvm/guest.c
>>> +++ b/arch/arm64/kvm/guest.c
>>> @@ -756,7 +756,9 @@ int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>>>
>>> switch (reg->id & KVM_REG_ARM_COPROC_MASK) {
>>> case KVM_REG_ARM_CORE: return get_core_reg(vcpu, reg);
>>> - case KVM_REG_ARM_FW: return kvm_arm_get_fw_reg(vcpu, reg);
>>> + case KVM_REG_ARM_FW:
>>> + case KVM_REG_ARM_FW_FEAT_BMAP:
>>> + return kvm_arm_get_fw_reg(vcpu, reg);
>>> case KVM_REG_ARM64_SVE: return get_sve_reg(vcpu, reg);
>>> }
>>>
>>> @@ -774,7 +776,9 @@ int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>>>
>>> switch (reg->id & KVM_REG_ARM_COPROC_MASK) {
>>> case KVM_REG_ARM_CORE: return set_core_reg(vcpu, reg);
>>> - case KVM_REG_ARM_FW: return kvm_arm_set_fw_reg(vcpu, reg);
>>> + case KVM_REG_ARM_FW:
>>> + case KVM_REG_ARM_FW_FEAT_BMAP:
>>> + return kvm_arm_set_fw_reg(vcpu, reg);
>>> case KVM_REG_ARM64_SVE: return set_sve_reg(vcpu, reg);
>>> }
>>>
>>> diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
>>> index fa6d9378d8e7..df55a04d2fe8 100644
>>> --- a/arch/arm64/kvm/hypercalls.c
>>> +++ b/arch/arm64/kvm/hypercalls.c
>>> @@ -58,6 +58,48 @@ static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
>>> val[3] = lower_32_bits(cycles);
>>> }
>>>
>>> +static bool kvm_arm_fw_reg_feat_enabled(unsigned long *reg_bmap, unsigned long feat_bit)
>>> +{
>>> + return test_bit(feat_bit, reg_bmap);
>>> +}
>>> +
>>
>> Might be worhty to be 'inline'. This function would be called
>> frequently.
>>
> I was hoping the compiler would optimize it for us as needed.
>
Yeah, GCC is smart enough. It could be compiled to inline function
even it's not specified explicitly. I guess __always_inline can be
used here. However, it seems __always_inline isn't used broadly in
kvm/arm64 scope. Anyway, it's not a big deal in this specific case :)
>>> +static bool kvm_hvc_call_default_allowed(struct kvm_vcpu *vcpu, u32 func_id)
>>> +{
>>> + switch (func_id) {
>>> + /*
>>> + * List of function-ids that are not gated with the bitmapped feature
>>> + * firmware registers, and are to be allowed for servicing the call by default.
>>> + */
>>> + case ARM_SMCCC_VERSION_FUNC_ID:
>>> + case ARM_SMCCC_ARCH_FEATURES_FUNC_ID:
>>> + case ARM_SMCCC_HV_PV_TIME_FEATURES:
>>> + case ARM_SMCCC_HV_PV_TIME_ST:
>>> + case ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID:
>>> + case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
>>> + case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
>>> + return true;
>>> + default:
>>> + return kvm_psci_func_id_is_valid(vcpu, func_id);
>>> + }
>>> +}
>>> +
>>> +static bool kvm_hvc_call_allowed(struct kvm_vcpu *vcpu, u32 func_id)
>>> +{
>>> + struct kvm_smccc_features *smccc_feat = &vcpu->kvm->arch.smccc_feat;
>>> +
>>> + switch (func_id) {
>>> + case ARM_SMCCC_TRNG_VERSION:
>>> + case ARM_SMCCC_TRNG_FEATURES:
>>> + case ARM_SMCCC_TRNG_GET_UUID:
>>> + case ARM_SMCCC_TRNG_RND32:
>>> + case ARM_SMCCC_TRNG_RND64:
>>> + return kvm_arm_fw_reg_feat_enabled(&smccc_feat->std_bmap,
>>> + KVM_REG_ARM_STD_BIT_TRNG_V1_0);
>>> + default:
>>> + return kvm_hvc_call_default_allowed(vcpu, func_id);
>>> + }
>>> +}
>>> +
>>> int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
>>> {
>>> u32 func_id = smccc_get_function(vcpu);
>>> @@ -65,6 +107,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
>>> u32 feature;
>>> gpa_t gpa;
>>>
>>> + if (!kvm_hvc_call_allowed(vcpu, func_id))
>>> + goto out;
>>> +
>>> switch (func_id) {
>>> case ARM_SMCCC_VERSION_FUNC_ID:
>>> val[0] = ARM_SMCCC_VERSION_1_1;
>>> @@ -155,6 +200,7 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
>>> return kvm_psci_call(vcpu);
>>> }
>>>
>>> +out:
>>> smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
>>> return 1;
>>> }
>>> @@ -164,8 +210,16 @@ static const u64 kvm_arm_fw_reg_ids[] = {
>>> KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1,
>>> KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2,
>>> KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3,
>>> + KVM_REG_ARM_STD_BMAP,
>>> };
>>>
>>> +void kvm_arm_init_hypercalls(struct kvm *kvm)
>>> +{
>>> + struct kvm_smccc_features *smccc_feat = &kvm->arch.smccc_feat;
>>> +
>>> + smccc_feat->std_bmap = KVM_ARM_SMCCC_STD_FEATURES;
>>> +}
>>> +
>>> int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu)
>>> {
>>> return ARRAY_SIZE(kvm_arm_fw_reg_ids);
>>> @@ -237,6 +291,7 @@ static int get_kernel_wa_level(u64 regid)
>>>
>>> int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>>> {
>>> + struct kvm_smccc_features *smccc_feat = &vcpu->kvm->arch.smccc_feat;
>>> void __user *uaddr = (void __user *)(long)reg->addr;
>>> u64 val;
>>>
>>> @@ -249,6 +304,9 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>>> case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3:
>>> val = get_kernel_wa_level(reg->id) & KVM_REG_FEATURE_LEVEL_MASK;
>>> break;
>>> + case KVM_REG_ARM_STD_BMAP:
>>> + val = READ_ONCE(smccc_feat->std_bmap);
>>> + break;
>>> default:
>>> return -ENOENT;
>>> }
>>> @@ -259,6 +317,40 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>>> return 0;
>>> }
>>>
>>> +static int kvm_arm_set_fw_reg_bmap(struct kvm_vcpu *vcpu, u64 reg_id, u64 val)
>>> +{
>>> + int ret = 0;
>>> + struct kvm *kvm = vcpu->kvm;
>>> + struct kvm_smccc_features *smccc_feat = &kvm->arch.smccc_feat;
>>> + unsigned long *fw_reg_bmap, fw_reg_features;
>>> +
>>> + switch (reg_id) {
>>> + case KVM_REG_ARM_STD_BMAP:
>>> + fw_reg_bmap = &smccc_feat->std_bmap;
>>> + fw_reg_features = KVM_ARM_SMCCC_STD_FEATURES;
>>> + break;
>>> + default:
>>> + return -ENOENT;
>>> + }
>>> +
>>> + /* Check for unsupported bit */
>>> + if (val & ~fw_reg_features)
>>> + return -EINVAL;
>>> +
>>> + mutex_lock(&kvm->lock);
>>> +
>>> + /* Return -EBUSY if the VM (any vCPU) has already started running. */
>>> + if (test_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &kvm->arch.flags)) {
>>> + ret = -EBUSY;
>>> + goto out;
>>> + }
>>> +
>>> + WRITE_ONCE(*fw_reg_bmap, val);
>>> +out:
>>> + mutex_unlock(&kvm->lock);
>>> + return ret;
>>> +}
>>> +
>>> int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>>> {
>>> void __user *uaddr = (void __user *)(long)reg->addr;
>>> @@ -337,6 +429,8 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>>> return -EINVAL;
>>>
>>> return 0;
>>> + case KVM_REG_ARM_STD_BMAP:
>>> + return kvm_arm_set_fw_reg_bmap(vcpu, reg->id, val);
>>> default:
>>> return -ENOENT;
>>> }
>>> diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
>>> index 346535169faa..67d1273e8086 100644
>>> --- a/arch/arm64/kvm/psci.c
>>> +++ b/arch/arm64/kvm/psci.c
>>> @@ -436,3 +436,16 @@ int kvm_psci_call(struct kvm_vcpu *vcpu)
>>> return -EINVAL;
>>> }
>>> }
>>> +
>>> +bool kvm_psci_func_id_is_valid(struct kvm_vcpu *vcpu, u32 func_id)
>>> +{
>>> + /* PSCI 0.1 doesn't comply with the standard SMCCC */
>>> + if (kvm_psci_version(vcpu) == KVM_ARM_PSCI_0_1)
>>> + return (func_id == KVM_PSCI_FN_CPU_OFF || func_id == KVM_PSCI_FN_CPU_ON);
>>> +
>>> + if (ARM_SMCCC_OWNER_NUM(func_id) == ARM_SMCCC_OWNER_STANDARD &&
>>> + ARM_SMCCC_FUNC_NUM(func_id) >= 0 && ARM_SMCCC_FUNC_NUM(func_id) <= 0x1f)
>>> + return true;
>>> +
>>> + return false;
>>> +}
>>> diff --git a/include/kvm/arm_hypercalls.h b/include/kvm/arm_hypercalls.h
>>> index 5d38628a8d04..499b45b607b6 100644
>>> --- a/include/kvm/arm_hypercalls.h
>>> +++ b/include/kvm/arm_hypercalls.h
>>> @@ -6,6 +6,11 @@
>>>
>>> #include <asm/kvm_emulate.h>
>>>
>>> +/* Last valid bits of the bitmapped firmware registers */
>>> +#define KVM_REG_ARM_STD_BMAP_BIT_MAX 0
>>> +
>>> +#define KVM_ARM_SMCCC_STD_FEATURES GENMASK(KVM_REG_ARM_STD_BMAP_BIT_MAX, 0)
>>> +
>>
>> s/bits of/bit of
>>
> Great catch again!
>
>>> int kvm_hvc_call_handler(struct kvm_vcpu *vcpu);
>>>
>>> static inline u32 smccc_get_function(struct kvm_vcpu *vcpu)
>>> @@ -42,6 +47,7 @@ static inline void smccc_set_retval(struct kvm_vcpu *vcpu,
>>>
>>> struct kvm_one_reg;
>>>
>>> +void kvm_arm_init_hypercalls(struct kvm *kvm);
>>> int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu);
>>> int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
>>> int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
>>> diff --git a/include/kvm/arm_psci.h b/include/kvm/arm_psci.h
>>> index 6e55b9283789..c47be3e26965 100644
>>> --- a/include/kvm/arm_psci.h
>>> +++ b/include/kvm/arm_psci.h
>>> @@ -36,7 +36,7 @@ static inline int kvm_psci_version(struct kvm_vcpu *vcpu)
>>> return KVM_ARM_PSCI_0_1;
>>> }
>>>
>>> -
>>> int kvm_psci_call(struct kvm_vcpu *vcpu);
>>> +bool kvm_psci_func_id_is_valid(struct kvm_vcpu *vcpu, u32 func_id);
>>>
>>> #endif /* __KVM_ARM_PSCI_H__ */
>>>
>>
>> Thanks,
>> Gavin
>>
>
> Thanks for the review, Gavin.
>
No worries :)
Thanks,
Gavin
Hi Raghu,
On Mon, Apr 25, 2022 at 9:46 AM Raghavendra Rao Ananta
<[email protected]> wrote:
>
> Hi Reiji,
>
> On Sun, Apr 24, 2022 at 9:52 PM Reiji Watanabe <[email protected]> wrote:
> >
> > Hi Raghu,
> >
> > On Fri, Apr 22, 2022 at 5:03 PM Raghavendra Rao Ananta
> > <[email protected]> wrote:
> > >
> > > KVM regularly introduces new hypercall services to the guests without
> > > any consent from the userspace. This means, the guests can observe
> > > hypercall services in and out as they migrate across various host
> > > kernel versions. This could be a major problem if the guest
> > > discovered a hypercall, started using it, and after getting migrated
> > > to an older kernel realizes that it's no longer available. Depending
> > > on how the guest handles the change, there's a potential chance that
> > > the guest would just panic.
> > >
> > > As a result, there's a need for the userspace to elect the services
> > > that it wishes the guest to discover. It can elect these services
> > > based on the kernels spread across its (migration) fleet. To remedy
> > > this, extend the existing firmware pseudo-registers, such as
> > > KVM_REG_ARM_PSCI_VERSION, but by creating a new COPROC register space
> > > for all the hypercall services available.
> > >
> > > These firmware registers are categorized based on the service call
> > > owners, but unlike the existing firmware pseudo-registers, they hold
> > > the features supported in the form of a bitmap.
> > >
> > > During the VM initialization, the registers are set to upper-limit of
> > > the features supported by the corresponding registers. It's expected
> > > that the VMMs discover the features provided by each register via
> > > GET_ONE_REG, and write back the desired values using SET_ONE_REG.
> > > KVM allows this modification only until the VM has started.
> > >
> > > Some of the standard features are not mapped to any bits of the
> > > registers. But since they can recreate the original problem of
> > > making it available without userspace's consent, they need to
> > > be explicitly added to the case-list in
> > > kvm_hvc_call_default_allowed(). Any function-id that's not enabled
> > > via the bitmap, or not listed in kvm_hvc_call_default_allowed, will
> > > be returned as SMCCC_RET_NOT_SUPPORTED to the guest.
> > >
> > > Older userspace code can simply ignore the feature and the
> > > hypercall services will be exposed unconditionally to the guests,
> > > thus ensuring backward compatibility.
> > >
> > > In this patch, the framework adds the register only for ARM's standard
> > > secure services (owner value 4). Currently, this includes support only
> > > for ARM True Random Number Generator (TRNG) service, with bit-0 of the
> > > register representing mandatory features of v1.0. Other services are
> > > momentarily added in the upcoming patches.
> > >
> > > Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> > > ---
> > > arch/arm64/include/asm/kvm_host.h | 12 ++++
> > > arch/arm64/include/uapi/asm/kvm.h | 9 +++
> > > arch/arm64/kvm/arm.c | 1 +
> > > arch/arm64/kvm/guest.c | 8 ++-
> > > arch/arm64/kvm/hypercalls.c | 94 +++++++++++++++++++++++++++++++
> > > arch/arm64/kvm/psci.c | 13 +++++
> > > include/kvm/arm_hypercalls.h | 6 ++
> > > include/kvm/arm_psci.h | 2 +-
> > > 8 files changed, 142 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > > index 94a27a7520f4..df07f4c10197 100644
> > > --- a/arch/arm64/include/asm/kvm_host.h
> > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > @@ -101,6 +101,15 @@ struct kvm_s2_mmu {
> > > struct kvm_arch_memory_slot {
> > > };
> > >
> > > +/**
> > > + * struct kvm_smccc_features: Descriptor the hypercall services exposed to the guests
> > > + *
> > > + * @std_bmap: Bitmap of standard secure service calls
> > > + */
> > > +struct kvm_smccc_features {
> > > + unsigned long std_bmap;
> > > +};
> > > +
> > > struct kvm_arch {
> > > struct kvm_s2_mmu mmu;
> > >
> > > @@ -150,6 +159,9 @@ struct kvm_arch {
> > >
> > > u8 pfr0_csv2;
> > > u8 pfr0_csv3;
> > > +
> > > + /* Hypercall features firmware registers' descriptor */
> > > + struct kvm_smccc_features smccc_feat;
> > > };
> > >
> > > struct kvm_vcpu_fault_info {
> > > diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> > > index c1b6ddc02d2f..0b79d2dc6ffd 100644
> > > --- a/arch/arm64/include/uapi/asm/kvm.h
> > > +++ b/arch/arm64/include/uapi/asm/kvm.h
> > > @@ -332,6 +332,15 @@ struct kvm_arm_copy_mte_tags {
> > > #define KVM_ARM64_SVE_VLS_WORDS \
> > > ((KVM_ARM64_SVE_VQ_MAX - KVM_ARM64_SVE_VQ_MIN) / 64 + 1)
> > >
> > > +/* Bitmap feature firmware registers */
> > > +#define KVM_REG_ARM_FW_FEAT_BMAP (0x0016 << KVM_REG_ARM_COPROC_SHIFT)
> > > +#define KVM_REG_ARM_FW_FEAT_BMAP_REG(r) (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
> > > + KVM_REG_ARM_FW_FEAT_BMAP | \
> > > + ((r) & 0xffff))
> > > +
> > > +#define KVM_REG_ARM_STD_BMAP KVM_REG_ARM_FW_FEAT_BMAP_REG(0)
> > > +#define KVM_REG_ARM_STD_BIT_TRNG_V1_0 0
> > > +
> > > /* Device Control API: ARM VGIC */
> > > #define KVM_DEV_ARM_VGIC_GRP_ADDR 0
> > > #define KVM_DEV_ARM_VGIC_GRP_DIST_REGS 1
> > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > > index 523bc934fe2f..a37fadbd617e 100644
> > > --- a/arch/arm64/kvm/arm.c
> > > +++ b/arch/arm64/kvm/arm.c
> > > @@ -156,6 +156,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> > > kvm->arch.max_vcpus = kvm_arm_default_max_vcpus();
> > >
> > > set_default_spectre(kvm);
> > > + kvm_arm_init_hypercalls(kvm);
> > >
> > > return ret;
> > > out_free_stage2_pgd:
> > > diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> > > index 0d5cca56cbda..8c607199cad1 100644
> > > --- a/arch/arm64/kvm/guest.c
> > > +++ b/arch/arm64/kvm/guest.c
> > > @@ -756,7 +756,9 @@ int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > >
> > > switch (reg->id & KVM_REG_ARM_COPROC_MASK) {
> > > case KVM_REG_ARM_CORE: return get_core_reg(vcpu, reg);
> > > - case KVM_REG_ARM_FW: return kvm_arm_get_fw_reg(vcpu, reg);
> > > + case KVM_REG_ARM_FW:
> > > + case KVM_REG_ARM_FW_FEAT_BMAP:
> > > + return kvm_arm_get_fw_reg(vcpu, reg);
> > > case KVM_REG_ARM64_SVE: return get_sve_reg(vcpu, reg);
> > > }
> > >
> > > @@ -774,7 +776,9 @@ int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > >
> > > switch (reg->id & KVM_REG_ARM_COPROC_MASK) {
> > > case KVM_REG_ARM_CORE: return set_core_reg(vcpu, reg);
> > > - case KVM_REG_ARM_FW: return kvm_arm_set_fw_reg(vcpu, reg);
> > > + case KVM_REG_ARM_FW:
> > > + case KVM_REG_ARM_FW_FEAT_BMAP:
> > > + return kvm_arm_set_fw_reg(vcpu, reg);
> > > case KVM_REG_ARM64_SVE: return set_sve_reg(vcpu, reg);
> > > }
> > >
> > > diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> > > index fa6d9378d8e7..df55a04d2fe8 100644
> > > --- a/arch/arm64/kvm/hypercalls.c
> > > +++ b/arch/arm64/kvm/hypercalls.c
> > > @@ -58,6 +58,48 @@ static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
> > > val[3] = lower_32_bits(cycles);
> > > }
> > >
> > > +static bool kvm_arm_fw_reg_feat_enabled(unsigned long *reg_bmap, unsigned long feat_bit)
> > > +{
> > > + return test_bit(feat_bit, reg_bmap);
> > > +}
> > > +
> > > +static bool kvm_hvc_call_default_allowed(struct kvm_vcpu *vcpu, u32 func_id)
> > > +{
> > > + switch (func_id) {
> > > + /*
> > > + * List of function-ids that are not gated with the bitmapped feature
> > > + * firmware registers, and are to be allowed for servicing the call by default.
> > > + */
> > > + case ARM_SMCCC_VERSION_FUNC_ID:
> > > + case ARM_SMCCC_ARCH_FEATURES_FUNC_ID:
> > > + case ARM_SMCCC_HV_PV_TIME_FEATURES:
> > > + case ARM_SMCCC_HV_PV_TIME_ST:
> > > + case ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID:
> > > + case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
> > > + case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
> > > + return true;
> > > + default:
> > > + return kvm_psci_func_id_is_valid(vcpu, func_id);
> > > + }
> > > +}
> > > +
> > > +static bool kvm_hvc_call_allowed(struct kvm_vcpu *vcpu, u32 func_id)
> > > +{
> > > + struct kvm_smccc_features *smccc_feat = &vcpu->kvm->arch.smccc_feat;
> > > +
> > > + switch (func_id) {
> > > + case ARM_SMCCC_TRNG_VERSION:
> > > + case ARM_SMCCC_TRNG_FEATURES:
> > > + case ARM_SMCCC_TRNG_GET_UUID:
> > > + case ARM_SMCCC_TRNG_RND32:
> > > + case ARM_SMCCC_TRNG_RND64:
> > > + return kvm_arm_fw_reg_feat_enabled(&smccc_feat->std_bmap,
> > > + KVM_REG_ARM_STD_BIT_TRNG_V1_0);
> > > + default:
> > > + return kvm_hvc_call_default_allowed(vcpu, func_id);
> > > + }
> > > +}
> > > +
> > > int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> > > {
> > > u32 func_id = smccc_get_function(vcpu);
> > > @@ -65,6 +107,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> > > u32 feature;
> > > gpa_t gpa;
> > >
> > > + if (!kvm_hvc_call_allowed(vcpu, func_id))
> > > + goto out;
> > > +
> > > switch (func_id) {
> > > case ARM_SMCCC_VERSION_FUNC_ID:
> > > val[0] = ARM_SMCCC_VERSION_1_1;
> > > @@ -155,6 +200,7 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> > > return kvm_psci_call(vcpu);
> > > }
> > >
> > > +out:
> > > smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
> > > return 1;
> > > }
> > > @@ -164,8 +210,16 @@ static const u64 kvm_arm_fw_reg_ids[] = {
> > > KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1,
> > > KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2,
> > > KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3,
> > > + KVM_REG_ARM_STD_BMAP,
> > > };
> > >
> > > +void kvm_arm_init_hypercalls(struct kvm *kvm)
> > > +{
> > > + struct kvm_smccc_features *smccc_feat = &kvm->arch.smccc_feat;
> > > +
> > > + smccc_feat->std_bmap = KVM_ARM_SMCCC_STD_FEATURES;
> > > +}
> > > +
> > > int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu)
> > > {
> > > return ARRAY_SIZE(kvm_arm_fw_reg_ids);
> > > @@ -237,6 +291,7 @@ static int get_kernel_wa_level(u64 regid)
> > >
> > > int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > > {
> > > + struct kvm_smccc_features *smccc_feat = &vcpu->kvm->arch.smccc_feat;
> > > void __user *uaddr = (void __user *)(long)reg->addr;
> > > u64 val;
> > >
> > > @@ -249,6 +304,9 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > > case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3:
> > > val = get_kernel_wa_level(reg->id) & KVM_REG_FEATURE_LEVEL_MASK;
> > > break;
> > > + case KVM_REG_ARM_STD_BMAP:
> > > + val = READ_ONCE(smccc_feat->std_bmap);
> > > + break;
> > > default:
> > > return -ENOENT;
> > > }
> > > @@ -259,6 +317,40 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > > return 0;
> > > }
> > >
> > > +static int kvm_arm_set_fw_reg_bmap(struct kvm_vcpu *vcpu, u64 reg_id, u64 val)
> > > +{
> > > + int ret = 0;
> > > + struct kvm *kvm = vcpu->kvm;
> > > + struct kvm_smccc_features *smccc_feat = &kvm->arch.smccc_feat;
> > > + unsigned long *fw_reg_bmap, fw_reg_features;
> > > +
> > > + switch (reg_id) {
> > > + case KVM_REG_ARM_STD_BMAP:
> > > + fw_reg_bmap = &smccc_feat->std_bmap;
> > > + fw_reg_features = KVM_ARM_SMCCC_STD_FEATURES;
> > > + break;
> > > + default:
> > > + return -ENOENT;
> > > + }
> > > +
> > > + /* Check for unsupported bit */
> > > + if (val & ~fw_reg_features)
> > > + return -EINVAL;
> > > +
> > > + mutex_lock(&kvm->lock);
> >
> > Why don't you check if the register value will be modified before
> > getting the lock ? (then there is nothing to do)
> > It would help reduce unnecessary serialization for live migration
> > (even without the vm-scoped register capability).
> >
> That was the case until v5. Since v6, we return -EBUSY unconditionally
> regardless of the incoming value. See Marc's comments in [1].
> That was the case until v5. Since v6, we return -EBUSY unconditionally
> regardless of the incoming value. See Marc's comments in [1].
Even with that, the function could do below to avoid
the unnecessary serialization.
(I would expect mostly the function returns before getting the lock)
if (test_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &kvm->arch.flags))
return -EBUSY;
if (val == *fw_reg_bmap)
return 0;
mutex_lock(&kvm->lock);
<...>
> >
> >
> > > +
> > > + /* Return -EBUSY if the VM (any vCPU) has already started running. */
> > > + if (test_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &kvm->arch.flags)) {
> > > + ret = -EBUSY;
> > > + goto out;
> > > + }
> >
> > I just would like to make sure that you are sure that existing
> > userspace you know will not run KVM_RUN for any vCPUs until
> > KVM_SET_ONE_REG is complete for all vCPUs (even for migration),
> > correct ?
> >
> Since v6, that is something that we are leaving with the userspace to
> synchronize. See [1].
Understood.
> > > +o
> > > + WRITE_ONCE(*fw_reg_bmap, val);
> > > +out:
> > > + mutex_unlock(&kvm->lock);
> > > + return ret;
> > > +}
> > > +
> > > int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > > {
> > > void __user *uaddr = (void __user *)(long)reg->addr;
> > > @@ -337,6 +429,8 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > > return -EINVAL;
> > >
> > > return 0;
> > > + case KVM_REG_ARM_STD_BMAP:
> > > + return kvm_arm_set_fw_reg_bmap(vcpu, reg->id, val);
> > > default:
> > > return -ENOENT;
> > > }
> > > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> > > index 346535169faa..67d1273e8086 100644
> > > --- a/arch/arm64/kvm/psci.c
> > > +++ b/arch/arm64/kvm/psci.c
> > > @@ -436,3 +436,16 @@ int kvm_psci_call(struct kvm_vcpu *vcpu)
> > > return -EINVAL;
> > > }
> > > }
> > > +
> > > +bool kvm_psci_func_id_is_valid(struct kvm_vcpu *vcpu, u32 func_id)
> > > +{
> > > + /* PSCI 0.1 doesn't comply with the standard SMCCC */
> > > + if (kvm_psci_version(vcpu) == KVM_ARM_PSCI_0_1)
> > > + return (func_id == KVM_PSCI_FN_CPU_OFF || func_id == KVM_PSCI_FN_CPU_ON);
> > > +
> > > + if (ARM_SMCCC_OWNER_NUM(func_id) == ARM_SMCCC_OWNER_STANDARD &&
> > > + ARM_SMCCC_FUNC_NUM(func_id) >= 0 && ARM_SMCCC_FUNC_NUM(func_id) <= 0x1f)
> > > + return true;
> >
> > For PSCI 0.1, the function checks if the funct_id is valid for
> > the vCPU (according to the vCPU's PSCI version).
> > For other version of PSCI, the function doesn't care the vCPU's
> > PSCI version (although supported functions depend on the PSCI
> > version and not all of them are defined yet, the code returns
> > true as long as the function id is within the reserved PSCI
> > function id range).
> > So, the behavior appears to be inconsistent.
> > Shouldn't it return the validity of the function id according
> > to the vCPU's psci version for non-PSCI 0.1 case as well ?
> > (Otherwise, shouldn't it return true if the function id is valid
> > for any of the PSCI versions ?)
> >
> Well, PSCI 1.0 is somewhat of an odd implementation. It doesn't comply
> with the SMCCC, hence needed some special handling. Only two func_ids> are currently supported by KVM, and we just check for each. The second
> 'if' statement is for all the PSCI versions >= 0.2. Thankfully, the
> specification defines a range of acceptable PSCI func_ids.
I understand PSCI 0.1 is different from PSCI 0.2 or newer versions.
But, my question is: What would you consider "valid" psci function id ?
It seems that the function checks whether or not the func_id is valid
on the vCPU for PSCI 0.1, and checks whether or not the func_id is a
PSCI function id for vCPU with PSCI 0.2 or newer.
I understand either one works for your purpose, but I would think
the behavior should be consistent.
Thanks,
Reiji
>
> If it's confusing, I can add a comment above the second 'if' that it's
> for all PSCI versions >= 0.2.
> > Thanks,
> > Reiji
> >
> Thank you.
> Raghavendra
>
> [1]: https://lore.kernel.org/lkml/[email protected]/
> >
> >
> > > +
> > > + return false;
> > > +}
> > > diff --git a/include/kvm/arm_hypercalls.h b/include/kvm/arm_hypercalls.h
> > > index 5d38628a8d04..499b45b607b6 100644
> > > --- a/include/kvm/arm_hypercalls.h
> > > +++ b/include/kvm/arm_hypercalls.h
> > > @@ -6,6 +6,11 @@
> > >
> > > #include <asm/kvm_emulate.h>
> > >
> > > +/* Last valid bits of the bitmapped firmware registers */
> > > +#define KVM_REG_ARM_STD_BMAP_BIT_MAX 0
> > > +
> > > +#define KVM_ARM_SMCCC_STD_FEATURES GENMASK(KVM_REG_ARM_STD_BMAP_BIT_MAX, 0)
> > > +
> > > int kvm_hvc_call_handler(struct kvm_vcpu *vcpu);
> > >
> > > static inline u32 smccc_get_function(struct kvm_vcpu *vcpu)
> > > @@ -42,6 +47,7 @@ static inline void smccc_set_retval(struct kvm_vcpu *vcpu,
> > >
> > > struct kvm_one_reg;
> > >
> > > +void kvm_arm_init_hypercalls(struct kvm *kvm);
> > > int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu);
> > > int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
> > > int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
> > > diff --git a/include/kvm/arm_psci.h b/include/kvm/arm_psci.h
> > > index 6e55b9283789..c47be3e26965 100644
> > > --- a/include/kvm/arm_psci.h
> > > +++ b/include/kvm/arm_psci.h
> > > @@ -36,7 +36,7 @@ static inline int kvm_psci_version(struct kvm_vcpu *vcpu)
> > > return KVM_ARM_PSCI_0_1;
> > > }
> > >
> > > -
> > > int kvm_psci_call(struct kvm_vcpu *vcpu);
> > > +bool kvm_psci_func_id_is_valid(struct kvm_vcpu *vcpu, u32 func_id);
> > >
> > > #endif /* __KVM_ARM_PSCI_H__ */
> > > --
> > > 2.36.0.rc2.479.g8af0fa9b8e-goog
> > >
Hi Gavin,
On Mon, Apr 25, 2022 at 11:34 PM Gavin Shan <[email protected]> wrote:
>
> Hi Raghavendra,
>
> On 4/23/22 8:03 AM, Raghavendra Rao Ananta wrote:
> > KVM regularly introduces new hypercall services to the guests without
> > any consent from the userspace. This means, the guests can observe
> > hypercall services in and out as they migrate across various host
> > kernel versions. This could be a major problem if the guest
> > discovered a hypercall, started using it, and after getting migrated
> > to an older kernel realizes that it's no longer available. Depending
> > on how the guest handles the change, there's a potential chance that
> > the guest would just panic.
> >
> > As a result, there's a need for the userspace to elect the services
> > that it wishes the guest to discover. It can elect these services
> > based on the kernels spread across its (migration) fleet. To remedy
> > this, extend the existing firmware pseudo-registers, such as
> > KVM_REG_ARM_PSCI_VERSION, but by creating a new COPROC register space
> > for all the hypercall services available.
> >
> > These firmware registers are categorized based on the service call
> > owners, but unlike the existing firmware pseudo-registers, they hold
> > the features supported in the form of a bitmap.
> >
> > During the VM initialization, the registers are set to upper-limit of
> > the features supported by the corresponding registers. It's expected
> > that the VMMs discover the features provided by each register via
> > GET_ONE_REG, and write back the desired values using SET_ONE_REG.
> > KVM allows this modification only until the VM has started.
> >
> > Some of the standard features are not mapped to any bits of the
> > registers. But since they can recreate the original problem of
> > making it available without userspace's consent, they need to
> > be explicitly added to the case-list in
> > kvm_hvc_call_default_allowed(). Any function-id that's not enabled
> > via the bitmap, or not listed in kvm_hvc_call_default_allowed, will
> > be returned as SMCCC_RET_NOT_SUPPORTED to the guest.
> >
> > Older userspace code can simply ignore the feature and the
> > hypercall services will be exposed unconditionally to the guests,
> > thus ensuring backward compatibility.
> >
> > In this patch, the framework adds the register only for ARM's standard
> > secure services (owner value 4). Currently, this includes support only
> > for ARM True Random Number Generator (TRNG) service, with bit-0 of the
> > register representing mandatory features of v1.0. Other services are
> > momentarily added in the upcoming patches.
> >
> > Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> > ---
> > arch/arm64/include/asm/kvm_host.h | 12 ++++
> > arch/arm64/include/uapi/asm/kvm.h | 9 +++
> > arch/arm64/kvm/arm.c | 1 +
> > arch/arm64/kvm/guest.c | 8 ++-
> > arch/arm64/kvm/hypercalls.c | 94 +++++++++++++++++++++++++++++++
> > arch/arm64/kvm/psci.c | 13 +++++
> > include/kvm/arm_hypercalls.h | 6 ++
> > include/kvm/arm_psci.h | 2 +-
> > 8 files changed, 142 insertions(+), 3 deletions(-)
> >
>
> Some nits as below, please consider to improve if you need another
> respin.
>
> Reviewed-by: Gavin Shan <[email protected]>
>
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 94a27a7520f4..df07f4c10197 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -101,6 +101,15 @@ struct kvm_s2_mmu {
> > struct kvm_arch_memory_slot {
> > };
> >
> > +/**
> > + * struct kvm_smccc_features: Descriptor the hypercall services exposed to the guests
> > + *
> > + * @std_bmap: Bitmap of standard secure service calls
> > + */
> > +struct kvm_smccc_features {
> > + unsigned long std_bmap;
> > +};
> > +
>
> s/Descriptor/Descriptor of
>
Nice catch!
> > struct kvm_arch {
> > struct kvm_s2_mmu mmu;
> >
> > @@ -150,6 +159,9 @@ struct kvm_arch {
> >
> > u8 pfr0_csv2;
> > u8 pfr0_csv3;
> > +
> > + /* Hypercall features firmware registers' descriptor */
> > + struct kvm_smccc_features smccc_feat;
> > };
> >
> > struct kvm_vcpu_fault_info {
> > diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> > index c1b6ddc02d2f..0b79d2dc6ffd 100644
> > --- a/arch/arm64/include/uapi/asm/kvm.h
> > +++ b/arch/arm64/include/uapi/asm/kvm.h
> > @@ -332,6 +332,15 @@ struct kvm_arm_copy_mte_tags {
> > #define KVM_ARM64_SVE_VLS_WORDS \
> > ((KVM_ARM64_SVE_VQ_MAX - KVM_ARM64_SVE_VQ_MIN) / 64 + 1)
> >
> > +/* Bitmap feature firmware registers */
> > +#define KVM_REG_ARM_FW_FEAT_BMAP (0x0016 << KVM_REG_ARM_COPROC_SHIFT)
> > +#define KVM_REG_ARM_FW_FEAT_BMAP_REG(r) (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
> > + KVM_REG_ARM_FW_FEAT_BMAP | \
> > + ((r) & 0xffff))
> > +
> > +#define KVM_REG_ARM_STD_BMAP KVM_REG_ARM_FW_FEAT_BMAP_REG(0)
> > +#define KVM_REG_ARM_STD_BIT_TRNG_V1_0 0
> > +
> > /* Device Control API: ARM VGIC */
> > #define KVM_DEV_ARM_VGIC_GRP_ADDR 0
> > #define KVM_DEV_ARM_VGIC_GRP_DIST_REGS 1
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 523bc934fe2f..a37fadbd617e 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -156,6 +156,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> > kvm->arch.max_vcpus = kvm_arm_default_max_vcpus();
> >
> > set_default_spectre(kvm);
> > + kvm_arm_init_hypercalls(kvm);
> >
> > return ret;
> > out_free_stage2_pgd:
> > diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> > index 0d5cca56cbda..8c607199cad1 100644
> > --- a/arch/arm64/kvm/guest.c
> > +++ b/arch/arm64/kvm/guest.c
> > @@ -756,7 +756,9 @@ int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> >
> > switch (reg->id & KVM_REG_ARM_COPROC_MASK) {
> > case KVM_REG_ARM_CORE: return get_core_reg(vcpu, reg);
> > - case KVM_REG_ARM_FW: return kvm_arm_get_fw_reg(vcpu, reg);
> > + case KVM_REG_ARM_FW:
> > + case KVM_REG_ARM_FW_FEAT_BMAP:
> > + return kvm_arm_get_fw_reg(vcpu, reg);
> > case KVM_REG_ARM64_SVE: return get_sve_reg(vcpu, reg);
> > }
> >
> > @@ -774,7 +776,9 @@ int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> >
> > switch (reg->id & KVM_REG_ARM_COPROC_MASK) {
> > case KVM_REG_ARM_CORE: return set_core_reg(vcpu, reg);
> > - case KVM_REG_ARM_FW: return kvm_arm_set_fw_reg(vcpu, reg);
> > + case KVM_REG_ARM_FW:
> > + case KVM_REG_ARM_FW_FEAT_BMAP:
> > + return kvm_arm_set_fw_reg(vcpu, reg);
> > case KVM_REG_ARM64_SVE: return set_sve_reg(vcpu, reg);
> > }
> >
> > diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> > index fa6d9378d8e7..df55a04d2fe8 100644
> > --- a/arch/arm64/kvm/hypercalls.c
> > +++ b/arch/arm64/kvm/hypercalls.c
> > @@ -58,6 +58,48 @@ static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
> > val[3] = lower_32_bits(cycles);
> > }
> >
> > +static bool kvm_arm_fw_reg_feat_enabled(unsigned long *reg_bmap, unsigned long feat_bit)
> > +{
> > + return test_bit(feat_bit, reg_bmap);
> > +}
> > +
>
> Might be worhty to be 'inline'. This function would be called
> frequently.
>
I was hoping the compiler would optimize it for us as needed.
> > +static bool kvm_hvc_call_default_allowed(struct kvm_vcpu *vcpu, u32 func_id)
> > +{
> > + switch (func_id) {
> > + /*
> > + * List of function-ids that are not gated with the bitmapped feature
> > + * firmware registers, and are to be allowed for servicing the call by default.
> > + */
> > + case ARM_SMCCC_VERSION_FUNC_ID:
> > + case ARM_SMCCC_ARCH_FEATURES_FUNC_ID:
> > + case ARM_SMCCC_HV_PV_TIME_FEATURES:
> > + case ARM_SMCCC_HV_PV_TIME_ST:
> > + case ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID:
> > + case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
> > + case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
> > + return true;
> > + default:
> > + return kvm_psci_func_id_is_valid(vcpu, func_id);
> > + }
> > +}
> > +
> > +static bool kvm_hvc_call_allowed(struct kvm_vcpu *vcpu, u32 func_id)
> > +{
> > + struct kvm_smccc_features *smccc_feat = &vcpu->kvm->arch.smccc_feat;
> > +
> > + switch (func_id) {
> > + case ARM_SMCCC_TRNG_VERSION:
> > + case ARM_SMCCC_TRNG_FEATURES:
> > + case ARM_SMCCC_TRNG_GET_UUID:
> > + case ARM_SMCCC_TRNG_RND32:
> > + case ARM_SMCCC_TRNG_RND64:
> > + return kvm_arm_fw_reg_feat_enabled(&smccc_feat->std_bmap,
> > + KVM_REG_ARM_STD_BIT_TRNG_V1_0);
> > + default:
> > + return kvm_hvc_call_default_allowed(vcpu, func_id);
> > + }
> > +}
> > +
> > int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> > {
> > u32 func_id = smccc_get_function(vcpu);
> > @@ -65,6 +107,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> > u32 feature;
> > gpa_t gpa;
> >
> > + if (!kvm_hvc_call_allowed(vcpu, func_id))
> > + goto out;
> > +
> > switch (func_id) {
> > case ARM_SMCCC_VERSION_FUNC_ID:
> > val[0] = ARM_SMCCC_VERSION_1_1;
> > @@ -155,6 +200,7 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> > return kvm_psci_call(vcpu);
> > }
> >
> > +out:
> > smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
> > return 1;
> > }
> > @@ -164,8 +210,16 @@ static const u64 kvm_arm_fw_reg_ids[] = {
> > KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1,
> > KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2,
> > KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3,
> > + KVM_REG_ARM_STD_BMAP,
> > };
> >
> > +void kvm_arm_init_hypercalls(struct kvm *kvm)
> > +{
> > + struct kvm_smccc_features *smccc_feat = &kvm->arch.smccc_feat;
> > +
> > + smccc_feat->std_bmap = KVM_ARM_SMCCC_STD_FEATURES;
> > +}
> > +
> > int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu)
> > {
> > return ARRAY_SIZE(kvm_arm_fw_reg_ids);
> > @@ -237,6 +291,7 @@ static int get_kernel_wa_level(u64 regid)
> >
> > int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > {
> > + struct kvm_smccc_features *smccc_feat = &vcpu->kvm->arch.smccc_feat;
> > void __user *uaddr = (void __user *)(long)reg->addr;
> > u64 val;
> >
> > @@ -249,6 +304,9 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3:
> > val = get_kernel_wa_level(reg->id) & KVM_REG_FEATURE_LEVEL_MASK;
> > break;
> > + case KVM_REG_ARM_STD_BMAP:
> > + val = READ_ONCE(smccc_feat->std_bmap);
> > + break;
> > default:
> > return -ENOENT;
> > }
> > @@ -259,6 +317,40 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > return 0;
> > }
> >
> > +static int kvm_arm_set_fw_reg_bmap(struct kvm_vcpu *vcpu, u64 reg_id, u64 val)
> > +{
> > + int ret = 0;
> > + struct kvm *kvm = vcpu->kvm;
> > + struct kvm_smccc_features *smccc_feat = &kvm->arch.smccc_feat;
> > + unsigned long *fw_reg_bmap, fw_reg_features;
> > +
> > + switch (reg_id) {
> > + case KVM_REG_ARM_STD_BMAP:
> > + fw_reg_bmap = &smccc_feat->std_bmap;
> > + fw_reg_features = KVM_ARM_SMCCC_STD_FEATURES;
> > + break;
> > + default:
> > + return -ENOENT;
> > + }
> > +
> > + /* Check for unsupported bit */
> > + if (val & ~fw_reg_features)
> > + return -EINVAL;
> > +
> > + mutex_lock(&kvm->lock);
> > +
> > + /* Return -EBUSY if the VM (any vCPU) has already started running. */
> > + if (test_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &kvm->arch.flags)) {
> > + ret = -EBUSY;
> > + goto out;
> > + }
> > +
> > + WRITE_ONCE(*fw_reg_bmap, val);
> > +out:
> > + mutex_unlock(&kvm->lock);
> > + return ret;
> > +}
> > +
> > int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > {
> > void __user *uaddr = (void __user *)(long)reg->addr;
> > @@ -337,6 +429,8 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > return -EINVAL;
> >
> > return 0;
> > + case KVM_REG_ARM_STD_BMAP:
> > + return kvm_arm_set_fw_reg_bmap(vcpu, reg->id, val);
> > default:
> > return -ENOENT;
> > }
> > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> > index 346535169faa..67d1273e8086 100644
> > --- a/arch/arm64/kvm/psci.c
> > +++ b/arch/arm64/kvm/psci.c
> > @@ -436,3 +436,16 @@ int kvm_psci_call(struct kvm_vcpu *vcpu)
> > return -EINVAL;
> > }
> > }
> > +
> > +bool kvm_psci_func_id_is_valid(struct kvm_vcpu *vcpu, u32 func_id)
> > +{
> > + /* PSCI 0.1 doesn't comply with the standard SMCCC */
> > + if (kvm_psci_version(vcpu) == KVM_ARM_PSCI_0_1)
> > + return (func_id == KVM_PSCI_FN_CPU_OFF || func_id == KVM_PSCI_FN_CPU_ON);
> > +
> > + if (ARM_SMCCC_OWNER_NUM(func_id) == ARM_SMCCC_OWNER_STANDARD &&
> > + ARM_SMCCC_FUNC_NUM(func_id) >= 0 && ARM_SMCCC_FUNC_NUM(func_id) <= 0x1f)
> > + return true;
> > +
> > + return false;
> > +}
> > diff --git a/include/kvm/arm_hypercalls.h b/include/kvm/arm_hypercalls.h
> > index 5d38628a8d04..499b45b607b6 100644
> > --- a/include/kvm/arm_hypercalls.h
> > +++ b/include/kvm/arm_hypercalls.h
> > @@ -6,6 +6,11 @@
> >
> > #include <asm/kvm_emulate.h>
> >
> > +/* Last valid bits of the bitmapped firmware registers */
> > +#define KVM_REG_ARM_STD_BMAP_BIT_MAX 0
> > +
> > +#define KVM_ARM_SMCCC_STD_FEATURES GENMASK(KVM_REG_ARM_STD_BMAP_BIT_MAX, 0)
> > +
>
> s/bits of/bit of
>
Great catch again!
> > int kvm_hvc_call_handler(struct kvm_vcpu *vcpu);
> >
> > static inline u32 smccc_get_function(struct kvm_vcpu *vcpu)
> > @@ -42,6 +47,7 @@ static inline void smccc_set_retval(struct kvm_vcpu *vcpu,
> >
> > struct kvm_one_reg;
> >
> > +void kvm_arm_init_hypercalls(struct kvm *kvm);
> > int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu);
> > int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
> > int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
> > diff --git a/include/kvm/arm_psci.h b/include/kvm/arm_psci.h
> > index 6e55b9283789..c47be3e26965 100644
> > --- a/include/kvm/arm_psci.h
> > +++ b/include/kvm/arm_psci.h
> > @@ -36,7 +36,7 @@ static inline int kvm_psci_version(struct kvm_vcpu *vcpu)
> > return KVM_ARM_PSCI_0_1;
> > }
> >
> > -
> > int kvm_psci_call(struct kvm_vcpu *vcpu);
> > +bool kvm_psci_func_id_is_valid(struct kvm_vcpu *vcpu, u32 func_id);
> >
> > #endif /* __KVM_ARM_PSCI_H__ */
> >
>
> Thanks,
> Gavin
>
Thanks for the review, Gavin.
- Raghavendra
Hi Raghavendra,
On 4/27/22 12:59 AM, Raghavendra Rao Ananta wrote:
> On Tue, Apr 26, 2022 at 12:50 AM Gavin Shan <[email protected]> wrote:
>> On 4/23/22 8:03 AM, Raghavendra Rao Ananta wrote:
>>> Introduce a KVM selftest to check the hypercall interface
>>> for arm64 platforms. The test validates the user-space'
>>> [GET|SET]_ONE_REG interface to read/write the psuedo-firmware
>>> registers as well as its effects on the guest upon certain
>>> configurations.
>>>
>>> Signed-off-by: Raghavendra Rao Ananta <[email protected]>
>>> ---
>>> tools/testing/selftests/kvm/.gitignore | 1 +
>>> tools/testing/selftests/kvm/Makefile | 1 +
>>> .../selftests/kvm/aarch64/hypercalls.c | 335 ++++++++++++++++++
>>> 3 files changed, 337 insertions(+)
>>> create mode 100644 tools/testing/selftests/kvm/aarch64/hypercalls.c
>>>
>>
>> There are comments about @false_hvc_info[] and some nits, as below.
>> Please evaluate and improve if it makes sense to you. Otherwise, it
>> looks good to me:
>>
>> Reviewed-by: Gavin Shan <[email protected]>
>>
>>> diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
>>> index 1bb575dfc42e..b17e464ec661 100644
>>> --- a/tools/testing/selftests/kvm/.gitignore
>>> +++ b/tools/testing/selftests/kvm/.gitignore
>>> @@ -2,6 +2,7 @@
>>> /aarch64/arch_timer
>>> /aarch64/debug-exceptions
>>> /aarch64/get-reg-list
>>> +/aarch64/hypercalls
>>> /aarch64/psci_test
>>> /aarch64/vcpu_width_config
>>> /aarch64/vgic_init
>>> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
>>> index c2cf4d318296..97eef0c03d3b 100644
>>> --- a/tools/testing/selftests/kvm/Makefile
>>> +++ b/tools/testing/selftests/kvm/Makefile
>>> @@ -105,6 +105,7 @@ TEST_GEN_PROGS_x86_64 += system_counter_offset_test
>>> TEST_GEN_PROGS_aarch64 += aarch64/arch_timer
>>> TEST_GEN_PROGS_aarch64 += aarch64/debug-exceptions
>>> TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
>>> +TEST_GEN_PROGS_aarch64 += aarch64/hypercalls
>>> TEST_GEN_PROGS_aarch64 += aarch64/psci_test
>>> TEST_GEN_PROGS_aarch64 += aarch64/vcpu_width_config
>>> TEST_GEN_PROGS_aarch64 += aarch64/vgic_init
>>> diff --git a/tools/testing/selftests/kvm/aarch64/hypercalls.c b/tools/testing/selftests/kvm/aarch64/hypercalls.c
>>> new file mode 100644
>>> index 000000000000..f404343a0ae3
>>> --- /dev/null
>>> +++ b/tools/testing/selftests/kvm/aarch64/hypercalls.c
>>> @@ -0,0 +1,335 @@
>>> +// SPDX-License-Identifier: GPL-2.0-only
>>> +
>>> +/* hypercalls: Check the ARM64's psuedo-firmware bitmap register interface.
>>> + *
>>> + * The test validates the basic hypercall functionalities that are exposed
>>> + * via the psuedo-firmware bitmap register. This includes the registers'
>>> + * read/write behavior before and after the VM has started, and if the
>>> + * hypercalls are properly masked or unmasked to the guest when disabled or
>>> + * enabled from the KVM userspace, respectively.
>>> + */
>>> +
>>> +#include <errno.h>
>>> +#include <linux/arm-smccc.h>
>>> +#include <asm/kvm.h>
>>> +#include <kvm_util.h>
>>> +
>>> +#include "processor.h"
>>> +
>>> +#define FW_REG_ULIMIT_VAL(max_feat_bit) (GENMASK(max_feat_bit, 0))
>>> +
>>> +/* Last valid bits of the bitmapped firmware registers */
>>> +#define KVM_REG_ARM_STD_BMAP_BIT_MAX 0
>>> +#define KVM_REG_ARM_STD_HYP_BMAP_BIT_MAX 0
>>> +#define KVM_REG_ARM_VENDOR_HYP_BMAP_BIT_MAX 1
>>> +
>>> +struct kvm_fw_reg_info {
>>> + uint64_t reg; /* Register definition */
>>> + uint64_t max_feat_bit; /* Bit that represents the upper limit of the feature-map */
>>> +};
>>> +
>>> +#define FW_REG_INFO(r) \
>>> + { \
>>> + .reg = r, \
>>> + .max_feat_bit = r##_BIT_MAX, \
>>> + }
>>> +
>>> +static const struct kvm_fw_reg_info fw_reg_info[] = {
>>> + FW_REG_INFO(KVM_REG_ARM_STD_BMAP),
>>> + FW_REG_INFO(KVM_REG_ARM_STD_HYP_BMAP),
>>> + FW_REG_INFO(KVM_REG_ARM_VENDOR_HYP_BMAP),
>>> +};
>>> +
>>> +enum test_stage {
>>> + TEST_STAGE_REG_IFACE,
>>> + TEST_STAGE_HVC_IFACE_FEAT_DISABLED,
>>> + TEST_STAGE_HVC_IFACE_FEAT_ENABLED,
>>> + TEST_STAGE_HVC_IFACE_FALSE_INFO,
>>> + TEST_STAGE_END,
>>> +};
>>> +
>>> +static int stage = TEST_STAGE_REG_IFACE;
>>> +
>>> +struct test_hvc_info {
>>> + uint32_t func_id;
>>> + uint64_t arg1;
>>> +};
>>> +
>>> +#define TEST_HVC_INFO(f, a1) \
>>> + { \
>>> + .func_id = f, \
>>> + .arg1 = a1, \
>>> + }
>>> +
>>> +static const struct test_hvc_info hvc_info[] = {
>>> + /* KVM_REG_ARM_STD_BMAP */
>>> + TEST_HVC_INFO(ARM_SMCCC_TRNG_VERSION, 0),
>>> + TEST_HVC_INFO(ARM_SMCCC_TRNG_FEATURES, ARM_SMCCC_TRNG_RND64),
>>> + TEST_HVC_INFO(ARM_SMCCC_TRNG_GET_UUID, 0),
>>> + TEST_HVC_INFO(ARM_SMCCC_TRNG_RND32, 0),
>>> + TEST_HVC_INFO(ARM_SMCCC_TRNG_RND64, 0),
>>> +
>>> + /* KVM_REG_ARM_STD_HYP_BMAP */
>>> + TEST_HVC_INFO(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, ARM_SMCCC_HV_PV_TIME_FEATURES),
>>> + TEST_HVC_INFO(ARM_SMCCC_HV_PV_TIME_FEATURES, ARM_SMCCC_HV_PV_TIME_ST),
>>> + TEST_HVC_INFO(ARM_SMCCC_HV_PV_TIME_ST, 0),
>>> +
>>> + /* KVM_REG_ARM_VENDOR_HYP_BMAP */
>>> + TEST_HVC_INFO(ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID,
>>> + ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID),
>>> + TEST_HVC_INFO(ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID, 0),
>>> + TEST_HVC_INFO(ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID, KVM_PTP_VIRT_COUNTER),
>>> +};
>>> +
>>> +/* Feed false hypercall info to test the KVM behavior */
>>> +static const struct test_hvc_info false_hvc_info[] = {
>>> + /* Feature support check against a different family of hypercalls */
>>> + TEST_HVC_INFO(ARM_SMCCC_TRNG_FEATURES, ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID),
>>> + TEST_HVC_INFO(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, ARM_SMCCC_TRNG_RND64),
>>> + TEST_HVC_INFO(ARM_SMCCC_HV_PV_TIME_FEATURES, ARM_SMCCC_TRNG_RND64),
>>> +};
>>> +
>>
>> I don't see too much benefits of @false_hvc_info[] because
>> NOT_SUPPORTED is always returned from its test case. I think
>> it and its test case can be removed if you agree. I'm not
>> sure if it was suggested by somebody else.
>>
> While this is not exactly testing the bitmap firmware registers, the
> idea behind introducing false_hvc_info[] was to introduce some
> negative tests and see if KVM handles it well. Especially with
> *_FEATURES func_ids, we can accidentally introduce functional bugs in
> KVM, and these would act as our safety net. I was planning to also
> test with some reserved hypercall numbers, just to test if the kernel
> doesn't panic for some reason.
>
Ok, thanks for the explanation. It makes sense to me.
>>> +static void guest_test_hvc(const struct test_hvc_info *hc_info)
>>> +{
>>> + unsigned int i;
>>> + struct arm_smccc_res res;
>>> + unsigned int hvc_info_arr_sz;
>>> +
>>> + hvc_info_arr_sz =
>>> + hc_info == hvc_info ? ARRAY_SIZE(hvc_info) : ARRAY_SIZE(false_hvc_info);
>>> +
>>> + for (i = 0; i < hvc_info_arr_sz; i++, hc_info++) {
>>> + memset(&res, 0, sizeof(res));
>>> + smccc_hvc(hc_info->func_id, hc_info->arg1, 0, 0, 0, 0, 0, 0, &res);
>>> +
>>> + switch (stage) {
>>> + case TEST_STAGE_HVC_IFACE_FEAT_DISABLED:
>>> + case TEST_STAGE_HVC_IFACE_FALSE_INFO:
>>> + GUEST_ASSERT_3(res.a0 == SMCCC_RET_NOT_SUPPORTED,
>>> + res.a0, hc_info->func_id, hc_info->arg1);
>>> + break;
>>> + case TEST_STAGE_HVC_IFACE_FEAT_ENABLED:
>>> + GUEST_ASSERT_3(res.a0 != SMCCC_RET_NOT_SUPPORTED,
>>> + res.a0, hc_info->func_id, hc_info->arg1);
>>> + break;
>>> + default:
>>> + GUEST_ASSERT_1(0, stage);
>>> + }
>>> + }
>>> +}
>>> +
>>> +static void guest_code(void)
>>> +{
>>> + while (stage != TEST_STAGE_END) {
>>> + switch (stage) {
>>> + case TEST_STAGE_REG_IFACE:
>>> + break;
>>> + case TEST_STAGE_HVC_IFACE_FEAT_DISABLED:
>>> + case TEST_STAGE_HVC_IFACE_FEAT_ENABLED:
>>> + guest_test_hvc(hvc_info);
>>> + break;
>>> + case TEST_STAGE_HVC_IFACE_FALSE_INFO:
>>> + guest_test_hvc(false_hvc_info);
>>> + break;
>>> + default:
>>> + GUEST_ASSERT_1(0, stage);
>>> + }
>>> +
>>> + GUEST_SYNC(stage);
>>> + }
>>> +
>>> + GUEST_DONE();
>>> +}
>>> +
>>> +static int set_fw_reg(struct kvm_vm *vm, uint64_t id, uint64_t val)
>>> +{
>>> + struct kvm_one_reg reg = {
>>> + .id = id,
>>> + .addr = (uint64_t)&val,
>>> + };
>>> +
>>> + return _vcpu_ioctl(vm, 0, KVM_SET_ONE_REG, ®);
>>> +}
>>> +
>>> +static void get_fw_reg(struct kvm_vm *vm, uint64_t id, uint64_t *addr)
>>> +{
>>> + struct kvm_one_reg reg = {
>>> + .id = id,
>>> + .addr = (uint64_t)addr,
>>> + };
>>> +
>>> + vcpu_ioctl(vm, 0, KVM_GET_ONE_REG, ®);
>>> +}
>>> +
>>> +struct st_time {
>>> + uint32_t rev;
>>> + uint32_t attr;
>>> + uint64_t st_time;
>>> +};
>>> +
>>> +#define STEAL_TIME_SIZE ((sizeof(struct st_time) + 63) & ~63)
>>> +#define ST_GPA_BASE (1 << 30)
>>> +
>>> +static void steal_time_init(struct kvm_vm *vm)
>>> +{
>>> + uint64_t st_ipa = (ulong)ST_GPA_BASE;
>>> + unsigned int gpages;
>>> + struct kvm_device_attr dev = {
>>> + .group = KVM_ARM_VCPU_PVTIME_CTRL,
>>> + .attr = KVM_ARM_VCPU_PVTIME_IPA,
>>> + .addr = (uint64_t)&st_ipa,
>>> + };
>>> +
>>> + gpages = vm_calc_num_guest_pages(VM_MODE_DEFAULT, STEAL_TIME_SIZE);
>>> + vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, ST_GPA_BASE, 1, gpages, 0);
>>> +
>>> + vcpu_ioctl(vm, 0, KVM_SET_DEVICE_ATTR, &dev);
>>> +}
>>> +
>>> +static void test_fw_regs_before_vm_start(struct kvm_vm *vm)
>>> +{
>>> + uint64_t val;
>>> + unsigned int i;
>>> + int ret;
>>> +
>>> + for (i = 0; i < ARRAY_SIZE(fw_reg_info); i++) {
>>> + const struct kvm_fw_reg_info *reg_info = &fw_reg_info[i];
>>> +
>>> + /* First 'read' should be an upper limit of the features supported */
>>> + get_fw_reg(vm, reg_info->reg, &val);
>>> + TEST_ASSERT(val == FW_REG_ULIMIT_VAL(reg_info->max_feat_bit),
>>> + "Expected all the features to be set for reg: 0x%lx; expected: 0x%lx; read: 0x%lx\n",
>>> + reg_info->reg, FW_REG_ULIMIT_VAL(reg_info->max_feat_bit), val);
>>> +
>>> + /* Test a 'write' by disabling all the features of the register map */
>>> + ret = set_fw_reg(vm, reg_info->reg, 0);
>>> + TEST_ASSERT(ret == 0,
>>> + "Failed to clear all the features of reg: 0x%lx; ret: %d\n",
>>> + reg_info->reg, errno);
>>> +
>>> + get_fw_reg(vm, reg_info->reg, &val);
>>> + TEST_ASSERT(val == 0,
>>> + "Expected all the features to be cleared for reg: 0x%lx\n", reg_info->reg);
>>> +
>>> + /*
>>> + * Test enabling a feature that's not supported.
>>> + * Avoid this check if all the bits are occupied.
>>> + */
>>> + if (reg_info->max_feat_bit < 63) {
>>> + ret = set_fw_reg(vm, reg_info->reg, BIT(reg_info->max_feat_bit + 1));
>>> + TEST_ASSERT(ret != 0 && errno == EINVAL,
>>> + "Unexpected behavior or return value (%d) while setting an unsupported feature for reg: 0x%lx\n",
>>> + errno, reg_info->reg);
>>> + }
>>> + }
>>> +}
>>
>> Just in case :)
>>
>> ret = set_fw_reg(vm, reg_info->reg, GENMASK(63, reg_info->max_feat_bit + 1));
>>
> It may be better to cover the entire range, but to test only the
> (max_feat_bit + 1) gives us the advantage of checking if there's any
> discrepancy between the kernel and the test, now that *_BIT_MAX are
> not a part of UAPI headers.
>
> Probably also include your test along with the existing one?
Thanks for your explanation again. Lets keep it as it is then.
>>
>>> +
>>> +static void test_fw_regs_after_vm_start(struct kvm_vm *vm)
>>> +{
>>> + uint64_t val;
>>> + unsigned int i;
>>> + int ret;
>>> +
>>> + for (i = 0; i < ARRAY_SIZE(fw_reg_info); i++) {
>>> + const struct kvm_fw_reg_info *reg_info = &fw_reg_info[i];
>>> +
>>> + /*
>>> + * Before starting the VM, the test clears all the bits.
>>> + * Check if that's still the case.
>>> + */
>>> + get_fw_reg(vm, reg_info->reg, &val);
>>> + TEST_ASSERT(val == 0,
>>> + "Expected all the features to be cleared for reg: 0x%lx\n",
>>> + reg_info->reg);
>>> +
>>> + /*
>>> + * Set all the features for this register again. KVM shouldn't
>>> + * allow this as the VM is running.
>>> + */
>>> + ret = set_fw_reg(vm, reg_info->reg, FW_REG_ULIMIT_VAL(reg_info->max_feat_bit));
>>> + TEST_ASSERT(ret != 0 && errno == EBUSY,
>>> + "Unexpected behavior or return value (%d) while setting a feature while VM is running for reg: 0x%lx\n",
>>> + errno, reg_info->reg);
>>> + }
>>> +}
>>> +
>>
>> I guess you want to check -EBUSY is returned. In that case,
>> the comments here could be clearer, something like below
>> to emphasize '-EBUSY'.
>>
>> /*
>> * After VM runs for once, -EBUSY should be returned on attempt
>> * to set features. Check if the correct errno is returned.
>> */
>>
> Sounds good.
>
>>> +static struct kvm_vm *test_vm_create(void)
>>> +{
>>> + struct kvm_vm *vm;
>>> +
>>> + vm = vm_create_default(0, 0, guest_code);
>>> +
>>> + ucall_init(vm, NULL);
>>> + steal_time_init(vm);
>>> +
>>> + return vm;
>>> +}
>>> +
>>> +static struct kvm_vm *test_guest_stage(struct kvm_vm *vm)
>>> +{
>>> + struct kvm_vm *ret_vm = vm;
>>> +
>>> + pr_debug("Stage: %d\n", stage);
>>> +
>>> + switch (stage) {
>>> + case TEST_STAGE_REG_IFACE:
>>> + test_fw_regs_after_vm_start(vm);
>>> + break;
>>> + case TEST_STAGE_HVC_IFACE_FEAT_DISABLED:
>>> + /* Start a new VM so that all the features are now enabled by default */
>>> + kvm_vm_free(vm);
>>> + ret_vm = test_vm_create();
>>> + break;
>>> + case TEST_STAGE_HVC_IFACE_FEAT_ENABLED:
>>> + case TEST_STAGE_HVC_IFACE_FALSE_INFO:
>>> + break;
>>> + default:
>>> + TEST_FAIL("Unknown test stage: %d\n", stage);
>>> + }
>>> +
>>> + stage++;
>>> + sync_global_to_guest(vm, stage);
>>> +
>>> + return ret_vm;
>>> +}
>>> +
>>> +static void test_run(void)
>>> +{
>>> + struct kvm_vm *vm;
>>> + struct ucall uc;
>>> + bool guest_done = false;
>>> +
>>> + vm = test_vm_create();
>>> +
>>> + test_fw_regs_before_vm_start(vm);
>>> +
>>> + while (!guest_done) {
>>> + vcpu_run(vm, 0);
>>> +
>>> + switch (get_ucall(vm, 0, &uc)) {
>>> + case UCALL_SYNC:
>>> + vm = test_guest_stage(vm);
>>> + break;
>>> + case UCALL_DONE:
>>> + guest_done = true;
>>> + break;
>>> + case UCALL_ABORT:
>>> + TEST_FAIL("%s at %s:%ld\n\tvalues: 0x%lx, 0x%lx; 0x%lx, stage: %u",
>>> + (const char *)uc.args[0], __FILE__, uc.args[1],
>>> + uc.args[2], uc.args[3], uc.args[4], stage);
>>> + break;
>>> + default:
>>> + TEST_FAIL("Unexpected guest exit\n");
>>> + }
>>> + }
>>> +
>>> + kvm_vm_free(vm);
>>> +}
>>> +
>>> +int main(void)
>>> +{
>>> + setbuf(stdout, NULL);
>>> +
>>> + test_run();
>>> + return 0;
>>> +}
>>>
[...]
Thanks,
Gavin
Hi Gavin,
On Tue, Apr 26, 2022 at 12:50 AM Gavin Shan <[email protected]> wrote:
>
> Hi Raghavendra,
>
> On 4/23/22 8:03 AM, Raghavendra Rao Ananta wrote:
> > Introduce a KVM selftest to check the hypercall interface
> > for arm64 platforms. The test validates the user-space'
> > [GET|SET]_ONE_REG interface to read/write the psuedo-firmware
> > registers as well as its effects on the guest upon certain
> > configurations.
> >
> > Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> > ---
> > tools/testing/selftests/kvm/.gitignore | 1 +
> > tools/testing/selftests/kvm/Makefile | 1 +
> > .../selftests/kvm/aarch64/hypercalls.c | 335 ++++++++++++++++++
> > 3 files changed, 337 insertions(+)
> > create mode 100644 tools/testing/selftests/kvm/aarch64/hypercalls.c
> >
>
> There are comments about @false_hvc_info[] and some nits, as below.
> Please evaluate and improve if it makes sense to you. Otherwise, it
> looks good to me:
>
> Reviewed-by: Gavin Shan <[email protected]>
>
> > diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
> > index 1bb575dfc42e..b17e464ec661 100644
> > --- a/tools/testing/selftests/kvm/.gitignore
> > +++ b/tools/testing/selftests/kvm/.gitignore
> > @@ -2,6 +2,7 @@
> > /aarch64/arch_timer
> > /aarch64/debug-exceptions
> > /aarch64/get-reg-list
> > +/aarch64/hypercalls
> > /aarch64/psci_test
> > /aarch64/vcpu_width_config
> > /aarch64/vgic_init
> > diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> > index c2cf4d318296..97eef0c03d3b 100644
> > --- a/tools/testing/selftests/kvm/Makefile
> > +++ b/tools/testing/selftests/kvm/Makefile
> > @@ -105,6 +105,7 @@ TEST_GEN_PROGS_x86_64 += system_counter_offset_test
> > TEST_GEN_PROGS_aarch64 += aarch64/arch_timer
> > TEST_GEN_PROGS_aarch64 += aarch64/debug-exceptions
> > TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
> > +TEST_GEN_PROGS_aarch64 += aarch64/hypercalls
> > TEST_GEN_PROGS_aarch64 += aarch64/psci_test
> > TEST_GEN_PROGS_aarch64 += aarch64/vcpu_width_config
> > TEST_GEN_PROGS_aarch64 += aarch64/vgic_init
> > diff --git a/tools/testing/selftests/kvm/aarch64/hypercalls.c b/tools/testing/selftests/kvm/aarch64/hypercalls.c
> > new file mode 100644
> > index 000000000000..f404343a0ae3
> > --- /dev/null
> > +++ b/tools/testing/selftests/kvm/aarch64/hypercalls.c
> > @@ -0,0 +1,335 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +
> > +/* hypercalls: Check the ARM64's psuedo-firmware bitmap register interface.
> > + *
> > + * The test validates the basic hypercall functionalities that are exposed
> > + * via the psuedo-firmware bitmap register. This includes the registers'
> > + * read/write behavior before and after the VM has started, and if the
> > + * hypercalls are properly masked or unmasked to the guest when disabled or
> > + * enabled from the KVM userspace, respectively.
> > + */
> > +
> > +#include <errno.h>
> > +#include <linux/arm-smccc.h>
> > +#include <asm/kvm.h>
> > +#include <kvm_util.h>
> > +
> > +#include "processor.h"
> > +
> > +#define FW_REG_ULIMIT_VAL(max_feat_bit) (GENMASK(max_feat_bit, 0))
> > +
> > +/* Last valid bits of the bitmapped firmware registers */
> > +#define KVM_REG_ARM_STD_BMAP_BIT_MAX 0
> > +#define KVM_REG_ARM_STD_HYP_BMAP_BIT_MAX 0
> > +#define KVM_REG_ARM_VENDOR_HYP_BMAP_BIT_MAX 1
> > +
> > +struct kvm_fw_reg_info {
> > + uint64_t reg; /* Register definition */
> > + uint64_t max_feat_bit; /* Bit that represents the upper limit of the feature-map */
> > +};
> > +
> > +#define FW_REG_INFO(r) \
> > + { \
> > + .reg = r, \
> > + .max_feat_bit = r##_BIT_MAX, \
> > + }
> > +
> > +static const struct kvm_fw_reg_info fw_reg_info[] = {
> > + FW_REG_INFO(KVM_REG_ARM_STD_BMAP),
> > + FW_REG_INFO(KVM_REG_ARM_STD_HYP_BMAP),
> > + FW_REG_INFO(KVM_REG_ARM_VENDOR_HYP_BMAP),
> > +};
> > +
> > +enum test_stage {
> > + TEST_STAGE_REG_IFACE,
> > + TEST_STAGE_HVC_IFACE_FEAT_DISABLED,
> > + TEST_STAGE_HVC_IFACE_FEAT_ENABLED,
> > + TEST_STAGE_HVC_IFACE_FALSE_INFO,
> > + TEST_STAGE_END,
> > +};
> > +
> > +static int stage = TEST_STAGE_REG_IFACE;
> > +
> > +struct test_hvc_info {
> > + uint32_t func_id;
> > + uint64_t arg1;
> > +};
> > +
> > +#define TEST_HVC_INFO(f, a1) \
> > + { \
> > + .func_id = f, \
> > + .arg1 = a1, \
> > + }
> > +
> > +static const struct test_hvc_info hvc_info[] = {
> > + /* KVM_REG_ARM_STD_BMAP */
> > + TEST_HVC_INFO(ARM_SMCCC_TRNG_VERSION, 0),
> > + TEST_HVC_INFO(ARM_SMCCC_TRNG_FEATURES, ARM_SMCCC_TRNG_RND64),
> > + TEST_HVC_INFO(ARM_SMCCC_TRNG_GET_UUID, 0),
> > + TEST_HVC_INFO(ARM_SMCCC_TRNG_RND32, 0),
> > + TEST_HVC_INFO(ARM_SMCCC_TRNG_RND64, 0),
> > +
> > + /* KVM_REG_ARM_STD_HYP_BMAP */
> > + TEST_HVC_INFO(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, ARM_SMCCC_HV_PV_TIME_FEATURES),
> > + TEST_HVC_INFO(ARM_SMCCC_HV_PV_TIME_FEATURES, ARM_SMCCC_HV_PV_TIME_ST),
> > + TEST_HVC_INFO(ARM_SMCCC_HV_PV_TIME_ST, 0),
> > +
> > + /* KVM_REG_ARM_VENDOR_HYP_BMAP */
> > + TEST_HVC_INFO(ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID,
> > + ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID),
> > + TEST_HVC_INFO(ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID, 0),
> > + TEST_HVC_INFO(ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID, KVM_PTP_VIRT_COUNTER),
> > +};
> > +
> > +/* Feed false hypercall info to test the KVM behavior */
> > +static const struct test_hvc_info false_hvc_info[] = {
> > + /* Feature support check against a different family of hypercalls */
> > + TEST_HVC_INFO(ARM_SMCCC_TRNG_FEATURES, ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID),
> > + TEST_HVC_INFO(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, ARM_SMCCC_TRNG_RND64),
> > + TEST_HVC_INFO(ARM_SMCCC_HV_PV_TIME_FEATURES, ARM_SMCCC_TRNG_RND64),
> > +};
> > +
>
> I don't see too much benefits of @false_hvc_info[] because
> NOT_SUPPORTED is always returned from its test case. I think
> it and its test case can be removed if you agree. I'm not
> sure if it was suggested by somebody else.
>
While this is not exactly testing the bitmap firmware registers, the
idea behind introducing false_hvc_info[] was to introduce some
negative tests and see if KVM handles it well. Especially with
*_FEATURES func_ids, we can accidentally introduce functional bugs in
KVM, and these would act as our safety net. I was planning to also
test with some reserved hypercall numbers, just to test if the kernel
doesn't panic for some reason.
> > +static void guest_test_hvc(const struct test_hvc_info *hc_info)
> > +{
> > + unsigned int i;
> > + struct arm_smccc_res res;
> > + unsigned int hvc_info_arr_sz;
> > +
> > + hvc_info_arr_sz =
> > + hc_info == hvc_info ? ARRAY_SIZE(hvc_info) : ARRAY_SIZE(false_hvc_info);
> > +
> > + for (i = 0; i < hvc_info_arr_sz; i++, hc_info++) {
> > + memset(&res, 0, sizeof(res));
> > + smccc_hvc(hc_info->func_id, hc_info->arg1, 0, 0, 0, 0, 0, 0, &res);
> > +
> > + switch (stage) {
> > + case TEST_STAGE_HVC_IFACE_FEAT_DISABLED:
> > + case TEST_STAGE_HVC_IFACE_FALSE_INFO:
> > + GUEST_ASSERT_3(res.a0 == SMCCC_RET_NOT_SUPPORTED,
> > + res.a0, hc_info->func_id, hc_info->arg1);
> > + break;
> > + case TEST_STAGE_HVC_IFACE_FEAT_ENABLED:
> > + GUEST_ASSERT_3(res.a0 != SMCCC_RET_NOT_SUPPORTED,
> > + res.a0, hc_info->func_id, hc_info->arg1);
> > + break;
> > + default:
> > + GUEST_ASSERT_1(0, stage);
> > + }
> > + }
> > +}
> > +
> > +static void guest_code(void)
> > +{
> > + while (stage != TEST_STAGE_END) {
> > + switch (stage) {
> > + case TEST_STAGE_REG_IFACE:
> > + break;
> > + case TEST_STAGE_HVC_IFACE_FEAT_DISABLED:
> > + case TEST_STAGE_HVC_IFACE_FEAT_ENABLED:
> > + guest_test_hvc(hvc_info);
> > + break;
> > + case TEST_STAGE_HVC_IFACE_FALSE_INFO:
> > + guest_test_hvc(false_hvc_info);
> > + break;
> > + default:
> > + GUEST_ASSERT_1(0, stage);
> > + }
> > +
> > + GUEST_SYNC(stage);
> > + }
> > +
> > + GUEST_DONE();
> > +}
> > +
> > +static int set_fw_reg(struct kvm_vm *vm, uint64_t id, uint64_t val)
> > +{
> > + struct kvm_one_reg reg = {
> > + .id = id,
> > + .addr = (uint64_t)&val,
> > + };
> > +
> > + return _vcpu_ioctl(vm, 0, KVM_SET_ONE_REG, ®);
> > +}
> > +
> > +static void get_fw_reg(struct kvm_vm *vm, uint64_t id, uint64_t *addr)
> > +{
> > + struct kvm_one_reg reg = {
> > + .id = id,
> > + .addr = (uint64_t)addr,
> > + };
> > +
> > + vcpu_ioctl(vm, 0, KVM_GET_ONE_REG, ®);
> > +}
> > +
> > +struct st_time {
> > + uint32_t rev;
> > + uint32_t attr;
> > + uint64_t st_time;
> > +};
> > +
> > +#define STEAL_TIME_SIZE ((sizeof(struct st_time) + 63) & ~63)
> > +#define ST_GPA_BASE (1 << 30)
> > +
> > +static void steal_time_init(struct kvm_vm *vm)
> > +{
> > + uint64_t st_ipa = (ulong)ST_GPA_BASE;
> > + unsigned int gpages;
> > + struct kvm_device_attr dev = {
> > + .group = KVM_ARM_VCPU_PVTIME_CTRL,
> > + .attr = KVM_ARM_VCPU_PVTIME_IPA,
> > + .addr = (uint64_t)&st_ipa,
> > + };
> > +
> > + gpages = vm_calc_num_guest_pages(VM_MODE_DEFAULT, STEAL_TIME_SIZE);
> > + vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, ST_GPA_BASE, 1, gpages, 0);
> > +
> > + vcpu_ioctl(vm, 0, KVM_SET_DEVICE_ATTR, &dev);
> > +}
> > +
> > +static void test_fw_regs_before_vm_start(struct kvm_vm *vm)
> > +{
> > + uint64_t val;
> > + unsigned int i;
> > + int ret;
> > +
> > + for (i = 0; i < ARRAY_SIZE(fw_reg_info); i++) {
> > + const struct kvm_fw_reg_info *reg_info = &fw_reg_info[i];
> > +
> > + /* First 'read' should be an upper limit of the features supported */
> > + get_fw_reg(vm, reg_info->reg, &val);
> > + TEST_ASSERT(val == FW_REG_ULIMIT_VAL(reg_info->max_feat_bit),
> > + "Expected all the features to be set for reg: 0x%lx; expected: 0x%lx; read: 0x%lx\n",
> > + reg_info->reg, FW_REG_ULIMIT_VAL(reg_info->max_feat_bit), val);
> > +
> > + /* Test a 'write' by disabling all the features of the register map */
> > + ret = set_fw_reg(vm, reg_info->reg, 0);
> > + TEST_ASSERT(ret == 0,
> > + "Failed to clear all the features of reg: 0x%lx; ret: %d\n",
> > + reg_info->reg, errno);
> > +
> > + get_fw_reg(vm, reg_info->reg, &val);
> > + TEST_ASSERT(val == 0,
> > + "Expected all the features to be cleared for reg: 0x%lx\n", reg_info->reg);
> > +
> > + /*
> > + * Test enabling a feature that's not supported.
> > + * Avoid this check if all the bits are occupied.
> > + */
> > + if (reg_info->max_feat_bit < 63) {
> > + ret = set_fw_reg(vm, reg_info->reg, BIT(reg_info->max_feat_bit + 1));
> > + TEST_ASSERT(ret != 0 && errno == EINVAL,
> > + "Unexpected behavior or return value (%d) while setting an unsupported feature for reg: 0x%lx\n",
> > + errno, reg_info->reg);
> > + }
> > + }
> > +}
>
> Just in case :)
>
> ret = set_fw_reg(vm, reg_info->reg, GENMASK(63, reg_info->max_feat_bit + 1));
>
It may be better to cover the entire range, but to test only the
(max_feat_bit + 1) gives us the advantage of checking if there's any
discrepancy between the kernel and the test, now that *_BIT_MAX are
not a part of UAPI headers.
Probably also include your test along with the existing one?
>
> > +
> > +static void test_fw_regs_after_vm_start(struct kvm_vm *vm)
> > +{
> > + uint64_t val;
> > + unsigned int i;
> > + int ret;
> > +
> > + for (i = 0; i < ARRAY_SIZE(fw_reg_info); i++) {
> > + const struct kvm_fw_reg_info *reg_info = &fw_reg_info[i];
> > +
> > + /*
> > + * Before starting the VM, the test clears all the bits.
> > + * Check if that's still the case.
> > + */
> > + get_fw_reg(vm, reg_info->reg, &val);
> > + TEST_ASSERT(val == 0,
> > + "Expected all the features to be cleared for reg: 0x%lx\n",
> > + reg_info->reg);
> > +
> > + /*
> > + * Set all the features for this register again. KVM shouldn't
> > + * allow this as the VM is running.
> > + */
> > + ret = set_fw_reg(vm, reg_info->reg, FW_REG_ULIMIT_VAL(reg_info->max_feat_bit));
> > + TEST_ASSERT(ret != 0 && errno == EBUSY,
> > + "Unexpected behavior or return value (%d) while setting a feature while VM is running for reg: 0x%lx\n",
> > + errno, reg_info->reg);
> > + }
> > +}
> > +
>
> I guess you want to check -EBUSY is returned. In that case,
> the comments here could be clearer, something like below
> to emphasize '-EBUSY'.
>
> /*
> * After VM runs for once, -EBUSY should be returned on attempt
> * to set features. Check if the correct errno is returned.
> */
>
Sounds good.
> > +static struct kvm_vm *test_vm_create(void)
> > +{
> > + struct kvm_vm *vm;
> > +
> > + vm = vm_create_default(0, 0, guest_code);
> > +
> > + ucall_init(vm, NULL);
> > + steal_time_init(vm);
> > +
> > + return vm;
> > +}
> > +
> > +static struct kvm_vm *test_guest_stage(struct kvm_vm *vm)
> > +{
> > + struct kvm_vm *ret_vm = vm;
> > +
> > + pr_debug("Stage: %d\n", stage);
> > +
> > + switch (stage) {
> > + case TEST_STAGE_REG_IFACE:
> > + test_fw_regs_after_vm_start(vm);
> > + break;
> > + case TEST_STAGE_HVC_IFACE_FEAT_DISABLED:
> > + /* Start a new VM so that all the features are now enabled by default */
> > + kvm_vm_free(vm);
> > + ret_vm = test_vm_create();
> > + break;
> > + case TEST_STAGE_HVC_IFACE_FEAT_ENABLED:
> > + case TEST_STAGE_HVC_IFACE_FALSE_INFO:
> > + break;
> > + default:
> > + TEST_FAIL("Unknown test stage: %d\n", stage);
> > + }
> > +
> > + stage++;
> > + sync_global_to_guest(vm, stage);
> > +
> > + return ret_vm;
> > +}
> > +
> > +static void test_run(void)
> > +{
> > + struct kvm_vm *vm;
> > + struct ucall uc;
> > + bool guest_done = false;
> > +
> > + vm = test_vm_create();
> > +
> > + test_fw_regs_before_vm_start(vm);
> > +
> > + while (!guest_done) {
> > + vcpu_run(vm, 0);
> > +
> > + switch (get_ucall(vm, 0, &uc)) {
> > + case UCALL_SYNC:
> > + vm = test_guest_stage(vm);
> > + break;
> > + case UCALL_DONE:
> > + guest_done = true;
> > + break;
> > + case UCALL_ABORT:
> > + TEST_FAIL("%s at %s:%ld\n\tvalues: 0x%lx, 0x%lx; 0x%lx, stage: %u",
> > + (const char *)uc.args[0], __FILE__, uc.args[1],
> > + uc.args[2], uc.args[3], uc.args[4], stage);
> > + break;
> > + default:
> > + TEST_FAIL("Unexpected guest exit\n");
> > + }
> > + }
> > +
> > + kvm_vm_free(vm);
> > +}
> > +
> > +int main(void)
> > +{
> > + setbuf(stdout, NULL);
> > +
> > + test_run();
> > + return 0;
> > +}
> >
>
> Thanks,
> Gavin
>
Thanks for the reviews on all the patches, Gavin.
Regards,
Raghavendra
On 4/23/22 8:03 AM, Raghavendra Rao Ananta wrote:
> Since the doc also covers general hypercalls' details,
> rather than just PSCI, and the fact that the bitmap firmware
> registers' details will be added to this doc, rename the file
> to a more appropriate name- hypercalls.rst.
>
> Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> Reviewed-by: Oliver Upton <[email protected]>
> ---
> Documentation/virt/kvm/arm/{psci.rst => hypercalls.rst} | 0
> 1 file changed, 0 insertions(+), 0 deletions(-)
> rename Documentation/virt/kvm/arm/{psci.rst => hypercalls.rst} (100%)
>
Reviewed-by: Gavin Shan <[email protected]>
> diff --git a/Documentation/virt/kvm/arm/psci.rst b/Documentation/virt/kvm/arm/hypercalls.rst
> similarity index 100%
> rename from Documentation/virt/kvm/arm/psci.rst
> rename to Documentation/virt/kvm/arm/hypercalls.rst
>
On Tue, Apr 26, 2022 at 6:46 PM Reiji Watanabe <[email protected]> wrote:
>
> Hi Raghu,
>
> On Mon, Apr 25, 2022 at 9:46 AM Raghavendra Rao Ananta
> <[email protected]> wrote:
> >
> > Hi Reiji,
> >
> > On Sun, Apr 24, 2022 at 9:52 PM Reiji Watanabe <[email protected]> wrote:
> > >
> > > Hi Raghu,
> > >
> > > On Fri, Apr 22, 2022 at 5:03 PM Raghavendra Rao Ananta
> > > <[email protected]> wrote:
> > > >
> > > > KVM regularly introduces new hypercall services to the guests without
> > > > any consent from the userspace. This means, the guests can observe
> > > > hypercall services in and out as they migrate across various host
> > > > kernel versions. This could be a major problem if the guest
> > > > discovered a hypercall, started using it, and after getting migrated
> > > > to an older kernel realizes that it's no longer available. Depending
> > > > on how the guest handles the change, there's a potential chance that
> > > > the guest would just panic.
> > > >
> > > > As a result, there's a need for the userspace to elect the services
> > > > that it wishes the guest to discover. It can elect these services
> > > > based on the kernels spread across its (migration) fleet. To remedy
> > > > this, extend the existing firmware pseudo-registers, such as
> > > > KVM_REG_ARM_PSCI_VERSION, but by creating a new COPROC register space
> > > > for all the hypercall services available.
> > > >
> > > > These firmware registers are categorized based on the service call
> > > > owners, but unlike the existing firmware pseudo-registers, they hold
> > > > the features supported in the form of a bitmap.
> > > >
> > > > During the VM initialization, the registers are set to upper-limit of
> > > > the features supported by the corresponding registers. It's expected
> > > > that the VMMs discover the features provided by each register via
> > > > GET_ONE_REG, and write back the desired values using SET_ONE_REG.
> > > > KVM allows this modification only until the VM has started.
> > > >
> > > > Some of the standard features are not mapped to any bits of the
> > > > registers. But since they can recreate the original problem of
> > > > making it available without userspace's consent, they need to
> > > > be explicitly added to the case-list in
> > > > kvm_hvc_call_default_allowed(). Any function-id that's not enabled
> > > > via the bitmap, or not listed in kvm_hvc_call_default_allowed, will
> > > > be returned as SMCCC_RET_NOT_SUPPORTED to the guest.
> > > >
> > > > Older userspace code can simply ignore the feature and the
> > > > hypercall services will be exposed unconditionally to the guests,
> > > > thus ensuring backward compatibility.
> > > >
> > > > In this patch, the framework adds the register only for ARM's standard
> > > > secure services (owner value 4). Currently, this includes support only
> > > > for ARM True Random Number Generator (TRNG) service, with bit-0 of the
> > > > register representing mandatory features of v1.0. Other services are
> > > > momentarily added in the upcoming patches.
> > > >
> > > > Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> > > > ---
> > > > arch/arm64/include/asm/kvm_host.h | 12 ++++
> > > > arch/arm64/include/uapi/asm/kvm.h | 9 +++
> > > > arch/arm64/kvm/arm.c | 1 +
> > > > arch/arm64/kvm/guest.c | 8 ++-
> > > > arch/arm64/kvm/hypercalls.c | 94 +++++++++++++++++++++++++++++++
> > > > arch/arm64/kvm/psci.c | 13 +++++
> > > > include/kvm/arm_hypercalls.h | 6 ++
> > > > include/kvm/arm_psci.h | 2 +-
> > > > 8 files changed, 142 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > > > index 94a27a7520f4..df07f4c10197 100644
> > > > --- a/arch/arm64/include/asm/kvm_host.h
> > > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > > @@ -101,6 +101,15 @@ struct kvm_s2_mmu {
> > > > struct kvm_arch_memory_slot {
> > > > };
> > > >
> > > > +/**
> > > > + * struct kvm_smccc_features: Descriptor the hypercall services exposed to the guests
> > > > + *
> > > > + * @std_bmap: Bitmap of standard secure service calls
> > > > + */
> > > > +struct kvm_smccc_features {
> > > > + unsigned long std_bmap;
> > > > +};
> > > > +
> > > > struct kvm_arch {
> > > > struct kvm_s2_mmu mmu;
> > > >
> > > > @@ -150,6 +159,9 @@ struct kvm_arch {
> > > >
> > > > u8 pfr0_csv2;
> > > > u8 pfr0_csv3;
> > > > +
> > > > + /* Hypercall features firmware registers' descriptor */
> > > > + struct kvm_smccc_features smccc_feat;
> > > > };
> > > >
> > > > struct kvm_vcpu_fault_info {
> > > > diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> > > > index c1b6ddc02d2f..0b79d2dc6ffd 100644
> > > > --- a/arch/arm64/include/uapi/asm/kvm.h
> > > > +++ b/arch/arm64/include/uapi/asm/kvm.h
> > > > @@ -332,6 +332,15 @@ struct kvm_arm_copy_mte_tags {
> > > > #define KVM_ARM64_SVE_VLS_WORDS \
> > > > ((KVM_ARM64_SVE_VQ_MAX - KVM_ARM64_SVE_VQ_MIN) / 64 + 1)
> > > >
> > > > +/* Bitmap feature firmware registers */
> > > > +#define KVM_REG_ARM_FW_FEAT_BMAP (0x0016 << KVM_REG_ARM_COPROC_SHIFT)
> > > > +#define KVM_REG_ARM_FW_FEAT_BMAP_REG(r) (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
> > > > + KVM_REG_ARM_FW_FEAT_BMAP | \
> > > > + ((r) & 0xffff))
> > > > +
> > > > +#define KVM_REG_ARM_STD_BMAP KVM_REG_ARM_FW_FEAT_BMAP_REG(0)
> > > > +#define KVM_REG_ARM_STD_BIT_TRNG_V1_0 0
> > > > +
> > > > /* Device Control API: ARM VGIC */
> > > > #define KVM_DEV_ARM_VGIC_GRP_ADDR 0
> > > > #define KVM_DEV_ARM_VGIC_GRP_DIST_REGS 1
> > > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > > > index 523bc934fe2f..a37fadbd617e 100644
> > > > --- a/arch/arm64/kvm/arm.c
> > > > +++ b/arch/arm64/kvm/arm.c
> > > > @@ -156,6 +156,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> > > > kvm->arch.max_vcpus = kvm_arm_default_max_vcpus();
> > > >
> > > > set_default_spectre(kvm);
> > > > + kvm_arm_init_hypercalls(kvm);
> > > >
> > > > return ret;
> > > > out_free_stage2_pgd:
> > > > diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> > > > index 0d5cca56cbda..8c607199cad1 100644
> > > > --- a/arch/arm64/kvm/guest.c
> > > > +++ b/arch/arm64/kvm/guest.c
> > > > @@ -756,7 +756,9 @@ int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > > >
> > > > switch (reg->id & KVM_REG_ARM_COPROC_MASK) {
> > > > case KVM_REG_ARM_CORE: return get_core_reg(vcpu, reg);
> > > > - case KVM_REG_ARM_FW: return kvm_arm_get_fw_reg(vcpu, reg);
> > > > + case KVM_REG_ARM_FW:
> > > > + case KVM_REG_ARM_FW_FEAT_BMAP:
> > > > + return kvm_arm_get_fw_reg(vcpu, reg);
> > > > case KVM_REG_ARM64_SVE: return get_sve_reg(vcpu, reg);
> > > > }
> > > >
> > > > @@ -774,7 +776,9 @@ int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > > >
> > > > switch (reg->id & KVM_REG_ARM_COPROC_MASK) {
> > > > case KVM_REG_ARM_CORE: return set_core_reg(vcpu, reg);
> > > > - case KVM_REG_ARM_FW: return kvm_arm_set_fw_reg(vcpu, reg);
> > > > + case KVM_REG_ARM_FW:
> > > > + case KVM_REG_ARM_FW_FEAT_BMAP:
> > > > + return kvm_arm_set_fw_reg(vcpu, reg);
> > > > case KVM_REG_ARM64_SVE: return set_sve_reg(vcpu, reg);
> > > > }
> > > >
> > > > diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> > > > index fa6d9378d8e7..df55a04d2fe8 100644
> > > > --- a/arch/arm64/kvm/hypercalls.c
> > > > +++ b/arch/arm64/kvm/hypercalls.c
> > > > @@ -58,6 +58,48 @@ static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
> > > > val[3] = lower_32_bits(cycles);
> > > > }
> > > >
> > > > +static bool kvm_arm_fw_reg_feat_enabled(unsigned long *reg_bmap, unsigned long feat_bit)
> > > > +{
> > > > + return test_bit(feat_bit, reg_bmap);
> > > > +}
> > > > +
> > > > +static bool kvm_hvc_call_default_allowed(struct kvm_vcpu *vcpu, u32 func_id)
> > > > +{
> > > > + switch (func_id) {
> > > > + /*
> > > > + * List of function-ids that are not gated with the bitmapped feature
> > > > + * firmware registers, and are to be allowed for servicing the call by default.
> > > > + */
> > > > + case ARM_SMCCC_VERSION_FUNC_ID:
> > > > + case ARM_SMCCC_ARCH_FEATURES_FUNC_ID:
> > > > + case ARM_SMCCC_HV_PV_TIME_FEATURES:
> > > > + case ARM_SMCCC_HV_PV_TIME_ST:
> > > > + case ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID:
> > > > + case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
> > > > + case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
> > > > + return true;
> > > > + default:
> > > > + return kvm_psci_func_id_is_valid(vcpu, func_id);
> > > > + }
> > > > +}
> > > > +
> > > > +static bool kvm_hvc_call_allowed(struct kvm_vcpu *vcpu, u32 func_id)
> > > > +{
> > > > + struct kvm_smccc_features *smccc_feat = &vcpu->kvm->arch.smccc_feat;
> > > > +
> > > > + switch (func_id) {
> > > > + case ARM_SMCCC_TRNG_VERSION:
> > > > + case ARM_SMCCC_TRNG_FEATURES:
> > > > + case ARM_SMCCC_TRNG_GET_UUID:
> > > > + case ARM_SMCCC_TRNG_RND32:
> > > > + case ARM_SMCCC_TRNG_RND64:
> > > > + return kvm_arm_fw_reg_feat_enabled(&smccc_feat->std_bmap,
> > > > + KVM_REG_ARM_STD_BIT_TRNG_V1_0);
> > > > + default:
> > > > + return kvm_hvc_call_default_allowed(vcpu, func_id);
> > > > + }
> > > > +}
> > > > +
> > > > int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> > > > {
> > > > u32 func_id = smccc_get_function(vcpu);
> > > > @@ -65,6 +107,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> > > > u32 feature;
> > > > gpa_t gpa;
> > > >
> > > > + if (!kvm_hvc_call_allowed(vcpu, func_id))
> > > > + goto out;
> > > > +
> > > > switch (func_id) {
> > > > case ARM_SMCCC_VERSION_FUNC_ID:
> > > > val[0] = ARM_SMCCC_VERSION_1_1;
> > > > @@ -155,6 +200,7 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> > > > return kvm_psci_call(vcpu);
> > > > }
> > > >
> > > > +out:
> > > > smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
> > > > return 1;
> > > > }
> > > > @@ -164,8 +210,16 @@ static const u64 kvm_arm_fw_reg_ids[] = {
> > > > KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1,
> > > > KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2,
> > > > KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3,
> > > > + KVM_REG_ARM_STD_BMAP,
> > > > };
> > > >
> > > > +void kvm_arm_init_hypercalls(struct kvm *kvm)
> > > > +{
> > > > + struct kvm_smccc_features *smccc_feat = &kvm->arch.smccc_feat;
> > > > +
> > > > + smccc_feat->std_bmap = KVM_ARM_SMCCC_STD_FEATURES;
> > > > +}
> > > > +
> > > > int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu)
> > > > {
> > > > return ARRAY_SIZE(kvm_arm_fw_reg_ids);
> > > > @@ -237,6 +291,7 @@ static int get_kernel_wa_level(u64 regid)
> > > >
> > > > int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > > > {
> > > > + struct kvm_smccc_features *smccc_feat = &vcpu->kvm->arch.smccc_feat;
> > > > void __user *uaddr = (void __user *)(long)reg->addr;
> > > > u64 val;
> > > >
> > > > @@ -249,6 +304,9 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > > > case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3:
> > > > val = get_kernel_wa_level(reg->id) & KVM_REG_FEATURE_LEVEL_MASK;
> > > > break;
> > > > + case KVM_REG_ARM_STD_BMAP:
> > > > + val = READ_ONCE(smccc_feat->std_bmap);
> > > > + break;
> > > > default:
> > > > return -ENOENT;
> > > > }
> > > > @@ -259,6 +317,40 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > > > return 0;
> > > > }
> > > >
> > > > +static int kvm_arm_set_fw_reg_bmap(struct kvm_vcpu *vcpu, u64 reg_id, u64 val)
> > > > +{
> > > > + int ret = 0;
> > > > + struct kvm *kvm = vcpu->kvm;
> > > > + struct kvm_smccc_features *smccc_feat = &kvm->arch.smccc_feat;
> > > > + unsigned long *fw_reg_bmap, fw_reg_features;
> > > > +
> > > > + switch (reg_id) {
> > > > + case KVM_REG_ARM_STD_BMAP:
> > > > + fw_reg_bmap = &smccc_feat->std_bmap;
> > > > + fw_reg_features = KVM_ARM_SMCCC_STD_FEATURES;
> > > > + break;
> > > > + default:
> > > > + return -ENOENT;
> > > > + }
> > > > +
> > > > + /* Check for unsupported bit */
> > > > + if (val & ~fw_reg_features)
> > > > + return -EINVAL;
> > > > +
> > > > + mutex_lock(&kvm->lock);
> > >
> > > Why don't you check if the register value will be modified before
> > > getting the lock ? (then there is nothing to do)
> > > It would help reduce unnecessary serialization for live migration
> > > (even without the vm-scoped register capability).
> > >
> > That was the case until v5. Since v6, we return -EBUSY unconditionally
> > regardless of the incoming value. See Marc's comments in [1].
>
> > That was the case until v5. Since v6, we return -EBUSY unconditionally
> > regardless of the incoming value. See Marc's comments in [1].
>
> Even with that, the function could do below to avoid
> the unnecessary serialization.
> (I would expect mostly the function returns before getting the lock)
>
> if (test_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &kvm->arch.flags))
> return -EBUSY;
>
> if (val == *fw_reg_bmap)
> return 0;
>
> mutex_lock(&kvm->lock);
>
> <...>
>
Great idea! I can try this out. Thanks for the suggestion.
> > >
> > >
> > > > +
> > > > + /* Return -EBUSY if the VM (any vCPU) has already started running. */
> > > > + if (test_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &kvm->arch.flags)) {
> > > > + ret = -EBUSY;
> > > > + goto out;
> > > > + }
> > >
> > > I just would like to make sure that you are sure that existing
> > > userspace you know will not run KVM_RUN for any vCPUs until
> > > KVM_SET_ONE_REG is complete for all vCPUs (even for migration),
> > > correct ?
> > >
> > Since v6, that is something that we are leaving with the userspace to
> > synchronize. See [1].
>
> Understood.
>
>
> > > > +o
> > > > + WRITE_ONCE(*fw_reg_bmap, val);
> > > > +out:
> > > > + mutex_unlock(&kvm->lock);
> > > > + return ret;
> > > > +}
> > > > +
> > > > int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > > > {
> > > > void __user *uaddr = (void __user *)(long)reg->addr;
> > > > @@ -337,6 +429,8 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > > > return -EINVAL;
> > > >
> > > > return 0;
> > > > + case KVM_REG_ARM_STD_BMAP:
> > > > + return kvm_arm_set_fw_reg_bmap(vcpu, reg->id, val);
> > > > default:
> > > > return -ENOENT;
> > > > }
> > > > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> > > > index 346535169faa..67d1273e8086 100644
> > > > --- a/arch/arm64/kvm/psci.c
> > > > +++ b/arch/arm64/kvm/psci.c
> > > > @@ -436,3 +436,16 @@ int kvm_psci_call(struct kvm_vcpu *vcpu)
> > > > return -EINVAL;
> > > > }
> > > > }
> > > > +
> > > > +bool kvm_psci_func_id_is_valid(struct kvm_vcpu *vcpu, u32 func_id)
> > > > +{
> > > > + /* PSCI 0.1 doesn't comply with the standard SMCCC */
> > > > + if (kvm_psci_version(vcpu) == KVM_ARM_PSCI_0_1)
> > > > + return (func_id == KVM_PSCI_FN_CPU_OFF || func_id == KVM_PSCI_FN_CPU_ON);
> > > > +
> > > > + if (ARM_SMCCC_OWNER_NUM(func_id) == ARM_SMCCC_OWNER_STANDARD &&
> > > > + ARM_SMCCC_FUNC_NUM(func_id) >= 0 && ARM_SMCCC_FUNC_NUM(func_id) <= 0x1f)
> > > > + return true;
> > >
> > > For PSCI 0.1, the function checks if the funct_id is valid for
> > > the vCPU (according to the vCPU's PSCI version).
> > > For other version of PSCI, the function doesn't care the vCPU's
> > > PSCI version (although supported functions depend on the PSCI
> > > version and not all of them are defined yet, the code returns
> > > true as long as the function id is within the reserved PSCI
> > > function id range).
> > > So, the behavior appears to be inconsistent.
> > > Shouldn't it return the validity of the function id according
> > > to the vCPU's psci version for non-PSCI 0.1 case as well ?
> > > (Otherwise, shouldn't it return true if the function id is valid
> > > for any of the PSCI versions ?)
> > >
> > Well, PSCI 1.0 is somewhat of an odd implementation. It doesn't comply
> > with the SMCCC, hence needed some special handling. Only two func_ids> are currently supported by KVM, and we just check for each. The second
> > 'if' statement is for all the PSCI versions >= 0.2. Thankfully, the
> > specification defines a range of acceptable PSCI func_ids.
>
> I understand PSCI 0.1 is different from PSCI 0.2 or newer versions.
> But, my question is: What would you consider "valid" psci function id ?
> It seems that the function checks whether or not the func_id is valid
> on the vCPU for PSCI 0.1, and checks whether or not the func_id is a
> PSCI function id for vCPU with PSCI 0.2 or newer.
>
> I understand either one works for your purpose, but I would think
> the behavior should be consistent.
>
I guess checking for the version caused the confusion here, but that
was done since there isn't a standard way to check the 0.1's range of
func_ids. Alternatively, instead of version, since the base of the
0.1's range is different as well, I can just check for that to avoid
the confusion (no functional change though).
Thank you.
Raghavendra
> Thanks,
> Reiji
>
>
> >
> > If it's confusing, I can add a comment above the second 'if' that it's
> > for all PSCI versions >= 0.2.
> > > Thanks,
> > > Reiji
> > >
> > Thank you.
> > Raghavendra
> >
> > [1]: https://lore.kernel.org/lkml/[email protected]/
> > >
> > >
> > > > +
> > > > + return false;
> > > > +}
> > > > diff --git a/include/kvm/arm_hypercalls.h b/include/kvm/arm_hypercalls.h
> > > > index 5d38628a8d04..499b45b607b6 100644
> > > > --- a/include/kvm/arm_hypercalls.h
> > > > +++ b/include/kvm/arm_hypercalls.h
> > > > @@ -6,6 +6,11 @@
> > > >
> > > > #include <asm/kvm_emulate.h>
> > > >
> > > > +/* Last valid bits of the bitmapped firmware registers */
> > > > +#define KVM_REG_ARM_STD_BMAP_BIT_MAX 0
> > > > +
> > > > +#define KVM_ARM_SMCCC_STD_FEATURES GENMASK(KVM_REG_ARM_STD_BMAP_BIT_MAX, 0)
> > > > +
> > > > int kvm_hvc_call_handler(struct kvm_vcpu *vcpu);
> > > >
> > > > static inline u32 smccc_get_function(struct kvm_vcpu *vcpu)
> > > > @@ -42,6 +47,7 @@ static inline void smccc_set_retval(struct kvm_vcpu *vcpu,
> > > >
> > > > struct kvm_one_reg;
> > > >
> > > > +void kvm_arm_init_hypercalls(struct kvm *kvm);
> > > > int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu);
> > > > int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
> > > > int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
> > > > diff --git a/include/kvm/arm_psci.h b/include/kvm/arm_psci.h
> > > > index 6e55b9283789..c47be3e26965 100644
> > > > --- a/include/kvm/arm_psci.h
> > > > +++ b/include/kvm/arm_psci.h
> > > > @@ -36,7 +36,7 @@ static inline int kvm_psci_version(struct kvm_vcpu *vcpu)
> > > > return KVM_ARM_PSCI_0_1;
> > > > }
> > > >
> > > > -
> > > > int kvm_psci_call(struct kvm_vcpu *vcpu);
> > > > +bool kvm_psci_func_id_is_valid(struct kvm_vcpu *vcpu, u32 func_id);
> > > >
> > > > #endif /* __KVM_ARM_PSCI_H__ */
> > > > --
> > > > 2.36.0.rc2.479.g8af0fa9b8e-goog
> > > >