2022-11-17 22:33:31

by Janis Schoetterl-Glausch

[permalink] [raw]
Subject: [PATCH v3 0/9] KVM: s390: Extend MEM_OP ioctl by storage key checked cmpxchg

User space can use the MEM_OP ioctl to make storage key checked reads
and writes to the guest, however, it has no way of performing atomic,
key checked, accesses to the guest.
Extend the MEM_OP ioctl in order to allow for this, by adding a cmpxchg
mode. For now, support this mode for absolute accesses only.

This mode can be use, for example, to set the device-state-change
indicator and the adapter-local-summary indicator atomically.

Also contains some fixes/changes for the memop selftest independent of
the cmpxchg changes.

v2 -> v3
* rebase onto the wip/cmpxchg_user_key branch in the s390 kernel repo
* use __uint128_t instead of unsigned __int128
* put moving of testlist into main into separate patch
* pick up R-b's (thanks Nico)

v1 -> v2
* get rid of xrk instruction for cmpxchg byte and short implementation
* pass old parameter via pointer instead of in mem_op struct
* indicate failure of cmpxchg due to wrong old value by special return
code
* picked up R-b's (thanks Thomas)

Janis Schoetterl-Glausch (9):
KVM: s390: Extend MEM_OP ioctl by storage key checked cmpxchg
Documentation: KVM: s390: Describe KVM_S390_MEMOP_F_CMPXCHG
KVM: s390: selftest: memop: Pass mop_desc via pointer
KVM: s390: selftest: memop: Replace macros by functions
KVM: s390: selftest: memop: Move testlist into main
KVM: s390: selftest: memop: Add cmpxchg tests
KVM: s390: selftest: memop: Add bad address test
KVM: s390: selftest: memop: Fix typo
KVM: s390: selftest: memop: Fix wrong address being used in test

Documentation/virt/kvm/api.rst | 21 +-
include/uapi/linux/kvm.h | 5 +
arch/s390/kvm/gaccess.h | 3 +
arch/s390/kvm/gaccess.c | 101 ++++
arch/s390/kvm/kvm-s390.c | 35 +-
tools/testing/selftests/kvm/s390x/memop.c | 670 +++++++++++++++++-----
6 files changed, 683 insertions(+), 152 deletions(-)

Range-diff against v2:
1: c6731b0063ab ! 1: 94820a73e9b0 KVM: s390: Extend MEM_OP ioctl by storage key checked cmpxchg
@@ arch/s390/kvm/gaccess.h: int access_guest_with_key(struct kvm_vcpu *vcpu, unsign
void *data, unsigned long len, enum gacc_mode mode);

+int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len,
-+ unsigned __int128 *old,
-+ unsigned __int128 new, u8 access_key);
++ __uint128_t *old, __uint128_t new, u8 access_key);
+
/**
* write_guest_with_key - copy data from kernel space to guest space
@@ arch/s390/kvm/gaccess.c: int access_guest_real(struct kvm_vcpu *vcpu, unsigned l
+ * * a program interruption code indicating the reason cmpxchg could
+ * not be attempted
+ * * -EINVAL: address misaligned or len not power of two
++ * * -EAGAIN: transient failure (len 1 or 2)
+ */
+int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len,
-+ unsigned __int128 *old_p, unsigned __int128 new,
++ __uint128_t *old_p, __uint128_t new,
+ u8 access_key)
+{
+ gfn_t gfn = gpa >> PAGE_SHIFT;
@@ arch/s390/kvm/gaccess.c: int access_guest_real(struct kvm_vcpu *vcpu, unsigned l
+ return -EOPNOTSUPP;
+
+ hva += offset_in_page(gpa);
-+ ret = cmpxchg_user_key_size(len, (void __user *)hva, old_p, new, access_key);
++ switch (len) {
++ case 1: {
++ u8 old;
++
++ ret = cmpxchg_user_key((u8 *)hva, &old, *old_p, new, access_key);
++ ret = ret < 0 ? ret : old != *old_p;
++ *old_p = old;
++ break;
++ }
++ case 2: {
++ u16 old;
++
++ ret = cmpxchg_user_key((u16 *)hva, &old, *old_p, new, access_key);
++ ret = ret < 0 ? ret : old != *old_p;
++ *old_p = old;
++ break;
++ }
++ case 4: {
++ u32 old;
++
++ ret = cmpxchg_user_key((u32 *)hva, &old, *old_p, new, access_key);
++ ret = ret < 0 ? ret : old != *old_p;
++ *old_p = old;
++ break;
++ }
++ case 8: {
++ u64 old;
++
++ ret = cmpxchg_user_key((u64 *)hva, &old, *old_p, new, access_key);
++ ret = ret < 0 ? ret : old != *old_p;
++ *old_p = old;
++ break;
++ }
++ case 16: {
++ __uint128_t old;
++
++ ret = cmpxchg_user_key((__uint128_t *)hva, &old, *old_p, new, access_key);
++ ret = ret < 0 ? ret : old != *old_p;
++ *old_p = old;
++ break;
++ }
++ default:
++ return -EINVAL;
++ }
+ mark_page_dirty_in_slot(kvm, slot, gfn);
+ /*
+ * Assume that the fault is caused by protection, either key protection
@@ arch/s390/kvm/kvm-s390.c: static bool access_key_invalid(u8 access_key)
void __user *uaddr = (void __user *)mop->buf;
+ void __user *old_p = (void __user *)mop->old_p;
+ union {
-+ unsigned __int128 quad;
-+ char raw[sizeof(unsigned __int128)];
++ __uint128_t quad;
++ char raw[sizeof(__uint128_t)];
+ } old = { .quad = 0}, new = { .quad = 0 };
-+ unsigned int off_in_quad = sizeof(unsigned __int128) - mop->size;
++ unsigned int off_in_quad = sizeof(__uint128_t) - mop->size;
u64 supported_flags;
void *tmpbuf = NULL;
int r, srcu_idx;
2: 6cb32b244899 = 2: 28637a03f63e Documentation: KVM: s390: Describe KVM_S390_MEMOP_F_CMPXCHG
3: 5f1217ad9d31 = 3: 31f45e4377c5 KVM: s390: selftest: memop: Pass mop_desc via pointer
4: 86a15b53846a = 4: dba0a8ba3ba2 KVM: s390: selftest: memop: Replace macros by functions
-: ------------ > 5: ba8cdd064c02 KVM: s390: selftest: memop: Move testlist into main
5: 49e67d7559de ! 6: 3cf7f710e5db KVM: s390: selftest: memop: Add cmpxchg tests
@@ tools/testing/selftests/kvm/s390x/memop.c: struct mop_desc {
uint8_t ar;
uint8_t key;
};
-
-+typedef unsigned __int128 uint128;
-+
- const uint8_t NO_KEY = 0xff;
-
- static struct kvm_s390_mem_op ksmo_from_desc(const struct mop_desc *desc)
@@ tools/testing/selftests/kvm/s390x/memop.c: static struct kvm_s390_mem_op ksmo_from_desc(const struct mop_desc *desc)
ksmo.flags |= KVM_S390_MEMOP_F_SKEY_PROTECTION;
ksmo.key = desc->key;
@@ tools/testing/selftests/kvm/s390x/memop.c: enum stage {
vcpu_run(__vcpu); \
get_ucall(__vcpu, &uc); \
+ if (uc.cmd == UCALL_ABORT) { \
-+ TEST_FAIL("line %lu: %s, hints: %lu, %lu", \
-+ uc.args[1], (const char *)uc.args[0], \
-+ uc.args[2], uc.args[3]); \
++ REPORT_GUEST_ASSERT_2(uc, "hints: %lu, %lu"); \
+ } \
ASSERT_EQ(uc.cmd, UCALL_SYNC); \
ASSERT_EQ(uc.args[1], __stage); \
@@ tools/testing/selftests/kvm/s390x/memop.c: static void test_copy_key(void)
+ kvm_vm_free(t.kvm_vm);
+}
+
-+static uint128 cut_to_size(int size, uint128 val)
++static __uint128_t cut_to_size(int size, __uint128_t val)
+{
+ switch (size) {
+ case 1:
@@ tools/testing/selftests/kvm/s390x/memop.c: static void test_copy_key(void)
+ return 0;
+}
+
-+static bool popcount_eq(uint128 a, uint128 b)
++static bool popcount_eq(__uint128_t a, __uint128_t b)
+{
+ unsigned int count_a, count_b;
+
@@ tools/testing/selftests/kvm/s390x/memop.c: static void test_copy_key(void)
+ return count_a == count_b;
+}
+
-+static uint128 rotate(int size, uint128 val, int amount)
++static __uint128_t rotate(int size, __uint128_t val, int amount)
+{
+ unsigned int bits = size * 8;
+
@@ tools/testing/selftests/kvm/s390x/memop.c: static void test_copy_key(void)
+ }
+}
+
-+static uint128 permutate_bits(bool guest, int i, int size, uint128 old)
++static __uint128_t permutate_bits(bool guest, int i, int size, __uint128_t old)
+{
+ unsigned int rand;
+ bool swap;
@@ tools/testing/selftests/kvm/s390x/memop.c: static void test_copy_key(void)
+ swap = rand % 2 == 0;
+ if (swap) {
+ int i, j;
-+ uint128 new;
++ __uint128_t new;
+ uint8_t byte0, byte1;
+
+ rand = rand * 3 + 1;
@@ tools/testing/selftests/kvm/s390x/memop.c: static void test_copy_key(void)
+ }
+}
+
-+static bool _cmpxchg(int size, void *target, uint128 *old_p, uint128 new)
++static bool _cmpxchg(int size, void *target, __uint128_t *old_p, __uint128_t new)
+{
+ bool ret;
+
@@ tools/testing/selftests/kvm/s390x/memop.c: static void test_copy_key(void)
+ return ret;
+ }
+ case 16: {
-+ uint128 old = *old_p;
++ __uint128_t old = *old_p;
+
+ asm volatile ("cdsg %[old],%[new],%[address]"
+ : [old] "+d" (old),
-+ [address] "+Q" (*(uint128 *)(target))
++ [address] "+Q" (*(__uint128_t *)(target))
+ : [new] "d" (new)
+ : "cc"
+ );
@@ tools/testing/selftests/kvm/s390x/memop.c: static void test_copy_key(void)
+static void guest_cmpxchg_key(void)
+{
+ int size, offset;
-+ uint128 old, new;
++ __uint128_t old, new;
+
+ set_storage_key_range(mem1, max_block, 0x10);
+ set_storage_key_range(mem2, max_block, 0x10);
@@ tools/testing/selftests/kvm/s390x/memop.c: static void test_copy_key(void)
+ return NULL;
+}
+
-+static char *quad_to_char(uint128 *quad, int size)
++static char *quad_to_char(__uint128_t *quad, int size)
+{
+ return ((char *)quad) + (sizeof(*quad) - size);
+}
@@ tools/testing/selftests/kvm/s390x/memop.c: static void test_copy_key(void)
+{
+ struct test_default t = test_default_init(guest_cmpxchg_key);
+ int size, offset;
-+ uint128 old, new;
++ __uint128_t old, new;
+ bool success;
+ pthread_t thread;
+
@@ tools/testing/selftests/kvm/s390x/memop.c: static void test_copy_key(void)
+ pthread_join(thread, NULL);
+
+ MOP(t.vcpu, LOGICAL, READ, mem2, max_block, GADDR_V(mem2));
-+ TEST_ASSERT(popcount_eq(*(uint128 *)mem1, *(uint128 *)mem2),
++ TEST_ASSERT(popcount_eq(*(__uint128_t *)mem1, *(__uint128_t *)mem2),
+ "Must retain number of set bits");
+
+ kvm_vm_free(t.kvm_vm);
@@ tools/testing/selftests/kvm/s390x/memop.c: static void test_errors_key(void)
+ HOST_SYNC(t.vcpu, STAGE_SKEYS_SET);
+
+ for (i = 1; i <= 16; i *= 2) {
-+ uint128 old = 0;
++ __uint128_t old = 0;
+
+ CHECK_N_DO(ERR_PROT_MOP, t.vm, ABSOLUTE, WRITE, mem2, i, GADDR_V(mem2),
+ CMPXCHG_OLD(&old), KEY(2));
@@ tools/testing/selftests/kvm/s390x/memop.c: static void test_errors(void)
kvm_vm_free(t.kvm_vm);
}

--struct testdef {
-- const char *name;
-- void (*test)(void);
-- int extension;
--} testlist[] = {
-- {
-- .name = "simple copy",
-- .test = test_copy,
-- },
-- {
-- .name = "generic error checks",
-- .test = test_errors,
-- },
-- {
-- .name = "copy with storage keys",
-- .test = test_copy_key,
-- .extension = 1,
-- },
-- {
-- .name = "copy with key storage protection override",
-- .test = test_copy_key_storage_prot_override,
-- .extension = 1,
-- },
-- {
-- .name = "copy with key fetch protection",
-- .test = test_copy_key_fetch_prot,
-- .extension = 1,
-- },
-- {
-- .name = "copy with key fetch protection override",
-- .test = test_copy_key_fetch_prot_override,
-- .extension = 1,
-- },
-- {
-- .name = "error checks with key",
-- .test = test_errors_key,
-- .extension = 1,
-- },
-- {
-- .name = "termination",
-- .test = test_termination,
-- .extension = 1,
-- },
-- {
-- .name = "error checks with key storage protection override",
-- .test = test_errors_key_storage_prot_override,
-- .extension = 1,
-- },
-- {
-- .name = "error checks without key fetch prot override",
-- .test = test_errors_key_fetch_prot_override_not_enabled,
-- .extension = 1,
-- },
-- {
-- .name = "error checks with key fetch prot override",
-- .test = test_errors_key_fetch_prot_override_enabled,
-- .extension = 1,
-- },
--};
+static void test_errors_cmpxchg(void)
+{
+ struct test_default t = test_default_init(guest_idle);
-+ uint128 old;
++ __uint128_t old;
+ int rv, i, power = 1;
+
+ HOST_SYNC(t.vcpu, STAGE_INITED);
@@ tools/testing/selftests/kvm/s390x/memop.c: static void test_errors(void)

int main(int argc, char *argv[])
{
- int extension_cap, idx;
-
-+ setbuf(stdout, NULL); /* Tell stdout not to buffer its content */
- TEST_REQUIRE(kvm_has_cap(KVM_CAP_S390_MEM_OP));
-+ extension_cap = kvm_check_cap(KVM_CAP_S390_MEM_OP_EXTENSION);
-
-- setbuf(stdout, NULL); /* Tell stdout not to buffer its content */
-+ struct testdef {
-+ const char *name;
-+ void (*test)(void);
-+ bool requirements_met;
-+ } testlist[] = {
-+ {
-+ .name = "simple copy",
-+ .test = test_copy,
-+ .requirements_met = true,
-+ },
-+ {
-+ .name = "generic error checks",
-+ .test = test_errors,
-+ .requirements_met = true,
-+ },
-+ {
-+ .name = "copy with storage keys",
-+ .test = test_copy_key,
-+ .requirements_met = extension_cap > 0,
-+ },
+@@ tools/testing/selftests/kvm/s390x/memop.c: int main(int argc, char *argv[])
+ .test = test_copy_key,
+ .requirements_met = extension_cap > 0,
+ },
+ {
+ .name = "cmpxchg with storage keys",
+ .test = test_cmpxchg_key,
@@ tools/testing/selftests/kvm/s390x/memop.c: static void test_errors(void)
+ .test = test_cmpxchg_key_concurrent,
+ .requirements_met = extension_cap & 0x2,
+ },
-+ {
-+ .name = "copy with key storage protection override",
-+ .test = test_copy_key_storage_prot_override,
-+ .requirements_met = extension_cap > 0,
-+ },
-+ {
-+ .name = "copy with key fetch protection",
-+ .test = test_copy_key_fetch_prot,
-+ .requirements_met = extension_cap > 0,
-+ },
-+ {
-+ .name = "copy with key fetch protection override",
-+ .test = test_copy_key_fetch_prot_override,
-+ .requirements_met = extension_cap > 0,
-+ },
-+ {
-+ .name = "error checks with key",
-+ .test = test_errors_key,
-+ .requirements_met = extension_cap > 0,
-+ },
+ {
+ .name = "copy with key storage protection override",
+ .test = test_copy_key_storage_prot_override,
+@@ tools/testing/selftests/kvm/s390x/memop.c: int main(int argc, char *argv[])
+ .test = test_errors_key,
+ .requirements_met = extension_cap > 0,
+ },
+ {
+ .name = "error checks for cmpxchg with key",
+ .test = test_errors_cmpxchg_key,
@@ tools/testing/selftests/kvm/s390x/memop.c: static void test_errors(void)
+ .test = test_errors_cmpxchg,
+ .requirements_met = extension_cap & 0x2,
+ },
-+ {
-+ .name = "termination",
-+ .test = test_termination,
-+ .requirements_met = extension_cap > 0,
-+ },
-+ {
-+ .name = "error checks with key storage protection override",
-+ .test = test_errors_key_storage_prot_override,
-+ .requirements_met = extension_cap > 0,
-+ },
-+ {
-+ .name = "error checks without key fetch prot override",
-+ .test = test_errors_key_fetch_prot_override_not_enabled,
-+ .requirements_met = extension_cap > 0,
-+ },
-+ {
-+ .name = "error checks with key fetch prot override",
-+ .test = test_errors_key_fetch_prot_override_enabled,
-+ .requirements_met = extension_cap > 0,
-+ },
-+ };
-
- ksft_print_header();
--
- ksft_set_plan(ARRAY_SIZE(testlist));
-
-- extension_cap = kvm_check_cap(KVM_CAP_S390_MEM_OP_EXTENSION);
- for (idx = 0; idx < ARRAY_SIZE(testlist); idx++) {
-- if (extension_cap >= testlist[idx].extension) {
-+ if (testlist[idx].requirements_met) {
- testlist[idx].test();
- ksft_test_result_pass("%s\n", testlist[idx].name);
- } else {
-- ksft_test_result_skip("%s - extension level %d not supported\n",
-- testlist[idx].name,
-- testlist[idx].extension);
-+ ksft_test_result_skip("%s - requirements not met (kernel has extension cap %#x\n)",
-+ testlist[idx].name, extension_cap);
- }
- }
-
+ {
+ .name = "termination",
+ .test = test_termination,
6: faad9cf03ea6 ! 7: d81d5697ba4b KVM: s390: selftest: memop: Add bad address test
@@ Commit message
Add test that tries to access, instead of CHECK_ONLY.

Signed-off-by: Janis Schoetterl-Glausch <[email protected]>
+ Reviewed-by: Nico Boehr <[email protected]>

## tools/testing/selftests/kvm/s390x/memop.c ##
@@ tools/testing/selftests/kvm/s390x/memop.c: static void _test_errors_common(struct test_info info, enum mop_target target, i
7: 8070036aa89a ! 8: 22c9cd9589ba KVM: s390: selftest: memop: Fix typo
@@ Commit message

Signed-off-by: Janis Schoetterl-Glausch <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
+ Reviewed-by: Nico Boehr <[email protected]>

## tools/testing/selftests/kvm/s390x/memop.c ##
@@ tools/testing/selftests/kvm/s390x/memop.c: static void test_errors_key_fetch_prot_override_enabled(void)
8: 18c423e4e3ad ! 9: 4647be0790c8 KVM: s390: selftest: memop: Fix wrong address being used in test
@@ Commit message
protection exception the test codes needs to address mem1.

Signed-off-by: Janis Schoetterl-Glausch <[email protected]>
+ Reviewed-by: Nico Boehr <[email protected]>

## tools/testing/selftests/kvm/s390x/memop.c ##
@@ tools/testing/selftests/kvm/s390x/memop.c: static void test_errors_key(void)

base-commit: b5b2a05e6de7b88bcfca9d4bbc8ac74e7db80c52
--
2.34.1



2022-11-17 22:39:16

by Janis Schoetterl-Glausch

[permalink] [raw]
Subject: [PATCH v3 2/9] Documentation: KVM: s390: Describe KVM_S390_MEMOP_F_CMPXCHG

Describe the semantics of the new KVM_S390_MEMOP_F_CMPXCHG flag for
absolute vm write memops which allows user space to perform (storage key
checked) cmpxchg operations on guest memory.

Signed-off-by: Janis Schoetterl-Glausch <[email protected]>
---
Documentation/virt/kvm/api.rst | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index eee9f857a986..204d128f23e0 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -3753,7 +3753,8 @@ The fields in each entry are defined as follows:
:Parameters: struct kvm_s390_mem_op (in)
:Returns: = 0 on success,
< 0 on generic error (e.g. -EFAULT or -ENOMEM),
- > 0 if an exception occurred while walking the page tables
+ 16 bit program exception code if the access causes such an exception
+ other code > maximum 16 bit value with special meaning

Read or write data from/to the VM's memory.
The KVM_CAP_S390_MEM_OP_EXTENSION capability specifies what functionality is
@@ -3771,6 +3772,8 @@ Parameters are specified via the following structure::
struct {
__u8 ar; /* the access register number */
__u8 key; /* access key, ignored if flag unset */
+ __u8 pad1[6]; /* ignored */
+ __u64 old_p; /* ignored if flag unset */
};
__u32 sida_offset; /* offset into the sida */
__u8 reserved[32]; /* ignored */
@@ -3853,8 +3856,22 @@ Absolute accesses are permitted for non-protected guests only.
Supported flags:
* ``KVM_S390_MEMOP_F_CHECK_ONLY``
* ``KVM_S390_MEMOP_F_SKEY_PROTECTION``
+ * ``KVM_S390_MEMOP_F_CMPXCHG``
+
+The semantics of the flags common with logical acesses are as for logical
+accesses.
+
+For write accesses, the KVM_S390_MEMOP_F_CMPXCHG might be supported.
+In this case, instead of doing an unconditional write, the access occurs only
+if the target location contains the "size" byte long value pointed to by
+"old_p". This is performed as an atomic cmpxchg. "size" must be a power of two
+up to and including 16.
+The value at the target location is written to the location "old_p" points to.
+If the exchange did not take place because the target value doesn't match the
+old value KVM_S390_MEMOP_R_NO_XCHG is returned.
+The KVM_S390_MEMOP_F_CMPXCHG flag is supported if KVM_CAP_S390_MEM_OP_EXTENSION
+has bit 1 (i.e. bit with value 2) set.

-The semantics of the flags are as for logical accesses.

SIDA read/write:
^^^^^^^^^^^^^^^^
--
2.34.1


2022-11-17 22:51:19

by Janis Schoetterl-Glausch

[permalink] [raw]
Subject: [PATCH v3 1/9] KVM: s390: Extend MEM_OP ioctl by storage key checked cmpxchg

User space can use the MEM_OP ioctl to make storage key checked reads
and writes to the guest, however, it has no way of performing atomic,
key checked, accesses to the guest.
Extend the MEM_OP ioctl in order to allow for this, by adding a cmpxchg
mode. For now, support this mode for absolute accesses only.

This mode can be use, for example, to set the device-state-change
indicator and the adapter-local-summary indicator atomically.

Signed-off-by: Janis Schoetterl-Glausch <[email protected]>
---
include/uapi/linux/kvm.h | 5 ++
arch/s390/kvm/gaccess.h | 3 ++
arch/s390/kvm/gaccess.c | 101 +++++++++++++++++++++++++++++++++++++++
arch/s390/kvm/kvm-s390.c | 35 +++++++++++++-
4 files changed, 142 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 0d5d4419139a..1f36be5493e6 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -588,6 +588,8 @@ struct kvm_s390_mem_op {
struct {
__u8 ar; /* the access register number */
__u8 key; /* access key, ignored if flag unset */
+ __u8 pad1[6]; /* ignored */
+ __u64 old_p; /* ignored if flag unset */
};
__u32 sida_offset; /* offset into the sida */
__u8 reserved[32]; /* ignored */
@@ -604,6 +606,9 @@ struct kvm_s390_mem_op {
#define KVM_S390_MEMOP_F_CHECK_ONLY (1ULL << 0)
#define KVM_S390_MEMOP_F_INJECT_EXCEPTION (1ULL << 1)
#define KVM_S390_MEMOP_F_SKEY_PROTECTION (1ULL << 2)
+#define KVM_S390_MEMOP_F_CMPXCHG (1ULL << 3)
+/* Non program exception return codes (pgm codes are 16 bit) */
+#define KVM_S390_MEMOP_R_NO_XCHG ((1 << 16) + 0)

/* for KVM_INTERRUPT */
struct kvm_interrupt {
diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h
index 9408d6cc8e2c..92a3b9fb31ec 100644
--- a/arch/s390/kvm/gaccess.h
+++ b/arch/s390/kvm/gaccess.h
@@ -206,6 +206,9 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra,
void *data, unsigned long len, enum gacc_mode mode);

+int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len,
+ __uint128_t *old, __uint128_t new, u8 access_key);
+
/**
* write_guest_with_key - copy data from kernel space to guest space
* @vcpu: virtual cpu
diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
index 0243b6e38d36..be042865d8a1 100644
--- a/arch/s390/kvm/gaccess.c
+++ b/arch/s390/kvm/gaccess.c
@@ -1161,6 +1161,107 @@ int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra,
return rc;
}

+/**
+ * cmpxchg_guest_abs_with_key() - Perform cmpxchg on guest absolute address.
+ * @kvm: Virtual machine instance.
+ * @gpa: Absolute guest address of the location to be changed.
+ * @len: Operand length of the cmpxchg, required: 1 <= len <= 16. Providing a
+ * non power of two will result in failure.
+ * @old_p: Pointer to old value. If the location at @gpa contains this value, the
+ * exchange will succeed. After calling cmpxchg_guest_abs_with_key() *@old
+ * contains the value at @gpa before the attempt to exchange the value.
+ * @new: The value to place at @gpa.
+ * @access_key: The access key to use for the guest access.
+ *
+ * Atomically exchange the value at @gpa by @new, if it contains *@old.
+ * Honors storage keys.
+ *
+ * Return: * 0: successful exchange
+ * * 1: exchange unsuccessful
+ * * a program interruption code indicating the reason cmpxchg could
+ * not be attempted
+ * * -EINVAL: address misaligned or len not power of two
+ * * -EAGAIN: transient failure (len 1 or 2)
+ */
+int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len,
+ __uint128_t *old_p, __uint128_t new,
+ u8 access_key)
+{
+ gfn_t gfn = gpa >> PAGE_SHIFT;
+ struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
+ bool writable;
+ hva_t hva;
+ int ret;
+
+ if (!IS_ALIGNED(gpa, len))
+ return -EINVAL;
+
+ hva = gfn_to_hva_memslot_prot(slot, gfn, &writable);
+ if (kvm_is_error_hva(hva))
+ return PGM_ADDRESSING;
+ /*
+ * Check if it's a read-only memslot, even though that cannot occur
+ * since those are unsupported.
+ * Don't try to actually handle that case.
+ */
+ if (!writable)
+ return -EOPNOTSUPP;
+
+ hva += offset_in_page(gpa);
+ switch (len) {
+ case 1: {
+ u8 old;
+
+ ret = cmpxchg_user_key((u8 *)hva, &old, *old_p, new, access_key);
+ ret = ret < 0 ? ret : old != *old_p;
+ *old_p = old;
+ break;
+ }
+ case 2: {
+ u16 old;
+
+ ret = cmpxchg_user_key((u16 *)hva, &old, *old_p, new, access_key);
+ ret = ret < 0 ? ret : old != *old_p;
+ *old_p = old;
+ break;
+ }
+ case 4: {
+ u32 old;
+
+ ret = cmpxchg_user_key((u32 *)hva, &old, *old_p, new, access_key);
+ ret = ret < 0 ? ret : old != *old_p;
+ *old_p = old;
+ break;
+ }
+ case 8: {
+ u64 old;
+
+ ret = cmpxchg_user_key((u64 *)hva, &old, *old_p, new, access_key);
+ ret = ret < 0 ? ret : old != *old_p;
+ *old_p = old;
+ break;
+ }
+ case 16: {
+ __uint128_t old;
+
+ ret = cmpxchg_user_key((__uint128_t *)hva, &old, *old_p, new, access_key);
+ ret = ret < 0 ? ret : old != *old_p;
+ *old_p = old;
+ break;
+ }
+ default:
+ return -EINVAL;
+ }
+ mark_page_dirty_in_slot(kvm, slot, gfn);
+ /*
+ * Assume that the fault is caused by protection, either key protection
+ * or user page write protection.
+ */
+ if (ret == -EFAULT)
+ ret = PGM_PROTECTION;
+ return ret;
+}
+
/**
* guest_translate_address_with_key - translate guest logical into guest absolute address
* @vcpu: virtual cpu
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 45d4b8182b07..2410b4044283 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -576,7 +576,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_S390_VCPU_RESETS:
case KVM_CAP_SET_GUEST_DEBUG:
case KVM_CAP_S390_DIAG318:
- case KVM_CAP_S390_MEM_OP_EXTENSION:
r = 1;
break;
case KVM_CAP_SET_GUEST_DEBUG2:
@@ -590,6 +589,14 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_S390_MEM_OP:
r = MEM_OP_MAX_SIZE;
break;
+ case KVM_CAP_S390_MEM_OP_EXTENSION:
+ /*
+ * Flag bits indicating which extensions are supported.
+ * The first extension doesn't use a flag, but pretend it does,
+ * this way that can be changed in the future.
+ */
+ r = 0x3;
+ break;
case KVM_CAP_NR_VCPUS:
case KVM_CAP_MAX_VCPUS:
case KVM_CAP_MAX_VCPU_ID:
@@ -2714,12 +2721,19 @@ static bool access_key_invalid(u8 access_key)
static int kvm_s390_vm_mem_op(struct kvm *kvm, struct kvm_s390_mem_op *mop)
{
void __user *uaddr = (void __user *)mop->buf;
+ void __user *old_p = (void __user *)mop->old_p;
+ union {
+ __uint128_t quad;
+ char raw[sizeof(__uint128_t)];
+ } old = { .quad = 0}, new = { .quad = 0 };
+ unsigned int off_in_quad = sizeof(__uint128_t) - mop->size;
u64 supported_flags;
void *tmpbuf = NULL;
int r, srcu_idx;

supported_flags = KVM_S390_MEMOP_F_SKEY_PROTECTION
- | KVM_S390_MEMOP_F_CHECK_ONLY;
+ | KVM_S390_MEMOP_F_CHECK_ONLY
+ | KVM_S390_MEMOP_F_CMPXCHG;
if (mop->flags & ~supported_flags || !mop->size)
return -EINVAL;
if (mop->size > MEM_OP_MAX_SIZE)
@@ -2741,6 +2755,15 @@ static int kvm_s390_vm_mem_op(struct kvm *kvm, struct kvm_s390_mem_op *mop)
} else {
mop->key = 0;
}
+ if (mop->flags & KVM_S390_MEMOP_F_CMPXCHG) {
+ if (mop->size > sizeof(new))
+ return -EINVAL;
+ /* off_in_quad has been validated */
+ if (copy_from_user(&new.raw[off_in_quad], uaddr, mop->size))
+ return -EFAULT;
+ if (copy_from_user(&old.raw[off_in_quad], old_p, mop->size))
+ return -EFAULT;
+ }
if (!(mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY)) {
tmpbuf = vmalloc(mop->size);
if (!tmpbuf)
@@ -2771,6 +2794,14 @@ static int kvm_s390_vm_mem_op(struct kvm *kvm, struct kvm_s390_mem_op *mop)
case KVM_S390_MEMOP_ABSOLUTE_WRITE: {
if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) {
r = check_gpa_range(kvm, mop->gaddr, mop->size, GACC_STORE, mop->key);
+ } else if (mop->flags & KVM_S390_MEMOP_F_CMPXCHG) {
+ r = cmpxchg_guest_abs_with_key(kvm, mop->gaddr, mop->size,
+ &old.quad, new.quad, mop->key);
+ if (r == 1) {
+ r = KVM_S390_MEMOP_R_NO_XCHG;
+ if (copy_to_user(old_p, &old.raw[off_in_quad], mop->size))
+ r = -EFAULT;
+ }
} else {
if (copy_from_user(tmpbuf, uaddr, mop->size)) {
r = -EFAULT;
--
2.34.1


2022-11-18 02:08:54

by Bagas Sanjaya

[permalink] [raw]
Subject: Re: [PATCH v3 2/9] Documentation: KVM: s390: Describe KVM_S390_MEMOP_F_CMPXCHG

On 11/18/22 05:17, Janis Schoetterl-Glausch wrote:
> +For write accesses, the KVM_S390_MEMOP_F_CMPXCHG might be supported.
> +In this case, instead of doing an unconditional write, the access occurs only
> +if the target location contains the "size" byte long value pointed to by
> +"old_p". This is performed as an atomic cmpxchg. "size" must be a power of two
> +up to and including 16.
> +The value at the target location is written to the location "old_p" points to.
> +If the exchange did not take place because the target value doesn't match the
> +old value KVM_S390_MEMOP_R_NO_XCHG is returned.
> +The KVM_S390_MEMOP_F_CMPXCHG flag is supported if KVM_CAP_S390_MEM_OP_EXTENSION
> +has bit 1 (i.e. bit with value 2) set.
>

Is KVM_S390_MEMOP_F_CMPXCHG supported with conditions (as you implied)?

--
An old man doll... just what I always wanted! - Clara


2022-11-18 10:38:17

by Heiko Carstens

[permalink] [raw]
Subject: Re: [PATCH v3 1/9] KVM: s390: Extend MEM_OP ioctl by storage key checked cmpxchg

On Thu, Nov 17, 2022 at 11:17:50PM +0100, Janis Schoetterl-Glausch wrote:
> User space can use the MEM_OP ioctl to make storage key checked reads
> and writes to the guest, however, it has no way of performing atomic,
> key checked, accesses to the guest.
> Extend the MEM_OP ioctl in order to allow for this, by adding a cmpxchg
> mode. For now, support this mode for absolute accesses only.
>
> This mode can be use, for example, to set the device-state-change
> indicator and the adapter-local-summary indicator atomically.
>
> Signed-off-by: Janis Schoetterl-Glausch <[email protected]>
> ---
> include/uapi/linux/kvm.h | 5 ++
> arch/s390/kvm/gaccess.h | 3 ++
> arch/s390/kvm/gaccess.c | 101 +++++++++++++++++++++++++++++++++++++++
> arch/s390/kvm/kvm-s390.c | 35 +++++++++++++-
> 4 files changed, 142 insertions(+), 2 deletions(-)
>
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 0d5d4419139a..1f36be5493e6 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -588,6 +588,8 @@ struct kvm_s390_mem_op {
> struct {
> __u8 ar; /* the access register number */
> __u8 key; /* access key, ignored if flag unset */
> + __u8 pad1[6]; /* ignored */
> + __u64 old_p; /* ignored if flag unset */

Just one comment: the suffix "_p" for pointer is quite unusual within
the kernel. This also would be the first of its kind within kvm.h.
Usually there is either no suffix or "_addr".
So for consistency reasons I would suggest to change this to one of
the common variants.

The code itself looks good from my point of view, even though for the
sake of simplicity I would have put the complete sign/zero extended
128 bit old value into the structure, instead of having a pointer to
the value. Imho that would simplify the interface. Also alignment, as
pointed out previously, really doesn't matter for this use case.

But you had already something like that previously and changed it, so
no reason to go back and forth. Not really important.

2022-11-18 14:53:14

by Thomas Huth

[permalink] [raw]
Subject: Re: [PATCH v3 1/9] KVM: s390: Extend MEM_OP ioctl by storage key checked cmpxchg

On 18/11/2022 11.12, Heiko Carstens wrote:
> On Thu, Nov 17, 2022 at 11:17:50PM +0100, Janis Schoetterl-Glausch wrote:
>> User space can use the MEM_OP ioctl to make storage key checked reads
>> and writes to the guest, however, it has no way of performing atomic,
>> key checked, accesses to the guest.
>> Extend the MEM_OP ioctl in order to allow for this, by adding a cmpxchg
>> mode. For now, support this mode for absolute accesses only.
>>
>> This mode can be use, for example, to set the device-state-change
>> indicator and the adapter-local-summary indicator atomically.
>>
>> Signed-off-by: Janis Schoetterl-Glausch <[email protected]>
>> ---
>> include/uapi/linux/kvm.h | 5 ++
>> arch/s390/kvm/gaccess.h | 3 ++
>> arch/s390/kvm/gaccess.c | 101 +++++++++++++++++++++++++++++++++++++++
>> arch/s390/kvm/kvm-s390.c | 35 +++++++++++++-
>> 4 files changed, 142 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 0d5d4419139a..1f36be5493e6 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -588,6 +588,8 @@ struct kvm_s390_mem_op {
>> struct {
>> __u8 ar; /* the access register number */
>> __u8 key; /* access key, ignored if flag unset */
>> + __u8 pad1[6]; /* ignored */
>> + __u64 old_p; /* ignored if flag unset */
>
> Just one comment: the suffix "_p" for pointer is quite unusual within
> the kernel. This also would be the first of its kind within kvm.h.
> Usually there is either no suffix or "_addr".
> So for consistency reasons I would suggest to change this to one of
> the common variants.
>
> The code itself looks good from my point of view, even though for the
> sake of simplicity I would have put the complete sign/zero extended
> 128 bit old value into the structure, instead of having a pointer to
> the value.

See
https://lore.kernel.org/kvm/[email protected]/
... it would break the "IOW" definition of the ioctl. It can be done, but
that confuses tools like valgrind, as far as I know. So I think the idea
with the pointer is better in this case.

Thomas


2022-11-18 16:21:47

by Heiko Carstens

[permalink] [raw]
Subject: Re: [PATCH v3 1/9] KVM: s390: Extend MEM_OP ioctl by storage key checked cmpxchg

On Fri, Nov 18, 2022 at 03:37:26PM +0100, Thomas Huth wrote:
> > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > > index 0d5d4419139a..1f36be5493e6 100644
> > > --- a/include/uapi/linux/kvm.h
> > > +++ b/include/uapi/linux/kvm.h
> > > @@ -588,6 +588,8 @@ struct kvm_s390_mem_op {
> > > struct {
> > > __u8 ar; /* the access register number */
> > > __u8 key; /* access key, ignored if flag unset */
> > > + __u8 pad1[6]; /* ignored */
> > > + __u64 old_p; /* ignored if flag unset */
> >
> > Just one comment: the suffix "_p" for pointer is quite unusual within
> > the kernel. This also would be the first of its kind within kvm.h.
> > Usually there is either no suffix or "_addr".
> > So for consistency reasons I would suggest to change this to one of
> > the common variants.
> >
> > The code itself looks good from my point of view, even though for the
> > sake of simplicity I would have put the complete sign/zero extended
> > 128 bit old value into the structure, instead of having a pointer to
> > the value.
>
> See
> https://lore.kernel.org/kvm/[email protected]/
> ... it would break the "IOW" definition of the ioctl. It can be done, but
> that confuses tools like valgrind, as far as I know. So I think the idea
> with the pointer is better in this case.

Ah right, I forgot about that. Then let's do it this way.

2022-11-21 17:53:23

by Janis Schoetterl-Glausch

[permalink] [raw]
Subject: Re: [PATCH v3 2/9] Documentation: KVM: s390: Describe KVM_S390_MEMOP_F_CMPXCHG

On Fri, 2022-11-18 at 08:50 +0700, Bagas Sanjaya wrote:
> On 11/18/22 05:17, Janis Schoetterl-Glausch wrote:
> > +For write accesses, the KVM_S390_MEMOP_F_CMPXCHG might be supported.
> > +In this case, instead of doing an unconditional write, the access occurs only
> > +if the target location contains the "size" byte long value pointed to by
> > +"old_p". This is performed as an atomic cmpxchg. "size" must be a power of two
> > +up to and including 16.
> > +The value at the target location is written to the location "old_p" points to.
> > +If the exchange did not take place because the target value doesn't match the
> > +old value KVM_S390_MEMOP_R_NO_XCHG is returned.
> > +The KVM_S390_MEMOP_F_CMPXCHG flag is supported if KVM_CAP_S390_MEM_OP_EXTENSION
> > +has bit 1 (i.e. bit with value 2) set.
> >
>
> Is KVM_S390_MEMOP_F_CMPXCHG supported with conditions (as you implied)?
>
I'm afraid I don't quite understand the question.
It is supported if the capability says it is, i.e. if bit 1 is set.

2022-11-21 18:14:30

by Janis Schoetterl-Glausch

[permalink] [raw]
Subject: Re: [PATCH v3 1/9] KVM: s390: Extend MEM_OP ioctl by storage key checked cmpxchg

On Fri, 2022-11-18 at 11:12 +0100, Heiko Carstens wrote:
> On Thu, Nov 17, 2022 at 11:17:50PM +0100, Janis Schoetterl-Glausch wrote:
> > User space can use the MEM_OP ioctl to make storage key checked reads
> > and writes to the guest, however, it has no way of performing atomic,
> > key checked, accesses to the guest.
> > Extend the MEM_OP ioctl in order to allow for this, by adding a cmpxchg
> > mode. For now, support this mode for absolute accesses only.
> >
> > This mode can be use, for example, to set the device-state-change
> > indicator and the adapter-local-summary indicator atomically.
> >
> > Signed-off-by: Janis Schoetterl-Glausch <[email protected]>
> > ---
> > include/uapi/linux/kvm.h | 5 ++
> > arch/s390/kvm/gaccess.h | 3 ++
> > arch/s390/kvm/gaccess.c | 101 +++++++++++++++++++++++++++++++++++++++
> > arch/s390/kvm/kvm-s390.c | 35 +++++++++++++-
> > 4 files changed, 142 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index 0d5d4419139a..1f36be5493e6 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -588,6 +588,8 @@ struct kvm_s390_mem_op {
> > struct {
> > __u8 ar; /* the access register number */
> > __u8 key; /* access key, ignored if flag unset */
> > + __u8 pad1[6]; /* ignored */
> > + __u64 old_p; /* ignored if flag unset */
>
> Just one comment: the suffix "_p" for pointer is quite unusual within
> the kernel. This also would be the first of its kind within kvm.h.
> Usually there is either no suffix or "_addr".
> So for consistency reasons I would suggest to change this to one of
> the common variants.
>
Thanks, good point.

[...]

2022-11-22 07:56:41

by Thomas Huth

[permalink] [raw]
Subject: Re: [PATCH v3 2/9] Documentation: KVM: s390: Describe KVM_S390_MEMOP_F_CMPXCHG

On 17/11/2022 23.17, Janis Schoetterl-Glausch wrote:
> Describe the semantics of the new KVM_S390_MEMOP_F_CMPXCHG flag for
> absolute vm write memops which allows user space to perform (storage key
> checked) cmpxchg operations on guest memory.
>
> Signed-off-by: Janis Schoetterl-Glausch <[email protected]>
> ---
...
> Supported flags:
> * ``KVM_S390_MEMOP_F_CHECK_ONLY``
> * ``KVM_S390_MEMOP_F_SKEY_PROTECTION``
> + * ``KVM_S390_MEMOP_F_CMPXCHG``
> +
> +The semantics of the flags common with logical acesses are as for logical
> +accesses.
> +
> +For write accesses, the KVM_S390_MEMOP_F_CMPXCHG might be supported.

I'd maybe merge this with the last sentence:

For write accesses, the KVM_S390_MEMOP_F_CMPXCHG flag is supported if
KVM_CAP_S390_MEM_OP_EXTENSION has bit 1 (i.e. bit with value 2) set.

... and speaking of that, I wonder whether it's maybe a good idea to
introduce some #defines for bit 1 / value 2, to avoid the confusion ?

> +In this case, instead of doing an unconditional write, the access occurs only
> +if the target location contains the "size" byte long value pointed to by
> +"old_p". This is performed as an atomic cmpxchg.

I had to read the first sentence twice to understand it ... maybe it's
easier to understand if you move the "size" part to the second sentence:

In this case, instead of doing an unconditional write, the access occurs
only if the target location contains value pointed to by "old_p". This is
performed as an atomic cmpxchg with the length specified by the "size"
parameter.

?

> "size" must be a power of two
> +up to and including 16.
> +The value at the target location is written to the location "old_p" points to.

IMHO something like this would be better:

The value at the target location is replaced with the value from the
location that "old_p" points to.

> +If the exchange did not take place because the target value doesn't match the
> +old value KVM_S390_MEMOP_R_NO_XCHG is returned.
> +The KVM_S390_MEMOP_F_CMPXCHG flag is supported if KVM_CAP_S390_MEM_OP_EXTENSION
> +has bit 1 (i.e. bit with value 2) set.

Thomas

PS: Please take my suggestions with a grain of salt ... I'm not a native
speaker either.

2022-11-22 13:42:48

by Janis Schoetterl-Glausch

[permalink] [raw]
Subject: Re: [PATCH v3 2/9] Documentation: KVM: s390: Describe KVM_S390_MEMOP_F_CMPXCHG

On Tue, 2022-11-22 at 08:47 +0100, Thomas Huth wrote:
> On 17/11/2022 23.17, Janis Schoetterl-Glausch wrote:
> > Describe the semantics of the new KVM_S390_MEMOP_F_CMPXCHG flag for
> > absolute vm write memops which allows user space to perform (storage key
> > checked) cmpxchg operations on guest memory.
> >
> > Signed-off-by: Janis Schoetterl-Glausch <[email protected]>
> > ---
> ...
> > Supported flags:
> > * ``KVM_S390_MEMOP_F_CHECK_ONLY``
> > * ``KVM_S390_MEMOP_F_SKEY_PROTECTION``
> > + * ``KVM_S390_MEMOP_F_CMPXCHG``
> > +
> > +The semantics of the flags common with logical acesses are as for logical
> > +accesses.
> > +
> > +For write accesses, the KVM_S390_MEMOP_F_CMPXCHG might be supported.
>
> I'd maybe merge this with the last sentence:
>
> For write accesses, the KVM_S390_MEMOP_F_CMPXCHG flag is supported if
> KVM_CAP_S390_MEM_OP_EXTENSION has bit 1 (i.e. bit with value 2) set.

Ok.
>
> ... and speaking of that, I wonder whether it's maybe a good idea to
> introduce some #defines for bit 1 / value 2, to avoid the confusion ?

Not sure, I don't feel it's too complicated. Where would you define it?
Next to the mem_op struct? KVM_S390_MEMOP_EXTENSION_CAP_CMPXCHG?
>
> > +In this case, instead of doing an unconditional write, the access occurs only
> > +if the target location contains the "size" byte long value pointed to by
> > +"old_p". This is performed as an atomic cmpxchg.
>
> I had to read the first sentence twice to understand it ... maybe it's
> easier to understand if you move the "size" part to the second sentence:
>
> In this case, instead of doing an unconditional write, the access occurs
> only if the target location contains value pointed to by "old_p". This is
> performed as an atomic cmpxchg with the length specified by the "size"
> parameter.
>
> ?

Ok.
>
> > "size" must be a power of two
> > +up to and including 16.
> > +The value at the target location is written to the location "old_p" points to.
>
> IMHO something like this would be better:
>
> The value at the target location is replaced with the value from the
> location that "old_p" points to.

I'm trying to say the opposite :).
I went with this:

If the exchange did not take place because the target value doesn't match the
old value, KVM_S390_MEMOP_R_NO_XCHG is returned.
In this case the value "old_addr" points to is replaced by the target value.
>
> > +If the exchange did not take place because the target value doesn't match the
> > +old value KVM_S390_MEMOP_R_NO_XCHG is returned.
> > +The KVM_S390_MEMOP_F_CMPXCHG flag is supported if KVM_CAP_S390_MEM_OP_EXTENSION
> > +has bit 1 (i.e. bit with value 2) set.
>
> Thomas
>
> PS: Please take my suggestions with a grain of salt ... I'm not a native
> speaker either.
>

2022-11-25 09:12:36

by Nico Boehr

[permalink] [raw]
Subject: Re: [PATCH v3 2/9] Documentation: KVM: s390: Describe KVM_S390_MEMOP_F_CMPXCHG

Quoting Janis Schoetterl-Glausch (2022-11-22 14:10:41)
> On Tue, 2022-11-22 at 08:47 +0100, Thomas Huth wrote:
> > On 17/11/2022 23.17, Janis Schoetterl-Glausch wrote:
[...]
> > > Supported flags:
> > > * ``KVM_S390_MEMOP_F_CHECK_ONLY``
> > > * ``KVM_S390_MEMOP_F_SKEY_PROTECTION``
> > > + * ``KVM_S390_MEMOP_F_CMPXCHG``
> > > +
> > > +The semantics of the flags common with logical acesses are as for logical
> > > +accesses.
> > > +
> > > +For write accesses, the KVM_S390_MEMOP_F_CMPXCHG might be supported.
> >
> > I'd maybe merge this with the last sentence:
> >
> > For write accesses, the KVM_S390_MEMOP_F_CMPXCHG flag is supported if
> > KVM_CAP_S390_MEM_OP_EXTENSION has bit 1 (i.e. bit with value 2) set.
>
> Ok.
> >
> > ... and speaking of that, I wonder whether it's maybe a good idea to
> > introduce some #defines for bit 1 / value 2, to avoid the confusion ?
>
> Not sure, I don't feel it's too complicated. Where would you define it?
> Next to the mem_op struct? KVM_S390_MEMOP_EXTENSION_CAP_CMPXCHG?

I think the define would be a good idea. Location and name sound good to me.

You could also replace the hard-coded 0x3 in kvm_vm_ioctl_check_extension() when you have the define.

2022-12-01 17:30:46

by Claudio Imbrenda

[permalink] [raw]
Subject: Re: [PATCH v3 2/9] Documentation: KVM: s390: Describe KVM_S390_MEMOP_F_CMPXCHG

On Thu, 17 Nov 2022 23:17:51 +0100
Janis Schoetterl-Glausch <[email protected]> wrote:

> Describe the semantics of the new KVM_S390_MEMOP_F_CMPXCHG flag for
> absolute vm write memops which allows user space to perform (storage key
> checked) cmpxchg operations on guest memory.
>
> Signed-off-by: Janis Schoetterl-Glausch <[email protected]>
> ---
> Documentation/virt/kvm/api.rst | 21 +++++++++++++++++++--
> 1 file changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index eee9f857a986..204d128f23e0 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -3753,7 +3753,8 @@ The fields in each entry are defined as follows:
> :Parameters: struct kvm_s390_mem_op (in)
> :Returns: = 0 on success,
> < 0 on generic error (e.g. -EFAULT or -ENOMEM),
> - > 0 if an exception occurred while walking the page tables
> + 16 bit program exception code if the access causes such an exception
> + other code > maximum 16 bit value with special meaning

I would write the number explicitly ( > 65535 or > 0xffff )

>
> Read or write data from/to the VM's memory.
> The KVM_CAP_S390_MEM_OP_EXTENSION capability specifies what functionality is
> @@ -3771,6 +3772,8 @@ Parameters are specified via the following structure::
> struct {
> __u8 ar; /* the access register number */
> __u8 key; /* access key, ignored if flag unset */
> + __u8 pad1[6]; /* ignored */
> + __u64 old_p; /* ignored if flag unset */
> };
> __u32 sida_offset; /* offset into the sida */
> __u8 reserved[32]; /* ignored */
> @@ -3853,8 +3856,22 @@ Absolute accesses are permitted for non-protected guests only.
> Supported flags:
> * ``KVM_S390_MEMOP_F_CHECK_ONLY``
> * ``KVM_S390_MEMOP_F_SKEY_PROTECTION``
> + * ``KVM_S390_MEMOP_F_CMPXCHG``
> +
> +The semantics of the flags common with logical acesses are as for logical
> +accesses.
> +
> +For write accesses, the KVM_S390_MEMOP_F_CMPXCHG might be supported.
> +In this case, instead of doing an unconditional write, the access occurs only
> +if the target location contains the "size" byte long value pointed to by
> +"old_p". This is performed as an atomic cmpxchg. "size" must be a power of two
> +up to and including 16.
> +The value at the target location is written to the location "old_p" points to.
> +If the exchange did not take place because the target value doesn't match the
> +old value KVM_S390_MEMOP_R_NO_XCHG is returned.
> +The KVM_S390_MEMOP_F_CMPXCHG flag is supported if KVM_CAP_S390_MEM_OP_EXTENSION
> +has bit 1 (i.e. bit with value 2) set.
>
> -The semantics of the flags are as for logical accesses.
>
> SIDA read/write:
> ^^^^^^^^^^^^^^^^

2022-12-01 17:44:50

by Claudio Imbrenda

[permalink] [raw]
Subject: Re: [PATCH v3 1/9] KVM: s390: Extend MEM_OP ioctl by storage key checked cmpxchg

On Thu, 17 Nov 2022 23:17:50 +0100
Janis Schoetterl-Glausch <[email protected]> wrote:

> User space can use the MEM_OP ioctl to make storage key checked reads
> and writes to the guest, however, it has no way of performing atomic,
> key checked, accesses to the guest.
> Extend the MEM_OP ioctl in order to allow for this, by adding a cmpxchg
> mode. For now, support this mode for absolute accesses only.
>
> This mode can be use, for example, to set the device-state-change
> indicator and the adapter-local-summary indicator atomically.
>
> Signed-off-by: Janis Schoetterl-Glausch <[email protected]>
> ---
> include/uapi/linux/kvm.h | 5 ++
> arch/s390/kvm/gaccess.h | 3 ++
> arch/s390/kvm/gaccess.c | 101 +++++++++++++++++++++++++++++++++++++++
> arch/s390/kvm/kvm-s390.c | 35 +++++++++++++-
> 4 files changed, 142 insertions(+), 2 deletions(-)
>
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 0d5d4419139a..1f36be5493e6 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -588,6 +588,8 @@ struct kvm_s390_mem_op {
> struct {
> __u8 ar; /* the access register number */
> __u8 key; /* access key, ignored if flag unset */
> + __u8 pad1[6]; /* ignored */
> + __u64 old_p; /* ignored if flag unset */
> };
> __u32 sida_offset; /* offset into the sida */
> __u8 reserved[32]; /* ignored */
> @@ -604,6 +606,9 @@ struct kvm_s390_mem_op {
> #define KVM_S390_MEMOP_F_CHECK_ONLY (1ULL << 0)
> #define KVM_S390_MEMOP_F_INJECT_EXCEPTION (1ULL << 1)
> #define KVM_S390_MEMOP_F_SKEY_PROTECTION (1ULL << 2)
> +#define KVM_S390_MEMOP_F_CMPXCHG (1ULL << 3)
> +/* Non program exception return codes (pgm codes are 16 bit) */
> +#define KVM_S390_MEMOP_R_NO_XCHG ((1 << 16) + 0)

are you planning to have further *_R_* macros in the near future?
if not, remove the + 0
if yes, move the (1 << 16) to a macro, so it becomes something like
(KVM_S390_MEMOP_R_BASE + 0)

(maybe you can find a better/shorter name)

>
> /* for KVM_INTERRUPT */
> struct kvm_interrupt {
> diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h
> index 9408d6cc8e2c..92a3b9fb31ec 100644
> --- a/arch/s390/kvm/gaccess.h
> +++ b/arch/s390/kvm/gaccess.h
> @@ -206,6 +206,9 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
> int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra,
> void *data, unsigned long len, enum gacc_mode mode);
>
> +int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len,
> + __uint128_t *old, __uint128_t new, u8 access_key);
> +
> /**
> * write_guest_with_key - copy data from kernel space to guest space
> * @vcpu: virtual cpu
> diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
> index 0243b6e38d36..be042865d8a1 100644
> --- a/arch/s390/kvm/gaccess.c
> +++ b/arch/s390/kvm/gaccess.c
> @@ -1161,6 +1161,107 @@ int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra,
> return rc;
> }
>
> +/**
> + * cmpxchg_guest_abs_with_key() - Perform cmpxchg on guest absolute address.
> + * @kvm: Virtual machine instance.
> + * @gpa: Absolute guest address of the location to be changed.
> + * @len: Operand length of the cmpxchg, required: 1 <= len <= 16. Providing a
> + * non power of two will result in failure.
> + * @old_p: Pointer to old value. If the location at @gpa contains this value, the
> + * exchange will succeed. After calling cmpxchg_guest_abs_with_key() *@old
> + * contains the value at @gpa before the attempt to exchange the value.
> + * @new: The value to place at @gpa.
> + * @access_key: The access key to use for the guest access.
> + *
> + * Atomically exchange the value at @gpa by @new, if it contains *@old.
> + * Honors storage keys.
> + *
> + * Return: * 0: successful exchange
> + * * 1: exchange unsuccessful
> + * * a program interruption code indicating the reason cmpxchg could
> + * not be attempted
> + * * -EINVAL: address misaligned or len not power of two
> + * * -EAGAIN: transient failure (len 1 or 2)

please also document -EOPNOTSUPP

> + */
> +int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len,
> + __uint128_t *old_p, __uint128_t new,
> + u8 access_key)
> +{
> + gfn_t gfn = gpa >> PAGE_SHIFT;
> + struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);

exchange the above two lines (reverse christmas tree)

> + bool writable;
> + hva_t hva;
> + int ret;
> +
> + if (!IS_ALIGNED(gpa, len))
> + return -EINVAL;
> +
> + hva = gfn_to_hva_memslot_prot(slot, gfn, &writable);
> + if (kvm_is_error_hva(hva))
> + return PGM_ADDRESSING;
> + /*
> + * Check if it's a read-only memslot, even though that cannot occur
> + * since those are unsupported.
> + * Don't try to actually handle that case.
> + */
> + if (!writable)
> + return -EOPNOTSUPP;

either you document this, or you return something else (like -EINVAL)

> +
> + hva += offset_in_page(gpa);
> + switch (len) {
> + case 1: {
> + u8 old;
> +
> + ret = cmpxchg_user_key((u8 *)hva, &old, *old_p, new, access_key);
> + ret = ret < 0 ? ret : old != *old_p;
> + *old_p = old;
> + break;
> + }
> + case 2: {
> + u16 old;
> +
> + ret = cmpxchg_user_key((u16 *)hva, &old, *old_p, new, access_key);
> + ret = ret < 0 ? ret : old != *old_p;
> + *old_p = old;
> + break;
> + }
> + case 4: {
> + u32 old;
> +
> + ret = cmpxchg_user_key((u32 *)hva, &old, *old_p, new, access_key);
> + ret = ret < 0 ? ret : old != *old_p;
> + *old_p = old;
> + break;
> + }
> + case 8: {
> + u64 old;
> +
> + ret = cmpxchg_user_key((u64 *)hva, &old, *old_p, new, access_key);
> + ret = ret < 0 ? ret : old != *old_p;
> + *old_p = old;
> + break;
> + }
> + case 16: {
> + __uint128_t old;
> +
> + ret = cmpxchg_user_key((__uint128_t *)hva, &old, *old_p, new, access_key);
> + ret = ret < 0 ? ret : old != *old_p;
> + *old_p = old;
> + break;

I really dislike repeating the same code 5 times, but I guess there was
no other way?

> + }
> + default:
> + return -EINVAL;
> + }
> + mark_page_dirty_in_slot(kvm, slot, gfn);
> + /*
> + * Assume that the fault is caused by protection, either key protection
> + * or user page write protection.
> + */
> + if (ret == -EFAULT)
> + ret = PGM_PROTECTION;
> + return ret;
> +}
> +
> /**
> * guest_translate_address_with_key - translate guest logical into guest absolute address
> * @vcpu: virtual cpu
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 45d4b8182b07..2410b4044283 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -576,7 +576,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> case KVM_CAP_S390_VCPU_RESETS:
> case KVM_CAP_SET_GUEST_DEBUG:
> case KVM_CAP_S390_DIAG318:
> - case KVM_CAP_S390_MEM_OP_EXTENSION:
> r = 1;
> break;
> case KVM_CAP_SET_GUEST_DEBUG2:
> @@ -590,6 +589,14 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> case KVM_CAP_S390_MEM_OP:
> r = MEM_OP_MAX_SIZE;
> break;
> + case KVM_CAP_S390_MEM_OP_EXTENSION:
> + /*
> + * Flag bits indicating which extensions are supported.
> + * The first extension doesn't use a flag, but pretend it does,
> + * this way that can be changed in the future.
> + */
> + r = 0x3;
> + break;
> case KVM_CAP_NR_VCPUS:
> case KVM_CAP_MAX_VCPUS:
> case KVM_CAP_MAX_VCPU_ID:
> @@ -2714,12 +2721,19 @@ static bool access_key_invalid(u8 access_key)
> static int kvm_s390_vm_mem_op(struct kvm *kvm, struct kvm_s390_mem_op *mop)
> {
> void __user *uaddr = (void __user *)mop->buf;
> + void __user *old_p = (void __user *)mop->old_p;
> + union {
> + __uint128_t quad;
> + char raw[sizeof(__uint128_t)];
> + } old = { .quad = 0}, new = { .quad = 0 };
> + unsigned int off_in_quad = sizeof(__uint128_t) - mop->size;
> u64 supported_flags;
> void *tmpbuf = NULL;
> int r, srcu_idx;
>
> supported_flags = KVM_S390_MEMOP_F_SKEY_PROTECTION
> - | KVM_S390_MEMOP_F_CHECK_ONLY;
> + | KVM_S390_MEMOP_F_CHECK_ONLY
> + | KVM_S390_MEMOP_F_CMPXCHG;
> if (mop->flags & ~supported_flags || !mop->size)
> return -EINVAL;
> if (mop->size > MEM_OP_MAX_SIZE)
> @@ -2741,6 +2755,15 @@ static int kvm_s390_vm_mem_op(struct kvm *kvm, struct kvm_s390_mem_op *mop)
> } else {
> mop->key = 0;
> }
> + if (mop->flags & KVM_S390_MEMOP_F_CMPXCHG) {

add a quick comment here to explain that this check validates
off_in_quad, and that it does not do a full validation of mop->size,
which will happen in cmpxchg_guest_abs_with_key.

> + if (mop->size > sizeof(new))
> + return -EINVAL;
> + /* off_in_quad has been validated */
> + if (copy_from_user(&new.raw[off_in_quad], uaddr, mop->size))
> + return -EFAULT;
> + if (copy_from_user(&old.raw[off_in_quad], old_p, mop->size))
> + return -EFAULT;
> + }
> if (!(mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY)) {
> tmpbuf = vmalloc(mop->size);
> if (!tmpbuf)
> @@ -2771,6 +2794,14 @@ static int kvm_s390_vm_mem_op(struct kvm *kvm, struct kvm_s390_mem_op *mop)
> case KVM_S390_MEMOP_ABSOLUTE_WRITE: {
> if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) {
> r = check_gpa_range(kvm, mop->gaddr, mop->size, GACC_STORE, mop->key);
> + } else if (mop->flags & KVM_S390_MEMOP_F_CMPXCHG) {
> + r = cmpxchg_guest_abs_with_key(kvm, mop->gaddr, mop->size,
> + &old.quad, new.quad, mop->key);
> + if (r == 1) {
> + r = KVM_S390_MEMOP_R_NO_XCHG;

I wonder if you could not simplify things by returning directly
KVM_S390_MEMOP_R_NO_XCHG instead of 1

> + if (copy_to_user(old_p, &old.raw[off_in_quad], mop->size))
> + r = -EFAULT;
> + }
> } else {
> if (copy_from_user(tmpbuf, uaddr, mop->size)) {
> r = -EFAULT;

2022-12-01 18:10:06

by Janis Schoetterl-Glausch

[permalink] [raw]
Subject: Re: [PATCH v3 1/9] KVM: s390: Extend MEM_OP ioctl by storage key checked cmpxchg

On Thu, 2022-12-01 at 17:15 +0100, Claudio Imbrenda wrote:
> On Thu, 17 Nov 2022 23:17:50 +0100
> Janis Schoetterl-Glausch <[email protected]> wrote:
>
> > User space can use the MEM_OP ioctl to make storage key checked reads
> > and writes to the guest, however, it has no way of performing atomic,
> > key checked, accesses to the guest.
> > Extend the MEM_OP ioctl in order to allow for this, by adding a cmpxchg
> > mode. For now, support this mode for absolute accesses only.
> >
> > This mode can be use, for example, to set the device-state-change
> > indicator and the adapter-local-summary indicator atomically.
> >
> > Signed-off-by: Janis Schoetterl-Glausch <[email protected]>
> > ---
> > include/uapi/linux/kvm.h | 5 ++
> > arch/s390/kvm/gaccess.h | 3 ++
> > arch/s390/kvm/gaccess.c | 101 +++++++++++++++++++++++++++++++++++++++
> > arch/s390/kvm/kvm-s390.c | 35 +++++++++++++-
> > 4 files changed, 142 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index 0d5d4419139a..1f36be5493e6 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -588,6 +588,8 @@ struct kvm_s390_mem_op {
> > struct {
> > __u8 ar; /* the access register number */
> > __u8 key; /* access key, ignored if flag unset */
> > + __u8 pad1[6]; /* ignored */
> > + __u64 old_p; /* ignored if flag unset */
> > };
> > __u32 sida_offset; /* offset into the sida */
> > __u8 reserved[32]; /* ignored */
> > @@ -604,6 +606,9 @@ struct kvm_s390_mem_op {
> > #define KVM_S390_MEMOP_F_CHECK_ONLY (1ULL << 0)
> > #define KVM_S390_MEMOP_F_INJECT_EXCEPTION (1ULL << 1)
> > #define KVM_S390_MEMOP_F_SKEY_PROTECTION (1ULL << 2)
> > +#define KVM_S390_MEMOP_F_CMPXCHG (1ULL << 3)
> > +/* Non program exception return codes (pgm codes are 16 bit) */
> > +#define KVM_S390_MEMOP_R_NO_XCHG ((1 << 16) + 0)
>
> are you planning to have further *_R_* macros in the near future?
> if not, remove the + 0

No, we can indeed just add it back if there ever are additional ones.

> if yes, move the (1 << 16) to a macro, so it becomes something like
> (KVM_S390_MEMOP_R_BASE + 0)
>
> (maybe you can find a better/shorter name)
>
> >
> > /* for KVM_INTERRUPT */
> > struct kvm_interrupt {
> > diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h
> > index 9408d6cc8e2c..92a3b9fb31ec 100644
> > --- a/arch/s390/kvm/gaccess.h
> > +++ b/arch/s390/kvm/gaccess.h
> > @@ -206,6 +206,9 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
> > int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra,
> > void *data, unsigned long len, enum gacc_mode mode);
> >
> > +int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len,
> > + __uint128_t *old, __uint128_t new, u8 access_key);
> > +
> > /**
> > * write_guest_with_key - copy data from kernel space to guest space
> > * @vcpu: virtual cpu
> > diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
> > index 0243b6e38d36..be042865d8a1 100644
> > --- a/arch/s390/kvm/gaccess.c
> > +++ b/arch/s390/kvm/gaccess.c
> > @@ -1161,6 +1161,107 @@ int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra,
> > return rc;
> > }
> >
> > +/**
> > + * cmpxchg_guest_abs_with_key() - Perform cmpxchg on guest absolute address.
> > + * @kvm: Virtual machine instance.
> > + * @gpa: Absolute guest address of the location to be changed.
> > + * @len: Operand length of the cmpxchg, required: 1 <= len <= 16. Providing a
> > + * non power of two will result in failure.
> > + * @old_p: Pointer to old value. If the location at @gpa contains this value, the
> > + * exchange will succeed. After calling cmpxchg_guest_abs_with_key() *@old
> > + * contains the value at @gpa before the attempt to exchange the value.
> > + * @new: The value to place at @gpa.
> > + * @access_key: The access key to use for the guest access.
> > + *
> > + * Atomically exchange the value at @gpa by @new, if it contains *@old.
> > + * Honors storage keys.
> > + *
> > + * Return: * 0: successful exchange
> > + * * 1: exchange unsuccessful
> > + * * a program interruption code indicating the reason cmpxchg could
> > + * not be attempted
> > + * * -EINVAL: address misaligned or len not power of two
> > + * * -EAGAIN: transient failure (len 1 or 2)
>
> please also document -EOPNOTSUPP

I'd add "* -EOPNOTSUPP: should never occur", then, that ok with you?
>
> > + */
> > +int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len,
> > + __uint128_t *old_p, __uint128_t new,
> > + u8 access_key)
> > +{
> > + gfn_t gfn = gpa >> PAGE_SHIFT;
> > + struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
>
> exchange the above two lines (reverse christmas tree)

Is this a hard requirement? Since there is a dependency.
If I do the initialization further down, the order wouldn't actually change.
>
> > + bool writable;
> > + hva_t hva;
> > + int ret;
> > +
> > + if (!IS_ALIGNED(gpa, len))
> > + return -EINVAL;
> > +
> > + hva = gfn_to_hva_memslot_prot(slot, gfn, &writable);
> > + if (kvm_is_error_hva(hva))
> > + return PGM_ADDRESSING;
> > + /*
> > + * Check if it's a read-only memslot, even though that cannot occur
> > + * since those are unsupported.
> > + * Don't try to actually handle that case.
> > + */
> > + if (!writable)
> > + return -EOPNOTSUPP;
>
> either you document this, or you return something else (like -EINVAL)
>
> > +
> > + hva += offset_in_page(gpa);
> > + switch (len) {
> > + case 1: {
> > + u8 old;
> > +
> > + ret = cmpxchg_user_key((u8 *)hva, &old, *old_p, new, access_key);
> > + ret = ret < 0 ? ret : old != *old_p;
> > + *old_p = old;
> > + break;
> > + }
> > + case 2: {
> > + u16 old;
> > +
> > + ret = cmpxchg_user_key((u16 *)hva, &old, *old_p, new, access_key);
> > + ret = ret < 0 ? ret : old != *old_p;
> > + *old_p = old;
> > + break;
> > + }
> > + case 4: {
> > + u32 old;
> > +
> > + ret = cmpxchg_user_key((u32 *)hva, &old, *old_p, new, access_key);
> > + ret = ret < 0 ? ret : old != *old_p;
> > + *old_p = old;
> > + break;
> > + }
> > + case 8: {
> > + u64 old;
> > +
> > + ret = cmpxchg_user_key((u64 *)hva, &old, *old_p, new, access_key);
> > + ret = ret < 0 ? ret : old != *old_p;
> > + *old_p = old;
> > + break;
> > + }
> > + case 16: {
> > + __uint128_t old;
> > +
> > + ret = cmpxchg_user_key((__uint128_t *)hva, &old, *old_p, new, access_key);
> > + ret = ret < 0 ? ret : old != *old_p;
> > + *old_p = old;
> > + break;
>
> I really dislike repeating the same code 5 times, but I guess there was
> no other way?

I could use the function called by cmpxchg_user_key directly, but Heiko won't agree to that.
A macro would work too, of course, not sure if I prefer that tho.
>
> > + }
> > + default:
> > + return -EINVAL;
> > + }
> > + mark_page_dirty_in_slot(kvm, slot, gfn);
> > + /*
> > + * Assume that the fault is caused by protection, either key protection
> > + * or user page write protection.
> > + */
> > + if (ret == -EFAULT)
> > + ret = PGM_PROTECTION;
> > + return ret;
> > +}
> > +
> > /**
> > * guest_translate_address_with_key - translate guest logical into guest absolute address
> > * @vcpu: virtual cpu
> > diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> > index 45d4b8182b07..2410b4044283 100644
> > --- a/arch/s390/kvm/kvm-s390.c
> > +++ b/arch/s390/kvm/kvm-s390.c
> > @@ -576,7 +576,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> > case KVM_CAP_S390_VCPU_RESETS:
> > case KVM_CAP_SET_GUEST_DEBUG:
> > case KVM_CAP_S390_DIAG318:
> > - case KVM_CAP_S390_MEM_OP_EXTENSION:
> > r = 1;
> > break;
> > case KVM_CAP_SET_GUEST_DEBUG2:
> > @@ -590,6 +589,14 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> > case KVM_CAP_S390_MEM_OP:
> > r = MEM_OP_MAX_SIZE;
> > break;
> > + case KVM_CAP_S390_MEM_OP_EXTENSION:
> > + /*
> > + * Flag bits indicating which extensions are supported.
> > + * The first extension doesn't use a flag, but pretend it does,
> > + * this way that can be changed in the future.
> > + */
> > + r = 0x3;
> > + break;
> > case KVM_CAP_NR_VCPUS:
> > case KVM_CAP_MAX_VCPUS:
> > case KVM_CAP_MAX_VCPU_ID:
> > @@ -2714,12 +2721,19 @@ static bool access_key_invalid(u8 access_key)
> > static int kvm_s390_vm_mem_op(struct kvm *kvm, struct kvm_s390_mem_op *mop)
> > {
> > void __user *uaddr = (void __user *)mop->buf;
> > + void __user *old_p = (void __user *)mop->old_p;
> > + union {
> > + __uint128_t quad;
> > + char raw[sizeof(__uint128_t)];
> > + } old = { .quad = 0}, new = { .quad = 0 };
> > + unsigned int off_in_quad = sizeof(__uint128_t) - mop->size;
> > u64 supported_flags;
> > void *tmpbuf = NULL;
> > int r, srcu_idx;
> >
> > supported_flags = KVM_S390_MEMOP_F_SKEY_PROTECTION
> > - | KVM_S390_MEMOP_F_CHECK_ONLY;
> > + | KVM_S390_MEMOP_F_CHECK_ONLY
> > + | KVM_S390_MEMOP_F_CMPXCHG;
> > if (mop->flags & ~supported_flags || !mop->size)
> > return -EINVAL;
> > if (mop->size > MEM_OP_MAX_SIZE)
> > @@ -2741,6 +2755,15 @@ static int kvm_s390_vm_mem_op(struct kvm *kvm, struct kvm_s390_mem_op *mop)
> > } else {
> > mop->key = 0;
> > }
> > + if (mop->flags & KVM_S390_MEMOP_F_CMPXCHG) {
>
> add a quick comment here to explain that this check validates
> off_in_quad, and that it does not do a full validation of mop->size,
> which will happen in cmpxchg_guest_abs_with_key.

Will do.
>
> > + if (mop->size > sizeof(new))
> > + return -EINVAL;
> > + /* off_in_quad has been validated */
> > + if (copy_from_user(&new.raw[off_in_quad], uaddr, mop->size))
> > + return -EFAULT;
> > + if (copy_from_user(&old.raw[off_in_quad], old_p, mop->size))
> > + return -EFAULT;
> > + }
> > if (!(mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY)) {
> > tmpbuf = vmalloc(mop->size);
> > if (!tmpbuf)
> > @@ -2771,6 +2794,14 @@ static int kvm_s390_vm_mem_op(struct kvm *kvm, struct kvm_s390_mem_op *mop)
> > case KVM_S390_MEMOP_ABSOLUTE_WRITE: {
> > if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) {
> > r = check_gpa_range(kvm, mop->gaddr, mop->size, GACC_STORE, mop->key);
> > + } else if (mop->flags & KVM_S390_MEMOP_F_CMPXCHG) {
> > + r = cmpxchg_guest_abs_with_key(kvm, mop->gaddr, mop->size,
> > + &old.quad, new.quad, mop->key);
> > + if (r == 1) {
> > + r = KVM_S390_MEMOP_R_NO_XCHG;
>
> I wonder if you could not simplify things by returning directly
> KVM_S390_MEMOP_R_NO_XCHG instead of 1

To me it feels like KVM_S390_MEMOP_R_NO_XCHG is api surface and should be referenced here.
cmpxchg_guest_abs_with_key isn't mem op specific
(of course that's the only thing it is currently used for).
>
> > + if (copy_to_user(old_p, &old.raw[off_in_quad], mop->size))
> > + r = -EFAULT;
> > + }
> > } else {
> > if (copy_from_user(tmpbuf, uaddr, mop->size)) {
> > r = -EFAULT;
>

2022-12-02 09:24:04

by Claudio Imbrenda

[permalink] [raw]
Subject: Re: [PATCH v3 1/9] KVM: s390: Extend MEM_OP ioctl by storage key checked cmpxchg

On Thu, 01 Dec 2022 18:44:56 +0100
Janis Schoetterl-Glausch <[email protected]> wrote:

> >
> > please also document -EOPNOTSUPP
>
> I'd add "* -EOPNOTSUPP: should never occur", then, that ok with you?

no, also explain in which conditions it is returned

something like:
* -EOPNOTSUPP: if the memslot is not writable (should never occour)

> >
> > > + */
> > > +int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len,
> > > + __uint128_t *old_p, __uint128_t new,
> > > + u8 access_key)
> > > +{
> > > + gfn_t gfn = gpa >> PAGE_SHIFT;
> > > + struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
> >
> > exchange the above two lines (reverse christmas tree)
>
> Is this a hard requirement? Since there is a dependency.
> If I do the initialization further down, the order wouldn't actually change.

ahhhhh right, I had missed that

keep it as it is, of course

[...]

> > I really dislike repeating the same code 5 times, but I guess there was
> > no other way?
>
> I could use the function called by cmpxchg_user_key directly, but Heiko won't agree to that.
> A macro would work too, of course, not sure if I prefer that tho.

ok so there is no other way, let's keep it as it is

[...]

> To me it feels like KVM_S390_MEMOP_R_NO_XCHG is api surface and should be referenced here.
> cmpxchg_guest_abs_with_key isn't mem op specific
> (of course that's the only thing it is currently used for).

fair enough

> >
> > > + if (copy_to_user(old_p, &old.raw[off_in_quad], mop->size))
> > > + r = -EFAULT;
> > > + }
> > > } else {
> > > if (copy_from_user(tmpbuf, uaddr, mop->size)) {
> > > r = -EFAULT;
> >
>