2022-04-25 16:03:17

by Janis Schoetterl-Glausch

[permalink] [raw]
Subject: [PATCH v2 0/2] Dirtying, failing memop: don't indicate suppression

If a memop fails due to key checked protection, after already having
written to the guest, don't indicate suppression to the guest, as that
would imply that memory wasn't modified.

This could be considered a fix to the code introducing storage key
support, however this is a bug in KVM only if we emulate an
instructions writing to an operand spanning multiple pages, which I
don't believe we do.

v1 -> v2
* Reword commit message of patch 1

Janis Schoetterl-Glausch (2):
KVM: s390: Don't indicate suppression on dirtying, failing memop
KVM: s390: selftest: Test suppression indication on key prot exception

arch/s390/kvm/gaccess.c | 47 ++++++++++++++---------
tools/testing/selftests/kvm/s390x/memop.c | 43 ++++++++++++++++++++-
2 files changed, 70 insertions(+), 20 deletions(-)


base-commit: af2d861d4cd2a4da5137f795ee3509e6f944a25b
--
2.32.0


2022-04-25 17:15:45

by Janis Schoetterl-Glausch

[permalink] [raw]
Subject: [PATCH v2 1/2] KVM: s390: Don't indicate suppression on dirtying, failing memop

If user space uses a memop to emulate an instruction and that
memop fails, the execution of the instruction ends.
Instruction execution can end in different ways, one of which is
suppression, which requires that the instruction execute like a no-op.
A writing memop that spans multiple pages and fails due to key
protection can modified guest memory, as a result, the likely
correct ending is termination. Therefore do not indicate a
suppressing instruction ending in this case.

Signed-off-by: Janis Schoetterl-Glausch <[email protected]>
---
arch/s390/kvm/gaccess.c | 47 ++++++++++++++++++++++++-----------------
1 file changed, 28 insertions(+), 19 deletions(-)

diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
index d53a183c2005..3b1fbef82288 100644
--- a/arch/s390/kvm/gaccess.c
+++ b/arch/s390/kvm/gaccess.c
@@ -491,8 +491,8 @@ enum prot_type {
PROT_TYPE_IEP = 4,
};

-static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva,
- u8 ar, enum gacc_mode mode, enum prot_type prot)
+static int trans_exc_ending(struct kvm_vcpu *vcpu, int code, unsigned long gva, u8 ar,
+ enum gacc_mode mode, enum prot_type prot, bool suppress)
{
struct kvm_s390_pgm_info *pgm = &vcpu->arch.pgm;
struct trans_exc_code_bits *tec;
@@ -503,22 +503,24 @@ static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva,

switch (code) {
case PGM_PROTECTION:
- switch (prot) {
- case PROT_TYPE_IEP:
- tec->b61 = 1;
- fallthrough;
- case PROT_TYPE_LA:
- tec->b56 = 1;
- break;
- case PROT_TYPE_KEYC:
- tec->b60 = 1;
- break;
- case PROT_TYPE_ALC:
- tec->b60 = 1;
- fallthrough;
- case PROT_TYPE_DAT:
- tec->b61 = 1;
- break;
+ if (suppress) {
+ switch (prot) {
+ case PROT_TYPE_IEP:
+ tec->b61 = 1;
+ fallthrough;
+ case PROT_TYPE_LA:
+ tec->b56 = 1;
+ break;
+ case PROT_TYPE_KEYC:
+ tec->b60 = 1;
+ break;
+ case PROT_TYPE_ALC:
+ tec->b60 = 1;
+ fallthrough;
+ case PROT_TYPE_DAT:
+ tec->b61 = 1;
+ break;
+ }
}
fallthrough;
case PGM_ASCE_TYPE:
@@ -552,6 +554,12 @@ static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva,
return code;
}

+static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva, u8 ar,
+ enum gacc_mode mode, enum prot_type prot)
+{
+ return trans_exc_ending(vcpu, code, gva, ar, mode, prot, true);
+}
+
static int get_vcpu_asce(struct kvm_vcpu *vcpu, union asce *asce,
unsigned long ga, u8 ar, enum gacc_mode mode)
{
@@ -1110,7 +1118,8 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
ga = kvm_s390_logical_to_effective(vcpu, ga + fragment_len);
}
if (rc > 0)
- rc = trans_exc(vcpu, rc, ga, ar, mode, prot);
+ rc = trans_exc_ending(vcpu, rc, ga, ar, mode, prot,
+ (mode != GACC_STORE) || (idx == 0));
out_unlock:
if (need_ipte_lock)
ipte_unlock(vcpu);
--
2.32.0

2022-04-25 18:46:59

by Christian Borntraeger

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] KVM: s390: Don't indicate suppression on dirtying, failing memop

Am 25.04.22 um 12:01 schrieb Janis Schoetterl-Glausch:
> If user space uses a memop to emulate an instruction and that
> memop fails, the execution of the instruction ends.
> Instruction execution can end in different ways, one of which is
> suppression, which requires that the instruction execute like a no-op.
> A writing memop that spans multiple pages and fails due to key
> protection can modified guest memory, as a result, the likely
> correct ending is termination. Therefore do not indicate a
> suppressing instruction ending in this case.
>
> Signed-off-by: Janis Schoetterl-Glausch <[email protected]>
> ---
> arch/s390/kvm/gaccess.c | 47 ++++++++++++++++++++++++-----------------
> 1 file changed, 28 insertions(+), 19 deletions(-)



Reviewed-by: Christian Borntraeger <[email protected]>

2022-04-25 19:53:55

by Janis Schoetterl-Glausch

[permalink] [raw]
Subject: [PATCH v2 2/2] KVM: s390: selftest: Test suppression indication on key prot exception

Check that suppression is not indicated on injection of a key checked
protection exception caused by a memop after it already modified guest
memory, as that violates the definition of suppression.

Signed-off-by: Janis Schoetterl-Glausch <[email protected]>
---
tools/testing/selftests/kvm/s390x/memop.c | 43 ++++++++++++++++++++++-
1 file changed, 42 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/s390x/memop.c b/tools/testing/selftests/kvm/s390x/memop.c
index b04c2c1b3c30..ce176ad9f216 100644
--- a/tools/testing/selftests/kvm/s390x/memop.c
+++ b/tools/testing/selftests/kvm/s390x/memop.c
@@ -194,6 +194,7 @@ static int err_memop_ioctl(struct test_vcpu vcpu, struct kvm_s390_mem_op *ksmo)
#define SIDA_OFFSET(o) ._sida_offset = 1, .sida_offset = (o)
#define AR(a) ._ar = 1, .ar = (a)
#define KEY(a) .f_key = 1, .key = (a)
+#define INJECT .f_inject = 1

#define CHECK_N_DO(f, ...) ({ f(__VA_ARGS__, CHECK_ONLY); f(__VA_ARGS__); })

@@ -430,9 +431,18 @@ static void test_copy_key_fetch_prot(void)
TEST_ASSERT(rv == 4, "Should result in protection exception"); \
})

+static void guest_error_key(void)
+{
+ GUEST_SYNC(STAGE_INITED);
+ set_storage_key_range(mem1, PAGE_SIZE, 0x18);
+ set_storage_key_range(mem1 + PAGE_SIZE, sizeof(mem1) - PAGE_SIZE, 0x98);
+ GUEST_SYNC(STAGE_SKEYS_SET);
+ GUEST_SYNC(STAGE_IDLED);
+}
+
static void test_errors_key(void)
{
- struct test_default t = test_default_init(guest_copy_key_fetch_prot);
+ struct test_default t = test_default_init(guest_error_key);

HOST_SYNC(t.vcpu, STAGE_INITED);
HOST_SYNC(t.vcpu, STAGE_SKEYS_SET);
@@ -446,6 +456,36 @@ static void test_errors_key(void)
kvm_vm_free(t.kvm_vm);
}

+static void test_termination(void)
+{
+ struct test_default t = test_default_init(guest_error_key);
+ uint64_t prefix;
+ uint64_t teid;
+ uint64_t psw[2];
+
+ HOST_SYNC(t.vcpu, STAGE_INITED);
+ HOST_SYNC(t.vcpu, STAGE_SKEYS_SET);
+
+ /* vcpu, mismatching keys after first page */
+ ERR_PROT_MOP(t.vcpu, LOGICAL, WRITE, mem1, t.size, GADDR_V(mem1), KEY(1), INJECT);
+ /*
+ * The memop injected a program exception and the test needs to check the
+ * Translation-Exception Identification (TEID). It is necessary to run
+ * the guest in order to be able to read the TEID from guest memory.
+ * Set the guest program new PSW, so the guest state is not clobbered.
+ */
+ prefix = t.run->s.regs.prefix;
+ psw[0] = t.run->psw_mask;
+ psw[1] = t.run->psw_addr;
+ MOP(t.vm, ABSOLUTE, WRITE, psw, sizeof(psw), GADDR(prefix + 464));
+ HOST_SYNC(t.vcpu, STAGE_IDLED);
+ MOP(t.vm, ABSOLUTE, READ, &teid, sizeof(teid), GADDR(prefix + 168));
+ /* Bits 56, 60, 61 form a code, 0 being the only one allowing for termination */
+ ASSERT_EQ(teid & 0x4c, 0);
+
+ kvm_vm_free(t.kvm_vm);
+}
+
static void test_errors_key_storage_prot_override(void)
{
struct test_default t = test_default_init(guest_copy_key_fetch_prot);
@@ -668,6 +708,7 @@ int main(int argc, char *argv[])
test_copy_key_fetch_prot();
test_copy_key_fetch_prot_override();
test_errors_key();
+ test_termination();
test_errors_key_storage_prot_override();
test_errors_key_fetch_prot_override_not_enabled();
test_errors_key_fetch_prot_override_enabled();
--
2.32.0

2022-04-25 22:30:17

by Christian Borntraeger

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] Dirtying, failing memop: don't indicate suppression

Am 25.04.22 um 12:01 schrieb Janis Schoetterl-Glausch:
> If a memop fails due to key checked protection, after already having
> written to the guest, don't indicate suppression to the guest, as that
> would imply that memory wasn't modified.
>
> This could be considered a fix to the code introducing storage key
> support, however this is a bug in KVM only if we emulate an
> instructions writing to an operand spanning multiple pages, which I
> don't believe we do.
>

Thanks applied. I think it makes sense for 5.18 nevertheless.

> v1 -> v2
> * Reword commit message of patch 1
>
> Janis Schoetterl-Glausch (2):
> KVM: s390: Don't indicate suppression on dirtying, failing memop
> KVM: s390: selftest: Test suppression indication on key prot exception
>
> arch/s390/kvm/gaccess.c | 47 ++++++++++++++---------
> tools/testing/selftests/kvm/s390x/memop.c | 43 ++++++++++++++++++++-
> 2 files changed, 70 insertions(+), 20 deletions(-)
>
>
> base-commit: af2d861d4cd2a4da5137f795ee3509e6f944a25b

2022-04-26 00:19:05

by Claudio Imbrenda

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] KVM: s390: Don't indicate suppression on dirtying, failing memop

On Mon, 25 Apr 2022 12:01:46 +0200
Janis Schoetterl-Glausch <[email protected]> wrote:

> If user space uses a memop to emulate an instruction and that
> memop fails, the execution of the instruction ends.
> Instruction execution can end in different ways, one of which is
> suppression, which requires that the instruction execute like a no-op.
> A writing memop that spans multiple pages and fails due to key
> protection can modified guest memory, as a result, the likely
> correct ending is termination. Therefore do not indicate a
> suppressing instruction ending in this case.
>
> Signed-off-by: Janis Schoetterl-Glausch <[email protected]>

Reviewed-by: Claudio Imbrenda <[email protected]>

> ---
> arch/s390/kvm/gaccess.c | 47 ++++++++++++++++++++++++-----------------
> 1 file changed, 28 insertions(+), 19 deletions(-)
>
> diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
> index d53a183c2005..3b1fbef82288 100644
> --- a/arch/s390/kvm/gaccess.c
> +++ b/arch/s390/kvm/gaccess.c
> @@ -491,8 +491,8 @@ enum prot_type {
> PROT_TYPE_IEP = 4,
> };
>
> -static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva,
> - u8 ar, enum gacc_mode mode, enum prot_type prot)
> +static int trans_exc_ending(struct kvm_vcpu *vcpu, int code, unsigned long gva, u8 ar,
> + enum gacc_mode mode, enum prot_type prot, bool suppress)
> {
> struct kvm_s390_pgm_info *pgm = &vcpu->arch.pgm;
> struct trans_exc_code_bits *tec;
> @@ -503,22 +503,24 @@ static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva,
>
> switch (code) {
> case PGM_PROTECTION:
> - switch (prot) {
> - case PROT_TYPE_IEP:
> - tec->b61 = 1;
> - fallthrough;
> - case PROT_TYPE_LA:
> - tec->b56 = 1;
> - break;
> - case PROT_TYPE_KEYC:
> - tec->b60 = 1;
> - break;
> - case PROT_TYPE_ALC:
> - tec->b60 = 1;
> - fallthrough;
> - case PROT_TYPE_DAT:
> - tec->b61 = 1;
> - break;
> + if (suppress) {
> + switch (prot) {
> + case PROT_TYPE_IEP:
> + tec->b61 = 1;
> + fallthrough;
> + case PROT_TYPE_LA:
> + tec->b56 = 1;
> + break;
> + case PROT_TYPE_KEYC:
> + tec->b60 = 1;
> + break;
> + case PROT_TYPE_ALC:
> + tec->b60 = 1;
> + fallthrough;
> + case PROT_TYPE_DAT:
> + tec->b61 = 1;
> + break;
> + }
> }
> fallthrough;
> case PGM_ASCE_TYPE:
> @@ -552,6 +554,12 @@ static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva,
> return code;
> }
>
> +static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva, u8 ar,
> + enum gacc_mode mode, enum prot_type prot)
> +{
> + return trans_exc_ending(vcpu, code, gva, ar, mode, prot, true);
> +}
> +
> static int get_vcpu_asce(struct kvm_vcpu *vcpu, union asce *asce,
> unsigned long ga, u8 ar, enum gacc_mode mode)
> {
> @@ -1110,7 +1118,8 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
> ga = kvm_s390_logical_to_effective(vcpu, ga + fragment_len);
> }
> if (rc > 0)
> - rc = trans_exc(vcpu, rc, ga, ar, mode, prot);
> + rc = trans_exc_ending(vcpu, rc, ga, ar, mode, prot,
> + (mode != GACC_STORE) || (idx == 0));
> out_unlock:
> if (need_ipte_lock)
> ipte_unlock(vcpu);

2022-04-26 03:52:14

by Janis Schoetterl-Glausch

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] Dirtying, failing memop: don't indicate suppression

On 4/25/22 18:30, Christian Borntraeger wrote:
> Am 25.04.22 um 12:01 schrieb Janis Schoetterl-Glausch:
>> If a memop fails due to key checked protection, after already having
>> written to the guest, don't indicate suppression to the guest, as that
>> would imply that memory wasn't modified.
>>
>> This could be considered a fix to the code introducing storage key
>> support, however this is a bug in KVM only if we emulate an
>> instructions writing to an operand spanning multiple pages, which I
>> don't believe we do.
>>
>
> Thanks applied. I think it makes sense for 5.18 nevertheless.

Janosch had some concerns because the protection code being 000 implies
that the effective address in the TEID is unpredictable.
Let's see if he chimes in.

>
>> v1 -> v2
>>   * Reword commit message of patch 1
>>
>> Janis Schoetterl-Glausch (2):
>>    KVM: s390: Don't indicate suppression on dirtying, failing memop
>>    KVM: s390: selftest: Test suppression indication on key prot exception
>>
>>   arch/s390/kvm/gaccess.c                   | 47 ++++++++++++++---------
>>   tools/testing/selftests/kvm/s390x/memop.c | 43 ++++++++++++++++++++-
>>   2 files changed, 70 insertions(+), 20 deletions(-)
>>
>>
>> base-commit: af2d861d4cd2a4da5137f795ee3509e6f944a25b
>

2022-04-26 08:33:57

by Christian Borntraeger

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] Dirtying, failing memop: don't indicate suppression



Am 25.04.22 um 19:29 schrieb Janis Schoetterl-Glausch:
> On 4/25/22 18:30, Christian Borntraeger wrote:
>> Am 25.04.22 um 12:01 schrieb Janis Schoetterl-Glausch:
>>> If a memop fails due to key checked protection, after already having
>>> written to the guest, don't indicate suppression to the guest, as that
>>> would imply that memory wasn't modified.
>>>
>>> This could be considered a fix to the code introducing storage key
>>> support, however this is a bug in KVM only if we emulate an
>>> instructions writing to an operand spanning multiple pages, which I
>>> don't believe we do.
>>>
>>
>> Thanks applied. I think it makes sense for 5.18 nevertheless.
>
> Janosch had some concerns because the protection code being 000 implies
> that the effective address in the TEID is unpredictable.
> Let's see if he chimes in.

z/VM does exactly the same on key protection crossing a page boundary. The
architecture was written in a way to allow all zeros exactly for this case.
(hypervisor emulation of key protection crossing pages).
This is even true for ESOP-2. See Figure 3-5 or figure 3-8 (the first line)
which allows to NOT have a valid address in the TEID for key controlled
protection.

The only question is, do we need to change the suppression parameter in
access_guest_with_key

(mode != GACC_STORE) || (idx == 0)

to also check for prot != PROT_TYPE_KEYC
? I think we do not need this as we have checked other reasons before.

2022-04-26 09:10:46

by Janosch Frank

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] Dirtying, failing memop: don't indicate suppression

On 4/26/22 08:19, Christian Borntraeger wrote:
>
>
> Am 25.04.22 um 19:29 schrieb Janis Schoetterl-Glausch:
>> On 4/25/22 18:30, Christian Borntraeger wrote:
>>> Am 25.04.22 um 12:01 schrieb Janis Schoetterl-Glausch:
>>>> If a memop fails due to key checked protection, after already having
>>>> written to the guest, don't indicate suppression to the guest, as that
>>>> would imply that memory wasn't modified.
>>>>
>>>> This could be considered a fix to the code introducing storage key
>>>> support, however this is a bug in KVM only if we emulate an
>>>> instructions writing to an operand spanning multiple pages, which I
>>>> don't believe we do.
>>>>
>>>
>>> Thanks applied. I think it makes sense for 5.18 nevertheless.
>>
>> Janosch had some concerns because the protection code being 000 implies
>> that the effective address in the TEID is unpredictable.
>> Let's see if he chimes in.
>
> z/VM does exactly the same on key protection crossing a page boundary. The
> architecture was written in a way to allow all zeros exactly for this case.
> (hypervisor emulation of key protection crossing pages).
> This is even true for ESOP-2. See Figure 3-5 or figure 3-8 (the first line)
> which allows to NOT have a valid address in the TEID for key controlled
> protection.
>
> The only question is, do we need to change the suppression parameter in
> access_guest_with_key
>
> (mode != GACC_STORE) || (idx == 0)
>
> to also check for prot != PROT_TYPE_KEYC
> ? I think we do not need this as we have checked other reasons before.

To me this measure looks like a last resort option and the POP doesn't
state a 100% what is to be done. Some instructions can mandate
suppression instead of termination according to the architects.

My intuition tells me that if we are in a situation where this would
happen then we would be much better off just doing it by hand (i.e. in
the instruction emulation code) and not letting this function decide.

So I'm not entirely sure if we're replacing something that is not
correct with something that also won't be correct for all cases.

But to summarize this: I'm not entirely sure even after reading the POP
for more than an hour and consulting an architect

2022-04-26 09:24:47

by Janosch Frank

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] KVM: s390: Don't indicate suppression on dirtying, failing memop

On 4/25/22 12:01, Janis Schoetterl-Glausch wrote:
> If user space uses a memop to emulate an instruction and that
> memop fails, the execution of the instruction ends.
> Instruction execution can end in different ways, one of which is
> suppression, which requires that the instruction execute like a no-op.



> A writing memop that spans multiple pages and fails due to key
> protection can modified guest memory, as a result, the likely
> correct ending is termination. Therefore do not indicate a
> suppressing instruction ending in this case.

Check grammar.

>
> Signed-off-by: Janis Schoetterl-Glausch <[email protected]>
> ---
> arch/s390/kvm/gaccess.c | 47 ++++++++++++++++++++++++-----------------
> 1 file changed, 28 insertions(+), 19 deletions(-)
>
> diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
> index d53a183c2005..3b1fbef82288 100644
> --- a/arch/s390/kvm/gaccess.c
> +++ b/arch/s390/kvm/gaccess.c
> @@ -491,8 +491,8 @@ enum prot_type {
> PROT_TYPE_IEP = 4,
> };
>
> -static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva,
> - u8 ar, enum gacc_mode mode, enum prot_type prot)
> +static int trans_exc_ending(struct kvm_vcpu *vcpu, int code, unsigned long gva, u8 ar,
> + enum gacc_mode mode, enum prot_type prot, bool suppress)
> {
> struct kvm_s390_pgm_info *pgm = &vcpu->arch.pgm;
> struct trans_exc_code_bits *tec;
> @@ -503,22 +503,24 @@ static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva,
>
> switch (code) {
> case PGM_PROTECTION:
> - switch (prot) {
> - case PROT_TYPE_IEP:
> - tec->b61 = 1;
> - fallthrough;
> - case PROT_TYPE_LA:
> - tec->b56 = 1;
> - break;
> - case PROT_TYPE_KEYC:
> - tec->b60 = 1;
> - break;
> - case PROT_TYPE_ALC:
> - tec->b60 = 1;
> - fallthrough;
> - case PROT_TYPE_DAT:
> - tec->b61 = 1;
> - break;
> + if (suppress) {
> + switch (prot) {
> + case PROT_TYPE_IEP:
> + tec->b61 = 1;
> + fallthrough;
> + case PROT_TYPE_LA:
> + tec->b56 = 1;
> + break;
> + case PROT_TYPE_KEYC:
> + tec->b60 = 1;
> + break;
> + case PROT_TYPE_ALC:
> + tec->b60 = 1;
> + fallthrough;
> + case PROT_TYPE_DAT:
> + tec->b61 = 1;
> + break;
> + }
> }

How about switching this around and masking those bits on termination.

> fallthrough;
> case PGM_ASCE_TYPE:
> @@ -552,6 +554,12 @@ static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva,
> return code;
> }
>
> +static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva, u8 ar,
> + enum gacc_mode mode, enum prot_type prot)
> +{
> + return trans_exc_ending(vcpu, code, gva, ar, mode, prot, true);
> +}
> +
> static int get_vcpu_asce(struct kvm_vcpu *vcpu, union asce *asce,
> unsigned long ga, u8 ar, enum gacc_mode mode)
> {
> @@ -1110,7 +1118,8 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
> ga = kvm_s390_logical_to_effective(vcpu, ga + fragment_len);
> }
> if (rc > 0)
> - rc = trans_exc(vcpu, rc, ga, ar, mode, prot);
> + rc = trans_exc_ending(vcpu, rc, ga, ar, mode, prot,
> + (mode != GACC_STORE) || (idx == 0));

Add a boolean variable named terminating, calculate the value before
passing the boolean on.

> out_unlock:
> if (need_ipte_lock)
> ipte_unlock(vcpu);


2022-04-26 19:03:33

by Janis Schoetterl-Glausch

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] Dirtying, failing memop: don't indicate suppression

On 4/26/22 09:25, Janosch Frank wrote:
> On 4/26/22 08:19, Christian Borntraeger wrote:
>>
>>
>> Am 25.04.22 um 19:29 schrieb Janis Schoetterl-Glausch:
>>> On 4/25/22 18:30, Christian Borntraeger wrote:
>>>> Am 25.04.22 um 12:01 schrieb Janis Schoetterl-Glausch:
>>>>> If a memop fails due to key checked protection, after already having
>>>>> written to the guest, don't indicate suppression to the guest, as that
>>>>> would imply that memory wasn't modified.
>>>>>
>>>>> This could be considered a fix to the code introducing storage key
>>>>> support, however this is a bug in KVM only if we emulate an
>>>>> instructions writing to an operand spanning multiple pages, which I
>>>>> don't believe we do.
>>>>>
>>>>
>>>> Thanks applied. I think it makes sense for 5.18 nevertheless.
>>>
>>> Janosch had some concerns because the protection code being 000 implies
>>> that the effective address in the TEID is unpredictable.
>>> Let's see if he chimes in.
>>
>> z/VM does exactly the same on key protection crossing a page boundary. The
>> architecture was written in a way to allow all zeros exactly for this case.
>> (hypervisor emulation of key protection crossing pages).
>> This is even true for ESOP-2. See Figure 3-5 or figure 3-8 (the first line)
>> which allows to NOT have a valid address in the TEID for key controlled
>> protection.

The question is if this is the best way to do it. Janosch brought up
interruptible instructions, for those you would want to just consider
the current unit of operation to be suppressed. Now this is not actually
relevant, I guess PFMF is the only interruptible instruction we'd emulate
and that ignores keys when clearing memory.
But maybe there are other edge cases.

>>
>> The only question is, do we need to change the suppression parameter in
>> access_guest_with_key
>>
>>    (mode != GACC_STORE) || (idx == 0)
>>
>> to also check for prot != PROT_TYPE_KEYC
>> ? I think we do not need this as we have checked other reasons before.

Yes, it is not necessary, the control flow is such that a protection exception
implies that is due to keys.
>
> To me this measure looks like a last resort option and the POP doesn't state a 100% what is to be done. Some instructions can mandate suppression instead of termination according to the architects.
>
> My intuition tells me that if we are in a situation where this would happen then we would be much better off just doing it by hand (i.e. in the instruction emulation code) and not letting this function decide.

For the instructions we currently need to emulate in KVM we should be fine.
So the question is what's best for the future and for instructions emulated by user space.
Upward in the call stack (including user space), we don't know the failing address,
which complicates handling it in the emulation code.
You could chop up the memop in page chunks to find out, but that might have other issues.

Since this behavior is very implicit and easy to overlook maybe we should document it
in the description of the memop ioctl?
>
> So I'm not entirely sure if we're replacing something that is not correct with something that also won't be correct for all cases.

That may be the case, which option is more correct/less incorrect tho?
It's hard to say because one would have to consider all instructions/possibilities,
but indicating not suppression when we've already written to memory, seems to make
sense more often than not doing so.
>
> But to summarize this: I'm not entirely sure even after reading the POP for more than an hour and consulting an architect

2022-04-26 20:04:34

by Janosch Frank

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] Dirtying, failing memop: don't indicate suppression

[...]
>>>
>>> The only question is, do we need to change the suppression parameter in
>>> access_guest_with_key
>>>
>>>    (mode != GACC_STORE) || (idx == 0)
>>>
>>> to also check for prot != PROT_TYPE_KEYC
>>> ? I think we do not need this as we have checked other reasons before.
>
> Yes, it is not necessary, the control flow is such that a protection exception
> implies that is due to keys.
>>
>> To me this measure looks like a last resort option and the POP doesn't state a 100% what is to be done. Some instructions can mandate suppression instead of termination according to the architects.
>>
>> My intuition tells me that if we are in a situation where this would happen then we would be much better off just doing it by hand (i.e. in the instruction emulation code) and not letting this function decide.
>
> For the instructions we currently need to emulate in KVM we should be fine.
> So the question is what's best for the future and for instructions emulated by user space.
> Upward in the call stack (including user space), we don't know the failing address,
> which complicates handling it in the emulation code.
> You could chop up the memop in page chunks to find out, but that might have other issues.
>
> Since this behavior is very implicit and easy to overlook maybe we should document it
> in the description of the memop ioctl?

Yeah, properly documenting this is the least we can do.

2022-04-27 00:23:14

by Janosch Frank

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] KVM: s390: Don't indicate suppression on dirtying, failing memop

On 4/26/22 15:25, Janis Schoetterl-Glausch wrote:
> On 4/26/22 09:18, Janosch Frank wrote:
>> On 4/25/22 12:01, Janis Schoetterl-Glausch wrote:
>>> If user space uses a memop to emulate an instruction and that
>>> memop fails, the execution of the instruction ends.
>>> Instruction execution can end in different ways, one of which is
>>> suppression, which requires that the instruction execute like a no-op.
>>
>>
>>
>>> A writing memop that spans multiple pages and fails due to key
>>> protection can modified guest memory, as a result, the likely
>>> correct ending is termination. Therefore do not indicate a
>>> suppressing instruction ending in this case.
>>
>> Check grammar.
>>
>>>
>>> Signed-off-by: Janis Schoetterl-Glausch <[email protected]>
>>> ---
>>>   arch/s390/kvm/gaccess.c | 47 ++++++++++++++++++++++++-----------------
>>>   1 file changed, 28 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
>>> index d53a183c2005..3b1fbef82288 100644
>>> --- a/arch/s390/kvm/gaccess.c
>>> +++ b/arch/s390/kvm/gaccess.c
>>> @@ -491,8 +491,8 @@ enum prot_type {
>>>       PROT_TYPE_IEP  = 4,
>>>   };
>>>   -static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva,
>>> -             u8 ar, enum gacc_mode mode, enum prot_type prot)
>>> +static int trans_exc_ending(struct kvm_vcpu *vcpu, int code, unsigned long gva, u8 ar,
>>> +                enum gacc_mode mode, enum prot_type prot, bool suppress)
>>>   {
>>>       struct kvm_s390_pgm_info *pgm = &vcpu->arch.pgm;
>>>       struct trans_exc_code_bits *tec;
>>> @@ -503,22 +503,24 @@ static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva,
>>>         switch (code) {
>>>       case PGM_PROTECTION:
>>> -        switch (prot) {
>>> -        case PROT_TYPE_IEP:
>>> -            tec->b61 = 1;
>>> -            fallthrough;
>>> -        case PROT_TYPE_LA:
>>> -            tec->b56 = 1;
>>> -            break;
>>> -        case PROT_TYPE_KEYC:
>>> -            tec->b60 = 1;
>>> -            break;
>>> -        case PROT_TYPE_ALC:
>>> -            tec->b60 = 1;
>>> -            fallthrough;
>>> -        case PROT_TYPE_DAT:
>>> -            tec->b61 = 1;
>>> -            break;
>>> +        if (suppress) {
>>> +            switch (prot) {
>>> +            case PROT_TYPE_IEP:
>>> +                tec->b61 = 1;
>>> +                fallthrough;
>>> +            case PROT_TYPE_LA:
>>> +                tec->b56 = 1;
>>> +                break;
>>> +            case PROT_TYPE_KEYC:
>>> +                tec->b60 = 1;
>>> +                break;
>>> +            case PROT_TYPE_ALC:
>>> +                tec->b60 = 1;
>>> +                fallthrough;
>>> +            case PROT_TYPE_DAT:
>>> +                tec->b61 = 1;
>>> +                break;
>>> +            }
>>>           }
>>
>> How about switching this around and masking those bits on termination.
>
> I did initially have if (!terminate) { ... }, but it seemed more straight forward
> to me without the negation. Or are you suggesting explicitly resetting the
> bits to zero when terminating?

Yes

>>
>>>           fallthrough;
>>>       case PGM_ASCE_TYPE:
>>> @@ -552,6 +554,12 @@ static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva,
>>>       return code;
>>>   }
>>>   +static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva, u8 ar,
>>> +             enum gacc_mode mode, enum prot_type prot)
>>> +{
>>> +    return trans_exc_ending(vcpu, code, gva, ar, mode, prot, true);
>>> +}
>>> +
>>>   static int get_vcpu_asce(struct kvm_vcpu *vcpu, union asce *asce,
>>>                unsigned long ga, u8 ar, enum gacc_mode mode)
>>>   {
>>> @@ -1110,7 +1118,8 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
>>>           ga = kvm_s390_logical_to_effective(vcpu, ga + fragment_len);
>>>       }
>>>       if (rc > 0)
>>> -        rc = trans_exc(vcpu, rc, ga, ar, mode, prot);
>>> +        rc = trans_exc_ending(vcpu, rc, ga, ar, mode, prot,
>>> +                      (mode != GACC_STORE) || (idx == 0));
>>
>> Add a boolean variable named terminating, calculate the value before passing the boolean on.
>
> Ok. I'll scope it to the body of the if.
>>
>>>   out_unlock:
>>>       if (need_ipte_lock)
>>>           ipte_unlock(vcpu);
>>
>>
>

2022-04-27 10:45:13

by Janis Schoetterl-Glausch

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] KVM: s390: Don't indicate suppression on dirtying, failing memop

On 4/26/22 09:18, Janosch Frank wrote:
> On 4/25/22 12:01, Janis Schoetterl-Glausch wrote:
>> If user space uses a memop to emulate an instruction and that
>> memop fails, the execution of the instruction ends.
>> Instruction execution can end in different ways, one of which is
>> suppression, which requires that the instruction execute like a no-op.
>
>
>
>> A writing memop that spans multiple pages and fails due to key
>> protection can modified guest memory, as a result, the likely
>> correct ending is termination. Therefore do not indicate a
>> suppressing instruction ending in this case.
>
> Check grammar.
>
>>
>> Signed-off-by: Janis Schoetterl-Glausch <[email protected]>
>> ---
>>   arch/s390/kvm/gaccess.c | 47 ++++++++++++++++++++++++-----------------
>>   1 file changed, 28 insertions(+), 19 deletions(-)
>>
>> diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
>> index d53a183c2005..3b1fbef82288 100644
>> --- a/arch/s390/kvm/gaccess.c
>> +++ b/arch/s390/kvm/gaccess.c
>> @@ -491,8 +491,8 @@ enum prot_type {
>>       PROT_TYPE_IEP  = 4,
>>   };
>>   -static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva,
>> -             u8 ar, enum gacc_mode mode, enum prot_type prot)
>> +static int trans_exc_ending(struct kvm_vcpu *vcpu, int code, unsigned long gva, u8 ar,
>> +                enum gacc_mode mode, enum prot_type prot, bool suppress)
>>   {
>>       struct kvm_s390_pgm_info *pgm = &vcpu->arch.pgm;
>>       struct trans_exc_code_bits *tec;
>> @@ -503,22 +503,24 @@ static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva,
>>         switch (code) {
>>       case PGM_PROTECTION:
>> -        switch (prot) {
>> -        case PROT_TYPE_IEP:
>> -            tec->b61 = 1;
>> -            fallthrough;
>> -        case PROT_TYPE_LA:
>> -            tec->b56 = 1;
>> -            break;
>> -        case PROT_TYPE_KEYC:
>> -            tec->b60 = 1;
>> -            break;
>> -        case PROT_TYPE_ALC:
>> -            tec->b60 = 1;
>> -            fallthrough;
>> -        case PROT_TYPE_DAT:
>> -            tec->b61 = 1;
>> -            break;
>> +        if (suppress) {
>> +            switch (prot) {
>> +            case PROT_TYPE_IEP:
>> +                tec->b61 = 1;
>> +                fallthrough;
>> +            case PROT_TYPE_LA:
>> +                tec->b56 = 1;
>> +                break;
>> +            case PROT_TYPE_KEYC:
>> +                tec->b60 = 1;
>> +                break;
>> +            case PROT_TYPE_ALC:
>> +                tec->b60 = 1;
>> +                fallthrough;
>> +            case PROT_TYPE_DAT:
>> +                tec->b61 = 1;
>> +                break;
>> +            }
>>           }
>
> How about switching this around and masking those bits on termination.

I did initially have if (!terminate) { ... }, but it seemed more straight forward
to me without the negation. Or are you suggesting explicitly resetting the
bits to zero when terminating?
>
>>           fallthrough;
>>       case PGM_ASCE_TYPE:
>> @@ -552,6 +554,12 @@ static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva,
>>       return code;
>>   }
>>   +static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva, u8 ar,
>> +             enum gacc_mode mode, enum prot_type prot)
>> +{
>> +    return trans_exc_ending(vcpu, code, gva, ar, mode, prot, true);
>> +}
>> +
>>   static int get_vcpu_asce(struct kvm_vcpu *vcpu, union asce *asce,
>>                unsigned long ga, u8 ar, enum gacc_mode mode)
>>   {
>> @@ -1110,7 +1118,8 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
>>           ga = kvm_s390_logical_to_effective(vcpu, ga + fragment_len);
>>       }
>>       if (rc > 0)
>> -        rc = trans_exc(vcpu, rc, ga, ar, mode, prot);
>> +        rc = trans_exc_ending(vcpu, rc, ga, ar, mode, prot,
>> +                      (mode != GACC_STORE) || (idx == 0));
>
> Add a boolean variable named terminating, calculate the value before passing the boolean on.

Ok. I'll scope it to the body of the if.
>
>>   out_unlock:
>>       if (need_ipte_lock)
>>           ipte_unlock(vcpu);
>
>

2022-04-29 10:46:25

by Janis Schoetterl-Glausch

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] KVM: s390: selftest: Test suppression indication on key prot exception

On 4/25/22 12:01, Janis Schoetterl-Glausch wrote:
> Check that suppression is not indicated on injection of a key checked
> protection exception caused by a memop after it already modified guest
> memory, as that violates the definition of suppression.
>
> Signed-off-by: Janis Schoetterl-Glausch <[email protected]>
> ---
> tools/testing/selftests/kvm/s390x/memop.c | 43 ++++++++++++++++++++++-
> 1 file changed, 42 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/kvm/s390x/memop.c b/tools/testing/selftests/kvm/s390x/memop.c
> index b04c2c1b3c30..ce176ad9f216 100644
> --- a/tools/testing/selftests/kvm/s390x/memop.c
> +++ b/tools/testing/selftests/kvm/s390x/memop.c

[...]

> +static void test_termination(void)
> +{
> + struct test_default t = test_default_init(guest_error_key);
> + uint64_t prefix;
> + uint64_t teid;
> + uint64_t psw[2];
> +
> + HOST_SYNC(t.vcpu, STAGE_INITED);
> + HOST_SYNC(t.vcpu, STAGE_SKEYS_SET);
> +
> + /* vcpu, mismatching keys after first page */
> + ERR_PROT_MOP(t.vcpu, LOGICAL, WRITE, mem1, t.size, GADDR_V(mem1), KEY(1), INJECT);
> + /*
> + * The memop injected a program exception and the test needs to check the
> + * Translation-Exception Identification (TEID). It is necessary to run
> + * the guest in order to be able to read the TEID from guest memory.
> + * Set the guest program new PSW, so the guest state is not clobbered.
> + */
> + prefix = t.run->s.regs.prefix;
> + psw[0] = t.run->psw_mask;
> + psw[1] = t.run->psw_addr;
> + MOP(t.vm, ABSOLUTE, WRITE, psw, sizeof(psw), GADDR(prefix + 464));
> + HOST_SYNC(t.vcpu, STAGE_IDLED);
> + MOP(t.vm, ABSOLUTE, READ, &teid, sizeof(teid), GADDR(prefix + 168));
> + /* Bits 56, 60, 61 form a code, 0 being the only one allowing for termination */
> + ASSERT_EQ(teid & 0x4c, 0);

The constant is wrong, should be 0x8c instead, or better, a more straight forward
expression that evaluates to it.

[...]

2022-05-02 13:17:48

by Christian Borntraeger

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] Dirtying, failing memop: don't indicate suppression

Am 26.04.22 um 09:25 schrieb Janosch Frank:
>
> To me this measure looks like a last resort option and the POP doesn't state a 100% what is to be done. Some instructions can mandate suppression instead of termination according to the architects.
>
> My intuition tells me that if we are in a situation where this would happen then we would be much better off just doing it by hand (i.e. in the instruction emulation code) and not letting this function decide.
>
> So I'm not entirely sure if we're replacing something that is not correct with something that also won't be correct for all cases.
>
> But to summarize this: I'm not entirely sure even after reading the POP for more than an hour and consulting an architect

According to Damian, the definition in the POP is exactly the way it is to cover for z/VMs way of handling key protection for long operatings in a terminating fashion since the 70ies or 80ies.
As it is fine for z/VM (and then also for z/OS and zVSE under z/VM) I guess we can (and should) mimic that behaviour.