2021-11-22 13:14:13

by Pierre Morel

[permalink] [raw]
Subject: [PATCH v5 0/1] s390x: KVM: CPU Topology

Hi all,

This new series add the implementation of interpretation for
the PTF instruction.

The series provides:
1- interception of the STSI instruction forwarding the CPU topology
2- interpretation of the PTF instruction
3- a KVM capability for the userland hypervisor to ask KVM to
setup PTF interpretation.


0- Foreword

The S390 CPU topology is reported using two instructions:
- PTF, to get information if the CPU topology did change since last
PTF instruction or a subsystem reset.
- STSI, to get the topology information, consisting of the topology
of the CPU inside the sockets, of the sockets inside the books etc.

The PTF(2) instruction report a change if the STSI(15.1.2) instruction
will report a difference with the last STSI(15.1.2) instruction*.
With the SIE interpretation, the PTF(2) instruction will report a
change to the guest if the host sets the SCA.MTCR bit.

*The STSI(15.1.2) instruction reports:
- The cores address within a socket
- The polarization of the cores
- The CPU type of the cores
- If the cores are dedicated or not

We decided to implement the CPU topology for S390 in several steps:
- first step we provide a correct topology only for dedicated CPUs
and vCPUs. We provide the basic framework and report topology change
only when a new vCPU is plugged in a monotonic scheme.

In future development we will provide:
- NUMA handling, allowing holes inside the cores bitmap reported by
STSI(15.1.2)
- dedicated versus shared CPUs handling
- vCPU migration on a different CPU

We will ignore the following changes inside a STSI(15.1.2):
- polarization: only horizontal polarization is currently used in Linux.
- CPU Type: only IFL Type are supported in Linux


1- Interception of STSI

To provide Topology information to the guest through the STSI
instruction, we forward STSI with Function Code 15 to the
userland hypervisor which will take care to provide the right
information to the guest.

To let the guest use both the PTF instruction to check if a topology
change occurred and sthe STSI_15.x.x instruction we add a new KVM
capability to enable the topology facility.

2- Interpretation of PTF with FC(2)

The PTF instruction will report a topology change if there is any change
with a previous STSI(15.1.2) SYSIB.
Changes inside a STSI(15.1.2) SYSIB occur if CPU bits are set or clear
inside the CPU Topology List Entry CPU mask field, which happens with
changes in CPU polarization, dedication, CPU types and adding or
removing CPUs in a socket.

The reporting to the guest is done using the Multiprocessor
Topology-Change-Report (MTCR) bit of the utility entry of the guest's
SCA which will be cleared during the interpretation of PTF.

To check if the topology has been modified we use a new field of the
arch vCPU prev_cpu, to save the previous real CPU ID at the end of a
schedule and verify on next schedule that the CPU used is in the same
socket, this field is initialized to -1 on vCPU creation.


Regards,
Pierre

Pierre Morel (1):
s390x: KVM: accept STSI for CPU topology information

Documentation/virt/kvm/api.rst | 16 ++++++++++
arch/s390/include/asm/kvm_host.h | 14 ++++++---
arch/s390/kvm/kvm-s390.c | 52 +++++++++++++++++++++++++++++++-
arch/s390/kvm/priv.c | 7 ++++-
arch/s390/kvm/vsie.c | 3 ++
include/uapi/linux/kvm.h | 1 +
6 files changed, 87 insertions(+), 6 deletions(-)

--
2.27.0

Changelog:

from v4 tp v5

- modify the way KVM_CAP is tested to be OK with vsie
(David)

from v3 to v4

- squatch both patches
(David)

- Added Documentation
(David)

- Modified the detection for new vCPUs
(Pierre)

from v2 to v3

- use PTF interpretation
(Christian)

- optimize arch_update_cpu_topology using PTF
(Pierre)

from v1 to v2:

- Add a KVM capability to let QEMU know we support PTF and STSI 15
(David)

- check KVM facility 11 before accepting STSI fc 15
(David)

- handle all we can in userland
(David)

- add tracing to STSI fc 15
(Connie)



2021-11-22 13:14:16

by Pierre Morel

[permalink] [raw]
Subject: [PATCH v5 1/1] s390x: KVM: accept STSI for CPU topology information

We let the userland hypervisor know if the machine support the CPU
topology facility using a new KVM capability: KVM_CAP_S390_CPU_TOPOLOGY.

The PTF instruction will report a topology change if there is any change
with a previous STSI_15_1_2 SYSIB.
Changes inside a STSI_15_1_2 SYSIB occur if CPU bits are set or clear
inside the CPU Topology List Entry CPU mask field, which happens with
changes in CPU polarization, dedication, CPU types and adding or
removing CPUs in a socket.

The reporting to the guest is done using the Multiprocessor
Topology-Change-Report (MTCR) bit of the utility entry of the guest's
SCA which will be cleared during the interpretation of PTF.

To check if the topology has been modified we use a new field of the
arch vCPU to save the previous real CPU ID at the end of a schedule
and verify on next schedule that the CPU used is in the same socket.

We assume in this patch:
- no polarization change: only horizontal polarization is currently
used in linux.
- no CPU Type change: only IFL Type are supported in Linux
- Dedication: with this patch, only a complete dedicated CPU stack can
take benefit of the CPU Topology.

STSI(15.1.x) gives information on the CPU configuration topology.
Let's accept the interception of STSI with the function code 15 and
let the userland part of the hypervisor handle it when userland
support the CPU Topology facility.

Signed-off-by: Pierre Morel <[email protected]>
---
Documentation/virt/kvm/api.rst | 16 ++++++++++
arch/s390/include/asm/kvm_host.h | 14 ++++++---
arch/s390/kvm/kvm-s390.c | 52 +++++++++++++++++++++++++++++++-
arch/s390/kvm/priv.c | 7 ++++-
arch/s390/kvm/vsie.c | 3 ++
include/uapi/linux/kvm.h | 1 +
6 files changed, 87 insertions(+), 6 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index aeeb071c7688..e5c9da0782a6 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7484,3 +7484,19 @@ The argument to KVM_ENABLE_CAP is also a bitmask, and must be a subset
of the result of KVM_CHECK_EXTENSION. KVM will forward to userspace
the hypercalls whose corresponding bit is in the argument, and return
ENOSYS for the others.
+
+8.17 KVM_CAP_S390_CPU_TOPOLOGY
+------------------------------
+
+:Capability: KVM_CAP_S390_CPU_TOPOLOGY
+:Architectures: s390
+:Type: vm
+
+This capability indicates that kvm will provide the S390 CPU Topology facility
+which consist of the interpretation of the PTF instruction for the Function
+Code 2 along with interception and forwarding of both the PTF instruction
+with function Codes 0 or 1 and the STSI(15,1,x) instruction to the userland
+hypervisor.
+
+The stfle facility 11, CPU Topology facility, should not be provided to the
+guest without this capability.
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index a604d51acfc8..cccc09a8fdab 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -95,15 +95,19 @@ struct bsca_block {
union ipte_control ipte_control;
__u64 reserved[5];
__u64 mcn;
- __u64 reserved2;
+#define ESCA_UTILITY_MTCR 0x8000
+ __u16 utility;
+ __u8 reserved2[6];
struct bsca_entry cpu[KVM_S390_BSCA_CPU_SLOTS];
};

struct esca_block {
union ipte_control ipte_control;
- __u64 reserved1[7];
+ __u64 reserved1[6];
+ __u16 utility;
+ __u8 reserved2[6];
__u64 mcn[4];
- __u64 reserved2[20];
+ __u64 reserved3[20];
struct esca_entry cpu[KVM_S390_ESCA_CPU_SLOTS];
};

@@ -228,7 +232,7 @@ struct kvm_s390_sie_block {
__u8 icptcode; /* 0x0050 */
__u8 icptstatus; /* 0x0051 */
__u16 ihcpu; /* 0x0052 */
- __u8 reserved54; /* 0x0054 */
+ __u8 mtcr; /* 0x0054 */
#define IICTL_CODE_NONE 0x00
#define IICTL_CODE_MCHK 0x01
#define IICTL_CODE_EXT 0x02
@@ -247,6 +251,7 @@ struct kvm_s390_sie_block {
#define ECB_SPECI 0x08
#define ECB_SRSI 0x04
#define ECB_HOSTPROTINT 0x02
+#define ECB_PTF 0x01
__u8 ecb; /* 0x0061 */
#define ECB2_CMMA 0x80
#define ECB2_IEP 0x20
@@ -748,6 +753,7 @@ struct kvm_vcpu_arch {
bool skey_enabled;
struct kvm_s390_pv_vcpu pv;
union diag318_info diag318_info;
+ int prev_cpu;
};

struct kvm_vm_stat {
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 14a18ba5ff2c..b40d2a20bce0 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -606,6 +606,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_S390_PROTECTED:
r = is_prot_virt_host();
break;
+ case KVM_CAP_S390_CPU_TOPOLOGY:
+ r = test_facility(11);
+ break;
default:
r = 0;
}
@@ -817,6 +820,20 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
icpt_operexc_on_all_vcpus(kvm);
r = 0;
break;
+ case KVM_CAP_S390_CPU_TOPOLOGY:
+ r = -EINVAL;
+ mutex_lock(&kvm->lock);
+ if (kvm->created_vcpus) {
+ r = -EBUSY;
+ } else if (test_facility(11)) {
+ set_kvm_facility(kvm->arch.model.fac_mask, 11);
+ set_kvm_facility(kvm->arch.model.fac_list, 11);
+ r = 0;
+ }
+ mutex_unlock(&kvm->lock);
+ VM_EVENT(kvm, 3, "ENABLE: CPU TOPOLOGY %s",
+ r ? "(not available)" : "(success)");
+ break;
default:
r = -EINVAL;
break;
@@ -3089,18 +3106,44 @@ __u64 kvm_s390_get_cpu_timer(struct kvm_vcpu *vcpu)
return value;
}

-void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+static void kvm_s390_set_mtcr(struct kvm_vcpu *vcpu)
{
+ struct esca_block *esca = vcpu->kvm->arch.sca;

+ if (vcpu->arch.sie_block->ecb & ECB_PTF) {
+ ipte_lock(vcpu);
+ WRITE_ONCE(esca->utility, ESCA_UTILITY_MTCR);
+ ipte_unlock(vcpu);
+ }
+}
+
+void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+{
gmap_enable(vcpu->arch.enabled_gmap);
kvm_s390_set_cpuflags(vcpu, CPUSTAT_RUNNING);
if (vcpu->arch.cputm_enabled && !is_vcpu_idle(vcpu))
__start_cpu_timer_accounting(vcpu);
vcpu->cpu = cpu;
+
+ /*
+ * With PTF interpretation the guest will be aware of topology
+ * change when the Multiprocessor Topology-Change-Report is pending.
+ * We check for events modifying the result of STSI_15_2:
+ * - A new vCPU has been hotplugged (prev_cpu == -1)
+ * - The real CPU backing up the vCPU moved to another socket
+ */
+ if (vcpu->arch.sie_block->ecb & ECB_PTF) {
+ if (vcpu->arch.prev_cpu == -1 ||
+ (topology_physical_package_id(cpu) !=
+ topology_physical_package_id(vcpu->arch.prev_cpu)))
+ kvm_s390_set_mtcr(vcpu);
+ }
}

void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
{
+ /* Remember which CPU was backing the vCPU */
+ vcpu->arch.prev_cpu = vcpu->cpu;
vcpu->cpu = -1;
if (vcpu->arch.cputm_enabled && !is_vcpu_idle(vcpu))
__stop_cpu_timer_accounting(vcpu);
@@ -3220,6 +3263,13 @@ static int kvm_s390_vcpu_setup(struct kvm_vcpu *vcpu)
vcpu->arch.sie_block->ecb |= ECB_HOSTPROTINT;
if (test_kvm_facility(vcpu->kvm, 9))
vcpu->arch.sie_block->ecb |= ECB_SRSI;
+
+ /* PTF needs guest facilities to enable interpretation */
+ if (test_kvm_facility(vcpu->kvm, 11))
+ vcpu->arch.sie_block->ecb |= ECB_PTF;
+ /* Set the prev_cpu value to an impossible value to detect a new vcpu */
+ vcpu->arch.prev_cpu = -1;
+
if (test_kvm_facility(vcpu->kvm, 73))
vcpu->arch.sie_block->ecb |= ECB_TE;
if (!kvm_is_ucontrol(vcpu->kvm))
diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index 417154b314a6..26d165733496 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -861,7 +861,8 @@ static int handle_stsi(struct kvm_vcpu *vcpu)
if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);

- if (fc > 3) {
+ if ((fc > 3 && fc != 15) ||
+ (fc == 15 && !test_kvm_facility(vcpu->kvm, 11))) {
kvm_s390_set_psw_cc(vcpu, 3);
return 0;
}
@@ -898,6 +899,10 @@ static int handle_stsi(struct kvm_vcpu *vcpu)
goto out_no_data;
handle_stsi_3_2_2(vcpu, (void *) mem);
break;
+ case 15:
+ trace_kvm_s390_handle_stsi(vcpu, fc, sel1, sel2, operand2);
+ insert_stsi_usr_data(vcpu, operand2, ar, fc, sel1, sel2);
+ return -EREMOTE;
}
if (kvm_s390_pv_cpu_is_protected(vcpu)) {
memcpy((void *)sida_origin(vcpu->arch.sie_block), (void *)mem,
diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
index acda4b6fc851..da0397cf2cc7 100644
--- a/arch/s390/kvm/vsie.c
+++ b/arch/s390/kvm/vsie.c
@@ -503,6 +503,9 @@ static int shadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
/* Host-protection-interruption introduced with ESOP */
if (test_kvm_cpu_feat(vcpu->kvm, KVM_S390_VM_CPU_FEAT_ESOP))
scb_s->ecb |= scb_o->ecb & ECB_HOSTPROTINT;
+ /* CPU Topology */
+ if (test_kvm_facility(vcpu->kvm, 11))
+ scb_s->ecb |= scb_o->ecb & ECB_PTF;
/* transactional execution */
if (test_kvm_facility(vcpu->kvm, 73) && wants_tx) {
/* remap the prefix is tx is toggled on */
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 1daa45268de2..273c62dfbe9a 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1131,6 +1131,7 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
#define KVM_CAP_ARM_MTE 205
#define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
+#define KVM_CAP_S390_CPU_TOPOLOGY 207

#ifdef KVM_CAP_IRQ_ROUTING

--
2.27.0


2021-12-07 09:43:10

by Pierre Morel

[permalink] [raw]
Subject: Re: [PATCH v5 1/1] s390x: KVM: accept STSI for CPU topology information

gentle ping

On 11/22/21 14:14, Pierre Morel wrote:
> We let the userland hypervisor know if the machine support the CPU
> topology facility using a new KVM capability: KVM_CAP_S390_CPU_TOPOLOGY.
>
> The PTF instruction will report a topology change if there is any change
> with a previous STSI_15_1_2 SYSIB.
> Changes inside a STSI_15_1_2 SYSIB occur if CPU bits are set or clear
> inside the CPU Topology List Entry CPU mask field, which happens with
> changes in CPU polarization, dedication, CPU types and adding or
> removing CPUs in a socket.
>
> The reporting to the guest is done using the Multiprocessor
> Topology-Change-Report (MTCR) bit of the utility entry of the guest's
> SCA which will be cleared during the interpretation of PTF.
>
> To check if the topology has been modified we use a new field of the
> arch vCPU to save the previous real CPU ID at the end of a schedule
> and verify on next schedule that the CPU used is in the same socket.
>
> We assume in this patch:
> - no polarization change: only horizontal polarization is currently
> used in linux.
> - no CPU Type change: only IFL Type are supported in Linux
> - Dedication: with this patch, only a complete dedicated CPU stack can
> take benefit of the CPU Topology.
>
> STSI(15.1.x) gives information on the CPU configuration topology.
> Let's accept the interception of STSI with the function code 15 and
> let the userland part of the hypervisor handle it when userland
> support the CPU Topology facility.
>
> Signed-off-by: Pierre Morel <[email protected]>
> ---
> Documentation/virt/kvm/api.rst | 16 ++++++++++
> arch/s390/include/asm/kvm_host.h | 14 ++++++---
> arch/s390/kvm/kvm-s390.c | 52 +++++++++++++++++++++++++++++++-
> arch/s390/kvm/priv.c | 7 ++++-
> arch/s390/kvm/vsie.c | 3 ++
> include/uapi/linux/kvm.h | 1 +
> 6 files changed, 87 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index aeeb071c7688..e5c9da0782a6 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -7484,3 +7484,19 @@ The argument to KVM_ENABLE_CAP is also a bitmask, and must be a subset
> of the result of KVM_CHECK_EXTENSION. KVM will forward to userspace
> the hypercalls whose corresponding bit is in the argument, and return
> ENOSYS for the others.
> +
> +8.17 KVM_CAP_S390_CPU_TOPOLOGY
> +------------------------------
> +
> +:Capability: KVM_CAP_S390_CPU_TOPOLOGY
> +:Architectures: s390
> +:Type: vm
> +
> +This capability indicates that kvm will provide the S390 CPU Topology facility
> +which consist of the interpretation of the PTF instruction for the Function
> +Code 2 along with interception and forwarding of both the PTF instruction
> +with function Codes 0 or 1 and the STSI(15,1,x) instruction to the userland
> +hypervisor.
> +
> +The stfle facility 11, CPU Topology facility, should not be provided to the
> +guest without this capability.
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index a604d51acfc8..cccc09a8fdab 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -95,15 +95,19 @@ struct bsca_block {
> union ipte_control ipte_control;
> __u64 reserved[5];
> __u64 mcn;
> - __u64 reserved2;
> +#define ESCA_UTILITY_MTCR 0x8000
> + __u16 utility;
> + __u8 reserved2[6];
> struct bsca_entry cpu[KVM_S390_BSCA_CPU_SLOTS];
> };
>
> struct esca_block {
> union ipte_control ipte_control;
> - __u64 reserved1[7];
> + __u64 reserved1[6];
> + __u16 utility;
> + __u8 reserved2[6];
> __u64 mcn[4];
> - __u64 reserved2[20];
> + __u64 reserved3[20];
> struct esca_entry cpu[KVM_S390_ESCA_CPU_SLOTS];
> };
>
> @@ -228,7 +232,7 @@ struct kvm_s390_sie_block {
> __u8 icptcode; /* 0x0050 */
> __u8 icptstatus; /* 0x0051 */
> __u16 ihcpu; /* 0x0052 */
> - __u8 reserved54; /* 0x0054 */
> + __u8 mtcr; /* 0x0054 */
> #define IICTL_CODE_NONE 0x00
> #define IICTL_CODE_MCHK 0x01
> #define IICTL_CODE_EXT 0x02
> @@ -247,6 +251,7 @@ struct kvm_s390_sie_block {
> #define ECB_SPECI 0x08
> #define ECB_SRSI 0x04
> #define ECB_HOSTPROTINT 0x02
> +#define ECB_PTF 0x01
> __u8 ecb; /* 0x0061 */
> #define ECB2_CMMA 0x80
> #define ECB2_IEP 0x20
> @@ -748,6 +753,7 @@ struct kvm_vcpu_arch {
> bool skey_enabled;
> struct kvm_s390_pv_vcpu pv;
> union diag318_info diag318_info;
> + int prev_cpu;
> };
>
> struct kvm_vm_stat {
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 14a18ba5ff2c..b40d2a20bce0 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -606,6 +606,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> case KVM_CAP_S390_PROTECTED:
> r = is_prot_virt_host();
> break;
> + case KVM_CAP_S390_CPU_TOPOLOGY:
> + r = test_facility(11);
> + break;
> default:
> r = 0;
> }
> @@ -817,6 +820,20 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
> icpt_operexc_on_all_vcpus(kvm);
> r = 0;
> break;
> + case KVM_CAP_S390_CPU_TOPOLOGY:
> + r = -EINVAL;
> + mutex_lock(&kvm->lock);
> + if (kvm->created_vcpus) {
> + r = -EBUSY;
> + } else if (test_facility(11)) {
> + set_kvm_facility(kvm->arch.model.fac_mask, 11);
> + set_kvm_facility(kvm->arch.model.fac_list, 11);
> + r = 0;
> + }
> + mutex_unlock(&kvm->lock);
> + VM_EVENT(kvm, 3, "ENABLE: CPU TOPOLOGY %s",
> + r ? "(not available)" : "(success)");
> + break;
> default:
> r = -EINVAL;
> break;
> @@ -3089,18 +3106,44 @@ __u64 kvm_s390_get_cpu_timer(struct kvm_vcpu *vcpu)
> return value;
> }
>
> -void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> +static void kvm_s390_set_mtcr(struct kvm_vcpu *vcpu)
> {
> + struct esca_block *esca = vcpu->kvm->arch.sca;
>
> + if (vcpu->arch.sie_block->ecb & ECB_PTF) {
> + ipte_lock(vcpu);
> + WRITE_ONCE(esca->utility, ESCA_UTILITY_MTCR);
> + ipte_unlock(vcpu);
> + }
> +}
> +
> +void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> +{
> gmap_enable(vcpu->arch.enabled_gmap);
> kvm_s390_set_cpuflags(vcpu, CPUSTAT_RUNNING);
> if (vcpu->arch.cputm_enabled && !is_vcpu_idle(vcpu))
> __start_cpu_timer_accounting(vcpu);
> vcpu->cpu = cpu;
> +
> + /*
> + * With PTF interpretation the guest will be aware of topology
> + * change when the Multiprocessor Topology-Change-Report is pending.
> + * We check for events modifying the result of STSI_15_2:
> + * - A new vCPU has been hotplugged (prev_cpu == -1)
> + * - The real CPU backing up the vCPU moved to another socket
> + */
> + if (vcpu->arch.sie_block->ecb & ECB_PTF) {
> + if (vcpu->arch.prev_cpu == -1 ||
> + (topology_physical_package_id(cpu) !=
> + topology_physical_package_id(vcpu->arch.prev_cpu)))
> + kvm_s390_set_mtcr(vcpu);
> + }
> }
>
> void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> {
> + /* Remember which CPU was backing the vCPU */
> + vcpu->arch.prev_cpu = vcpu->cpu;
> vcpu->cpu = -1;
> if (vcpu->arch.cputm_enabled && !is_vcpu_idle(vcpu))
> __stop_cpu_timer_accounting(vcpu);
> @@ -3220,6 +3263,13 @@ static int kvm_s390_vcpu_setup(struct kvm_vcpu *vcpu)
> vcpu->arch.sie_block->ecb |= ECB_HOSTPROTINT;
> if (test_kvm_facility(vcpu->kvm, 9))
> vcpu->arch.sie_block->ecb |= ECB_SRSI;
> +
> + /* PTF needs guest facilities to enable interpretation */
> + if (test_kvm_facility(vcpu->kvm, 11))
> + vcpu->arch.sie_block->ecb |= ECB_PTF;
> + /* Set the prev_cpu value to an impossible value to detect a new vcpu */
> + vcpu->arch.prev_cpu = -1;
> +
> if (test_kvm_facility(vcpu->kvm, 73))
> vcpu->arch.sie_block->ecb |= ECB_TE;
> if (!kvm_is_ucontrol(vcpu->kvm))
> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
> index 417154b314a6..26d165733496 100644
> --- a/arch/s390/kvm/priv.c
> +++ b/arch/s390/kvm/priv.c
> @@ -861,7 +861,8 @@ static int handle_stsi(struct kvm_vcpu *vcpu)
> if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
> return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
>
> - if (fc > 3) {
> + if ((fc > 3 && fc != 15) ||
> + (fc == 15 && !test_kvm_facility(vcpu->kvm, 11))) {
> kvm_s390_set_psw_cc(vcpu, 3);
> return 0;
> }
> @@ -898,6 +899,10 @@ static int handle_stsi(struct kvm_vcpu *vcpu)
> goto out_no_data;
> handle_stsi_3_2_2(vcpu, (void *) mem);
> break;
> + case 15:
> + trace_kvm_s390_handle_stsi(vcpu, fc, sel1, sel2, operand2);
> + insert_stsi_usr_data(vcpu, operand2, ar, fc, sel1, sel2);
> + return -EREMOTE;
> }
> if (kvm_s390_pv_cpu_is_protected(vcpu)) {
> memcpy((void *)sida_origin(vcpu->arch.sie_block), (void *)mem,
> diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
> index acda4b6fc851..da0397cf2cc7 100644
> --- a/arch/s390/kvm/vsie.c
> +++ b/arch/s390/kvm/vsie.c
> @@ -503,6 +503,9 @@ static int shadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
> /* Host-protection-interruption introduced with ESOP */
> if (test_kvm_cpu_feat(vcpu->kvm, KVM_S390_VM_CPU_FEAT_ESOP))
> scb_s->ecb |= scb_o->ecb & ECB_HOSTPROTINT;
> + /* CPU Topology */
> + if (test_kvm_facility(vcpu->kvm, 11))
> + scb_s->ecb |= scb_o->ecb & ECB_PTF;
> /* transactional execution */
> if (test_kvm_facility(vcpu->kvm, 73) && wants_tx) {
> /* remap the prefix is tx is toggled on */
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 1daa45268de2..273c62dfbe9a 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1131,6 +1131,7 @@ struct kvm_ppc_resize_hpt {
> #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
> #define KVM_CAP_ARM_MTE 205
> #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
> +#define KVM_CAP_S390_CPU_TOPOLOGY 207
>
> #ifdef KVM_CAP_IRQ_ROUTING
>
>

--
Pierre Morel
IBM Lab Boeblingen

2021-12-09 12:36:30

by Claudio Imbrenda

[permalink] [raw]
Subject: Re: [PATCH v5 1/1] s390x: KVM: accept STSI for CPU topology information

On Mon, 22 Nov 2021 14:14:43 +0100
Pierre Morel <[email protected]> wrote:

> We let the userland hypervisor know if the machine support the CPU
> topology facility using a new KVM capability: KVM_CAP_S390_CPU_TOPOLOGY.
>
> The PTF instruction will report a topology change if there is any change
> with a previous STSI_15_1_2 SYSIB.
> Changes inside a STSI_15_1_2 SYSIB occur if CPU bits are set or clear
> inside the CPU Topology List Entry CPU mask field, which happens with
> changes in CPU polarization, dedication, CPU types and adding or
> removing CPUs in a socket.
>
> The reporting to the guest is done using the Multiprocessor
> Topology-Change-Report (MTCR) bit of the utility entry of the guest's
> SCA which will be cleared during the interpretation of PTF.
>
> To check if the topology has been modified we use a new field of the
> arch vCPU to save the previous real CPU ID at the end of a schedule
> and verify on next schedule that the CPU used is in the same socket.
>
> We assume in this patch:
> - no polarization change: only horizontal polarization is currently
> used in linux.
> - no CPU Type change: only IFL Type are supported in Linux
> - Dedication: with this patch, only a complete dedicated CPU stack can
> take benefit of the CPU Topology.
>
> STSI(15.1.x) gives information on the CPU configuration topology.
> Let's accept the interception of STSI with the function code 15 and
> let the userland part of the hypervisor handle it when userland
> support the CPU Topology facility.
>
> Signed-off-by: Pierre Morel <[email protected]>

Reviewed-by: Claudio Imbrenda <[email protected]>

Although the title of the patch is not very correct, you are doing more
than just accepting STSI.

Maybe call it "guest PTF support", or "guest support for topology
function"?

> ---
> Documentation/virt/kvm/api.rst | 16 ++++++++++
> arch/s390/include/asm/kvm_host.h | 14 ++++++---
> arch/s390/kvm/kvm-s390.c | 52 +++++++++++++++++++++++++++++++-
> arch/s390/kvm/priv.c | 7 ++++-
> arch/s390/kvm/vsie.c | 3 ++
> include/uapi/linux/kvm.h | 1 +
> 6 files changed, 87 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index aeeb071c7688..e5c9da0782a6 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -7484,3 +7484,19 @@ The argument to KVM_ENABLE_CAP is also a bitmask, and must be a subset
> of the result of KVM_CHECK_EXTENSION. KVM will forward to userspace
> the hypercalls whose corresponding bit is in the argument, and return
> ENOSYS for the others.
> +
> +8.17 KVM_CAP_S390_CPU_TOPOLOGY
> +------------------------------
> +
> +:Capability: KVM_CAP_S390_CPU_TOPOLOGY
> +:Architectures: s390
> +:Type: vm
> +
> +This capability indicates that kvm will provide the S390 CPU Topology facility
> +which consist of the interpretation of the PTF instruction for the Function
> +Code 2 along with interception and forwarding of both the PTF instruction
> +with function Codes 0 or 1 and the STSI(15,1,x) instruction to the userland
> +hypervisor.
> +
> +The stfle facility 11, CPU Topology facility, should not be provided to the
> +guest without this capability.
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index a604d51acfc8..cccc09a8fdab 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -95,15 +95,19 @@ struct bsca_block {
> union ipte_control ipte_control;
> __u64 reserved[5];
> __u64 mcn;
> - __u64 reserved2;
> +#define ESCA_UTILITY_MTCR 0x8000
> + __u16 utility;
> + __u8 reserved2[6];
> struct bsca_entry cpu[KVM_S390_BSCA_CPU_SLOTS];
> };
>
> struct esca_block {
> union ipte_control ipte_control;
> - __u64 reserved1[7];
> + __u64 reserved1[6];
> + __u16 utility;
> + __u8 reserved2[6];
> __u64 mcn[4];
> - __u64 reserved2[20];
> + __u64 reserved3[20];
> struct esca_entry cpu[KVM_S390_ESCA_CPU_SLOTS];
> };
>
> @@ -228,7 +232,7 @@ struct kvm_s390_sie_block {
> __u8 icptcode; /* 0x0050 */
> __u8 icptstatus; /* 0x0051 */
> __u16 ihcpu; /* 0x0052 */
> - __u8 reserved54; /* 0x0054 */
> + __u8 mtcr; /* 0x0054 */
> #define IICTL_CODE_NONE 0x00
> #define IICTL_CODE_MCHK 0x01
> #define IICTL_CODE_EXT 0x02
> @@ -247,6 +251,7 @@ struct kvm_s390_sie_block {
> #define ECB_SPECI 0x08
> #define ECB_SRSI 0x04
> #define ECB_HOSTPROTINT 0x02
> +#define ECB_PTF 0x01
> __u8 ecb; /* 0x0061 */
> #define ECB2_CMMA 0x80
> #define ECB2_IEP 0x20
> @@ -748,6 +753,7 @@ struct kvm_vcpu_arch {
> bool skey_enabled;
> struct kvm_s390_pv_vcpu pv;
> union diag318_info diag318_info;
> + int prev_cpu;
> };
>
> struct kvm_vm_stat {
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 14a18ba5ff2c..b40d2a20bce0 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -606,6 +606,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> case KVM_CAP_S390_PROTECTED:
> r = is_prot_virt_host();
> break;
> + case KVM_CAP_S390_CPU_TOPOLOGY:
> + r = test_facility(11);
> + break;
> default:
> r = 0;
> }
> @@ -817,6 +820,20 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
> icpt_operexc_on_all_vcpus(kvm);
> r = 0;
> break;
> + case KVM_CAP_S390_CPU_TOPOLOGY:
> + r = -EINVAL;
> + mutex_lock(&kvm->lock);
> + if (kvm->created_vcpus) {
> + r = -EBUSY;
> + } else if (test_facility(11)) {
> + set_kvm_facility(kvm->arch.model.fac_mask, 11);
> + set_kvm_facility(kvm->arch.model.fac_list, 11);
> + r = 0;
> + }
> + mutex_unlock(&kvm->lock);
> + VM_EVENT(kvm, 3, "ENABLE: CPU TOPOLOGY %s",
> + r ? "(not available)" : "(success)");
> + break;
> default:
> r = -EINVAL;
> break;
> @@ -3089,18 +3106,44 @@ __u64 kvm_s390_get_cpu_timer(struct kvm_vcpu *vcpu)
> return value;
> }
>
> -void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> +static void kvm_s390_set_mtcr(struct kvm_vcpu *vcpu)
> {
> + struct esca_block *esca = vcpu->kvm->arch.sca;
>
> + if (vcpu->arch.sie_block->ecb & ECB_PTF) {
> + ipte_lock(vcpu);
> + WRITE_ONCE(esca->utility, ESCA_UTILITY_MTCR);
> + ipte_unlock(vcpu);
> + }
> +}
> +
> +void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> +{
> gmap_enable(vcpu->arch.enabled_gmap);
> kvm_s390_set_cpuflags(vcpu, CPUSTAT_RUNNING);
> if (vcpu->arch.cputm_enabled && !is_vcpu_idle(vcpu))
> __start_cpu_timer_accounting(vcpu);
> vcpu->cpu = cpu;
> +
> + /*
> + * With PTF interpretation the guest will be aware of topology
> + * change when the Multiprocessor Topology-Change-Report is pending.
> + * We check for events modifying the result of STSI_15_2:
> + * - A new vCPU has been hotplugged (prev_cpu == -1)
> + * - The real CPU backing up the vCPU moved to another socket
> + */
> + if (vcpu->arch.sie_block->ecb & ECB_PTF) {
> + if (vcpu->arch.prev_cpu == -1 ||
> + (topology_physical_package_id(cpu) !=
> + topology_physical_package_id(vcpu->arch.prev_cpu)))
> + kvm_s390_set_mtcr(vcpu);
> + }
> }
>
> void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> {
> + /* Remember which CPU was backing the vCPU */
> + vcpu->arch.prev_cpu = vcpu->cpu;
> vcpu->cpu = -1;
> if (vcpu->arch.cputm_enabled && !is_vcpu_idle(vcpu))
> __stop_cpu_timer_accounting(vcpu);
> @@ -3220,6 +3263,13 @@ static int kvm_s390_vcpu_setup(struct kvm_vcpu *vcpu)
> vcpu->arch.sie_block->ecb |= ECB_HOSTPROTINT;
> if (test_kvm_facility(vcpu->kvm, 9))
> vcpu->arch.sie_block->ecb |= ECB_SRSI;
> +
> + /* PTF needs guest facilities to enable interpretation */
> + if (test_kvm_facility(vcpu->kvm, 11))
> + vcpu->arch.sie_block->ecb |= ECB_PTF;
> + /* Set the prev_cpu value to an impossible value to detect a new vcpu */
> + vcpu->arch.prev_cpu = -1;
> +
> if (test_kvm_facility(vcpu->kvm, 73))
> vcpu->arch.sie_block->ecb |= ECB_TE;
> if (!kvm_is_ucontrol(vcpu->kvm))
> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
> index 417154b314a6..26d165733496 100644
> --- a/arch/s390/kvm/priv.c
> +++ b/arch/s390/kvm/priv.c
> @@ -861,7 +861,8 @@ static int handle_stsi(struct kvm_vcpu *vcpu)
> if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
> return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
>
> - if (fc > 3) {
> + if ((fc > 3 && fc != 15) ||
> + (fc == 15 && !test_kvm_facility(vcpu->kvm, 11))) {
> kvm_s390_set_psw_cc(vcpu, 3);
> return 0;
> }
> @@ -898,6 +899,10 @@ static int handle_stsi(struct kvm_vcpu *vcpu)
> goto out_no_data;
> handle_stsi_3_2_2(vcpu, (void *) mem);
> break;
> + case 15:
> + trace_kvm_s390_handle_stsi(vcpu, fc, sel1, sel2, operand2);
> + insert_stsi_usr_data(vcpu, operand2, ar, fc, sel1, sel2);
> + return -EREMOTE;
> }
> if (kvm_s390_pv_cpu_is_protected(vcpu)) {
> memcpy((void *)sida_origin(vcpu->arch.sie_block), (void *)mem,
> diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
> index acda4b6fc851..da0397cf2cc7 100644
> --- a/arch/s390/kvm/vsie.c
> +++ b/arch/s390/kvm/vsie.c
> @@ -503,6 +503,9 @@ static int shadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
> /* Host-protection-interruption introduced with ESOP */
> if (test_kvm_cpu_feat(vcpu->kvm, KVM_S390_VM_CPU_FEAT_ESOP))
> scb_s->ecb |= scb_o->ecb & ECB_HOSTPROTINT;
> + /* CPU Topology */
> + if (test_kvm_facility(vcpu->kvm, 11))
> + scb_s->ecb |= scb_o->ecb & ECB_PTF;
> /* transactional execution */
> if (test_kvm_facility(vcpu->kvm, 73) && wants_tx) {
> /* remap the prefix is tx is toggled on */
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 1daa45268de2..273c62dfbe9a 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1131,6 +1131,7 @@ struct kvm_ppc_resize_hpt {
> #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
> #define KVM_CAP_ARM_MTE 205
> #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
> +#define KVM_CAP_S390_CPU_TOPOLOGY 207
>
> #ifdef KVM_CAP_IRQ_ROUTING
>


2021-12-09 13:30:57

by Pierre Morel

[permalink] [raw]
Subject: Re: [PATCH v5 1/1] s390x: KVM: accept STSI for CPU topology information



On 12/9/21 13:36, Claudio Imbrenda wrote:
> On Mon, 22 Nov 2021 14:14:43 +0100
> Pierre Morel <[email protected]> wrote:
>
>> We let the userland hypervisor know if the machine support the CPU
>> topology facility using a new KVM capability: KVM_CAP_S390_CPU_TOPOLOGY.
>>
>> The PTF instruction will report a topology change if there is any change
>> with a previous STSI_15_1_2 SYSIB.
>> Changes inside a STSI_15_1_2 SYSIB occur if CPU bits are set or clear
>> inside the CPU Topology List Entry CPU mask field, which happens with
>> changes in CPU polarization, dedication, CPU types and adding or
>> removing CPUs in a socket.
>>
>> The reporting to the guest is done using the Multiprocessor
>> Topology-Change-Report (MTCR) bit of the utility entry of the guest's
>> SCA which will be cleared during the interpretation of PTF.
>>
>> To check if the topology has been modified we use a new field of the
>> arch vCPU to save the previous real CPU ID at the end of a schedule
>> and verify on next schedule that the CPU used is in the same socket.
>>
>> We assume in this patch:
>> - no polarization change: only horizontal polarization is currently
>> used in linux.
>> - no CPU Type change: only IFL Type are supported in Linux
>> - Dedication: with this patch, only a complete dedicated CPU stack can
>> take benefit of the CPU Topology.
>>
>> STSI(15.1.x) gives information on the CPU configuration topology.
>> Let's accept the interception of STSI with the function code 15 and
>> let the userland part of the hypervisor handle it when userland
>> support the CPU Topology facility.
>>
>> Signed-off-by: Pierre Morel <[email protected]>
>
> Reviewed-by: Claudio Imbrenda <[email protected]>
>
> Although the title of the patch is not very correct, you are doing more
> than just accepting STSI.
>
> Maybe call it "guest PTF support", or "guest support for topology
> function"?

Right, thanks will do.

Thanks for the review.
Pierre


>
>> ---
>> Documentation/virt/kvm/api.rst | 16 ++++++++++
>> arch/s390/include/asm/kvm_host.h | 14 ++++++---
>> arch/s390/kvm/kvm-s390.c | 52 +++++++++++++++++++++++++++++++-
>> arch/s390/kvm/priv.c | 7 ++++-
>> arch/s390/kvm/vsie.c | 3 ++
>> include/uapi/linux/kvm.h | 1 +
>> 6 files changed, 87 insertions(+), 6 deletions(-)
>>
>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>> index aeeb071c7688..e5c9da0782a6 100644
>> --- a/Documentation/virt/kvm/api.rst
>> +++ b/Documentation/virt/kvm/api.rst
>> @@ -7484,3 +7484,19 @@ The argument to KVM_ENABLE_CAP is also a bitmask, and must be a subset
>> of the result of KVM_CHECK_EXTENSION. KVM will forward to userspace
>> the hypercalls whose corresponding bit is in the argument, and return
>> ENOSYS for the others.
>> +
>> +8.17 KVM_CAP_S390_CPU_TOPOLOGY
>> +------------------------------
>> +
>> +:Capability: KVM_CAP_S390_CPU_TOPOLOGY
>> +:Architectures: s390
>> +:Type: vm
>> +
>> +This capability indicates that kvm will provide the S390 CPU Topology facility
>> +which consist of the interpretation of the PTF instruction for the Function
>> +Code 2 along with interception and forwarding of both the PTF instruction
>> +with function Codes 0 or 1 and the STSI(15,1,x) instruction to the userland
>> +hypervisor.
>> +
>> +The stfle facility 11, CPU Topology facility, should not be provided to the
>> +guest without this capability.
>> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
>> index a604d51acfc8..cccc09a8fdab 100644
>> --- a/arch/s390/include/asm/kvm_host.h
>> +++ b/arch/s390/include/asm/kvm_host.h
>> @@ -95,15 +95,19 @@ struct bsca_block {
>> union ipte_control ipte_control;
>> __u64 reserved[5];
>> __u64 mcn;
>> - __u64 reserved2;
>> +#define ESCA_UTILITY_MTCR 0x8000
>> + __u16 utility;
>> + __u8 reserved2[6];
>> struct bsca_entry cpu[KVM_S390_BSCA_CPU_SLOTS];
>> };
>>
>> struct esca_block {
>> union ipte_control ipte_control;
>> - __u64 reserved1[7];
>> + __u64 reserved1[6];
>> + __u16 utility;
>> + __u8 reserved2[6];
>> __u64 mcn[4];
>> - __u64 reserved2[20];
>> + __u64 reserved3[20];
>> struct esca_entry cpu[KVM_S390_ESCA_CPU_SLOTS];
>> };
>>
>> @@ -228,7 +232,7 @@ struct kvm_s390_sie_block {
>> __u8 icptcode; /* 0x0050 */
>> __u8 icptstatus; /* 0x0051 */
>> __u16 ihcpu; /* 0x0052 */
>> - __u8 reserved54; /* 0x0054 */
>> + __u8 mtcr; /* 0x0054 */
>> #define IICTL_CODE_NONE 0x00
>> #define IICTL_CODE_MCHK 0x01
>> #define IICTL_CODE_EXT 0x02
>> @@ -247,6 +251,7 @@ struct kvm_s390_sie_block {
>> #define ECB_SPECI 0x08
>> #define ECB_SRSI 0x04
>> #define ECB_HOSTPROTINT 0x02
>> +#define ECB_PTF 0x01
>> __u8 ecb; /* 0x0061 */
>> #define ECB2_CMMA 0x80
>> #define ECB2_IEP 0x20
>> @@ -748,6 +753,7 @@ struct kvm_vcpu_arch {
>> bool skey_enabled;
>> struct kvm_s390_pv_vcpu pv;
>> union diag318_info diag318_info;
>> + int prev_cpu;
>> };
>>
>> struct kvm_vm_stat {
>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>> index 14a18ba5ff2c..b40d2a20bce0 100644
>> --- a/arch/s390/kvm/kvm-s390.c
>> +++ b/arch/s390/kvm/kvm-s390.c
>> @@ -606,6 +606,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>> case KVM_CAP_S390_PROTECTED:
>> r = is_prot_virt_host();
>> break;
>> + case KVM_CAP_S390_CPU_TOPOLOGY:
>> + r = test_facility(11);
>> + break;
>> default:
>> r = 0;
>> }
>> @@ -817,6 +820,20 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
>> icpt_operexc_on_all_vcpus(kvm);
>> r = 0;
>> break;
>> + case KVM_CAP_S390_CPU_TOPOLOGY:
>> + r = -EINVAL;
>> + mutex_lock(&kvm->lock);
>> + if (kvm->created_vcpus) {
>> + r = -EBUSY;
>> + } else if (test_facility(11)) {
>> + set_kvm_facility(kvm->arch.model.fac_mask, 11);
>> + set_kvm_facility(kvm->arch.model.fac_list, 11);
>> + r = 0;
>> + }
>> + mutex_unlock(&kvm->lock);
>> + VM_EVENT(kvm, 3, "ENABLE: CPU TOPOLOGY %s",
>> + r ? "(not available)" : "(success)");
>> + break;
>> default:
>> r = -EINVAL;
>> break;
>> @@ -3089,18 +3106,44 @@ __u64 kvm_s390_get_cpu_timer(struct kvm_vcpu *vcpu)
>> return value;
>> }
>>
>> -void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>> +static void kvm_s390_set_mtcr(struct kvm_vcpu *vcpu)
>> {
>> + struct esca_block *esca = vcpu->kvm->arch.sca;
>>
>> + if (vcpu->arch.sie_block->ecb & ECB_PTF) {
>> + ipte_lock(vcpu);
>> + WRITE_ONCE(esca->utility, ESCA_UTILITY_MTCR);
>> + ipte_unlock(vcpu);
>> + }
>> +}
>> +
>> +void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>> +{
>> gmap_enable(vcpu->arch.enabled_gmap);
>> kvm_s390_set_cpuflags(vcpu, CPUSTAT_RUNNING);
>> if (vcpu->arch.cputm_enabled && !is_vcpu_idle(vcpu))
>> __start_cpu_timer_accounting(vcpu);
>> vcpu->cpu = cpu;
>> +
>> + /*
>> + * With PTF interpretation the guest will be aware of topology
>> + * change when the Multiprocessor Topology-Change-Report is pending.
>> + * We check for events modifying the result of STSI_15_2:
>> + * - A new vCPU has been hotplugged (prev_cpu == -1)
>> + * - The real CPU backing up the vCPU moved to another socket
>> + */
>> + if (vcpu->arch.sie_block->ecb & ECB_PTF) {
>> + if (vcpu->arch.prev_cpu == -1 ||
>> + (topology_physical_package_id(cpu) !=
>> + topology_physical_package_id(vcpu->arch.prev_cpu)))
>> + kvm_s390_set_mtcr(vcpu);
>> + }
>> }
>>
>> void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>> {
>> + /* Remember which CPU was backing the vCPU */
>> + vcpu->arch.prev_cpu = vcpu->cpu;
>> vcpu->cpu = -1;
>> if (vcpu->arch.cputm_enabled && !is_vcpu_idle(vcpu))
>> __stop_cpu_timer_accounting(vcpu);
>> @@ -3220,6 +3263,13 @@ static int kvm_s390_vcpu_setup(struct kvm_vcpu *vcpu)
>> vcpu->arch.sie_block->ecb |= ECB_HOSTPROTINT;
>> if (test_kvm_facility(vcpu->kvm, 9))
>> vcpu->arch.sie_block->ecb |= ECB_SRSI;
>> +
>> + /* PTF needs guest facilities to enable interpretation */
>> + if (test_kvm_facility(vcpu->kvm, 11))
>> + vcpu->arch.sie_block->ecb |= ECB_PTF;
>> + /* Set the prev_cpu value to an impossible value to detect a new vcpu */
>> + vcpu->arch.prev_cpu = -1;
>> +
>> if (test_kvm_facility(vcpu->kvm, 73))
>> vcpu->arch.sie_block->ecb |= ECB_TE;
>> if (!kvm_is_ucontrol(vcpu->kvm))
>> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
>> index 417154b314a6..26d165733496 100644
>> --- a/arch/s390/kvm/priv.c
>> +++ b/arch/s390/kvm/priv.c
>> @@ -861,7 +861,8 @@ static int handle_stsi(struct kvm_vcpu *vcpu)
>> if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
>> return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
>>
>> - if (fc > 3) {
>> + if ((fc > 3 && fc != 15) ||
>> + (fc == 15 && !test_kvm_facility(vcpu->kvm, 11))) {
>> kvm_s390_set_psw_cc(vcpu, 3);
>> return 0;
>> }
>> @@ -898,6 +899,10 @@ static int handle_stsi(struct kvm_vcpu *vcpu)
>> goto out_no_data;
>> handle_stsi_3_2_2(vcpu, (void *) mem);
>> break;
>> + case 15:
>> + trace_kvm_s390_handle_stsi(vcpu, fc, sel1, sel2, operand2);
>> + insert_stsi_usr_data(vcpu, operand2, ar, fc, sel1, sel2);
>> + return -EREMOTE;
>> }
>> if (kvm_s390_pv_cpu_is_protected(vcpu)) {
>> memcpy((void *)sida_origin(vcpu->arch.sie_block), (void *)mem,
>> diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
>> index acda4b6fc851..da0397cf2cc7 100644
>> --- a/arch/s390/kvm/vsie.c
>> +++ b/arch/s390/kvm/vsie.c
>> @@ -503,6 +503,9 @@ static int shadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
>> /* Host-protection-interruption introduced with ESOP */
>> if (test_kvm_cpu_feat(vcpu->kvm, KVM_S390_VM_CPU_FEAT_ESOP))
>> scb_s->ecb |= scb_o->ecb & ECB_HOSTPROTINT;
>> + /* CPU Topology */
>> + if (test_kvm_facility(vcpu->kvm, 11))
>> + scb_s->ecb |= scb_o->ecb & ECB_PTF;
>> /* transactional execution */
>> if (test_kvm_facility(vcpu->kvm, 73) && wants_tx) {
>> /* remap the prefix is tx is toggled on */
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 1daa45268de2..273c62dfbe9a 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -1131,6 +1131,7 @@ struct kvm_ppc_resize_hpt {
>> #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
>> #define KVM_CAP_ARM_MTE 205
>> #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
>> +#define KVM_CAP_S390_CPU_TOPOLOGY 207
>>
>> #ifdef KVM_CAP_IRQ_ROUTING
>>
>

--
Pierre Morel
IBM Lab Boeblingen

2021-12-09 15:54:31

by Heiko Carstens

[permalink] [raw]
Subject: Re: [PATCH v5 1/1] s390x: KVM: accept STSI for CPU topology information

On Thu, Dec 09, 2021 at 01:36:16PM +0100, Claudio Imbrenda wrote:
> On Mon, 22 Nov 2021 14:14:43 +0100
> Pierre Morel <[email protected]> wrote:
>
> > We let the userland hypervisor know if the machine support the CPU
> > topology facility using a new KVM capability: KVM_CAP_S390_CPU_TOPOLOGY.
> >
> > The PTF instruction will report a topology change if there is any change
> > with a previous STSI_15_1_2 SYSIB.
> > Changes inside a STSI_15_1_2 SYSIB occur if CPU bits are set or clear
> > inside the CPU Topology List Entry CPU mask field, which happens with
> > changes in CPU polarization, dedication, CPU types and adding or
> > removing CPUs in a socket.
> >
> > The reporting to the guest is done using the Multiprocessor
> > Topology-Change-Report (MTCR) bit of the utility entry of the guest's
> > SCA which will be cleared during the interpretation of PTF.
> >
> > To check if the topology has been modified we use a new field of the
> > arch vCPU to save the previous real CPU ID at the end of a schedule
> > and verify on next schedule that the CPU used is in the same socket.
> >
> > We assume in this patch:
> > - no polarization change: only horizontal polarization is currently
> > used in linux.

Why is this assumption necessary? The statement that Linux runs only
with horizontal polarization is not true.

2021-12-09 16:09:08

by Janosch Frank

[permalink] [raw]
Subject: Re: [PATCH v5 1/1] s390x: KVM: accept STSI for CPU topology information

On 11/22/21 14:14, Pierre Morel wrote:
> We let the userland hypervisor know if the machine support the CPU
> topology facility using a new KVM capability: KVM_CAP_S390_CPU_TOPOLOGY.
>
> The PTF instruction will report a topology change if there is any change
> with a previous STSI_15_1_2 SYSIB.
> Changes inside a STSI_15_1_2 SYSIB occur if CPU bits are set or clear
> inside the CPU Topology List Entry CPU mask field, which happens with
> changes in CPU polarization, dedication, CPU types and adding or
> removing CPUs in a socket.
>
> The reporting to the guest is done using the Multiprocessor
> Topology-Change-Report (MTCR) bit of the utility entry of the guest's
> SCA which will be cleared during the interpretation of PTF.
>
> To check if the topology has been modified we use a new field of the
> arch vCPU to save the previous real CPU ID at the end of a schedule
> and verify on next schedule that the CPU used is in the same socket.
>
> We assume in this patch:
> - no polarization change: only horizontal polarization is currently
> used in linux.
> - no CPU Type change: only IFL Type are supported in Linux
> - Dedication: with this patch, only a complete dedicated CPU stack can
> take benefit of the CPU Topology.
>
> STSI(15.1.x) gives information on the CPU configuration topology.
> Let's accept the interception of STSI with the function code 15 and
> let the userland part of the hypervisor handle it when userland
> support the CPU Topology facility.
>
> Signed-off-by: Pierre Morel <[email protected]>
> ---
> Documentation/virt/kvm/api.rst | 16 ++++++++++
> arch/s390/include/asm/kvm_host.h | 14 ++++++---
> arch/s390/kvm/kvm-s390.c | 52 +++++++++++++++++++++++++++++++-
> arch/s390/kvm/priv.c | 7 ++++-
> arch/s390/kvm/vsie.c | 3 ++
> include/uapi/linux/kvm.h | 1 +
> 6 files changed, 87 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index aeeb071c7688..e5c9da0782a6 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -7484,3 +7484,19 @@ The argument to KVM_ENABLE_CAP is also a bitmask, and must be a subset
> of the result of KVM_CHECK_EXTENSION. KVM will forward to userspace
> the hypercalls whose corresponding bit is in the argument, and return
> ENOSYS for the others.
> +
> +8.17 KVM_CAP_S390_CPU_TOPOLOGY
> +------------------------------
> +
> +:Capability: KVM_CAP_S390_CPU_TOPOLOGY
> +:Architectures: s390
> +:Type: vm
> +
> +This capability indicates that kvm will provide the S390 CPU Topology facility
> +which consist of the interpretation of the PTF instruction for the Function
> +Code 2 along with interception and forwarding of both the PTF instruction
> +with function Codes 0 or 1 and the STSI(15,1,x) instruction to the userland

The capitalization of "Function code" is inconsistent.

> +hypervisor.
> +
> +The stfle facility 11, CPU Topology facility, should not be provided to the
> +guest without this capability.
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index a604d51acfc8..cccc09a8fdab 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -95,15 +95,19 @@ struct bsca_block {
> union ipte_control ipte_control;
> __u64 reserved[5];
> __u64 mcn;
> - __u64 reserved2;
> +#define ESCA_UTILITY_MTCR 0x8000
> + __u16 utility;
> + __u8 reserved2[6];
> struct bsca_entry cpu[KVM_S390_BSCA_CPU_SLOTS];
> };
>
> struct esca_block {
> union ipte_control ipte_control;
> - __u64 reserved1[7];
> + __u64 reserved1[6];
> + __u16 utility;
> + __u8 reserved2[6];
> __u64 mcn[4];
> - __u64 reserved2[20];
> + __u64 reserved3[20];

Note to self: Prime example for a move to reserved member names based on
offsets.

> struct esca_entry cpu[KVM_S390_ESCA_CPU_SLOTS];
> };
>
> @@ -228,7 +232,7 @@ struct kvm_s390_sie_block {
> __u8 icptcode; /* 0x0050 */
> __u8 icptstatus; /* 0x0051 */
> __u16 ihcpu; /* 0x0052 */
> - __u8 reserved54; /* 0x0054 */
> + __u8 mtcr; /* 0x0054 */
> #define IICTL_CODE_NONE 0x00
> #define IICTL_CODE_MCHK 0x01
> #define IICTL_CODE_EXT 0x02
> @@ -247,6 +251,7 @@ struct kvm_s390_sie_block {
> #define ECB_SPECI 0x08
> #define ECB_SRSI 0x04
> #define ECB_HOSTPROTINT 0x02
> +#define ECB_PTF 0x01
> __u8 ecb; /* 0x0061 */
> #define ECB2_CMMA 0x80
> #define ECB2_IEP 0x20
> @@ -748,6 +753,7 @@ struct kvm_vcpu_arch {
> bool skey_enabled;
> struct kvm_s390_pv_vcpu pv;
> union diag318_info diag318_info;
> + int prev_cpu;
> };
>
> struct kvm_vm_stat {

[..]

> }
>
> -void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> +static void kvm_s390_set_mtcr(struct kvm_vcpu *vcpu)

We change a vcpu related data structure, there should be "vcpu" in the
function name to indicate that.

> {
> + struct esca_block *esca = vcpu->kvm->arch.sca;
>
> + if (vcpu->arch.sie_block->ecb & ECB_PTF) {

I'm wondering if we should replace these checks with the
test_kvm_facility() ones. ECB_PTF is never changed after vcpu setup, right?

> + ipte_lock(vcpu);
> + WRITE_ONCE(esca->utility, ESCA_UTILITY_MTCR);
> + ipte_unlock(vcpu);
> + }
> +}
> +
> +void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> +{
> gmap_enable(vcpu->arch.enabled_gmap);
> kvm_s390_set_cpuflags(vcpu, CPUSTAT_RUNNING);
> if (vcpu->arch.cputm_enabled && !is_vcpu_idle(vcpu))
> __start_cpu_timer_accounting(vcpu);
> vcpu->cpu = cpu;
> +
> + /*
> + * With PTF interpretation the guest will be aware of topology
> + * change when the Multiprocessor Topology-Change-Report is pending.
> + * We check for events modifying the result of STSI_15_2:
> + * - A new vCPU has been hotplugged (prev_cpu == -1)
> + * - The real CPU backing up the vCPU moved to another socket
> + */
> + if (vcpu->arch.sie_block->ecb & ECB_PTF) {
> + if (vcpu->arch.prev_cpu == -1 ||
> + (topology_physical_package_id(cpu) !=
> + topology_physical_package_id(vcpu->arch.prev_cpu)))

This is barely readable, might be good to put this check in a separate
function in kvm-s390.h.

> + kvm_s390_set_mtcr(vcpu);
> + }
> }
>
> void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> {
> + /* Remember which CPU was backing the vCPU */
> + vcpu->arch.prev_cpu = vcpu->cpu;
> vcpu->cpu = -1;
> if (vcpu->arch.cputm_enabled && !is_vcpu_idle(vcpu))
> __stop_cpu_timer_accounting(vcpu);
> @@ -3220,6 +3263,13 @@ static int kvm_s390_vcpu_setup(struct kvm_vcpu *vcpu)
> vcpu->arch.sie_block->ecb |= ECB_HOSTPROTINT;
> if (test_kvm_facility(vcpu->kvm, 9))
> vcpu->arch.sie_block->ecb |= ECB_SRSI;
> +
> + /* PTF needs guest facilities to enable interpretation */

Please explain.
How is this different from any other facility a few lines above in this
function?

> + if (test_kvm_facility(vcpu->kvm, 11))
> + vcpu->arch.sie_block->ecb |= ECB_PTF;
> + /* Set the prev_cpu value to an impossible value to detect a new vcpu */

We can either change this to:
"A prev_value of -1 indicates this is a new vcpu"

Or we define a constant which will also make the check in
kvm_arch_vcpu_load() easier to understand.

> + vcpu->arch.prev_cpu = -1;
> +
> if (test_kvm_facility(vcpu->kvm, 73))
> vcpu->arch.sie_block->ecb |= ECB_TE;
> if (!kvm_is_ucontrol(vcpu->kvm))
> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
> index 417154b314a6..26d165733496 100644
> --- a/arch/s390/kvm/priv.c
> +++ b/arch/s390/kvm/priv.c
> @@ -861,7 +861,8 @@ static int handle_stsi(struct kvm_vcpu *vcpu)
> if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
> return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
>
> - if (fc > 3) {
> + if ((fc > 3 && fc != 15) ||
> + (fc == 15 && !test_kvm_facility(vcpu->kvm, 11))) {
> kvm_s390_set_psw_cc(vcpu, 3);
> return 0;
> }

How about:

if (fc > 3 && fc != 15)
goto out_no_data;

/* fc 15 is provided with PTF/CPU topology support */
if (fc == 15 && !test_kvm_facility(vcpu->kvm, 11))
goto out_no_data;

> @@ -898,6 +899,10 @@ static int handle_stsi(struct kvm_vcpu *vcpu)
> goto out_no_data;
> handle_stsi_3_2_2(vcpu, (void *) mem);
> break;
> + case 15:
> + trace_kvm_s390_handle_stsi(vcpu, fc, sel1, sel2, operand2);
> + insert_stsi_usr_data(vcpu, operand2, ar, fc, sel1, sel2);
> + return -EREMOTE;
> }
> if (kvm_s390_pv_cpu_is_protected(vcpu)) {
> memcpy((void *)sida_origin(vcpu->arch.sie_block), (void *)mem,
> diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
> index acda4b6fc851..da0397cf2cc7 100644
> --- a/arch/s390/kvm/vsie.c
> +++ b/arch/s390/kvm/vsie.c
> @@ -503,6 +503,9 @@ static int shadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
> /* Host-protection-interruption introduced with ESOP */
> if (test_kvm_cpu_feat(vcpu->kvm, KVM_S390_VM_CPU_FEAT_ESOP))
> scb_s->ecb |= scb_o->ecb & ECB_HOSTPROTINT;
> + /* CPU Topology */
> + if (test_kvm_facility(vcpu->kvm, 11))
> + scb_s->ecb |= scb_o->ecb & ECB_PTF;
> /* transactional execution */
> if (test_kvm_facility(vcpu->kvm, 73) && wants_tx) {
> /* remap the prefix is tx is toggled on */
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 1daa45268de2..273c62dfbe9a 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1131,6 +1131,7 @@ struct kvm_ppc_resize_hpt {
> #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
> #define KVM_CAP_ARM_MTE 205
> #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
> +#define KVM_CAP_S390_CPU_TOPOLOGY 207
>
> #ifdef KVM_CAP_IRQ_ROUTING
>
>


2021-12-13 10:13:29

by Pierre Morel

[permalink] [raw]
Subject: Re: [PATCH v5 1/1] s390x: KVM: accept STSI for CPU topology information



On 12/9/21 16:54, Heiko Carstens wrote:
> On Thu, Dec 09, 2021 at 01:36:16PM +0100, Claudio Imbrenda wrote:
>> On Mon, 22 Nov 2021 14:14:43 +0100
>> Pierre Morel <[email protected]> wrote:
>>
>>> We let the userland hypervisor know if the machine support the CPU
>>> topology facility using a new KVM capability: KVM_CAP_S390_CPU_TOPOLOGY.
>>>
>>> The PTF instruction will report a topology change if there is any change
>>> with a previous STSI_15_1_2 SYSIB.
>>> Changes inside a STSI_15_1_2 SYSIB occur if CPU bits are set or clear
>>> inside the CPU Topology List Entry CPU mask field, which happens with
>>> changes in CPU polarization, dedication, CPU types and adding or
>>> removing CPUs in a socket.
>>>
>>> The reporting to the guest is done using the Multiprocessor
>>> Topology-Change-Report (MTCR) bit of the utility entry of the guest's
>>> SCA which will be cleared during the interpretation of PTF.
>>>
>>> To check if the topology has been modified we use a new field of the
>>> arch vCPU to save the previous real CPU ID at the end of a schedule
>>> and verify on next schedule that the CPU used is in the same socket.
>>>
>>> We assume in this patch:
>>> - no polarization change: only horizontal polarization is currently
>>> used in linux.
>
> Why is this assumption necessary? The statement that Linux runs only
> with horizontal polarization is not true.

Oh OK, I will change this and take a look at the implications.

Thanks,
Pierre

>

--
Pierre Morel
IBM Lab Boeblingen

2021-12-13 10:20:26

by Pierre Morel

[permalink] [raw]
Subject: Re: [PATCH v5 1/1] s390x: KVM: accept STSI for CPU topology information



On 12/9/21 17:08, Janosch Frank wrote:
> On 11/22/21 14:14, Pierre Morel wrote:
>> We let the userland hypervisor know if the machine support the CPU
>> topology facility using a new KVM capability: KVM_CAP_S390_CPU_TOPOLOGY.
>>
>> The PTF instruction will report a topology change if there is any change
>> with a previous STSI_15_1_2 SYSIB.
>> Changes inside a STSI_15_1_2 SYSIB occur if CPU bits are set or clear
>> inside the CPU Topology List Entry CPU mask field, which happens with
>> changes in CPU polarization, dedication, CPU types and adding or
>> removing CPUs in a socket.
>>
>> The reporting to the guest is done using the Multiprocessor
>> Topology-Change-Report (MTCR) bit of the utility entry of the guest's
>> SCA which will be cleared during the interpretation of PTF.
>>
>> To check if the topology has been modified we use a new field of the
>> arch vCPU to save the previous real CPU ID at the end of a schedule
>> and verify on next schedule that the CPU used is in the same socket.
>>
>> We assume in this patch:
>> - no polarization change: only horizontal polarization is currently
>>    used in linux.
>> - no CPU Type change: only IFL Type are supported in Linux
>> - Dedication: with this patch, only a complete dedicated CPU stack can
>>    take benefit of the CPU Topology.
>>
>> STSI(15.1.x) gives information on the CPU configuration topology.
>> Let's accept the interception of STSI with the function code 15 and
>> let the userland part of the hypervisor handle it when userland
>> support the CPU Topology facility.
>>
>> Signed-off-by: Pierre Morel <[email protected]>
>> ---
>>   Documentation/virt/kvm/api.rst   | 16 ++++++++++
>>   arch/s390/include/asm/kvm_host.h | 14 ++++++---
>>   arch/s390/kvm/kvm-s390.c         | 52 +++++++++++++++++++++++++++++++-
>>   arch/s390/kvm/priv.c             |  7 ++++-
>>   arch/s390/kvm/vsie.c             |  3 ++
>>   include/uapi/linux/kvm.h         |  1 +
>>   6 files changed, 87 insertions(+), 6 deletions(-)
>>
>> diff --git a/Documentation/virt/kvm/api.rst
>> b/Documentation/virt/kvm/api.rst
>> index aeeb071c7688..e5c9da0782a6 100644
>> --- a/Documentation/virt/kvm/api.rst
>> +++ b/Documentation/virt/kvm/api.rst
>> @@ -7484,3 +7484,19 @@ The argument to KVM_ENABLE_CAP is also a
>> bitmask, and must be a subset
>>   of the result of KVM_CHECK_EXTENSION.  KVM will forward to userspace
>>   the hypercalls whose corresponding bit is in the argument, and return
>>   ENOSYS for the others.
>> +
>> +8.17 KVM_CAP_S390_CPU_TOPOLOGY
>> +------------------------------
>> +
>> +:Capability: KVM_CAP_S390_CPU_TOPOLOGY
>> +:Architectures: s390
>> +:Type: vm
>> +
>> +This capability indicates that kvm will provide the S390 CPU Topology
>> facility
>> +which consist of the interpretation of the PTF instruction for the
>> Function
>> +Code 2 along with interception and forwarding of both the PTF
>> instruction
>> +with function Codes 0 or 1 and the STSI(15,1,x) instruction to the
>> userland
>
> The capitalization of "Function code" is inconsistent.

ok

>
>> +hypervisor.
>> +
>> +The stfle facility 11, CPU Topology facility, should not be provided
>> to the
>> +guest without this capability.
>> diff --git a/arch/s390/include/asm/kvm_host.h
>> b/arch/s390/include/asm/kvm_host.h
>> index a604d51acfc8..cccc09a8fdab 100644
>> --- a/arch/s390/include/asm/kvm_host.h
>> +++ b/arch/s390/include/asm/kvm_host.h
>> @@ -95,15 +95,19 @@ struct bsca_block {
>>       union ipte_control ipte_control;
>>       __u64    reserved[5];
>>       __u64    mcn;
>> -    __u64    reserved2;
>> +#define ESCA_UTILITY_MTCR    0x8000
>> +    __u16    utility;
>> +    __u8    reserved2[6];
>>       struct bsca_entry cpu[KVM_S390_BSCA_CPU_SLOTS];
>>   };
>>   struct esca_block {
>>       union ipte_control ipte_control;
>> -    __u64   reserved1[7];
>> +    __u64   reserved1[6];
>> +    __u16    utility;
>> +    __u8    reserved2[6];
>>       __u64   mcn[4];
>> -    __u64   reserved2[20];
>> +    __u64   reserved3[20];
>
> Note to self: Prime example for a move to reserved member names based on
> offsets.

yes

>
>>       struct esca_entry cpu[KVM_S390_ESCA_CPU_SLOTS];
>>   };
>> @@ -228,7 +232,7 @@ struct kvm_s390_sie_block {
>>       __u8    icptcode;        /* 0x0050 */
>>       __u8    icptstatus;        /* 0x0051 */
>>       __u16    ihcpu;            /* 0x0052 */
>> -    __u8    reserved54;        /* 0x0054 */
>> +    __u8    mtcr;            /* 0x0054 */
>>   #define IICTL_CODE_NONE         0x00
>>   #define IICTL_CODE_MCHK         0x01
>>   #define IICTL_CODE_EXT         0x02
>> @@ -247,6 +251,7 @@ struct kvm_s390_sie_block {
>>   #define ECB_SPECI    0x08
>>   #define ECB_SRSI    0x04
>>   #define ECB_HOSTPROTINT    0x02
>> +#define ECB_PTF        0x01
>>       __u8    ecb;            /* 0x0061 */
>>   #define ECB2_CMMA    0x80
>>   #define ECB2_IEP    0x20
>> @@ -748,6 +753,7 @@ struct kvm_vcpu_arch {
>>       bool skey_enabled;
>>       struct kvm_s390_pv_vcpu pv;
>>       union diag318_info diag318_info;
>> +    int prev_cpu;
>>   };
>>   struct kvm_vm_stat {
>
> [..]
>
>>   }
>> -void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>> +static void kvm_s390_set_mtcr(struct kvm_vcpu *vcpu)
>
> We change a vcpu related data structure, there should be "vcpu" in the
> function name to indicate that.

ok

>
>>   {
>> +    struct esca_block *esca = vcpu->kvm->arch.sca;
>> +    if (vcpu->arch.sie_block->ecb & ECB_PTF) {
>
> I'm wondering if we should replace these checks with the
> test_kvm_facility() ones. ECB_PTF is never changed after vcpu setup, right?

sure, it is left from the first draw as the patch supported both
interpretation and interception.

>
>> +        ipte_lock(vcpu);
>> +        WRITE_ONCE(esca->utility, ESCA_UTILITY_MTCR);
>> +        ipte_unlock(vcpu);
>> +    }
>> +}
>> +
>> +void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>> +{
>>       gmap_enable(vcpu->arch.enabled_gmap);
>>       kvm_s390_set_cpuflags(vcpu, CPUSTAT_RUNNING);
>>       if (vcpu->arch.cputm_enabled && !is_vcpu_idle(vcpu))
>>           __start_cpu_timer_accounting(vcpu);
>>       vcpu->cpu = cpu;
>> +
>> +    /*
>> +     * With PTF interpretation the guest will be aware of topology
>> +     * change when the Multiprocessor Topology-Change-Report is pending.
>> +     * We check for events modifying the result of STSI_15_2:
>> +     * - A new vCPU has been hotplugged (prev_cpu == -1)
>> +     * - The real CPU backing up the vCPU moved to another socket
>> +     */
>> +    if (vcpu->arch.sie_block->ecb & ECB_PTF) {
>> +        if (vcpu->arch.prev_cpu == -1 ||
>> +            (topology_physical_package_id(cpu) !=
>> +             topology_physical_package_id(vcpu->arch.prev_cpu)))
>
> This is barely readable, might be good to put this check in a separate
> function in kvm-s390.h.

ok

>
>> +            kvm_s390_set_mtcr(vcpu);
>> +    }
>>   }
>>   void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>>   {
>> +    /* Remember which CPU was backing the vCPU */
>> +    vcpu->arch.prev_cpu = vcpu->cpu;
>>       vcpu->cpu = -1;
>>       if (vcpu->arch.cputm_enabled && !is_vcpu_idle(vcpu))
>>           __stop_cpu_timer_accounting(vcpu);
>> @@ -3220,6 +3263,13 @@ static int kvm_s390_vcpu_setup(struct kvm_vcpu
>> *vcpu)
>>           vcpu->arch.sie_block->ecb |= ECB_HOSTPROTINT;
>>       if (test_kvm_facility(vcpu->kvm, 9))
>>           vcpu->arch.sie_block->ecb |= ECB_SRSI;
>> +
>> +    /* PTF needs guest facilities to enable interpretation */
>
> Please explain.
> How is this different from any other facility a few lines above in this
> function?

it is not I remove the comment, here again left from the time the patch
supported interception.

>
>> +    if (test_kvm_facility(vcpu->kvm, 11))
>> +        vcpu->arch.sie_block->ecb |= ECB_PTF;
>> +    /* Set the prev_cpu value to an impossible value to detect a new
>> vcpu */
>
> We can either change this to:
> "A prev_value of -1 indicates this is a new vcpu"
>
> Or we define a constant which will also make the check in
> kvm_arch_vcpu_load() easier to understand.

ok, the constant would be clearer.

>
>> +    vcpu->arch.prev_cpu = -1;
>> +
>>       if (test_kvm_facility(vcpu->kvm, 73))
>>           vcpu->arch.sie_block->ecb |= ECB_TE;
>>       if (!kvm_is_ucontrol(vcpu->kvm))
>> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
>> index 417154b314a6..26d165733496 100644
>> --- a/arch/s390/kvm/priv.c
>> +++ b/arch/s390/kvm/priv.c
>> @@ -861,7 +861,8 @@ static int handle_stsi(struct kvm_vcpu *vcpu)
>>       if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
>>           return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
>> -    if (fc > 3) {
>> +    if ((fc > 3 && fc != 15) ||
>> +        (fc == 15 && !test_kvm_facility(vcpu->kvm, 11))) {
>>           kvm_s390_set_psw_cc(vcpu, 3);
>>           return 0;
>>       }
>
> How about:
>
> if (fc > 3 && fc != 15)
>     goto out_no_data;
>
> /* fc 15 is provided with PTF/CPU topology support */
> if (fc == 15 && !test_kvm_facility(vcpu->kvm, 11))
>     goto out_no_data;

ok, clearer


Thanks for review,
Pierre

--
Pierre Morel
IBM Lab Boeblingen

2021-12-13 14:26:06

by Pierre Morel

[permalink] [raw]
Subject: Re: [PATCH v5 1/1] s390x: KVM: accept STSI for CPU topology information



On 12/9/21 16:54, Heiko Carstens wrote:
> On Thu, Dec 09, 2021 at 01:36:16PM +0100, Claudio Imbrenda wrote:
>> On Mon, 22 Nov 2021 14:14:43 +0100
>> Pierre Morel <[email protected]> wrote:
>>
>>> We let the userland hypervisor know if the machine support the CPU
>>> topology facility using a new KVM capability: KVM_CAP_S390_CPU_TOPOLOGY.
>>>
>>> The PTF instruction will report a topology change if there is any change
>>> with a previous STSI_15_1_2 SYSIB.
>>> Changes inside a STSI_15_1_2 SYSIB occur if CPU bits are set or clear
>>> inside the CPU Topology List Entry CPU mask field, which happens with
>>> changes in CPU polarization, dedication, CPU types and adding or
>>> removing CPUs in a socket.
>>>
>>> The reporting to the guest is done using the Multiprocessor
>>> Topology-Change-Report (MTCR) bit of the utility entry of the guest's
>>> SCA which will be cleared during the interpretation of PTF.
>>>
>>> To check if the topology has been modified we use a new field of the
>>> arch vCPU to save the previous real CPU ID at the end of a schedule
>>> and verify on next schedule that the CPU used is in the same socket.
>>>
>>> We assume in this patch:
>>> - no polarization change: only horizontal polarization is currently
>>> used in linux.
>
> Why is this assumption necessary? The statement that Linux runs only
> with horizontal polarization is not true.
>

Right, I will rephrase this as:

"Polarization change is not taken into account, QEMU intercepts queries
for polarization change (PTF) and only provides horizontal polarization
indication to Guest's Linux."

@Heiko, I did not find any usage of the polarization in the kernel other
than an indication in the sysfs. Is there currently other use of the
polarization that I did not see?



--
Pierre Morel
IBM Lab Boeblingen

2021-12-13 15:21:59

by Heiko Carstens

[permalink] [raw]
Subject: Re: [PATCH v5 1/1] s390x: KVM: accept STSI for CPU topology information

On Mon, Dec 13, 2021 at 03:26:58PM +0100, Pierre Morel wrote:
> > Why is this assumption necessary? The statement that Linux runs only
> > with horizontal polarization is not true.
> >
>
> Right, I will rephrase this as:
>
> "Polarization change is not taken into account, QEMU intercepts queries for
> polarization change (PTF) and only provides horizontal polarization
> indication to Guest's Linux."
>
> @Heiko, I did not find any usage of the polarization in the kernel other
> than an indication in the sysfs. Is there currently other use of the
> polarization that I did not see?

You can change polarization by writing to /sys/devices/system/cpu/dispatching.

Or alternativel use the chcpu tool to change polarization. There is
however no real support for vertical polarization implemented in the
kernel. Therefore changing to vertical polarization is _not_
recommended, since it will most likely have negative performance
impacts on your Linux system.
However the interface is still there for experimental purposes.

2021-12-13 15:45:13

by Pierre Morel

[permalink] [raw]
Subject: Re: [PATCH v5 1/1] s390x: KVM: accept STSI for CPU topology information



On 12/13/21 16:21, Heiko Carstens wrote:
> On Mon, Dec 13, 2021 at 03:26:58PM +0100, Pierre Morel wrote:
>>> Why is this assumption necessary? The statement that Linux runs only
>>> with horizontal polarization is not true.
>>>
>>
>> Right, I will rephrase this as:
>>
>> "Polarization change is not taken into account, QEMU intercepts queries for
>> polarization change (PTF) and only provides horizontal polarization
>> indication to Guest's Linux."
>>
>> @Heiko, I did not find any usage of the polarization in the kernel other
>> than an indication in the sysfs. Is there currently other use of the
>> polarization that I did not see?
>
> You can change polarization by writing to /sys/devices/system/cpu/dispatching.
>
> Or alternativel use the chcpu tool to change polarization. There is
> however no real support for vertical polarization implemented in the
> kernel. Therefore changing to vertical polarization is _not_
> recommended, since it will most likely have negative performance
> impacts on your Linux system.
> However the interface is still there for experimental purposes.
>

Thanks, so I guess that not reflecting polarization changes to the guest
topology will be OK for the moment.
Of course, I will change the wrong comment.

--
Pierre Morel
IBM Lab Boeblingen

2021-12-14 09:01:11

by Alexandra Winter

[permalink] [raw]
Subject: Re: [PATCH v5 0/1] s390x: KVM: CPU Topology



On 22.11.21 14:14, Pierre Morel wrote:
> Hi all,
>
> This new series add the implementation of interpretation for
> the PTF instruction.
>
> The series provides:
> 1- interception of the STSI instruction forwarding the CPU topology
> 2- interpretation of the PTF instruction
> 3- a KVM capability for the userland hypervisor to ask KVM to
> setup PTF interpretation.
>
>
> 0- Foreword
>
[...]
> We will ignore the following changes inside a STSI(15.1.2):
> - polarization: only horizontal polarization is currently used in Linux.
> - CPU Type: only IFL Type are supported in Linux
I thought Linux can also run on General Purpose CPUs ??

2021-12-16 15:15:08

by Pierre Morel

[permalink] [raw]
Subject: Re: [PATCH v5 0/1] s390x: KVM: CPU Topology



On 12/14/21 10:01, Alexandra Winter wrote:
>
>
> On 22.11.21 14:14, Pierre Morel wrote:
>> Hi all,
>>
>> This new series add the implementation of interpretation for
>> the PTF instruction.
>>
>> The series provides:
>> 1- interception of the STSI instruction forwarding the CPU topology
>> 2- interpretation of the PTF instruction
>> 3- a KVM capability for the userland hypervisor to ask KVM to
>> setup PTF interpretation.
>>
>>
>> 0- Foreword
>>
> [...]
>> We will ignore the following changes inside a STSI(15.1.2):
>> - polarization: only horizontal polarization is currently used in Linux.
>> - CPU Type: only IFL Type are supported in Linux
> I thought Linux can also run on General Purpose CPUs ??
>

You are right, I will change these comments.
Thanks

--
Pierre Morel
IBM Lab Boeblingen