Hi all,
This new spin suppress the check for real cpu migration and
modify the checking of valid function code inside the interception
of the STSI instruction.
The series provides:
0- Modification of the ipte lock handling to use KVM instead of the
vcpu as an argument because ipte lock work on SCA which is uniq
per KVM structure and common to all vCPUs.
1- interception of the STSI instruction forwarding the CPU topology
2- interpretation of the PTF instruction
3- a KVM capability for the userland hypervisor to ask KVM to
setup PTF interpretation.
4- KVM ioctl to get and set the MTCR bit of the SCA in order to
migrate this bit during a migration.
0- Foreword
The S390 CPU topology is reported using two instructions:
- PTF, to get information if the CPU topology did change since last
PTF instruction or a subsystem reset.
- STSI, to get the topology information, consisting of the topology
of the CPU inside the sockets, of the sockets inside the books etc.
The PTF(2) instruction report a change if the STSI(15.1.2) instruction
will report a difference with the last STSI(15.1.2) instruction*.
With the SIE interpretation, the PTF(2) instruction will report a
change to the guest if the host sets the SCA.MTCR bit.
*The STSI(15.1.2) instruction reports:
- The cores address within a socket
- The polarization of the cores
- The CPU type of the cores
- If the cores are dedicated or not
We decided to implement the CPU topology for S390 in several steps:
- first we report CPU hotplug
In future development we will provide:
- modification of the CPU mask inside sockets
- handling of shared CPUs
- reporting of the CPU Type
- reporting of the polarization
1- Interception of STSI
To provide Topology information to the guest through the STSI
instruction, we forward STSI with Function Code 15 to the
userland hypervisor which will take care to provide the right
information to the guest.
To let the guest use both the PTF instruction to check if a topology
change occurred and sthe STSI_15.x.x instruction we add a new KVM
capability to enable the topology facility.
2- Interpretation of PTF with FC(2)
The PTF instruction reports a topology change if there is any change
with a previous STSI(15.1.2) SYSIB.
Changes inside a STSI(15.1.2) SYSIB occur if CPU bits are set or clear
inside the CPU Topology List Entry CPU mask field, which happens with
changes in CPU polarization, dedication, CPU types and adding or
removing CPUs in a socket.
Considering that the KVM guests currently only supports:
- horizontal polarization
- type 3 (Linux) CPU
And that we decide to support only:
- dedicated CPUs on the host
- pinned vCPUs on the guest
the creation of vCPU will is the only trigger to set the MTCR bit for
a guest.
The reporting to the guest is done using the Multiprocessor
Topology-Change-Report (MTCR) bit of the utility entry of the guest's
SCA which will be cleared during the interpretation of PTF.
Regards,
Pierre
Pierre Morel (3):
KVM: s390: ipte lock for SCA access should be contained in KVM
KVM: s390: guest support for topology function
KVM: s390: resetting the Topology-Change-Report
Documentation/virt/kvm/api.rst | 31 ++++++++
arch/s390/include/asm/kvm_host.h | 11 ++-
arch/s390/include/uapi/asm/kvm.h | 10 +++
arch/s390/kvm/gaccess.c | 96 ++++++++++++------------
arch/s390/kvm/gaccess.h | 6 +-
arch/s390/kvm/kvm-s390.c | 123 ++++++++++++++++++++++++++++++-
arch/s390/kvm/priv.c | 21 ++++--
arch/s390/kvm/vsie.c | 3 +
include/uapi/linux/kvm.h | 1 +
9 files changed, 240 insertions(+), 62 deletions(-)
--
2.31.1
Changelog:
from v9 to v10
- Suppression of the check on real CPU migration
(Christian)
- Changed the check on fc in handle_stsi
(David)
from v8 to v9
- bug correction in kvm_s390_topology_changed
(Heiko)
- simplification for ipte_lock/unlock to use kvm
as arg instead of vcpu and test on sclp.has_siif
instead of the SIE ECA_SII.
(David)
- use of a single value for reporting if the
topology changed instead of a structure
(David)
from v7 to v8
- implement reset handling
(Janosch)
- change the way to check if the topology changed
(Nico, Heiko)
from v6 to v7
- rebase
from v5 to v6
- make the subject more accurate
(Claudio)
- Change the kvm_s390_set_mtcr() function to have vcpu in the name
(Janosch)
- Replace the checks on ECB_PTF wit the check of facility 11
(Janosch)
- modify kvm_arch_vcpu_load, move the check in a function in
the header file
(Janosh)
- No magical number replace the "new cpu value" of -1 with a define
(Janosch)
- Make the checks for STSI validity clearer
(Janosch)
from v4 tp v5
- modify the way KVM_CAP is tested to be OK with vsie
(David)
from v3 to v4
- squatch both patches
(David)
- Added Documentation
(David)
- Modified the detection for new vCPUs
(Pierre)
from v2 to v3
- use PTF interpretation
(Christian)
- optimize arch_update_cpu_topology using PTF
(Pierre)
from v1 to v2:
- Add a KVM capability to let QEMU know we support PTF and STSI 15
(David)
- check KVM facility 11 before accepting STSI fc 15
(David)
- handle all we can in userland
(David)
- add tracing to STSI fc 15
(Connie)
We can check if SIIF is enabled by testing the sclp_info struct
instead of testing the sie control block eca variable.
sclp.has_ssif is the only requirement to set ECA_SII anyway
so we can go straight to the source for that.
Signed-off-by: Pierre Morel <[email protected]>
Reviewed-by: Janosch Frank <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
---
arch/s390/kvm/gaccess.c | 96 ++++++++++++++++++++---------------------
arch/s390/kvm/gaccess.h | 6 +--
arch/s390/kvm/priv.c | 6 +--
3 files changed, 54 insertions(+), 54 deletions(-)
diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
index 227ed0009354..082ec5f2c3a5 100644
--- a/arch/s390/kvm/gaccess.c
+++ b/arch/s390/kvm/gaccess.c
@@ -262,77 +262,77 @@ struct aste {
/* .. more fields there */
};
-int ipte_lock_held(struct kvm_vcpu *vcpu)
+int ipte_lock_held(struct kvm *kvm)
{
- if (vcpu->arch.sie_block->eca & ECA_SII) {
+ if (sclp.has_siif) {
int rc;
- read_lock(&vcpu->kvm->arch.sca_lock);
- rc = kvm_s390_get_ipte_control(vcpu->kvm)->kh != 0;
- read_unlock(&vcpu->kvm->arch.sca_lock);
+ read_lock(&kvm->arch.sca_lock);
+ rc = kvm_s390_get_ipte_control(kvm)->kh != 0;
+ read_unlock(&kvm->arch.sca_lock);
return rc;
}
- return vcpu->kvm->arch.ipte_lock_count != 0;
+ return kvm->arch.ipte_lock_count != 0;
}
-static void ipte_lock_simple(struct kvm_vcpu *vcpu)
+static void ipte_lock_simple(struct kvm *kvm)
{
union ipte_control old, new, *ic;
- mutex_lock(&vcpu->kvm->arch.ipte_mutex);
- vcpu->kvm->arch.ipte_lock_count++;
- if (vcpu->kvm->arch.ipte_lock_count > 1)
+ mutex_lock(&kvm->arch.ipte_mutex);
+ kvm->arch.ipte_lock_count++;
+ if (kvm->arch.ipte_lock_count > 1)
goto out;
retry:
- read_lock(&vcpu->kvm->arch.sca_lock);
- ic = kvm_s390_get_ipte_control(vcpu->kvm);
+ read_lock(&kvm->arch.sca_lock);
+ ic = kvm_s390_get_ipte_control(kvm);
do {
old = READ_ONCE(*ic);
if (old.k) {
- read_unlock(&vcpu->kvm->arch.sca_lock);
+ read_unlock(&kvm->arch.sca_lock);
cond_resched();
goto retry;
}
new = old;
new.k = 1;
} while (cmpxchg(&ic->val, old.val, new.val) != old.val);
- read_unlock(&vcpu->kvm->arch.sca_lock);
+ read_unlock(&kvm->arch.sca_lock);
out:
- mutex_unlock(&vcpu->kvm->arch.ipte_mutex);
+ mutex_unlock(&kvm->arch.ipte_mutex);
}
-static void ipte_unlock_simple(struct kvm_vcpu *vcpu)
+static void ipte_unlock_simple(struct kvm *kvm)
{
union ipte_control old, new, *ic;
- mutex_lock(&vcpu->kvm->arch.ipte_mutex);
- vcpu->kvm->arch.ipte_lock_count--;
- if (vcpu->kvm->arch.ipte_lock_count)
+ mutex_lock(&kvm->arch.ipte_mutex);
+ kvm->arch.ipte_lock_count--;
+ if (kvm->arch.ipte_lock_count)
goto out;
- read_lock(&vcpu->kvm->arch.sca_lock);
- ic = kvm_s390_get_ipte_control(vcpu->kvm);
+ read_lock(&kvm->arch.sca_lock);
+ ic = kvm_s390_get_ipte_control(kvm);
do {
old = READ_ONCE(*ic);
new = old;
new.k = 0;
} while (cmpxchg(&ic->val, old.val, new.val) != old.val);
- read_unlock(&vcpu->kvm->arch.sca_lock);
- wake_up(&vcpu->kvm->arch.ipte_wq);
+ read_unlock(&kvm->arch.sca_lock);
+ wake_up(&kvm->arch.ipte_wq);
out:
- mutex_unlock(&vcpu->kvm->arch.ipte_mutex);
+ mutex_unlock(&kvm->arch.ipte_mutex);
}
-static void ipte_lock_siif(struct kvm_vcpu *vcpu)
+static void ipte_lock_siif(struct kvm *kvm)
{
union ipte_control old, new, *ic;
retry:
- read_lock(&vcpu->kvm->arch.sca_lock);
- ic = kvm_s390_get_ipte_control(vcpu->kvm);
+ read_lock(&kvm->arch.sca_lock);
+ ic = kvm_s390_get_ipte_control(kvm);
do {
old = READ_ONCE(*ic);
if (old.kg) {
- read_unlock(&vcpu->kvm->arch.sca_lock);
+ read_unlock(&kvm->arch.sca_lock);
cond_resched();
goto retry;
}
@@ -340,15 +340,15 @@ static void ipte_lock_siif(struct kvm_vcpu *vcpu)
new.k = 1;
new.kh++;
} while (cmpxchg(&ic->val, old.val, new.val) != old.val);
- read_unlock(&vcpu->kvm->arch.sca_lock);
+ read_unlock(&kvm->arch.sca_lock);
}
-static void ipte_unlock_siif(struct kvm_vcpu *vcpu)
+static void ipte_unlock_siif(struct kvm *kvm)
{
union ipte_control old, new, *ic;
- read_lock(&vcpu->kvm->arch.sca_lock);
- ic = kvm_s390_get_ipte_control(vcpu->kvm);
+ read_lock(&kvm->arch.sca_lock);
+ ic = kvm_s390_get_ipte_control(kvm);
do {
old = READ_ONCE(*ic);
new = old;
@@ -356,25 +356,25 @@ static void ipte_unlock_siif(struct kvm_vcpu *vcpu)
if (!new.kh)
new.k = 0;
} while (cmpxchg(&ic->val, old.val, new.val) != old.val);
- read_unlock(&vcpu->kvm->arch.sca_lock);
+ read_unlock(&kvm->arch.sca_lock);
if (!new.kh)
- wake_up(&vcpu->kvm->arch.ipte_wq);
+ wake_up(&kvm->arch.ipte_wq);
}
-void ipte_lock(struct kvm_vcpu *vcpu)
+void ipte_lock(struct kvm *kvm)
{
- if (vcpu->arch.sie_block->eca & ECA_SII)
- ipte_lock_siif(vcpu);
+ if (sclp.has_siif)
+ ipte_lock_siif(kvm);
else
- ipte_lock_simple(vcpu);
+ ipte_lock_simple(kvm);
}
-void ipte_unlock(struct kvm_vcpu *vcpu)
+void ipte_unlock(struct kvm *kvm)
{
- if (vcpu->arch.sie_block->eca & ECA_SII)
- ipte_unlock_siif(vcpu);
+ if (sclp.has_siif)
+ ipte_unlock_siif(kvm);
else
- ipte_unlock_simple(vcpu);
+ ipte_unlock_simple(kvm);
}
static int ar_translation(struct kvm_vcpu *vcpu, union asce *asce, u8 ar,
@@ -1086,7 +1086,7 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
try_storage_prot_override = storage_prot_override_applicable(vcpu);
need_ipte_lock = psw_bits(*psw).dat && !asce.r;
if (need_ipte_lock)
- ipte_lock(vcpu);
+ ipte_lock(vcpu->kvm);
/*
* Since we do the access further down ultimately via a move instruction
* that does key checking and returns an error in case of a protection
@@ -1127,7 +1127,7 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
}
out_unlock:
if (need_ipte_lock)
- ipte_unlock(vcpu);
+ ipte_unlock(vcpu->kvm);
if (nr_pages > ARRAY_SIZE(gpa_array))
vfree(gpas);
return rc;
@@ -1199,10 +1199,10 @@ int check_gva_range(struct kvm_vcpu *vcpu, unsigned long gva, u8 ar,
rc = get_vcpu_asce(vcpu, &asce, gva, ar, mode);
if (rc)
return rc;
- ipte_lock(vcpu);
+ ipte_lock(vcpu->kvm);
rc = guest_range_to_gpas(vcpu, gva, ar, NULL, length, asce, mode,
access_key);
- ipte_unlock(vcpu);
+ ipte_unlock(vcpu->kvm);
return rc;
}
@@ -1465,7 +1465,7 @@ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
* tables/pointers we read stay valid - unshadowing is however
* always possible - only guest_table_lock protects us.
*/
- ipte_lock(vcpu);
+ ipte_lock(vcpu->kvm);
rc = gmap_shadow_pgt_lookup(sg, saddr, &pgt, &dat_protection, &fake);
if (rc)
@@ -1499,7 +1499,7 @@ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
pte.p |= dat_protection;
if (!rc)
rc = gmap_shadow_page(sg, saddr, __pte(pte.val));
- ipte_unlock(vcpu);
+ ipte_unlock(vcpu->kvm);
mmap_read_unlock(sg->mm);
return rc;
}
diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h
index 1124ff282012..9408d6cc8e2c 100644
--- a/arch/s390/kvm/gaccess.h
+++ b/arch/s390/kvm/gaccess.h
@@ -440,9 +440,9 @@ int read_guest_real(struct kvm_vcpu *vcpu, unsigned long gra, void *data,
return access_guest_real(vcpu, gra, data, len, 0);
}
-void ipte_lock(struct kvm_vcpu *vcpu);
-void ipte_unlock(struct kvm_vcpu *vcpu);
-int ipte_lock_held(struct kvm_vcpu *vcpu);
+void ipte_lock(struct kvm *kvm);
+void ipte_unlock(struct kvm *kvm);
+int ipte_lock_held(struct kvm *kvm);
int kvm_s390_check_low_addr_prot_real(struct kvm_vcpu *vcpu, unsigned long gra);
/* MVPG PEI indication bits */
diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index 83bb5cf97282..12c464c7cddf 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -442,7 +442,7 @@ static int handle_ipte_interlock(struct kvm_vcpu *vcpu)
vcpu->stat.instruction_ipte_interlock++;
if (psw_bits(vcpu->arch.sie_block->gpsw).pstate)
return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
- wait_event(vcpu->kvm->arch.ipte_wq, !ipte_lock_held(vcpu));
+ wait_event(vcpu->kvm->arch.ipte_wq, !ipte_lock_held(vcpu->kvm));
kvm_s390_retry_instr(vcpu);
VCPU_EVENT(vcpu, 4, "%s", "retrying ipte interlock operation");
return 0;
@@ -1471,7 +1471,7 @@ static int handle_tprot(struct kvm_vcpu *vcpu)
access_key = (operand2 & 0xf0) >> 4;
if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_DAT)
- ipte_lock(vcpu);
+ ipte_lock(vcpu->kvm);
ret = guest_translate_address_with_key(vcpu, address, ar, &gpa,
GACC_STORE, access_key);
@@ -1508,7 +1508,7 @@ static int handle_tprot(struct kvm_vcpu *vcpu)
}
if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_DAT)
- ipte_unlock(vcpu);
+ ipte_unlock(vcpu->kvm);
return ret;
}
--
2.31.1
During a subsystem reset the Topology-Change-Report is cleared.
Let's give userland the possibility to clear the MTCR in the case
of a subsystem reset.
To migrate the MTCR, we give userland the possibility to
query the MTCR state.
We indicate KVM support for the CPU topology facility with a new
KVM capability: KVM_CAP_S390_CPU_TOPOLOGY.
Signed-off-by: Pierre Morel <[email protected]>
---
Documentation/virt/kvm/api.rst | 31 +++++++++++
arch/s390/include/uapi/asm/kvm.h | 10 ++++
arch/s390/kvm/kvm-s390.c | 96 ++++++++++++++++++++++++++++++++
include/uapi/linux/kvm.h | 1 +
4 files changed, 138 insertions(+)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 11e00a46c610..326f8b7e7671 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7956,6 +7956,37 @@ should adjust CPUID leaf 0xA to reflect that the PMU is disabled.
When enabled, KVM will exit to userspace with KVM_EXIT_SYSTEM_EVENT of
type KVM_SYSTEM_EVENT_SUSPEND to process the guest suspend request.
+8.37 KVM_CAP_S390_CPU_TOPOLOGY
+------------------------------
+
+:Capability: KVM_CAP_S390_CPU_TOPOLOGY
+:Architectures: s390
+:Type: vm
+
+This capability indicates that KVM will provide the S390 CPU Topology
+facility which consist of the interpretation of the PTF instruction for
+the Function Code 2 along with interception and forwarding of both the
+PTF instruction with Function Codes 0 or 1 and the STSI(15,1,x)
+instruction to the userland hypervisor.
+
+The stfle facility 11, CPU Topology facility, should not be provided
+to the guest without this capability.
+
+When this capability is present, KVM provides a new attribute group
+on vm fd, KVM_S390_VM_CPU_TOPOLOGY.
+This new attribute allows to get, set or clear the Modified Change
+Topology Report (MTCR) bit of the SCA through the kvm_device_attr
+structure.
+
+Getting the MTCR bit is realized by using a kvm_device_attr attr
+entry value of KVM_GET_DEVICE_ATTR and with kvm_device_attr addr
+entry pointing to the address of a struct kvm_cpu_topology.
+The value of the MTCR is return by the bit mtcr of the structure.
+
+When using KVM_SET_DEVICE_ATTR the MTCR is set by using the
+attr->attr value KVM_S390_VM_CPU_TOPO_MTCR_SET and cleared by
+using KVM_S390_VM_CPU_TOPO_MTCR_CLEAR.
+
9. Known KVM API problems
=========================
diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
index 7a6b14874d65..df5e8279ffd0 100644
--- a/arch/s390/include/uapi/asm/kvm.h
+++ b/arch/s390/include/uapi/asm/kvm.h
@@ -74,6 +74,7 @@ struct kvm_s390_io_adapter_req {
#define KVM_S390_VM_CRYPTO 2
#define KVM_S390_VM_CPU_MODEL 3
#define KVM_S390_VM_MIGRATION 4
+#define KVM_S390_VM_CPU_TOPOLOGY 5
/* kvm attributes for mem_ctrl */
#define KVM_S390_VM_MEM_ENABLE_CMMA 0
@@ -171,6 +172,15 @@ struct kvm_s390_vm_cpu_subfunc {
#define KVM_S390_VM_MIGRATION_START 1
#define KVM_S390_VM_MIGRATION_STATUS 2
+/* kvm attributes for cpu topology */
+#define KVM_S390_VM_CPU_TOPO_MTCR_CLEAR 0
+#define KVM_S390_VM_CPU_TOPO_MTCR_SET 1
+
+struct kvm_cpu_topology {
+ __u16 mtcr : 1;
+ __u16 reserved : 15;
+};
+
/* for KVM_GET_REGS and KVM_SET_REGS */
struct kvm_regs {
/* general purpose regs for s390 */
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 95b96019ca8e..ae39041bb149 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -606,6 +606,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_S390_PROTECTED:
r = is_prot_virt_host();
break;
+ case KVM_CAP_S390_CPU_TOPOLOGY:
+ r = test_facility(11);
+ break;
default:
r = 0;
}
@@ -817,6 +820,20 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
icpt_operexc_on_all_vcpus(kvm);
r = 0;
break;
+ case KVM_CAP_S390_CPU_TOPOLOGY:
+ r = -EINVAL;
+ mutex_lock(&kvm->lock);
+ if (kvm->created_vcpus) {
+ r = -EBUSY;
+ } else if (test_facility(11)) {
+ set_kvm_facility(kvm->arch.model.fac_mask, 11);
+ set_kvm_facility(kvm->arch.model.fac_list, 11);
+ r = 0;
+ }
+ mutex_unlock(&kvm->lock);
+ VM_EVENT(kvm, 3, "ENABLE: CPU TOPOLOGY %s",
+ r ? "(not available)" : "(success)");
+ break;
default:
r = -EINVAL;
break;
@@ -1710,6 +1727,76 @@ static void kvm_s390_sca_set_mtcr(struct kvm *kvm)
ipte_unlock(kvm);
}
+/**
+ * kvm_s390_sca_clear_mtcr
+ * @kvm: guest KVM description
+ *
+ * Is only relevant if the topology facility is present,
+ * the caller should check KVM facility 11
+ *
+ * Updates the Multiprocessor Topology-Change-Report to signal
+ * the guest with a topology change.
+ */
+static void kvm_s390_sca_clear_mtcr(struct kvm *kvm)
+{
+ struct bsca_block *sca = kvm->arch.sca; /* SCA version doesn't matter */
+
+ ipte_lock(kvm);
+ sca->utility &= ~SCA_UTILITY_MTCR;
+ ipte_unlock(kvm);
+}
+
+static int kvm_s390_set_topology(struct kvm *kvm, struct kvm_device_attr *attr)
+{
+ if (!test_kvm_facility(kvm, 11))
+ return -ENXIO;
+
+ switch (attr->attr) {
+ case KVM_S390_VM_CPU_TOPO_MTCR_SET:
+ kvm_s390_sca_set_mtcr(kvm);
+ break;
+ case KVM_S390_VM_CPU_TOPO_MTCR_CLEAR:
+ kvm_s390_sca_clear_mtcr(kvm);
+ break;
+ }
+ return 0;
+}
+
+/**
+ * kvm_s390_sca_get_mtcr
+ * @kvm: guest KVM description
+ *
+ * Is only relevant if the topology facility is present,
+ * the caller should check KVM facility 11
+ *
+ * reports to QEMU the Multiprocessor Topology-Change-Report.
+ */
+static int kvm_s390_sca_get_mtcr(struct kvm *kvm)
+{
+ struct bsca_block *sca = kvm->arch.sca; /* SCA version doesn't matter */
+ int val;
+
+ ipte_lock(kvm);
+ val = sca->utility & SCA_UTILITY_MTCR;
+ ipte_unlock(kvm);
+
+ return val;
+}
+
+static int kvm_s390_get_topology(struct kvm *kvm, struct kvm_device_attr *attr)
+{
+ struct kvm_cpu_topology topo = {};
+
+ if (!test_kvm_facility(kvm, 11))
+ return -ENXIO;
+
+ topo.mtcr = kvm_s390_sca_get_mtcr(kvm) ? 1 : 0;
+ if (copy_to_user((void __user *)attr->addr, &topo, sizeof(topo)))
+ return -EFAULT;
+
+ return 0;
+}
+
static int kvm_s390_vm_set_attr(struct kvm *kvm, struct kvm_device_attr *attr)
{
int ret;
@@ -1730,6 +1817,9 @@ static int kvm_s390_vm_set_attr(struct kvm *kvm, struct kvm_device_attr *attr)
case KVM_S390_VM_MIGRATION:
ret = kvm_s390_vm_set_migration(kvm, attr);
break;
+ case KVM_S390_VM_CPU_TOPOLOGY:
+ ret = kvm_s390_set_topology(kvm, attr);
+ break;
default:
ret = -ENXIO;
break;
@@ -1755,6 +1845,9 @@ static int kvm_s390_vm_get_attr(struct kvm *kvm, struct kvm_device_attr *attr)
case KVM_S390_VM_MIGRATION:
ret = kvm_s390_vm_get_migration(kvm, attr);
break;
+ case KVM_S390_VM_CPU_TOPOLOGY:
+ ret = kvm_s390_get_topology(kvm, attr);
+ break;
default:
ret = -ENXIO;
break;
@@ -1828,6 +1921,9 @@ static int kvm_s390_vm_has_attr(struct kvm *kvm, struct kvm_device_attr *attr)
case KVM_S390_VM_MIGRATION:
ret = 0;
break;
+ case KVM_S390_VM_CPU_TOPOLOGY:
+ ret = test_kvm_facility(kvm, 11) ? 0 : -ENXIO;
+ break;
default:
ret = -ENXIO;
break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 5088bd9f1922..33317d820032 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1157,6 +1157,7 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_VM_TSC_CONTROL 214
#define KVM_CAP_SYSTEM_EVENT_DATA 215
#define KVM_CAP_ARM_SYSTEM_SUSPEND 216
+#define KVM_CAP_S390_CPU_TOPOLOGY 217
#ifdef KVM_CAP_IRQ_ROUTING
--
2.31.1
On 6/20/22 14:54, Pierre Morel wrote:
> We can check if SIIF is enabled by testing the sclp_info struct
> instead of testing the sie control block eca variable.
> sclp.has_ssif is the only requirement to set ECA_SII anyway
> so we can go straight to the source for that.
The subject and commit description don't fit together.
You're doing two things in this patch and only describe one of them.
I'd suggest something like this:
KVM: s390: Cleanup ipte lock access and SIIF facility checks
We can check if SIIF is enabled by testing the sclp_info struct instead
of testing the sie control block eca variable as that facility is always
enabled if available.
Also let's cleanup all the ipte related struct member accesses which
currently happen by referencing the KVM struct via the VCPU struct.
Making the KVM struct the parameter to the ipte_* functions removes one
level of indirection which makes the code more readable.
Other than that I'm happy with this patch.
>
> Signed-off-by: Pierre Morel <[email protected]>
> Reviewed-by: Janosch Frank <[email protected]>
> Reviewed-by: David Hildenbrand <[email protected]>
> ---
> arch/s390/kvm/gaccess.c | 96 ++++++++++++++++++++---------------------
> arch/s390/kvm/gaccess.h | 6 +--
> arch/s390/kvm/priv.c | 6 +--
> 3 files changed, 54 insertions(+), 54 deletions(-)
>
> diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
> index 227ed0009354..082ec5f2c3a5 100644
> --- a/arch/s390/kvm/gaccess.c
> +++ b/arch/s390/kvm/gaccess.c
> @@ -262,77 +262,77 @@ struct aste {
> /* .. more fields there */
> };
>
> -int ipte_lock_held(struct kvm_vcpu *vcpu)
> +int ipte_lock_held(struct kvm *kvm)
> {
> - if (vcpu->arch.sie_block->eca & ECA_SII) {
> + if (sclp.has_siif) {
> int rc;
>
> - read_lock(&vcpu->kvm->arch.sca_lock);
> - rc = kvm_s390_get_ipte_control(vcpu->kvm)->kh != 0;
> - read_unlock(&vcpu->kvm->arch.sca_lock);
> + read_lock(&kvm->arch.sca_lock);
> + rc = kvm_s390_get_ipte_control(kvm)->kh != 0;
> + read_unlock(&kvm->arch.sca_lock);
> return rc;
> }
> - return vcpu->kvm->arch.ipte_lock_count != 0;
> + return kvm->arch.ipte_lock_count != 0;
> }
>
> -static void ipte_lock_simple(struct kvm_vcpu *vcpu)
> +static void ipte_lock_simple(struct kvm *kvm)
> {
> union ipte_control old, new, *ic;
>
> - mutex_lock(&vcpu->kvm->arch.ipte_mutex);
> - vcpu->kvm->arch.ipte_lock_count++;
> - if (vcpu->kvm->arch.ipte_lock_count > 1)
> + mutex_lock(&kvm->arch.ipte_mutex);
> + kvm->arch.ipte_lock_count++;
> + if (kvm->arch.ipte_lock_count > 1)
> goto out;
> retry:
> - read_lock(&vcpu->kvm->arch.sca_lock);
> - ic = kvm_s390_get_ipte_control(vcpu->kvm);
> + read_lock(&kvm->arch.sca_lock);
> + ic = kvm_s390_get_ipte_control(kvm);
> do {
> old = READ_ONCE(*ic);
> if (old.k) {
> - read_unlock(&vcpu->kvm->arch.sca_lock);
> + read_unlock(&kvm->arch.sca_lock);
> cond_resched();
> goto retry;
> }
> new = old;
> new.k = 1;
> } while (cmpxchg(&ic->val, old.val, new.val) != old.val);
> - read_unlock(&vcpu->kvm->arch.sca_lock);
> + read_unlock(&kvm->arch.sca_lock);
> out:
> - mutex_unlock(&vcpu->kvm->arch.ipte_mutex);
> + mutex_unlock(&kvm->arch.ipte_mutex);
> }
>
> -static void ipte_unlock_simple(struct kvm_vcpu *vcpu)
> +static void ipte_unlock_simple(struct kvm *kvm)
> {
> union ipte_control old, new, *ic;
>
> - mutex_lock(&vcpu->kvm->arch.ipte_mutex);
> - vcpu->kvm->arch.ipte_lock_count--;
> - if (vcpu->kvm->arch.ipte_lock_count)
> + mutex_lock(&kvm->arch.ipte_mutex);
> + kvm->arch.ipte_lock_count--;
> + if (kvm->arch.ipte_lock_count)
> goto out;
> - read_lock(&vcpu->kvm->arch.sca_lock);
> - ic = kvm_s390_get_ipte_control(vcpu->kvm);
> + read_lock(&kvm->arch.sca_lock);
> + ic = kvm_s390_get_ipte_control(kvm);
> do {
> old = READ_ONCE(*ic);
> new = old;
> new.k = 0;
> } while (cmpxchg(&ic->val, old.val, new.val) != old.val);
> - read_unlock(&vcpu->kvm->arch.sca_lock);
> - wake_up(&vcpu->kvm->arch.ipte_wq);
> + read_unlock(&kvm->arch.sca_lock);
> + wake_up(&kvm->arch.ipte_wq);
> out:
> - mutex_unlock(&vcpu->kvm->arch.ipte_mutex);
> + mutex_unlock(&kvm->arch.ipte_mutex);
> }
>
> -static void ipte_lock_siif(struct kvm_vcpu *vcpu)
> +static void ipte_lock_siif(struct kvm *kvm)
> {
> union ipte_control old, new, *ic;
>
> retry:
> - read_lock(&vcpu->kvm->arch.sca_lock);
> - ic = kvm_s390_get_ipte_control(vcpu->kvm);
> + read_lock(&kvm->arch.sca_lock);
> + ic = kvm_s390_get_ipte_control(kvm);
> do {
> old = READ_ONCE(*ic);
> if (old.kg) {
> - read_unlock(&vcpu->kvm->arch.sca_lock);
> + read_unlock(&kvm->arch.sca_lock);
> cond_resched();
> goto retry;
> }
> @@ -340,15 +340,15 @@ static void ipte_lock_siif(struct kvm_vcpu *vcpu)
> new.k = 1;
> new.kh++;
> } while (cmpxchg(&ic->val, old.val, new.val) != old.val);
> - read_unlock(&vcpu->kvm->arch.sca_lock);
> + read_unlock(&kvm->arch.sca_lock);
> }
>
> -static void ipte_unlock_siif(struct kvm_vcpu *vcpu)
> +static void ipte_unlock_siif(struct kvm *kvm)
> {
> union ipte_control old, new, *ic;
>
> - read_lock(&vcpu->kvm->arch.sca_lock);
> - ic = kvm_s390_get_ipte_control(vcpu->kvm);
> + read_lock(&kvm->arch.sca_lock);
> + ic = kvm_s390_get_ipte_control(kvm);
> do {
> old = READ_ONCE(*ic);
> new = old;
> @@ -356,25 +356,25 @@ static void ipte_unlock_siif(struct kvm_vcpu *vcpu)
> if (!new.kh)
> new.k = 0;
> } while (cmpxchg(&ic->val, old.val, new.val) != old.val);
> - read_unlock(&vcpu->kvm->arch.sca_lock);
> + read_unlock(&kvm->arch.sca_lock);
> if (!new.kh)
> - wake_up(&vcpu->kvm->arch.ipte_wq);
> + wake_up(&kvm->arch.ipte_wq);
> }
>
> -void ipte_lock(struct kvm_vcpu *vcpu)
> +void ipte_lock(struct kvm *kvm)
> {
> - if (vcpu->arch.sie_block->eca & ECA_SII)
> - ipte_lock_siif(vcpu);
> + if (sclp.has_siif)
> + ipte_lock_siif(kvm);
> else
> - ipte_lock_simple(vcpu);
> + ipte_lock_simple(kvm);
> }
>
> -void ipte_unlock(struct kvm_vcpu *vcpu)
> +void ipte_unlock(struct kvm *kvm)
> {
> - if (vcpu->arch.sie_block->eca & ECA_SII)
> - ipte_unlock_siif(vcpu);
> + if (sclp.has_siif)
> + ipte_unlock_siif(kvm);
> else
> - ipte_unlock_simple(vcpu);
> + ipte_unlock_simple(kvm);
> }
>
> static int ar_translation(struct kvm_vcpu *vcpu, union asce *asce, u8 ar,
> @@ -1086,7 +1086,7 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
> try_storage_prot_override = storage_prot_override_applicable(vcpu);
> need_ipte_lock = psw_bits(*psw).dat && !asce.r;
> if (need_ipte_lock)
> - ipte_lock(vcpu);
> + ipte_lock(vcpu->kvm);
> /*
> * Since we do the access further down ultimately via a move instruction
> * that does key checking and returns an error in case of a protection
> @@ -1127,7 +1127,7 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
> }
> out_unlock:
> if (need_ipte_lock)
> - ipte_unlock(vcpu);
> + ipte_unlock(vcpu->kvm);
> if (nr_pages > ARRAY_SIZE(gpa_array))
> vfree(gpas);
> return rc;
> @@ -1199,10 +1199,10 @@ int check_gva_range(struct kvm_vcpu *vcpu, unsigned long gva, u8 ar,
> rc = get_vcpu_asce(vcpu, &asce, gva, ar, mode);
> if (rc)
> return rc;
> - ipte_lock(vcpu);
> + ipte_lock(vcpu->kvm);
> rc = guest_range_to_gpas(vcpu, gva, ar, NULL, length, asce, mode,
> access_key);
> - ipte_unlock(vcpu);
> + ipte_unlock(vcpu->kvm);
>
> return rc;
> }
> @@ -1465,7 +1465,7 @@ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
> * tables/pointers we read stay valid - unshadowing is however
> * always possible - only guest_table_lock protects us.
> */
> - ipte_lock(vcpu);
> + ipte_lock(vcpu->kvm);
>
> rc = gmap_shadow_pgt_lookup(sg, saddr, &pgt, &dat_protection, &fake);
> if (rc)
> @@ -1499,7 +1499,7 @@ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
> pte.p |= dat_protection;
> if (!rc)
> rc = gmap_shadow_page(sg, saddr, __pte(pte.val));
> - ipte_unlock(vcpu);
> + ipte_unlock(vcpu->kvm);
> mmap_read_unlock(sg->mm);
> return rc;
> }
> diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h
> index 1124ff282012..9408d6cc8e2c 100644
> --- a/arch/s390/kvm/gaccess.h
> +++ b/arch/s390/kvm/gaccess.h
> @@ -440,9 +440,9 @@ int read_guest_real(struct kvm_vcpu *vcpu, unsigned long gra, void *data,
> return access_guest_real(vcpu, gra, data, len, 0);
> }
>
> -void ipte_lock(struct kvm_vcpu *vcpu);
> -void ipte_unlock(struct kvm_vcpu *vcpu);
> -int ipte_lock_held(struct kvm_vcpu *vcpu);
> +void ipte_lock(struct kvm *kvm);
> +void ipte_unlock(struct kvm *kvm);
> +int ipte_lock_held(struct kvm *kvm);
> int kvm_s390_check_low_addr_prot_real(struct kvm_vcpu *vcpu, unsigned long gra);
>
> /* MVPG PEI indication bits */
> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
> index 83bb5cf97282..12c464c7cddf 100644
> --- a/arch/s390/kvm/priv.c
> +++ b/arch/s390/kvm/priv.c
> @@ -442,7 +442,7 @@ static int handle_ipte_interlock(struct kvm_vcpu *vcpu)
> vcpu->stat.instruction_ipte_interlock++;
> if (psw_bits(vcpu->arch.sie_block->gpsw).pstate)
> return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
> - wait_event(vcpu->kvm->arch.ipte_wq, !ipte_lock_held(vcpu));
> + wait_event(vcpu->kvm->arch.ipte_wq, !ipte_lock_held(vcpu->kvm));
> kvm_s390_retry_instr(vcpu);
> VCPU_EVENT(vcpu, 4, "%s", "retrying ipte interlock operation");
> return 0;
> @@ -1471,7 +1471,7 @@ static int handle_tprot(struct kvm_vcpu *vcpu)
> access_key = (operand2 & 0xf0) >> 4;
>
> if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_DAT)
> - ipte_lock(vcpu);
> + ipte_lock(vcpu->kvm);
>
> ret = guest_translate_address_with_key(vcpu, address, ar, &gpa,
> GACC_STORE, access_key);
> @@ -1508,7 +1508,7 @@ static int handle_tprot(struct kvm_vcpu *vcpu)
> }
>
> if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_DAT)
> - ipte_unlock(vcpu);
> + ipte_unlock(vcpu->kvm);
> return ret;
> }
>
Quoting Pierre Morel (2022-06-20 14:54:35)
> We can check if SIIF is enabled by testing the sclp_info struct
> instead of testing the sie control block eca variable.
> sclp.has_ssif is the only requirement to set ECA_SII anyway
> so we can go straight to the source for that.
>
> Signed-off-by: Pierre Morel <[email protected]>
> Reviewed-by: Janosch Frank <[email protected]>
> Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Nico Boehr <[email protected]>
On 6/20/22 14:54, Pierre Morel wrote:
> During a subsystem reset the Topology-Change-Report is cleared.
> Let's give userland the possibility to clear the MTCR in the case
> of a subsystem reset.
>
> To migrate the MTCR, we give userland the possibility to
> query the MTCR state.
>
> We indicate KVM support for the CPU topology facility with a new
> KVM capability: KVM_CAP_S390_CPU_TOPOLOGY.
>
> Signed-off-by: Pierre Morel <[email protected]>
> ---
> Documentation/virt/kvm/api.rst | 31 +++++++++++
> arch/s390/include/uapi/asm/kvm.h | 10 ++++
> arch/s390/kvm/kvm-s390.c | 96 ++++++++++++++++++++++++++++++++
> include/uapi/linux/kvm.h | 1 +
> 4 files changed, 138 insertions(+)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 11e00a46c610..326f8b7e7671 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -7956,6 +7956,37 @@ should adjust CPUID leaf 0xA to reflect that the PMU is disabled.
> When enabled, KVM will exit to userspace with KVM_EXIT_SYSTEM_EVENT of
> type KVM_SYSTEM_EVENT_SUSPEND to process the guest suspend request.
>
> +8.37 KVM_CAP_S390_CPU_TOPOLOGY
> +------------------------------
> +
> +:Capability: KVM_CAP_S390_CPU_TOPOLOGY
> +:Architectures: s390
> +:Type: vm
> +
> +This capability indicates that KVM will provide the S390 CPU Topology
> +facility which consist of the interpretation of the PTF instruction for
> +the Function Code 2 along with interception and forwarding of both the
Making function code capital surprises me when reading.
> +PTF instruction with Function Codes 0 or 1 and the STSI(15,1,x)
> +instruction to the userland hypervisor.
> +
> +The stfle facility 11, CPU Topology facility, should not be provided
s/provided/indicated
> +to the guest without this capability.
> +
> +When this capability is present, KVM provides a new attribute group
> +on vm fd, KVM_S390_VM_CPU_TOPOLOGY.
> +This new attribute allows to get, set or clear the Modified Change
> +Topology Report (MTCR) bit of the SCA through the kvm_device_attr
> +structure.
> +
> +Getting the MTCR bit is realized by using a kvm_device_attr attr
> +entry value of KVM_GET_DEVICE_ATTR and with kvm_device_attr addr
> +entry pointing to the address of a struct kvm_cpu_topology.
> +The value of the MTCR is return by the bit mtcr of the structure. > +
> +When using KVM_SET_DEVICE_ATTR the MTCR is set by using the
> +attr->attr value KVM_S390_VM_CPU_TOPO_MTCR_SET and cleared by
> +using KVM_S390_VM_CPU_TOPO_MTCR_CLEAR.
I have the feeling that we can drop the two blocks above and we won't
loose information.
> +/**
> + * kvm_s390_sca_clear_mtcr
> + * @kvm: guest KVM description
> + *
> + * Is only relevant if the topology facility is present,
> + * the caller should check KVM facility 11
> + *
> + * Updates the Multiprocessor Topology-Change-Report to signal
> + * the guest with a topology change.
> + */
> +static void kvm_s390_sca_clear_mtcr(struct kvm *kvm)
This is a set operation with the value 0 and that's clearly visible by
the copied code. If you make the utility entry a bitfield you can easily
set 0/1 via one function without doing the bit manipulation by hand.
I.e. please only use one set function.
> +{
> + struct bsca_block *sca = kvm->arch.sca; /* SCA version doesn't matter */
> +
> + ipte_lock(kvm);
> + sca->utility &= ~SCA_UTILITY_MTCR;
> + ipte_unlock(kvm);
> +}
> +
> +static int kvm_s390_set_topology(struct kvm *kvm, struct kvm_device_attr *attr)
> +{
> + if (!test_kvm_facility(kvm, 11))
> + return -ENXIO;
> +
> + switch (attr->attr) {
> + case KVM_S390_VM_CPU_TOPO_MTCR_SET:
> + kvm_s390_sca_set_mtcr(kvm);
> + break;
> + case KVM_S390_VM_CPU_TOPO_MTCR_CLEAR:
> + kvm_s390_sca_clear_mtcr(kvm);
> + break;
> + }
By having two endpoints here we trade an easy check with having to
access process memory to grab the value we want to set.
I'm still torn about this.
> + return 0;
> +}
> +
> +/**
> + * kvm_s390_sca_get_mtcr
> + * @kvm: guest KVM description
> + *
> + * Is only relevant if the topology facility is present,
> + * the caller should check KVM facility 11
> + *
> + * reports to QEMU the Multiprocessor Topology-Change-Report.
> + */
> +static int kvm_s390_sca_get_mtcr(struct kvm *kvm)
> +{
> + struct bsca_block *sca = kvm->arch.sca; /* SCA version doesn't matter */
Same comments as with the set_mtcr()
> + int val;
> +
> + ipte_lock(kvm);
> + val = sca->utility & SCA_UTILITY_MTCR;
> + ipte_unlock(kvm);
> +
> + return val;
> +}
> +
> +static int kvm_s390_get_topology(struct kvm *kvm, struct kvm_device_attr *attr)
> +{
> + struct kvm_cpu_topology topo = {};
> +
> + if (!test_kvm_facility(kvm, 11))
> + return -ENXIO;
> +
> + topo.mtcr = kvm_s390_sca_get_mtcr(kvm) ? 1 : 0;
> + if (copy_to_user((void __user *)attr->addr, &topo, sizeof(topo)))
> + return -EFAULT;
> +
> + return 0;
> +}
> +
> static int kvm_s390_vm_set_attr(struct kvm *kvm, struct kvm_device_attr *attr)
> {
> int ret;
> @@ -1730,6 +1817,9 @@ static int kvm_s390_vm_set_attr(struct kvm *kvm, struct kvm_device_attr *attr)
> case KVM_S390_VM_MIGRATION:
> ret = kvm_s390_vm_set_migration(kvm, attr);
> break;
> + case KVM_S390_VM_CPU_TOPOLOGY:
> + ret = kvm_s390_set_topology(kvm, attr);
> + break;
> default:
> ret = -ENXIO;
> break;
> @@ -1755,6 +1845,9 @@ static int kvm_s390_vm_get_attr(struct kvm *kvm, struct kvm_device_attr *attr)
> case KVM_S390_VM_MIGRATION:
> ret = kvm_s390_vm_get_migration(kvm, attr);
> break;
> + case KVM_S390_VM_CPU_TOPOLOGY:
> + ret = kvm_s390_get_topology(kvm, attr);
> + break;
> default:
> ret = -ENXIO;
> break;
> @@ -1828,6 +1921,9 @@ static int kvm_s390_vm_has_attr(struct kvm *kvm, struct kvm_device_attr *attr)
> case KVM_S390_VM_MIGRATION:
> ret = 0;
> break;
> + case KVM_S390_VM_CPU_TOPOLOGY:
> + ret = test_kvm_facility(kvm, 11) ? 0 : -ENXIO;
> + break;
> default:
> ret = -ENXIO;
> break;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 5088bd9f1922..33317d820032 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1157,6 +1157,7 @@ struct kvm_ppc_resize_hpt {
> #define KVM_CAP_VM_TSC_CONTROL 214
> #define KVM_CAP_SYSTEM_EVENT_DATA 215
> #define KVM_CAP_ARM_SYSTEM_SUSPEND 216
> +#define KVM_CAP_S390_CPU_TOPOLOGY 217
>
> #ifdef KVM_CAP_IRQ_ROUTING
>
On Mon, 20 Jun 2022 14:54:35 +0200
Pierre Morel <[email protected]> wrote:
> We can check if SIIF is enabled by testing the sclp_info struct
> instead of testing the sie control block eca variable.
> sclp.has_ssif is the only requirement to set ECA_SII anyway
> so we can go straight to the source for that.
>
> Signed-off-by: Pierre Morel <[email protected]>
> Reviewed-by: Janosch Frank <[email protected]>
> Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Claudio Imbrenda <[email protected]>
> ---
> arch/s390/kvm/gaccess.c | 96 ++++++++++++++++++++---------------------
> arch/s390/kvm/gaccess.h | 6 +--
> arch/s390/kvm/priv.c | 6 +--
> 3 files changed, 54 insertions(+), 54 deletions(-)
>
> diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
> index 227ed0009354..082ec5f2c3a5 100644
> --- a/arch/s390/kvm/gaccess.c
> +++ b/arch/s390/kvm/gaccess.c
> @@ -262,77 +262,77 @@ struct aste {
> /* .. more fields there */
> };
>
> -int ipte_lock_held(struct kvm_vcpu *vcpu)
> +int ipte_lock_held(struct kvm *kvm)
> {
> - if (vcpu->arch.sie_block->eca & ECA_SII) {
> + if (sclp.has_siif) {
> int rc;
>
> - read_lock(&vcpu->kvm->arch.sca_lock);
> - rc = kvm_s390_get_ipte_control(vcpu->kvm)->kh != 0;
> - read_unlock(&vcpu->kvm->arch.sca_lock);
> + read_lock(&kvm->arch.sca_lock);
> + rc = kvm_s390_get_ipte_control(kvm)->kh != 0;
> + read_unlock(&kvm->arch.sca_lock);
> return rc;
> }
> - return vcpu->kvm->arch.ipte_lock_count != 0;
> + return kvm->arch.ipte_lock_count != 0;
> }
>
> -static void ipte_lock_simple(struct kvm_vcpu *vcpu)
> +static void ipte_lock_simple(struct kvm *kvm)
> {
> union ipte_control old, new, *ic;
>
> - mutex_lock(&vcpu->kvm->arch.ipte_mutex);
> - vcpu->kvm->arch.ipte_lock_count++;
> - if (vcpu->kvm->arch.ipte_lock_count > 1)
> + mutex_lock(&kvm->arch.ipte_mutex);
> + kvm->arch.ipte_lock_count++;
> + if (kvm->arch.ipte_lock_count > 1)
> goto out;
> retry:
> - read_lock(&vcpu->kvm->arch.sca_lock);
> - ic = kvm_s390_get_ipte_control(vcpu->kvm);
> + read_lock(&kvm->arch.sca_lock);
> + ic = kvm_s390_get_ipte_control(kvm);
> do {
> old = READ_ONCE(*ic);
> if (old.k) {
> - read_unlock(&vcpu->kvm->arch.sca_lock);
> + read_unlock(&kvm->arch.sca_lock);
> cond_resched();
> goto retry;
> }
> new = old;
> new.k = 1;
> } while (cmpxchg(&ic->val, old.val, new.val) != old.val);
> - read_unlock(&vcpu->kvm->arch.sca_lock);
> + read_unlock(&kvm->arch.sca_lock);
> out:
> - mutex_unlock(&vcpu->kvm->arch.ipte_mutex);
> + mutex_unlock(&kvm->arch.ipte_mutex);
> }
>
> -static void ipte_unlock_simple(struct kvm_vcpu *vcpu)
> +static void ipte_unlock_simple(struct kvm *kvm)
> {
> union ipte_control old, new, *ic;
>
> - mutex_lock(&vcpu->kvm->arch.ipte_mutex);
> - vcpu->kvm->arch.ipte_lock_count--;
> - if (vcpu->kvm->arch.ipte_lock_count)
> + mutex_lock(&kvm->arch.ipte_mutex);
> + kvm->arch.ipte_lock_count--;
> + if (kvm->arch.ipte_lock_count)
> goto out;
> - read_lock(&vcpu->kvm->arch.sca_lock);
> - ic = kvm_s390_get_ipte_control(vcpu->kvm);
> + read_lock(&kvm->arch.sca_lock);
> + ic = kvm_s390_get_ipte_control(kvm);
> do {
> old = READ_ONCE(*ic);
> new = old;
> new.k = 0;
> } while (cmpxchg(&ic->val, old.val, new.val) != old.val);
> - read_unlock(&vcpu->kvm->arch.sca_lock);
> - wake_up(&vcpu->kvm->arch.ipte_wq);
> + read_unlock(&kvm->arch.sca_lock);
> + wake_up(&kvm->arch.ipte_wq);
> out:
> - mutex_unlock(&vcpu->kvm->arch.ipte_mutex);
> + mutex_unlock(&kvm->arch.ipte_mutex);
> }
>
> -static void ipte_lock_siif(struct kvm_vcpu *vcpu)
> +static void ipte_lock_siif(struct kvm *kvm)
> {
> union ipte_control old, new, *ic;
>
> retry:
> - read_lock(&vcpu->kvm->arch.sca_lock);
> - ic = kvm_s390_get_ipte_control(vcpu->kvm);
> + read_lock(&kvm->arch.sca_lock);
> + ic = kvm_s390_get_ipte_control(kvm);
> do {
> old = READ_ONCE(*ic);
> if (old.kg) {
> - read_unlock(&vcpu->kvm->arch.sca_lock);
> + read_unlock(&kvm->arch.sca_lock);
> cond_resched();
> goto retry;
> }
> @@ -340,15 +340,15 @@ static void ipte_lock_siif(struct kvm_vcpu *vcpu)
> new.k = 1;
> new.kh++;
> } while (cmpxchg(&ic->val, old.val, new.val) != old.val);
> - read_unlock(&vcpu->kvm->arch.sca_lock);
> + read_unlock(&kvm->arch.sca_lock);
> }
>
> -static void ipte_unlock_siif(struct kvm_vcpu *vcpu)
> +static void ipte_unlock_siif(struct kvm *kvm)
> {
> union ipte_control old, new, *ic;
>
> - read_lock(&vcpu->kvm->arch.sca_lock);
> - ic = kvm_s390_get_ipte_control(vcpu->kvm);
> + read_lock(&kvm->arch.sca_lock);
> + ic = kvm_s390_get_ipte_control(kvm);
> do {
> old = READ_ONCE(*ic);
> new = old;
> @@ -356,25 +356,25 @@ static void ipte_unlock_siif(struct kvm_vcpu *vcpu)
> if (!new.kh)
> new.k = 0;
> } while (cmpxchg(&ic->val, old.val, new.val) != old.val);
> - read_unlock(&vcpu->kvm->arch.sca_lock);
> + read_unlock(&kvm->arch.sca_lock);
> if (!new.kh)
> - wake_up(&vcpu->kvm->arch.ipte_wq);
> + wake_up(&kvm->arch.ipte_wq);
> }
>
> -void ipte_lock(struct kvm_vcpu *vcpu)
> +void ipte_lock(struct kvm *kvm)
> {
> - if (vcpu->arch.sie_block->eca & ECA_SII)
> - ipte_lock_siif(vcpu);
> + if (sclp.has_siif)
> + ipte_lock_siif(kvm);
> else
> - ipte_lock_simple(vcpu);
> + ipte_lock_simple(kvm);
> }
>
> -void ipte_unlock(struct kvm_vcpu *vcpu)
> +void ipte_unlock(struct kvm *kvm)
> {
> - if (vcpu->arch.sie_block->eca & ECA_SII)
> - ipte_unlock_siif(vcpu);
> + if (sclp.has_siif)
> + ipte_unlock_siif(kvm);
> else
> - ipte_unlock_simple(vcpu);
> + ipte_unlock_simple(kvm);
> }
>
> static int ar_translation(struct kvm_vcpu *vcpu, union asce *asce, u8 ar,
> @@ -1086,7 +1086,7 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
> try_storage_prot_override = storage_prot_override_applicable(vcpu);
> need_ipte_lock = psw_bits(*psw).dat && !asce.r;
> if (need_ipte_lock)
> - ipte_lock(vcpu);
> + ipte_lock(vcpu->kvm);
> /*
> * Since we do the access further down ultimately via a move instruction
> * that does key checking and returns an error in case of a protection
> @@ -1127,7 +1127,7 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
> }
> out_unlock:
> if (need_ipte_lock)
> - ipte_unlock(vcpu);
> + ipte_unlock(vcpu->kvm);
> if (nr_pages > ARRAY_SIZE(gpa_array))
> vfree(gpas);
> return rc;
> @@ -1199,10 +1199,10 @@ int check_gva_range(struct kvm_vcpu *vcpu, unsigned long gva, u8 ar,
> rc = get_vcpu_asce(vcpu, &asce, gva, ar, mode);
> if (rc)
> return rc;
> - ipte_lock(vcpu);
> + ipte_lock(vcpu->kvm);
> rc = guest_range_to_gpas(vcpu, gva, ar, NULL, length, asce, mode,
> access_key);
> - ipte_unlock(vcpu);
> + ipte_unlock(vcpu->kvm);
>
> return rc;
> }
> @@ -1465,7 +1465,7 @@ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
> * tables/pointers we read stay valid - unshadowing is however
> * always possible - only guest_table_lock protects us.
> */
> - ipte_lock(vcpu);
> + ipte_lock(vcpu->kvm);
>
> rc = gmap_shadow_pgt_lookup(sg, saddr, &pgt, &dat_protection, &fake);
> if (rc)
> @@ -1499,7 +1499,7 @@ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
> pte.p |= dat_protection;
> if (!rc)
> rc = gmap_shadow_page(sg, saddr, __pte(pte.val));
> - ipte_unlock(vcpu);
> + ipte_unlock(vcpu->kvm);
> mmap_read_unlock(sg->mm);
> return rc;
> }
> diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h
> index 1124ff282012..9408d6cc8e2c 100644
> --- a/arch/s390/kvm/gaccess.h
> +++ b/arch/s390/kvm/gaccess.h
> @@ -440,9 +440,9 @@ int read_guest_real(struct kvm_vcpu *vcpu, unsigned long gra, void *data,
> return access_guest_real(vcpu, gra, data, len, 0);
> }
>
> -void ipte_lock(struct kvm_vcpu *vcpu);
> -void ipte_unlock(struct kvm_vcpu *vcpu);
> -int ipte_lock_held(struct kvm_vcpu *vcpu);
> +void ipte_lock(struct kvm *kvm);
> +void ipte_unlock(struct kvm *kvm);
> +int ipte_lock_held(struct kvm *kvm);
> int kvm_s390_check_low_addr_prot_real(struct kvm_vcpu *vcpu, unsigned long gra);
>
> /* MVPG PEI indication bits */
> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
> index 83bb5cf97282..12c464c7cddf 100644
> --- a/arch/s390/kvm/priv.c
> +++ b/arch/s390/kvm/priv.c
> @@ -442,7 +442,7 @@ static int handle_ipte_interlock(struct kvm_vcpu *vcpu)
> vcpu->stat.instruction_ipte_interlock++;
> if (psw_bits(vcpu->arch.sie_block->gpsw).pstate)
> return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
> - wait_event(vcpu->kvm->arch.ipte_wq, !ipte_lock_held(vcpu));
> + wait_event(vcpu->kvm->arch.ipte_wq, !ipte_lock_held(vcpu->kvm));
> kvm_s390_retry_instr(vcpu);
> VCPU_EVENT(vcpu, 4, "%s", "retrying ipte interlock operation");
> return 0;
> @@ -1471,7 +1471,7 @@ static int handle_tprot(struct kvm_vcpu *vcpu)
> access_key = (operand2 & 0xf0) >> 4;
>
> if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_DAT)
> - ipte_lock(vcpu);
> + ipte_lock(vcpu->kvm);
>
> ret = guest_translate_address_with_key(vcpu, address, ar, &gpa,
> GACC_STORE, access_key);
> @@ -1508,7 +1508,7 @@ static int handle_tprot(struct kvm_vcpu *vcpu)
> }
>
> if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_DAT)
> - ipte_unlock(vcpu);
> + ipte_unlock(vcpu->kvm);
> return ret;
> }
>
On 6/24/22 08:57, Nico Boehr wrote:
> Quoting Pierre Morel (2022-06-20 14:54:35)
>> We can check if SIIF is enabled by testing the sclp_info struct
>> instead of testing the sie control block eca variable.
>> sclp.has_ssif is the only requirement to set ECA_SII anyway
>> so we can go straight to the source for that.
>>
>> Signed-off-by: Pierre Morel <[email protected]>
>> Reviewed-by: Janosch Frank <[email protected]>
>> Reviewed-by: David Hildenbrand <[email protected]>
>
> Reviewed-by: Nico Boehr <[email protected]>
>
Thanks,
Pierre
--
Pierre Morel
IBM Lab Boeblingen
On 6/24/22 08:50, Janosch Frank wrote:
> On 6/20/22 14:54, Pierre Morel wrote:
>> During a subsystem reset the Topology-Change-Report is cleared.
>> Let's give userland the possibility to clear the MTCR in the case
>> of a subsystem reset.
>>
>> To migrate the MTCR, we give userland the possibility to
>> query the MTCR state.
>>
>> We indicate KVM support for the CPU topology facility with a new
>> KVM capability: KVM_CAP_S390_CPU_TOPOLOGY.
>>
>> Signed-off-by: Pierre Morel <[email protected]>
>> ---
>> Documentation/virt/kvm/api.rst | 31 +++++++++++
>> arch/s390/include/uapi/asm/kvm.h | 10 ++++
>> arch/s390/kvm/kvm-s390.c | 96 ++++++++++++++++++++++++++++++++
>> include/uapi/linux/kvm.h | 1 +
>> 4 files changed, 138 insertions(+)
>>
>> diff --git a/Documentation/virt/kvm/api.rst
>> b/Documentation/virt/kvm/api.rst
>> index 11e00a46c610..326f8b7e7671 100644
>> --- a/Documentation/virt/kvm/api.rst
>> +++ b/Documentation/virt/kvm/api.rst
>> @@ -7956,6 +7956,37 @@ should adjust CPUID leaf 0xA to reflect that
>> the PMU is disabled.
>> When enabled, KVM will exit to userspace with KVM_EXIT_SYSTEM_EVENT of
>> type KVM_SYSTEM_EVENT_SUSPEND to process the guest suspend request.
>> +8.37 KVM_CAP_S390_CPU_TOPOLOGY
>> +------------------------------
>> +
>> +:Capability: KVM_CAP_S390_CPU_TOPOLOGY
>> +:Architectures: s390
>> +:Type: vm
>> +
>> +This capability indicates that KVM will provide the S390 CPU Topology
>> +facility which consist of the interpretation of the PTF instruction for
>> +the Function Code 2 along with interception and forwarding of both the
>
> Making function code capital surprises me when reading.
wanted to highlight FC.
I remove it.
>
>> +PTF instruction with Function Codes 0 or 1 and the STSI(15,1,x)
>> +instruction to the userland hypervisor.
>> +
>> +The stfle facility 11, CPU Topology facility, should not be provided
>
> s/provided/indicated
>
OK
>> +to the guest without this capability.
>> +
>> +When this capability is present, KVM provides a new attribute group
>> +on vm fd, KVM_S390_VM_CPU_TOPOLOGY.
>> +This new attribute allows to get, set or clear the Modified Change
>> +Topology Report (MTCR) bit of the SCA through the kvm_device_attr
>> +structure.
>> +
>> +Getting the MTCR bit is realized by using a kvm_device_attr attr
>> +entry value of KVM_GET_DEVICE_ATTR and with kvm_device_attr addr
>> +entry pointing to the address of a struct kvm_cpu_topology.
>> +The value of the MTCR is return by the bit mtcr of the structure. > +
>> +When using KVM_SET_DEVICE_ATTR the MTCR is set by using the
>> +attr->attr value KVM_S390_VM_CPU_TOPO_MTCR_SET and cleared by
>> +using KVM_S390_VM_CPU_TOPO_MTCR_CLEAR.
>
> I have the feeling that we can drop the two blocks above and we won't
> loose information.
>
>> +/**
>> + * kvm_s390_sca_clear_mtcr
>> + * @kvm: guest KVM description
>> + *
>> + * Is only relevant if the topology facility is present,
>> + * the caller should check KVM facility 11
>> + *
>> + * Updates the Multiprocessor Topology-Change-Report to signal
>> + * the guest with a topology change.
>> + */
>> +static void kvm_s390_sca_clear_mtcr(struct kvm *kvm)
>
> This is a set operation with the value 0 and that's clearly visible by
> the copied code. If you make the utility entry a bitfield you can easily
> set 0/1 via one function without doing the bit manipulation by hand.
OK
>
> I.e. please only use one set function.
>
>> +{
>> + struct bsca_block *sca = kvm->arch.sca; /* SCA version doesn't
>> matter */
>> +
>> + ipte_lock(kvm);
>> + sca->utility &= ~SCA_UTILITY_MTCR;
>> + ipte_unlock(kvm);
>> +}
>> +
>> +static int kvm_s390_set_topology(struct kvm *kvm, struct
>> kvm_device_attr *attr)
>> +{
>> + if (!test_kvm_facility(kvm, 11))
>> + return -ENXIO;
>> +
>> + switch (attr->attr) {
>> + case KVM_S390_VM_CPU_TOPO_MTCR_SET:
>> + kvm_s390_sca_set_mtcr(kvm);
>> + break;
>> + case KVM_S390_VM_CPU_TOPO_MTCR_CLEAR:
>> + kvm_s390_sca_clear_mtcr(kvm);
>> + break;
>> + }
>
> By having two endpoints here we trade an easy check with having to
> access process memory to grab the value we want to set.
>
> I'm still torn about this.
>
>> + return 0;
>> +}
>> +
>> +/**
>> + * kvm_s390_sca_get_mtcr
>> + * @kvm: guest KVM description
>> + *
>> + * Is only relevant if the topology facility is present,
>> + * the caller should check KVM facility 11
>> + *
>> + * reports to QEMU the Multiprocessor Topology-Change-Report.
>> + */
>> +static int kvm_s390_sca_get_mtcr(struct kvm *kvm)
>> +{
>> + struct bsca_block *sca = kvm->arch.sca; /* SCA version doesn't
>> matter */
>
> Same comments as with the set_mtcr()
OK
>
>> + int val;
>> +
>> + ipte_lock(kvm);
>> + val = sca->utility & SCA_UTILITY_MTCR;
>> + ipte_unlock(kvm);
>> +
>> + return val;
>> +}
>> +
>> +static int kvm_s390_get_topology(struct kvm *kvm, struct
>> kvm_device_attr *attr)
>> +{
>> + struct kvm_cpu_topology topo = {};
>> +
>> + if (!test_kvm_facility(kvm, 11))
>> + return -ENXIO;
>> +
>> + topo.mtcr = kvm_s390_sca_get_mtcr(kvm) ? 1 : 0;
>> + if (copy_to_user((void __user *)attr->addr, &topo, sizeof(topo)))
>> + return -EFAULT;
>> +
>> + return 0;
>> +}
>> +
>> static int kvm_s390_vm_set_attr(struct kvm *kvm, struct
>> kvm_device_attr *attr)
>> {
>> int ret;
>> @@ -1730,6 +1817,9 @@ static int kvm_s390_vm_set_attr(struct kvm *kvm,
>> struct kvm_device_attr *attr)
>> case KVM_S390_VM_MIGRATION:
>> ret = kvm_s390_vm_set_migration(kvm, attr);
>> break;
>> + case KVM_S390_VM_CPU_TOPOLOGY:
>> + ret = kvm_s390_set_topology(kvm, attr);
>> + break;
>> default:
>> ret = -ENXIO;
>> break;
>> @@ -1755,6 +1845,9 @@ static int kvm_s390_vm_get_attr(struct kvm *kvm,
>> struct kvm_device_attr *attr)
>> case KVM_S390_VM_MIGRATION:
>> ret = kvm_s390_vm_get_migration(kvm, attr);
>> break;
>> + case KVM_S390_VM_CPU_TOPOLOGY:
>> + ret = kvm_s390_get_topology(kvm, attr);
>> + break;
>> default:
>> ret = -ENXIO;
>> break;
>> @@ -1828,6 +1921,9 @@ static int kvm_s390_vm_has_attr(struct kvm *kvm,
>> struct kvm_device_attr *attr)
>> case KVM_S390_VM_MIGRATION:
>> ret = 0;
>> break;
>> + case KVM_S390_VM_CPU_TOPOLOGY:
>> + ret = test_kvm_facility(kvm, 11) ? 0 : -ENXIO;
>> + break;
>> default:
>> ret = -ENXIO;
>> break;
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 5088bd9f1922..33317d820032 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -1157,6 +1157,7 @@ struct kvm_ppc_resize_hpt {
>> #define KVM_CAP_VM_TSC_CONTROL 214
>> #define KVM_CAP_SYSTEM_EVENT_DATA 215
>> #define KVM_CAP_ARM_SYSTEM_SUSPEND 216
>> +#define KVM_CAP_S390_CPU_TOPOLOGY 217
>> #ifdef KVM_CAP_IRQ_ROUTING
>
--
Pierre Morel
IBM Lab Boeblingen
On 6/24/22 07:47, Janosch Frank wrote:
> On 6/20/22 14:54, Pierre Morel wrote:
>> We can check if SIIF is enabled by testing the sclp_info struct
>> instead of testing the sie control block eca variable.
>> sclp.has_ssif is the only requirement to set ECA_SII anyway
>> so we can go straight to the source for that.
>
>
> The subject and commit description don't fit together.
> You're doing two things in this patch and only describe one of them.
>
> I'd suggest something like this:
>
> KVM: s390: Cleanup ipte lock access and SIIF facility checks
>
> We can check if SIIF is enabled by testing the sclp_info struct instead
> of testing the sie control block eca variable as that facility is always
> enabled if available.
>
> Also let's cleanup all the ipte related struct member accesses which
> currently happen by referencing the KVM struct via the VCPU struct.
> Making the KVM struct the parameter to the ipte_* functions removes one
> level of indirection which makes the code more readable.
>
OK done.
>
> Other than that I'm happy with this patch.
Thanks,
Pierre
--
Pierre Morel
IBM Lab Boeblingen
On 6/20/22 14:54, Pierre Morel wrote:
> During a subsystem reset the Topology-Change-Report is cleared.
> Let's give userland the possibility to clear the MTCR in the case
> of a subsystem reset.
>
> To migrate the MTCR, we give userland the possibility to
> query the MTCR state.
>
> We indicate KVM support for the CPU topology facility with a new
> KVM capability: KVM_CAP_S390_CPU_TOPOLOGY.
>
> Signed-off-by: Pierre Morel <[email protected]>
> ---
> Documentation/virt/kvm/api.rst | 31 +++++++++++
> arch/s390/include/uapi/asm/kvm.h | 10 ++++
> arch/s390/kvm/kvm-s390.c | 96 ++++++++++++++++++++++++++++++++
> include/uapi/linux/kvm.h | 1 +
> 4 files changed, 138 insertions(+)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 11e00a46c610..326f8b7e7671 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -7956,6 +7956,37 @@ should adjust CPUID leaf 0xA to reflect that the PMU is disabled.
> When enabled, KVM will exit to userspace with KVM_EXIT_SYSTEM_EVENT of
> type KVM_SYSTEM_EVENT_SUSPEND to process the guest suspend request.
>
> +8.37 KVM_CAP_S390_CPU_TOPOLOGY
> +------------------------------
> +
> +:Capability: KVM_CAP_S390_CPU_TOPOLOGY
> +:Architectures: s390
> +:Type: vm
> +
> +This capability indicates that KVM will provide the S390 CPU Topology
> +facility which consist of the interpretation of the PTF instruction for
> +the Function Code 2 along with interception and forwarding of both the
> +PTF instruction with Function Codes 0 or 1 and the STSI(15,1,x)
> +instruction to the userland hypervisor.
The way the code is written, STSI 15.x.x is forwarded to user space,
might actually make sense to future proof the code by restricting that
to 15.1.2-6 in priv.c.
> +
> +The stfle facility 11, CPU Topology facility, should not be provided
> +to the guest without this capability.
> +
> +When this capability is present, KVM provides a new attribute group
> +on vm fd, KVM_S390_VM_CPU_TOPOLOGY.
> +This new attribute allows to get, set or clear the Modified Change
> +Topology Report (MTCR) bit of the SCA through the kvm_device_attr
> +structure.
> +
> +Getting the MTCR bit is realized by using a kvm_device_attr attr
> +entry value of KVM_GET_DEVICE_ATTR and with kvm_device_attr addr
> +entry pointing to the address of a struct kvm_cpu_topology.
> +The value of the MTCR is return by the bit mtcr of the structure.
> +
> +When using KVM_SET_DEVICE_ATTR the MTCR is set by using the
> +attr->attr value KVM_S390_VM_CPU_TOPO_MTCR_SET and cleared by
> +using KVM_S390_VM_CPU_TOPO_MTCR_CLEAR.
> +
> 9. Known KVM API problems
> =========================
>
> diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
> index 7a6b14874d65..df5e8279ffd0 100644
> --- a/arch/s390/include/uapi/asm/kvm.h
> +++ b/arch/s390/include/uapi/asm/kvm.h
> @@ -74,6 +74,7 @@ struct kvm_s390_io_adapter_req {
> #define KVM_S390_VM_CRYPTO 2
> #define KVM_S390_VM_CPU_MODEL 3
> #define KVM_S390_VM_MIGRATION 4
> +#define KVM_S390_VM_CPU_TOPOLOGY 5
>
> /* kvm attributes for mem_ctrl */
> #define KVM_S390_VM_MEM_ENABLE_CMMA 0
> @@ -171,6 +172,15 @@ struct kvm_s390_vm_cpu_subfunc {
> #define KVM_S390_VM_MIGRATION_START 1
> #define KVM_S390_VM_MIGRATION_STATUS 2
>
> +/* kvm attributes for cpu topology */
> +#define KVM_S390_VM_CPU_TOPO_MTCR_CLEAR 0
> +#define KVM_S390_VM_CPU_TOPO_MTCR_SET 1
Are you going to transition to a set-value-provided-by-user API with the next series?
I don't particularly like that MTCR is user visible, it's kind of an implementation detail.
> +
> +struct kvm_cpu_topology {
> + __u16 mtcr : 1;
So I'd give this a more descriptive name, report_topology_change/topo_change_report_pending ?
> + __u16 reserved : 15;
Are these bits for future proofing? If so a few more would do no harm IMO.
> +};
The use of a bit field in uapi surprised my, but I guess it's fine and kvm_sync_regs has them too.
> +
> /* for KVM_GET_REGS and KVM_SET_REGS */
> struct kvm_regs {
> /* general purpose regs for s390 */
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 95b96019ca8e..ae39041bb149 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -606,6 +606,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> case KVM_CAP_S390_PROTECTED:
> r = is_prot_virt_host();
> break;
> + case KVM_CAP_S390_CPU_TOPOLOGY:
> + r = test_facility(11);
> + break;
> default:
> r = 0;
> }
> @@ -817,6 +820,20 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
> icpt_operexc_on_all_vcpus(kvm);
> r = 0;
> break;
> + case KVM_CAP_S390_CPU_TOPOLOGY:
> + r = -EINVAL;
> + mutex_lock(&kvm->lock);
> + if (kvm->created_vcpus) {
> + r = -EBUSY;
> + } else if (test_facility(11)) {
> + set_kvm_facility(kvm->arch.model.fac_mask, 11);
> + set_kvm_facility(kvm->arch.model.fac_list, 11);
> + r = 0;
> + }
> + mutex_unlock(&kvm->lock);
> + VM_EVENT(kvm, 3, "ENABLE: CPU TOPOLOGY %s",
Most of the other cases spell out the cap, so it'd be "ENABLE: CAP_S390_CPU_TOPOLOGY %s".
> + r ? "(not available)" : "(success)");
> + break;
> default:
> r = -EINVAL;
> break;
> @@ -1710,6 +1727,76 @@ static void kvm_s390_sca_set_mtcr(struct kvm *kvm)
> ipte_unlock(kvm);
> }
>
Some brainstorming function names:
kvm_s390_get_topo_change_report
kvm_s390_(un|re)set_topo_change_report
kvm_s390_(publish|revoke|unpublish)_topo_change_report
kvm_s390_(report|signal|revoke)_topology_change
[...]
On 6/28/22 18:41, Janis Schoetterl-Glausch wrote:
> On 6/20/22 14:54, Pierre Morel wrote:
>> During a subsystem reset the Topology-Change-Report is cleared.
>> Let's give userland the possibility to clear the MTCR in the case
>> of a subsystem reset.
>>
>> To migrate the MTCR, we give userland the possibility to
>> query the MTCR state.
>>
>> We indicate KVM support for the CPU topology facility with a new
>> KVM capability: KVM_CAP_S390_CPU_TOPOLOGY.
>>
>> Signed-off-by: Pierre Morel <[email protected]>
>> ---
>> Documentation/virt/kvm/api.rst | 31 +++++++++++
>> arch/s390/include/uapi/asm/kvm.h | 10 ++++
>> arch/s390/kvm/kvm-s390.c | 96 ++++++++++++++++++++++++++++++++
>> include/uapi/linux/kvm.h | 1 +
>> 4 files changed, 138 insertions(+)
>>
>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>> index 11e00a46c610..326f8b7e7671 100644
>> --- a/Documentation/virt/kvm/api.rst
>> +++ b/Documentation/virt/kvm/api.rst
>> @@ -7956,6 +7956,37 @@ should adjust CPUID leaf 0xA to reflect that the PMU is disabled.
>> When enabled, KVM will exit to userspace with KVM_EXIT_SYSTEM_EVENT of
>> type KVM_SYSTEM_EVENT_SUSPEND to process the guest suspend request.
>>
>> +8.37 KVM_CAP_S390_CPU_TOPOLOGY
>> +------------------------------
>> +
>> +:Capability: KVM_CAP_S390_CPU_TOPOLOGY
>> +:Architectures: s390
>> +:Type: vm
>> +
>> +This capability indicates that KVM will provide the S390 CPU Topology
>> +facility which consist of the interpretation of the PTF instruction for
>> +the Function Code 2 along with interception and forwarding of both the
>> +PTF instruction with Function Codes 0 or 1 and the STSI(15,1,x)
>> +instruction to the userland hypervisor.
>
> The way the code is written, STSI 15.x.x is forwarded to user space,
> might actually make sense to future proof the code by restricting that
> to 15.1.2-6 in priv.c.
>> +
>> +The stfle facility 11, CPU Topology facility, should not be provided
>> +to the guest without this capability.
>> +
>> +When this capability is present, KVM provides a new attribute group
>> +on vm fd, KVM_S390_VM_CPU_TOPOLOGY.
>> +This new attribute allows to get, set or clear the Modified Change
>> +Topology Report (MTCR) bit of the SCA through the kvm_device_attr
>> +structure.
>> +
>> +Getting the MTCR bit is realized by using a kvm_device_attr attr
>> +entry value of KVM_GET_DEVICE_ATTR and with kvm_device_attr addr
>> +entry pointing to the address of a struct kvm_cpu_topology.
>> +The value of the MTCR is return by the bit mtcr of the structure.
>> +
>> +When using KVM_SET_DEVICE_ATTR the MTCR is set by using the
>> +attr->attr value KVM_S390_VM_CPU_TOPO_MTCR_SET and cleared by
>> +using KVM_S390_VM_CPU_TOPO_MTCR_CLEAR.
>> +
>> 9. Known KVM API problems
>> =========================
>>
>> diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
>> index 7a6b14874d65..df5e8279ffd0 100644
>> --- a/arch/s390/include/uapi/asm/kvm.h
>> +++ b/arch/s390/include/uapi/asm/kvm.h
>> @@ -74,6 +74,7 @@ struct kvm_s390_io_adapter_req {
>> #define KVM_S390_VM_CRYPTO 2
>> #define KVM_S390_VM_CPU_MODEL 3
>> #define KVM_S390_VM_MIGRATION 4
>> +#define KVM_S390_VM_CPU_TOPOLOGY 5
>>
>> /* kvm attributes for mem_ctrl */
>> #define KVM_S390_VM_MEM_ENABLE_CMMA 0
>> @@ -171,6 +172,15 @@ struct kvm_s390_vm_cpu_subfunc {
>> #define KVM_S390_VM_MIGRATION_START 1
>> #define KVM_S390_VM_MIGRATION_STATUS 2
>>
>> +/* kvm attributes for cpu topology */
>> +#define KVM_S390_VM_CPU_TOPO_MTCR_CLEAR 0
>> +#define KVM_S390_VM_CPU_TOPO_MTCR_SET 1
>
> Are you going to transition to a set-value-provided-by-user API with the next series?
> I don't particularly like that MTCR is user visible, it's kind of an implementation detail.
It is not the same structure as the hardware structure.
Even it looks like it.
I am OK to use something else, like a u8
in that case I need to say userland that the size of the data returned
by get KVM_S390_VM_CPU_TOPOLOGY is u8.
I find this is a lack in the definition of the kvm_device_attr, it
should have a size entry.
All other user of kvm_device_attr have structures and it is easy to the
userland to get the size using the sizeof(struct...) one can say that
userland knows that the parameter for topology is a u8 but that hurt me
somehow.
May be it is stupid, for the other calls the user has to know the name
of the structure anyway.
Then we can say the value of u8 bit 1 is the value of the mtcr.
OK for me.
What do you think?
>
>> +
>> +struct kvm_cpu_topology {
>> + __u16 mtcr : 1;
>
> So I'd give this a more descriptive name, report_topology_change/topo_change_report_pending ?
>
>> + __u16 reserved : 15;
>
> Are these bits for future proofing? If so a few more would do no harm IMO.
>> +};
>
> The use of a bit field in uapi surprised my, but I guess it's fine and kvm_sync_regs has them too.
>> +
>> /* for KVM_GET_REGS and KVM_SET_REGS */
>> struct kvm_regs {
>> /* general purpose regs for s390 */
>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>> index 95b96019ca8e..ae39041bb149 100644
>> --- a/arch/s390/kvm/kvm-s390.c
>> +++ b/arch/s390/kvm/kvm-s390.c
>> @@ -606,6 +606,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>> case KVM_CAP_S390_PROTECTED:
>> r = is_prot_virt_host();
>> break;
>> + case KVM_CAP_S390_CPU_TOPOLOGY:
>> + r = test_facility(11);
>> + break;
>> default:
>> r = 0;
>> }
>> @@ -817,6 +820,20 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
>> icpt_operexc_on_all_vcpus(kvm);
>> r = 0;
>> break;
>> + case KVM_CAP_S390_CPU_TOPOLOGY:
>> + r = -EINVAL;
>> + mutex_lock(&kvm->lock);
>> + if (kvm->created_vcpus) {
>> + r = -EBUSY;
>> + } else if (test_facility(11)) {
>> + set_kvm_facility(kvm->arch.model.fac_mask, 11);
>> + set_kvm_facility(kvm->arch.model.fac_list, 11);
>> + r = 0;
>> + }
>> + mutex_unlock(&kvm->lock);
>> + VM_EVENT(kvm, 3, "ENABLE: CPU TOPOLOGY %s",
>
> Most of the other cases spell out the cap, so it'd be "ENABLE: CAP_S390_CPU_TOPOLOGY %s".
OK
>
>> + r ? "(not available)" : "(success)");
>> + break;
>> default:
>> r = -EINVAL;
>> break;
>> @@ -1710,6 +1727,76 @@ static void kvm_s390_sca_set_mtcr(struct kvm *kvm)
>> ipte_unlock(kvm);
>> }
>>
>
> Some brainstorming function names:
>
> kvm_s390_get_topo_change_report
> kvm_s390_(un|re)set_topo_change_report
> kvm_s390_(publish|revoke|unpublish)_topo_change_report
> kvm_s390_(report|signal|revoke)_topology_change
kvm_s390_update_topology_change_report ?
>
> [...]
>
--
Pierre Morel
IBM Lab Boeblingen