M2 MacBook Air has mismatched CCSIDR associativity bits, which makes the
bits a KVM vCPU sees inconsistent when migrating.
It also makes QEMU fail restoring the vCPU registers because QEMU saves
and restores all of the registers including CCSIDRs, and if the vCPU
migrated among physical CPUs between saving and restoring, it tries to
restore CCSIDR values that mismatch with the current physical CPU, which
causes EFAULT.
Trap CCSIDRs if there are CCSIDR value msimatches, and override the
associativity bits when handling the trap.
Akihiko Odaki (3):
KVM: arm64: Make CCSIDRs consistent
arm64: errata: Check for mismatched cache associativity
KVM: arm64: Handle CCSIDR associativity mismatches
arch/arm64/include/asm/cache.h | 3 ++
arch/arm64/include/asm/cpu.h | 1 +
arch/arm64/include/asm/cpufeature.h | 8 +++++
arch/arm64/include/asm/kvm_emulate.h | 10 ++++--
arch/arm64/include/asm/sysreg.h | 7 ++++
arch/arm64/kernel/cacheinfo.c | 4 +--
arch/arm64/kernel/cpu_errata.c | 52 ++++++++++++++++++++++++++++
arch/arm64/kernel/cpufeature.c | 4 +++
arch/arm64/kernel/cpuinfo.c | 30 ++++++++++++++++
arch/arm64/kvm/sys_regs.c | 50 ++++++++++++++------------
arch/arm64/tools/cpucaps | 1 +
11 files changed, 144 insertions(+), 26 deletions(-)
--
2.38.1
A vCPU sees masked CCSIDRs when the physical CPUs has mismatched
cache types or the vCPU has 32-bit EL1. Perform the same masking for
ioctls too so that ioctls shows values consistent with the values the
vCPU actually sees.
Signed-off-by: Akihiko Odaki <[email protected]>
---
arch/arm64/include/asm/kvm_emulate.h | 9 ++++--
arch/arm64/kvm/sys_regs.c | 45 ++++++++++++++--------------
2 files changed, 30 insertions(+), 24 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 9bdba47f7e14..b45cf8903190 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -61,6 +61,12 @@ static __always_inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
}
#endif
+static inline bool vcpu_cache_overridden(struct kvm_vcpu *vcpu)
+{
+ return cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
+ vcpu_el1_is_32bit(vcpu);
+}
+
static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
{
vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS;
@@ -88,8 +94,7 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
if (vcpu_el1_is_32bit(vcpu))
vcpu->arch.hcr_el2 &= ~HCR_RW;
- if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
- vcpu_el1_is_32bit(vcpu))
+ if (vcpu_cache_overridden(vcpu))
vcpu->arch.hcr_el2 |= HCR_TID2;
if (kvm_has_mte(vcpu->kvm))
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index f4a7c5abcbca..273ed1aaa6b3 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -88,7 +88,7 @@ static u32 cache_levels;
#define CSSELR_MAX 14
/* Which cache CCSIDR represents depends on CSSELR value. */
-static u32 get_ccsidr(u32 csselr)
+static u32 get_ccsidr(struct kvm_vcpu *vcpu, u32 csselr)
{
u32 ccsidr;
@@ -99,6 +99,21 @@ static u32 get_ccsidr(u32 csselr)
ccsidr = read_sysreg(ccsidr_el1);
local_irq_enable();
+ /*
+ * Guests should not be doing cache operations by set/way at all, and
+ * for this reason, we trap them and attempt to infer the intent, so
+ * that we can flush the entire guest's address space at the appropriate
+ * time.
+ * To prevent this trapping from causing performance problems, let's
+ * expose the geometry of all data and unified caches (which are
+ * guaranteed to be PIPT and thus non-aliasing) as 1 set and 1 way.
+ * [If guests should attempt to infer aliasing properties from the
+ * geometry (which is not permitted by the architecture), they would
+ * only do so for virtually indexed caches.]
+ */
+ if (vcpu_cache_overridden(vcpu) && !(csselr & 1)) // data or unified cache
+ ccsidr &= ~GENMASK(27, 3);
+
return ccsidr;
}
@@ -1300,22 +1315,8 @@ static bool access_ccsidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
return write_to_read_only(vcpu, p, r);
csselr = vcpu_read_sys_reg(vcpu, CSSELR_EL1);
- p->regval = get_ccsidr(csselr);
+ p->regval = get_ccsidr(vcpu, csselr);
- /*
- * Guests should not be doing cache operations by set/way at all, and
- * for this reason, we trap them and attempt to infer the intent, so
- * that we can flush the entire guest's address space at the appropriate
- * time.
- * To prevent this trapping from causing performance problems, let's
- * expose the geometry of all data and unified caches (which are
- * guaranteed to be PIPT and thus non-aliasing) as 1 set and 1 way.
- * [If guests should attempt to infer aliasing properties from the
- * geometry (which is not permitted by the architecture), they would
- * only do so for virtually indexed caches.]
- */
- if (!(csselr & 1)) // data or unified cache
- p->regval &= ~GENMASK(27, 3);
return true;
}
@@ -2686,7 +2687,7 @@ static bool is_valid_cache(u32 val)
}
}
-static int demux_c15_get(u64 id, void __user *uaddr)
+static int demux_c15_get(struct kvm_vcpu *vcpu, u64 id, void __user *uaddr)
{
u32 val;
u32 __user *uval = uaddr;
@@ -2705,13 +2706,13 @@ static int demux_c15_get(u64 id, void __user *uaddr)
if (!is_valid_cache(val))
return -ENOENT;
- return put_user(get_ccsidr(val), uval);
+ return put_user(get_ccsidr(vcpu, val), uval);
default:
return -ENOENT;
}
}
-static int demux_c15_set(u64 id, void __user *uaddr)
+static int demux_c15_set(struct kvm_vcpu *vcpu, u64 id, void __user *uaddr)
{
u32 val, newval;
u32 __user *uval = uaddr;
@@ -2734,7 +2735,7 @@ static int demux_c15_set(u64 id, void __user *uaddr)
return -EFAULT;
/* This is also invariant: you can't change it. */
- if (newval != get_ccsidr(val))
+ if (newval != get_ccsidr(vcpu, val))
return -EINVAL;
return 0;
default:
@@ -2773,7 +2774,7 @@ int kvm_arm_sys_reg_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg
int err;
if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_DEMUX)
- return demux_c15_get(reg->id, uaddr);
+ return demux_c15_get(vcpu, reg->id, uaddr);
err = get_invariant_sys_reg(reg->id, uaddr);
if (err != -ENOENT)
@@ -2817,7 +2818,7 @@ int kvm_arm_sys_reg_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg
int err;
if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_DEMUX)
- return demux_c15_set(reg->id, uaddr);
+ return demux_c15_set(vcpu, reg->id, uaddr);
err = set_invariant_sys_reg(reg->id, uaddr);
if (err != -ENOENT)
--
2.38.1
CCSIDR associativity mismatches among the physical CPUs causes a vCPU
see inconsistent values when it is migrated among physical CPUs.
It also makes QEMU fail restoring the vCPU registers because QEMU saves
and restores all of the registers including CCSIDRs, and if the vCPU
migrated among physical CPUs between saving and restoring, it tries to
restore CCSIDR values that mismatch with the current physical CPU, which
causes EFAULT.
Trap CCSIDRs if there are CCSIDR value msimatches, and override the
associativity bits when handling the trap.
Signed-off-by: Akihiko Odaki <[email protected]>
---
arch/arm64/include/asm/kvm_emulate.h | 1 +
arch/arm64/kvm/sys_regs.c | 7 ++++++-
2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index b45cf8903190..df2bab867e12 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -64,6 +64,7 @@ static __always_inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
static inline bool vcpu_cache_overridden(struct kvm_vcpu *vcpu)
{
return cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
+ cpus_have_const_cap(ARM64_MISMATCHED_CACHE_ASSOCIATIVITY) ||
vcpu_el1_is_32bit(vcpu);
}
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 1f0cb015e81c..181a5b215a0e 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -110,8 +110,13 @@ static u32 get_ccsidr(struct kvm_vcpu *vcpu, u32 csselr)
* [If guests should attempt to infer aliasing properties from the
* geometry (which is not permitted by the architecture), they would
* only do so for virtually indexed caches.]
+ *
+ * This also makes sure the associativity bits of the CCSIDRs, including
+ * the ones of CCSIDRs for instruction caches, are overridden when the
+ * physical CPUs have a heterogeneous configuration so that a vCPU sees
+ * the consistent values if it is migrated among physical CPUs.
*/
- if (vcpu_cache_overridden(vcpu) && !(csselr & CSSELR_IN)) // data or unified cache
+ if (vcpu_cache_overridden(vcpu))
ccsidr &= ~CCSIDR_ASSOCIATIVITY_BITS_MASK;
return ccsidr;
--
2.38.1
On Thu, 01 Dec 2022 10:49:11 +0000,
Akihiko Odaki <[email protected]> wrote:
Thanks for looking into this.
> M2 MacBook Air has mismatched CCSIDR associativity bits, which makes the
> bits a KVM vCPU sees inconsistent when migrating.
Can you describe the actual discrepancy? Is that an issue between the
two core types? In which case, nothing says that these two cluster
should have the same cache topology.
> It also makes QEMU fail restoring the vCPU registers because QEMU saves
> and restores all of the registers including CCSIDRs, and if the vCPU
> migrated among physical CPUs between saving and restoring, it tries to
> restore CCSIDR values that mismatch with the current physical CPU, which
> causes EFAULT.
Well, QEMU will have plenty of other problems, starting with MIDRs,
which always reflect the physical one. In general, KVM isn't well
geared for VMs spanning multiple CPU types. It is improving, but there
is a long way to go.
> Trap CCSIDRs if there are CCSIDR value msimatches, and override the
> associativity bits when handling the trap.
TBH, I'd rather we stop reporting this stuff altogether.
There is nothing a correctly written arm64 guest should do with any of
this (this is only useful for set/way CMOs, which non-secure SW should
never issue). It would be a lot better to expose a virtual topology
(one set, one way, one level). It would also save us from the CCSIDRX
silliness.
The only complexity would be to still accept different topologies from
userspace so that we can restore a VM saved before this virtual
topology.
Do you mind having a look at this?
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
M2 MacBook Air has mismatched CCSIDR associativity bits, which makes the
bits a KVM vCPU sees inconsistent when migrating. Record such mismatches
so that KVM can use the information later to avoid the problem.
Signed-off-by: Akihiko Odaki <[email protected]>
---
arch/arm64/include/asm/cache.h | 3 ++
arch/arm64/include/asm/cpu.h | 1 +
arch/arm64/include/asm/cpufeature.h | 8 +++++
arch/arm64/include/asm/sysreg.h | 7 ++++
arch/arm64/kernel/cacheinfo.c | 4 +--
arch/arm64/kernel/cpu_errata.c | 52 +++++++++++++++++++++++++++++
arch/arm64/kernel/cpufeature.c | 4 +++
arch/arm64/kernel/cpuinfo.c | 30 +++++++++++++++++
arch/arm64/kvm/sys_regs.c | 4 +--
arch/arm64/tools/cpucaps | 1 +
10 files changed, 110 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
index c0b178d1bb4f..eeab2b8c7e71 100644
--- a/arch/arm64/include/asm/cache.h
+++ b/arch/arm64/include/asm/cache.h
@@ -72,6 +72,8 @@ static inline u32 cache_type_cwg(void)
#define __read_mostly __section(".data..read_mostly")
+#define MAX_CACHE_LEVEL 7 /* Max 7 level supported */
+
static inline int cache_line_size_of_cpu(void)
{
u32 cwg = cache_type_cwg();
@@ -80,6 +82,7 @@ static inline int cache_line_size_of_cpu(void)
}
int cache_line_size(void);
+enum cache_type get_cache_type(int level);
/*
* Read the effective value of CTR_EL0.
diff --git a/arch/arm64/include/asm/cpu.h b/arch/arm64/include/asm/cpu.h
index fd7a92219eea..b8d4f31ed59b 100644
--- a/arch/arm64/include/asm/cpu.h
+++ b/arch/arm64/include/asm/cpu.h
@@ -41,6 +41,7 @@ struct cpuinfo_arm64 {
struct cpu cpu;
struct kobject kobj;
u64 reg_ctr;
+ struct ccsidr reg_ccsidr[MAX_CACHE_LEVEL + 1];
u64 reg_cntfrq;
u64 reg_dczid;
u64 reg_midr;
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index f73f11b55042..104483151362 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -7,6 +7,7 @@
#define __ASM_CPUFEATURE_H
#include <asm/alternative-macros.h>
+#include <asm/cache.h>
#include <asm/cpucaps.h>
#include <asm/cputype.h>
#include <asm/hwcap.h>
@@ -917,6 +918,13 @@ extern struct arm64_ftr_override id_aa64isar2_override;
u32 get_kvm_ipa_limit(void);
void dump_cpu_features(void);
+struct ccsidr {
+ u64 data;
+ u64 inst;
+};
+
+extern struct ccsidr ccsidr[MAX_CACHE_LEVEL + 1];
+
#endif /* __ASSEMBLY__ */
#endif
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 7d301700d1a9..e796f14fdc2a 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -941,6 +941,13 @@
#define HFGxTR_EL2_nSMPRI_EL1_SHIFT 54
#define HFGxTR_EL2_nSMPRI_EL1_MASK BIT_MASK(HFGxTR_EL2_nSMPRI_EL1_SHIFT)
+/* CCSIDR_EL1 bit definitions */
+#define CCSIDR_ASSOCIATIVITY_BITS_MASK GENMASK(27, 3)
+
+/* CSSELR_EL1 */
+#define CSSELR_IN 1
+#define CSSELR_LEVEL_SHIFT 1
+
#define ARM64_FEATURE_FIELD_BITS 4
/* Create a mask for the feature bits of the specified feature. */
diff --git a/arch/arm64/kernel/cacheinfo.c b/arch/arm64/kernel/cacheinfo.c
index 97c42be71338..2e808ccc15bf 100644
--- a/arch/arm64/kernel/cacheinfo.c
+++ b/arch/arm64/kernel/cacheinfo.c
@@ -10,7 +10,6 @@
#include <linux/cacheinfo.h>
#include <linux/of.h>
-#define MAX_CACHE_LEVEL 7 /* Max 7 level supported */
/* Ctypen, bits[3(n - 1) + 2 : 3(n - 1)], for n = 1 to 7 */
#define CLIDR_CTYPE_SHIFT(level) (3 * (level - 1))
#define CLIDR_CTYPE_MASK(level) (7 << CLIDR_CTYPE_SHIFT(level))
@@ -26,7 +25,7 @@ int cache_line_size(void)
}
EXPORT_SYMBOL_GPL(cache_line_size);
-static inline enum cache_type get_cache_type(int level)
+enum cache_type get_cache_type(int level)
{
u64 clidr;
@@ -35,6 +34,7 @@ static inline enum cache_type get_cache_type(int level)
clidr = read_sysreg(clidr_el1);
return CLIDR_CTYPE(clidr, level);
}
+EXPORT_SYMBOL_GPL(get_cache_type);
static void ci_leaf_init(struct cacheinfo *this_leaf,
enum cache_type type, unsigned int level)
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 89ac00084f38..5caccf602fc0 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -8,6 +8,8 @@
#include <linux/arm-smccc.h>
#include <linux/types.h>
#include <linux/cpu.h>
+#include <linux/cacheinfo.h>
+#include <asm/cache.h>
#include <asm/cpu.h>
#include <asm/cputype.h>
#include <asm/cpufeature.h>
@@ -87,6 +89,50 @@ has_mismatched_cache_type(const struct arm64_cpu_capabilities *entry,
return (ctr_real != sys) && (ctr_raw != sys);
}
+static bool
+has_mismatched_cache_associativity(const struct arm64_cpu_capabilities *entry,
+ int scope)
+{
+ u64 mask = CCSIDR_ASSOCIATIVITY_BITS_MASK;
+ u64 real;
+ bool mismatched = false;
+ enum cache_type cache_type;
+ unsigned int i;
+
+ WARN_ON(scope != SCOPE_LOCAL_CPU || preemptible());
+
+ local_irq_disable();
+
+ for (i = 0; i <= MAX_CACHE_LEVEL; i++) {
+ cache_type = get_cache_type(i);
+
+ if ((cache_type & (CACHE_TYPE_DATA | CACHE_TYPE_UNIFIED))) {
+ write_sysreg(i << CSSELR_LEVEL_SHIFT, csselr_el1);
+ isb();
+ real = read_sysreg(ccsidr_el1);
+ if ((ccsidr[i].data & mask) != (real & mask)) {
+ mismatched = true;
+ break;
+ }
+ }
+
+ if ((cache_type & CACHE_TYPE_INST)) {
+ write_sysreg((i << CSSELR_LEVEL_SHIFT) | CSSELR_IN,
+ csselr_el1);
+ isb();
+ real = read_sysreg(ccsidr_el1);
+ if ((ccsidr[i].inst & mask) != (real & mask)) {
+ mismatched = true;
+ break;
+ }
+ }
+ }
+
+ local_irq_enable();
+
+ return mismatched;
+}
+
static void
cpu_enable_trap_ctr_access(const struct arm64_cpu_capabilities *cap)
{
@@ -499,6 +545,12 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
ERRATA_MIDR_RANGE_LIST(cavium_erratum_30115_cpus),
},
#endif
+ {
+ .desc = "Mismatched cache associativity",
+ .capability = ARM64_MISMATCHED_CACHE_ASSOCIATIVITY,
+ .matches = has_mismatched_cache_associativity,
+ .type = ARM64_CPUCAP_LOCAL_CPU_ERRATUM,
+ },
{
.desc = "Mismatched cache type (CTR_EL0)",
.capability = ARM64_MISMATCHED_CACHE_TYPE,
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index b3f37e2209ad..ef259396aa4c 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -930,6 +930,8 @@ static void init_cpu_ftr_reg(u32 sys_reg, u64 new)
reg->user_mask = user_mask;
}
+struct ccsidr ccsidr[MAX_CACHE_LEVEL + 1];
+
extern const struct arm64_cpu_capabilities arm64_errata[];
static const struct arm64_cpu_capabilities arm64_features[];
@@ -1039,6 +1041,8 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info)
* after we have initialised the CPU feature infrastructure.
*/
setup_boot_cpu_capabilities();
+
+ memcpy(ccsidr, info->reg_ccsidr, sizeof(ccsidr));
}
static void update_cpu_ftr_reg(struct arm64_ftr_reg *reg, u64 new)
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index 28d4f442b0bc..b1ea276b9d10 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -13,6 +13,7 @@
#include <linux/bitops.h>
#include <linux/bug.h>
+#include <linux/cacheinfo.h>
#include <linux/compat.h>
#include <linux/elf.h>
#include <linux/init.h>
@@ -47,6 +48,34 @@ static inline const char *icache_policy_str(int l1ip)
}
}
+static void read_ccsidr(struct ccsidr *ccsidr)
+{
+ enum cache_type cache_type;
+ unsigned int i;
+
+ local_irq_disable();
+
+ for (i = 0; i <= MAX_CACHE_LEVEL; i++) {
+ cache_type = get_cache_type(i);
+
+ if ((cache_type & (CACHE_TYPE_DATA | CACHE_TYPE_UNIFIED))) {
+ write_sysreg(i << CSSELR_LEVEL_SHIFT, csselr_el1);
+ isb();
+ ccsidr[i].data = read_sysreg(ccsidr_el1);
+ break;
+ }
+
+ if ((cache_type & CACHE_TYPE_INST)) {
+ write_sysreg((i << CSSELR_LEVEL_SHIFT) | CSSELR_IN,
+ csselr_el1);
+ isb();
+ ccsidr[i].inst = read_sysreg(ccsidr_el1);
+ }
+ }
+
+ local_irq_enable();
+}
+
unsigned long __icache_flags;
static const char *const hwcap_str[] = {
@@ -440,6 +469,7 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
if (id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0))
__cpuinfo_store_cpu_32bit(&info->aarch32);
+ read_ccsidr(info->reg_ccsidr);
cpuinfo_detect_icache_policy(info);
}
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 273ed1aaa6b3..1f0cb015e81c 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -111,8 +111,8 @@ static u32 get_ccsidr(struct kvm_vcpu *vcpu, u32 csselr)
* geometry (which is not permitted by the architecture), they would
* only do so for virtually indexed caches.]
*/
- if (vcpu_cache_overridden(vcpu) && !(csselr & 1)) // data or unified cache
- ccsidr &= ~GENMASK(27, 3);
+ if (vcpu_cache_overridden(vcpu) && !(csselr & CSSELR_IN)) // data or unified cache
+ ccsidr &= ~CCSIDR_ASSOCIATIVITY_BITS_MASK;
return ccsidr;
}
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index f1c0347ec31a..061c93319295 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -44,6 +44,7 @@ HAS_VIRT_HOST_EXTN
HAS_WFXT
HW_DBM
KVM_PROTECTED_MODE
+MISMATCHED_CACHE_ASSOCIATIVITY
MISMATCHED_CACHE_TYPE
MTE
MTE_ASYMM
--
2.38.1
On 2022/12/01 20:06, Marc Zyngier wrote:
> On Thu, 01 Dec 2022 10:49:11 +0000,
> Akihiko Odaki <[email protected]> wrote:
>
> Thanks for looking into this.
>
>> M2 MacBook Air has mismatched CCSIDR associativity bits, which makes the
>> bits a KVM vCPU sees inconsistent when migrating.
>
> Can you describe the actual discrepancy? Is that an issue between the
> two core types? In which case, nothing says that these two cluster
> should have the same cache topology.
Yes, the processor has big.LITTLE configuration.
On the processor, the valid CSSELR values are 0 (L1D), 1 (L1I), 3 (L2D).
For each CSSELR values, each cluster has:
- 0x700FE03A, 0x203FE01A, 0x70FFE07B
- 0x701FE03A, 0x203FE02A, 0x73FFE07B
>
>> It also makes QEMU fail restoring the vCPU registers because QEMU saves
>> and restores all of the registers including CCSIDRs, and if the vCPU
>> migrated among physical CPUs between saving and restoring, it tries to
>> restore CCSIDR values that mismatch with the current physical CPU, which
>> causes EFAULT.
>
> Well, QEMU will have plenty of other problems, starting with MIDRs,
> which always reflect the physical one. In general, KVM isn't well
> geared for VMs spanning multiple CPU types. It is improving, but there
> is a long way to go.
On M2 MacBook Air, I have seen no other difference in standard ID
registers and CCSIDRs are exceptions. Perhaps Apple designed this way so
that macOS's Hypervisor can freely migrate vCPU, but I can't assure that
without more analysis. This is still enough to migrate vCPU running
Linux at least.
>
>> Trap CCSIDRs if there are CCSIDR value msimatches, and override the
>> associativity bits when handling the trap.
>
> TBH, I'd rather we stop reporting this stuff altogether.
>
> There is nothing a correctly written arm64 guest should do with any of
> this (this is only useful for set/way CMOs, which non-secure SW should
> never issue). It would be a lot better to expose a virtual topology
> (one set, one way, one level). It would also save us from the CCSIDRX
> silliness.
>
> The only complexity would be to still accept different topologies from
> userspace so that we can restore a VM saved before this virtual
> topology.
Another (minor) concern is that trapping relevant registers may cost too
much. Currently KVM traps CSSELR and CCSIDR accesses with HCR_TID2, but
HCR_TID2 also affects CTR_EL0. Although I'm not sure if the register is
referred frequently, Arm introduced FEAT_EVT to trap CSSELR and CSSIDR
but not CTR_EL0 so there may be some case where trapping CTR_EL0 is not
tolerated. Perhaps Arm worried that a userspace application may read
CTR_EL0 frequently.
If you think the concern on VM restoration you mentioned and the
trapping overhead is tolerable, I'll write a new, much smaller patch
accordingly.
Regards,
Akihiko Odaki
>
> Do you mind having a look at this?
>
> Thanks,
>
> M.
>
On Thu, Dec 01, 2022 at 11:06:50AM +0000, Marc Zyngier wrote:
[...]
> It would be a lot better to expose a virtual topology
> (one set, one way, one level). It would also save us from the CCSIDRX
> silliness.
>
> The only complexity would be to still accept different topologies from
> userspace so that we can restore a VM saved before this virtual
> topology.
I generally agree that the reported topology is meaningless to
non-secure software.
However, with the cloud vendor hat on, I'm worried that inevitably some
customer will inspect the cache topology of the VM we've provided them
and complain.
Could we extend your suggestion about accepting different topologies to
effectively tolerate _any_ topology provided by userspace? KVM can
default to the virtual topology, but a well-informed userspace could
still provide different values to its guest. No point in trying to
babyproofing the UAPI further, IMO.
--
Thanks,
Oliver
On Thu, 01 Dec 2022 18:29:51 +0000,
Oliver Upton <[email protected]> wrote:
>
> On Thu, Dec 01, 2022 at 11:06:50AM +0000, Marc Zyngier wrote:
>
> [...]
>
> > It would be a lot better to expose a virtual topology
> > (one set, one way, one level). It would also save us from the CCSIDRX
> > silliness.
> >
> > The only complexity would be to still accept different topologies from
> > userspace so that we can restore a VM saved before this virtual
> > topology.
>
> I generally agree that the reported topology is meaningless to
> non-secure software.
>
> However, with the cloud vendor hat on, I'm worried that inevitably some
> customer will inspect the cache topology of the VM we've provided them
> and complain.
That's their prerogative. It is idiotic, but I guess paying customers
get this privilege ;-).
> Could we extend your suggestion about accepting different topologies to
> effectively tolerate _any_ topology provided by userspace? KVM can
> default to the virtual topology, but a well-informed userspace could
> still provide different values to its guest. No point in trying to
> babyproofing the UAPI further, IMO.
I think this is *exactly* what I suggested. Any valid topology should
be able to be restored, as we currently present the VM with any
topology the host HW may have. This must be preserved.
Eventually, we may even have to expose CCSIDRX, but let's cross that
bridge when we get to it.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
On Thu, 01 Dec 2022 17:26:08 +0000,
Akihiko Odaki <[email protected]> wrote:
>
> On 2022/12/01 20:06, Marc Zyngier wrote:
> > On Thu, 01 Dec 2022 10:49:11 +0000,
> > Akihiko Odaki <[email protected]> wrote:
> >
> > Thanks for looking into this.
> >
> >> M2 MacBook Air has mismatched CCSIDR associativity bits, which makes the
> >> bits a KVM vCPU sees inconsistent when migrating.
> >
> > Can you describe the actual discrepancy? Is that an issue between the
> > two core types? In which case, nothing says that these two cluster
> > should have the same cache topology.
>
> Yes, the processor has big.LITTLE configuration.
>
> On the processor, the valid CSSELR values are 0 (L1D), 1 (L1I), 3
> (L2D). For each CSSELR values, each cluster has:
> - 0x700FE03A, 0x203FE01A, 0x70FFE07B
> - 0x701FE03A, 0x203FE02A, 0x73FFE07B
This is a perfectly valid configuration. The architecture doesn't
place any limitation on how different or identical the cache
hierarchies are from the PoV of each CPU. Actually, most big-little
systems show similar differences across their clusters.
> >> It also makes QEMU fail restoring the vCPU registers because QEMU saves
> >> and restores all of the registers including CCSIDRs, and if the vCPU
> >> migrated among physical CPUs between saving and restoring, it tries to
> >> restore CCSIDR values that mismatch with the current physical CPU, which
> >> causes EFAULT.
> >
> > Well, QEMU will have plenty of other problems, starting with MIDRs,
> > which always reflect the physical one. In general, KVM isn't well
> > geared for VMs spanning multiple CPU types. It is improving, but there
> > is a long way to go.
>
> On M2 MacBook Air, I have seen no other difference in standard ID
> registers and CCSIDRs are exceptions. Perhaps Apple designed this way
> so that macOS's Hypervisor can freely migrate vCPU, but I can't assure
> that without more analysis. This is still enough to migrate vCPU
> running Linux at least.
I guess that MacOS hides more of the underlying HW than KVM does. And
KVM definitely doesn't hide the MIDR_EL1 registers, which *are*
different between the two clusters.
> >> Trap CCSIDRs if there are CCSIDR value msimatches, and override the
> >> associativity bits when handling the trap.
> >
> > TBH, I'd rather we stop reporting this stuff altogether.
> >
> > There is nothing a correctly written arm64 guest should do with any of
> > this (this is only useful for set/way CMOs, which non-secure SW should
> > never issue). It would be a lot better to expose a virtual topology
> > (one set, one way, one level). It would also save us from the CCSIDRX
> > silliness.
> >
> > The only complexity would be to still accept different topologies from
> > userspace so that we can restore a VM saved before this virtual
> > topology.
>
> Another (minor) concern is that trapping relevant registers may cost
> too much. Currently KVM traps CSSELR and CCSIDR accesses with
> HCR_TID2, but HCR_TID2 also affects CTR_EL0.
It will have an additional impact (JITs, for example, will take a hit
if they don't cache that value), but this is pretty easy to mitigate
if it proves to have too much of an impact. We already have a bunch of
fast-paths for things that we want to emulate more efficiently, and
CTR_EL0 could be one of them,
> Although I'm not sure if the register is referred frequently, Arm
> introduced FEAT_EVT to trap CSSELR and CSSIDR but not CTR_EL0 so
> there may be some case where trapping CTR_EL0 is not
> tolerated. Perhaps Arm worried that a userspace application may read
> CTR_EL0 frequently.
FEAT_EVT is one of these "let's add random traps" extensions,
culminating in FEAT_FGT. Having FEAT_EVT would make it more efficient,
but we need to support this for all revisions of the architecture.
So let's first build on top of HCR_EL2.TID2, and only then once we
have an idea of the overhead add support for HCR_EL2.TID4 for the
systems that have FEAT_EVT.
> If you think the concern on VM restoration you mentioned and the
> trapping overhead is tolerable, I'll write a new, much smaller patch
> accordingly.
That would great, thanks. There are a number of gotchas around that
(like the 32bit stuff that already plays the emulation game), but this
is the right time to start and have something in 6.3 if you keep to it!
M.
--
Without deviation from the norm, progress is not possible.
On 2022/12/02 8:13, Marc Zyngier wrote:
> On Thu, 01 Dec 2022 17:26:08 +0000,
> Akihiko Odaki <[email protected]> wrote:
>>
>> On 2022/12/01 20:06, Marc Zyngier wrote:
>>> On Thu, 01 Dec 2022 10:49:11 +0000,
>>> Akihiko Odaki <[email protected]> wrote:
>>>
>>> Thanks for looking into this.
>>>
>>>> M2 MacBook Air has mismatched CCSIDR associativity bits, which makes the
>>>> bits a KVM vCPU sees inconsistent when migrating.
>>>
>>> Can you describe the actual discrepancy? Is that an issue between the
>>> two core types? In which case, nothing says that these two cluster
>>> should have the same cache topology.
>>
>> Yes, the processor has big.LITTLE configuration.
>>
>> On the processor, the valid CSSELR values are 0 (L1D), 1 (L1I), 3
>> (L2D). For each CSSELR values, each cluster has:
>> - 0x700FE03A, 0x203FE01A, 0x70FFE07B
>> - 0x701FE03A, 0x203FE02A, 0x73FFE07B
>
> This is a perfectly valid configuration. The architecture doesn't
> place any limitation on how different or identical the cache
> hierarchies are from the PoV of each CPU. Actually, most big-little
> systems show similar differences across their clusters.
>
>>>> It also makes QEMU fail restoring the vCPU registers because QEMU saves
>>>> and restores all of the registers including CCSIDRs, and if the vCPU
>>>> migrated among physical CPUs between saving and restoring, it tries to
>>>> restore CCSIDR values that mismatch with the current physical CPU, which
>>>> causes EFAULT.
>>>
>>> Well, QEMU will have plenty of other problems, starting with MIDRs,
>>> which always reflect the physical one. In general, KVM isn't well
>>> geared for VMs spanning multiple CPU types. It is improving, but there
>>> is a long way to go.
>>
>> On M2 MacBook Air, I have seen no other difference in standard ID
>> registers and CCSIDRs are exceptions. Perhaps Apple designed this way
>> so that macOS's Hypervisor can freely migrate vCPU, but I can't assure
>> that without more analysis. This is still enough to migrate vCPU
>> running Linux at least.
>
> I guess that MacOS hides more of the underlying HW than KVM does. And
> KVM definitely doesn't hide the MIDR_EL1 registers, which *are*
> different between the two clusters.
It seems KVM stores a MIDR value of a CPU and reuse it as "invariant"
value for ioctls while it exposes the MIDR value each physical CPU owns
to vCPU.
This may be a problem worth fixing. My understanding is that while there
is no serious application which requires vCPU migration among physical
clusters, crosvm uses KVM on big.LITTLE processors by pinning vCPU to
physical CPU, and it is a real-world application which needs to be
supported.
For an application like crosvm, you would expect the vCPU thread gets
the MIDR value of the physical CPU which the thread is pinned to when it
calls ioctl, but it can get one of another arbitrary CPU in reality.
Fixing this problem will pose two design questions:
1. Should it expose a value consistent among clusters?
For example, we can change the KVM initialization code so that it
initializes VPIDR with the value stored as "invariant". This would help
migrating vCPU among clusters, but if you pin each vCPU thread to a
distinct phyiscal CPU, you may rather want the vCPU to see the MIDR
value specific to each physical CPU and to apply quirks or tuning
parameters according to the value.
2. Should it be invariant or variable?
Fortunately making it variable is easy. Arm provides VPIDR_EL1 register
to specify the value exposed as MPIDR_EL0 so there is no trapping cost.
...or we may just say the value of MPIDR_EL0 (and possibly other
"invariant" registers) exposed via ioctl are useless and deprecated.
>
>>>> Trap CCSIDRs if there are CCSIDR value msimatches, and override the
>>>> associativity bits when handling the trap.
>>>
>>> TBH, I'd rather we stop reporting this stuff altogether.
>>>
>>> There is nothing a correctly written arm64 guest should do with any of
>>> this (this is only useful for set/way CMOs, which non-secure SW should
>>> never issue). It would be a lot better to expose a virtual topology
>>> (one set, one way, one level). It would also save us from the CCSIDRX
>>> silliness.
>>>
>>> The only complexity would be to still accept different topologies from
>>> userspace so that we can restore a VM saved before this virtual
>>> topology.
>>
>> Another (minor) concern is that trapping relevant registers may cost
>> too much. Currently KVM traps CSSELR and CCSIDR accesses with
>> HCR_TID2, but HCR_TID2 also affects CTR_EL0.
>
> It will have an additional impact (JITs, for example, will take a hit
> if they don't cache that value), but this is pretty easy to mitigate
> if it proves to have too much of an impact. We already have a bunch of
> fast-paths for things that we want to emulate more efficiently, and
> CTR_EL0 could be one of them,
>
>> Although I'm not sure if the register is referred frequently, Arm
>> introduced FEAT_EVT to trap CSSELR and CSSIDR but not CTR_EL0 so
>> there may be some case where trapping CTR_EL0 is not
>> tolerated. Perhaps Arm worried that a userspace application may read
>> CTR_EL0 frequently.
>
> FEAT_EVT is one of these "let's add random traps" extensions,
> culminating in FEAT_FGT. Having FEAT_EVT would make it more efficient,
> but we need to support this for all revisions of the architecture.
>
> So let's first build on top of HCR_EL2.TID2, and only then once we
> have an idea of the overhead add support for HCR_EL2.TID4 for the
> systems that have FEAT_EVT.
That sounds good, I'll write a new series according to this idea.
Regards,
Akihiko Odaki
>
>> If you think the concern on VM restoration you mentioned and the
>> trapping overhead is tolerable, I'll write a new, much smaller patch
>> accordingly.
>
> That would great, thanks. There are a number of gotchas around that
> (like the 32bit stuff that already plays the emulation game), but this
> is the right time to start and have something in 6.3 if you keep to it!
>
> M.
>
On Fri, 02 Dec 2022 05:17:12 +0000,
Akihiko Odaki <[email protected]> wrote:
>
> >> On M2 MacBook Air, I have seen no other difference in standard ID
> >> registers and CCSIDRs are exceptions. Perhaps Apple designed this way
> >> so that macOS's Hypervisor can freely migrate vCPU, but I can't assure
> >> that without more analysis. This is still enough to migrate vCPU
> >> running Linux at least.
> >
> > I guess that MacOS hides more of the underlying HW than KVM does. And
> > KVM definitely doesn't hide the MIDR_EL1 registers, which *are*
> > different between the two clusters.
>
> It seems KVM stores a MIDR value of a CPU and reuse it as "invariant"
> value for ioctls while it exposes the MIDR value each physical CPU
> owns to vCPU.
This only affects the VMM though, and not the guest which sees the
MIDR of the CPU it runs on. The problem is that at short of pinning
the vcpus, you don't know where they will run. So any value is fair
game.
> This may be a problem worth fixing. My understanding is that while
> there is no serious application which requires vCPU migration among
> physical clusters,
Hey, I do that all the time with kvmtool! It's just that my guest do
not care about being run on a CPU or another.
> crosvm uses KVM on big.LITTLE processors by pinning
> vCPU to physical CPU, and it is a real-world application which needs
> to be supported.
>
> For an application like crosvm, you would expect the vCPU thread gets
> the MIDR value of the physical CPU which the thread is pinned to when
> it calls ioctl, but it can get one of another arbitrary CPU in
> reality.
No. It will get the MIDR of the CPU it runs on. Check again. What you
describing above is solely for userspace.
>
> Fixing this problem will pose two design questions:
>
> 1. Should it expose a value consistent among clusters?
>
> For example, we can change the KVM initialization code so that it
> initializes VPIDR with the value stored as "invariant". This would
> help migrating vCPU among clusters, but if you pin each vCPU thread to
> a distinct phyiscal CPU, you may rather want the vCPU to see the MIDR
> value specific to each physical CPU and to apply quirks or tuning
> parameters according to the value.
Which is what happens. Not at the cluster level, but at the CPU
level. The architecture doesn't describe what a *cluster* is.
> 2. Should it be invariant or variable?
>
> Fortunately making it variable is easy. Arm provides VPIDR_EL1
> register to specify the value exposed as MPIDR_EL0 so there is no
> trapping cost.
And if you do that you make it impossible for the guest to mitigate
errata, as most of the errata handling is based on the MIDR values.
> ...or we may just say the value of MPIDR_EL0 (and possibly other
I assume you meant MIDR_EL1 here, as MPIDR_EL1 is something else (and
it has no _EL0 equivalent).
> "invariant" registers) exposed via ioctl are useless and deprecated.
Useless? Not really. The all are meaningful to the guest, and a change
there will cause issues.
CTR_EL0 must, for example, be an invariant. Otherwise, you need to
trap all the CMOs when the {I,D}minLine values that are restored from
userspace are bigger than the ones the HW has. Even worse, when the
DIC/IDC bits are set from userspace while the HW has them cleared: you
cannot mitigate that one, and you'll end up with memory corruption.
I've been toying with the idea of exposing to guests the list of
MIDR/REVIDR the guest is allowed to run on, as a PV service. This
would allow that guest to enable all the mitigations it wants in one
go.
Not sure I have time for this at the moment, but that'd be something
to explore.
[...]
> > So let's first build on top of HCR_EL2.TID2, and only then once we
> > have an idea of the overhead add support for HCR_EL2.TID4 for the
> > systems that have FEAT_EVT.
>
> That sounds good, I'll write a new series according to this idea.
Thanks!
M.
--
Without deviation from the norm, progress is not possible.
On 2022/12/02 18:40, Marc Zyngier wrote:
> On Fri, 02 Dec 2022 05:17:12 +0000,
> Akihiko Odaki <[email protected]> wrote:
>>
>>>> On M2 MacBook Air, I have seen no other difference in standard ID
>>>> registers and CCSIDRs are exceptions. Perhaps Apple designed this way
>>>> so that macOS's Hypervisor can freely migrate vCPU, but I can't assure
>>>> that without more analysis. This is still enough to migrate vCPU
>>>> running Linux at least.
>>>
>>> I guess that MacOS hides more of the underlying HW than KVM does. And
>>> KVM definitely doesn't hide the MIDR_EL1 registers, which *are*
>>> different between the two clusters.
>>
>> It seems KVM stores a MIDR value of a CPU and reuse it as "invariant"
>> value for ioctls while it exposes the MIDR value each physical CPU
>> owns to vCPU.
>
> This only affects the VMM though, and not the guest which sees the
> MIDR of the CPU it runs on. The problem is that at short of pinning
> the vcpus, you don't know where they will run. So any value is fair
> game.
Yes, my concern is that VMM can be confused if it sees something
different from what the guest on the vCPU sees.
>> crosvm uses KVM on big.LITTLE processors by pinning
>> vCPU to physical CPU, and it is a real-world application which needs
>> to be supported.
>>
>> For an application like crosvm, you would expect the vCPU thread gets
>> the MIDR value of the physical CPU which the thread is pinned to when
>> it calls ioctl, but it can get one of another arbitrary CPU in
>> reality.
>
> No. It will get the MIDR of the CPU it runs on. Check again. What you
> describing above is solely for userspace.
By "ioctl", I meant the value the VMM gets for the vCPU thread. The
problem is that the guest on the vCPU and the VMM issuing ioctl on the
same thread can see different values, and it doesn't seem quite right.
>> ...or we may just say the value of MPIDR_EL0 (and possibly other
>
> I assume you meant MIDR_EL1 here, as MPIDR_EL1 is something else (and
> it has no _EL0 equivalent).
Yes, I meant MIDR_EL1.
>
>> "invariant" registers) exposed via ioctl are useless and deprecated.
>
> Useless? Not really. The all are meaningful to the guest, and a change
> there will cause issues.
>
> CTR_EL0 must, for example, be an invariant. Otherwise, you need to
> trap all the CMOs when the {I,D}minLine values that are restored from
> userspace are bigger than the ones the HW has. Even worse, when the
> DIC/IDC bits are set from userspace while the HW has them cleared: you
> cannot mitigate that one, and you'll end up with memory corruption.
>
> I've been toying with the idea of exposing to guests the list of
> MIDR/REVIDR the guest is allowed to run on, as a PV service. This
> would allow that guest to enable all the mitigations it wants in one
> go.
>
> Not sure I have time for this at the moment, but that'd be something
> to explore.
>
> [...]
I meant that the values exposed to the VMM via ioctls does not seem
useful. They still need to be exposed to the guest as you say.
Regards,
Akihiko Odaki
On Thu, Dec 01, 2022 at 11:14:43PM +0000, Marc Zyngier wrote:
> On Thu, 01 Dec 2022 18:29:51 +0000,
> Oliver Upton <[email protected]> wrote:
> > Could we extend your suggestion about accepting different topologies to
> > effectively tolerate _any_ topology provided by userspace? KVM can
> > default to the virtual topology, but a well-informed userspace could
> > still provide different values to its guest. No point in trying to
> > babyproofing the UAPI further, IMO.
>
> I think this is *exactly* what I suggested. Any valid topology should
> be able to be restored, as we currently present the VM with any
> topology the host HW may have. This must be preserved.
Ah, I was narrowly reading into the conversation as it relates to the M2
implementation, my bad. SGTM :)
--
Thanks,
Oliver
On Fri, 02 Dec 2022 09:55:24 +0000,
Akihiko Odaki <[email protected]> wrote:
>
> On 2022/12/02 18:40, Marc Zyngier wrote:
> > On Fri, 02 Dec 2022 05:17:12 +0000,
> > Akihiko Odaki <[email protected]> wrote:
> >>
> >>>> On M2 MacBook Air, I have seen no other difference in standard ID
> >>>> registers and CCSIDRs are exceptions. Perhaps Apple designed this way
> >>>> so that macOS's Hypervisor can freely migrate vCPU, but I can't assure
> >>>> that without more analysis. This is still enough to migrate vCPU
> >>>> running Linux at least.
> >>>
> >>> I guess that MacOS hides more of the underlying HW than KVM does. And
> >>> KVM definitely doesn't hide the MIDR_EL1 registers, which *are*
> >>> different between the two clusters.
> >>
> >> It seems KVM stores a MIDR value of a CPU and reuse it as "invariant"
> >> value for ioctls while it exposes the MIDR value each physical CPU
> >> owns to vCPU.
> >
> > This only affects the VMM though, and not the guest which sees the
> > MIDR of the CPU it runs on. The problem is that at short of pinning
> > the vcpus, you don't know where they will run. So any value is fair
> > game.
>
> Yes, my concern is that VMM can be confused if it sees something
> different from what the guest on the vCPU sees.
Well, this has been part of the ABI for about 10 years, since Rusty
introduced this notion of invariant, so userspace is already working
around it if that's an actual issue.
This would be easily addressed though, and shouldn't result in any
issue. The following should do the trick (only lightly tested on an
M1).
Thanks,
M.
From f1caacb89eb8ae40dc38669160a2f081f87f4b15 Mon Sep 17 00:00:00 2001
From: Marc Zyngier <[email protected]>
Date: Sun, 4 Dec 2022 14:22:22 +0000
Subject: [PATCH] KVM: arm64: Return MIDR_EL1 to userspace as seen on the vcpu
thread
When booting, KVM sample the MIDR of the CPU it initialises on,
and keep this as the value that will forever be exposed to userspace.
However, this has nothing to do with the value that the guest will
see. On an asymetric system, this can result in userspace observing
weird things, specially if it has pinned the vcpus on a *different*
set of CPUs.
Instead, return the MIDR value for the vpcu we're currently on and
that the vcpu will observe if it has been pinned onto that CPU.
For symmetric systems, this changes nothing. For asymmetric machines,
they will observe the correct MIDR value at the point of the call.
Reported-by: Akihiko Odaki <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
---
arch/arm64/kvm/sys_regs.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index f4a7c5abcbca..f6bcf8ba9b2e 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1246,6 +1246,22 @@ static int set_id_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
return 0;
}
+static int get_midr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
+ u64 *val)
+{
+ *val = read_sysreg(midr_el1);
+ return 0;
+}
+
+static int set_midr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
+ u64 val)
+{
+ if (val != read_sysreg(midr_el1))
+ return -EINVAL;
+
+ return 0;
+}
+
static int get_raz_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
u64 *val)
{
@@ -1432,6 +1448,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ SYS_DESC(SYS_DBGVCR32_EL2), NULL, reset_val, DBGVCR32_EL2, 0 },
+ { SYS_DESC(SYS_MIDR_EL1), .get_user = get_midr, .set_user = set_midr },
{ SYS_DESC(SYS_MPIDR_EL1), NULL, reset_mpidr, MPIDR_EL1 },
/*
@@ -2609,7 +2626,6 @@ id_to_sys_reg_desc(struct kvm_vcpu *vcpu, u64 id,
((struct sys_reg_desc *)r)->val = read_sysreg(reg); \
}
-FUNCTION_INVARIANT(midr_el1)
FUNCTION_INVARIANT(revidr_el1)
FUNCTION_INVARIANT(clidr_el1)
FUNCTION_INVARIANT(aidr_el1)
@@ -2621,7 +2637,6 @@ static void get_ctr_el0(struct kvm_vcpu *v, const struct sys_reg_desc *r)
/* ->val is filled in by kvm_sys_reg_table_init() */
static struct sys_reg_desc invariant_sys_regs[] = {
- { SYS_DESC(SYS_MIDR_EL1), NULL, get_midr_el1 },
{ SYS_DESC(SYS_REVIDR_EL1), NULL, get_revidr_el1 },
{ SYS_DESC(SYS_CLIDR_EL1), NULL, get_clidr_el1 },
{ SYS_DESC(SYS_AIDR_EL1), NULL, get_aidr_el1 },
--
2.34.1
--
Without deviation from the norm, progress is not possible.
On 2022/12/04 23:57, Marc Zyngier wrote:
> On Fri, 02 Dec 2022 09:55:24 +0000,
> Akihiko Odaki <[email protected]> wrote:
>>
>> On 2022/12/02 18:40, Marc Zyngier wrote:
>>> On Fri, 02 Dec 2022 05:17:12 +0000,
>>> Akihiko Odaki <[email protected]> wrote:
>>>>
>>>>>> On M2 MacBook Air, I have seen no other difference in standard ID
>>>>>> registers and CCSIDRs are exceptions. Perhaps Apple designed this way
>>>>>> so that macOS's Hypervisor can freely migrate vCPU, but I can't assure
>>>>>> that without more analysis. This is still enough to migrate vCPU
>>>>>> running Linux at least.
>>>>>
>>>>> I guess that MacOS hides more of the underlying HW than KVM does. And
>>>>> KVM definitely doesn't hide the MIDR_EL1 registers, which *are*
>>>>> different between the two clusters.
>>>>
>>>> It seems KVM stores a MIDR value of a CPU and reuse it as "invariant"
>>>> value for ioctls while it exposes the MIDR value each physical CPU
>>>> owns to vCPU.
>>>
>>> This only affects the VMM though, and not the guest which sees the
>>> MIDR of the CPU it runs on. The problem is that at short of pinning
>>> the vcpus, you don't know where they will run. So any value is fair
>>> game.
>>
>> Yes, my concern is that VMM can be confused if it sees something
>> different from what the guest on the vCPU sees.
>
> Well, this has been part of the ABI for about 10 years, since Rusty
> introduced this notion of invariant, so userspace is already working
> around it if that's an actual issue.
In that case, I think it is better to document that the interface is not
working properly and deprecated.
>
> This would be easily addressed though, and shouldn't result in any
> issue. The following should do the trick (only lightly tested on an
> M1).
This can be problematic when restoring vcpu state saved with the old
kernel. A possible solution is to allow the userspace to overwrite
MIDR_EL1 as proposed for CCSIDR_EL1.
Regards,
Akihiko Odaki
>
> Thanks,
>
> M.
>
> From f1caacb89eb8ae40dc38669160a2f081f87f4b15 Mon Sep 17 00:00:00 2001
> From: Marc Zyngier <[email protected]>
> Date: Sun, 4 Dec 2022 14:22:22 +0000
> Subject: [PATCH] KVM: arm64: Return MIDR_EL1 to userspace as seen on the vcpu
> thread
>
> When booting, KVM sample the MIDR of the CPU it initialises on,
> and keep this as the value that will forever be exposed to userspace.
>
> However, this has nothing to do with the value that the guest will
> see. On an asymetric system, this can result in userspace observing
> weird things, specially if it has pinned the vcpus on a *different*
> set of CPUs.
>
> Instead, return the MIDR value for the vpcu we're currently on and
> that the vcpu will observe if it has been pinned onto that CPU.
>
> For symmetric systems, this changes nothing. For asymmetric machines,
> they will observe the correct MIDR value at the point of the call.
>
> Reported-by: Akihiko Odaki <[email protected]>
> Signed-off-by: Marc Zyngier <[email protected]>
> ---
> arch/arm64/kvm/sys_regs.c | 19 +++++++++++++++++--
> 1 file changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index f4a7c5abcbca..f6bcf8ba9b2e 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1246,6 +1246,22 @@ static int set_id_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
> return 0;
> }
>
> +static int get_midr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
> + u64 *val)
> +{
> + *val = read_sysreg(midr_el1);
> + return 0;
> +}
> +
> +static int set_midr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
> + u64 val)
> +{
> + if (val != read_sysreg(midr_el1))
> + return -EINVAL;
> +
> + return 0;
> +}
> +
> static int get_raz_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
> u64 *val)
> {
> @@ -1432,6 +1448,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>
> { SYS_DESC(SYS_DBGVCR32_EL2), NULL, reset_val, DBGVCR32_EL2, 0 },
>
> + { SYS_DESC(SYS_MIDR_EL1), .get_user = get_midr, .set_user = set_midr },
> { SYS_DESC(SYS_MPIDR_EL1), NULL, reset_mpidr, MPIDR_EL1 },
>
> /*
> @@ -2609,7 +2626,6 @@ id_to_sys_reg_desc(struct kvm_vcpu *vcpu, u64 id,
> ((struct sys_reg_desc *)r)->val = read_sysreg(reg); \
> }
>
> -FUNCTION_INVARIANT(midr_el1)
> FUNCTION_INVARIANT(revidr_el1)
> FUNCTION_INVARIANT(clidr_el1)
> FUNCTION_INVARIANT(aidr_el1)
> @@ -2621,7 +2637,6 @@ static void get_ctr_el0(struct kvm_vcpu *v, const struct sys_reg_desc *r)
>
> /* ->val is filled in by kvm_sys_reg_table_init() */
> static struct sys_reg_desc invariant_sys_regs[] = {
> - { SYS_DESC(SYS_MIDR_EL1), NULL, get_midr_el1 },
> { SYS_DESC(SYS_REVIDR_EL1), NULL, get_revidr_el1 },
> { SYS_DESC(SYS_CLIDR_EL1), NULL, get_clidr_el1 },
> { SYS_DESC(SYS_AIDR_EL1), NULL, get_aidr_el1 },
On Sun, 11 Dec 2022 05:25:31 +0000,
Akihiko Odaki <[email protected]> wrote:
>
> On 2022/12/04 23:57, Marc Zyngier wrote:
> > On Fri, 02 Dec 2022 09:55:24 +0000,
> > Akihiko Odaki <[email protected]> wrote:
> >>
> >> On 2022/12/02 18:40, Marc Zyngier wrote:
> >>> On Fri, 02 Dec 2022 05:17:12 +0000,
> >>> Akihiko Odaki <[email protected]> wrote:
> >>>>
> >>>>>> On M2 MacBook Air, I have seen no other difference in standard ID
> >>>>>> registers and CCSIDRs are exceptions. Perhaps Apple designed this way
> >>>>>> so that macOS's Hypervisor can freely migrate vCPU, but I can't assure
> >>>>>> that without more analysis. This is still enough to migrate vCPU
> >>>>>> running Linux at least.
> >>>>>
> >>>>> I guess that MacOS hides more of the underlying HW than KVM does. And
> >>>>> KVM definitely doesn't hide the MIDR_EL1 registers, which *are*
> >>>>> different between the two clusters.
> >>>>
> >>>> It seems KVM stores a MIDR value of a CPU and reuse it as "invariant"
> >>>> value for ioctls while it exposes the MIDR value each physical CPU
> >>>> owns to vCPU.
> >>>
> >>> This only affects the VMM though, and not the guest which sees the
> >>> MIDR of the CPU it runs on. The problem is that at short of pinning
> >>> the vcpus, you don't know where they will run. So any value is fair
> >>> game.
> >>
> >> Yes, my concern is that VMM can be confused if it sees something
> >> different from what the guest on the vCPU sees.
> >
> > Well, this has been part of the ABI for about 10 years, since Rusty
> > introduced this notion of invariant, so userspace is already working
> > around it if that's an actual issue.
>
> In that case, I think it is better to document that the interface is
> not working properly and deprecated.
This means nothing. Deprecating an API doesn't mean we don't support
it and doesn't solve any issue for existing userspace.
I'd rather not change anything, TBH. Existing userspace already knows
how to deal with this,
>
> >
> > This would be easily addressed though, and shouldn't result in any
> > issue. The following should do the trick (only lightly tested on an
> > M1).
>
> This can be problematic when restoring vcpu state saved with the old
> kernel. A possible solution is to allow the userspace to overwrite
> MIDR_EL1 as proposed for CCSIDR_EL1.
That would break most guests for obvious reasons. At best what can be
done is to make the MIDR WI.
M.
--
Without deviation from the norm, progress is not possible.
On 2022/12/11 19:21, Marc Zyngier wrote:
> On Sun, 11 Dec 2022 05:25:31 +0000,
> Akihiko Odaki <[email protected]> wrote:
>>
>> On 2022/12/04 23:57, Marc Zyngier wrote:
>>> On Fri, 02 Dec 2022 09:55:24 +0000,
>>> Akihiko Odaki <[email protected]> wrote:
>>>>
>>>> On 2022/12/02 18:40, Marc Zyngier wrote:
>>>>> On Fri, 02 Dec 2022 05:17:12 +0000,
>>>>> Akihiko Odaki <[email protected]> wrote:
>>>>>>
>>>>>>>> On M2 MacBook Air, I have seen no other difference in standard ID
>>>>>>>> registers and CCSIDRs are exceptions. Perhaps Apple designed this way
>>>>>>>> so that macOS's Hypervisor can freely migrate vCPU, but I can't assure
>>>>>>>> that without more analysis. This is still enough to migrate vCPU
>>>>>>>> running Linux at least.
>>>>>>>
>>>>>>> I guess that MacOS hides more of the underlying HW than KVM does. And
>>>>>>> KVM definitely doesn't hide the MIDR_EL1 registers, which *are*
>>>>>>> different between the two clusters.
>>>>>>
>>>>>> It seems KVM stores a MIDR value of a CPU and reuse it as "invariant"
>>>>>> value for ioctls while it exposes the MIDR value each physical CPU
>>>>>> owns to vCPU.
>>>>>
>>>>> This only affects the VMM though, and not the guest which sees the
>>>>> MIDR of the CPU it runs on. The problem is that at short of pinning
>>>>> the vcpus, you don't know where they will run. So any value is fair
>>>>> game.
>>>>
>>>> Yes, my concern is that VMM can be confused if it sees something
>>>> different from what the guest on the vCPU sees.
>>>
>>> Well, this has been part of the ABI for about 10 years, since Rusty
>>> introduced this notion of invariant, so userspace is already working
>>> around it if that's an actual issue.
>>
>> In that case, I think it is better to document that the interface is
>> not working properly and deprecated.
>
> This means nothing. Deprecating an API doesn't mean we don't support
> it and doesn't solve any issue for existing userspace.
>
> I'd rather not change anything, TBH. Existing userspace already knows
> how to deal with this,
>
>>
>>>
>>> This would be easily addressed though, and shouldn't result in any
>>> issue. The following should do the trick (only lightly tested on an
>>> M1).
>>
>> This can be problematic when restoring vcpu state saved with the old
>> kernel. A possible solution is to allow the userspace to overwrite
>> MIDR_EL1 as proposed for CCSIDR_EL1.
>
> That would break most guests for obvious reasons. At best what can be
> done is to make the MIDR WI.
Making MIDR WI sounds good to me. Either keeping the current behavior or
making it WI, the behavior is better to be documented, I think. The
documentation obviously does not help running existing userspace code
but will be helpful when writing new userspace code or understanding how
existing userspace code works.
Regards,
Akihiko Odaki
>
> M.
>