2024-04-26 10:51:22

by Sebastian Ott

[permalink] [raw]
Subject: [PATCH v2 0/6] KVM: arm64: emulation for CTR_EL0

Hej folks,

I'm looking into supporting migration between 2 Ampere Altra (Max)
machines (using Neoverse-N1). They are almost identical regarding
their feature id register state except for CTR_EL0.DIC which is set
on one machine but not the other.

CTR_EL0 is currently marked as invariant and migrating a VM between
those 2 machines using qemu fails.

Changes RFC [0] -> V1 [1]:
* store the emulated value per VM and not per VCPU
* allow to change more values than just the DIC bit
* only trap guest access to that reg when needed
* make sure to not present the guest with an inconsistent register set
Changes V1 -> V2:
* implemented Marc's suggestion for keeping registers consistent while
not breaking userspace ABI / expectations (I hope correctly this time)
* keep the shadowed value valid at all time
* unify the code to setup traps

Note:
* in order to switch off CTR_EL0.IDC userspace first has to set up
CLIDR_EL1 accordingly
* reading CCSIDR before and after changing the cache size in CTR_EL0
could result in different values (but only if CCSIDR was not changed
from the generated value)
* I'll prepare a testcase for the next revision

Thanks,
Sebastian

[0]: https://lore.kernel.org/all/[email protected]/T/
[1]: https://lore.kernel.org/lkml/[email protected]/T/

Sebastian Ott (6):
KVM: arm64: change return value in arm64_check_features()
KVM: arm64: unify trap setup code
KVM: arm64: maintain per VM value for CTR_EL0
KVM: arm64: add emulation for CTR_EL0 register
KVM: arm64: show writable masks for feature registers
KVM: arm64: rename functions for invariant sys regs

arch/arm64/include/asm/kvm_emulate.h | 37 -----
arch/arm64/include/asm/kvm_host.h | 4 +-
arch/arm64/kvm/arm.c | 3 +-
arch/arm64/kvm/sys_regs.c | 225 ++++++++++++++++++++-------
4 files changed, 173 insertions(+), 96 deletions(-)

--
2.42.0



2024-04-26 10:51:28

by Sebastian Ott

[permalink] [raw]
Subject: [PATCH v2 1/6] KVM: arm64: change return value in arm64_check_features()

arm64_check_features() returns -E2BIG to indicate the register's
feature set is a superset of the maximally-allowed register value.

The only caller of that function changes this to -EINVAL since
that's what userspace expects for invalid register writes.

In preparation of adding another caller for arm64_check_features()
that would need to do the same conversion just return -EINVAL
directly.

Signed-off-by: Sebastian Ott <[email protected]>
---
arch/arm64/kvm/sys_regs.c | 18 ++++--------------
1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index c9f4f387155f..131f5b0ca2b9 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1445,7 +1445,7 @@ static s64 kvm_arm64_ftr_safe_value(u32 id, const struct arm64_ftr_bits *ftrp,
* considered the safe value regardless For register fields that are not in
* writable, only the value in limit is considered the safe value.
*
- * Return: 0 if all the fields are safe. Otherwise, return negative errno.
+ * Return: 0 if all the fields are safe. Otherwise, return -EINVAL.
*/
static int arm64_check_features(struct kvm_vcpu *vcpu,
const struct sys_reg_desc *rd,
@@ -1464,7 +1464,7 @@ static int arm64_check_features(struct kvm_vcpu *vcpu,
* only safe value is 0.
*/
if (sysreg_visible_as_raz(vcpu, rd))
- return val ? -E2BIG : 0;
+ return val ? -EINVAL : 0;

ftr_reg = get_arm64_ftr_reg(id);
if (!ftr_reg)
@@ -1490,12 +1490,12 @@ static int arm64_check_features(struct kvm_vcpu *vcpu,
safe_val = kvm_arm64_ftr_safe_value(id, ftrp, f_val, f_lim);

if (safe_val != f_val)
- return -E2BIG;
+ return -EINVAL;
}

/* For fields that are not writable, values in limit are the safe values. */
if ((val & ~mask) != (limit & ~mask))
- return -E2BIG;
+ return -EINVAL;

return 0;
}
@@ -1840,16 +1840,6 @@ static int set_id_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
IDREG(vcpu->kvm, id) = val;

mutex_unlock(&vcpu->kvm->arch.config_lock);
-
- /*
- * arm64_check_features() returns -E2BIG to indicate the register's
- * feature set is a superset of the maximally-allowed register value.
- * While it would be nice to precisely describe this to userspace, the
- * existing UAPI for KVM_SET_ONE_REG has it that invalid register
- * writes return -EINVAL.
- */
- if (ret == -E2BIG)
- ret = -EINVAL;
return ret;
}

--
2.42.0


2024-04-26 10:51:42

by Sebastian Ott

[permalink] [raw]
Subject: [PATCH v2 2/6] KVM: arm64: unify trap setup code

There are 2 functions to set up traps via HCR_EL2:
* kvm_init_sysreg() called via KVM_RUN (before the 1st run or when
the pid changes)
* vcpu_reset_hcr() called via KVM_ARM_VCPU_INIT

To unify these 2 and to support traps that are dependent on the
ID register configuration, move vcpu_reset_hcr() to sys_regs.c
and call it via kvm_init_sysreg().

While at it rename kvm_init_sysreg() to kvm_setup_traps() to
better reflect what it's doing.

Signed-off-by: Sebastian Ott <[email protected]>
---
arch/arm64/include/asm/kvm_emulate.h | 37 -----------------------
arch/arm64/include/asm/kvm_host.h | 2 +-
arch/arm64/kvm/arm.c | 3 +-
arch/arm64/kvm/sys_regs.c | 44 ++++++++++++++++++++++++++--
4 files changed, 44 insertions(+), 42 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 975af30af31f..9e71fcbb033d 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -67,43 +67,6 @@ static __always_inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
}
#endif

-static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
-{
- vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS;
- if (has_vhe() || has_hvhe())
- vcpu->arch.hcr_el2 |= HCR_E2H;
- if (cpus_have_final_cap(ARM64_HAS_RAS_EXTN)) {
- /* route synchronous external abort exceptions to EL2 */
- vcpu->arch.hcr_el2 |= HCR_TEA;
- /* trap error record accesses */
- vcpu->arch.hcr_el2 |= HCR_TERR;
- }
-
- if (cpus_have_final_cap(ARM64_HAS_STAGE2_FWB)) {
- vcpu->arch.hcr_el2 |= HCR_FWB;
- } else {
- /*
- * For non-FWB CPUs, we trap VM ops (HCR_EL2.TVM) until M+C
- * get set in SCTLR_EL1 such that we can detect when the guest
- * MMU gets turned on and do the necessary cache maintenance
- * then.
- */
- vcpu->arch.hcr_el2 |= HCR_TVM;
- }
-
- if (cpus_have_final_cap(ARM64_HAS_EVT) &&
- !cpus_have_final_cap(ARM64_MISMATCHED_CACHE_TYPE))
- vcpu->arch.hcr_el2 |= HCR_TID4;
- else
- vcpu->arch.hcr_el2 |= HCR_TID2;
-
- if (vcpu_el1_is_32bit(vcpu))
- vcpu->arch.hcr_el2 &= ~HCR_RW;
-
- if (kvm_has_mte(vcpu->kvm))
- vcpu->arch.hcr_el2 |= HCR_ATA;
-}
-
static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
{
return (unsigned long *)&vcpu->arch.hcr_el2;
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 9e8a496fb284..696acba883c1 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1115,7 +1115,7 @@ int __init populate_nv_trap_config(void);
bool lock_all_vcpus(struct kvm *kvm);
void unlock_all_vcpus(struct kvm *kvm);

-void kvm_init_sysreg(struct kvm_vcpu *);
+void kvm_setup_traps(struct kvm_vcpu *);

/* MMIO helpers */
void kvm_mmio_write_buf(void *buf, unsigned int len, unsigned long data);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index c4a0a35e02c7..d6c27d8a8f2f 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -683,7 +683,7 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
* This needs to happen after NV has imposed its own restrictions on
* the feature set
*/
- kvm_init_sysreg(vcpu);
+ kvm_setup_traps(vcpu);

ret = kvm_timer_enable(vcpu);
if (ret)
@@ -1438,7 +1438,6 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
icache_inval_all_pou();
}

- vcpu_reset_hcr(vcpu);
vcpu->arch.cptr_el2 = kvm_get_reset_cptr_el2(vcpu);

/*
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 131f5b0ca2b9..ac366d0b614a 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -4020,11 +4020,43 @@ int kvm_vm_ioctl_get_reg_writable_masks(struct kvm *kvm, struct reg_mask_range *
return 0;
}

-void kvm_init_sysreg(struct kvm_vcpu *vcpu)
+static void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
{
struct kvm *kvm = vcpu->kvm;

- mutex_lock(&kvm->arch.config_lock);
+ vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS;
+ if (has_vhe() || has_hvhe())
+ vcpu->arch.hcr_el2 |= HCR_E2H;
+ if (cpus_have_final_cap(ARM64_HAS_RAS_EXTN)) {
+ /* route synchronous external abort exceptions to EL2 */
+ vcpu->arch.hcr_el2 |= HCR_TEA;
+ /* trap error record accesses */
+ vcpu->arch.hcr_el2 |= HCR_TERR;
+ }
+
+ if (cpus_have_final_cap(ARM64_HAS_STAGE2_FWB)) {
+ vcpu->arch.hcr_el2 |= HCR_FWB;
+ } else {
+ /*
+ * For non-FWB CPUs, we trap VM ops (HCR_EL2.TVM) until M+C
+ * get set in SCTLR_EL1 such that we can detect when the guest
+ * MMU gets turned on and do the necessary cache maintenance
+ * then.
+ */
+ vcpu->arch.hcr_el2 |= HCR_TVM;
+ }
+
+ if (cpus_have_final_cap(ARM64_HAS_EVT) &&
+ !cpus_have_final_cap(ARM64_MISMATCHED_CACHE_TYPE))
+ vcpu->arch.hcr_el2 |= HCR_TID4;
+ else
+ vcpu->arch.hcr_el2 |= HCR_TID2;
+
+ if (vcpu_el1_is_32bit(vcpu))
+ vcpu->arch.hcr_el2 &= ~HCR_RW;
+
+ if (kvm_has_mte(vcpu->kvm))
+ vcpu->arch.hcr_el2 |= HCR_ATA;

/*
* In the absence of FGT, we cannot independently trap TLBI
@@ -4033,6 +4065,14 @@ void kvm_init_sysreg(struct kvm_vcpu *vcpu)
*/
if (!kvm_has_feat(kvm, ID_AA64ISAR0_EL1, TLB, OS))
vcpu->arch.hcr_el2 |= HCR_TTLBOS;
+}
+
+void kvm_setup_traps(struct kvm_vcpu *vcpu)
+{
+ struct kvm *kvm = vcpu->kvm;
+
+ mutex_lock(&kvm->arch.config_lock);
+ vcpu_reset_hcr(vcpu);

if (cpus_have_final_cap(ARM64_HAS_HCX)) {
vcpu->arch.hcrx_el2 = HCRX_GUEST_FLAGS;
--
2.42.0


2024-04-26 10:51:46

by Sebastian Ott

[permalink] [raw]
Subject: [PATCH v2 3/6] KVM: arm64: maintain per VM value for CTR_EL0

In preparation for CTR_EL0 emulation maintain a per VM for this
register and use it where appropriate.

Signed-off-by: Sebastian Ott <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 2 ++
arch/arm64/kvm/sys_regs.c | 15 ++++++++-------
2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 696acba883c1..0c84cdb11c97 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -340,6 +340,8 @@ struct kvm_arch {
#define KVM_ARM_ID_REG_NUM (IDREG_IDX(sys_reg(3, 0, 0, 7, 7)) + 1)
u64 id_regs[KVM_ARM_ID_REG_NUM];

+ u64 ctr_el0;
+
/* Masks for VNCR-baked sysregs */
struct kvm_sysreg_masks *sysreg_masks;

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index ac366d0b614a..1488b93050d4 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -219,9 +219,9 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
* Returns the minimum line size for the selected cache, expressed as
* Log2(bytes).
*/
-static u8 get_min_cache_line_size(bool icache)
+static u8 get_min_cache_line_size(struct kvm *kvm, bool icache)
{
- u64 ctr = read_sanitised_ftr_reg(SYS_CTR_EL0);
+ u64 ctr = kvm->arch.ctr_el0;
u8 field;

if (icache)
@@ -248,7 +248,7 @@ static u32 get_ccsidr(struct kvm_vcpu *vcpu, u32 csselr)
if (vcpu->arch.ccsidr)
return vcpu->arch.ccsidr[csselr];

- line_size = get_min_cache_line_size(csselr & CSSELR_EL1_InD);
+ line_size = get_min_cache_line_size(vcpu->kvm, csselr & CSSELR_EL1_InD);

/*
* Fabricate a CCSIDR value as the overriding value does not exist.
@@ -283,7 +283,7 @@ static int set_ccsidr(struct kvm_vcpu *vcpu, u32 csselr, u32 val)
u32 i;

if ((val & CCSIDR_EL1_RES0) ||
- line_size < get_min_cache_line_size(csselr & CSSELR_EL1_InD))
+ line_size < get_min_cache_line_size(vcpu->kvm, csselr & CSSELR_EL1_InD))
return -EINVAL;

if (!ccsidr) {
@@ -1862,7 +1862,7 @@ static bool access_ctr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
if (p->is_write)
return write_to_read_only(vcpu, p, r);

- p->regval = read_sanitised_ftr_reg(SYS_CTR_EL0);
+ p->regval = vcpu->kvm->arch.ctr_el0;
return true;
}

@@ -1882,7 +1882,7 @@ static bool access_clidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
*/
static u64 reset_clidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
{
- u64 ctr_el0 = read_sanitised_ftr_reg(SYS_CTR_EL0);
+ u64 ctr_el0 = vcpu->kvm->arch.ctr_el0;
u64 clidr;
u8 loc;

@@ -1935,8 +1935,8 @@ static u64 reset_clidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
static int set_clidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
u64 val)
{
- u64 ctr_el0 = read_sanitised_ftr_reg(SYS_CTR_EL0);
u64 idc = !CLIDR_LOC(val) || (!CLIDR_LOUIS(val) && !CLIDR_LOUU(val));
+ u64 ctr_el0 = vcpu->kvm->arch.ctr_el0;

if ((val & CLIDR_EL1_RES0) || (!(ctr_el0 & CTR_EL0_IDC) && idc))
return -EINVAL;
@@ -3509,6 +3509,7 @@ static void kvm_reset_id_regs(struct kvm_vcpu *vcpu)
return;

lockdep_assert_held(&kvm->arch.config_lock);
+ kvm->arch.ctr_el0 = read_sanitised_ftr_reg(SYS_CTR_EL0);

/* Initialize all idregs */
while (is_id_reg(id)) {
--
2.42.0


2024-04-26 10:52:06

by Sebastian Ott

[permalink] [raw]
Subject: [PATCH v2 4/6] KVM: arm64: add emulation for CTR_EL0 register

CTR_EL0 is currently handled as an invariant register, thus
guests will be presented with the host value of that register.

Add emulation for CTR_EL0 based on a per VM value. Userspace can
switch off DIC and IDC bits and reduce DminLine and IminLine sizes.

When CTR_EL0 is changed validate that against CLIDR_EL1 and CCSIDR_EL1
to make sure we present the guest with consistent register values.
Changes that affect the generated cache topology values are allowed if
they don't clash with previous register writes.

Signed-off-by: Sebastian Ott <[email protected]>
---
arch/arm64/kvm/sys_regs.c | 123 +++++++++++++++++++++++++++++++++-----
1 file changed, 107 insertions(+), 16 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 1488b93050d4..2fe3492ba3c4 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -215,13 +215,8 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
/* CSSELR values; used to index KVM_REG_ARM_DEMUX_ID_CCSIDR */
#define CSSELR_MAX 14

-/*
- * Returns the minimum line size for the selected cache, expressed as
- * Log2(bytes).
- */
-static u8 get_min_cache_line_size(struct kvm *kvm, bool icache)
+static u8 __get_min_cache_line_size(u64 ctr, bool icache)
{
- u64 ctr = kvm->arch.ctr_el0;
u8 field;

if (icache)
@@ -240,6 +235,15 @@ static u8 get_min_cache_line_size(struct kvm *kvm, bool icache)
return field + 2;
}

+/*
+ * Returns the minimum line size for the selected cache, expressed as
+ * Log2(bytes).
+ */
+static u8 get_min_cache_line_size(struct kvm *kvm, bool icache)
+{
+ return __get_min_cache_line_size(kvm->arch.ctr_el0, icache);
+}
+
/* Which cache CCSIDR represents depends on CSSELR value. */
static u32 get_ccsidr(struct kvm_vcpu *vcpu, u32 csselr)
{
@@ -1856,6 +1860,45 @@ static int set_wi_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
return 0;
}

+static const struct sys_reg_desc *get_sys_reg_desc(u32 encoding);
+
+static int validate_clidr_el1(u64 clidr_el1, u64 ctr_el0)
+{
+ u64 idc = !CLIDR_LOC(clidr_el1) ||
+ (!CLIDR_LOUIS(clidr_el1) && !CLIDR_LOUU(clidr_el1));
+
+ if ((clidr_el1 & CLIDR_EL1_RES0) || (!(ctr_el0 & CTR_EL0_IDC) && idc))
+ return -EINVAL;
+
+ return 0;
+}
+
+static int validate_cache_top(struct kvm_vcpu *vcpu, u64 ctr_el0)
+{
+ const struct sys_reg_desc *clidr_el1;
+ unsigned int i;
+ int ret;
+
+ clidr_el1 = get_sys_reg_desc(SYS_CLIDR_EL1);
+ if (!clidr_el1)
+ return -ENOENT;
+
+ ret = validate_clidr_el1(__vcpu_sys_reg(vcpu, clidr_el1->reg), ctr_el0);
+ if (ret)
+ return ret;
+
+ if (!vcpu->arch.ccsidr)
+ return 0;
+
+ for (i = 0; i < CSSELR_MAX; i++) {
+ if ((FIELD_GET(CCSIDR_EL1_LineSize, get_ccsidr(vcpu, i)) + 4)
+ < __get_min_cache_line_size(ctr_el0, i & CSSELR_EL1_InD))
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
static bool access_ctr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
const struct sys_reg_desc *r)
{
@@ -1866,6 +1909,48 @@ static bool access_ctr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
return true;
}

+static u64 reset_ctr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd)
+{
+ vcpu->kvm->arch.ctr_el0 = read_sanitised_ftr_reg(SYS_CTR_EL0);
+ return vcpu->kvm->arch.ctr_el0;
+}
+
+static int get_ctr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
+ u64 *val)
+{
+ *val = vcpu->kvm->arch.ctr_el0;
+ return 0;
+}
+
+static int set_ctr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
+ u64 val)
+{
+ int ret;
+
+ if (val == vcpu->kvm->arch.ctr_el0)
+ return 0;
+
+ if (kvm_vm_has_ran_once(vcpu->kvm))
+ return -EBUSY;
+
+ mutex_lock(&vcpu->kvm->arch.config_lock);
+ ret = arm64_check_features(vcpu, rd, val);
+ if (ret) {
+ mutex_unlock(&vcpu->kvm->arch.config_lock);
+ return ret;
+ }
+ ret = validate_cache_top(vcpu, val);
+ if (ret) {
+ mutex_unlock(&vcpu->kvm->arch.config_lock);
+ return ret;
+ }
+
+ vcpu->kvm->arch.ctr_el0 = val;
+ mutex_unlock(&vcpu->kvm->arch.config_lock);
+
+ return 0;
+}
+
static bool access_clidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
const struct sys_reg_desc *r)
{
@@ -1935,10 +2020,9 @@ static u64 reset_clidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
static int set_clidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
u64 val)
{
- u64 idc = !CLIDR_LOC(val) || (!CLIDR_LOUIS(val) && !CLIDR_LOUU(val));
u64 ctr_el0 = vcpu->kvm->arch.ctr_el0;

- if ((val & CLIDR_EL1_RES0) || (!(ctr_el0 & CTR_EL0_IDC) && idc))
+ if (validate_clidr_el1(val, ctr_el0))
return -EINVAL;

__vcpu_sys_reg(vcpu, rd->reg) = val;
@@ -2452,7 +2536,11 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ SYS_DESC(SYS_CCSIDR2_EL1), undef_access },
{ SYS_DESC(SYS_SMIDR_EL1), undef_access },
{ SYS_DESC(SYS_CSSELR_EL1), access_csselr, reset_unknown, CSSELR_EL1 },
- { SYS_DESC(SYS_CTR_EL0), access_ctr },
+ { SYS_DESC(SYS_CTR_EL0), access_ctr, .reset = reset_ctr,
+ .get_user = get_ctr, .set_user = set_ctr, .val = (CTR_EL0_DIC_MASK |
+ CTR_EL0_IDC_MASK |
+ CTR_EL0_DminLine_MASK |
+ CTR_EL0_IminLine_MASK)},
{ SYS_DESC(SYS_SVCR), undef_access },

{ PMU_SYS_REG(PMCR_EL0), .access = access_pmcr, .reset = reset_pmcr,
@@ -3616,6 +3704,13 @@ static bool index_to_params(u64 id, struct sys_reg_params *params)
}
}

+static const struct sys_reg_desc *get_sys_reg_desc(u32 encoding)
+{
+ struct sys_reg_params params = encoding_to_params(encoding);
+
+ return find_reg(&params, sys_reg_descs, ARRAY_SIZE(sys_reg_descs));
+}
+
const struct sys_reg_desc *get_reg_by_id(u64 id,
const struct sys_reg_desc table[],
unsigned int num)
@@ -3669,18 +3764,11 @@ FUNCTION_INVARIANT(midr_el1)
FUNCTION_INVARIANT(revidr_el1)
FUNCTION_INVARIANT(aidr_el1)

-static u64 get_ctr_el0(struct kvm_vcpu *v, const struct sys_reg_desc *r)
-{
- ((struct sys_reg_desc *)r)->val = read_sanitised_ftr_reg(SYS_CTR_EL0);
- return ((struct sys_reg_desc *)r)->val;
-}
-
/* ->val is filled in by kvm_sys_reg_table_init() */
static struct sys_reg_desc invariant_sys_regs[] __ro_after_init = {
{ SYS_DESC(SYS_MIDR_EL1), NULL, get_midr_el1 },
{ SYS_DESC(SYS_REVIDR_EL1), NULL, get_revidr_el1 },
{ SYS_DESC(SYS_AIDR_EL1), NULL, get_aidr_el1 },
- { SYS_DESC(SYS_CTR_EL0), NULL, get_ctr_el0 },
};

static int get_invariant_sys_reg(u64 id, u64 __user *uaddr)
@@ -4066,6 +4154,9 @@ static void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
*/
if (!kvm_has_feat(kvm, ID_AA64ISAR0_EL1, TLB, OS))
vcpu->arch.hcr_el2 |= HCR_TTLBOS;
+
+ if (kvm->arch.ctr_el0 != read_sanitised_ftr_reg(SYS_CTR_EL0))
+ vcpu->arch.hcr_el2 |= HCR_TID2;
}

void kvm_setup_traps(struct kvm_vcpu *vcpu)
--
2.42.0


2024-04-26 10:52:18

by Sebastian Ott

[permalink] [raw]
Subject: [PATCH v2 5/6] KVM: arm64: show writable masks for feature registers

Instead of using ~0UL provide the actual writable mask for
non-id feature registers in the output of the
KVM_ARM_GET_REG_WRITABLE_MASKS ioctl.

Explicitely set the mask for CLIDR_EL1 to make sure we present
the same value to userspace than before.

This changes the mask for the CTR_EL0 register in the output
of the KVM_ARM_GET_REG_WRITABLE_MASKS ioctl.

Signed-off-by: Sebastian Ott <[email protected]>
---
arch/arm64/kvm/sys_regs.c | 19 +++++--------------
1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 2fe3492ba3c4..7d5d55e5c16a 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2532,7 +2532,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {

{ SYS_DESC(SYS_CCSIDR_EL1), access_ccsidr },
{ SYS_DESC(SYS_CLIDR_EL1), access_clidr, reset_clidr, CLIDR_EL1,
- .set_user = set_clidr },
+ .set_user = set_clidr, .val = ~0UL },
{ SYS_DESC(SYS_CCSIDR2_EL1), undef_access },
{ SYS_DESC(SYS_SMIDR_EL1), undef_access },
{ SYS_DESC(SYS_CSSELR_EL1), access_csselr, reset_unknown, CSSELR_EL1 },
@@ -4087,20 +4087,11 @@ int kvm_vm_ioctl_get_reg_writable_masks(struct kvm *kvm, struct reg_mask_range *
if (!is_feature_id_reg(encoding) || !reg->set_user)
continue;

- /*
- * For ID registers, we return the writable mask. Other feature
- * registers return a full 64bit mask. That's not necessary
- * compliant with a given revision of the architecture, but the
- * RES0/RES1 definitions allow us to do that.
- */
- if (is_id_reg(encoding)) {
- if (!reg->val ||
- (is_aa32_id_reg(encoding) && !kvm_supports_32bit_el0()))
- continue;
- val = reg->val;
- } else {
- val = ~0UL;
+ if (!reg->val ||
+ (is_aa32_id_reg(encoding) && !kvm_supports_32bit_el0())) {
+ continue;
}
+ val = reg->val;

if (put_user(val, (masks + KVM_ARM_FEATURE_ID_RANGE_INDEX(encoding))))
return -EFAULT;
--
2.42.0


2024-04-26 10:52:19

by Sebastian Ott

[permalink] [raw]
Subject: [PATCH v2 6/6] KVM: arm64: rename functions for invariant sys regs

Invariant system id registers are populated with host values
at initialization time using their .reset function cb.

These are currently called get_* which is usually used by
the functions implementing the .get_user callback.

Change their function names to reset_* to reflect what they
are used for.

Signed-off-by: Sebastian Ott <[email protected]>
---
arch/arm64/kvm/sys_regs.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 7d5d55e5c16a..88f6cdf9b8d6 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -3753,8 +3753,8 @@ id_to_sys_reg_desc(struct kvm_vcpu *vcpu, u64 id,
*/

#define FUNCTION_INVARIANT(reg) \
- static u64 get_##reg(struct kvm_vcpu *v, \
- const struct sys_reg_desc *r) \
+ static u64 reset_##reg(struct kvm_vcpu *v, \
+ const struct sys_reg_desc *r) \
{ \
((struct sys_reg_desc *)r)->val = read_sysreg(reg); \
return ((struct sys_reg_desc *)r)->val; \
@@ -3766,9 +3766,9 @@ FUNCTION_INVARIANT(aidr_el1)

/* ->val is filled in by kvm_sys_reg_table_init() */
static struct sys_reg_desc invariant_sys_regs[] __ro_after_init = {
- { SYS_DESC(SYS_MIDR_EL1), NULL, get_midr_el1 },
- { SYS_DESC(SYS_REVIDR_EL1), NULL, get_revidr_el1 },
- { SYS_DESC(SYS_AIDR_EL1), NULL, get_aidr_el1 },
+ { SYS_DESC(SYS_MIDR_EL1), NULL, reset_midr_el1 },
+ { SYS_DESC(SYS_REVIDR_EL1), NULL, reset_revidr_el1 },
+ { SYS_DESC(SYS_AIDR_EL1), NULL, reset_aidr_el1 },
};

static int get_invariant_sys_reg(u64 id, u64 __user *uaddr)
--
2.42.0


2024-05-01 06:51:53

by Oliver Upton

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] KVM: arm64: unify trap setup code

On Fri, Apr 26, 2024 at 12:49:46PM +0200, Sebastian Ott wrote:
> There are 2 functions to set up traps via HCR_EL2:

nitpick: these functions *calculate* the trap values, but do not
actually set them up. HCR_EL2 doesn't get written to until further down
the line on KVM_RUN.

> + if (cpus_have_final_cap(ARM64_HAS_STAGE2_FWB)) {
> + vcpu->arch.hcr_el2 |= HCR_FWB;
> + } else {
> + /*
> + * For non-FWB CPUs, we trap VM ops (HCR_EL2.TVM) until M+C
> + * get set in SCTLR_EL1 such that we can detect when the guest
> + * MMU gets turned on and do the necessary cache maintenance
> + * then.
> + */
> + vcpu->arch.hcr_el2 |= HCR_TVM;
> + }

It seems to me like calling this once for the lifetime of a vCPU will
break non-FWB behavior.

Like the comment suggests, these traps are needed to catch the moment
the S1 MMU is turned on and do cache maintenance to make sure D$ agrees
with what the guest was doing before enabling the MMU.

KVM_ARM_VCPU_INIT resets SCTLR_EL1, but it seems we'd miss setting
HCR_TVM in that case.

--
Thanks,
Oliver

2024-05-01 07:31:51

by Oliver Upton

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] KVM: arm64: show writable masks for feature registers

On Fri, Apr 26, 2024 at 12:49:49PM +0200, Sebastian Ott wrote:
> Instead of using ~0UL provide the actual writable mask for
> non-id feature registers in the output of the
> KVM_ARM_GET_REG_WRITABLE_MASKS ioctl.

Urgh, yeah in retrospect I think we should've constrained this to the
registers KVM considers "id regs" (the space occupied by known registers
or otherwise RAZ).

> Explicitely set the mask for CLIDR_EL1 to make sure we present
> the same value to userspace than before.

typo: explicitly

Also, we know the set of mutable bits for CLIDR_EL1 as it is handled in
set_clidr(). The mask really should match that, since the UAPI is
documented as "allowing userspace to know what fields can be changed for
the system register"

--
Thanks,
Oliver

2024-05-01 08:15:26

by Oliver Upton

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] KVM: arm64: add emulation for CTR_EL0 register

On Fri, Apr 26, 2024 at 12:49:48PM +0200, Sebastian Ott wrote:
> CTR_EL0 is currently handled as an invariant register, thus
> guests will be presented with the host value of that register.
>
> Add emulation for CTR_EL0 based on a per VM value. Userspace can
> switch off DIC and IDC bits and reduce DminLine and IminLine sizes.
>
> When CTR_EL0 is changed validate that against CLIDR_EL1 and CCSIDR_EL1
> to make sure we present the guest with consistent register values.
> Changes that affect the generated cache topology values are allowed if
> they don't clash with previous register writes.

Sorry I didn't speak up earlier, but I'm not sold on the need to
cross-validate userspace values for the cache type registers.

KVM should only be concerned about whether or not the selected feature
set matches what hardware is capable of and what KVM can virtualize. So
in the context of the CTR and the cache topology, I feel that they
should be _separately_ evaluated against the host's CTR_EL0.

Inconsistencies between fields in userspace values should be out of
scope; userspace shares the responsibility of presenting something
architectural, especially if it starts modifying ID registers. Otherwise
I'm quite worried about the amount of glue required to plumb exhaustive
consitency checks for registers, especially considering the lack of
ordering.

Marc, I know this goes against what you had suggested earlier, is there
something in particular that you think warrants the consistency checks?

> +static u64 reset_ctr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd)
> +{
> + vcpu->kvm->arch.ctr_el0 = read_sanitised_ftr_reg(SYS_CTR_EL0);
> + return vcpu->kvm->arch.ctr_el0;
> +}
> +

We definitely do not want this value to change across a vCPU reset, it
should be handled like the other ID registers where they only get reset
once for the VM lifetime.

--
Thanks,
Oliver

2024-05-01 08:17:50

by Oliver Upton

[permalink] [raw]
Subject: Re: [PATCH v2 0/6] KVM: arm64: emulation for CTR_EL0

Hi Sebastian,

On Fri, Apr 26, 2024 at 12:49:44PM +0200, Sebastian Ott wrote:
> Hej folks,
>
> I'm looking into supporting migration between 2 Ampere Altra (Max)
> machines (using Neoverse-N1). They are almost identical regarding
> their feature id register state except for CTR_EL0.DIC which is set
> on one machine but not the other.
>
> CTR_EL0 is currently marked as invariant and migrating a VM between
> those 2 machines using qemu fails.

I left some feedback on the series, but in addition to that would it be
possible to augment the set_id_regs selftest to exercise the CTR_EL0
mutability?

--
Thanks,
Oliver

2024-05-03 11:02:57

by Sebastian Ott

[permalink] [raw]
Subject: Re: [PATCH v2 0/6] KVM: arm64: emulation for CTR_EL0

Hi Oliver,

On Wed, 1 May 2024, Oliver Upton wrote:
> On Fri, Apr 26, 2024 at 12:49:44PM +0200, Sebastian Ott wrote:
>> Hej folks,
>>
>> I'm looking into supporting migration between 2 Ampere Altra (Max)
>> machines (using Neoverse-N1). They are almost identical regarding
>> their feature id register state except for CTR_EL0.DIC which is set
>> on one machine but not the other.
>>
>> CTR_EL0 is currently marked as invariant and migrating a VM between
>> those 2 machines using qemu fails.
>
> I left some feedback on the series, but in addition to that would it be
> possible to augment the set_id_regs selftest to exercise the CTR_EL0
> mutability?

Yes, sure!

Thanks,
Sebastian


2024-05-03 11:03:41

by Sebastian Ott

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] KVM: arm64: show writable masks for feature registers

On Wed, 1 May 2024, Oliver Upton wrote:

> On Fri, Apr 26, 2024 at 12:49:49PM +0200, Sebastian Ott wrote:
>> Instead of using ~0UL provide the actual writable mask for
>> non-id feature registers in the output of the
>> KVM_ARM_GET_REG_WRITABLE_MASKS ioctl.
>
> Urgh, yeah in retrospect I think we should've constrained this to the
> registers KVM considers "id regs" (the space occupied by known registers
> or otherwise RAZ).
>
>> Explicitely set the mask for CLIDR_EL1 to make sure we present
>> the same value to userspace than before.
>
> typo: explicitly
>
> Also, we know the set of mutable bits for CLIDR_EL1 as it is handled in
> set_clidr(). The mask really should match that, since the UAPI is
> documented as "allowing userspace to know what fields can be changed for
> the system register"

Done for the next revision.

Thanks,
Sebastian


2024-05-03 15:07:12

by Sebastian Ott

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] KVM: arm64: unify trap setup code

On Wed, 1 May 2024, Oliver Upton wrote:
> On Fri, Apr 26, 2024 at 12:49:46PM +0200, Sebastian Ott wrote:
>> There are 2 functions to set up traps via HCR_EL2:
>
> nitpick: these functions *calculate* the trap values, but do not
> actually set them up. HCR_EL2 doesn't get written to until further down
> the line on KVM_RUN.
>
>> + if (cpus_have_final_cap(ARM64_HAS_STAGE2_FWB)) {
>> + vcpu->arch.hcr_el2 |= HCR_FWB;
>> + } else {
>> + /*
>> + * For non-FWB CPUs, we trap VM ops (HCR_EL2.TVM) until M+C
>> + * get set in SCTLR_EL1 such that we can detect when the guest
>> + * MMU gets turned on and do the necessary cache maintenance
>> + * then.
>> + */
>> + vcpu->arch.hcr_el2 |= HCR_TVM;
>> + }
>
> It seems to me like calling this once for the lifetime of a vCPU will
> break non-FWB behavior.
>
> Like the comment suggests, these traps are needed to catch the moment
> the S1 MMU is turned on and do cache maintenance to make sure D$ agrees
> with what the guest was doing before enabling the MMU.
>
> KVM_ARM_VCPU_INIT resets SCTLR_EL1, but it seems we'd miss setting
> HCR_TVM in that case.

Ugh, I didn't think about KVM_ARM_VCPU_INIT being called more than once.

But in that case don't we loose the changes done to hcr_el2 in the current
code? E.g.:

void kvm_init_sysreg(struct kvm_vcpu *vcpu)
{
..
if (!kvm_has_feat(kvm, ID_AA64ISAR0_EL1, TLB, OS))
vcpu->arch.hcr_el2 |= HCR_TTLBOS;
..
}

static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
{
vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS;
..
}

Thanks,
Sebastian


2024-05-03 15:50:12

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] KVM: arm64: add emulation for CTR_EL0 register

On Wed, 01 May 2024 09:15:09 +0100,
Oliver Upton <[email protected]> wrote:
>
> On Fri, Apr 26, 2024 at 12:49:48PM +0200, Sebastian Ott wrote:
> > CTR_EL0 is currently handled as an invariant register, thus
> > guests will be presented with the host value of that register.
> >
> > Add emulation for CTR_EL0 based on a per VM value. Userspace can
> > switch off DIC and IDC bits and reduce DminLine and IminLine sizes.
> >
> > When CTR_EL0 is changed validate that against CLIDR_EL1 and CCSIDR_EL1
> > to make sure we present the guest with consistent register values.
> > Changes that affect the generated cache topology values are allowed if
> > they don't clash with previous register writes.
>
> Sorry I didn't speak up earlier, but I'm not sold on the need to
> cross-validate userspace values for the cache type registers.
>
> KVM should only be concerned about whether or not the selected feature
> set matches what hardware is capable of and what KVM can virtualize. So
> in the context of the CTR and the cache topology, I feel that they
> should be _separately_ evaluated against the host's CTR_EL0.
>
> Inconsistencies between fields in userspace values should be out of
> scope; userspace shares the responsibility of presenting something
> architectural, especially if it starts modifying ID registers. Otherwise
> I'm quite worried about the amount of glue required to plumb exhaustive
> consitency checks for registers, especially considering the lack of
> ordering.
>
> Marc, I know this goes against what you had suggested earlier, is there
> something in particular that you think warrants the consistency
> checks?

The problem is that we have a dependency chain: individual cache
levels are validated against CLIDR/CCSIDR, which are themselves
validated against CTR_EL0.

Change one, and everything becomes inconsistent. I absolutely don't
trust userspace to do a good job on that, and not validating this will
result in extremely hard to debug issues in the guest. Which is why
CTR_EL0 was an invariant the first place, and everything derived from
it.

Take for example CLIDR_EL1.Lo{UU,UIS,C}. Their values depend on
CTR_EL0.{IDC,DIC}. SW is free to check one or the other. If you don't
have this dependency, you're in for some serious trouble.

The alternative is to *regenerate* the whole cache hierarchy when
CTR_EL0 is written, and too bad if it changes behind the guest's
back. Yes, the latter is a problem on its own...

Thanks,

M.

--
Without deviation from the norm, progress is not possible.

2024-05-03 17:28:12

by Oliver Upton

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] KVM: arm64: add emulation for CTR_EL0 register

On Fri, May 03, 2024 at 04:50:02PM +0100, Marc Zyngier wrote:
> > Marc, I know this goes against what you had suggested earlier, is there
> > something in particular that you think warrants the consistency
> > checks?
>
> The problem is that we have a dependency chain: individual cache
> levels are validated against CLIDR/CCSIDR, which are themselves
> validated against CTR_EL0.
>
> Change one, and everything becomes inconsistent. I absolutely don't
> trust userspace to do a good job on that

Violent agreement on this point, heh.

> and not validating this will result in extremely hard to debug issues
> in the guest. Which is why CTR_EL0 was an invariant the first place,
> and everything derived from it.

Sure, but userspace can completely hose the guest in tons of spectacular
ways, I don't see why feature ID registers require thorough
cross-checking of relationships between CPU features.

We already fail at this. Just looking at ID_AA64ISAR0_EL1, we do not
enforce any of the "FEAT_X implies FEAT_Y" relationships between all of
the crypto extensions. Userspace can also setup ID_AA64MMFR0_EL1 to
advertise that no translation granule is supported by the MMU.

I agree that KVM needs to sanitize feature ID registers against the
capabilities of hardware + KVM itself. Beyond that cross-checking
userspace against itself is difficult to get right, and I'm worried
about what the tangled mess will look like when we finish up the
plumbing for the whole feature ID space.

> Take for example CLIDR_EL1.Lo{UU,UIS,C}. Their values depend on
> CTR_EL0.{IDC,DIC}. SW is free to check one or the other. If you don't
> have this dependency, you're in for some serious trouble.

Right, we absolutely need to sanitize these against *hardware*, and
using CTR_EL0 definitely the way to go. Userspace cannot promise a
stricter cache coherency model than what's offered in hardware.

Making sure userspace's values for CLIDR_EL1 and CTR_EL0 agree with each
other shouldn't matter if we've determined hardware coherency is at least
as strict as the model described through these registers.

Without the cross-check, it would be possible for userspace to setup the
vCPU as:

- CTR_EL0.{IDC,DIC} = {1, 1}
- CLIDR_EL1.Lo{UU,UIS,C} = {1, 1, 1}

But we would only allow this if hardware was {IDC,DIC} = {1,1}. So while
the values presented to the guest aren't consistent with one another, it
seems in the worst case the guest will do I$ maintenance where it isn't
actually necessary.

--
Thanks,
Oliver

2024-05-08 15:25:53

by Sebastian Ott

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] KVM: arm64: add emulation for CTR_EL0 register

Hej Oliver,

On Wed, 1 May 2024, Oliver Upton wrote:
>> +static u64 reset_ctr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd)
>> +{
>> + vcpu->kvm->arch.ctr_el0 = read_sanitised_ftr_reg(SYS_CTR_EL0);
>> + return vcpu->kvm->arch.ctr_el0;
>> +}
>> +
>
> We definitely do not want this value to change across a vCPU reset, it
> should be handled like the other ID registers where they only get reset
> once for the VM lifetime.

Hm, maybe I'm misreading the code here but I don't think this is true
for existing regs e.g. CLIDR_EL1 or the stuff defined via ID_WRITABLE().

Sebastian


2024-05-08 17:18:58

by Oliver Upton

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] KVM: arm64: add emulation for CTR_EL0 register

On Wed, May 08, 2024 at 05:17:25PM +0200, Sebastian Ott wrote:
> Hej Oliver,
>
> On Wed, 1 May 2024, Oliver Upton wrote:
> > > +static u64 reset_ctr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd)
> > > +{
> > > + vcpu->kvm->arch.ctr_el0 = read_sanitised_ftr_reg(SYS_CTR_EL0);
> > > + return vcpu->kvm->arch.ctr_el0;
> > > +}
> > > +
> >
> > We definitely do not want this value to change across a vCPU reset, it
> > should be handled like the other ID registers where they only get reset
> > once for the VM lifetime.
>
> Hm, maybe I'm misreading the code here but I don't think this is true
> for existing regs e.g. CLIDR_EL1 or the stuff defined via ID_WRITABLE().

This works for the feature ID registers we maintain per-VM, but not for
the feature ID registers local to a vCPU. Sent some fixes out for this
but forgot to Cc you on it, apologies.

https://lore.kernel.org/kvmarm/[email protected]/

--
Thanks,
Oliver