Hej folks,
I'm looking into supporting migration between 2 Ampere Altra (Max)
machines (using Neoverse-N1). They are almost identical regarding
their feature id register state except for CTR_EL0.DIC which is set
on one machine but not the other.
CTR_EL0 is currently marked as invariant and migrating a VM between
those 2 machines using qemu fails.
I've send this out as an RFC before [0]. What I've changed here
based on feedback I've got from Marc was:
* store the emulated value per VM and not per VCPU
* allow to change more values than just the DIC bit
* only trap guest access to that reg when needed
* make sure to not present the guest with an inconsistent register set
Patch 2 adds a ctr_el0 member to struct kvm_arch - would it be preferred
to add it into kvm->arch.id_regs[] (this would mean to increase that
array 4x)?
Patch 3 resets CLIDR_EL1 after a write to CTR_EL0 potentially changing
the value for CLIDR_EL1 - would that be ok for userspace?
Thanks,
Sebastian
[0]: https://lore.kernel.org/all/[email protected]/T/
Sebastian Ott (4):
KVM: arm64: change return value in arm64_check_features()
KVM: arm64: maintain per VM value for CTR_EL0
KVM: arm64: add emulation for CTR_EL0 register
KVM: arm64: show writable masks for feature registers
arch/arm64/include/asm/kvm_host.h | 1 +
arch/arm64/kvm/sys_regs.c | 131 ++++++++++++++++++++----------
2 files changed, 89 insertions(+), 43 deletions(-)
--
2.42.0
arm64_check_features() returns -E2BIG to indicate the register's
feature set is a superset of the maximally-allowed register value.
The only caller of that function changes this to -EINVAL since
that's what userspace expects for invalid register writes.
In preparation of adding another caller for arm64_check_features()
that would need to do the same conversion just return -EINVAL
directly.
Signed-off-by: Sebastian Ott <[email protected]>
---
arch/arm64/kvm/sys_regs.c | 18 ++++--------------
1 file changed, 4 insertions(+), 14 deletions(-)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index c9f4f387155f..131f5b0ca2b9 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1445,7 +1445,7 @@ static s64 kvm_arm64_ftr_safe_value(u32 id, const struct arm64_ftr_bits *ftrp,
* considered the safe value regardless For register fields that are not in
* writable, only the value in limit is considered the safe value.
*
- * Return: 0 if all the fields are safe. Otherwise, return negative errno.
+ * Return: 0 if all the fields are safe. Otherwise, return -EINVAL.
*/
static int arm64_check_features(struct kvm_vcpu *vcpu,
const struct sys_reg_desc *rd,
@@ -1464,7 +1464,7 @@ static int arm64_check_features(struct kvm_vcpu *vcpu,
* only safe value is 0.
*/
if (sysreg_visible_as_raz(vcpu, rd))
- return val ? -E2BIG : 0;
+ return val ? -EINVAL : 0;
ftr_reg = get_arm64_ftr_reg(id);
if (!ftr_reg)
@@ -1490,12 +1490,12 @@ static int arm64_check_features(struct kvm_vcpu *vcpu,
safe_val = kvm_arm64_ftr_safe_value(id, ftrp, f_val, f_lim);
if (safe_val != f_val)
- return -E2BIG;
+ return -EINVAL;
}
/* For fields that are not writable, values in limit are the safe values. */
if ((val & ~mask) != (limit & ~mask))
- return -E2BIG;
+ return -EINVAL;
return 0;
}
@@ -1840,16 +1840,6 @@ static int set_id_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
IDREG(vcpu->kvm, id) = val;
mutex_unlock(&vcpu->kvm->arch.config_lock);
-
- /*
- * arm64_check_features() returns -E2BIG to indicate the register's
- * feature set is a superset of the maximally-allowed register value.
- * While it would be nice to precisely describe this to userspace, the
- * existing UAPI for KVM_SET_ONE_REG has it that invalid register
- * writes return -EINVAL.
- */
- if (ret == -E2BIG)
- ret = -EINVAL;
return ret;
}
--
2.42.0
In preparation for CTR_EL0 emulation maintain a per VM for this
register and use it where appropriate.
Signed-off-by: Sebastian Ott <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 1 +
arch/arm64/kvm/sys_regs.c | 22 +++++++++++++++-------
2 files changed, 16 insertions(+), 7 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 9e8a496fb284..481216febb46 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -318,6 +318,7 @@ struct kvm_arch {
/* PMCR_EL0.N value for the guest */
u8 pmcr_n;
+ u64 ctr_el0;
/* Iterator for idreg debugfs */
u8 idreg_debugfs_iter;
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 131f5b0ca2b9..4d29b1a0842d 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -215,13 +215,21 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
/* CSSELR values; used to index KVM_REG_ARM_DEMUX_ID_CCSIDR */
#define CSSELR_MAX 14
+static u64 kvm_get_ctr_el0(struct kvm *kvm)
+{
+ if (kvm->arch.ctr_el0)
+ return kvm->arch.ctr_el0;
+
+ return read_sanitised_ftr_reg(SYS_CTR_EL0);
+}
+
/*
* Returns the minimum line size for the selected cache, expressed as
* Log2(bytes).
*/
-static u8 get_min_cache_line_size(bool icache)
+static u8 get_min_cache_line_size(struct kvm *kvm, bool icache)
{
- u64 ctr = read_sanitised_ftr_reg(SYS_CTR_EL0);
+ u64 ctr = kvm_get_ctr_el0(kvm);
u8 field;
if (icache)
@@ -248,7 +256,7 @@ static u32 get_ccsidr(struct kvm_vcpu *vcpu, u32 csselr)
if (vcpu->arch.ccsidr)
return vcpu->arch.ccsidr[csselr];
- line_size = get_min_cache_line_size(csselr & CSSELR_EL1_InD);
+ line_size = get_min_cache_line_size(vcpu->kvm, csselr & CSSELR_EL1_InD);
/*
* Fabricate a CCSIDR value as the overriding value does not exist.
@@ -283,7 +291,7 @@ static int set_ccsidr(struct kvm_vcpu *vcpu, u32 csselr, u32 val)
u32 i;
if ((val & CCSIDR_EL1_RES0) ||
- line_size < get_min_cache_line_size(csselr & CSSELR_EL1_InD))
+ line_size < get_min_cache_line_size(vcpu->kvm, csselr & CSSELR_EL1_InD))
return -EINVAL;
if (!ccsidr) {
@@ -1862,7 +1870,7 @@ static bool access_ctr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
if (p->is_write)
return write_to_read_only(vcpu, p, r);
- p->regval = read_sanitised_ftr_reg(SYS_CTR_EL0);
+ p->regval = kvm_get_ctr_el0(vcpu->kvm);
return true;
}
@@ -1882,7 +1890,7 @@ static bool access_clidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
*/
static u64 reset_clidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
{
- u64 ctr_el0 = read_sanitised_ftr_reg(SYS_CTR_EL0);
+ u64 ctr_el0 = kvm_get_ctr_el0(vcpu->kvm);
u64 clidr;
u8 loc;
@@ -1935,7 +1943,7 @@ static u64 reset_clidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
static int set_clidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
u64 val)
{
- u64 ctr_el0 = read_sanitised_ftr_reg(SYS_CTR_EL0);
+ u64 ctr_el0 = kvm_get_ctr_el0(vcpu->kvm);
u64 idc = !CLIDR_LOC(val) || (!CLIDR_LOUIS(val) && !CLIDR_LOUU(val));
if ((val & CLIDR_EL1_RES0) || (!(ctr_el0 & CTR_EL0_IDC) && idc))
--
2.42.0
Instead of using ~0UL provide the actual writable mask for
non-id feature registers in the output of the
KVM_ARM_GET_REG_WRITABLE_MASKS ioctl.
Explicitely set the mask for CLIDR_EL1 to make sure we present
the same value to userspace than before.
This changes the mask for the CTR_EL0 register in the output
of the KVM_ARM_GET_REG_WRITABLE_MASKS ioctl.
Signed-off-by: Sebastian Ott <[email protected]>
---
arch/arm64/kvm/sys_regs.c | 19 +++++--------------
1 file changed, 5 insertions(+), 14 deletions(-)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index b0ba292259f9..947aa9c0784f 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2505,7 +2505,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ SYS_DESC(SYS_CCSIDR_EL1), access_ccsidr },
{ SYS_DESC(SYS_CLIDR_EL1), access_clidr, reset_clidr, CLIDR_EL1,
- .set_user = set_clidr },
+ .set_user = set_clidr, .val = ~0UL },
{ SYS_DESC(SYS_CCSIDR2_EL1), undef_access },
{ SYS_DESC(SYS_SMIDR_EL1), undef_access },
{ SYS_DESC(SYS_CSSELR_EL1), access_csselr, reset_unknown, CSSELR_EL1 },
@@ -4059,20 +4059,11 @@ int kvm_vm_ioctl_get_reg_writable_masks(struct kvm *kvm, struct reg_mask_range *
if (!is_feature_id_reg(encoding) || !reg->set_user)
continue;
- /*
- * For ID registers, we return the writable mask. Other feature
- * registers return a full 64bit mask. That's not necessary
- * compliant with a given revision of the architecture, but the
- * RES0/RES1 definitions allow us to do that.
- */
- if (is_id_reg(encoding)) {
- if (!reg->val ||
- (is_aa32_id_reg(encoding) && !kvm_supports_32bit_el0()))
- continue;
- val = reg->val;
- } else {
- val = ~0UL;
+ if (!reg->val ||
+ (is_aa32_id_reg(encoding) && !kvm_supports_32bit_el0())) {
+ continue;
}
+ val = reg->val;
if (put_user(val, (masks + KVM_ARM_FEATURE_ID_RANGE_INDEX(encoding))))
return -EFAULT;
--
2.42.0
CTR_EL0 is currently handled as an invariant register, thus
guests will be presented with the host value of that register.
Add emulation for CTR_EL0 based on a per VM value.
When CTR_EL0 is changed the reset function for CLIDR_EL1 is
called to make sure we present the guest with consistent
register values.
Signed-off-by: Sebastian Ott <[email protected]>
---
arch/arm64/kvm/sys_regs.c | 72 ++++++++++++++++++++++++++++++++++-----
1 file changed, 64 insertions(+), 8 deletions(-)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 4d29b1a0842d..b0ba292259f9 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1874,6 +1874,55 @@ static bool access_ctr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
return true;
}
+static u64 reset_ctr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd)
+{
+ vcpu->kvm->arch.ctr_el0 = 0;
+ return kvm_get_ctr_el0(vcpu->kvm);
+}
+
+static int get_ctr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
+ u64 *val)
+{
+ *val = kvm_get_ctr_el0(vcpu->kvm);
+ return 0;
+}
+
+static const struct sys_reg_desc *get_sys_reg_desc(u32 encoding);
+
+static int set_ctr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
+ u64 val)
+{
+ u64 host_val = read_sanitised_ftr_reg(SYS_CTR_EL0);
+ u64 old_val = kvm_get_ctr_el0(vcpu->kvm);
+ const struct sys_reg_desc *clidr_el1;
+ int ret;
+
+ if (val == old_val)
+ return 0;
+
+ if (kvm_vm_has_ran_once(vcpu->kvm))
+ return -EBUSY;
+
+ mutex_lock(&vcpu->kvm->arch.config_lock);
+ ret = arm64_check_features(vcpu, rd, val);
+ if (ret) {
+ mutex_unlock(&vcpu->kvm->arch.config_lock);
+ return ret;
+ }
+ if (val != host_val)
+ vcpu->kvm->arch.ctr_el0 = val;
+ else
+ vcpu->kvm->arch.ctr_el0 = 0;
+
+ mutex_unlock(&vcpu->kvm->arch.config_lock);
+
+ clidr_el1 = get_sys_reg_desc(SYS_CLIDR_EL1);
+ if (clidr_el1)
+ clidr_el1->reset(vcpu, clidr_el1);
+
+ return 0;
+}
+
static bool access_clidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
const struct sys_reg_desc *r)
{
@@ -2460,7 +2509,11 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ SYS_DESC(SYS_CCSIDR2_EL1), undef_access },
{ SYS_DESC(SYS_SMIDR_EL1), undef_access },
{ SYS_DESC(SYS_CSSELR_EL1), access_csselr, reset_unknown, CSSELR_EL1 },
- { SYS_DESC(SYS_CTR_EL0), access_ctr },
+ { SYS_DESC(SYS_CTR_EL0), access_ctr, .reset = reset_ctr,
+ .get_user = get_ctr, .set_user = set_ctr, .val = (CTR_EL0_DIC_MASK |
+ CTR_EL0_IDC_MASK |
+ CTR_EL0_DminLine_MASK |
+ CTR_EL0_IminLine_MASK)},
{ SYS_DESC(SYS_SVCR), undef_access },
{ PMU_SYS_REG(PMCR_EL0), .access = access_pmcr, .reset = reset_pmcr,
@@ -3623,6 +3676,13 @@ static bool index_to_params(u64 id, struct sys_reg_params *params)
}
}
+static const struct sys_reg_desc *get_sys_reg_desc(u32 encoding)
+{
+ struct sys_reg_params params = encoding_to_params(encoding);
+
+ return find_reg(¶ms, sys_reg_descs, ARRAY_SIZE(sys_reg_descs));
+}
+
const struct sys_reg_desc *get_reg_by_id(u64 id,
const struct sys_reg_desc table[],
unsigned int num)
@@ -3676,18 +3736,11 @@ FUNCTION_INVARIANT(midr_el1)
FUNCTION_INVARIANT(revidr_el1)
FUNCTION_INVARIANT(aidr_el1)
-static u64 get_ctr_el0(struct kvm_vcpu *v, const struct sys_reg_desc *r)
-{
- ((struct sys_reg_desc *)r)->val = read_sanitised_ftr_reg(SYS_CTR_EL0);
- return ((struct sys_reg_desc *)r)->val;
-}
-
/* ->val is filled in by kvm_sys_reg_table_init() */
static struct sys_reg_desc invariant_sys_regs[] __ro_after_init = {
{ SYS_DESC(SYS_MIDR_EL1), NULL, get_midr_el1 },
{ SYS_DESC(SYS_REVIDR_EL1), NULL, get_revidr_el1 },
{ SYS_DESC(SYS_AIDR_EL1), NULL, get_aidr_el1 },
- { SYS_DESC(SYS_CTR_EL0), NULL, get_ctr_el0 },
};
static int get_invariant_sys_reg(u64 id, u64 __user *uaddr)
@@ -4049,6 +4102,9 @@ void kvm_init_sysreg(struct kvm_vcpu *vcpu)
vcpu->arch.hcrx_el2 |= (HCRX_EL2_MSCEn | HCRX_EL2_MCE2);
}
+ if (vcpu->kvm->arch.ctr_el0)
+ vcpu->arch.hcr_el2 |= HCR_TID2;
+
if (test_bit(KVM_ARCH_FLAG_FGU_INITIALIZED, &kvm->arch.flags))
goto out;
--
2.42.0
On Fri, 05 Apr 2024 13:01:06 +0100,
Sebastian Ott <[email protected]> wrote:
>
> In preparation for CTR_EL0 emulation maintain a per VM for this
> register and use it where appropriate.
>
> Signed-off-by: Sebastian Ott <[email protected]>
> ---
> arch/arm64/include/asm/kvm_host.h | 1 +
> arch/arm64/kvm/sys_regs.c | 22 +++++++++++++++-------
> 2 files changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 9e8a496fb284..481216febb46 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -318,6 +318,7 @@ struct kvm_arch {
>
> /* PMCR_EL0.N value for the guest */
> u8 pmcr_n;
> + u64 ctr_el0;
>
> /* Iterator for idreg debugfs */
> u8 idreg_debugfs_iter;
Please consider the alignment of the fields. This leaves a 7 byte hole
that could be avoided (yes, I'm on a mission to reduce the size of the
various structures, because they are absolute pigs).
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 131f5b0ca2b9..4d29b1a0842d 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -215,13 +215,21 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
> /* CSSELR values; used to index KVM_REG_ARM_DEMUX_ID_CCSIDR */
> #define CSSELR_MAX 14
>
> +static u64 kvm_get_ctr_el0(struct kvm *kvm)
> +{
> + if (kvm->arch.ctr_el0)
> + return kvm->arch.ctr_el0;
Is this relying on some bits not being 0?
> +
> + return read_sanitised_ftr_reg(SYS_CTR_EL0);
Why isn't the shadow value always populated?
> +}
> +
> /*
> * Returns the minimum line size for the selected cache, expressed as
> * Log2(bytes).
> */
> -static u8 get_min_cache_line_size(bool icache)
> +static u8 get_min_cache_line_size(struct kvm *kvm, bool icache)
> {
> - u64 ctr = read_sanitised_ftr_reg(SYS_CTR_EL0);
> + u64 ctr = kvm_get_ctr_el0(kvm);
> u8 field;
>
> if (icache)
> @@ -248,7 +256,7 @@ static u32 get_ccsidr(struct kvm_vcpu *vcpu, u32 csselr)
> if (vcpu->arch.ccsidr)
> return vcpu->arch.ccsidr[csselr];
>
> - line_size = get_min_cache_line_size(csselr & CSSELR_EL1_InD);
> + line_size = get_min_cache_line_size(vcpu->kvm, csselr & CSSELR_EL1_InD);
>
> /*
> * Fabricate a CCSIDR value as the overriding value does not exist.
> @@ -283,7 +291,7 @@ static int set_ccsidr(struct kvm_vcpu *vcpu, u32 csselr, u32 val)
> u32 i;
>
> if ((val & CCSIDR_EL1_RES0) ||
> - line_size < get_min_cache_line_size(csselr & CSSELR_EL1_InD))
> + line_size < get_min_cache_line_size(vcpu->kvm, csselr & CSSELR_EL1_InD))
> return -EINVAL;
>
> if (!ccsidr) {
> @@ -1862,7 +1870,7 @@ static bool access_ctr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> if (p->is_write)
> return write_to_read_only(vcpu, p, r);
>
> - p->regval = read_sanitised_ftr_reg(SYS_CTR_EL0);
> + p->regval = kvm_get_ctr_el0(vcpu->kvm);
> return true;
> }
>
> @@ -1882,7 +1890,7 @@ static bool access_clidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> */
> static u64 reset_clidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> {
> - u64 ctr_el0 = read_sanitised_ftr_reg(SYS_CTR_EL0);
> + u64 ctr_el0 = kvm_get_ctr_el0(vcpu->kvm);
> u64 clidr;
> u8 loc;
>
> @@ -1935,7 +1943,7 @@ static u64 reset_clidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> static int set_clidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
> u64 val)
> {
> - u64 ctr_el0 = read_sanitised_ftr_reg(SYS_CTR_EL0);
> + u64 ctr_el0 = kvm_get_ctr_el0(vcpu->kvm);
> u64 idc = !CLIDR_LOC(val) || (!CLIDR_LOUIS(val) && !CLIDR_LOUU(val));
>
> if ((val & CLIDR_EL1_RES0) || (!(ctr_el0 & CTR_EL0_IDC) && idc))
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
On Fri, 05 Apr 2024 13:01:07 +0100,
Sebastian Ott <[email protected]> wrote:
>
> CTR_EL0 is currently handled as an invariant register, thus
> guests will be presented with the host value of that register.
> Add emulation for CTR_EL0 based on a per VM value.
>
> When CTR_EL0 is changed the reset function for CLIDR_EL1 is
> called to make sure we present the guest with consistent
> register values.
Isn't that a change in the userspace ABI? You are now creating an
explicit ordering between the write to CTR_EL0 and the rest of the
cache hierarchy registers. It has the obvious capacity to lead to the
wrong result in a silent way...
>
> Signed-off-by: Sebastian Ott <[email protected]>
> ---
> arch/arm64/kvm/sys_regs.c | 72 ++++++++++++++++++++++++++++++++++-----
> 1 file changed, 64 insertions(+), 8 deletions(-)
>
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 4d29b1a0842d..b0ba292259f9 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1874,6 +1874,55 @@ static bool access_ctr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> return true;
> }
>
> +static u64 reset_ctr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd)
> +{
> + vcpu->kvm->arch.ctr_el0 = 0;
> + return kvm_get_ctr_el0(vcpu->kvm);
I'd expect the cached value to be reset instead of being set to
0. What are you achieving by this?
> +}
> +
> +static int get_ctr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
> + u64 *val)
> +{
> + *val = kvm_get_ctr_el0(vcpu->kvm);
> + return 0;
> +}
> +
> +static const struct sys_reg_desc *get_sys_reg_desc(u32 encoding);
> +
> +static int set_ctr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
> + u64 val)
> +{
> + u64 host_val = read_sanitised_ftr_reg(SYS_CTR_EL0);
> + u64 old_val = kvm_get_ctr_el0(vcpu->kvm);
> + const struct sys_reg_desc *clidr_el1;
> + int ret;
> +
> + if (val == old_val)
> + return 0;
> +
> + if (kvm_vm_has_ran_once(vcpu->kvm))
> + return -EBUSY;
> +
> + mutex_lock(&vcpu->kvm->arch.config_lock);
> + ret = arm64_check_features(vcpu, rd, val);
> + if (ret) {
> + mutex_unlock(&vcpu->kvm->arch.config_lock);
> + return ret;
> + }
> + if (val != host_val)
> + vcpu->kvm->arch.ctr_el0 = val;
> + else
> + vcpu->kvm->arch.ctr_el0 = 0;
> +
> + mutex_unlock(&vcpu->kvm->arch.config_lock);
> +
> + clidr_el1 = get_sys_reg_desc(SYS_CLIDR_EL1);
> + if (clidr_el1)
> + clidr_el1->reset(vcpu, clidr_el1);
> +
> + return 0;
No check against what can be changed, and in what direction? You seem
to be allowing a guest to migrate from a host with IDC==1 to one where
IDC==0 (same for DIC). How can that work? Same for the cache lines,
which can be larger on the target... How will the guest survive that?
> +}
> +
> static bool access_clidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> const struct sys_reg_desc *r)
> {
> @@ -2460,7 +2509,11 @@ static const struct sys_reg_desc sys_reg_descs[] = {
> { SYS_DESC(SYS_CCSIDR2_EL1), undef_access },
> { SYS_DESC(SYS_SMIDR_EL1), undef_access },
> { SYS_DESC(SYS_CSSELR_EL1), access_csselr, reset_unknown, CSSELR_EL1 },
> - { SYS_DESC(SYS_CTR_EL0), access_ctr },
> + { SYS_DESC(SYS_CTR_EL0), access_ctr, .reset = reset_ctr,
> + .get_user = get_ctr, .set_user = set_ctr, .val = (CTR_EL0_DIC_MASK |
> + CTR_EL0_IDC_MASK |
> + CTR_EL0_DminLine_MASK |
> + CTR_EL0_IminLine_MASK)},
> { SYS_DESC(SYS_SVCR), undef_access },
>
> { PMU_SYS_REG(PMCR_EL0), .access = access_pmcr, .reset = reset_pmcr,
> @@ -3623,6 +3676,13 @@ static bool index_to_params(u64 id, struct sys_reg_params *params)
> }
> }
>
> +static const struct sys_reg_desc *get_sys_reg_desc(u32 encoding)
> +{
> + struct sys_reg_params params = encoding_to_params(encoding);
> +
> + return find_reg(¶ms, sys_reg_descs, ARRAY_SIZE(sys_reg_descs));
> +}
> +
> const struct sys_reg_desc *get_reg_by_id(u64 id,
> const struct sys_reg_desc table[],
> unsigned int num)
> @@ -3676,18 +3736,11 @@ FUNCTION_INVARIANT(midr_el1)
> FUNCTION_INVARIANT(revidr_el1)
> FUNCTION_INVARIANT(aidr_el1)
>
> -static u64 get_ctr_el0(struct kvm_vcpu *v, const struct sys_reg_desc *r)
> -{
> - ((struct sys_reg_desc *)r)->val = read_sanitised_ftr_reg(SYS_CTR_EL0);
> - return ((struct sys_reg_desc *)r)->val;
> -}
> -
> /* ->val is filled in by kvm_sys_reg_table_init() */
> static struct sys_reg_desc invariant_sys_regs[] __ro_after_init = {
> { SYS_DESC(SYS_MIDR_EL1), NULL, get_midr_el1 },
> { SYS_DESC(SYS_REVIDR_EL1), NULL, get_revidr_el1 },
> { SYS_DESC(SYS_AIDR_EL1), NULL, get_aidr_el1 },
> - { SYS_DESC(SYS_CTR_EL0), NULL, get_ctr_el0 },
> };
>
> static int get_invariant_sys_reg(u64 id, u64 __user *uaddr)
> @@ -4049,6 +4102,9 @@ void kvm_init_sysreg(struct kvm_vcpu *vcpu)
> vcpu->arch.hcrx_el2 |= (HCRX_EL2_MSCEn | HCRX_EL2_MCE2);
> }
>
> + if (vcpu->kvm->arch.ctr_el0)
> + vcpu->arch.hcr_el2 |= HCR_TID2;
Why trap CTR_EL0 if the values are the same as the host? I really
dislike the use of the value 0 as a such an indication. Why isn't this
grouped with the traps in vcpu_reset_hcr()?
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
On Sat, 13 Apr 2024, Marc Zyngier wrote:
> On Fri, 05 Apr 2024 13:01:06 +0100,
> Sebastian Ott <[email protected]> wrote:
>>
>> In preparation for CTR_EL0 emulation maintain a per VM for this
>> register and use it where appropriate.
>>
>> Signed-off-by: Sebastian Ott <[email protected]>
>> ---
>> arch/arm64/include/asm/kvm_host.h | 1 +
>> arch/arm64/kvm/sys_regs.c | 22 +++++++++++++++-------
>> 2 files changed, 16 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 9e8a496fb284..481216febb46 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -318,6 +318,7 @@ struct kvm_arch {
>>
>> /* PMCR_EL0.N value for the guest */
>> u8 pmcr_n;
>> + u64 ctr_el0;
>>
>> /* Iterator for idreg debugfs */
>> u8 idreg_debugfs_iter;
>
> Please consider the alignment of the fields. This leaves a 7 byte hole
> that could be avoided (yes, I'm on a mission to reduce the size of the
> various structures, because they are absolute pigs).
OK.
>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>> index 131f5b0ca2b9..4d29b1a0842d 100644
>> --- a/arch/arm64/kvm/sys_regs.c
>> +++ b/arch/arm64/kvm/sys_regs.c
>> @@ -215,13 +215,21 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
>> /* CSSELR values; used to index KVM_REG_ARM_DEMUX_ID_CCSIDR */
>> #define CSSELR_MAX 14
>>
>> +static u64 kvm_get_ctr_el0(struct kvm *kvm)
>> +{
>> + if (kvm->arch.ctr_el0)
>> + return kvm->arch.ctr_el0;
>
> Is this relying on some bits not being 0?
>
>> +
>> + return read_sanitised_ftr_reg(SYS_CTR_EL0);
>
> Why isn't the shadow value always populated?
The idea was for kvm->arch.ctr_el0 being non zero only if userspace
set it up to differ from the host value. So it can be used to decide
if we need to set up a trap for the reg access (without comparing it
to the host value again).
Thanks,
Sebastian
On Sat, 13 Apr 2024, Marc Zyngier wrote:
> On Fri, 05 Apr 2024 13:01:07 +0100,
> Sebastian Ott <[email protected]> wrote:
>>
>> CTR_EL0 is currently handled as an invariant register, thus
>> guests will be presented with the host value of that register.
>> Add emulation for CTR_EL0 based on a per VM value.
>>
>> When CTR_EL0 is changed the reset function for CLIDR_EL1 is
>> called to make sure we present the guest with consistent
>> register values.
>
> Isn't that a change in the userspace ABI? You are now creating an
> explicit ordering between the write to CTR_EL0 and the rest of the
> cache hierarchy registers. It has the obvious capacity to lead to the
> wrong result in a silent way...
Yea, that's why I've asked in the cover letter if userspace would be
ok with that. I thought that this is what you suggested in your reply
to the RFC.
(https://lore.kernel.org/linux-arm-kernel/[email protected]/T/#m0aea5b744774f123bd9a4a4b7be6878f6c737f63)
But I guess I've got that wrong.
Do we have other means to handle the dependencies between registers?
Allow inconsistent values and do a sanity check before the first
vcpu_run()?
>>
>> Signed-off-by: Sebastian Ott <[email protected]>
>> ---
>> arch/arm64/kvm/sys_regs.c | 72 ++++++++++++++++++++++++++++++++++-----
>> 1 file changed, 64 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>> index 4d29b1a0842d..b0ba292259f9 100644
>> --- a/arch/arm64/kvm/sys_regs.c
>> +++ b/arch/arm64/kvm/sys_regs.c
>> @@ -1874,6 +1874,55 @@ static bool access_ctr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>> return true;
>> }
>>
>> +static u64 reset_ctr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd)
>> +{
>> + vcpu->kvm->arch.ctr_el0 = 0;
>> + return kvm_get_ctr_el0(vcpu->kvm);
>
> I'd expect the cached value to be reset instead of being set to
> 0. What are you achieving by this?
The idea was that kvm->arch.ctr_el0 == 0 means we use the host value and
don't set up a trap.
>> +}
>> +
>> +static int get_ctr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
>> + u64 *val)
>> +{
>> + *val = kvm_get_ctr_el0(vcpu->kvm);
>> + return 0;
>> +}
>> +
>> +static const struct sys_reg_desc *get_sys_reg_desc(u32 encoding);
>> +
>> +static int set_ctr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
>> + u64 val)
>> +{
>> + u64 host_val = read_sanitised_ftr_reg(SYS_CTR_EL0);
>> + u64 old_val = kvm_get_ctr_el0(vcpu->kvm);
>> + const struct sys_reg_desc *clidr_el1;
>> + int ret;
>> +
>> + if (val == old_val)
>> + return 0;
>> +
>> + if (kvm_vm_has_ran_once(vcpu->kvm))
>> + return -EBUSY;
>> +
>> + mutex_lock(&vcpu->kvm->arch.config_lock);
>> + ret = arm64_check_features(vcpu, rd, val);
>> + if (ret) {
>> + mutex_unlock(&vcpu->kvm->arch.config_lock);
>> + return ret;
>> + }
>> + if (val != host_val)
>> + vcpu->kvm->arch.ctr_el0 = val;
>> + else
>> + vcpu->kvm->arch.ctr_el0 = 0;
>> +
>> + mutex_unlock(&vcpu->kvm->arch.config_lock);
>> +
>> + clidr_el1 = get_sys_reg_desc(SYS_CLIDR_EL1);
>> + if (clidr_el1)
>> + clidr_el1->reset(vcpu, clidr_el1);
>> +
>> + return 0;
>
> No check against what can be changed, and in what direction? You seem
> to be allowing a guest to migrate from a host with IDC==1 to one where
> IDC==0 (same for DIC). How can that work? Same for the cache lines,
> which can be larger on the target... How will the guest survive that?
Shouldn't this all be handled by arm64_check_features() using the safe
value definitions from ftr_ctr? (I'll double check that..)
>> @@ -4049,6 +4102,9 @@ void kvm_init_sysreg(struct kvm_vcpu *vcpu)
>> vcpu->arch.hcrx_el2 |= (HCRX_EL2_MSCEn | HCRX_EL2_MCE2);
>> }
>>
>> + if (vcpu->kvm->arch.ctr_el0)
>> + vcpu->arch.hcr_el2 |= HCR_TID2;
>
> Why trap CTR_EL0 if the values are the same as the host?
For values same as host vcpu->kvm->arch.ctr_el0 would be zero and
reg access would not be trapped.
> I really dislike the use of the value 0 as a such an indication.
OK.
> Why isn't this grouped with the traps in vcpu_reset_hcr()?
I was under the impression that vcpu_reset_hcr() is called too early
to decide if we need to set up a trap or not (but lemme double check
that).
Thanks,
Sebastian
On Sat, 13 Apr 2024 14:50:42 +0100,
Sebastian Ott <[email protected]> wrote:
>
> On Sat, 13 Apr 2024, Marc Zyngier wrote:
>
> > On Fri, 05 Apr 2024 13:01:07 +0100,
> > Sebastian Ott <[email protected]> wrote:
> >>
> >> CTR_EL0 is currently handled as an invariant register, thus
> >> guests will be presented with the host value of that register.
> >> Add emulation for CTR_EL0 based on a per VM value.
> >>
> >> When CTR_EL0 is changed the reset function for CLIDR_EL1 is
> >> called to make sure we present the guest with consistent
> >> register values.
> >
> > Isn't that a change in the userspace ABI? You are now creating an
> > explicit ordering between the write to CTR_EL0 and the rest of the
> > cache hierarchy registers. It has the obvious capacity to lead to the
> > wrong result in a silent way...
>
> Yea, that's why I've asked in the cover letter if userspace would be
> ok with that. I thought that this is what you suggested in your reply
> to the
> RFC. (https://lore.kernel.org/linux-arm-kernel/[email protected]/T/#m0aea5b744774f123bd9a4a4b7be6878f6c737f63)
>
> But I guess I've got that wrong.
Not wrong, just incomplete. I think it is fine to recompute the cache
topology if there is no restored cache state at the point where
CTL_EL0 is written. However, if a topology has been restored (and that
it is incompatible with the write to CTR_EL0, the write must fail. The
ugly part is that the CCSIDR array is per vcpu and not per VM.
> Do we have other means to handle the dependencies between registers?
> Allow inconsistent values and do a sanity check before the first
> vcpu_run()?
Failing on vcpu_run() is the worse kind of failure, because you can't
easily find the failure cause, other than by looking at each single
register trying to spot the inconsistency.
In the case at hand, I think validating CTL_EL0 against the currently
visible topology is the right thing to do.
>
> >>
> >> Signed-off-by: Sebastian Ott <[email protected]>
> >> ---
> >> arch/arm64/kvm/sys_regs.c | 72 ++++++++++++++++++++++++++++++++++-----
> >> 1 file changed, 64 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> >> index 4d29b1a0842d..b0ba292259f9 100644
> >> --- a/arch/arm64/kvm/sys_regs.c
> >> +++ b/arch/arm64/kvm/sys_regs.c
> >> @@ -1874,6 +1874,55 @@ static bool access_ctr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> >> return true;
> >> }
> >>
> >> +static u64 reset_ctr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd)
> >> +{
> >> + vcpu->kvm->arch.ctr_el0 = 0;
> >> + return kvm_get_ctr_el0(vcpu->kvm);
> >
> > I'd expect the cached value to be reset instead of being set to
> > 0. What are you achieving by this?
>
> The idea was that kvm->arch.ctr_el0 == 0 means we use the host value and
> don't set up a trap.
I'd rather you keep the shadow register to a valid value at all times,
and simply compare it to the HW-provided version to decide whether you
need to trap. The main reason is that we don't know how the
architecture will evolve, and CTR_EL0==0 may become a legal value in
the future (unlikely, but that's outside of our control).
>
> >> +}
> >> +
> >> +static int get_ctr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
> >> + u64 *val)
> >> +{
> >> + *val = kvm_get_ctr_el0(vcpu->kvm);
> >> + return 0;
> >> +}
> >> +
> >> +static const struct sys_reg_desc *get_sys_reg_desc(u32 encoding);
> >> +
> >> +static int set_ctr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
> >> + u64 val)
> >> +{
> >> + u64 host_val = read_sanitised_ftr_reg(SYS_CTR_EL0);
> >> + u64 old_val = kvm_get_ctr_el0(vcpu->kvm);
> >> + const struct sys_reg_desc *clidr_el1;
> >> + int ret;
> >> +
> >> + if (val == old_val)
> >> + return 0;
> >> +
> >> + if (kvm_vm_has_ran_once(vcpu->kvm))
> >> + return -EBUSY;
> >> +
> >> + mutex_lock(&vcpu->kvm->arch.config_lock);
> >> + ret = arm64_check_features(vcpu, rd, val);
> >> + if (ret) {
> >> + mutex_unlock(&vcpu->kvm->arch.config_lock);
> >> + return ret;
> >> + }
> >> + if (val != host_val)
> >> + vcpu->kvm->arch.ctr_el0 = val;
> >> + else
> >> + vcpu->kvm->arch.ctr_el0 = 0;
> >> +
> >> + mutex_unlock(&vcpu->kvm->arch.config_lock);
> >> +
> >> + clidr_el1 = get_sys_reg_desc(SYS_CLIDR_EL1);
> >> + if (clidr_el1)
> >> + clidr_el1->reset(vcpu, clidr_el1);
> >> +
> >> + return 0;
> >
> > No check against what can be changed, and in what direction? You seem
> > to be allowing a guest to migrate from a host with IDC==1 to one where
> > IDC==0 (same for DIC). How can that work? Same for the cache lines,
> > which can be larger on the target... How will the guest survive that?
>
> Shouldn't this all be handled by arm64_check_features() using the safe
> value definitions from ftr_ctr? (I'll double check that..)
I think I may have read the code the wrong way. IDC/DIC should be OK
due to the feature check. I'm not sure about the line-size fields
though, and we should make sure that only a *smaller* line size is
allowed.
Then, there is the case of all the other fields. TminLine should get
the same treatment as the other cache line size fields, with the
additional constraint that it should be RES0 if the guest isn't MTE
aware. CWG and ERG are other interesting cases, and I don't think they
should be writable (your patch looks correct in that respect).
>
> >> @@ -4049,6 +4102,9 @@ void kvm_init_sysreg(struct kvm_vcpu *vcpu)
> >> vcpu->arch.hcrx_el2 |= (HCRX_EL2_MSCEn | HCRX_EL2_MCE2);
> >> }
> >>
> >> + if (vcpu->kvm->arch.ctr_el0)
> >> + vcpu->arch.hcr_el2 |= HCR_TID2;
> >
> > Why trap CTR_EL0 if the values are the same as the host?
>
> For values same as host vcpu->kvm->arch.ctr_el0 would be zero and
> reg access would not be trapped.
>
> > I really dislike the use of the value 0 as a such an indication.
>
> OK.
>
> > Why isn't this grouped with the traps in vcpu_reset_hcr()?
>
> I was under the impression that vcpu_reset_hcr() is called too early
> to decide if we need to set up a trap or not (but lemme double check
> that).
Could well be (it is probably decided at vpcu init time). but in that
case, it could be worth moving all the TID2/TID4 trapping together.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.