Changes from v3:
- Main change is in patch #4, where the VMID is now set to an
invalid one on vCPU schedule out. Introduced an
INVALID_ACTIVE_VMID which is basically a VMID 0 with generation 1.
Since the basic allocator algorithm reserves vmid #0, it is never
used as an active VMID. This (hopefully) will fix the issue of
unnecessarily reserving VMID space with active_vmids when those
VMs are no longer active[0] and at the same time address the
problem noted in v3 wherein everything ends up in slow-path[1].
Testing:
-Run with VMID bit set to 4 and maxcpus to 8 on D06. The test
involves running concurrently 50 guests with 4 vCPUs. Each
guest will then execute hackbench 5 times before exiting.
No crash was observed for a 4-day continuous run.
The latest branch is here,
https://github.com/hisilicon/kernel-dev/tree/private-v5.16-rc1-vmid-v4
-TLA+ model. Modified the asidalloc model to incorporate the new
VMID algo. The main differences are,
-flush_tlb_all() instead of local_tlb_flush_all() on rollover.
-Introduced INVALID_VMID and vCPU Sched Out logic.
-No CnP (Removed UniqueASIDAllCPUs & UniqueASIDActiveTask invariants).
-Removed UniqueVMIDPerCPU invariant for now as it looks like
because of the speculative fetching with flush_tlb_all() there
is a small window where this gets triggered. If I change the
logic back to local_flush_tlb_all(), UniqueVMIDPerCPU seems to
be fine. With my limited knowledge on TLA+ model, it is not
clear to me whether this is a problem with the above logic
or the VMID model implementation. Really appreciate any help
with the model.
The initial VMID TLA+ model is here,
https://github.com/shamiali2008/kernel-tla/tree/private-vmidalloc-v1
Please take a look and let me know.
Thanks,
Shameer
[0] https://lore.kernel.org/kvmarm/20210721160614.GC11003@willie-the-truck/
[1] https://lore.kernel.org/kvmarm/20210803114034.GB30853@willie-the-truck/
History:
--------
v2 --> v3
-Dropped adding a new static key and cpufeature for retrieving
supported VMID bits. Instead, we now make use of the
kvm_arm_vmid_bits variable (patch #2).
-Since we expect less frequent rollover in the case of VMIDs,
the TLB invalidation is now broadcasted on rollover instead
of keeping per CPU flush_pending info and issuing a local
context flush.
-Clear active_vmids on vCPU schedule out to avoid unnecessarily
reserving the VMID space(patch #3).
-I have kept the struct kvm_vmid as it is for now(instead of a
typedef as suggested), as we may soon add another variable to
it when we introduce Pinned KVM VMID support.
RFCv1 --> v2
-Dropped "pinned VMID" support for now.
-Dropped RFC tag.
RFCv1
https://lore.kernel.org/kvmarm/[email protected]/
Julien Grall (1):
KVM: arm64: Align the VMID allocation with the arm64 ASID
Shameer Kolothum (3):
KVM: arm64: Introduce a new VMID allocator for KVM
KVM: arm64: Make VMID bits accessible outside of allocator
KVM: arm64: Make active_vmids invalid on vCPU schedule out
arch/arm64/include/asm/kvm_host.h | 10 +-
arch/arm64/include/asm/kvm_mmu.h | 4 +-
arch/arm64/kernel/image-vars.h | 3 +
arch/arm64/kvm/Makefile | 2 +-
arch/arm64/kvm/arm.c | 106 +++-----------
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 3 +-
arch/arm64/kvm/mmu.c | 1 -
arch/arm64/kvm/vmid.c | 196 ++++++++++++++++++++++++++
8 files changed, 228 insertions(+), 97 deletions(-)
create mode 100644 arch/arm64/kvm/vmid.c
--
2.17.1
A new VMID allocator for arm64 KVM use. This is based on
arm64 ASID allocator algorithm.
One major deviation from the ASID allocator is the way we
flush the context. Unlike ASID allocator, we expect less
frequent rollover in the case of VMIDs. Hence, instead of
marking the CPU as flush_pending and issuing a local context
invalidation on the next context switch, we broadcast TLB
flush + I-cache invalidation over the inner shareable domain
on rollover.
Signed-off-by: Shameer Kolothum <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 4 +
arch/arm64/kvm/vmid.c | 177 ++++++++++++++++++++++++++++++
2 files changed, 181 insertions(+)
create mode 100644 arch/arm64/kvm/vmid.c
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 2a5f7f38006f..f4a86a79ea4a 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -690,6 +690,10 @@ int kvm_arm_pvtime_get_attr(struct kvm_vcpu *vcpu,
int kvm_arm_pvtime_has_attr(struct kvm_vcpu *vcpu,
struct kvm_device_attr *attr);
+int kvm_arm_vmid_alloc_init(void);
+void kvm_arm_vmid_alloc_free(void);
+void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid);
+
static inline void kvm_arm_pvtime_vcpu_init(struct kvm_vcpu_arch *vcpu_arch)
{
vcpu_arch->steal.base = GPA_INVALID;
diff --git a/arch/arm64/kvm/vmid.c b/arch/arm64/kvm/vmid.c
new file mode 100644
index 000000000000..aa01c97f7df0
--- /dev/null
+++ b/arch/arm64/kvm/vmid.c
@@ -0,0 +1,177 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * VMID allocator.
+ *
+ * Based on Arm64 ASID allocator algorithm.
+ * Please refer arch/arm64/mm/context.c for detailed
+ * comments on algorithm.
+ *
+ * Copyright (C) 2002-2003 Deep Blue Solutions Ltd, all rights reserved.
+ * Copyright (C) 2012 ARM Ltd.
+ */
+
+#include <linux/bitfield.h>
+#include <linux/bitops.h>
+
+#include <asm/kvm_asm.h>
+#include <asm/kvm_mmu.h>
+
+static unsigned int kvm_arm_vmid_bits;
+static DEFINE_RAW_SPINLOCK(cpu_vmid_lock);
+
+static atomic64_t vmid_generation;
+static unsigned long *vmid_map;
+
+static DEFINE_PER_CPU(atomic64_t, active_vmids);
+static DEFINE_PER_CPU(u64, reserved_vmids);
+
+#define VMID_MASK (~GENMASK(kvm_arm_vmid_bits - 1, 0))
+#define VMID_FIRST_VERSION (1UL << kvm_arm_vmid_bits)
+
+#define NUM_USER_VMIDS VMID_FIRST_VERSION
+#define vmid2idx(vmid) ((vmid) & ~VMID_MASK)
+#define idx2vmid(idx) vmid2idx(idx)
+
+#define vmid_gen_match(vmid) \
+ (!(((vmid) ^ atomic64_read(&vmid_generation)) >> kvm_arm_vmid_bits))
+
+static void flush_context(void)
+{
+ int cpu;
+ u64 vmid;
+
+ bitmap_clear(vmid_map, 0, NUM_USER_VMIDS);
+
+ for_each_possible_cpu(cpu) {
+ vmid = atomic64_xchg_relaxed(&per_cpu(active_vmids, cpu), 0);
+
+ /* Preserve reserved VMID */
+ if (vmid == 0)
+ vmid = per_cpu(reserved_vmids, cpu);
+ __set_bit(vmid2idx(vmid), vmid_map);
+ per_cpu(reserved_vmids, cpu) = vmid;
+ }
+
+ /*
+ * Unlike ASID allocator, we expect less frequent rollover in
+ * case of VMIDs. Hence, instead of marking the CPU as
+ * flush_pending and issuing a local context invalidation on
+ * the next context-switch, we broadcast TLB flush + I-cache
+ * invalidation over the inner shareable domain on rollover.
+ */
+ kvm_call_hyp(__kvm_flush_vm_context);
+}
+
+static bool check_update_reserved_vmid(u64 vmid, u64 newvmid)
+{
+ int cpu;
+ bool hit = false;
+
+ /*
+ * Iterate over the set of reserved VMIDs looking for a match
+ * and update to use newvmid (i.e. the same VMID in the current
+ * generation).
+ */
+ for_each_possible_cpu(cpu) {
+ if (per_cpu(reserved_vmids, cpu) == vmid) {
+ hit = true;
+ per_cpu(reserved_vmids, cpu) = newvmid;
+ }
+ }
+
+ return hit;
+}
+
+static u64 new_vmid(struct kvm_vmid *kvm_vmid)
+{
+ static u32 cur_idx = 1;
+ u64 vmid = atomic64_read(&kvm_vmid->id);
+ u64 generation = atomic64_read(&vmid_generation);
+
+ if (vmid != 0) {
+ u64 newvmid = generation | (vmid & ~VMID_MASK);
+
+ if (check_update_reserved_vmid(vmid, newvmid)) {
+ atomic64_set(&kvm_vmid->id, newvmid);
+ return newvmid;
+ }
+
+ if (!__test_and_set_bit(vmid2idx(vmid), vmid_map)) {
+ atomic64_set(&kvm_vmid->id, newvmid);
+ return newvmid;
+ }
+ }
+
+ vmid = find_next_zero_bit(vmid_map, NUM_USER_VMIDS, cur_idx);
+ if (vmid != NUM_USER_VMIDS)
+ goto set_vmid;
+
+ /* We're out of VMIDs, so increment the global generation count */
+ generation = atomic64_add_return_relaxed(VMID_FIRST_VERSION,
+ &vmid_generation);
+ flush_context();
+
+ /* We have more VMIDs than CPUs, so this will always succeed */
+ vmid = find_next_zero_bit(vmid_map, NUM_USER_VMIDS, 1);
+
+set_vmid:
+ __set_bit(vmid, vmid_map);
+ cur_idx = vmid;
+ vmid = idx2vmid(vmid) | generation;
+ atomic64_set(&kvm_vmid->id, vmid);
+ return vmid;
+}
+
+void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid)
+{
+ unsigned long flags;
+ u64 vmid, old_active_vmid;
+
+ vmid = atomic64_read(&kvm_vmid->id);
+
+ /*
+ * Please refer comments in check_and_switch_context() in
+ * arch/arm64/mm/context.c.
+ */
+ old_active_vmid = atomic64_read(this_cpu_ptr(&active_vmids));
+ if (old_active_vmid && vmid_gen_match(vmid) &&
+ atomic64_cmpxchg_relaxed(this_cpu_ptr(&active_vmids),
+ old_active_vmid, vmid))
+ return;
+
+ raw_spin_lock_irqsave(&cpu_vmid_lock, flags);
+
+ /* Check that our VMID belongs to the current generation. */
+ vmid = atomic64_read(&kvm_vmid->id);
+ if (!vmid_gen_match(vmid))
+ vmid = new_vmid(kvm_vmid);
+
+ atomic64_set(this_cpu_ptr(&active_vmids), vmid);
+ raw_spin_unlock_irqrestore(&cpu_vmid_lock, flags);
+}
+
+/*
+ * Initialize the VMID allocator
+ */
+int kvm_arm_vmid_alloc_init(void)
+{
+ kvm_arm_vmid_bits = kvm_get_vmid_bits();
+
+ /*
+ * Expect allocation after rollover to fail if we don't have
+ * at least one more VMID than CPUs. VMID #0 is always reserved.
+ */
+ WARN_ON(NUM_USER_VMIDS - 1 <= num_possible_cpus());
+ atomic64_set(&vmid_generation, VMID_FIRST_VERSION);
+ vmid_map = kcalloc(BITS_TO_LONGS(NUM_USER_VMIDS),
+ sizeof(*vmid_map), GFP_KERNEL);
+ if (!vmid_map)
+ return -ENOMEM;
+
+ return 0;
+}
+
+void kvm_arm_vmid_alloc_free(void)
+{
+ kfree(vmid_map);
+}
--
2.17.1
Since we already set the kvm_arm_vmid_bits in the VMID allocator
init function, make it accessible outside as well so that it can
be used in the subsequent patch.
Suggested-by: Will Deacon <[email protected]>
Signed-off-by: Shameer Kolothum <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 1 +
arch/arm64/kernel/image-vars.h | 3 +++
arch/arm64/kvm/vmid.c | 2 +-
3 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index f4a86a79ea4a..51af17e16115 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -690,6 +690,7 @@ int kvm_arm_pvtime_get_attr(struct kvm_vcpu *vcpu,
int kvm_arm_pvtime_has_attr(struct kvm_vcpu *vcpu,
struct kvm_device_attr *attr);
+extern unsigned int kvm_arm_vmid_bits;
int kvm_arm_vmid_alloc_init(void);
void kvm_arm_vmid_alloc_free(void);
void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid);
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index c96a9a0043bf..c12963c3a055 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -79,6 +79,9 @@ KVM_NVHE_ALIAS(__hyp_stub_vectors);
/* Kernel symbol used by icache_is_vpipt(). */
KVM_NVHE_ALIAS(__icache_flags);
+/* VMID bits set by the KVM VMID allocator */
+KVM_NVHE_ALIAS(kvm_arm_vmid_bits);
+
/* Kernel symbols needed for cpus_have_final/const_caps checks. */
KVM_NVHE_ALIAS(arm64_const_caps_ready);
KVM_NVHE_ALIAS(cpu_hwcap_keys);
diff --git a/arch/arm64/kvm/vmid.c b/arch/arm64/kvm/vmid.c
index aa01c97f7df0..9aff692b6b7d 100644
--- a/arch/arm64/kvm/vmid.c
+++ b/arch/arm64/kvm/vmid.c
@@ -16,7 +16,7 @@
#include <asm/kvm_asm.h>
#include <asm/kvm_mmu.h>
-static unsigned int kvm_arm_vmid_bits;
+unsigned int kvm_arm_vmid_bits;
static DEFINE_RAW_SPINLOCK(cpu_vmid_lock);
static atomic64_t vmid_generation;
--
2.17.1
From: Julien Grall <[email protected]>
At the moment, the VMID algorithm will send an SGI to all the
CPUs to force an exit and then broadcast a full TLB flush and
I-Cache invalidation.
This patch uses the new VMID allocator. The benefits are:
- Aligns with arm64 ASID algorithm.
- CPUs are not forced to exit at roll-over. Instead,
the VMID will be marked reserved and context invalidation
is broadcasted. This will reduce the IPIs traffic.
- More flexible to add support for pinned KVM VMIDs in
the future.
With the new algo, the code is now adapted:
- The call to update_vmid() will be done with preemption
disabled as the new algo requires to store information
per-CPU.
Signed-off-by: Julien Grall <[email protected]>
Signed-off-by: Shameer Kolothum <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 4 +-
arch/arm64/include/asm/kvm_mmu.h | 4 +-
arch/arm64/kvm/Makefile | 2 +-
arch/arm64/kvm/arm.c | 105 ++++----------------------
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 3 +-
arch/arm64/kvm/mmu.c | 1 -
6 files changed, 22 insertions(+), 97 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 51af17e16115..752d4408e3d0 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -72,9 +72,7 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu);
struct kvm_vmid {
- /* The VMID generation used for the virt. memory system */
- u64 vmid_gen;
- u32 vmid;
+ atomic64_t id;
};
struct kvm_s2_mmu {
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 02d378887743..d8e6266b8d46 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -115,6 +115,7 @@ alternative_cb_end
#include <asm/cache.h>
#include <asm/cacheflush.h>
#include <asm/mmu_context.h>
+#include <asm/kvm_host.h>
void kvm_update_va_mask(struct alt_instr *alt,
__le32 *origptr, __le32 *updptr, int nr_inst);
@@ -264,7 +265,8 @@ static __always_inline u64 kvm_get_vttbr(struct kvm_s2_mmu *mmu)
u64 cnp = system_supports_cnp() ? VTTBR_CNP_BIT : 0;
baddr = mmu->pgd_phys;
- vmid_field = (u64)READ_ONCE(vmid->vmid) << VTTBR_VMID_SHIFT;
+ vmid_field = atomic64_read(&vmid->id) << VTTBR_VMID_SHIFT;
+ vmid_field &= VTTBR_VMID_MASK(kvm_arm_vmid_bits);
return kvm_phys_to_vttbr(baddr) | vmid_field | cnp;
}
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 989bb5dad2c8..4a607d52356c 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -16,7 +16,7 @@ kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o \
inject_fault.o va_layout.o handle_exit.o \
guest.o debug.o reset.o sys_regs.o \
vgic-sys-reg-v3.o fpsimd.o pmu.o \
- arch_timer.o trng.o\
+ arch_timer.o trng.o vmid.o \
vgic/vgic.o vgic/vgic-init.o \
vgic/vgic-irqfd.o vgic/vgic-v2.o \
vgic/vgic-v3.o vgic/vgic-v4.o \
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 2f03cbfefe67..0544011b0fc6 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -53,11 +53,6 @@ static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
-/* The VMID used in the VTTBR */
-static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
-static u32 kvm_next_vmid;
-static DEFINE_SPINLOCK(kvm_vmid_lock);
-
static bool vgic_present;
static DEFINE_PER_CPU(unsigned char, kvm_arm_hardware_enabled);
@@ -496,87 +491,6 @@ bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu)
return vcpu_mode_priv(vcpu);
}
-/* Just ensure a guest exit from a particular CPU */
-static void exit_vm_noop(void *info)
-{
-}
-
-void force_vm_exit(const cpumask_t *mask)
-{
- preempt_disable();
- smp_call_function_many(mask, exit_vm_noop, NULL, true);
- preempt_enable();
-}
-
-/**
- * need_new_vmid_gen - check that the VMID is still valid
- * @vmid: The VMID to check
- *
- * return true if there is a new generation of VMIDs being used
- *
- * The hardware supports a limited set of values with the value zero reserved
- * for the host, so we check if an assigned value belongs to a previous
- * generation, which requires us to assign a new value. If we're the first to
- * use a VMID for the new generation, we must flush necessary caches and TLBs
- * on all CPUs.
- */
-static bool need_new_vmid_gen(struct kvm_vmid *vmid)
-{
- u64 current_vmid_gen = atomic64_read(&kvm_vmid_gen);
- smp_rmb(); /* Orders read of kvm_vmid_gen and kvm->arch.vmid */
- return unlikely(READ_ONCE(vmid->vmid_gen) != current_vmid_gen);
-}
-
-/**
- * update_vmid - Update the vmid with a valid VMID for the current generation
- * @vmid: The stage-2 VMID information struct
- */
-static void update_vmid(struct kvm_vmid *vmid)
-{
- if (!need_new_vmid_gen(vmid))
- return;
-
- spin_lock(&kvm_vmid_lock);
-
- /*
- * We need to re-check the vmid_gen here to ensure that if another vcpu
- * already allocated a valid vmid for this vm, then this vcpu should
- * use the same vmid.
- */
- if (!need_new_vmid_gen(vmid)) {
- spin_unlock(&kvm_vmid_lock);
- return;
- }
-
- /* First user of a new VMID generation? */
- if (unlikely(kvm_next_vmid == 0)) {
- atomic64_inc(&kvm_vmid_gen);
- kvm_next_vmid = 1;
-
- /*
- * On SMP we know no other CPUs can use this CPU's or each
- * other's VMID after force_vm_exit returns since the
- * kvm_vmid_lock blocks them from reentry to the guest.
- */
- force_vm_exit(cpu_all_mask);
- /*
- * Now broadcast TLB + ICACHE invalidation over the inner
- * shareable domain to make sure all data structures are
- * clean.
- */
- kvm_call_hyp(__kvm_flush_vm_context);
- }
-
- WRITE_ONCE(vmid->vmid, kvm_next_vmid);
- kvm_next_vmid++;
- kvm_next_vmid &= (1 << kvm_get_vmid_bits()) - 1;
-
- smp_wmb();
- WRITE_ONCE(vmid->vmid_gen, atomic64_read(&kvm_vmid_gen));
-
- spin_unlock(&kvm_vmid_lock);
-}
-
static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
{
struct kvm *kvm = vcpu->kvm;
@@ -753,7 +667,6 @@ static bool kvm_vcpu_exit_request(struct kvm_vcpu *vcpu, int *ret)
}
return kvm_request_pending(vcpu) ||
- need_new_vmid_gen(&vcpu->arch.hw_mmu->vmid) ||
xfer_to_guest_mode_work_pending();
}
@@ -804,8 +717,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
if (!ret)
ret = 1;
- update_vmid(&vcpu->arch.hw_mmu->vmid);
-
check_vcpu_requests(vcpu);
/*
@@ -815,6 +726,15 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
*/
preempt_disable();
+ /*
+ * The VMID allocator only tracks active VMIDs per
+ * physical CPU, and therefore the VMID allocated may not be
+ * preserved on VMID roll-over if the task was preempted,
+ * making a thread's VMID inactive. So we need to call
+ * kvm_arm_vmid_update() in non-premptible context.
+ */
+ kvm_arm_vmid_update(&vcpu->arch.hw_mmu->vmid);
+
kvm_pmu_flush_hwstate(vcpu);
local_irq_disable();
@@ -2111,6 +2031,12 @@ int kvm_arch_init(void *opaque)
if (err)
return err;
+ err = kvm_arm_vmid_alloc_init();
+ if (err) {
+ kvm_err("Failed to initialize VMID allocator.\n");
+ return err;
+ }
+
if (!in_hyp_mode) {
err = init_hyp_mode();
if (err)
@@ -2150,6 +2076,7 @@ int kvm_arch_init(void *opaque)
if (!in_hyp_mode)
teardown_hyp_mode();
out_err:
+ kvm_arm_vmid_alloc_free();
return err;
}
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index c1a90dd022b8..136141f7d524 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -117,8 +117,7 @@ int kvm_host_prepare_stage2(void *pgt_pool_base)
mmu->pgd_phys = __hyp_pa(host_kvm.pgt.pgd);
mmu->arch = &host_kvm.arch;
mmu->pgt = &host_kvm.pgt;
- WRITE_ONCE(mmu->vmid.vmid_gen, 0);
- WRITE_ONCE(mmu->vmid.vmid, 0);
+ atomic64_set(&mmu->vmid.id, 0);
return 0;
}
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 326cdfec74a1..f1f4bbe5adeb 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -532,7 +532,6 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
mmu->arch = &kvm->arch;
mmu->pgt = pgt;
mmu->pgd_phys = __pa(pgt->pgd);
- WRITE_ONCE(mmu->vmid.vmid_gen, 0);
return 0;
out_destroy_pgtable:
--
2.17.1
Like ASID allocator, we copy the active_vmids into the
reserved_vmids on a rollover. But it's unlikely that
every CPU will have a vCPU as current task and we may
end up unnecessarily reserving the VMID space.
Hence, set active_vmids to an invalid one when scheduling
out a vCPU.
Signed-off-by: Shameer Kolothum <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 1 +
arch/arm64/kvm/arm.c | 1 +
arch/arm64/kvm/vmid.c | 25 ++++++++++++++++++++++---
3 files changed, 24 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 752d4408e3d0..22f952effd03 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -692,6 +692,7 @@ extern unsigned int kvm_arm_vmid_bits;
int kvm_arm_vmid_alloc_init(void);
void kvm_arm_vmid_alloc_free(void);
void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid);
+void kvm_arm_vmid_clear_active(void);
static inline void kvm_arm_pvtime_vcpu_init(struct kvm_vcpu_arch *vcpu_arch)
{
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 0544011b0fc6..bfe926805240 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -431,6 +431,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
kvm_timer_vcpu_put(vcpu);
kvm_vgic_put(vcpu);
kvm_vcpu_pmu_restore_host(vcpu);
+ kvm_arm_vmid_clear_active();
vcpu->cpu = -1;
}
diff --git a/arch/arm64/kvm/vmid.c b/arch/arm64/kvm/vmid.c
index 9aff692b6b7d..966ebb2d12e5 100644
--- a/arch/arm64/kvm/vmid.c
+++ b/arch/arm64/kvm/vmid.c
@@ -32,6 +32,13 @@ static DEFINE_PER_CPU(u64, reserved_vmids);
#define vmid2idx(vmid) ((vmid) & ~VMID_MASK)
#define idx2vmid(idx) vmid2idx(idx)
+/*
+ * As vmid #0 is always reserved, we will never allocate one
+ * as below and can be treated as invalid. This is used to
+ * set the active_vmids on vCPU schedule out.
+ */
+#define VMID_ACTIVE_INVALID VMID_FIRST_VERSION
+
#define vmid_gen_match(vmid) \
(!(((vmid) ^ atomic64_read(&vmid_generation)) >> kvm_arm_vmid_bits))
@@ -122,6 +129,12 @@ static u64 new_vmid(struct kvm_vmid *kvm_vmid)
return vmid;
}
+/* Called from vCPU sched out with preemption disabled */
+void kvm_arm_vmid_clear_active(void)
+{
+ atomic64_set(this_cpu_ptr(&active_vmids), VMID_ACTIVE_INVALID);
+}
+
void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid)
{
unsigned long flags;
@@ -132,11 +145,17 @@ void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid)
/*
* Please refer comments in check_and_switch_context() in
* arch/arm64/mm/context.c.
+ *
+ * Unlike ASID allocator, we set the active_vmids to
+ * VMID_ACTIVE_INVALID on vCPU schedule out to avoid
+ * reserving the VMID space needlessly on rollover.
+ * Hence explicitly check here for a "!= 0" to
+ * handle the sync with a concurrent rollover.
*/
old_active_vmid = atomic64_read(this_cpu_ptr(&active_vmids));
- if (old_active_vmid && vmid_gen_match(vmid) &&
- atomic64_cmpxchg_relaxed(this_cpu_ptr(&active_vmids),
- old_active_vmid, vmid))
+ if (old_active_vmid != 0 && vmid_gen_match(vmid) &&
+ 0 != atomic64_cmpxchg_relaxed(this_cpu_ptr(&active_vmids),
+ old_active_vmid, vmid))
return;
raw_spin_lock_irqsave(&cpu_vmid_lock, flags);
--
2.17.1
Hi,
A gentle ping on this series. Please take a look and let me know the new approach
taken in this revision is good enough or not.
Appreciate your feedback.
Thanks,
Shameer
> -----Original Message-----
> From: linux-arm-kernel [mailto:[email protected]]
> On Behalf Of Shameer Kolothum
> Sent: 22 November 2021 12:19
> To: [email protected]; [email protected];
> [email protected]
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; Jonathan Cameron
> <[email protected]>; Linuxarm <[email protected]>
> Subject: [PATCH v4 0/4] kvm/arm: New VMID allocator based on asid
>
> Changes from v3:
> - Main change is in patch #4, where the VMID is now set to an
> invalid one on vCPU schedule out. Introduced an
> INVALID_ACTIVE_VMID which is basically a VMID 0 with generation 1.
> Since the basic allocator algorithm reserves vmid #0, it is never
> used as an active VMID. This (hopefully) will fix the issue of
> unnecessarily reserving VMID space with active_vmids when those
> VMs are no longer active[0] and at the same time address the
> problem noted in v3 wherein everything ends up in slow-path[1].
>
> Testing:
> -Run with VMID bit set to 4 and maxcpus to 8 on D06. The test
> involves running concurrently 50 guests with 4 vCPUs. Each
> guest will then execute hackbench 5 times before exiting.
> No crash was observed for a 4-day continuous run.
> The latest branch is here,
> https://github.com/hisilicon/kernel-dev/tree/private-v5.16-rc1-vmid-v4
>
> -TLA+ model. Modified the asidalloc model to incorporate the new
> VMID algo. The main differences are,
> -flush_tlb_all() instead of local_tlb_flush_all() on rollover.
> -Introduced INVALID_VMID and vCPU Sched Out logic.
> -No CnP (Removed UniqueASIDAllCPUs & UniqueASIDActiveTask invariants).
> -Removed UniqueVMIDPerCPU invariant for now as it looks like
> because of the speculative fetching with flush_tlb_all() there
> is a small window where this gets triggered. If I change the
> logic back to local_flush_tlb_all(), UniqueVMIDPerCPU seems to
> be fine. With my limited knowledge on TLA+ model, it is not
> clear to me whether this is a problem with the above logic
> or the VMID model implementation. Really appreciate any help
> with the model.
> The initial VMID TLA+ model is here,
> https://github.com/shamiali2008/kernel-tla/tree/private-vmidalloc-v1
>
> Please take a look and let me know.
>
> Thanks,
> Shameer
>
> [0]
> https://lore.kernel.org/kvmarm/20210721160614.GC11003@willie-the-truck/
> [1]
> https://lore.kernel.org/kvmarm/20210803114034.GB30853@willie-the-truck/
>
> History:
> --------
> v2 --> v3
> -Dropped adding a new static key and cpufeature for retrieving
> supported VMID bits. Instead, we now make use of the
> kvm_arm_vmid_bits variable (patch #2).
>
> -Since we expect less frequent rollover in the case of VMIDs,
> the TLB invalidation is now broadcasted on rollover instead
> of keeping per CPU flush_pending info and issuing a local
> context flush.
>
> -Clear active_vmids on vCPU schedule out to avoid unnecessarily
> reserving the VMID space(patch #3).
>
> -I have kept the struct kvm_vmid as it is for now(instead of a
> typedef as suggested), as we may soon add another variable to
> it when we introduce Pinned KVM VMID support.
>
> RFCv1 --> v2
> -Dropped "pinned VMID" support for now.
> -Dropped RFC tag.
> RFCv1
>
> https://lore.kernel.org/kvmarm/20210506165232.1969-1-shameerali.kolothu
> [email protected]/
>
> Julien Grall (1):
> KVM: arm64: Align the VMID allocation with the arm64 ASID
>
> Shameer Kolothum (3):
> KVM: arm64: Introduce a new VMID allocator for KVM
> KVM: arm64: Make VMID bits accessible outside of allocator
> KVM: arm64: Make active_vmids invalid on vCPU schedule out
>
> arch/arm64/include/asm/kvm_host.h | 10 +-
> arch/arm64/include/asm/kvm_mmu.h | 4 +-
> arch/arm64/kernel/image-vars.h | 3 +
> arch/arm64/kvm/Makefile | 2 +-
> arch/arm64/kvm/arm.c | 106 +++-----------
> arch/arm64/kvm/hyp/nvhe/mem_protect.c | 3 +-
> arch/arm64/kvm/mmu.c | 1 -
> arch/arm64/kvm/vmid.c | 196
> ++++++++++++++++++++++++++
> 8 files changed, 228 insertions(+), 97 deletions(-) create mode 100644
> arch/arm64/kvm/vmid.c
>
> --
> 2.17.1
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Hi Shameer,
On Mon, Nov 22, 2021 at 12:18:40PM +0000, Shameer Kolothum wrote:
> ?-TLA+ model. Modified the asidalloc model to incorporate the new
> VMID algo. The main differences are,
> ? -flush_tlb_all() instead of local_tlb_flush_all() on rollover.
> ? -Introduced INVALID_VMID and vCPU Sched Out logic.
> ? -No CnP (Removed UniqueASIDAllCPUs & UniqueASIDActiveTask invariants).
> ? -Removed ?UniqueVMIDPerCPU invariant for now as it looks like
> because of the speculative fetching with flush_tlb_all() there
> is a small window where this gets triggered. If I change the
> logic back to local_flush_tlb_all(), UniqueVMIDPerCPU seems to
> be fine. With my limited knowledge on TLA+ model, it is not
> clear to me whether this is a problem with the above logic
> or the VMID model implementation. Really appreciate any help
> with the model.
> The initial VMID TLA+ model is here,
> https://github.com/shamiali2008/kernel-tla/tree/private-vmidalloc-v1
I only had a brief look at the TLA+ model and I don't understand why you
have a separate 'shed_out' process. It would run in parallel with the
'sched' but AFAICT you can't really schedule a guest out while you are
in the middle of scheduling it in. I'd rather use the same 'sched'
process and either schedule in an inactive task or schedule out an
active one for a given CPU.
Also active_vmids[] for example is defined on the CPUS domain but you
call vcpu_sched_out() from a process that's not in the CPUS domain but
the SCHED_OUT one.
Regarding UniqueVMIDPerCPU, I think we need to figure out why it
happens. The fact that flush_tlb_all() was made to simulate the
speculative TLB loads is not relevant. In a different spec I have,
arm64kpti.tla, I just used another process that invokes an update_tlbs()
macro so that it can happen at any time. I didn't bother to update the
ASID spec in a similar way but it may be useful. The corresponding
UniqueASIDPerCPU meant that for any two TLB entries on a single CPU, if
they correspond to different tasks (pgd), they should have different
ASIDs. That's a strong requirement, otherwise we end up with the wrong
translation.
Why did you remove the CnP? Do we have this disabled for KVM guests?
--
Catalin
On Wed, Jan 19, 2022 at 09:23:31AM +0000, Shameerali Kolothum Thodi wrote:
> > On Mon, Nov 22, 2021 at 12:18:40PM +0000, Shameer Kolothum wrote:
> > > ?-TLA+ model. Modified the asidalloc model to incorporate the new
> > > VMID algo. The main differences are,
> > > ? -flush_tlb_all() instead of local_tlb_flush_all() on rollover.
> > > ? -Introduced INVALID_VMID and vCPU Sched Out logic.
> > > ? -No CnP (Removed UniqueASIDAllCPUs & UniqueASIDActiveTask invariants).
> > > ? -Removed ?UniqueVMIDPerCPU invariant for now as it looks like
> > > because of the speculative fetching with flush_tlb_all() there
> > > is a small window where this gets triggered. If I change the
> > > logic back to local_flush_tlb_all(), UniqueVMIDPerCPU seems to
> > > be fine. With my limited knowledge on TLA+ model, it is not
> > > clear to me whether this is a problem with the above logic
> > > or the VMID model implementation. Really appreciate any help
> > > with the model.
> > > The initial VMID TLA+ model is here,
> > > https://github.com/shamiali2008/kernel-tla/tree/private-vmidalloc-v1
> >
> > I only had a brief look at the TLA+ model and I don't understand why you
> > have a separate 'shed_out' process. It would run in parallel with the
> > 'sched' but AFAICT you can't really schedule a guest out while you are
> > in the middle of scheduling it in. I'd rather use the same 'sched'
> > process and either schedule in an inactive task or schedule out an
> > active one for a given CPU.
> >
> > Also active_vmids[] for example is defined on the CPUS domain but you
> > call vcpu_sched_out() from a process that's not in the CPUS domain but
> > the SCHED_OUT one.
>
> Many thanks for taking a look. My bad!. The 'sched_out' would indeed run in parallel
> and defeat the purpose. I must say I was really confused by the TLA+ syntax and
> is still not very confident about it.
Yeah, it can be confusing. If you have time, you could give CBMC a try
and the 'spec' would be pretty close to your C version. Each CPU would
be modelled as a thread with an extra thread that simulates the
speculative TLB look-ups for all CPUS together with the asserts for the
invariants. The spinlocks would be pthread_mutexes.
> Based on the above suggestion, I have modified it as below,
>
> \* vCPU is scheduled out by KVM
> macro vcpu_sched_out() {
> active_kvm[self].task := 0;
> active_vmids[self] := INVALID_VMID;
> }
Could you call cpu_switch_kvm(0, INVALID_VMID) instead? You could do
this directly below and avoid another macro. Well, whatever you find
clearer.
What confuses me is that your INVALID_VMID looks like a valid one: vmid
0, generation 1. Do you ensure that you never allocate VMID 0?
> \* About to run a Guest VM
> process (sched \in CPUS)
> {
> sched_loop:
> while (TRUE) {
> with (t \in TASKS) {
> if (t # ActiveTask(self))
> call kvm_arm_vmid_update(t);
> else
> vcpu_sched_out();
> }
> }
> }
Yes, that's what I meant.
> > The corresponding
> > UniqueASIDPerCPU meant that for any two TLB entries on a single CPU, if
> > they correspond to different tasks (pgd), they should have different
> > ASIDs. That's a strong requirement, otherwise we end up with the wrong
> > translation.
>
> Yes, I understand that it is a strong requirement. Also, I thought this is something
> that will trigger easily with the test setup I had with the real hardware. But testing
> stayed on for days, so I was not sure it is a problem with the TLA+ implementation
> or not.
Well, you'd have to check the TLA+ state trace and see how it got
there, whether the last state would be a valid one. It's either
something missing in the spec that the hardware enforces or the
algorithm is wrong and just hard to hit in practice. If this condition
is violated in hardware for a very brief period, e.g. due to some TLBI,
you'd not notice an issue under normal circumstances. But it's still
incorrect.
> > Why did you remove the CnP? Do we have this disabled for KVM guests?
>
> I removed CnP related Invariants to simplify things for the first version. Also not sure
> what specific changes we need to do for CnP here. Do we still need that switching to
> global pg_dir to prevent any speculative reading? I didn't see that being done in KVM
> anywhere at the moment. Maybe I am missing something.
It make sense to ignore CnP for now. Maybe KVM doesn't even bother and
sets VTTBR_EL2.CnP to 0 (I haven't checked).
> On a side note, In my setup, the CnP=TRUE case for asidalloc.tla now fails with,
> "Error: Invariant TLBEmptyInvalPTE is violated.". Please check.
This was added later as part of try-to-unmap and I only checked this
with CnP = FALSE. I'll need to look into, it's possible that
flush_tlb_all() doesn't take into account that the pte is unmapped (as
cpu_switch_mm() does). I'll add a separate thread for speculative TLB
loads, it's easier to have them in one place. Thanks.
--
Catalin
On Mon, Nov 22, 2021 at 4:19 AM Shameer Kolothum
<[email protected]> wrote:
>
> A new VMID allocator for arm64 KVM use. This is based on
> arm64 ASID allocator algorithm.
>
> One major deviation from the ASID allocator is the way we
> flush the context. Unlike ASID allocator, we expect less
> frequent rollover in the case of VMIDs. Hence, instead of
> marking the CPU as flush_pending and issuing a local context
> invalidation on the next context switch, we broadcast TLB
> flush + I-cache invalidation over the inner shareable domain
> on rollover.
>
> Signed-off-by: Shameer Kolothum <[email protected]>
> ---
> arch/arm64/include/asm/kvm_host.h | 4 +
> arch/arm64/kvm/vmid.c | 177 ++++++++++++++++++++++++++++++
> 2 files changed, 181 insertions(+)
> create mode 100644 arch/arm64/kvm/vmid.c
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 2a5f7f38006f..f4a86a79ea4a 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -690,6 +690,10 @@ int kvm_arm_pvtime_get_attr(struct kvm_vcpu *vcpu,
> int kvm_arm_pvtime_has_attr(struct kvm_vcpu *vcpu,
> struct kvm_device_attr *attr);
>
> +int kvm_arm_vmid_alloc_init(void);
> +void kvm_arm_vmid_alloc_free(void);
> +void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid);
> +
> static inline void kvm_arm_pvtime_vcpu_init(struct kvm_vcpu_arch *vcpu_arch)
> {
> vcpu_arch->steal.base = GPA_INVALID;
> diff --git a/arch/arm64/kvm/vmid.c b/arch/arm64/kvm/vmid.c
> new file mode 100644
> index 000000000000..aa01c97f7df0
> --- /dev/null
> +++ b/arch/arm64/kvm/vmid.c
> @@ -0,0 +1,177 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * VMID allocator.
> + *
> + * Based on Arm64 ASID allocator algorithm.
> + * Please refer arch/arm64/mm/context.c for detailed
> + * comments on algorithm.
> + *
> + * Copyright (C) 2002-2003 Deep Blue Solutions Ltd, all rights reserved.
> + * Copyright (C) 2012 ARM Ltd.
> + */
> +
> +#include <linux/bitfield.h>
> +#include <linux/bitops.h>
> +
> +#include <asm/kvm_asm.h>
> +#include <asm/kvm_mmu.h>
> +
> +static unsigned int kvm_arm_vmid_bits;
> +static DEFINE_RAW_SPINLOCK(cpu_vmid_lock);
> +
> +static atomic64_t vmid_generation;
> +static unsigned long *vmid_map;
> +
> +static DEFINE_PER_CPU(atomic64_t, active_vmids);
> +static DEFINE_PER_CPU(u64, reserved_vmids);
> +
> +#define VMID_MASK (~GENMASK(kvm_arm_vmid_bits - 1, 0))
> +#define VMID_FIRST_VERSION (1UL << kvm_arm_vmid_bits)
> +
> +#define NUM_USER_VMIDS VMID_FIRST_VERSION
> +#define vmid2idx(vmid) ((vmid) & ~VMID_MASK)
> +#define idx2vmid(idx) vmid2idx(idx)
> +
> +#define vmid_gen_match(vmid) \
> + (!(((vmid) ^ atomic64_read(&vmid_generation)) >> kvm_arm_vmid_bits))
> +
> +static void flush_context(void)
> +{
> + int cpu;
> + u64 vmid;
> +
> + bitmap_clear(vmid_map, 0, NUM_USER_VMIDS);
> +
> + for_each_possible_cpu(cpu) {
> + vmid = atomic64_xchg_relaxed(&per_cpu(active_vmids, cpu), 0);
> +
> + /* Preserve reserved VMID */
> + if (vmid == 0)
> + vmid = per_cpu(reserved_vmids, cpu);
> + __set_bit(vmid2idx(vmid), vmid_map);
> + per_cpu(reserved_vmids, cpu) = vmid;
> + }
> +
> + /*
> + * Unlike ASID allocator, we expect less frequent rollover in
> + * case of VMIDs. Hence, instead of marking the CPU as
> + * flush_pending and issuing a local context invalidation on
> + * the next context-switch, we broadcast TLB flush + I-cache
> + * invalidation over the inner shareable domain on rollover.
> + */
> + kvm_call_hyp(__kvm_flush_vm_context);
> +}
> +
> +static bool check_update_reserved_vmid(u64 vmid, u64 newvmid)
> +{
> + int cpu;
> + bool hit = false;
> +
> + /*
> + * Iterate over the set of reserved VMIDs looking for a match
> + * and update to use newvmid (i.e. the same VMID in the current
> + * generation).
> + */
> + for_each_possible_cpu(cpu) {
> + if (per_cpu(reserved_vmids, cpu) == vmid) {
> + hit = true;
> + per_cpu(reserved_vmids, cpu) = newvmid;
> + }
> + }
Once updating reserved_vmids gets done for the all CPUs, it appears
that the function doesn't need to iterate over the set of reserved
VMIDs (correct ?). So, I'm wondering if KVM can manage the number of
CPUs for which reserved_vmids need to get updated so that the function
can skip the loop when the number is zero. I'm not sure how likely
that would help though.
(Since every vmid allocation for non-new guest needs to iterate over
reserved_vmids holding cpu_vmid_lock, I'm a bit concerned about the
performance impact on systems with a large number of CPUs.)
Thanks,
Reiji
> +
> + return hit;
> +}
> +
> +static u64 new_vmid(struct kvm_vmid *kvm_vmid)
> +{
> + static u32 cur_idx = 1;
> + u64 vmid = atomic64_read(&kvm_vmid->id);
> + u64 generation = atomic64_read(&vmid_generation);
> +
> + if (vmid != 0) {
> + u64 newvmid = generation | (vmid & ~VMID_MASK);
> +
> + if (check_update_reserved_vmid(vmid, newvmid)) {
> + atomic64_set(&kvm_vmid->id, newvmid);
> + return newvmid;
> + }
> +
> + if (!__test_and_set_bit(vmid2idx(vmid), vmid_map)) {
> + atomic64_set(&kvm_vmid->id, newvmid);
> + return newvmid;
> + }
> + }
> +
> + vmid = find_next_zero_bit(vmid_map, NUM_USER_VMIDS, cur_idx);
> + if (vmid != NUM_USER_VMIDS)
> + goto set_vmid;
> +
> + /* We're out of VMIDs, so increment the global generation count */
> + generation = atomic64_add_return_relaxed(VMID_FIRST_VERSION,
> + &vmid_generation);
> + flush_context();
> +
> + /* We have more VMIDs than CPUs, so this will always succeed */
> + vmid = find_next_zero_bit(vmid_map, NUM_USER_VMIDS, 1);
> +
> +set_vmid:
> + __set_bit(vmid, vmid_map);
> + cur_idx = vmid;
> + vmid = idx2vmid(vmid) | generation;
> + atomic64_set(&kvm_vmid->id, vmid);
> + return vmid;
> +}
> +
> +void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid)
> +{
> + unsigned long flags;
> + u64 vmid, old_active_vmid;
> +
> + vmid = atomic64_read(&kvm_vmid->id);
> +
> + /*
> + * Please refer comments in check_and_switch_context() in
> + * arch/arm64/mm/context.c.
> + */
> + old_active_vmid = atomic64_read(this_cpu_ptr(&active_vmids));
> + if (old_active_vmid && vmid_gen_match(vmid) &&
> + atomic64_cmpxchg_relaxed(this_cpu_ptr(&active_vmids),
> + old_active_vmid, vmid))
> + return;
> +
> + raw_spin_lock_irqsave(&cpu_vmid_lock, flags);
> +
> + /* Check that our VMID belongs to the current generation. */
> + vmid = atomic64_read(&kvm_vmid->id);
> + if (!vmid_gen_match(vmid))
> + vmid = new_vmid(kvm_vmid);
> +
> + atomic64_set(this_cpu_ptr(&active_vmids), vmid);
> + raw_spin_unlock_irqrestore(&cpu_vmid_lock, flags);
> +}
> +
> +/*
> + * Initialize the VMID allocator
> + */
> +int kvm_arm_vmid_alloc_init(void)
> +{
> + kvm_arm_vmid_bits = kvm_get_vmid_bits();
> +
> + /*
> + * Expect allocation after rollover to fail if we don't have
> + * at least one more VMID than CPUs. VMID #0 is always reserved.
> + */
> + WARN_ON(NUM_USER_VMIDS - 1 <= num_possible_cpus());
> + atomic64_set(&vmid_generation, VMID_FIRST_VERSION);
> + vmid_map = kcalloc(BITS_TO_LONGS(NUM_USER_VMIDS),
> + sizeof(*vmid_map), GFP_KERNEL);
> + if (!vmid_map)
> + return -ENOMEM;
> +
> + return 0;
> +}
> +
> +void kvm_arm_vmid_alloc_free(void)
> +{
> + kfree(vmid_map);
> +}
> --
> 2.17.1
>
> _______________________________________________
> kvmarm mailing list
> [email protected]
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> -----Original Message-----
> From: Reiji Watanabe [mailto:[email protected]]
> Sent: 21 January 2022 07:36
> To: Shameerali Kolothum Thodi <[email protected]>
> Cc: Linux ARM <[email protected]>;
> [email protected]; [email protected];
> [email protected]; Marc Zyngier <[email protected]>; Linuxarm
> <[email protected]>; Jonathan Cameron
> <[email protected]>; Catalin Marinas
> <[email protected]>; Will Deacon <[email protected]>
> Subject: Re: [PATCH v4 1/4] KVM: arm64: Introduce a new VMID allocator for
> KVM
>
> On Mon, Nov 22, 2021 at 4:19 AM Shameer Kolothum
> <[email protected]> wrote:
> >
> > A new VMID allocator for arm64 KVM use. This is based on
> > arm64 ASID allocator algorithm.
> >
> > One major deviation from the ASID allocator is the way we
> > flush the context. Unlike ASID allocator, we expect less
> > frequent rollover in the case of VMIDs. Hence, instead of
> > marking the CPU as flush_pending and issuing a local context
> > invalidation on the next context switch, we broadcast TLB
> > flush + I-cache invalidation over the inner shareable domain
> > on rollover.
> >
> > Signed-off-by: Shameer Kolothum
> <[email protected]>
> > ---
> > arch/arm64/include/asm/kvm_host.h | 4 +
> > arch/arm64/kvm/vmid.c | 177
> ++++++++++++++++++++++++++++++
> > 2 files changed, 181 insertions(+)
> > create mode 100644 arch/arm64/kvm/vmid.c
> >
> > diff --git a/arch/arm64/include/asm/kvm_host.h
> b/arch/arm64/include/asm/kvm_host.h
> > index 2a5f7f38006f..f4a86a79ea4a 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -690,6 +690,10 @@ int kvm_arm_pvtime_get_attr(struct kvm_vcpu
> *vcpu,
> > int kvm_arm_pvtime_has_attr(struct kvm_vcpu *vcpu,
> > struct kvm_device_attr *attr);
> >
> > +int kvm_arm_vmid_alloc_init(void);
> > +void kvm_arm_vmid_alloc_free(void);
> > +void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid);
> > +
> > static inline void kvm_arm_pvtime_vcpu_init(struct kvm_vcpu_arch
> *vcpu_arch)
> > {
> > vcpu_arch->steal.base = GPA_INVALID;
> > diff --git a/arch/arm64/kvm/vmid.c b/arch/arm64/kvm/vmid.c
> > new file mode 100644
> > index 000000000000..aa01c97f7df0
> > --- /dev/null
> > +++ b/arch/arm64/kvm/vmid.c
> > @@ -0,0 +1,177 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * VMID allocator.
> > + *
> > + * Based on Arm64 ASID allocator algorithm.
> > + * Please refer arch/arm64/mm/context.c for detailed
> > + * comments on algorithm.
> > + *
> > + * Copyright (C) 2002-2003 Deep Blue Solutions Ltd, all rights reserved.
> > + * Copyright (C) 2012 ARM Ltd.
> > + */
> > +
> > +#include <linux/bitfield.h>
> > +#include <linux/bitops.h>
> > +
> > +#include <asm/kvm_asm.h>
> > +#include <asm/kvm_mmu.h>
> > +
> > +static unsigned int kvm_arm_vmid_bits;
> > +static DEFINE_RAW_SPINLOCK(cpu_vmid_lock);
> > +
> > +static atomic64_t vmid_generation;
> > +static unsigned long *vmid_map;
> > +
> > +static DEFINE_PER_CPU(atomic64_t, active_vmids);
> > +static DEFINE_PER_CPU(u64, reserved_vmids);
> > +
> > +#define VMID_MASK (~GENMASK(kvm_arm_vmid_bits - 1,
> 0))
> > +#define VMID_FIRST_VERSION (1UL << kvm_arm_vmid_bits)
> > +
> > +#define NUM_USER_VMIDS VMID_FIRST_VERSION
> > +#define vmid2idx(vmid) ((vmid) & ~VMID_MASK)
> > +#define idx2vmid(idx) vmid2idx(idx)
> > +
> > +#define vmid_gen_match(vmid) \
> > + (!(((vmid) ^ atomic64_read(&vmid_generation)) >>
> kvm_arm_vmid_bits))
> > +
> > +static void flush_context(void)
> > +{
> > + int cpu;
> > + u64 vmid;
> > +
> > + bitmap_clear(vmid_map, 0, NUM_USER_VMIDS);
> > +
> > + for_each_possible_cpu(cpu) {
> > + vmid = atomic64_xchg_relaxed(&per_cpu(active_vmids,
> cpu), 0);
> > +
> > + /* Preserve reserved VMID */
> > + if (vmid == 0)
> > + vmid = per_cpu(reserved_vmids, cpu);
> > + __set_bit(vmid2idx(vmid), vmid_map);
> > + per_cpu(reserved_vmids, cpu) = vmid;
> > + }
> > +
> > + /*
> > + * Unlike ASID allocator, we expect less frequent rollover in
> > + * case of VMIDs. Hence, instead of marking the CPU as
> > + * flush_pending and issuing a local context invalidation on
> > + * the next context-switch, we broadcast TLB flush + I-cache
> > + * invalidation over the inner shareable domain on rollover.
> > + */
> > + kvm_call_hyp(__kvm_flush_vm_context);
> > +}
> > +
> > +static bool check_update_reserved_vmid(u64 vmid, u64 newvmid)
> > +{
> > + int cpu;
> > + bool hit = false;
> > +
> > + /*
> > + * Iterate over the set of reserved VMIDs looking for a match
> > + * and update to use newvmid (i.e. the same VMID in the current
> > + * generation).
> > + */
> > + for_each_possible_cpu(cpu) {
> > + if (per_cpu(reserved_vmids, cpu) == vmid) {
> > + hit = true;
> > + per_cpu(reserved_vmids, cpu) = newvmid;
> > + }
> > + }
>
> Once updating reserved_vmids gets done for the all CPUs, it appears
> that the function doesn't need to iterate over the set of reserved
> VMIDs (correct ?). So, I'm wondering if KVM can manage the number of
> CPUs for which reserved_vmids need to get updated so that the function
> can skip the loop when the number is zero. I'm not sure how likely
> that would help though.
Ok. I think that is possible to do. In the flush_context() we can update the
number of CPUs with valid reserved_vmid and add a check here. Not sure on
the probability of that being zero though.
> (Since every vmid allocation for non-new guest needs to iterate over
> reserved_vmids holding cpu_vmid_lock, I'm a bit concerned about the
> performance impact on systems with a large number of CPUs.)
But the non-new guest VMID allocation will normally go through the fast path
unless there is a rollover. And for 16bit VMID space, the frequency of rollover
is very rare I guess.
Thanks,
Shameer
>
> Thanks,
> Reiji
>
> > +
> > + return hit;
> > +}
> > +
> > +static u64 new_vmid(struct kvm_vmid *kvm_vmid)
> > +{
> > + static u32 cur_idx = 1;
> > + u64 vmid = atomic64_read(&kvm_vmid->id);
> > + u64 generation = atomic64_read(&vmid_generation);
> > +
> > + if (vmid != 0) {
> > + u64 newvmid = generation | (vmid & ~VMID_MASK);
> > +
> > + if (check_update_reserved_vmid(vmid, newvmid)) {
> > + atomic64_set(&kvm_vmid->id, newvmid);
> > + return newvmid;
> > + }
> > +
> > + if (!__test_and_set_bit(vmid2idx(vmid), vmid_map)) {
> > + atomic64_set(&kvm_vmid->id, newvmid);
> > + return newvmid;
> > + }
> > + }
> > +
> > + vmid = find_next_zero_bit(vmid_map, NUM_USER_VMIDS,
> cur_idx);
> > + if (vmid != NUM_USER_VMIDS)
> > + goto set_vmid;
> > +
> > + /* We're out of VMIDs, so increment the global generation count */
> > + generation = atomic64_add_return_relaxed(VMID_FIRST_VERSION,
> > +
> &vmid_generation);
> > + flush_context();
> > +
> > + /* We have more VMIDs than CPUs, so this will always succeed */
> > + vmid = find_next_zero_bit(vmid_map, NUM_USER_VMIDS, 1);
> > +
> > +set_vmid:
> > + __set_bit(vmid, vmid_map);
> > + cur_idx = vmid;
> > + vmid = idx2vmid(vmid) | generation;
> > + atomic64_set(&kvm_vmid->id, vmid);
> > + return vmid;
> > +}
> > +
> > +void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid)
> > +{
> > + unsigned long flags;
> > + u64 vmid, old_active_vmid;
> > +
> > + vmid = atomic64_read(&kvm_vmid->id);
> > +
> > + /*
> > + * Please refer comments in check_and_switch_context() in
> > + * arch/arm64/mm/context.c.
> > + */
> > + old_active_vmid = atomic64_read(this_cpu_ptr(&active_vmids));
> > + if (old_active_vmid && vmid_gen_match(vmid) &&
> > + atomic64_cmpxchg_relaxed(this_cpu_ptr(&active_vmids),
> > + old_active_vmid, vmid))
> > + return;
> > +
> > + raw_spin_lock_irqsave(&cpu_vmid_lock, flags);
> > +
> > + /* Check that our VMID belongs to the current generation. */
> > + vmid = atomic64_read(&kvm_vmid->id);
> > + if (!vmid_gen_match(vmid))
> > + vmid = new_vmid(kvm_vmid);
> > +
> > + atomic64_set(this_cpu_ptr(&active_vmids), vmid);
> > + raw_spin_unlock_irqrestore(&cpu_vmid_lock, flags);
> > +}
> > +
> > +/*
> > + * Initialize the VMID allocator
> > + */
> > +int kvm_arm_vmid_alloc_init(void)
> > +{
> > + kvm_arm_vmid_bits = kvm_get_vmid_bits();
> > +
> > + /*
> > + * Expect allocation after rollover to fail if we don't have
> > + * at least one more VMID than CPUs. VMID #0 is always reserved.
> > + */
> > + WARN_ON(NUM_USER_VMIDS - 1 <= num_possible_cpus());
> > + atomic64_set(&vmid_generation, VMID_FIRST_VERSION);
> > + vmid_map = kcalloc(BITS_TO_LONGS(NUM_USER_VMIDS),
> > + sizeof(*vmid_map), GFP_KERNEL);
> > + if (!vmid_map)
> > + return -ENOMEM;
> > +
> > + return 0;
> > +}
> > +
> > +void kvm_arm_vmid_alloc_free(void)
> > +{
> > + kfree(vmid_map);
> > +}
> > --
> > 2.17.1
> >
> > _______________________________________________
> > kvmarm mailing list
> > [email protected]
> > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
On Mon, 22 Nov 2021 12:18:40 +0000, Shameer Kolothum wrote:
> Changes from v3:
> - Main change is in patch #4, where the VMID is now set to an
> invalid one on vCPU schedule out. Introduced an
> INVALID_ACTIVE_VMID which is basically a VMID 0 with generation 1.
> Since the basic allocator algorithm reserves vmid #0, it is never
> used as an active VMID. This (hopefully) will fix the issue of
> unnecessarily reserving VMID space with active_vmids when those
> VMs are no longer active[0] and at the same time address the
> problem noted in v3 wherein everything ends up in slow-path[1].
>
> [...]
Applied to next, thanks!
[1/4] KVM: arm64: Introduce a new VMID allocator for KVM
commit: 417838392f2e657ee25cc30e373ff4c35a0faa90
[2/4] KVM: arm64: Make VMID bits accessible outside of allocator
commit: f8051e960922a9de8e42159103d5d9c697ef17ec
[3/4] KVM: arm64: Align the VMID allocation with the arm64 ASID
commit: 3248136b3637e1671e4fa46e32e2122f9ec4bc3d
[4/4] KVM: arm64: Make active_vmids invalid on vCPU schedule out
commit: 100b4f092f878dc379f1fcef9ce567c25dee3473
Cheers,
M.
--
Without deviation from the norm, progress is not possible.