2019-07-24 17:04:36

by Julien Grall

[permalink] [raw]
Subject: [PATCH v3 00/15] kvm/arm: Align the VMID allocation with the arm64 ASID one

Hi all,

This patch series is moving out the ASID allocator in a separate file in order
to re-use it for the VMID. The benefits are:
- CPUs are not forced to exit on a roll-over.
- Context invalidation is now per-CPU rather than
broadcasted.

There are no performance regression on the fastpath for ASID allocation.
Actually on the hackbench measurement (300 hackbench) it was .7% faster.

The measurement was made on a Seattle based SoC (8 CPUs), with the
number of VMID limited to 4-bit. The test involves running concurrently 40
guests with 2 vCPUs. Each guest will then execute hackbench 5 times
before exiting.

The performance difference (on 5.1-rc1) between the current algo and the
new one are:
- 2.5% less exit from the guest
- 22.4% more flush, although they are now local rather than broadcasted
- 0.11% faster (just for the record)

The ASID allocator rework to make it generic has been divided in multiple
patches to make the review easier.

A branch with the patch based on 5.3-rc1 can be found:

http://xenbits.xen.org/gitweb/?p=people/julieng/linux-arm.git;a=shortlog;h=refs/heads/vmid-rework/v3

For all the changes see in each patch.

Best regards,

Cc: Russell King <[email protected]>

Julien Grall (15):
arm64/mm: Introduce asid_info structure and move
asid_generation/asid_map to it
arm64/mm: Move active_asids and reserved_asids to asid_info
arm64/mm: Move bits to asid_info
arm64/mm: Move the variable lock and tlb_flush_pending to asid_info
arm64/mm: Remove dependency on MM in new_context
arm64/mm: Store the number of asid allocated per context
arm64/mm: Introduce NUM_ASIDS
arm64/mm: Split asid_inits in 2 parts
arm64/mm: Split the function check_and_switch_context in 3 parts
arm64/mm: Introduce a callback to flush the local context
arm64: Move the ASID allocator code in a separate file
arm64/lib: Add an helper to free memory allocated by the ASID
allocator
arm/kvm: Introduce a new VMID allocator
arch/arm64: Introduce a capability to tell whether 16-bit VMID is
available
kvm/arm: Align the VMID allocation with the arm64 ASID one

arch/arm/include/asm/kvm_asm.h | 2 +-
arch/arm/include/asm/kvm_host.h | 5 +-
arch/arm/include/asm/kvm_hyp.h | 1 +
arch/arm/include/asm/kvm_mmu.h | 3 +-
arch/arm/include/asm/lib_asid.h | 79 +++++++++++++++
arch/arm/kvm/Makefile | 1 +
arch/arm/kvm/hyp/tlb.c | 8 +-
arch/arm64/include/asm/cpucaps.h | 3 +-
arch/arm64/include/asm/kvm_asid.h | 8 ++
arch/arm64/include/asm/kvm_asm.h | 2 +-
arch/arm64/include/asm/kvm_host.h | 5 +-
arch/arm64/include/asm/kvm_mmu.h | 7 +-
arch/arm64/include/asm/lib_asid.h | 79 +++++++++++++++
arch/arm64/kernel/cpufeature.c | 9 ++
arch/arm64/kvm/hyp/tlb.c | 10 +-
arch/arm64/lib/Makefile | 2 +
arch/arm64/lib/asid.c | 190 ++++++++++++++++++++++++++++++++++++
arch/arm64/mm/context.c | 200 +++++---------------------------------
virt/kvm/arm/arm.c | 125 +++++++++---------------
19 files changed, 458 insertions(+), 281 deletions(-)
create mode 100644 arch/arm/include/asm/lib_asid.h
create mode 100644 arch/arm64/include/asm/kvm_asid.h
create mode 100644 arch/arm64/include/asm/lib_asid.h
create mode 100644 arch/arm64/lib/asid.c

--
2.11.0


2019-07-24 17:04:38

by Julien Grall

[permalink] [raw]
Subject: [PATCH v3 02/15] arm64/mm: Move active_asids and reserved_asids to asid_info

The variables active_asids and reserved_asids hold information for a
given ASID allocator. So move them to the structure asid_info.

At the same time, introduce wrappers to access the active and reserved
ASIDs to make the code clearer.

Signed-off-by: Julien Grall <[email protected]>
---
arch/arm64/mm/context.c | 34 ++++++++++++++++++++++------------
1 file changed, 22 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index b0789f30d03b..3de028803284 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -23,10 +23,16 @@ static struct asid_info
{
atomic64_t generation;
unsigned long *map;
+ atomic64_t __percpu *active;
+ u64 __percpu *reserved;
} asid_info;

+#define active_asid(info, cpu) *per_cpu_ptr((info)->active, cpu)
+#define reserved_asid(info, cpu) *per_cpu_ptr((info)->reserved, cpu)
+
static DEFINE_PER_CPU(atomic64_t, active_asids);
static DEFINE_PER_CPU(u64, reserved_asids);
+
static cpumask_t tlb_flush_pending;

#define ASID_MASK (~GENMASK(asid_bits - 1, 0))
@@ -89,7 +95,7 @@ static void flush_context(struct asid_info *info)
bitmap_clear(info->map, 0, NUM_USER_ASIDS);

for_each_possible_cpu(i) {
- asid = atomic64_xchg_relaxed(&per_cpu(active_asids, i), 0);
+ asid = atomic64_xchg_relaxed(&active_asid(info, i), 0);
/*
* If this CPU has already been through a
* rollover, but hasn't run another task in
@@ -98,9 +104,9 @@ static void flush_context(struct asid_info *info)
* the process it is still running.
*/
if (asid == 0)
- asid = per_cpu(reserved_asids, i);
+ asid = reserved_asid(info, i);
__set_bit(asid2idx(asid), info->map);
- per_cpu(reserved_asids, i) = asid;
+ reserved_asid(info, i) = asid;
}

/*
@@ -110,7 +116,8 @@ static void flush_context(struct asid_info *info)
cpumask_setall(&tlb_flush_pending);
}

-static bool check_update_reserved_asid(u64 asid, u64 newasid)
+static bool check_update_reserved_asid(struct asid_info *info, u64 asid,
+ u64 newasid)
{
int cpu;
bool hit = false;
@@ -125,9 +132,9 @@ static bool check_update_reserved_asid(u64 asid, u64 newasid)
* generation.
*/
for_each_possible_cpu(cpu) {
- if (per_cpu(reserved_asids, cpu) == asid) {
+ if (reserved_asid(info, cpu) == asid) {
hit = true;
- per_cpu(reserved_asids, cpu) = newasid;
+ reserved_asid(info, cpu) = newasid;
}
}

@@ -147,7 +154,7 @@ static u64 new_context(struct asid_info *info, struct mm_struct *mm)
* If our current ASID was active during a rollover, we
* can continue to use it and this was just a false alarm.
*/
- if (check_update_reserved_asid(asid, newasid))
+ if (check_update_reserved_asid(info, asid, newasid))
return newasid;

/*
@@ -196,8 +203,8 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)

/*
* The memory ordering here is subtle.
- * If our active_asids is non-zero and the ASID matches the current
- * generation, then we update the active_asids entry with a relaxed
+ * If our active_asid is non-zero and the ASID matches the current
+ * generation, then we update the active_asid entry with a relaxed
* cmpxchg. Racing with a concurrent rollover means that either:
*
* - We get a zero back from the cmpxchg and end up waiting on the
@@ -208,10 +215,10 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
* relaxed xchg in flush_context will treat us as reserved
* because atomic RmWs are totally ordered for a given location.
*/
- old_active_asid = atomic64_read(&per_cpu(active_asids, cpu));
+ old_active_asid = atomic64_read(&active_asid(info, cpu));
if (old_active_asid &&
!((asid ^ atomic64_read(&info->generation)) >> asid_bits) &&
- atomic64_cmpxchg_relaxed(&per_cpu(active_asids, cpu),
+ atomic64_cmpxchg_relaxed(&active_asid(info, cpu),
old_active_asid, asid))
goto switch_mm_fastpath;

@@ -226,7 +233,7 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
if (cpumask_test_and_clear_cpu(cpu, &tlb_flush_pending))
local_flush_tlb_all();

- atomic64_set(&per_cpu(active_asids, cpu), asid);
+ atomic64_set(&active_asid(info, cpu), asid);
raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);

switch_mm_fastpath:
@@ -267,6 +274,9 @@ static int asids_init(void)
panic("Failed to allocate bitmap for %lu ASIDs\n",
NUM_USER_ASIDS);

+ info->active = &active_asids;
+ info->reserved = &reserved_asids;
+
pr_info("ASID allocator initialised with %lu entries\n", NUM_USER_ASIDS);
return 0;
}
--
2.11.0

2019-07-24 17:05:21

by Julien Grall

[permalink] [raw]
Subject: [PATCH v3 06/15] arm64/mm: Store the number of asid allocated per context

Currently the number of ASID allocated per context is determined at
compilation time. As the algorithm is becoming generic, the user may
want to instantiate the ASID allocator multiple time with different
number of ASID allocated.

Add a field in asid_info to track the number ASID allocated per context.
This is stored in term of shift amount to avoid division in the code.

This means the number of ASID allocated per context should be a power of
two.

At the same time rename NUM_USERS_ASIDS to NUM_CTXT_ASIDS to make the
name more generic.

Signed-off-by: Julien Grall <[email protected]>
---
arch/arm64/mm/context.c | 31 +++++++++++++++++--------------
1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index dfb0da35a541..2e1e495cd1d8 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -26,6 +26,8 @@ static struct asid_info
raw_spinlock_t lock;
/* Which CPU requires context flush on next call */
cpumask_t flush_pending;
+ /* Number of ASID allocated by context (shift value) */
+ unsigned int ctxt_shift;
} asid_info;

#define active_asid(info, cpu) *per_cpu_ptr((info)->active, cpu)
@@ -38,15 +40,15 @@ static DEFINE_PER_CPU(u64, reserved_asids);
#define ASID_FIRST_VERSION(info) (1UL << ((info)->bits))

#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
-#define NUM_USER_ASIDS(info) (ASID_FIRST_VERSION(info) >> 1)
-#define asid2idx(info, asid) (((asid) & ~ASID_MASK(info)) >> 1)
-#define idx2asid(info, idx) (((idx) << 1) & ~ASID_MASK(info))
+#define ASID_PER_CONTEXT 2
#else
-#define NUM_USER_ASIDS(info) (ASID_FIRST_VERSION(info))
-#define asid2idx(info, asid) ((asid) & ~ASID_MASK(info))
-#define idx2asid(info, idx) asid2idx(info, idx)
+#define ASID_PER_CONTEXT 1
#endif

+#define NUM_CTXT_ASIDS(info) (ASID_FIRST_VERSION(info) >> (info)->ctxt_shift)
+#define asid2idx(info, asid) (((asid) & ~ASID_MASK(info)) >> (info)->ctxt_shift)
+#define idx2asid(info, idx) (((idx) << (info)->ctxt_shift) & ~ASID_MASK(info))
+
/* Get the ASIDBits supported by the current CPU */
static u32 get_cpu_asid_bits(void)
{
@@ -91,7 +93,7 @@ static void flush_context(struct asid_info *info)
u64 asid;

/* Update the list of reserved ASIDs and the ASID bitmap. */
- bitmap_clear(info->map, 0, NUM_USER_ASIDS(info));
+ bitmap_clear(info->map, 0, NUM_CTXT_ASIDS(info));

for_each_possible_cpu(i) {
asid = atomic64_xchg_relaxed(&active_asid(info, i), 0);
@@ -171,8 +173,8 @@ static u64 new_context(struct asid_info *info, atomic64_t *pasid)
* a reserved TTBR0 for the init_mm and we allocate ASIDs in even/odd
* pairs.
*/
- asid = find_next_zero_bit(info->map, NUM_USER_ASIDS(info), cur_idx);
- if (asid != NUM_USER_ASIDS(info))
+ asid = find_next_zero_bit(info->map, NUM_CTXT_ASIDS(info), cur_idx);
+ if (asid != NUM_CTXT_ASIDS(info))
goto set_asid;

/* We're out of ASIDs, so increment the global generation count */
@@ -181,7 +183,7 @@ static u64 new_context(struct asid_info *info, atomic64_t *pasid)
flush_context(info);

/* We have more ASIDs than CPUs, so this will always succeed */
- asid = find_next_zero_bit(info->map, NUM_USER_ASIDS(info), 1);
+ asid = find_next_zero_bit(info->map, NUM_CTXT_ASIDS(info), 1);

set_asid:
__set_bit(asid, info->map);
@@ -261,17 +263,18 @@ static int asids_init(void)
struct asid_info *info = &asid_info;

info->bits = get_cpu_asid_bits();
+ info->ctxt_shift = ilog2(ASID_PER_CONTEXT);
/*
* Expect allocation after rollover to fail if we don't have at least
* one more ASID than CPUs. ASID #0 is reserved for init_mm.
*/
- WARN_ON(NUM_USER_ASIDS(info) - 1 <= num_possible_cpus());
+ WARN_ON(NUM_CTXT_ASIDS(info) - 1 <= num_possible_cpus());
atomic64_set(&info->generation, ASID_FIRST_VERSION(info));
- info->map = kcalloc(BITS_TO_LONGS(NUM_USER_ASIDS(info)),
+ info->map = kcalloc(BITS_TO_LONGS(NUM_CTXT_ASIDS(info)),
sizeof(*info->map), GFP_KERNEL);
if (!info->map)
panic("Failed to allocate bitmap for %lu ASIDs\n",
- NUM_USER_ASIDS(info));
+ NUM_CTXT_ASIDS(info));

info->active = &active_asids;
info->reserved = &reserved_asids;
@@ -279,7 +282,7 @@ static int asids_init(void)
raw_spin_lock_init(&info->lock);

pr_info("ASID allocator initialised with %lu entries\n",
- NUM_USER_ASIDS(info));
+ NUM_CTXT_ASIDS(info));
return 0;
}
early_initcall(asids_init);
--
2.11.0

2019-07-24 17:37:54

by Julien Grall

[permalink] [raw]
Subject: [PATCH v3 10/15] arm64/mm: Introduce a callback to flush the local context

Flushing the local context will vary depending on the actual user of the ASID
allocator. Introduce a new callback to flush the local context and move
the call to flush local TLB in it.

Signed-off-by: Julien Grall <[email protected]>
---
arch/arm64/mm/context.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index 5e8b381ab67f..ac10893b403c 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -28,6 +28,8 @@ static struct asid_info
cpumask_t flush_pending;
/* Number of ASID allocated by context (shift value) */
unsigned int ctxt_shift;
+ /* Callback to locally flush the context. */
+ void (*flush_cpu_ctxt_cb)(void);
} asid_info;

#define active_asid(info, cpu) *per_cpu_ptr((info)->active, cpu)
@@ -255,7 +257,7 @@ static void asid_new_context(struct asid_info *info, atomic64_t *pasid,
}

if (cpumask_test_and_clear_cpu(cpu, &info->flush_pending))
- local_flush_tlb_all();
+ info->flush_cpu_ctxt_cb();

atomic64_set(&active_asid(info, cpu), asid);
raw_spin_unlock_irqrestore(&info->lock, flags);
@@ -287,6 +289,11 @@ asmlinkage void post_ttbr_update_workaround(void)
CONFIG_CAVIUM_ERRATUM_27456));
}

+static void asid_flush_cpu_ctxt(void)
+{
+ local_flush_tlb_all();
+}
+
/*
* Initialize the ASID allocator
*
@@ -297,10 +304,12 @@ asmlinkage void post_ttbr_update_workaround(void)
* 2.
*/
static int asid_allocator_init(struct asid_info *info,
- u32 bits, unsigned int asid_per_ctxt)
+ u32 bits, unsigned int asid_per_ctxt,
+ void (*flush_cpu_ctxt_cb)(void))
{
info->bits = bits;
info->ctxt_shift = ilog2(asid_per_ctxt);
+ info->flush_cpu_ctxt_cb = flush_cpu_ctxt_cb;
/*
* Expect allocation after rollover to fail if we don't have at least
* one more ASID than CPUs. ASID #0 is always reserved.
@@ -321,7 +330,8 @@ static int asids_init(void)
{
u32 bits = get_cpu_asid_bits();

- if (asid_allocator_init(&asid_info, bits, ASID_PER_CONTEXT))
+ if (asid_allocator_init(&asid_info, bits, ASID_PER_CONTEXT,
+ asid_flush_cpu_ctxt))
panic("Unable to initialize ASID allocator for %lu ASIDs\n",
1UL << bits);

--
2.11.0

2019-07-24 17:38:11

by Julien Grall

[permalink] [raw]
Subject: [PATCH v3 11/15] arm64: Move the ASID allocator code in a separate file

We will want to re-use the ASID allocator in a separate context (e.g
allocating VMID). So move the code in a new file.

The function asid_check_context has been moved in the header as a static
inline function because we want to avoid add a branch when checking if the
ASID is still valid.

Signed-off-by: Julien Grall <[email protected]>

---

This code will be used in the virt code for allocating VMID. I am not
entirely sure where to place it. Lib could potentially be a good place but I
am not entirely convinced the algo as it is could be used by other
architecture.

Looking at x86, it seems that it will not be possible to re-use because
the number of PCID (aka ASID) could be smaller than the number of CPUs.
See commit message 10af6235e0d327d42e1bad974385197817923dc1 "x86/mm:
Implement PCID based optimization: try to preserve old TLB entries using
PCI".

Changes in v3:
- Correctly move ASID_FIRST_VERSION to the new file

Changes in v2:
- Rename the header from asid.h to lib_asid.h
---
arch/arm64/include/asm/lib_asid.h | 77 +++++++++++++
arch/arm64/lib/Makefile | 2 +
arch/arm64/lib/asid.c | 185 ++++++++++++++++++++++++++++++
arch/arm64/mm/context.c | 235 +-------------------------------------
4 files changed, 267 insertions(+), 232 deletions(-)
create mode 100644 arch/arm64/include/asm/lib_asid.h
create mode 100644 arch/arm64/lib/asid.c

diff --git a/arch/arm64/include/asm/lib_asid.h b/arch/arm64/include/asm/lib_asid.h
new file mode 100644
index 000000000000..c18e9eca500e
--- /dev/null
+++ b/arch/arm64/include/asm/lib_asid.h
@@ -0,0 +1,77 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_ASM_LIB_ASID_H
+#define __ASM_ASM_LIB_ASID_H
+
+#include <linux/atomic.h>
+#include <linux/compiler.h>
+#include <linux/cpumask.h>
+#include <linux/percpu.h>
+#include <linux/spinlock.h>
+
+struct asid_info
+{
+ atomic64_t generation;
+ unsigned long *map;
+ atomic64_t __percpu *active;
+ u64 __percpu *reserved;
+ u32 bits;
+ /* Lock protecting the structure */
+ raw_spinlock_t lock;
+ /* Which CPU requires context flush on next call */
+ cpumask_t flush_pending;
+ /* Number of ASID allocated by context (shift value) */
+ unsigned int ctxt_shift;
+ /* Callback to locally flush the context. */
+ void (*flush_cpu_ctxt_cb)(void);
+};
+
+#define NUM_ASIDS(info) (1UL << ((info)->bits))
+#define NUM_CTXT_ASIDS(info) (NUM_ASIDS(info) >> (info)->ctxt_shift)
+
+#define active_asid(info, cpu) *per_cpu_ptr((info)->active, cpu)
+
+void asid_new_context(struct asid_info *info, atomic64_t *pasid,
+ unsigned int cpu);
+
+/*
+ * Check the ASID is still valid for the context. If not generate a new ASID.
+ *
+ * @pasid: Pointer to the current ASID batch
+ * @cpu: current CPU ID. Must have been acquired throught get_cpu()
+ */
+static inline void asid_check_context(struct asid_info *info,
+ atomic64_t *pasid, unsigned int cpu)
+{
+ u64 asid, old_active_asid;
+
+ asid = atomic64_read(pasid);
+
+ /*
+ * The memory ordering here is subtle.
+ * If our active_asid is non-zero and the ASID matches the current
+ * generation, then we update the active_asid entry with a relaxed
+ * cmpxchg. Racing with a concurrent rollover means that either:
+ *
+ * - We get a zero back from the cmpxchg and end up waiting on the
+ * lock. Taking the lock synchronises with the rollover and so
+ * we are forced to see the updated generation.
+ *
+ * - We get a valid ASID back from the cmpxchg, which means the
+ * relaxed xchg in flush_context will treat us as reserved
+ * because atomic RmWs are totally ordered for a given location.
+ */
+ old_active_asid = atomic64_read(&active_asid(info, cpu));
+ if (old_active_asid &&
+ !((asid ^ atomic64_read(&info->generation)) >> info->bits) &&
+ atomic64_cmpxchg_relaxed(&active_asid(info, cpu),
+ old_active_asid, asid))
+ return;
+
+ asid_new_context(info, pasid, cpu);
+}
+
+int asid_allocator_init(struct asid_info *info,
+ u32 bits, unsigned int asid_per_ctxt,
+ void (*flush_cpu_ctxt_cb)(void));
+
+#endif
diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
index 33c2a4abda04..37169d541ab5 100644
--- a/arch/arm64/lib/Makefile
+++ b/arch/arm64/lib/Makefile
@@ -5,6 +5,8 @@ lib-y := clear_user.o delay.o copy_from_user.o \
memcmp.o strcmp.o strncmp.o strlen.o strnlen.o \
strchr.o strrchr.o tishift.o

+lib-y += asid.o
+
ifeq ($(CONFIG_KERNEL_MODE_NEON), y)
obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o
CFLAGS_REMOVE_xor-neon.o += -mgeneral-regs-only
diff --git a/arch/arm64/lib/asid.c b/arch/arm64/lib/asid.c
new file mode 100644
index 000000000000..0b3a99c4aed4
--- /dev/null
+++ b/arch/arm64/lib/asid.c
@@ -0,0 +1,185 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Generic ASID allocator.
+ *
+ * Based on arch/arm/mm/context.c
+ *
+ * Copyright (C) 2002-2003 Deep Blue Solutions Ltd, all rights reserved.
+ * Copyright (C) 2012 ARM Ltd.
+ */
+
+#include <linux/slab.h>
+
+#include <asm/lib_asid.h>
+
+#define reserved_asid(info, cpu) *per_cpu_ptr((info)->reserved, cpu)
+
+#define ASID_MASK(info) (~GENMASK((info)->bits - 1, 0))
+#define ASID_FIRST_VERSION(info) NUM_ASIDS(info)
+
+#define asid2idx(info, asid) (((asid) & ~ASID_MASK(info)) >> (info)->ctxt_shift)
+#define idx2asid(info, idx) (((idx) << (info)->ctxt_shift) & ~ASID_MASK(info))
+
+static void flush_context(struct asid_info *info)
+{
+ int i;
+ u64 asid;
+
+ /* Update the list of reserved ASIDs and the ASID bitmap. */
+ bitmap_clear(info->map, 0, NUM_CTXT_ASIDS(info));
+
+ for_each_possible_cpu(i) {
+ asid = atomic64_xchg_relaxed(&active_asid(info, i), 0);
+ /*
+ * If this CPU has already been through a
+ * rollover, but hasn't run another task in
+ * the meantime, we must preserve its reserved
+ * ASID, as this is the only trace we have of
+ * the process it is still running.
+ */
+ if (asid == 0)
+ asid = reserved_asid(info, i);
+ __set_bit(asid2idx(info, asid), info->map);
+ reserved_asid(info, i) = asid;
+ }
+
+ /*
+ * Queue a TLB invalidation for each CPU to perform on next
+ * context-switch
+ */
+ cpumask_setall(&info->flush_pending);
+}
+
+static bool check_update_reserved_asid(struct asid_info *info, u64 asid,
+ u64 newasid)
+{
+ int cpu;
+ bool hit = false;
+
+ /*
+ * Iterate over the set of reserved ASIDs looking for a match.
+ * If we find one, then we can update our mm to use newasid
+ * (i.e. the same ASID in the current generation) but we can't
+ * exit the loop early, since we need to ensure that all copies
+ * of the old ASID are updated to reflect the mm. Failure to do
+ * so could result in us missing the reserved ASID in a future
+ * generation.
+ */
+ for_each_possible_cpu(cpu) {
+ if (reserved_asid(info, cpu) == asid) {
+ hit = true;
+ reserved_asid(info, cpu) = newasid;
+ }
+ }
+
+ return hit;
+}
+
+static u64 new_context(struct asid_info *info, atomic64_t *pasid)
+{
+ static u32 cur_idx = 1;
+ u64 asid = atomic64_read(pasid);
+ u64 generation = atomic64_read(&info->generation);
+
+ if (asid != 0) {
+ u64 newasid = generation | (asid & ~ASID_MASK(info));
+
+ /*
+ * If our current ASID was active during a rollover, we
+ * can continue to use it and this was just a false alarm.
+ */
+ if (check_update_reserved_asid(info, asid, newasid))
+ return newasid;
+
+ /*
+ * We had a valid ASID in a previous life, so try to re-use
+ * it if possible.
+ */
+ if (!__test_and_set_bit(asid2idx(info, asid), info->map))
+ return newasid;
+ }
+
+ /*
+ * Allocate a free ASID. If we can't find one, take a note of the
+ * currently active ASIDs and mark the TLBs as requiring flushes. We
+ * always count from ASID #2 (index 1), as we use ASID #0 when setting
+ * a reserved TTBR0 for the init_mm and we allocate ASIDs in even/odd
+ * pairs.
+ */
+ asid = find_next_zero_bit(info->map, NUM_CTXT_ASIDS(info), cur_idx);
+ if (asid != NUM_CTXT_ASIDS(info))
+ goto set_asid;
+
+ /* We're out of ASIDs, so increment the global generation count */
+ generation = atomic64_add_return_relaxed(ASID_FIRST_VERSION(info),
+ &info->generation);
+ flush_context(info);
+
+ /* We have more ASIDs than CPUs, so this will always succeed */
+ asid = find_next_zero_bit(info->map, NUM_CTXT_ASIDS(info), 1);
+
+set_asid:
+ __set_bit(asid, info->map);
+ cur_idx = asid;
+ return idx2asid(info, asid) | generation;
+}
+
+/*
+ * Generate a new ASID for the context.
+ *
+ * @pasid: Pointer to the current ASID batch allocated. It will be updated
+ * with the new ASID batch.
+ * @cpu: current CPU ID. Must have been acquired through get_cpu()
+ */
+void asid_new_context(struct asid_info *info, atomic64_t *pasid,
+ unsigned int cpu)
+{
+ unsigned long flags;
+ u64 asid;
+
+ raw_spin_lock_irqsave(&info->lock, flags);
+ /* Check that our ASID belongs to the current generation. */
+ asid = atomic64_read(pasid);
+ if ((asid ^ atomic64_read(&info->generation)) >> info->bits) {
+ asid = new_context(info, pasid);
+ atomic64_set(pasid, asid);
+ }
+
+ if (cpumask_test_and_clear_cpu(cpu, &info->flush_pending))
+ info->flush_cpu_ctxt_cb();
+
+ atomic64_set(&active_asid(info, cpu), asid);
+ raw_spin_unlock_irqrestore(&info->lock, flags);
+}
+
+/*
+ * Initialize the ASID allocator
+ *
+ * @info: Pointer to the asid allocator structure
+ * @bits: Number of ASIDs available
+ * @asid_per_ctxt: Number of ASIDs to allocate per-context. ASIDs are
+ * allocated contiguously for a given context. This value should be a power of
+ * 2.
+ */
+int asid_allocator_init(struct asid_info *info,
+ u32 bits, unsigned int asid_per_ctxt,
+ void (*flush_cpu_ctxt_cb)(void))
+{
+ info->bits = bits;
+ info->ctxt_shift = ilog2(asid_per_ctxt);
+ info->flush_cpu_ctxt_cb = flush_cpu_ctxt_cb;
+ /*
+ * Expect allocation after rollover to fail if we don't have at least
+ * one more ASID than CPUs. ASID #0 is always reserved.
+ */
+ WARN_ON(NUM_CTXT_ASIDS(info) - 1 <= num_possible_cpus());
+ atomic64_set(&info->generation, ASID_FIRST_VERSION(info));
+ info->map = kcalloc(BITS_TO_LONGS(NUM_CTXT_ASIDS(info)),
+ sizeof(*info->map), GFP_KERNEL);
+ if (!info->map)
+ return -ENOMEM;
+
+ raw_spin_lock_init(&info->lock);
+
+ return 0;
+}
diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index ac10893b403c..9b352a072fbb 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -12,46 +12,21 @@
#include <linux/mm.h>

#include <asm/cpufeature.h>
+#include <asm/lib_asid.h>
#include <asm/mmu_context.h>
#include <asm/smp.h>
#include <asm/tlbflush.h>

-static struct asid_info
-{
- atomic64_t generation;
- unsigned long *map;
- atomic64_t __percpu *active;
- u64 __percpu *reserved;
- u32 bits;
- raw_spinlock_t lock;
- /* Which CPU requires context flush on next call */
- cpumask_t flush_pending;
- /* Number of ASID allocated by context (shift value) */
- unsigned int ctxt_shift;
- /* Callback to locally flush the context. */
- void (*flush_cpu_ctxt_cb)(void);
-} asid_info;
-
-#define active_asid(info, cpu) *per_cpu_ptr((info)->active, cpu)
-#define reserved_asid(info, cpu) *per_cpu_ptr((info)->reserved, cpu)
-
static DEFINE_PER_CPU(atomic64_t, active_asids);
static DEFINE_PER_CPU(u64, reserved_asids);

-#define ASID_MASK(info) (~GENMASK((info)->bits - 1, 0))
-#define NUM_ASIDS(info) (1UL << ((info)->bits))
-
-#define ASID_FIRST_VERSION(info) NUM_ASIDS(info)
-
#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
#define ASID_PER_CONTEXT 2
#else
#define ASID_PER_CONTEXT 1
#endif

-#define NUM_CTXT_ASIDS(info) (NUM_ASIDS(info) >> (info)->ctxt_shift)
-#define asid2idx(info, asid) (((asid) & ~ASID_MASK(info)) >> (info)->ctxt_shift)
-#define idx2asid(info, idx) (((idx) << (info)->ctxt_shift) & ~ASID_MASK(info))
+static struct asid_info asid_info;

/* Get the ASIDBits supported by the current CPU */
static u32 get_cpu_asid_bits(void)
@@ -91,178 +66,6 @@ void verify_cpu_asid_bits(void)
}
}

-static void flush_context(struct asid_info *info)
-{
- int i;
- u64 asid;
-
- /* Update the list of reserved ASIDs and the ASID bitmap. */
- bitmap_clear(info->map, 0, NUM_CTXT_ASIDS(info));
-
- for_each_possible_cpu(i) {
- asid = atomic64_xchg_relaxed(&active_asid(info, i), 0);
- /*
- * If this CPU has already been through a
- * rollover, but hasn't run another task in
- * the meantime, we must preserve its reserved
- * ASID, as this is the only trace we have of
- * the process it is still running.
- */
- if (asid == 0)
- asid = reserved_asid(info, i);
- __set_bit(asid2idx(info, asid), info->map);
- reserved_asid(info, i) = asid;
- }
-
- /*
- * Queue a TLB invalidation for each CPU to perform on next
- * context-switch
- */
- cpumask_setall(&info->flush_pending);
-}
-
-static bool check_update_reserved_asid(struct asid_info *info, u64 asid,
- u64 newasid)
-{
- int cpu;
- bool hit = false;
-
- /*
- * Iterate over the set of reserved ASIDs looking for a match.
- * If we find one, then we can update our mm to use newasid
- * (i.e. the same ASID in the current generation) but we can't
- * exit the loop early, since we need to ensure that all copies
- * of the old ASID are updated to reflect the mm. Failure to do
- * so could result in us missing the reserved ASID in a future
- * generation.
- */
- for_each_possible_cpu(cpu) {
- if (reserved_asid(info, cpu) == asid) {
- hit = true;
- reserved_asid(info, cpu) = newasid;
- }
- }
-
- return hit;
-}
-
-static u64 new_context(struct asid_info *info, atomic64_t *pasid)
-{
- static u32 cur_idx = 1;
- u64 asid = atomic64_read(pasid);
- u64 generation = atomic64_read(&info->generation);
-
- if (asid != 0) {
- u64 newasid = generation | (asid & ~ASID_MASK(info));
-
- /*
- * If our current ASID was active during a rollover, we
- * can continue to use it and this was just a false alarm.
- */
- if (check_update_reserved_asid(info, asid, newasid))
- return newasid;
-
- /*
- * We had a valid ASID in a previous life, so try to re-use
- * it if possible.
- */
- if (!__test_and_set_bit(asid2idx(info, asid), info->map))
- return newasid;
- }
-
- /*
- * Allocate a free ASID. If we can't find one, take a note of the
- * currently active ASIDs and mark the TLBs as requiring flushes. We
- * always count from ASID #2 (index 1), as we use ASID #0 when setting
- * a reserved TTBR0 for the init_mm and we allocate ASIDs in even/odd
- * pairs.
- */
- asid = find_next_zero_bit(info->map, NUM_CTXT_ASIDS(info), cur_idx);
- if (asid != NUM_CTXT_ASIDS(info))
- goto set_asid;
-
- /* We're out of ASIDs, so increment the global generation count */
- generation = atomic64_add_return_relaxed(ASID_FIRST_VERSION(info),
- &info->generation);
- flush_context(info);
-
- /* We have more ASIDs than CPUs, so this will always succeed */
- asid = find_next_zero_bit(info->map, NUM_CTXT_ASIDS(info), 1);
-
-set_asid:
- __set_bit(asid, info->map);
- cur_idx = asid;
- return idx2asid(info, asid) | generation;
-}
-
-static void asid_new_context(struct asid_info *info, atomic64_t *pasid,
- unsigned int cpu);
-
-/*
- * Check the ASID is still valid for the context. If not generate a new ASID.
- *
- * @pasid: Pointer to the current ASID batch
- * @cpu: current CPU ID. Must have been acquired throught get_cpu()
- */
-static void asid_check_context(struct asid_info *info,
- atomic64_t *pasid, unsigned int cpu)
-{
- u64 asid, old_active_asid;
-
- asid = atomic64_read(pasid);
-
- /*
- * The memory ordering here is subtle.
- * If our active_asid is non-zero and the ASID matches the current
- * generation, then we update the active_asid entry with a relaxed
- * cmpxchg. Racing with a concurrent rollover means that either:
- *
- * - We get a zero back from the cmpxchg and end up waiting on the
- * lock. Taking the lock synchronises with the rollover and so
- * we are forced to see the updated generation.
- *
- * - We get a valid ASID back from the cmpxchg, which means the
- * relaxed xchg in flush_context will treat us as reserved
- * because atomic RmWs are totally ordered for a given location.
- */
- old_active_asid = atomic64_read(&active_asid(info, cpu));
- if (old_active_asid &&
- !((asid ^ atomic64_read(&info->generation)) >> info->bits) &&
- atomic64_cmpxchg_relaxed(&active_asid(info, cpu),
- old_active_asid, asid))
- return;
-
- asid_new_context(info, pasid, cpu);
-}
-
-/*
- * Generate a new ASID for the context.
- *
- * @pasid: Pointer to the current ASID batch allocated. It will be updated
- * with the new ASID batch.
- * @cpu: current CPU ID. Must have been acquired through get_cpu()
- */
-static void asid_new_context(struct asid_info *info, atomic64_t *pasid,
- unsigned int cpu)
-{
- unsigned long flags;
- u64 asid;
-
- raw_spin_lock_irqsave(&info->lock, flags);
- /* Check that our ASID belongs to the current generation. */
- asid = atomic64_read(pasid);
- if ((asid ^ atomic64_read(&info->generation)) >> info->bits) {
- asid = new_context(info, pasid);
- atomic64_set(pasid, asid);
- }
-
- if (cpumask_test_and_clear_cpu(cpu, &info->flush_pending))
- info->flush_cpu_ctxt_cb();
-
- atomic64_set(&active_asid(info, cpu), asid);
- raw_spin_unlock_irqrestore(&info->lock, flags);
-}
-
void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
{
if (system_supports_cnp())
@@ -294,38 +97,6 @@ static void asid_flush_cpu_ctxt(void)
local_flush_tlb_all();
}

-/*
- * Initialize the ASID allocator
- *
- * @info: Pointer to the asid allocator structure
- * @bits: Number of ASIDs available
- * @asid_per_ctxt: Number of ASIDs to allocate per-context. ASIDs are
- * allocated contiguously for a given context. This value should be a power of
- * 2.
- */
-static int asid_allocator_init(struct asid_info *info,
- u32 bits, unsigned int asid_per_ctxt,
- void (*flush_cpu_ctxt_cb)(void))
-{
- info->bits = bits;
- info->ctxt_shift = ilog2(asid_per_ctxt);
- info->flush_cpu_ctxt_cb = flush_cpu_ctxt_cb;
- /*
- * Expect allocation after rollover to fail if we don't have at least
- * one more ASID than CPUs. ASID #0 is always reserved.
- */
- WARN_ON(NUM_CTXT_ASIDS(info) - 1 <= num_possible_cpus());
- atomic64_set(&info->generation, ASID_FIRST_VERSION(info));
- info->map = kcalloc(BITS_TO_LONGS(NUM_CTXT_ASIDS(info)),
- sizeof(*info->map), GFP_KERNEL);
- if (!info->map)
- return -ENOMEM;
-
- raw_spin_lock_init(&info->lock);
-
- return 0;
-}
-
static int asids_init(void)
{
u32 bits = get_cpu_asid_bits();
@@ -333,7 +104,7 @@ static int asids_init(void)
if (asid_allocator_init(&asid_info, bits, ASID_PER_CONTEXT,
asid_flush_cpu_ctxt))
panic("Unable to initialize ASID allocator for %lu ASIDs\n",
- 1UL << bits);
+ NUM_ASIDS(&asid_info));

asid_info.active = &active_asids;
asid_info.reserved = &reserved_asids;
--
2.11.0

2019-07-24 18:46:14

by Julien Grall

[permalink] [raw]
Subject: [PATCH v3 07/15] arm64/mm: Introduce NUM_ASIDS

At the moment ASID_FIRST_VERSION is used to know the number of ASIDs
supported. As we are going to move the ASID allocator in a separate, it
would be better to use a different name for external users.

This patch adds NUM_ASIDS and implements ASID_FIRST_VERSION using it.

Signed-off-by: Julien Grall <[email protected]>
---
arch/arm64/mm/context.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index 2e1e495cd1d8..3b40ac4a2541 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -37,7 +37,9 @@ static DEFINE_PER_CPU(atomic64_t, active_asids);
static DEFINE_PER_CPU(u64, reserved_asids);

#define ASID_MASK(info) (~GENMASK((info)->bits - 1, 0))
-#define ASID_FIRST_VERSION(info) (1UL << ((info)->bits))
+#define NUM_ASIDS(info) (1UL << ((info)->bits))
+
+#define ASID_FIRST_VERSION(info) NUM_ASIDS(info)

#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
#define ASID_PER_CONTEXT 2
@@ -45,7 +47,7 @@ static DEFINE_PER_CPU(u64, reserved_asids);
#define ASID_PER_CONTEXT 1
#endif

-#define NUM_CTXT_ASIDS(info) (ASID_FIRST_VERSION(info) >> (info)->ctxt_shift)
+#define NUM_CTXT_ASIDS(info) (NUM_ASIDS(info) >> (info)->ctxt_shift)
#define asid2idx(info, asid) (((asid) & ~ASID_MASK(info)) >> (info)->ctxt_shift)
#define idx2asid(info, idx) (((idx) << (info)->ctxt_shift) & ~ASID_MASK(info))

--
2.11.0

2019-07-24 18:46:15

by Julien Grall

[permalink] [raw]
Subject: [PATCH v3 09/15] arm64/mm: Split the function check_and_switch_context in 3 parts

The function check_and_switch_context is used to:
1) Check whether the ASID is still valid
2) Generate a new one if it is not valid
3) Switch the context

While the latter is specific to the MM subsystem, the rest could be part
of the generic ASID allocator.

After this patch, the function is now split in 3 parts which corresponds
to the use of the functions:
1) asid_check_context: Check if the ASID is still valid
2) asid_new_context: Generate a new ASID for the context
3) check_and_switch_context: Call 1) and 2) and switch the context

1) and 2) have not been merged in a single function because we want to
avoid to add a branch in when the ASID is still valid. This will matter
when the code will be moved in separate file later on as 1) will reside
in the header as a static inline function.

Signed-off-by: Julien Grall <[email protected]>

---

Will wants to avoid to add a branch when the ASID is still valid. So
1) and 2) are in separates function. The former will move to a new
header and make static inline.
---
arch/arm64/mm/context.c | 51 +++++++++++++++++++++++++++++++++++++------------
1 file changed, 39 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index 27e328fffdb1..5e8b381ab67f 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -193,16 +193,21 @@ static u64 new_context(struct asid_info *info, atomic64_t *pasid)
return idx2asid(info, asid) | generation;
}

-void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
+static void asid_new_context(struct asid_info *info, atomic64_t *pasid,
+ unsigned int cpu);
+
+/*
+ * Check the ASID is still valid for the context. If not generate a new ASID.
+ *
+ * @pasid: Pointer to the current ASID batch
+ * @cpu: current CPU ID. Must have been acquired throught get_cpu()
+ */
+static void asid_check_context(struct asid_info *info,
+ atomic64_t *pasid, unsigned int cpu)
{
- unsigned long flags;
u64 asid, old_active_asid;
- struct asid_info *info = &asid_info;

- if (system_supports_cnp())
- cpu_set_reserved_ttbr0();
-
- asid = atomic64_read(&mm->context.id);
+ asid = atomic64_read(pasid);

/*
* The memory ordering here is subtle.
@@ -223,14 +228,30 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
!((asid ^ atomic64_read(&info->generation)) >> info->bits) &&
atomic64_cmpxchg_relaxed(&active_asid(info, cpu),
old_active_asid, asid))
- goto switch_mm_fastpath;
+ return;
+
+ asid_new_context(info, pasid, cpu);
+}
+
+/*
+ * Generate a new ASID for the context.
+ *
+ * @pasid: Pointer to the current ASID batch allocated. It will be updated
+ * with the new ASID batch.
+ * @cpu: current CPU ID. Must have been acquired through get_cpu()
+ */
+static void asid_new_context(struct asid_info *info, atomic64_t *pasid,
+ unsigned int cpu)
+{
+ unsigned long flags;
+ u64 asid;

raw_spin_lock_irqsave(&info->lock, flags);
/* Check that our ASID belongs to the current generation. */
- asid = atomic64_read(&mm->context.id);
+ asid = atomic64_read(pasid);
if ((asid ^ atomic64_read(&info->generation)) >> info->bits) {
- asid = new_context(info, &mm->context.id);
- atomic64_set(&mm->context.id, asid);
+ asid = new_context(info, pasid);
+ atomic64_set(pasid, asid);
}

if (cpumask_test_and_clear_cpu(cpu, &info->flush_pending))
@@ -238,8 +259,14 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)

atomic64_set(&active_asid(info, cpu), asid);
raw_spin_unlock_irqrestore(&info->lock, flags);
+}
+
+void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
+{
+ if (system_supports_cnp())
+ cpu_set_reserved_ttbr0();

-switch_mm_fastpath:
+ asid_check_context(&asid_info, &mm->context.id, cpu);

arm64_apply_bp_hardening();

--
2.11.0

2019-07-25 18:09:06

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v3 09/15] arm64/mm: Split the function check_and_switch_context in 3 parts

On Wed, Jul 24, 2019 at 05:25:28PM +0100, Julien Grall wrote:
> The function check_and_switch_context is used to:
> 1) Check whether the ASID is still valid
> 2) Generate a new one if it is not valid
> 3) Switch the context
>
> While the latter is specific to the MM subsystem, the rest could be part
> of the generic ASID allocator.
>
> After this patch, the function is now split in 3 parts which corresponds
> to the use of the functions:
> 1) asid_check_context: Check if the ASID is still valid
> 2) asid_new_context: Generate a new ASID for the context
> 3) check_and_switch_context: Call 1) and 2) and switch the context
>
> 1) and 2) have not been merged in a single function because we want to
> avoid to add a branch in when the ASID is still valid. This will matter
> when the code will be moved in separate file later on as 1) will reside
> in the header as a static inline function.
>
> Signed-off-by: Julien Grall <[email protected]>
>
> ---
> Will wants to avoid to add a branch when the ASID is still valid. So
> 1) and 2) are in separates function. The former will move to a new
> header and make static inline.

Was this discussion logged somewhere, just to get the context?

I presume by "branch" you meant the function call to
asid_check_context(). Personally, I don't like the duplication of this
function in patch 13. This is part of the ASID allocation algorithm and
I prefer to keep them together (we even had a bug in here with the xchg
use).

Do you have any numbers to show how non-inlining this function affects
the performance (hackbench -P would do).

--
Catalin