2023-12-11 09:44:54

by Andrea Parri

[permalink] [raw]
Subject: [PATCH v2 0/4] membarrier: riscv: Core serializing command

Changes since v1 ([1]):

Add patch 1/4 to resolve the "sync core" scenario reported by Mathieu
in [1] and meet the requirement of the membarrier "private expedited"
command.

Miscellaneous improvements/additions to documentation and MAINTAINERS
files. Follow up on Mathieu's suggestion in [1] to introduce a Kconfig
/feature for the latter command (patch 4/4).

N.B. Patch 4/4 is in RFC-mode, the plan being to submit this patch to
the various archs/doc lists (more likely, as a stand-alone patch) once
I've got some ack/positive feedback from the MEMBARRIER maintainers.

Andrea

[1] https://lore.kernel.org/all/[email protected]/

Andrea Parri (4):
membarrier: riscv: Add full memory barrier in switch_mm()
locking: Introduce prepare_sync_core_cmd()
membarrier: riscv: Provide core serializing command
membarrier: Introduce Kconfig ARCH_HAS_MEMBARRIER

.../membarrier-sync-core/arch-support.txt | 18 ++++++-
.../sched/membarrier/arch-support.txt | 50 +++++++++++++++++++
MAINTAINERS | 4 +-
arch/alpha/Kconfig | 1 +
arch/arc/Kconfig | 1 +
arch/arm/Kconfig | 1 +
arch/arm64/Kconfig | 1 +
arch/hexagon/Kconfig | 1 +
arch/mips/Kconfig | 1 +
arch/powerpc/Kconfig | 1 +
arch/riscv/Kconfig | 5 ++
arch/riscv/include/asm/membarrier.h | 48 ++++++++++++++++++
arch/riscv/include/asm/sync_core.h | 29 +++++++++++
arch/riscv/mm/context.c | 2 +
arch/s390/Kconfig | 1 +
arch/sparc/Kconfig | 1 +
arch/x86/Kconfig | 1 +
include/linux/sync_core.h | 16 +++++-
init/Kconfig | 6 +++
kernel/sched/core.c | 9 ++--
kernel/sched/membarrier.c | 16 +++++-
21 files changed, 204 insertions(+), 9 deletions(-)
create mode 100644 Documentation/features/sched/membarrier/arch-support.txt
create mode 100644 arch/riscv/include/asm/membarrier.h
create mode 100644 arch/riscv/include/asm/sync_core.h

--
2.34.1


2023-12-11 09:44:56

by Andrea Parri

[permalink] [raw]
Subject: [PATCH v2 2/4] locking: Introduce prepare_sync_core_cmd()

Introduce an architecture function that architectures can use to set
up ("prepare") SYNC_CORE commands.

The function will be used by RISC-V to update its "deferred icache-
flush" data structures (icache_stale_mask).

Architectures defining prepare_sync_core_cmd() static inline need to
select ARCH_HAS_PREPARE_SYNC_CORE_CMD.

Suggested-by: Mathieu Desnoyers <[email protected]>
Signed-off-by: Andrea Parri <[email protected]>
Reviewed-by: Mathieu Desnoyers <[email protected]>
---
include/linux/sync_core.h | 16 +++++++++++++++-
init/Kconfig | 3 +++
kernel/sched/membarrier.c | 1 +
3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/include/linux/sync_core.h b/include/linux/sync_core.h
index 013da4b8b3272..67bb9794b8758 100644
--- a/include/linux/sync_core.h
+++ b/include/linux/sync_core.h
@@ -17,5 +17,19 @@ static inline void sync_core_before_usermode(void)
}
#endif

-#endif /* _LINUX_SYNC_CORE_H */
+#ifdef CONFIG_ARCH_HAS_PREPARE_SYNC_CORE_CMD
+#include <asm/sync_core.h>
+#else
+/*
+ * This is a dummy prepare_sync_core_cmd() implementation that can be used on
+ * all architectures which provide unconditional core serializing instructions
+ * in switch_mm().
+ * If your architecture doesn't provide such core serializing instructions in
+ * switch_mm(), you may need to write your own functions.
+ */
+static inline void prepare_sync_core_cmd(struct mm_struct *mm)
+{
+}
+#endif

+#endif /* _LINUX_SYNC_CORE_H */
diff --git a/init/Kconfig b/init/Kconfig
index 9ffb103fc927b..87daf50838f02 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1972,6 +1972,9 @@ source "kernel/Kconfig.locks"
config ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
bool

+config ARCH_HAS_PREPARE_SYNC_CORE_CMD
+ bool
+
config ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
bool

diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 2ad881d07752c..58f801e013988 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -320,6 +320,7 @@ static int membarrier_private_expedited(int flags, int cpu_id)
MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY))
return -EPERM;
ipi_func = ipi_sync_core;
+ prepare_sync_core_cmd(mm);
} else if (flags == MEMBARRIER_FLAG_RSEQ) {
if (!IS_ENABLED(CONFIG_RSEQ))
return -EINVAL;
--
2.34.1

2023-12-11 09:45:02

by Andrea Parri

[permalink] [raw]
Subject: [PATCH v2 3/4] membarrier: riscv: Provide core serializing command

RISC-V uses xRET instructions on return from interrupt and to go back
to user-space; the xRET instruction is not core serializing.

Use FENCE.I for providing core serialization as follows:

- by calling sync_core_before_usermode() on return from interrupt (cf.
ipi_sync_core()),

- via switch_mm() and sync_core_before_usermode() (respectively, for
uthread->uthread and kthread->uthread transitions) to go back to
user-space.

On RISC-V, the serialization in switch_mm() is activated by resetting
the icache_stale_mask of the mm at prepare_sync_core_cmd().

Suggested-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Andrea Parri <[email protected]>
---
.../membarrier-sync-core/arch-support.txt | 18 +++++++++++-
MAINTAINERS | 1 +
arch/riscv/Kconfig | 3 ++
arch/riscv/include/asm/membarrier.h | 19 ++++++++++++
arch/riscv/include/asm/sync_core.h | 29 +++++++++++++++++++
5 files changed, 69 insertions(+), 1 deletion(-)
create mode 100644 arch/riscv/include/asm/sync_core.h

diff --git a/Documentation/features/sched/membarrier-sync-core/arch-support.txt b/Documentation/features/sched/membarrier-sync-core/arch-support.txt
index d96b778b87ed8..a163170fc0f48 100644
--- a/Documentation/features/sched/membarrier-sync-core/arch-support.txt
+++ b/Documentation/features/sched/membarrier-sync-core/arch-support.txt
@@ -10,6 +10,22 @@
# Rely on implicit context synchronization as a result of exception return
# when returning from IPI handler, and when returning to user-space.
#
+# * riscv
+#
+# riscv uses xRET as return from interrupt and to return to user-space.
+#
+# Given that xRET is not core serializing, we rely on FENCE.I for providing
+# core serialization:
+#
+# - by calling sync_core_before_usermode() on return from interrupt (cf.
+# ipi_sync_core()),
+#
+# - via switch_mm() and sync_core_before_usermode() (respectively, for
+# uthread->uthread and kthread->uthread transitions) to go back to
+# user-space.
+#
+# The serialization in switch_mm() is activated by prepare_sync_core_cmd().
+#
# * x86
#
# x86-32 uses IRET as return from interrupt, which takes care of the IPI.
@@ -43,7 +59,7 @@
| openrisc: | TODO |
| parisc: | TODO |
| powerpc: | ok |
- | riscv: | TODO |
+ | riscv: | ok |
| s390: | ok |
| sh: | TODO |
| sparc: | TODO |
diff --git a/MAINTAINERS b/MAINTAINERS
index a9166d82ffced..f6f1fdc76cf46 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13808,6 +13808,7 @@ M: "Paul E. McKenney" <[email protected]>
L: [email protected]
S: Supported
F: arch/*/include/asm/membarrier.h
+F: arch/*/include/asm/sync_core.h
F: include/uapi/linux/membarrier.h
F: kernel/sched/membarrier.c

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index f7db95097caf1..db7b1acd943e4 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -28,14 +28,17 @@ config RISCV
select ARCH_HAS_GIGANTIC_PAGE
select ARCH_HAS_KCOV
select ARCH_HAS_MEMBARRIER_CALLBACKS
+ select ARCH_HAS_MEMBARRIER_SYNC_CORE
select ARCH_HAS_MMIOWB
select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
select ARCH_HAS_PMEM_API
+ select ARCH_HAS_PREPARE_SYNC_CORE_CMD
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_SET_DIRECT_MAP if MMU
select ARCH_HAS_SET_MEMORY if MMU
select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL
select ARCH_HAS_STRICT_MODULE_RWX if MMU && !XIP_KERNEL
+ select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
select ARCH_HAS_SYSCALL_WRAPPER
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAS_UBSAN_SANITIZE_ALL
diff --git a/arch/riscv/include/asm/membarrier.h b/arch/riscv/include/asm/membarrier.h
index 4be218fa03b14..a1071039c20ed 100644
--- a/arch/riscv/include/asm/membarrier.h
+++ b/arch/riscv/include/asm/membarrier.h
@@ -22,6 +22,25 @@ static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
/*
* The membarrier system call requires a full memory barrier
* after storing to rq->curr, before going back to user-space.
+ *
+ * The barrier is also needed for the SYNC_CORE command when
+ * switching between processes; in particular, on a transition
+ * from a thread belonging to another mm to a thread belonging
+ * to the mm for which a membarrier SYNC_CORE is done on CPU0:
+ *
+ * - [CPU0] sets all bits in the mm icache_stale_mask.
+ *
+ * - [CPU1] store to rq->curr (by the scheduler).
+ *
+ * - [CPU0] loads rq->curr within membarrier and observes
+ * cpu_rq(1)->curr->mm != mm, so the IPI is skipped on
+ * CPU1; this means membarrier relies on switch_mm() to
+ * issue the sync-core.
+ *
+ * - [CPU1] switch_mm() loads icache_stale_mask; if the bit
+ * is zero, switch_mm() may incorrectly skip the sync-core.
+ *
+ * Matches the full barrier in membarrier_private_expedited().
*/
smp_mb();
}
diff --git a/arch/riscv/include/asm/sync_core.h b/arch/riscv/include/asm/sync_core.h
new file mode 100644
index 0000000000000..9153016da8f14
--- /dev/null
+++ b/arch/riscv/include/asm/sync_core.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_RISCV_SYNC_CORE_H
+#define _ASM_RISCV_SYNC_CORE_H
+
+/*
+ * RISC-V implements return to user-space through an xRET instruction,
+ * which is not core serializing.
+ */
+static inline void sync_core_before_usermode(void)
+{
+ asm volatile ("fence.i" ::: "memory");
+}
+
+#ifdef CONFIG_SMP
+/*
+ * Ensure the next switch_mm() on every CPU issues a core serializing
+ * instruction for the given @mm.
+ */
+static inline void prepare_sync_core_cmd(struct mm_struct *mm)
+{
+ cpumask_setall(&mm->context.icache_stale_mask);
+}
+#else
+static inline void prepare_sync_core_cmd(struct mm_struct *mm)
+{
+}
+#endif /* CONFIG_SMP */
+
+#endif /* _ASM_RISCV_SYNC_CORE_H */
--
2.34.1

2023-12-11 09:46:25

by Andrea Parri

[permalink] [raw]
Subject: [RFC PATCH 4/4] membarrier: Introduce Kconfig ARCH_HAS_MEMBARRIER

Architectures supporting the "private expedited" membarrier command must
select the Kconfig to use the command. Document status and requirements
for each architecture in a single file under Documentation/features.

Suggested-by: Mathieu Desnoyers <[email protected]>
Signed-off-by: Andrea Parri <[email protected]>
---
The TODOs in the arch-support table below should really be interpreted
as "I'm not sure/haven't checked" (the respective arch maintainers will
be able to verify and amend such information).

Based on the following inline comment in __schedule():

/*
* The membarrier system call requires each architecture
* to have a full memory barrier after updating
* rq->curr, before returning to user-space.
*
* Here are the schemes providing that barrier on the
* various architectures:
* - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC,
* RISC-V. switch_mm() relies on membarrier_arch_switch_mm()
* on PowerPC and on RISC-V.
* - finish_lock_switch() for weakly-ordered
* architectures where spin_unlock is a full barrier,
* - switch_to() for arm64 (weakly-ordered, spin_unlock
* is a RELEASE barrier),
*/

.../sched/membarrier/arch-support.txt | 50 +++++++++++++++++++
MAINTAINERS | 1 +
arch/alpha/Kconfig | 1 +
arch/arc/Kconfig | 1 +
arch/arm/Kconfig | 1 +
arch/arm64/Kconfig | 1 +
arch/hexagon/Kconfig | 1 +
arch/mips/Kconfig | 1 +
arch/powerpc/Kconfig | 1 +
arch/riscv/Kconfig | 1 +
arch/s390/Kconfig | 1 +
arch/sparc/Kconfig | 1 +
arch/x86/Kconfig | 1 +
init/Kconfig | 3 ++
kernel/sched/core.c | 4 +-
kernel/sched/membarrier.c | 15 +++++-
16 files changed, 80 insertions(+), 4 deletions(-)
create mode 100644 Documentation/features/sched/membarrier/arch-support.txt

diff --git a/Documentation/features/sched/membarrier/arch-support.txt b/Documentation/features/sched/membarrier/arch-support.txt
new file mode 100644
index 0000000000000..95e65195e47c2
--- /dev/null
+++ b/Documentation/features/sched/membarrier/arch-support.txt
@@ -0,0 +1,50 @@
+#
+# Feature name: membarrier
+# Kconfig: ARCH_HAS_MEMBARRIER
+# description: arch supports MEMBARRIER_CMD_PRIVATE_EXPEDITED
+#
+# Architecture requirements
+#
+# The membarrier() system call requires each architecture to have a full memory
+# barrier after updating rq->curr, before returning to user-space.
+#
+# Here are the schemes providing that barrier on the various architectures:
+#
+# * alpha/arc/arm/hexagon/mips
+#
+# We rely on the full barrier implied by spin_unlock() in finish_lock_switch().
+#
+# * arm64
+#
+# We rely on the full barrier implied by switch_to().
+#
+# * powerpc/riscv/s390/sparc/x86
+#
+# We rely on the full barrier implied by switch_mm(), if mm isn't NULL; we rely
+# on the full barrier implied by mmdrop(), otherwise.
+#
+ -----------------------
+ | arch |status|
+ -----------------------
+ | alpha: | ok |
+ | arc: | ok |
+ | arm: | ok |
+ | arm64: | ok |
+ | csky: | TODO |
+ | hexagon: | ok |
+ | loongarch: | TODO |
+ | m68k: | TODO |
+ | microblaze: | TODO |
+ | mips: | ok |
+ | nios2: | TODO |
+ | openrisc: | TODO |
+ | parisc: | TODO |
+ | powerpc: | ok |
+ | riscv: | ok |
+ | s390: | ok |
+ | sh: | TODO |
+ | sparc: | ok |
+ | um: | TODO |
+ | x86: | ok |
+ | xtensa: | TODO |
+ -----------------------
diff --git a/MAINTAINERS b/MAINTAINERS
index f6f1fdc76cf46..c5a053605cbc4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13807,6 +13807,7 @@ M: Mathieu Desnoyers <[email protected]>
M: "Paul E. McKenney" <[email protected]>
L: [email protected]
S: Supported
+F: Documentation/features/sched/membarrier*/
F: arch/*/include/asm/membarrier.h
F: arch/*/include/asm/sync_core.h
F: include/uapi/linux/membarrier.h
diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
index d6968d090d49a..f98d6cba0bd9a 100644
--- a/arch/alpha/Kconfig
+++ b/arch/alpha/Kconfig
@@ -24,6 +24,7 @@ config ALPHA
select GENERIC_IRQ_SHOW
select ARCH_WANT_IPC_PARSE_VERSION
select ARCH_HAVE_NMI_SAFE_CMPXCHG
+ select ARCH_HAS_MEMBARRIER
select AUDIT_ARCH
select GENERIC_CPU_VULNERABILITIES
select GENERIC_SMP_IDLE_THREAD
diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index 3162db540ee96..1d8a6ba98ae33 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -9,6 +9,7 @@ config ARC
select ARCH_HAS_CACHE_LINE_SIZE
select ARCH_HAS_DEBUG_VM_PGTABLE
select ARCH_HAS_DMA_PREP_COHERENT
+ select ARCH_HAS_MEMBARRIER
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_SETUP_DMA_OPS
select ARCH_HAS_SYNC_DMA_FOR_CPU
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index f8567e95f98be..700d1d9ff2f8b 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -14,6 +14,7 @@ config ARM
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_KEEPINITRD
select ARCH_HAS_KCOV
+ select ARCH_HAS_MEMBARRIER
select ARCH_HAS_MEMBARRIER_SYNC_CORE
select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
select ARCH_HAS_PTE_SPECIAL if ARM_LPAE
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7b071a00425d2..d21788e6920f6 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -31,6 +31,7 @@ config ARM64
select ARCH_HAS_GIGANTIC_PAGE
select ARCH_HAS_KCOV
select ARCH_HAS_KEEPINITRD
+ select ARCH_HAS_MEMBARRIER
select ARCH_HAS_MEMBARRIER_SYNC_CORE
select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
diff --git a/arch/hexagon/Kconfig b/arch/hexagon/Kconfig
index a880ee067d2ec..c2b2713c01bbd 100644
--- a/arch/hexagon/Kconfig
+++ b/arch/hexagon/Kconfig
@@ -5,6 +5,7 @@ comment "Linux Kernel Configuration for Hexagon"
config HEXAGON
def_bool y
select ARCH_32BIT_OFF_T
+ select ARCH_HAS_MEMBARRIER
select ARCH_HAS_SYNC_DMA_FOR_DEVICE
select ARCH_NO_PREEMPT
select DMA_GLOBAL_POOL
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 797ae590ebdba..4b65e73e34c16 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -16,6 +16,7 @@ config MIPS
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAS_UBSAN_SANITIZE_ALL
select ARCH_HAS_GCOV_PROFILE_ALL
+ select ARCH_HAS_MEMBARRIER
select ARCH_KEEP_MEMBLOCK
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF if 64BIT
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 6f105ee4f3cf5..c13980eac3585 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -137,6 +137,7 @@ config PPC
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_HUGEPD if HUGETLB_PAGE
select ARCH_HAS_KCOV
+ select ARCH_HAS_MEMBARRIER
select ARCH_HAS_MEMBARRIER_CALLBACKS
select ARCH_HAS_MEMBARRIER_SYNC_CORE
select ARCH_HAS_MEMREMAP_COMPAT_ALIGN if PPC_64S_HASH_MMU
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index db7b1acd943e4..fd4c6a74ebd61 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -27,6 +27,7 @@ config RISCV
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_GIGANTIC_PAGE
select ARCH_HAS_KCOV
+ select ARCH_HAS_MEMBARRIER
select ARCH_HAS_MEMBARRIER_CALLBACKS
select ARCH_HAS_MEMBARRIER_SYNC_CORE
select ARCH_HAS_MMIOWB
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 3bec98d20283b..2e044d424fd4a 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -72,6 +72,7 @@ config S390
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_GIGANTIC_PAGE
select ARCH_HAS_KCOV
+ select ARCH_HAS_MEMBARRIER
select ARCH_HAS_MEMBARRIER_SYNC_CORE
select ARCH_HAS_MEM_ENCRYPT
select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 49849790e66dc..40eb179c2416a 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -30,6 +30,7 @@ config SPARC
select RTC_SYSTOHC
select HAVE_ARCH_JUMP_LABEL if SPARC64
select GENERIC_IRQ_SHOW
+ select ARCH_HAS_MEMBARRIER
select ARCH_WANT_IPC_PARSE_VERSION
select GENERIC_PCI_IOMAP
select HAS_IOPORT
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 3762f41bb0929..83f63e00312ee 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -82,6 +82,7 @@ config X86
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_KCOV if X86_64
select ARCH_HAS_MEM_ENCRYPT
+ select ARCH_HAS_MEMBARRIER
select ARCH_HAS_MEMBARRIER_SYNC_CORE
select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
diff --git a/init/Kconfig b/init/Kconfig
index 87daf50838f02..8114404b52b91 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1742,6 +1742,9 @@ config KALLSYMS_BASE_RELATIVE

# syscall, maps, verifier

+config ARCH_HAS_MEMBARRIER
+ bool
+
config ARCH_HAS_MEMBARRIER_CALLBACKS
bool

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 711dc753f7216..dff1c6df337f9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6673,8 +6673,8 @@ static void __sched notrace __schedule(unsigned int sched_mode)
* - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC,
* RISC-V. switch_mm() relies on membarrier_arch_switch_mm()
* on PowerPC and on RISC-V.
- * - finish_lock_switch() for weakly-ordered
- * architectures where spin_unlock is a full barrier,
+ * - finish_lock_switch() for alpha, arc, arm, hexagon, mips
+ * where spin_unlock is a full barrier,
* - switch_to() for arm64 (weakly-ordered, spin_unlock
* is a RELEASE barrier),
*/
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 58f801e013988..248a38c9b261c 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -137,6 +137,14 @@
* Bitmask made from a "or" of all commands within enum membarrier_cmd,
* except MEMBARRIER_CMD_QUERY.
*/
+#ifdef CONFIG_ARCH_HAS_MEMBARRIER
+#define MEMBARRIER_PRIVATE_EXPEDITED_BITMASK \
+ (MEMBARRIER_CMD_PRIVATE_EXPEDITED \
+ | MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED)
+#else
+#define MEMBARRIER_PRIVATE_EXPEDITED_BITMASK 0
+#endif
+
#ifdef CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE
#define MEMBARRIER_PRIVATE_EXPEDITED_SYNC_CORE_BITMASK \
(MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE \
@@ -156,8 +164,7 @@
#define MEMBARRIER_CMD_BITMASK \
(MEMBARRIER_CMD_GLOBAL | MEMBARRIER_CMD_GLOBAL_EXPEDITED \
| MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED \
- | MEMBARRIER_CMD_PRIVATE_EXPEDITED \
- | MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED \
+ | MEMBARRIER_PRIVATE_EXPEDITED_BITMASK \
| MEMBARRIER_PRIVATE_EXPEDITED_SYNC_CORE_BITMASK \
| MEMBARRIER_PRIVATE_EXPEDITED_RSEQ_BITMASK \
| MEMBARRIER_CMD_GET_REGISTRATIONS)
@@ -329,6 +336,8 @@ static int membarrier_private_expedited(int flags, int cpu_id)
return -EPERM;
ipi_func = ipi_rseq;
} else {
+ if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER))
+ return -EINVAL;
WARN_ON_ONCE(flags);
if (!(atomic_read(&mm->membarrier_state) &
MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY))
@@ -519,6 +528,8 @@ static int membarrier_register_private_expedited(int flags)
ready_state =
MEMBARRIER_STATE_PRIVATE_EXPEDITED_RSEQ_READY;
} else {
+ if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER))
+ return -EINVAL;
WARN_ON_ONCE(flags);
}

--
2.34.1

2023-12-11 13:34:40

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC PATCH 4/4] membarrier: Introduce Kconfig ARCH_HAS_MEMBARRIER

On 2023-12-11 04:44, Andrea Parri wrote:
> Architectures supporting the "private expedited" membarrier command must
> select the Kconfig to use the command. Document status and requirements
> for each architecture in a single file under Documentation/features.

Sorry for being vague in my suggestion: I did not intend to make it
optional (not ARCH_HAS_MEMBARRIER), but rather just to have a central
place where this would be documented.

I'm not sure where in Documentation this should land though ?

Thanks,

Mathieu


>
> Suggested-by: Mathieu Desnoyers <[email protected]>
> Signed-off-by: Andrea Parri <[email protected]>
> ---
> The TODOs in the arch-support table below should really be interpreted
> as "I'm not sure/haven't checked" (the respective arch maintainers will
> be able to verify and amend such information).
>
> Based on the following inline comment in __schedule():
>
> /*
> * The membarrier system call requires each architecture
> * to have a full memory barrier after updating
> * rq->curr, before returning to user-space.
> *
> * Here are the schemes providing that barrier on the
> * various architectures:
> * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC,
> * RISC-V. switch_mm() relies on membarrier_arch_switch_mm()
> * on PowerPC and on RISC-V.
> * - finish_lock_switch() for weakly-ordered
> * architectures where spin_unlock is a full barrier,
> * - switch_to() for arm64 (weakly-ordered, spin_unlock
> * is a RELEASE barrier),
> */
>
> .../sched/membarrier/arch-support.txt | 50 +++++++++++++++++++
> MAINTAINERS | 1 +
> arch/alpha/Kconfig | 1 +
> arch/arc/Kconfig | 1 +
> arch/arm/Kconfig | 1 +
> arch/arm64/Kconfig | 1 +
> arch/hexagon/Kconfig | 1 +
> arch/mips/Kconfig | 1 +
> arch/powerpc/Kconfig | 1 +
> arch/riscv/Kconfig | 1 +
> arch/s390/Kconfig | 1 +
> arch/sparc/Kconfig | 1 +
> arch/x86/Kconfig | 1 +
> init/Kconfig | 3 ++
> kernel/sched/core.c | 4 +-
> kernel/sched/membarrier.c | 15 +++++-
> 16 files changed, 80 insertions(+), 4 deletions(-)
> create mode 100644 Documentation/features/sched/membarrier/arch-support.txt
>
> diff --git a/Documentation/features/sched/membarrier/arch-support.txt b/Documentation/features/sched/membarrier/arch-support.txt
> new file mode 100644
> index 0000000000000..95e65195e47c2
> --- /dev/null
> +++ b/Documentation/features/sched/membarrier/arch-support.txt
> @@ -0,0 +1,50 @@
> +#
> +# Feature name: membarrier
> +# Kconfig: ARCH_HAS_MEMBARRIER
> +# description: arch supports MEMBARRIER_CMD_PRIVATE_EXPEDITED
> +#
> +# Architecture requirements
> +#
> +# The membarrier() system call requires each architecture to have a full memory
> +# barrier after updating rq->curr, before returning to user-space.
> +#
> +# Here are the schemes providing that barrier on the various architectures:
> +#
> +# * alpha/arc/arm/hexagon/mips
> +#
> +# We rely on the full barrier implied by spin_unlock() in finish_lock_switch().
> +#
> +# * arm64
> +#
> +# We rely on the full barrier implied by switch_to().
> +#
> +# * powerpc/riscv/s390/sparc/x86
> +#
> +# We rely on the full barrier implied by switch_mm(), if mm isn't NULL; we rely
> +# on the full barrier implied by mmdrop(), otherwise.
> +#
> + -----------------------
> + | arch |status|
> + -----------------------
> + | alpha: | ok |
> + | arc: | ok |
> + | arm: | ok |
> + | arm64: | ok |
> + | csky: | TODO |
> + | hexagon: | ok |
> + | loongarch: | TODO |
> + | m68k: | TODO |
> + | microblaze: | TODO |
> + | mips: | ok |
> + | nios2: | TODO |
> + | openrisc: | TODO |
> + | parisc: | TODO |
> + | powerpc: | ok |
> + | riscv: | ok |
> + | s390: | ok |
> + | sh: | TODO |
> + | sparc: | ok |
> + | um: | TODO |
> + | x86: | ok |
> + | xtensa: | TODO |
> + -----------------------
> diff --git a/MAINTAINERS b/MAINTAINERS
> index f6f1fdc76cf46..c5a053605cbc4 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -13807,6 +13807,7 @@ M: Mathieu Desnoyers <[email protected]>
> M: "Paul E. McKenney" <[email protected]>
> L: [email protected]
> S: Supported
> +F: Documentation/features/sched/membarrier*/
> F: arch/*/include/asm/membarrier.h
> F: arch/*/include/asm/sync_core.h
> F: include/uapi/linux/membarrier.h
> diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
> index d6968d090d49a..f98d6cba0bd9a 100644
> --- a/arch/alpha/Kconfig
> +++ b/arch/alpha/Kconfig
> @@ -24,6 +24,7 @@ config ALPHA
> select GENERIC_IRQ_SHOW
> select ARCH_WANT_IPC_PARSE_VERSION
> select ARCH_HAVE_NMI_SAFE_CMPXCHG
> + select ARCH_HAS_MEMBARRIER
> select AUDIT_ARCH
> select GENERIC_CPU_VULNERABILITIES
> select GENERIC_SMP_IDLE_THREAD
> diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
> index 3162db540ee96..1d8a6ba98ae33 100644
> --- a/arch/arc/Kconfig
> +++ b/arch/arc/Kconfig
> @@ -9,6 +9,7 @@ config ARC
> select ARCH_HAS_CACHE_LINE_SIZE
> select ARCH_HAS_DEBUG_VM_PGTABLE
> select ARCH_HAS_DMA_PREP_COHERENT
> + select ARCH_HAS_MEMBARRIER
> select ARCH_HAS_PTE_SPECIAL
> select ARCH_HAS_SETUP_DMA_OPS
> select ARCH_HAS_SYNC_DMA_FOR_CPU
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index f8567e95f98be..700d1d9ff2f8b 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -14,6 +14,7 @@ config ARM
> select ARCH_HAS_FORTIFY_SOURCE
> select ARCH_HAS_KEEPINITRD
> select ARCH_HAS_KCOV
> + select ARCH_HAS_MEMBARRIER
> select ARCH_HAS_MEMBARRIER_SYNC_CORE
> select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
> select ARCH_HAS_PTE_SPECIAL if ARM_LPAE
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 7b071a00425d2..d21788e6920f6 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -31,6 +31,7 @@ config ARM64
> select ARCH_HAS_GIGANTIC_PAGE
> select ARCH_HAS_KCOV
> select ARCH_HAS_KEEPINITRD
> + select ARCH_HAS_MEMBARRIER
> select ARCH_HAS_MEMBARRIER_SYNC_CORE
> select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
> select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
> diff --git a/arch/hexagon/Kconfig b/arch/hexagon/Kconfig
> index a880ee067d2ec..c2b2713c01bbd 100644
> --- a/arch/hexagon/Kconfig
> +++ b/arch/hexagon/Kconfig
> @@ -5,6 +5,7 @@ comment "Linux Kernel Configuration for Hexagon"
> config HEXAGON
> def_bool y
> select ARCH_32BIT_OFF_T
> + select ARCH_HAS_MEMBARRIER
> select ARCH_HAS_SYNC_DMA_FOR_DEVICE
> select ARCH_NO_PREEMPT
> select DMA_GLOBAL_POOL
> diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
> index 797ae590ebdba..4b65e73e34c16 100644
> --- a/arch/mips/Kconfig
> +++ b/arch/mips/Kconfig
> @@ -16,6 +16,7 @@ config MIPS
> select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
> select ARCH_HAS_UBSAN_SANITIZE_ALL
> select ARCH_HAS_GCOV_PROFILE_ALL
> + select ARCH_HAS_MEMBARRIER
> select ARCH_KEEP_MEMBLOCK
> select ARCH_USE_BUILTIN_BSWAP
> select ARCH_USE_CMPXCHG_LOCKREF if 64BIT
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 6f105ee4f3cf5..c13980eac3585 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -137,6 +137,7 @@ config PPC
> select ARCH_HAS_GCOV_PROFILE_ALL
> select ARCH_HAS_HUGEPD if HUGETLB_PAGE
> select ARCH_HAS_KCOV
> + select ARCH_HAS_MEMBARRIER
> select ARCH_HAS_MEMBARRIER_CALLBACKS
> select ARCH_HAS_MEMBARRIER_SYNC_CORE
> select ARCH_HAS_MEMREMAP_COMPAT_ALIGN if PPC_64S_HASH_MMU
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index db7b1acd943e4..fd4c6a74ebd61 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -27,6 +27,7 @@ config RISCV
> select ARCH_HAS_GCOV_PROFILE_ALL
> select ARCH_HAS_GIGANTIC_PAGE
> select ARCH_HAS_KCOV
> + select ARCH_HAS_MEMBARRIER
> select ARCH_HAS_MEMBARRIER_CALLBACKS
> select ARCH_HAS_MEMBARRIER_SYNC_CORE
> select ARCH_HAS_MMIOWB
> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
> index 3bec98d20283b..2e044d424fd4a 100644
> --- a/arch/s390/Kconfig
> +++ b/arch/s390/Kconfig
> @@ -72,6 +72,7 @@ config S390
> select ARCH_HAS_GCOV_PROFILE_ALL
> select ARCH_HAS_GIGANTIC_PAGE
> select ARCH_HAS_KCOV
> + select ARCH_HAS_MEMBARRIER
> select ARCH_HAS_MEMBARRIER_SYNC_CORE
> select ARCH_HAS_MEM_ENCRYPT
> select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
> diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
> index 49849790e66dc..40eb179c2416a 100644
> --- a/arch/sparc/Kconfig
> +++ b/arch/sparc/Kconfig
> @@ -30,6 +30,7 @@ config SPARC
> select RTC_SYSTOHC
> select HAVE_ARCH_JUMP_LABEL if SPARC64
> select GENERIC_IRQ_SHOW
> + select ARCH_HAS_MEMBARRIER
> select ARCH_WANT_IPC_PARSE_VERSION
> select GENERIC_PCI_IOMAP
> select HAS_IOPORT
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 3762f41bb0929..83f63e00312ee 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -82,6 +82,7 @@ config X86
> select ARCH_HAS_GCOV_PROFILE_ALL
> select ARCH_HAS_KCOV if X86_64
> select ARCH_HAS_MEM_ENCRYPT
> + select ARCH_HAS_MEMBARRIER
> select ARCH_HAS_MEMBARRIER_SYNC_CORE
> select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
> select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
> diff --git a/init/Kconfig b/init/Kconfig
> index 87daf50838f02..8114404b52b91 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1742,6 +1742,9 @@ config KALLSYMS_BASE_RELATIVE
>
> # syscall, maps, verifier
>
> +config ARCH_HAS_MEMBARRIER
> + bool
> +
> config ARCH_HAS_MEMBARRIER_CALLBACKS
> bool
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 711dc753f7216..dff1c6df337f9 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6673,8 +6673,8 @@ static void __sched notrace __schedule(unsigned int sched_mode)
> * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC,
> * RISC-V. switch_mm() relies on membarrier_arch_switch_mm()
> * on PowerPC and on RISC-V.
> - * - finish_lock_switch() for weakly-ordered
> - * architectures where spin_unlock is a full barrier,
> + * - finish_lock_switch() for alpha, arc, arm, hexagon, mips
> + * where spin_unlock is a full barrier,
> * - switch_to() for arm64 (weakly-ordered, spin_unlock
> * is a RELEASE barrier),
> */
> diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
> index 58f801e013988..248a38c9b261c 100644
> --- a/kernel/sched/membarrier.c
> +++ b/kernel/sched/membarrier.c
> @@ -137,6 +137,14 @@
> * Bitmask made from a "or" of all commands within enum membarrier_cmd,
> * except MEMBARRIER_CMD_QUERY.
> */
> +#ifdef CONFIG_ARCH_HAS_MEMBARRIER
> +#define MEMBARRIER_PRIVATE_EXPEDITED_BITMASK \
> + (MEMBARRIER_CMD_PRIVATE_EXPEDITED \
> + | MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED)
> +#else
> +#define MEMBARRIER_PRIVATE_EXPEDITED_BITMASK 0
> +#endif
> +
> #ifdef CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE
> #define MEMBARRIER_PRIVATE_EXPEDITED_SYNC_CORE_BITMASK \
> (MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE \
> @@ -156,8 +164,7 @@
> #define MEMBARRIER_CMD_BITMASK \
> (MEMBARRIER_CMD_GLOBAL | MEMBARRIER_CMD_GLOBAL_EXPEDITED \
> | MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED \
> - | MEMBARRIER_CMD_PRIVATE_EXPEDITED \
> - | MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED \
> + | MEMBARRIER_PRIVATE_EXPEDITED_BITMASK \
> | MEMBARRIER_PRIVATE_EXPEDITED_SYNC_CORE_BITMASK \
> | MEMBARRIER_PRIVATE_EXPEDITED_RSEQ_BITMASK \
> | MEMBARRIER_CMD_GET_REGISTRATIONS)
> @@ -329,6 +336,8 @@ static int membarrier_private_expedited(int flags, int cpu_id)
> return -EPERM;
> ipi_func = ipi_rseq;
> } else {
> + if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER))
> + return -EINVAL;
> WARN_ON_ONCE(flags);
> if (!(atomic_read(&mm->membarrier_state) &
> MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY))
> @@ -519,6 +528,8 @@ static int membarrier_register_private_expedited(int flags)
> ready_state =
> MEMBARRIER_STATE_PRIVATE_EXPEDITED_RSEQ_READY;
> } else {
> + if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER))
> + return -EINVAL;
> WARN_ON_ONCE(flags);
> }
>

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

2023-12-11 14:08:59

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH v2 3/4] membarrier: riscv: Provide core serializing command

On 2023-12-11 04:44, Andrea Parri wrote:
> RISC-V uses xRET instructions on return from interrupt and to go back
> to user-space; the xRET instruction is not core serializing.
>
> Use FENCE.I for providing core serialization as follows:
>
> - by calling sync_core_before_usermode() on return from interrupt (cf.
> ipi_sync_core()),
>
> - via switch_mm() and sync_core_before_usermode() (respectively, for
> uthread->uthread and kthread->uthread transitions) to go back to
> user-space.
>
> On RISC-V, the serialization in switch_mm() is activated by resetting
> the icache_stale_mask of the mm at prepare_sync_core_cmd().
>
> Suggested-by: Palmer Dabbelt <[email protected]>
> Signed-off-by: Andrea Parri <[email protected]>
> ---
> .../membarrier-sync-core/arch-support.txt | 18 +++++++++++-
> MAINTAINERS | 1 +
> arch/riscv/Kconfig | 3 ++
> arch/riscv/include/asm/membarrier.h | 19 ++++++++++++
> arch/riscv/include/asm/sync_core.h | 29 +++++++++++++++++++
> 5 files changed, 69 insertions(+), 1 deletion(-)
> create mode 100644 arch/riscv/include/asm/sync_core.h
>
> diff --git a/Documentation/features/sched/membarrier-sync-core/arch-support.txt b/Documentation/features/sched/membarrier-sync-core/arch-support.txt
> index d96b778b87ed8..a163170fc0f48 100644
> --- a/Documentation/features/sched/membarrier-sync-core/arch-support.txt
> +++ b/Documentation/features/sched/membarrier-sync-core/arch-support.txt
> @@ -10,6 +10,22 @@
> # Rely on implicit context synchronization as a result of exception return
> # when returning from IPI handler, and when returning to user-space.
> #
> +# * riscv
> +#
> +# riscv uses xRET as return from interrupt and to return to user-space.
> +#
> +# Given that xRET is not core serializing, we rely on FENCE.I for providing
> +# core serialization:
> +#
> +# - by calling sync_core_before_usermode() on return from interrupt (cf.
> +# ipi_sync_core()),
> +#
> +# - via switch_mm() and sync_core_before_usermode() (respectively, for
> +# uthread->uthread and kthread->uthread transitions) to go back to
> +# user-space.
> +#
> +# The serialization in switch_mm() is activated by prepare_sync_core_cmd().
> +#
> # * x86
> #
> # x86-32 uses IRET as return from interrupt, which takes care of the IPI.
> @@ -43,7 +59,7 @@
> | openrisc: | TODO |
> | parisc: | TODO |
> | powerpc: | ok |
> - | riscv: | TODO |
> + | riscv: | ok |
> | s390: | ok |
> | sh: | TODO |
> | sparc: | TODO |
> diff --git a/MAINTAINERS b/MAINTAINERS
> index a9166d82ffced..f6f1fdc76cf46 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -13808,6 +13808,7 @@ M: "Paul E. McKenney" <[email protected]>
> L: [email protected]
> S: Supported
> F: arch/*/include/asm/membarrier.h
> +F: arch/*/include/asm/sync_core.h
> F: include/uapi/linux/membarrier.h
> F: kernel/sched/membarrier.c
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index f7db95097caf1..db7b1acd943e4 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -28,14 +28,17 @@ config RISCV
> select ARCH_HAS_GIGANTIC_PAGE
> select ARCH_HAS_KCOV
> select ARCH_HAS_MEMBARRIER_CALLBACKS
> + select ARCH_HAS_MEMBARRIER_SYNC_CORE
> select ARCH_HAS_MMIOWB
> select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
> select ARCH_HAS_PMEM_API
> + select ARCH_HAS_PREPARE_SYNC_CORE_CMD
> select ARCH_HAS_PTE_SPECIAL
> select ARCH_HAS_SET_DIRECT_MAP if MMU
> select ARCH_HAS_SET_MEMORY if MMU
> select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL
> select ARCH_HAS_STRICT_MODULE_RWX if MMU && !XIP_KERNEL
> + select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
> select ARCH_HAS_SYSCALL_WRAPPER
> select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
> select ARCH_HAS_UBSAN_SANITIZE_ALL
> diff --git a/arch/riscv/include/asm/membarrier.h b/arch/riscv/include/asm/membarrier.h
> index 4be218fa03b14..a1071039c20ed 100644
> --- a/arch/riscv/include/asm/membarrier.h
> +++ b/arch/riscv/include/asm/membarrier.h
> @@ -22,6 +22,25 @@ static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
> /*
> * The membarrier system call requires a full memory barrier
> * after storing to rq->curr, before going back to user-space.
> + *
> + * The barrier is also needed for the SYNC_CORE command when
> + * switching between processes; in particular, on a transition
> + * from a thread belonging to another mm to a thread belonging
> + * to the mm for which a membarrier SYNC_CORE is done on CPU0:
> + *
> + * - [CPU0] sets all bits in the mm icache_stale_mask.
> + *
> + * - [CPU1] store to rq->curr (by the scheduler).
> + *
> + * - [CPU0] loads rq->curr within membarrier and observes
> + * cpu_rq(1)->curr->mm != mm, so the IPI is skipped on
> + * CPU1; this means membarrier relies on switch_mm() to
> + * issue the sync-core.
> + *
> + * - [CPU1] switch_mm() loads icache_stale_mask; if the bit
> + * is zero, switch_mm() may incorrectly skip the sync-core.
> + *
> + * Matches the full barrier in membarrier_private_expedited().

There are two full barriers in membarrier_private_expedited(). We
should clearly state which one it matches, and update the associated
barrier to state that it matches this barrier as well.

The barrier it matches is near the membarrier system call entry:

/*
* Matches memory barriers around rq->curr modification in
* scheduler.
*/
smp_mb(); /* system call entry is not a mb. */

Note that membarrier _also_ requires a full barrier between entering
the kernel and storing to rq->curr. This is provided by __schedule():

* Also, the membarrier system call requires a full memory barrier
* after coming from user-space, before storing to rq->curr.
*/
rq_lock(rq, &rf);
smp_mb__after_spinlock();

This other barrier is matched by the barrier near the membarrier system call
exit:

/*
* Memory barrier on the caller thread _after_ we finished
* waiting for the last IPI. Matches memory barriers around
* rq->curr modification in scheduler.
*/
smp_mb(); /* exit from system call is not a mb */

I don't think this second barrier pairing matters for the cpumask update
though.

Thanks,

Mathieu

> */
> smp_mb();
> }
> diff --git a/arch/riscv/include/asm/sync_core.h b/arch/riscv/include/asm/sync_core.h
> new file mode 100644
> index 0000000000000..9153016da8f14
> --- /dev/null
> +++ b/arch/riscv/include/asm/sync_core.h
> @@ -0,0 +1,29 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_RISCV_SYNC_CORE_H
> +#define _ASM_RISCV_SYNC_CORE_H
> +
> +/*
> + * RISC-V implements return to user-space through an xRET instruction,
> + * which is not core serializing.
> + */
> +static inline void sync_core_before_usermode(void)
> +{
> + asm volatile ("fence.i" ::: "memory");
> +}
> +
> +#ifdef CONFIG_SMP
> +/*
> + * Ensure the next switch_mm() on every CPU issues a core serializing
> + * instruction for the given @mm.
> + */
> +static inline void prepare_sync_core_cmd(struct mm_struct *mm)
> +{
> + cpumask_setall(&mm->context.icache_stale_mask);
> +}
> +#else
> +static inline void prepare_sync_core_cmd(struct mm_struct *mm)
> +{
> +}
> +#endif /* CONFIG_SMP */
> +
> +#endif /* _ASM_RISCV_SYNC_CORE_H */

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

2023-12-11 16:46:36

by Andrea Parri

[permalink] [raw]
Subject: Re: [PATCH v2 3/4] membarrier: riscv: Provide core serializing command

> > /*
> > * The membarrier system call requires a full memory barrier
> > * after storing to rq->curr, before going back to user-space.
> > + *
> > + * The barrier is also needed for the SYNC_CORE command when
> > + * switching between processes; in particular, on a transition
> > + * from a thread belonging to another mm to a thread belonging
> > + * to the mm for which a membarrier SYNC_CORE is done on CPU0:
> > + *
> > + * - [CPU0] sets all bits in the mm icache_stale_mask.
> > + *
> > + * - [CPU1] store to rq->curr (by the scheduler).
> > + *
> > + * - [CPU0] loads rq->curr within membarrier and observes
> > + * cpu_rq(1)->curr->mm != mm, so the IPI is skipped on
> > + * CPU1; this means membarrier relies on switch_mm() to
> > + * issue the sync-core.
> > + *
> > + * - [CPU1] switch_mm() loads icache_stale_mask; if the bit
> > + * is zero, switch_mm() may incorrectly skip the sync-core.
> > + *
> > + * Matches the full barrier in membarrier_private_expedited().
>
> There are two full barriers in membarrier_private_expedited(). We
> should clearly state which one it matches, and update the associated
> barrier to state that it matches this barrier as well.

Agreed, will do.

Andrea

2023-12-11 16:53:39

by Andrea Parri

[permalink] [raw]
Subject: Re: [RFC PATCH 4/4] membarrier: Introduce Kconfig ARCH_HAS_MEMBARRIER

On Mon, Dec 11, 2023 at 08:34:21AM -0500, Mathieu Desnoyers wrote:
> On 2023-12-11 04:44, Andrea Parri wrote:
> > Architectures supporting the "private expedited" membarrier command must
> > select the Kconfig to use the command. Document status and requirements
> > for each architecture in a single file under Documentation/features.
>
> Sorry for being vague in my suggestion: I did not intend to make it
> optional (not ARCH_HAS_MEMBARRIER), but rather just to have a central
> place where this would be documented.

I see.


> I'm not sure where in Documentation this should land though ?

Not sure, perhaps a section in (a new) Documentation/scheduler/membarrier.rst?

Andrea

2023-12-11 16:54:35

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC PATCH 4/4] membarrier: Introduce Kconfig ARCH_HAS_MEMBARRIER

On 2023-12-11 11:53, Andrea Parri wrote:
> On Mon, Dec 11, 2023 at 08:34:21AM -0500, Mathieu Desnoyers wrote:
>> On 2023-12-11 04:44, Andrea Parri wrote:
>>> Architectures supporting the "private expedited" membarrier command must
>>> select the Kconfig to use the command. Document status and requirements
>>> for each architecture in a single file under Documentation/features.
>>
>> Sorry for being vague in my suggestion: I did not intend to make it
>> optional (not ARCH_HAS_MEMBARRIER), but rather just to have a central
>> place where this would be documented.
>
> I see.
>
>
>> I'm not sure where in Documentation this should land though ?
>
> Not sure, perhaps a section in (a new) Documentation/scheduler/membarrier.rst?

Yes, it makes sense.

Thanks!

Mathieu

>
> Andrea

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com