2024-04-07 09:03:39

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 00/11] rcu/x86: Use per-cpu rcu preempt count

From: Lai Jiangshan <[email protected]>


Changed from v1:
Merge thunk_64.S and thunk_32.S into thunk.S
Add missing #ifdef in arch/x86/kernel/cpu/common.c

X86 can access percpu data in a single instruction.

Use per-cpu rcu preempt count and make it able to be inlined.

patch 1-8: prepare
patch 9-11: implement PCPU_RCU_PREEMPT_COUNT

Cc: "Paul E. McKenney" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Frederic Weisbecker <[email protected]>


Lai Jiangshan (11):
lib: Use rcu_preempt_depth() to replace current->rcu_read_lock_nesting
rcu: Move rcu_preempt_depth_set() to rcupdate.h
rcu: Reorder tree_exp.h after tree_plugin.h
rcu: Add macros set_rcu_preempt_special() and
clear_rcu_preempt_special()
rcu: Make rcu_read_unlock_special() global
rcu: Rename marco __LINUX_RCU_H to __KERNEL_RCU_H
sched/core: Add rcu_preempt_switch()
x86/entry: Merge thunk_64.S and thunk_32.S into thunk.S
rcu: Implement PCPU_RCU_PREEMPT_COUNT framework
x86/rcu: Add rcu_preempt_count
x86/rcu: Add THUNK rcu_read_unlock_special_thunk

arch/x86/Kconfig | 1 +
arch/x86/entry/Makefile | 2 +-
arch/x86/entry/{thunk_64.S => thunk.S} | 5 ++
arch/x86/entry/thunk_32.S | 18 ----
arch/x86/include/asm/current.h | 3 +
arch/x86/include/asm/rcu_preempt.h | 109 +++++++++++++++++++++++++
arch/x86/kernel/cpu/common.c | 4 +
include/linux/rcupdate.h | 36 ++++++++
kernel/rcu/Kconfig | 8 ++
kernel/rcu/rcu.h | 15 +++-
kernel/rcu/tree.c | 2 +-
kernel/rcu/tree_exp.h | 2 +-
kernel/rcu/tree_plugin.h | 41 ++++++----
kernel/sched/core.c | 2 +
lib/locking-selftest.c | 6 +-
15 files changed, 212 insertions(+), 42 deletions(-)
rename arch/x86/entry/{thunk_64.S => thunk.S} (72%)
delete mode 100644 arch/x86/entry/thunk_32.S
create mode 100644 arch/x86/include/asm/rcu_preempt.h


base-commit: f2f80ac809875855ac843f9e5e7480604b5cbff5
--
2.19.1.6.gb485710b



2024-04-07 09:03:59

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 01/11] lib: Use rcu_preempt_depth() to replace current->rcu_read_lock_nesting

From: Lai Jiangshan <[email protected]>

Use the general wrapper rcu_preempt_depth() instead of the open code.

Prepare for enabling per-cpu rcu_preempt_count, in which case
current->rcu_read_lock_nesting might not be synced with real depth.

Cc: "Paul E. McKenney" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>
---
lib/locking-selftest.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index 6f6a5fc85b42..9bb41fb18088 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -1434,7 +1434,7 @@ static void dotest(void (*testcase_fn)(void), int expected, int lockclass_mask)
#ifdef CONFIG_SMP
int saved_mgd_count = current->migration_disabled;
#endif
- int saved_rcu_count = current->rcu_read_lock_nesting;
+ int saved_rcu_count = rcu_preempt_depth();
#endif

WARN_ON(irqs_disabled());
@@ -1476,9 +1476,9 @@ static void dotest(void (*testcase_fn)(void), int expected, int lockclass_mask)
migrate_enable();
#endif

- while (current->rcu_read_lock_nesting > saved_rcu_count)
+ while (rcu_preempt_depth() > saved_rcu_count)
rcu_read_unlock();
- WARN_ON_ONCE(current->rcu_read_lock_nesting < saved_rcu_count);
+ WARN_ON_ONCE(rcu_preempt_depth() < saved_rcu_count);
#endif

#ifdef CONFIG_TRACE_IRQFLAGS
--
2.19.1.6.gb485710b


2024-04-07 09:04:29

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 02/11] rcu: Move rcu_preempt_depth_set() to rcupdate.h

From: Lai Jiangshan <[email protected]>

Prepare for arch-specific-defined rcu_preempt_depth_set().

No functionality change intended, but it has to be defined as a macro
as rcupdate.h is a very low level header included from areas that don't
even know about the task struct "current".

Cc: "Paul E. McKenney" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>
---
include/linux/rcupdate.h | 1 +
kernel/rcu/tree_plugin.h | 5 -----
2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 17d7ed5f3ae6..ebe9bfc700bb 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -78,6 +78,7 @@ void __rcu_read_unlock(void);
* types of kernel builds, the rcu_read_lock() nesting depth is unknowable.
*/
#define rcu_preempt_depth() READ_ONCE(current->rcu_read_lock_nesting)
+#define rcu_preempt_depth_set(val) WRITE_ONCE(current->rcu_read_lock_nesting, (val))

#else /* #ifdef CONFIG_PREEMPT_RCU */

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 36a8b5dbf5b5..b1264096d03a 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -387,11 +387,6 @@ static int rcu_preempt_read_exit(void)
return ret;
}

-static void rcu_preempt_depth_set(int val)
-{
- WRITE_ONCE(current->rcu_read_lock_nesting, val);
-}
-
/*
* Preemptible RCU implementation for rcu_read_lock().
* Just increment ->rcu_read_lock_nesting, shared state will be updated
--
2.19.1.6.gb485710b


2024-04-07 09:04:46

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 03/11] rcu: Reorder tree_exp.h after tree_plugin.h

From: Lai Jiangshan <[email protected]>

Enable tree_exp.h using some rcu preempt macros introduced in
the next patch. The new macros touch core rcu-preempt fields
and are better to be implemented in tree_plugin.h.

Cc: "Paul E. McKenney" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/rcu/tree.c | 2 +-
kernel/rcu/tree_plugin.h | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index d9642dd06c25..57d1ae26861f 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -5245,6 +5245,6 @@ void __init rcu_init(void)
}

#include "tree_stall.h"
-#include "tree_exp.h"
#include "tree_nocb.h"
#include "tree_plugin.h"
+#include "tree_exp.h"
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index b1264096d03a..d899b4afc21c 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -103,6 +103,7 @@ static void __init rcu_bootup_announce_oddness(void)
#ifdef CONFIG_PREEMPT_RCU

static void rcu_report_exp_rnp(struct rcu_node *rnp, bool wake);
+static bool sync_rcu_exp_done(struct rcu_node *rnp);
static void rcu_read_unlock_special(struct task_struct *t);

/*
--
2.19.1.6.gb485710b


2024-04-07 09:05:01

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 04/11] rcu: Add macros set_rcu_preempt_special() and clear_rcu_preempt_special()

From: Lai Jiangshan <[email protected]>

Add wrappers accessing to t->rcu_read_unlock_special.b so that the
wrappers can be adapted for synching the value with the later-introduced
per-cpu rcu_preempt_count.

Cc: "Paul E. McKenney" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/rcu/tree_exp.h | 2 +-
kernel/rcu/tree_plugin.h | 20 +++++++++++++-------
2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 6b83537480b1..4e91414552e6 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -725,7 +725,7 @@ static void rcu_exp_handler(void *unused)
raw_spin_lock_irqsave_rcu_node(rnp, flags);
if (rnp->expmask & rdp->grpmask) {
WRITE_ONCE(rdp->cpu_no_qs.b.exp, true);
- t->rcu_read_unlock_special.b.exp_hint = true;
+ set_rcu_preempt_special(exp_hint);
}
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
return;
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index d899b4afc21c..3dbd5609185e 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -106,6 +106,14 @@ static void rcu_report_exp_rnp(struct rcu_node *rnp, bool wake);
static bool sync_rcu_exp_done(struct rcu_node *rnp);
static void rcu_read_unlock_special(struct task_struct *t);

+#define set_rcu_preempt_special(reason) do { \
+ WRITE_ONCE(current->rcu_read_unlock_special.b.reason, true); \
+ } while (0)
+
+#define clear_rcu_preempt_special(reason) do { \
+ WRITE_ONCE(current->rcu_read_unlock_special.b.reason, false); \
+ } while (0)
+
/*
* Tell them what RCU they are running.
*/
@@ -293,7 +301,7 @@ static void rcu_qs(void)
TPS("cpuqs"));
__this_cpu_write(rcu_data.cpu_no_qs.b.norm, false);
barrier(); /* Coordinate with rcu_flavor_sched_clock_irq(). */
- WRITE_ONCE(current->rcu_read_unlock_special.b.need_qs, false);
+ clear_rcu_preempt_special(need_qs);
}
}

@@ -325,7 +333,7 @@ void rcu_note_context_switch(bool preempt)
/* Possibly blocking in an RCU read-side critical section. */
rnp = rdp->mynode;
raw_spin_lock_rcu_node(rnp);
- t->rcu_read_unlock_special.b.blocked = true;
+ set_rcu_preempt_special(blocked);
t->rcu_blocked_node = rnp;

/*
@@ -399,7 +407,7 @@ void __rcu_read_lock(void)
if (IS_ENABLED(CONFIG_PROVE_LOCKING))
WARN_ON_ONCE(rcu_preempt_depth() > RCU_NEST_PMAX);
if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) && rcu_state.gp_kthread)
- WRITE_ONCE(current->rcu_read_unlock_special.b.need_qs, true);
+ set_rcu_preempt_special(need_qs);
barrier(); /* critical section after entry code. */
}
EXPORT_SYMBOL_GPL(__rcu_read_lock);
@@ -738,7 +746,7 @@ static void rcu_flavor_sched_clock_irq(int user)
__this_cpu_read(rcu_data.cpu_no_qs.b.norm) &&
!t->rcu_read_unlock_special.b.need_qs &&
time_after(jiffies, rcu_state.gp_start + HZ))
- t->rcu_read_unlock_special.b.need_qs = true;
+ set_rcu_preempt_special(need_qs);
}

/*
@@ -751,12 +759,10 @@ static void rcu_flavor_sched_clock_irq(int user)
*/
void exit_rcu(void)
{
- struct task_struct *t = current;
-
if (unlikely(!list_empty(&current->rcu_node_entry))) {
rcu_preempt_depth_set(1);
barrier();
- WRITE_ONCE(t->rcu_read_unlock_special.b.blocked, true);
+ set_rcu_preempt_special(blocked);
} else if (unlikely(rcu_preempt_depth())) {
rcu_preempt_depth_set(1);
} else {
--
2.19.1.6.gb485710b


2024-04-07 09:05:17

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 05/11] rcu: Make rcu_read_unlock_special() global

From: Lai Jiangshan <[email protected]>

Make it global so that it can be used in the future inlined rcu_read_unlock().

Make it exported so that the inlined rcu_read_unlock() can be used in modules.

Make it taking a void-argument so that the caller can reduce instructions
obtaining the "current" task and it can be used in THUNK later.

Cc: "Paul E. McKenney" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>
---
include/linux/rcupdate.h | 2 ++
kernel/rcu/tree_plugin.h | 7 ++++---
2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index ebe9bfc700bb..328667ae8086 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -68,6 +68,8 @@ static inline bool same_state_synchronize_rcu(unsigned long oldstate1, unsigned

#ifdef CONFIG_PREEMPT_RCU

+void rcu_read_unlock_special(void);
+
void __rcu_read_lock(void);
void __rcu_read_unlock(void);

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 3dbd5609185e..880b3fef1158 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -104,7 +104,6 @@ static void __init rcu_bootup_announce_oddness(void)

static void rcu_report_exp_rnp(struct rcu_node *rnp, bool wake);
static bool sync_rcu_exp_done(struct rcu_node *rnp);
-static void rcu_read_unlock_special(struct task_struct *t);

#define set_rcu_preempt_special(reason) do { \
WRITE_ONCE(current->rcu_read_unlock_special.b.reason, true); \
@@ -427,7 +426,7 @@ void __rcu_read_unlock(void)
if (rcu_preempt_read_exit() == 0) {
barrier(); // critical-section exit before .s check.
if (unlikely(READ_ONCE(t->rcu_read_unlock_special.s)))
- rcu_read_unlock_special(t);
+ rcu_read_unlock_special();
}
if (IS_ENABLED(CONFIG_PROVE_LOCKING)) {
int rrln = rcu_preempt_depth();
@@ -627,8 +626,9 @@ static void rcu_preempt_deferred_qs_handler(struct irq_work *iwp)
* notify RCU core processing or task having blocked during the RCU
* read-side critical section.
*/
-static void rcu_read_unlock_special(struct task_struct *t)
+void rcu_read_unlock_special(void)
{
+ struct task_struct *t = current;
unsigned long flags;
bool irqs_were_disabled;
bool preempt_bh_were_disabled =
@@ -684,6 +684,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
}
rcu_preempt_deferred_qs_irqrestore(t, flags);
}
+EXPORT_SYMBOL_GPL(rcu_read_unlock_special);

/*
* Check that the list of blocked tasks for the newly completed grace
--
2.19.1.6.gb485710b


2024-04-07 09:05:59

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 07/11] sched/core: Add rcu_preempt_switch()

From: Lai Jiangshan <[email protected]>

When the per-cpu rcu_preempt_count is used, it has to be switched too
on context-switching. And the instructions to switch the per-cpu
rcu_preempt_count are few, so it is inlined to avoid the overhead
in the scheduler.

Cc: "Paul E. McKenney" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/rcu/rcu.h | 5 +++++
kernel/sched/core.c | 2 ++
2 files changed, 7 insertions(+)

diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index b17b2ed657fc..ea5ae957c687 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -668,4 +668,9 @@ int rcu_stall_notifier_call_chain(unsigned long val, void *v);
static inline int rcu_stall_notifier_call_chain(unsigned long val, void *v) { return NOTIFY_DONE; }
#endif // #else // #if defined(CONFIG_RCU_STALL_COMMON) && defined(CONFIG_RCU_CPU_STALL_NOTIFIER)

+static inline void
+rcu_preempt_switch(struct task_struct *prev, struct task_struct *next)
+{
+}
+
#endif /* __KERNEL_RCU_H */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7019a40457a6..1d9e3c51c913 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -95,6 +95,7 @@
#include "../workqueue_internal.h"
#include "../../io_uring/io-wq.h"
#include "../smpboot.h"
+#include "../rcu/rcu.h"

EXPORT_TRACEPOINT_SYMBOL_GPL(ipi_send_cpu);
EXPORT_TRACEPOINT_SYMBOL_GPL(ipi_send_cpumask);
@@ -6737,6 +6738,7 @@ static void __sched notrace __schedule(unsigned int sched_mode)
*/
++*switch_count;

+ rcu_preempt_switch(prev, next);
migrate_disable_switch(rq, prev);
psi_sched_switch(prev, next, !task_on_rq_queued(prev));

--
2.19.1.6.gb485710b


2024-04-07 09:06:14

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 08/11] x86/entry: Merge thunk_64.S and thunk_32.S into thunk.S

From: Lai Jiangshan <[email protected]>

The code in thunk_64.S and thunk_32.S are exactly the same except for
the comments. Merge them in to thunk.S.

And since thunk_32.S was originated from thunk_64.S, the new merged
thunk.S is actually renamed from thunk_64.S.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/entry/Makefile | 2 +-
arch/x86/entry/{thunk_64.S => thunk.S} | 0
arch/x86/entry/thunk_32.S | 18 ------------------
3 files changed, 1 insertion(+), 19 deletions(-)
rename arch/x86/entry/{thunk_64.S => thunk.S} (100%)
delete mode 100644 arch/x86/entry/thunk_32.S

diff --git a/arch/x86/entry/Makefile b/arch/x86/entry/Makefile
index c93e7f5c2a06..ce1cc1622385 100644
--- a/arch/x86/entry/Makefile
+++ b/arch/x86/entry/Makefile
@@ -17,7 +17,7 @@ obj-y += common.o
obj-y += vdso/
obj-y += vsyscall/

-obj-$(CONFIG_PREEMPTION) += thunk_$(BITS).o
+obj-$(CONFIG_PREEMPTION) += thunk.o
CFLAGS_entry_fred.o += -fno-stack-protector
CFLAGS_REMOVE_entry_fred.o += -pg $(CC_FLAGS_FTRACE)
obj-$(CONFIG_X86_FRED) += entry_64_fred.o entry_fred.o
diff --git a/arch/x86/entry/thunk_64.S b/arch/x86/entry/thunk.S
similarity index 100%
rename from arch/x86/entry/thunk_64.S
rename to arch/x86/entry/thunk.S
diff --git a/arch/x86/entry/thunk_32.S b/arch/x86/entry/thunk_32.S
deleted file mode 100644
index da37f42f4549..000000000000
--- a/arch/x86/entry/thunk_32.S
+++ /dev/null
@@ -1,18 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * Trampoline to trace irqs off. (otherwise CALLER_ADDR1 might crash)
- * Copyright 2008 by Steven Rostedt, Red Hat, Inc
- * (inspired by Andi Kleen's thunk_64.S)
- */
-
-#include <linux/export.h>
-#include <linux/linkage.h>
-#include <asm/asm.h>
-
-#include "calling.h"
-
-THUNK preempt_schedule_thunk, preempt_schedule
-THUNK preempt_schedule_notrace_thunk, preempt_schedule_notrace
-EXPORT_SYMBOL(preempt_schedule_thunk)
-EXPORT_SYMBOL(preempt_schedule_notrace_thunk)
-
--
2.19.1.6.gb485710b


2024-04-07 09:06:25

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 09/11] rcu: Implement PCPU_RCU_PREEMPT_COUNT framework

From: Lai Jiangshan <[email protected]>

When the arch code provides HAVE_PCPU_RCU_PREEMPT_COUNT and the
corresponding functions, rcu core uses the functions to implement
rcu_preempt_depth(), special bits, switching and so on.

Cc: "Paul E. McKenney" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>
---
include/linux/rcupdate.h | 33 +++++++++++++++++++++++++++++++++
kernel/rcu/Kconfig | 8 ++++++++
kernel/rcu/rcu.h | 4 ++++
kernel/rcu/tree_plugin.h | 8 ++++++++
4 files changed, 53 insertions(+)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 328667ae8086..e3e5ce44c7dc 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -70,6 +70,8 @@ static inline bool same_state_synchronize_rcu(unsigned long oldstate1, unsigned

void rcu_read_unlock_special(void);

+#ifndef CONFIG_PCPU_RCU_PREEMPT_COUNT
+
void __rcu_read_lock(void);
void __rcu_read_unlock(void);

@@ -81,6 +83,37 @@ void __rcu_read_unlock(void);
*/
#define rcu_preempt_depth() READ_ONCE(current->rcu_read_lock_nesting)
#define rcu_preempt_depth_set(val) WRITE_ONCE(current->rcu_read_lock_nesting, (val))
+#define pcpu_rcu_preempt_special_set() do { } while (0)
+#define pcpu_rcu_preempt_special_clear() do { } while (0)
+
+#else /* #ifndef CONFIG_PCPU_RCU_PREEMPT_COUNT */
+
+#include <asm/rcu_preempt.h>
+
+static __always_inline void __rcu_read_lock(void)
+{
+ pcpu_rcu_preempt_count_add(1);
+ barrier();
+}
+
+static __always_inline void __rcu_read_unlock(void)
+{
+ barrier();
+ if (unlikely(pcpu_rcu_preempt_count_dec_and_test()))
+ pcpu_rcu_read_unlock_special();
+}
+
+static inline int rcu_preempt_depth(void)
+{
+ return pcpu_rcu_preempt_count();
+}
+
+static inline void rcu_preempt_depth_set(int val)
+{
+ pcpu_rcu_preempt_count_set(val);
+}
+
+#endif /* #else #ifndef CONFIG_PCPU_RCU_PREEMPT_COUNT */

#else /* #ifdef CONFIG_PREEMPT_RCU */

diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
index e7d2dd267593..5d91147bc9a3 100644
--- a/kernel/rcu/Kconfig
+++ b/kernel/rcu/Kconfig
@@ -345,4 +345,12 @@ config RCU_DOUBLE_CHECK_CB_TIME
Say Y here if you need tighter callback-limit enforcement.
Say N here if you are unsure.

+config HAVE_PCPU_RCU_PREEMPT_COUNT
+ bool
+
+config PCPU_RCU_PREEMPT_COUNT
+ def_bool y
+ depends on PREEMPT_RCU && HAVE_PCPU_RCU_PREEMPT_COUNT
+ depends on !PROVE_LOCKING && !RCU_STRICT_GRACE_PERIOD
+
endmenu # "RCU Subsystem"
diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index ea5ae957c687..2322b040c5cd 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -671,6 +671,10 @@ static inline int rcu_stall_notifier_call_chain(unsigned long val, void *v) { re
static inline void
rcu_preempt_switch(struct task_struct *prev, struct task_struct *next)
{
+#ifdef CONFIG_PCPU_RCU_PREEMPT_COUNT
+ prev->rcu_read_lock_nesting = rcu_preempt_depth();
+ pcpu_rcu_preempt_switch(next->rcu_read_lock_nesting, next->rcu_read_unlock_special.s);
+#endif // #ifdef CONFIG_PCPU_RCU_PREEMPT_COUNT
}

#endif /* __KERNEL_RCU_H */
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 880b3fef1158..db68d0c1c1f2 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -107,10 +107,13 @@ static bool sync_rcu_exp_done(struct rcu_node *rnp);

#define set_rcu_preempt_special(reason) do { \
WRITE_ONCE(current->rcu_read_unlock_special.b.reason, true); \
+ pcpu_rcu_preempt_special_set(); \
} while (0)

#define clear_rcu_preempt_special(reason) do { \
WRITE_ONCE(current->rcu_read_unlock_special.b.reason, false); \
+ if (!current->rcu_read_unlock_special.s) \
+ pcpu_rcu_preempt_special_clear(); \
} while (0)

/*
@@ -379,6 +382,8 @@ static int rcu_preempt_blocked_readers_cgp(struct rcu_node *rnp)
return READ_ONCE(rnp->gp_tasks) != NULL;
}

+#ifndef CONFIG_PCPU_RCU_PREEMPT_COUNT
+
/* limit value for ->rcu_read_lock_nesting. */
#define RCU_NEST_PMAX (INT_MAX / 2)

@@ -436,6 +441,8 @@ void __rcu_read_unlock(void)
}
EXPORT_SYMBOL_GPL(__rcu_read_unlock);

+#endif /* #ifndef CONFIG_PCPU_RCU_PREEMPT_COUNT */
+
/*
* Advance a ->blkd_tasks-list pointer to the next entry, instead
* returning NULL if at the end of the list.
@@ -489,6 +496,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
return;
}
t->rcu_read_unlock_special.s = 0;
+ pcpu_rcu_preempt_special_clear();
if (special.b.need_qs) {
if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD)) {
rdp->cpu_no_qs.b.norm = false;
--
2.19.1.6.gb485710b


2024-04-07 09:06:39

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 10/11] x86/rcu: Add rcu_preempt_count

From: Lai Jiangshan <[email protected]>

Implement PCPU_RCU_PREEMPT_COUNT for x86.
Mainly copied from asm/preempt.h

Make rcu_read_[un]lock() inlined for rcu-preempt.
Make rcu_read_lock() only one instruction.
Make rcu_read_unlock() only two instructions in the fast path.

Cc: "Paul E. McKenney" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/Kconfig | 1 +
arch/x86/include/asm/current.h | 3 +
arch/x86/include/asm/rcu_preempt.h | 107 +++++++++++++++++++++++++++++
arch/x86/kernel/cpu/common.c | 4 ++
4 files changed, 115 insertions(+)
create mode 100644 arch/x86/include/asm/rcu_preempt.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4fff6ed46e90..e805cac3763d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -256,6 +256,7 @@ config X86
select HAVE_OBJTOOL if X86_64
select HAVE_OPTPROBES
select HAVE_PAGE_SIZE_4KB
+ select HAVE_PCPU_RCU_PREEMPT_COUNT
select HAVE_PCSPKR_PLATFORM
select HAVE_PERF_EVENTS
select HAVE_PERF_EVENTS_NMI
diff --git a/arch/x86/include/asm/current.h b/arch/x86/include/asm/current.h
index bf5953883ec3..dcc2ef784120 100644
--- a/arch/x86/include/asm/current.h
+++ b/arch/x86/include/asm/current.h
@@ -24,6 +24,9 @@ struct pcpu_hot {
unsigned long top_of_stack;
void *hardirq_stack_ptr;
u16 softirq_pending;
+#ifdef CONFIG_PCPU_RCU_PREEMPT_COUNT
+ int rcu_preempt_count;
+#endif // #ifdef CONFIG_PCPU_RCU_PREEMPT_COUNT
#ifdef CONFIG_X86_64
bool hardirq_stack_inuse;
#else
diff --git a/arch/x86/include/asm/rcu_preempt.h b/arch/x86/include/asm/rcu_preempt.h
new file mode 100644
index 000000000000..cb25ebe038a5
--- /dev/null
+++ b/arch/x86/include/asm/rcu_preempt.h
@@ -0,0 +1,107 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_RCU_PREEMPT_H
+#define __ASM_RCU_PREEMPT_H
+
+#include <asm/rmwcc.h>
+#include <asm/percpu.h>
+#include <asm/current.h>
+
+#ifdef CONFIG_PCPU_RCU_PREEMPT_COUNT
+
+/* We use the MSB mostly because its available */
+#define RCU_PREEMPT_UNLOCK_SPECIAL_INVERTED 0x80000000
+
+/*
+ * We use the RCU_PREEMPT_UNLOCK_SPECIAL_INVERTED bit as an inverted
+ * current->rcu_read_unlock_special.s such that a decrement hitting 0
+ * means we can and should call rcu_read_unlock_special().
+ */
+#define RCU_PREEMPT_INIT (0 + RCU_PREEMPT_UNLOCK_SPECIAL_INVERTED)
+
+/*
+ * We mask the RCU_PREEMPT_UNLOCK_SPECIAL_INVERTED bit so as not to
+ * confuse all current users that think a non-zero value indicates we
+ * are in a critical section.
+ */
+static inline int pcpu_rcu_preempt_count(void)
+{
+ return raw_cpu_read_4(pcpu_hot.rcu_preempt_count) & ~RCU_PREEMPT_UNLOCK_SPECIAL_INVERTED;
+}
+
+static inline void pcpu_rcu_preempt_count_set(int count)
+{
+ int old, new;
+
+ old = raw_cpu_read_4(pcpu_hot.rcu_preempt_count);
+ do {
+ new = (old & RCU_PREEMPT_UNLOCK_SPECIAL_INVERTED) |
+ (count & ~RCU_PREEMPT_UNLOCK_SPECIAL_INVERTED);
+ } while (!raw_cpu_try_cmpxchg_4(pcpu_hot.rcu_preempt_count, &old, new));
+}
+
+/*
+ * We fold the RCU_PREEMPT_UNLOCK_SPECIAL_INVERTED bit into the RCU
+ * preempt count such that rcu_read_unlock() can decrement and test for
+ * the need of unlock-special handling with a single instruction.
+ *
+ * We invert the actual bit, so that when the decrement hits 0 we know
+ * we both reach a quiescent state (no rcu preempt count) and need to
+ * handle unlock-special (the bit is cleared), normally to report the
+ * quiescent state immediately.
+ */
+
+static inline void pcpu_rcu_preempt_special_set(void)
+{
+ raw_cpu_and_4(pcpu_hot.rcu_preempt_count, ~RCU_PREEMPT_UNLOCK_SPECIAL_INVERTED);
+}
+
+static inline void pcpu_rcu_preempt_special_clear(void)
+{
+ raw_cpu_or_4(pcpu_hot.rcu_preempt_count, RCU_PREEMPT_UNLOCK_SPECIAL_INVERTED);
+}
+
+static inline bool pcpu_rcu_preempt_special_test(void)
+{
+ return !(raw_cpu_read_4(pcpu_hot.rcu_preempt_count) & RCU_PREEMPT_UNLOCK_SPECIAL_INVERTED);
+}
+
+static inline void pcpu_rcu_preempt_switch(int count, bool special)
+{
+ if (likely(!special))
+ raw_cpu_write(pcpu_hot.rcu_preempt_count, count | RCU_PREEMPT_UNLOCK_SPECIAL_INVERTED);
+ else
+ raw_cpu_write(pcpu_hot.rcu_preempt_count, count);
+}
+
+/*
+ * The various rcu_preempt_count add/sub methods
+ */
+
+static __always_inline void pcpu_rcu_preempt_count_add(int val)
+{
+ raw_cpu_add_4(pcpu_hot.rcu_preempt_count, val);
+}
+
+static __always_inline void pcpu_rcu_preempt_count_sub(int val)
+{
+ raw_cpu_add_4(pcpu_hot.rcu_preempt_count, -val);
+}
+
+/*
+ * Because we keep RCU_PREEMPT_UNLOCK_SPECIAL_INVERTED set when we do
+ * _not_ need to handle unlock-special for a fast-path decrement.
+ */
+static __always_inline bool pcpu_rcu_preempt_count_dec_and_test(void)
+{
+ return GEN_UNARY_RMWcc("decl", __my_cpu_var(pcpu_hot.rcu_preempt_count), e,
+ __percpu_arg([var]));
+}
+
+#define pcpu_rcu_read_unlock_special() \
+do { \
+ rcu_read_unlock_special(); \
+} while (0)
+
+#endif // #ifdef CONFIG_PCPU_RCU_PREEMPT_COUNT
+
+#endif /* __ASM_RCU_PREEMPT_H */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 5c1e6d6be267..918b1f5cb75d 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1995,6 +1995,10 @@ DEFINE_PER_CPU_ALIGNED(struct pcpu_hot, pcpu_hot) = {
.current_task = &init_task,
.preempt_count = INIT_PREEMPT_COUNT,
.top_of_stack = TOP_OF_INIT_STACK,
+
+#ifdef CONFIG_PCPU_RCU_PREEMPT_COUNT
+ .rcu_preempt_count = RCU_PREEMPT_INIT,
+#endif // #ifdef CONFIG_PCPU_RCU_PREEMPT_COUNT
};
EXPORT_PER_CPU_SYMBOL(pcpu_hot);
EXPORT_PER_CPU_SYMBOL(const_pcpu_hot);
--
2.19.1.6.gb485710b


2024-04-07 09:07:06

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 11/11] x86/rcu: Add THUNK rcu_read_unlock_special_thunk

From: Lai Jiangshan <[email protected]>

Add rcu_read_unlock_special_thunk(), so that the inlined rcu_read_unlock()
doesn't need any code to save the caller-saved registers.

Make rcu_read_unlock() only two instructions in the slow path at the
caller site.

Cc: "Paul E. McKenney" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/entry/thunk.S | 5 +++++
arch/x86/include/asm/rcu_preempt.h | 4 +++-
2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/thunk.S b/arch/x86/entry/thunk.S
index 119ebdc3d362..10c60369a67c 100644
--- a/arch/x86/entry/thunk.S
+++ b/arch/x86/entry/thunk.S
@@ -13,3 +13,8 @@ THUNK preempt_schedule_thunk, preempt_schedule
THUNK preempt_schedule_notrace_thunk, preempt_schedule_notrace
EXPORT_SYMBOL(preempt_schedule_thunk)
EXPORT_SYMBOL(preempt_schedule_notrace_thunk)
+
+#ifdef CONFIG_PCPU_RCU_PREEMPT_COUNT
+THUNK rcu_read_unlock_special_thunk, rcu_read_unlock_special
+EXPORT_SYMBOL_GPL(rcu_read_unlock_special_thunk)
+#endif /* #ifdef CONFIG_PCPU_RCU_PREEMPT_COUNT */
diff --git a/arch/x86/include/asm/rcu_preempt.h b/arch/x86/include/asm/rcu_preempt.h
index cb25ebe038a5..acdd73b74c05 100644
--- a/arch/x86/include/asm/rcu_preempt.h
+++ b/arch/x86/include/asm/rcu_preempt.h
@@ -97,9 +97,11 @@ static __always_inline bool pcpu_rcu_preempt_count_dec_and_test(void)
__percpu_arg([var]));
}

+extern asmlinkage void rcu_read_unlock_special_thunk(void);
+
#define pcpu_rcu_read_unlock_special() \
do { \
- rcu_read_unlock_special(); \
+ asm volatile ("call rcu_read_unlock_special_thunk" : ASM_CALL_CONSTRAINT);\
} while (0)

#endif // #ifdef CONFIG_PCPU_RCU_PREEMPT_COUNT
--
2.19.1.6.gb485710b


2024-04-07 09:07:25

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 06/11] rcu: Rename marco __LINUX_RCU_H to __KERNEL_RCU_H

From: Lai Jiangshan <[email protected]>

to reflect the path of kernel/rcu/rcu.h.

Cc: "Paul E. McKenney" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/rcu/rcu.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 86fce206560e..b17b2ed657fc 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -7,8 +7,8 @@
* Author: Paul E. McKenney <[email protected]>
*/

-#ifndef __LINUX_RCU_H
-#define __LINUX_RCU_H
+#ifndef __KERNEL_RCU_H
+#define __KERNEL_RCU_H

#include <linux/slab.h>
#include <trace/events/rcu.h>
@@ -668,4 +668,4 @@ int rcu_stall_notifier_call_chain(unsigned long val, void *v);
static inline int rcu_stall_notifier_call_chain(unsigned long val, void *v) { return NOTIFY_DONE; }
#endif // #else // #if defined(CONFIG_RCU_STALL_COMMON) && defined(CONFIG_RCU_CPU_STALL_NOTIFIER)

-#endif /* __LINUX_RCU_H */
+#endif /* __KERNEL_RCU_H */
--
2.19.1.6.gb485710b


Subject: [tip: x86/entry] x86/entry: Merge thunk_64.S and thunk_32.S into thunk.S

The following commit has been merged into the x86/entry branch of tip:

Commit-ID: b767fe5de0b4a5057b070d8cdefdcf6740733d6e
Gitweb: https://git.kernel.org/tip/b767fe5de0b4a5057b070d8cdefdcf6740733d6e
Author: Lai Jiangshan <[email protected]>
AuthorDate: Sun, 07 Apr 2024 17:05:55 +08:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Tue, 09 Apr 2024 09:57:56 +02:00

x86/entry: Merge thunk_64.S and thunk_32.S into thunk.S

The code in thunk_64.S and thunk_32.S are exactly the same except for
the comments. Merge them in to thunk.S.

And since thunk_32.S was originated from thunk_64.S, the new merged
thunk.S is actually renamed from thunk_64.S.

Signed-off-by: Lai Jiangshan <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/entry/Makefile | 2 +-
arch/x86/entry/thunk.S | 15 +++++++++++++++
arch/x86/entry/thunk_32.S | 18 ------------------
arch/x86/entry/thunk_64.S | 15 ---------------
4 files changed, 16 insertions(+), 34 deletions(-)
create mode 100644 arch/x86/entry/thunk.S
delete mode 100644 arch/x86/entry/thunk_32.S
delete mode 100644 arch/x86/entry/thunk_64.S

diff --git a/arch/x86/entry/Makefile b/arch/x86/entry/Makefile
index c93e7f5..ce1cc16 100644
--- a/arch/x86/entry/Makefile
+++ b/arch/x86/entry/Makefile
@@ -17,7 +17,7 @@ obj-y += common.o
obj-y += vdso/
obj-y += vsyscall/

-obj-$(CONFIG_PREEMPTION) += thunk_$(BITS).o
+obj-$(CONFIG_PREEMPTION) += thunk.o
CFLAGS_entry_fred.o += -fno-stack-protector
CFLAGS_REMOVE_entry_fred.o += -pg $(CC_FLAGS_FTRACE)
obj-$(CONFIG_X86_FRED) += entry_64_fred.o entry_fred.o
diff --git a/arch/x86/entry/thunk.S b/arch/x86/entry/thunk.S
new file mode 100644
index 0000000..119ebdc
--- /dev/null
+++ b/arch/x86/entry/thunk.S
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Save registers before calling assembly functions. This avoids
+ * disturbance of register allocation in some inline assembly constructs.
+ * Copyright 2001,2002 by Andi Kleen, SuSE Labs.
+ */
+#include <linux/export.h>
+#include <linux/linkage.h>
+#include "calling.h"
+#include <asm/asm.h>
+
+THUNK preempt_schedule_thunk, preempt_schedule
+THUNK preempt_schedule_notrace_thunk, preempt_schedule_notrace
+EXPORT_SYMBOL(preempt_schedule_thunk)
+EXPORT_SYMBOL(preempt_schedule_notrace_thunk)
diff --git a/arch/x86/entry/thunk_32.S b/arch/x86/entry/thunk_32.S
deleted file mode 100644
index da37f42..0000000
--- a/arch/x86/entry/thunk_32.S
+++ /dev/null
@@ -1,18 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * Trampoline to trace irqs off. (otherwise CALLER_ADDR1 might crash)
- * Copyright 2008 by Steven Rostedt, Red Hat, Inc
- * (inspired by Andi Kleen's thunk_64.S)
- */
-
-#include <linux/export.h>
-#include <linux/linkage.h>
-#include <asm/asm.h>
-
-#include "calling.h"
-
-THUNK preempt_schedule_thunk, preempt_schedule
-THUNK preempt_schedule_notrace_thunk, preempt_schedule_notrace
-EXPORT_SYMBOL(preempt_schedule_thunk)
-EXPORT_SYMBOL(preempt_schedule_notrace_thunk)
-
diff --git a/arch/x86/entry/thunk_64.S b/arch/x86/entry/thunk_64.S
deleted file mode 100644
index 119ebdc..0000000
--- a/arch/x86/entry/thunk_64.S
+++ /dev/null
@@ -1,15 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * Save registers before calling assembly functions. This avoids
- * disturbance of register allocation in some inline assembly constructs.
- * Copyright 2001,2002 by Andi Kleen, SuSE Labs.
- */
-#include <linux/export.h>
-#include <linux/linkage.h>
-#include "calling.h"
-#include <asm/asm.h>
-
-THUNK preempt_schedule_thunk, preempt_schedule
-THUNK preempt_schedule_notrace_thunk, preempt_schedule_notrace
-EXPORT_SYMBOL(preempt_schedule_thunk)
-EXPORT_SYMBOL(preempt_schedule_notrace_thunk)

2024-04-23 17:26:29

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH V2 11/11] x86/rcu: Add THUNK rcu_read_unlock_special_thunk

Hello Lai,

On Sun, Apr 7, 2024 at 5:07 AM Lai Jiangshan <[email protected]> wrote:
>
> From: Lai Jiangshan <[email protected]>
>
> Add rcu_read_unlock_special_thunk(), so that the inlined rcu_read_unlock()
> doesn't need any code to save the caller-saved registers.
>
> Make rcu_read_unlock() only two instructions in the slow path at the
> caller site.
>
> Cc: "Paul E. McKenney" <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Frederic Weisbecker <[email protected]>
> Signed-off-by: Lai Jiangshan <[email protected]>
> ---
> arch/x86/entry/thunk.S | 5 +++++
> arch/x86/include/asm/rcu_preempt.h | 4 +++-
> 2 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/entry/thunk.S b/arch/x86/entry/thunk.S
> index 119ebdc3d362..10c60369a67c 100644
> --- a/arch/x86/entry/thunk.S
> +++ b/arch/x86/entry/thunk.S
> @@ -13,3 +13,8 @@ THUNK preempt_schedule_thunk, preempt_schedule
> THUNK preempt_schedule_notrace_thunk, preempt_schedule_notrace
> EXPORT_SYMBOL(preempt_schedule_thunk)
> EXPORT_SYMBOL(preempt_schedule_notrace_thunk)
> +
> +#ifdef CONFIG_PCPU_RCU_PREEMPT_COUNT
> +THUNK rcu_read_unlock_special_thunk, rcu_read_unlock_special
> +EXPORT_SYMBOL_GPL(rcu_read_unlock_special_thunk)
> +#endif /* #ifdef CONFIG_PCPU_RCU_PREEMPT_COUNT */
> diff --git a/arch/x86/include/asm/rcu_preempt.h b/arch/x86/include/asm/rcu_preempt.h
> index cb25ebe038a5..acdd73b74c05 100644
> --- a/arch/x86/include/asm/rcu_preempt.h
> +++ b/arch/x86/include/asm/rcu_preempt.h
> @@ -97,9 +97,11 @@ static __always_inline bool pcpu_rcu_preempt_count_dec_and_test(void)
> __percpu_arg([var]));
> }
>
> +extern asmlinkage void rcu_read_unlock_special_thunk(void);
> +
> #define pcpu_rcu_read_unlock_special() \
> do { \
> - rcu_read_unlock_special();

Instead, can you not use __no_caller_saved_registers attribute for
definition of rcu_read_unlock_special() or does that not work for what
you're trying to do here?

Thanks,

- Joel

2024-04-23 17:58:26

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH V2 03/11] rcu: Reorder tree_exp.h after tree_plugin.h

On Sun, Apr 7, 2024 at 5:03 AM Lai Jiangshan <[email protected]> wrote:
>
> From: Lai Jiangshan <[email protected]>
>
> Enable tree_exp.h using some rcu preempt macros introduced in
> the next patch. The new macros touch core rcu-preempt fields
> and are better to be implemented in tree_plugin.h.
>
> Cc: "Paul E. McKenney" <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Frederic Weisbecker <[email protected]>
> Signed-off-by: Lai Jiangshan <[email protected]>
> ---
> kernel/rcu/tree.c | 2 +-
> kernel/rcu/tree_plugin.h | 1 +
> 2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index d9642dd06c25..57d1ae26861f 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -5245,6 +5245,6 @@ void __init rcu_init(void)
> }
>
> #include "tree_stall.h"
> -#include "tree_exp.h"
> #include "tree_nocb.h"
> #include "tree_plugin.h"
> +#include "tree_exp.h"
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index b1264096d03a..d899b4afc21c 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -103,6 +103,7 @@ static void __init rcu_bootup_announce_oddness(void)
> #ifdef CONFIG_PREEMPT_RCU
>
> static void rcu_report_exp_rnp(struct rcu_node *rnp, bool wake);
> +static bool sync_rcu_exp_done(struct rcu_node *rnp);
> static void rcu_read_unlock_special(struct task_struct *t);

OK with me, but not sure if the reordering of header inclusion is
needed? You could get the same results by just adding declarations of
the new helpers to tree_exp.h.

Not sure if tree_plugin.h needs to be included last, I for some reason
thought it needed to be - but looks like not. I found a thread that
shed some light into the header file including C code thing as well,
which may or may not help:
https://lore.kernel.org/all/8ab3ca72-e20c-4b18-803f-bf6937c2cd70@paulmck-laptop/#t

Thanks.

2024-04-23 18:09:25

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH V2 10/11] x86/rcu: Add rcu_preempt_count

On Sun, Apr 7, 2024 at 5:06 AM Lai Jiangshan <[email protected]> wrote:
>
> From: Lai Jiangshan <[email protected]>
>
> Implement PCPU_RCU_PREEMPT_COUNT for x86.
> Mainly copied from asm/preempt.h
>
> Make rcu_read_[un]lock() inlined for rcu-preempt.

Changelog is wrong. You inlined rcu_read_[un]lock in previous patch,
not this one?

- Joel

> Make rcu_read_lock() only one instruction.
> Make rcu_read_unlock() only two instructions in the fast path.
>
> Cc: "Paul E. McKenney" <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Frederic Weisbecker <[email protected]>
> Signed-off-by: Lai Jiangshan <[email protected]>
> ---
> arch/x86/Kconfig | 1 +
> arch/x86/include/asm/current.h | 3 +
> arch/x86/include/asm/rcu_preempt.h | 107 +++++++++++++++++++++++++++++
> arch/x86/kernel/cpu/common.c | 4 ++
> 4 files changed, 115 insertions(+)
> create mode 100644 arch/x86/include/asm/rcu_preempt.h
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 4fff6ed46e90..e805cac3763d 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -256,6 +256,7 @@ config X86
> select HAVE_OBJTOOL if X86_64
> select HAVE_OPTPROBES
> select HAVE_PAGE_SIZE_4KB
> + select HAVE_PCPU_RCU_PREEMPT_COUNT
> select HAVE_PCSPKR_PLATFORM
> select HAVE_PERF_EVENTS
> select HAVE_PERF_EVENTS_NMI
> diff --git a/arch/x86/include/asm/current.h b/arch/x86/include/asm/current.h
> index bf5953883ec3..dcc2ef784120 100644
> --- a/arch/x86/include/asm/current.h
> +++ b/arch/x86/include/asm/current.h
> @@ -24,6 +24,9 @@ struct pcpu_hot {
> unsigned long top_of_stack;
> void *hardirq_stack_ptr;
> u16 softirq_pending;
> +#ifdef CONFIG_PCPU_RCU_PREEMPT_COUNT
> + int rcu_preempt_count;
> +#endif // #ifdef CONFIG_PCPU_RCU_PREEMPT_COUNT
> #ifdef CONFIG_X86_64
> bool hardirq_stack_inuse;
> #else
> diff --git a/arch/x86/include/asm/rcu_preempt.h b/arch/x86/include/asm/rcu_preempt.h
> new file mode 100644
> index 000000000000..cb25ebe038a5
> --- /dev/null
> +++ b/arch/x86/include/asm/rcu_preempt.h
> @@ -0,0 +1,107 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __ASM_RCU_PREEMPT_H
> +#define __ASM_RCU_PREEMPT_H
> +
> +#include <asm/rmwcc.h>
> +#include <asm/percpu.h>
> +#include <asm/current.h>
> +
> +#ifdef CONFIG_PCPU_RCU_PREEMPT_COUNT
> +
> +/* We use the MSB mostly because its available */
> +#define RCU_PREEMPT_UNLOCK_SPECIAL_INVERTED 0x80000000
> +
> +/*
> + * We use the RCU_PREEMPT_UNLOCK_SPECIAL_INVERTED bit as an inverted
> + * current->rcu_read_unlock_special.s such that a decrement hitting 0
> + * means we can and should call rcu_read_unlock_special().
> + */
> +#define RCU_PREEMPT_INIT (0 + RCU_PREEMPT_UNLOCK_SPECIAL_INVERTED)
> +
> +/*
> + * We mask the RCU_PREEMPT_UNLOCK_SPECIAL_INVERTED bit so as not to
> + * confuse all current users that think a non-zero value indicates we
> + * are in a critical section.
> + */
> +static inline int pcpu_rcu_preempt_count(void)
> +{
> + return raw_cpu_read_4(pcpu_hot.rcu_preempt_count) & ~RCU_PREEMPT_UNLOCK_SPECIAL_INVERTED;
> +}
> +
> +static inline void pcpu_rcu_preempt_count_set(int count)
> +{
> + int old, new;
> +
> + old = raw_cpu_read_4(pcpu_hot.rcu_preempt_count);
> + do {
> + new = (old & RCU_PREEMPT_UNLOCK_SPECIAL_INVERTED) |
> + (count & ~RCU_PREEMPT_UNLOCK_SPECIAL_INVERTED);
> + } while (!raw_cpu_try_cmpxchg_4(pcpu_hot.rcu_preempt_count, &old, new));
> +}
> +
> +/*
> + * We fold the RCU_PREEMPT_UNLOCK_SPECIAL_INVERTED bit into the RCU
> + * preempt count such that rcu_read_unlock() can decrement and test for
> + * the need of unlock-special handling with a single instruction.
> + *
> + * We invert the actual bit, so that when the decrement hits 0 we know
> + * we both reach a quiescent state (no rcu preempt count) and need to
> + * handle unlock-special (the bit is cleared), normally to report the
> + * quiescent state immediately.
> + */
> +
> +static inline void pcpu_rcu_preempt_special_set(void)
> +{
> + raw_cpu_and_4(pcpu_hot.rcu_preempt_count, ~RCU_PREEMPT_UNLOCK_SPECIAL_INVERTED);
> +}
> +
> +static inline void pcpu_rcu_preempt_special_clear(void)
> +{
> + raw_cpu_or_4(pcpu_hot.rcu_preempt_count, RCU_PREEMPT_UNLOCK_SPECIAL_INVERTED);
> +}
> +
> +static inline bool pcpu_rcu_preempt_special_test(void)
> +{
> + return !(raw_cpu_read_4(pcpu_hot.rcu_preempt_count) & RCU_PREEMPT_UNLOCK_SPECIAL_INVERTED);
> +}
> +
> +static inline void pcpu_rcu_preempt_switch(int count, bool special)
> +{
> + if (likely(!special))
> + raw_cpu_write(pcpu_hot.rcu_preempt_count, count | RCU_PREEMPT_UNLOCK_SPECIAL_INVERTED);
> + else
> + raw_cpu_write(pcpu_hot.rcu_preempt_count, count);
> +}
> +
> +/*
> + * The various rcu_preempt_count add/sub methods
> + */
> +
> +static __always_inline void pcpu_rcu_preempt_count_add(int val)
> +{
> + raw_cpu_add_4(pcpu_hot.rcu_preempt_count, val);
> +}
> +
> +static __always_inline void pcpu_rcu_preempt_count_sub(int val)
> +{
> + raw_cpu_add_4(pcpu_hot.rcu_preempt_count, -val);
> +}
> +
> +/*
> + * Because we keep RCU_PREEMPT_UNLOCK_SPECIAL_INVERTED set when we do
> + * _not_ need to handle unlock-special for a fast-path decrement.
> + */
> +static __always_inline bool pcpu_rcu_preempt_count_dec_and_test(void)
> +{
> + return GEN_UNARY_RMWcc("decl", __my_cpu_var(pcpu_hot.rcu_preempt_count), e,
> + __percpu_arg([var]));
> +}
> +
> +#define pcpu_rcu_read_unlock_special() \
> +do { \
> + rcu_read_unlock_special(); \
> +} while (0)
> +
> +#endif // #ifdef CONFIG_PCPU_RCU_PREEMPT_COUNT
> +
> +#endif /* __ASM_RCU_PREEMPT_H */
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 5c1e6d6be267..918b1f5cb75d 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1995,6 +1995,10 @@ DEFINE_PER_CPU_ALIGNED(struct pcpu_hot, pcpu_hot) = {
> .current_task = &init_task,
> .preempt_count = INIT_PREEMPT_COUNT,
> .top_of_stack = TOP_OF_INIT_STACK,
> +
> +#ifdef CONFIG_PCPU_RCU_PREEMPT_COUNT
> + .rcu_preempt_count = RCU_PREEMPT_INIT,
> +#endif // #ifdef CONFIG_PCPU_RCU_PREEMPT_COUNT
> };
> EXPORT_PER_CPU_SYMBOL(pcpu_hot);
> EXPORT_PER_CPU_SYMBOL(const_pcpu_hot);
> --
> 2.19.1.6.gb485710b
>
>

2024-04-23 18:19:56

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH V2 09/11] rcu: Implement PCPU_RCU_PREEMPT_COUNT framework

On Sun, Apr 7, 2024 at 5:04 AM Lai Jiangshan <[email protected]> wrote:
>
> From: Lai Jiangshan <[email protected]>
>
> When the arch code provides HAVE_PCPU_RCU_PREEMPT_COUNT and the
> corresponding functions, rcu core uses the functions to implement
> rcu_preempt_depth(), special bits, switching and so on.
>
> Cc: "Paul E. McKenney" <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Frederic Weisbecker <[email protected]>
> Signed-off-by: Lai Jiangshan <[email protected]>
> ---
> include/linux/rcupdate.h | 33 +++++++++++++++++++++++++++++++++
> kernel/rcu/Kconfig | 8 ++++++++
> kernel/rcu/rcu.h | 4 ++++
> kernel/rcu/tree_plugin.h | 8 ++++++++
> 4 files changed, 53 insertions(+)
>
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 328667ae8086..e3e5ce44c7dc 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -70,6 +70,8 @@ static inline bool same_state_synchronize_rcu(unsigned long oldstate1, unsigned
>
> void rcu_read_unlock_special(void);
>
> +#ifndef CONFIG_PCPU_RCU_PREEMPT_COUNT
> +
> void __rcu_read_lock(void);
> void __rcu_read_unlock(void);
>
> @@ -81,6 +83,37 @@ void __rcu_read_unlock(void);
> */
> #define rcu_preempt_depth() READ_ONCE(current->rcu_read_lock_nesting)
> #define rcu_preempt_depth_set(val) WRITE_ONCE(current->rcu_read_lock_nesting, (val))
> +#define pcpu_rcu_preempt_special_set() do { } while (0)
> +#define pcpu_rcu_preempt_special_clear() do { } while (0)
> +
> +#else /* #ifndef CONFIG_PCPU_RCU_PREEMPT_COUNT */
> +
> +#include <asm/rcu_preempt.h>
> +
> +static __always_inline void __rcu_read_lock(void)
> +{
> + pcpu_rcu_preempt_count_add(1);
> + barrier();
> +}
> +
> +static __always_inline void __rcu_read_unlock(void)
> +{
> + barrier();
> + if (unlikely(pcpu_rcu_preempt_count_dec_and_test()))
> + pcpu_rcu_read_unlock_special();
> +}

Previous code had comments about the barrier(); , can you add back
those comments?

Also there was a compiler barrier in the body of the if() as well?

For reference:

void __rcu_read_unlock(void)
{
struct task_struct *t = current;

barrier(); // critical section before exit code.
if (rcu_preempt_read_exit() == 0) {
barrier(); // critical-section exit before .s check.
if (unlikely(READ_ONCE(t->rcu_read_unlock_special.s)))
rcu_read_unlock_special(t);
}

2024-04-24 02:43:28

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH V2 11/11] x86/rcu: Add THUNK rcu_read_unlock_special_thunk

On Wed, Apr 24, 2024 at 1:26 AM Joel Fernandes <[email protected]> wrote:
>
> Hello Lai,
>
> On Sun, Apr 7, 2024 at 5:07 AM Lai Jiangshan <[email protected]> wrote:
> >
> > From: Lai Jiangshan <[email protected]>
> >
> > Add rcu_read_unlock_special_thunk(), so that the inlined rcu_read_unlock()
> > doesn't need any code to save the caller-saved registers.
> >
> > Make rcu_read_unlock() only two instructions in the slow path at the
> > caller site.
> >
> > Cc: "Paul E. McKenney" <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Frederic Weisbecker <[email protected]>
> > Signed-off-by: Lai Jiangshan <[email protected]>
> > ---
> > arch/x86/entry/thunk.S | 5 +++++
> > arch/x86/include/asm/rcu_preempt.h | 4 +++-
> > 2 files changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/entry/thunk.S b/arch/x86/entry/thunk.S
> > index 119ebdc3d362..10c60369a67c 100644
> > --- a/arch/x86/entry/thunk.S
> > +++ b/arch/x86/entry/thunk.S
> > @@ -13,3 +13,8 @@ THUNK preempt_schedule_thunk, preempt_schedule
> > THUNK preempt_schedule_notrace_thunk, preempt_schedule_notrace
> > EXPORT_SYMBOL(preempt_schedule_thunk)
> > EXPORT_SYMBOL(preempt_schedule_notrace_thunk)
> > +
> > +#ifdef CONFIG_PCPU_RCU_PREEMPT_COUNT
> > +THUNK rcu_read_unlock_special_thunk, rcu_read_unlock_special
> > +EXPORT_SYMBOL_GPL(rcu_read_unlock_special_thunk)
> > +#endif /* #ifdef CONFIG_PCPU_RCU_PREEMPT_COUNT */
> > diff --git a/arch/x86/include/asm/rcu_preempt.h b/arch/x86/include/asm/rcu_preempt.h
> > index cb25ebe038a5..acdd73b74c05 100644
> > --- a/arch/x86/include/asm/rcu_preempt.h
> > +++ b/arch/x86/include/asm/rcu_preempt.h
> > @@ -97,9 +97,11 @@ static __always_inline bool pcpu_rcu_preempt_count_dec_and_test(void)
> > __percpu_arg([var]));
> > }
> >
> > +extern asmlinkage void rcu_read_unlock_special_thunk(void);
> > +
> > #define pcpu_rcu_read_unlock_special() \
> > do { \
> > - rcu_read_unlock_special();
>
> Instead, can you not use __no_caller_saved_registers attribute for
> definition of rcu_read_unlock_special() or does that not work for what
> you're trying to do here?
>

I think it is paramount to make it the same as preempt_schedule[_thunk]()
when it comes to avoiding the caller-saving-registers-procedures in the
unlock paths.

I had no idea of 'no_caller_saved_registers' before, so I haven't tried it.

And there are limits to 'no_caller_saved_registers' either:

https://clang.llvm.org/docs/AttributeReference.html#no-caller-saved-registers:

Functions specified with the ‘no_caller_saved_registers’ attribute
should only call other functions with the ‘no_caller_saved_registers’
attribute, or should be compiled with the ‘-mgeneral-regs-only’ flag
to avoid saving unused non-GPR registers.

https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html#index-no_005fcaller_005fsaved_005fregisters-function-attribute_002c-x86:

Since GCC doesn’t preserve SSE, MMX nor x87 states, the GCC option
-mgeneral-regs-only should be used to compile functions with
no_caller_saved_registers attribute.

And I don't think ‘-mgeneral-regs-only’ is being used in the kernel for x86.

Thanks
Lai

2024-04-24 02:52:50

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH V2 03/11] rcu: Reorder tree_exp.h after tree_plugin.h

On Wed, Apr 24, 2024 at 1:58 AM Joel Fernandes <[email protected]> wrote:
>
> On Sun, Apr 7, 2024 at 5:03 AM Lai Jiangshan <[email protected]> wrote:
> >
> > From: Lai Jiangshan <[email protected]>
> >
> > Enable tree_exp.h using some rcu preempt macros introduced in
> > the next patch. The new macros touch core rcu-preempt fields
> > and are better to be implemented in tree_plugin.h.
> >
> > Cc: "Paul E. McKenney" <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Frederic Weisbecker <[email protected]>
> > Signed-off-by: Lai Jiangshan <[email protected]>
> > ---
> > kernel/rcu/tree.c | 2 +-
> > kernel/rcu/tree_plugin.h | 1 +
> > 2 files changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index d9642dd06c25..57d1ae26861f 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -5245,6 +5245,6 @@ void __init rcu_init(void)
> > }
> >
> > #include "tree_stall.h"
> > -#include "tree_exp.h"
> > #include "tree_nocb.h"
> > #include "tree_plugin.h"
> > +#include "tree_exp.h"
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index b1264096d03a..d899b4afc21c 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -103,6 +103,7 @@ static void __init rcu_bootup_announce_oddness(void)
> > #ifdef CONFIG_PREEMPT_RCU
> >
> > static void rcu_report_exp_rnp(struct rcu_node *rnp, bool wake);
> > +static bool sync_rcu_exp_done(struct rcu_node *rnp);
> > static void rcu_read_unlock_special(struct task_struct *t);
>
> OK with me, but not sure if the reordering of header inclusion is
> needed? You could get the same results by just adding declarations of
> the new helpers to tree_exp.h.

The new macros (in later patch) touch core rcu-preempt fields
and are better to be implemented in tree_plugin.h.

>
> Not sure if tree_plugin.h needs to be included last, I for some reason
> thought it needed to be - but looks like not. I found a thread that
> shed some light into the header file including C code thing as well,
> which may or may not help:
> https://lore.kernel.org/all/8ab3ca72-e20c-4b18-803f-bf6937c2cd70@paulmck-laptop/#t
>
> Thanks.

2024-04-24 02:53:52

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH V2 10/11] x86/rcu: Add rcu_preempt_count

On Wed, Apr 24, 2024 at 2:09 AM Joel Fernandes <[email protected]> wrote:
>
> On Sun, Apr 7, 2024 at 5:06 AM Lai Jiangshan <[email protected]> wrote:
> >
> > From: Lai Jiangshan <[email protected]>
> >
> > Implement PCPU_RCU_PREEMPT_COUNT for x86.
> > Mainly copied from asm/preempt.h
> >
> > Make rcu_read_[un]lock() inlined for rcu-preempt.
>
> Changelog is wrong. You inlined rcu_read_[un]lock in previous patch,
> not this one?

The previous patch just adds the non-arch framework code. The incline only
happens when CONFIG_PCPU_RCU_PREEMPT_COUNT=y. This patch implements
PCPU_RCU_PREEMPT_COUNT for x86, so rcu_read_[un]lock() was not made inlined
for x86 for rcu-preempt until this patch.

Thanks
Lai

>
> - Joel
>
> > Make rcu_read_lock() only one instruction.
> > Make rcu_read_unlock() only two instructions in the fast path.
> >
> > Cc: "Paul E. McKenney" <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Frederic Weisbecker <[email protected]>
> > Signed-off-by: Lai Jiangshan <[email protected]>
> > ---
> > arch/x86/Kconfig | 1 +
> > arch/x86/include/asm/current.h | 3 +
> > arch/x86/include/asm/rcu_preempt.h | 107 +++++++++++++++++++++++++++++
> > arch/x86/kernel/cpu/common.c | 4 ++
> > 4 files changed, 115 insertions(+)
> > create mode 100644 arch/x86/include/asm/rcu_preempt.h
> >
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index 4fff6ed46e90..e805cac3763d 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -256,6 +256,7 @@ config X86
> > select HAVE_OBJTOOL if X86_64
> > select HAVE_OPTPROBES
> > select HAVE_PAGE_SIZE_4KB
> > + select HAVE_PCPU_RCU_PREEMPT_COUNT
> > select HAVE_PCSPKR_PLATFORM
> > select HAVE_PERF_EVENTS
> > select HAVE_PERF_EVENTS_NMI

2024-04-24 03:03:11

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH V2 09/11] rcu: Implement PCPU_RCU_PREEMPT_COUNT framework

On Wed, Apr 24, 2024 at 2:19 AM Joel Fernandes <[email protected]> wrote:
>
> On Sun, Apr 7, 2024 at 5:04 AM Lai Jiangshan <[email protected]> wrote:
> >
> > From: Lai Jiangshan <[email protected]>
> >
> > When the arch code provides HAVE_PCPU_RCU_PREEMPT_COUNT and the
> > corresponding functions, rcu core uses the functions to implement
> > rcu_preempt_depth(), special bits, switching and so on.
> >
> > Cc: "Paul E. McKenney" <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Frederic Weisbecker <[email protected]>
> > Signed-off-by: Lai Jiangshan <[email protected]>
> > ---
> > include/linux/rcupdate.h | 33 +++++++++++++++++++++++++++++++++
> > kernel/rcu/Kconfig | 8 ++++++++
> > kernel/rcu/rcu.h | 4 ++++
> > kernel/rcu/tree_plugin.h | 8 ++++++++
> > 4 files changed, 53 insertions(+)
> >
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index 328667ae8086..e3e5ce44c7dc 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -70,6 +70,8 @@ static inline bool same_state_synchronize_rcu(unsigned long oldstate1, unsigned
> >
> > void rcu_read_unlock_special(void);
> >
> > +#ifndef CONFIG_PCPU_RCU_PREEMPT_COUNT
> > +
> > void __rcu_read_lock(void);
> > void __rcu_read_unlock(void);
> >
> > @@ -81,6 +83,37 @@ void __rcu_read_unlock(void);
> > */
> > #define rcu_preempt_depth() READ_ONCE(current->rcu_read_lock_nesting)
> > #define rcu_preempt_depth_set(val) WRITE_ONCE(current->rcu_read_lock_nesting, (val))
> > +#define pcpu_rcu_preempt_special_set() do { } while (0)
> > +#define pcpu_rcu_preempt_special_clear() do { } while (0)
> > +
> > +#else /* #ifndef CONFIG_PCPU_RCU_PREEMPT_COUNT */
> > +
> > +#include <asm/rcu_preempt.h>
> > +
> > +static __always_inline void __rcu_read_lock(void)
> > +{
> > + pcpu_rcu_preempt_count_add(1);
> > + barrier();
> > +}
> > +
> > +static __always_inline void __rcu_read_unlock(void)
> > +{
> > + barrier();
> > + if (unlikely(pcpu_rcu_preempt_count_dec_and_test()))
> > + pcpu_rcu_read_unlock_special();
> > +}
>
> Previous code had comments about the barrier(); , can you add back
> those comments?
>
> Also there was a compiler barrier in the body of the if() as well?
>

The two "if"s in the referenced __rcu_read_unlock() are condensed into
a single "if" ("if (unlikely(pcpu_rcu_preempt_count_dec_and_test()))"),
so there is no extra barrier() needed in the body of the "if" which
is analogue to the body of the second "if" of the referenced
__rcu_read_unlock().

The special bit and the rcu_depth_count are condensed, so the code
mostly follows the way how preempt_enable() works.

Thanks
Lai

> For reference:
>
> void __rcu_read_unlock(void)
> {
> struct task_struct *t = current;
>
> barrier(); // critical section before exit code.
> if (rcu_preempt_read_exit() == 0) {
> barrier(); // critical-section exit before .s check.
> if (unlikely(READ_ONCE(t->rcu_read_unlock_special.s)))
> rcu_read_unlock_special(t);
> }

2024-05-20 20:03:25

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH V2 00/11] rcu/x86: Use per-cpu rcu preempt count

On Sun, Apr 07, 2024 at 05:05:47PM +0800, Lai Jiangshan wrote:
> From: Lai Jiangshan <[email protected]>
>
>
> Changed from v1:
> Merge thunk_64.S and thunk_32.S into thunk.S
> Add missing #ifdef in arch/x86/kernel/cpu/common.c
>
> X86 can access percpu data in a single instruction.
>
> Use per-cpu rcu preempt count and make it able to be inlined.
>
> patch 1-8: prepare
> patch 9-11: implement PCPU_RCU_PREEMPT_COUNT
>
> Cc: "Paul E. McKenney" <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Frederic Weisbecker <[email protected]>

Hello, Lai!

This is really cool stuff, thank you!!!

Two big questions remain: (1) What system-level net performance benefit
is there, taking the increased context-switch overhead into account and
(2) Are the scheduler maintainers on board with these changes?

On #1, I do well recall your ca. 2019 points about the improved code
generated, but I have seen cases where improved code actually ran
more slowly. My guess is that you have the best chance of seeing
system-level benefits on low-end x86 platforms, perhaps the old Atom
or Celeron systems. The rcuref module provides a good way of doing
microbenchmarks, which would be a good start. Other metrics that
might help include overall kernel code size.

On #2, good data for #1 would help greatly.

Thoughts?

Thanx, Paul

> Lai Jiangshan (11):
> lib: Use rcu_preempt_depth() to replace current->rcu_read_lock_nesting
> rcu: Move rcu_preempt_depth_set() to rcupdate.h
> rcu: Reorder tree_exp.h after tree_plugin.h
> rcu: Add macros set_rcu_preempt_special() and
> clear_rcu_preempt_special()
> rcu: Make rcu_read_unlock_special() global
> rcu: Rename marco __LINUX_RCU_H to __KERNEL_RCU_H
> sched/core: Add rcu_preempt_switch()
> x86/entry: Merge thunk_64.S and thunk_32.S into thunk.S
> rcu: Implement PCPU_RCU_PREEMPT_COUNT framework
> x86/rcu: Add rcu_preempt_count
> x86/rcu: Add THUNK rcu_read_unlock_special_thunk
>
> arch/x86/Kconfig | 1 +
> arch/x86/entry/Makefile | 2 +-
> arch/x86/entry/{thunk_64.S => thunk.S} | 5 ++
> arch/x86/entry/thunk_32.S | 18 ----
> arch/x86/include/asm/current.h | 3 +
> arch/x86/include/asm/rcu_preempt.h | 109 +++++++++++++++++++++++++
> arch/x86/kernel/cpu/common.c | 4 +
> include/linux/rcupdate.h | 36 ++++++++
> kernel/rcu/Kconfig | 8 ++
> kernel/rcu/rcu.h | 15 +++-
> kernel/rcu/tree.c | 2 +-
> kernel/rcu/tree_exp.h | 2 +-
> kernel/rcu/tree_plugin.h | 41 ++++++----
> kernel/sched/core.c | 2 +
> lib/locking-selftest.c | 6 +-
> 15 files changed, 212 insertions(+), 42 deletions(-)
> rename arch/x86/entry/{thunk_64.S => thunk.S} (72%)
> delete mode 100644 arch/x86/entry/thunk_32.S
> create mode 100644 arch/x86/include/asm/rcu_preempt.h
>
>
> base-commit: f2f80ac809875855ac843f9e5e7480604b5cbff5
> --
> 2.19.1.6.gb485710b
>