2022-06-20 23:09:54

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 0/32] RCU Tasks updates for v5.20

Hello!

This series provides updates for the RCU Tasks family, perhaps most
notably reducing the CPU overhead of RCU Tasks Trace grace periods:

1. rcu-tasks: Check for abandoned callbacks.

2. rcu-tasks: Split rcu_tasks_one_gp() from rcu_tasks_kthread().

3. rcu-tasks: Move synchronize_rcu_tasks_generic() down.

4. rcu-tasks: Drive synchronous grace periods from calling task.

5. rcu-tasks: Merge state into .b.need_qs and atomically update.

6. rcu-tasks: Remove rcu_tasks_trace_postgp() wait for counter.

7. rcu-tasks: Make trc_read_check_handler() fetch
->trc_reader_nesting only once.

8. rcu-tasks: Idle tasks on offline CPUs are in quiescent states.

9. rcu-tasks: Handle idle tasks for recently offlined CPUs.

10. rcu-tasks: RCU Tasks Trace grace-period kthread has implicit QS.

11. rcu-tasks: Make rcu_note_context_switch() unconditionally call
rcu_tasks_qs().

12. rcu-tasks: Simplify trc_inspect_reader() QS logic.

13. rcu-tasks: Add slow-IPI indicator to RCU Tasks Trace stall
warnings.

14. rcu-tasks: Flag offline CPUs in RCU Tasks Trace stall warnings.

15. rcu-tasks: Make RCU Tasks Trace stall warnings print full
.b.need_qs field.

16. rcu-tasks: Make RCU Tasks Trace stall warning handle idle
offline tasks.

17. rcu-tasks: Add data structures for lightweight grace periods.

18. rcu-tasks: Track blocked RCU Tasks Trace readers.

19. rcu-tasks: Untrack blocked RCU Tasks Trace at reader end.

20. rcu-tasks: Add blocked-task indicator to RCU Tasks Trace stall
warnings.

21. rcu-tasks: Move rcu_tasks_trace_pertask() before
rcu_tasks_trace_pregp_step().

22. rcu-tasks: Avoid rcu_tasks_trace_pertask() duplicate list
additions.

23. rcu-tasks: Scan running tasks for RCU Tasks Trace readers.

24. rcu-tasks: Pull in tasks blocked within RCU Tasks Trace readers.

25. rcu-tasks: Stop RCU Tasks Trace from scanning idle tasks.

26. rcu-tasks: Stop RCU Tasks Trace from scanning full tasks list.

27. rcu-tasks: Maintain a count of tasks blocking RCU Tasks Trace
grace period.

28. rcu-tasks: Eliminate RCU Tasks Trace IPIs to online CPUs.

29. rcu-tasks: Disable and enable CPU hotplug in same function.

30. rcu-tasks: Update comments.

31. rcu-tasks: Be more patient for RCU Tasks boot-time testing.

32. rcu-tasks: Use delayed_work to delay
rcu_tasks_verify_self_tests(), courtesy of Waiman Long.

Thanx, Paul

------------------------------------------------------------------------

b/include/linux/rcupdate.h | 18 -
b/include/linux/rcupdate_trace.h | 2
b/include/linux/sched.h | 1
b/init/init_task.c | 1
b/kernel/fork.c | 1
b/kernel/rcu/tasks.h | 5
b/kernel/rcu/tree_plugin.h | 2
b/kernel/sched/core.c | 32 +
include/linux/rcupdate.h | 11
include/linux/sched.h | 3
kernel/rcu/tasks.h | 678 +++++++++++++++++++++++----------------
11 files changed, 466 insertions(+), 288 deletions(-)


2022-06-20 23:10:25

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 09/32] rcu-tasks: Handle idle tasks for recently offlined CPUs

This commit identifies idle tasks for recently offlined CPUs as residing
in a quiescent state. This is safe only because CPU-hotplug operations
are excluded during these checks.

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: KP Singh <[email protected]>
---
kernel/rcu/tasks.h | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index ec68bfe98c958..414861d651964 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1318,27 +1318,26 @@ static int trc_inspect_reader(struct task_struct *t, void *bhp_in)
int nesting;
bool ofl = cpu_is_offline(cpu);

- if (task_curr(t)) {
- WARN_ON_ONCE(ofl && !is_idle_task(t));
-
+ if (task_curr(t) && !ofl) {
// If no chance of heavyweight readers, do it the hard way.
- if (!ofl && !IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB))
+ if (!IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB))
return -EINVAL;

// If heavyweight readers are enabled on the remote task,
// we can inspect its state despite its currently running.
// However, we cannot safely change its state.
n_heavy_reader_attempts++;
- if (!ofl && // Check for "running" idle tasks on offline CPUs.
- !rcu_dynticks_zero_in_eqs(cpu, &t->trc_reader_nesting))
+ // Check for "running" idle tasks on offline CPUs.
+ if (!rcu_dynticks_zero_in_eqs(cpu, &t->trc_reader_nesting))
return -EINVAL; // No quiescent state, do it the hard way.
n_heavy_reader_updates++;
- if (ofl)
- n_heavy_reader_ofl_updates++;
nesting = 0;
} else {
// The task is not running, so C-language access is safe.
nesting = t->trc_reader_nesting;
+ WARN_ON_ONCE(ofl && task_curr(t) && !is_idle_task(t));
+ if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB) && ofl)
+ n_heavy_reader_ofl_updates++;
}

// If not exiting a read-side critical section, mark as checked
--
2.31.1.189.g2e36527f23

2022-06-20 23:10:36

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 32/32] rcu-tasks: Use delayed_work to delay rcu_tasks_verify_self_tests()

From: Waiman Long <[email protected]>

Commit 2585014188d5 ("rcu-tasks: Be more patient for RCU Tasks
boot-time testing") fixes false positive rcu_tasks verification check
failure by repeating the test once every second until timeout using
schedule_timeout_uninterruptible().

Since rcu_tasks_verify_selft_tests() is called from do_initcalls()
as a late_initcall, this has the undesirable side effect of delaying
other late_initcall's queued after it by a second or more. Fix this by
instead using delayed_work to repeat the verification check.

Fixes: 2585014188d5 ("rcu-tasks: Be more patient for RCU Tasks boot-time testing")
Signed-off-by: Waiman Long <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
kernel/rcu/tasks.h | 37 ++++++++++++++++++++++++++++++++-----
1 file changed, 32 insertions(+), 5 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index fcbd0ec33c866..83c7e6620d403 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1832,6 +1832,11 @@ static void rcu_tasks_initiate_self_tests(void)
#endif
}

+/*
+ * Return: 0 - test passed
+ * 1 - test failed, but have not timed out yet
+ * -1 - test failed and timed out
+ */
static int rcu_tasks_verify_self_tests(void)
{
int ret = 0;
@@ -1847,16 +1852,38 @@ static int rcu_tasks_verify_self_tests(void)
ret = -1;
break;
}
- schedule_timeout_uninterruptible(1);
+ ret = 1;
+ break;
}
}
-
- if (ret)
- WARN_ON(1);
+ WARN_ON(ret < 0);

return ret;
}
-late_initcall(rcu_tasks_verify_self_tests);
+
+/*
+ * Repeat the rcu_tasks_verify_self_tests() call once every second until the
+ * test passes or has timed out.
+ */
+static struct delayed_work rcu_tasks_verify_work;
+static void rcu_tasks_verify_work_fn(struct work_struct *work __maybe_unused)
+{
+ int ret = rcu_tasks_verify_self_tests();
+
+ if (ret <= 0)
+ return;
+
+ /* Test fails but not timed out yet, reschedule another check */
+ schedule_delayed_work(&rcu_tasks_verify_work, HZ);
+}
+
+static int rcu_tasks_verify_schedule_work(void)
+{
+ INIT_DELAYED_WORK(&rcu_tasks_verify_work, rcu_tasks_verify_work_fn);
+ rcu_tasks_verify_work_fn(NULL);
+ return 0;
+}
+late_initcall(rcu_tasks_verify_schedule_work);
#else /* #ifdef CONFIG_PROVE_RCU */
static void rcu_tasks_initiate_self_tests(void) { }
#endif /* #else #ifdef CONFIG_PROVE_RCU */
--
2.31.1.189.g2e36527f23

2022-06-20 23:11:07

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 28/32] rcu-tasks: Eliminate RCU Tasks Trace IPIs to online CPUs

Currently, the RCU Tasks Trace grace-period kthread IPIs each online CPU
using smp_call_function_single() in order to track any tasks currently in
RCU Tasks Trace read-side critical sections during which the corresponding
task has neither blocked nor been preempted. These IPIs are annoying
and are also not strictly necessary because any task that blocks or is
preempted within its current RCU Tasks Trace read-side critical section
will be tracked on one of the per-CPU rcu_tasks_percpu structure's
->rtp_blkd_tasks list. So the only time that this is a problem is if
one of the CPUs runs through a long-duration RCU Tasks Trace read-side
critical section without a context switch.

Note that the task_call_func() function cannot help here because there is
no safe way to identify the target task. Of course, the task_call_func()
function will be very useful later, when processing the list of tasks,
but it needs to know the task.

This commit therefore creates a cpu_curr_snapshot() function that returns
a pointer the task_struct structure of some task that happened to be
running on the specified CPU more or less during the time that the
cpu_curr_snapshot() function was executing. If there was no context
switch during this time, this function will return a pointer to the
task_struct structure of the task that was running throughout. If there
was a context switch, then the outgoing task will be taken care of by
RCU's context-switch hook, and the incoming task was either already taken
care during some previous context switch, or it is not currently within an
RCU Tasks Trace read-side critical section. And in this latter case, the
grace period already started, so there is no need to wait on this task.

This new cpu_curr_snapshot() function is invoked on each CPU early in
the RCU Tasks Trace grace-period processing, and the resulting tasks
are queued for later quiescent-state inspection.

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: KP Singh <[email protected]>
---
include/linux/sched.h | 1 +
kernel/rcu/tasks.h | 24 +++++++-----------------
kernel/sched/core.c | 32 ++++++++++++++++++++++++++++++++
3 files changed, 40 insertions(+), 17 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index b88caf54e1686..72242bc73d850 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2224,6 +2224,7 @@ static inline void set_task_cpu(struct task_struct *p, unsigned int cpu)

extern bool sched_task_on_rq(struct task_struct *p);
extern unsigned long get_wchan(struct task_struct *p);
+extern struct task_struct *cpu_curr_snapshot(int cpu);

/*
* In order to reduce various lock holder preemption latencies provide an
diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 9d7d6fd4b8a79..c2aae2643a0b2 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1479,21 +1479,6 @@ static void rcu_tasks_trace_pertask(struct task_struct *t, struct list_head *hop
trc_wait_for_one_reader(t, hop);
}

-/*
- * Get the current CPU's current task on the holdout list.
- * Calls to this function must be serialized.
- */
-static void rcu_tasks_trace_pertask_handler(void *hop_in)
-{
- struct list_head *hop = hop_in;
- struct task_struct *t = current;
-
- // Pull in the currently running task, but only if it is currently
- // in an RCU tasks trace read-side critical section.
- if (rcu_tasks_trace_pertask_prep(t, false))
- trc_add_holdout(t, hop);
-}
-
/* Initialize for a new RCU-tasks-trace grace period. */
static void rcu_tasks_trace_pregp_step(struct list_head *hop)
{
@@ -1513,8 +1498,13 @@ static void rcu_tasks_trace_pregp_step(struct list_head *hop)

// These smp_call_function_single() calls are serialized to
// allow safe access to the hop list.
- for_each_online_cpu(cpu)
- smp_call_function_single(cpu, rcu_tasks_trace_pertask_handler, hop, 1);
+ for_each_online_cpu(cpu) {
+ rcu_read_lock();
+ t = cpu_curr_snapshot(cpu);
+ if (rcu_tasks_trace_pertask_prep(t, true))
+ trc_add_holdout(t, hop);
+ rcu_read_unlock();
+ }

// Only after all running tasks have been accounted for is it
// safe to take care of the tasks that have blocked within their
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index da0bf6fe9ecdc..9568019be124c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4263,6 +4263,38 @@ int task_call_func(struct task_struct *p, task_call_f func, void *arg)
return ret;
}

+/**
+ * cpu_curr_snapshot - Return a snapshot of the currently running task
+ * @cpu: The CPU on which to snapshot the task.
+ *
+ * Returns the task_struct pointer of the task "currently" running on
+ * the specified CPU. If the same task is running on that CPU throughout,
+ * the return value will be a pointer to that task's task_struct structure.
+ * If the CPU did any context switches even vaguely concurrently with the
+ * execution of this function, the return value will be a pointer to the
+ * task_struct structure of a randomly chosen task that was running on
+ * that CPU somewhere around the time that this function was executing.
+ *
+ * If the specified CPU was offline, the return value is whatever it
+ * is, perhaps a pointer to the task_struct structure of that CPU's idle
+ * task, but there is no guarantee. Callers wishing a useful return
+ * value must take some action to ensure that the specified CPU remains
+ * online throughout.
+ *
+ * This function executes full memory barriers before and after fetching
+ * the pointer, which permits the caller to confine this function's fetch
+ * with respect to the caller's accesses to other shared variables.
+ */
+struct task_struct *cpu_curr_snapshot(int cpu)
+{
+ struct task_struct *t;
+
+ smp_mb(); /* Pairing determined by caller's synchronization design. */
+ t = rcu_dereference(cpu_curr(cpu));
+ smp_mb(); /* Pairing determined by caller's synchronization design. */
+ return t;
+}
+
/**
* wake_up_process - Wake up a specific process
* @p: The process to be woken up.
--
2.31.1.189.g2e36527f23

2022-06-20 23:11:12

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 11/32] rcu-tasks: Make rcu_note_context_switch() unconditionally call rcu_tasks_qs()

This commit makes rcu_note_context_switch() unconditionally invoke the
rcu_tasks_qs() function, as opposed to doing so only when RCU (as opposed
to RCU Tasks Trace) urgently needs a grace period to end.

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: KP Singh <[email protected]>
---
kernel/rcu/tree_plugin.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index c8ba0fe17267c..c966d680b789e 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -899,8 +899,8 @@ void rcu_note_context_switch(bool preempt)
this_cpu_write(rcu_data.rcu_urgent_qs, false);
if (unlikely(raw_cpu_read(rcu_data.rcu_need_heavy_qs)))
rcu_momentary_dyntick_idle();
- rcu_tasks_qs(current, preempt);
out:
+ rcu_tasks_qs(current, preempt);
trace_rcu_utilization(TPS("End context switch"));
}
EXPORT_SYMBOL_GPL(rcu_note_context_switch);
--
2.31.1.189.g2e36527f23

2022-06-20 23:12:03

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 04/32] rcu-tasks: Drive synchronous grace periods from calling task

This commit causes synchronous grace periods to be driven from the task
invoking synchronize_rcu_*(), allowing these functions to be invoked from
the mid-boot dead zone extending from when the scheduler was initialized
to to point that the various RCU tasks grace-period kthreads are spawned.
This change will allow the self-tests to run in a consistent manner.

Reported-by: Matthew Wilcox <[email protected]>
Reported-by: Zhouyi Zhou <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
kernel/rcu/tasks.h | 24 ++++++++++++++++--------
1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index ad993c4ed924f..bd9f2e24f5c73 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -495,17 +495,21 @@ static void rcu_tasks_invoke_cbs_wq(struct work_struct *wp)
}

// Wait for one grace period.
-static void rcu_tasks_one_gp(struct rcu_tasks *rtp)
+static void rcu_tasks_one_gp(struct rcu_tasks *rtp, bool midboot)
{
int needgpcb;

mutex_lock(&rtp->tasks_gp_mutex);
- set_tasks_gp_state(rtp, RTGS_WAIT_CBS);

// If there were none, wait a bit and start over.
- rcuwait_wait_event(&rtp->cbs_wait,
- (needgpcb = rcu_tasks_need_gpcb(rtp)),
- TASK_IDLE);
+ if (unlikely(midboot)) {
+ needgpcb = 0x2;
+ } else {
+ set_tasks_gp_state(rtp, RTGS_WAIT_CBS);
+ rcuwait_wait_event(&rtp->cbs_wait,
+ (needgpcb = rcu_tasks_need_gpcb(rtp)),
+ TASK_IDLE);
+ }

if (needgpcb & 0x2) {
// Wait for one grace period.
@@ -540,7 +544,7 @@ static int __noreturn rcu_tasks_kthread(void *arg)
for (;;) {
// Wait for one grace period and invoke any callbacks
// that are ready.
- rcu_tasks_one_gp(rtp);
+ rcu_tasks_one_gp(rtp, false);

// Paranoid sleep to keep this from entering a tight loop.
schedule_timeout_idle(rtp->gp_sleep);
@@ -554,8 +558,12 @@ static void synchronize_rcu_tasks_generic(struct rcu_tasks *rtp)
RCU_LOCKDEP_WARN(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE,
"synchronize_rcu_tasks called too soon");

- /* Wait for the grace period. */
- wait_rcu_gp(rtp->call_func);
+ // If the grace-period kthread is running, use it.
+ if (READ_ONCE(rtp->kthread_ptr)) {
+ wait_rcu_gp(rtp->call_func);
+ return;
+ }
+ rcu_tasks_one_gp(rtp, true);
}

/* Spawn RCU-tasks grace-period kthread. */
--
2.31.1.189.g2e36527f23

2022-06-20 23:12:42

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 10/32] rcu-tasks: RCU Tasks Trace grace-period kthread has implicit QS

Because the task driving the grace-period kthread is in quiescent state
throughout, this commit excludes it from the list of tasks from which
a quiescent state is needed.

This does mean that attaching a sleepable BPF program to function in
kernel/rcu/tasks.h is a bad idea, by the way.

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: KP Singh <[email protected]>
---
kernel/rcu/tasks.h | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 414861d651964..554b2e59a1d5a 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1433,8 +1433,9 @@ static void rcu_tasks_trace_pertask(struct task_struct *t,
struct list_head *hop)
{
// During early boot when there is only the one boot CPU, there
- // is no idle task for the other CPUs. Just return.
- if (unlikely(t == NULL))
+ // is no idle task for the other CPUs. Also, the grace-period
+ // kthread is always in a quiescent state. Either way, just return.
+ if (unlikely(t == NULL) || t == current)
return;

rcu_st_need_qs(t, 0);
--
2.31.1.189.g2e36527f23

2022-06-20 23:15:12

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 30/32] rcu-tasks: Update comments

This commit updates comments to reflect the changes in the series
of commits that eliminated the full task-list scan.

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: KP Singh <[email protected]>
---
kernel/rcu/tasks.h | 71 +++++++++++++++++++++-------------------------
1 file changed, 33 insertions(+), 38 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index bf9cc5bc4ae52..df6b2cb2f205d 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1138,11 +1138,10 @@ EXPORT_SYMBOL_GPL(show_rcu_tasks_rude_gp_kthread);
// 3. Avoids expensive read-side instructions, having overhead similar
// to that of Preemptible RCU.
//
-// There are of course downsides. The grace-period code can send IPIs to
-// CPUs, even when those CPUs are in the idle loop or in nohz_full userspace.
-// It is necessary to scan the full tasklist, much as for Tasks RCU. There
-// is a single callback queue guarded by a single lock, again, much as for
-// Tasks RCU. If needed, these downsides can be at least partially remedied.
+// There are of course downsides. For example, the grace-period code
+// can send IPIs to CPUs, even when those CPUs are in the idle loop or
+// in nohz_full userspace. If needed, these downsides can be at least
+// partially remedied.
//
// Perhaps most important, this variant of RCU does not affect the vanilla
// flavors, rcu_preempt and rcu_sched. The fact that RCU Tasks Trace
@@ -1155,38 +1154,30 @@ EXPORT_SYMBOL_GPL(show_rcu_tasks_rude_gp_kthread);
// invokes these functions in this order:
//
// rcu_tasks_trace_pregp_step():
-// Initialize the count of readers and block CPU-hotplug operations.
-// rcu_tasks_trace_pertask(), invoked on every non-idle task:
-// Initialize per-task state and attempt to identify an immediate
-// quiescent state for that task, or, failing that, attempt to
-// set that task's .need_qs flag so that task's next outermost
-// rcu_read_unlock_trace() will report the quiescent state (in which
-// case the count of readers is incremented). If both attempts fail,
-// the task is added to a "holdout" list. Note that IPIs are used
-// to invoke trc_read_check_handler() in the context of running tasks
-// in order to avoid ordering overhead on common-case shared-variable
-// accessses.
+// Disables CPU hotplug, adds all currently executing tasks to the
+// holdout list, then checks the state of all tasks that blocked
+// or were preempted within their current RCU Tasks Trace read-side
+// critical section, adding them to the holdout list if appropriate.
+// Finally, this function re-enables CPU hotplug.
+// The ->pertask_func() pointer is NULL, so there is no per-task processing.
// rcu_tasks_trace_postscan():
-// Initialize state and attempt to identify an immediate quiescent
-// state as above (but only for idle tasks), unblock CPU-hotplug
-// operations, and wait for an RCU grace period to avoid races with
-// tasks that are in the process of exiting.
+// Invokes synchronize_rcu() to wait for late-stage exiting tasks
+// to finish exiting.
// check_all_holdout_tasks_trace(), repeatedly until holdout list is empty:
// Scans the holdout list, attempting to identify a quiescent state
// for each task on the list. If there is a quiescent state, the
-// corresponding task is removed from the holdout list.
+// corresponding task is removed from the holdout list. Once this
+// list is empty, the grace period has completed.
// rcu_tasks_trace_postgp():
-// Wait for the count of readers do drop to zero, reporting any stalls.
-// Also execute full memory barriers to maintain ordering with code
-// executing after the grace period.
+// Provides the needed full memory barrier and does debug checks.
//
// The exit_tasks_rcu_finish_trace() synchronizes with exiting tasks.
//
-// Pre-grace-period update-side code is ordered before the grace
-// period via the ->cbs_lock and barriers in rcu_tasks_kthread().
-// Pre-grace-period read-side code is ordered before the grace period by
-// atomic_dec_and_test() of the count of readers (for IPIed readers) and by
-// scheduler context-switch ordering (for locked-down non-running readers).
+// Pre-grace-period update-side code is ordered before the grace period
+// via the ->cbs_lock and barriers in rcu_tasks_kthread(). Pre-grace-period
+// read-side code is ordered before the grace period by atomic operations
+// on .b.need_qs flag of each task involved in this process, or by scheduler
+// context-switch ordering (for locked-down non-running readers).

// The lockdep state must be outside of #ifdef to be useful.
#ifdef CONFIG_DEBUG_LOCK_ALLOC
@@ -1245,7 +1236,10 @@ u8 rcu_trc_cmpxchg_need_qs(struct task_struct *t, u8 old, u8 new)
}
EXPORT_SYMBOL_GPL(rcu_trc_cmpxchg_need_qs);

-/* If we are the last reader, wake up the grace-period kthread. */
+/*
+ * If we are the last reader, signal the grace-period kthread.
+ * Also remove from the per-CPU list of blocked tasks.
+ */
void rcu_read_unlock_trace_special(struct task_struct *t)
{
unsigned long flags;
@@ -1336,9 +1330,9 @@ static void trc_read_check_handler(void *t_in)
if (unlikely(nesting < 0))
goto reset_ipi;

- // Get here if the task is in a read-side critical section. Set
- // its state so that it will awaken the grace-period kthread upon
- // exit from that critical section.
+ // Get here if the task is in a read-side critical section.
+ // Set its state so that it will update state for the grace-period
+ // kthread upon exit from that critical section.
rcu_trc_cmpxchg_need_qs(t, 0, TRC_NEED_QS | TRC_NEED_QS_CHECKED);

reset_ipi:
@@ -1387,7 +1381,7 @@ static int trc_inspect_reader(struct task_struct *t, void *bhp_in)
return 0; // In QS, so done.
}
if (nesting < 0)
- return -EINVAL; // QS transitioning, try again later.
+ return -EINVAL; // Reader transitioning, try again later.

// The task is in a read-side critical section, so set up its
// state so that it will update state upon exit from that critical
@@ -1492,11 +1486,12 @@ static void rcu_tasks_trace_pregp_step(struct list_head *hop)
for_each_possible_cpu(cpu)
WARN_ON_ONCE(per_cpu(trc_ipi_to_cpu, cpu));

- // Disable CPU hotplug across the CPU scan.
- // This also waits for all readers in CPU-hotplug code paths.
+ // Disable CPU hotplug across the CPU scan for the benefit of
+ // any IPIs that might be needed. This also waits for all readers
+ // in CPU-hotplug code paths.
cpus_read_lock();

- // These smp_call_function_single() calls are serialized to
+ // These rcu_tasks_trace_pertask_prep() calls are serialized to
// allow safe access to the hop list.
for_each_online_cpu(cpu) {
rcu_read_lock();
@@ -1608,7 +1603,7 @@ static void check_all_holdout_tasks_trace(struct list_head *hop,
{
struct task_struct *g, *t;

- // Disable CPU hotplug across the holdout list scan.
+ // Disable CPU hotplug across the holdout list scan for IPIs.
cpus_read_lock();

list_for_each_entry_safe(t, g, hop, trc_holdout_list) {
--
2.31.1.189.g2e36527f23

2022-06-20 23:31:48

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 24/32] rcu-tasks: Pull in tasks blocked within RCU Tasks Trace readers

This commit scans each CPU's ->rtp_blkd_tasks list, adding them to
the list of holdout tasks. This will cause the current RCU Tasks Trace
grace period to wait until these tasks exit their RCU Tasks Trace
read-side critical sections. This commit will enable later work
omitting the scan of the full task list.

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: KP Singh <[email protected]>
---
kernel/rcu/tasks.h | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index a8f95864c921a..d318cdfd2309c 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1492,7 +1492,11 @@ static void rcu_tasks_trace_pertask_handler(void *hop_in)
/* Initialize for a new RCU-tasks-trace grace period. */
static void rcu_tasks_trace_pregp_step(struct list_head *hop)
{
+ LIST_HEAD(blkd_tasks);
int cpu;
+ unsigned long flags;
+ struct rcu_tasks_percpu *rtpcp;
+ struct task_struct *t;

// There shouldn't be any old IPIs, but...
for_each_possible_cpu(cpu)
@@ -1506,6 +1510,26 @@ static void rcu_tasks_trace_pregp_step(struct list_head *hop)
// allow safe access to the hop list.
for_each_online_cpu(cpu)
smp_call_function_single(cpu, rcu_tasks_trace_pertask_handler, hop, 1);
+
+ // Only after all running tasks have been accounted for is it
+ // safe to take care of the tasks that have blocked within their
+ // current RCU tasks trace read-side critical section.
+ for_each_possible_cpu(cpu) {
+ rtpcp = per_cpu_ptr(rcu_tasks_trace.rtpcpu, cpu);
+ raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
+ list_splice_init(&rtpcp->rtp_blkd_tasks, &blkd_tasks);
+ while (!list_empty(&blkd_tasks)) {
+ rcu_read_lock();
+ t = list_first_entry(&blkd_tasks, struct task_struct, trc_blkd_node);
+ list_del_init(&t->trc_blkd_node);
+ list_add(&t->trc_blkd_node, &rtpcp->rtp_blkd_tasks);
+ raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
+ rcu_tasks_trace_pertask(t, hop);
+ rcu_read_unlock();
+ raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
+ }
+ raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
+ }
}

/*
--
2.31.1.189.g2e36527f23

2022-06-20 23:32:03

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 08/32] rcu-tasks: Idle tasks on offline CPUs are in quiescent states

Any idle task corresponding to an offline CPU is in an RCU Tasks Trace
quiescent state. This commit causes rcu_tasks_trace_postscan() to ignore
idle tasks for offline CPUs, which it can do safely due to CPU-hotplug
operations being disabled.

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: KP Singh <[email protected]>
---
kernel/rcu/tasks.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 8fe78a7fecafd..ec68bfe98c958 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1451,7 +1451,7 @@ static void rcu_tasks_trace_postscan(struct list_head *hop)
{
int cpu;

- for_each_possible_cpu(cpu)
+ for_each_online_cpu(cpu)
rcu_tasks_trace_pertask(idle_task(cpu), hop);

// Re-enable CPU hotplug now that the tasklist scan has completed.
--
2.31.1.189.g2e36527f23

2022-06-20 23:33:10

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 23/32] rcu-tasks: Scan running tasks for RCU Tasks Trace readers

A running task might be within an RCU Tasks Trace read-side critical
section for any length of time, but will not be placed on any of the
per-CPU rcu_tasks_percpu structure's ->rtp_blkd_tasks lists. Therefore
any RCU Tasks Trace grace-period processing that does not scan the full
task list must interact with the running tasks.

This commit therefore causes the rcu_tasks_trace_pregp_step() function
to IPI each CPU in order to place the corresponding task on the holdouts
list and to record whether or not it was in an RCU Tasks Trace read-side
critical section. Yes, it is possible to avoid adding it to that list
if it is not a reader, but that would prevent the system from remembering
that this task was in a quiescent state. Which is why the running tasks
are unconditionally added to the holdout list.

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: KP Singh <[email protected]>
---
kernel/rcu/tasks.h | 51 ++++++++++++++++++++++++++++++++++++----------
1 file changed, 40 insertions(+), 11 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 1aa6a24a9bc2b..a8f95864c921a 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -14,7 +14,7 @@

struct rcu_tasks;
typedef void (*rcu_tasks_gp_func_t)(struct rcu_tasks *rtp);
-typedef void (*pregp_func_t)(void);
+typedef void (*pregp_func_t)(struct list_head *hop);
typedef void (*pertask_func_t)(struct task_struct *t, struct list_head *hop);
typedef void (*postscan_func_t)(struct list_head *hop);
typedef void (*holdouts_func_t)(struct list_head *hop, bool ndrpt, bool *frptp);
@@ -661,7 +661,7 @@ static void rcu_tasks_wait_gp(struct rcu_tasks *rtp)
struct task_struct *t;

set_tasks_gp_state(rtp, RTGS_PRE_WAIT_GP);
- rtp->pregp_func();
+ rtp->pregp_func(&holdouts);

/*
* There were callbacks, so we need to wait for an RCU-tasks
@@ -791,7 +791,7 @@ static void rcu_tasks_wait_gp(struct rcu_tasks *rtp)
// disabling.

/* Pre-grace-period preparation. */
-static void rcu_tasks_pregp_step(void)
+static void rcu_tasks_pregp_step(struct list_head *hop)
{
/*
* Wait for all pre-existing t->on_rq and t->nvcsw transitions
@@ -1449,24 +1449,48 @@ static void trc_wait_for_one_reader(struct task_struct *t,
}
}

-/* Do first-round processing for the specified task. */
-static void rcu_tasks_trace_pertask(struct task_struct *t,
- struct list_head *hop)
+/*
+ * Initialize for first-round processing for the specified task.
+ * Return false if task is NULL or already taken care of, true otherwise.
+ */
+static bool rcu_tasks_trace_pertask_prep(struct task_struct *t, bool notself)
{
// During early boot when there is only the one boot CPU, there
// is no idle task for the other CPUs. Also, the grace-period
// kthread is always in a quiescent state. In addition, just return
// if this task is already on the list.
- if (unlikely(t == NULL) || t == current || !list_empty(&t->trc_holdout_list))
- return;
+ if (unlikely(t == NULL) || (t == current && notself) || !list_empty(&t->trc_holdout_list))
+ return false;

rcu_st_need_qs(t, 0);
t->trc_ipi_to_cpu = -1;
- trc_wait_for_one_reader(t, hop);
+ return true;
+}
+
+/* Do first-round processing for the specified task. */
+static void rcu_tasks_trace_pertask(struct task_struct *t, struct list_head *hop)
+{
+ if (rcu_tasks_trace_pertask_prep(t, true))
+ trc_wait_for_one_reader(t, hop);
+}
+
+/*
+ * Get the current CPU's current task on the holdout list.
+ * Calls to this function must be serialized.
+ */
+static void rcu_tasks_trace_pertask_handler(void *hop_in)
+{
+ struct list_head *hop = hop_in;
+ struct task_struct *t = current;
+
+ // Pull in the currently running task, but only if it is currently
+ // in an RCU tasks trace read-side critical section.
+ if (rcu_tasks_trace_pertask_prep(t, false))
+ trc_add_holdout(t, hop);
}

/* Initialize for a new RCU-tasks-trace grace period. */
-static void rcu_tasks_trace_pregp_step(void)
+static void rcu_tasks_trace_pregp_step(struct list_head *hop)
{
int cpu;

@@ -1474,9 +1498,14 @@ static void rcu_tasks_trace_pregp_step(void)
for_each_possible_cpu(cpu)
WARN_ON_ONCE(per_cpu(trc_ipi_to_cpu, cpu));

- // Disable CPU hotplug across the tasklist scan.
+ // Disable CPU hotplug across the CPU scan.
// This also waits for all readers in CPU-hotplug code paths.
cpus_read_lock();
+
+ // These smp_call_function_single() calls are serialized to
+ // allow safe access to the hop list.
+ for_each_online_cpu(cpu)
+ smp_call_function_single(cpu, rcu_tasks_trace_pertask_handler, hop, 1);
}

/*
--
2.31.1.189.g2e36527f23

2022-06-20 23:35:58

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 17/32] rcu-tasks: Add data structures for lightweight grace periods

This commit adds fields to task_struct and to rcu_tasks_percpu that will
be used to avoid the task-list scan for RCU Tasks Trace grace periods,
and also initializes these fields.

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: KP Singh <[email protected]>
---
include/linux/sched.h | 2 ++
init/init_task.c | 1 +
kernel/fork.c | 1 +
kernel/rcu/tasks.h | 4 ++++
4 files changed, 8 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index e6eb5871593e9..b88caf54e1686 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -844,6 +844,8 @@ struct task_struct {
int trc_ipi_to_cpu;
union rcu_special trc_reader_special;
struct list_head trc_holdout_list;
+ struct list_head trc_blkd_node;
+ int trc_blkd_cpu;
#endif /* #ifdef CONFIG_TASKS_TRACE_RCU */

struct sched_info sched_info;
diff --git a/init/init_task.c b/init/init_task.c
index 73cc8f03511a3..ff6c4b9bfe6b1 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -157,6 +157,7 @@ struct task_struct init_task
.trc_reader_nesting = 0,
.trc_reader_special.s = 0,
.trc_holdout_list = LIST_HEAD_INIT(init_task.trc_holdout_list),
+ .trc_blkd_node = LIST_HEAD_INIT(init_task.trc_blkd_node),
#endif
#ifdef CONFIG_CPUSETS
.mems_allowed_seq = SEQCNT_SPINLOCK_ZERO(init_task.mems_allowed_seq,
diff --git a/kernel/fork.c b/kernel/fork.c
index 9d44f2d46c696..1950eb8702441 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1814,6 +1814,7 @@ static inline void rcu_copy_process(struct task_struct *p)
p->trc_reader_nesting = 0;
p->trc_reader_special.s = 0;
INIT_LIST_HEAD(&p->trc_holdout_list);
+ INIT_LIST_HEAD(&p->trc_blkd_node);
#endif /* #ifdef CONFIG_TASKS_TRACE_RCU */
}

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 64eb4d7b142e3..fd4508af055e6 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -29,6 +29,7 @@ typedef void (*postgp_func_t)(struct rcu_tasks *rtp);
* @rtp_work: Work queue for invoking callbacks.
* @rtp_irq_work: IRQ work queue for deferred wakeups.
* @barrier_q_head: RCU callback for barrier operation.
+ * @rtp_blkd_tasks: List of tasks blocked as readers.
* @cpu: CPU number corresponding to this entry.
* @rtpp: Pointer to the rcu_tasks structure.
*/
@@ -40,6 +41,7 @@ struct rcu_tasks_percpu {
struct work_struct rtp_work;
struct irq_work rtp_irq_work;
struct rcu_head barrier_q_head;
+ struct list_head rtp_blkd_tasks;
int cpu;
struct rcu_tasks *rtpp;
};
@@ -256,6 +258,8 @@ static void cblist_init_generic(struct rcu_tasks *rtp)
INIT_WORK(&rtpcp->rtp_work, rcu_tasks_invoke_cbs_wq);
rtpcp->cpu = cpu;
rtpcp->rtpp = rtp;
+ if (!rtpcp->rtp_blkd_tasks.next)
+ INIT_LIST_HEAD(&rtpcp->rtp_blkd_tasks);
raw_spin_unlock_rcu_node(rtpcp); // irqs remain disabled.
}
raw_spin_unlock_irqrestore(&rtp->cbs_gbl_lock, flags);
--
2.31.1.189.g2e36527f23

2022-06-20 23:46:03

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 29/32] rcu-tasks: Disable and enable CPU hotplug in same function

The rcu_tasks_trace_pregp_step() function invokes cpus_read_lock() to
disable CPU hotplug, and a later call to the rcu_tasks_trace_postscan()
function invokes cpus_read_unlock() to re-enable it. This was absolutely
necessary in the past in order to protect the intervening scan of the full
tasks list, but there is no longer such a scan. This commit therefore
improves readability by moving the cpus_read_unlock() call to the end
of the rcu_tasks_trace_pregp_step() function. This commit is a pure
code-motion commit without any (intended) change in functionality.

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: KP Singh <[email protected]>
---
kernel/rcu/tasks.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index c2aae2643a0b2..bf9cc5bc4ae52 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1525,6 +1525,9 @@ static void rcu_tasks_trace_pregp_step(struct list_head *hop)
}
raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
}
+
+ // Re-enable CPU hotplug now that the holdout list is populated.
+ cpus_read_unlock();
}

/*
@@ -1532,9 +1535,6 @@ static void rcu_tasks_trace_pregp_step(struct list_head *hop)
*/
static void rcu_tasks_trace_postscan(struct list_head *hop)
{
- // Re-enable CPU hotplug now that the tasklist scan has completed.
- cpus_read_unlock();
-
// Wait for late-stage exiting tasks to finish exiting.
// These might have passed the call to exit_tasks_rcu_finish().
synchronize_rcu();
--
2.31.1.189.g2e36527f23

2022-06-20 23:50:08

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 14/32] rcu-tasks: Flag offline CPUs in RCU Tasks Trace stall warnings

This commit tags offline CPUs with "(offline)" in RCU Tasks Trace CPU
stall warnings. This is a debugging aid.

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: KP Singh <[email protected]>
---
kernel/rcu/tasks.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 1cfbebf2b5977..93096188d3631 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1502,14 +1502,14 @@ static void show_stalled_task_trace(struct task_struct *t, bool *firstreport)
".I"[t->trc_ipi_to_cpu >= 0],
".i"[is_idle_tsk]);
else
- pr_alert("P%d: %c%c%c nesting: %d%c cpu: %d\n",
+ pr_alert("P%d: %c%c%c nesting: %d%c cpu: %d%s\n",
t->pid,
".I"[trc_rdr.ipi_to_cpu >= 0],
".i"[is_idle_tsk],
".N"[cpu >= 0 && tick_nohz_full_cpu(cpu)],
trc_rdr.nesting,
" N"[!!trc_rdr.needqs],
- cpu);
+ cpu, cpu_online(cpu) ? "" : "(offline)");
sched_show_task(t);
}

--
2.31.1.189.g2e36527f23

2022-06-20 23:57:56

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 13/32] rcu-tasks: Add slow-IPI indicator to RCU Tasks Trace stall warnings

This commit adds a "I" indicator to the RCU Tasks Trace CPU stall
warning when an IPI directed to a task has thus far failed to arrive.
This serves as a debugging aid.

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: KP Singh <[email protected]>
---
kernel/rcu/tasks.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 6b44c69eeca88..1cfbebf2b5977 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1497,8 +1497,9 @@ static void show_stalled_task_trace(struct task_struct *t, bool *firstreport)
}
cpu = task_cpu(t);
if (!task_call_func(t, trc_check_slow_task, &trc_rdr))
- pr_alert("P%d: %c\n",
+ pr_alert("P%d: %c%c\n",
t->pid,
+ ".I"[t->trc_ipi_to_cpu >= 0],
".i"[is_idle_tsk]);
else
pr_alert("P%d: %c%c%c nesting: %d%c cpu: %d\n",
--
2.31.1.189.g2e36527f23

2022-06-21 00:00:28

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 25/32] rcu-tasks: Stop RCU Tasks Trace from scanning idle tasks

Now that RCU scans both running tasks and tasks that have blocked within
their current RCU Tasks Trace read-side critical section, there is no
need for it to scan the idle tasks. After all, an idle loop should not
be remain within an RCU Tasks Trace read-side critical section across
exit from idle, and from a BPF viewpoint, functions invoked from the
idle loop should not sleep. So only running idle tasks can be within
RCU Tasks Trace read-side critical sections.

This commit therefore removes the scan of the idle tasks from the
rcu_tasks_trace_postscan() function.

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: KP Singh <[email protected]>
---
kernel/rcu/tasks.h | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index d318cdfd2309c..272c905995e56 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1533,16 +1533,10 @@ static void rcu_tasks_trace_pregp_step(struct list_head *hop)
}

/*
- * Do intermediate processing between task and holdout scans and
- * pick up the idle tasks.
+ * Do intermediate processing between task and holdout scans.
*/
static void rcu_tasks_trace_postscan(struct list_head *hop)
{
- int cpu;
-
- for_each_online_cpu(cpu)
- rcu_tasks_trace_pertask(idle_task(cpu), hop);
-
// Re-enable CPU hotplug now that the tasklist scan has completed.
cpus_read_unlock();

--
2.31.1.189.g2e36527f23

2022-06-21 00:03:59

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 15/32] rcu-tasks: Make RCU Tasks Trace stall warnings print full .b.need_qs field

Currently, the RCU Tasks Trace CPU stall warning simply indicates
whether or not the .b.need_qs field is zero. This commit shows the
three permitted values and flags other values with either "!" or "?".
This is a debugging aid.

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: KP Singh <[email protected]>
---
kernel/rcu/tasks.h | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 93096188d3631..5eefbab7f2edb 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1502,13 +1502,14 @@ static void show_stalled_task_trace(struct task_struct *t, bool *firstreport)
".I"[t->trc_ipi_to_cpu >= 0],
".i"[is_idle_tsk]);
else
- pr_alert("P%d: %c%c%c nesting: %d%c cpu: %d%s\n",
+ pr_alert("P%d: %c%c%c nesting: %d%c%c cpu: %d%s\n",
t->pid,
".I"[trc_rdr.ipi_to_cpu >= 0],
".i"[is_idle_tsk],
".N"[cpu >= 0 && tick_nohz_full_cpu(cpu)],
trc_rdr.nesting,
- " N"[!!trc_rdr.needqs],
+ " !CN"[trc_rdr.needqs & 0x3],
+ " ?"[trc_rdr.needqs > 0x3],
cpu, cpu_online(cpu) ? "" : "(offline)");
sched_show_task(t);
}
--
2.31.1.189.g2e36527f23

2022-06-21 00:07:05

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 18/32] rcu-tasks: Track blocked RCU Tasks Trace readers

This commit places any task that has ever blocked within its current
RCU Tasks Trace read-side critical section on a per-CPU list within the
rcu_tasks_percpu structure. Tasks are removed from this list when they
exit by the exit_tasks_rcu_finish_trace() function. The purpose of this
commit is to provide the information needed to eliminate the current
scan of the full task list.

This commit offsets the INT_MIN value for ->trc_reader_nesting with the
new nesting level in order to avoid queueing tasks that are exiting
their read-side critical sections.

[ paulmck: Apply kernel test robot feedback. ]
[ paulmck: Apply feedback from [email protected] ]

Signed-off-by: Paul E. McKenney <[email protected]>
Tested-by: syzbot <[email protected]>
Tested-by: "Zhang, Qiang1" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: KP Singh <[email protected]>
---
include/linux/rcupdate.h | 11 +++++++++--
include/linux/rcupdate_trace.h | 2 +-
kernel/rcu/tasks.h | 22 +++++++++++++++++++++-
3 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 1e728d544fc1e..ebdfeead44e51 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -174,12 +174,19 @@ void synchronize_rcu_tasks(void);
#define TRC_NEED_QS_CHECKED 0x2 // Task has been checked for needing quiescent state.

u8 rcu_trc_cmpxchg_need_qs(struct task_struct *t, u8 old, u8 new);
+void rcu_tasks_trace_qs_blkd(struct task_struct *t);

# define rcu_tasks_trace_qs(t) \
do { \
+ int ___rttq_nesting = READ_ONCE((t)->trc_reader_nesting); \
+ \
if (likely(!READ_ONCE((t)->trc_reader_special.b.need_qs)) && \
- likely(!READ_ONCE((t)->trc_reader_nesting))) \
+ likely(!___rttq_nesting)) { \
rcu_trc_cmpxchg_need_qs((t), 0, TRC_NEED_QS_CHECKED); \
+ } else if (___rttq_nesting && ___rttq_nesting != INT_MIN && \
+ !READ_ONCE((t)->trc_reader_special.b.blocked)) { \
+ rcu_tasks_trace_qs_blkd(t); \
+ } \
} while (0)
# else
# define rcu_tasks_trace_qs(t) do { } while (0)
@@ -188,7 +195,7 @@ u8 rcu_trc_cmpxchg_need_qs(struct task_struct *t, u8 old, u8 new);
#define rcu_tasks_qs(t, preempt) \
do { \
rcu_tasks_classic_qs((t), (preempt)); \
- rcu_tasks_trace_qs((t)); \
+ rcu_tasks_trace_qs(t); \
} while (0)

# ifdef CONFIG_TASKS_RUDE_RCU
diff --git a/include/linux/rcupdate_trace.h b/include/linux/rcupdate_trace.h
index 6f9c358173989..9bc8cbb33340b 100644
--- a/include/linux/rcupdate_trace.h
+++ b/include/linux/rcupdate_trace.h
@@ -75,7 +75,7 @@ static inline void rcu_read_unlock_trace(void)
nesting = READ_ONCE(t->trc_reader_nesting) - 1;
barrier(); // Critical section before disabling.
// Disable IPI-based setting of .need_qs.
- WRITE_ONCE(t->trc_reader_nesting, INT_MIN);
+ WRITE_ONCE(t->trc_reader_nesting, INT_MIN + nesting);
if (likely(!READ_ONCE(t->trc_reader_special.s)) || nesting) {
WRITE_ONCE(t->trc_reader_nesting, nesting);
return; // We assume shallow reader nesting.
diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index fd4508af055e6..bab75ec26bdbb 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1261,6 +1261,24 @@ void rcu_read_unlock_trace_special(struct task_struct *t)
}
EXPORT_SYMBOL_GPL(rcu_read_unlock_trace_special);

+/* Add a newly blocked reader task to its CPU's list. */
+void rcu_tasks_trace_qs_blkd(struct task_struct *t)
+{
+ unsigned long flags;
+ struct rcu_tasks_percpu *rtpcp;
+
+ local_irq_save(flags);
+ rtpcp = this_cpu_ptr(rcu_tasks_trace.rtpcpu);
+ raw_spin_lock_rcu_node(rtpcp); // irqs already disabled
+ t->trc_blkd_cpu = smp_processor_id();
+ if (!rtpcp->rtp_blkd_tasks.next)
+ INIT_LIST_HEAD(&rtpcp->rtp_blkd_tasks);
+ list_add(&t->trc_blkd_node, &rtpcp->rtp_blkd_tasks);
+ t->trc_reader_special.b.blocked = true;
+ raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
+}
+EXPORT_SYMBOL_GPL(rcu_tasks_trace_qs_blkd);
+
/* Add a task to the holdout list, if it is not already on the list. */
static void trc_add_holdout(struct task_struct *t, struct list_head *bhp)
{
@@ -1586,9 +1604,11 @@ static void rcu_tasks_trace_postgp(struct rcu_tasks *rtp)
/* Report any needed quiescent state for this exiting task. */
static void exit_tasks_rcu_finish_trace(struct task_struct *t)
{
+ union rcu_special trs = READ_ONCE(t->trc_reader_special);
+
rcu_trc_cmpxchg_need_qs(t, 0, TRC_NEED_QS_CHECKED);
WARN_ON_ONCE(READ_ONCE(t->trc_reader_nesting));
- if (WARN_ON_ONCE(rcu_ld_need_qs(t) & TRC_NEED_QS))
+ if (WARN_ON_ONCE(rcu_ld_need_qs(t) & TRC_NEED_QS) || trs.b.blocked)
rcu_read_unlock_trace_special(t);
else
WRITE_ONCE(t->trc_reader_nesting, 0);
--
2.31.1.189.g2e36527f23

2022-06-21 00:12:46

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 02/32] rcu-tasks: Split rcu_tasks_one_gp() from rcu_tasks_kthread()

This commit abstracts most of the rcu_tasks_kthread() function's loop
body into a new rcu_tasks_one_gp() function. It also introduces
a new ->tasks_gp_mutex to synchronize concurrent calls to this new
rcu_tasks_one_gp() function. This commit is preparation for allowing
RCU tasks grace periods to be driven by the calling task during the
mid-boot dead zone.

Signed-off-by: Paul E. McKenney <[email protected]>
---
kernel/rcu/tasks.h | 58 ++++++++++++++++++++++++++++------------------
1 file changed, 36 insertions(+), 22 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index b8690a412c5bf..d7b12f524e81c 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -48,6 +48,7 @@ struct rcu_tasks_percpu {
* struct rcu_tasks - Definition for a Tasks-RCU-like mechanism.
* @cbs_wait: RCU wait allowing a new callback to get kthread's attention.
* @cbs_gbl_lock: Lock protecting callback list.
+ * @tasks_gp_mutex: Mutex protecting grace period, needed during mid-boot dead zone.
* @kthread_ptr: This flavor's grace-period/callback-invocation kthread.
* @gp_func: This flavor's grace-period-wait function.
* @gp_state: Grace period's most recent state transition (debugging).
@@ -79,6 +80,7 @@ struct rcu_tasks_percpu {
struct rcu_tasks {
struct rcuwait cbs_wait;
raw_spinlock_t cbs_gbl_lock;
+ struct mutex tasks_gp_mutex;
int gp_state;
int gp_sleep;
int init_fract;
@@ -119,6 +121,7 @@ static struct rcu_tasks rt_name = \
{ \
.cbs_wait = __RCUWAIT_INITIALIZER(rt_name.wait), \
.cbs_gbl_lock = __RAW_SPIN_LOCK_UNLOCKED(rt_name.cbs_gbl_lock), \
+ .tasks_gp_mutex = __MUTEX_INITIALIZER(rt_name.tasks_gp_mutex), \
.gp_func = gp, \
.call_func = call, \
.rtpcpu = &rt_name ## __percpu, \
@@ -502,10 +505,37 @@ static void rcu_tasks_invoke_cbs_wq(struct work_struct *wp)
rcu_tasks_invoke_cbs(rtp, rtpcp);
}

-/* RCU-tasks kthread that detects grace periods and invokes callbacks. */
-static int __noreturn rcu_tasks_kthread(void *arg)
+// Wait for one grace period.
+static void rcu_tasks_one_gp(struct rcu_tasks *rtp)
{
int needgpcb;
+
+ mutex_lock(&rtp->tasks_gp_mutex);
+ set_tasks_gp_state(rtp, RTGS_WAIT_CBS);
+
+ // If there were none, wait a bit and start over.
+ rcuwait_wait_event(&rtp->cbs_wait,
+ (needgpcb = rcu_tasks_need_gpcb(rtp)),
+ TASK_IDLE);
+
+ if (needgpcb & 0x2) {
+ // Wait for one grace period.
+ set_tasks_gp_state(rtp, RTGS_WAIT_GP);
+ rtp->gp_start = jiffies;
+ rcu_seq_start(&rtp->tasks_gp_seq);
+ rtp->gp_func(rtp);
+ rcu_seq_end(&rtp->tasks_gp_seq);
+ }
+
+ // Invoke callbacks.
+ set_tasks_gp_state(rtp, RTGS_INVOKE_CBS);
+ rcu_tasks_invoke_cbs(rtp, per_cpu_ptr(rtp->rtpcpu, 0));
+ mutex_unlock(&rtp->tasks_gp_mutex);
+}
+
+// RCU-tasks kthread that detects grace periods and invokes callbacks.
+static int __noreturn rcu_tasks_kthread(void *arg)
+{
struct rcu_tasks *rtp = arg;

/* Run on housekeeping CPUs by default. Sysadm can move if desired. */
@@ -519,27 +549,11 @@ static int __noreturn rcu_tasks_kthread(void *arg)
* This loop is terminated by the system going down. ;-)
*/
for (;;) {
- set_tasks_gp_state(rtp, RTGS_WAIT_CBS);
-
- /* If there were none, wait a bit and start over. */
- rcuwait_wait_event(&rtp->cbs_wait,
- (needgpcb = rcu_tasks_need_gpcb(rtp)),
- TASK_IDLE);
-
- if (needgpcb & 0x2) {
- // Wait for one grace period.
- set_tasks_gp_state(rtp, RTGS_WAIT_GP);
- rtp->gp_start = jiffies;
- rcu_seq_start(&rtp->tasks_gp_seq);
- rtp->gp_func(rtp);
- rcu_seq_end(&rtp->tasks_gp_seq);
- }
-
- /* Invoke callbacks. */
- set_tasks_gp_state(rtp, RTGS_INVOKE_CBS);
- rcu_tasks_invoke_cbs(rtp, per_cpu_ptr(rtp->rtpcpu, 0));
+ // Wait for one grace period and invoke any callbacks
+ // that are ready.
+ rcu_tasks_one_gp(rtp);

- /* Paranoid sleep to keep this from entering a tight loop */
+ // Paranoid sleep to keep this from entering a tight loop.
schedule_timeout_idle(rtp->gp_sleep);
}
}
--
2.31.1.189.g2e36527f23

2022-06-21 00:15:13

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 27/32] rcu-tasks: Maintain a count of tasks blocking RCU Tasks Trace grace period

This commit maintains a new n_trc_holdouts counter that tracks the number
of tasks blocking the RCU Tasks grace period. This counter is useful
for debugging, and its value has been added to a diagostic message.

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: KP Singh <[email protected]>
---
kernel/rcu/tasks.h | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index fe0552086ccfc..9d7d6fd4b8a79 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1206,6 +1206,7 @@ static DEFINE_PER_CPU(bool, trc_ipi_to_cpu);
static unsigned long n_heavy_reader_attempts;
static unsigned long n_heavy_reader_updates;
static unsigned long n_heavy_reader_ofl_updates;
+static unsigned long n_trc_holdouts;

void call_rcu_tasks_trace(struct rcu_head *rhp, rcu_callback_t func);
DEFINE_RCU_TASKS(rcu_tasks_trace, rcu_tasks_wait_gp, call_rcu_tasks_trace,
@@ -1299,6 +1300,7 @@ static void trc_add_holdout(struct task_struct *t, struct list_head *bhp)
if (list_empty(&t->trc_holdout_list)) {
get_task_struct(t);
list_add(&t->trc_holdout_list, bhp);
+ n_trc_holdouts++;
}
}

@@ -1308,6 +1310,7 @@ static void trc_del_holdout(struct task_struct *t)
if (!list_empty(&t->trc_holdout_list)) {
list_del_init(&t->trc_holdout_list);
put_task_struct(t);
+ n_trc_holdouts--;
}
}

@@ -1760,7 +1763,8 @@ void show_rcu_tasks_trace_gp_kthread(void)
{
char buf[64];

- sprintf(buf, "h:%lu/%lu/%lu",
+ sprintf(buf, "N%lu h:%lu/%lu/%lu",
+ data_race(n_trc_holdouts),
data_race(n_heavy_reader_ofl_updates),
data_race(n_heavy_reader_updates),
data_race(n_heavy_reader_attempts));
--
2.31.1.189.g2e36527f23

2022-06-21 00:15:38

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 22/32] rcu-tasks: Avoid rcu_tasks_trace_pertask() duplicate list additions

This commit adds checks within rcu_tasks_trace_pertask() to avoid
duplicate (and destructive) additions to the holdouts list. These checks
will be required later due to the possibility of a given task having
blocked while in an RCU Tasks Trace read-side critical section, but now
running on a CPU.

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: KP Singh <[email protected]>
---
kernel/rcu/tasks.h | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 66d8473f1bda1..1aa6a24a9bc2b 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1454,9 +1454,10 @@ static void rcu_tasks_trace_pertask(struct task_struct *t,
struct list_head *hop)
{
// During early boot when there is only the one boot CPU, there
- // is no idle task for the other CPUs. Also, the grace-period
- // kthread is always in a quiescent state. Either way, just return.
- if (unlikely(t == NULL) || t == current)
+ // is no idle task for the other CPUs. Also, the grace-period
+ // kthread is always in a quiescent state. In addition, just return
+ // if this task is already on the list.
+ if (unlikely(t == NULL) || t == current || !list_empty(&t->trc_holdout_list))
return;

rcu_st_need_qs(t, 0);
--
2.31.1.189.g2e36527f23

2022-06-21 00:16:15

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 01/32] rcu-tasks: Check for abandoned callbacks

This commit adds a debugging scan for callbacks that got lost during a
callback-queueing transition.

Signed-off-by: Paul E. McKenney <[email protected]>
---
kernel/rcu/tasks.h | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 3925e32159b5a..b8690a412c5bf 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -439,6 +439,11 @@ static int rcu_tasks_need_gpcb(struct rcu_tasks *rtp)
WRITE_ONCE(rtp->percpu_dequeue_lim, 1);
pr_info("Completing switch %s to CPU-0 callback queuing.\n", rtp->name);
}
+ for (cpu = rtp->percpu_dequeue_lim; cpu < nr_cpu_ids; cpu++) {
+ struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rtp->rtpcpu, cpu);
+
+ WARN_ON_ONCE(rcu_segcblist_n_cbs(&rtpcp->cblist));
+ }
raw_spin_unlock_irqrestore(&rtp->cbs_gbl_lock, flags);
}

--
2.31.1.189.g2e36527f23

2022-06-21 00:16:49

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 03/32] rcu-tasks: Move synchronize_rcu_tasks_generic() down

This is strictly a code-motion commit that moves the
synchronize_rcu_tasks_generic() down to where it can invoke
rcu_tasks_one_gp() without the need for a forward declaration.

Signed-off-by: Paul E. McKenney <[email protected]>
---
kernel/rcu/tasks.h | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index d7b12f524e81c..ad993c4ed924f 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -326,17 +326,6 @@ static void call_rcu_tasks_generic(struct rcu_head *rhp, rcu_callback_t func,
irq_work_queue(&rtpcp->rtp_irq_work);
}

-// Wait for a grace period for the specified flavor of Tasks RCU.
-static void synchronize_rcu_tasks_generic(struct rcu_tasks *rtp)
-{
- /* Complain if the scheduler has not started. */
- RCU_LOCKDEP_WARN(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE,
- "synchronize_rcu_tasks called too soon");
-
- /* Wait for the grace period. */
- wait_rcu_gp(rtp->call_func);
-}
-
// RCU callback function for rcu_barrier_tasks_generic().
static void rcu_barrier_tasks_generic_cb(struct rcu_head *rhp)
{
@@ -558,6 +547,17 @@ static int __noreturn rcu_tasks_kthread(void *arg)
}
}

+// Wait for a grace period for the specified flavor of Tasks RCU.
+static void synchronize_rcu_tasks_generic(struct rcu_tasks *rtp)
+{
+ /* Complain if the scheduler has not started. */
+ RCU_LOCKDEP_WARN(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE,
+ "synchronize_rcu_tasks called too soon");
+
+ /* Wait for the grace period. */
+ wait_rcu_gp(rtp->call_func);
+}
+
/* Spawn RCU-tasks grace-period kthread. */
static void __init rcu_spawn_tasks_kthread_generic(struct rcu_tasks *rtp)
{
--
2.31.1.189.g2e36527f23

2022-06-21 00:17:08

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 05/32] rcu-tasks: Merge state into .b.need_qs and atomically update

This commit gets rid of the task_struct structure's ->trc_reader_checked
field, making it instead be a bit within the task_struct structure's
existing ->trc_reader_special.b.need_qs field. This commit also
atomically loads, stores, and checks the resulting combination of the
reader-checked and need-quiescent state flags. This will in turn allow
significant simplification of the rcu_tasks_trace_postgp() function
as well as elimination of the trc_n_readers_need_end counter in later
commits. These changes will in turn simplify later elimination of the
RCU Tasks Trace scan of the task list, which will make RCU Tasks Trace
grace periods less CPU-intensive.

[ paulmck: Apply kernel test robot feedback. ]

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: KP Singh <[email protected]>
---
include/linux/rcupdate.h | 18 ++++---
include/linux/sched.h | 1 -
kernel/rcu/tasks.h | 103 +++++++++++++++++++++++++++------------
3 files changed, 82 insertions(+), 40 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 1a32036c918cd..1e728d544fc1e 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -169,13 +169,17 @@ void synchronize_rcu_tasks(void);
# endif

# ifdef CONFIG_TASKS_TRACE_RCU
-# define rcu_tasks_trace_qs(t) \
- do { \
- if (!likely(READ_ONCE((t)->trc_reader_checked)) && \
- !unlikely(READ_ONCE((t)->trc_reader_nesting))) { \
- smp_store_release(&(t)->trc_reader_checked, true); \
- smp_mb(); /* Readers partitioned by store. */ \
- } \
+// Bits for ->trc_reader_special.b.need_qs field.
+#define TRC_NEED_QS 0x1 // Task needs a quiescent state.
+#define TRC_NEED_QS_CHECKED 0x2 // Task has been checked for needing quiescent state.
+
+u8 rcu_trc_cmpxchg_need_qs(struct task_struct *t, u8 old, u8 new);
+
+# define rcu_tasks_trace_qs(t) \
+ do { \
+ if (likely(!READ_ONCE((t)->trc_reader_special.b.need_qs)) && \
+ likely(!READ_ONCE((t)->trc_reader_nesting))) \
+ rcu_trc_cmpxchg_need_qs((t), 0, TRC_NEED_QS_CHECKED); \
} while (0)
# else
# define rcu_tasks_trace_qs(t) do { } while (0)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index c46f3a63b758f..e6eb5871593e9 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -843,7 +843,6 @@ struct task_struct {
int trc_reader_nesting;
int trc_ipi_to_cpu;
union rcu_special trc_reader_special;
- bool trc_reader_checked;
struct list_head trc_holdout_list;
#endif /* #ifdef CONFIG_TASKS_TRACE_RCU */

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index bd9f2e24f5c73..7bdc62606816b 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1208,6 +1208,39 @@ void call_rcu_tasks_trace(struct rcu_head *rhp, rcu_callback_t func);
DEFINE_RCU_TASKS(rcu_tasks_trace, rcu_tasks_wait_gp, call_rcu_tasks_trace,
"RCU Tasks Trace");

+/* Load from ->trc_reader_special.b.need_qs with proper ordering. */
+static u8 rcu_ld_need_qs(struct task_struct *t)
+{
+ smp_mb(); // Enforce full grace-period ordering.
+ return smp_load_acquire(&t->trc_reader_special.b.need_qs);
+}
+
+/* Store to ->trc_reader_special.b.need_qs with proper ordering. */
+static void rcu_st_need_qs(struct task_struct *t, u8 v)
+{
+ smp_store_release(&t->trc_reader_special.b.need_qs, v);
+ smp_mb(); // Enforce full grace-period ordering.
+}
+
+/*
+ * Do a cmpxchg() on ->trc_reader_special.b.need_qs, allowing for
+ * the four-byte operand-size restriction of some platforms.
+ * Returns the old value, which is often ignored.
+ */
+u8 rcu_trc_cmpxchg_need_qs(struct task_struct *t, u8 old, u8 new)
+{
+ union rcu_special ret;
+ union rcu_special trs_old = READ_ONCE(t->trc_reader_special);
+ union rcu_special trs_new = trs_old;
+
+ if (trs_old.b.need_qs != old)
+ return trs_old.b.need_qs;
+ trs_new.b.need_qs = new;
+ ret.s = cmpxchg(&t->trc_reader_special.s, trs_old.s, trs_new.s);
+ return ret.b.need_qs;
+}
+EXPORT_SYMBOL_GPL(rcu_trc_cmpxchg_need_qs);
+
/*
* This irq_work handler allows rcu_read_unlock_trace() to be invoked
* while the scheduler locks are held.
@@ -1221,16 +1254,20 @@ static DEFINE_IRQ_WORK(rcu_tasks_trace_iw, rcu_read_unlock_iw);
/* If we are the last reader, wake up the grace-period kthread. */
void rcu_read_unlock_trace_special(struct task_struct *t)
{
- int nq = READ_ONCE(t->trc_reader_special.b.need_qs);
+ int nqs = (rcu_ld_need_qs(t) == (TRC_NEED_QS_CHECKED | TRC_NEED_QS));

- if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB) &&
- t->trc_reader_special.b.need_mb)
+ if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB) && t->trc_reader_special.b.need_mb)
smp_mb(); // Pairs with update-side barriers.
// Update .need_qs before ->trc_reader_nesting for irq/NMI handlers.
- if (nq)
- WRITE_ONCE(t->trc_reader_special.b.need_qs, false);
+ if (nqs) {
+ u8 result = rcu_trc_cmpxchg_need_qs(t, TRC_NEED_QS_CHECKED | TRC_NEED_QS,
+ TRC_NEED_QS_CHECKED);
+
+ WARN_ONCE(result != (TRC_NEED_QS_CHECKED | TRC_NEED_QS),
+ "%s: result = %d", __func__, result);
+ }
WRITE_ONCE(t->trc_reader_nesting, 0);
- if (nq && atomic_dec_and_test(&trc_n_readers_need_end))
+ if (nqs && atomic_dec_and_test(&trc_n_readers_need_end))
irq_work_queue(&rcu_tasks_trace_iw);
}
EXPORT_SYMBOL_GPL(rcu_read_unlock_trace_special);
@@ -1260,27 +1297,24 @@ static void trc_read_check_handler(void *t_in)
struct task_struct *texp = t_in;

// If the task is no longer running on this CPU, leave.
- if (unlikely(texp != t)) {
+ if (unlikely(texp != t))
goto reset_ipi; // Already on holdout list, so will check later.
- }

// If the task is not in a read-side critical section, and
// if this is the last reader, awaken the grace-period kthread.
if (likely(!READ_ONCE(t->trc_reader_nesting))) {
- WRITE_ONCE(t->trc_reader_checked, true);
+ rcu_trc_cmpxchg_need_qs(t, 0, TRC_NEED_QS_CHECKED);
goto reset_ipi;
}
// If we are racing with an rcu_read_unlock_trace(), try again later.
if (unlikely(READ_ONCE(t->trc_reader_nesting) < 0))
goto reset_ipi;
- WRITE_ONCE(t->trc_reader_checked, true);

// Get here if the task is in a read-side critical section. Set
// its state so that it will awaken the grace-period kthread upon
// exit from that critical section.
- atomic_inc(&trc_n_readers_need_end); // One more to wait on.
- WARN_ON_ONCE(READ_ONCE(t->trc_reader_special.b.need_qs));
- WRITE_ONCE(t->trc_reader_special.b.need_qs, true);
+ if (!rcu_trc_cmpxchg_need_qs(t, 0, TRC_NEED_QS | TRC_NEED_QS_CHECKED))
+ atomic_inc(&trc_n_readers_need_end); // One more to wait on.

reset_ipi:
// Allow future IPIs to be sent on CPU and for task.
@@ -1291,8 +1325,9 @@ static void trc_read_check_handler(void *t_in)
}

/* Callback function for scheduler to check locked-down task. */
-static int trc_inspect_reader(struct task_struct *t, void *arg)
+static int trc_inspect_reader(struct task_struct *t, void *bhp_in)
{
+ struct list_head *bhp = bhp_in;
int cpu = task_cpu(t);
int nesting;
bool ofl = cpu_is_offline(cpu);
@@ -1323,16 +1358,19 @@ static int trc_inspect_reader(struct task_struct *t, void *arg)
// If not exiting a read-side critical section, mark as checked
// so that the grace-period kthread will remove it from the
// holdout list.
- t->trc_reader_checked = nesting >= 0;
- if (nesting <= 0)
+ if (nesting <= 0) {
+ if (!nesting)
+ rcu_trc_cmpxchg_need_qs(t, 0, TRC_NEED_QS_CHECKED);
return nesting ? -EINVAL : 0; // If in QS, done, otherwise try again later.
+ }

// The task is in a read-side critical section, so set up its
// state so that it will awaken the grace-period kthread upon exit
// from that critical section.
- atomic_inc(&trc_n_readers_need_end); // One more to wait on.
- WARN_ON_ONCE(READ_ONCE(t->trc_reader_special.b.need_qs));
- WRITE_ONCE(t->trc_reader_special.b.need_qs, true);
+ if (!rcu_trc_cmpxchg_need_qs(t, 0, TRC_NEED_QS | TRC_NEED_QS_CHECKED)) {
+ atomic_inc(&trc_n_readers_need_end); // One more to wait on.
+ trc_add_holdout(t, bhp);
+ }
return 0;
}

@@ -1348,14 +1386,14 @@ static void trc_wait_for_one_reader(struct task_struct *t,

// The current task had better be in a quiescent state.
if (t == current) {
- t->trc_reader_checked = true;
+ rcu_trc_cmpxchg_need_qs(t, 0, TRC_NEED_QS_CHECKED);
WARN_ON_ONCE(READ_ONCE(t->trc_reader_nesting));
return;
}

// Attempt to nail down the task for inspection.
get_task_struct(t);
- if (!task_call_func(t, trc_inspect_reader, NULL)) {
+ if (!task_call_func(t, trc_inspect_reader, bhp)) {
put_task_struct(t);
return;
}
@@ -1419,8 +1457,7 @@ static void rcu_tasks_trace_pertask(struct task_struct *t,
if (unlikely(t == NULL))
return;

- WRITE_ONCE(t->trc_reader_special.b.need_qs, false);
- WRITE_ONCE(t->trc_reader_checked, false);
+ rcu_st_need_qs(t, 0);
t->trc_ipi_to_cpu = -1;
trc_wait_for_one_reader(t, hop);
}
@@ -1442,7 +1479,8 @@ static void rcu_tasks_trace_postscan(struct list_head *hop)
// Wait for late-stage exiting tasks to finish exiting.
// These might have passed the call to exit_tasks_rcu_finish().
synchronize_rcu();
- // Any tasks that exit after this point will set ->trc_reader_checked.
+ // Any tasks that exit after this point will set
+ // TRC_NEED_QS_CHECKED in ->trc_reader_special.b.need_qs.
}

/* Communicate task state back to the RCU tasks trace stall warning request. */
@@ -1460,7 +1498,7 @@ static int trc_check_slow_task(struct task_struct *t, void *arg)
return false; // It is running, so decline to inspect it.
trc_rdrp->nesting = READ_ONCE(t->trc_reader_nesting);
trc_rdrp->ipi_to_cpu = READ_ONCE(t->trc_ipi_to_cpu);
- trc_rdrp->needqs = READ_ONCE(t->trc_reader_special.b.need_qs);
+ trc_rdrp->needqs = rcu_ld_need_qs(t);
return true;
}

@@ -1514,12 +1552,12 @@ static void check_all_holdout_tasks_trace(struct list_head *hop,
list_for_each_entry_safe(t, g, hop, trc_holdout_list) {
// If safe and needed, try to check the current task.
if (READ_ONCE(t->trc_ipi_to_cpu) == -1 &&
- !READ_ONCE(t->trc_reader_checked))
+ !(rcu_ld_need_qs(t) & TRC_NEED_QS_CHECKED))
trc_wait_for_one_reader(t, hop);

// If check succeeded, remove this task from the list.
if (smp_load_acquire(&t->trc_ipi_to_cpu) == -1 &&
- READ_ONCE(t->trc_reader_checked))
+ rcu_ld_need_qs(t) == TRC_NEED_QS_CHECKED)
trc_del_holdout(t);
else if (needreport)
show_stalled_task_trace(t, firstreport);
@@ -1574,12 +1612,12 @@ static void rcu_tasks_trace_postgp(struct rcu_tasks *rtp)
// Stall warning time, so make a list of the offenders.
rcu_read_lock();
for_each_process_thread(g, t)
- if (READ_ONCE(t->trc_reader_special.b.need_qs))
+ if (rcu_ld_need_qs(t) & TRC_NEED_QS)
trc_add_holdout(t, &holdouts);
rcu_read_unlock();
firstreport = true;
list_for_each_entry_safe(t, g, &holdouts, trc_holdout_list) {
- if (READ_ONCE(t->trc_reader_special.b.need_qs))
+ if (rcu_ld_need_qs(t) & TRC_NEED_QS)
show_stalled_task_trace(t, &firstreport);
trc_del_holdout(t); // Release task_struct reference.
}
@@ -1595,11 +1633,12 @@ static void rcu_tasks_trace_postgp(struct rcu_tasks *rtp)
/* Report any needed quiescent state for this exiting task. */
static void exit_tasks_rcu_finish_trace(struct task_struct *t)
{
- WRITE_ONCE(t->trc_reader_checked, true);
+ rcu_trc_cmpxchg_need_qs(t, 0, TRC_NEED_QS_CHECKED);
WARN_ON_ONCE(READ_ONCE(t->trc_reader_nesting));
- WRITE_ONCE(t->trc_reader_nesting, 0);
- if (WARN_ON_ONCE(READ_ONCE(t->trc_reader_special.b.need_qs)))
+ if (WARN_ON_ONCE(rcu_ld_need_qs(t) & TRC_NEED_QS))
rcu_read_unlock_trace_special(t);
+ else
+ WRITE_ONCE(t->trc_reader_nesting, 0);
}

/**
--
2.31.1.189.g2e36527f23

2022-06-21 00:17:18

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 06/32] rcu-tasks: Remove rcu_tasks_trace_postgp() wait for counter

Now that tasks are not removed from the list until they have responded to
any needed request for a quiescent state, it is no longer necessary to
wait for the trc_n_readers_need_end counter to go to zero. This commit
therefore removes that waiting code.

It is therefore also no longer necessary for rcu_tasks_trace_postgp() to
do the final decrement of this counter, so that code is also removed.
This in turn means that trc_n_readers_need_end counter itself can
be removed, as can the rcu_tasks_trace_iw irq_work structure and the
rcu_read_unlock_iw() function.

[ paulmck: Apply feedback from Zqiang. ]

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: KP Singh <[email protected]>
---
kernel/rcu/tasks.h | 62 +++-------------------------------------------
1 file changed, 3 insertions(+), 59 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 7bdc62606816b..561d24f7f73cc 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1192,9 +1192,6 @@ EXPORT_SYMBOL_GPL(rcu_trace_lock_map);

#ifdef CONFIG_TASKS_TRACE_RCU

-static atomic_t trc_n_readers_need_end; // Number of waited-for readers.
-static DECLARE_WAIT_QUEUE_HEAD(trc_wait); // List of holdout tasks.
-
// Record outstanding IPIs to each CPU. No point in sending two...
static DEFINE_PER_CPU(bool, trc_ipi_to_cpu);

@@ -1241,16 +1238,6 @@ u8 rcu_trc_cmpxchg_need_qs(struct task_struct *t, u8 old, u8 new)
}
EXPORT_SYMBOL_GPL(rcu_trc_cmpxchg_need_qs);

-/*
- * This irq_work handler allows rcu_read_unlock_trace() to be invoked
- * while the scheduler locks are held.
- */
-static void rcu_read_unlock_iw(struct irq_work *iwp)
-{
- wake_up(&trc_wait);
-}
-static DEFINE_IRQ_WORK(rcu_tasks_trace_iw, rcu_read_unlock_iw);
-
/* If we are the last reader, wake up the grace-period kthread. */
void rcu_read_unlock_trace_special(struct task_struct *t)
{
@@ -1267,8 +1254,6 @@ void rcu_read_unlock_trace_special(struct task_struct *t)
"%s: result = %d", __func__, result);
}
WRITE_ONCE(t->trc_reader_nesting, 0);
- if (nqs && atomic_dec_and_test(&trc_n_readers_need_end))
- irq_work_queue(&rcu_tasks_trace_iw);
}
EXPORT_SYMBOL_GPL(rcu_read_unlock_trace_special);

@@ -1313,8 +1298,7 @@ static void trc_read_check_handler(void *t_in)
// Get here if the task is in a read-side critical section. Set
// its state so that it will awaken the grace-period kthread upon
// exit from that critical section.
- if (!rcu_trc_cmpxchg_need_qs(t, 0, TRC_NEED_QS | TRC_NEED_QS_CHECKED))
- atomic_inc(&trc_n_readers_need_end); // One more to wait on.
+ rcu_trc_cmpxchg_need_qs(t, 0, TRC_NEED_QS | TRC_NEED_QS_CHECKED);

reset_ipi:
// Allow future IPIs to be sent on CPU and for task.
@@ -1367,10 +1351,8 @@ static int trc_inspect_reader(struct task_struct *t, void *bhp_in)
// The task is in a read-side critical section, so set up its
// state so that it will awaken the grace-period kthread upon exit
// from that critical section.
- if (!rcu_trc_cmpxchg_need_qs(t, 0, TRC_NEED_QS | TRC_NEED_QS_CHECKED)) {
- atomic_inc(&trc_n_readers_need_end); // One more to wait on.
+ if (!rcu_trc_cmpxchg_need_qs(t, 0, TRC_NEED_QS | TRC_NEED_QS_CHECKED))
trc_add_holdout(t, bhp);
- }
return 0;
}

@@ -1436,9 +1418,6 @@ static void rcu_tasks_trace_pregp_step(void)
{
int cpu;

- // Allow for fast-acting IPIs.
- atomic_set(&trc_n_readers_need_end, 1);
-
// There shouldn't be any old IPIs, but...
for_each_possible_cpu(cpu)
WARN_ON_ONCE(per_cpu(trc_ipi_to_cpu, cpu));
@@ -1581,10 +1560,6 @@ static void rcu_tasks_trace_empty_fn(void *unused)
static void rcu_tasks_trace_postgp(struct rcu_tasks *rtp)
{
int cpu;
- bool firstreport;
- struct task_struct *g, *t;
- LIST_HEAD(holdouts);
- long ret;

// Wait for any lingering IPI handlers to complete. Note that
// if a CPU has gone offline or transitioned to userspace in the
@@ -1595,37 +1570,6 @@ static void rcu_tasks_trace_postgp(struct rcu_tasks *rtp)
if (WARN_ON_ONCE(smp_load_acquire(per_cpu_ptr(&trc_ipi_to_cpu, cpu))))
smp_call_function_single(cpu, rcu_tasks_trace_empty_fn, NULL, 1);

- // Remove the safety count.
- smp_mb__before_atomic(); // Order vs. earlier atomics
- atomic_dec(&trc_n_readers_need_end);
- smp_mb__after_atomic(); // Order vs. later atomics
-
- // Wait for readers.
- set_tasks_gp_state(rtp, RTGS_WAIT_READERS);
- for (;;) {
- ret = wait_event_idle_exclusive_timeout(
- trc_wait,
- atomic_read(&trc_n_readers_need_end) == 0,
- READ_ONCE(rcu_task_stall_timeout));
- if (ret)
- break; // Count reached zero.
- // Stall warning time, so make a list of the offenders.
- rcu_read_lock();
- for_each_process_thread(g, t)
- if (rcu_ld_need_qs(t) & TRC_NEED_QS)
- trc_add_holdout(t, &holdouts);
- rcu_read_unlock();
- firstreport = true;
- list_for_each_entry_safe(t, g, &holdouts, trc_holdout_list) {
- if (rcu_ld_need_qs(t) & TRC_NEED_QS)
- show_stalled_task_trace(t, &firstreport);
- trc_del_holdout(t); // Release task_struct reference.
- }
- if (firstreport)
- pr_err("INFO: rcu_tasks_trace detected stalls? (Counter/taskslist mismatch?)\n");
- show_stalled_ipi_trace();
- pr_err("\t%d holdouts\n", atomic_read(&trc_n_readers_need_end));
- }
smp_mb(); // Caller's code must be ordered after wakeup.
// Pairs with pretty much every ordering primitive.
}
@@ -1725,7 +1669,7 @@ void show_rcu_tasks_trace_gp_kthread(void)
{
char buf[64];

- sprintf(buf, "N%d h:%lu/%lu/%lu", atomic_read(&trc_n_readers_need_end),
+ sprintf(buf, "h:%lu/%lu/%lu",
data_race(n_heavy_reader_ofl_updates),
data_race(n_heavy_reader_updates),
data_race(n_heavy_reader_attempts));
--
2.31.1.189.g2e36527f23

2022-09-01 11:09:01

by Sascha Hauer

[permalink] [raw]
Subject: Re: [PATCH rcu 04/32] rcu-tasks: Drive synchronous grace periods from calling task

Hi Paul,

On Mon, Jun 20, 2022 at 03:53:43PM -0700, Paul E. McKenney wrote:
> This commit causes synchronous grace periods to be driven from the task
> invoking synchronize_rcu_*(), allowing these functions to be invoked from
> the mid-boot dead zone extending from when the scheduler was initialized
> to to point that the various RCU tasks grace-period kthreads are spawned.
> This change will allow the self-tests to run in a consistent manner.
>
> Reported-by: Matthew Wilcox <[email protected]>
> Reported-by: Zhouyi Zhou <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>

This commit (appeared in mainline as 4a8cc433b8bf) breaks booting my
ARMv7 based i.MX6ul board when CONFIG_PROVE_RCU is enabled. Reverting
this patch on v6.0-rc3 makes my board boot again. See below for a boot
log. The last message is "Running RCU-tasks wait API self tests", after
that the board hangs. Any idea what goes wrong here?

Sascha

----------------------------8<-----------------------------

[ 0.000000] Booting Linux on physical CPU 0x0
[ 0.000000] Linux version 5.19.0-rc3-00004-g4a8cc433b8bf (sha@dude02) (arm-v7a-linux-gnueabihf-gcc (OSELAS.Toolchain-2021.07.0 11-20210703) 11.1.1 20210703, GNU ld (GNU Binutils) 2.36.1) #229 SMP Thu Sep 1 12:00:07 CEST 2022
[ 0.000000] CPU: ARMv7 Processor [410fc075] revision 5 (ARMv7), cr=10c5387d
[ 0.000000] CPU: div instructions available: patching division code
[ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
[ 0.000000] OF: fdt: Machine model: IDS CU33X
[ 0.000000] earlycon: ec_imx6q0 at MMIO 0x02020000 (options '')
[ 0.000000] printk: bootconsole [ec_imx6q0] enabled
[ 0.000000] Memory policy: Data cache writealloc
[ 0.000000] cma: Reserved 64 MiB at 0x8c000000
[ 0.000000] Zone ranges:
[ 0.000000] Normal [mem 0x0000000080000000-0x000000008fffffff]
[ 0.000000] HighMem empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000080000000-0x000000008fffffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x000000008fffffff]
[ 0.000000] percpu: Embedded 17 pages/cpu s38068 r8192 d23372 u69632
[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 65024
[ 0.000000] Kernel command line: console=ttymxc0,115200n8 earlycon ip=dhcp root=/dev/nfs nfsroot=192.168.8.12:/hom
e/sha/nfsroot/cu33x,v3,tcp
[ 0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes, linear)
[ 0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes, linear)
[ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[ 0.000000] Memory: 162600K/262144K available (15360K kernel code, 2146K rwdata, 5472K rodata, 1024K init, 6681K b
ss, 34008K reserved, 65536K cma-reserved, 0K highmem)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[ 0.000000] trace event string verifier disabled
[ 0.000000] Running RCU self tests
[ 0.000000] rcu: Hierarchical RCU implementation.
[ 0.000000] rcu: RCU event tracing is enabled.
[ 0.000000] rcu: RCU lockdep checking is enabled.
[ 0.000000] rcu: RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=1.
[ 0.000000] Tracing variant of Tasks RCU enabled.
[ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
[ 0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
[ 0.000000] NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16
[ 0.000000] rcu: srcu_init: Setting srcu_struct sizes based on contention.
[ 0.000000] Switching to timer-based delay loop, resolution 41ns
[ 0.000003] sched_clock: 32 bits at 24MHz, resolution 41ns, wraps every 89478484971ns
[ 0.007810] clocksource: mxc_timer1: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 79635851949 ns
[ 0.021748] Console: colour dummy device 80x30
[ 0.023488] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
[ 0.032009] ... MAX_LOCKDEP_SUBCLASSES: 8
[ 0.035282] ... MAX_LOCK_DEPTH: 48
[ 0.039445] ... MAX_LOCKDEP_KEYS: 8192
[ 0.043886] ... CLASSHASH_SIZE: 4096
[ 0.048127] ... MAX_LOCKDEP_ENTRIES: 32768
[ 0.052552] ... MAX_LOCKDEP_CHAINS: 65536
[ 0.057069] ... CHAINHASH_SIZE: 32768
[ 0.061405] memory used by lock dependency info: 4061 kB
[ 0.066788] memory used for stack traces: 2112 kB
[ 0.071645] per task-struct memory footprint: 1536 bytes
[ 0.077138] Calibrating delay loop (skipped), value calculated using timer frequency.. 48.00 BogoMIPS (lpj=240000)
[ 0.087384] pid_max: default: 32768 minimum: 301
[ 0.093527] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[ 0.099327] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[ 0.116798] CPU: Testing write buffer coherency: ok
[ 0.122036] CPU0: update cpu_capacity 1024
[ 0.123381] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
[ 0.137390] cblist_init_generic: Setting adjustable number of callback queues.
[ 0.142282] cblist_init_generic: Setting shift to 0 and lim to 1.
[ 0.149333] Running RCU-tasks wait API self tests

--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

2022-09-01 18:01:39

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH rcu 04/32] rcu-tasks: Drive synchronous grace periods from calling task

On Thu, Sep 01, 2022 at 10:27:25AM -0700, Paul E. McKenney wrote:
> On Thu, Sep 01, 2022 at 12:36:25PM +0200, Sascha Hauer wrote:
> > Hi Paul,
> >
> > On Mon, Jun 20, 2022 at 03:53:43PM -0700, Paul E. McKenney wrote:
> > > This commit causes synchronous grace periods to be driven from the task
> > > invoking synchronize_rcu_*(), allowing these functions to be invoked from
> > > the mid-boot dead zone extending from when the scheduler was initialized
> > > to to point that the various RCU tasks grace-period kthreads are spawned.
> > > This change will allow the self-tests to run in a consistent manner.
> > >
> > > Reported-by: Matthew Wilcox <[email protected]>
> > > Reported-by: Zhouyi Zhou <[email protected]>
> > > Signed-off-by: Paul E. McKenney <[email protected]>
> >
> > This commit (appeared in mainline as 4a8cc433b8bf) breaks booting my
> > ARMv7 based i.MX6ul board when CONFIG_PROVE_RCU is enabled. Reverting
> > this patch on v6.0-rc3 makes my board boot again. See below for a boot
> > log. The last message is "Running RCU-tasks wait API self tests", after
> > that the board hangs. Any idea what goes wrong here?
>
> New one on me!
>
> Is it possible to get a stack trace of the hang, perhaps via
> one form or another of sysrq-T? Such a stack trace would likely
> include synchronize_rcu_tasks(), synchronize_rcu_tasks_rude(), or
> synchronize_rcu_tasks_trace() followed by synchronize_rcu_tasks_generic(),
> rcu_tasks_one_gp(), and one of rcu_tasks_wait_gp(),
> rcu_tasks_rude_wait_gp(), or rcu_tasks_wait_gp().

If there is no chance of sysrq-T, kernel debuggers, kernel crash
dumps, or any other source of the stack trace, please decorate the
code path with printk() or similar and let me know where it goes.
Under normal circumstances, this code path is not sensitive to performance
perturbations of the printk() persuasion.

Thanx, Paul

> At this point in the boot sequence, there is only one online CPU,
> correct?
>
> I have seen recent non-boot hangs within synchronize_rcu_tasks()
> due to some other task getting stuck in do_exit() between its calls
> to exit_tasks_rcu_start() and exit_tasks_rcu_finish(). The symptom of
> this is that the aforementioned stack trace includes synchronize_srcu().
> I would not expect much in the way of exiting tasks that early in the
> boot sequence, but who knows?
>
> Thanx, Paul
>
> > Sascha
> >
> > ----------------------------8<-----------------------------
> >
> > [ 0.000000] Booting Linux on physical CPU 0x0
> > [ 0.000000] Linux version 5.19.0-rc3-00004-g4a8cc433b8bf (sha@dude02) (arm-v7a-linux-gnueabihf-gcc (OSELAS.Toolchain-2021.07.0 11-20210703) 11.1.1 20210703, GNU ld (GNU Binutils) 2.36.1) #229 SMP Thu Sep 1 12:00:07 CEST 2022
> > [ 0.000000] CPU: ARMv7 Processor [410fc075] revision 5 (ARMv7), cr=10c5387d
> > [ 0.000000] CPU: div instructions available: patching division code
> > [ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
> > [ 0.000000] OF: fdt: Machine model: IDS CU33X
> > [ 0.000000] earlycon: ec_imx6q0 at MMIO 0x02020000 (options '')
> > [ 0.000000] printk: bootconsole [ec_imx6q0] enabled
> > [ 0.000000] Memory policy: Data cache writealloc
> > [ 0.000000] cma: Reserved 64 MiB at 0x8c000000
> > [ 0.000000] Zone ranges:
> > [ 0.000000] Normal [mem 0x0000000080000000-0x000000008fffffff]
> > [ 0.000000] HighMem empty
> > [ 0.000000] Movable zone start for each node
> > [ 0.000000] Early memory node ranges
> > [ 0.000000] node 0: [mem 0x0000000080000000-0x000000008fffffff]
> > [ 0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x000000008fffffff]
> > [ 0.000000] percpu: Embedded 17 pages/cpu s38068 r8192 d23372 u69632
> > [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 65024
> > [ 0.000000] Kernel command line: console=ttymxc0,115200n8 earlycon ip=dhcp root=/dev/nfs nfsroot=192.168.8.12:/hom
> > e/sha/nfsroot/cu33x,v3,tcp
> > [ 0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes, linear)
> > [ 0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes, linear)
> > [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> > [ 0.000000] Memory: 162600K/262144K available (15360K kernel code, 2146K rwdata, 5472K rodata, 1024K init, 6681K b
> > ss, 34008K reserved, 65536K cma-reserved, 0K highmem)
> > [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
> > [ 0.000000] trace event string verifier disabled
> > [ 0.000000] Running RCU self tests
> > [ 0.000000] rcu: Hierarchical RCU implementation.
> > [ 0.000000] rcu: RCU event tracing is enabled.
> > [ 0.000000] rcu: RCU lockdep checking is enabled.
> > [ 0.000000] rcu: RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=1.
> > [ 0.000000] Tracing variant of Tasks RCU enabled.
> > [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
> > [ 0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
> > [ 0.000000] NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16
> > [ 0.000000] rcu: srcu_init: Setting srcu_struct sizes based on contention.
> > [ 0.000000] Switching to timer-based delay loop, resolution 41ns
> > [ 0.000003] sched_clock: 32 bits at 24MHz, resolution 41ns, wraps every 89478484971ns
> > [ 0.007810] clocksource: mxc_timer1: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 79635851949 ns
> > [ 0.021748] Console: colour dummy device 80x30
> > [ 0.023488] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
> > [ 0.032009] ... MAX_LOCKDEP_SUBCLASSES: 8
> > [ 0.035282] ... MAX_LOCK_DEPTH: 48
> > [ 0.039445] ... MAX_LOCKDEP_KEYS: 8192
> > [ 0.043886] ... CLASSHASH_SIZE: 4096
> > [ 0.048127] ... MAX_LOCKDEP_ENTRIES: 32768
> > [ 0.052552] ... MAX_LOCKDEP_CHAINS: 65536
> > [ 0.057069] ... CHAINHASH_SIZE: 32768
> > [ 0.061405] memory used by lock dependency info: 4061 kB
> > [ 0.066788] memory used for stack traces: 2112 kB
> > [ 0.071645] per task-struct memory footprint: 1536 bytes
> > [ 0.077138] Calibrating delay loop (skipped), value calculated using timer frequency.. 48.00 BogoMIPS (lpj=240000)
> > [ 0.087384] pid_max: default: 32768 minimum: 301
> > [ 0.093527] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
> > [ 0.099327] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
> > [ 0.116798] CPU: Testing write buffer coherency: ok
> > [ 0.122036] CPU0: update cpu_capacity 1024
> > [ 0.123381] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
> > [ 0.137390] cblist_init_generic: Setting adjustable number of callback queues.
> > [ 0.142282] cblist_init_generic: Setting shift to 0 and lim to 1.
> > [ 0.149333] Running RCU-tasks wait API self tests
> >
> > --
> > Pengutronix e.K. | |
> > Steuerwalder Str. 21 | http://www.pengutronix.de/ |
> > 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
> > Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

2022-09-01 18:06:46

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH rcu 04/32] rcu-tasks: Drive synchronous grace periods from calling task

On Thu, Sep 01, 2022 at 12:36:25PM +0200, Sascha Hauer wrote:
> Hi Paul,
>
> On Mon, Jun 20, 2022 at 03:53:43PM -0700, Paul E. McKenney wrote:
> > This commit causes synchronous grace periods to be driven from the task
> > invoking synchronize_rcu_*(), allowing these functions to be invoked from
> > the mid-boot dead zone extending from when the scheduler was initialized
> > to to point that the various RCU tasks grace-period kthreads are spawned.
> > This change will allow the self-tests to run in a consistent manner.
> >
> > Reported-by: Matthew Wilcox <[email protected]>
> > Reported-by: Zhouyi Zhou <[email protected]>
> > Signed-off-by: Paul E. McKenney <[email protected]>
>
> This commit (appeared in mainline as 4a8cc433b8bf) breaks booting my
> ARMv7 based i.MX6ul board when CONFIG_PROVE_RCU is enabled. Reverting
> this patch on v6.0-rc3 makes my board boot again. See below for a boot
> log. The last message is "Running RCU-tasks wait API self tests", after
> that the board hangs. Any idea what goes wrong here?

New one on me!

Is it possible to get a stack trace of the hang, perhaps via
one form or another of sysrq-T? Such a stack trace would likely
include synchronize_rcu_tasks(), synchronize_rcu_tasks_rude(), or
synchronize_rcu_tasks_trace() followed by synchronize_rcu_tasks_generic(),
rcu_tasks_one_gp(), and one of rcu_tasks_wait_gp(),
rcu_tasks_rude_wait_gp(), or rcu_tasks_wait_gp().

At this point in the boot sequence, there is only one online CPU,
correct?

I have seen recent non-boot hangs within synchronize_rcu_tasks()
due to some other task getting stuck in do_exit() between its calls
to exit_tasks_rcu_start() and exit_tasks_rcu_finish(). The symptom of
this is that the aforementioned stack trace includes synchronize_srcu().
I would not expect much in the way of exiting tasks that early in the
boot sequence, but who knows?

Thanx, Paul

> Sascha
>
> ----------------------------8<-----------------------------
>
> [ 0.000000] Booting Linux on physical CPU 0x0
> [ 0.000000] Linux version 5.19.0-rc3-00004-g4a8cc433b8bf (sha@dude02) (arm-v7a-linux-gnueabihf-gcc (OSELAS.Toolchain-2021.07.0 11-20210703) 11.1.1 20210703, GNU ld (GNU Binutils) 2.36.1) #229 SMP Thu Sep 1 12:00:07 CEST 2022
> [ 0.000000] CPU: ARMv7 Processor [410fc075] revision 5 (ARMv7), cr=10c5387d
> [ 0.000000] CPU: div instructions available: patching division code
> [ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
> [ 0.000000] OF: fdt: Machine model: IDS CU33X
> [ 0.000000] earlycon: ec_imx6q0 at MMIO 0x02020000 (options '')
> [ 0.000000] printk: bootconsole [ec_imx6q0] enabled
> [ 0.000000] Memory policy: Data cache writealloc
> [ 0.000000] cma: Reserved 64 MiB at 0x8c000000
> [ 0.000000] Zone ranges:
> [ 0.000000] Normal [mem 0x0000000080000000-0x000000008fffffff]
> [ 0.000000] HighMem empty
> [ 0.000000] Movable zone start for each node
> [ 0.000000] Early memory node ranges
> [ 0.000000] node 0: [mem 0x0000000080000000-0x000000008fffffff]
> [ 0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x000000008fffffff]
> [ 0.000000] percpu: Embedded 17 pages/cpu s38068 r8192 d23372 u69632
> [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 65024
> [ 0.000000] Kernel command line: console=ttymxc0,115200n8 earlycon ip=dhcp root=/dev/nfs nfsroot=192.168.8.12:/hom
> e/sha/nfsroot/cu33x,v3,tcp
> [ 0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes, linear)
> [ 0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes, linear)
> [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> [ 0.000000] Memory: 162600K/262144K available (15360K kernel code, 2146K rwdata, 5472K rodata, 1024K init, 6681K b
> ss, 34008K reserved, 65536K cma-reserved, 0K highmem)
> [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
> [ 0.000000] trace event string verifier disabled
> [ 0.000000] Running RCU self tests
> [ 0.000000] rcu: Hierarchical RCU implementation.
> [ 0.000000] rcu: RCU event tracing is enabled.
> [ 0.000000] rcu: RCU lockdep checking is enabled.
> [ 0.000000] rcu: RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=1.
> [ 0.000000] Tracing variant of Tasks RCU enabled.
> [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
> [ 0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
> [ 0.000000] NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16
> [ 0.000000] rcu: srcu_init: Setting srcu_struct sizes based on contention.
> [ 0.000000] Switching to timer-based delay loop, resolution 41ns
> [ 0.000003] sched_clock: 32 bits at 24MHz, resolution 41ns, wraps every 89478484971ns
> [ 0.007810] clocksource: mxc_timer1: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 79635851949 ns
> [ 0.021748] Console: colour dummy device 80x30
> [ 0.023488] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
> [ 0.032009] ... MAX_LOCKDEP_SUBCLASSES: 8
> [ 0.035282] ... MAX_LOCK_DEPTH: 48
> [ 0.039445] ... MAX_LOCKDEP_KEYS: 8192
> [ 0.043886] ... CLASSHASH_SIZE: 4096
> [ 0.048127] ... MAX_LOCKDEP_ENTRIES: 32768
> [ 0.052552] ... MAX_LOCKDEP_CHAINS: 65536
> [ 0.057069] ... CHAINHASH_SIZE: 32768
> [ 0.061405] memory used by lock dependency info: 4061 kB
> [ 0.066788] memory used for stack traces: 2112 kB
> [ 0.071645] per task-struct memory footprint: 1536 bytes
> [ 0.077138] Calibrating delay loop (skipped), value calculated using timer frequency.. 48.00 BogoMIPS (lpj=240000)
> [ 0.087384] pid_max: default: 32768 minimum: 301
> [ 0.093527] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
> [ 0.099327] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
> [ 0.116798] CPU: Testing write buffer coherency: ok
> [ 0.122036] CPU0: update cpu_capacity 1024
> [ 0.123381] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
> [ 0.137390] cblist_init_generic: Setting adjustable number of callback queues.
> [ 0.142282] cblist_init_generic: Setting shift to 0 and lim to 1.
> [ 0.149333] Running RCU-tasks wait API self tests
>
> --
> Pengutronix e.K. | |
> Steuerwalder Str. 21 | http://www.pengutronix.de/ |
> 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
> Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

2022-09-02 12:11:17

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH rcu 04/32] rcu-tasks: Drive synchronous grace periods from calling task

On Fri, Sep 02, 2022 at 01:52:28PM +0200, Sascha Hauer wrote:
> On Thu, Sep 01, 2022 at 10:33:04AM -0700, Paul E. McKenney wrote:
> > On Thu, Sep 01, 2022 at 10:27:25AM -0700, Paul E. McKenney wrote:
> > > On Thu, Sep 01, 2022 at 12:36:25PM +0200, Sascha Hauer wrote:
> > > > Hi Paul,
> > > >
> > > > On Mon, Jun 20, 2022 at 03:53:43PM -0700, Paul E. McKenney wrote:
> > > > > This commit causes synchronous grace periods to be driven from the task
> > > > > invoking synchronize_rcu_*(), allowing these functions to be invoked from
> > > > > the mid-boot dead zone extending from when the scheduler was initialized
> > > > > to to point that the various RCU tasks grace-period kthreads are spawned.
> > > > > This change will allow the self-tests to run in a consistent manner.
> > > > >
> > > > > Reported-by: Matthew Wilcox <[email protected]>
> > > > > Reported-by: Zhouyi Zhou <[email protected]>
> > > > > Signed-off-by: Paul E. McKenney <[email protected]>
> > > >
> > > > This commit (appeared in mainline as 4a8cc433b8bf) breaks booting my
> > > > ARMv7 based i.MX6ul board when CONFIG_PROVE_RCU is enabled. Reverting
> > > > this patch on v6.0-rc3 makes my board boot again. See below for a boot
> > > > log. The last message is "Running RCU-tasks wait API self tests", after
> > > > that the board hangs. Any idea what goes wrong here?
> > >
> > > New one on me!
> > >
> > > Is it possible to get a stack trace of the hang, perhaps via
> > > one form or another of sysrq-T? Such a stack trace would likely
> > > include synchronize_rcu_tasks(), synchronize_rcu_tasks_rude(), or
> > > synchronize_rcu_tasks_trace() followed by synchronize_rcu_tasks_generic(),
> > > rcu_tasks_one_gp(), and one of rcu_tasks_wait_gp(),
> > > rcu_tasks_rude_wait_gp(), or rcu_tasks_wait_gp().
> >
> > If there is no chance of sysrq-T, kernel debuggers, kernel crash
> > dumps, or any other source of the stack trace, please decorate the
> > code path with printk() or similar and let me know where it goes.
> > Under normal circumstances, this code path is not sensitive to performance
> > perturbations of the printk() persuasion.
>
> Some unrelated bug I was searching for made me turn on early console
> output with the "earlycon" parameter. It turned out that when I remove
> this parameter then my board boots fine.
>
> I then realized that even with earlycon enabled my board boots fine
> when I remove the call to
>
> pr_info("Running RCU-tasks wait API self tests\n");

Ah, there are some printk() fixes in the works. Maybe this is one area
needed that. Or maybe not.

> Given that I am not sure how useful it is to add more printk. I did that
> anyway like this:
>
> > static void rcu_tasks_one_gp(struct rcu_tasks *rtp, bool midboot)
> > {
> > int needgpcb;
> >
> > printk("%s: mutex_lock... midboot: %d\n", __func__, midboot);
> > mutex_lock(&rtp->tasks_gp_mutex);
> > printk("%s: mutex_locked midboot: %d\n", __func__, midboot);
> >
> > // If there were none, wait a bit and start over.
> > if (unlikely(midboot)) {
> > needgpcb = 0x2;
> > } else {
> > printk("%s: set_tasks_gp_state(RTGS_WAIT_CBS)...\n", __func__);
> > set_tasks_gp_state(rtp, RTGS_WAIT_CBS);
> > printk("%s: rcuwait_wait_event...\n", __func__);
> > rcuwait_wait_event(&rtp->cbs_wait,
> > (needgpcb = rcu_tasks_need_gpcb(rtp)),
> > TASK_IDLE);
> > printk("%s: rcuwait_wait_event done\n", __func__);
> > }
> >
>
> What I see then is:
>
> [ 0.156362] synchronize_rcu_tasks_generic: rcu_tasks_one_gp....
> [ 0.162087] rcu_tasks_one_gp: mutex_lock... midboot: 1

So one task gets stuck either in mutex_lock() or the printk() above
and some other task below moves ahead? Or might some printk()s have
been lost?

> [ 0.167386] rcu_tasks_one_gp: mutex_lock... midboot: 0
> [ 0.172489] rcu_tasks_one_gp: mutex_locked midboot: 0
> [ 0.177535] rcu_tasks_one_gp: set_tasks_gp_state(RTGS_WAIT_CBS)...
> [ 0.183525] rcu_tasks_one_gp: rcuwait_wait_event...

Given that everything works with printk()s turned off, my current
suspicion is a printk() issue.

> Here the board hangs. After some time I get:
>
> [ 254.493010] random: crng init done

This looks unrelated.

> But that's it.
>
> >
> > > At this point in the boot sequence, there is only one online CPU,
> > > correct?
>
> Yes, it's a single core system.

OK, then we should be able to rule out SMP issues. ;-)

Thanx, Paul

> Sascha
>
> --
> Pengutronix e.K. | |
> Steuerwalder Str. 21 | http://www.pengutronix.de/ |
> 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
> Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

2022-09-02 12:22:10

by Sascha Hauer

[permalink] [raw]
Subject: Re: [PATCH rcu 04/32] rcu-tasks: Drive synchronous grace periods from calling task

On Thu, Sep 01, 2022 at 10:33:04AM -0700, Paul E. McKenney wrote:
> On Thu, Sep 01, 2022 at 10:27:25AM -0700, Paul E. McKenney wrote:
> > On Thu, Sep 01, 2022 at 12:36:25PM +0200, Sascha Hauer wrote:
> > > Hi Paul,
> > >
> > > On Mon, Jun 20, 2022 at 03:53:43PM -0700, Paul E. McKenney wrote:
> > > > This commit causes synchronous grace periods to be driven from the task
> > > > invoking synchronize_rcu_*(), allowing these functions to be invoked from
> > > > the mid-boot dead zone extending from when the scheduler was initialized
> > > > to to point that the various RCU tasks grace-period kthreads are spawned.
> > > > This change will allow the self-tests to run in a consistent manner.
> > > >
> > > > Reported-by: Matthew Wilcox <[email protected]>
> > > > Reported-by: Zhouyi Zhou <[email protected]>
> > > > Signed-off-by: Paul E. McKenney <[email protected]>
> > >
> > > This commit (appeared in mainline as 4a8cc433b8bf) breaks booting my
> > > ARMv7 based i.MX6ul board when CONFIG_PROVE_RCU is enabled. Reverting
> > > this patch on v6.0-rc3 makes my board boot again. See below for a boot
> > > log. The last message is "Running RCU-tasks wait API self tests", after
> > > that the board hangs. Any idea what goes wrong here?
> >
> > New one on me!
> >
> > Is it possible to get a stack trace of the hang, perhaps via
> > one form or another of sysrq-T? Such a stack trace would likely
> > include synchronize_rcu_tasks(), synchronize_rcu_tasks_rude(), or
> > synchronize_rcu_tasks_trace() followed by synchronize_rcu_tasks_generic(),
> > rcu_tasks_one_gp(), and one of rcu_tasks_wait_gp(),
> > rcu_tasks_rude_wait_gp(), or rcu_tasks_wait_gp().
>
> If there is no chance of sysrq-T, kernel debuggers, kernel crash
> dumps, or any other source of the stack trace, please decorate the
> code path with printk() or similar and let me know where it goes.
> Under normal circumstances, this code path is not sensitive to performance
> perturbations of the printk() persuasion.

Some unrelated bug I was searching for made me turn on early console
output with the "earlycon" parameter. It turned out that when I remove
this parameter then my board boots fine.

I then realized that even with earlycon enabled my board boots fine
when I remove the call to

pr_info("Running RCU-tasks wait API self tests\n");

Given that I am not sure how useful it is to add more printk. I did that
anyway like this:

> static void rcu_tasks_one_gp(struct rcu_tasks *rtp, bool midboot)
> {
> int needgpcb;
>
> printk("%s: mutex_lock... midboot: %d\n", __func__, midboot);
> mutex_lock(&rtp->tasks_gp_mutex);
> printk("%s: mutex_locked midboot: %d\n", __func__, midboot);
>
> // If there were none, wait a bit and start over.
> if (unlikely(midboot)) {
> needgpcb = 0x2;
> } else {
> printk("%s: set_tasks_gp_state(RTGS_WAIT_CBS)...\n", __func__);
> set_tasks_gp_state(rtp, RTGS_WAIT_CBS);
> printk("%s: rcuwait_wait_event...\n", __func__);
> rcuwait_wait_event(&rtp->cbs_wait,
> (needgpcb = rcu_tasks_need_gpcb(rtp)),
> TASK_IDLE);
> printk("%s: rcuwait_wait_event done\n", __func__);
> }
>

What I see then is:

[ 0.156362] synchronize_rcu_tasks_generic: rcu_tasks_one_gp....
[ 0.162087] rcu_tasks_one_gp: mutex_lock... midboot: 1
[ 0.167386] rcu_tasks_one_gp: mutex_lock... midboot: 0
[ 0.172489] rcu_tasks_one_gp: mutex_locked midboot: 0
[ 0.177535] rcu_tasks_one_gp: set_tasks_gp_state(RTGS_WAIT_CBS)...
[ 0.183525] rcu_tasks_one_gp: rcuwait_wait_event...

Here the board hangs. After some time I get:

[ 254.493010] random: crng init done

But that's it.

>
> > At this point in the boot sequence, there is only one online CPU,
> > correct?

Yes, it's a single core system.

Sascha

--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |