Hi,
This series contains the fixes of RCU tasks for v6.9. You can also find
the series at:
git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git rcu-tasks.2024.02.14a
Changes since v1:
* Update with Paul's rework on "Eliminate deadlocks involving
do_exit() and RCU task"
The detailed list of changes:
Paul E. McKenney (6):
rcu-tasks: Repair RCU Tasks Trace quiescence check
rcu-tasks: Add data to eliminate RCU-tasks/do_exit() deadlocks
rcu-tasks: Initialize data to eliminate RCU-tasks/do_exit() deadlocks
rcu-tasks: Maintain lists to eliminate RCU-tasks/do_exit() deadlocks
rcu-tasks: Eliminate deadlocks involving do_exit() and RCU tasks
rcu-tasks: Maintain real-time response in rcu_tasks_postscan()
include/linux/rcupdate.h | 4 +-
include/linux/sched.h | 2 +
init/init_task.c | 1 +
kernel/fork.c | 1 +
kernel/rcu/tasks.h | 110 ++++++++++++++++++++++++++++++---------
5 files changed, 90 insertions(+), 28 deletions(-)
--
2.43.0
From: "Paul E. McKenney" <[email protected]>
Holding a mutex across synchronize_rcu_tasks() and acquiring
that same mutex in code called from do_exit() after its call to
exit_tasks_rcu_start() but before its call to exit_tasks_rcu_stop()
results in deadlock. This is by design, because tasks that are far
enough into do_exit() are no longer present on the tasks list, making
it a bit difficult for RCU Tasks to find them, let alone wait on them
to do a voluntary context switch. However, such deadlocks are becoming
more frequent. In addition, lockdep currently does not detect such
deadlocks and they can be difficult to reproduce.
In addition, if a task voluntarily context switches during that time
(for example, if it blocks acquiring a mutex), then this task is in an
RCU Tasks quiescent state. And with some adjustments, RCU Tasks could
just as well take advantage of that fact.
This commit therefore initializes the data structures that will be needed
to rely on these quiescent states and to eliminate these deadlocks.
Link: https://lore.kernel.org/all/[email protected]/
Reported-by: Chen Zhongjin <[email protected]>
Reported-by: Yang Jihong <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Tested-by: Yang Jihong <[email protected]>
Tested-by: Chen Zhongjin <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>
---
init/init_task.c | 1 +
kernel/fork.c | 1 +
kernel/rcu/tasks.h | 2 ++
3 files changed, 4 insertions(+)
diff --git a/init/init_task.c b/init/init_task.c
index 7ecb458eb3da..4daee6d761c8 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -147,6 +147,7 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) = {
.rcu_tasks_holdout = false,
.rcu_tasks_holdout_list = LIST_HEAD_INIT(init_task.rcu_tasks_holdout_list),
.rcu_tasks_idle_cpu = -1,
+ .rcu_tasks_exit_list = LIST_HEAD_INIT(init_task.rcu_tasks_exit_list),
#endif
#ifdef CONFIG_TASKS_TRACE_RCU
.trc_reader_nesting = 0,
diff --git a/kernel/fork.c b/kernel/fork.c
index 0d944e92a43f..af7203be1d2d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1976,6 +1976,7 @@ static inline void rcu_copy_process(struct task_struct *p)
p->rcu_tasks_holdout = false;
INIT_LIST_HEAD(&p->rcu_tasks_holdout_list);
p->rcu_tasks_idle_cpu = -1;
+ INIT_LIST_HEAD(&p->rcu_tasks_exit_list);
#endif /* #ifdef CONFIG_TASKS_RCU */
#ifdef CONFIG_TASKS_TRACE_RCU
p->trc_reader_nesting = 0;
diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index b7d5f2757053..4a5d562e3189 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -277,6 +277,8 @@ static void cblist_init_generic(struct rcu_tasks *rtp)
rtpcp->rtpp = rtp;
if (!rtpcp->rtp_blkd_tasks.next)
INIT_LIST_HEAD(&rtpcp->rtp_blkd_tasks);
+ if (!rtpcp->rtp_exit_list.next)
+ INIT_LIST_HEAD(&rtpcp->rtp_exit_list);
}
pr_info("%s: Setting shift to %d and lim to %d rcu_task_cb_adjust=%d.\n", rtp->name,
--
2.43.0
Le Fri, Feb 16, 2024 at 05:27:38PM -0800, Boqun Feng a ?crit :
> From: "Paul E. McKenney" <[email protected]>
>
> Holding a mutex across synchronize_rcu_tasks() and acquiring
> that same mutex in code called from do_exit() after its call to
> exit_tasks_rcu_start() but before its call to exit_tasks_rcu_stop()
> results in deadlock. This is by design, because tasks that are far
> enough into do_exit() are no longer present on the tasks list, making
> it a bit difficult for RCU Tasks to find them, let alone wait on them
> to do a voluntary context switch. However, such deadlocks are becoming
> more frequent. In addition, lockdep currently does not detect such
> deadlocks and they can be difficult to reproduce.
>
> In addition, if a task voluntarily context switches during that time
> (for example, if it blocks acquiring a mutex), then this task is in an
> RCU Tasks quiescent state. And with some adjustments, RCU Tasks could
> just as well take advantage of that fact.
>
> This commit therefore initializes the data structures that will be needed
> to rely on these quiescent states and to eliminate these deadlocks.
>
> Link: https://lore.kernel.org/all/[email protected]/
>
> Reported-by: Chen Zhongjin <[email protected]>
> Reported-by: Yang Jihong <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
> Tested-by: Yang Jihong <[email protected]>
> Tested-by: Chen Zhongjin <[email protected]>
> Signed-off-by: Boqun Feng <[email protected]>
> ---
> init/init_task.c | 1 +
> kernel/fork.c | 1 +
> kernel/rcu/tasks.h | 2 ++
> 3 files changed, 4 insertions(+)
>
> diff --git a/init/init_task.c b/init/init_task.c
> index 7ecb458eb3da..4daee6d761c8 100644
> --- a/init/init_task.c
> +++ b/init/init_task.c
> @@ -147,6 +147,7 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) = {
> .rcu_tasks_holdout = false,
> .rcu_tasks_holdout_list = LIST_HEAD_INIT(init_task.rcu_tasks_holdout_list),
> .rcu_tasks_idle_cpu = -1,
> + .rcu_tasks_exit_list = LIST_HEAD_INIT(init_task.rcu_tasks_exit_list),
> #endif
> #ifdef CONFIG_TASKS_TRACE_RCU
> .trc_reader_nesting = 0,
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 0d944e92a43f..af7203be1d2d 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1976,6 +1976,7 @@ static inline void rcu_copy_process(struct task_struct *p)
> p->rcu_tasks_holdout = false;
> INIT_LIST_HEAD(&p->rcu_tasks_holdout_list);
> p->rcu_tasks_idle_cpu = -1;
> + INIT_LIST_HEAD(&p->rcu_tasks_exit_list);
> #endif /* #ifdef CONFIG_TASKS_RCU */
> #ifdef CONFIG_TASKS_TRACE_RCU
> p->trc_reader_nesting = 0;
> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> index b7d5f2757053..4a5d562e3189 100644
> --- a/kernel/rcu/tasks.h
> +++ b/kernel/rcu/tasks.h
> @@ -277,6 +277,8 @@ static void cblist_init_generic(struct rcu_tasks *rtp)
> rtpcp->rtpp = rtp;
> if (!rtpcp->rtp_blkd_tasks.next)
> INIT_LIST_HEAD(&rtpcp->rtp_blkd_tasks);
> + if (!rtpcp->rtp_exit_list.next)
I assume there can't be an exiting task concurrently at this point on
boot. Because kthreadd just got created and workqueues as well but that's it,
right? Or workqueues can die that early? Probably not.
> + INIT_LIST_HEAD(&rtpcp->rtp_exit_list);
Because if tasks can exit concurrently, then we are in trouble :-)
Thanks.
> }
>
> pr_info("%s: Setting shift to %d and lim to %d rcu_task_cb_adjust=%d.\n", rtp->name,
> --
> 2.43.0
>
Le Fri, Feb 16, 2024 at 05:27:35PM -0800, Boqun Feng a ?crit :
> Hi,
>
> This series contains the fixes of RCU tasks for v6.9. You can also find
> the series at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git rcu-tasks.2024.02.14a
>
> Changes since v1:
>
> * Update with Paul's rework on "Eliminate deadlocks involving
> do_exit() and RCU task"
>
> The detailed list of changes:
>
> Paul E. McKenney (6):
> rcu-tasks: Repair RCU Tasks Trace quiescence check
> rcu-tasks: Add data to eliminate RCU-tasks/do_exit() deadlocks
> rcu-tasks: Initialize data to eliminate RCU-tasks/do_exit() deadlocks
> rcu-tasks: Maintain lists to eliminate RCU-tasks/do_exit() deadlocks
> rcu-tasks: Eliminate deadlocks involving do_exit() and RCU tasks
Food for later thoughts and further improvements: would it make sense to
call exit_rcu_tasks_start() on fork() instead and rely solely on
each CPUs' rtp_exit_list instead of the tasklist?
Thanks.
> rcu-tasks: Maintain real-time response in rcu_tasks_postscan()
>
> include/linux/rcupdate.h | 4 +-
> include/linux/sched.h | 2 +
> init/init_task.c | 1 +
> kernel/fork.c | 1 +
> kernel/rcu/tasks.h | 110 ++++++++++++++++++++++++++++++---------
> 5 files changed, 90 insertions(+), 28 deletions(-)
>
> --
> 2.43.0
>
>
On Thu, Feb 22, 2024 at 05:21:03PM +0100, Frederic Weisbecker wrote:
> Le Fri, Feb 16, 2024 at 05:27:38PM -0800, Boqun Feng a ?crit :
> > From: "Paul E. McKenney" <[email protected]>
> >
> > Holding a mutex across synchronize_rcu_tasks() and acquiring
> > that same mutex in code called from do_exit() after its call to
> > exit_tasks_rcu_start() but before its call to exit_tasks_rcu_stop()
> > results in deadlock. This is by design, because tasks that are far
> > enough into do_exit() are no longer present on the tasks list, making
> > it a bit difficult for RCU Tasks to find them, let alone wait on them
> > to do a voluntary context switch. However, such deadlocks are becoming
> > more frequent. In addition, lockdep currently does not detect such
> > deadlocks and they can be difficult to reproduce.
> >
> > In addition, if a task voluntarily context switches during that time
> > (for example, if it blocks acquiring a mutex), then this task is in an
> > RCU Tasks quiescent state. And with some adjustments, RCU Tasks could
> > just as well take advantage of that fact.
> >
> > This commit therefore initializes the data structures that will be needed
> > to rely on these quiescent states and to eliminate these deadlocks.
> >
> > Link: https://lore.kernel.org/all/[email protected]/
> >
> > Reported-by: Chen Zhongjin <[email protected]>
> > Reported-by: Yang Jihong <[email protected]>
> > Signed-off-by: Paul E. McKenney <[email protected]>
> > Tested-by: Yang Jihong <[email protected]>
> > Tested-by: Chen Zhongjin <[email protected]>
> > Signed-off-by: Boqun Feng <[email protected]>
> > ---
> > init/init_task.c | 1 +
> > kernel/fork.c | 1 +
> > kernel/rcu/tasks.h | 2 ++
> > 3 files changed, 4 insertions(+)
> >
> > diff --git a/init/init_task.c b/init/init_task.c
> > index 7ecb458eb3da..4daee6d761c8 100644
> > --- a/init/init_task.c
> > +++ b/init/init_task.c
> > @@ -147,6 +147,7 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) = {
> > .rcu_tasks_holdout = false,
> > .rcu_tasks_holdout_list = LIST_HEAD_INIT(init_task.rcu_tasks_holdout_list),
> > .rcu_tasks_idle_cpu = -1,
> > + .rcu_tasks_exit_list = LIST_HEAD_INIT(init_task.rcu_tasks_exit_list),
> > #endif
> > #ifdef CONFIG_TASKS_TRACE_RCU
> > .trc_reader_nesting = 0,
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index 0d944e92a43f..af7203be1d2d 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -1976,6 +1976,7 @@ static inline void rcu_copy_process(struct task_struct *p)
> > p->rcu_tasks_holdout = false;
> > INIT_LIST_HEAD(&p->rcu_tasks_holdout_list);
> > p->rcu_tasks_idle_cpu = -1;
> > + INIT_LIST_HEAD(&p->rcu_tasks_exit_list);
> > #endif /* #ifdef CONFIG_TASKS_RCU */
> > #ifdef CONFIG_TASKS_TRACE_RCU
> > p->trc_reader_nesting = 0;
> > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> > index b7d5f2757053..4a5d562e3189 100644
> > --- a/kernel/rcu/tasks.h
> > +++ b/kernel/rcu/tasks.h
> > @@ -277,6 +277,8 @@ static void cblist_init_generic(struct rcu_tasks *rtp)
> > rtpcp->rtpp = rtp;
> > if (!rtpcp->rtp_blkd_tasks.next)
> > INIT_LIST_HEAD(&rtpcp->rtp_blkd_tasks);
> > + if (!rtpcp->rtp_exit_list.next)
>
> I assume there can't be an exiting task concurrently at this point on
> boot. Because kthreadd just got created and workqueues as well but that's it,
> right? Or workqueues can die that early? Probably not.
>
> > + INIT_LIST_HEAD(&rtpcp->rtp_exit_list);
>
> Because if tasks can exit concurrently, then we are in trouble :-)
Tasks exiting at that point might be unconventional, but I don't see
anything that prevents them from doing so.
So excellent catch, and thank you very much!!!
My thought is to add the following patch to precede this one, which
initializes those lists at rcu_init() time. Would that work?
Thanx, Paul
------------------------------------------------------------------------
commit 9a876aac8064dfd46c840e4bb6177e65f7964bb4
Author: Paul E. McKenney <[email protected]>
Date: Thu Feb 22 12:29:54 2024 -0800
rcu-tasks: Initialize callback lists at rcu_init() time
In order for RCU Tasks to reliably maintain per-CPU lists of exiting
tasks, those lists must be initialized before it is possible for tasks
to exit, especially given that the boot CPU is not necessarily CPU 0
(an example being, powerpc kexec() kernels). And at the time that
rcu_init_tasks_generic() is called, a task could potentially exit,
unconventional though that sort of thing might be.
This commit therefore moves the calls to cblist_init_generic() from
functions called from rcu_init_tasks_generic() to a new function named
tasks_cblist_init_generic() that is invoked from rcu_init().
This constituted a bug in a commit that never went to mainline, so
there is no need for any backporting to -stable.
Reported-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 4e65a92e528e5..86fce206560e8 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -528,6 +528,12 @@ struct task_struct *get_rcu_tasks_gp_kthread(void);
struct task_struct *get_rcu_tasks_rude_gp_kthread(void);
#endif // # ifdef CONFIG_TASKS_RUDE_RCU
+#ifdef CONFIG_TASKS_RCU_GENERIC
+void tasks_cblist_init_generic(void);
+#else /* #ifdef CONFIG_TASKS_RCU_GENERIC */
+static inline void tasks_cblist_init_generic(void) { }
+#endif /* #else #ifdef CONFIG_TASKS_RCU_GENERIC */
+
#define RCU_SCHEDULER_INACTIVE 0
#define RCU_SCHEDULER_INIT 1
#define RCU_SCHEDULER_RUNNING 2
diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 866743e0796f4..e06e388e7c7e6 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -240,7 +240,6 @@ static const char *tasks_gp_state_getname(struct rcu_tasks *rtp)
static void cblist_init_generic(struct rcu_tasks *rtp)
{
int cpu;
- unsigned long flags;
int lim;
int shift;
@@ -266,10 +265,8 @@ static void cblist_init_generic(struct rcu_tasks *rtp)
WARN_ON_ONCE(!rtpcp);
if (cpu)
raw_spin_lock_init(&ACCESS_PRIVATE(rtpcp, lock));
- local_irq_save(flags); // serialize initialization
if (rcu_segcblist_empty(&rtpcp->cblist))
rcu_segcblist_init(&rtpcp->cblist);
- local_irq_restore(flags);
INIT_WORK(&rtpcp->rtp_work, rcu_tasks_invoke_cbs_wq);
rtpcp->cpu = cpu;
rtpcp->rtpp = rtp;
@@ -1153,7 +1150,6 @@ module_param(rcu_tasks_lazy_ms, int, 0444);
static int __init rcu_spawn_tasks_kthread(void)
{
- cblist_init_generic(&rcu_tasks);
rcu_tasks.gp_sleep = HZ / 10;
rcu_tasks.init_fract = HZ / 10;
if (rcu_tasks_lazy_ms >= 0)
@@ -1340,7 +1336,6 @@ module_param(rcu_tasks_rude_lazy_ms, int, 0444);
static int __init rcu_spawn_tasks_rude_kthread(void)
{
- cblist_init_generic(&rcu_tasks_rude);
rcu_tasks_rude.gp_sleep = HZ / 10;
if (rcu_tasks_rude_lazy_ms >= 0)
rcu_tasks_rude.lazy_jiffies = msecs_to_jiffies(rcu_tasks_rude_lazy_ms);
@@ -1972,7 +1967,6 @@ module_param(rcu_tasks_trace_lazy_ms, int, 0444);
static int __init rcu_spawn_tasks_trace_kthread(void)
{
- cblist_init_generic(&rcu_tasks_trace);
if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB)) {
rcu_tasks_trace.gp_sleep = HZ / 10;
rcu_tasks_trace.init_fract = HZ / 10;
@@ -2144,6 +2138,24 @@ late_initcall(rcu_tasks_verify_schedule_work);
static void rcu_tasks_initiate_self_tests(void) { }
#endif /* #else #ifdef CONFIG_PROVE_RCU */
+void __init tasks_cblist_init_generic(void)
+{
+ lockdep_assert_irqs_disabled();
+ WARN_ON(num_online_cpus() > 1);
+
+#ifdef CONFIG_TASKS_RCU
+ cblist_init_generic(&rcu_tasks);
+#endif
+
+#ifdef CONFIG_TASKS_RUDE_RCU
+ cblist_init_generic(&rcu_tasks_rude);
+#endif
+
+#ifdef CONFIG_TASKS_TRACE_RCU
+ cblist_init_generic(&rcu_tasks_trace);
+#endif
+}
+
void __init rcu_init_tasks_generic(void)
{
#ifdef CONFIG_TASKS_RCU
diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
index fec804b790803..705c0d16850aa 100644
--- a/kernel/rcu/tiny.c
+++ b/kernel/rcu/tiny.c
@@ -261,4 +261,5 @@ void __init rcu_init(void)
{
open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
rcu_early_boot_tests();
+ tasks_cblist_init_generic();
}
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 31f3a61f9c384..4f4aec64039f0 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -5601,6 +5601,8 @@ void __init rcu_init(void)
(void)start_poll_synchronize_rcu_expedited();
rcu_test_sync_prims();
+
+ tasks_cblist_init_generic();
}
#include "tree_stall.h"
On Thu, Feb 22, 2024 at 05:52:23PM +0100, Frederic Weisbecker wrote:
> Le Fri, Feb 16, 2024 at 05:27:35PM -0800, Boqun Feng a ?crit :
> > Hi,
> >
> > This series contains the fixes of RCU tasks for v6.9. You can also find
> > the series at:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git rcu-tasks.2024.02.14a
> >
> > Changes since v1:
> >
> > * Update with Paul's rework on "Eliminate deadlocks involving
> > do_exit() and RCU task"
> >
> > The detailed list of changes:
> >
> > Paul E. McKenney (6):
> > rcu-tasks: Repair RCU Tasks Trace quiescence check
> > rcu-tasks: Add data to eliminate RCU-tasks/do_exit() deadlocks
> > rcu-tasks: Initialize data to eliminate RCU-tasks/do_exit() deadlocks
> > rcu-tasks: Maintain lists to eliminate RCU-tasks/do_exit() deadlocks
> > rcu-tasks: Eliminate deadlocks involving do_exit() and RCU tasks
>
> Food for later thoughts and further improvements: would it make sense to
> call exit_rcu_tasks_start() on fork() instead and rely solely on
> each CPUs' rtp_exit_list instead of the tasklist?
It might well.
One big advantage of doing that is the ability to incrementally traverse
the tasks. But is there some good way of doing that to the full task
lists? If so, everyone could benefit.
Thanx, Paul
> Thanks.
>
> > rcu-tasks: Maintain real-time response in rcu_tasks_postscan()
> >
> > include/linux/rcupdate.h | 4 +-
> > include/linux/sched.h | 2 +
> > init/init_task.c | 1 +
> > kernel/fork.c | 1 +
> > kernel/rcu/tasks.h | 110 ++++++++++++++++++++++++++++++---------
> > 5 files changed, 90 insertions(+), 28 deletions(-)
> >
> > --
> > 2.43.0
> >
> >
>
On Thu, Feb 22, 2024 at 12:41:55PM -0800, Paul E. McKenney wrote:
> On Thu, Feb 22, 2024 at 05:21:03PM +0100, Frederic Weisbecker wrote:
> > Le Fri, Feb 16, 2024 at 05:27:38PM -0800, Boqun Feng a ?crit :
> > > From: "Paul E. McKenney" <[email protected]>
> > >
> > > Holding a mutex across synchronize_rcu_tasks() and acquiring
> > > that same mutex in code called from do_exit() after its call to
> > > exit_tasks_rcu_start() but before its call to exit_tasks_rcu_stop()
> > > results in deadlock. This is by design, because tasks that are far
> > > enough into do_exit() are no longer present on the tasks list, making
> > > it a bit difficult for RCU Tasks to find them, let alone wait on them
> > > to do a voluntary context switch. However, such deadlocks are becoming
> > > more frequent. In addition, lockdep currently does not detect such
> > > deadlocks and they can be difficult to reproduce.
> > >
> > > In addition, if a task voluntarily context switches during that time
> > > (for example, if it blocks acquiring a mutex), then this task is in an
> > > RCU Tasks quiescent state. And with some adjustments, RCU Tasks could
> > > just as well take advantage of that fact.
> > >
> > > This commit therefore initializes the data structures that will be needed
> > > to rely on these quiescent states and to eliminate these deadlocks.
> > >
> > > Link: https://lore.kernel.org/all/[email protected]/
> > >
> > > Reported-by: Chen Zhongjin <[email protected]>
> > > Reported-by: Yang Jihong <[email protected]>
> > > Signed-off-by: Paul E. McKenney <[email protected]>
> > > Tested-by: Yang Jihong <[email protected]>
> > > Tested-by: Chen Zhongjin <[email protected]>
> > > Signed-off-by: Boqun Feng <[email protected]>
Looks good, thanks!
Reviewed-by: Frederic Weisbecker <[email protected]>
On Thu, Feb 22, 2024 at 02:09:17PM -0800, Paul E. McKenney wrote:
> On Thu, Feb 22, 2024 at 05:52:23PM +0100, Frederic Weisbecker wrote:
> > Le Fri, Feb 16, 2024 at 05:27:35PM -0800, Boqun Feng a ?crit :
> > > Hi,
> > >
> > > This series contains the fixes of RCU tasks for v6.9. You can also find
> > > the series at:
> > >
> > > git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git rcu-tasks.2024.02.14a
> > >
> > > Changes since v1:
> > >
> > > * Update with Paul's rework on "Eliminate deadlocks involving
> > > do_exit() and RCU task"
> > >
> > > The detailed list of changes:
> > >
> > > Paul E. McKenney (6):
> > > rcu-tasks: Repair RCU Tasks Trace quiescence check
> > > rcu-tasks: Add data to eliminate RCU-tasks/do_exit() deadlocks
> > > rcu-tasks: Initialize data to eliminate RCU-tasks/do_exit() deadlocks
> > > rcu-tasks: Maintain lists to eliminate RCU-tasks/do_exit() deadlocks
> > > rcu-tasks: Eliminate deadlocks involving do_exit() and RCU tasks
> >
> > Food for later thoughts and further improvements: would it make sense to
> > call exit_rcu_tasks_start() on fork() instead and rely solely on
> > each CPUs' rtp_exit_list instead of the tasklist?
>
> It might well.
>
> One big advantage of doing that is the ability to incrementally traverse
> the tasks. But is there some good way of doing that to the full task
> lists? If so, everyone could benefit.
What do you mean by incrementally? You mean being able to cond_resched()
in the middle of the tasks iteration? Yeah not sure that's possible...
Thanks.
>
> Thanx, Paul
>
> > Thanks.
> >
> > > rcu-tasks: Maintain real-time response in rcu_tasks_postscan()
> > >
> > > include/linux/rcupdate.h | 4 +-
> > > include/linux/sched.h | 2 +
> > > init/init_task.c | 1 +
> > > kernel/fork.c | 1 +
> > > kernel/rcu/tasks.h | 110 ++++++++++++++++++++++++++++++---------
> > > 5 files changed, 90 insertions(+), 28 deletions(-)
> > >
> > > --
> > > 2.43.0
> > >
> > >
> >
On Fri, Feb 23, 2024 at 01:25:06PM +0100, Frederic Weisbecker wrote:
> On Thu, Feb 22, 2024 at 02:09:17PM -0800, Paul E. McKenney wrote:
> > On Thu, Feb 22, 2024 at 05:52:23PM +0100, Frederic Weisbecker wrote:
> > > Le Fri, Feb 16, 2024 at 05:27:35PM -0800, Boqun Feng a ?crit :
> > > > Hi,
> > > >
> > > > This series contains the fixes of RCU tasks for v6.9. You can also find
> > > > the series at:
> > > >
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git rcu-tasks.2024.02.14a
> > > >
> > > > Changes since v1:
> > > >
> > > > * Update with Paul's rework on "Eliminate deadlocks involving
> > > > do_exit() and RCU task"
> > > >
> > > > The detailed list of changes:
> > > >
> > > > Paul E. McKenney (6):
> > > > rcu-tasks: Repair RCU Tasks Trace quiescence check
> > > > rcu-tasks: Add data to eliminate RCU-tasks/do_exit() deadlocks
> > > > rcu-tasks: Initialize data to eliminate RCU-tasks/do_exit() deadlocks
> > > > rcu-tasks: Maintain lists to eliminate RCU-tasks/do_exit() deadlocks
> > > > rcu-tasks: Eliminate deadlocks involving do_exit() and RCU tasks
> > >
> > > Food for later thoughts and further improvements: would it make sense to
> > > call exit_rcu_tasks_start() on fork() instead and rely solely on
> > > each CPUs' rtp_exit_list instead of the tasklist?
> >
> > It might well.
> >
> > One big advantage of doing that is the ability to incrementally traverse
> > the tasks. But is there some good way of doing that to the full task
> > lists? If so, everyone could benefit.
>
> What do you mean by incrementally? You mean being able to cond_resched()
> in the middle of the tasks iteration? Yeah not sure that's possible...
I do indeed mean doing cond_resched() mid-stream.
One way to make this happen would be to do something like this:
struct task_struct_anchor {
struct list_head tsa_list;
struct list_head tsa_adjust_list;
atomic_t tsa_ref; // Or use an appropriate API.
bool tsa_is_anchor;
}
Each task structure would contain one of these, though there are a
number of ways to conserve space if needed.
These anchors would be placed perhaps every 1,000 tasks or so. When a
traversal encountered one, it could atomic_inc_not_zero() the reference
count, and if that succeeded, exit the RCU read-side critical section and
do a cond_resched(). It could then enter a new RCU read-side critical
section, drop the reference, and continue.
A traveral might container_of() its way from ->tsa_list to the
task_struct_anchor structure, then if ->tsa_is_anchor is false,
container_of() its way to the enclosing task structure.
How to maintain proper spacing of the anchors?
One way is to make the traversals do the checking. If the space between a
pair of anchors was to large or too small, it could add the first of the
pair to a list to be adjusted. This list could periodically be processed,
perhaps with more urgency if a huge gap had opened up.
Freeing an anchor requires decrementing the reference count, waiting for
it to go to zero, removing the anchor, waiting for a grace period (perhaps
asynchronously), and only then freeing the anchor.
Anchors cannot be moved, only added or removed.
So it is possible. But is it reasonable? ;-)
Thanx, Paul
> > > Thanks.
> > >
> > > > rcu-tasks: Maintain real-time response in rcu_tasks_postscan()
> > > >
> > > > include/linux/rcupdate.h | 4 +-
> > > > include/linux/sched.h | 2 +
> > > > init/init_task.c | 1 +
> > > > kernel/fork.c | 1 +
> > > > kernel/rcu/tasks.h | 110 ++++++++++++++++++++++++++++++---------
> > > > 5 files changed, 90 insertions(+), 28 deletions(-)
> > > >
> > > > --
> > > > 2.43.0
> > > >
> > > >
> > >
Le Fri, Feb 23, 2024 at 04:43:04PM -0800, Paul E. McKenney a ?crit :
> I do indeed mean doing cond_resched() mid-stream.
>
> One way to make this happen would be to do something like this:
>
> struct task_struct_anchor {
> struct list_head tsa_list;
> struct list_head tsa_adjust_list;
> atomic_t tsa_ref; // Or use an appropriate API.
> bool tsa_is_anchor;
> }
>
> Each task structure would contain one of these, though there are a
> number of ways to conserve space if needed.
>
> These anchors would be placed perhaps every 1,000 tasks or so. When a
> traversal encountered one, it could atomic_inc_not_zero() the reference
> count, and if that succeeded, exit the RCU read-side critical section and
> do a cond_resched(). It could then enter a new RCU read-side critical
> section, drop the reference, and continue.
>
> A traveral might container_of() its way from ->tsa_list to the
> task_struct_anchor structure, then if ->tsa_is_anchor is false,
> container_of() its way to the enclosing task structure.
>
> How to maintain proper spacing of the anchors?
>
> One way is to make the traversals do the checking. If the space between a
> pair of anchors was to large or too small, it could add the first of the
> pair to a list to be adjusted. This list could periodically be processed,
> perhaps with more urgency if a huge gap had opened up.
>
> Freeing an anchor requires decrementing the reference count, waiting for
> it to go to zero, removing the anchor, waiting for a grace period (perhaps
> asynchronously), and only then freeing the anchor.
>
> Anchors cannot be moved, only added or removed.
>
> So it is possible. But is it reasonable? ;-)
Wow! And this will need to be done both for process leaders (p->tasks)
and for threads (p->thread_node) :-)
On Mon, Feb 26, 2024 at 02:56:06PM +0100, Frederic Weisbecker wrote:
> Le Fri, Feb 23, 2024 at 04:43:04PM -0800, Paul E. McKenney a ?crit :
> > I do indeed mean doing cond_resched() mid-stream.
> >
> > One way to make this happen would be to do something like this:
> >
> > struct task_struct_anchor {
> > struct list_head tsa_list;
> > struct list_head tsa_adjust_list;
> > atomic_t tsa_ref; // Or use an appropriate API.
> > bool tsa_is_anchor;
> > }
> >
> > Each task structure would contain one of these, though there are a
> > number of ways to conserve space if needed.
> >
> > These anchors would be placed perhaps every 1,000 tasks or so. When a
> > traversal encountered one, it could atomic_inc_not_zero() the reference
> > count, and if that succeeded, exit the RCU read-side critical section and
> > do a cond_resched(). It could then enter a new RCU read-side critical
> > section, drop the reference, and continue.
> >
> > A traveral might container_of() its way from ->tsa_list to the
> > task_struct_anchor structure, then if ->tsa_is_anchor is false,
> > container_of() its way to the enclosing task structure.
> >
> > How to maintain proper spacing of the anchors?
> >
> > One way is to make the traversals do the checking. If the space between a
> > pair of anchors was to large or too small, it could add the first of the
> > pair to a list to be adjusted. This list could periodically be processed,
> > perhaps with more urgency if a huge gap had opened up.
> >
> > Freeing an anchor requires decrementing the reference count, waiting for
> > it to go to zero, removing the anchor, waiting for a grace period (perhaps
> > asynchronously), and only then freeing the anchor.
> >
> > Anchors cannot be moved, only added or removed.
> >
> > So it is possible. But is it reasonable? ;-)
>
> Wow! And this will need to be done both for process leaders (p->tasks)
> and for threads (p->thread_node) :-)
True enough! Your point being? ;-)
Thanx, Paul