2023-10-24 21:46:41

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 0/4] rcu: Fix PF_IDLE related issues v2

The modification of PF_IDLE semantics lately to fix a bug in rcutiny
eventually introduced new bugs in RCU-tasks. In this v2, this series
propose to fix these issues without reverting:

cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup")

Frederic Weisbecker (4):
rcu: Introduce rcu_cpu_online()
rcu/tasks: Handle new PF_IDLE semantics
rcu/tasks-trace: Handle new PF_IDLE semantics
sched: Exclude CPU boot code from PF_IDLE area

include/linux/sched.h | 2 +-
kernel/cpu.c | 4 ++++
kernel/rcu/rcu.h | 2 ++
kernel/rcu/tasks.h | 33 ++++++++++++++++++++++++++++++---
kernel/rcu/tree.c | 7 +++++++
kernel/sched/idle.c | 1 -
6 files changed, 44 insertions(+), 5 deletions(-)

--
2.41.0


2023-10-24 21:46:50

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 1/4] rcu: Introduce rcu_cpu_online()

Export the RCU point of view as to when a CPU is considered offline
(ie: when does RCU consider that a CPU is sufficiently down in the
hotplug process to not feature any possible read side).

This will be used by RCU-tasks whose vision of an offline CPU should
reasonably match the one of RCU core.

Fixes: cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup")
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/rcu/rcu.h | 2 ++
kernel/rcu/tree.c | 7 +++++++
2 files changed, 9 insertions(+)

diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 0d866eaa4cc8..b531c33e9545 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -500,6 +500,7 @@ static inline void rcu_expedite_gp(void) { }
static inline void rcu_unexpedite_gp(void) { }
static inline void rcu_async_hurry(void) { }
static inline void rcu_async_relax(void) { }
+static inline bool rcu_cpu_online(int cpu) { return true; }
#else /* #ifdef CONFIG_TINY_RCU */
bool rcu_gp_is_normal(void); /* Internal RCU use. */
bool rcu_gp_is_expedited(void); /* Internal RCU use. */
@@ -509,6 +510,7 @@ void rcu_unexpedite_gp(void);
void rcu_async_hurry(void);
void rcu_async_relax(void);
void rcupdate_announce_bootup_oddness(void);
+bool rcu_cpu_online(int cpu);
#ifdef CONFIG_TASKS_RCU_GENERIC
void show_rcu_tasks_gp_kthreads(void);
#else /* #ifdef CONFIG_TASKS_RCU_GENERIC */
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 700524726079..fd21c1506092 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4202,6 +4202,13 @@ static bool rcu_rdp_cpu_online(struct rcu_data *rdp)
return !!(rdp->grpmask & rcu_rnp_online_cpus(rdp->mynode));
}

+bool rcu_cpu_online(int cpu)
+{
+ struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
+
+ return rcu_rdp_cpu_online(rdp);
+}
+
#if defined(CONFIG_PROVE_RCU) && defined(CONFIG_HOTPLUG_CPU)

/*
--
2.41.0

2023-10-24 21:46:55

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 2/4] rcu/tasks: Handle new PF_IDLE semantics

The commit:

cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup")

has changed the semantics of what is to be considered an idle task in
such a way that CPU boot code preceding the actual idle loop is excluded
from it.

This has however introduced new potential RCU-tasks stalls when either:

1) Grace period is started before init/0 had a chance to set PF_IDLE,
keeping it stuck in the holdout list until idle ever schedules.

2) Grace period is started when some possible CPUs have never been
online, keeping their idle tasks stuck in the holdout list until the
CPU ever boots up.

3) Similar to 1) but with secondary CPUs: Grace period is started
concurrently with secondary CPU booting, putting its idle task in
the holdout list because PF_IDLE isn't yet observed on it. It stays
then stuck in the holdout list until that CPU ever schedules. The
effect is mitigated here by the hotplug AP thread that must run to
bring the CPU up.

Fix this with handling the new semantics of PF_IDLE, keeping in mind
that it may or may not be set on an idle task. Take advantage of that to
strengthen the coverage of an RCU-tasks quiescent state within an idle
task, excluding the CPU boot code from it. Only the code running within
the idle loop is now a quiescent state, along with offline CPUs.

Fixes: cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup")
Suggested-by: Joel Fernandes <[email protected]>
Suggested-by: Paul E . McKenney" <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/rcu/tasks.h | 31 +++++++++++++++++++++++++++++--
1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index bf5f178fe723..acf81efe5eff 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -895,10 +895,37 @@ static void rcu_tasks_pregp_step(struct list_head *hop)
synchronize_rcu();
}

+/* Check for quiescent states since the pregp's synchronize_rcu() */
+static bool rcu_tasks_is_holdout(struct task_struct *t)
+{
+ int cpu;
+
+ /* Has the task been seen voluntarily sleeping? */
+ if (!READ_ONCE(t->on_rq))
+ return false;
+
+ cpu = task_cpu(t);
+
+ /*
+ * Idle tasks within the idle loop or offline CPUs are RCU-tasks
+ * quiescent states. But CPU boot code performed by the idle task
+ * isn't a quiescent state.
+ */
+ if (t == idle_task(cpu)) {
+ if (is_idle_task(t))
+ return false;
+
+ if (!rcu_cpu_online(cpu))
+ return false;
+ }
+
+ return true;
+}
+
/* Per-task initial processing. */
static void rcu_tasks_pertask(struct task_struct *t, struct list_head *hop)
{
- if (t != current && READ_ONCE(t->on_rq) && !is_idle_task(t)) {
+ if (t != current && rcu_tasks_is_holdout(t)) {
get_task_struct(t);
t->rcu_tasks_nvcsw = READ_ONCE(t->nvcsw);
WRITE_ONCE(t->rcu_tasks_holdout, true);
@@ -947,7 +974,7 @@ static void check_holdout_task(struct task_struct *t,

if (!READ_ONCE(t->rcu_tasks_holdout) ||
t->rcu_tasks_nvcsw != READ_ONCE(t->nvcsw) ||
- !READ_ONCE(t->on_rq) ||
+ !rcu_tasks_is_holdout(t) ||
(IS_ENABLED(CONFIG_NO_HZ_FULL) &&
!is_idle_task(t) && READ_ONCE(t->rcu_tasks_idle_cpu) >= 0)) {
WRITE_ONCE(t->rcu_tasks_holdout, false);
--
2.41.0

2023-10-24 21:47:04

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 3/4] rcu/tasks-trace: Handle new PF_IDLE semantics

The commit:

cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup")

has changed the semantics of what is to be considered an idle task in
such a way that the idle task of an offline CPU may not carry the
PF_IDLE flag anymore.

However RCU-tasks-trace tests the opposite assertion, still assuming
that idle tasks carry the PF_IDLE flag during their whole lifecycle.

Remove this assumption to avoid spurious warnings but keep the initial
test verifying that the idle task is the current task on any offline
CPU.

Reported-by: Naresh Kamboju <[email protected]>
Fixes: cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup")
Suggested-by: Joel Fernandes <[email protected]>
Suggested-by: Paul E . McKenney" <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/rcu/tasks.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index acf81efe5eff..4dd70f2af4af 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1552,7 +1552,7 @@ static int trc_inspect_reader(struct task_struct *t, void *bhp_in)
} else {
// The task is not running, so C-language access is safe.
nesting = t->trc_reader_nesting;
- WARN_ON_ONCE(ofl && task_curr(t) && !is_idle_task(t));
+ WARN_ON_ONCE(ofl && task_curr(t) && (t != idle_task(task_cpu(t))));
if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB) && ofl)
n_heavy_reader_ofl_updates++;
}
--
2.41.0

2023-10-24 21:47:08

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 4/4] sched: Exclude CPU boot code from PF_IDLE area

The commit:

cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup")

has changed the semantics of what is to be considered an idle task in
such a way that only the actual idle loop is accounted as PF_IDLE. The
intent is to exclude the CPU boot code from that coverage.

However this doesn't clear the flag when the CPU goes down. Therefore
when the CPU goes up again, its boot code is part of the PF_IDLE zone.

Make sure this flag behave consistently and clear the flag when a CPU
exits from the idle loop. If anything, RCU-tasks relies on it to exclude
CPU boot code from its quiescent states.

Fixes: cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup")
Signed-off-by: Frederic Weisbecker <[email protected]>
---
include/linux/sched.h | 2 +-
kernel/cpu.c | 4 ++++
kernel/sched/idle.c | 1 -
3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8885be2c143e..ad18962b921d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1945,7 +1945,7 @@ extern struct task_struct *idle_task(int cpu);
*/
static __always_inline bool is_idle_task(const struct task_struct *p)
{
- return !!(p->flags & PF_IDLE);
+ return !!(READ_ONCE(p->flags) & PF_IDLE);
}

extern struct task_struct *curr_task(int cpu);
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 3b9d5c7eb4a2..3a1991010f4e 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1394,7 +1394,9 @@ void cpuhp_report_idle_dead(void)
{
struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state);

+ WRITE_ONCE(current->flags, current->flags & ~PF_IDLE);
BUG_ON(st->state != CPUHP_AP_OFFLINE);
+
rcutree_report_cpu_dead();
st->state = CPUHP_AP_IDLE_DEAD;
/*
@@ -1642,6 +1644,8 @@ void cpuhp_online_idle(enum cpuhp_state state)
{
struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state);

+ WRITE_ONCE(current->flags, current->flags | PF_IDLE);
+
/* Happens for the boot cpu */
if (state != CPUHP_AP_ONLINE_IDLE)
return;
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 5007b25c5bc6..342f58a329f5 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -373,7 +373,6 @@ EXPORT_SYMBOL_GPL(play_idle_precise);

void cpu_startup_entry(enum cpuhp_state state)
{
- current->flags |= PF_IDLE;
arch_cpu_idle_prepare();
cpuhp_online_idle(state);
while (1)
--
2.41.0

2023-10-25 02:31:24

by Zqiang

[permalink] [raw]
Subject: Re: [PATCH 1/4] rcu: Introduce rcu_cpu_online()

>
> Export the RCU point of view as to when a CPU is considered offline
> (ie: when does RCU consider that a CPU is sufficiently down in the
> hotplug process to not feature any possible read side).
>
> This will be used by RCU-tasks whose vision of an offline CPU should
> reasonably match the one of RCU core.
>
> Fixes: cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup")
> Signed-off-by: Frederic Weisbecker <[email protected]>
> ---
> kernel/rcu/rcu.h | 2 ++
> kernel/rcu/tree.c | 7 +++++++
> 2 files changed, 9 insertions(+)
>
> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> index 0d866eaa4cc8..b531c33e9545 100644
> --- a/kernel/rcu/rcu.h
> +++ b/kernel/rcu/rcu.h
> @@ -500,6 +500,7 @@ static inline void rcu_expedite_gp(void) { }
> static inline void rcu_unexpedite_gp(void) { }
> static inline void rcu_async_hurry(void) { }
> static inline void rcu_async_relax(void) { }
> +static inline bool rcu_cpu_online(int cpu) { return true; }
> #else /* #ifdef CONFIG_TINY_RCU */
> bool rcu_gp_is_normal(void); /* Internal RCU use. */
> bool rcu_gp_is_expedited(void); /* Internal RCU use. */
> @@ -509,6 +510,7 @@ void rcu_unexpedite_gp(void);
> void rcu_async_hurry(void);
> void rcu_async_relax(void);
> void rcupdate_announce_bootup_oddness(void);
> +bool rcu_cpu_online(int cpu);
> #ifdef CONFIG_TASKS_RCU_GENERIC
> void show_rcu_tasks_gp_kthreads(void);
> #else /* #ifdef CONFIG_TASKS_RCU_GENERIC */
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 700524726079..fd21c1506092 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -4202,6 +4202,13 @@ static bool rcu_rdp_cpu_online(struct rcu_data *rdp)
> return !!(rdp->grpmask & rcu_rnp_online_cpus(rdp->mynode));
> }
>
> +bool rcu_cpu_online(int cpu)
> +{
> + struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
>
>

Should 'struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu)' be used ?

Thanks
Zqiang


>
> +
> + return rcu_rdp_cpu_online(rdp);
> +}
> +
> #if defined(CONFIG_PROVE_RCU) && defined(CONFIG_HOTPLUG_CPU)
>
> /*
> --
> 2.41.0
>

2023-10-25 08:41:30

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/4] rcu/tasks: Handle new PF_IDLE semantics

On Tue, Oct 24, 2023 at 11:46:23PM +0200, Frederic Weisbecker wrote:

> +/* Check for quiescent states since the pregp's synchronize_rcu() */
> +static bool rcu_tasks_is_holdout(struct task_struct *t)
> +{
> + int cpu;
> +
> + /* Has the task been seen voluntarily sleeping? */
> + if (!READ_ONCE(t->on_rq))
> + return false;
> +
> + cpu = task_cpu(t);
> +
> + /*
> + * Idle tasks within the idle loop or offline CPUs are RCU-tasks
> + * quiescent states. But CPU boot code performed by the idle task
> + * isn't a quiescent state.
> + */
> + if (t == idle_task(cpu)) {
> + if (is_idle_task(t))
> + return false;
> +
> + if (!rcu_cpu_online(cpu))
> + return false;
> + }

Hmm, why is this guarded by t == idle_task() ?

Notably, there is the idle-injection thing that uses FIFO tasks to run
'idle', see play_idle_precise(). This will (temporarily) get PF_IDLE on
tasks that are not idle_task().

> +
> + return true;
> +}
> +
> /* Per-task initial processing. */
> static void rcu_tasks_pertask(struct task_struct *t, struct list_head *hop)
> {
> - if (t != current && READ_ONCE(t->on_rq) && !is_idle_task(t)) {
> + if (t != current && rcu_tasks_is_holdout(t)) {


2023-10-25 08:49:18

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 4/4] sched: Exclude CPU boot code from PF_IDLE area

On Tue, Oct 24, 2023 at 11:46:25PM +0200, Frederic Weisbecker wrote:

> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 8885be2c143e..ad18962b921d 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1945,7 +1945,7 @@ extern struct task_struct *idle_task(int cpu);
> */
> static __always_inline bool is_idle_task(const struct task_struct *p)
> {
> - return !!(p->flags & PF_IDLE);
> + return !!(READ_ONCE(p->flags) & PF_IDLE);
> }
>
> extern struct task_struct *curr_task(int cpu);
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 3b9d5c7eb4a2..3a1991010f4e 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -1394,7 +1394,9 @@ void cpuhp_report_idle_dead(void)
> {
> struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state);
>
> + WRITE_ONCE(current->flags, current->flags & ~PF_IDLE);
> BUG_ON(st->state != CPUHP_AP_OFFLINE);
> +
> rcutree_report_cpu_dead();
> st->state = CPUHP_AP_IDLE_DEAD;
> /*
> @@ -1642,6 +1644,8 @@ void cpuhp_online_idle(enum cpuhp_state state)
> {
> struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state);
>
> + WRITE_ONCE(current->flags, current->flags | PF_IDLE);
> +
> /* Happens for the boot cpu */
> if (state != CPUHP_AP_ONLINE_IDLE)
> return;

Without changing *ALL* ->flags stores to WRITE_ONCE() I don't see the
point of this. Also, since we only care about a single bit, how does
store tearing affect things?

Not to mention if we're really paranoid, what are the SMP ordering
considerations :-)

[ also, PF_ is used for Protocol Family, Page Flag and Process Flag,
grepping is a pain in the arse :-( ]

2023-10-25 10:31:39

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 2/4] rcu/tasks: Handle new PF_IDLE semantics

Le Wed, Oct 25, 2023 at 10:40:08AM +0200, Peter Zijlstra a ?crit :
> On Tue, Oct 24, 2023 at 11:46:23PM +0200, Frederic Weisbecker wrote:
>
> > +/* Check for quiescent states since the pregp's synchronize_rcu() */
> > +static bool rcu_tasks_is_holdout(struct task_struct *t)
> > +{
> > + int cpu;
> > +
> > + /* Has the task been seen voluntarily sleeping? */
> > + if (!READ_ONCE(t->on_rq))
> > + return false;
> > +
> > + cpu = task_cpu(t);
> > +
> > + /*
> > + * Idle tasks within the idle loop or offline CPUs are RCU-tasks
> > + * quiescent states. But CPU boot code performed by the idle task
> > + * isn't a quiescent state.
> > + */
> > + if (t == idle_task(cpu)) {
> > + if (is_idle_task(t))
> > + return false;
> > +
> > + if (!rcu_cpu_online(cpu))
> > + return false;
> > + }
>
> Hmm, why is this guarded by t == idle_task() ?
>
> Notably, there is the idle-injection thing that uses FIFO tasks to run
> 'idle', see play_idle_precise(). This will (temporarily) get PF_IDLE on
> tasks that are not idle_task().

Ah good point. So indeed the is_idle_task() test doesn't musn't be
guarded by t == idle_task(cpu). But rcu_cpu_online() has to, otherwise
if it's not an idle task, there is a risk that the task gets migrated out
by the time we observe the old CPU offline.

Thanks.

>
> > +
> > + return true;
> > +}
> > +
> > /* Per-task initial processing. */
> > static void rcu_tasks_pertask(struct task_struct *t, struct list_head *hop)
> > {
> > - if (t != current && READ_ONCE(t->on_rq) && !is_idle_task(t)) {
> > + if (t != current && rcu_tasks_is_holdout(t)) {
>
>

2023-10-25 10:32:23

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 1/4] rcu: Introduce rcu_cpu_online()

Le Wed, Oct 25, 2023 at 10:29:46AM +0800, Z qiang a ?crit :
> >
> > Export the RCU point of view as to when a CPU is considered offline
> > (ie: when does RCU consider that a CPU is sufficiently down in the
> > hotplug process to not feature any possible read side).
> >
> > This will be used by RCU-tasks whose vision of an offline CPU should
> > reasonably match the one of RCU core.
> >
> > Fixes: cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup")
> > Signed-off-by: Frederic Weisbecker <[email protected]>
> > ---
> > kernel/rcu/rcu.h | 2 ++
> > kernel/rcu/tree.c | 7 +++++++
> > 2 files changed, 9 insertions(+)
> >
> > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> > index 0d866eaa4cc8..b531c33e9545 100644
> > --- a/kernel/rcu/rcu.h
> > +++ b/kernel/rcu/rcu.h
> > @@ -500,6 +500,7 @@ static inline void rcu_expedite_gp(void) { }
> > static inline void rcu_unexpedite_gp(void) { }
> > static inline void rcu_async_hurry(void) { }
> > static inline void rcu_async_relax(void) { }
> > +static inline bool rcu_cpu_online(int cpu) { return true; }
> > #else /* #ifdef CONFIG_TINY_RCU */
> > bool rcu_gp_is_normal(void); /* Internal RCU use. */
> > bool rcu_gp_is_expedited(void); /* Internal RCU use. */
> > @@ -509,6 +510,7 @@ void rcu_unexpedite_gp(void);
> > void rcu_async_hurry(void);
> > void rcu_async_relax(void);
> > void rcupdate_announce_bootup_oddness(void);
> > +bool rcu_cpu_online(int cpu);
> > #ifdef CONFIG_TASKS_RCU_GENERIC
> > void show_rcu_tasks_gp_kthreads(void);
> > #else /* #ifdef CONFIG_TASKS_RCU_GENERIC */
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 700524726079..fd21c1506092 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -4202,6 +4202,13 @@ static bool rcu_rdp_cpu_online(struct rcu_data *rdp)
> > return !!(rdp->grpmask & rcu_rnp_online_cpus(rdp->mynode));
> > }
> >
> > +bool rcu_cpu_online(int cpu)
> > +{
> > + struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> >
> >
>
> Should 'struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu)' be used ?

Oh! Good catch!

>
> Thanks
> Zqiang
>
>
> >
> > +
> > + return rcu_rdp_cpu_online(rdp);
> > +}
> > +
> > #if defined(CONFIG_PROVE_RCU) && defined(CONFIG_HOTPLUG_CPU)
> >
> > /*
> > --
> > 2.41.0
> >

2023-10-25 11:26:04

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 4/4] sched: Exclude CPU boot code from PF_IDLE area

Le Wed, Oct 25, 2023 at 10:48:33AM +0200, Peter Zijlstra a ?crit :
> On Tue, Oct 24, 2023 at 11:46:25PM +0200, Frederic Weisbecker wrote:
>
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 8885be2c143e..ad18962b921d 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -1945,7 +1945,7 @@ extern struct task_struct *idle_task(int cpu);
> > */
> > static __always_inline bool is_idle_task(const struct task_struct *p)
> > {
> > - return !!(p->flags & PF_IDLE);
> > + return !!(READ_ONCE(p->flags) & PF_IDLE);
> > }
> >
> > extern struct task_struct *curr_task(int cpu);
> > diff --git a/kernel/cpu.c b/kernel/cpu.c
> > index 3b9d5c7eb4a2..3a1991010f4e 100644
> > --- a/kernel/cpu.c
> > +++ b/kernel/cpu.c
> > @@ -1394,7 +1394,9 @@ void cpuhp_report_idle_dead(void)
> > {
> > struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state);
> >
> > + WRITE_ONCE(current->flags, current->flags & ~PF_IDLE);
> > BUG_ON(st->state != CPUHP_AP_OFFLINE);
> > +
> > rcutree_report_cpu_dead();
> > st->state = CPUHP_AP_IDLE_DEAD;
> > /*
> > @@ -1642,6 +1644,8 @@ void cpuhp_online_idle(enum cpuhp_state state)
> > {
> > struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state);
> >
> > + WRITE_ONCE(current->flags, current->flags | PF_IDLE);
> > +
> > /* Happens for the boot cpu */
> > if (state != CPUHP_AP_ONLINE_IDLE)
> > return;
>
> Without changing *ALL* ->flags stores to WRITE_ONCE() I don't see the
> point of this. Also, since we only care about a single bit, how does
> store tearing affect things?
>
> Not to mention if we're really paranoid, what are the SMP ordering
> considerations :-)
>
> [ also, PF_ is used for Protocol Family, Page Flag and Process Flag,
> grepping is a pain in the arse :-( ]

Indeed. Also cpuhp_online_idle() is called with preemption disabled
and cpuhp_report_idle_dead() with interrupts disabled. As for idle
injection in play_idle_precise(), the flag is set and cleared with
preemption disabled.

This means that all writes are in an RCU read side critical section
that RCU-tasks pre-gp's synchronize_rcu() waits for. So I don't think
we need those WRITE_ONCE/READ_ONCE.

Paul are you ok with that?

Thanks.

2023-10-26 12:16:11

by Zqiang

[permalink] [raw]
Subject: Re: [PATCH 2/4] rcu/tasks: Handle new PF_IDLE semantics

>
> Le Wed, Oct 25, 2023 at 10:40:08AM +0200, Peter Zijlstra a écrit :
> > On Tue, Oct 24, 2023 at 11:46:23PM +0200, Frederic Weisbecker wrote:
> >
> > > +/* Check for quiescent states since the pregp's synchronize_rcu() */
> > > +static bool rcu_tasks_is_holdout(struct task_struct *t)
> > > +{
> > > + int cpu;
> > > +
> > > + /* Has the task been seen voluntarily sleeping? */
> > > + if (!READ_ONCE(t->on_rq))
> > > + return false;
> > > +
> > > + cpu = task_cpu(t);
> > > +
> > > + /*
> > > + * Idle tasks within the idle loop or offline CPUs are RCU-tasks
> > > + * quiescent states. But CPU boot code performed by the idle task
> > > + * isn't a quiescent state.
> > > + */
> > > + if (t == idle_task(cpu)) {
> > > + if (is_idle_task(t))
> > > + return false;
> > > +
> > > + if (!rcu_cpu_online(cpu))
> > > + return false;
> > > + }
> >
> > Hmm, why is this guarded by t == idle_task() ?
> >
> > Notably, there is the idle-injection thing that uses FIFO tasks to run
> > 'idle', see play_idle_precise(). This will (temporarily) get PF_IDLE on
> > tasks that are not idle_task().
>
> Ah good point. So indeed the is_idle_task() test doesn't musn't be
> guarded by t == idle_task(cpu). But rcu_cpu_online() has to, otherwise
> if it's not an idle task, there is a risk that the task gets migrated out
> by the time we observe the old CPU offline.
>

If a fifo-tasks use play_idle_precise() to run idle and invoke
do_idle(), may cause
rcu-tasks to falsely report a rcu-tasks QS , when rcu_is_cpu_rrupt_from_idle()
return true in rcu_sched_clock_irq(), so should we also add a check for
"current == idle_task(task_cpu(current))" in the rcu_is_cpu_rrupt_from_idle() ?

Thanks
Zqiang


> Thanks.
>
> >
> > > +
> > > + return true;
> > > +}
> > > +
> > > /* Per-task initial processing. */
> > > static void rcu_tasks_pertask(struct task_struct *t, struct list_head *hop)
> > > {
> > > - if (t != current && READ_ONCE(t->on_rq) && !is_idle_task(t)) {
> > > + if (t != current && rcu_tasks_is_holdout(t)) {
> >
> >

2023-10-26 14:34:51

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 2/4] rcu/tasks: Handle new PF_IDLE semantics

On Thu, Oct 26, 2023 at 08:15:33PM +0800, Z qiang wrote:
> >
> > Le Wed, Oct 25, 2023 at 10:40:08AM +0200, Peter Zijlstra a ?crit :
> > > On Tue, Oct 24, 2023 at 11:46:23PM +0200, Frederic Weisbecker wrote:
> > >
> > > > +/* Check for quiescent states since the pregp's synchronize_rcu() */
> > > > +static bool rcu_tasks_is_holdout(struct task_struct *t)
> > > > +{
> > > > + int cpu;
> > > > +
> > > > + /* Has the task been seen voluntarily sleeping? */
> > > > + if (!READ_ONCE(t->on_rq))
> > > > + return false;
> > > > +
> > > > + cpu = task_cpu(t);
> > > > +
> > > > + /*
> > > > + * Idle tasks within the idle loop or offline CPUs are RCU-tasks
> > > > + * quiescent states. But CPU boot code performed by the idle task
> > > > + * isn't a quiescent state.
> > > > + */
> > > > + if (t == idle_task(cpu)) {
> > > > + if (is_idle_task(t))
> > > > + return false;
> > > > +
> > > > + if (!rcu_cpu_online(cpu))
> > > > + return false;
> > > > + }
> > >
> > > Hmm, why is this guarded by t == idle_task() ?
> > >
> > > Notably, there is the idle-injection thing that uses FIFO tasks to run
> > > 'idle', see play_idle_precise(). This will (temporarily) get PF_IDLE on
> > > tasks that are not idle_task().
> >
> > Ah good point. So indeed the is_idle_task() test doesn't musn't be
> > guarded by t == idle_task(cpu). But rcu_cpu_online() has to, otherwise
> > if it's not an idle task, there is a risk that the task gets migrated out
> > by the time we observe the old CPU offline.
> >
>
> If a fifo-tasks use play_idle_precise() to run idle and invoke
> do_idle(), may cause
> rcu-tasks to falsely report a rcu-tasks QS

Well, there can be a debate here: should we consider an idle injector as a real
task that we must wait for a voluntary schedule or should we treat it just like
an idle task?

Having that whole idle task quiescent state in RCU-tasks is quite a strange
semantic anyway. And in the long run, the purpose is to unify RCU-tasks and
RCU-tasks-RUDE with relying on ct_dynticks for idle quiescent states.

> , when rcu_is_cpu_rrupt_from_idle()
> return true in rcu_sched_clock_irq(), so should we also add a check for
> "current == idle_task(task_cpu(current))" in the rcu_is_cpu_rrupt_from_idle()
> ?

That looks fine OTOH. Whether idle injection or real idle,
rcu_is_cpu_rrupt_from_idle() is always a quiescent state in real RCU. Because
we know we have no RCU reader between ct_idle_enter() and ct_idle_exit().

Thanks.