LinuxLists.cc - [PATCH RT v2 2/3] sched: migrate_enable: Use sleeping

[permalink] [raw]

Subject: Re: [PATCH RT v2 2/3] sched: migrate_enable: Use sleeping_lock to indicate involuntary sleep

On Wed, Aug 21, 2019 at 06:19:05PM -0500, Scott Wood wrote:
> Without this, rcu_note_context_switch() will complain if an RCU read
> lock is held when migrate_enable() calls stop_one_cpu().
>
> Signed-off-by: Scott Wood <[email protected]>

I have to ask... Both sleeping_lock_inc() and sleeping_lock_dec() are
no-ops if not CONFIG_PREEMPT_RT_BASE?

Thanx, Paul

> ---
> v2: Added comment.
>
> If my migrate disable changes aren't taken, then pin_current_cpu()
> will also need to use sleeping_lock_inc() because calling
> __read_rt_lock() bypasses the usual place it's done.
>
> include/linux/sched.h | 4 ++--
> kernel/rcu/tree_plugin.h | 2 +-
> kernel/sched/core.c | 8 ++++++++
> 3 files changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 7e892e727f12..1ebc97f28009 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -673,7 +673,7 @@ struct task_struct {
> int migrate_disable_atomic;
> # endif
> #endif
> -#ifdef CONFIG_PREEMPT_RT_FULL
> +#ifdef CONFIG_PREEMPT_RT_BASE
> int sleeping_lock;
> #endif
>
> @@ -1881,7 +1881,7 @@ static __always_inline bool need_resched(void)
> return unlikely(tif_need_resched());
> }
>
> -#ifdef CONFIG_PREEMPT_RT_FULL
> +#ifdef CONFIG_PREEMPT_RT_BASE
> static inline void sleeping_lock_inc(void)
> {
> current->sleeping_lock++;
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 23a54e4b649c..7a3aa085ce2c 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -292,7 +292,7 @@ void rcu_note_context_switch(bool preempt)
> barrier(); /* Avoid RCU read-side critical sections leaking down. */
> trace_rcu_utilization(TPS("Start context switch"));
> lockdep_assert_irqs_disabled();
> -#if defined(CONFIG_PREEMPT_RT_FULL)
> +#if defined(CONFIG_PREEMPT_RT_BASE)
> sleeping_l = t->sleeping_lock;
> #endif
> WARN_ON_ONCE(!preempt && t->rcu_read_lock_nesting > 0 && !sleeping_l);
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index e1bdd7f9be05..0758ee85634e 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7405,7 +7405,15 @@ void migrate_enable(void)
> unpin_current_cpu();
> preempt_lazy_enable();
> preempt_enable();
> +
> + /*
> + * sleeping_lock_inc suppresses a debug check for
> + * sleeping inside an RCU read side critical section
> + */
> + sleeping_lock_inc();
> stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg);
> + sleeping_lock_dec();
> +
> return;
> }
> }
> --
> 1.8.3.1
>

2019-08-23 15:54:22

by Crystal Wood

[permalink] [raw]

Subject: Re: [PATCH RT v2 2/3] sched: migrate_enable: Use sleeping_lock to indicate involuntary sleep

On Wed, 2019-08-21 at 16:35 -0700, Paul E. McKenney wrote:
> On Wed, Aug 21, 2019 at 06:19:05PM -0500, Scott Wood wrote:
> > Without this, rcu_note_context_switch() will complain if an RCU read
> > lock is held when migrate_enable() calls stop_one_cpu().
> >
> > Signed-off-by: Scott Wood <[email protected]>
>
> I have to ask... Both sleeping_lock_inc() and sleeping_lock_dec() are
> no-ops if not CONFIG_PREEMPT_RT_BASE?

Yes.

-Scott

2019-08-23 23:33:50

[permalink] [raw]

Subject: Re: [PATCH RT v2 2/3] sched: migrate_enable: Use sleeping_lock to indicate involuntary sleep

On 2019-08-21 18:19:05 [-0500], Scott Wood wrote:
> Without this, rcu_note_context_switch() will complain if an RCU read
> lock is held when migrate_enable() calls stop_one_cpu().
>
> Signed-off-by: Scott Wood <[email protected]>
> ---
> v2: Added comment.
>
> If my migrate disable changes aren't taken, then pin_current_cpu()
> will also need to use sleeping_lock_inc() because calling
> __read_rt_lock() bypasses the usual place it's done.
>
> include/linux/sched.h | 4 ++--
> kernel/rcu/tree_plugin.h | 2 +-
> kernel/sched/core.c | 8 ++++++++
> 3 files changed, 11 insertions(+), 3 deletions(-)
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7405,7 +7405,15 @@ void migrate_enable(void)
> unpin_current_cpu();
> preempt_lazy_enable();
> preempt_enable();
> +
> + /*
> + * sleeping_lock_inc suppresses a debug check for
> + * sleeping inside an RCU read side critical section
> + */
> + sleeping_lock_inc();
> stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg);
> + sleeping_lock_dec();

this looks like an ugly hack. This sleeping_lock_inc() is used where we
actually hold a sleeping lock and schedule() which is okay. But this
would mean we hold a RCU lock and schedule() anyway. Is that okay?

> +
> return;
> }
> }

Sebastian

2019-08-23 23:37:03

by Crystal Wood

[permalink] [raw]

Subject: Re: [PATCH RT v2 2/3] sched: migrate_enable: Use sleeping_lock to indicate involuntary sleep

On Fri, 2019-08-23 at 18:20 +0200, Sebastian Andrzej Siewior wrote:
> On 2019-08-21 18:19:05 [-0500], Scott Wood wrote:
> > Without this, rcu_note_context_switch() will complain if an RCU read
> > lock is held when migrate_enable() calls stop_one_cpu().
> >
> > Signed-off-by: Scott Wood <[email protected]>
> > ---
> > v2: Added comment.
> >
> > If my migrate disable changes aren't taken, then pin_current_cpu()
> > will also need to use sleeping_lock_inc() because calling
> > __read_rt_lock() bypasses the usual place it's done.
> >
> > include/linux/sched.h | 4 ++--
> > kernel/rcu/tree_plugin.h | 2 +-
> > kernel/sched/core.c | 8 ++++++++
> > 3 files changed, 11 insertions(+), 3 deletions(-)
> >
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -7405,7 +7405,15 @@ void migrate_enable(void)
> > unpin_current_cpu();
> > preempt_lazy_enable();
> > preempt_enable();
> > +
> > + /*
> > + * sleeping_lock_inc suppresses a debug check for
> > + * sleeping inside an RCU read side critical section
> > + */
> > + sleeping_lock_inc();
> > stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg);
> > + sleeping_lock_dec();
>
> this looks like an ugly hack. This sleeping_lock_inc() is used where we
> actually hold a sleeping lock and schedule() which is okay. But this
> would mean we hold a RCU lock and schedule() anyway. Is that okay?

Perhaps the name should be changed, but the concept is the same -- RT-
specific sleeping which should be considered involuntary for the purpose of
debug checks. Voluntary sleeping is not allowed in an RCU critical section
because it will break the critical section on certain flavors of RCU, but
that doesn't apply to the flavor used on RT. Sleeping for a long time in an
RCU critical section would also be a bad thing, but that also doesn't apply
here.

-Scott

2019-08-24 03:12:38

[permalink] [raw]

Subject: Re: [PATCH RT v2 2/3] sched: migrate_enable: Use sleeping_lock to indicate involuntary sleep

On Fri, Aug 23, 2019 at 02:28:46PM -0500, Scott Wood wrote:
> On Fri, 2019-08-23 at 18:20 +0200, Sebastian Andrzej Siewior wrote:
> > On 2019-08-21 18:19:05 [-0500], Scott Wood wrote:
> > > Without this, rcu_note_context_switch() will complain if an RCU read
> > > lock is held when migrate_enable() calls stop_one_cpu().
> > >
> > > Signed-off-by: Scott Wood <[email protected]>
> > > ---
> > > v2: Added comment.
> > >
> > > If my migrate disable changes aren't taken, then pin_current_cpu()
> > > will also need to use sleeping_lock_inc() because calling
> > > __read_rt_lock() bypasses the usual place it's done.
> > >
> > > include/linux/sched.h | 4 ++--
> > > kernel/rcu/tree_plugin.h | 2 +-
> > > kernel/sched/core.c | 8 ++++++++
> > > 3 files changed, 11 insertions(+), 3 deletions(-)
> > >
> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -7405,7 +7405,15 @@ void migrate_enable(void)
> > > unpin_current_cpu();
> > > preempt_lazy_enable();
> > > preempt_enable();
> > > +
> > > + /*
> > > + * sleeping_lock_inc suppresses a debug check for
> > > + * sleeping inside an RCU read side critical section
> > > + */
> > > + sleeping_lock_inc();
> > > stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg);
> > > + sleeping_lock_dec();
> >
> > this looks like an ugly hack. This sleeping_lock_inc() is used where we
> > actually hold a sleeping lock and schedule() which is okay. But this
> > would mean we hold a RCU lock and schedule() anyway. Is that okay?
>
> Perhaps the name should be changed, but the concept is the same -- RT-
> specific sleeping which should be considered involuntary for the purpose of
> debug checks. Voluntary sleeping is not allowed in an RCU critical section
> because it will break the critical section on certain flavors of RCU, but
> that doesn't apply to the flavor used on RT. Sleeping for a long time in an
> RCU critical section would also be a bad thing, but that also doesn't apply
> here.

I think the name should definitely be changed. At best, it is super confusing to
call it "sleeping_lock" for this scenario. In fact here, you are not even
blocking on a lock.

Maybe "sleeping_allowed" or some such.

thanks,

- Joel

2019-08-26 15:42:10

[permalink] [raw]

Subject: Re: [PATCH RT v2 2/3] sched: migrate_enable: Use sleeping_lock to indicate involuntary sleep

On 2019-08-23 23:10:14 [-0400], Joel Fernandes wrote:
> On Fri, Aug 23, 2019 at 02:28:46PM -0500, Scott Wood wrote:
> > On Fri, 2019-08-23 at 18:20 +0200, Sebastian Andrzej Siewior wrote:
> > >
> > > this looks like an ugly hack. This sleeping_lock_inc() is used where we
> > > actually hold a sleeping lock and schedule() which is okay. But this
> > > would mean we hold a RCU lock and schedule() anyway. Is that okay?
> >
> > Perhaps the name should be changed, but the concept is the same -- RT-
> > specific sleeping which should be considered involuntary for the purpose of
> > debug checks. Voluntary sleeping is not allowed in an RCU critical section
> > because it will break the critical section on certain flavors of RCU, but
> > that doesn't apply to the flavor used on RT. Sleeping for a long time in an
> > RCU critical section would also be a bad thing, but that also doesn't apply
> > here.
>
> I think the name should definitely be changed. At best, it is super confusing to
> call it "sleeping_lock" for this scenario. In fact here, you are not even
> blocking on a lock.
>
> Maybe "sleeping_allowed" or some such.

The mechanism that is used here may change in future. I just wanted to
make sure that from RCU's side it is okay to schedule here.

> thanks,
>
> - Joel

Sebastian

2019-08-26 16:54:46

[permalink] [raw]

Subject: Re: [PATCH RT v2 2/3] sched: migrate_enable: Use sleeping_lock to indicate involuntary sleep

On Mon, Aug 26, 2019 at 05:25:23PM +0200, Sebastian Andrzej Siewior wrote:
> On 2019-08-23 23:10:14 [-0400], Joel Fernandes wrote:
> > On Fri, Aug 23, 2019 at 02:28:46PM -0500, Scott Wood wrote:
> > > On Fri, 2019-08-23 at 18:20 +0200, Sebastian Andrzej Siewior wrote:
> > > >
> > > > this looks like an ugly hack. This sleeping_lock_inc() is used where we
> > > > actually hold a sleeping lock and schedule() which is okay. But this
> > > > would mean we hold a RCU lock and schedule() anyway. Is that okay?
> > >
> > > Perhaps the name should be changed, but the concept is the same -- RT-
> > > specific sleeping which should be considered involuntary for the purpose of
> > > debug checks. Voluntary sleeping is not allowed in an RCU critical section
> > > because it will break the critical section on certain flavors of RCU, but
> > > that doesn't apply to the flavor used on RT. Sleeping for a long time in an
> > > RCU critical section would also be a bad thing, but that also doesn't apply
> > > here.
> >
> > I think the name should definitely be changed. At best, it is super confusing to
> > call it "sleeping_lock" for this scenario. In fact here, you are not even
> > blocking on a lock.
> >
> > Maybe "sleeping_allowed" or some such.
>
> The mechanism that is used here may change in future. I just wanted to
> make sure that from RCU's side it is okay to schedule here.

Good point.

The effect from RCU's viewpoint will be to split any non-rcu_read_lock()
RCU read-side critical section at this point. This alrady happens in a
few places, for example, rcu_note_context_switch() constitutes an RCU
quiescent state despite being invoked with interrupts disabled (as is
required!). The __schedule() function just needs to understand (and does
understand) that the RCU read-side critical section that would otherwise
span that call to rcu_node_context_switch() is split in two by that call.

However, if this was instead an rcu_read_lock() critical section within
a PREEMPT=y kernel, then if a schedule() occured within stop_one_task(),
RCU would consider that critical section to be preempted. This means
that any RCU grace period that is blocked by this RCU read-side critical
section would remain blocked until stop_one_cpu() resumed, returned,
and so on until the matching rcu_read_unlock() was reached. In other
words, RCU would consider that RCU read-side critical section to span
the call to stop_one_cpu() even if stop_one_cpu() invoked schedule().

On the other hand, within a PREEMPT=n kernel, the call to schedule()
would split even an rcu_read_lock() critical section. Which is why I
asked earlier if sleeping_lock_inc() and sleeping_lock_dec() are no-ops
in !PREEMPT_RT_BASE kernels. We would after all want the usual lockdep
complaints in that case.

Does that help, or am I missing the point?

Thanx, Paul

2019-08-26 18:49:12

by Crystal Wood

[permalink] [raw]

Subject: Re: [PATCH RT v2 2/3] sched: migrate_enable: Use sleeping_lock to indicate involuntary sleep

On Mon, 2019-08-26 at 09:29 -0700, Paul E. McKenney wrote:
> On Mon, Aug 26, 2019 at 05:25:23PM +0200, Sebastian Andrzej Siewior wrote:
> > On 2019-08-23 23:10:14 [-0400], Joel Fernandes wrote:
> > > On Fri, Aug 23, 2019 at 02:28:46PM -0500, Scott Wood wrote:
> > > > On Fri, 2019-08-23 at 18:20 +0200, Sebastian Andrzej Siewior wrote:
> > > > > this looks like an ugly hack. This sleeping_lock_inc() is used
> > > > > where we
> > > > > actually hold a sleeping lock and schedule() which is okay. But
> > > > > this
> > > > > would mean we hold a RCU lock and schedule() anyway. Is that okay?
> > > >
> > > > Perhaps the name should be changed, but the concept is the same --
> > > > RT-
> > > > specific sleeping which should be considered involuntary for the
> > > > purpose of
> > > > debug checks. Voluntary sleeping is not allowed in an RCU critical
> > > > section
> > > > because it will break the critical section on certain flavors of
> > > > RCU, but
> > > > that doesn't apply to the flavor used on RT. Sleeping for a long
> > > > time in an
> > > > RCU critical section would also be a bad thing, but that also
> > > > doesn't apply
> > > > here.
> > >
> > > I think the name should definitely be changed. At best, it is super
> > > confusing to
> > > call it "sleeping_lock" for this scenario. In fact here, you are not
> > > even
> > > blocking on a lock.
> > >
> > > Maybe "sleeping_allowed" or some such.
> >
> > The mechanism that is used here may change in future. I just wanted to
> > make sure that from RCU's side it is okay to schedule here.
>
> Good point.
>
> The effect from RCU's viewpoint will be to split any non-rcu_read_lock()
> RCU read-side critical section at this point. This alrady happens in a
> few places, for example, rcu_note_context_switch() constitutes an RCU
> quiescent state despite being invoked with interrupts disabled (as is
> required!). The __schedule() function just needs to understand (and does
> understand) that the RCU read-side critical section that would otherwise
> span that call to rcu_node_context_switch() is split in two by that call.
>
> However, if this was instead an rcu_read_lock() critical section within
> a PREEMPT=y kernel, then if a schedule() occured within stop_one_task(),
> RCU would consider that critical section to be preempted. This means
> that any RCU grace period that is blocked by this RCU read-side critical
> section would remain blocked until stop_one_cpu() resumed, returned,
> and so on until the matching rcu_read_unlock() was reached. In other
> words, RCU would consider that RCU read-side critical section to span
> the call to stop_one_cpu() even if stop_one_cpu() invoked schedule().
>
> On the other hand, within a PREEMPT=n kernel, the call to schedule()
> would split even an rcu_read_lock() critical section. Which is why I
> asked earlier if sleeping_lock_inc() and sleeping_lock_dec() are no-ops
> in !PREEMPT_RT_BASE kernels. We would after all want the usual lockdep
> complaints in that case.

migrate_enable() is PREEMPT_RT_BASE-specific -- this code won't execute at
all with PREEMPT=n.

-Scott

2019-08-26 19:02:17

[permalink] [raw]

Subject: Re: [PATCH RT v2 2/3] sched: migrate_enable: Use sleeping_lock to indicate involuntary sleep

On Mon, Aug 26, 2019 at 12:49:22PM -0500, Scott Wood wrote:
> On Mon, 2019-08-26 at 09:29 -0700, Paul E. McKenney wrote:
> > On Mon, Aug 26, 2019 at 05:25:23PM +0200, Sebastian Andrzej Siewior wrote:
> > > On 2019-08-23 23:10:14 [-0400], Joel Fernandes wrote:
> > > > On Fri, Aug 23, 2019 at 02:28:46PM -0500, Scott Wood wrote:
> > > > > On Fri, 2019-08-23 at 18:20 +0200, Sebastian Andrzej Siewior wrote:
> > > > > > this looks like an ugly hack. This sleeping_lock_inc() is used
> > > > > > where we
> > > > > > actually hold a sleeping lock and schedule() which is okay. But
> > > > > > this
> > > > > > would mean we hold a RCU lock and schedule() anyway. Is that okay?
> > > > >
> > > > > Perhaps the name should be changed, but the concept is the same --
> > > > > RT-
> > > > > specific sleeping which should be considered involuntary for the
> > > > > purpose of
> > > > > debug checks. Voluntary sleeping is not allowed in an RCU critical
> > > > > section
> > > > > because it will break the critical section on certain flavors of
> > > > > RCU, but
> > > > > that doesn't apply to the flavor used on RT. Sleeping for a long
> > > > > time in an
> > > > > RCU critical section would also be a bad thing, but that also
> > > > > doesn't apply
> > > > > here.
> > > >
> > > > I think the name should definitely be changed. At best, it is super
> > > > confusing to
> > > > call it "sleeping_lock" for this scenario. In fact here, you are not
> > > > even
> > > > blocking on a lock.
> > > >
> > > > Maybe "sleeping_allowed" or some such.
> > >
> > > The mechanism that is used here may change in future. I just wanted to
> > > make sure that from RCU's side it is okay to schedule here.
> >
> > Good point.
> >
> > The effect from RCU's viewpoint will be to split any non-rcu_read_lock()
> > RCU read-side critical section at this point. This alrady happens in a
> > few places, for example, rcu_note_context_switch() constitutes an RCU
> > quiescent state despite being invoked with interrupts disabled (as is
> > required!). The __schedule() function just needs to understand (and does
> > understand) that the RCU read-side critical section that would otherwise
> > span that call to rcu_node_context_switch() is split in two by that call.
> >
> > However, if this was instead an rcu_read_lock() critical section within
> > a PREEMPT=y kernel, then if a schedule() occured within stop_one_task(),
> > RCU would consider that critical section to be preempted. This means
> > that any RCU grace period that is blocked by this RCU read-side critical
> > section would remain blocked until stop_one_cpu() resumed, returned,
> > and so on until the matching rcu_read_unlock() was reached. In other
> > words, RCU would consider that RCU read-side critical section to span
> > the call to stop_one_cpu() even if stop_one_cpu() invoked schedule().
> >
> > On the other hand, within a PREEMPT=n kernel, the call to schedule()
> > would split even an rcu_read_lock() critical section. Which is why I
> > asked earlier if sleeping_lock_inc() and sleeping_lock_dec() are no-ops
> > in !PREEMPT_RT_BASE kernels. We would after all want the usual lockdep
> > complaints in that case.
>
> migrate_enable() is PREEMPT_RT_BASE-specific -- this code won't execute at
> all with PREEMPT=n.

Understood! And yes, that was your answer to my question. Me, I was
just answering Sebastian's question. ;-)

Thanx, Paul

2019-08-27 09:25:14

[permalink] [raw]

Subject: Re: [PATCH RT v2 2/3] sched: migrate_enable: Use sleeping_lock to indicate involuntary sleep

On 2019-08-26 09:29:45 [-0700], Paul E. McKenney wrote:
> > The mechanism that is used here may change in future. I just wanted to
> > make sure that from RCU's side it is okay to schedule here.
>
> Good point.
>
> The effect from RCU's viewpoint will be to split any non-rcu_read_lock()
> RCU read-side critical section at this point. This alrady happens in a
> few places, for example, rcu_note_context_switch() constitutes an RCU
> quiescent state despite being invoked with interrupts disabled (as is
> required!). The __schedule() function just needs to understand (and does
> understand) that the RCU read-side critical section that would otherwise
> span that call to rcu_node_context_switch() is split in two by that call.

Okay. So I read this as invoking schedule() at this point is okay.
Looking at this again, this could also happen on a PREEMPT=y kernel if
the kernel decides to preempt a task within a rcu_read_lock() section
and put it back later on another CPU.

> However, if this was instead an rcu_read_lock() critical section within
> a PREEMPT=y kernel, then if a schedule() occured within stop_one_task(),
> RCU would consider that critical section to be preempted. This means
> that any RCU grace period that is blocked by this RCU read-side critical
> section would remain blocked until stop_one_cpu() resumed, returned,
> and so on until the matching rcu_read_unlock() was reached. In other
> words, RCU would consider that RCU read-side critical section to span
> the call to stop_one_cpu() even if stop_one_cpu() invoked schedule().

Isn't that my example from above and what we do in RT? My understanding
is that this is the reason why we need BOOST on RT otherwise the RCU
critical section could remain blocked for some time.

> On the other hand, within a PREEMPT=n kernel, the call to schedule()
> would split even an rcu_read_lock() critical section. Which is why I
> asked earlier if sleeping_lock_inc() and sleeping_lock_dec() are no-ops
> in !PREEMPT_RT_BASE kernels. We would after all want the usual lockdep
> complaints in that case.

sleeping_lock_inc() +dec() is only RT specific. It is part of RT's
spin_lock() implementation and used by RCU (rcu_note_context_switch())
to not complain if invoked within a critical section.

> Does that help, or am I missing the point?
>
> Thanx, Paul
Sebastian

2019-08-27 13:10:48

[permalink] [raw]

Subject: Re: [PATCH RT v2 2/3] sched: migrate_enable: Use sleeping_lock to indicate involuntary sleep

On Tue, Aug 27, 2019 at 11:23:33AM +0200, Sebastian Andrzej Siewior wrote:
[snip]
> > However, if this was instead an rcu_read_lock() critical section within
> > a PREEMPT=y kernel, then if a schedule() occured within stop_one_task(),
> > RCU would consider that critical section to be preempted. This means
> > that any RCU grace period that is blocked by this RCU read-side critical
> > section would remain blocked until stop_one_cpu() resumed, returned,
> > and so on until the matching rcu_read_unlock() was reached. In other
> > words, RCU would consider that RCU read-side critical section to span
> > the call to stop_one_cpu() even if stop_one_cpu() invoked schedule().
>
> Isn't that my example from above and what we do in RT? My understanding
> is that this is the reason why we need BOOST on RT otherwise the RCU
> critical section could remain blocked for some time.

Not just for boost, it is needed to block the grace period itself on
PREEMPT=y. On PREEMPT=y, if rcu_note_context_switch() happens in middle of a
rcu_read_lock() reader section, then the task is added to a blocked list
(rcu_preempt_ctxt_queue). Then just after that, the CPU reports a QS state
(rcu_qs()) as you can see in the PREEMPT=y implementation of
rcu_note_context_switch(). Even though the CPU has reported a QS, the grace
period will not end because the preempted (or block as could be in -rt) task
is still blocking the grace period. This is fundamental to the function of
Preemptible-RCU where there is the concept of tasks blocking a grace period,
not just CPUs.

I think what Paul is trying to explain AIUI (Paul please let me know if I
missed something):

(1) Anyone calling rcu_note_context_switch() and expecting it to respect
RCU-readers that are readers as a result of interrupt disabled regions, have
incorrect expectations. So calling rcu_note_context_switch() has to be done
carefully.

(2) Disabling interrupts is "generally" implied as an RCU-sched flavor
reader. However, invoking rcu_note_context_switch() from a disabled interrupt
region is *required* for rcu_note_context_switch() to work correctly.

(3) On PREEMPT=y kernels, invoking rcu_note_context_switch() from an
interrupt disabled region does not mean that that the task will be added to a
blocked list (unless it is also in an RCU-preempt reader) so
rcu_note_context_switch() may immediately report a quiescent state and
nothing blockings the grace period.
So callers of rcu_note_context_switch() must be aware of this behavior.

(4) On PREEMPT=n, unlike PREEMPT=y, there is no blocked list handling and so
nothing will block the grace period once rcu_note_context_switch() is called.
So any path calling rcu_note_context_switch() on a PREEMPT=n kernel, in the
middle of something that is expected to be an RCU reader would be really bad
from an RCU view point.

Probably, we should add this all to documentation somewhere.

thanks!

- Joel

> > On the other hand, within a PREEMPT=n kernel, the call to schedule()
> > would split even an rcu_read_lock() critical section. Which is why I
> > asked earlier if sleeping_lock_inc() and sleeping_lock_dec() are no-ops
> > in !PREEMPT_RT_BASE kernels. We would after all want the usual lockdep
> > complaints in that case.
>
> sleeping_lock_inc() +dec() is only RT specific. It is part of RT's
> spin_lock() implementation and used by RCU (rcu_note_context_switch())
> to not complain if invoked within a critical section.
>
> > Does that help, or am I missing the point?
> >
> > Thanx, Paul
> Sebastian

2019-08-27 15:55:46

[permalink] [raw]

Subject: Re: [PATCH RT v2 2/3] sched: migrate_enable: Use sleeping_lock to indicate involuntary sleep

On Tue, Aug 27, 2019 at 11:23:33AM +0200, Sebastian Andrzej Siewior wrote:
> On 2019-08-26 09:29:45 [-0700], Paul E. McKenney wrote:
> > > The mechanism that is used here may change in future. I just wanted to
> > > make sure that from RCU's side it is okay to schedule here.
> >
> > Good point.
> >
> > The effect from RCU's viewpoint will be to split any non-rcu_read_lock()
> > RCU read-side critical section at this point. This alrady happens in a
> > few places, for example, rcu_note_context_switch() constitutes an RCU
> > quiescent state despite being invoked with interrupts disabled (as is
> > required!). The __schedule() function just needs to understand (and does
> > understand) that the RCU read-side critical section that would otherwise
> > span that call to rcu_node_context_switch() is split in two by that call.
>
> Okay. So I read this as invoking schedule() at this point is okay.

As long as no one is relying on a non-rcu_read_lock() RCU
read-side critical section (local_bh_disable(), preempt_disable(),
local_irq_disable(), ...) spanning this call. But that depends on the
calling code and on other code it interacts with it, not on any specific
need on the part of RCU itself.

> Looking at this again, this could also happen on a PREEMPT=y kernel if
> the kernel decides to preempt a task within a rcu_read_lock() section
> and put it back later on another CPU.

This is an rcu_read_lock() critical section, so yes, on a PREEMPT=y
kernel, executing schedule() will cause the corresponding RCU read-side
critical section to persist, following the preempted tasks. Give or
take lockdep complaints.

On a PREEMPT=n kernel, schedule() within an RCU read-side critical
section instead results in that critical section being split in two.
And this will also results in lockdep complaints.

> > However, if this was instead an rcu_read_lock() critical section within
> > a PREEMPT=y kernel, then if a schedule() occured within stop_one_task(),
> > RCU would consider that critical section to be preempted. This means
> > that any RCU grace period that is blocked by this RCU read-side critical
> > section would remain blocked until stop_one_cpu() resumed, returned,
> > and so on until the matching rcu_read_unlock() was reached. In other
> > words, RCU would consider that RCU read-side critical section to span
> > the call to stop_one_cpu() even if stop_one_cpu() invoked schedule().
>
> Isn't that my example from above and what we do in RT? My understanding
> is that this is the reason why we need BOOST on RT otherwise the RCU
> critical section could remain blocked for some time.

At this point, I must confess that I have lost track of whose example
it is. It was first reported in 2006, if I remember correctly. ;-)

But yes, you are correct, the point of RCU priority boosting is to
cause tasks that have been preempted while within RCU read-side critical
sections to be scheduled so that they can reach their rcu_read_unlock()
calls, thus allowing the current grace period to end.

> > On the other hand, within a PREEMPT=n kernel, the call to schedule()
> > would split even an rcu_read_lock() critical section. Which is why I
> > asked earlier if sleeping_lock_inc() and sleeping_lock_dec() are no-ops
> > in !PREEMPT_RT_BASE kernels. We would after all want the usual lockdep
> > complaints in that case.
>
> sleeping_lock_inc() +dec() is only RT specific. It is part of RT's
> spin_lock() implementation and used by RCU (rcu_note_context_switch())
> to not complain if invoked within a critical section.

Then this is being called when we have something like this, correct?

DEFINE_SPINLOCK(mylock); // As opposed to DEFINE_RAW_SPINLOCK().

...

rcu_read_lock();
do_something();
spin_lock(&mylock); // Can block in -rt, thus needs sleeping_lock_inc()
...
rcu_read_unlock();

Without sleeping_lock_inc(), lockdep would complain about a voluntary
schedule within an RCU read-side critical section. But in -rt, voluntary
schedules due to sleeping on a "spinlock" are OK.

Am I understanding this correctly?

Thanx, Paul

2019-08-27 15:59:49

[permalink] [raw]

Subject: Re: [PATCH RT v2 2/3] sched: migrate_enable: Use sleeping_lock to indicate involuntary sleep

On Tue, Aug 27, 2019 at 09:08:53AM -0400, Joel Fernandes wrote:
> On Tue, Aug 27, 2019 at 11:23:33AM +0200, Sebastian Andrzej Siewior wrote:
> [snip]
> > > However, if this was instead an rcu_read_lock() critical section within
> > > a PREEMPT=y kernel, then if a schedule() occured within stop_one_task(),
> > > RCU would consider that critical section to be preempted. This means
> > > that any RCU grace period that is blocked by this RCU read-side critical
> > > section would remain blocked until stop_one_cpu() resumed, returned,
> > > and so on until the matching rcu_read_unlock() was reached. In other
> > > words, RCU would consider that RCU read-side critical section to span
> > > the call to stop_one_cpu() even if stop_one_cpu() invoked schedule().
> >
> > Isn't that my example from above and what we do in RT? My understanding
> > is that this is the reason why we need BOOST on RT otherwise the RCU
> > critical section could remain blocked for some time.
>
> Not just for boost, it is needed to block the grace period itself on
> PREEMPT=y. On PREEMPT=y, if rcu_note_context_switch() happens in middle of a
> rcu_read_lock() reader section, then the task is added to a blocked list
> (rcu_preempt_ctxt_queue). Then just after that, the CPU reports a QS state
> (rcu_qs()) as you can see in the PREEMPT=y implementation of
> rcu_note_context_switch(). Even though the CPU has reported a QS, the grace
> period will not end because the preempted (or block as could be in -rt) task
> is still blocking the grace period. This is fundamental to the function of
> Preemptible-RCU where there is the concept of tasks blocking a grace period,
> not just CPUs.
>
> I think what Paul is trying to explain AIUI (Paul please let me know if I
> missed something):
>
> (1) Anyone calling rcu_note_context_switch() and expecting it to respect
> RCU-readers that are readers as a result of interrupt disabled regions, have
> incorrect expectations. So calling rcu_note_context_switch() has to be done
> carefully.
>
> (2) Disabling interrupts is "generally" implied as an RCU-sched flavor
> reader. However, invoking rcu_note_context_switch() from a disabled interrupt
> region is *required* for rcu_note_context_switch() to work correctly.
>
> (3) On PREEMPT=y kernels, invoking rcu_note_context_switch() from an
> interrupt disabled region does not mean that that the task will be added to a
> blocked list (unless it is also in an RCU-preempt reader) so
> rcu_note_context_switch() may immediately report a quiescent state and
> nothing blockings the grace period.
> So callers of rcu_note_context_switch() must be aware of this behavior.
>
> (4) On PREEMPT=n, unlike PREEMPT=y, there is no blocked list handling and so
> nothing will block the grace period once rcu_note_context_switch() is called.
> So any path calling rcu_note_context_switch() on a PREEMPT=n kernel, in the
> middle of something that is expected to be an RCU reader would be really bad
> from an RCU view point.
>
> Probably, we should add this all to documentation somewhere.

I think that Sebastian understands this and was using the example of RCU
priority boosting to confirm his understanding. But documentation would
be good. Extremely difficult to keep current, but good. I believe that
the requirements document does cover this.

Thanx, Paul

> thanks!
>
> - Joel
>
>
> > > On the other hand, within a PREEMPT=n kernel, the call to schedule()
> > > would split even an rcu_read_lock() critical section. Which is why I
> > > asked earlier if sleeping_lock_inc() and sleeping_lock_dec() are no-ops
> > > in !PREEMPT_RT_BASE kernels. We would after all want the usual lockdep
> > > complaints in that case.
> >
> > sleeping_lock_inc() +dec() is only RT specific. It is part of RT's
> > spin_lock() implementation and used by RCU (rcu_note_context_switch())
> > to not complain if invoked within a critical section.
> >
> > > Does that help, or am I missing the point?
> > >
> > > Thanx, Paul
> > Sebastian

2019-08-27 16:07:53

[permalink] [raw]

Subject: Re: [PATCH RT v2 2/3] sched: migrate_enable: Use sleeping_lock to indicate involuntary sleep

On Tue, Aug 27, 2019 at 08:58:13AM -0700, Paul E. McKenney wrote:
> On Tue, Aug 27, 2019 at 09:08:53AM -0400, Joel Fernandes wrote:
> > On Tue, Aug 27, 2019 at 11:23:33AM +0200, Sebastian Andrzej Siewior wrote:
> > [snip]
> > > > However, if this was instead an rcu_read_lock() critical section within
> > > > a PREEMPT=y kernel, then if a schedule() occured within stop_one_task(),
> > > > RCU would consider that critical section to be preempted. This means
> > > > that any RCU grace period that is blocked by this RCU read-side critical
> > > > section would remain blocked until stop_one_cpu() resumed, returned,
> > > > and so on until the matching rcu_read_unlock() was reached. In other
> > > > words, RCU would consider that RCU read-side critical section to span
> > > > the call to stop_one_cpu() even if stop_one_cpu() invoked schedule().
> > >
> > > Isn't that my example from above and what we do in RT? My understanding
> > > is that this is the reason why we need BOOST on RT otherwise the RCU
> > > critical section could remain blocked for some time.
> >
> > Not just for boost, it is needed to block the grace period itself on
> > PREEMPT=y. On PREEMPT=y, if rcu_note_context_switch() happens in middle of a
> > rcu_read_lock() reader section, then the task is added to a blocked list
> > (rcu_preempt_ctxt_queue). Then just after that, the CPU reports a QS state
> > (rcu_qs()) as you can see in the PREEMPT=y implementation of
> > rcu_note_context_switch(). Even though the CPU has reported a QS, the grace
> > period will not end because the preempted (or block as could be in -rt) task
> > is still blocking the grace period. This is fundamental to the function of
> > Preemptible-RCU where there is the concept of tasks blocking a grace period,
> > not just CPUs.
> >
> > I think what Paul is trying to explain AIUI (Paul please let me know if I
> > missed something):
> >
> > (1) Anyone calling rcu_note_context_switch() and expecting it to respect
> > RCU-readers that are readers as a result of interrupt disabled regions, have
> > incorrect expectations. So calling rcu_note_context_switch() has to be done
> > carefully.
> >
> > (2) Disabling interrupts is "generally" implied as an RCU-sched flavor
> > reader. However, invoking rcu_note_context_switch() from a disabled interrupt
> > region is *required* for rcu_note_context_switch() to work correctly.
> >
> > (3) On PREEMPT=y kernels, invoking rcu_note_context_switch() from an
> > interrupt disabled region does not mean that that the task will be added to a
> > blocked list (unless it is also in an RCU-preempt reader) so
> > rcu_note_context_switch() may immediately report a quiescent state and
> > nothing blockings the grace period.
> > So callers of rcu_note_context_switch() must be aware of this behavior.
> >
> > (4) On PREEMPT=n, unlike PREEMPT=y, there is no blocked list handling and so
> > nothing will block the grace period once rcu_note_context_switch() is called.
> > So any path calling rcu_note_context_switch() on a PREEMPT=n kernel, in the
> > middle of something that is expected to be an RCU reader would be really bad
> > from an RCU view point.
> >
> > Probably, we should add this all to documentation somewhere.
>
> I think that Sebastian understands this and was using the example of RCU
> priority boosting to confirm his understanding. But documentation would
> be good. Extremely difficult to keep current, but good. I believe that
> the requirements document does cover this.

Oh ok, got it. Sorry about the noise then!

(In a way, I was just thinking out loud since this is a slightly confusing
topic :-P and an archive link to this discussion serves a great purpose in
my notes :-D :-)).

thanks!

- Joel

2019-08-28 09:28:49

[permalink] [raw]

Subject: Re: [PATCH RT v2 2/3] sched: migrate_enable: Use sleeping_lock to indicate involuntary sleep

On 2019-08-27 08:53:06 [-0700], Paul E. McKenney wrote:
> > > On the other hand, within a PREEMPT=n kernel, the call to schedule()
> > > would split even an rcu_read_lock() critical section. Which is why I
> > > asked earlier if sleeping_lock_inc() and sleeping_lock_dec() are no-ops
> > > in !PREEMPT_RT_BASE kernels. We would after all want the usual lockdep
> > > complaints in that case.
> >
> > sleeping_lock_inc() +dec() is only RT specific. It is part of RT's
> > spin_lock() implementation and used by RCU (rcu_note_context_switch())
> > to not complain if invoked within a critical section.
>
> Then this is being called when we have something like this, correct?
>
> DEFINE_SPINLOCK(mylock); // As opposed to DEFINE_RAW_SPINLOCK().
>
> ...
>
> rcu_read_lock();
> do_something();
> spin_lock(&mylock); // Can block in -rt, thus needs sleeping_lock_inc()
> ...
> rcu_read_unlock();
>
> Without sleeping_lock_inc(), lockdep would complain about a voluntary
> schedule within an RCU read-side critical section. But in -rt, voluntary
> schedules due to sleeping on a "spinlock" are OK.
>
> Am I understanding this correctly?

Everything perfect except that it is not lockdep complaining but the
WARN_ON_ONCE() in rcu_note_context_switch().

>
> Thanx, Paul

Sebastian

2019-08-28 12:56:08

[permalink] [raw]

Subject: Re: [PATCH RT v2 2/3] sched: migrate_enable: Use sleeping_lock to indicate involuntary sleep

On Wed, Aug 28, 2019 at 11:27:39AM +0200, Sebastian Andrzej Siewior wrote:
> On 2019-08-27 08:53:06 [-0700], Paul E. McKenney wrote:
> > > > On the other hand, within a PREEMPT=n kernel, the call to schedule()
> > > > would split even an rcu_read_lock() critical section. Which is why I
> > > > asked earlier if sleeping_lock_inc() and sleeping_lock_dec() are no-ops
> > > > in !PREEMPT_RT_BASE kernels. We would after all want the usual lockdep
> > > > complaints in that case.
> > >
> > > sleeping_lock_inc() +dec() is only RT specific. It is part of RT's
> > > spin_lock() implementation and used by RCU (rcu_note_context_switch())
> > > to not complain if invoked within a critical section.
> >
> > Then this is being called when we have something like this, correct?
> >
> > DEFINE_SPINLOCK(mylock); // As opposed to DEFINE_RAW_SPINLOCK().
> >
> > ...
> >
> > rcu_read_lock();
> > do_something();
> > spin_lock(&mylock); // Can block in -rt, thus needs sleeping_lock_inc()
> > ...
> > rcu_read_unlock();
> >
> > Without sleeping_lock_inc(), lockdep would complain about a voluntary
> > schedule within an RCU read-side critical section. But in -rt, voluntary
> > schedules due to sleeping on a "spinlock" are OK.
> >
> > Am I understanding this correctly?
>
> Everything perfect except that it is not lockdep complaining but the
> WARN_ON_ONCE() in rcu_note_context_switch().

This one, right?

WARN_ON_ONCE(!preempt && t->rcu_read_lock_nesting > 0);

Another approach would be to change that WARN_ON_ONCE(). This fix might
be too extreme, as it would suppress other issues:

WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_RT_BASE) && !preempt && t->rcu_read_lock_nesting > 0);

But maybe what is happening under the covers is that preempt is being
set when sleeping on a spinlock. Is that the case?

Thanx, Paul

2019-08-28 13:15:51

[permalink] [raw]

Subject: Re: [PATCH RT v2 2/3] sched: migrate_enable: Use sleeping_lock to indicate involuntary sleep

On 2019-08-28 05:54:26 [-0700], Paul E. McKenney wrote:
> On Wed, Aug 28, 2019 at 11:27:39AM +0200, Sebastian Andrzej Siewior wrote:
> > On 2019-08-27 08:53:06 [-0700], Paul E. McKenney wrote:
> > > Am I understanding this correctly?
> >
> > Everything perfect except that it is not lockdep complaining but the
> > WARN_ON_ONCE() in rcu_note_context_switch().
>
> This one, right?
>
> WARN_ON_ONCE(!preempt && t->rcu_read_lock_nesting > 0);
>
> Another approach would be to change that WARN_ON_ONCE(). This fix might
> be too extreme, as it would suppress other issues:
>
> WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_RT_BASE) && !preempt && t->rcu_read_lock_nesting > 0);
>
> But maybe what is happening under the covers is that preempt is being
> set when sleeping on a spinlock. Is that the case?

I would like to keep that check and that is why we have:

| #if defined(CONFIG_PREEMPT_RT_FULL)
| sleeping_l = t->sleeping_lock;
| #endif
| WARN_ON_ONCE(!preempt && t->rcu_read_lock_nesting > 0 && !sleeping_l);

in -RT and ->sleeping_lock is that counter that is incremented in
spin_lock(). And the only reason why sleeping_lock_inc() was used in the
patch was to disable _this_ warning.

> Thanx, Paul

Sebastian

2019-08-28 14:01:05

[permalink] [raw]

Subject: Re: [PATCH RT v2 2/3] sched: migrate_enable: Use sleeping_lock to indicate involuntary sleep

On Wed, Aug 28, 2019 at 03:14:33PM +0200, Sebastian Andrzej Siewior wrote:
> On 2019-08-28 05:54:26 [-0700], Paul E. McKenney wrote:
> > On Wed, Aug 28, 2019 at 11:27:39AM +0200, Sebastian Andrzej Siewior wrote:
> > > On 2019-08-27 08:53:06 [-0700], Paul E. McKenney wrote:
> > > > Am I understanding this correctly?
> > >
> > > Everything perfect except that it is not lockdep complaining but the
> > > WARN_ON_ONCE() in rcu_note_context_switch().
> >
> > This one, right?
> >
> > WARN_ON_ONCE(!preempt && t->rcu_read_lock_nesting > 0);
> >
> > Another approach would be to change that WARN_ON_ONCE(). This fix might
> > be too extreme, as it would suppress other issues:
> >
> > WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_RT_BASE) && !preempt && t->rcu_read_lock_nesting > 0);
> >
> > But maybe what is happening under the covers is that preempt is being
> > set when sleeping on a spinlock. Is that the case?
>
> I would like to keep that check and that is why we have:
>
> | #if defined(CONFIG_PREEMPT_RT_FULL)
> | sleeping_l = t->sleeping_lock;
> | #endif
> | WARN_ON_ONCE(!preempt && t->rcu_read_lock_nesting > 0 && !sleeping_l);
>
> in -RT and ->sleeping_lock is that counter that is incremented in
> spin_lock(). And the only reason why sleeping_lock_inc() was used in the
> patch was to disable _this_ warning.

Makes sense, Sebastian.

Paul, you meant "!" in front of the IS_ENABLED right in your code snippet right?

The other issue with:
WARN_ON_ONCE(!IS_ENABLED(CONFIG_PREEMPT_RT_BASE) && !preempt && t->rcu_read_lock_nesting > 0);

.. could be that, the warning will be disabled for -rt entirely, not just for
sleeping locks. And we probably do want to keep this warning for the cases in
-rt where we are blocking but it is not a sleeping lock. Right?

thanks,

- Joel

2019-08-28 15:52:18

[permalink] [raw]

Subject: Re: [PATCH RT v2 2/3] sched: migrate_enable: Use sleeping_lock to indicate involuntary sleep

On Wed, Aug 28, 2019 at 03:14:33PM +0200, Sebastian Andrzej Siewior wrote:
> On 2019-08-28 05:54:26 [-0700], Paul E. McKenney wrote:
> > On Wed, Aug 28, 2019 at 11:27:39AM +0200, Sebastian Andrzej Siewior wrote:
> > > On 2019-08-27 08:53:06 [-0700], Paul E. McKenney wrote:
> > > > Am I understanding this correctly?
> > >
> > > Everything perfect except that it is not lockdep complaining but the
> > > WARN_ON_ONCE() in rcu_note_context_switch().
> >
> > This one, right?
> >
> > WARN_ON_ONCE(!preempt && t->rcu_read_lock_nesting > 0);
> >
> > Another approach would be to change that WARN_ON_ONCE(). This fix might
> > be too extreme, as it would suppress other issues:
> >
> > WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_RT_BASE) && !preempt && t->rcu_read_lock_nesting > 0);
> >
> > But maybe what is happening under the covers is that preempt is being
> > set when sleeping on a spinlock. Is that the case?
>
> I would like to keep that check and that is why we have:
>
> | #if defined(CONFIG_PREEMPT_RT_FULL)
> | sleeping_l = t->sleeping_lock;
> | #endif
> | WARN_ON_ONCE(!preempt && t->rcu_read_lock_nesting > 0 && !sleeping_l);
>
> in -RT and ->sleeping_lock is that counter that is incremented in
> spin_lock(). And the only reason why sleeping_lock_inc() was used in the
> patch was to disable _this_ warning.

Very good! I would prefer an (one-line) access function to keep the
number of #if uses down to dull roar, but other than that, looks good!

(Yeah, I know, this is tree_preempt.h, but still...)

Thanx, Paul

2019-08-28 15:54:18