2022-08-03 01:24:55

by Liu Song

[permalink] [raw]
Subject: [PATCH] sched/debug: avoid executing show_state and causing rcu stall warning

From: Liu Song <[email protected]>

If the number of CPUs is large, "sysrq_sched_debug_show" will execute for
a long time. Every time I execute "echo t > /proc/sysrq-trigger" on my
128-core machine, the rcu stall warning will be triggered. Moreover,
sysrq_sched_debug_show does not need to be protected by rcu_read_lock,
and no rcu stall warning will appear after adjustment.

Signed-off-by: Liu Song <[email protected]>
---
kernel/sched/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5555e49..82c117e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8879,11 +8879,11 @@ void show_state_filter(unsigned int state_filter)
sched_show_task(p);
}

+ rcu_read_unlock();
#ifdef CONFIG_SCHED_DEBUG
if (!state_filter)
sysrq_sched_debug_show();
#endif
- rcu_read_unlock();
/*
* Only show locks if all tasks are dumped:
*/
--
1.8.3.1



2022-08-03 08:47:55

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] sched/debug: avoid executing show_state and causing rcu stall warning


* Liu Song <[email protected]> wrote:

> From: Liu Song <[email protected]>
>
> If the number of CPUs is large, "sysrq_sched_debug_show" will execute for
> a long time. Every time I execute "echo t > /proc/sysrq-trigger" on my
> 128-core machine, the rcu stall warning will be triggered. Moreover,
> sysrq_sched_debug_show does not need to be protected by rcu_read_lock,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> and no rcu stall warning will appear after adjustment.
>

That doesn't mean it doesn't have to be protected by *any* lock - which
your patch implements AFAICS.

There's a couple of lines such as:

for_each_online_cpu(cpu) {

... which need to be protected against CPU hotplug events.

I haven't checked any of the deeper code to see what RCU or other
protection it may need, but clearly you didn't either ...

Thanks,

Ingo

2022-08-03 09:24:30

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] sched/debug: avoid executing show_state and causing rcu stall warning


* Liu Song <[email protected]> wrote:

>
> > * Liu Song <[email protected]> wrote:
> >
> > > From: Liu Song <[email protected]>
> > >
> > > If the number of CPUs is large, "sysrq_sched_debug_show" will execute for
> > > a long time. Every time I execute "echo t > /proc/sysrq-trigger" on my
> > > 128-core machine, the rcu stall warning will be triggered. Moreover,
> > > sysrq_sched_debug_show does not need to be protected by rcu_read_lock,
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > and no rcu stall warning will appear after adjustment.
> > >
> > That doesn't mean it doesn't have to be protected by *any* lock - which
> > your patch implements AFAICS.
> >
> > There's a couple of lines such as:
> >
> > for_each_online_cpu(cpu) {
>
> Hi,
>
> Here I refer to the implementation of "sysrq_timer_list_show", and I don't
> see any lock.
>
> Maybe there is a problem with the implementation of "sysrq_timer_list_show".

But we are talking about sysrq_sched_debug_show(), which your patch tries
to relax the RCU locking of.

Thanks,

Ingo

2022-08-03 09:24:53

by Liu Song

[permalink] [raw]
Subject: Re: [PATCH] sched/debug: avoid executing show_state and causing rcu stall warning


> * Liu Song <[email protected]> wrote:
>
>> From: Liu Song <[email protected]>
>>
>> If the number of CPUs is large, "sysrq_sched_debug_show" will execute for
>> a long time. Every time I execute "echo t > /proc/sysrq-trigger" on my
>> 128-core machine, the rcu stall warning will be triggered. Moreover,
>> sysrq_sched_debug_show does not need to be protected by rcu_read_lock,
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> and no rcu stall warning will appear after adjustment.
>>
> That doesn't mean it doesn't have to be protected by *any* lock - which
> your patch implements AFAICS.
>
> There's a couple of lines such as:
>
> for_each_online_cpu(cpu) {

Hi,

Here I refer to the implementation of "sysrq_timer_list_show", and I
don't see any lock.

Maybe there is a problem with the implementation of "sysrq_timer_list_show".


Thanks

>
> ... which need to be protected against CPU hotplug events.
>
> I haven't checked any of the deeper code to see what RCU or other
> protection it may need, but clearly you didn't either ...
>
> Thanks,
>
> Ingo

2022-08-03 09:49:54

by Liu Song

[permalink] [raw]
Subject: Re: [PATCH] sched/debug: avoid executing show_state and causing rcu stall warning

> * Liu Song <[email protected]> wrote:
>
>>> * Liu Song <[email protected]> wrote:
>>>
>>>> From: Liu Song <[email protected]>
>>>>
>>>> If the number of CPUs is large, "sysrq_sched_debug_show" will execute for
>>>> a long time. Every time I execute "echo t > /proc/sysrq-trigger" on my
>>>> 128-core machine, the rcu stall warning will be triggered. Moreover,
>>>> sysrq_sched_debug_show does not need to be protected by rcu_read_lock,
>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>> and no rcu stall warning will appear after adjustment.
>>>>
>>> That doesn't mean it doesn't have to be protected by *any* lock - which
>>> your patch implements AFAICS.
>>>
>>> There's a couple of lines such as:
>>>
>>> for_each_online_cpu(cpu) {
>> Hi,
>>
>> Here I refer to the implementation of "sysrq_timer_list_show", and I don't
>> see any lock.
>>
>> Maybe there is a problem with the implementation of "sysrq_timer_list_show".
> But we are talking about sysrq_sched_debug_show(), which your patch tries
> to relax the RCU locking of.

Hi,

I'm not sure for_each_online_cpu && print_cpu must need a lock to
protect, so I refer to other codes

under kernel that reference the implementation. It looks like some
places use "get_online_cpus" to prevent

cpu hotplug, but many places don't have obvious protection, so I'm also
confused if protection is necessarily

required.


Thanks

>
> Thanks,
>
> Ingo

2022-08-03 13:00:04

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] sched/debug: avoid executing show_state and causing rcu stall warning


[ Adding Paul ]

On Wed, 3 Aug 2022 09:18:45 +0800
Liu Song <[email protected]> wrote:

> From: Liu Song <[email protected]>
>
> If the number of CPUs is large, "sysrq_sched_debug_show" will execute for
> a long time. Every time I execute "echo t > /proc/sysrq-trigger" on my
> 128-core machine, the rcu stall warning will be triggered. Moreover,
> sysrq_sched_debug_show does not need to be protected by rcu_read_lock,
> and no rcu stall warning will appear after adjustment.
>
> Signed-off-by: Liu Song <[email protected]>
> ---
> kernel/sched/core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 5555e49..82c117e 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -8879,11 +8879,11 @@ void show_state_filter(unsigned int state_filter)
> sched_show_task(p);
> }
>
> + rcu_read_unlock();
> #ifdef CONFIG_SCHED_DEBUG
> if (!state_filter)
> sysrq_sched_debug_show();

If this is just because sysrq_sched_debug_show() is very slow, does RCU
have a way to "touch" it? Like the watchdogs have? That is, to tell RCU
"Yes I know I'm taking a long time, but I'm still making forward progress,
don't complain about me". Then the sysrq_sched_debug_show() could have:

for_each_online_cpu(cpu) {
/*
* Need to reset softlockup watchdogs on all CPUs, because
* another CPU might be blocked waiting for us to process
* an IPI or stop_machine.
*/
touch_nmi_watchdog();
touch_all_softlockup_watchdogs();
+ touch_rcu();
print_cpu(NULL, cpu);
}

??

-- Steve

> #endif
> - rcu_read_unlock();
> /*
> * Only show locks if all tasks are dumped:
> */


2022-08-03 14:24:08

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] sched/debug: avoid executing show_state and causing rcu stall warning

On Wed, Aug 03, 2022 at 08:42:35AM -0400, Steven Rostedt wrote:
>
> [ Adding Paul ]
>
> On Wed, 3 Aug 2022 09:18:45 +0800
> Liu Song <[email protected]> wrote:
>
> > From: Liu Song <[email protected]>
> >
> > If the number of CPUs is large, "sysrq_sched_debug_show" will execute for
> > a long time. Every time I execute "echo t > /proc/sysrq-trigger" on my
> > 128-core machine, the rcu stall warning will be triggered. Moreover,
> > sysrq_sched_debug_show does not need to be protected by rcu_read_lock,
> > and no rcu stall warning will appear after adjustment.
> >
> > Signed-off-by: Liu Song <[email protected]>
> > ---
> > kernel/sched/core.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 5555e49..82c117e 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -8879,11 +8879,11 @@ void show_state_filter(unsigned int state_filter)
> > sched_show_task(p);
> > }
> >
> > + rcu_read_unlock();
> > #ifdef CONFIG_SCHED_DEBUG
> > if (!state_filter)
> > sysrq_sched_debug_show();
>
> If this is just because sysrq_sched_debug_show() is very slow, does RCU
> have a way to "touch" it? Like the watchdogs have? That is, to tell RCU
> "Yes I know I'm taking a long time, but I'm still making forward progress,
> don't complain about me". Then the sysrq_sched_debug_show() could have:
>
> for_each_online_cpu(cpu) {
> /*
> * Need to reset softlockup watchdogs on all CPUs, because
> * another CPU might be blocked waiting for us to process
> * an IPI or stop_machine.
> */
> touch_nmi_watchdog();
> touch_all_softlockup_watchdogs();
> + touch_rcu();
> print_cpu(NULL, cpu);
> }
>
> ??

There is an rcu_sysrq_start() and rcu_sysrq_end() to suppress this. These
are invoked by __handle_sysrq(). The value of rcu_cpu_stall_suppress
should be non-zero during the sysrq execution, and this should prevent
RCU CPU stall warnings from being printed.

That said, the code currently does not support overlapping calls to the
various functions that suppress RCU CPU stall warnings. Except that
the only other use in current mainline is rcu_panic(), which never
unsuppresses.

So could you please check the value of rcu_cpu_stall_suppress?
Just in case some other form of suppression was added somewhere
that I missed?

Thanx, Paul

> -- Steve
>
> > #endif
> > - rcu_read_unlock();
> > /*
> > * Only show locks if all tasks are dumped:
> > */
>

2022-08-03 17:45:37

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] sched/debug: avoid executing show_state and causing rcu stall warning


* Paul E. McKenney <[email protected]> wrote:

> There is an rcu_sysrq_start() and rcu_sysrq_end() to suppress this.
> These are invoked by __handle_sysrq(). The value of
> rcu_cpu_stall_suppress should be non-zero during the sysrq execution, and
> this should prevent RCU CPU stall warnings from being printed.
>
> That said, the code currently does not support overlapping calls to the
> various functions that suppress RCU CPU stall warnings. Except that the
> only other use in current mainline is rcu_panic(), which never
> unsuppresses.
>
> So could you please check the value of rcu_cpu_stall_suppress? Just in
> case some other form of suppression was added somewhere that I missed?

So instead of supressing the (justified!) RCU stall messages, I'd much
rather we apply only the minimal locking necessary for this debug printout.

That should also solve the stall warnings as a side effect.

Thanks,

Ingo

2022-08-03 17:49:35

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] sched/debug: avoid executing show_state and causing rcu stall warning


* Steven Rostedt <[email protected]> wrote:

>
> [ Adding Paul ]
>
> On Wed, 3 Aug 2022 09:18:45 +0800
> Liu Song <[email protected]> wrote:
>
> > From: Liu Song <[email protected]>
> >
> > If the number of CPUs is large, "sysrq_sched_debug_show" will execute for
> > a long time. Every time I execute "echo t > /proc/sysrq-trigger" on my
> > 128-core machine, the rcu stall warning will be triggered. Moreover,
> > sysrq_sched_debug_show does not need to be protected by rcu_read_lock,
> > and no rcu stall warning will appear after adjustment.
> >
> > Signed-off-by: Liu Song <[email protected]>
> > ---
> > kernel/sched/core.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 5555e49..82c117e 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -8879,11 +8879,11 @@ void show_state_filter(unsigned int state_filter)
> > sched_show_task(p);
> > }
> >
> > + rcu_read_unlock();
> > #ifdef CONFIG_SCHED_DEBUG
> > if (!state_filter)
> > sysrq_sched_debug_show();
>
> If this is just because sysrq_sched_debug_show() is very slow, does RCU
> have a way to "touch" it? Like the watchdogs have? That is, to tell RCU
> "Yes I know I'm taking a long time, but I'm still making forward progress,
> don't complain about me". Then the sysrq_sched_debug_show() could have:
>
> for_each_online_cpu(cpu) {
> /*
> * Need to reset softlockup watchdogs on all CPUs, because
> * another CPU might be blocked waiting for us to process
> * an IPI or stop_machine.
> */
> touch_nmi_watchdog();
> touch_all_softlockup_watchdogs();
> + touch_rcu();
> print_cpu(NULL, cpu);
> }

I'd much rather we use the specific exclusion primitive suitable for that
sequence - in that case it should be cpus_read_lock()/unlock() I suspect.

But the entire code sequence should be reviewed - do we anywhere walk task
lists that need RCU protection?

My main complaint was that we cannot just randomly drop the RCU lock with
no inspection of the underlying code.

Ingo

2022-08-03 17:57:41

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] sched/debug: avoid executing show_state and causing rcu stall warning

On Wed, Aug 03, 2022 at 07:14:52PM +0200, Ingo Molnar wrote:
>
> * Paul E. McKenney <[email protected]> wrote:
>
> > There is an rcu_sysrq_start() and rcu_sysrq_end() to suppress this.
> > These are invoked by __handle_sysrq(). The value of
> > rcu_cpu_stall_suppress should be non-zero during the sysrq execution, and
> > this should prevent RCU CPU stall warnings from being printed.
> >
> > That said, the code currently does not support overlapping calls to the
> > various functions that suppress RCU CPU stall warnings. Except that the
> > only other use in current mainline is rcu_panic(), which never
> > unsuppresses.
> >
> > So could you please check the value of rcu_cpu_stall_suppress? Just in
> > case some other form of suppression was added somewhere that I missed?
>
> So instead of supressing the (justified!) RCU stall messages, I'd much
> rather we apply only the minimal locking necessary for this debug printout.
>
> That should also solve the stall warnings as a side effect.

I am certainly with you in spirit! If I recall correctly, the issue
that led to the current state was that there was no way to walk the
task list locklessly except under an RCU read-side critical section.
Yes, you can use get_task_struct(), but that only prevents that task
structure from being freed, not from being removed from the list.

Here is hoping that there is a better way to nail down a task while
RCU-pausing a task-list traversal. Thoughts?

Thanx, Paul