2019-12-26 08:39:21

by liwei (GF)

[permalink] [raw]
Subject: [PATCH] sched/debug: Reset watchdog on all CPUs while processing sysrq-t

Lengthy output of sysrq-t may take a lot of time on slow serial console
with lots of processes and CPUs.

So we need to reset NMI-watchdog to avoid spurious lockup messages, and
we also reset softlockup watchdogs on all other CPUs since another CPU
might be blocked waiting for us to process an IPI or stop_machine.

Add to sysrq_sched_debug_show() as what we did in show_state_filter().

Signed-off-by: Wei Li <[email protected]>
---
kernel/sched/debug.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index f7e4579e746c..879d3ccf3806 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -751,9 +751,16 @@ void sysrq_sched_debug_show(void)
int cpu;

sched_debug_header(NULL);
- for_each_online_cpu(cpu)
+ for_each_online_cpu(cpu) {
+ /*
+ * Need to reset softlockup watchdogs on all CPUs, because
+ * another CPU might be blocked waiting for us to process
+ * an IPI or stop_machine.
+ */
+ touch_nmi_watchdog();
+ touch_all_softlockup_watchdogs();
print_cpu(NULL, cpu);
-
+ }
}

/*
--
2.17.1


2020-01-02 19:46:53

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] sched/debug: Reset watchdog on all CPUs while processing sysrq-t

On Thu, 26 Dec 2019 16:52:24 +0800
Wei Li <[email protected]> wrote:

> Lengthy output of sysrq-t may take a lot of time on slow serial console
> with lots of processes and CPUs.
>
> So we need to reset NMI-watchdog to avoid spurious lockup messages, and
> we also reset softlockup watchdogs on all other CPUs since another CPU
> might be blocked waiting for us to process an IPI or stop_machine.

Have you had this triggered?

>
> Add to sysrq_sched_debug_show() as what we did in show_state_filter().
>
> Signed-off-by: Wei Li <[email protected]>
> ---
> kernel/sched/debug.c | 11 +++++++++--
> 1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
> index f7e4579e746c..879d3ccf3806 100644
> --- a/kernel/sched/debug.c
> +++ b/kernel/sched/debug.c
> @@ -751,9 +751,16 @@ void sysrq_sched_debug_show(void)
> int cpu;
>
> sched_debug_header(NULL);
> - for_each_online_cpu(cpu)
> + for_each_online_cpu(cpu) {
> + /*
> + * Need to reset softlockup watchdogs on all CPUs, because
> + * another CPU might be blocked waiting for us to process
> + * an IPI or stop_machine.
> + */
> + touch_nmi_watchdog();
> + touch_all_softlockup_watchdogs();

This doesn't seem to hurt to add, thus.

Reviewed-by: Steven Rostedt (VMware) <[email protected]>

-- Steve

> print_cpu(NULL, cpu);
> -
> + }
> }
>
> /*

2020-01-03 01:55:59

by liwei (GF)

[permalink] [raw]
Subject: Re: [PATCH] sched/debug: Reset watchdog on all CPUs while processing sysrq-t

Hi Steven,
Yes, it can be triggered on the Hi1620 system (128 cores) as follows:
stress-ng -c 50 &
stress-ng -m 50 &
stress-ng -i 20 &
echo 7 > /proc/sys/kernel/printk
echo t > /proc/sysrq-trigger

Then a soft lockup will be reported at migration thread
[39636.303531] watchdog: BUG: soft lockup - CPU#67 stuck for 23s! [migration/67:348]
which is waiting for the CPU handling sysrq-t to process stop_two_cpus.

Thanks,
Wei

On 2020/1/3 3:45, Steven Rostedt wrote:
> On Thu, 26 Dec 2019 16:52:24 +0800
> Wei Li <[email protected]> wrote:
>
>> Lengthy output of sysrq-t may take a lot of time on slow serial console
>> with lots of processes and CPUs.
>>
>> So we need to reset NMI-watchdog to avoid spurious lockup messages, and
>> we also reset softlockup watchdogs on all other CPUs since another CPU
>> might be blocked waiting for us to process an IPI or stop_machine.
>
> Have you had this triggered?
>
>>
>> Add to sysrq_sched_debug_show() as what we did in show_state_filter().
>>
>> Signed-off-by: Wei Li <[email protected]>
>> ---
>> kernel/sched/debug.c | 11 +++++++++--
>> 1 file changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
>> index f7e4579e746c..879d3ccf3806 100644
>> --- a/kernel/sched/debug.c
>> +++ b/kernel/sched/debug.c
>> @@ -751,9 +751,16 @@ void sysrq_sched_debug_show(void)
>> int cpu;
>>
>> sched_debug_header(NULL);
>> - for_each_online_cpu(cpu)
>> + for_each_online_cpu(cpu) {
>> + /*
>> + * Need to reset softlockup watchdogs on all CPUs, because
>> + * another CPU might be blocked waiting for us to process
>> + * an IPI or stop_machine.
>> + */
>> + touch_nmi_watchdog();
>> + touch_all_softlockup_watchdogs();
>
> This doesn't seem to hurt to add, thus.
>
> Reviewed-by: Steven Rostedt (VMware) <[email protected]>
>
> -- Steve
>
>> print_cpu(NULL, cpu);
>> -
>> + }
>> }
>>
>> /*
>
>
> .
>

2020-01-07 09:34:50

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched/debug: Reset watchdog on all CPUs while processing sysrq-t

On Thu, Jan 02, 2020 at 02:45:14PM -0500, Steven Rostedt wrote:
> On Thu, 26 Dec 2019 16:52:24 +0800
> Wei Li <[email protected]> wrote:
>
> > Lengthy output of sysrq-t may take a lot of time on slow serial console
> > with lots of processes and CPUs.
> >
> > So we need to reset NMI-watchdog to avoid spurious lockup messages, and
> > we also reset softlockup watchdogs on all other CPUs since another CPU
> > might be blocked waiting for us to process an IPI or stop_machine.
>
> Have you had this triggered?
>
> >
> > Add to sysrq_sched_debug_show() as what we did in show_state_filter().
> >
> > Signed-off-by: Wei Li <[email protected]>
> > ---
> > kernel/sched/debug.c | 11 +++++++++--
> > 1 file changed, 9 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
> > index f7e4579e746c..879d3ccf3806 100644
> > --- a/kernel/sched/debug.c
> > +++ b/kernel/sched/debug.c
> > @@ -751,9 +751,16 @@ void sysrq_sched_debug_show(void)
> > int cpu;
> >
> > sched_debug_header(NULL);
> > - for_each_online_cpu(cpu)
> > + for_each_online_cpu(cpu) {
> > + /*
> > + * Need to reset softlockup watchdogs on all CPUs, because
> > + * another CPU might be blocked waiting for us to process
> > + * an IPI or stop_machine.
> > + */
> > + touch_nmi_watchdog();
> > + touch_all_softlockup_watchdogs();
>
> This doesn't seem to hurt to add, thus.
>
> Reviewed-by: Steven Rostedt (VMware) <[email protected]>

Thanks!

Subject: [tip: sched/core] sched/debug: Reset watchdog on all CPUs while processing sysrq-t

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 02d4ac5885a18d326b500b94808f0956dcce2832
Gitweb: https://git.kernel.org/tip/02d4ac5885a18d326b500b94808f0956dcce2832
Author: Wei Li <[email protected]>
AuthorDate: Thu, 26 Dec 2019 16:52:24 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Fri, 17 Jan 2020 10:19:20 +01:00

sched/debug: Reset watchdog on all CPUs while processing sysrq-t

Lengthy output of sysrq-t may take a lot of time on slow serial console
with lots of processes and CPUs.

So we need to reset NMI-watchdog to avoid spurious lockup messages, and
we also reset softlockup watchdogs on all other CPUs since another CPU
might be blocked waiting for us to process an IPI or stop_machine.

Add to sysrq_sched_debug_show() as what we did in show_state_filter().

Signed-off-by: Wei Li <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Steven Rostedt (VMware) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/sched/debug.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index f7e4579..879d3cc 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -751,9 +751,16 @@ void sysrq_sched_debug_show(void)
int cpu;

sched_debug_header(NULL);
- for_each_online_cpu(cpu)
+ for_each_online_cpu(cpu) {
+ /*
+ * Need to reset softlockup watchdogs on all CPUs, because
+ * another CPU might be blocked waiting for us to process
+ * an IPI or stop_machine.
+ */
+ touch_nmi_watchdog();
+ touch_all_softlockup_watchdogs();
print_cpu(NULL, cpu);
-
+ }
}

/*