2022-02-08 13:23:14

by Paul Menzel

[permalink] [raw]
Subject: ppc64le: `NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!` when turning off SMT

Dear Linux folks,


On the POWER8 server IBM S822LC running Ubuntu 21.10, Linux 5.17-rc1+
built with

$ grep HZ /boot/config-5.17.0-rc1+
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
CONFIG_NO_HZ=y
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250

once warned about a NOHZ tick-stop error, when I executed `sudo
/usr/sbin/ppc64_cpu --smt=off` (so that KVM would work).

```
$ dmesg
[ 0.000000] Linux version 5.17.0-rc1+
([email protected]) (Ubuntu
clang version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28 17:13:04 CET 2022
[…]
[271272.030262] NOHZ tick-stop error: Non-RCU local softirq work is
pending, handler #20!!!
[271272.305726] NOHZ tick-stop error: Non-RCU local softirq work is
pending, handler #20!!!
[271272.549790] NOHZ tick-stop error: Non-RCU local softirq work is
pending, handler #20!!!
[271274.885167] NOHZ tick-stop error: Non-RCU local softirq work is
pending, handler #20!!!
[271275.113896] NOHZ tick-stop error: Non-RCU local softirq work is
pending, handler #20!!!
[271275.412902] NOHZ tick-stop error: Non-RCU local softirq work is
pending, handler #20!!!
[271275.625245] NOHZ tick-stop error: Non-RCU local softirq work is
pending, handler #20!!!
[271275.833107] NOHZ tick-stop error: Non-RCU local softirq work is
pending, handler #20!!!
[271276.041391] NOHZ tick-stop error: Non-RCU local softirq work is
pending, handler #20!!!
[271277.244880] NOHZ tick-stop error: Non-RCU local softirq work is
pending, handler #20!!!
```


Kind regards,

Paul


2022-02-08 14:14:05

by Paul A. Clarke

[permalink] [raw]
Subject: Re: ppc64le: `NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!` when turning off SMT

On Tue, Feb 08, 2022 at 02:17:03PM +0100, Frederic Weisbecker wrote:
> On Tue, Feb 08, 2022 at 08:32:37AM +0100, Paul Menzel wrote:
> > once warned about a NOHZ tick-stop error, when I executed `sudo
> > /usr/sbin/ppc64_cpu --smt=off` (so that KVM would work).
>
> I see, so I assume this sets some CPUs offline, right?

ppc64_cpu --smt=off sets all but the first CPU per core offline.

PC

2022-02-09 10:10:48

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: ppc64le: `NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!` when turning off SMT

On Tue, Feb 08, 2022 at 08:32:37AM +0100, Paul Menzel wrote:
> Dear Linux folks,
>
>
> On the POWER8 server IBM S822LC running Ubuntu 21.10, Linux 5.17-rc1+ built
> with
>
> $ grep HZ /boot/config-5.17.0-rc1+
> CONFIG_NO_HZ_COMMON=y
> # CONFIG_HZ_PERIODIC is not set
> CONFIG_NO_HZ_IDLE=y
> # CONFIG_NO_HZ_FULL is not set
> CONFIG_NO_HZ=y
> # CONFIG_HZ_100 is not set
> CONFIG_HZ_250=y
> # CONFIG_HZ_300 is not set
> # CONFIG_HZ_1000 is not set
> CONFIG_HZ=250
>
> once warned about a NOHZ tick-stop error, when I executed `sudo
> /usr/sbin/ppc64_cpu --smt=off` (so that KVM would work).

I see, so I assume this sets some CPUs offline, right?

>
> ```
> $ dmesg
> [ 0.000000] Linux version 5.17.0-rc1+
> ([email protected]) (Ubuntu clang
> version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28 17:13:04 CET 2022
> […]
> [271272.030262] NOHZ tick-stop error: Non-RCU local softirq work is pending,
> handler #20!!!
> [271272.305726] NOHZ tick-stop error: Non-RCU local softirq work is pending,
> handler #20!!!
> [271272.549790] NOHZ tick-stop error: Non-RCU local softirq work is pending,
> handler #20!!!
> [271274.885167] NOHZ tick-stop error: Non-RCU local softirq work is pending,
> handler #20!!!
> [271275.113896] NOHZ tick-stop error: Non-RCU local softirq work is pending,
> handler #20!!!
> [271275.412902] NOHZ tick-stop error: Non-RCU local softirq work is pending,
> handler #20!!!
> [271275.625245] NOHZ tick-stop error: Non-RCU local softirq work is pending,
> handler #20!!!
> [271275.833107] NOHZ tick-stop error: Non-RCU local softirq work is pending,
> handler #20!!!
> [271276.041391] NOHZ tick-stop error: Non-RCU local softirq work is pending,
> handler #20!!!
> [271277.244880] NOHZ tick-stop error: Non-RCU local softirq work is pending,
> handler #20!!!
> ```

That's IRQ_POLL_SOFTIRQ. The problem here is probably that some of these
softirqs are pending even though ksoftirqd has been parked.

I see there is irq_poll_cpu_dead() that migrates the pending queue once
the CPU is finally dead, so this is well handled.

I'm preparing a patch to fix the warning.

Thanks.

2022-02-09 13:25:53

by Paul Menzel

[permalink] [raw]
Subject: Re: ppc64le: `NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!` when turning off SMT

[attach output of `dmesg`]

Am 08.02.22 um 08:32 schrieb Paul Menzel:
> Dear Linux folks,
>
>
> On the POWER8 server IBM S822LC running Ubuntu 21.10, Linux 5.17-rc1+
> built with
>
>     $ grep HZ /boot/config-5.17.0-rc1+
>     CONFIG_NO_HZ_COMMON=y
>     # CONFIG_HZ_PERIODIC is not set
>     CONFIG_NO_HZ_IDLE=y
>     # CONFIG_NO_HZ_FULL is not set
>     CONFIG_NO_HZ=y
>     # CONFIG_HZ_100 is not set
>     CONFIG_HZ_250=y
>     # CONFIG_HZ_300 is not set
>     # CONFIG_HZ_1000 is not set
>     CONFIG_HZ=250
>
> once warned about a NOHZ tick-stop error, when I executed `sudo
> /usr/sbin/ppc64_cpu --smt=off` (so that KVM would work).
>
> ```
> $ dmesg
> [    0.000000] Linux version 5.17.0-rc1+
> ([email protected]) (Ubuntu clang version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28 17:13:04 CET 2022
> […]
> [271272.030262] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!
> [271272.305726] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!
> [271272.549790] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!
> [271274.885167] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!
> [271275.113896] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!
> [271275.412902] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!
> [271275.625245] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!
> [271275.833107] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!
> [271276.041391] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!
> [271277.244880] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!
> ```
>
>
> Kind regards,
>
> Paul


Attachments:
linux-5.17-rc1+-nohz-tick-stop-error.txt (176.72 kB)