2023-06-23 01:11:02

by Bagas Sanjaya

[permalink] [raw]
Subject: Fwd: High cpu usage caused by kernel process when upgraded to linux 5.19.17 or later

Hi,

I notice a regression report on Bugzilla [1]. Quoting from it:

> kernel process "kworker/events_power_efficient" uses a lot of cpu power (100% on ESXI 6.7, ~30% on ESXI 7.0U3 or later) after upgrading from 5.17.3 to 5.19.17 or later.
>
> dmesg log:
> [ 2430.973102] </TASK>
> [ 2430.973131] Sending NMI from CPU 1 to CPUs 0:
> [ 2430.973241] NMI backtrace for cpu 0
> [ 2430.973247] CPU: 0 PID: 22 Comm: kworker/0:1 Not tainted 6.3.3 #1
> [ 2430.973254] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
> [ 2430.973258] Workqueue: events_power_efficient htable_gc [xt_hashlimit]
> [ 2430.973275] RIP: 0010:preempt_count_sub+0x2e/0xa0
> [ 2430.973289] Code: 36 01 85 c9 75 1b 65 8b 15 a7 da f8 5e 89 d1 81 e1 ff ff ff 7f 39 f9 7c 16 81 ff fe 00 00 00 76 3b f7 df 65 01 3d 8a da f8 5e <c3> cc cc cc cc e8 98 aa 25 00 85 c0 74 f2 8b 15 da 71 ed 00 85 d2
> [ 2430.973294] RSP: 0018:ffffb15ec00dbe58 EFLAGS: 00000297
> [ 2430.973299] RAX: 0000000000000000 RBX: ffffb15ec12ad000 RCX: 0000000000000001
> [ 2430.973302] RDX: 0000000080000001 RSI: ffffffffa1c3313b RDI: 00000000ffffffff
> [ 2430.973306] RBP: dead000000000122 R08: 0000000000000010 R09: 0000746e65696369
> [ 2430.973309] R10: 8080808080808080 R11: 0000000000000018 R12: 0000000000000000
> [ 2430.973312] R13: 0000000000001e2b R14: ffffb15ec12ad048 R15: ffff91c279c26a05
> [ 2430.973316] FS: 0000000000000000(0000) GS:ffff91c279c00000(0000) knlGS:0000000000000000
> [ 2430.973320] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2430.973324] CR2: 000055fc138890e0 CR3: 000000010810e002 CR4: 00000000001706f0
> [ 2430.973374] Call Trace:
> [ 2430.973388] <TASK>
> [ 2430.973390] __local_bh_enable_ip+0x32/0x70
> [ 2430.973413] htable_selective_cleanup+0x95/0xc0 [xt_hashlimit]
> [ 2430.973428] htable_gc+0xf/0x30 [xt_hashlimit]
> [ 2430.973440] process_one_work+0x1d4/0x360
> [ 2430.973459] ? process_one_work+0x360/0x360
> [ 2430.973467] worker_thread+0x25/0x3b0
> [ 2430.973476] ? process_one_work+0x360/0x360
> [ 2430.973483] kthread+0xe1/0x110
> [ 2430.973499] ? kthread_complete_and_exit+0x20/0x20
> [ 2430.973507] ret_from_fork+0x1f/0x30
> [ 2430.973526] </TASK>

See Bugzilla for the full thread and perf output.

Anyway, I'm tracking it in regzbot so that it doesn't fall through
cracks unnoticed:

#regzbot introduced: v5.17.3..v5.19.17 https://bugzilla.kernel.org/show_bug.cgi?id=217586
#regzbot title: kworker/events_power_efficient utilizes full CPU power after kernel upgrade

Thanks.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217586

--
An old man doll... just what I always wanted! - Clara


2023-06-23 02:12:06

by Steven Rostedt

[permalink] [raw]
Subject: Re: High cpu usage caused by kernel process when upgraded to linux 5.19.17 or later

On Fri, 23 Jun 2023 07:58:51 +0700
Bagas Sanjaya <[email protected]> wrote:

> Hi,
>
> I notice a regression report on Bugzilla [1]. Quoting from it:
>
> > kernel process "kworker/events_power_efficient" uses a lot of cpu power (100% on ESXI 6.7, ~30% on ESXI 7.0U3 or later) after upgrading from 5.17.3 to 5.19.17 or later.
> >
> > dmesg log:
> > [ 2430.973102] </TASK>
> > [ 2430.973131] Sending NMI from CPU 1 to CPUs 0:
> > [ 2430.973241] NMI backtrace for cpu 0
> > [ 2430.973247] CPU: 0 PID: 22 Comm: kworker/0:1 Not tainted 6.3.3 #1
> > [ 2430.973254] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
> > [ 2430.973258] Workqueue: events_power_efficient htable_gc [xt_hashlimit]
> > [ 2430.973275] RIP: 0010:preempt_count_sub+0x2e/0xa0
> > [ 2430.973289] Code: 36 01 85 c9 75 1b 65 8b 15 a7 da f8 5e 89 d1 81 e1 ff ff ff 7f 39 f9 7c 16 81 ff fe 00 00 00 76 3b f7 df 65 01 3d 8a da f8 5e <c3> cc cc cc cc e8 98 aa 25 00 85 c0 74 f2 8b 15 da 71 ed 00 85 d2
> > [ 2430.973294] RSP: 0018:ffffb15ec00dbe58 EFLAGS: 00000297
> > [ 2430.973299] RAX: 0000000000000000 RBX: ffffb15ec12ad000 RCX: 0000000000000001
> > [ 2430.973302] RDX: 0000000080000001 RSI: ffffffffa1c3313b RDI: 00000000ffffffff
> > [ 2430.973306] RBP: dead000000000122 R08: 0000000000000010 R09: 0000746e65696369
> > [ 2430.973309] R10: 8080808080808080 R11: 0000000000000018 R12: 0000000000000000
> > [ 2430.973312] R13: 0000000000001e2b R14: ffffb15ec12ad048 R15: ffff91c279c26a05
> > [ 2430.973316] FS: 0000000000000000(0000) GS:ffff91c279c00000(0000) knlGS:0000000000000000
> > [ 2430.973320] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 2430.973324] CR2: 000055fc138890e0 CR3: 000000010810e002 CR4: 00000000001706f0
> > [ 2430.973374] Call Trace:
> > [ 2430.973388] <TASK>
> > [ 2430.973390] __local_bh_enable_ip+0x32/0x70
> > [ 2430.973413] htable_selective_cleanup+0x95/0xc0 [xt_hashlimit]
> > [ 2430.973428] htable_gc+0xf/0x30 [xt_hashlimit]

I take it that the "gc" in "htable_gc" means "garbage collection". It may
not have anything to do with changes to this thread. It could very well be
something is feeding it too much garbage!

-- Steve


> > [ 2430.973440] process_one_work+0x1d4/0x360
> > [ 2430.973459] ? process_one_work+0x360/0x360
> > [ 2430.973467] worker_thread+0x25/0x3b0
> > [ 2430.973476] ? process_one_work+0x360/0x360
> > [ 2430.973483] kthread+0xe1/0x110
> > [ 2430.973499] ? kthread_complete_and_exit+0x20/0x20
> > [ 2430.973507] ret_from_fork+0x1f/0x30
> > [ 2430.973526] </TASK>
>
> See Bugzilla for the full thread and perf output.
>
> Anyway, I'm tracking it in regzbot so that it doesn't fall through
> cracks unnoticed:
>
> #regzbot introduced: v5.17.3..v5.19.17 https://bugzilla.kernel.org/show_bug.cgi?id=217586
> #regzbot title: kworker/events_power_efficient utilizes full CPU power after kernel upgrade
>
> Thanks.
>
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217586
>


2023-06-27 02:57:02

by Bagas Sanjaya

[permalink] [raw]
Subject: Re: Fwd: High cpu usage caused by kernel process when upgraded to linux 5.19.17 or later

On Fri, Jun 23, 2023 at 07:58:51AM +0700, Bagas Sanjaya wrote:
> Hi,
>
> I notice a regression report on Bugzilla [1]. Quoting from it:
>
> > kernel process "kworker/events_power_efficient" uses a lot of cpu power (100% on ESXI 6.7, ~30% on ESXI 7.0U3 or later) after upgrading from 5.17.3 to 5.19.17 or later.
> >
> > dmesg log:
> > [ 2430.973102] </TASK>
> > [ 2430.973131] Sending NMI from CPU 1 to CPUs 0:
> > [ 2430.973241] NMI backtrace for cpu 0
> > [ 2430.973247] CPU: 0 PID: 22 Comm: kworker/0:1 Not tainted 6.3.3 #1
> > [ 2430.973254] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
> > [ 2430.973258] Workqueue: events_power_efficient htable_gc [xt_hashlimit]
> > [ 2430.973275] RIP: 0010:preempt_count_sub+0x2e/0xa0
> > [ 2430.973289] Code: 36 01 85 c9 75 1b 65 8b 15 a7 da f8 5e 89 d1 81 e1 ff ff ff 7f 39 f9 7c 16 81 ff fe 00 00 00 76 3b f7 df 65 01 3d 8a da f8 5e <c3> cc cc cc cc e8 98 aa 25 00 85 c0 74 f2 8b 15 da 71 ed 00 85 d2
> > [ 2430.973294] RSP: 0018:ffffb15ec00dbe58 EFLAGS: 00000297
> > [ 2430.973299] RAX: 0000000000000000 RBX: ffffb15ec12ad000 RCX: 0000000000000001
> > [ 2430.973302] RDX: 0000000080000001 RSI: ffffffffa1c3313b RDI: 00000000ffffffff
> > [ 2430.973306] RBP: dead000000000122 R08: 0000000000000010 R09: 0000746e65696369
> > [ 2430.973309] R10: 8080808080808080 R11: 0000000000000018 R12: 0000000000000000
> > [ 2430.973312] R13: 0000000000001e2b R14: ffffb15ec12ad048 R15: ffff91c279c26a05
> > [ 2430.973316] FS: 0000000000000000(0000) GS:ffff91c279c00000(0000) knlGS:0000000000000000
> > [ 2430.973320] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 2430.973324] CR2: 000055fc138890e0 CR3: 000000010810e002 CR4: 00000000001706f0
> > [ 2430.973374] Call Trace:
> > [ 2430.973388] <TASK>
> > [ 2430.973390] __local_bh_enable_ip+0x32/0x70
> > [ 2430.973413] htable_selective_cleanup+0x95/0xc0 [xt_hashlimit]
> > [ 2430.973428] htable_gc+0xf/0x30 [xt_hashlimit]
> > [ 2430.973440] process_one_work+0x1d4/0x360
> > [ 2430.973459] ? process_one_work+0x360/0x360
> > [ 2430.973467] worker_thread+0x25/0x3b0
> > [ 2430.973476] ? process_one_work+0x360/0x360
> > [ 2430.973483] kthread+0xe1/0x110
> > [ 2430.973499] ? kthread_complete_and_exit+0x20/0x20
> > [ 2430.973507] ret_from_fork+0x1f/0x30
> > [ 2430.973526] </TASK>
>
> See Bugzilla for the full thread and perf output.
>
> Anyway, I'm tracking it in regzbot so that it doesn't fall through
> cracks unnoticed:
>
> #regzbot introduced: v5.17.3..v5.19.17 https://bugzilla.kernel.org/show_bug.cgi?id=217586
> #regzbot title: kworker/events_power_efficient utilizes full CPU power after kernel upgrade
>

The reporter had found the culprit (see Bugzilla for more information), thus
telling regzbot:

#regzbot introduced: 6ad0ad2bf8a67e
#regzbot title: retbleed reporting causes high cpu utilization due to kworker/events_power_efficient
#regzbot link: https://lore.kernel.org/all/PH0PR05MB8448A203A909959FAC754B7AAF439@PH0PR05MB8448.namprd05.prod.outlook.com/
#regzbot link: 6ad0ad2bf8a67e

Thanks.

--
An old man doll... just what I always wanted! - Clara


Attachments:
(No filename) (3.12 kB)
signature.asc (235.00 B)
Download all attachments

2023-06-27 07:51:12

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Fwd: High cpu usage caused by kernel process when upgraded to linux 5.19.17 or later

On Tue, Jun 27, 2023 at 09:29:36AM +0700, Bagas Sanjaya wrote:
> On Fri, Jun 23, 2023 at 07:58:51AM +0700, Bagas Sanjaya wrote:
> > Hi,
> >
> > I notice a regression report on Bugzilla [1]. Quoting from it:
> >
> > > kernel process "kworker/events_power_efficient" uses a lot of cpu power (100% on ESXI 6.7, ~30% on ESXI 7.0U3 or later) after upgrading from 5.17.3 to 5.19.17 or later.
> > >
> > > dmesg log:
> > > [ 2430.973102] </TASK>
> > > [ 2430.973131] Sending NMI from CPU 1 to CPUs 0:
> > > [ 2430.973241] NMI backtrace for cpu 0
> > > [ 2430.973247] CPU: 0 PID: 22 Comm: kworker/0:1 Not tainted 6.3.3 #1
> > > [ 2430.973254] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
> > > [ 2430.973258] Workqueue: events_power_efficient htable_gc [xt_hashlimit]
> > > [ 2430.973275] RIP: 0010:preempt_count_sub+0x2e/0xa0
> > > [ 2430.973289] Code: 36 01 85 c9 75 1b 65 8b 15 a7 da f8 5e 89 d1 81 e1 ff ff ff 7f 39 f9 7c 16 81 ff fe 00 00 00 76 3b f7 df 65 01 3d 8a da f8 5e <c3> cc cc cc cc e8 98 aa 25 00 85 c0 74 f2 8b 15 da 71 ed 00 85 d2
> > > [ 2430.973294] RSP: 0018:ffffb15ec00dbe58 EFLAGS: 00000297
> > > [ 2430.973299] RAX: 0000000000000000 RBX: ffffb15ec12ad000 RCX: 0000000000000001
> > > [ 2430.973302] RDX: 0000000080000001 RSI: ffffffffa1c3313b RDI: 00000000ffffffff
> > > [ 2430.973306] RBP: dead000000000122 R08: 0000000000000010 R09: 0000746e65696369
> > > [ 2430.973309] R10: 8080808080808080 R11: 0000000000000018 R12: 0000000000000000
> > > [ 2430.973312] R13: 0000000000001e2b R14: ffffb15ec12ad048 R15: ffff91c279c26a05
> > > [ 2430.973316] FS: 0000000000000000(0000) GS:ffff91c279c00000(0000) knlGS:0000000000000000
> > > [ 2430.973320] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 2430.973324] CR2: 000055fc138890e0 CR3: 000000010810e002 CR4: 00000000001706f0
> > > [ 2430.973374] Call Trace:
> > > [ 2430.973388] <TASK>
> > > [ 2430.973390] __local_bh_enable_ip+0x32/0x70
> > > [ 2430.973413] htable_selective_cleanup+0x95/0xc0 [xt_hashlimit]
> > > [ 2430.973428] htable_gc+0xf/0x30 [xt_hashlimit]
> > > [ 2430.973440] process_one_work+0x1d4/0x360
> > > [ 2430.973459] ? process_one_work+0x360/0x360
> > > [ 2430.973467] worker_thread+0x25/0x3b0
> > > [ 2430.973476] ? process_one_work+0x360/0x360
> > > [ 2430.973483] kthread+0xe1/0x110
> > > [ 2430.973499] ? kthread_complete_and_exit+0x20/0x20
> > > [ 2430.973507] ret_from_fork+0x1f/0x30
> > > [ 2430.973526] </TASK>
> >
> > See Bugzilla for the full thread and perf output.
> >
> > Anyway, I'm tracking it in regzbot so that it doesn't fall through
> > cracks unnoticed:
> >
> > #regzbot introduced: v5.17.3..v5.19.17 https://bugzilla.kernel.org/show_bug.cgi?id=217586
> > #regzbot title: kworker/events_power_efficient utilizes full CPU power after kernel upgrade
> >
>
> The reporter had found the culprit (see Bugzilla for more information), thus
> telling regzbot:
>
> #regzbot introduced: 6ad0ad2bf8a67e
> #regzbot title: retbleed reporting causes high cpu utilization due to kworker/events_power_efficient
> #regzbot link: https://lore.kernel.org/all/PH0PR05MB8448A203A909959FAC754B7AAF439@PH0PR05MB8448.namprd05.prod.outlook.com/
> #regzbot link: 6ad0ad2bf8a67e

Is there a problem other than that IBRS is slow and slower still on
shitty virt?

Does booting with: "spectre_v2=retpoline retbleed=off" make it go away?

I can't tell from this. Also, please don't use bugzilla.

2023-06-27 10:43:03

by Bagas Sanjaya

[permalink] [raw]
Subject: Re: Fwd: High cpu usage caused by kernel process when upgraded to linux 5.19.17 or later

On 6/27/23 14:30, Peter Zijlstra wrote:
> I can't tell from this. Also, please don't use bugzilla.

Why not BZ? I'm confused too...

--
An old man doll... just what I always wanted! - Clara


2023-06-27 11:03:26

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Fwd: High cpu usage caused by kernel process when upgraded to linux 5.19.17 or later

On Tue, Jun 27, 2023 at 05:15:42PM +0700, Bagas Sanjaya wrote:
> On 6/27/23 14:30, Peter Zijlstra wrote:
> > I can't tell from this. Also, please don't use bugzilla.
>
> Why not BZ? I'm confused too...

Because we have email; and why would I want to touch a browser for this?

2023-06-29 14:29:00

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Fwd: High cpu usage caused by kernel process when upgraded to linux 5.19.17 or later

On Thu, Jun 29 2023 at 12:05, Vivek Anand wrote:
> I've tried booting with "spectre_v2=retpoline retbleed=off".
> This change didn't work. Still CPU is 100%

This does not make sense.

retbleed=off has the same effect as CONFIG_X86_IBRS_ENTRY=n.

The only difference is that with CONFIG_X86_IBRS_ENTRY=y and
retbleed=off there is one extra jump in the low level entry code
(syscall, interrupts, exceptions) and one extra jump on the exit side.

But those extra jumps are completely irrelevant for the kworker threads.

Can you please provide dmesg and the content of the files in

/sys/devices/system/cpu/vulnerabilities/

on a kernel booted with "spectre_v2=retpoline retbleed=off" ?

Thanks,

tglx

2023-10-20 10:56:12

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: Fwd: High cpu usage caused by kernel process when upgraded to linux 5.19.17 or later

[TLDR: This mail in primarily relevant for Linux kernel regression
tracking. See link in footer if these mails annoy you.]

On 23.06.23 02:58, Bagas Sanjaya wrote:
>
> I notice a regression report on Bugzilla [1]. Quoting from it:
>
>> kernel process "kworker/events_power_efficient" uses a lot of cpu power (100% on ESXI 6.7, ~30% on ESXI 7.0U3 or later) after upgrading from 5.17.3 to 5.19.17 or later.
> [...]
> #regzbot introduced: v5.17.3..v5.19.17 https://bugzilla.kernel.org/show_bug.cgi?id=217586
> #regzbot title: kworker/events_power_efficient utilizes full CPU power after kernel upgrade
>
> Thanks.
>
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217586

#regzbot resolve: afaics: misconfiguration became more problematic due
to new mitigations
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.