2024-02-27 08:22:26

by Bagas Sanjaya

[permalink] [raw]
Subject: Fwd: Continuous ACPI errors resulting in high CPU usage by journald

Hi,

On Bugzilla, [email protected] reported stable-specific, ACPI error
regression that led into high CPU temperature [1]. He wrote:

> Overview:
>
> After updating from lts v6.6.14-2 to lts v6.6.17-1 noticed high CPU temperature and lag. After running htop noticed that journald was using 30-60% of CPU. Afterwards, tried switching to stable, or lts v6.6.18-1, but encountered the same issue.
>
> Running journalctl -f gives these lines over and over again:
>
> Feb 19 21:09:12 danirybe kernel: ACPI Error: Could not disable RealTimeClock events (20230628/evxfevnt-243)
> Feb 19 21:09:12 danirybe kernel: ACPI Error: No handler or method for GPE 08, disabling event (20230628/evgpe-839)
> Feb 19 21:09:12 danirybe kernel: ACPI Error: No handler or method for GPE 0A, disabling event (20230628/evgpe-839)
> Feb 19 21:09:12 danirybe kernel: ACPI Error: No handler or method for GPE 0B, disabling event (20230628/evgpe-839)
> Feb 19 21:09:12 danirybe kernel: ACPI Error: No installed handler for fixed event - PM_Timer (0), disabling (20230628/evevent-255)
> Feb 19 21:09:12 danirybe kernel: ACPI Error: No installed handler for fixed event - PowerButton (2), disabling (20230628/evevent-255)
> Feb 19 21:09:12 danirybe kernel: ACPI Error: No installed handler for fixed event - SleepButton (3), disabling (20230628/evevent-255)
>
> My system info:
>
> Laptop model: ASUS VivoBook D540NV-GQ065T
> OS: Arch Linux x86_64
> Kernel: 6.6.14-2-lts
> WM: sway
> CPU: Intel Pentium N420 (4) @ 2.500GHz
> GPU1: Intel Apollo Lake [HD Graphics 505]
> GPU2: NVIDIA GeForce 920MX
>
> I've pinned down the commit after which the problem occurs:
>
> 847e1eb30e269a094da046c08273abe3f3361cf2 is the first bad commit
> commit 847e1eb30e269a094da046c08273abe3f3361cf2
> Author: Shin'ichiro Kawasaki <[email protected]>
> Date: Mon Jan 8 15:20:58 2024 +0900
>
> platform/x86: p2sb: Allow p2sb_bar() calls during PCI device probe
>
> commit 5913320eb0b3ec88158cfcb0fa5e996bf4ef681b upstream.
>
> <snipped>...

See Bugzilla for the full thread.

Thanks.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=218531

--
An old man doll... just what I always wanted! - Clara


Attachments:
(No filename) (2.19 kB)
signature.asc (235.00 B)
Download all attachments

2024-02-27 09:58:53

by Shinichiro Kawasaki

[permalink] [raw]
Subject: Re: Fwd: Continuous ACPI errors resulting in high CPU usage by journald

On Feb 27, 2024 / 15:22, Bagas Sanjaya wrote:
> Hi,
>
> On Bugzilla, [email protected] reported stable-specific, ACPI error
> regression that led into high CPU temperature [1]. He wrote:

Thanks for the report, and sorry for the trouble.

>
> > Overview:
> >
> > After updating from lts v6.6.14-2 to lts v6.6.17-1 noticed high CPU temperature and lag. After running htop noticed that journald was using 30-60% of CPU. Afterwards, tried switching to stable, or lts v6.6.18-1, but encountered the same issue.
> >
> > Running journalctl -f gives these lines over and over again:
> >
> > Feb 19 21:09:12 danirybe kernel: ACPI Error: Could not disable RealTimeClock events (20230628/evxfevnt-243)
> > Feb 19 21:09:12 danirybe kernel: ACPI Error: No handler or method for GPE 08, disabling event (20230628/evgpe-839)
> > Feb 19 21:09:12 danirybe kernel: ACPI Error: No handler or method for GPE 0A, disabling event (20230628/evgpe-839)
> > Feb 19 21:09:12 danirybe kernel: ACPI Error: No handler or method for GPE 0B, disabling event (20230628/evgpe-839)
> > Feb 19 21:09:12 danirybe kernel: ACPI Error: No installed handler for fixed event - PM_Timer (0), disabling (20230628/evevent-255)
> > Feb 19 21:09:12 danirybe kernel: ACPI Error: No installed handler for fixed event - PowerButton (2), disabling (20230628/evevent-255)
> > Feb 19 21:09:12 danirybe kernel: ACPI Error: No installed handler for fixed event - SleepButton (3), disabling (20230628/evevent-255)
> >
> > My system info:
> >
> > Laptop model: ASUS VivoBook D540NV-GQ065T
> > OS: Arch Linux x86_64
> > Kernel: 6.6.14-2-lts
> > WM: sway
> > CPU: Intel Pentium N420 (4) @ 2.500GHz

I think this CPU is in Goldmont microarchitecture group. The group is handled
in a bit unique way in drivers/platform/x86/p2sb.c. I guess the commit affected
handling of P2SB resource on machines with that architecture.

> > GPU1: Intel Apollo Lake [HD Graphics 505]
> > GPU2: NVIDIA GeForce 920MX
> >
> > I've pinned down the commit after which the problem occurs:
> >
> > 847e1eb30e269a094da046c08273abe3f3361cf2 is the first bad commit
> > commit 847e1eb30e269a094da046c08273abe3f3361cf2
> > Author: Shin'ichiro Kawasaki <[email protected]>
> > Date: Mon Jan 8 15:20:58 2024 +0900
> >
> > platform/x86: p2sb: Allow p2sb_bar() calls during PCI device probe
> >
> > commit 5913320eb0b3ec88158cfcb0fa5e996bf4ef681b upstream.
> >
> > <snipped>...
>
> See Bugzilla for the full thread.
>
> Thanks.
>
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=218531

I do not have access to the hardware. As I commented on the bugzilla link above,
I would like ask help for debug.

Subject: Re: Fwd: Continuous ACPI errors resulting in high CPU usage by journald

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 27.02.24 09:22, Bagas Sanjaya wrote:

> On Bugzilla, [email protected] reported stable-specific, ACPI error
> regression that led into high CPU temperature [1]. He wrote:
> [...]

#regzbot ^introduced 847e1eb30e269a094da046c08273abe3f3361cf2
#regzbot duplicate: https://bugzilla.kernel.org/show_bug.cgi?id=218531
#regzbot title platform/x86: p2sb: Continuous ACPI errors resulting in
high CPU usage by journald
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

2024-02-29 16:17:32

by Andy Shevchenko

[permalink] [raw]
Subject: Re: Fwd: Continuous ACPI errors resulting in high CPU usage by journald

On Tue, Feb 27, 2024 at 09:57:28AM +0000, Shinichiro Kawasaki wrote:
> On Feb 27, 2024 / 15:22, Bagas Sanjaya wrote:
> >
> > On Bugzilla, [email protected] reported stable-specific, ACPI error
> > regression that led into high CPU temperature [1]. He wrote:
>
> Thanks for the report, and sorry for the trouble.

Heads up. The problem seems with the caching algo which includes function 0
to be scanned. The investigation and fix development are in progress.

--
With Best Regards,
Andy Shevchenko



2024-03-07 21:33:14

by Salvatore Bonaccorso

[permalink] [raw]
Subject: Re: Fwd: Continuous ACPI errors resulting in high CPU usage by journald

Hi,

On Thu, Feb 29, 2024 at 09:49:08AM +0100, Linux regression tracking #adding (Thorsten Leemhuis) wrote:
> [TLDR: I'm adding this report to the list of tracked Linux kernel
> regressions; the text you find below is based on a few templates
> paragraphs you might have encountered already in similar form.
> See link in footer if these mails annoy you.]
>
> On 27.02.24 09:22, Bagas Sanjaya wrote:
>
> > On Bugzilla, [email protected] reported stable-specific, ACPI error
> > regression that led into high CPU temperature [1]. He wrote:
> > [...]
>
> #regzbot ^introduced 847e1eb30e269a094da046c08273abe3f3361cf2
> #regzbot duplicate: https://bugzilla.kernel.org/show_bug.cgi?id=218531
> #regzbot title platform/x86: p2sb: Continuous ACPI errors resulting in
> high CPU usage by journald
> #regzbot ignore-activity
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> That page also explains what to do if mails like this annoy you.

The fix for this issue seems to have landed in mainline:

aec7d25b497c ("platform/x86: p2sb: On Goldmont only cache P2SB and SPI devfn BAR")

Regards,
Salvatore