2022-01-10 11:30:53

by Paul Menzel

[permalink] [raw]
Subject: 100 ms boot time increase regression in acpi_init()/acpi_scan_bus()

#regzbot introduced: v5.13..v5.14-rc1
#regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=215419


Dear Linux folks,


On the Intel T4500 laptop Acer TravelMate 5735Z with Debian
sid/unstable, there is a 100 ms introduced between Linux 5.10.46 and
5.13.9, and is still present in Linux 5.15.5.

[ 0.000000] microcode: microcode updated early to revision
0xa0b, date = 2010-09-28
[ 0.000000] Linux version 5.15.0-2-amd64
([email protected]) (gcc-11 (Debian 11.2.0-13) 11.2.0, GNU
ld (GNU Binutils for Debian) 2.37) #1 SMP Debian 5.15.5-2 (2021-12-18)
[ 0.000000] Command line:
BOOT_IMAGE=/boot/vmlinuz-5.15.0-2-amd64
root=UUID=e17cec4f-d2b8-4cc3-bd39-39a10ed422f4 ro quiet noisapnp
cryptomgr.notests random.trust_cpu=on initcall_debug log_buf_len=4M
[…]
[ 0.262243] calling acpi_init+0x0/0x487 @ 1
[…]
[ 0.281655] ACPI: Enabled 15 GPEs in block 00 to 3F
[ 0.394855] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[…]
[ 0.570908] initcall acpi_init+0x0/0x487 returned 0 after 300781
usecs

I attached all the log files to the Kernel.org Bugzilla bug report
#215419 [1].

Unfortunately, I am unable to bisect the issue, as it’s not my machine,
and I do not have a lot of access to it.

Using ftrace, unfortunately, I didn’t save all of them, I think the path is

acpi_init() → acpi_scan_init() → acpi_bus_scan(ACPI_ROOT_OBJECT)

But this path hasn’t changed as far as I can see. Anyway, from that
path, somehow

acpi_bus_check_add_1() → acpi_bus_check_add() → … →
acpi_bus_check_add() → acpi_add_single_object() → acpi_bus_get_status()

is called, and the `acpi_bus_get_status()` call takes 100 ms on the
system – also the cause for bug #208705 [2] –, but that code path wasn’t
taken before.

Do you know from the top of your head, what changed? I am going to have
short access to the system every two weeks or so, so debugging is
unfortunately quite hard.

What is already on my to-do list:

1. Use dynamic debug `drivers/acpi/scan.c`
2. Trace older Linux kernel (5.10.46) to see the differences
3. Booting some GNU/Linux system to test 5.11 (Ubuntu 20.10) and 5.12
4. Unrelated to the regression, but trace `acpi_bus_get_status()` to
understand the 100 ms delay to solve bug #208705 [2]


Kind regards,

Paul


PS: Do you know of GNU/Linux live systems that are available for all
Linux kernel releases and have an initrd, that just stores/uploads the
output of `dmesg`?


[1]: https://bugzilla.kernel.org/show_bug.cgi?id=215419
"100 ms regression in boottime before `ACPI: PCI Root Bridge [PCI0]"
[2]: https://bugzilla.kernel.org/show_bug.cgi?id=208705
"boot performance: 100 ms delay in PCI initialization - Acer
TravelMate 5735Z"


Subject: Re: 100 ms boot time increase regression in acpi_init()/acpi_scan_bus()

Hi, this is your Linux kernel regression tracker speaking.

On 10.01.22 12:29, Paul Menzel wrote:
> #regzbot introduced: v5.13..v5.14-rc1
> #regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=215419

Thx for getting regzbot involved!

Nothing happened since you reported the issue three weeks ago; sure,
it's not a pressing issue, but I wonder what the status is.

@pm people: isn't this at least worth a reply?
@paul: did you perform any additional checks?

Or did anything happen somewhere else and I just missed it?

#regzbot poke

Ciao, Thorsten (wearing his 'Linux kernel regression tracker' hat)

P.S.: As a Linux kernel regression tracker I'm getting a lot of reports
on my table. I can only look briefly into most of them. Unfortunately
therefore I sometimes will get things wrong or miss something important.
I hope that's not the case here; if you think it is, don't hesitate to
tell me about it in a public reply, that's in everyone's interest.

BTW, I have no personal interest in this issue, which is tracked using
regzbot, my Linux kernel regression tracking bot
(https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
this mail to get things rolling again and hence don't need to be CC on
all further activities wrt to this regression.

> On the Intel T4500 laptop Acer TravelMate 5735Z with Debian
> sid/unstable, there is a 100 ms introduced between Linux 5.10.46 and
> 5.13.9, and is still present in Linux 5.15.5.
>
>     [    0.000000] microcode: microcode updated early to revision 0xa0b,
> date = 2010-09-28
>     [    0.000000] Linux version 5.15.0-2-amd64
> ([email protected]) (gcc-11 (Debian 11.2.0-13) 11.2.0, GNU
> ld (GNU Binutils for Debian) 2.37) #1 SMP Debian 5.15.5-2 (2021-12-18)
>     [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.15.0-2-amd64
> root=UUID=e17cec4f-d2b8-4cc3-bd39-39a10ed422f4 ro quiet noisapnp
> cryptomgr.notests random.trust_cpu=on initcall_debug log_buf_len=4M
>     […]
>     [    0.262243] calling  acpi_init+0x0/0x487 @ 1
>     […]
>     [    0.281655] ACPI: Enabled 15 GPEs in block 00 to 3F
>     [    0.394855] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
>     […]
>     [    0.570908] initcall acpi_init+0x0/0x487 returned 0 after 300781
> usecs
>
> I attached all the log files to the Kernel.org Bugzilla bug report
> #215419 [1].
>
> Unfortunately, I am unable to bisect the issue, as it’s not my machine,
> and I do not have a lot of access to it.
>
> Using ftrace, unfortunately, I didn’t save all of them, I think the path is
>
>     acpi_init() → acpi_scan_init() → acpi_bus_scan(ACPI_ROOT_OBJECT)
>
> But this path hasn’t changed as far as I can see. Anyway, from that
> path, somehow
>
>     acpi_bus_check_add_1() → acpi_bus_check_add() → … →
> acpi_bus_check_add() → acpi_add_single_object() → acpi_bus_get_status()
>
> is called, and the `acpi_bus_get_status()` call takes 100 ms on the
> system – also the cause for bug #208705 [2] –, but that code path wasn’t
> taken before.
>
> Do you know from the top of your head, what changed? I am going to have
> short access to the system every two weeks or so, so debugging is
> unfortunately quite hard.
>
> What is already on my to-do list:
>
> 1.  Use dynamic debug `drivers/acpi/scan.c`
> 2.  Trace older Linux kernel (5.10.46) to see the differences
> 3.  Booting some GNU/Linux system to test 5.11 (Ubuntu 20.10) and 5.12
> 4.  Unrelated to the regression, but trace `acpi_bus_get_status()` to
> understand the 100 ms delay to solve bug #208705 [2]
>
>
> Kind regards,
>
> Paul
>
>
> PS: Do you know of GNU/Linux live systems that are available for all
> Linux kernel releases and have an initrd, that just stores/uploads the
> output of `dmesg`?
>
>
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=215419
>      "100 ms regression in boottime before `ACPI: PCI Root Bridge [PCI0]"
> [2]: https://bugzilla.kernel.org/show_bug.cgi?id=208705
>      "boot performance: 100 ms delay in PCI initialization - Acer
> TravelMate 5735Z"
>

2022-02-01 20:51:49

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: 100 ms boot time increase regression in acpi_init()/acpi_scan_bus()

On Mon, Jan 10, 2022 at 12:29:44PM +0100, Paul Menzel wrote:
> PS: Do you know of GNU/Linux live systems that are available for all Linux
> kernel releases and have an initrd, that just stores/uploads the output of
> `dmesg`?

That would be pretty slick.

There is a collection of dmesg logs at https://github.com/linuxhw/Dmesg,
mentioned earlier:
https://lore.kernel.org/r/[email protected]

Subject: Re: 100 ms boot time increase regression in acpi_init()/acpi_scan_bus()

Hi, this is your Linux kernel regression tracker speaking.

Top-posting for once, to make this easy accessible to everyone.

@Rafael or any other @acpi/@pm developer: what's the status here?
Neither in this thread nor in the bug ticket anything happened afaics.
Or is a 100ms boot time increase considered "not a regression"?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I'm getting a lot of
reports on my table. I can only look briefly into most of them and lack
knowledge about most of the areas they concern. I thus unfortunately
will sometimes get things wrong or miss something important. I hope
that's not the case here; if you think it is, don't hesitate to tell me
in a public reply, it's in everyone's interest to set the public record
straight.

#regzbot poke

On 10.01.22 12:29, Paul Menzel wrote:
> #regzbot introduced: v5.13..v5.14-rc1
> #regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=215419
>
>
> Dear Linux folks,
>
>
> On the Intel T4500 laptop Acer TravelMate 5735Z with Debian
> sid/unstable, there is a 100 ms introduced between Linux 5.10.46 and
> 5.13.9, and is still present in Linux 5.15.5.
>
>     [    0.000000] microcode: microcode updated early to revision 0xa0b,
> date = 2010-09-28
>     [    0.000000] Linux version 5.15.0-2-amd64
> ([email protected]) (gcc-11 (Debian 11.2.0-13) 11.2.0, GNU
> ld (GNU Binutils for Debian) 2.37) #1 SMP Debian 5.15.5-2 (2021-12-18)
>     [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.15.0-2-amd64
> root=UUID=e17cec4f-d2b8-4cc3-bd39-39a10ed422f4 ro quiet noisapnp
> cryptomgr.notests random.trust_cpu=on initcall_debug log_buf_len=4M
>     […]
>     [    0.262243] calling  acpi_init+0x0/0x487 @ 1
>     […]
>     [    0.281655] ACPI: Enabled 15 GPEs in block 00 to 3F
>     [    0.394855] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
>     […]
>     [    0.570908] initcall acpi_init+0x0/0x487 returned 0 after 300781
> usecs
>
> I attached all the log files to the Kernel.org Bugzilla bug report
> #215419 [1].
>
> Unfortunately, I am unable to bisect the issue, as it’s not my machine,
> and I do not have a lot of access to it.
>
> Using ftrace, unfortunately, I didn’t save all of them, I think the path is
>
>     acpi_init() → acpi_scan_init() → acpi_bus_scan(ACPI_ROOT_OBJECT)
>
> But this path hasn’t changed as far as I can see. Anyway, from that
> path, somehow
>
>     acpi_bus_check_add_1() → acpi_bus_check_add() → … →
> acpi_bus_check_add() → acpi_add_single_object() → acpi_bus_get_status()
>
> is called, and the `acpi_bus_get_status()` call takes 100 ms on the
> system – also the cause for bug #208705 [2] –, but that code path wasn’t
> taken before.
>
> Do you know from the top of your head, what changed? I am going to have
> short access to the system every two weeks or so, so debugging is
> unfortunately quite hard.
>
> What is already on my to-do list:
>
> 1.  Use dynamic debug `drivers/acpi/scan.c`
> 2.  Trace older Linux kernel (5.10.46) to see the differences
> 3.  Booting some GNU/Linux system to test 5.11 (Ubuntu 20.10) and 5.12
> 4.  Unrelated to the regression, but trace `acpi_bus_get_status()` to
> understand the 100 ms delay to solve bug #208705 [2]
>
>
> Kind regards,
>
> Paul
>
>
> PS: Do you know of GNU/Linux live systems that are available for all
> Linux kernel releases and have an initrd, that just stores/uploads the
> output of `dmesg`?
>
>
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=215419
>      "100 ms regression in boottime before `ACPI: PCI Root Bridge [PCI0]"
> [2]: https://bugzilla.kernel.org/show_bug.cgi?id=208705
>      "boot performance: 100 ms delay in PCI initialization - Acer
> TravelMate 5735Z"
>
>

2022-02-09 12:22:03

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 100 ms boot time increase regression in acpi_init()/acpi_scan_bus()

On Tue, Feb 8, 2022 at 1:03 PM Thorsten Leemhuis
<[email protected]> wrote:
>
> Hi, this is your Linux kernel regression tracker speaking.
>
> Top-posting for once, to make this easy accessible to everyone.
>
> @Rafael or any other @acpi/@pm developer: what's the status here?
> Neither in this thread nor in the bug ticket anything happened afaics.
> Or is a 100ms boot time increase considered "not a regression"?

It is on my list of things to look into, because it used to work
faster, so depending on what change made it work slower, it may be
possible to restore the previous performance without sacrificing too
much.

I honestly don't think that it is a noticeable issue for users in the
field, though.