I'm experiencing a delay or hang in powering off my computer after `halt
-d -f -i -p' and I've bisected it to this commit in the mainline tree:
commit 6d25be5782e482eb93e3de0c94d0a517879377d0
Author: Thomas Gleixner <[email protected]>
Date: Wed Mar 13 17:55:48 2019 +0100
sched/core, workqueues: Distangle worker accounting from rq lock
The delays have varied in length from ~20 seconds to seeminingly
indefinitely long (the longest I've waited before pressing the start
button on the machine is 6 minutes). With kernels prior to this commit,
my machine powers off within 4 seconds after the halt invocation.
I first noticed this problem with the stable kernel 5.5.9 that I built
with Linux From Scratch (LFS). My previous LFS system had stable kernel
4.20.12 and there was no such delay. I also tried stable kernel 5.6.4
and the delay/hang also happens with it. After consulting with LFS
developers I cloned the mainline kernel repository and determined that
problem happens with commit with v5.2 (0ecfebd2) but not with v5.1
(e93c9c99), then ran the bisection with those boundaries.
With mainline kernels (I only tried with v.5.3 and then commits between
v5.1 and v5.2 during the bisection) there are no kernel messages during
the delay/hang, but with 5.5.9 and 5.6.4, when the delay is longer than
two minutes, a message like the following is displayed:
sr 5:0:0:0: tag#21 timing out command, waited 120s
repeating every 2 minutes with a different tag#. This made me think the
problem was due to the cdrom device in the machine, but when I compiled
5.3.0 without /dev/cdrom (by not setting CONFIG_BLK_DEV_SR, which also
left CONFIG_CDROM unset), the delay/hang still happened.
The length of the delay seems to depend not only on the kernel but also
the number or kind of programs used. The 20 second delay has happened
only went I invoke `shutdown -h now' (which runs `halt -d -f -i -p' in
LFS) immediately after booting. When I ran `more' in one tty and
`shutdown -h now' in another, the machine powered off in less than two
minutes. But after running emacs in a tty (and then killing it), the
delay was seemingly indefinite, as it is with my usual workflow with X,
emacs, firefox, etc. running (but I always kill them before invoking
`shutdown -h now' from a tty). During the bisection I tested each
kernel build by running startx, emacs, firefox, killing these, then
`shutdown -h now'. I've attached the git log of the bisection augmented
with power-off timing notes. (FWIW, when I reboot the machine with
`shutdown -r now' there is no delay with the problematic kernels.)
One piece of advice I got from the LFS mailing list was to try and
revert the above commit pinpointed by the bisection in a recent mainline
or stable kernel. But it fails to revert cleanly (using `patch -R') in
such kernels and I am too unfamiliar with the code to try and revert it
manually.
Another pointer I got was to the recent commit 62849a96, which fixes a
bug due to the commit that causes my problem. I applied 62849a96 to
5.6.4, but it did not prevent the delay/hang.
I've found nothing on the web about this problem, so it seems to be
rare. That's why I'm seeking your help (on the advice of LFS
developers). In addition to the bisection log I've also attached the
kernel 5.1.0 config file (I used that during the bisection, accepting
all defaults of later options, as with the configs of 5.5.9 and 5.6.4)
and the contents of /proc/cpuinfo. Please let me know if there's any
other information I can provide or anything else I can do to help with
debugging this issue. I hope you can fix it.
Thanks,
Steve Berman
On 2020-05-01 17:46:48 [+0200], Stephen Berman wrote:
> I'm experiencing a delay or hang in powering off my computer after `halt
> -d -f -i -p' and I've bisected it to this commit in the mainline tree:
You refer to a normal "poweroff" or is this some kind of "shutdown now"
kind of thing? Unless I'm mistaken, the `halt' command above will turn
off the machine without stopping user tasks (as in "reaching shutdown
level").
…
> The delays have varied in length from ~20 seconds to seeminingly
> indefinitely long (the longest I've waited before pressing the start
> button on the machine is 6 minutes). With kernels prior to this commit,
> my machine powers off within 4 seconds after the halt invocation.
So you say that normally the machine shuts down immediately but now it
takes 20secs+ if it shuts down at all?
…
> sr 5:0:0:0: tag#21 timing out command, waited 120s
The CD-drive is polled regularly so it notices when a CD us inserted /
removed. It seems that one of the requests wasn't answered and it ended
up in timeout.
…
> minutes. But after running emacs in a tty (and then killing it), the
> delay was seemingly indefinite, as it is with my usual workflow with X,
> emacs, firefox, etc. running (but I always kill them before invoking
> `shutdown -h now' from a tty). During the bisection I tested each
> kernel build by running startx, emacs, firefox, killing these, then
> `shutdown -h now'. I've attached the git log of the bisection augmented
> with power-off timing notes. (FWIW, when I reboot the machine with
> `shutdown -r now' there is no delay with the problematic kernels.)
So reboot is not affected, just the halt. Could you please check if the
"poweroff" variant is also affected?
…
> Another pointer I got was to the recent commit 62849a96, which fixes a
> bug due to the commit that causes my problem. I applied 62849a96 to
> 5.6.4, but it did not prevent the delay/hang.
You should see a warning if you were facing the problem described in the
commit.
…
> Please let me know if there's any
> other information I can provide or anything else I can do to help with
> debugging this issue. I hope you can fix it.
Can you send dmesg output of the system / lspci -k? I'm mainly
interested what drivers are bound to storage devices (you can send it
offlist if you want).
Can you log the output on the serial console?
If the commit you cited is really the problem then it would mean that a
worker isn't scheduled for some reason. Could you please enable
CONFIG_WQ_WATCHDOG to see if workqueue core code notices that a worker
isn't making progress?
> Thanks,
> Steve Berman
Sebastian
On Wed, 6 May 2020 23:57:13 +0200 Sebastian Andrzej Siewior <[email protected]> wrote:
> On 2020-05-01 17:46:48 [+0200], Stephen Berman wrote:
>> I'm experiencing a delay or hang in powering off my computer after `halt
>> -d -f -i -p' and I've bisected it to this commit in the mainline tree:
>
> You refer to a normal "poweroff" or is this some kind of "shutdown now"
> kind of thing? Unless I'm mistaken, the `halt' command above will turn
> off the machine without stopping user tasks (as in "reaching shutdown
> level").
What I actually do is `shutdown -h now' but my understanding is that (at
least in LFS) the last thing that happens when `shutdown -h now' runs is
effectively the same as `halt -d -f -i -p'. The reason I wrote that
above is because I did not start timing immediately when I entered
`shutdown -h now', since it displays messages of the actions it takes,
like unmounting filesystems and so on. I start timing after the last
message before powering off (or before the hang/delay), which in LFS is
"Bringing down the loopback interface". When powering off succeeds
normally, it takes ~4 seconds after that message appears. I've since
tested directly entering `halt -d -f -i -p' with kernel 5.1.0: there
were no messages like with `shutdown -h now' and after ~4 seconds, the
machine powered off, which seems to confirm my understanding. I also
tested `halt -d -f -i -p' with kernel 5.6.4, and here, too, there were
no messages like with `shutdown -h now' but after ~7 seconds this
appeared in the tty:
sd 4:0:0:0: [sda] tag#13 timing out command, waited 7s
and after two minutes, this:
sr 5:0:0:0: tag#14 timing out command, waited 120s
and then I pressed the reset button.
> …
>> The delays have varied in length from ~20 seconds to seeminingly
>> indefinitely long (the longest I've waited before pressing the start
>> button on the machine is 6 minutes). With kernels prior to this commit,
>> my machine powers off within 4 seconds after the halt invocation.
>
> So you say that normally the machine shuts down immediately but now it
> takes 20secs+ if it shuts down at all?
Yes.
> …
>> sr 5:0:0:0: tag#21 timing out command, waited 120s
>
> The CD-drive is polled regularly so it notices when a CD us inserted /
> removed. It seems that one of the requests wasn't answered and it ended
> up in timeout.
But the drive is always not in use and empty when I run `shutdown -h
now' (in fact I rarely use that drive at all). And the message I
referred to above "sd 4:0:0:0: [sda] tag#13 timing out command, waited
7s" seems to indicate that other polling requests also aren't answered.
> …
>> minutes. But after running emacs in a tty (and then killing it), the
>> delay was seemingly indefinite, as it is with my usual workflow with X,
>> emacs, firefox, etc. running (but I always kill them before invoking
>> `shutdown -h now' from a tty). During the bisection I tested each
>> kernel build by running startx, emacs, firefox, killing these, then
>> `shutdown -h now'. I've attached the git log of the bisection augmented
>> with power-off timing notes. (FWIW, when I reboot the machine with
>> `shutdown -r now' there is no delay with the problematic kernels.)
>
> So reboot is not affected, just the halt.
Yes.
> Could you please check if the
> "poweroff" variant is also affected?
I have tried `poweroff' and also `shutdown -hP now' and these make no
difference: with kernel 5.1.0 and earlier, the machine powers off
promptly, with 5.2.0 and later it doesn't.
> …
>> Another pointer I got was to the recent commit 62849a96, which fixes a
>> bug due to the commit that causes my problem. I applied 62849a96 to
>> 5.6.4, but it did not prevent the delay/hang.
>
> You should see a warning if you were facing the problem described in the
> commit.
I didn't see any warning, so that doesn't seem related to my issue.
> …
>> Please let me know if there's any
>> other information I can provide or anything else I can do to help with
>> debugging this issue. I hope you can fix it.
>
> Can you send dmesg output of the system / lspci -k? I'm mainly
> interested what drivers are bound to storage devices (you can send it
> offlist if you want).
I've attached them (I included dmesg both from the "good" kernel 5.1.0
and the "bad" kernel 5.6.4; they differ somewhat but I don't know if the
differences are significant).
> Can you log the output on the serial console?
How do I do that?
> If the commit you cited is really the problem then it would mean that a
> worker isn't scheduled for some reason. Could you please enable
> CONFIG_WQ_WATCHDOG to see if workqueue core code notices that a worker
> isn't making progress?
How will I know if that happens, is there a specific message in the tty?
Thanks for your reply.
Steve Berman
On 2020-05-08 23:30:45 [+0200], Stephen Berman wrote:
> > Can you log the output on the serial console?
>
> How do I do that?
The spec for your mainboard says "serial port header". You would need to
connect a cable there to another computer and log its output.
The alternative would be to delay the output on the console and use a
camera.
> > If the commit you cited is really the problem then it would mean that a
> > worker isn't scheduled for some reason. Could you please enable
> > CONFIG_WQ_WATCHDOG to see if workqueue core code notices that a worker
> > isn't making progress?
>
> How will I know if that happens, is there a specific message in the tty?
On the tty console where you see the "timing out command, waited"
message, there should be something starting with
|BUG: workqueue lockup - pool
following with the pool information that got stuck. That code checks the
workqueues every 30secs by default. So if you waited >= 60secs then
system is not detecting a stall.
As far as I can tell, there is nothing special on your system. The CD
and disk drives are served by the AHCI controller. There is no special
SCSI/SATA/SAS controller.
Right now I have no idea how the workqueues fit in the picture. Could
you please check if the stall-dector says something?
Is it possible to show me output when the timeout message comes? My
guess is that the system is going down and before unounting/remount RO
the filesystem it flushes its last data. But this is done before issuing
the "halt-syscall".
> Thanks for your reply.
>
> Steve Berman
Sebastian
On Thu, 14 May 2020 00:04:28 +0200 Sebastian Andrzej Siewior <[email protected]> wrote:
> On 2020-05-08 23:30:45 [+0200], Stephen Berman wrote:
>> > Can you log the output on the serial console?
>>
>> How do I do that?
>
> The spec for your mainboard says "serial port header". You would need to
> connect a cable there to another computer and log its output.
> The alternative would be to delay the output on the console and use a
> camera.
It's easiest for me to take a picture, since there isn't much output and
in any case the delay happens on it's own ;-). I'm sending you the
image (from kernel 5.6.4) off-list since even after reducing it it's 1.2
MB large.
>> > If the commit you cited is really the problem then it would mean that a
>> > worker isn't scheduled for some reason. Could you please enable
>> > CONFIG_WQ_WATCHDOG to see if workqueue core code notices that a worker
>> > isn't making progress?
I enabled that and also CONFIG_SOFTLOCKUP_DETECTOR,
CONFIG_HARDLOCKUP_DETECTOR and CONFIG_DETECT_HUNG_TASK, which had all
been unset previously.
>> How will I know if that happens, is there a specific message in the tty?
>
> On the tty console where you see the "timing out command, waited"
> message, there should be something starting with
> |BUG: workqueue lockup - pool
>
> following with the pool information that got stuck. That code checks the
> workqueues every 30secs by default. So if you waited >= 60secs then
> system is not detecting a stall.
As you can see in the photo, there was no message about a workqueue
lockup, only "task halt:5320 blocked for more than <XXX> seconds" every
two minutes. I suppose that comes from one of the other options I
enabled. Does it reveal anything about the problem?
> As far as I can tell, there is nothing special on your system. The CD
> and disk drives are served by the AHCI controller. There is no special
> SCSI/SATA/SAS controller.
> Right now I have no idea how the workqueues fit in the picture. Could
> you please check if the stall-dector says something?
Is that the message I repeated above or do you mean the workqueue?
> Is it possible to show me output when the timeout message comes? My
> guess is that the system is going down and before unounting/remount RO
> the filesystem it flushes its last data. But this is done before issuing
> the "halt-syscall".
The entire output from `shutdown -h now' is in the picture; after the
fourth "timing out command" message, I pressed the reset button.
Steve Berman
Sorry for the late reply.
On 2020-05-14 23:39:40 [+0200], Stephen Berman wrote:
> >> How will I know if that happens, is there a specific message in the tty?
> >
> > On the tty console where you see the "timing out command, waited"
> > message, there should be something starting with
> > |BUG: workqueue lockup - pool
> >
> > following with the pool information that got stuck. That code checks the
> > workqueues every 30secs by default. So if you waited >= 60secs then
> > system is not detecting a stall.
>
> As you can see in the photo, there was no message about a workqueue
> lockup, only "task halt:5320 blocked for more than <XXX> seconds" every
> two minutes. I suppose that comes from one of the other options I
> enabled. Does it reveal anything about the problem?
From the picture, you are on your way to level 0, which would issue the
final shutdown command, but you are not quite there yet.
If you add a printk() to the reboot syscall, then I wouldn't expect you
to see it. (something like that):
diff --git a/kernel/reboot.c b/kernel/reboot.c
index c4d472b7f1b42..19bc35bc0cda0 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -314,6 +314,7 @@ SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,
char buffer[256];
int ret = 0;
+ pr_err("%s(%d)CMD: %lx\n", __func__, __LINE__, cmd);
/* We only trust the superuser with rebooting the system. */
if (!ns_capable(pid_ns->user_ns, CAP_SYS_BOOT))
return -EPERM;
If you add "ignore_loglevel initcall_debug" to the command line then you
should see the init callbacks of each driver. But there will be nothing
on your shutdown (as I expect it).
The "task X blocked for more than 120 secs" is part of the hung task
detector. With the "ignore_loglevel" above you should be able to see the
callchain of the task. I suspect that the task poked the cd-drive which
isn't answering. So from detector's point of view, the task issued a
system call which appears to hang an makes no progress.
> > Could
> > you please check if the stall-dector says something?
>
> Is that the message I repeated above or do you mean the workqueue?
The hung message is not workqueue related. It is the task `halt' that
makes no progress. There is not stall of the workqueue as far as the
system can tell.
The two boot options, I suggested above, may reveal additional
information that are printed but suppressed due to the loglevel.
My guess now is that maybe shutting down wifi also paused the AHCI
controller which makes no progress now. So booting without cdrom/disk
should not cause any problems.
Could you please:
- try booting with "ignore_loglevel initcall_debug" and see if
additional information is printed on the console.
- Remove cd / ATA-disk to check if anything else causes a stall. As by
your report you only mentioned those two (and if I see it correctly,
you rootFS is on nvme so removing the disk might be doable).
- Could you remove the Wifi (just the driver, no the physical hw) to see
if it makes any difference?
> Steve Berman
Sebastian
After the last message here, the "echo", Call Trace, RIP and register
messages were repeated exactly as above. At this point I did a hard
reboot. A web search found something similar to the above output
("System hangs (bad RIP value) when disk used in pool is removed"), but
didn't otherwise seem to be related to the problem I'm having.
Do you still want me to check whether removing the iwlwifi driver makes
a differece? And with the CDROM still detached, or does that not
matter?
Steve Berman
On 2020-06-09 12:06:23 [+0200], Stephen Berman wrote:
> I recompiled kernel 5.6.4 with the printk() call you suggested, then
> booted the kernel with "ignore_loglevel initcall_debug" (but leaving the
> CDROM and wifi intact for now). After working as I normally do, I
> called `shutdown -h now', again as usual. After the "Bringing down the
> loopback interface" message there were these two messages:
>
> reboot: __do_sys_reboot(317)CMD: 89abcdef
> reboot: __do_sys_reboot(317)CMD: 4321fedc
>
> Then nothing more for two minutes, then, as previously:
>
> sr 5:0:0:0: tag#10 timing out command, waited 120 seconds.
>
> Then I did a hard reboot.
>
> This morning I detached the cables to the CDROM (but left the disk in
> the box) and again booted 5.6.4 with "ignore_loglevel initcall_debug".
> After working for a bit, I called `shutdown -h now', and now there were
> quite a few more messages, but again the machine did not power off.
> Here is the ouput, which I transcribed carefully, hopefully without
> error (I omitted the fractional parts of the timestamps and some of the
> usb1-portX messages):
If it helps you could delay printks on shutdown (via
/proc/sys/kernel/printk_delay) and record a video clip. I could make
storage available for an upload.
| * Bringing down the loopback interface...
| [1123.***] reboot: __do_sys_reboot(317)CMD: 89abcdef
| [1123.***] reboot: __do_sys_reboot(317)CMD: 4321fedc
This happens quicker than I expected. Initially I assumed that you are
not there yet and userland is still poking at your cdrom. Accordin
…
| [1124.***] sd 4:0:0:0: shutdown
| [1124.***] sd 4:0:0:0: shutdown [sda] Synchronizing SCSI cache
| [1124.***] sd 4:0:0:0: shutdown [sda] Stopping disk
After this, there shouldn't be any outstanding disk requests.
…
| [1125.***] ahci 0000:00:17.0: shutdown
This disables the ahci controller which means disk/cdrom requests won't
be answered by the hardware.
…
| [1125.***] ACPI: Preparing to enter system sleep state S5
| [1352.***] INFO: task halt:5187 blocked for more than 122 seconds.
| [1352.***] Not tainted 5.6.4 #4
| [1352.***] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message
| [1352.***] halt D13856 5187 5186 0x00004000
| [1352.***] Call Trace:
| [1352.***] ? __schedule+0x272/0x5b0
| [1352.***] schedule+0x45/0xb0
| [1352.***] schedule_timeout+0x204/0x2f0
| [1352.***] ? acpi_os_release_object+0x5/0x10
| [1352.***] ? acpi_ut_update_object_reference+0x14e/0x1d2
| [1352.***] wait_for_completion+0xa3/0x100
| [1352.***] ? wake_up_q+0x90/0x90
| [1352.***] flush_workqueue+0x130/0x420
I have no idea where this flush_workqueue() is coming from. The command
scripts/decode_stacktrace.sh vmlinux $(pwd)/ kernel_power_off+0x3d/0x70
should reveal that.
The cdrom is polled by the kernel every two seconds. I *think* the
kernel is blocked for some reason and then the cdrom polled. This can't
complete and you see the srX warning. The window is quite small but not
impossible.
I managed to fabricate this case but after
|[ 137.581613] sr 3:0:0:0: tag#1 timing out command, waited 120s
The system shutdowns. So you are facing something different. I am
surprised that the workqueue stall detector did not yell here. The patch
at the bottom should get rid of the sr warnings.
| [1352.***] kernel_power_off+0x3d/0x70
| [1352.***] __do_sys_reboot+0x140/0x220
…
> After the last message here, the "echo", Call Trace, RIP and register
> messages were repeated exactly as above. At this point I did a hard
> reboot. A web search found something similar to the above output
> ("System hangs (bad RIP value) when disk used in pool is removed"), but
> didn't otherwise seem to be related to the problem I'm having.
>
> Do you still want me to check whether removing the iwlwifi driver makes
> a differece? And with the CDROM still detached, or does that not
> matter?
I assumed the wifi driver shuts the AHCI port for some reason. But
according to this log it does not happen, the ahci port is shutdown
properly. The patch at the bottom should get rid of the sr warning. Then
could you please try the other patch so we see which workqueue is
blocked? I am curious to see why the system is blocked.
> Steve Berman
Sebastian
From: Sebastian Andrzej Siewior <[email protected]>
Date: Tue, 9 Jun 2020 22:13:46 +0200
Subject: [PATCH] SCSI: Disable CD-ROM poll on shutdown
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
drivers/scsi/sr.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index d2fe3fa470f95..2484e594af283 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -85,6 +85,13 @@ static blk_status_t sr_init_command(struct scsi_cmnd *SCpnt);
static int sr_done(struct scsi_cmnd *);
static int sr_runtime_suspend(struct device *dev);
+static void sr_shutdown(struct device *dev)
+{
+ struct scsi_cd *cd = dev_get_drvdata(dev);
+
+ disk_block_events(cd->disk);
+}
+
static const struct dev_pm_ops sr_pm_ops = {
.runtime_suspend = sr_runtime_suspend,
};
@@ -95,6 +102,7 @@ static struct scsi_driver sr_template = {
.owner = THIS_MODULE,
.probe = sr_probe,
.remove = sr_remove,
+ .shutdown = sr_shutdown,
.pm = &sr_pm_ops,
},
.init_command = sr_init_command,
--
2.27.0
On Tue, 09 Jun 2020 12:06:23 +0200 Stephen Berman <[email protected]> wrote:
> Do you still want me to check whether removing the iwlwifi driver makes
> a differece? And with the CDROM still detached, or does that not
> matter?
I'm not actually sure just what you mean by removing the wifi driver,
but I just now tried moving the iwlwifi firmware out of /lib/firmware,
then booted 5.6.4 as before (with "ignore_loglevel initcall_debug" and
with the CDROM still detached), and of course there was now no wireless
interface. I started X, Emacs, Firefox (of course with no internet),
VLC, used them a bit then closed them all, then called `shutdown -h now'
and the message output was essentially identical to what I posted
previously (but of course without "iwlwifi 0000:02:00.0: shutdown") and
the machine did not power off. So if this was the test of removing the
wifi driver, then it made no difference.
Steve Berman
(GMX put your email into my spam folder, so I didn't see it before I
sent my followup about removing the wifi firmware.)
On Tue, 9 Jun 2020 22:23:39 +0200 Sebastian Andrzej Siewior <[email protected]> wrote:
> On 2020-06-09 12:06:23 [+0200], Stephen Berman wrote:
>> I recompiled kernel 5.6.4 with the printk() call you suggested, then
>> booted the kernel with "ignore_loglevel initcall_debug" (but leaving the
>> CDROM and wifi intact for now). After working as I normally do, I
>> called `shutdown -h now', again as usual. After the "Bringing down the
>> loopback interface" message there were these two messages:
>>
>> reboot: __do_sys_reboot(317)CMD: 89abcdef
>> reboot: __do_sys_reboot(317)CMD: 4321fedc
>>
>> Then nothing more for two minutes, then, as previously:
>>
>> sr 5:0:0:0: tag#10 timing out command, waited 120 seconds.
>>
>> Then I did a hard reboot.
>>
>> This morning I detached the cables to the CDROM (but left the disk in
>> the box) and again booted 5.6.4 with "ignore_loglevel initcall_debug".
>> After working for a bit, I called `shutdown -h now', and now there were
>> quite a few more messages, but again the machine did not power off.
>> Here is the ouput, which I transcribed carefully, hopefully without
>> error (I omitted the fractional parts of the timestamps and some of the
>> usb1-portX messages):
>
> If it helps you could delay printks on shutdown (via
> /proc/sys/kernel/printk_delay) and record a video clip. I could make
> storage available for an upload.
So far this hasn't been necessary, since the new output until the first
two-minute pause fit on one screen and I could photograph it.
> | * Bringing down the loopback interface...
> | [1123.***] reboot: __do_sys_reboot(317)CMD: 89abcdef
> | [1123.***] reboot: __do_sys_reboot(317)CMD: 4321fedc
> This happens quicker than I expected. Initially I assumed that you are
> not there yet and userland is still poking at your cdrom. Accordin
> …
> | [1124.***] sd 4:0:0:0: shutdown
> | [1124.***] sd 4:0:0:0: shutdown [sda] Synchronizing SCSI cache
> | [1124.***] sd 4:0:0:0: shutdown [sda] Stopping disk
>
> After this, there shouldn't be any outstanding disk requests.
> …
> | [1125.***] ahci 0000:00:17.0: shutdown
>
> This disables the ahci controller which means disk/cdrom requests won't
> be answered by the hardware.
> …
> | [1125.***] ACPI: Preparing to enter system sleep state S5
> | [1352.***] INFO: task halt:5187 blocked for more than 122 seconds.
> | [1352.***] Not tainted 5.6.4 #4
> | [1352.***] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message
> | [1352.***] halt D13856 5187 5186 0x00004000
> | [1352.***] Call Trace:
> | [1352.***] ? __schedule+0x272/0x5b0
> | [1352.***] schedule+0x45/0xb0
> | [1352.***] schedule_timeout+0x204/0x2f0
> | [1352.***] ? acpi_os_release_object+0x5/0x10
> | [1352.***] ? acpi_ut_update_object_reference+0x14e/0x1d2
> | [1352.***] wait_for_completion+0xa3/0x100
> | [1352.***] ? wake_up_q+0x90/0x90
> | [1352.***] flush_workqueue+0x130/0x420
>
> I have no idea where this flush_workqueue() is coming from. The command
> scripts/decode_stacktrace.sh vmlinux $(pwd)/ kernel_power_off+0x3d/0x70
>
> should reveal that.
I saved the call trace lines to a file and did this (with `$(pwd)/' the
script did not run):
steve [ /sources/linux-5.6.4 ]$ ./scripts/decode_stacktrace.sh vmlinux . kernel_power_off+0x3d/0x70 < /home/steve/temp/call_trace
[1352.***] Call Trace:
[1352.***] ? __schedule (core.c:?)
[1352.***] schedule (??:?)
[1352.***] schedule_timeout (??:?)
[1352.***] ? acpi_os_release_object (??:?)
[1352.***] ? acpi_ut_update_object_reference (??:?)
[1352.***] wait_for_completion (??:?)
[1352.***] ? wake_up_q (??:?)
[1352.***] flush_workqueue (??:?)
[1352.***] kernel_power_off (??:?)
[1352.***] __do_sys_reboot (reboot.c:?)
[1352.***] ? do_send_sig_info (??:?)
[1352.***] ? kill_pid_info (??:?)
[1352.***] ? kill_something_info (signal.c:?)
[1352.***] ? __x64_sys_kill (??:?)
[1352.***] do_syscall_64 (??:?)
[1352.***] entry_SYSCALL_64_after_hwframe (??:?)
[1352.***] RIP: 0033:0x7f95dd6992c3
[1352.***] Code: Bad RIP value.
objdump: '/tmp/tmp.BlpNDdSQpG.o': No such file
Did I do something wrong or do I need to pass certain compiler flags
when building the kernel (when I build Emacs I pass CFLAGS='-Og -g3')?
> The cdrom is polled by the kernel every two seconds. I *think* the
> kernel is blocked for some reason and then the cdrom polled. This can't
> complete and you see the srX warning. The window is quite small but not
> impossible.
> I managed to fabricate this case but after
> |[ 137.581613] sr 3:0:0:0: tag#1 timing out command, waited 120s
>
> The system shutdowns. So you are facing something different. I am
> surprised that the workqueue stall detector did not yell here. The patch
> at the bottom should get rid of the sr warnings.
But the cdrom seems to have nothing to do with power-off failing, since
it happens even when the cdrom is detached (and hence there were no sr
warnings in the log output), or am I misunderstanding you?
> | [1352.***] kernel_power_off+0x3d/0x70
> | [1352.***] __do_sys_reboot+0x140/0x220
> …
>
>> After the last message here, the "echo", Call Trace, RIP and register
>> messages were repeated exactly as above. At this point I did a hard
>> reboot. A web search found something similar to the above output
>> ("System hangs (bad RIP value) when disk used in pool is removed"), but
>> didn't otherwise seem to be related to the problem I'm having.
>>
>> Do you still want me to check whether removing the iwlwifi driver makes
>> a differece? And with the CDROM still detached, or does that not
>> matter?
>
> I assumed the wifi driver shuts the AHCI port for some reason. But
> according to this log it does not happen, the ahci port is shutdown
> properly. The patch at the bottom should get rid of the sr warning. Then
> could you please try the other patch so we see which workqueue is
> blocked? I am curious to see why the system is blocked.
By the other patch do you mean the following? (This email was also put
into my spam by GMX and AFAICT has not yet shown up on the vger list.)
From: Hillf Danton <[email protected]>
Subject: Re: power-off delay/hang due to commit 6d25be57 (mainline)
To: Stephen Berman <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>,
Thomas Gleixner <[email protected]>,
Peter Zijlstra <[email protected]>,
Markus Elfring <[email protected]>,
[email protected]
Date: Tue, 9 Jun 2020 23:06:07 +0800
[...]
Curious what the workqueue it is blocking halt.
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2826,7 +2826,14 @@ void flush_workqueue(struct workqueue_st
mutex_unlock(&wq->mutex);
- wait_for_completion(&this_flusher.done);
+ if (system_state == SYSTEM_POWER_OFF) {
+ /*
+ * save data before running at warned risk in any form
+ */
+ wait_for_completion_timeout(&this_flusher.done, 2*HZ);
+ pr_info("workqueue %s 2s since SYSTEM_POWER_OFF\n", wq->name);
+ } else
+ wait_for_completion(&this_flusher.done);
/*
* Wake-up-and-cascade phase
Steve Berman
On 2020-06-10 10:21:29 [+0200], Stephen Berman wrote:
> (GMX put your email into my spam folder, so I didn't see it before I
> sent my followup about removing the wifi firmware.)
okay.
> On Tue, 9 Jun 2020 22:23:39 +0200 Sebastian Andrzej Siewior <[email protected]> wrote:
> > scripts/decode_stacktrace.sh vmlinux $(pwd)/ kernel_power_off+0x3d/0x70
> >
> > should reveal that.
>
> I saved the call trace lines to a file and did this (with `$(pwd)/' the
> script did not run):
>
> steve [ /sources/linux-5.6.4 ]$ ./scripts/decode_stacktrace.sh vmlinux . kernel_power_off+0x3d/0x70 < /home/steve/temp/call_trace
> [1352.***] Call Trace:
> [1352.***] ? __schedule (core.c:?)
> [1352.***] schedule (??:?)
> [1352.***] schedule_timeout (??:?)
> [1352.***] ? acpi_os_release_object (??:?)
> [1352.***] ? acpi_ut_update_object_reference (??:?)
> [1352.***] wait_for_completion (??:?)
> [1352.***] ? wake_up_q (??:?)
> [1352.***] flush_workqueue (??:?)
> [1352.***] kernel_power_off (??:?)
> [1352.***] __do_sys_reboot (reboot.c:?)
> [1352.***] ? do_send_sig_info (??:?)
> [1352.***] ? kill_pid_info (??:?)
> [1352.***] ? kill_something_info (signal.c:?)
> [1352.***] ? __x64_sys_kill (??:?)
> [1352.***] do_syscall_64 (??:?)
> [1352.***] entry_SYSCALL_64_after_hwframe (??:?)
> [1352.***] RIP: 0033:0x7f95dd6992c3
> [1352.***] Code: Bad RIP value.
> objdump: '/tmp/tmp.BlpNDdSQpG.o': No such file
>
> Did I do something wrong or do I need to pass certain compiler flags
> when building the kernel (when I build Emacs I pass CFLAGS='-Og -g3')?
the arguments last argument should not be kernel_power_off+0x3d/0x70. I
should have been entered after "enter" but you can also pipe the whole
file as you did.
If it gives you the ?? then might need
CONFIG_DEBUG_INFO
> > The system shutdowns. So you are facing something different. I am
> > surprised that the workqueue stall detector did not yell here. The patch
> > at the bottom should get rid of the sr warnings.
>
> But the cdrom seems to have nothing to do with power-off failing, since
> it happens even when the cdrom is detached (and hence there were no sr
> warnings in the log output), or am I misunderstanding you?
Yes. I think it has nothing todo with it, it is just something
irrelevant that I do understand now. Before that trace it looked like
AHCI is failing to continue.
> By the other patch do you mean the following? (This email was also put
> into my spam by GMX and AFAICT has not yet shown up on the vger list.)
Yes. But now that I played a little with it, I made dis:
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 4e01c448b4b48..e09fc41e99fc3 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2777,6 +2777,8 @@ void flush_workqueue(struct workqueue_struct *wq)
lock_map_acquire(&wq->lockdep_map);
lock_map_release(&wq->lockdep_map);
+ if (system_state > SYSTEM_RUNNING)
+ pr_err("Flushing %s\n", wq->name);
mutex_lock(&wq->mutex);
/*
@@ -2826,7 +2828,18 @@ void flush_workqueue(struct workqueue_struct *wq)
mutex_unlock(&wq->mutex);
- wait_for_completion(&this_flusher.done);
+ if (system_state > SYSTEM_RUNNING) {
+ int ret;
+try_again:
+ ret = wait_for_completion_timeout(&this_flusher.done, 3 * HZ);
+ pr_err("Flushing %s over, %d\n", wq->name, ret);
+ if (!ret) {
+ show_workqueue_state();
+ goto try_again;
+ }
+ } else {
+ wait_for_completion(&this_flusher.done);
+ }
/*
* Wake-up-and-cascade phase
@@ -3039,11 +3052,18 @@ static bool __flush_work(struct work_struct *work, bool from_cancel)
lock_map_release(&work->lockdep_map);
}
+ if (system_state > SYSTEM_RUNNING)
+ pr_err("Flush %pS\n", work->func);
+
if (start_flush_work(work, &barr, from_cancel)) {
wait_for_completion(&barr.done);
destroy_work_on_stack(&barr.work);
+ if (system_state > SYSTEM_RUNNING)
+ pr_err("Flush %pS done\n", work->func);
return true;
} else {
+ if (system_state > SYSTEM_RUNNING)
+ pr_err("Flush %pS done\n", work->func);
return false;
}
}
So you should have output like:
|[ 14.131696] Flushing events_freezable_power_
the flush that hangs
|[ 17.203246] Flushing events_freezable_power_ over, 0
It has been noticed (0 at the end)
|[ 17.203849] Showing busy workqueues and worker pools:
|[ 17.204400] workqueue events_freezable_power_: flags=0x84
|[ 17.205009] pwq 12: cpus=6 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
|[ 17.205774] in-flight: 2147:disk_events_workfn
|[ 17.206268] workqueue kblockd: flags=0x18
|[ 17.206682] pwq 13: cpus=6 node=0 flags=0x0 nice=-20 active=1/256 refcnt=2
|[ 17.207428] pending: blk_mq_requeue_work
|[ 17.207958] pool 12: cpus=6 node=0 flags=0x0 nice=0 hung=2s workers=3 idle: 535 42
A dump of all pending worqueue items.
|[ 20.275239] Flushing events_freezable_power_ over, 0
|[ 20.275812] Showing busy workqueues and worker pools:
|[ 20.276385] workqueue events_freezable_power_: flags=0x84
|[ 20.276972] pwq 12: cpus=6 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
|[ 20.277712] in-flight: 2147:disk_events_workfn
|[ 20.278245] workqueue kblockd: flags=0x18
|[ 20.278685] pwq 13: cpus=6 node=0 flags=0x0 nice=-20 active=1/256 refcnt=2
|[ 20.279471] pending: blk_mq_requeue_work
|[ 20.279989] pool 12: cpus=6 node=0 flags=0x0 nice=0 hung=5s workers=3 idle: 535 42
Another atempt. As you see `blk_mq_requeue_work' is references in both
outputs but it makes progress (it is scheduled multiple times). The item
`disk_events_workfn' stucks.
Sebastian
[...]
>> By the other patch do you mean the following? (This email was also put
>> into my spam by GMX and AFAICT has not yet shown up on the vger list.)
>
> Yes. But now that I played a little with it, I made dis:
[...]
> So you should have output like:
> |[ 14.131696] Flushing events_freezable_power_
> the flush that hangs
>
> |[ 17.203246] Flushing events_freezable_power_ over, 0
> It has been noticed (0 at the end)
>
> |[ 17.203849] Showing busy workqueues and worker pools:
> |[ 17.204400] workqueue events_freezable_power_: flags=0x84
> |[ 17.205009] pwq 12: cpus=6 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> |[ 17.205774] in-flight: 2147:disk_events_workfn
> |[ 17.206268] workqueue kblockd: flags=0x18
> |[ 17.206682] pwq 13: cpus=6 node=0 flags=0x0 nice=-20 active=1/256 refcnt=2
> |[ 17.207428] pending: blk_mq_requeue_work
> |[ 17.207958] pool 12: cpus=6 node=0 flags=0x0 nice=0 hung=2s workers=3 idle:
> | 535 42
>
> A dump of all pending worqueue items.
>
> |[ 20.275239] Flushing events_freezable_power_ over, 0
> |[ 20.275812] Showing busy workqueues and worker pools:
> |[ 20.276385] workqueue events_freezable_power_: flags=0x84
> |[ 20.276972] pwq 12: cpus=6 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> |[ 20.277712] in-flight: 2147:disk_events_workfn
> |[ 20.278245] workqueue kblockd: flags=0x18
> |[ 20.278685] pwq 13: cpus=6 node=0 flags=0x0 nice=-20 active=1/256 refcnt=2
> |[ 20.279471] pending: blk_mq_requeue_work
> |[ 20.279989] pool 12: cpus=6 node=0 flags=0x0 nice=0 hung=5s workers=3 idle:
> | 535 42
>
> Another atempt. As you see `blk_mq_requeue_work' is references in both
> outputs but it makes progress (it is scheduled multiple times). The item
> `disk_events_workfn' stucks.
I rebuilt kernel 5.6.4 with that patch, but before calling `shutdown -h
now' I guess I should have set printk_delay, because this time there was
a lot of output and it went by too fast. I'll try to delay it the next
time. After the initial output (which went by too fast for me to
photograph) there were endlessly repeated screenfuls of output like this:
[ 2578.402398] , acpi_os_execute_deferred, acpi_os_execute_deferred, acpi_os_execute_deferred, acpi_os_execute_deferred, acpi_os_execute_deferred, acpi_os_execute_deferred, acpi_os_execute_deferred, acpi_os_execute_deferred, acpi_os_execute_deferred, acpi_os_execute_deferred, acpi_os_execute_deferred
and at the bottom of the screen this:
[ 2578.427680] pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=0s workers=4 idle: 1331
After this line there was a brief pause (so I was able to photograph it)
and then it repeated, always the same. In the last line pool, cpus,
node, flags, nice and hung were always 0, and workers was always 4.
Only idle changed, but seemed to cycle through a few values. After
watching this repetition for several minutes, I finally did a hard
reboot.
Steve Berman
On Thu, 11 Jun 2020 00:49:26 +0200 Stephen Berman <[email protected]> wrote:
> On Wed, 10 Jun 2020 12:25:14 +0200 Sebastian Andrzej Siewior
> <[email protected]> wrote:
> [...]
>>> By the other patch do you mean the following? (This email was also put
>>> into my spam by GMX and AFAICT has not yet shown up on the vger list.)
>>
>> Yes. But now that I played a little with it, I made dis:
> [...]
>> So you should have output like:
>> |[ 14.131696] Flushing events_freezable_power_
>> the flush that hangs
>>
>> |[ 17.203246] Flushing events_freezable_power_ over, 0
>> It has been noticed (0 at the end)
>>
>> |[ 17.203849] Showing busy workqueues and worker pools:
>> |[ 17.204400] workqueue events_freezable_power_: flags=0x84
>> |[ 17.205009] pwq 12: cpus=6 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
>> |[ 17.205774] in-flight: 2147:disk_events_workfn
>> |[ 17.206268] workqueue kblockd: flags=0x18
>> |[ 17.206682] pwq 13: cpus=6 node=0 flags=0x0 nice=-20 active=1/256 refcnt=2
>> |[ 17.207428] pending: blk_mq_requeue_work
>> |[ 17.207958] pool 12: cpus=6 node=0 flags=0x0 nice=0 hung=2s workers=3 idle:
>> | 535 42
>>
>> A dump of all pending worqueue items.
>>
>> |[ 20.275239] Flushing events_freezable_power_ over, 0
>> |[ 20.275812] Showing busy workqueues and worker pools:
>> |[ 20.276385] workqueue events_freezable_power_: flags=0x84
>> |[ 20.276972] pwq 12: cpus=6 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
>> |[ 20.277712] in-flight: 2147:disk_events_workfn
>> |[ 20.278245] workqueue kblockd: flags=0x18
>> |[ 20.278685] pwq 13: cpus=6 node=0 flags=0x0 nice=-20 active=1/256 refcnt=2
>> |[ 20.279471] pending: blk_mq_requeue_work
>> |[ 20.279989] pool 12: cpus=6 node=0 flags=0x0 nice=0 hung=5s workers=3 idle:
>> | 535 42
>>
>> Another atempt. As you see `blk_mq_requeue_work' is references in both
>> outputs but it makes progress (it is scheduled multiple times). The item
>> `disk_events_workfn' stucks.
>
> I rebuilt kernel 5.6.4 with that patch, but before calling `shutdown -h
> now' I guess I should have set printk_delay, because this time there was
> a lot of output and it went by too fast. I'll try to delay it the next
> time.
I've done that now. I've sent you screenshots offlist. Here's a brief
summary: The initial shutdown log output is essentially the same as the
transcription I posted upthread, except that it is interspersed with
messages from your patch like "Flush hid_reset+0x0/0x80". Then comes:
ACPI: Preparing to enter system sleep state S5
Flushing kacpid
Flushing kacpid_notify
Flushing kacpid_notify over, 0
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
pending: drm_fb_helper_diry_work
workqueue kacpid_notify: flags=0x0
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1 refcnt=4185
in-flight: 1578:acpi_os_execute_deferred
delayed: acpi_os_execute_deferred, acpi_os_execute_deferred, ...
The "acpi_os_execute_deferred" messages were repeated many times in the
above line, then every 20-30 seconds again for several minutes. Then
suddenly a call trace appeared which was similar but not identical to
the one I posted upthread, and each line of the trace was followed by
the line ", acpi_os_execute_deferred". This went by quite quickly even
with the printk_delay I added, and I was unable to photograph the start
of it and couldn't get all of the subsequent output, but the screenshots
show some of it. After several minutes of this output I again did a
hard reboot.
Steve Berman
+ ACPI in case the ACPI folks see something obvious.
On 2020-06-11 17:39:40 [+0200], Stephen Berman wrote:
>
> I've done that now. I've sent you screenshots offlist. Here's a brief
> summary: The initial shutdown log output is essentially the same as the
> transcription I posted upthread, except that it is interspersed with
> messages from your patch like "Flush hid_reset+0x0/0x80". Then comes:
>
> ACPI: Preparing to enter system sleep state S5
> Flushing kacpid
> Flushing kacpid_notify
based on the screenshots and the backtrace you are stuck in is:
- __do_sys_reboot()
- acpi_power_off_prepare()
- acpi_os_wait_events_complete()
- flush_workqueue(kacpi_notify_wq);
> Flushing kacpid_notify over, 0
> Showing busy workqueues and worker pools:
> workqueue events: flags=0x0
> pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> pending: drm_fb_helper_diry_work
> workqueue kacpid_notify: flags=0x0
> pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1 refcnt=4185
> in-flight: 1578:acpi_os_execute_deferred
> delayed: acpi_os_execute_deferred, acpi_os_execute_deferred, ...
based on this, the kacpid_notify workqueue has quite some worker piled
up.
> The "acpi_os_execute_deferred" messages were repeated many times in the
> above line, then every 20-30 seconds again for several minutes. Then
> suddenly a call trace appeared which was similar but not identical to
> the one I posted upthread, and each line of the trace was followed by
> the line ", acpi_os_execute_deferred". This went by quite quickly even
> with the printk_delay I added, and I was unable to photograph the start
> of it and couldn't get all of the subsequent output, but the screenshots
> show some of it. After several minutes of this output I again did a
> hard reboot.
All good. I though that you will have one worker that is blocking but
you have a lot of them. It appears that one is active and stuck and more
are waiting.
Could you look at acpi in /proc/interrupts 10 secs apart to see if it
increments?
grep -E 'acpi|smbus' /proc/interrupts
You could also do "echo t > /proc/sysrq-trigger" which gives you a lot
of task state information, but at the end you will also see "howing busy
workqueues and worker pools:" regarding the workqueue state. I am
curious to see if you already have worker stuck in kacpid_notify with
acpi_os_execute_deferred.
Now that we know that know that acpi_os_execute_deferred() is stuck,
lets shed some light in what it is trying to do. The patch at the end
will dump this information into the console buffer (The `dmesg' command
will print the whole kernel buffer). I have no idea if this starts
printing while the system is running or during shutdown. I would expect
to see the Start line in acpi_os_execute_deferred() but not the End one.
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 41168c027a5a4..0e983c558bcb5 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -840,7 +840,9 @@ static void acpi_os_execute_deferred(struct work_struct *work)
{
struct acpi_os_dpc *dpc = container_of(work, struct acpi_os_dpc, work);
+ pr_err("%s(%d) Start %px %pF(%px)\n", __func__, __LINE__, dpc, dpc->function, dpc->context);
dpc->function(dpc->context);
+ pr_err("%s(%d) End %px %pF(%px)\n", __func__, __LINE__, dpc, dpc->function, dpc->context);
kfree(dpc);
}
@@ -1096,6 +1098,8 @@ acpi_status acpi_os_execute(acpi_execute_type type,
*/
if (type == OSL_NOTIFY_HANDLER) {
queue = kacpi_notify_wq;
+ pr_err("%s(%d) Adding %pS %px\n", __func__, __LINE__, function,
+ context);
INIT_WORK(&dpc->work, acpi_os_execute_deferred);
} else if (type == OSL_GPE_HANDLER) {
queue = kacpid_wq;
> Steve Berman
Sebastian
On Fri, 12 Jun 2020 13:01:22 +0200 Sebastian Andrzej Siewior <[email protected]> wrote:
> + ACPI in case the ACPI folks see something obvious.
[...]
>> The "acpi_os_execute_deferred" messages were repeated many times in the
>> above line, then every 20-30 seconds again for several minutes. Then
>> suddenly a call trace appeared which was similar but not identical to
>> the one I posted upthread, and each line of the trace was followed by
>> the line ", acpi_os_execute_deferred". This went by quite quickly even
>> with the printk_delay I added, and I was unable to photograph the start
>> of it and couldn't get all of the subsequent output, but the screenshots
>> show some of it. After several minutes of this output I again did a
>> hard reboot.
>
> All good. I though that you will have one worker that is blocking but
> you have a lot of them. It appears that one is active and stuck and more
> are waiting.
>
> Could you look at acpi in /proc/interrupts 10 secs apart to see if it
> increments?
>
> grep -E 'acpi|smbus' /proc/interrupts
I tried this several times, 10 seconds apart and longer, but saw no
change, it was always this:
steve [ ~ ]$ grep -E 'acpi|smbus' /proc/interrupts
9: 0 5 0 0 0 0 0 0 0 0 0 0 IO-APIC 9-fasteoi acpi
16: 0 0 0 0 0 0 0 0 0 0 0 0 IO-APIC 16-fasteoi i801_smbus
> You could also do "echo t > /proc/sysrq-trigger" which gives you a lot
> of task state information, but at the end you will also see "howing busy
> workqueues and worker pools:" regarding the workqueue state. I am
> curious to see if you already have worker stuck in kacpid_notify with
> acpi_os_execute_deferred.
What am I supposed to do after "echo t > /proc/sysrq-trigger"? Both
before and after doing that I get an error trying to open the file:
root [ ~ ]# cat /proc/sysrq-trigger
cat: /proc/sysrq-trigger: Input/output error
> Now that we know that know that acpi_os_execute_deferred() is stuck,
> lets shed some light in what it is trying to do. The patch at the end
> will dump this information into the console buffer (The `dmesg' command
> will print the whole kernel buffer). I have no idea if this starts
> printing while the system is running or during shutdown. I would expect
> to see the Start line in acpi_os_execute_deferred() but not the End one.
>
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index 41168c027a5a4..0e983c558bcb5 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -840,7 +840,9 @@ static void acpi_os_execute_deferred(struct work_struct *work)
> {
> struct acpi_os_dpc *dpc = container_of(work, struct acpi_os_dpc, work);
>
> + pr_err("%s(%d) Start %px %pF(%px)\n", __func__, __LINE__, dpc, dpc->function, dpc->context);
> dpc->function(dpc->context);
> + pr_err("%s(%d) End %px %pF(%px)\n", __func__, __LINE__, dpc, dpc->function, dpc->context);
> kfree(dpc);
> }
>
> @@ -1096,6 +1098,8 @@ acpi_status acpi_os_execute(acpi_execute_type type,
> */
> if (type == OSL_NOTIFY_HANDLER) {
> queue = kacpi_notify_wq;
> + pr_err("%s(%d) Adding %pS %px\n", __func__, __LINE__, function,
> + context);
> INIT_WORK(&dpc->work, acpi_os_execute_deferred);
> } else if (type == OSL_GPE_HANDLER) {
> queue = kacpid_wq;
I applied this patch to 5.6.4 and recompiled, and on the next boot with
that kernel, the kernel buffer (and kernel and system logs) began to get
flooded with these messages:
Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.943987] acpi_os_execute_deferred(843) Start ffff8fb82c7b6500 000000003edf1e05(ffff8fb82c492990)
Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944102] acpi_os_execute(1101) Adding acpi_ev_asynch_enable_gpe+0x0/0x2f ffff8fb82c492990
Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944104] acpi_os_execute_deferred(845) End ffff8fb82c7b6500 000000003edf1e05(ffff8fb82c492990)
Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944105] acpi_os_execute_deferred(843) Start ffff8fb82b844800 000000002ba560ea(ffff8fb82c492990)
Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944124] acpi_os_execute_deferred(845) End ffff8fb82b844800 000000002ba560ea(ffff8fb82c492990)
Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944288] acpi_os_execute_deferred(843) Start ffff8fb82c7b6540 000000003edf1e05(ffff8fb82c492990)
Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944387] acpi_os_execute(1101) Adding acpi_ev_asynch_enable_gpe+0x0/0x2f ffff8fb82c492990
and so on without stopping. I could start X and avoid seeing the
messages, but was afraid the logs would fill up the root partition if I
let it keep going, so I rebooted with another kernel.
Was this message flood because I booted with "ignore_loglevel
initcall_debug"? In the logs there are also lots of messages like this:
Jun 14 10:37:13 strobe-jhalfs kernel: [ 6.237628] initcall i915_init+0x0/0x66 returned 0 after 288657 usecs
Jun 14 10:37:13 strobe-jhalfs kernel: [ 6.237644] calling cn_proc_init+0x0/0x35 @ 1
Jun 14 10:37:13 strobe-jhalfs kernel: [ 6.237646] initcall cn_proc_init+0x0/0x35 returned 0 after 0 usecs
Jun 14 10:37:13 strobe-jhalfs kernel: [ 6.237648] calling _nvm_misc_init+0x0/0xc @ 1
Jun 14 10:37:13 strobe-jhalfs kernel: [ 6.237687] initcall _nvm_misc_init+0x0/0xc returned 0 after 36 usecs
Jun 14 10:37:13 strobe-jhalfs kernel: [ 6.237690] calling topology_sysfs_init+0x0/0x30 @ 1
Jun 14 10:37:13 strobe-jhalfs kernel: [ 6.237720] initcall topology_sysfs_init+0x0/0x30 returned 0 after 28 usecs
and so on.
Steve Berman
On 2020-06-14 14:12:18 [+0200], Stephen Berman wrote:
> On Fri, 12 Jun 2020 13:01:22 +0200 Sebastian Andrzej Siewior <[email protected]> wrote:
>
> steve [ ~ ]$ grep -E 'acpi|smbus' /proc/interrupts
> 9: 0 5 0 0 0 0 0 0 0 0 0 0 IO-APIC 9-fasteoi acpi
> 16: 0 0 0 0 0 0 0 0 0 0 0 0 IO-APIC 16-fasteoi i801_smbus
okay, so it is not increasing at runtime.
> > You could also do "echo t > /proc/sysrq-trigger" which gives you a lot
> > of task state information, but at the end you will also see "howing busy
> > workqueues and worker pools:" regarding the workqueue state. I am
> > curious to see if you already have worker stuck in kacpid_notify with
> > acpi_os_execute_deferred.
>
> What am I supposed to do after "echo t > /proc/sysrq-trigger"? Both
> before and after doing that I get an error trying to open the file:
>
> root [ ~ ]# cat /proc/sysrq-trigger
> cat: /proc/sysrq-trigger: Input/output error
echo "t > /proc/sysrq-trigger"
not cat.
> > Now that we know that know that acpi_os_execute_deferred() is stuck,
> > lets shed some light in what it is trying to do. The patch at the end
> > will dump this information into the console buffer (The `dmesg' command
> > will print the whole kernel buffer). I have no idea if this starts
> > printing while the system is running or during shutdown. I would expect
> > to see the Start line in acpi_os_execute_deferred() but not the End one.
> >
> > diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> > index 41168c027a5a4..0e983c558bcb5 100644
> > --- a/drivers/acpi/osl.c
> > +++ b/drivers/acpi/osl.c
> > @@ -840,7 +840,9 @@ static void acpi_os_execute_deferred(struct work_struct *work)
> > {
> > struct acpi_os_dpc *dpc = container_of(work, struct acpi_os_dpc, work);
> >
> > + pr_err("%s(%d) Start %px %pF(%px)\n", __func__, __LINE__, dpc, dpc->function, dpc->context);
> > dpc->function(dpc->context);
> > + pr_err("%s(%d) End %px %pF(%px)\n", __func__, __LINE__, dpc, dpc->function, dpc->context);
> > kfree(dpc);
> > }
argh, this should have been %pS not %pF. Sorry for that.
> > @@ -1096,6 +1098,8 @@ acpi_status acpi_os_execute(acpi_execute_type type,
> > */
> > if (type == OSL_NOTIFY_HANDLER) {
> > queue = kacpi_notify_wq;
> > + pr_err("%s(%d) Adding %pS %px\n", __func__, __LINE__, function,
> > + context);
> > INIT_WORK(&dpc->work, acpi_os_execute_deferred);
> > } else if (type == OSL_GPE_HANDLER) {
> > queue = kacpid_wq;
>
> I applied this patch to 5.6.4 and recompiled, and on the next boot with
> that kernel, the kernel buffer (and kernel and system logs) began to get
> flooded with these messages:
>
> Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.943987] acpi_os_execute_deferred(843) Start ffff8fb82c7b6500 000000003edf1e05(ffff8fb82c492990)
> Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944102] acpi_os_execute(1101) Adding acpi_ev_asynch_enable_gpe+0x0/0x2f ffff8fb82c492990
> Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944104] acpi_os_execute_deferred(845) End ffff8fb82c7b6500 000000003edf1e05(ffff8fb82c492990)
> Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944105] acpi_os_execute_deferred(843) Start ffff8fb82b844800 000000002ba560ea(ffff8fb82c492990)
> Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944124] acpi_os_execute_deferred(845) End ffff8fb82b844800 000000002ba560ea(ffff8fb82c492990)
> Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944288] acpi_os_execute_deferred(843) Start ffff8fb82c7b6540 000000003edf1e05(ffff8fb82c492990)
> Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944387] acpi_os_execute(1101) Adding acpi_ev_asynch_enable_gpe+0x0/0x2f ffff8fb82c492990
>
> and so on without stopping. I could start X and avoid seeing the
> messages, but was afraid the logs would fill up the root partition if I
> let it keep going, so I rebooted with another kernel.
>
> Was this message flood because I booted with "ignore_loglevel
> initcall_debug"? In the logs there are also lots of messages like this:
Is there a acpi_os_execute_* flood? The *few* at what appears to system
startup and might be normal. If there appear *many* more and are
constantly printing (check with dmesg) then we might be to something.
> Jun 14 10:37:13 strobe-jhalfs kernel: [ 6.237628] initcall i915_init+0x0/0x66 returned 0 after 288657 usecs
> Jun 14 10:37:13 strobe-jhalfs kernel: [ 6.237644] calling cn_proc_init+0x0/0x35 @ 1
> Jun 14 10:37:13 strobe-jhalfs kernel: [ 6.237646] initcall cn_proc_init+0x0/0x35 returned 0 after 0 usecs
> Jun 14 10:37:13 strobe-jhalfs kernel: [ 6.237648] calling _nvm_misc_init+0x0/0xc @ 1
> Jun 14 10:37:13 strobe-jhalfs kernel: [ 6.237687] initcall _nvm_misc_init+0x0/0xc returned 0 after 36 usecs
> Jun 14 10:37:13 strobe-jhalfs kernel: [ 6.237690] calling topology_sysfs_init+0x0/0x30 @ 1
> Jun 14 10:37:13 strobe-jhalfs kernel: [ 6.237720] initcall topology_sysfs_init+0x0/0x30 returned 0 after 28 usecs
>
> and so on.
You have initcalls here which is due to "initcall_debug". The i915*
message means that the i915 module was loaded.
That "initcall_debug" prints you starts/stops of modules that are loaded
(built-in or loaded modules) and shutdown callbacks which are invoked at
system shutdown. The "ignore_loglevel" shows prints all messages to the
console ignoring the current loglevel. Otherwise it would skip messages
with a "minor" loglevel. With this we were able to see the shutdown
callbacks (of the ahci module for instance).
I attached the updated acpi patch. It limits the prints to the
kacpi_notify_wq queue which appears to stuck at shutdown.
The interesting part is to see if there is a acpi_os_execute() adding a
specific event multiple times which does not complete. Maybe at runtime,
maybe at shutdown time. If that is the case then ignoring this specific
event might fix the shutdown problem. With all this information so far,
I don't see a relation with this problem and the commit…
> Steve Berman
Sebastian
On Sun, 14 Jun 2020 19:10:05 +0200 Sebastian Andrzej Siewior <[email protected]> wrote:
> On 2020-06-14 14:12:18 [+0200], Stephen Berman wrote:
[...]
>> What am I supposed to do after "echo t > /proc/sysrq-trigger"? Both
>> before and after doing that I get an error trying to open the file:
>>
>> root [ ~ ]# cat /proc/sysrq-trigger
>> cat: /proc/sysrq-trigger: Input/output error
>
> echo "t > /proc/sysrq-trigger"
>
> not cat.
Ok, sorry, I had misunderstood, but now I've looked at the
documentation. I had in fact already done `echo t >
/proc/sysrq-trigger' in an xterm (as root) and there was no output.
Later, after booting kernel 5.1.0 because of the message flooding with
5.6.4, I did `echo t > /proc/sysrq-trigger' in a virtual tty (also as
root) and the only output was: `[ <timestamp>] sysrq: Show State'. Is
this expected? (In /proc/sys/kernel/sysrq there is `1'.) I couldn't
try it in a 5.6.4 virtual tty because of the message flooding (see
below).
[...]
>> I applied this patch to 5.6.4 and recompiled, and on the next boot with
>> that kernel, the kernel buffer (and kernel and system logs) began to get
>> flooded with these messages:
>>
>> Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.943987]
>> acpi_os_execute_deferred(843) Start ffff8fb82c7b6500
>> 000000003edf1e05(ffff8fb82c492990)
>> Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944102] acpi_os_execute(1101)
>> Adding acpi_ev_asynch_enable_gpe+0x0/0x2f ffff8fb82c492990
>> Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944104]
>> acpi_os_execute_deferred(845) End ffff8fb82c7b6500
>> 000000003edf1e05(ffff8fb82c492990)
>> Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944105]
>> acpi_os_execute_deferred(843) Start ffff8fb82b844800
>> 000000002ba560ea(ffff8fb82c492990)
>> Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944124]
>> acpi_os_execute_deferred(845) End ffff8fb82b844800
>> 000000002ba560ea(ffff8fb82c492990)
>> Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944288]
>> acpi_os_execute_deferred(843) Start ffff8fb82c7b6540
>> 000000003edf1e05(ffff8fb82c492990)
>> Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944387] acpi_os_execute(1101)
>> Adding acpi_ev_asynch_enable_gpe+0x0/0x2f ffff8fb82c492990
>>
>> and so on without stopping. I could start X and avoid seeing the
>> messages, but was afraid the logs would fill up the root partition if I
>> let it keep going, so I rebooted with another kernel.
>>
>> Was this message flood because I booted with "ignore_loglevel
>> initcall_debug"? In the logs there are also lots of messages like this:
>
> Is there a acpi_os_execute_* flood? The *few* at what appears to system
> startup and might be normal. If there appear *many* more and are
> constantly printing (check with dmesg) then we might be to something.
The kernel log shows 305 of these messages in the 4 minutes and 17
seconds between the start of klogd and when I rebooted.
[...]
> I attached the updated acpi patch. It limits the prints to the
> kacpi_notify_wq queue which appears to stuck at shutdown.
I applied this patch and rebuilt the kernel. After booting 5.6.4 this
morning there was again message flooding and this time I was unable to
log in because of it, so had to do a hard reboot and booted with 5.1.0.
The kernel log shows 207 acpi_os_execute_deferred_notify messages in 3
minutes and 6 seconds; here are the first:
Jun 15 08:55:22 strobe-jhalfs kernel: [ 5.943848] acpi_os_execute(1109) Adding acpi_ev_asynch_enable_gpe+0x0/0x2f ffff8d7aaba18990 <ffff8d7aab157fc0>
Jun 15 08:55:22 strobe-jhalfs kernel: [ 5.943851] acpi_os_execute_deferred_notify(851) Start ffff8d7aab157fc0 acpi_ev_asynch_enable_gpe+0x0/0x2f(ffff8d7aaba18990)
Jun 15 08:55:22 strobe-jhalfs kernel: [ 5.943865] acpi_os_execute_deferred_notify(853) End ffff8d7aab157fc0 acpi_ev_asynch_enable_gpe+0x0/0x2f(ffff8d7aaba18990)
Jun 15 08:55:22 strobe-jhalfs kernel: [ 5.944104] acpi_os_execute(1109) Adding acpi_ev_asynch_enable_gpe+0x0/0x2f ffff8d7aaba18990 <ffff8d7aab157fc0>
Jun 15 08:55:22 strobe-jhalfs kernel: [ 5.944106] acpi_os_execute_deferred_notify(851) Start ffff8d7aab157fc0 acpi_ev_asynch_enable_gpe+0x0/0x2f(ffff8d7aaba18990)
Jun 15 08:55:22 strobe-jhalfs kernel: [ 5.944115] acpi_os_execute_deferred_notify(853) End ffff8d7aab157fc0 acpi_ev_asynch_enable_gpe+0x0/0x2f(ffff8d7aaba18990)
Jun 15 08:55:22 strobe-jhalfs kernel: [ 5.944406] acpi_os_execute(1109) Adding acpi_ev_asynch_enable_gpe+0x0/0x2f ffff8d7aaba18990 <ffff8d7aab157fc0>
Jun 15 08:55:22 strobe-jhalfs kernel: [ 5.944408] acpi_os_execute_deferred_notify(851) Start ffff8d7aab157fc0 acpi_ev_asynch_enable_gpe+0x0/0x2f(ffff8d7aaba18990)
Jun 15 08:55:22 strobe-jhalfs kernel: [ 5.944417] acpi_os_execute_deferred_notify(853) End ffff8d7aab157fc0 acpi_ev_asynch_enable_gpe+0x0/0x2f(ffff8d7aaba18990)
> The interesting part is to see if there is a acpi_os_execute() adding a
> specific event multiple times which does not complete. Maybe at runtime,
> maybe at shutdown time. If that is the case then ignoring this specific
> event might fix the shutdown problem. With all this information so far,
> I don't see a relation with this problem and the commit…
In those 3 minutes and 8 seconds there were 5 "Adding
acpi_ev_asynch_enable_gpe" messages at the beginning, then 185 "Adding
acpi_ev_notify_dispatch" messages, which kept coming until I rebooted.
Steve Berman
On 2020-06-15 09:58:00 [+0200], Stephen Berman wrote:
> Ok, sorry, I had misunderstood, but now I've looked at the
> documentation. I had in fact already done `echo t >
> /proc/sysrq-trigger' in an xterm (as root) and there was no output.
> Later, after booting kernel 5.1.0 because of the message flooding with
> 5.6.4, I did `echo t > /proc/sysrq-trigger' in a virtual tty (also as
> root) and the only output was: `[ <timestamp>] sysrq: Show State'. Is
> this expected? (In /proc/sys/kernel/sysrq there is `1'.) I couldn't
> try it in a 5.6.4 virtual tty because of the message flooding (see
> below).
If you do this "t" then there should be a lot of output on your console.
If you do this from an xterm then you can see the output after typing
"dmesg". The output should appear also in your system log.
> The kernel log shows 305 of these messages in the 4 minutes and 17
> seconds between the start of klogd and when I rebooted.
Okay, this is a lot.
> > The interesting part is to see if there is a acpi_os_execute() adding a
> > specific event multiple times which does not complete. Maybe at runtime,
> > maybe at shutdown time. If that is the case then ignoring this specific
> > event might fix the shutdown problem. With all this information so far,
> > I don't see a relation with this problem and the commit…
>
> In those 3 minutes and 8 seconds there were 5 "Adding
> acpi_ev_asynch_enable_gpe" messages at the beginning, then 185 "Adding
> acpi_ev_notify_dispatch" messages, which kept coming until I rebooted.
5 acpi_ev_asynch_enable_gpe() in three minutes since boot isn't much.
185 of "Adding acpi_ev_notify_dispatch" is a lot.
But to be clear, only
acpi_os_execute(1109) Adding acpi_ev_notify_dispatch" messages?
No
"acpi_os_execute_deferred_notify() Start/End acpi_ev_notify_dispatch"?
The syslog should have captured the log on disk.
I attached a modified acpi_dbg.patch. Please enable:
- CONFIG_ACPI_DEBUG=y
Looking at your 5.1 you have tracing enabled (hope it still is).
The attached patch will dump the date into the tracing buffer, so you
console should remain "clean". Once it records 300 of those "requests
for acpi_ev_notify_dispatch" it will stop recording.
After 4-5 minutes please do
cat /sys/kernel/debug/tracing/trace > file.txt
compress it and send it.
> Steve Berman
Sebastian
On Mon, 15 Jun 2020 16:51:30 +0200 Sebastian Andrzej Siewior <[email protected]> wrote:
> On 2020-06-15 09:58:00 [+0200], Stephen Berman wrote:
>> Ok, sorry, I had misunderstood, but now I've looked at the
>> documentation. I had in fact already done `echo t >
>> /proc/sysrq-trigger' in an xterm (as root) and there was no output.
>> Later, after booting kernel 5.1.0 because of the message flooding with
>> 5.6.4, I did `echo t > /proc/sysrq-trigger' in a virtual tty (also as
>> root) and the only output was: `[ <timestamp>] sysrq: Show State'. Is
>> this expected? (In /proc/sys/kernel/sysrq there is `1'.) I couldn't
>> try it in a 5.6.4 virtual tty because of the message flooding (see
>> below).
>
> If you do this "t" then there should be a lot of output on your console.
> If you do this from an xterm then you can see the output after typing
> "dmesg". The output should appear also in your system log.
Ah, ok, I do see it in the log, it looks basically the same as the call
trace I posted upthread. I wonder why there was no such output in the
console, could there be some setting that suppresses it even though
/proc/sys/kernel/sysrq has `1'?
>> The kernel log shows 305 of these messages in the 4 minutes and 17
>> seconds between the start of klogd and when I rebooted.
>
> Okay, this is a lot.
>
>> > The interesting part is to see if there is a acpi_os_execute() adding a
>> > specific event multiple times which does not complete. Maybe at runtime,
>> > maybe at shutdown time. If that is the case then ignoring this specific
>> > event might fix the shutdown problem. With all this information so far,
>> > I don't see a relation with this problem and the commit…
>>
>> In those 3 minutes and 8 seconds there were 5 "Adding
>> acpi_ev_asynch_enable_gpe" messages at the beginning, then 185 "Adding
>> acpi_ev_notify_dispatch" messages, which kept coming until I rebooted.
>
> 5 acpi_ev_asynch_enable_gpe() in three minutes since boot isn't much.
> 185 of "Adding acpi_ev_notify_dispatch" is a lot.
>
> But to be clear, only
> acpi_os_execute(1109) Adding acpi_ev_notify_dispatch" messages?
> No
> "acpi_os_execute_deferred_notify() Start/End acpi_ev_notify_dispatch"?
>
> The syslog should have captured the log on disk.
There were in fact 99 "Start .* acpi_ev_notify_dispatch" messages and 98
"End .* acpi_ev_notify_dispatch" messages. Here's the last of them
before I rebooted:
Jun 15 08:58:25 strobe-jhalfs kernel: [ 193.315014] acpi_os_execute(1109) Adding acpi_ev_notify_dispatch+0x0/0x55 ffff8d7aa99a66e0 <ffff8d7aabb38480>
Jun 15 08:58:25 strobe-jhalfs kernel: [ 193.319929] acpi_os_execute_deferred_notify(853) End ffff8d7aa8758d00 acpi_ev_notify_dispatch+0x0/0x55(ffff8d7aa84e7eb0)
Jun 15 08:58:25 strobe-jhalfs kernel: [ 193.321242] acpi_os_execute_deferred_notify(851) Start ffff8d7aa8758a80 acpi_ev_notify_dispatch+0x0/0x55(ffff8d7aa84e70a0)
Jun 15 08:58:26 strobe-jhalfs kernel: [ 194.339017] acpi_os_execute(1109) Adding acpi_ev_notify_dispatch+0x0/0x55 ffff8d7aa99a6730 <ffff8d7aabb384c0>
Jun 15 08:58:27 strobe-jhalfs kernel: [ 195.363005] acpi_os_execute(1109) Adding acpi_ev_notify_dispatch+0x0/0x55 ffff8d7aa99a6780 <ffff8d7aabb38500>
Jun 15 08:58:27 strobe-jhalfs kernel: [ 195.367978] acpi_os_execute_deferred_notify(853) End ffff8d7aa8758a80 acpi_ev_notify_dispatch+0x0/0x55(ffff8d7aa84e70a0)
Jun 15 08:58:27 strobe-jhalfs kernel: [ 195.369234] acpi_os_execute_deferred_notify(851) Start ffff8d7aa8758240 acpi_ev_notify_dispatch+0x0/0x55(ffff8d7aa84e7a00)
Jun 15 08:58:28 strobe-jhalfs kernel: [ 196.387017] acpi_os_execute(1109) Adding acpi_ev_notify_dispatch+0x0/0x55 ffff8d7aa99a68c0 <ffff8d7aabb38540>
Jun 15 08:58:28 strobe-jhalfs kernel: [ 196.388508] acpi_os_execute_deferred_notify(853) End ffff8d7aa8758240 acpi_ev_notify_dispatch+0x0/0x55(ffff8d7aa84e7a00)
Jun 15 08:58:28 strobe-jhalfs kernel: [ 196.390128] acpi_os_execute_deferred_notify(851) Start ffff8d7aa8758b80 acpi_ev_notify_dispatch+0x0/0x55(ffff8d7aa84e7460)
Jun 15 08:58:29 strobe-jhalfs kernel: [ 197.208714] wlan1: deauthenticating from 7c:ff:4d:08:df:22 by local choice (Reason: 3=DEAUTH_LEAVING)
Jun 15 08:58:29 strobe-jhalfs kernel: [ 197.232214] ip (4614) used greatest stack depth: 11272 bytes left
Jun 15 08:58:29 strobe-jhalfs kernel: Kernel logging (proc) stopped.
Jun 15 08:58:29 strobe-jhalfs kernel: Kernel log daemon terminating.
> I attached a modified acpi_dbg.patch. Please enable:
> - CONFIG_ACPI_DEBUG=y
>
> Looking at your 5.1 you have tracing enabled (hope it still is).
>
> The attached patch will dump the date into the tracing buffer, so you
> console should remain "clean". Once it records 300 of those "requests
> for acpi_ev_notify_dispatch" it will stop recording.
> After 4-5 minutes please do
> cat /sys/kernel/debug/tracing/trace > file.txt
>
> compress it and send it.
Will do as soon as I can, hopefully later today or tomorrow. Thanks.
Steve Berman
On 2020-06-15 17:41:06 [+0200], Stephen Berman wrote:
> > If you do this "t" then there should be a lot of output on your console.
> > If you do this from an xterm then you can see the output after typing
> > "dmesg". The output should appear also in your system log.
>
> Ah, ok, I do see it in the log, it looks basically the same as the call
> trace I posted upthread. I wonder why there was no such output in the
> console, could there be some setting that suppresses it even though
> /proc/sys/kernel/sysrq has `1'?
You have ignore_loglevel soo everything should appear on your console.
The is true for your tty / ctrl+alt+f1 but your xterm shell.
> There were in fact 99 "Start .* acpi_ev_notify_dispatch" messages and 98
> "End .* acpi_ev_notify_dispatch" messages. Here's the last of them
> before I rebooted:
>
> Jun 15 08:58:25 strobe-jhalfs kernel: [ 193.315014] acpi_os_execute(1109) Adding acpi_ev_notify_dispatch+0x0/0x55 ffff8d7aa99a66e0 <ffff8d7aabb38480>
> Jun 15 08:58:25 strobe-jhalfs kernel: [ 193.319929] acpi_os_execute_deferred_notify(853) End ffff8d7aa8758d00 acpi_ev_notify_dispatch+0x0/0x55(ffff8d7aa84e7eb0)
> Jun 15 08:58:25 strobe-jhalfs kernel: [ 193.321242] acpi_os_execute_deferred_notify(851) Start ffff8d7aa8758a80 acpi_ev_notify_dispatch+0x0/0x55(ffff8d7aa84e70a0)
> Jun 15 08:58:26 strobe-jhalfs kernel: [ 194.339017] acpi_os_execute(1109) Adding acpi_ev_notify_dispatch+0x0/0x55 ffff8d7aa99a6730 <ffff8d7aabb384c0>
> Jun 15 08:58:27 strobe-jhalfs kernel: [ 195.363005] acpi_os_execute(1109) Adding acpi_ev_notify_dispatch+0x0/0x55 ffff8d7aa99a6780 <ffff8d7aabb38500>
> Jun 15 08:58:27 strobe-jhalfs kernel: [ 195.367978] acpi_os_execute_deferred_notify(853) End ffff8d7aa8758a80 acpi_ev_notify_dispatch+0x0/0x55(ffff8d7aa84e70a0)
> Jun 15 08:58:27 strobe-jhalfs kernel: [ 195.369234] acpi_os_execute_deferred_notify(851) Start ffff8d7aa8758240 acpi_ev_notify_dispatch+0x0/0x55(ffff8d7aa84e7a00)
> Jun 15 08:58:28 strobe-jhalfs kernel: [ 196.387017] acpi_os_execute(1109) Adding acpi_ev_notify_dispatch+0x0/0x55 ffff8d7aa99a68c0 <ffff8d7aabb38540>
> Jun 15 08:58:28 strobe-jhalfs kernel: [ 196.388508] acpi_os_execute_deferred_notify(853) End ffff8d7aa8758240 acpi_ev_notify_dispatch+0x0/0x55(ffff8d7aa84e7a00)
> Jun 15 08:58:28 strobe-jhalfs kernel: [ 196.390128] acpi_os_execute_deferred_notify(851) Start ffff8d7aa8758b80 acpi_ev_notify_dispatch+0x0/0x55(ffff8d7aa84e7460)
> Jun 15 08:58:29 strobe-jhalfs kernel: [ 197.208714] wlan1: deauthenticating from 7c:ff:4d:08:df:22 by local choice (Reason: 3=DEAUTH_LEAVING)
> Jun 15 08:58:29 strobe-jhalfs kernel: [ 197.232214] ip (4614) used greatest stack depth: 11272 bytes left
> Jun 15 08:58:29 strobe-jhalfs kernel: Kernel logging (proc) stopped.
> Jun 15 08:58:29 strobe-jhalfs kernel: Kernel log daemon terminating.
Okay. So there is "Adding" of four events/work items in total, each
almost every second.
Processing of one work-item took two seconds, the other one took a
second. These events are "old" so I don't see the "adding" line for the
Start/End.
You could look in the log for ffff8d7aa8758a80 to seen when it was
enqueued but it feels like ACPI enqueues more events then it is capable
of processing.
> Steve Berman
Sebastian
On Mon, 15 Jun 2020 17:58:46 +0200 Sebastian Andrzej Siewior <[email protected]> wrote:
> On 2020-06-15 17:41:06 [+0200], Stephen Berman wrote:
>> > If you do this "t" then there should be a lot of output on your console.
>> > If you do this from an xterm then you can see the output after typing
>> > "dmesg". The output should appear also in your system log.
>>
>> Ah, ok, I do see it in the log, it looks basically the same as the call
>> trace I posted upthread. I wonder why there was no such output in the
>> console, could there be some setting that suppresses it even though
>> /proc/sys/kernel/sysrq has `1'?
>
> You have ignore_loglevel soo everything should appear on your console.
> The is true for your tty / ctrl+alt+f1 but your xterm shell.
Yet that appears not to be the case here, or I'm still doing something
wrong ????
>> There were in fact 99 "Start .* acpi_ev_notify_dispatch" messages and 98
>> "End .* acpi_ev_notify_dispatch" messages. Here's the last of them
>> before I rebooted:
[...]
>> acpi_os_execute_deferred_notify(851) Start ffff8d7aa8758a80
>> acpi_ev_notify_dispatch+0x0/0x55(ffff8d7aa84e70a0)
>> Jun 15 08:58:26 strobe-jhalfs kernel: [ 194.339017] acpi_os_execute(1109)
>> Adding acpi_ev_notify_dispatch+0x0/0x55 ffff8d7aa99a6730 <ffff8d7aabb384c0>
>> Jun 15 08:58:27 strobe-jhalfs kernel: [ 195.363005] acpi_os_execute(1109)
>> Adding acpi_ev_notify_dispatch+0x0/0x55 ffff8d7aa99a6780 <ffff8d7aabb38500>
>> Jun 15 08:58:27 strobe-jhalfs kernel: [ 195.367978]
>> acpi_os_execute_deferred_notify(853) End ffff8d7aa8758a80
[...]
> Okay. So there is "Adding" of four events/work items in total, each
> almost every second.
> Processing of one work-item took two seconds, the other one took a
> second. These events are "old" so I don't see the "adding" line for the
> Start/End.
> You could look in the log for ffff8d7aa8758a80 to seen when it was
> enqueued but it feels like ACPI enqueues more events then it is capable
> of processing.
It's there, 196 lines above the Start ffff8d7aa8758a80 line:
Jun 15 08:56:58 strobe-jhalfs kernel: [ 106.275356] acpi_os_execute(1109) Adding acpi_ev_notify_dispatch+0x0/0x55 ffff8d7aa84e70a0 <ffff8d7aa8758a80>
Steve Berman
On 2020-06-15 18:19:18 [+0200], Stephen Berman wrote:
> It's there, 196 lines above the Start ffff8d7aa8758a80 line:
>
> Jun 15 08:56:58 strobe-jhalfs kernel: [ 106.275356] acpi_os_execute(1109) Adding acpi_ev_notify_dispatch+0x0/0x55 ffff8d7aa84e70a0 <ffff8d7aa8758a80>
So compared with
| [ 193.321242] acpi_os_execute_deferred_notify(851) Start ffff8d7aa8758a80 acpi_ev_notify_dispatch+0x0/0x55(ffff8d7aa84e70a0)
it took ~87 seconds for the item to start since it was enqueued.
Assuming that workqueue still makes progress on shutdown it may just a
long time to flush all worker on shutdown. I still have no idea how the
commit in question is responsible for that behaviour…
> Steve Berman
Sebastian
On Mon, 15 Jun 2020 16:51:30 +0200 Sebastian Andrzej Siewior <[email protected]> wrote:
> I attached a modified acpi_dbg.patch. Please enable:
> - CONFIG_ACPI_DEBUG=y
>
> Looking at your 5.1 you have tracing enabled (hope it still is).
>
> The attached patch will dump the date into the tracing buffer, so you
> console should remain "clean". Once it records 300 of those "requests
> for acpi_ev_notify_dispatch" it will stop recording.
> After 4-5 minutes please do
> cat /sys/kernel/debug/tracing/trace > file.txt
>
> compress it and send it.
I set CONFIG_ACPI_DEBUG=y, applied the patch and rebuilt 5.6.4, but:
$ cat /sys/kernel/debug/tracing/trace > trace.txt
cat: /sys/kernel/debug/tracing/trace: No such file or directory
Here are all the 5.6.4 config options with "TRACING" or "TRACE"; do I
need to set any that aren't set?
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_TRACEPOINTS=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_PM_TRACE=y
CONFIG_PM_TRACE_RTC=y
CONFIG_HAVE_KPROBES_ON_FTRACE=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_RELIABLE_STACKTRACE=y
# CONFIG_MAC80211_MESSAGE_TRACING is not set
# CONFIG_DMA_FENCE_TRACE is not set
CONFIG_IWLWIFI_DEVICE_TRACING=y
# CONFIG_TRACE_SINK is not set
# HW tracing support
# end of HW tracing support
CONFIG_STACKTRACE=y
CONFIG_RCU_TRACE=y
CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_TRACE_CLOCK=y
CONFIG_EVENT_TRACING=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_TRACING=y
CONFIG_GENERIC_TRACER=y
CONFIG_TRACING_SUPPORT=y
CONFIG_FTRACE=y
# CONFIG_BOOTTIME_TRACING is not set
# CONFIG_FUNCTION_TRACER is not set
# CONFIG_STACK_TRACER is not set
# CONFIG_IRQSOFF_TRACER is not set
# CONFIG_SCHED_TRACER is not set
# CONFIG_HWLAT_TRACER is not set
# CONFIG_MMIOTRACE is not set
# CONFIG_FTRACE_SYSCALLS is not set
# CONFIG_TRACER_SNAPSHOT is not set
CONFIG_BLK_DEV_IO_TRACE=y
# CONFIG_TRACE_EVENT_INJECT is not set
# CONFIG_TRACEPOINT_BENCHMARK is not set
# CONFIG_TRACE_EVAL_MAP_FILE is not set
# CONFIG_FTRACE_STARTUP_TEST is not set
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_HAVE_MMIOTRACE_SUPPORT=y
CONFIG_CC_HAS_SANCOV_TRACE_PC=y
# CONFIG_BACKTRACE_SELF_TEST is not set
Steve Berman
On 2020-06-16 09:14:37 [+0200], Stephen Berman wrote:
>
> I set CONFIG_ACPI_DEBUG=y, applied the patch and rebuilt 5.6.4, but:
>
> $ cat /sys/kernel/debug/tracing/trace > trace.txt
> cat: /sys/kernel/debug/tracing/trace: No such file or directory
>
> Here are all the 5.6.4 config options with "TRACING" or "TRACE"; do I
> need to set any that aren't set?
I see that "Kernel hacking ---> Tracers" is enabled. You should have
one tracer enabled for that trace_printk() to work. You have context
switch tracer so it should be enough.
You might need CONFIG_DEBUG_FS (but it is set in your 5.1 config) or it
is just not mounted. So I have here:
| debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
| tracefs on /sys/kernel/debug/tracing type tracefs (rw,nosuid,nodev,noexec,relatime)
Does it work for you if you mount it?
> Steve Berman
Sebastian
On Tue, 16 Jun 2020 09:38:27 +0200 Sebastian Andrzej Siewior <[email protected]> wrote:
> On 2020-06-16 09:14:37 [+0200], Stephen Berman wrote:
>>
>> I set CONFIG_ACPI_DEBUG=y, applied the patch and rebuilt 5.6.4, but:
>>
>> $ cat /sys/kernel/debug/tracing/trace > trace.txt
>> cat: /sys/kernel/debug/tracing/trace: No such file or directory
>>
>> Here are all the 5.6.4 config options with "TRACING" or "TRACE"; do I
>> need to set any that aren't set?
>
> I see that "Kernel hacking ---> Tracers" is enabled. You should have
> one tracer enabled for that trace_printk() to work. You have context
> switch tracer so it should be enough.
>
> You might need CONFIG_DEBUG_FS (but it is set in your 5.1 config) or it
> is just not mounted. So I have here:
>
> | debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
> | tracefs on /sys/kernel/debug/tracing type tracefs (rw,nosuid,nodev,noexec,relatime)
>
> Does it work for you if you mount it?
# mount -t debugfs debugfs /sys/kernel/debug
# mount -t tracefs tracefs /sys/kernel/debug/tracing
# mount
...
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
tracefs on /sys/kernel/debug/tracing type tracefs (rw,relatime)
Yes, thanks, that did it. Trace attached.
Steve Berman
On 2020-06-16 10:13:27 [+0200], Stephen Berman wrote:
> Yes, thanks, that did it. Trace attached.
So TZ10 is a temperature sensor of some kind on your motherboard. In
your v5.6 dmesg there is:
| thermal LNXTHERM:00: registered as thermal_zone0
| ACPI: Thermal Zone [TZ10] (17 C)
So. In /sys/class/thermal/thermal_zone0/device/path you should also see
TZ10. And /sys/class/thermal/thermal_zone0/temp should show the actual
value.
This comes from the "thermal" module.
Looking at the trace, might query the temperature every second which
somehow results in "Dispatching Notify on". I don't understand how it
gets from reading of the temperature to the notify part, maybe it is
part of the ACPI…
However. Could you please make sure that the thermal module is not
loaded at system startup? Adding
thermal.off=1
to the kernel commandline should do the trick. And you should see
thermal control disabled
in dmesg. That means your thermal_zone0 with TZ10 does not show up in
/sys and nothing should schedule the work-items. This in turn should
allow you to shutdown your system without the delay.
If this works, could you please try to load the module with tzp=300?
If you add this
thermal.tzp=300
to the kernel commandline then it should do the trick. You can verify it
by
cat /sys/module/thermal/parameters/tzp
This should change the polling interval from what ACPI says to 30secs.
This should ensure that you don't have so many worker waiting. So you
should also be able to shutdown the system.
> Steve Berman
Sebastian
On Tue, 16 Jun 2020 17:55:01 +0200 Sebastian Andrzej Siewior <[email protected]> wrote:
> On 2020-06-16 10:13:27 [+0200], Stephen Berman wrote:
>> Yes, thanks, that did it. Trace attached.
>
> So TZ10 is a temperature sensor of some kind on your motherboard. In
> your v5.6 dmesg there is:
> | thermal LNXTHERM:00: registered as thermal_zone0
> | ACPI: Thermal Zone [TZ10] (17 C)
>
> So. In /sys/class/thermal/thermal_zone0/device/path you should also see
> TZ10. And /sys/class/thermal/thermal_zone0/temp should show the actual
> value.
> This comes from the "thermal" module.
Yes, TZ10 was in the thermal_zone0/device/path and the value in
thermal_zone0/temp was 16800.
> Looking at the trace, might query the temperature every second which
> somehow results in "Dispatching Notify on". I don't understand how it
> gets from reading of the temperature to the notify part, maybe it is
> part of the ACPI…
>
> However. Could you please make sure that the thermal module is not
> loaded at system startup? Adding
> thermal.off=1
>
> to the kernel commandline should do the trick. And you should see
> thermal control disabled
>
> in dmesg.
Confirmed. And the value in thermal_zone0/temp was now 33000.
> That means your thermal_zone0 with TZ10 does not show up in
> /sys and nothing should schedule the work-items. This in turn should
> allow you to shutdown your system without the delay.
It did!
> If this works, could you please try to load the module with tzp=300?
> If you add this
> thermal.tzp=300
>
> to the kernel commandline then it should do the trick. You can verify it
> by
> cat /sys/module/thermal/parameters/tzp
>
> This should change the polling interval from what ACPI says to 30secs.
> This should ensure that you don't have so many worker waiting. So you
> should also be able to shutdown the system.
Your assessment and predictions are right on the mark!
I'm fine with the thermal.tzp=300 workaround, but it would be good to
find out why this problem started with commit 6d25be57, if my git
bisection was correct, or if it wasn't, then at least somewhere between
5.1.0 and 5.2.0. Or can you already deduce why? If not, I'd be more
than happy to continue applying any patches or trying any suggestions
you have, if you want to continue debugging this issue. In any case,
thanks for pursuing it to this point.
Steve Berman
On 2020-06-16 22:28:43 [+0200], Stephen Berman wrote:
> Your assessment and predictions are right on the mark!
perfect.
> I'm fine with the thermal.tzp=300 workaround, but it would be good to
> find out why this problem started with commit 6d25be57, if my git
> bisection was correct, or if it wasn't, then at least somewhere between
> 5.1.0 and 5.2.0. Or can you already deduce why? If not, I'd be more
> than happy to continue applying any patches or trying any suggestions
> you have, if you want to continue debugging this issue. In any case,
> thanks for pursuing it to this point.
I have no idea why the commit in question should make any difference.
Could please apply the tracing patch on v5.1 and send the trace?
> Steve Berman
Sebastian
On Wed, 17 Jun 2020 16:27:34 +0200 Sebastian Andrzej Siewior <[email protected]> wrote:
> On 2020-06-16 22:28:43 [+0200], Stephen Berman wrote:
>> Your assessment and predictions are right on the mark!
> perfect.
>
>> I'm fine with the thermal.tzp=300 workaround, but it would be good to
>> find out why this problem started with commit 6d25be57, if my git
>> bisection was correct, or if it wasn't, then at least somewhere between
>> 5.1.0 and 5.2.0. Or can you already deduce why? If not, I'd be more
>> than happy to continue applying any patches or trying any suggestions
>> you have, if you want to continue debugging this issue. In any case,
>> thanks for pursuing it to this point.
>
> I have no idea why the commit in question should make any difference.
> Could please apply the tracing patch on v5.1 and send the trace?
Attached.
Steve Berman
Hi Sebastian,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on mkp-scsi/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Sebastian-Andrzej-Siewior/SCSI-Disable-CD-ROM-poll-on-shutdown/20200610-042458
base: https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next
config: x86_64-rhel-7.6 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-13) 9.3.0
reproduce (this is a W=1 build):
# save the attached .config to linux build tree
make W=1 ARCH=x86_64
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>
All errors (new ones prefixed by >>, old ones prefixed by <<):
>> ERROR: modpost: "disk_block_events" [drivers/scsi/sr_mod.ko] undefined!
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]
On 2020-06-17 23:09:44 [+0200], Stephen Berman wrote:
>
> Attached.
I did not forget about this but had recently little time to look into
this.
> Steve Berman
Sebastian
On Wed, 24 Jun 2020 22:11:56 +0200 Sebastian Andrzej Siewior <[email protected]> wrote:
> On 2020-06-17 23:09:44 [+0200], Stephen Berman wrote:
>>
>> Attached.
>
> I did not forget about this but had recently little time to look into
> this.
No problem!
Steve Berman
On 2020-06-24 23:49:52 [+0200], Stephen Berman wrote:
Let me summarize the thread here:
On Stephen's system, ACPI informs the thermal zone driver to poll the
temperature every second and the driver does so.
The driver queries the temperature by invoking acpi_evaluate_integer()
which invokes (at some point) acpi_ev_queue_notify_request().
This then invokes acpi_os_execute_deferred() via
queue_work_on(, kacpi_notify_wq, )
acpi_os_execute_deferred() invokes acpi_ev_notify_dispatch() and this is
no longer synchronised with the initial acpi_evaluate_integer() request.
Before commit
6d25be5782e48 ("sched/core, workqueues: Distangle worker accounting from rq lock")
that function took on average 1.023993 seconds to complete. The
interval when the thermal driver invokes acpi_evaluate_integer() isn't
exactly 1 second but almost (it is not an absolute timer so). Still it
looks that one function slowly overtakes the other. After 5 Minutes
uptime there is:
| kworker/0:1-12 [000] 312.315565: acpi_ev_queue_notify_request: Dispatching Notify on [TZ10] (Thermal) Value 0x81 (Information Change) Node ffff9935ab20f7a8
| kworker/0:1-12 [000] 312.315567: acpi_os_execute: Adding acpi_ev_notify_dispatch+0x0/0x5a ffff9935a6c64050 <ffff9935a7eb5e00>
Enqueue worker with job ffff9935a7eb5e00
| kworker/0:1-12 [000] 312.315596: acpi_os_execute_deferred_notify: End ffff9935a7eb5c80 acpi_ev_notify_dispatch+0x0/0x5a(ffff9935a6c64c80)
| kworker/0:1-12 [000] 312.315607: acpi_os_execute_deferred_notify: Start ffff9935a7eb5d80 acpi_ev_notify_dispatch+0x0/0x5a(ffff9935a6c64dc0)
previous worker completed, another (already enqueued) started.
| kworker/0:1-12 [000] 313.339564: acpi_ev_queue_notify_request: Dispatching Notify on [TZ10] (Thermal) Value 0x81 (Information Change) Node ffff9935ab20f7a8
| kworker/0:1-12 [000] 313.339566: acpi_os_execute: Adding acpi_ev_notify_dispatch+0x0/0x5a ffff9935a6c64f00 <ffff9935a7eb5c80>
another one enqueued.
| kworker/0:1-12 [000] 313.339595: acpi_os_execute_deferred_notify: End ffff9935a7eb5d80 acpi_ev_notify_dispatch+0x0/0x5a(ffff9935a6c64dc0)
| kworker/0:1-12 [000] 313.339597: acpi_os_execute_deferred_notify: Start ffff9935a7eb5e00 acpi_ev_notify_dispatch+0x0/0x5a(ffff9935a6c64050)
finally, job ffff9935a7eb5e00 started (one second after enqueue).
| kworker/0:1-12 [000] 314.363571: acpi_ev_queue_notify_request: Dispatching Notify on [TZ10] (Thermal) Value 0x81 (Information Change) Node ffff9935ab20f7a8
| kworker/0:1-12 [000] 314.363574: acpi_os_execute: Adding acpi_ev_notify_dispatch+0x0/0x5a ffff9935a6c646e0 <ffff9935a7eb5d80>
| kworker/0:1-12 [000] 314.363608: acpi_os_execute_deferred_notify: End ffff9935a7eb5e00 acpi_ev_notify_dispatch+0x0/0x5a(ffff9935a6c64050)
and ended, two seconds after enqueue. Before it ended, the system
enqueued two more jobs. And this just within 5 Minutes of uptime. The
worker pile slowly up.
According to Stephen, after previously mentioned commit the situation
gets worse. According to tracing the execution time of
acpi_ev_notify_dispatch() varies between 1.023990 (like before) and
2.048005 seconds. While I don't have an explanation for the changed
behaviour (or see anything wrong the commit in question), the pile up of
worker increased.
Stephen noticed this because system shutdown flushes kacpi_notify_wq
and this takes quite some time (with that amount of worker pending).
I tried a few test cases to reproduce that behaviour but failed.
However, even before that commit in question the situation is far from
perfect:
- There is the lack of synchronisation between thermal driver's
requests and its tail (that acpi_ev_notify_dispatch() part).
- Do we have to respect ACPI interval of one seconds. Is the hardware
really so accurate that it can change noticeable in one second.
- The requests are already back to back which keeps the CPU busy (maybe
sched-switch will reveal that the CPU is idle most of the time).
So...
Is there a simple way to synchronize the ACPI part? The obvious thing
would be
flush_workqueue(kacpi_notify_wq);
or
acpi_os_wait_events_complete()
in thermal_get_temp().
Would it make sense to ensure that the polling interval is no less than
10 seconds?
> Steve Berman
Sebastian
On Tue, Jul 14, 2020 at 3:44 PM Sebastian Andrzej Siewior
<[email protected]> wrote:>
> On 2020-06-24 23:49:52 [+0200], Stephen Berman wrote:
>
> Let me summarize the thread here:
>
> On Stephen's system, ACPI informs the thermal zone driver to poll the
> temperature every second and the driver does so.
> The driver queries the temperature by invoking acpi_evaluate_integer()
> which invokes (at some point) acpi_ev_queue_notify_request().
Well, I don't quite see how acpi_ev_queue_notify_request() can be
invoked from the acpi_evaluate_integer() code path.
Do you have a call trace showing that?
> This then invokes acpi_os_execute_deferred() via
> queue_work_on(, kacpi_notify_wq, )
>
> acpi_os_execute_deferred() invokes acpi_ev_notify_dispatch() and this is
> no longer synchronised with the initial acpi_evaluate_integer() request.
That indeed would be the case, but I first need to understand what's
going on with the acpi_evaluate_integer() causing a Notify () to be
queued up.
Cheers!
On 2020-07-14 15:54:57 [+0200], Rafael J. Wysocki wrote:
> On Tue, Jul 14, 2020 at 3:44 PM Sebastian Andrzej Siewior
> <[email protected]> wrote:>
> > On 2020-06-24 23:49:52 [+0200], Stephen Berman wrote:
> >
> > Let me summarize the thread here:
> >
> > On Stephen's system, ACPI informs the thermal zone driver to poll the
> > temperature every second and the driver does so.
> > The driver queries the temperature by invoking acpi_evaluate_integer()
> > which invokes (at some point) acpi_ev_queue_notify_request().
>
> Well, I don't quite see how acpi_ev_queue_notify_request() can be
> invoked from the acpi_evaluate_integer() code path.
>
> Do you have a call trace showing that?
So the trace in
https://lore.kernel.org/linux-acpi/[email protected]/
shows the pattern and we nailed it down that it comes from
thermal_get_temp(). I assumed acpi_ex_opcode_2A_0T_0R() since the other
candidate was acpi_ev_asynch_execute_gpe_method().
Stephen, the patch attached adds a WARN_ON() statement which will
produce a stack trace (4 or so). Could please run 'dmesg' after a while
and send it back. There should be a
"WARNING in drivers/acpi/acpica/evmisc.c"
statement or something along the lines.
Rafael, are you also interested in an ACPI dump?
> Cheers!
Sebastian
On Tue, Jul 14, 2020 at 4:11 PM Sebastian Andrzej Siewior
<[email protected]> wrote:
>
> On 2020-07-14 15:54:57 [+0200], Rafael J. Wysocki wrote:
> > On Tue, Jul 14, 2020 at 3:44 PM Sebastian Andrzej Siewior
> > <[email protected]> wrote:>
> > > On 2020-06-24 23:49:52 [+0200], Stephen Berman wrote:
> > >
> > > Let me summarize the thread here:
> > >
> > > On Stephen's system, ACPI informs the thermal zone driver to poll the
> > > temperature every second and the driver does so.
> > > The driver queries the temperature by invoking acpi_evaluate_integer()
> > > which invokes (at some point) acpi_ev_queue_notify_request().
> >
> > Well, I don't quite see how acpi_ev_queue_notify_request() can be
> > invoked from the acpi_evaluate_integer() code path.
> >
> > Do you have a call trace showing that?
>
> So the trace in
> https://lore.kernel.org/linux-acpi/[email protected]/
>
> shows the pattern and we nailed it down that it comes from
> thermal_get_temp().
acpi_evaluate_integer() doesn't show up in the trace, though, AFAICS.
> I assumed acpi_ex_opcode_2A_0T_0R() since the other
> candidate was acpi_ev_asynch_execute_gpe_method().
Which probably is the case. Specifically
acpi_ev_asynch_execute_gpe_method: Evaluate _L66
is likely to cause the Notify() to be dispatched.
> Stephen, the patch attached adds a WARN_ON() statement which will
> produce a stack trace (4 or so). Could please run 'dmesg' after a while
> and send it back. There should be a
> "WARNING in drivers/acpi/acpica/evmisc.c"
>
> statement or something along the lines.
>
> Rafael, are you also interested in an ACPI dump?
That might help a bit.
So what probably happens is that poking at the TZ causes a GPE to
trigger and a Notify() to be dispatched which then goes into the
workqueue for execution.
Now, I'm not sure what happens to those Notify() items, though. They
each should cause a handler (in the thermal driver) to be executed,
but does that happen?
On 2020-07-14 17:53:15 [+0200], Rafael J. Wysocki wrote:
> > shows the pattern and we nailed it down that it comes from
> > thermal_get_temp().
>
> acpi_evaluate_integer() doesn't show up in the trace, though, AFAICS.
I deducted it. acpi_thermal_get_temperature() -> acpi_evaluate_integer()
and the dmesg showed
| ACPI: Invalid passive threshold
| thermal LNXTHERM:00: registered as thermal_zone0
| ACPI: Thermal Zone [TZ10] (17 C)
| thermal LNXTHERM:01: registered as thermal_zone1
| ACPI: Thermal Zone [TZ00] (28 C)
and overriding thermal.tzp to a higher value made the issue go away.
There is a complete dmesg
https://lore.kernel.org/lkml/[email protected]/
> > I assumed acpi_ex_opcode_2A_0T_0R() since the other
> > candidate was acpi_ev_asynch_execute_gpe_method().
>
> Which probably is the case. Specifically
>
> acpi_ev_asynch_execute_gpe_method: Evaluate _L66
>
> is likely to cause the Notify() to be dispatched.
Okay. I hope Stephen replies later with a backtrace.
> > Stephen, the patch attached adds a WARN_ON() statement which will
> > produce a stack trace (4 or so). Could please run 'dmesg' after a while
> > and send it back. There should be a
> > "WARNING in drivers/acpi/acpica/evmisc.c"
> >
> > statement or something along the lines.
> >
> > Rafael, are you also interested in an ACPI dump?
>
> That might help a bit.
>
> So what probably happens is that poking at the TZ causes a GPE to
> trigger and a Notify() to be dispatched which then goes into the
> workqueue for execution.
>
> Now, I'm not sure what happens to those Notify() items, though. They
> each should cause a handler (in the thermal driver) to be executed,
> but does that happen?
Are you asking for a trace point in acpi_ev_notify_dispatch() to see
what it actually calls as in info->notify.global->handler /
handler_obj->notify.handler ?
There is no handler in the thermal driver or do I just not recognize it?
Sebastian
On Tue, 14 Jul 2020 16:11:35 +0200 Sebastian Andrzej Siewior <[email protected]> wrote:
> On 2020-07-14 15:54:57 [+0200], Rafael J. Wysocki wrote:
>> On Tue, Jul 14, 2020 at 3:44 PM Sebastian Andrzej Siewior
>> <[email protected]> wrote:>
>> > On 2020-06-24 23:49:52 [+0200], Stephen Berman wrote:
>> >
>> > Let me summarize the thread here:
>> >
>> > On Stephen's system, ACPI informs the thermal zone driver to poll the
>> > temperature every second and the driver does so.
>> > The driver queries the temperature by invoking acpi_evaluate_integer()
>> > which invokes (at some point) acpi_ev_queue_notify_request().
>>
>> Well, I don't quite see how acpi_ev_queue_notify_request() can be
>> invoked from the acpi_evaluate_integer() code path.
>>
>> Do you have a call trace showing that?
>
> So the trace in
> https://lore.kernel.org/linux-acpi/[email protected]/
>
> shows the pattern and we nailed it down that it comes from
> thermal_get_temp(). I assumed acpi_ex_opcode_2A_0T_0R() since the other
> candidate was acpi_ev_asynch_execute_gpe_method().
>
> Stephen, the patch attached adds a WARN_ON() statement which will
> produce a stack trace (4 or so). Could please run 'dmesg' after a while
> and send it back. There should be a
> "WARNING in drivers/acpi/acpica/evmisc.c"
>
> statement or something along the lines.
I've now rebuilt 5.6.4 with the new patch; sorry for the delay. Output
of dmesg attached.
Steve Berman
On 2020-07-14 17:53:15 [+0200], Rafael J. Wysocki wrote:
> acpi_evaluate_integer() doesn't show up in the trace, though, AFAICS.
>
> > I assumed acpi_ex_opcode_2A_0T_0R() since the other
> > candidate was acpi_ev_asynch_execute_gpe_method().
>
> Which probably is the case. Specifically
>
> acpi_ev_asynch_execute_gpe_method: Evaluate _L66
>
> is likely to cause the Notify() to be dispatched.
…
> > Rafael, are you also interested in an ACPI dump?
>
> That might help a bit.
>
> So what probably happens is that poking at the TZ causes a GPE to
> trigger and a Notify() to be dispatched which then goes into the
> workqueue for execution.
>
> Now, I'm not sure what happens to those Notify() items, though. They
> each should cause a handler (in the thermal driver) to be executed,
> but does that happen?
Stephen's trace contains a few backtraces, all of them look like this:
| Call Trace:
| acpi_ex_opcode_2A_0T_0R+0x93/0xdf
| acpi_ds_exec_end_op+0x10d/0x701
| acpi_ps_parse_loop+0x7f2/0x8c3
| acpi_ps_parse_aml+0x1a5/0x540
| acpi_ps_execute_method+0x1fe/0x2ba
| acpi_ns_evaluate+0x345/0x4e2
| acpi_evaluate_object+0x177/0x39f
| acpi_evaluate_integer+0x4f/0x110
| acpi_thermal_get_temperature.part.0+0x45/0xc4
| thermal_get_temp.cold+0xc/0x2e
| thermal_zone_get_temp+0x4c/0x70
| thermal_zone_device_update.part.0+0x2a/0x110
| acpi_thermal_notify+0xcf/0x140
| acpi_ev_notify_dispatch+0x45/0x5a
| acpi_os_execute_deferred_notify+0x34/0x60
| process_one_work+0x1d2/0x3a0
| worker_thread+0x45/0x3c0
| kthread+0xf6/0x130
| ret_from_fork+0x35/0x40
so no GPE and it comes the notify callback while parsing the ACPI table.
Any ideas? I guess acpi_ex_opcode_2A_0T_0R() uses the workqueue because
it may sleep and it might be invoked from non-preemptible context.
Sebastian
On Sun, 19 Jul 2020 12:07:14 +0200 Stephen Berman <[email protected]> wrote:
> On Tue, 14 Jul 2020 16:11:35 +0200 Sebastian Andrzej Siewior
> <[email protected]> wrote:
>
[...]
>> Stephen, the patch attached adds a WARN_ON() statement which will
>> produce a stack trace (4 or so). Could please run 'dmesg' after a while
>> and send it back. There should be a
>> "WARNING in drivers/acpi/acpica/evmisc.c"
>>
>> statement or something along the lines.
>
> I've now rebuilt 5.6.4 with the new patch; sorry for the delay. Output
> of dmesg attached.
FWIW, while you were away I got an additional datapoint: someone who saw
my original post about this problem to one of the Linux From Scratch
mailing lists wrote me that he had the same problem, also getting
messages about the cdrom timing out ("sr 3:0:0:0" in his case); I told
him about your workaround of adding 'thermal.tzp=300' to the kernel
commandline, and he replied that this works for him too. And it turns
out we have similar motherboards: I have a Gigabyte Z390 M Gaming
Rev. 1001 board and he has Gigabyte Z390 Designare rev 1.0.
Steve Berman
On 2020-08-11 13:58:39 [+0200], Stephen Berman wrote:
> him about your workaround of adding 'thermal.tzp=300' to the kernel
> commandline, and he replied that this works for him too. And it turns
> out we have similar motherboards: I have a Gigabyte Z390 M Gaming
> Rev. 1001 board and he has Gigabyte Z390 Designare rev 1.0.
Yes. Based on latest dmesg, the ACPI tables contain code which schedules
the worker and takes so long. It is possible / likely that his board
contains the same tables which leads to the same effect. After all those
two boards are very similar from the naming part :)
Would you mind to dump the ACPI tables and send them? There might be
some hints.
It might be possible that a BIOS update fixes the problem but I would
prefer very much to fix this in kernel to ensure that such a BIOS does
not lead to this problem again.
> Steve Berman
Sebastian
On Tue, Aug 11, 2020 at 12:27 PM Sebastian Andrzej Siewior
<[email protected]> wrote:
>
> On 2020-07-14 17:53:15 [+0200], Rafael J. Wysocki wrote:
> > acpi_evaluate_integer() doesn't show up in the trace, though, AFAICS.
> >
> > > I assumed acpi_ex_opcode_2A_0T_0R() since the other
> > > candidate was acpi_ev_asynch_execute_gpe_method().
> >
> > Which probably is the case. Specifically
> >
> > acpi_ev_asynch_execute_gpe_method: Evaluate _L66
> >
> > is likely to cause the Notify() to be dispatched.
> …
> > > Rafael, are you also interested in an ACPI dump?
> >
> > That might help a bit.
> >
> > So what probably happens is that poking at the TZ causes a GPE to
> > trigger and a Notify() to be dispatched which then goes into the
> > workqueue for execution.
> >
> > Now, I'm not sure what happens to those Notify() items, though. They
> > each should cause a handler (in the thermal driver) to be executed,
> > but does that happen?
>
> Stephen's trace contains a few backtraces, all of them look like this:
>
> | Call Trace:
> | acpi_ex_opcode_2A_0T_0R+0x93/0xdf
> | acpi_ds_exec_end_op+0x10d/0x701
> | acpi_ps_parse_loop+0x7f2/0x8c3
> | acpi_ps_parse_aml+0x1a5/0x540
> | acpi_ps_execute_method+0x1fe/0x2ba
> | acpi_ns_evaluate+0x345/0x4e2
> | acpi_evaluate_object+0x177/0x39f
> | acpi_evaluate_integer+0x4f/0x110
> | acpi_thermal_get_temperature.part.0+0x45/0xc4
> | thermal_get_temp.cold+0xc/0x2e
> | thermal_zone_get_temp+0x4c/0x70
> | thermal_zone_device_update.part.0+0x2a/0x110
> | acpi_thermal_notify+0xcf/0x140
> | acpi_ev_notify_dispatch+0x45/0x5a
> | acpi_os_execute_deferred_notify+0x34/0x60
This is Notify () processing.
The handler is acpi_thermal_notify() which calls acpi_thermal_check()
which is just a wrapper around thermal_zone_device_update() doing
update_temperature() and that calls thermal_zone_get_temp() (among
other things) which invokes the ->get_temp() callback for the target
thermal zone.
In the ACPI thermal driver the ->get_temp callback is
thermal_get_temp() which basically calls
acpi_thermal_get_temperature() and that evaluates _TMP (for the target
thermal zone).
It looks like on the machine in question the _TMP method contains
another Notify () targeting the same thermal zone which gets queued up
via the acpi_ex_opcode_2A_0T_0R() at the top. Obviously that Notify
() (when it gets executed) will cause acpi_thermal_notify() to be
executed again and so on, ad infinitum unless the Notify () in _TMP is
conditional on something.
> | process_one_work+0x1d2/0x3a0
> | worker_thread+0x45/0x3c0
> | kthread+0xf6/0x130
> | ret_from_fork+0x35/0x40
>
> so no GPE and it comes the notify callback while parsing the ACPI table.
Right.
> Any ideas? I guess acpi_ex_opcode_2A_0T_0R() uses the workqueue because
> it may sleep and it might be invoked from non-preemptible context.
No, it uses the workqueue because it queues up a Notify () which
always goes through a workqueue (kacpi_notify_wq to be precise) and
basically in order to avoid deadlocks (it runs under locks which may
need to be acquired again to handle the notification).
On Tue, Aug 11, 2020 at 3:29 PM Sebastian Andrzej Siewior
<[email protected]> wrote:
>
> On 2020-08-11 13:58:39 [+0200], Stephen Berman wrote:
> > him about your workaround of adding 'thermal.tzp=300' to the kernel
> > commandline, and he replied that this works for him too. And it turns
> > out we have similar motherboards: I have a Gigabyte Z390 M Gaming
> > Rev. 1001 board and he has Gigabyte Z390 Designare rev 1.0.
>
> Yes. Based on latest dmesg, the ACPI tables contain code which schedules
> the worker and takes so long. It is possible / likely that his board
> contains the same tables which leads to the same effect. After all those
> two boards are very similar from the naming part :)
> Would you mind to dump the ACPI tables and send them? There might be
> some hints.
Do we have a BZ for this? It would be useful to open one if not.
> It might be possible that a BIOS update fixes the problem but I would
> prefer very much to fix this in kernel to ensure that such a BIOS does
> not lead to this problem again.
I agree.
It looks like one way to address this issue might be to add a rate
limit for thermal notifications on a given zone.
On 2020-08-11 16:34:09 [+0200], Rafael J. Wysocki wrote:
> On Tue, Aug 11, 2020 at 3:29 PM Sebastian Andrzej Siewior
> <[email protected]> wrote:
> >
> > On 2020-08-11 13:58:39 [+0200], Stephen Berman wrote:
> > > him about your workaround of adding 'thermal.tzp=300' to the kernel
> > > commandline, and he replied that this works for him too. And it turns
> > > out we have similar motherboards: I have a Gigabyte Z390 M Gaming
> > > Rev. 1001 board and he has Gigabyte Z390 Designare rev 1.0.
> >
> > Yes. Based on latest dmesg, the ACPI tables contain code which schedules
> > the worker and takes so long. It is possible / likely that his board
> > contains the same tables which leads to the same effect. After all those
> > two boards are very similar from the naming part :)
> > Would you mind to dump the ACPI tables and send them? There might be
> > some hints.
>
> Do we have a BZ for this? It would be useful to open one if not.
no, it came via lkml and I looked at it since it was bisected to a
workqueue commit with my signoff…
Stephen, can you open a bug on https://bugzilla.kernel.org/?
> > It might be possible that a BIOS update fixes the problem but I would
> > prefer very much to fix this in kernel to ensure that such a BIOS does
> > not lead to this problem again.
>
> I agree.
>
> It looks like one way to address this issue might be to add a rate
> limit for thermal notifications on a given zone.
So one thing is that ACPI says to poll every second and driver is doing
it. This could be increased to something like 15 or 30 seconds as lower
sane level. I don't think there is much value in polling this sensor
every second. As workaound, Stephen is using `thermal.tzp=300' now.
Would it make sense to flush the workqueue before checking the
temperature? I have no idea what the ACPI is doing there but there is no
upper limit on time how long in may take, right? Doing this inline (and
avoiding the worker) is probably causing other trouble, right?
Sebastian
On Tue, 11 Aug 2020 17:25:51 +0200 Sebastian Andrzej Siewior <[email protected]> wrote:
> On 2020-08-11 16:34:09 [+0200], Rafael J. Wysocki wrote:
>> On Tue, Aug 11, 2020 at 3:29 PM Sebastian Andrzej Siewior
>> <[email protected]> wrote:
>> >
>> > On 2020-08-11 13:58:39 [+0200], Stephen Berman wrote:
>> > > him about your workaround of adding 'thermal.tzp=300' to the kernel
>> > > commandline, and he replied that this works for him too. And it turns
>> > > out we have similar motherboards: I have a Gigabyte Z390 M Gaming
>> > > Rev. 1001 board and he has Gigabyte Z390 Designare rev 1.0.
>> >
>> > Yes. Based on latest dmesg, the ACPI tables contain code which schedules
>> > the worker and takes so long. It is possible / likely that his board
>> > contains the same tables which leads to the same effect. After all those
>> > two boards are very similar from the naming part :)
>> > Would you mind to dump the ACPI tables and send them? There might be
>> > some hints.
Attached.
>> Do we have a BZ for this? It would be useful to open one if not.
>
> no, it came via lkml and I looked at it since it was bisected to a
> workqueue commit with my signoff…
> Stephen, can you open a bug on https://bugzilla.kernel.org/?
Done as Bug 208877.
Steve Berman
On 2020-08-11 19:22:19 [+0200], Stephen Berman wrote:
> Attached.
ssdt6.dsl:
| ThermalZone (TZ10)
| {
…
| Method (_TSP, 0, Serialized) // _TSP: Thermal Sampling Period
| {
| Return (0x0A)
| }
|
| Method (_TZP, 0, Serialized) // _TZP: Thermal Zone Polling
| {
| Return (0x0A)
| }
|
| Method (_TMP, 0, Serialized) // _TMP: Temperature
| {
| \GSA1.ZRC3 ()
| Sleep (0x03E8)
| Notify (\_TZ.TZ10, 0x81) // Thermal Trip Point Change
| Return (0x0B54)
| }
So if I read this correctly then TZ10 should be polled every second
(check) and the temperature function contains a `sleep(1 second)' (which
explains the mysteries delay) followed by the Notify() (which schedules
the worker).
Now I'm curious if Windows has the same ACPI tables and if so how they
behave here. And what the actual plan here was. And where is the border
when one is actual allowed to make fun of someone. So many questions.
> Done as Bug 208877.
Thank you.
> Steve Berman
Sebastian
On 2020-08-11 20:49:05 [+0200], To Stephen Berman wrote:
> On 2020-08-11 19:22:19 [+0200], Stephen Berman wrote:
> > Attached.
>
> ssdt6.dsl:
> | ThermalZone (TZ10)
> | {
> …
> | Method (_TSP, 0, Serialized) // _TSP: Thermal Sampling Period
> | {
> | Return (0x0A)
> | }
> |
> | Method (_TZP, 0, Serialized) // _TZP: Thermal Zone Polling
> | {
> | Return (0x0A)
> | }
> |
> | Method (_TMP, 0, Serialized) // _TMP: Temperature
> | {
> | \GSA1.ZRC3 ()
> | Sleep (0x03E8)
> | Notify (\_TZ.TZ10, 0x81) // Thermal Trip Point Change
> | Return (0x0B54)
> | }
>
> So if I read this correctly then TZ10 should be polled every second
> (check) and the temperature function contains a `sleep(1 second)' (which
> explains the mysteries delay) followed by the Notify() (which schedules
> the worker).
>
> > Done as Bug 208877.
Rafael, do you have any suggestions?
Sebastian
On 10/6/2020 11:49 PM, Sebastian Andrzej Siewior wrote:
> On 2020-08-11 20:49:05 [+0200], To Stephen Berman wrote:
>> On 2020-08-11 19:22:19 [+0200], Stephen Berman wrote:
>>> Attached.
>> ssdt6.dsl:
>> | ThermalZone (TZ10)
>> | {
>> …
>> | Method (_TSP, 0, Serialized) // _TSP: Thermal Sampling Period
>> | {
>> | Return (0x0A)
>> | }
>> |
>> | Method (_TZP, 0, Serialized) // _TZP: Thermal Zone Polling
>> | {
>> | Return (0x0A)
>> | }
>> |
>> | Method (_TMP, 0, Serialized) // _TMP: Temperature
>> | {
>> | \GSA1.ZRC3 ()
>> | Sleep (0x03E8)
>> | Notify (\_TZ.TZ10, 0x81) // Thermal Trip Point Change
>> | Return (0x0B54)
>> | }
>>
>> So if I read this correctly then TZ10 should be polled every second
>> (check) and the temperature function contains a `sleep(1 second)' (which
>> explains the mysteries delay) followed by the Notify() (which schedules
>> the worker).
>>
>>> Done as Bug 208877.
> Rafael, do you have any suggestions?
>
I've lost track of this sorry.
I have ideas, let me get back to this next week.
Cheers!
On 2020-10-07 18:18:03 [+0200], Rafael J. Wysocki wrote:
> On 10/6/2020 11:49 PM, Sebastian Andrzej Siewior wrote:
> > On 2020-08-11 20:49:05 [+0200], To Stephen Berman wrote:
> > > On 2020-08-11 19:22:19 [+0200], Stephen Berman wrote:
> > > > Attached.
> > > ssdt6.dsl:
> > > | ThermalZone (TZ10)
> > > | {
> > > …
> > > | Method (_TSP, 0, Serialized) // _TSP: Thermal Sampling Period
> > > | {
> > > | Return (0x0A)
> > > | }
> > > |
> > > | Method (_TZP, 0, Serialized) // _TZP: Thermal Zone Polling
> > > | {
> > > | Return (0x0A)
> > > | }
> > > |
> > > | Method (_TMP, 0, Serialized) // _TMP: Temperature
> > > | {
> > > | \GSA1.ZRC3 ()
> > > | Sleep (0x03E8)
> > > | Notify (\_TZ.TZ10, 0x81) // Thermal Trip Point Change
> > > | Return (0x0B54)
> > > | }
> > >
> > > So if I read this correctly then TZ10 should be polled every second
> > > (check) and the temperature function contains a `sleep(1 second)' (which
> > > explains the mysteries delay) followed by the Notify() (which schedules
> > > the worker).
> > >
> > > > Done as Bug 208877.
> > Rafael, do you have any suggestions?
>
> I've lost track of this sorry.
>
> I have ideas, let me get back to this next week.
:)
> Cheers!
Sebastian
On 2020-10-26 18:20:59 [+0100], To Rafael J. Wysocki wrote:
> > > > > Done as Bug 208877.
> > > Rafael, do you have any suggestions?
> >
> > I've lost track of this sorry.
> >
> > I have ideas, let me get back to this next week.
>
> :)
Rafael, any update? If you outline an idea or so then I may be able to
form a patch out of it. Otherwise I have no idea how to fix this - other
than telling the driver to not poll in smaller intervals than
30secs.
> > Cheers!
>
Sebastian
On Wed, Dec 2, 2020 at 7:03 PM Sebastian Andrzej Siewior
<[email protected]> wrote:
>
> On 2020-10-26 18:20:59 [+0100], To Rafael J. Wysocki wrote:
> > > > > > Done as Bug 208877.
> > > > Rafael, do you have any suggestions?
> > >
> > > I've lost track of this sorry.
> > >
> > > I have ideas, let me get back to this next week.
> >
> > :)
>
> Rafael, any update? If you outline an idea or so then I may be able to
> form a patch out of it. Otherwise I have no idea how to fix this - other
> than telling the driver to not poll in smaller intervals than
> 30secs.
The idea, roughly speaking, is to limit the number of outstanding work
items in the queue (basically, if there's a notification occurring
before the previous one can be handled, there is no need to queue up
another work item for it).
On Wed, Dec 2, 2020 at 7:31 PM Rafael J. Wysocki <[email protected]> wrote:
>
> On Wed, Dec 2, 2020 at 7:03 PM Sebastian Andrzej Siewior
> <[email protected]> wrote:
> >
> > On 2020-10-26 18:20:59 [+0100], To Rafael J. Wysocki wrote:
> > > > > > > Done as Bug 208877.
> > > > > Rafael, do you have any suggestions?
> > > >
> > > > I've lost track of this sorry.
> > > >
> > > > I have ideas, let me get back to this next week.
> > >
> > > :)
> >
> > Rafael, any update? If you outline an idea or so then I may be able to
> > form a patch out of it. Otherwise I have no idea how to fix this - other
> > than telling the driver to not poll in smaller intervals than
> > 30secs.
>
> The idea, roughly speaking, is to limit the number of outstanding work
> items in the queue (basically, if there's a notification occurring
> before the previous one can be handled, there is no need to queue up
> another work item for it).
That's easier said than done, though, because of the way the work item
queue-up is hooked up into the ACPICA code.
On Wednesday, December 2, 2020 8:13:38 PM CET Rafael J. Wysocki wrote:
> On Wed, Dec 2, 2020 at 7:31 PM Rafael J. Wysocki <[email protected]> wrote:
> >
> > On Wed, Dec 2, 2020 at 7:03 PM Sebastian Andrzej Siewior
> > <[email protected]> wrote:
> > >
> > > On 2020-10-26 18:20:59 [+0100], To Rafael J. Wysocki wrote:
> > > > > > > > Done as Bug 208877.
> > > > > > Rafael, do you have any suggestions?
> > > > >
> > > > > I've lost track of this sorry.
> > > > >
> > > > > I have ideas, let me get back to this next week.
> > > >
> > > > :)
> > >
> > > Rafael, any update? If you outline an idea or so then I may be able to
> > > form a patch out of it. Otherwise I have no idea how to fix this - other
> > > than telling the driver to not poll in smaller intervals than
> > > 30secs.
> >
> > The idea, roughly speaking, is to limit the number of outstanding work
> > items in the queue (basically, if there's a notification occurring
> > before the previous one can be handled, there is no need to queue up
> > another work item for it).
>
> That's easier said than done, though, because of the way the work item
> queue-up is hooked up into the ACPICA code.
So scratch this and it wouldn't work in general anyway AFAICS.
ATM, I'm tempted to do something like the patch below (with the rationale
that it shouldn't be necessary to read the temperature right after updating
the trip points if polling is in use, because the next update through polling
will cause it to be read anyway and it will trigger trip point actions as
needed).
Stephen, can you give it a go, please?
---
drivers/acpi/thermal.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)
Index: linux-pm/drivers/acpi/thermal.c
===================================================================
--- linux-pm.orig/drivers/acpi/thermal.c
+++ linux-pm/drivers/acpi/thermal.c
@@ -911,24 +911,25 @@ static void acpi_thermal_notify(struct a
switch (event) {
case ACPI_THERMAL_NOTIFY_TEMPERATURE:
acpi_thermal_check(tz);
- break;
+ return;
case ACPI_THERMAL_NOTIFY_THRESHOLDS:
acpi_thermal_trips_update(tz, ACPI_TRIPS_REFRESH_THRESHOLDS);
- acpi_thermal_check(tz);
- acpi_bus_generate_netlink_event(device->pnp.device_class,
- dev_name(&device->dev), event, 0);
break;
case ACPI_THERMAL_NOTIFY_DEVICES:
acpi_thermal_trips_update(tz, ACPI_TRIPS_REFRESH_DEVICES);
- acpi_thermal_check(tz);
- acpi_bus_generate_netlink_event(device->pnp.device_class,
- dev_name(&device->dev), event, 0);
break;
default:
ACPI_DEBUG_PRINT((ACPI_DB_INFO,
"Unsupported event [0x%x]\n", event));
- break;
+ return;
}
+
+ /* Trigger an update of the thermal zone unless polling is in use. */
+ if (!tz->polling_frequency)
+ acpi_thermal_check(tz);
+
+ acpi_bus_generate_netlink_event(device->pnp.device_class,
+ dev_name(&device->dev), event, 0);
}
/*
On Thursday, December 31, 2020 9:46:11 PM CET Rafael J. Wysocki wrote:
> On Wednesday, December 2, 2020 8:13:38 PM CET Rafael J. Wysocki wrote:
> > On Wed, Dec 2, 2020 at 7:31 PM Rafael J. Wysocki <[email protected]> wrote:
> > >
> > > On Wed, Dec 2, 2020 at 7:03 PM Sebastian Andrzej Siewior
> > > <[email protected]> wrote:
> > > >
> > > > On 2020-10-26 18:20:59 [+0100], To Rafael J. Wysocki wrote:
> > > > > > > > > Done as Bug 208877.
> > > > > > > Rafael, do you have any suggestions?
> > > > > >
> > > > > > I've lost track of this sorry.
> > > > > >
> > > > > > I have ideas, let me get back to this next week.
> > > > >
> > > > > :)
> > > >
> > > > Rafael, any update? If you outline an idea or so then I may be able to
> > > > form a patch out of it. Otherwise I have no idea how to fix this - other
> > > > than telling the driver to not poll in smaller intervals than
> > > > 30secs.
> > >
> > > The idea, roughly speaking, is to limit the number of outstanding work
> > > items in the queue (basically, if there's a notification occurring
> > > before the previous one can be handled, there is no need to queue up
> > > another work item for it).
> >
> > That's easier said than done, though, because of the way the work item
> > queue-up is hooked up into the ACPICA code.
>
> So scratch this and it wouldn't work in general anyway AFAICS.
>
> ATM, I'm tempted to do something like the patch below (with the rationale
> that it shouldn't be necessary to read the temperature right after updating
> the trip points if polling is in use, because the next update through polling
> will cause it to be read anyway and it will trigger trip point actions as
> needed).
There is one more way to address this, probably better: instead of checking the
temperature right away in acpi_thermal_notify(), queue that on acpi_thermal_pm_queue
and so only if another thermal check is not pending.
This way there will be at most one temperature check coming from
acpi_thermal_notify() queued up at any time which should prevent the
build-up of work items from taking place.
So something like this:
---
drivers/acpi/thermal.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
Index: linux-pm/drivers/acpi/thermal.c
===================================================================
--- linux-pm.orig/drivers/acpi/thermal.c
+++ linux-pm/drivers/acpi/thermal.c
@@ -900,6 +900,12 @@ static void acpi_thermal_unregister_ther
Driver Interface
-------------------------------------------------------------------------- */
+static void acpi_queue_thermal_check(struct acpi_thermal *tz)
+{
+ if (!work_pending(&tz->thermal_check_work))
+ queue_work(acpi_thermal_pm_queue, &tz->thermal_check_work);
+}
+
static void acpi_thermal_notify(struct acpi_device *device, u32 event)
{
struct acpi_thermal *tz = acpi_driver_data(device);
@@ -910,17 +916,17 @@ static void acpi_thermal_notify(struct a
switch (event) {
case ACPI_THERMAL_NOTIFY_TEMPERATURE:
- acpi_thermal_check(tz);
+ acpi_queue_thermal_check(tz);
break;
case ACPI_THERMAL_NOTIFY_THRESHOLDS:
acpi_thermal_trips_update(tz, ACPI_TRIPS_REFRESH_THRESHOLDS);
- acpi_thermal_check(tz);
+ acpi_queue_thermal_check(tz);
acpi_bus_generate_netlink_event(device->pnp.device_class,
dev_name(&device->dev), event, 0);
break;
case ACPI_THERMAL_NOTIFY_DEVICES:
acpi_thermal_trips_update(tz, ACPI_TRIPS_REFRESH_DEVICES);
- acpi_thermal_check(tz);
+ acpi_queue_thermal_check(tz);
acpi_bus_generate_netlink_event(device->pnp.device_class,
dev_name(&device->dev), event, 0);
break;
@@ -1117,7 +1123,7 @@ static int acpi_thermal_resume(struct de
tz->state.active |= tz->trips.active[i].flags.enabled;
}
- queue_work(acpi_thermal_pm_queue, &tz->thermal_check_work);
+ acpi_queue_thermal_check(tz);
return AE_OK;
}
On Thu, 31 Dec 2020 21:46:11 +0100 "Rafael J. Wysocki" <[email protected]> wrote:
> ATM, I'm tempted to do something like the patch below (with the rationale
> that it shouldn't be necessary to read the temperature right after updating
> the trip points if polling is in use, because the next update through polling
> will cause it to be read anyway and it will trigger trip point actions as
> needed).
>
> Stephen, can you give it a go, please?
On Sat, 02 Jan 2021 12:03:17 +0100 "Rafael J. Wysocki" <[email protected]> wrote:
> There is one more way to address this, probably better: instead of checking the
> temperature right away in acpi_thermal_notify(), queue that on acpi_thermal_pm_queue
> and so only if another thermal check is not pending.
>
> This way there will be at most one temperature check coming from
> acpi_thermal_notify() queued up at any time which should prevent the
> build-up of work items from taking place.
>
> So something like this:
Thanks for the patches. I'll try them as soon as I can.
Steve Berman
On Mon, 04 Jan 2021 16:38:43 +0100 Stephen Berman <[email protected]> wrote:
> On Thu, 31 Dec 2020 21:46:11 +0100 "Rafael J. Wysocki" <[email protected]> wrote:
>
>> ATM, I'm tempted to do something like the patch below (with the rationale
>> that it shouldn't be necessary to read the temperature right after updating
>> the trip points if polling is in use, because the next update through polling
>> will cause it to be read anyway and it will trigger trip point actions as
>> needed).
>>
>> Stephen, can you give it a go, please?
>
> On Sat, 02 Jan 2021 12:03:17 +0100 "Rafael J. Wysocki" <[email protected]> wrote:
>
>> There is one more way to address this, probably better: instead of checking the
>> temperature right away in acpi_thermal_notify(), queue that on
>> acpi_thermal_pm_queue
>> and so only if another thermal check is not pending.
>>
>> This way there will be at most one temperature check coming from
>> acpi_thermal_notify() queued up at any time which should prevent the
>> build-up of work items from taking place.
>>
>> So something like this:
>
> Thanks for the patches. I'll try them as soon as I can.
FTR, since this is the thread I started for this bug, I've confirmed in
https://lore.kernel.org/lkml/[email protected]/T/#t that the latest
patch fixes the bug.
Steve Berman
On Sun, Jan 24, 2021 at 2:49 PM Stephen Berman <[email protected]> wrote:
>
> On Mon, 04 Jan 2021 16:38:43 +0100 Stephen Berman <[email protected]> wrote:
>
> > On Thu, 31 Dec 2020 21:46:11 +0100 "Rafael J. Wysocki" <[email protected]> wrote:
> >
> >> ATM, I'm tempted to do something like the patch below (with the rationale
> >> that it shouldn't be necessary to read the temperature right after updating
> >> the trip points if polling is in use, because the next update through polling
> >> will cause it to be read anyway and it will trigger trip point actions as
> >> needed).
> >>
> >> Stephen, can you give it a go, please?
> >
> > On Sat, 02 Jan 2021 12:03:17 +0100 "Rafael J. Wysocki" <[email protected]> wrote:
> >
> >> There is one more way to address this, probably better: instead of checking the
> >> temperature right away in acpi_thermal_notify(), queue that on
> >> acpi_thermal_pm_queue
> >> and so only if another thermal check is not pending.
> >>
> >> This way there will be at most one temperature check coming from
> >> acpi_thermal_notify() queued up at any time which should prevent the
> >> build-up of work items from taking place.
> >>
> >> So something like this:
> >
> > Thanks for the patches. I'll try them as soon as I can.
>
> FTR, since this is the thread I started for this bug, I've confirmed in
> https://lore.kernel.org/lkml/[email protected]/T/#t that the latest
> patch fixes the bug.
OK, thanks!
The patch has been applied as 5.11-rc material.