Received: by 2002:a25:683:0:0:0:0:0 with SMTP id 125csp818483ybg; Tue, 9 Jun 2020 13:43:52 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzMnWbRtC/5gTXIfjLmS7z6vgW4bV7yfM/m6sEmCGFB2ZirtXqPQLapLkcag6PtTK5ampxf X-Received: by 2002:a05:6402:144e:: with SMTP id d14mr28309916edx.14.1591735432384; Tue, 09 Jun 2020 13:43:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1591735432; cv=none; d=google.com; s=arc-20160816; b=XZfFoVUiKJF/BQyXqAO53yYoV6GQvcZ+Zbn+t0ojdyppOclXKqg0IQ6vftzm0tKhZZ g7iCO3LL/gcwynYE4HL4DWUpDwwmcIEUjNqpSRPdyz5IO7ts+K+4iTpntI6UscFZA03d fBhj6Skh4d0GaCXbaTYwnb8SSU24qA6J4j1L8iyGOaMiOU1lW4WhY1xpjdNwMqpEv3qZ 11tLvc15at6+sg/FC3drR2ogXC7NDH58H7sBTjoHS4Que/u0paOBWGqcPk4XS44LyXSy KOJU4wNyRtTsfQNfJaTPKxuWyO1DCJEijgeRqt5z0OQCMwO0AC4Dxzk+wsQGTM38vbIL l5qg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=JeqxoE7j9L0VxceTsVMujtCaL/EFJjiMI8qyh22xFes=; b=cI6EXfd+yW+9Ymm3td2nQT54pg4qviPatmCqiNHB97jY/skf6bV0EbSrDP22KTIlz2 GsPU037wx5fdF4xJb8DkJYWiHd0mp39Bp4PvDEuWarXH1q+69J3315fzOUZC4Qw/rDUt IxjaA/vwLAS2EYejxm7G5Z2BQDVjy9hp1ETqEFc6J6SglHnv0lz+XFFMiSZvyUDKveKl J6l3PepkD9ppZ75mdBRo3OqkpoZElhIqYDzT5mY7cPRvJwPpu1TUkyqxRZPFKha/Swo+ fgd1kZEwRonH0BX72hZeg+JrA1iZphE+B7U/wRyVmXUMAT1XPY+SFDUyn4v7RCKeyWBG b4aw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z10si11113492ejb.517.2020.06.09.13.43.29; Tue, 09 Jun 2020 13:43:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733092AbgFIUXm convert rfc822-to-8bit (ORCPT + 99 others); Tue, 9 Jun 2020 16:23:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37008 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728272AbgFIUXm (ORCPT ); Tue, 9 Jun 2020 16:23:42 -0400 Received: from Galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DD49BC05BD1E for ; Tue, 9 Jun 2020 13:23:41 -0700 (PDT) Received: from bigeasy by Galois.linutronix.de with local (Exim 4.80) (envelope-from ) id 1jikmZ-0000v8-FR; Tue, 09 Jun 2020 22:23:39 +0200 Date: Tue, 9 Jun 2020 22:23:39 +0200 From: Sebastian Andrzej Siewior To: Stephen Berman Cc: Thomas Gleixner , Peter Zijlstra , linux-kernel@vger.kernel.org Subject: Re: power-off delay/hang due to commit 6d25be57 (mainline) Message-ID: <20200609202339.cgy57twm2zdtjhje@linutronix.de> References: <87bln7ves7.fsf@gmx.net> <20200506215713.qoo4enq32ckcjmz7@linutronix.de> <87v9l65d2y.fsf@gmx.net> <20200513220428.4nksinis2qs5dtmh@linutronix.de> <87mu6aurfn.fsf@gmx.net> <20200522164012.ynyvrjompv42jtmx@linutronix.de> <87y2owwo2o.fsf@rub.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8BIT In-Reply-To: <87y2owwo2o.fsf@rub.de> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2020-06-09 12:06:23 [+0200], Stephen Berman wrote: > I recompiled kernel 5.6.4 with the printk() call you suggested, then > booted the kernel with "ignore_loglevel initcall_debug" (but leaving the > CDROM and wifi intact for now). After working as I normally do, I > called `shutdown -h now', again as usual. After the "Bringing down the > loopback interface" message there were these two messages: > > reboot: __do_sys_reboot(317)CMD: 89abcdef > reboot: __do_sys_reboot(317)CMD: 4321fedc > > Then nothing more for two minutes, then, as previously: > > sr 5:0:0:0: tag#10 timing out command, waited 120 seconds. > > Then I did a hard reboot. > > This morning I detached the cables to the CDROM (but left the disk in > the box) and again booted 5.6.4 with "ignore_loglevel initcall_debug". > After working for a bit, I called `shutdown -h now', and now there were > quite a few more messages, but again the machine did not power off. > Here is the ouput, which I transcribed carefully, hopefully without > error (I omitted the fractional parts of the timestamps and some of the > usb1-portX messages): If it helps you could delay printks on shutdown (via /proc/sys/kernel/printk_delay) and record a video clip. I could make storage available for an upload. | * Bringing down the loopback interface... | [1123.***] reboot: __do_sys_reboot(317)CMD: 89abcdef | [1123.***] reboot: __do_sys_reboot(317)CMD: 4321fedc This happens quicker than I expected. Initially I assumed that you are not there yet and userland is still poking at your cdrom. Accordin … | [1124.***] sd 4:0:0:0: shutdown | [1124.***] sd 4:0:0:0: shutdown [sda] Synchronizing SCSI cache | [1124.***] sd 4:0:0:0: shutdown [sda] Stopping disk After this, there shouldn't be any outstanding disk requests. … | [1125.***] ahci 0000:00:17.0: shutdown This disables the ahci controller which means disk/cdrom requests won't be answered by the hardware. … | [1125.***] ACPI: Preparing to enter system sleep state S5 | [1352.***] INFO: task halt:5187 blocked for more than 122 seconds. | [1352.***] Not tainted 5.6.4 #4 | [1352.***] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message | [1352.***] halt D13856 5187 5186 0x00004000 | [1352.***] Call Trace: | [1352.***] ? __schedule+0x272/0x5b0 | [1352.***] schedule+0x45/0xb0 | [1352.***] schedule_timeout+0x204/0x2f0 | [1352.***] ? acpi_os_release_object+0x5/0x10 | [1352.***] ? acpi_ut_update_object_reference+0x14e/0x1d2 | [1352.***] wait_for_completion+0xa3/0x100 | [1352.***] ? wake_up_q+0x90/0x90 | [1352.***] flush_workqueue+0x130/0x420 I have no idea where this flush_workqueue() is coming from. The command scripts/decode_stacktrace.sh vmlinux $(pwd)/ kernel_power_off+0x3d/0x70 should reveal that. The cdrom is polled by the kernel every two seconds. I *think* the kernel is blocked for some reason and then the cdrom polled. This can't complete and you see the srX warning. The window is quite small but not impossible. I managed to fabricate this case but after |[ 137.581613] sr 3:0:0:0: tag#1 timing out command, waited 120s The system shutdowns. So you are facing something different. I am surprised that the workqueue stall detector did not yell here. The patch at the bottom should get rid of the sr warnings. | [1352.***] kernel_power_off+0x3d/0x70 | [1352.***] __do_sys_reboot+0x140/0x220 … > After the last message here, the "echo", Call Trace, RIP and register > messages were repeated exactly as above. At this point I did a hard > reboot. A web search found something similar to the above output > ("System hangs (bad RIP value) when disk used in pool is removed"), but > didn't otherwise seem to be related to the problem I'm having. > > Do you still want me to check whether removing the iwlwifi driver makes > a differece? And with the CDROM still detached, or does that not > matter? I assumed the wifi driver shuts the AHCI port for some reason. But according to this log it does not happen, the ahci port is shutdown properly. The patch at the bottom should get rid of the sr warning. Then could you please try the other patch so we see which workqueue is blocked? I am curious to see why the system is blocked. > Steve Berman Sebastian From: Sebastian Andrzej Siewior Date: Tue, 9 Jun 2020 22:13:46 +0200 Subject: [PATCH] SCSI: Disable CD-ROM poll on shutdown Signed-off-by: Sebastian Andrzej Siewior --- drivers/scsi/sr.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c index d2fe3fa470f95..2484e594af283 100644 --- a/drivers/scsi/sr.c +++ b/drivers/scsi/sr.c @@ -85,6 +85,13 @@ static blk_status_t sr_init_command(struct scsi_cmnd *SCpnt); static int sr_done(struct scsi_cmnd *); static int sr_runtime_suspend(struct device *dev); +static void sr_shutdown(struct device *dev) +{ + struct scsi_cd *cd = dev_get_drvdata(dev); + + disk_block_events(cd->disk); +} + static const struct dev_pm_ops sr_pm_ops = { .runtime_suspend = sr_runtime_suspend, }; @@ -95,6 +102,7 @@ static struct scsi_driver sr_template = { .owner = THIS_MODULE, .probe = sr_probe, .remove = sr_remove, + .shutdown = sr_shutdown, .pm = &sr_pm_ops, }, .init_command = sr_init_command, -- 2.27.0