Received: by 2002:a25:1104:0:0:0:0:0 with SMTP id 4csp422061ybr; Fri, 22 May 2020 09:44:54 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw92aiphF2Sx40VkcovwovkrTFETAmeKmywE3qYdighfAyEJaceVyQTQEpuQu4fR883rHH7 X-Received: by 2002:a17:906:f208:: with SMTP id gt8mr8807890ejb.358.1590165894202; Fri, 22 May 2020 09:44:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1590165894; cv=none; d=google.com; s=arc-20160816; b=NFoP1CFWX0Ffri1ScfaqUeYXSUiUL8Gsy5uXaOaDPwfaa6Mmz8AquUOt2gKHtsx4D/ /ccMiBQpqJy7zpcA92fZvqD37E2qwNEuu+to1mWkW6rcTTUBXrVmkkIxmM00Xi5wPzpP F9tbs+WeHTEkklSxF1ur6H4XZd/GBDG1O3Jm4xbMceTLeIei4ckwEYb7NOQaToSNIvN3 A64Z2rxOGmwi2bl2YyBPtVjdK71chxMpgtzOP1kFCoSvxvmYgjeBr9lobLT2ioiowvSh g44v2YyD4XdI34SlqErWIDtc85TB0cDSl8X1nwlzLV//HliJ+j8bLQ/AeLhc4Tfc//Ez ymqQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=X74qH+1m/P14/b5sRKZENN4tv26gif0Rd5TI24vPgD8=; b=P1oDIXZPSYs417vbLtoSmS+94IIWMlqhyh+j2mxy5Gff/ziqIbqtL0mejRGZZgdZId B/hck51WKkcftB+Gq6GMUIprafd+2X2CRHNAIeyzsDBYdyvudeNvaK4gdKiXKp3yAlaB yTXFjhhTQvhDGCmSQl/cXih7Wxaox8Xbzp+UiM6QZ6cDgeY5Js4siB/mwe6KfiWwqRps nhkTMOzeWxWAPxCanJIpeMfhfU3mm7ExXVr17sTK3mJIPptIOi7EwWV2NtVJJUWNkbNh ypw0njqE61KqqFANBvNjoBnDvaioHem9bPa0WMG4FYvD5fBzrfwgHvRoTvCfbavk9AQ4 YiSQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z17si5634948eji.696.2020.05.22.09.44.31; Fri, 22 May 2020 09:44:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730741AbgEVQkP (ORCPT + 99 others); Fri, 22 May 2020 12:40:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46042 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730390AbgEVQkP (ORCPT ); Fri, 22 May 2020 12:40:15 -0400 Received: from Galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F037BC061A0E for ; Fri, 22 May 2020 09:40:14 -0700 (PDT) Received: from bigeasy by Galois.linutronix.de with local (Exim 4.80) (envelope-from ) id 1jcAiS-0000uq-DY; Fri, 22 May 2020 18:40:12 +0200 Date: Fri, 22 May 2020 18:40:12 +0200 From: Sebastian Andrzej Siewior To: Stephen Berman Cc: Thomas Gleixner , Peter Zijlstra , linux-kernel@vger.kernel.org Subject: Re: power-off delay/hang due to commit 6d25be57 (mainline) Message-ID: <20200522164012.ynyvrjompv42jtmx@linutronix.de> References: <87bln7ves7.fsf@gmx.net> <20200506215713.qoo4enq32ckcjmz7@linutronix.de> <87v9l65d2y.fsf@gmx.net> <20200513220428.4nksinis2qs5dtmh@linutronix.de> <87mu6aurfn.fsf@gmx.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <87mu6aurfn.fsf@gmx.net> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Sorry for the late reply. On 2020-05-14 23:39:40 [+0200], Stephen Berman wrote: > >> How will I know if that happens, is there a specific message in the tty? > > > > On the tty console where you see the "timing out command, waited" > > message, there should be something starting with > > |BUG: workqueue lockup - pool > > > > following with the pool information that got stuck. That code checks the > > workqueues every 30secs by default. So if you waited >= 60secs then > > system is not detecting a stall. > > As you can see in the photo, there was no message about a workqueue > lockup, only "task halt:5320 blocked for more than seconds" every > two minutes. I suppose that comes from one of the other options I > enabled. Does it reveal anything about the problem? From the picture, you are on your way to level 0, which would issue the final shutdown command, but you are not quite there yet. If you add a printk() to the reboot syscall, then I wouldn't expect you to see it. (something like that): diff --git a/kernel/reboot.c b/kernel/reboot.c index c4d472b7f1b42..19bc35bc0cda0 100644 --- a/kernel/reboot.c +++ b/kernel/reboot.c @@ -314,6 +314,7 @@ SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd, char buffer[256]; int ret = 0; + pr_err("%s(%d)CMD: %lx\n", __func__, __LINE__, cmd); /* We only trust the superuser with rebooting the system. */ if (!ns_capable(pid_ns->user_ns, CAP_SYS_BOOT)) return -EPERM; If you add "ignore_loglevel initcall_debug" to the command line then you should see the init callbacks of each driver. But there will be nothing on your shutdown (as I expect it). The "task X blocked for more than 120 secs" is part of the hung task detector. With the "ignore_loglevel" above you should be able to see the callchain of the task. I suspect that the task poked the cd-drive which isn't answering. So from detector's point of view, the task issued a system call which appears to hang an makes no progress. > > Could > > you please check if the stall-dector says something? > > Is that the message I repeated above or do you mean the workqueue? The hung message is not workqueue related. It is the task `halt' that makes no progress. There is not stall of the workqueue as far as the system can tell. The two boot options, I suggested above, may reveal additional information that are printed but suppressed due to the loglevel. My guess now is that maybe shutting down wifi also paused the AHCI controller which makes no progress now. So booting without cdrom/disk should not cause any problems. Could you please: - try booting with "ignore_loglevel initcall_debug" and see if additional information is printed on the console. - Remove cd / ATA-disk to check if anything else causes a stall. As by your report you only mentioned those two (and if I see it correctly, you rootFS is on nvme so removing the disk might be doable). - Could you remove the Wifi (just the driver, no the physical hw) to see if it makes any difference? > Steve Berman Sebastian