Received: by 2002:ab2:6309:0:b0:1fb:d597:ff75 with SMTP id s9csp240649lqt; Thu, 6 Jun 2024 02:04:11 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCXO8iPtDTiF6o7ibO8L7n6XkYvzIn5nJbcCEepuPJnQ0dF5cLPU2sjLO0kS8iTyqgW/N9BTbY+B0RmTU6HAUHB870n3q8dH7tegMfBpWA== X-Google-Smtp-Source: AGHT+IF9VDOd3fB//rzgtq7KwA3TusvoMf49sMw04sikMAyMiIbqtWB/u9IB/KRYG/fSd0xQShqD X-Received: by 2002:a17:906:2587:b0:a68:a2e7:118b with SMTP id a640c23a62f3a-a699f5619b5mr289777266b.19.1717664651186; Thu, 06 Jun 2024 02:04:11 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717664651; cv=pass; d=google.com; s=arc-20160816; b=LvYMsHJS6EZ2LF8i7GYU8eTN/mxAimfFz971mvvW5ETcLG23WX7YIalfTcXAvzJbPf Dx6OL6UK62zHuCKyqyKO1zV0RcGR08Ry5fIvxKK/a8b//w9ZbKZ0Z0MP/95Dgvj86tFk 6U9kDDLVs/quRi6hh55TsQBTAGah+ovVVGYCJPVyl/lGs2G5mC/aLC7E3E8eCfx0PyYw fPL7HQehW6iC5OWDUFcCdfK2fCzJVW7IFNL5X0B1FSCatYn80PIe+UpxR+ex1RkEcOaf cQfvtYe/m1zaWIyv4u6JFYW9nfyH03rysTGzrA213R6oC34uIq3Gl6V+H+eqM0eo5pzg Vayw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:in-reply-to:content-transfer-encoding :content-disposition:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:references:message-id:subject:cc:to:from:date :dkim-signature; bh=ni9uQ13h09lqU+RZIfOfm3dcCubUKEXNFbzKL+EbuK4=; fh=uqOURwuvU2TBZqdmeBNCdEJ9ct7TXQ4U6iqbZ/OFQ+w=; b=im+/MGlFAA0qaaWgpYaoaasXnqOy0TIdZuZ5wlIh+vm1TaIFc8comejI0YkKWalcyT dqYrA6h0IhxerzRGBvC2BtPhPs5vrnLkQitR3aLs667UgnzyP1DTXKQeLnB5a+jt3h0/ liW9KQJlWNXd0KblHsInFXsVc8rcvD4VfjU1M36PRqdgYKWY6eJWbbXFsjMKQgC9qvUK Cnc+nathcj2T6/LKjSqjebWkp3IQLZNdcw3lHt/DzrFV5BP3IBpHF04vtRylrQNWCav2 oe6M8KCsmEmzbQGtic+ZKq0IVZqd+gm+IF0QY0GttMjrGS5Vnw1kazzRpZcwWEgsyrnl 7xhA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@sony.com header.s=s1jp header.b=Z8hd+uOZ; arc=pass (i=1 spf=pass spfdomain=sony.com dkim=pass dkdomain=sony.com dmarc=pass fromdomain=sony.com); spf=pass (google.com: domain of linux-kernel+bounces-203917-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-203917-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sony.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id a640c23a62f3a-a6c80708178si48125966b.546.2024.06.06.02.04.11 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Jun 2024 02:04:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-203917-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@sony.com header.s=s1jp header.b=Z8hd+uOZ; arc=pass (i=1 spf=pass spfdomain=sony.com dkim=pass dkdomain=sony.com dmarc=pass fromdomain=sony.com); spf=pass (google.com: domain of linux-kernel+bounces-203917-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-203917-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sony.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id A70C61F27B6E for ; Thu, 6 Jun 2024 09:04:10 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 5E69F13F45A; Thu, 6 Jun 2024 08:58:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=sony.com header.i=@sony.com header.b="Z8hd+uOZ" Received: from jpms-ob02.noc.sony.co.jp (jpms-ob02.noc.sony.co.jp [211.125.140.165]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DBAE813C674 for ; Thu, 6 Jun 2024 08:58:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=211.125.140.165 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717664310; cv=none; b=DkNafW/KqR7Ef5fqYWRB8iK+9IZyQPdMkKVu5KFX3E3N+Fiuci21lVjzMBYp3NTwt8e/9YVsliHWX7US8fQecOSsSGIEeVFqR8e8sHn9y1epyHuYxwFpfrwoC84I9ae7O7yT9YPIYmoCgXcx6L1R2cavL40aGB+oGAjy+Uo34nM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717664310; c=relaxed/simple; bh=eOFXaEud5GYAPjxJmRNmgm08j2Nx2v0h+8hTKBzMh/M=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=obQMssaeijo2GkPsALkShSfRicm4BVqGSfmOu9sxnzX8qYRiIgEUIU+ENUxPF5+anFj+OskR+PezGjscmijvX0iqhomN0vwM1YXGYkdTIKDYcbE4wnGkFjbL8KpyYkMSIqa8H5hqIMqgX/tW7Ra/YQmNAc4xWh3vrEYHgS0alUQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=sony.com; spf=pass smtp.mailfrom=sony.com; dkim=pass (2048-bit key) header.d=sony.com header.i=@sony.com header.b=Z8hd+uOZ; arc=none smtp.client-ip=211.125.140.165 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=sony.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=sony.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sony.com; s=s1jp; t=1717664308; x=1749200308; h=date:from:to:cc:subject:message-id:references: mime-version:content-transfer-encoding:in-reply-to; bh=ni9uQ13h09lqU+RZIfOfm3dcCubUKEXNFbzKL+EbuK4=; b=Z8hd+uOZ9azjJWxicd/hdMNf8jgbV0O2kcQEk6Rh6ElomXN2ENA2zGYm Up1aduApDmOMaNLtdSMhkQ0aRyyqszEBpPMpneG3kbVHpCpXpTn8CCYxR Sq+5VsjSp+QQlxmthfAHkCD/Gn6eVMSUtc7wPmVDFeG1hnUjZx/8RUSbI 3RE2DrQZuliqPUqyjDNDfsCQEDWL57Yx0seLyQDliIWEKKPPDLVC5b8gH FImgOKvEFIZLjwhMAnwxMwFgwpePx/79l8f+j84SasWB2RfC7lzyfN2X4 k+daHtVKFwGFZZPMpw0+WxYey/XnW8tP4TdJR9lWoH18VprVyNDGhOa0T A==; Received: from unknown (HELO jpmta-ob1.noc.sony.co.jp) ([IPv6:2001:cf8:0:6e7::6]) by jpms-ob02.noc.sony.co.jp with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jun 2024 17:48:18 +0900 X-IronPort-AV: E=Sophos;i="6.08,218,1712588400"; d="scan'208";a="420659674" Received: from unknown (HELO LXJ00013166) ([IPv6:2001:cf8:2:f100:2ef0:5dff:fe04:1f0f]) by jpmta-ob1.noc.sony.co.jp with ESMTP; 06 Jun 2024 17:48:18 +0900 Date: Thu, 6 Jun 2024 08:48:17 +0000 From: Khasnis Soumya To: Greg KH Cc: rafael@kernel.org, linux-kernel@vger.kernel.org, daniel.lezcano@linaro.org, festevam@denx.de, lee@kernel.org, benjamin.bara@skidata.com, dmitry.osipenko@collabora.com, ldmldm05@gmail.com, soumya.khasnis@sony.com, srinavasa.nagaraju@sony.com, Madhusudan.Bobbili@sony.com, shingo.takeuchi@sony.com, keita.aihara@sony.com, masaya.takahashi@sony.com Subject: Re: [PATCH v2] reboot: Add timeout for device shutdown Message-ID: <20240606084817.GA118817@sony.com> References: <20240529110049.GA73111@sony.com> <2024052927-traffic-lazy-e3ad@gregkh> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <2024052927-traffic-lazy-e3ad@gregkh> User-Agent: Mutt/1.9.4 (2018-02-28) On Wed, May 29, 2024 at 02:51:48PM +0200, Greg KH wrote: > On Wed, May 29, 2024 at 11:00:49AM +0000, Soumya Khasnis wrote: > > The device shutdown callbacks invoked during shutdown/reboot > > are prone to errors depending on the device state or mishandling > > by one or more driver. > > Why not fix those drivers? A release callback should not stall, and if > it does, that's a bug that should be fixed there. Yes, the drivers must be fixed. But currently there is no good mechanism which detects such stalls and there by resulting in a system hang. This serves as a fail-safe mechanism to prevent such stalls at reboot/shutdown. > > Or use a watchdog and just reboot if that triggers at shutdown time. The hard or software watchdog enabled by config_lockup_detector couldn’t detect the cases when stalled on IO wait (wait_for_completion/io) > > > In order to prevent a device hang in such > > scenarios, we bail out after a timeout while dumping a meaningful > > call trace of the shutdown callback which blocks the shutdown or > > reboot process. > > Dump it where? Dump to kernel log and there by the console. Will update the commit message > > > > > > Signed-off-by: Soumya Khasnis > > Signed-off-by: Srinavasa Nagaraju > > --- > > drivers/base/Kconfig | 15 +++++++++++++++ > > kernel/reboot.c | 46 +++++++++++++++++++++++++++++++++++++++++++- > > 2 files changed, 60 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig > > index 2b8fd6bb7da0..d06e379b6281 100644 > > --- a/drivers/base/Kconfig > > +++ b/drivers/base/Kconfig > > @@ -243,3 +243,18 @@ config FW_DEVLINK_SYNC_STATE_TIMEOUT > > work on. > > > > endmenu > > + > > +config DEVICE_SHUTDOWN_TIMEOUT > > + bool "device shutdown timeout" > > + default n > > That is the default, so no need for this. Was mindful of the “slow” devices in the system with long IO wait times. Will have to consider increasing DEVICE_SHUTDOWN_TIMEOUT_SEC and make this a default y > > > > + help > > + Enable timeout for device shutdown. Helps in case device shutdown > > + is hung during shoutdonw and reboot. > > + > > + > > +config DEVICE_SHUTDOWN_TIMEOUT_SEC > > + int "device shutdown timeout in seconds" > > + default 5 > > + depends on DEVICE_SHUTDOWN_TIMEOUT > > + help > > + sets time for device shutdown timeout in seconds > > You need much more help text for all of these. Will update the commit with more help text > > And why are these in the drivers/base/Kconfig file? It has nothing to > do with "devices", or the driver core, it's all core kernel reboot > logic. This is handling only device shutdown, not complete reboot part, so we want to keep it here. > > > > diff --git a/kernel/reboot.c b/kernel/reboot.c > > index 22c16e2564cc..8460bd24563b 100644 > > --- a/kernel/reboot.c > > +++ b/kernel/reboot.c > > @@ -18,7 +18,7 @@ > > #include > > #include > > #include > > - > > +#include > > Why remove the blank line? Will fix it. > > > /* > > * this indicates whether you can reboot with ctrl-alt-del: the default is yes > > */ > > @@ -48,6 +48,14 @@ int reboot_cpu; > > enum reboot_type reboot_type = BOOT_ACPI; > > int reboot_force; > > > > +#ifdef CONFIG_DEVICE_SHUTDOWN_TIMEOUT > > +struct device_shutdown_timeout { > > + struct timer_list timer; > > + struct task_struct *task; > > +} devs_shutdown; > > +#define SHUTDOWN_TIMEOUT CONFIG_DEVICE_SHUTDOWN_TIMEOUT_SEC > > +#endif > > #ifdefs should not be in .c files, please put this in a .h file where it > belongs. Same for the other #ifdefs. Will fix it > > > > > + > > struct sys_off_handler { > > struct notifier_block nb; > > int (*sys_off_cb)(struct sys_off_data *data); > > @@ -88,12 +96,46 @@ void emergency_restart(void) > > } > > EXPORT_SYMBOL_GPL(emergency_restart); > > > > +#ifdef CONFIG_DEVICE_SHUTDOWN_TIMEOUT > > +static void device_shutdown_timeout_handler(struct timer_list *t) > > +{ > > + pr_emerg("**** device shutdown timeout ****\n"); > > What does this have to do with "devices"? This is a whole-system issue, > or really a "broken driver" issue. Shutdown/reboot procedure involves quite a few routines. We are specifically dealing with a broken driver or device malfunction. > > > + show_stack(devs_shutdown.task, NULL, KERN_EMERG); > > How do you know this is the 'device shutdown' stack? What is a "device > shutdown"? The timeout handler is invoked in the context of reboot/shutdown process. We are interested in the device shutdown callback which was stuck and preventing the reboot/shutdown. By device shutdown, we refer to device class/bus/driver shutdown calls. > > > + if (system_state == SYSTEM_RESTART) > > + emergency_restart(); > > + else > > + machine_power_off(); > > +} > > + > > +static void device_shutdown_timer_set(void) > > +{ > > + devs_shutdown.task = current; > > It's just the normal shutdown stack/process, why call it a device? We are specifically dealing with a broken driver or device malfunction, So to indicate same, we want to name it as device_shutdown stack. > > > + timer_setup(&devs_shutdown.timer, device_shutdown_timeout_handler, 0); > > + devs_shutdown.timer.expires = jiffies + SHUTDOWN_TIMEOUT * HZ; > > + add_timer(&devs_shutdown.timer); > > +} > > + > > +static void device_shutdown_timer_clr(void) > > +{ > > + del_timer(&devs_shutdown.timer); > > +} > > +#else > > +static inline void device_shutdown_timer_set(void) > > +{ > > +} > > +static inline void device_shutdown_timer_clr(void) > > +{ > > +} > > +#endif > > + > > void kernel_restart_prepare(char *cmd) > > { > > blocking_notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd); > > system_state = SYSTEM_RESTART; > > usermodehelper_disable(); > > + device_shutdown_timer_set(); > > device_shutdown(); > > + device_shutdown_timer_clr(); > > Why isn't all of this done in device_shutdown() if you think it is a > device issue? Why put it in reboot.c? Will consider moving to device_shutdown().. > > thanks, > > greg k-h Thanks and Regards, Soumya.