Received: by 2002:ac0:8c9a:0:0:0:0:0 with SMTP id r26csp1294407ima; Fri, 1 Feb 2019 20:56:49 -0800 (PST) X-Google-Smtp-Source: AHgI3IaI/IimfaVJ+NzFyACc7loE8XutfyuGxlAiFpSydOcvZay/TsGI1EsxYJunAFqfA1YhACX6 X-Received: by 2002:a63:4745:: with SMTP id w5mr5157309pgk.377.1549083409170; Fri, 01 Feb 2019 20:56:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1549083409; cv=none; d=google.com; s=arc-20160816; b=Ozllfe7nDepPeusq/iUxL/OIPH2+Fme8AEaA+fiM1NrJA2AZrfuQfa9ozHzB2Rm0Pj omtxtj7RkHBK+6ZwY1qMu88ANzt4D9RgBhlsF2ULChor6yAj31wL33wip3d2xQHkk5qV /wYPbLtpERZC23FXqefetLCPT09IVbS4OwUDRX6TIJlYw97//mJWLFQoVcVBJn3jiDoQ eM2W2z5oSs/4OjO0tZtWcJpTUqobpEopfKr5ivBUJsUsIqD+pthrHeMz8PiR0GyDlBtX i377ELIL0K2+6F+g/nSpnwgAKwftmQDpktDuW8fsCmI4MAD1TaXhAV+Ma9YnCxd3vcAR KDBQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id:dkim-signature; bh=b8Vll334F3rCjHyxrsD6gX6NXlEadiPEZ1K0Ra/O4Es=; b=uQOhuBw6w1qm0qaGz/I1wgghQhFDN5cjq5XkNcOJYdsQD8h+SR5HTHjfN/bkuDnz0l mgK7E3j/onqUbYN5yTj1IbZ1ykK44YaJqKRV2ZE7oO15zM772Nq7Z/3wcOuwp1xmrLfu Ok8KjABIkfz/cciFKCLaQeGJ8qrg4bG6Nz9nyjYS7l+YbpIEu6LXek9y4tb7sriitQ8E 6SiWkeoiepxYNRS6qwda5sdSIbGdfxI1+lhG4junMECLnUFABJTedGCfKlJKqVVlnLak IbcNV36P6riN2dE9Xg0ZzZbIJhOSEcJkfEn0YEbcK93+IZnGXtq1CegXIVJTmk0SSlmR bi7Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=tsUqOyUu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j26si8533059pgl.537.2019.02.01.20.56.25; Fri, 01 Feb 2019 20:56:49 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=tsUqOyUu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727041AbfBBEzg (ORCPT + 99 others); Fri, 1 Feb 2019 23:55:36 -0500 Received: from mail-lj1-f193.google.com ([209.85.208.193]:45811 "EHLO mail-lj1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726516AbfBBEzf (ORCPT ); Fri, 1 Feb 2019 23:55:35 -0500 Received: by mail-lj1-f193.google.com with SMTP id s5-v6so7517610ljd.12; Fri, 01 Feb 2019 20:55:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:subject:from:to:cc:date:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=b8Vll334F3rCjHyxrsD6gX6NXlEadiPEZ1K0Ra/O4Es=; b=tsUqOyUuyP4HQOo7GBMx26N7DbG4qZMZLYHRropSclgvK5b3D6dad6IcqUq+45BH+Z RBNb3mvcSgADuasBTScTmHO5q8QxoJtTgeNbgaiSmiYPrbhGGeV9+GImwzZh3itMlTfj 7V0jS+8PhR/E2FH5vT/Gz3Xf3IrTfXoPWDqooWcTgd0E39emjjgqcdzu8LUV2Gfbhqvj QI5A5gR2BUeE/dO81g667vodqgz7ZAj8wRbPVNQLO8MFF2w0h18CLIPIRMjovMpH8Vdg VYvam1GRfnhaMGcLaDo5eLQ/+sr8qzxadB4Rv3V2b1auPnDWpee7ZusRewg/i7BjOiTv NVsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=b8Vll334F3rCjHyxrsD6gX6NXlEadiPEZ1K0Ra/O4Es=; b=EmCwDgI9/6rjCbw53zdaprlIJK/b1PVRgkYKi9PZHmXWxnXZdjthUaMjHS6MbcW6zc K581KvMo5zg/Uv/P2hl5x6X0z1CwQLK1ri9t8oVoZeMHwMwxWe+gPh9u2KIGd4rUcZ5a e3NMqduQBHvVbQIWV5lZsS+vm/gYPe9ICHD53rNfShb0P3jiyz7q91521UIFTshP2eHb 3lpwD8BC/QYgsnn4alD4UezKzgJHMGQ+HyqIaNJNm+fuXst2kcUHBtcEfHEkq+6LkUFz dFJlCO+fNxdfJTYUQ6lBTBKc4LJr/dyyXqT5j5SQiHT+CEbLje9WJCWAbAnyAryMyI6R 8q8Q== X-Gm-Message-State: AJcUukd8a+BuI8Caw+gxIPQo8RVQPbEArkxskLWHtTQUrSQwZqDECqwL /dLlKvnX1XaUqTunCWwcaqNEaNxv6EQ= X-Received: by 2002:a2e:81d3:: with SMTP id s19-v6mr30680781ljg.138.1549083333302; Fri, 01 Feb 2019 20:55:33 -0800 (PST) Received: from im-mac (pool-109-191-226-91.is74.ru. [109.191.226.91]) by smtp.gmail.com with ESMTPSA id m4-v6sm1575342ljb.58.2019.02.01.20.55.30 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 01 Feb 2019 20:55:32 -0800 (PST) Message-ID: <84950aa0d194f28389f3bb209154ddc26e96c6de.camel@gmail.com> Subject: Re: [RFC PATCH 1/4] watchdog: hpwdt: Don't disable watchdog on NMI From: Ivan Mironov To: Jerry.Hoemann@hpe.com Cc: linux-watchdog@vger.kernel.org, linux-kernel@vger.kernel.org, Wim Van Sebroeck , Guenter Roeck Date: Sat, 02 Feb 2019 09:55:29 +0500 In-Reply-To: <20190116022731.GD18342@anatevka> References: <20190114023617.10656-1-mironov.ivan@gmail.com> <20190114023617.10656-2-mironov.ivan@gmail.com> <20190116022731.GD18342@anatevka> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.30.4 (3.30.4-1.fc29) Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 2019-01-15 at 19:27 -0700, Jerry Hoemann wrote: > On Mon, Jan 14, 2019 at 07:36:14AM +0500, Ivan Mironov wrote: > > Existing code disables watchdog on NMI right before completely hanging > > the system. > > > > There are two problems here: > > > > * First, watchdog is expected to reset the system in a case of such > > failure, no matter what. > > Documentation/watchdog/watchdog-api.txt > > explicitly allows for pretimeout NMI and generation of kernel crash dumps. > > By removing hpwdt_stop the system will likely fail to crash dump > as there is only 9 seconds between receipt of a NMI and the iLO > resetting the system. > > Unfortunately, kdump is not without issues and can also be difficult > to properly configure either of which can result in failure to dump > and reset. > > Customers who value availability over kdump collection, the pretimeout > NMI can be disabled and hardware will not issue the pretimeout NMI > and will only do reset. > > A middle ground for those who want tombstones but not kdump, would > be to leave the pretimeout NMI enabled and add "panic=N" to the > Linux command line. That way after the panic, the tombstone is > printed and the system resets after N seconds. > > Somehow I missed the whole pretimout thing when reading about the watchdog API. Thanks for clarification, now code makes much more sense =). Still, I do not really understand the point of enabling of kdump support in hpwdt driver by default while kdump is not enabled by default. Also, existing code may call hpwdt_stop() (and thus break watchdog) even if pretimout is disabled. Also, "panic=N" option is not providing a way to *not* panic on NMI unrelated with iLO. This could be circumvented by blacklisting the hpwdt module entirely, but normal watchdog functionality would be lost then. It is possible to rebuild kernel without HPWDT_NMI_DECODING (which is enabled in Fedora, for example). But it is nearly impossible to come to this solution without examining the source code, because description of this option does not mention that it is really about pretimout support and panics and not about something else... I would say that current default behavior of hpwd is slightly confusing in multiple different ways. > > > * Second, this code has no effect if there are more than one watchdog. > > That is correct. Hpwdt will not turn off any other WDT. > > I don't see a current method of notifying other watchdogs > that a given watchdog is going to take the system down. > > The closest I hook see is watchdog_notify_pretimeout, but I don't > see that notifying other WDT. Its not clear to me that it should. > (e.g. the second WDT could be of longer duration and protect against > kdump hanging. This would need to be thought through.) > > > > > Signed-off-by: Ivan Mironov > > --- > > drivers/watchdog/hpwdt.c | 2 -- > > 1 file changed, 2 deletions(-) > > > > diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c > > index ef30c7e9728d..2467e6bc25c2 100644 > > --- a/drivers/watchdog/hpwdt.c > > +++ b/drivers/watchdog/hpwdt.c > > @@ -170,8 +170,6 @@ static int hpwdt_pretimeout(unsigned int ulReason, struct pt_regs *regs) > > if (ilo5 && !pretimeout && !mynmi) > > return NMI_DONE; > > > > - hpwdt_stop(); > > - > > hex_byte_pack(panic_msg, mynmi); > > nmi_panic(regs, panic_msg); > > > > -- > > 2.20.1