Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp841645imu; Wed, 16 Jan 2019 08:26:19 -0800 (PST) X-Google-Smtp-Source: ALg8bN57B78ZOYxX0/79mtJ6XNp3ClRZC/a/pZXBpzogGch2HWZBmzvgjWbPdl02YzBKeh8MfnB1 X-Received: by 2002:a63:990a:: with SMTP id d10mr9641759pge.279.1547655979584; Wed, 16 Jan 2019 08:26:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547655979; cv=none; d=google.com; s=arc-20160816; b=DWpmwDgSenUKVcTZO8aSKsy2PBELdrNUaHju+PD8pUcC8ECtH9LGnPnK9T5QNzfvz8 6aygDdqujgz70rXOfltNnISgrnABZfM/Z3g9jhnKFZQ+Q3XK/AS6gFF5nmPayXDRAb8D jMbd7HrrmZD24U65pLQjJPwr7vJYI/u8d8ovPwQ0KuuMVh6CtmULER1GwILPXPNQCOUM gKKza/4lx5VYBlHT+ezsrdXwzS0flwFw2xQUSoJBFbqRVndJJbCADXG9mJEHnnXKl5iy LaJOV8jd0MPOHdM0j359k9d1K9KjIfB8zdiaraGMrzQqbzyf7kmpnYoROL7J8NB6eX5a 7HLA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:reply-to:message-id :subject:cc:to:from:date; bh=M41XYD3jiaJ+NZQpsm624CtE8MGChaddoj6z90TpPWw=; b=goo7xc/jSgI05dIWh7Trkp4CXN0O51rHKtGLEPUDpAVFX/uR6ti5tc80NFykUkz/Ja zn8ekv225t132bbHnwapEt/Cg92ooirZvsvy79clF0ee/fR3q8ZWvQIFVjS1ylFd4aU6 tC/JizCwDHNEjfQI3jpuqcWyLkTe40kRBK1oxj9YHA0o5VmMk5n4Mo4VJVJ0vpnAR2Nw s9Qq1RDvy7z0SJaqUJbOA5MfaQ0eZhPofrkkn7VgE/SFcdjDrgTtlqq5kJTKK536nZAt 7+krP+kflEp3dTTsr88pc3h7N+9bBbcJCA2h9lKcC2IEwRymTAT9zXj/M5PBSDX0stN/ 5LLw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w24si6446805pgj.582.2019.01.16.08.25.54; Wed, 16 Jan 2019 08:26:19 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731389AbfAPDrp (ORCPT + 99 others); Tue, 15 Jan 2019 22:47:45 -0500 Received: from mx0b-002e3701.pphosted.com ([148.163.143.35]:59564 "EHLO mx0b-002e3701.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728627AbfAPDrp (ORCPT ); Tue, 15 Jan 2019 22:47:45 -0500 X-Greylist: delayed 5080 seconds by postgrey-1.27 at vger.kernel.org; Tue, 15 Jan 2019 22:47:44 EST Received: from pps.filterd (m0150244.ppops.net [127.0.0.1]) by mx0b-002e3701.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x0G2BwCR029079; Wed, 16 Jan 2019 02:22:44 GMT Received: from g2t2352.austin.hpe.com (g2t2352.austin.hpe.com [15.233.44.25]) by mx0b-002e3701.pphosted.com with ESMTP id 2q1qkb9ghf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 16 Jan 2019 02:22:44 +0000 Received: from g2t2360.austin.hpecorp.net (g2t2360.austin.hpecorp.net [16.196.225.135]) by g2t2352.austin.hpe.com (Postfix) with ESMTP id CA30D85; Wed, 16 Jan 2019 02:22:43 +0000 (UTC) Received: from anatevka (anatevka.americas.hpqcorp.net [10.34.81.61]) by g2t2360.austin.hpecorp.net (Postfix) with ESMTP id 2679A3A; Wed, 16 Jan 2019 02:22:43 +0000 (UTC) Date: Tue, 15 Jan 2019 19:22:42 -0700 From: Jerry Hoemann To: Ivan Mironov Cc: linux-watchdog@vger.kernel.org, linux-kernel@vger.kernel.org, Wim Van Sebroeck , Guenter Roeck Subject: Re: [RFC PATCH 0/4] watchdog: hpwdt: Fix NMI-related behaviour when CONFIG_HPWDT_NMI_DECODING is enabled Message-ID: <20190116022242.GC18342@anatevka> Reply-To: Jerry.Hoemann@hpe.com References: <20190114023617.10656-1-mironov.ivan@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190114023617.10656-1-mironov.ivan@gmail.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-HPE-SCL: -1 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-01-16_01:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=936 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1901160015 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jan 14, 2019 at 07:36:13AM +0500, Ivan Mironov wrote: > Hi, > > I found out that hpwdt alters NMI behaviour unexpectedly if compiled > with enabled CONFIG_HPWDT_NMI_DECODING: > > * System starts to panic on any NMI with misleading message. hpwdt doesn't start to panic on any NMI. It starts to panic on: 1) NMI_SERR associated with NMI 2) NMI_IO_CHECK associated with IO errors 3) NMI_UNKNOWN NMI unclaimed by all local handlers. On Gen10 going forward we plan to restrict to just iLO generated NMIs. There is a long history on hp/hpe proliant systems where hpwdt was handler of general IO errors (at least ones that would cause NMI to be generated) and we chose to panic in these situation as the errors were generally quite serious. Yes, this has caused some problems in the past as Linux has overloaded NMI and some subsystems didn't claim the NMIs that they generated (think profiling.) But, I haven't seen these types of problems for several years now. The more modern platforms have more robust error handling built into them and to linux so going forward we'll restrict hpwdt to a more traditional WDT role. But we're retaining the more conservative approach for legacy platforms. How would you suggest that the message be enhanced? > * Watchdog provided by hpwdt is not working after such panic. > > Here are the patches that should fix this. > > This is an RFC patch series because I am not sure that patches are > correct. Questions: > > * Are "mynmi" flags always set on all supported iLO versions when iLO > is the source of NMI? Unfortunately no. hpwdt is a dual purpose driver. It handles the iLO watchdog timer and the "Generate NMI to System" button. These are closely related hardware wise. However, some platforms generate NMI for "Generate NMI to System" button but aren't signaled via iLO registers. These will show up as NMI_UNKNOWN, hence while hpwdt still claims these. There are also some systems that do not set the nmistat bits correctly. So as to not break legacy platforms, the use the nmistat bits for control will be for Gen10 going forward. > * Is it safe to reset "mynmi" flags to zero if code decides to not panic? The reading of the registers is itself destructive (sets to zero) but the real issue is that some proliant systems lack the ability to acknowledge the NMI so only one can ever be received. So returning is not advisable as no further NMI will be generated via this path. A reset through firmware is required to restore the feature. > > Ivan Mironov (4): > watchdog: hpwdt: Don't disable watchdog on NMI > watchdog: hpwdt: Don't panic on foreign NMI > watchdog: hpwdt: Add more information into message > watchdog: hpwdt: Make panic behaviour configurable > > drivers/watchdog/hpwdt.c | 45 ++++++++++++++++++++++------------------ > 1 file changed, 25 insertions(+), 20 deletions(-) > > -- > 2.20.1 -- ----------------------------------------------------------------------------- Jerry Hoemann Software Engineer Hewlett Packard Enterprise -----------------------------------------------------------------------------