Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1176201imu; Wed, 16 Jan 2019 14:10:54 -0800 (PST) X-Google-Smtp-Source: ALg8bN6DfQG7ijOgH6NDaFssxiUwwaMM2aQg1azzREqxVCF74p1Ca0LQIMbsDg3y39QSZ+y0QE9S X-Received: by 2002:a17:902:2ec1:: with SMTP id r59mr12322096plb.254.1547676654861; Wed, 16 Jan 2019 14:10:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547676654; cv=none; d=google.com; s=arc-20160816; b=l8Fo1j+Obqm2n/736VMlFNifxjyx19a5W1fQSGtABIeB9uQ+3ubH5gDgvXwOHElOEU NmrQ0vQnY4AYj639pSpJU9TgeetQtVSZ3/W/E6O5tB40qGvC4eEoNcPp95qfKubtTYc0 nm7RgP327w+nG2V+8QC4Gs6Y9cy71TldwzjdrG8CX/FFFGurta/AZPPsnCsHht9rZBpS LbQ3AsJcXmreDg35EztHGQ/z7Eqlj1om1A8VORIErj9tsbQ14fpUit4mgqXI4bCpE9ru 5UxANpcn88Shz6Iv8BW+eSYFaS2cFodj7SCZ0lT6XktxC6bZzR5JCAgowtNCT0nIMAbD xaNw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:reply-to:message-id :subject:cc:to:from:date; bh=nG5l2YmBC5oc/+5fv7rcMpT/oO/LJgHn6wjAUiq9qoQ=; b=PQ0JeWMF8Y+OeorxQOGhU241PgUgqW0DG/bS5LJFlKzBW63tgSVFvb4bgIDvNkkgPx m63gIfQ/mETvZNQsPxrfuDpx08N2GqKIjHEWPwBzeoyimwfqXMrBX9nPFya/ZhGxrLIc 8014Pg9sDytWk8BRJUSc6sPqBuEYYfKF5hjNoLQxVb8QIrvTCd4KhF5Fcsqximgcs1o+ oxZQ5I9Y+2j/ILBBEfnI5OG2iDUyTnaDH/uzLskA3L9XXTkkn3nbgI4m58sBMPVkMgJQ ZiYDUL6xexEBvwEg44p4LSnFyeD6sopuRx7gb2esNOOO2cLFOetr/7jbTrflfOCdNDru IeZQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k25si7716634pfe.10.2019.01.16.14.10.28; Wed, 16 Jan 2019 14:10:54 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390833AbfAPOFI (ORCPT + 99 others); Wed, 16 Jan 2019 09:05:08 -0500 Received: from mx0b-002e3701.pphosted.com ([148.163.143.35]:49680 "EHLO mx0b-002e3701.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730952AbfAPOFH (ORCPT ); Wed, 16 Jan 2019 09:05:07 -0500 Received: from pps.filterd (m0148664.ppops.net [127.0.0.1]) by mx0b-002e3701.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x0G2Qbc5018936; Wed, 16 Jan 2019 02:27:33 GMT Received: from g2t2354.austin.hpe.com (g2t2354.austin.hpe.com [15.233.44.27]) by mx0b-002e3701.pphosted.com with ESMTP id 2q1qyd9cfy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 16 Jan 2019 02:27:32 +0000 Received: from g2t2360.austin.hpecorp.net (g2t2360.austin.hpecorp.net [16.196.225.135]) by g2t2354.austin.hpe.com (Postfix) with ESMTP id 4F431A2; Wed, 16 Jan 2019 02:27:32 +0000 (UTC) Received: from anatevka (anatevka.americas.hpqcorp.net [10.34.81.61]) by g2t2360.austin.hpecorp.net (Postfix) with ESMTP id EF87B3A; Wed, 16 Jan 2019 02:27:31 +0000 (UTC) Date: Tue, 15 Jan 2019 19:27:31 -0700 From: Jerry Hoemann To: Ivan Mironov Cc: linux-watchdog@vger.kernel.org, linux-kernel@vger.kernel.org, Wim Van Sebroeck , Guenter Roeck Subject: Re: [RFC PATCH 1/4] watchdog: hpwdt: Don't disable watchdog on NMI Message-ID: <20190116022731.GD18342@anatevka> Reply-To: Jerry.Hoemann@hpe.com References: <20190114023617.10656-1-mironov.ivan@gmail.com> <20190114023617.10656-2-mironov.ivan@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190114023617.10656-2-mironov.ivan@gmail.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-HPE-SCL: -1 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-01-16_01:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1901160016 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jan 14, 2019 at 07:36:14AM +0500, Ivan Mironov wrote: > Existing code disables watchdog on NMI right before completely hanging > the system. > > There are two problems here: > > * First, watchdog is expected to reset the system in a case of such > failure, no matter what. Documentation/watchdog/watchdog-api.txt explicitly allows for pretimeout NMI and generation of kernel crash dumps. By removing hpwdt_stop the system will likely fail to crash dump as there is only 9 seconds between receipt of a NMI and the iLO resetting the system. Unfortunately, kdump is not without issues and can also be difficult to properly configure either of which can result in failure to dump and reset. Customers who value availability over kdump collection, the pretimeout NMI can be disabled and hardware will not issue the pretimeout NMI and will only do reset. A middle ground for those who want tombstones but not kdump, would be to leave the pretimeout NMI enabled and add "panic=N" to the Linux command line. That way after the panic, the tombstone is printed and the system resets after N seconds. > * Second, this code has no effect if there are more than one watchdog. That is correct. Hpwdt will not turn off any other WDT. I don't see a current method of notifying other watchdogs that a given watchdog is going to take the system down. The closest I hook see is watchdog_notify_pretimeout, but I don't see that notifying other WDT. Its not clear to me that it should. (e.g. the second WDT could be of longer duration and protect against kdump hanging. This would need to be thought through.) > > Signed-off-by: Ivan Mironov > --- > drivers/watchdog/hpwdt.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c > index ef30c7e9728d..2467e6bc25c2 100644 > --- a/drivers/watchdog/hpwdt.c > +++ b/drivers/watchdog/hpwdt.c > @@ -170,8 +170,6 @@ static int hpwdt_pretimeout(unsigned int ulReason, struct pt_regs *regs) > if (ilo5 && !pretimeout && !mynmi) > return NMI_DONE; > > - hpwdt_stop(); > - > hex_byte_pack(panic_msg, mynmi); > nmi_panic(regs, panic_msg); > > -- > 2.20.1 -- ----------------------------------------------------------------------------- Jerry Hoemann Software Engineer Hewlett Packard Enterprise -----------------------------------------------------------------------------