Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp144274iob; Tue, 17 May 2022 21:44:51 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzpQG5Tj6IwyvwkrOuXBOtZjjw4bXMX0NrFQAmOUMNDVqo/dUVAxn9vsD+fvkc3rlfqIMgV X-Received: by 2002:a17:90a:cb8d:b0:1df:26ba:6338 with SMTP id a13-20020a17090acb8d00b001df26ba6338mr19722048pju.142.1652849091617; Tue, 17 May 2022 21:44:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652849091; cv=none; d=google.com; s=arc-20160816; b=MGcAI2rSp6VH4Ifh3T8ZxFvm6Iv2W4RdB4qnoZDH8PxeFb3AqSaAtlvFuW6NaNzFFs fKmw2VgOQ559nM1wXNCZXvkc7w+2s8+Rja+3qIhlcNOcs9Sci4j/GW0X8llSDWFbCIal h04bMZ9hE3kZnGeCO+dUCRjIysBerHQUZsCv578e4+ApseukrlQXwNuLsk9bPqvHJ+9Y 5oUws4EZV1ce0pfJhJTHjWQwl2RjiS+hlEdEkITtwvNmoUJ95jdzpiBifhE2LfbIyNhx oFdUMLsof+azYD1ia7C0RrzsI8MT6Z+PLJs+2TgqF1g1y0iJip8CRIoUzSwAr4x1znGl +1qw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=9Tp7TJIEUex3gI6Kd3ApVf2WVx3B3rr9lV6UjvXUYNQ=; b=VktXpV8Or5DEJ6Q71QRxFuZmoMjmq68OlxRMBxhJiOSMghNp3x6s2Mt3uAgwuQRuab dlYI1AtIqyZRs4FzNoC0t53frF1xnW+1Ca7KEIv1MQi/WUXvPIkXOpFvL9iNUPlAFWWl WCogGQeN+Rb2V9JQuLJmvEn+SAZ0zrVBtawyLLk6buFQi7MsJhZFk5zr7hRRET3OOVYf lG/IRUNsLnFap/iluiZ1HWm9fS5DaMxk1PEAG02Z/UvNzClzdm6qd+594iUq+G6Dptcy XdN1xyt3mQTvv0Wh9hrSiJaEtrYC+dYOI0rOdHjEsFlxWfxL08/oWFZG8ZlU5S//x+qO hkXw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="YUuJ/uUK"; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id k5-20020a634b45000000b003c4a06004bcsi1358631pgl.365.2022.05.17.21.44.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 May 2022 21:44:51 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="YUuJ/uUK"; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 83E5266C99; Tue, 17 May 2022 20:57:10 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1352259AbiEQSiP (ORCPT + 99 others); Tue, 17 May 2022 14:38:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34416 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231630AbiEQSiN (ORCPT ); Tue, 17 May 2022 14:38:13 -0400 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3A4AC37BF8 for ; Tue, 17 May 2022 11:38:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1652812693; x=1684348693; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=FTTY9ZFMfeTd8CI19p2u5YEYLQvqFSdIkPKeSEcRj4U=; b=YUuJ/uUKTbiDNFd4y++5VrAtg9TpHLmaQenw1VguwtX41VOsbhYcMRF/ +WV0jZz/ovTh+WHJfS7eQRHhBgYjW+E8ObYQgDD0n7CfZOzOp8KeKP4Fe VYbasYy4TvlRO7GunlbYTbiEkFfGWN9jqVa275WZBao3zWJUM6JdUEeGS wwNSrmswYoqYhBDSfdikZlBQtDBiz6ZTjH6SyfPxWhCU/dvRPiBJeUOYZ QGskWD3gpuE6SjUsaml91/F7Ork3NINy/Q9NA4CSxLKovZaGw83ziWULl 8bN9fKRiUZPr3iWIzLVMvNFZs4sWOAi/nUmcbAcNKejAp2N0+8d/tYKlM Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10350"; a="251180154" X-IronPort-AV: E=Sophos;i="5.91,233,1647327600"; d="scan'208";a="251180154" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 May 2022 11:38:12 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.91,233,1647327600"; d="scan'208";a="555910452" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by orsmga002.jf.intel.com with ESMTP; 17 May 2022 11:38:12 -0700 Date: Tue, 17 May 2022 11:41:54 -0700 From: Ricardo Neri To: Thomas Gleixner Cc: x86@kernel.org, Tony Luck , Andi Kleen , Stephane Eranian , Andrew Morton , Joerg Roedel , Suravee Suthikulpanit , David Woodhouse , Lu Baolu , Nicholas Piggin , "Ravi V. Shankar" , Ricardo Neri , iommu@lists.linux-foundation.org, linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v6 21/29] x86/nmi: Add an NMI_WATCHDOG NMI handler category Message-ID: <20220517184154.GA6711@ranerica-svr.sc.intel.com> References: <20220506000008.30892-1-ricardo.neri-calderon@linux.intel.com> <20220506000008.30892-22-ricardo.neri-calderon@linux.intel.com> <87a6bqrelv.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87a6bqrelv.ffs@tglx> User-Agent: Mutt/1.9.4 (2018-02-28) X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 09, 2022 at 03:59:40PM +0200, Thomas Gleixner wrote: > On Thu, May 05 2022 at 17:00, Ricardo Neri wrote: > > Add a NMI_WATCHDOG as a new category of NMI handler. This new category > > is to be used with the HPET-based hardlockup detector. This detector > > does not have a direct way of checking if the HPET timer is the source of > > the NMI. Instead, it indirectly estimates it using the time-stamp counter. > > > > Therefore, we may have false-positives in case another NMI occurs within > > the estimated time window. For this reason, we want the handler of the > > detector to be called after all the NMI_LOCAL handlers. A simple way > > of achieving this with a new NMI handler category. > > > > @@ -379,6 +385,10 @@ static noinstr void default_do_nmi(struct pt_regs *regs) > > } > > raw_spin_unlock(&nmi_reason_lock); > > > > + handled = nmi_handle(NMI_WATCHDOG, regs); > > + if (handled == NMI_HANDLED) > > + goto out; > > + > > How is this supposed to work reliably? > > If perf is active and the HPET NMI and the perf NMI come in around the > same time, then nmi_handle(LOCAL) can swallow the NMI and the watchdog > won't be checked. Because MSI is strictly edge and the message is only > sent once, this can result in a stale watchdog, no? This is true. Instead, at the end of each NMI I should _also_ check if the TSC is within the expected value of the HPET NMI watchdog. In this way, unrelated NMIs (e.g., perf NMI) are handled and we don't miss the NMI from the HPET channel. Thanks and BR, Ricardo