Received: by 2002:a05:7412:d8a:b0:e2:908c:2ebd with SMTP id b10csp216056rdg; Thu, 12 Oct 2023 03:46:02 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGanMsCQYynLcr0XQyaEoXAf4HrsgKWzv44GOLzMmztdxO4ESQ7bduGBVJJDW19DVI6R9S1 X-Received: by 2002:a17:902:da88:b0:1c3:ed30:ce0a with SMTP id j8-20020a170902da8800b001c3ed30ce0amr31686054plx.19.1697107562566; Thu, 12 Oct 2023 03:46:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697107562; cv=none; d=google.com; s=arc-20160816; b=wz1mnAy1Qb0/GyuZSXLyFtcEEYQthbNedcQu7baw4vHM3P8kBhydP72tmUXzVqOLT5 AMJfVzGsBTLWZAWfAYfjq9vp+QakvNl1s4A2mAC12NLdGKnQR0M1T9ITrxR9K93Op+bg DufvQzKUzNYaPDo2wSJO+KKYfGEHGUapdxgB8xkxUarGmbnv7UPQ10ceDnizVR9OGoxq fQdE2zGW/4Vg/NLonk9Ef0OOkcJmvPR+DzgXQBwqqT4DTuwka7IMjEpqqv+CcGuvPqVz Z8pNG0Q1Gr38IAycZR4kl6SpPn3oQHwaYLLL4SCd4eWJlp5FDubFetUxpW6wUDnz0oVp soCQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=IanqyZCYnpQel5k+KbN1EsZy0VSZSzl8mBBusf84zro=; fh=4BBLcG3z/K2wrvL41yqa9NAtQfdBi6x67JAmhpv4wuA=; b=EJDK17q0OnmaEKC0i+HfMcUHbFZOmylPoRqcKsf1XrYRH09pKz0zWz+znHCKf/Ur06 LYhmU53YlYFWbK/9DpPQ0YVdpgOyEc7BE/acHPOcibGZf7fycyV67SqoH43jL80v2Gps yD5f46VI3oTrmOeeqot0Uz9IbB8DqqXIHDXI7IPM3+WGJd4pO1Y35kbzM/bczsIWpjpf ivYF3B6INWVBmz5p12/Hl5b7LQs8Nujq0g2bZhq/e2MULYr676J5pJyYlJjdLGBb8KWF /X57Odr1hKJ+oHEZbkk636qv03hk46EJUPan2ugK2RUjzXSSufHIzXih/jpXhvlYKwIy eMOw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=DVlxCurF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id x1-20020a1709028ec100b001c61af1e683si1863553plo.641.2023.10.12.03.46.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 12 Oct 2023 03:46:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=DVlxCurF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 86A0880870F9; Thu, 12 Oct 2023 03:45:28 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343816AbjJLKpU (ORCPT + 99 others); Thu, 12 Oct 2023 06:45:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38956 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343649AbjJLKpS (ORCPT ); Thu, 12 Oct 2023 06:45:18 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AD01CC0 for ; Thu, 12 Oct 2023 03:45:16 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3E7EAC433C8; Thu, 12 Oct 2023 10:45:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1697107516; bh=BulDcowsMTrV6xgTGD2OQVcPXG/gb9pc52EWLjvRut4=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=DVlxCurFFLw5dSxuTdxeM5Z0Rl9qxStyqwVlAzxy36QGLtWMokRiysVLOwAS6Azcc fc7KfTR33WP/C7U5EUqICrUMm2hNTWuq1HvAkNoMLX3702+HygD0CvtsVb3hitxxyq g4Z+47/PUqA3DyQKDKfwQfVy4gp+R4m5hV5xmXAdk3TQeC6/yxAGkXsA5eeCPmbrt6 +/vcdlfB92S927u00ZydSZMtnMB3C3mMc3dlxPjt9jZGpZub8Rof9czjfbkXze5p7R +v8Cc5wIZaERz7WNp4Laq+FrXtov7ioCxJZRYuh33hxyG9in6x/5rUF7TqWyNy3cFh XnoYb8MEI6XKA== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id CEEDCCE0868; Thu, 12 Oct 2023 03:45:15 -0700 (PDT) Date: Thu, 12 Oct 2023 03:45:15 -0700 From: "Paul E. McKenney" To: Ingo Molnar Cc: linux-kernel@vger.kernel.org, x86@kernel.org, clm@fb.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com Subject: Re: [PATCH RFC x86/nmi] Fix out-of-order nesting checks Message-ID: Reply-To: paulmck@kernel.org References: <0cbff831-6e3d-431c-9830-ee65ee7787ff@paulmck-laptop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Thu, 12 Oct 2023 03:45:28 -0700 (PDT) On Thu, Oct 12, 2023 at 08:37:25AM +0200, Ingo Molnar wrote: > > * Paul E. McKenney wrote: > > > The ->idt_seq and ->recv_jiffies variables added by commit 1a3ea611fc10 > > ("x86/nmi: Accumulate NMI-progress evidence in exc_nmi()") place > > the exit-time check of the bottom bit of ->idt_seq after the > > this_cpu_dec_return() that re-enables NMI nesting. This can result in > > the following sequence of events on a given CPU in kernels built with > > CONFIG_NMI_CHECK_CPU=y: > > > > o An NMI arrives, and ->idt_seq is incremented to an odd number. > > In addition, nmi_state is set to NMI_EXECUTING==1. > > > > o The NMI is processed. > > > > o The this_cpu_dec_return(nmi_state) zeroes nmi_state and returns > > NMI_EXECUTING==1, thus opting out of the "goto nmi_restart". > > > > o Another NMI arrives and ->idt_seq is incremented to an even > > number, triggering the warning. But all is just fine, at least > > assuming we don't get so many closely spaced NMIs that the stack > > overflows or some such. > > > > Experience on the fleet indicates that the MTBF of this false positive > > is about 70 years. Or, for those who are not quite that patient, the > > MTBF appears to be about one per week per 4,000 systems. > > > > Fix this false-positive warning by moving the "nmi_restart" label before > > the initial ->idt_seq increment/check and moving the this_cpu_dec_return() > > to follow the final ->idt_seq increment/check. This way, all nested NMIs > > that get past the NMI_NOT_RUNNING check get a clean ->idt_seq slate. > > And if they don't get past that check, they will set nmi_state to > > NMI_LATCHED, which will cause the this_cpu_dec_return(nmi_state) > > to restart. > > This looks like a sensible fix: the warning should obviously be atomic wrt. > the no-nesting region. I've applied your fix to tip:x86/irq, as it doesn't > seem urgent enough with a MTBF of 70 years to warrant tip:x86/urgent handling. ;-) Works for me! ;-) And thank you! Thanx, Paul