Received: by 2002:ac0:a874:0:0:0:0:0 with SMTP id c49csp353146ima; Fri, 15 Mar 2019 04:25:43 -0700 (PDT) X-Google-Smtp-Source: APXvYqx5w8UsF9zj6WKUX74zLM0ABoH3O1GinFa7lwcje6DuRFxJDrJ4YciMxPcrOCpbUeeDL8DW X-Received: by 2002:a65:5c0b:: with SMTP id u11mr2859300pgr.352.1552649142986; Fri, 15 Mar 2019 04:25:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1552649142; cv=none; d=google.com; s=arc-20160816; b=AVER2U0aVBNsSYAnn3RbrbGLvEuyyO4cspCDOX4P8vLRZtfy6cFQqJYFEzBa5qf+tt W1HP/DTJNTjL8Y95pwBzesxNnY9pSliQna3p3mONeYdmzksfNJIAne3kFRQIS29+GDHZ Kvp/3+/8VWK9XIEcOrp+REVI90Vf1iltHbn8eESNiQuE5OjoXqIHtwdfFf+GwPEwGXdP jAJHHqeQSPQqmj8qJb2IVt+p9t/5c9dF//g3ABi8IZlRF0m/uHJ26PptU/uZZGALE4TY Y9MKfGacBQ/hgiwgzZfGScPavlA4HC/Y9g9bKM2YwnATZoioPG3IrBv7voW31Eexw6Ji XkwA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=Qy1jr0AkgXYq/ljGe2OnfsKJt7HMkUdQp8p34xIwKAc=; b=E2jZdQQ9rzB1vtCQksXX2JhmTZvf63Hwv3PdTDcIBcX96X0BaJOEq99FqMsaYJzUVO UiIzmkC2teN+k/xR8c/TFHizNLqt6Y4JIuoiYbrqdNNsS3r858HAvEeryZNcdflakV4r OPIoUcfFvmR9ZyZRkKs5tKENXWX3BYuEPV5LWXErdU37S3grur/7uleaP22hdVyvOmFj Ue033lxXi9QJGTRzf6SAAQMqsc4ZiS9g9GLNDquAwwxBw+M0aT/oOHVmkxRQspd25gFV hxI04+iinjywD1FxmYkjBtUqrhHH+sH8NzwfturQ1LLDaXjhAlcszrkbEJGzaebB9sbd JBKQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=Co32Sdu+; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u11si1546433pgp.205.2019.03.15.04.25.27; Fri, 15 Mar 2019 04:25:42 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=Co32Sdu+; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728938AbfCOKuM (ORCPT + 99 others); Fri, 15 Mar 2019 06:50:12 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:38770 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727553AbfCOKuL (ORCPT ); Fri, 15 Mar 2019 06:50:11 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=Qy1jr0AkgXYq/ljGe2OnfsKJt7HMkUdQp8p34xIwKAc=; b=Co32Sdu++BIbk2/vZ7GHprVPr gEVyEnlyKscI1qEkhxz8iESEv4WLAk7JbZ7zUla3FCr204PJTc9zuPEuI1Hpk9vEuJNKcnsYdMLn1 9XK1Okd2bEnD4reffCmcNUIwnMLpGPRz+vPk25TdLxds7yg0q8kFqr4eOsgpU3rNjrTrXuAcR7IZ8 GRBaNfWJ3rY1br/7Lg+MvxwUV4gief3ZI3iAyUR9KfI7lDdgiEH02n2dbU/CjYMSJA/3hF9S3wiXV 7za/3QDK7Tnsl0mGnzlwIQmgIxsCmYfJK6lfXZJh1yKZjnozIDxN9jugdkgWFOi+5mkC6TfZo6PEp HUqzgZ7yQ==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by bombadil.infradead.org with esmtpsa (Exim 4.90_1 #2 (Red Hat Linux)) id 1h4kPe-0003tR-OZ; Fri, 15 Mar 2019 10:50:07 +0000 Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 419802142294E; Fri, 15 Mar 2019 11:50:04 +0100 (CET) Date: Fri, 15 Mar 2019 11:50:04 +0100 From: Peter Zijlstra To: "Lendacky, Thomas" Cc: "x86@kernel.org" , "linux-kernel@vger.kernel.org" , Arnaldo Carvalho de Melo , Alexander Shishkin , Ingo Molnar , Borislav Petkov , Namhyung Kim , Thomas Gleixner , Jiri Olsa Subject: Re: [RFC PATCH 1/2] x86/perf/amd: Resolve race condition when disabling PMC Message-ID: <20190315105004.GW5996@hirez.programming.kicks-ass.net> References: <155232291547.21417.2499429555505085131.stgit@tlendack-t1.amdoffice.net> <155232292270.21417.18139649076000959940.stgit@tlendack-t1.amdoffice.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <155232292270.21417.18139649076000959940.stgit@tlendack-t1.amdoffice.net> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 11, 2019 at 04:48:44PM +0000, Lendacky, Thomas wrote: > On AMD processors, the detection of an overflowed counter in the NMI > handler relies on the current value of the counter. So, for example, to > check for overflow on a 48 bit counter, bit 47 is checked to see if it > is 1 (not overflowed) or 0 (overflowed). > > There is currently a race condition present when disabling and then > updating the PMC. Increased NMI latency in newer AMD processors makes this > race condition more pronounced. Increased NMI latency also makes the results less useful :/ What amount of skid are we talking about, and is there anything AMD is going to do about this? > If the counter value has overflowed, it is > possible to update the PMC value before the NMI handler can run. Arguably the WRMSR should sync against the PMI. That is the beahviour one would expect. Isn't that something you can fix in ucode? And could you very please tell the hardware people this is disguisting? > The updated PMC value is not an overflowed value, so when the perf NMI > handler does run, it will not find an overflowed counter. This may > appear as an unknown NMI resulting in either a panic or a series of > messages, depending on how the kernel is configured. > > To eliminate this race condition, the PMC value must be checked after > disabling the counter in x86_pmu_disable_all(), and, if overflowed, must > wait for the NMI handler to reset the value before continuing. Add a new, > optional, callable function that can be used to test for and resolve this > condition. > > Cc: # 4.14.x- > Signed-off-by: Tom Lendacky > +static void amd_pmu_wait_on_overflow(int idx, u64 config) > +{ > + unsigned int i; > + u64 counter; > + > + /* > + * We shouldn't be calling this from NMI context, but add a safeguard > + * here to return, since if we're in NMI context we can't wait for an > + * NMI to reset an overflowed counter value. > + */ > + if (in_nmi()) > + return; > + > + /* > + * If the interrupt isn't enabled then we won't get the NMI that will > + * reset the overflow condition, so return. > + */ > + if (!(config & ARCH_PERFMON_EVENTSEL_INT)) > + return; > + > + /* > + * Wait for the counter to be reset if it has overflowed. This loop > + * should exit very, very quickly, but just in case, don't wait > + * forever... > + */ > + for (i = 0; i < OVERFLOW_WAIT_COUNT; i++) { > + rdmsrl(x86_pmu_event_addr(idx), counter); > + if (counter & (1ULL << (x86_pmu.cntval_bits - 1))) > + break; > + > + /* Might be in IRQ context, so can't sleep */ > + udelay(1); > + } > +} Argh.. that's horrible, as I'm sure you fully appreciate :/ > diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c > index b684f0294f35..f1d2f70000cd 100644 > --- a/arch/x86/events/core.c > +++ b/arch/x86/events/core.c > @@ -606,6 +606,9 @@ void x86_pmu_disable_all(void) > continue; > val &= ~ARCH_PERFMON_EVENTSEL_ENABLE; > wrmsrl(x86_pmu_config_addr(idx), val); > + > + if (x86_pmu.wait_on_overflow) > + x86_pmu.wait_on_overflow(idx, val); > } > } One alternative is adding amd_pmu_disable_all() to amd/core.c and using that. Then you can also change the loop to do the wait after all the WRMSRs, if that helps with latency.