Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756920AbcCCKBH (ORCPT ); Thu, 3 Mar 2016 05:01:07 -0500 Received: from bombadil.infradead.org ([198.137.202.9]:43456 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751279AbcCCKBF (ORCPT ); Thu, 3 Mar 2016 05:01:05 -0500 Date: Thu, 3 Mar 2016 11:00:58 +0100 From: Peter Zijlstra To: kan.liang@intel.com Cc: linux-kernel@vger.kernel.org, ak@linux.intel.com, eranian@google.com, alexander.shishkin@linux.intel.com, vincent.weaver@maine.edu, Ingo Molnar Subject: Re: [PATCH V2 1/1] perf, x86: fix pebs warning by only restore active pmu in pmi Message-ID: <20160303100058.GJ6356@twins.programming.kicks-ass.net> References: <1456956233-3148-1-git-send-email-kan.liang@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1456956233-3148-1-git-send-email-kan.liang@intel.com> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2064 Lines: 55 On Wed, Mar 02, 2016 at 05:03:53PM -0500, kan.liang@intel.com wrote: > diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c > index a7ec685..6e95da0 100644 > --- a/arch/x86/events/intel/core.c > +++ b/arch/x86/events/intel/core.c > @@ -1929,7 +1929,9 @@ again: > goto again; > > done: > - __intel_pmu_enable_all(0, true); > + /* Only restore pmu when it's active. */ > + if (cpuc->enabled != 0) > + __intel_pmu_enable_all(0, true); > /* > * Only unmask the NMI after the overflow counters > * have been reset. This avoids spurious NMIs on Very good catch that! That might actually explain some of the weirdness I've seen from the fuzzer but was too sporadic to really pin down yet. So that got 'introduced' by: 3fb2b8ddcc6a ("perf, x86, Do not user perf_disable from NMI context") which has a shit Changelog for not explaining the actual race, whoever let me merge that.. :-) I suspect the race is the NMI hitting between incrementing the pmu->pmu_disable_count and setting the cpuc state in x86_pmu_disable(). A nested perf_pmu_disable() call would then see a !0 disable_count and not call x86_pmu_disable() and not initialize the cpuc state. Now, your fix relies on nested __intel_pmu_disable_all() to not change state, which I think is true. The intel_bts_disable_local() call does change state, but a nested call should not matter. This very much needs a comment though. There now is the fun issue if the PMI hitting after cpuc->enabled = 0; but before x86_pmu.disable_all(). I suppose this too is harmless, but again, this very much needs a comment. Looking through the PMI handlers, the KNC driver suffers the same problem (not to mention however many !x86 drivers that copied this, arm-xscale for example seems affected, as does the MIPS driver, assuming they have actual NMIs) So could you respin with comments at: __intel_pmu_disable_all() describing the requirement that consecutive calls should work and why. x86_pmu_disable() a comment about the PMI landing after enabled=0 And fix the KNC driver too.