Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S941219AbcLVNZc (ORCPT ); Thu, 22 Dec 2016 08:25:32 -0500 Received: from bombadil.infradead.org ([198.137.202.9]:50161 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755560AbcLVNZ2 (ORCPT ); Thu, 22 Dec 2016 08:25:28 -0500 Date: Thu, 22 Dec 2016 14:25:23 +0100 From: Peter Zijlstra To: Stephane Eranian Cc: linux-kernel@vger.kernel.org, mingo@elte.hu, ak@linux.intel.com, vincent.weaver@maine.edu Subject: Re: [PATCH] perf/x86/pebs: fix handling of PEBS buffer overflows Message-ID: <20161222132523.GM3107@twins.programming.kicks-ass.net> References: <1482395366-8992-1-git-send-email-eranian@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1482395366-8992-1-git-send-email-eranian@google.com> User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1543 Lines: 35 On Thu, Dec 22, 2016 at 12:29:26AM -0800, Stephane Eranian wrote: > This patch solves a race condition between PEBS and the PMU handler. > > In case multiple PEBS events are sampled at the same time, > it is possible to have GLOBAL_STATUS bit 62 set indicating > PEBS buffer overflow and also seeing at most 3 PEBS counters > having their bits set in the status register. This is a sign > that there was at least one PEBS record pending at the time > of the PMU interrupt. PEBS counters must only be processed > via the drain_pebs() calls, and not via the regular sample > processing loop coming after that the function, otherwise > phony regular samples may be generated in the sampling buffer > not marked with the EXACT tag. > > Another possibility is to have one PEBS event and at least > one non-PEBS event whic hoverflows while PEBS has armed. In this > case, bit 62 of GLOBAL_STATUS will not be set, yet the overflow > status bit for the PEBS counter will be on Skylake. > > To avoid this problem, we systematically ignore the PEBS-enabled > counters from the GLOBAL_STATUS mask and we always process PEBS > events via drain_pebs(). > > The problem manifested itself by having non-exact samples when > sampling only PEBS events, i.e., the PERF_SAMPLE_RECORD would > not have the EXACT flag set. > > Note that this problem is only present on Skylake processor. > This fix is harmless on older processors. > > Reported-by: Peter Zijlstra > Signed-off-by: Stephane Eranian > --- Thanks!