Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757335AbbDPMyC (ORCPT ); Thu, 16 Apr 2015 08:54:02 -0400 Received: from casper.infradead.org ([85.118.1.10]:42542 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753900AbbDPMxy (ORCPT ); Thu, 16 Apr 2015 08:53:54 -0400 Date: Thu, 16 Apr 2015 14:53:42 +0200 From: Peter Zijlstra To: Kan Liang Cc: linux-kernel@vger.kernel.org, mingo@kernel.org, acme@infradead.org, eranian@google.com, andi@firstfloor.org Subject: Re: [PATCH V6 4/6] perf, x86: handle multiple records in PEBS buffer Message-ID: <20150416125342.GZ23123@twins.programming.kicks-ass.net> References: <1428597466-8154-1-git-send-email-kan.liang@intel.com> <1428597466-8154-5-git-send-email-kan.liang@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1428597466-8154-5-git-send-email-kan.liang@intel.com> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5310 Lines: 126 On Thu, Apr 09, 2015 at 12:37:44PM -0400, Kan Liang wrote: > From: Yan, Zheng > > When PEBS interrupt threshold is larger than one, the PEBS buffer > may include multiple records for each PEBS event. This patch makes > the code first count how many records each PEBS event has, then > output the samples in batch. > > One corner case needs to mention is that the PEBS hardware doesn't > deal well with collisions. The records for the events can be collapsed > into a single one, and it's not possible to reconstruct all events that > caused the PEBS record. > Here are some cases which can be called collisions. > - PEBS events happen near to each other, so the hardware merges them. > - PEBS events happen near to each other, but they are not merged. > The GLOBAL_STATUS for first counter is clear before generating event > for next counter. Only the first record can be treated as collisions. > - Same as case2, but the first counter isn't clear before generating > event for next counter. All the records are treated as collision > until a record with only one bit set for PEBS event. > > GLOBAL_STATUS could be set by both PEBS and non-PEBS events. Multiple > non-PEBS bit set doesn't count as collisions. > > In practice collisions are extremely rare, as long as different PEBS > events are used. The periods are typically very large, so any collision > is unlikely. When collision happens, we drop the PEBS record. > The only way you can get a lot of collision is when you count the same > thing multiple times. But it is not a useful configuration. > > Here are some numbers about collisions. > Four frequently occurring events > (cycles:p,instructions:p,branches:p,mem-stores:p) are tested > > Test events which are sampled together collision rate > cycles:p,instructions:p 0.25% > cycles:p,instructions:p,branches:p 0.30% > cycles:p,instructions:p,branches:p,mem-stores:p 0.35% > > cycles:p,cycles:p 98.52% *sigh* you're really going to make me write this :-( The sad part is that I had already written a large part of it for you in a previous email ( lkml.kernel.org/r/20150330200710.GO27490@worktop.programming.kicks-ass.net ). And yes, writing a good Changelog takes a lot of time and effort, sometimes more than the actual patch, and that is OK. There's a *PLEASE CLARIFY* in the below, please do that. Also the below talks about a PERF_RECORD_SAMPLES_LOST, please also implement that. --- When the PEBS interrupt threshold is larger than one record and the machine supports multiple PEBS events, the records of these events are mixed up and we need to demultiplex them. Demuxing the records is hard because the hardware is deficient. The hardware has two issues that, when combined, create impossible scenarios to demux. The first issue is that the 'status' field of the PEBS record is a copy of the GLOBAL_STATUS MSR at PEBS assist time. To see why this is a problem let us first describe the regular PEBS cycle: A) the CTRn value reaches 0: - the corresponding bit in GLOBAL_STATUS gets set - we start arming the hardware assist < some unspecified amount of time later -- this could cover multiple events of interest > B) the hardware assist is armed, any next event will trigger it C) a matching event happens: - the hardware assist triggers and generates a PEBS record this includes a copy of GLOBAL_STATUS at this moment - if we auto-reload we (re)set CTRn - we clear the relevant bit in GLOBAL_STATUS Now consider the following chain of events: A0, B0, A1, C0 The event generated for counter 0 will include a status with counter 1 set, even though its not at all related to the record. A similar thing can happen with a !PEBS event if it just happens to overflow at the right moment. The second issue is that the hardware will only emit one record for two or more counters if the event that triggers the assist is 'close' -- *PLEASE CLARIFY* either the very same instruction or retired in the same cycle? For instance, consider this chain of events: A0, B0, A1, B1, C01 Where C01 is an event that triggers both hardware assists (the instruction matches both criteria), we will generate but a single record, but again with both counters listed in the status field. This time the record pertains to both events. Note that these two cases are different but undistinguishable with the data as generated. Therefore demuxing records with multiple PEBS bits (we can safely ignore status bits for !PEBS counters) is impossible. Furthermore we cannot emit the record to both events because that might cause a data leak -- the events might not have the same privileges -- so what this patch does is discard such events. The assumption/hope is that such discards will be rare, and to make sure the user is not left in the dark about this we'll emit a PERF_RECORD_SAMPLES_LOST record with the number of possible discards. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/