Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756724AbZFVMB2 (ORCPT ); Mon, 22 Jun 2009 08:01:28 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756206AbZFVMBN (ORCPT ); Mon, 22 Jun 2009 08:01:13 -0400 Received: from mx2.mail.elte.hu ([157.181.151.9]:47622 "EHLO mx2.mail.elte.hu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756261AbZFVMBL (ORCPT ); Mon, 22 Jun 2009 08:01:11 -0400 Date: Mon, 22 Jun 2009 14:00:52 +0200 From: Ingo Molnar To: eranian@gmail.com Cc: LKML , Andrew Morton , Thomas Gleixner , Robert Richter , Peter Zijlstra , Paul Mackerras , Andi Kleen , Maynard Johnson , Carl Love , Corey J Ashford , Philip Mucci , Dan Terpstra , perfmon2-devel Subject: Re: IV.4 - Intel PEBS Message-ID: <20090622120052.GS24366@elte.hu> References: <7c86c4470906161042p7fefdb59y10f8ef4275793f0e@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <7c86c4470906161042p7fefdb59y10f8ef4275793f0e@mail.gmail.com> User-Agent: Mutt/1.5.18 (2008-05-17) X-ELTE-SpamScore: -1.5 X-ELTE-SpamLevel: X-ELTE-SpamCheck: no X-ELTE-SpamVersion: ELTE 2.0 X-ELTE-SpamCheck-Details: score=-1.5 required=5.9 tests=BAYES_00 autolearn=no SpamAssassin version=3.2.5 -1.5 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3280 Lines: 76 > 4/ Intel PEBS > > Since Netburst-based processors, Intel PMUs support a hardware > sampling buffer mechanism called PEBS. > > PEBS really became useful with Nehalem. > > Not all events support PEBS. Up until Nehalem, only one counter > supported PEBS (PMC0). The format of the hardware buffer has > changed between Core and Nehalem. It is not yet architected, thus > it can still evolve with future PMU models. > > On Nehalem, there is a new PEBS-based feature called Load Latency > Filtering which captures where data cache misses occur (similar to > Itanium D-EAR). Activating this feature requires setting a latency > threshold hosted in a separate PMU MSR. > > On Nehalem, given that all 4 generic counters support PEBS, the > sampling buffer may contain samples generated by any of the 4 > counters. The buffer includes a bitmask of registers to determine > the source of the samples. Multiple bits may be set in the > bitmask. > > How PEBS will be supported for this new API? Note, the relevance of PEBS (or IBS) should not be over-stated: for example it fundamentally cannot do precise call-chain recording (it only records the RIP, not any of the return frames), which removes from its utility. Another limitation is that only a few basic hardware event types are supported by PEBS. Having said that, PEBS is a hardware sampling feature that is definitely saner than AMD's IBS. There's two immediate incremental uses of it in perfcounters: - it makes flat sampling lower overhead by avoiding an NMI for all sample points. - it makes flat sampled data more precise. (I.e. it can avoid the 1-2 instructions 'skidding' of a sample position, for a handful of PEBS-capable events.) As such its primary support form would be 'transparent enablement': i.e. on those (relatively few) events that are PEBS supported it would be enabled automatically, and would result in more precise (and possibly, cheaper) samples. No separate APIs are needed really - the kernel can abstract it away and can provide the user what the user wants: good and fast samples. Regarding demultiplexing on Nehalem: PEBS goes into the DS (Data Store), and indeed on Nehalem all PEBS counters 'mix' their PEBS records in the same stream of data. One possible model to support them is to set the PEBS threshold to one, and hence generate an interrupt for each PEBS record. At offset 0x90 of the PEBS record we have a snapshot of the global status register: 0x90 IA32_PERF_GLOBAL_STATUS Which tells us that relative to the previous PEBS record in the DS which counter overflowed. If this were not reliable, we could still poll all active counters for overflows and get a occasionally imprecise but still statistically meaningful and precise demultiplexing. As to enabling PEBS with the (CPU-)global latency recording filters, we can do this transparantly for every PEBS supported event, or can mandate PEBS scheduling when a PEBS only feature like load latency is requested. This means that for most purposes PEBS will be transparant. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/