Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754500Ab0HBSfQ (ORCPT ); Mon, 2 Aug 2010 14:35:16 -0400 Received: from mail-ww0-f44.google.com ([74.125.82.44]:58272 "EHLO mail-ww0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754301Ab0HBSfO (ORCPT ); Mon, 2 Aug 2010 14:35:14 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:mime-version:content-type :content-disposition:user-agent; b=rwSu5bN2NypNbk2JGCN7nCBZdlpq/C2XtG2iZMQm+3zeKja0brQNb84rJre2i2WzU6 elYZMunhw+aAA5vwGqUad816aZf3NfdeARxrkhqXGjqvqCpGmhQeOX70UlgJPebiI1Er l6BrixfTu9uCeW7n9Wu7mPEqGT5n5oz5qfhyI= Date: Mon, 2 Aug 2010 20:35:08 +0200 From: Frederic Weisbecker To: Ingo Molnar , Peter Zijlstra , Arnaldo Carvalho de Melo , Paul Mackerras , Stephane Eranian , Markus Metzger , Robert Richter Cc: LKML Subject: [RFC] BTS based perf user callchains Message-ID: <20100802183506.GA8962@nowhere> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1940 Lines: 48 Hi, As you may know there is an issue with user stacktraces: it requires userspace apps to be built with frame pointers. So there is something we can try: dump a piece of the top user stack page each time we have an event hit and let the tools deal with that later using the dwarf informations. But before trying that, which might require heavy copies, I would like to try something based on BTS. The idea is to look at the branch buffer and only pick addresses of branches that originated from "call" instructions. So we want BTS activated, only in user ring, without the need of interrupts once we reach the limit of the buffer, we can just run in a kind of live mode and read on need. This could be a secondary perf event that has no mmap buffer. Something only used by the kernel internally by others true perf events in a given context. Primary perf events can then read on this BTS buffer when they want. Now there are two ways: - record the whole branch buffer each time we overflow on another perf event and let post processing userspace deal with "call" instruction filtering to build the stacktrace on top of the branch trace. - do the "call" filtering on record time. That requires to inspect each recorded branches and look at the instruction content from the fast path. I don't know which solution could be the faster one. I'm not even sure that will work. Also, while looking at the BTS implementation in perf, I see we have one BTS buffer per cpu. But that doesn't look right as the code flow is not linear per cpu but per task. Hence I suspect we need one BTS buffer per task. But may be someone tried that and encountered a problem? Tell me your feelings. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/