Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp540647ybl; Fri, 9 Aug 2019 09:44:20 -0700 (PDT) X-Google-Smtp-Source: APXvYqx4MT/QCj+v75tkwrYFGc3w4zCzjF7GPjCcr5hVEEZS/8vOM3zDxxM3o2+Gg/MvvHJw8HsP X-Received: by 2002:a65:6216:: with SMTP id d22mr17020614pgv.404.1565369060444; Fri, 09 Aug 2019 09:44:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565369060; cv=none; d=google.com; s=arc-20160816; b=LBXcnzOlx8zK24YhzWFM/i1zTCj8RvHhLO1d+M9JPEX4t5hzUo+hLo0ifewzxL8sXd 1sFRppwLqUbws1aF13xlmivOIbwaUXzy60XUmNFkswnWf2e8MfJ7nBd6BW2zW4QUQgaf odfn1AA1kbpbRb2nVIfbm5WEv4wdViXKGP3HHlRkMcW2NNOjcdJoRsQFlkH9o3ulIvwH AK/f5kJ4GPkCgUnOqT2YchzvzaGqUn+Ew3AgFDBYBv9PqRJflUZYTbgF6w1fmZ10o8kv JPWkcgpo2mUt8YJPkExJljsBjwqsx1H62DOWaV+ZAbEsT19Be77B41AEBmW5O+PpSWAj Ipyw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:mime-version:user-agent:date:message-id:cc:to :organization:subject:from; bh=4tQ9zjoL8Zvk7X4pSTN02D6rHCubmLlm9UXwto1rI9w=; b=NSeYBr5+LGN1vX4JY6QkHIs6dwPI4NTAtbHwgkvq+RPDJiqjazXINDjWNGjeiy2LZy Tz/0Bgn/y35vgp/6L6iNXiaxtIzIY65jQnyk2gKy+x2Ew8g/MvU2EfqcUB6tXKFM7HXU 300Sx+y+kmsvNdLbG8S29CULLdCuWUIVokl7XVhlgkkruCn4+70zj3jPAqdbmZuE3NBM PZb1w0JJiLvejAhl1tfbGHk2rDVR5J0DTgfLXj2TL4a4txOKNy8vd7GG2RA+sBLYWi4l fTJEjqDEkcv8pF0syT6DpBqVmVCKt+tVq/pN+ObLJbZ44Bbo4GVvDUlGl+1MDN6U89jB jgpQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r30si10067658pgl.69.2019.08.09.09.44.04; Fri, 09 Aug 2019 09:44:20 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2407146AbfHIPQI (ORCPT + 99 others); Fri, 9 Aug 2019 11:16:08 -0400 Received: from mga09.intel.com ([134.134.136.24]:2361 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726037AbfHIPQI (ORCPT ); Fri, 9 Aug 2019 11:16:08 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 09 Aug 2019 08:16:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,364,1559545200"; d="scan'208";a="193407232" Received: from linux.intel.com ([10.54.29.200]) by fmsmga001.fm.intel.com with ESMTP; 09 Aug 2019 08:16:07 -0700 Received: from [10.252.0.91] (abudanko-mobl.ccr.corp.intel.com [10.252.0.91]) by linux.intel.com (Postfix) with ESMTP id 302325801AB; Fri, 9 Aug 2019 08:16:03 -0700 (PDT) From: Alexey Budankov Subject: [PATCH v1 0/3] collect LBR callstack together with thread stack data Organization: Intel Corp. To: Arnaldo Carvalho de Melo , linux-kernel Cc: Jiri Olsa , Namhyung Kim , Alexander Shishkin , Peter Zijlstra , Ingo Molnar , Andi Kleen , Kan Liang , "Jin, Yao" Message-ID: Date: Fri, 9 Aug 2019 18:16:02 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The patch set unblocks collection of LBR call stack data simultaneously with raw thread stack data by --call-graph dwarf,SIZE option: $perf record -g --call-graph dwarf,1024 -j stack,u -- stack_test Collected LBR call stack can be used to augment dwarf call stack calculated from the raw thread stack data and to provide more comprehensive call stack information for cases when collected SIZE is not enough to cover complete thread stack. Such cases are typical for workloads that allocate large arrays of data on its threads stacks or the possible SIZE to collect can't be large enough due to workload nature or system configuration and this is where hardware captured LBR call stacks can provide missing stack frames. Possible dwarf plus LBR call stacks consolidation algorithm description follows. With this patch set perf report command UI currently ignores collected LBR call stack data and still provides dwarf based call stacks information. =========================================================================== Overview: Legend: THS - thread stack CTX - thread register context SWS - software stack SSF - skipped stack frames PSS - Perf sample stack ip,sp,bp - HW registers values d - allocated stack regions kip - ip address in the kernel space K - captured thread stack size THS ----- | |<-stack bottom ... |---| |ip4| |---| PSS = SWS(THS(K)) | | --> | | | |d3 | user/ | |---| user PSS kernel PSS | |ip3| ------ ------ | |---| |SSF | |SSF | | | | .... .... | | | ------ ------ | |d2 | | -1 | | -1 | |---| user ------ ------ K |ip2| CTX |ip3 | |ip3 | |---| |----| |----| | |d1 | ... |ip2 | , |ip2 | | |---| |---| |----| |----| | |ip1| |bp0| |ip1 | |ip1 | | |---| |---| |----| |----| | | | |ip0|->|ip0 | |ip0 |<-user stack top | | | |---| ------ ------ | | |<-|sp0|<-stack |kip0|<-kernel stack bottom --> ----- ----- top |----| |kip1| |----| |kip2| |----| .... | |<-kernel stack top ------ Algorithm details: Legend: HWS - hardware stack K-SWS - kernel software stack BRANCH TABLE HWS ip ip from to ------ ----------- |ip7`| |ip7`| | |----| |----|----| |ip6`| |ip6`| | user PSS |----| |----|----| |ip5`| |ip5`| | ------ |----| |----|----| | -1 | |ip4`| |ip4`| | ------ |----| |----|----| |ip3 |~~~|ip3`| |ip3`| | |----| |----| |----|----| |ip2 |~~~|ip2`| |ip2`| | |----| |----| |----|----| |ip1 |~~~|ip1`| |ip1`|ip0`| |----| |----| ----------- |ip0 |~~~|ip0`|<---------' ------ ------ 1. if (sym(ipj) == sym(ipj`)), j=0-3 ===> user PSS 2. ipj` , j=4-7 ===> user PSS Augmented PSS = A_SWS(SWS(THS(K)), HWS): user/ user PSS kernel PSS ------ ------ |ip7`| |ip7`|<-user PSS bottom |----| |----| |ip6`| |ip6`| |----| |----| HWS |ip5`| |ip5`| |----| |----| |ip4`| |ip4`| ------ ------ |ip3 | |ip3 | |----| |----| SWS |ip2 | |ip2 | |----| |----| |ip1 | |ip1 | |----| |----| |ip0 | |ip0 |<-user PSS top ------ ------ |kip0|<-kernel PSS bottom |----| |kip1| K-SWS |----| |kip2| |----| |kip3|<-kernel PSS top ------ APSS =========================================================================== --- Alexey Budankov (3): perf record: enable LBR callstack capture jointly with thread stack perf report: dump LBR callstack data by -D jointly with thread stack perf report: prefer dwarf callstacks to LBR ones when captured both tools/perf/builtin-report.c | 2 ++ tools/perf/util/parse-branch-options.c | 1 + tools/perf/util/session.c | 31 ++++++++++++++++---------- 3 files changed, 22 insertions(+), 12 deletions(-) -- 2.20.1