Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp2850475pxk; Sun, 20 Sep 2020 20:39:43 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxypTau2GqopsXwYp5lJXeeAGzFKiVHiG86zcEVwtphnXdLeISf74c82cXNpuqWgQpmKwxw X-Received: by 2002:a50:cf8a:: with SMTP id h10mr51107933edk.43.1600659582851; Sun, 20 Sep 2020 20:39:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600659582; cv=none; d=google.com; s=arc-20160816; b=XLgajkTPHraaZdNWHHqe5j9MoGi7AfYGfziuxDd0ze86oH2DEK7FuMWUOeeczehjh0 X8AsgPeqMdXPsDYOy3B9+KJG0J/xY8uKqbD1EuRE9naOfCnZxGWVlPcPwFP+ZeuijL9H Lu6dKxrVzansGVi0sWymQoqSO2DVeYAG13el+JuRyh5qqEjDhqGFOPhWrR0lkukA0dmW vHO7r5+2Bk1WGo3i9+DBHS6Q9l0Dts4wOaTz7x+ElFJ+0y022shpblKw6CZH3LMG1elV 4I1LWUg5zfy6k4H/mXp68f24Ln4Q1UAiKJYdNcW5Dtw/AmjNzGbUhxm75j6qoZzjoY4k JceA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:subject:cc:to:from:ironport-sdr :ironport-sdr; bh=yhq4kMJ6T2oiVCzyo/Jmhp0NSEQghQi0ZtNdR/uQXnk=; b=eLoin8FAuZa4Qg+2t+JGkP6wAIjE3ud1ibf21u6czuggT0llJjCCqpMVopmPiZXLpO CEZ+RyvJVS4zCJ4LO+Nnduac+f6koMctjDpW12/YWYpPf0eDk4rUakE1Mz3iNQg8Qo7l H7hWKT4Gr4UCgIO7A4K4emy1TaxO0nbZwK6QqEDOe3tLBvTVYRe/ZRWFyd4XmIJo65tR 9vTvZSPxdEwa75V5b3nS4edaNo2knp0sFzZobNc5fcR7SvzTgpBXdw4qud4IxXe+/xGG Ty4EGk6UUcXoOhamxqXOn4rcLX3PcqpWsPOlQ2j06869K1V6uA1gX9uyKYQpEJ02cR4n dOqQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id x21si7703644ejb.137.2020.09.20.20.39.19; Sun, 20 Sep 2020 20:39:42 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726297AbgIUDea (ORCPT + 99 others); Sun, 20 Sep 2020 23:34:30 -0400 Received: from mga05.intel.com ([192.55.52.43]:56863 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726184AbgIUDea (ORCPT ); Sun, 20 Sep 2020 23:34:30 -0400 IronPort-SDR: jbSOPXS7GoAZOhFSrxTejYsiQ8LJY/4xuawj3B5Ko2eqeQeiGYxsvN6Tl6zh4gbr2XgXcmZPsb lckxHABtmczw== X-IronPort-AV: E=McAfee;i="6000,8403,9750"; a="245136848" X-IronPort-AV: E=Sophos;i="5.77,285,1596524400"; d="scan'208";a="245136848" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Sep 2020 20:34:28 -0700 IronPort-SDR: BwZuO+V/3O7l5aGpWZQOaSUdkXTj2sJggxJi1+8Xlz5G8ipShaV4fkDx2EoA0apycBY0uHV9ru adMhAU1z/Vpg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,285,1596524400"; d="scan'208";a="411943759" Received: from kbl-ppc.sh.intel.com ([10.239.159.55]) by fmsmga001.fm.intel.com with ESMTP; 20 Sep 2020 20:34:26 -0700 From: Jin Yao To: acme@kernel.org, jolsa@kernel.org, peterz@infradead.org, mingo@redhat.com, alexander.shishkin@linux.intel.com Cc: Linux-kernel@vger.kernel.org, ak@linux.intel.com, kan.liang@intel.com, yao.jin@intel.com, Jin Yao Subject: [PATCH v7 0/7] perf: Stream comparison Date: Mon, 21 Sep 2020 11:33:55 +0800 Message-Id: <20200921033402.25129-1-yao.jin@linux.intel.com> X-Mailer: git-send-email 2.17.1 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Sometimes, a small change in a hot function reducing the cycles of this function, but the overall workload doesn't get faster. It is interesting where the cycles are moved to. What it would like is to diff before/after streams. The stream is the branch history which is aggregated by the branch records from perf samples. For example, the callchains aggregated from the branch records. By browsing the hot stream, we can understand the hot code path. By browsing the hot streams, we can understand the hot code path. By comparing the cycles variation of same streams between old perf data and new perf data, we can understand if the cycles are moved to other codes. The before stream is the stream in perf.data.old. The after stream is the stream in perf.data. Diffing before/after streams compares top N hottest streams between two perf data files. If all entries of one stream in perf.data.old are fully matched with all entries of another stream in perf.data, we think two streams are matched, otherwise the streams are not matched. For example, cycles: 1, hits: 26.80% cycles: 1, hits: 27.30% -------------------------- -------------------------- main div.c:39 main div.c:39 main div.c:44 main div.c:44 The above streams are matched and we can see for the same streams the cycles (1) are equal and the callchain hit percents are slightly changed (26.80% vs. 27.30%). That's expected. Now let's see example. perf record -b ... Generate perf.data.old with branch data perf record -b ... Generate perf.data with branch data perf diff --stream [ Matched hot streams ] hot chain pair 1: cycles: 1, hits: 27.77% cycles: 1, hits: 9.24% --------------------------- -------------------------- main div.c:39 main div.c:39 main div.c:44 main div.c:44 hot chain pair 2: cycles: 34, hits: 20.06% cycles: 27, hits: 16.98% --------------------------- -------------------------- __random_r random_r.c:360 __random_r random_r.c:360 __random_r random_r.c:388 __random_r random_r.c:388 __random_r random_r.c:388 __random_r random_r.c:388 __random_r random_r.c:380 __random_r random_r.c:380 __random_r random_r.c:357 __random_r random_r.c:357 __random random.c:293 __random random.c:293 __random random.c:293 __random random.c:293 __random random.c:291 __random random.c:291 __random random.c:291 __random random.c:291 __random random.c:291 __random random.c:291 __random random.c:288 __random random.c:288 rand rand.c:27 rand rand.c:27 rand rand.c:26 rand rand.c:26 rand@plt rand@plt rand@plt rand@plt compute_flag div.c:25 compute_flag div.c:25 compute_flag div.c:22 compute_flag div.c:22 main div.c:40 main div.c:40 main div.c:40 main div.c:40 main div.c:39 main div.c:39 hot chain pair 3: cycles: 9, hits: 4.48% cycles: 6, hits: 4.51% --------------------------- -------------------------- __random_r random_r.c:360 __random_r random_r.c:360 __random_r random_r.c:388 __random_r random_r.c:388 __random_r random_r.c:388 __random_r random_r.c:388 __random_r random_r.c:380 __random_r random_r.c:380 [ Hot streams in old perf data only ] hot chain 1: cycles: 18, hits: 6.75% -------------------------- __random_r random_r.c:360 __random_r random_r.c:388 __random_r random_r.c:388 __random_r random_r.c:380 __random_r random_r.c:357 __random random.c:293 __random random.c:293 __random random.c:291 __random random.c:291 __random random.c:291 __random random.c:288 rand rand.c:27 rand rand.c:26 rand@plt rand@plt compute_flag div.c:25 compute_flag div.c:22 main div.c:40 hot chain 2: cycles: 29, hits: 2.78% -------------------------- compute_flag div.c:22 main div.c:40 main div.c:40 main div.c:39 [ Hot streams in new perf data only ] hot chain 1: cycles: 4, hits: 4.54% -------------------------- main div.c:42 compute_flag div.c:28 hot chain 2: cycles: 5, hits: 3.51% -------------------------- main div.c:39 main div.c:44 main div.c:42 compute_flag div.c:28 v7: --- Create a new struct evlist_streams which contains ev_streams and nr_evsel, so we don't need to pass nr_evsel in stream related functions. Rename functions for better coding style. v6: --- Rebase to perf/core v5: --- 1. Remove enum stream_type 2. Rebase to perf/core v4: --- The previous version is too big and very hard for review. 1. v4 removes the code which supports the source line mapping table and remove the source line based comparison. Now we only supports the basic functionality of stream comparison. 2. Refactor the code in a generic way. v3: --- v2 has 14 patches, it's hard to review. v3 is only 7 patches for basic stream comparison. Jin Yao (7): perf util: Create streams perf util: Get the evsel_streams by evsel_idx perf util: Compare two streams perf util: Link stream pair perf util: Calculate the sum of total streams hits perf util: Report hot streams perf diff: Support hot streams comparison tools/perf/Documentation/perf-diff.txt | 4 + tools/perf/builtin-diff.c | 119 ++++++++- tools/perf/util/Build | 1 + tools/perf/util/callchain.c | 99 +++++++ tools/perf/util/callchain.h | 9 + tools/perf/util/stream.c | 342 +++++++++++++++++++++++++ tools/perf/util/stream.h | 41 +++ 7 files changed, 602 insertions(+), 13 deletions(-) create mode 100644 tools/perf/util/stream.c create mode 100644 tools/perf/util/stream.h -- 2.17.1