Received: by 2002:a25:7ec1:0:0:0:0:0 with SMTP id z184csp614079ybc; Tue, 19 Nov 2019 06:37:13 -0800 (PST) X-Google-Smtp-Source: APXvYqzqcRdNgjaUosPyQytWauIiZEdMWedSgppjs/Jw2jkJmzRJ5YIGdSNsW/N++IVv/RNsQ1Ee X-Received: by 2002:adf:f20d:: with SMTP id p13mr35547082wro.325.1574174232981; Tue, 19 Nov 2019 06:37:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1574174232; cv=none; d=google.com; s=arc-20160816; b=g/RLA76jcGmJhg29Ud+AsJ4oZ+CaU2Aij8thyrd8g7MFcJKgt2BurBA7g5R49GeXUU nofWpP3CfcdHCKIrJkkluDeAqzD2oeSEuotIzZC2NLNKCj7DtZRpdDT+s18fka6FD3JY YnHemI6rR12cf+45HOaqCEcL3KiWI8Av7x6+zD4eDFR+8KbbWILXUSlGo0vQ56ixNOb/ B2cLroNenscOSwsYOlny+t87fMPAjc7tmFlUw2H05oJj73qzj6OfChCkBDBvPOJAf/BG zVZbRKOm09B/Z0e8iz1TmCrvBwRx3daxRiyR9et5FwYboNcZeQOkmoHIR8BrvEvYkzhk udfg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from; bh=Z1P0CixAMoUXvxQgWuiLcYMyHYck2zQ8T2Q6ac5Ujd0=; b=uz4tU8kYyqNRHfzy0kV+GQcjgfaGRnNtt5iJoXmpfFuAl4qpO9wxaLYNXxAk18AAuA LRpCO3Y7MLdXPCCd9Dpr6b/dIo1/mL7C6RpMDFcP7NnZ2+XUJP7CzUP0E7tseOgDW5SH XVfpgaBVGpajQjCANDhGpHqI6NxGxiltRSroDv5aCFqc98KivuaRz4+kqtF4UT8sOMm9 ifV/ZIkYQEIDC5SrdYemqiLP9dS++RLMUcvDTi5cmesjoIU1mFezWm8iRtQXtzQyHcFE Lwtgx8Ebjkc6k5LH0yCO52v9yKlT4+LmNThLoQu/BqAzyEhJ2uJLiYCZOXQheqN9+SxL ZiDQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j8si14899863eds.146.2019.11.19.06.36.48; Tue, 19 Nov 2019 06:37:12 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728355AbfKSOf1 (ORCPT + 99 others); Tue, 19 Nov 2019 09:35:27 -0500 Received: from mga04.intel.com ([192.55.52.120]:64767 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728323AbfKSOfU (ORCPT ); Tue, 19 Nov 2019 09:35:20 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 Nov 2019 06:35:20 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.68,324,1569308400"; d="scan'208";a="215552432" Received: from labuser-ice-lake-client-platform.jf.intel.com ([10.54.55.50]) by fmsmga001.fm.intel.com with ESMTP; 19 Nov 2019 06:35:19 -0800 From: kan.liang@linux.intel.com To: peterz@infradead.org, acme@redhat.com, mingo@kernel.org, linux-kernel@vger.kernel.org Cc: jolsa@kernel.org, namhyung@kernel.org, vitaly.slobodskoy@intel.com, pavel.gerasimov@intel.com, ak@linux.intel.com, eranian@google.com, mpe@ellerman.id.au, Kan Liang Subject: [RFC PATCH V4 13/13] perf hist: Add fast path for duplicate entries check approach Date: Tue, 19 Nov 2019 06:34:11 -0800 Message-Id: <20191119143411.3482-14-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191119143411.3482-1-kan.liang@linux.intel.com> References: <20191119143411.3482-1-kan.liang@linux.intel.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Kan Liang Perf checks the duplicate entries in a callchain before adding an entry. However the check is very slow especially with deeper call stack. Almost ~50% elapsed time of perf report is spent on the check when the call stack is always depth of 32. The hist_entry__cmp() is used to compare the new entry with the old entries. It will go through all the available sorts in the sort_list, and call the specific cmp of each sort, which is very slow. Actually, for most cases, there are no duplicate entries in callchain. The symbols are usually different. It's much faster to do a quick check for symbols first. Only do the full cmp when the symbols are exactly the same. The quick check is only to check symbols, not dso. Export _sort__sym_cmp. $perf record --call-graph lbr ./tchain_edit_64 Without the patch $time perf report --stdio real 0m21.142s user 0m21.110s sys 0m0.033s With the patch $time perf report --stdio real 0m10.977s user 0m10.948s sys 0m0.027s Signed-off-by: Kan Liang Cc: Namhyung Kim --- tools/perf/util/hist.c | 23 +++++++++++++++++++++++ tools/perf/util/sort.c | 2 +- tools/perf/util/sort.h | 2 ++ 3 files changed, 26 insertions(+), 1 deletion(-) diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c index 0a8d72ae93ca..6eb35dde3905 100644 --- a/tools/perf/util/hist.c +++ b/tools/perf/util/hist.c @@ -1057,6 +1057,20 @@ iter_next_cumulative_entry(struct hist_entry_iter *iter, return fill_callchain_info(al, node, iter->hide_unresolved); } +static bool +hist_entry__fast__sym_diff(struct hist_entry *left, + struct hist_entry *right) +{ + struct symbol *sym_l = left->ms.sym; + struct symbol *sym_r = right->ms.sym; + + if (!sym_l && !sym_r) + return left->ip != right->ip; + + return !!_sort__sym_cmp(sym_l, sym_r); +} + + static int iter_add_next_cumulative_entry(struct hist_entry_iter *iter, struct addr_location *al) @@ -1083,6 +1097,7 @@ iter_add_next_cumulative_entry(struct hist_entry_iter *iter, }; int i; struct callchain_cursor cursor; + bool fast = hists__has(he_tmp.hists, sym); callchain_cursor_snapshot(&cursor, &callchain_cursor); @@ -1093,6 +1108,14 @@ iter_add_next_cumulative_entry(struct hist_entry_iter *iter, * It's possible that it has cycles or recursive calls. */ for (i = 0; i < iter->curr; i++) { + /* + * For most cases, there are no duplicate entries in callchain. + * The symbols are usually different. Do a quick check for + * symbols first. + */ + if (fast && hist_entry__fast__sym_diff(he_cache[i], &he_tmp)) + continue; + if (hist_entry__cmp(he_cache[i], &he_tmp) == 0) { /* to avoid calling callback function */ iter->he = NULL; diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c index 6b626e6b111e..afa1ac233760 100644 --- a/tools/perf/util/sort.c +++ b/tools/perf/util/sort.c @@ -234,7 +234,7 @@ static int64_t _sort__addr_cmp(u64 left_ip, u64 right_ip) return (int64_t)(right_ip - left_ip); } -static int64_t _sort__sym_cmp(struct symbol *sym_l, struct symbol *sym_r) +int64_t _sort__sym_cmp(struct symbol *sym_l, struct symbol *sym_r) { if (!sym_l || !sym_r) return cmp_null(sym_l, sym_r); diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h index 5aff9542d9b7..d608b8a28a92 100644 --- a/tools/perf/util/sort.h +++ b/tools/perf/util/sort.h @@ -307,5 +307,7 @@ int64_t sort__daddr_cmp(struct hist_entry *left, struct hist_entry *right); int64_t sort__dcacheline_cmp(struct hist_entry *left, struct hist_entry *right); +int64_t +_sort__sym_cmp(struct symbol *sym_l, struct symbol *sym_r); char *hist_entry__srcline(struct hist_entry *he); #endif /* __PERF_SORT_H */ -- 2.17.1