Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758128AbcLBVXe (ORCPT ); Fri, 2 Dec 2016 16:23:34 -0500 Received: from mga05.intel.com ([192.55.52.43]:26610 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750903AbcLBVUj (ORCPT ); Fri, 2 Dec 2016 16:20:39 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,288,1477983600"; d="scan'208";a="1093847482" From: kan.liang@intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, linux-kernel@vger.kernel.org Cc: alexander.shishkin@linux.intel.com, tglx@linutronix.de, namhyung@kernel.org, jolsa@kernel.org, adrian.hunter@intel.com, wangnan0@huawei.com, mark.rutland@arm.com, andi@firstfloor.org, Kan Liang Subject: [PATCH V2 00/13] export perf overheads information Date: Fri, 2 Dec 2016 16:19:08 -0500 Message-Id: <1480713561-6617-1-git-send-email-kan.liang@intel.com> X-Mailer: git-send-email 2.5.5 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5535 Lines: 128 From: Kan Liang Profiling brings additional overhead. High overhead may impacts the behavior of the profiling object, impacts the accuracy of the profiling result, and even hang the system. Currently, perf has dynamic interrupt throttle mechanism to lower the sample rate and overhead. But it has limitations. - The mechanism only focus in the sampling overhead. However, there are other parts which also bring big overhead. E.g, multiplexing. - The hint from the mechanism doesn't work on fixed period. - The system changes which caused by the mechanism are not recorded in the perf.data. Users have no idea about the overhead and its impact. Acctually, any passive ways like dynamic interrupt throttle mechanism are only palliative. The best way is to export overhead information, provide more hints, and help the users design more proper perf command. Both kernel and user tool can bring overhead. For kernel, three parts can bring obvious overhead. - sample overhead: For x86, it's the time cost in perf NMI handler. - multiplexing overhead: The time cost spends on rotate context. - side-band events overhead: The time cost spends on iterating all events that need to receive side-band events. All the time cost of those parts are recorded in new perf record type PERF_RECORD_OVERHEAD. For user, the perf record CPU time and elapsed time are recorded in new perf record type PERF_RECORD_USER_OVERHEAD. User can apply --show-profiling-cost in perf report to dump the overhead details in the head of the output. The result is sorted by CPU, if CPU is available. Otherwise, the accumulated result is printed. perf report automatically checks the overhead information. If the kerenl profiling cost is > 10% or perf record profiling cost > 50%. A warning is printed. The kerenl profiling cost is from max of each CPU overhead / perf monotonic wall time. The perf record profiling cost is from perf CPU time / perf monotonic wall time. Logging overhead information is default on. User can disable it by applying --no-profile-cost-info in perf record. User can use the overhead information to refine their perf command and get accurate profiling result. For example, if there is high overhead warning, user may reduce the number of events/increase the period/decrease the frequency. Developer can also use the overhead information to find bugs. Here is an example. $ perf report --show-profiling-cost Warning: Perf kernel profiling cost is high! The cost rate is 15.30% Please consider reducing the number of events, or increasing the period, or decrease the frequency. # To display the perf.data header info, please use --header/--header-only options. # # ======== # Elapsed time: 120345432522(ns) # Perf record cpu time: 505470324(ns) # # CPU SAM SAM cost(ns) MUX MUX cost(ns) SB SB cost(ns) # -1 2247030 1705370307 111354 3111550168 1237358 13591276893 # ======== Changes since V1: - fix u32 holes and remove duplicate CPU information - configurable overhead logging - Introduce the concept of PMU specific overhead and common core overhead. Rename NMI overhead to PMU sample overhead - Add log_overhead in perf_event_context to indicate the logging of overhead - Refine the output of overhead information - Use perf CPU time to replace perf write data overhead - Refine the formula of overhead evaluation - Refine perf script Kan Liang (13): perf/core: Introduce PERF_RECORD_OVERHEAD perf/core: output overhead when sched out from context perf/x86: output sampling overhead perf/core: output multiplexing overhead perf/core: output side-band events overhead perf tools: option to disable overhead collection perf tools: handle PERF_RECORD_OVERHEAD record type perf tools: show kernel overhead perf script: show kernel overhead perf tools: add time related functions perf tools: introduce PERF_RECORD_USER_OVERHEAD perf tools: record user space profiling cost perf report: warn on high overhead arch/x86/events/core.c | 17 +++- arch/x86/events/perf_event.h | 2 + include/linux/perf_event.h | 15 +++ include/uapi/linux/perf_event.h | 42 ++++++++- kernel/events/core.c | 92 ++++++++++++++++++- tools/include/uapi/linux/perf_event.h | 42 ++++++++- tools/perf/Documentation/perf-record.txt | 6 ++ tools/perf/Documentation/perf-report.txt | 10 ++ tools/perf/Documentation/perf-script.txt | 3 + tools/perf/builtin-record.c | 63 ++++++++++++- tools/perf/builtin-report.c | 16 +++- tools/perf/builtin-sched.c | 2 +- tools/perf/builtin-script.c | 32 +++++++ tools/perf/builtin.h | 2 + tools/perf/perf.h | 1 + tools/perf/util/event.c | 39 ++++++++ tools/perf/util/event.h | 31 +++++++ tools/perf/util/evlist.c | 6 ++ tools/perf/util/evlist.h | 1 + tools/perf/util/evsel.c | 1 + tools/perf/util/machine.c | 37 ++++++++ tools/perf/util/machine.h | 3 + tools/perf/util/session.c | 152 +++++++++++++++++++++++++++++++ tools/perf/util/session.h | 3 + tools/perf/util/symbol.h | 3 +- tools/perf/util/tool.h | 1 + 26 files changed, 613 insertions(+), 9 deletions(-) -- 2.5.5