Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp967207pxb; Wed, 6 Apr 2022 05:33:43 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxo78RWHMGFyTv1b8vw59E/NMoI7b2PnyYZRzTIIeXnL2PX9cHjPnx8/TZbWG8Q59meDsNB X-Received: by 2002:a17:903:110f:b0:154:c7a4:9371 with SMTP id n15-20020a170903110f00b00154c7a49371mr8479580plh.75.1649248423653; Wed, 06 Apr 2022 05:33:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649248423; cv=none; d=google.com; s=arc-20160816; b=ZsxjSysaJ7QLB0IHIsQb/GY1n4LmPex/lUDzHz3WerSzbVNC+SBw8MOURSUbW5J+xR IeXJpK9O3Ve7CHiTP/v94NApWE1zYdprKbkNQY2FIPIfFyVdvMfrJgh1M8ONxaTKxlJh 3znCX9ZmQvGuyNnDJp57gZtQd/cgoq66Zzwf40W2MKyNtS5PQPYX4/xXZxO3HNBh4W4r umbGz8RCtK23vSPlpkOcmCPXKZJY8XnS4DzXvNVlMR9yIL2lyOCSii2W8tvKJg3ImOpi HV6Yskk0Zgyt6gRruHAZT1fs7CxjM98qM0HIbJqZBkxLToXyrVEqSs3b5fsScSLCiG+7 mApA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=BNyaGcZWmZNYz232pj/Z958zh8Ev8R+UW0C8NcHGpDQ=; b=b5qOazT/vJgx/ynqcLgQLGgF7Yak2okGu+vuOKm2ZEfGrS7k+08avvBhHnyPuWaYM2 Gttoxw7hpnzBZIL/WSOBhBin1WifvaT+b24fHN1o/4/Mhnd3cCmjllrYGbCxAF/Wv7I0 ARy2m+z9EQCUyxcns5BjLMHRWF5MEz3mPNB/YyoXDOAw4I7B8Xa41CNKVOqvH/742yL2 ovzgHUP9YnKzMT7FgjHePzkrlKhcGGTzOuOR9SWXFdr/LIpUtDVOsaY1x3HSqx4ClaED lyQQDIpjpQ210nnGlEW79+bHZBA1HohRlqhLMQ8EB+inf+Kk5yAcd5ROPUYmh9VVoaaq Tmcw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=r6l88UZM; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id s9-20020a17090a5d0900b001bf0de27d60si4414810pji.77.2022.04.06.05.33.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Apr 2022 05:33:43 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=r6l88UZM; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 3211959AAE0; Wed, 6 Apr 2022 02:13:40 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1453041AbiDEWcj (ORCPT + 99 others); Tue, 5 Apr 2022 18:32:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43726 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1573701AbiDEToL (ORCPT ); Tue, 5 Apr 2022 15:44:11 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 948D5208; Tue, 5 Apr 2022 12:42:12 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 2CB6D61899; Tue, 5 Apr 2022 19:42:12 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 360DAC385A0; Tue, 5 Apr 2022 19:42:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1649187731; bh=QRne+Azt1yudtzqNCxOaZsJ8yGOCmzV842MNHsPauuw=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=r6l88UZMcm23sPnyz/AaiznGnSohV7AuktoJeNpiGTGk3QPWRKPX40zKBdMato0qr vdPEFea/5HSqiwxv0akT27f80emds88gz+BHZ3UP05yFF9nsXR8ifCdpVV67q036l7 vVF2cdyeksivBV5KIzzzaq6AjR9egDLZT+Ia8qJdZQQVeaDmAzcJhSTVoVKfJdeXb7 Eas/gd4eEzpAuYsAEnbY2pQKn70A8HLs2QIgAcebPGG1sw0XCSgVo0SyoWyXutF6Kg RU/Y+yxbqqSoJZFZSaSxv5U93Mc3rSy7F79UA6PzqWXozHhhVc03bh7pU8qBzE7jV+ JiLAl2Uey4Tag== Received: by quaco.ghostprotocols.net (Postfix, from userid 1000) id 096AE40407; Tue, 5 Apr 2022 16:42:06 -0300 (-03) Date: Tue, 5 Apr 2022 16:42:06 -0300 From: Arnaldo Carvalho de Melo To: Florian Fischer Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Peter Zijlstra , Ingo Molnar , Florian Schmaus Subject: Re: [PATCH] perf stat: add rusage utime and stime events Message-ID: References: <20220405174028.1675025-1-florian.fischer@muhq.space> <20220405174028.1675025-2-florian.fischer@muhq.space> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220405174028.1675025-2-florian.fischer@muhq.space> X-Url: http://acmel.wordpress.com X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Em Tue, Apr 05, 2022 at 07:40:28PM +0200, Florian Fischer escreveu: > It bothered me that during benchmarking using perf stat (to collect > for example CPU cache events) I could not simultaneously retrieve the > times spend in user or kernel mode in a machine readable format. > > When running perf stat the output for humans contains the times > reported by rusage and wait4. > > $ perf stat -e cache-misses:u -- true > > Performance counter stats for 'true': > > 4,206 cache-misses:u > > 0.001113619 seconds time elapsed > > 0.001175000 seconds user > 0.000000000 seconds sys > > But using perf stats' machine-readable format does not provide this > information. > > $ perf stat -x, -e cache-misses:u -- true > 4282,,cache-misses:u,492859,100.00,, > > I found no way to retrieve this information using the available events > while using machine-readable output. > > To solve this I added two new tool internal events 'rusage_user_time' > and 'rusage_system_time' as well as their aliases 'ru_utime' and > 'ru_stime', similarly to the already present 'duration_time' event. > > Both events use the already collected rusage information obtained by wait4. > > Examples presenting cache-misses and rusage information in both human and > machine-readable form: > > $ ./perf stat -e duration_time,ru_utime,ru_stime,cache-misses -- grep -q -r duration_time . > > Performance counter stats for 'grep -q -r duration_time .': > > 67,422,542 ns duration_time:u > 50,517 us ru_utime:u > 16,839 us ru_stime:u > 30,937 cache-misses:u > > 0.067422542 seconds time elapsed > > 0.050517000 seconds user > 0.016839000 seconds sys So, it looks useful indeed! I suggest you break this patch into a series, to reduce its size and help in reviewing. For instance, you introduce evsel__tool_names, please do it in a separate patch, initially only with the current "duration_time", then you go on adding one new metric, then the other one. More below. > $ ./perf stat -x, -e duration_time,ru_utime,ru_stime,cache-misses -- grep -q -r duration_time . > 72134524,ns,duration_time:u,72134524,100.00,, > 65225,us,ru_utime:u,65225,100.00,, > 6865,us,ru_stime:u,6865,100.00,, > 38705,,cache-misses:u,71189328,100.00,, > > Signed-off-by: Florian Fischer > --- > tools/perf/builtin-stat.c | 41 ++++++++++++++++++++++++------- > tools/perf/util/evsel.c | 19 +++++++++------ > tools/perf/util/evsel.h | 4 ++++ > tools/perf/util/parse-events.c | 44 +++++++++++++++++++++++++++++----- > tools/perf/util/parse-events.l | 2 ++ > tools/perf/util/stat-shadow.c | 2 ++ > tools/perf/util/stat.h | 19 +++++++++++++++ > 7 files changed, 109 insertions(+), 22 deletions(-) > > diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c > index a96f106dc93a..c73afc8f6da5 100644 > --- a/tools/perf/builtin-stat.c > +++ b/tools/perf/builtin-stat.c > @@ -216,6 +216,7 @@ static struct perf_stat_config stat_config = { > .run_count = 1, > .metric_only_len = METRIC_ONLY_LEN, > .walltime_nsecs_stats = &walltime_nsecs_stats, > + .ru_stats = &ru_stats, > .big_num = true, > .ctl_fd = -1, > .ctl_fd_ack = -1, > @@ -341,15 +342,35 @@ static int evsel__write_stat_event(struct evsel *counter, int cpu_map_idx, u32 t > static int read_single_counter(struct evsel *counter, int cpu_map_idx, > int thread, struct timespec *rs) > { > - if (counter->tool_event == PERF_TOOL_DURATION_TIME) { > - u64 val = rs->tv_nsec + rs->tv_sec*1000000000ULL; > - struct perf_counts_values *count = > - perf_counts(counter->counts, cpu_map_idx, thread); > - count->ena = count->run = val; > - count->val = val; > - return 0; > + switch(counter->tool_event) { > + case PERF_TOOL_DURATION_TIME: { > + u64 val = rs->tv_nsec + rs->tv_sec*1000000000ULL; > + struct perf_counts_values *count = > + perf_counts(counter->counts, cpu_map_idx, thread); > + count->ena = count->run = val; > + count->val = val; > + return 0; > + } > + case PERF_TOOL_RU_UTIME: > + case PERF_TOOL_RU_STIME: { > + u64 val; > + struct perf_counts_values *count = > + perf_counts(counter->counts, cpu_map_idx, thread); > + if (counter->tool_event == PERF_TOOL_RU_UTIME) > + val = ru_stats.ru_utime_usec_stat.mean; > + else > + val = ru_stats.ru_stime_usec_stat.mean; > + count->ena = count->run = val; > + count->val = val; > + return 0; > + } > + default: > + case PERF_TOOL_NONE: > + return evsel__read_counter(counter, cpu_map_idx, thread); > + case PERF_TOOL_LAST: > + /* this case should never be reached */ > + return 0; > } > - return evsel__read_counter(counter, cpu_map_idx, thread); > } > > /* > @@ -1010,8 +1031,10 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx) > evlist__reset_prev_raw_counts(evsel_list); > runtime_stat_reset(&stat_config); > perf_stat__reset_shadow_per_stat(&rt_stat); > - } else > + } else { > update_stats(&walltime_nsecs_stats, t1 - t0); > + update_rusage_stats(&ru_stats, &stat_config.ru_data); > + } > > /* > * Closing a group leader splits the group, and as we only disable > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c > index 2a1729e7aee4..26cf7e76698b 100644 > --- a/tools/perf/util/evsel.c > +++ b/tools/perf/util/evsel.c > @@ -597,6 +597,17 @@ static int evsel__sw_name(struct evsel *evsel, char *bf, size_t size) > return r + evsel__add_modifiers(evsel, bf + r, size - r); > } > > +const char *evsel__tool_names[PERF_TOOL_LAST] = { > + "duration_time", > + "rusage_user_time", > + "rusage_system_time", > +}; > + > +static int evsel__tool_name(enum perf_tool_event ev, char *bf, size_t size) > +{ > + return scnprintf(bf, size, "%s", evsel__tool_names[ev]); > +} > + > static int __evsel__bp_name(char *bf, size_t size, u64 addr, u64 type) > { > int r; > @@ -723,12 +734,6 @@ static int evsel__raw_name(struct evsel *evsel, char *bf, size_t size) > return ret + evsel__add_modifiers(evsel, bf + ret, size - ret); > } > > -static int evsel__tool_name(char *bf, size_t size) > -{ > - int ret = scnprintf(bf, size, "duration_time"); > - return ret; > -} > - > const char *evsel__name(struct evsel *evsel) > { > char bf[128]; > @@ -754,7 +759,7 @@ const char *evsel__name(struct evsel *evsel) > > case PERF_TYPE_SOFTWARE: > if (evsel->tool_event) > - evsel__tool_name(bf, sizeof(bf)); > + evsel__tool_name(evsel->tool_event, bf, sizeof(bf)); > else > evsel__sw_name(evsel, bf, sizeof(bf)); > break; > diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h > index 041b42d33bf5..2adb97383952 100644 > --- a/tools/perf/util/evsel.h > +++ b/tools/perf/util/evsel.h > @@ -30,6 +30,9 @@ typedef int (evsel__sb_cb_t)(union perf_event *event, void *data); > enum perf_tool_event { > PERF_TOOL_NONE = 0, > PERF_TOOL_DURATION_TIME = 1, > + PERF_TOOL_RU_UTIME = 2, > + PERF_TOOL_RU_STIME = 3, > + PERF_TOOL_LAST > }; > > /** struct evsel - event selector > @@ -258,6 +261,7 @@ extern const char *evsel__hw_cache_op[PERF_COUNT_HW_CACHE_OP_MAX][EVSEL__MAX_ALI > extern const char *evsel__hw_cache_result[PERF_COUNT_HW_CACHE_RESULT_MAX][EVSEL__MAX_ALIASES]; > extern const char *evsel__hw_names[PERF_COUNT_HW_MAX]; > extern const char *evsel__sw_names[PERF_COUNT_SW_MAX]; > +extern const char *evsel__tool_names[PERF_TOOL_LAST]; > extern char *evsel__bpf_counter_events; > bool evsel__match_bpf_counter_events(const char *name); > > diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c > index 24997925ae00..13db69b71187 100644 > --- a/tools/perf/util/parse-events.c > +++ b/tools/perf/util/parse-events.c > @@ -154,6 +154,21 @@ struct event_symbol event_symbols_sw[PERF_COUNT_SW_MAX] = { > }, > }; > > +struct event_symbol event_symbols_tool[PERF_TOOL_LAST] = { > + [PERF_TOOL_DURATION_TIME] = { > + .symbol = "duration_time", > + .alias = "", > + }, > + [PERF_TOOL_RU_UTIME] = { > + .symbol = "rusage_user_time", > + .alias = "ru_utime", > + }, > + [PERF_TOOL_RU_STIME] = { > + .symbol = "rusage_system_time", > + .alias = "ru_stime", > + }, > +}; > + > #define __PERF_EVENT_FIELD(config, name) \ > ((config & PERF_EVENT_##name##_MASK) >> PERF_EVENT_##name##_SHIFT) > > @@ -406,6 +421,10 @@ static int add_event_tool(struct list_head *list, int *idx, > free((char *)evsel->unit); > evsel->unit = strdup("ns"); > } > + else if (tool_event == PERF_TOOL_RU_UTIME || tool_event == PERF_TOOL_RU_STIME) { > + free((char *)evsel->unit); > + evsel->unit = strdup("us"); > + } > return 0; > } > > @@ -3055,21 +3074,34 @@ int print_hwcache_events(const char *event_glob, bool name_only) > return evt_num; > } > > -static void print_tool_event(const char *name, const char *event_glob, > +static void print_tool_event(const struct event_symbol *syms, const char *event_glob, > bool name_only) > { > - if (event_glob && !strglobmatch(name, event_glob)) > + if (syms->symbol == NULL) > + return; > + > + if (event_glob && !(strglobmatch(syms->symbol, event_glob) || > + (syms->alias && strglobmatch(syms->alias, event_glob)))) > return; > + > if (name_only) > - printf("%s ", name); > - else > + printf("%s ", syms->symbol); > + else { > + char name[MAX_NAME_LEN]; > + if (syms->alias) > + snprintf(name, MAX_NAME_LEN, "%s OR %s", syms->symbol, syms->alias); > + else > + strlcpy(name, syms->symbol, MAX_NAME_LEN); > printf(" %-50s [%s]\n", name, "Tool event"); > - > + } > } > > void print_tool_events(const char *event_glob, bool name_only) > { > - print_tool_event("duration_time", event_glob, name_only); > + // Start at 1 because the first enum entry symbols no tool event > + for (int i = 0; i < PERF_TOOL_LAST; ++i) { > + print_tool_event(event_symbols_tool + i, event_glob, name_only); > + } > if (pager_in_use()) > printf("\n"); > } > diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l > index 5b6e4b5249cf..3c7227b8035c 100644 > --- a/tools/perf/util/parse-events.l > +++ b/tools/perf/util/parse-events.l > @@ -353,6 +353,8 @@ alignment-faults { return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_AL > emulation-faults { return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_EMULATION_FAULTS); } > dummy { return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_DUMMY); } > duration_time { return tool(yyscanner, PERF_TOOL_DURATION_TIME); } > +rusage_user_time|ru_utime { return tool(yyscanner, PERF_TOOL_RU_UTIME); } > +rusage_system_time|ru_stime { return tool(yyscanner, PERF_TOOL_RU_STIME); } > bpf-output { return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_BPF_OUTPUT); } > cgroup-switches { return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_CGROUP_SWITCHES); } > > diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c > index 10af7804e482..ea4c35e4f1da 100644 > --- a/tools/perf/util/stat-shadow.c > +++ b/tools/perf/util/stat-shadow.c > @@ -26,6 +26,7 @@ > > struct runtime_stat rt_stat; > struct stats walltime_nsecs_stats; > +struct rusage_stats ru_stats; > > struct saved_value { > struct rb_node rb_node; > @@ -199,6 +200,7 @@ void perf_stat__reset_shadow_stats(void) > { > reset_stat(&rt_stat); > memset(&walltime_nsecs_stats, 0, sizeof(walltime_nsecs_stats)); > + memset(&ru_stats, 0, sizeof(ru_stats)); > } > > void perf_stat__reset_shadow_per_stat(struct runtime_stat *st) > diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h > index 335d19cc3063..b14a45d8af63 100644 > --- a/tools/perf/util/stat.h > +++ b/tools/perf/util/stat.h > @@ -108,6 +108,11 @@ struct runtime_stat { > struct rblist value_list; > }; > > +struct rusage_stats { > + struct stats ru_utime_usec_stat; > + struct stats ru_stime_usec_stat; > +}; > + This could be added in a separate patch that introduces this struct, the global variable and the init/update rusage_stats. Then you add one of the new metrics, then the other one. Doing it in a granular way helps with reviewing and also when using 'git bisect' to find problems. Thanks for working on this! - Arnaldo > typedef struct aggr_cpu_id (*aggr_get_id_t)(struct perf_stat_config *config, struct perf_cpu cpu); > > struct perf_stat_config { > @@ -148,6 +153,7 @@ struct perf_stat_config { > const char *csv_sep; > struct stats *walltime_nsecs_stats; > struct rusage ru_data; > + struct rusage_stats *ru_stats; > struct cpu_aggr_map *aggr_map; > aggr_get_id_t aggr_get_id; > struct cpu_aggr_map *cpus_aggr_map; > @@ -177,6 +183,18 @@ static inline void init_stats(struct stats *stats) > stats->max = 0; > } > > +static inline void init_rusage_stats(struct rusage_stats *ru_stats) { > + init_stats(&ru_stats->ru_utime_usec_stat); > + init_stats(&ru_stats->ru_stime_usec_stat); > +} > + > +static inline void update_rusage_stats(struct rusage_stats *ru_stats, struct rusage* rusage) { > + update_stats(&ru_stats->ru_utime_usec_stat, > + (rusage->ru_utime.tv_usec + rusage->ru_utime.tv_sec * 1000000ULL)); > + update_stats(&ru_stats->ru_stime_usec_stat, > + (rusage->ru_stime.tv_usec + rusage->ru_stime.tv_sec * 1000000ULL)); > +} > + > struct evsel; > struct evlist; > > @@ -196,6 +214,7 @@ bool __perf_stat_evsel__is(struct evsel *evsel, enum perf_stat_evsel_id id); > > extern struct runtime_stat rt_stat; > extern struct stats walltime_nsecs_stats; > +extern struct rusage_stats ru_stats; > > typedef void (*print_metric_t)(struct perf_stat_config *config, > void *ctx, const char *color, const char *unit, > -- > 2.35.1 -- - Arnaldo