Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754081AbbHaTjG (ORCPT ); Mon, 31 Aug 2015 15:39:06 -0400 Received: from mx1.redhat.com ([209.132.183.28]:35269 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751610AbbHaTjD (ORCPT ); Mon, 31 Aug 2015 15:39:03 -0400 Date: Mon, 31 Aug 2015 16:38:58 -0300 From: Arnaldo Carvalho de Melo To: Wang Nan Cc: mingo@kernel.org, ast@plumgrid.com, linux-kernel@vger.kernel.org, lizefan@huawei.com, pi3orama@163.com, Brendan Gregg , Daniel Borkmann , David Ahern , He Kuang , Jiri Olsa , Kaixu Xia , Masami Hiramatsu , Namhyung Kim , Peter Zijlstra Subject: Re: [PATCH 03/31] perf tools: Introduce dummy evsel Message-ID: <20150831193858.GB2443@redhat.com> References: <1440822125-52691-1-git-send-email-wangnan0@huawei.com> <1440822125-52691-4-git-send-email-wangnan0@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1440822125-52691-4-git-send-email-wangnan0@huawei.com> X-Url: http://acmel.wordpress.com User-Agent: Mutt/1.5.20 (2009-12-10) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 11572 Lines: 318 Em Sat, Aug 29, 2015 at 04:21:37AM +0000, Wang Nan escreveu: > This patch allows linking dummy evsel onto evlist as a placeholder. It > is for following patch which allows passing BPF object using '--event > object.o'. Summary: this patch ended up adding too many subtle clever tricks to achieve what it needs to accomplish, please try to clearly describe the problem, then describe how you implemented it. > Doesn't link other event selectors, if passing a BPF object file to > '--event', nothing is linked onto evlist. -ENOPARSE, can you rewrite the above sentence? You mean you will segregate the events that related to eBPF to process them in a separate step? Like, for instance, putting them in a separate evlist or perhaps flipping a bit like: evsel->process_me_later = true and then avoid those and process them at some later stage? > Instead, events described in BPF object file are probed and linked in > a delayed manner because we want do all probing work together. > Therefore, evsel for events in BPF object would be linked at the end > of evlist. Which causes a small problem that, if passing '--filter' > setting after object file, the filter option won't be correctly > applied to those events. > This patch links dummy onto evlist, so following --filter can be > collected by the dummy evsel. For this reason dummy evsels are set to > PERF_TYPE_TRACEPOINT. Looks like a roundabout way of applying the --filter to the eBPF, but I really need to read the patch then... See more below. > Due to the possibility of existance of dummy evsel, > perf_evlist__purge_dummy() must be called right after parse_options(). > This patch adds it to record, top, trace and stat builtin commands. > Further patch moves it down after real BPF events are processed with. > Signed-off-by: Wang Nan > Cc: Arnaldo Carvalho de Melo > Cc: Alexei Starovoitov > Cc: Brendan Gregg > Cc: Daniel Borkmann > Cc: David Ahern > Cc: He Kuang > Cc: Jiri Olsa > Cc: Kaixu Xia > Cc: Masami Hiramatsu > Cc: Namhyung Kim > Cc: Peter Zijlstra > Cc: Zefan Li > Cc: pi3orama@163.com > Link: http://lkml.kernel.org/r/1440742821-44548-4-git-send-email-wangnan0@huawei.com > --- > tools/perf/builtin-record.c | 2 ++ > tools/perf/builtin-stat.c | 1 + > tools/perf/builtin-top.c | 1 + > tools/perf/builtin-trace.c | 1 + > tools/perf/util/evlist.c | 19 +++++++++++++++++++ > tools/perf/util/evlist.h | 1 + > tools/perf/util/evsel.c | 32 ++++++++++++++++++++++++++++++++ > tools/perf/util/evsel.h | 6 ++++++ > tools/perf/util/parse-events.c | 25 +++++++++++++++++++++---- > 9 files changed, 84 insertions(+), 4 deletions(-) > > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c > index a660022..81829de 100644 > --- a/tools/perf/builtin-record.c > +++ b/tools/perf/builtin-record.c > @@ -1112,6 +1112,8 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused) > > argc = parse_options(argc, argv, record_options, record_usage, > PARSE_OPT_STOP_AT_NON_OPTION); > + perf_evlist__purge_dummy(rec->evlist); > + > if (!argc && target__none(&rec->opts.target)) > usage_with_options(record_usage, record_options); > > diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c > index 7aa039b..99b62f1 100644 > --- a/tools/perf/builtin-stat.c > +++ b/tools/perf/builtin-stat.c > @@ -1208,6 +1208,7 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused) > > argc = parse_options(argc, argv, options, stat_usage, > PARSE_OPT_STOP_AT_NON_OPTION); > + perf_evlist__purge_dummy(evsel_list); > > interval = stat_config.interval; > > diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c > index 8c465c8..246203b 100644 > --- a/tools/perf/builtin-top.c > +++ b/tools/perf/builtin-top.c > @@ -1198,6 +1198,7 @@ int cmd_top(int argc, const char **argv, const char *prefix __maybe_unused) > perf_config(perf_top_config, &top); > > argc = parse_options(argc, argv, options, top_usage, 0); > + perf_evlist__purge_dummy(top.evlist); > if (argc) > usage_with_options(top_usage, options); > > diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c > index 4e3abba..57712b9 100644 > --- a/tools/perf/builtin-trace.c > +++ b/tools/perf/builtin-trace.c > @@ -3099,6 +3099,7 @@ int cmd_trace(int argc, const char **argv, const char *prefix __maybe_unused) > > argc = parse_options_subcommand(argc, argv, trace_options, trace_subcommands, > trace_usage, PARSE_OPT_STOP_AT_NON_OPTION); > + perf_evlist__purge_dummy(trace.evlist); > > if (trace.trace_pgfaults) { > trace.opts.sample_address = true; > diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c > index 8d00039..8a4e64d 100644 > --- a/tools/perf/util/evlist.c > +++ b/tools/perf/util/evlist.c > @@ -1696,3 +1696,22 @@ void perf_evlist__set_tracking_event(struct perf_evlist *evlist, > > tracking_evsel->tracking = true; > } > + > +void perf_evlist__purge_dummy(struct perf_evlist *evlist) If it remove more than one event, then it should be named accordingly, either: perf_evlist__purge_dummies() or perf_evlist__purge_dummy_events(). I would prefer the former. But we already have a "dummy" event: [root@zoo linux]# perf stat -e dummy -a sleep 10s Performance counter stats for 'system wide': 0 dummy 10.003173114 seconds time elapsed [root@zoo linux]# It has some specific purpose, but then now, with your patch, we need to figure out which dummy is which, so I think this needs rethinking. > +{ > + struct perf_evsel *pos, *n; > + > + /* > + * Remove all dummy events. > + * During linking, we don't touch anything except link > + * it into evlist. As a result, we don't > + * need to adjust evlist->nr_entries during removal. > + */ > + > + evlist__for_each_safe(evlist, n, pos) { > + if (perf_evsel__is_dummy(pos)) { > + list_del_init(&pos->node); > + perf_evsel__delete(pos); > + } > + } > +} > diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h > index b39a619..7f15727 100644 > --- a/tools/perf/util/evlist.h > +++ b/tools/perf/util/evlist.h > @@ -181,6 +181,7 @@ bool perf_evlist__valid_read_format(struct perf_evlist *evlist); > void perf_evlist__splice_list_tail(struct perf_evlist *evlist, > struct list_head *list, > int nr_entries); > +void perf_evlist__purge_dummy(struct perf_evlist *evlist); > > static inline struct perf_evsel *perf_evlist__first(struct perf_evlist *evlist) > { > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c > index bac25f4..01267f4 100644 > --- a/tools/perf/util/evsel.c > +++ b/tools/perf/util/evsel.c > @@ -213,6 +213,7 @@ void perf_evsel__init(struct perf_evsel *evsel, > evsel->sample_size = __perf_evsel__sample_size(attr->sample_type); > perf_evsel__calc_id_pos(evsel); > evsel->cmdline_group_boundary = false; > + evsel->is_dummy = false; > } > > struct perf_evsel *perf_evsel__new_idx(struct perf_event_attr *attr, int idx) > @@ -225,6 +226,37 @@ struct perf_evsel *perf_evsel__new_idx(struct perf_event_attr *attr, int idx) > return evsel; > } > > +struct perf_evsel *perf_evsel__new_dummy(const char *name) > +{ > + struct perf_evsel *evsel = zalloc(perf_evsel__object.size); > + > + if (!evsel) > + return NULL; > + > + /* > + * Don't need call perf_evsel__init() for dummy evsel. > + * Keep it simple. > + */ > + evsel->name = strdup(name); > + if (!evsel->name) > + goto out_free; > + > + INIT_LIST_HEAD(&evsel->node); > + INIT_LIST_HEAD(&evsel->config_terms); > + > + evsel->cmdline_group_boundary = false; > + /* > + * Set dummy evsel as TRACEPOINT event so it can collect filter > + * options. > + */ > + evsel->attr.type = PERF_TYPE_TRACEPOINT; > + evsel->is_dummy = true; > + return evsel; > +out_free: > + free(evsel); > + return NULL; > +} > + > struct perf_evsel *perf_evsel__newtp_idx(const char *sys, const char *name, int idx) > { > struct perf_evsel *evsel = zalloc(perf_evsel__object.size); > diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h > index 298e6bb..0b8e47d 100644 > --- a/tools/perf/util/evsel.h > +++ b/tools/perf/util/evsel.h > @@ -118,6 +118,7 @@ struct perf_evsel { > struct perf_evsel *leader; > char *group_name; > bool cmdline_group_boundary; > + bool is_dummy; > struct list_head config_terms; > }; > > @@ -153,6 +154,11 @@ int perf_evsel__object_config(size_t object_size, > void (*fini)(struct perf_evsel *evsel)); > > struct perf_evsel *perf_evsel__new_idx(struct perf_event_attr *attr, int idx); > +struct perf_evsel *perf_evsel__new_dummy(const char *name); > +static inline bool perf_evsel__is_dummy(struct perf_evsel *evsel) > +{ > + return evsel->is_dummy; > +} > > static inline struct perf_evsel *perf_evsel__new(struct perf_event_attr *attr) > { > diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c > index 14cd7e3..71d91fb 100644 > --- a/tools/perf/util/parse-events.c > +++ b/tools/perf/util/parse-events.c > @@ -1141,7 +1141,7 @@ int parse_events(struct perf_evlist *evlist, const char *str, > perf_pmu__parse_cleanup(); > if (!ret) { > int entries = data.idx - evlist->nr_entries; > - struct perf_evsel *last; > + struct perf_evsel *last = NULL; > > if (!list_empty(&data.list)) { > last = list_entry(data.list.prev, > @@ -1149,8 +1149,25 @@ int parse_events(struct perf_evlist *evlist, const char *str, > last->cmdline_group_boundary = true; > } > > - perf_evlist__splice_list_tail(evlist, &data.list, entries); > - evlist->nr_groups += data.nr_groups; > + if (last && perf_evsel__is_dummy(last)) { > + if (!list_is_singular(&data.list)) { > + parse_events_evlist_error(&data, 0, > + "Dummy evsel error: not on a singular list"); > + return -1; > + } > + /* > + * We are introducing a dummy event. Don't touch > + * anything, just link it. What is the advantage of "just linking it"? What will we achieve by that, you told what you want to avoid, i.e. "alerting" evlist->nr_entries, but why is that important and what is the part you want to reuse? > + * Don't use perf_evlist__splice_list_tail() since > + * it alerts evlist->nr_entries, which affect header > + * of resulting perf.data. > + */ > + list_splice_tail(&data.list, &evlist->entries); > + } else { > + perf_evlist__splice_list_tail(evlist, &data.list, entries); > + evlist->nr_groups += data.nr_groups; > + } > > return 0; > } > @@ -1256,7 +1273,7 @@ foreach_evsel_in_last_glob(struct perf_evlist *evlist, > struct perf_evsel *last = NULL; > int err; > > - if (evlist->nr_entries > 0) > + if (!list_empty(&evlist->entries)) So here is part of that clever trick, i.e. evlist->nr_entries, that so far we could rely on being the number of evsels in evlist->entries, can't be trusted for that, argh :-\ > last = perf_evlist__last(evlist); > > do { > -- > 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/