This patch allows passing perf's own PID to '--filter' by using
'$PERFPID'. This should be useful when system-widely capturing
tracepoints events.
Before this patch, when doing something like:
# perf record -a -e syscalls:sys_enter_write <cmd>
One could easily get result like this:
# perf report --stdio
...
# Overhead Command Shared Object Symbol
# ........ ....... .................. ....................
#
99.99% perf libpthread-2.18.so [.] __write_nocancel
0.01% ls libc-2.18.so [.] write
0.01% sshd libc-2.18.so [.] write
...
Where most events are generated by perf itself.
A shell trick can be done to filter perf itself out:
# cat << EOF > ./tmp
> #!/bin/sh
> exec perf record -e ... --filter="common_pid != \$\$" -a sleep 10
> EOF
# chmod a+x ./tmp
# ./tmp
However, doing so is user unfriendly.
This patch introduces '$PERFPID' placeholder to perf's filter. Now
user is allowed to do the above work with:
# perf record -e ... --filter='common_pid != $PERFPID' -a sleep 10
This patch adds the variable replacement code to perf_evsel__apply_filter(),
before PERF_EVENT_IOC_SET_FILTER ioctl, so not only 'perf record', all
subcommands which uses filter can utilize $PERFPID.
Andi Kleen sent a similar patch at 2014, but wasn't applied, the
reason is not clear.
Signed-off-by: Wang Nan <[email protected]>
---
This patch is based on Arnaldo Carvalho de Melo's git tree:
https://git.kernel.org/cgit/linux/kernel/git/acme/linux.git/commit/?h=perf/core
---
tools/perf/Documentation/perf-record.txt | 5 +-
tools/perf/util/evsel.c | 110 ++++++++++++++++++++++++++++++-
2 files changed, 111 insertions(+), 4 deletions(-)
diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 9b9d9d0..9c67482 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -61,7 +61,10 @@ OPTIONS
"perf report" to view group events together.
--filter=<filter>::
- Event filter.
+ Event filter. $PERFPID is allowed to be used to represent perf's own pid.
+ Note that '$' has special meaning for shell. Don't forget to use ''
+ quotation marks or to use '\' to escape when using '$PERFPID' in command
+ line.
-a::
--all-cpus::
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 83c0803..7f2a1a5 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -816,12 +816,116 @@ static int perf_evsel__run_ioctl(struct perf_evsel *evsel, int ncpus, int nthrea
return 0;
}
+static int
+perf_evsel__append_filter_token(const char *key, char *new_filter,
+ ssize_t *pspace)
+{
+ if (strcmp(key, "PERFPID") == 0) {
+ char pid_buf[32];
+ pid_t self_pid = getpid();
+
+ snprintf(pid_buf, sizeof(pid_buf), "%d", self_pid);
+ strncat(new_filter, pid_buf, *pspace);
+ *pspace -= strlen(pid_buf);
+ if (*pspace < 0)
+ return -1;
+ return 0;
+ }
+
+ return -1;
+}
+
+static const char *
+perf_evsel__postproc_filter(const char *filter)
+{
+ char *dollar = NULL, *sep = NULL, *p;
+ char *old_filter = NULL, *new_filter = NULL;
+ ssize_t space;
+
+ if (!filter)
+ return NULL;
+
+ dollar = strchr(filter, '$');
+ if (!dollar)
+ return filter;
+
+ p = old_filter = strdup(filter);
+ if (!old_filter) {
+ pr_warning("Can't alloc memory when postprocing filter '%s'\n",
+ filter);
+ return filter;
+ }
+
+ dollar = old_filter + (dollar - filter);
+
+ /*
+ * See perf_event_set_filter(). Length of a valid filter is
+ * limited by page_size.
+ */
+ new_filter = malloc(page_size);
+ if (!new_filter) {
+ pr_warning("Can't alloc memory when postprocing filter '%s'\n",
+ filter);
+ goto errout;
+ }
+
+ *new_filter = '\0';
+ space = page_size - 1;
+
+ while (1) {
+ if (dollar)
+ *dollar = '\0';
+ strncat(new_filter, p, space);
+ space -= strlen(p);
+ if (space < 0)
+ goto errout;
+ if (!dollar)
+ break;
+
+ sep = strchr(dollar + 1, ' ');
+ if (sep)
+ *sep = '\0';
+
+ if (perf_evsel__append_filter_token(dollar + 1, new_filter,
+ &space)) {
+ pr_warning("Filter become too long: '%s'\n", filter);
+ goto errout;
+ }
+
+ if (!sep)
+ break;
+
+ p = sep;
+ *p = ' ';
+ dollar = strchr(p, '$');
+ }
+
+ free(old_filter);
+ return new_filter;
+
+errout:
+ free(old_filter);
+ free(new_filter);
+ return filter;
+}
+
int perf_evsel__apply_filter(struct perf_evsel *evsel, int ncpus, int nthreads,
const char *filter)
{
- return perf_evsel__run_ioctl(evsel, ncpus, nthreads,
- PERF_EVENT_IOC_SET_FILTER,
- (void *)filter);
+ const char *real_filter;
+ int err;
+
+ real_filter = perf_evsel__postproc_filter(filter);
+ if (!real_filter)
+ real_filter = filter;
+
+ pr_debug("set filter: '%s'\n", real_filter);
+ err = perf_evsel__run_ioctl(evsel, ncpus, nthreads,
+ PERF_EVENT_IOC_SET_FILTER,
+ (void *)real_filter);
+ if (real_filter != filter)
+ free((void *)real_filter);
+ return err;
}
int perf_evsel__set_filter(struct perf_evsel *evsel, const char *filter)
--
1.8.3.4
Hi,
On Tue, Jul 7, 2015 at 12:38 PM, Wang Nan <[email protected]> wrote:
> This patch allows passing perf's own PID to '--filter' by using
> '$PERFPID'. This should be useful when system-widely capturing
> tracepoints events.
>
> Before this patch, when doing something like:
>
> # perf record -a -e syscalls:sys_enter_write <cmd>
>
> One could easily get result like this:
>
> # perf report --stdio
> ...
> # Overhead Command Shared Object Symbol
> # ........ ....... .................. ....................
> #
> 99.99% perf libpthread-2.18.so [.] __write_nocancel
> 0.01% ls libc-2.18.so [.] write
> 0.01% sshd libc-2.18.so [.] write
> ...
>
> Where most events are generated by perf itself.
>
> A shell trick can be done to filter perf itself out:
>
> # cat << EOF > ./tmp
> > #!/bin/sh
> > exec perf record -e ... --filter="common_pid != \$\$" -a sleep 10
> > EOF
> # chmod a+x ./tmp
> # ./tmp
>
> However, doing so is user unfriendly.
>
> This patch introduces '$PERFPID' placeholder to perf's filter. Now
> user is allowed to do the above work with:
>
> # perf record -e ... --filter='common_pid != $PERFPID' -a sleep 10
Instead, what about adding an option to do the same thing, like
--exclude-perf or something?
Thanks,
Namhyung
>
> This patch adds the variable replacement code to perf_evsel__apply_filter(),
> before PERF_EVENT_IOC_SET_FILTER ioctl, so not only 'perf record', all
> subcommands which uses filter can utilize $PERFPID.
>
> Andi Kleen sent a similar patch at 2014, but wasn't applied, the
> reason is not clear.
>
> Signed-off-by: Wang Nan <[email protected]>
Em Tue, Jul 07, 2015 at 09:33:02PM +0900, Namhyung Kim escreveu:
> On Tue, Jul 7, 2015 at 12:38 PM, Wang Nan <[email protected]> wrote:
> > This patch introduces '$PERFPID' placeholder to perf's filter. Now
> > user is allowed to do the above work with:
> > # perf record -e ... --filter='common_pid != $PERFPID' -a sleep 10
> Instead, what about adding an option to do the same thing, like
> --exclude-perf or something?
Good idea, that would be more compact, yes.
- Arnaldo
On 2015/7/7 20:33, Namhyung Kim wrote:
> Hi,
>
> On Tue, Jul 7, 2015 at 12:38 PM, Wang Nan <[email protected]> wrote:
>> This patch allows passing perf's own PID to '--filter' by using
>> '$PERFPID'. This should be useful when system-widely capturing
>> tracepoints events.
>>
>> Before this patch, when doing something like:
>>
>> # perf record -a -e syscalls:sys_enter_write <cmd>
>>
>> One could easily get result like this:
>>
>> # perf report --stdio
>> ...
>> # Overhead Command Shared Object Symbol
>> # ........ ....... .................. ....................
>> #
>> 99.99% perf libpthread-2.18.so [.] __write_nocancel
>> 0.01% ls libc-2.18.so [.] write
>> 0.01% sshd libc-2.18.so [.] write
>> ...
>>
>> Where most events are generated by perf itself.
>>
>> A shell trick can be done to filter perf itself out:
>>
>> # cat << EOF > ./tmp
>> > #!/bin/sh
>> > exec perf record -e ... --filter="common_pid != \$\$" -a sleep 10
>> > EOF
>> # chmod a+x ./tmp
>> # ./tmp
>>
>> However, doing so is user unfriendly.
>>
>> This patch introduces '$PERFPID' placeholder to perf's filter. Now
>> user is allowed to do the above work with:
>>
>> # perf record -e ... --filter='common_pid != $PERFPID' -a sleep 10
> Instead, what about adding an option to do the same thing, like
> --exclude-perf or something?
I thought this idea. Please see one of my previous email:
http://lkml.kernel.org/r/[email protected]
This patch gives user full control of their filters. They can create
filters like 'common_pid == $PERFPID' to profile perf itself, or use it
in some complex expressions.
But if most of you like adding new option, I can post a v3 with
'--exclude-perf' added. Yes, the code of it can be much simpler than
this patch.
> Thanks,
> Namhyung
>
>
>> This patch adds the variable replacement code to perf_evsel__apply_filter(),
>> before PERF_EVENT_IOC_SET_FILTER ioctl, so not only 'perf record', all
>> subcommands which uses filter can utilize $PERFPID.
>>
>> Andi Kleen sent a similar patch at 2014, but wasn't applied, the
>> reason is not clear.
>>
>> Signed-off-by: Wang Nan <[email protected]>
Em Tue, Jul 07, 2015 at 10:19:12PM +0800, Wangnan (F) escreveu:
>
>
> On 2015/7/7 20:33, Namhyung Kim wrote:
> >Hi,
> >
> >On Tue, Jul 7, 2015 at 12:38 PM, Wang Nan <[email protected]> wrote:
> >>This patch allows passing perf's own PID to '--filter' by using
> >>'$PERFPID'. This should be useful when system-widely capturing
> >>tracepoints events.
> >>
> >>Before this patch, when doing something like:
> >>
> >> # perf record -a -e syscalls:sys_enter_write <cmd>
> >>
> >>One could easily get result like this:
> >>
> >> # perf report --stdio
> >> ...
> >> # Overhead Command Shared Object Symbol
> >> # ........ ....... .................. ....................
> >> #
> >> 99.99% perf libpthread-2.18.so [.] __write_nocancel
> >> 0.01% ls libc-2.18.so [.] write
> >> 0.01% sshd libc-2.18.so [.] write
> >> ...
> >>
> >>Where most events are generated by perf itself.
> >>
> >>A shell trick can be done to filter perf itself out:
> >>
> >> # cat << EOF > ./tmp
> >> > #!/bin/sh
> >> > exec perf record -e ... --filter="common_pid != \$\$" -a sleep 10
> >> > EOF
> >> # chmod a+x ./tmp
> >> # ./tmp
> >>
> >>However, doing so is user unfriendly.
> >>
> >>This patch introduces '$PERFPID' placeholder to perf's filter. Now
> >>user is allowed to do the above work with:
> >>
> >> # perf record -e ... --filter='common_pid != $PERFPID' -a sleep 10
> >Instead, what about adding an option to do the same thing, like
> >--exclude-perf or something?
>
> I thought this idea. Please see one of my previous email:
>
> http://lkml.kernel.org/r/[email protected]
>
> This patch gives user full control of their filters. They can create filters
> like 'common_pid == $PERFPID' to profile perf itself, or use it in some
> complex expressions.
>
> But if most of you like adding new option, I can post a v3 with
> '--exclude-perf' added. Yes, the code of it can be much simpler than this
> patch.
One other thing I thought is that the current way to apply --filter is
per tracepoint, i.e. parse_filter() goes on being called as we go on
parsing the command line and then it gets applied to the last --event
parsed, so if we want to apply the same filter to all tracepoints in a
more complex command line, then we must use that expression in each
tracepoint, no?
That provides more flexibility, as does being able to ask for some
specific tracepoint to be recorded only for the perf tool itself, and
for some specific tracepoint at that.
So it seems that for the problem you have at hand, i.e. exclude the perf
tool from all tracepoints in the command line, --exclude-perf seems more
compact, but since you already wrote the other patch to support
$PERF_PID, probably we should apply both?
- Arnaldo