LinuxLists.cc - [PATCH] perf record: fix priv level with branch sampling for paranoid=2

2019-09-04 06:27:29

Subject: [PATCH] perf record: fix priv level with branch sampling for paranoid=2

Now that the default perf_events paranoid level is set to 2, a regular user
cannot monitor kernel level activity anymore. As such, with the following
cmdline:

$ perf record -e cycles date

The perf tool first tries cycles:uk but then falls back to cycles:u
as can be seen in the perf report --header-only output:

cmdline : /export/hda3/tmp/perf.tip record -e cycles ls
event : name = cycles:u, , id = { 436186, ... }

This is okay as long as there is way to learn the priv level was changed
internally by the tool.

But consider a similar example:

$ perf record -b -e cycles date
Error:
You may not have permission to collect stats.

Consider tweaking /proc/sys/kernel/perf_event_paranoid,
which controls use of the performance events system by
unprivileged users (without CAP_SYS_ADMIN).
...

Why is that treated differently given that the branch sampling inherits the
priv level of the first event in this case, i.e., cycles:u? It turns out
that the branch sampling code is more picky and also checks exclude_hv.

In the fallback path, perf record is setting exclude_kernel = 1, but it
does not change exclude_hv. This does not seem to match the restriction
imposed by paranoid = 2.

This patch fixes the problem by forcing exclude_hv = 1 in the fallback
for paranoid=2. With this in place:

$ perf record -b -e cycles date
cmdline : /export/hda3/tmp/perf.tip record -b -e cycles ls
event : name = cycles:u, , id = { 436847, ... }

And the command succeeds as expected.

Signed-off-by: Stephane Eranian <[email protected]>
---
tools/perf/util/evsel.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 85825384f9e8..3cbe06fdf7f7 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2811,9 +2811,11 @@ bool perf_evsel__fallback(struct evsel *evsel, int err,
if (evsel->name)
free(evsel->name);
evsel->name = new_name;
- scnprintf(msg, msgsize,
-"kernel.perf_event_paranoid=%d, trying to fall back to excluding kernel samples", paranoid);
+ scnprintf(msg, msgsize, "kernel.perf_event_paranoid=%d, trying "
+ "to fall back to excluding kernel and hypervisor "
+ " samples", paranoid);
evsel->core.attr.exclude_kernel = 1;
+ evsel->core.attr.exclude_hv = 1;

return true;
}
--
2.23.0.187.g17f5b7556c-goog

2019-09-13 10:08:43

by Stephane Eranian

[permalink] [raw]

Subject: Re: [PATCH] perf record: fix priv level with branch sampling for paranoid=2

On Tue, Sep 3, 2019 at 11:26 PM Stephane Eranian <[email protected]> wrote:
>
> Now that the default perf_events paranoid level is set to 2, a regular user
> cannot monitor kernel level activity anymore. As such, with the following
> cmdline:
>
> $ perf record -e cycles date
>
> The perf tool first tries cycles:uk but then falls back to cycles:u
> as can be seen in the perf report --header-only output:
>
> cmdline : /export/hda3/tmp/perf.tip record -e cycles ls
> event : name = cycles:u, , id = { 436186, ... }
>
> This is okay as long as there is way to learn the priv level was changed
> internally by the tool.
>
> But consider a similar example:
>
> $ perf record -b -e cycles date
> Error:
> You may not have permission to collect stats.
>
> Consider tweaking /proc/sys/kernel/perf_event_paranoid,
> which controls use of the performance events system by
> unprivileged users (without CAP_SYS_ADMIN).
> ...
>
> Why is that treated differently given that the branch sampling inherits the
> priv level of the first event in this case, i.e., cycles:u? It turns out
> that the branch sampling code is more picky and also checks exclude_hv.
>
> In the fallback path, perf record is setting exclude_kernel = 1, but it
> does not change exclude_hv. This does not seem to match the restriction
> imposed by paranoid = 2.
>
> This patch fixes the problem by forcing exclude_hv = 1 in the fallback
> for paranoid=2. With this in place:
>
> $ perf record -b -e cycles date
> cmdline : /export/hda3/tmp/perf.tip record -b -e cycles ls
> event : name = cycles:u, , id = { 436847, ... }
>
> And the command succeeds as expected.
>
Any comment on this patch?

> Signed-off-by: Stephane Eranian <[email protected]>
> ---
> tools/perf/util/evsel.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index 85825384f9e8..3cbe06fdf7f7 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -2811,9 +2811,11 @@ bool perf_evsel__fallback(struct evsel *evsel, int err,
> if (evsel->name)
> free(evsel->name);
> evsel->name = new_name;
> - scnprintf(msg, msgsize,
> -"kernel.perf_event_paranoid=%d, trying to fall back to excluding kernel samples", paranoid);
> + scnprintf(msg, msgsize, "kernel.perf_event_paranoid=%d, trying "
> + "to fall back to excluding kernel and hypervisor "
> + " samples", paranoid);
> evsel->core.attr.exclude_kernel = 1;
> + evsel->core.attr.exclude_hv = 1;
>
> return true;
> }
> --
> 2.23.0.187.g17f5b7556c-goog
>

2019-09-23 09:05:55

by Jiri Olsa

[permalink] [raw]

Subject: Re: [PATCH] perf record: fix priv level with branch sampling for paranoid=2

On Tue, Sep 03, 2019 at 11:26:03PM -0700, Stephane Eranian wrote:
> Now that the default perf_events paranoid level is set to 2, a regular user
> cannot monitor kernel level activity anymore. As such, with the following
> cmdline:
>
> $ perf record -e cycles date
>
> The perf tool first tries cycles:uk but then falls back to cycles:u
> as can be seen in the perf report --header-only output:
>
> cmdline : /export/hda3/tmp/perf.tip record -e cycles ls
> event : name = cycles:u, , id = { 436186, ... }
>
> This is okay as long as there is way to learn the priv level was changed
> internally by the tool.
>
> But consider a similar example:
>
> $ perf record -b -e cycles date
> Error:
> You may not have permission to collect stats.
>
> Consider tweaking /proc/sys/kernel/perf_event_paranoid,
> which controls use of the performance events system by
> unprivileged users (without CAP_SYS_ADMIN).
> ...
>
> Why is that treated differently given that the branch sampling inherits the
> priv level of the first event in this case, i.e., cycles:u? It turns out
> that the branch sampling code is more picky and also checks exclude_hv.
>
> In the fallback path, perf record is setting exclude_kernel = 1, but it
> does not change exclude_hv. This does not seem to match the restriction
> imposed by paranoid = 2.
>
> This patch fixes the problem by forcing exclude_hv = 1 in the fallback
> for paranoid=2. With this in place:
>
> $ perf record -b -e cycles date
> cmdline : /export/hda3/tmp/perf.tip record -b -e cycles ls
> event : name = cycles:u, , id = { 436847, ... }
>
> And the command succeeds as expected.
>
> Signed-off-by: Stephane Eranian <[email protected]>
> ---
> tools/perf/util/evsel.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index 85825384f9e8..3cbe06fdf7f7 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -2811,9 +2811,11 @@ bool perf_evsel__fallback(struct evsel *evsel, int err,
> if (evsel->name)
> free(evsel->name);
> evsel->name = new_name;
> - scnprintf(msg, msgsize,
> -"kernel.perf_event_paranoid=%d, trying to fall back to excluding kernel samples", paranoid);
> + scnprintf(msg, msgsize, "kernel.perf_event_paranoid=%d, trying "
> + "to fall back to excluding kernel and hypervisor "
> + " samples", paranoid);

extra space in here ^

Warning:
kernel.perf_event_paranoid=2, trying to fall back to excluding kernel and hypervisor samples

other than that it looks good to me

Acked-by: Jiri Olsa <[email protected]>

thanks,
jirka

2019-09-23 12:34:09

by Stephane Eranian

[permalink] [raw]

Subject: Re: [PATCH] perf record: fix priv level with branch sampling for paranoid=2

On Fri, Sep 20, 2019 at 12:12 PM Jiri Olsa <[email protected]> wrote:
>
> On Tue, Sep 03, 2019 at 11:26:03PM -0700, Stephane Eranian wrote:
> > Now that the default perf_events paranoid level is set to 2, a regular user
> > cannot monitor kernel level activity anymore. As such, with the following
> > cmdline:
> >
> > $ perf record -e cycles date
> >
> > The perf tool first tries cycles:uk but then falls back to cycles:u
> > as can be seen in the perf report --header-only output:
> >
> > cmdline : /export/hda3/tmp/perf.tip record -e cycles ls
> > event : name = cycles:u, , id = { 436186, ... }
> >
> > This is okay as long as there is way to learn the priv level was changed
> > internally by the tool.
> >
> > But consider a similar example:
> >
> > $ perf record -b -e cycles date
> > Error:
> > You may not have permission to collect stats.
> >
> > Consider tweaking /proc/sys/kernel/perf_event_paranoid,
> > which controls use of the performance events system by
> > unprivileged users (without CAP_SYS_ADMIN).
> > ...
> >
> > Why is that treated differently given that the branch sampling inherits the
> > priv level of the first event in this case, i.e., cycles:u? It turns out
> > that the branch sampling code is more picky and also checks exclude_hv.
> >
> > In the fallback path, perf record is setting exclude_kernel = 1, but it
> > does not change exclude_hv. This does not seem to match the restriction
> > imposed by paranoid = 2.
> >
> > This patch fixes the problem by forcing exclude_hv = 1 in the fallback
> > for paranoid=2. With this in place:
> >
> > $ perf record -b -e cycles date
> > cmdline : /export/hda3/tmp/perf.tip record -b -e cycles ls
> > event : name = cycles:u, , id = { 436847, ... }
> >
> > And the command succeeds as expected.
> >
> > Signed-off-by: Stephane Eranian <[email protected]>
> > ---
> > tools/perf/util/evsel.c | 6 ++++--
> > 1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> > index 85825384f9e8..3cbe06fdf7f7 100644
> > --- a/tools/perf/util/evsel.c
> > +++ b/tools/perf/util/evsel.c
> > @@ -2811,9 +2811,11 @@ bool perf_evsel__fallback(struct evsel *evsel, int err,
> > if (evsel->name)
> > free(evsel->name);
> > evsel->name = new_name;
> > - scnprintf(msg, msgsize,
> > -"kernel.perf_event_paranoid=%d, trying to fall back to excluding kernel samples", paranoid);
> > + scnprintf(msg, msgsize, "kernel.perf_event_paranoid=%d, trying "
> > + "to fall back to excluding kernel and hypervisor "
> > + " samples", paranoid);
>
> extra space in here ^
>
> Warning:
> kernel.perf_event_paranoid=2, trying to fall back to excluding kernel and hypervisor samples
>
> other than that it looks good to me
>
Fixed in v2.

> Acked-by: Jiri Olsa <[email protected]>
>
> thanks,
> jirka