2021-06-10 14:38:57

by John Garry

[permalink] [raw]
Subject: [PATCH 0/2] perf metricgroups: A couple of fixes

This series fixes a couple of subtle issues.

The first fixes a segfault from on my x86 broadwell when running the
'stat' command with a particular order of metrics.

As mentioned at [0], there is a pre-existing issue here which needs fixing
as this still does not work properly; however I think that is a bigger job,
and getting rid of the segfault is best I can do for the moment.

The second fixes an issue of an uninitialized variable. As noted in the
commit message, gcc does not seem to do a good job of picking up on this.

[0] https://lore.kernel.org/lkml/[email protected]/

John Garry (2):
perf metricgroup: Fix find_evsel_group() event selector
perf metricgroup: Return error code from
metricgroup__add_metric_sys_event_iter()

tools/perf/util/metricgroup.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)

--
2.26.2


2021-06-10 14:39:32

by John Garry

[permalink] [raw]
Subject: [PATCH 2/2] perf metricgroup: Return error code from metricgroup__add_metric_sys_event_iter()

The error code is not set at all in the sys event iter function.

This may lead to an uninitialized value of "ret" in
metricgroup__add_metric() when no CPU metric is added.

Fix by properly setting the error code.

It is not necessary to init "ret" to 0 in metricgroup__add_metric(), as
if we have no CPU or sys event metric matching, then "has_match" should
be 0 and "ret" is set to -EINVAL.

However gcc cannot detect that it may not have been set after the
map_for_each_metric() loop for CPU metrics, which is strange.

Fixes: be335ec28efa8 ("perf metricgroup: Support adding metrics for system PMUs")
Signed-off-by: John Garry <[email protected]>
---
tools/perf/util/metricgroup.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index c456fdeae06a..d3cf2dee36c8 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -1073,16 +1073,18 @@ static int metricgroup__add_metric_sys_event_iter(struct pmu_event *pe,

ret = add_metric(d->metric_list, pe, d->metric_no_group, &m, NULL, d->ids);
if (ret)
- return ret;
+ goto out;

ret = resolve_metric(d->metric_no_group,
d->metric_list, NULL, d->ids);
if (ret)
- return ret;
+ goto out;

*(d->has_match) = true;

- return *d->ret;
+out:
+ *(d->ret) = ret;
+ return ret;
}

static int metricgroup__add_metric(const char *metric, bool metric_no_group,
--
2.26.2

2021-06-10 14:41:14

by John Garry

[permalink] [raw]
Subject: [PATCH 1/2] perf metricgroup: Fix find_evsel_group() event selector

The following command segfaults on my x86 broadwell:

$ ./perf stat -M frontend_bound,retiring,backend_bound,bad_speculation sleep 1
WARNING: grouped events cpus do not match, disabling group:
anon group { raw 0x10e }
anon group { raw 0x10e }
perf: util/evsel.c:1596: get_group_fd: Assertion `!(!leader->core.fd)' failed.
Aborted (core dumped)

The issue shows itself as a use-after-free in evlist__check_cpu_maps(),
whereby the leader of an event selector (evsel) has been deleted (yet we
still attempt to verify for an evsel).

Fundamentally the problem comes from metricgroup__setup_events() ->
find_evsel_group(), and has developed from the previous fix attempt in
commit 9c880c24cb0d ("perf metricgroup: Fix for metrics containing
duration_time").

The problem now is that the logic in checking if an evsel is in the same
group is subtely broken for "cycles" event. For "cycles" event, the
pmu_name is NULL; however the logic in find_evsel_group() may set an event
matched against "cycles" as used, when it should not be.

This leads to a condition where an evsel is set, yet its leader is not.

Fix the check for evsel pmu_name by not matching evsels when either has a
NULL pmu_name.

There is still a pre-existing metric issue whereby the ordering of the
metrics may break the 'stat' function, as discussed at:
https://lore.kernel.org/lkml/[email protected]/

Fixes: 9c880c24cb0d ("perf metricgroup: Fix for metrics containing duration_time")
Signed-off-by: John Garry <[email protected]>
---
tools/perf/util/metricgroup.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index 8336dd8e8098..c456fdeae06a 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -162,10 +162,10 @@ static bool contains_event(struct evsel **metric_events, int num_events,
return false;
}

-static bool evsel_same_pmu(struct evsel *ev1, struct evsel *ev2)
+static bool evsel_same_pmu_or_none(struct evsel *ev1, struct evsel *ev2)
{
if (!ev1->pmu_name || !ev2->pmu_name)
- return false;
+ return true;

return !strcmp(ev1->pmu_name, ev2->pmu_name);
}
@@ -288,7 +288,7 @@ static struct evsel *find_evsel_group(struct evlist *perf_evlist,
*/
if (!has_constraint &&
ev->leader != metric_events[i]->leader &&
- evsel_same_pmu(ev->leader, metric_events[i]->leader))
+ evsel_same_pmu_or_none(ev->leader, metric_events[i]->leader))
break;
if (!strcmp(metric_events[i]->name, ev->name)) {
set_bit(ev->idx, evlist_used);
--
2.26.2

2021-06-10 18:48:43

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf metricgroup: Return error code from metricgroup__add_metric_sys_event_iter()

On Thu, Jun 10, 2021 at 7:37 AM John Garry <[email protected]> wrote:
>
> The error code is not set at all in the sys event iter function.
>
> This may lead to an uninitialized value of "ret" in
> metricgroup__add_metric() when no CPU metric is added.
>
> Fix by properly setting the error code.
>
> It is not necessary to init "ret" to 0 in metricgroup__add_metric(), as
> if we have no CPU or sys event metric matching, then "has_match" should
> be 0 and "ret" is set to -EINVAL.
>
> However gcc cannot detect that it may not have been set after the
> map_for_each_metric() loop for CPU metrics, which is strange.
>
> Fixes: be335ec28efa8 ("perf metricgroup: Support adding metrics for system PMUs")
> Signed-off-by: John Garry <[email protected]>

Acked-by: Ian Rogers <[email protected]>

Thanks,
Ian

> ---
> tools/perf/util/metricgroup.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
> index c456fdeae06a..d3cf2dee36c8 100644
> --- a/tools/perf/util/metricgroup.c
> +++ b/tools/perf/util/metricgroup.c
> @@ -1073,16 +1073,18 @@ static int metricgroup__add_metric_sys_event_iter(struct pmu_event *pe,
>
> ret = add_metric(d->metric_list, pe, d->metric_no_group, &m, NULL, d->ids);
> if (ret)
> - return ret;
> + goto out;
>
> ret = resolve_metric(d->metric_no_group,
> d->metric_list, NULL, d->ids);
> if (ret)
> - return ret;
> + goto out;
>
> *(d->has_match) = true;
>
> - return *d->ret;
> +out:
> + *(d->ret) = ret;
> + return ret;
> }
>
> static int metricgroup__add_metric(const char *metric, bool metric_no_group,
> --
> 2.26.2
>

2021-06-14 14:58:20

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf metricgroup: Return error code from metricgroup__add_metric_sys_event_iter()

Em Thu, Jun 10, 2021 at 11:45:17AM -0700, Ian Rogers escreveu:
> On Thu, Jun 10, 2021 at 7:37 AM John Garry <[email protected]> wrote:
> >
> > The error code is not set at all in the sys event iter function.
> >
> > This may lead to an uninitialized value of "ret" in
> > metricgroup__add_metric() when no CPU metric is added.
> >
> > Fix by properly setting the error code.
> >
> > It is not necessary to init "ret" to 0 in metricgroup__add_metric(), as
> > if we have no CPU or sys event metric matching, then "has_match" should
> > be 0 and "ret" is set to -EINVAL.
> >
> > However gcc cannot detect that it may not have been set after the
> > map_for_each_metric() loop for CPU metrics, which is strange.
> >
> > Fixes: be335ec28efa8 ("perf metricgroup: Support adding metrics for system PMUs")
> > Signed-off-by: John Garry <[email protected]>
>
> Acked-by: Ian Rogers <[email protected]>

Do your Acked-by applies to both patches? Or just 2/2? I reproduced the
problem fixed by 1/2 on a Thinkpad T450S (broadwell) and after applying
the patch it doesn't segfaults.

Please clarify,

- Arnaldo


> Thanks,
> Ian
>
> > ---
> > tools/perf/util/metricgroup.c | 8 +++++---
> > 1 file changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
> > index c456fdeae06a..d3cf2dee36c8 100644
> > --- a/tools/perf/util/metricgroup.c
> > +++ b/tools/perf/util/metricgroup.c
> > @@ -1073,16 +1073,18 @@ static int metricgroup__add_metric_sys_event_iter(struct pmu_event *pe,
> >
> > ret = add_metric(d->metric_list, pe, d->metric_no_group, &m, NULL, d->ids);
> > if (ret)
> > - return ret;
> > + goto out;
> >
> > ret = resolve_metric(d->metric_no_group,
> > d->metric_list, NULL, d->ids);
> > if (ret)
> > - return ret;
> > + goto out;
> >
> > *(d->has_match) = true;
> >
> > - return *d->ret;
> > +out:
> > + *(d->ret) = ret;
> > + return ret;
> > }
> >
> > static int metricgroup__add_metric(const char *metric, bool metric_no_group,
> > --
> > 2.26.2
> >

--

- Arnaldo

2021-06-15 17:52:52

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf metricgroup: Return error code from metricgroup__add_metric_sys_event_iter()

On Mon, Jun 14, 2021 at 7:56 AM Arnaldo Carvalho de Melo
<[email protected]> wrote:
>
> Em Thu, Jun 10, 2021 at 11:45:17AM -0700, Ian Rogers escreveu:
> > On Thu, Jun 10, 2021 at 7:37 AM John Garry <[email protected]> wrote:
> > >
> > > The error code is not set at all in the sys event iter function.
> > >
> > > This may lead to an uninitialized value of "ret" in
> > > metricgroup__add_metric() when no CPU metric is added.
> > >
> > > Fix by properly setting the error code.
> > >
> > > It is not necessary to init "ret" to 0 in metricgroup__add_metric(), as
> > > if we have no CPU or sys event metric matching, then "has_match" should
> > > be 0 and "ret" is set to -EINVAL.
> > >
> > > However gcc cannot detect that it may not have been set after the
> > > map_for_each_metric() loop for CPU metrics, which is strange.
> > >
> > > Fixes: be335ec28efa8 ("perf metricgroup: Support adding metrics for system PMUs")
> > > Signed-off-by: John Garry <[email protected]>
> >
> > Acked-by: Ian Rogers <[email protected]>
>
> Do your Acked-by applies to both patches? Or just 2/2? I reproduced the
> problem fixed by 1/2 on a Thinkpad T450S (broadwell) and after applying
> the patch it doesn't segfaults.

IIRC I need to look at what is going on with the names in patch 1/2
and didn't have a repro. I don't mind to ack it given that you've
repro-ed the problem and confirmed the fix. In general this logic
isn't working well (especially for --metric-no-group) so I plan to
take a stab at reorganizing it.

Thanks,
Ian

> Please clarify,
>
> - Arnaldo
>
>
> > Thanks,
> > Ian
> >
> > > ---
> > > tools/perf/util/metricgroup.c | 8 +++++---
> > > 1 file changed, 5 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
> > > index c456fdeae06a..d3cf2dee36c8 100644
> > > --- a/tools/perf/util/metricgroup.c
> > > +++ b/tools/perf/util/metricgroup.c
> > > @@ -1073,16 +1073,18 @@ static int metricgroup__add_metric_sys_event_iter(struct pmu_event *pe,
> > >
> > > ret = add_metric(d->metric_list, pe, d->metric_no_group, &m, NULL, d->ids);
> > > if (ret)
> > > - return ret;
> > > + goto out;
> > >
> > > ret = resolve_metric(d->metric_no_group,
> > > d->metric_list, NULL, d->ids);
> > > if (ret)
> > > - return ret;
> > > + goto out;
> > >
> > > *(d->has_match) = true;
> > >
> > > - return *d->ret;
> > > +out:
> > > + *(d->ret) = ret;
> > > + return ret;
> > > }
> > >
> > > static int metricgroup__add_metric(const char *metric, bool metric_no_group,
> > > --
> > > 2.26.2
> > >
>
> --
>
> - Arnaldo