2022-09-02 15:42:07

by Adrian Hunter

[permalink] [raw]
Subject: Re: perf top -p broken for multithreaded processes since 5.19

On 2/09/22 17:46, Tomáš Trnka wrote:
> Hello,
>
> A bug in perf v5.19 and newer completely breaks monitoring multithreaded
> processes using "perf top -p". The tool fails to start with "Failed to mmap
> with 22 (Invalid argument)". It still seems to work fine on single-threaded
> processes. "perf record" is also unaffected.

It has been reported here:

https://bugzilla.kernel.org/show_bug.cgi?id=216441


2022-09-02 19:53:54

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: perf top -p broken for multithreaded processes since 5.19

Em Fri, Sep 02, 2022 at 05:50:22PM +0300, Adrian Hunter escreveu:
> On 2/09/22 17:46, Tomáš Trnka wrote:
> > Hello,
> >
> > A bug in perf v5.19 and newer completely breaks monitoring multithreaded
> > processes using "perf top -p". The tool fails to start with "Failed to mmap
> > with 22 (Invalid argument)". It still seems to work fine on single-threaded
> > processes. "perf record" is also unaffected.
>
> It has been reported here:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=216441

If I do:

⬢[acme@toolbox perf-urgent]$ git log -2
commit dfeb0bc60782471c293938e71b1a1117cfac2cb3 (HEAD -> perf/urgent)
Author: Arnaldo Carvalho de Melo <[email protected]>
Date: Fri Sep 2 16:15:39 2022 -0300

Revert "libperf evlist: Check nr_mmaps is correct"

This reverts commit 4ce47d842d4c16c07b135b8a7975b8f0672bcc0e.

Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

commit 78cd283f6b8ab701cb35eafd5af8140560a88f16
Author: Arnaldo Carvalho de Melo <[email protected]>
Date: Fri Sep 2 16:13:41 2022 -0300

Revert "libperf evlist: Allow mixing per-thread and per-cpu mmaps"

This reverts commit ae4f8ae16a07896403c90305d4b9be27f657c1fc.

Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
⬢[acme@toolbox perf-urgent]$

It works again, Tomáš can you please try doing this to see if this works
for you?

- Arnaldo

2022-09-03 07:35:25

by Adrian Hunter

[permalink] [raw]
Subject: Re: perf top -p broken for multithreaded processes since 5.19

On 2/09/22 22:17, Arnaldo Carvalho de Melo wrote:
> Em Fri, Sep 02, 2022 at 05:50:22PM +0300, Adrian Hunter escreveu:
>> On 2/09/22 17:46, Tomáš Trnka wrote:
>>> Hello,
>>>
>>> A bug in perf v5.19 and newer completely breaks monitoring multithreaded
>>> processes using "perf top -p". The tool fails to start with "Failed to mmap
>>> with 22 (Invalid argument)". It still seems to work fine on single-threaded
>>> processes. "perf record" is also unaffected.
>>
>> It has been reported here:
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=216441
>
> If I do:
>
> ⬢[acme@toolbox perf-urgent]$ git log -2
> commit dfeb0bc60782471c293938e71b1a1117cfac2cb3 (HEAD -> perf/urgent)
> Author: Arnaldo Carvalho de Melo <[email protected]>
> Date: Fri Sep 2 16:15:39 2022 -0300
>
> Revert "libperf evlist: Check nr_mmaps is correct"
>
> This reverts commit 4ce47d842d4c16c07b135b8a7975b8f0672bcc0e.
>
> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
>
> commit 78cd283f6b8ab701cb35eafd5af8140560a88f16
> Author: Arnaldo Carvalho de Melo <[email protected]>
> Date: Fri Sep 2 16:13:41 2022 -0300
>
> Revert "libperf evlist: Allow mixing per-thread and per-cpu mmaps"
>
> This reverts commit ae4f8ae16a07896403c90305d4b9be27f657c1fc.
>
> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
> ⬢[acme@toolbox perf-urgent]$
>
> It works again, Tomáš can you please try doing this to see if this works
> for you?
>

This is the fix I have so far. I would like to test it some more though.

From: Adrian Hunter <[email protected]>
Date: Sat, 3 Sep 2022 10:05:08 +0300
Subject: [PATCH] libperf evlist: Fix per-thread mmaps for multi-threaded
targets

Offending commit did not consider the different set-output rules for
per-thread mmaps i.e. in the per-thread case set-output is used for
mmaps of the same thread not the same cpu.

This was not immediately noticed because it only happens with
multi-threaded targets and we do not have a test for that yet.

Fixes: ae4f8ae16a07 ("libperf evlist: Allow mixing per-thread and per-cpu mmaps")
Signed-off-by: Adrian Hunter <[email protected]>
---
tools/lib/perf/evlist.c | 49 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 49 insertions(+)

diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
index e6c98a6e3908e..24280c887520c 100644
--- a/tools/lib/perf/evlist.c
+++ b/tools/lib/perf/evlist.c
@@ -486,6 +486,7 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
if (ops->idx)
ops->idx(evlist, evsel, mp, idx);

+ pr_debug("idx %d: mmapping fd %d\n", idx, *output);
if (ops->mmap(map, mp, *output, evlist_cpu) < 0)
return -1;

@@ -494,6 +495,7 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
if (!idx)
perf_evlist__set_mmap_first(evlist, map, overwrite);
} else {
+ pr_debug("idx %d: set output fd %d -> %d\n", idx, fd, *output);
if (ioctl(fd, PERF_EVENT_IOC_SET_OUTPUT, *output) != 0)
return -1;

@@ -519,6 +521,47 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
return 0;
}

+static int
+mmap_per_thread(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
+ struct perf_mmap_param *mp)
+{
+ int nr_threads = perf_thread_map__nr(evlist->threads);
+ int nr_cpus = perf_cpu_map__nr(evlist->all_cpus);
+ int cpu, thread, idx = 0;
+ int nr_mmaps = 0;
+
+ pr_debug("%s: nr cpu values (may include -1) %d nr threads %d\n", __func__, nr_cpus, nr_threads);
+
+ /* per-thread mmaps */
+ for (thread = 0; thread < nr_threads; thread++, idx++) {
+ int output = -1;
+ int output_overwrite = -1;
+
+ if (mmap_per_evsel(evlist, ops, idx, mp, 0, thread, &output,
+ &output_overwrite, &nr_mmaps))
+ goto out_unmap;
+ }
+
+ /* system-wide mmaps i.e. per-cpu */
+ for (cpu = 1; cpu < nr_cpus; cpu++, idx ++) {
+ int output = -1;
+ int output_overwrite = -1;
+
+ if (mmap_per_evsel(evlist, ops, idx, mp, cpu, 0, &output,
+ &output_overwrite, &nr_mmaps))
+ goto out_unmap;
+ }
+
+ if (nr_mmaps != evlist->nr_mmaps)
+ pr_err("Miscounted nr_mmaps %d vs %d\n", nr_mmaps, evlist->nr_mmaps);
+
+ return 0;
+
+out_unmap:
+ perf_evlist__munmap(evlist);
+ return -1;
+}
+
static int
mmap_per_cpu(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
struct perf_mmap_param *mp)
@@ -528,6 +571,8 @@ mmap_per_cpu(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
int nr_mmaps = 0;
int cpu, thread;

+ pr_debug("%s: nr cpu values %d nr threads %d\n", __func__, nr_cpus, nr_threads);
+
for (cpu = 0; cpu < nr_cpus; cpu++) {
int output = -1;
int output_overwrite = -1;
@@ -569,6 +614,7 @@ int perf_evlist__mmap_ops(struct perf_evlist *evlist,
struct perf_evlist_mmap_ops *ops,
struct perf_mmap_param *mp)
{
+ const struct perf_cpu_map *cpus = evlist->all_cpus;
struct perf_evsel *evsel;

if (!ops || !ops->get || !ops->mmap)
@@ -588,6 +634,9 @@ int perf_evlist__mmap_ops(struct perf_evlist *evlist,
if (evlist->pollfd.entries == NULL && perf_evlist__alloc_pollfd(evlist) < 0)
return -ENOMEM;

+ if (perf_cpu_map__empty(cpus))
+ return mmap_per_thread(evlist, ops, mp);
+
return mmap_per_cpu(evlist, ops, mp);
}

--
2.34.1



2022-09-03 15:13:57

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: perf top -p broken for multithreaded processes since 5.19

Em Sat, Sep 03, 2022 at 10:14:25AM +0300, Adrian Hunter escreveu:
> On 2/09/22 22:17, Arnaldo Carvalho de Melo wrote:
> > Em Fri, Sep 02, 2022 at 05:50:22PM +0300, Adrian Hunter escreveu:
> >> On 2/09/22 17:46, Tomáš Trnka wrote:
> >>> Hello,
> >>>
> >>> A bug in perf v5.19 and newer completely breaks monitoring multithreaded
> >>> processes using "perf top -p". The tool fails to start with "Failed to mmap
> >>> with 22 (Invalid argument)". It still seems to work fine on single-threaded
> >>> processes. "perf record" is also unaffected.
> >>
> >> It has been reported here:
> >>
> >> https://bugzilla.kernel.org/show_bug.cgi?id=216441
> >
> > If I do:
> >
> > ⬢[acme@toolbox perf-urgent]$ git log -2
> > commit dfeb0bc60782471c293938e71b1a1117cfac2cb3 (HEAD -> perf/urgent)
> > Author: Arnaldo Carvalho de Melo <[email protected]>
> > Date: Fri Sep 2 16:15:39 2022 -0300
> >
> > Revert "libperf evlist: Check nr_mmaps is correct"
> >
> > This reverts commit 4ce47d842d4c16c07b135b8a7975b8f0672bcc0e.
> >
> > Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
> >
> > commit 78cd283f6b8ab701cb35eafd5af8140560a88f16
> > Author: Arnaldo Carvalho de Melo <[email protected]>
> > Date: Fri Sep 2 16:13:41 2022 -0300
> >
> > Revert "libperf evlist: Allow mixing per-thread and per-cpu mmaps"
> >
> > This reverts commit ae4f8ae16a07896403c90305d4b9be27f657c1fc.
> >
> > Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
> > ⬢[acme@toolbox perf-urgent]$
> >
> > It works again, Tomáš can you please try doing this to see if this works
> > for you?
> >
>
> This is the fix I have so far. I would like to test it some more though.

Ok, so I'll leave it for the next pull req, possibly after Linux
Plumbers.

What do you think about reverting those two patches for v6.0 and then
add this for v6.1?

- Arnaldo

> From: Adrian Hunter <[email protected]>
> Date: Sat, 3 Sep 2022 10:05:08 +0300
> Subject: [PATCH] libperf evlist: Fix per-thread mmaps for multi-threaded
> targets
>
> Offending commit did not consider the different set-output rules for
> per-thread mmaps i.e. in the per-thread case set-output is used for
> mmaps of the same thread not the same cpu.
>
> This was not immediately noticed because it only happens with
> multi-threaded targets and we do not have a test for that yet.
>
> Fixes: ae4f8ae16a07 ("libperf evlist: Allow mixing per-thread and per-cpu mmaps")
> Signed-off-by: Adrian Hunter <[email protected]>
> ---
> tools/lib/perf/evlist.c | 49 +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 49 insertions(+)
>
> diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
> index e6c98a6e3908e..24280c887520c 100644
> --- a/tools/lib/perf/evlist.c
> +++ b/tools/lib/perf/evlist.c
> @@ -486,6 +486,7 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
> if (ops->idx)
> ops->idx(evlist, evsel, mp, idx);
>
> + pr_debug("idx %d: mmapping fd %d\n", idx, *output);
> if (ops->mmap(map, mp, *output, evlist_cpu) < 0)
> return -1;
>
> @@ -494,6 +495,7 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
> if (!idx)
> perf_evlist__set_mmap_first(evlist, map, overwrite);
> } else {
> + pr_debug("idx %d: set output fd %d -> %d\n", idx, fd, *output);
> if (ioctl(fd, PERF_EVENT_IOC_SET_OUTPUT, *output) != 0)
> return -1;
>
> @@ -519,6 +521,47 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
> return 0;
> }
>
> +static int
> +mmap_per_thread(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
> + struct perf_mmap_param *mp)
> +{
> + int nr_threads = perf_thread_map__nr(evlist->threads);
> + int nr_cpus = perf_cpu_map__nr(evlist->all_cpus);
> + int cpu, thread, idx = 0;
> + int nr_mmaps = 0;
> +
> + pr_debug("%s: nr cpu values (may include -1) %d nr threads %d\n", __func__, nr_cpus, nr_threads);
> +
> + /* per-thread mmaps */
> + for (thread = 0; thread < nr_threads; thread++, idx++) {
> + int output = -1;
> + int output_overwrite = -1;
> +
> + if (mmap_per_evsel(evlist, ops, idx, mp, 0, thread, &output,
> + &output_overwrite, &nr_mmaps))
> + goto out_unmap;
> + }
> +
> + /* system-wide mmaps i.e. per-cpu */
> + for (cpu = 1; cpu < nr_cpus; cpu++, idx ++) {
> + int output = -1;
> + int output_overwrite = -1;
> +
> + if (mmap_per_evsel(evlist, ops, idx, mp, cpu, 0, &output,
> + &output_overwrite, &nr_mmaps))
> + goto out_unmap;
> + }
> +
> + if (nr_mmaps != evlist->nr_mmaps)
> + pr_err("Miscounted nr_mmaps %d vs %d\n", nr_mmaps, evlist->nr_mmaps);
> +
> + return 0;
> +
> +out_unmap:
> + perf_evlist__munmap(evlist);
> + return -1;
> +}
> +
> static int
> mmap_per_cpu(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
> struct perf_mmap_param *mp)
> @@ -528,6 +571,8 @@ mmap_per_cpu(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
> int nr_mmaps = 0;
> int cpu, thread;
>
> + pr_debug("%s: nr cpu values %d nr threads %d\n", __func__, nr_cpus, nr_threads);
> +
> for (cpu = 0; cpu < nr_cpus; cpu++) {
> int output = -1;
> int output_overwrite = -1;
> @@ -569,6 +614,7 @@ int perf_evlist__mmap_ops(struct perf_evlist *evlist,
> struct perf_evlist_mmap_ops *ops,
> struct perf_mmap_param *mp)
> {
> + const struct perf_cpu_map *cpus = evlist->all_cpus;
> struct perf_evsel *evsel;
>
> if (!ops || !ops->get || !ops->mmap)
> @@ -588,6 +634,9 @@ int perf_evlist__mmap_ops(struct perf_evlist *evlist,
> if (evlist->pollfd.entries == NULL && perf_evlist__alloc_pollfd(evlist) < 0)
> return -ENOMEM;
>
> + if (perf_cpu_map__empty(cpus))
> + return mmap_per_thread(evlist, ops, mp);
> +
> return mmap_per_cpu(evlist, ops, mp);
> }
>
> --
> 2.34.1
>
>

--

- Arnaldo

2022-09-03 17:38:31

by Adrian Hunter

[permalink] [raw]
Subject: Re: perf top -p broken for multithreaded processes since 5.19

On 3/09/22 17:08, Arnaldo Carvalho de Melo wrote:
> Em Sat, Sep 03, 2022 at 10:14:25AM +0300, Adrian Hunter escreveu:
>> On 2/09/22 22:17, Arnaldo Carvalho de Melo wrote:
>>> Em Fri, Sep 02, 2022 at 05:50:22PM +0300, Adrian Hunter escreveu:
>>>> On 2/09/22 17:46, Tomáš Trnka wrote:
>>>>> Hello,
>>>>>
>>>>> A bug in perf v5.19 and newer completely breaks monitoring multithreaded
>>>>> processes using "perf top -p". The tool fails to start with "Failed to mmap
>>>>> with 22 (Invalid argument)". It still seems to work fine on single-threaded
>>>>> processes. "perf record" is also unaffected.
>>>>
>>>> It has been reported here:
>>>>
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=216441
>>>
>>> If I do:
>>>
>>> ⬢[acme@toolbox perf-urgent]$ git log -2
>>> commit dfeb0bc60782471c293938e71b1a1117cfac2cb3 (HEAD -> perf/urgent)
>>> Author: Arnaldo Carvalho de Melo <[email protected]>
>>> Date: Fri Sep 2 16:15:39 2022 -0300
>>>
>>> Revert "libperf evlist: Check nr_mmaps is correct"
>>>
>>> This reverts commit 4ce47d842d4c16c07b135b8a7975b8f0672bcc0e.
>>>
>>> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
>>>
>>> commit 78cd283f6b8ab701cb35eafd5af8140560a88f16
>>> Author: Arnaldo Carvalho de Melo <[email protected]>
>>> Date: Fri Sep 2 16:13:41 2022 -0300
>>>
>>> Revert "libperf evlist: Allow mixing per-thread and per-cpu mmaps"
>>>
>>> This reverts commit ae4f8ae16a07896403c90305d4b9be27f657c1fc.
>>>
>>> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
>>> ⬢[acme@toolbox perf-urgent]$
>>>
>>> It works again, Tomáš can you please try doing this to see if this works
>>> for you?
>>>
>>
>> This is the fix I have so far. I would like to test it some more though.
>
> Ok, so I'll leave it for the next pull req, possibly after Linux
> Plumbers.
>
> What do you think about reverting those two patches for v6.0 and then
> add this for v6.1?

That would break sideband collection with selected CPUs, so I would
prefer to avoid that.

The fix is a effectively a partial revert of "libperf evlist: Allow
mixing per-thread and per-cpu mmaps" anyway. I just need a few days
to do more testing.

2022-09-05 11:28:15

by Jiri Olsa

[permalink] [raw]
Subject: Re: perf top -p broken for multithreaded processes since 5.19

On Sat, Sep 03, 2022 at 10:14:25AM +0300, Adrian Hunter wrote:
> On 2/09/22 22:17, Arnaldo Carvalho de Melo wrote:
> > Em Fri, Sep 02, 2022 at 05:50:22PM +0300, Adrian Hunter escreveu:
> >> On 2/09/22 17:46, Tomáš Trnka wrote:
> >>> Hello,
> >>>
> >>> A bug in perf v5.19 and newer completely breaks monitoring multithreaded
> >>> processes using "perf top -p". The tool fails to start with "Failed to mmap
> >>> with 22 (Invalid argument)". It still seems to work fine on single-threaded
> >>> processes. "perf record" is also unaffected.
> >>
> >> It has been reported here:
> >>
> >> https://bugzilla.kernel.org/show_bug.cgi?id=216441
> >
> > If I do:
> >
> > ⬢[acme@toolbox perf-urgent]$ git log -2
> > commit dfeb0bc60782471c293938e71b1a1117cfac2cb3 (HEAD -> perf/urgent)
> > Author: Arnaldo Carvalho de Melo <[email protected]>
> > Date: Fri Sep 2 16:15:39 2022 -0300
> >
> > Revert "libperf evlist: Check nr_mmaps is correct"
> >
> > This reverts commit 4ce47d842d4c16c07b135b8a7975b8f0672bcc0e.
> >
> > Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
> >
> > commit 78cd283f6b8ab701cb35eafd5af8140560a88f16
> > Author: Arnaldo Carvalho de Melo <[email protected]>
> > Date: Fri Sep 2 16:13:41 2022 -0300
> >
> > Revert "libperf evlist: Allow mixing per-thread and per-cpu mmaps"
> >
> > This reverts commit ae4f8ae16a07896403c90305d4b9be27f657c1fc.
> >
> > Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
> > ⬢[acme@toolbox perf-urgent]$
> >
> > It works again, Tomáš can you please try doing this to see if this works
> > for you?
> >
>
> This is the fix I have so far. I would like to test it some more though.
>
> From: Adrian Hunter <[email protected]>
> Date: Sat, 3 Sep 2022 10:05:08 +0300
> Subject: [PATCH] libperf evlist: Fix per-thread mmaps for multi-threaded
> targets
>
> Offending commit did not consider the different set-output rules for
> per-thread mmaps i.e. in the per-thread case set-output is used for
> mmaps of the same thread not the same cpu.
>
> This was not immediately noticed because it only happens with
> multi-threaded targets and we do not have a test for that yet.
>
> Fixes: ae4f8ae16a07 ("libperf evlist: Allow mixing per-thread and per-cpu mmaps")
> Signed-off-by: Adrian Hunter <[email protected]>

SNIP

> static int
> mmap_per_cpu(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
> struct perf_mmap_param *mp)
> @@ -528,6 +571,8 @@ mmap_per_cpu(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
> int nr_mmaps = 0;
> int cpu, thread;
>
> + pr_debug("%s: nr cpu values %d nr threads %d\n", __func__, nr_cpus, nr_threads);
> +
> for (cpu = 0; cpu < nr_cpus; cpu++) {
> int output = -1;
> int output_overwrite = -1;
> @@ -569,6 +614,7 @@ int perf_evlist__mmap_ops(struct perf_evlist *evlist,
> struct perf_evlist_mmap_ops *ops,
> struct perf_mmap_param *mp)
> {
> + const struct perf_cpu_map *cpus = evlist->all_cpus;
> struct perf_evsel *evsel;
>
> if (!ops || !ops->get || !ops->mmap)
> @@ -588,6 +634,9 @@ int perf_evlist__mmap_ops(struct perf_evlist *evlist,
> if (evlist->pollfd.entries == NULL && perf_evlist__alloc_pollfd(evlist) < 0)
> return -ENOMEM;
>
> + if (perf_cpu_map__empty(cpus))
> + return mmap_per_thread(evlist, ops, mp);
> +

could we just enable per-cpu mapping in top? I'd think it should be
enabled anyway because we need ring buffers on all cpus for sampling

jirka


---
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index e89208b4ad4b..d6be82315dd8 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1422,6 +1422,7 @@ int cmd_top(int argc, const char **argv)
.freq = 4000, /* 4 KHz */
.target = {
.uses_mmap = true,
+ .default_per_cpu = true,
},
/*
* FIXME: This will lose PERF_RECORD_MMAP and other metadata