2022-09-05 12:27:25

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V2] libperf evlist: Fix per-thread mmaps for multi-threaded targets

The offending commit removed mmap_per_thread(), which did not consider
the different set-output rules for per-thread mmaps i.e. in the per-thread
case set-output is used for file descriptors of the same thread not the
same cpu.

This was not immediately noticed because it only happens with
multi-threaded targets and we do not have a test for that yet.

Reinstate mmap_per_thread() expanding it to cover also system-wide per-cpu
events i.e. to continue to allow the mixing of per-thread and per-cpu
mmaps.

Debug messages (with -vv) show the file descriptors that are opened with
sys_perf_event_open. New debug messages are added (needs -vvv) that show
also which file descriptors are mmapped and which are redirected with
set-output.

In the per-cpu case (cpu != -1) file descriptors for the same CPU are
set-output to the first file descriptor for that CPU.

In the per-thread case (cpu == -1) file descriptors for the same thread are
set-output to the first file descriptor for that thread.

Example (process 17489 has 2 threads):

Before (but with new debug prints):

$ perf record --no-bpf-event -vvv --per-thread -p 17489
<SNIP>
sys_perf_event_open: pid 17489 cpu -1 group_fd -1 flags 0x8 = 5
sys_perf_event_open: pid 17490 cpu -1 group_fd -1 flags 0x8 = 6
<SNIP>
libperf: idx 0: mmapping fd 5
libperf: idx 0: set output fd 6 -> 5
failed to mmap with 22 (Invalid argument)

After:

$ perf record --no-bpf-event -vvv --per-thread -p 17489
<SNIP>
sys_perf_event_open: pid 17489 cpu -1 group_fd -1 flags 0x8 = 5
sys_perf_event_open: pid 17490 cpu -1 group_fd -1 flags 0x8 = 6
<SNIP>
libperf: mmap_per_thread: nr cpu values (may include -1) 1 nr threads 2
libperf: idx 0: mmapping fd 5
libperf: idx 1: mmapping fd 6
<SNIP>
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.018 MB perf.data (15 samples) ]

Per-cpu example (process 20341 has 2 threads, same as above):

$ perf record --no-bpf-event -vvv -p 20341
<SNIP>
sys_perf_event_open: pid 20341 cpu 0 group_fd -1 flags 0x8 = 5
sys_perf_event_open: pid 20342 cpu 0 group_fd -1 flags 0x8 = 6
sys_perf_event_open: pid 20341 cpu 1 group_fd -1 flags 0x8 = 7
sys_perf_event_open: pid 20342 cpu 1 group_fd -1 flags 0x8 = 8
sys_perf_event_open: pid 20341 cpu 2 group_fd -1 flags 0x8 = 9
sys_perf_event_open: pid 20342 cpu 2 group_fd -1 flags 0x8 = 10
sys_perf_event_open: pid 20341 cpu 3 group_fd -1 flags 0x8 = 11
sys_perf_event_open: pid 20342 cpu 3 group_fd -1 flags 0x8 = 12
sys_perf_event_open: pid 20341 cpu 4 group_fd -1 flags 0x8 = 13
sys_perf_event_open: pid 20342 cpu 4 group_fd -1 flags 0x8 = 14
sys_perf_event_open: pid 20341 cpu 5 group_fd -1 flags 0x8 = 15
sys_perf_event_open: pid 20342 cpu 5 group_fd -1 flags 0x8 = 16
sys_perf_event_open: pid 20341 cpu 6 group_fd -1 flags 0x8 = 17
sys_perf_event_open: pid 20342 cpu 6 group_fd -1 flags 0x8 = 18
sys_perf_event_open: pid 20341 cpu 7 group_fd -1 flags 0x8 = 19
sys_perf_event_open: pid 20342 cpu 7 group_fd -1 flags 0x8 = 20
<SNIP>
libperf: mmap_per_cpu: nr cpu values 8 nr threads 2
libperf: idx 0: mmapping fd 5
libperf: idx 0: set output fd 6 -> 5
libperf: idx 1: mmapping fd 7
libperf: idx 1: set output fd 8 -> 7
libperf: idx 2: mmapping fd 9
libperf: idx 2: set output fd 10 -> 9
libperf: idx 3: mmapping fd 11
libperf: idx 3: set output fd 12 -> 11
libperf: idx 4: mmapping fd 13
libperf: idx 4: set output fd 14 -> 13
libperf: idx 5: mmapping fd 15
libperf: idx 5: set output fd 16 -> 15
libperf: idx 6: mmapping fd 17
libperf: idx 6: set output fd 18 -> 17
libperf: idx 7: mmapping fd 19
libperf: idx 7: set output fd 20 -> 19
<SNIP>
[ perf record: Woken up 7 times to write data ]
[ perf record: Captured and wrote 0.020 MB perf.data (17 samples) ]

Fixes: ae4f8ae16a07 ("libperf evlist: Allow mixing per-thread and per-cpu mmaps")
Signed-off-by: Adrian Hunter <[email protected]>
---


Changes in V2:

Expand commit message.
White-space fixups.


tools/lib/perf/evlist.c | 50 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 50 insertions(+)

diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
index e6c98a6e3908..6b1bafe267a4 100644
--- a/tools/lib/perf/evlist.c
+++ b/tools/lib/perf/evlist.c
@@ -486,6 +486,7 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
if (ops->idx)
ops->idx(evlist, evsel, mp, idx);

+ pr_debug("idx %d: mmapping fd %d\n", idx, *output);
if (ops->mmap(map, mp, *output, evlist_cpu) < 0)
return -1;

@@ -494,6 +495,7 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
if (!idx)
perf_evlist__set_mmap_first(evlist, map, overwrite);
} else {
+ pr_debug("idx %d: set output fd %d -> %d\n", idx, fd, *output);
if (ioctl(fd, PERF_EVENT_IOC_SET_OUTPUT, *output) != 0)
return -1;

@@ -519,6 +521,48 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
return 0;
}

+static int
+mmap_per_thread(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
+ struct perf_mmap_param *mp)
+{
+ int nr_threads = perf_thread_map__nr(evlist->threads);
+ int nr_cpus = perf_cpu_map__nr(evlist->all_cpus);
+ int cpu, thread, idx = 0;
+ int nr_mmaps = 0;
+
+ pr_debug("%s: nr cpu values (may include -1) %d nr threads %d\n",
+ __func__, nr_cpus, nr_threads);
+
+ /* per-thread mmaps */
+ for (thread = 0; thread < nr_threads; thread++, idx++) {
+ int output = -1;
+ int output_overwrite = -1;
+
+ if (mmap_per_evsel(evlist, ops, idx, mp, 0, thread, &output,
+ &output_overwrite, &nr_mmaps))
+ goto out_unmap;
+ }
+
+ /* system-wide mmaps i.e. per-cpu */
+ for (cpu = 1; cpu < nr_cpus; cpu++, idx++) {
+ int output = -1;
+ int output_overwrite = -1;
+
+ if (mmap_per_evsel(evlist, ops, idx, mp, cpu, 0, &output,
+ &output_overwrite, &nr_mmaps))
+ goto out_unmap;
+ }
+
+ if (nr_mmaps != evlist->nr_mmaps)
+ pr_err("Miscounted nr_mmaps %d vs %d\n", nr_mmaps, evlist->nr_mmaps);
+
+ return 0;
+
+out_unmap:
+ perf_evlist__munmap(evlist);
+ return -1;
+}
+
static int
mmap_per_cpu(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
struct perf_mmap_param *mp)
@@ -528,6 +572,8 @@ mmap_per_cpu(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
int nr_mmaps = 0;
int cpu, thread;

+ pr_debug("%s: nr cpu values %d nr threads %d\n", __func__, nr_cpus, nr_threads);
+
for (cpu = 0; cpu < nr_cpus; cpu++) {
int output = -1;
int output_overwrite = -1;
@@ -569,6 +615,7 @@ int perf_evlist__mmap_ops(struct perf_evlist *evlist,
struct perf_evlist_mmap_ops *ops,
struct perf_mmap_param *mp)
{
+ const struct perf_cpu_map *cpus = evlist->all_cpus;
struct perf_evsel *evsel;

if (!ops || !ops->get || !ops->mmap)
@@ -588,6 +635,9 @@ int perf_evlist__mmap_ops(struct perf_evlist *evlist,
if (evlist->pollfd.entries == NULL && perf_evlist__alloc_pollfd(evlist) < 0)
return -ENOMEM;

+ if (perf_cpu_map__empty(cpus))
+ return mmap_per_thread(evlist, ops, mp);
+
return mmap_per_cpu(evlist, ops, mp);
}

--
2.25.1


2022-09-06 13:23:26

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH V2] libperf evlist: Fix per-thread mmaps for multi-threaded targets

On Mon, Sep 05, 2022 at 02:42:09PM +0300, Adrian Hunter wrote:

SNIP

> diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
> index e6c98a6e3908..6b1bafe267a4 100644
> --- a/tools/lib/perf/evlist.c
> +++ b/tools/lib/perf/evlist.c
> @@ -486,6 +486,7 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
> if (ops->idx)
> ops->idx(evlist, evsel, mp, idx);
>
> + pr_debug("idx %d: mmapping fd %d\n", idx, *output);
> if (ops->mmap(map, mp, *output, evlist_cpu) < 0)
> return -1;
>
> @@ -494,6 +495,7 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
> if (!idx)
> perf_evlist__set_mmap_first(evlist, map, overwrite);
> } else {
> + pr_debug("idx %d: set output fd %d -> %d\n", idx, fd, *output);
> if (ioctl(fd, PERF_EVENT_IOC_SET_OUTPUT, *output) != 0)
> return -1;
>
> @@ -519,6 +521,48 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
> return 0;
> }
>
> +static int
> +mmap_per_thread(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
> + struct perf_mmap_param *mp)
> +{
> + int nr_threads = perf_thread_map__nr(evlist->threads);
> + int nr_cpus = perf_cpu_map__nr(evlist->all_cpus);
> + int cpu, thread, idx = 0;
> + int nr_mmaps = 0;
> +
> + pr_debug("%s: nr cpu values (may include -1) %d nr threads %d\n",
> + __func__, nr_cpus, nr_threads);

-1 as cpu value is only for 'empty' perf_cpu_map, right?

> +
> + /* per-thread mmaps */
> + for (thread = 0; thread < nr_threads; thread++, idx++) {
> + int output = -1;
> + int output_overwrite = -1;
> +
> + if (mmap_per_evsel(evlist, ops, idx, mp, 0, thread, &output,
> + &output_overwrite, &nr_mmaps))
> + goto out_unmap;
> + }
> +
> + /* system-wide mmaps i.e. per-cpu */
> + for (cpu = 1; cpu < nr_cpus; cpu++, idx++) {
> + int output = -1;
> + int output_overwrite = -1;
> +
> + if (mmap_per_evsel(evlist, ops, idx, mp, cpu, 0, &output,
> + &output_overwrite, &nr_mmaps))
> + goto out_unmap;
> + }

will this loop be executed? we are in here because all_cpus is empty, right?

thanks,
jirka

> +
> + if (nr_mmaps != evlist->nr_mmaps)
> + pr_err("Miscounted nr_mmaps %d vs %d\n", nr_mmaps, evlist->nr_mmaps);
> +
> + return 0;
> +
> +out_unmap:
> + perf_evlist__munmap(evlist);
> + return -1;
> +}

SNIP

2022-09-06 13:30:30

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH V2] libperf evlist: Fix per-thread mmaps for multi-threaded targets

Em Mon, Sep 05, 2022 at 02:42:09PM +0300, Adrian Hunter escreveu:
> The offending commit removed mmap_per_thread(), which did not consider
> the different set-output rules for per-thread mmaps i.e. in the per-thread
> case set-output is used for file descriptors of the same thread not the
> same cpu.
>
> Fixes: ae4f8ae16a07 ("libperf evlist: Allow mixing per-thread and per-cpu mmaps")
> Signed-off-by: Adrian Hunter <[email protected]>

I added these:

Reported-by: Tomáš Trnka <[email protected]>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216441

Jiri made a comment, can you please reply?

Applied locally for further tests,

Thanks,

- Arnaldo

2022-09-06 15:21:01

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH V2] libperf evlist: Fix per-thread mmaps for multi-threaded targets

On 6/09/22 15:59, Jiri Olsa wrote:
> On Mon, Sep 05, 2022 at 02:42:09PM +0300, Adrian Hunter wrote:
>
> SNIP
>
>> diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
>> index e6c98a6e3908..6b1bafe267a4 100644
>> --- a/tools/lib/perf/evlist.c
>> +++ b/tools/lib/perf/evlist.c
>> @@ -486,6 +486,7 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
>> if (ops->idx)
>> ops->idx(evlist, evsel, mp, idx);
>>
>> + pr_debug("idx %d: mmapping fd %d\n", idx, *output);
>> if (ops->mmap(map, mp, *output, evlist_cpu) < 0)
>> return -1;
>>
>> @@ -494,6 +495,7 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
>> if (!idx)
>> perf_evlist__set_mmap_first(evlist, map, overwrite);
>> } else {
>> + pr_debug("idx %d: set output fd %d -> %d\n", idx, fd, *output);
>> if (ioctl(fd, PERF_EVENT_IOC_SET_OUTPUT, *output) != 0)
>> return -1;
>>
>> @@ -519,6 +521,48 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
>> return 0;
>> }
>>
>> +static int
>> +mmap_per_thread(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
>> + struct perf_mmap_param *mp)
>> +{
>> + int nr_threads = perf_thread_map__nr(evlist->threads);
>> + int nr_cpus = perf_cpu_map__nr(evlist->all_cpus);
>> + int cpu, thread, idx = 0;
>> + int nr_mmaps = 0;
>> +
>> + pr_debug("%s: nr cpu values (may include -1) %d nr threads %d\n",
>> + __func__, nr_cpus, nr_threads);
>
> -1 as cpu value is only for 'empty' perf_cpu_map, right?

The cpu map is a map of valid 3rd arguments to perf_event_open, so -1
means all CPUs which is per-thread by necessity.

>
>> +
>> + /* per-thread mmaps */
>> + for (thread = 0; thread < nr_threads; thread++, idx++) {
>> + int output = -1;
>> + int output_overwrite = -1;
>> +
>> + if (mmap_per_evsel(evlist, ops, idx, mp, 0, thread, &output,
>> + &output_overwrite, &nr_mmaps))
>> + goto out_unmap;
>> + }
>> +
>> + /* system-wide mmaps i.e. per-cpu */
>> + for (cpu = 1; cpu < nr_cpus; cpu++, idx++) {
>> + int output = -1;
>> + int output_overwrite = -1;
>> +
>> + if (mmap_per_evsel(evlist, ops, idx, mp, cpu, 0, &output,
>> + &output_overwrite, &nr_mmaps))
>> + goto out_unmap;
>> + }
>
> will this loop be executed? we are in here because all_cpus is empty, right?

Yes it is executed. I put back the code that was there before ae4f8ae16a07
("libperf evlist: Allow mixing per-thread and per-cpu mmaps"), which uses
perf_cpu_map__empty() which only checks the first entry is -1:

bool perf_cpu_map__empty(const struct perf_cpu_map *map)
{
return map ? map->map[0].cpu == -1 : true;
}

But there can be more CPUs in the map, so perf_cpu_map__empty()
returns true for the per-thread case, as desired, even if there
are also system-wide CPUs.

I guess perf_cpu_map__empty() needs renaming.

>
> thanks,
> jirka
>
>> +
>> + if (nr_mmaps != evlist->nr_mmaps)
>> + pr_err("Miscounted nr_mmaps %d vs %d\n", nr_mmaps, evlist->nr_mmaps);
>> +
>> + return 0;
>> +
>> +out_unmap:
>> + perf_evlist__munmap(evlist);
>> + return -1;
>> +}
>
> SNIP

2022-09-06 18:11:09

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH V2] libperf evlist: Fix per-thread mmaps for multi-threaded targets

On Tue, Sep 6, 2022 at 7:04 AM Adrian Hunter <[email protected]> wrote:
>
> On 6/09/22 15:59, Jiri Olsa wrote:
> > On Mon, Sep 05, 2022 at 02:42:09PM +0300, Adrian Hunter wrote:
> >
> > SNIP
> >
> >> diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
> >> index e6c98a6e3908..6b1bafe267a4 100644
> >> --- a/tools/lib/perf/evlist.c
> >> +++ b/tools/lib/perf/evlist.c
> >> @@ -486,6 +486,7 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
> >> if (ops->idx)
> >> ops->idx(evlist, evsel, mp, idx);
> >>
> >> + pr_debug("idx %d: mmapping fd %d\n", idx, *output);
> >> if (ops->mmap(map, mp, *output, evlist_cpu) < 0)
> >> return -1;
> >>
> >> @@ -494,6 +495,7 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
> >> if (!idx)
> >> perf_evlist__set_mmap_first(evlist, map, overwrite);
> >> } else {
> >> + pr_debug("idx %d: set output fd %d -> %d\n", idx, fd, *output);
> >> if (ioctl(fd, PERF_EVENT_IOC_SET_OUTPUT, *output) != 0)
> >> return -1;
> >>
> >> @@ -519,6 +521,48 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
> >> return 0;
> >> }
> >>
> >> +static int
> >> +mmap_per_thread(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
> >> + struct perf_mmap_param *mp)
> >> +{
> >> + int nr_threads = perf_thread_map__nr(evlist->threads);
> >> + int nr_cpus = perf_cpu_map__nr(evlist->all_cpus);
> >> + int cpu, thread, idx = 0;
> >> + int nr_mmaps = 0;
> >> +
> >> + pr_debug("%s: nr cpu values (may include -1) %d nr threads %d\n",
> >> + __func__, nr_cpus, nr_threads);
> >
> > -1 as cpu value is only for 'empty' perf_cpu_map, right?
>
> The cpu map is a map of valid 3rd arguments to perf_event_open, so -1
> means all CPUs which is per-thread by necessity.
>
> >
> >> +
> >> + /* per-thread mmaps */
> >> + for (thread = 0; thread < nr_threads; thread++, idx++) {
> >> + int output = -1;
> >> + int output_overwrite = -1;
> >> +
> >> + if (mmap_per_evsel(evlist, ops, idx, mp, 0, thread, &output,
> >> + &output_overwrite, &nr_mmaps))
> >> + goto out_unmap;
> >> + }
> >> +
> >> + /* system-wide mmaps i.e. per-cpu */
> >> + for (cpu = 1; cpu < nr_cpus; cpu++, idx++) {
> >> + int output = -1;
> >> + int output_overwrite = -1;
> >> +
> >> + if (mmap_per_evsel(evlist, ops, idx, mp, cpu, 0, &output,
> >> + &output_overwrite, &nr_mmaps))
> >> + goto out_unmap;
> >> + }
> >
> > will this loop be executed? we are in here because all_cpus is empty, right?
>
> Yes it is executed. I put back the code that was there before ae4f8ae16a07
> ("libperf evlist: Allow mixing per-thread and per-cpu mmaps"), which uses
> perf_cpu_map__empty() which only checks the first entry is -1:
>
> bool perf_cpu_map__empty(const struct perf_cpu_map *map)
> {
> return map ? map->map[0].cpu == -1 : true;
> }
>
> But there can be more CPUs in the map, so perf_cpu_map__empty()
> returns true for the per-thread case, as desired, even if there
> are also system-wide CPUs.
>
> I guess perf_cpu_map__empty() needs renaming.

Let me nitpick :-) -1 means any CPU, as per the perf_event_open man
page, all CPUs would be a CPU map with every CPU on the system in it -
all CPUs can have some ambiguity in whether it includes offline CPUs,
which isn't true for any. I've not sent out a patch to clean up the
libperf CPU map code as I'd like to propose we start a libperf2 (with
LPC coming up it'd be a good place to discuss this, there's also the
office hours on Thursday). What I hope for libperf2 is that we can
make its license the same as libbpf, so the code can be more widely
included in projects. As such it wouldn't be valid to copy paste
libperf's CPU map into libperf2, but a larger clean up and refactor
could be put into libperf2 with libperf and perf depending upon it.
There are other areas that'd benefit from cleanup such as how a dash
is handled by parse events. It makes sense (imo) for this all to
become libperf2 work, and once we're happy with libperf2 we can
replace libperf completely.

Thanks,
Ian

> >
> > thanks,
> > jirka
> >
> >> +
> >> + if (nr_mmaps != evlist->nr_mmaps)
> >> + pr_err("Miscounted nr_mmaps %d vs %d\n", nr_mmaps, evlist->nr_mmaps);
> >> +
> >> + return 0;
> >> +
> >> +out_unmap:
> >> + perf_evlist__munmap(evlist);
> >> + return -1;
> >> +}
> >
> > SNIP
>

2022-09-06 18:15:47

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH V2] libperf evlist: Fix per-thread mmaps for multi-threaded targets

On Mon, Sep 5, 2022 at 4:42 AM Adrian Hunter <[email protected]> wrote:
>
> The offending commit removed mmap_per_thread(), which did not consider
> the different set-output rules for per-thread mmaps i.e. in the per-thread
> case set-output is used for file descriptors of the same thread not the
> same cpu.
>
> This was not immediately noticed because it only happens with
> multi-threaded targets and we do not have a test for that yet.

Yeah, this is unfortunate. I feel like I need to spend some time on it.

>
> Reinstate mmap_per_thread() expanding it to cover also system-wide per-cpu
> events i.e. to continue to allow the mixing of per-thread and per-cpu
> mmaps.
>
> Debug messages (with -vv) show the file descriptors that are opened with
> sys_perf_event_open. New debug messages are added (needs -vvv) that show
> also which file descriptors are mmapped and which are redirected with
> set-output.
>
> In the per-cpu case (cpu != -1) file descriptors for the same CPU are
> set-output to the first file descriptor for that CPU.
>
> In the per-thread case (cpu == -1) file descriptors for the same thread are
> set-output to the first file descriptor for that thread.
>
> Example (process 17489 has 2 threads):
>
> Before (but with new debug prints):
>
> $ perf record --no-bpf-event -vvv --per-thread -p 17489
> <SNIP>
> sys_perf_event_open: pid 17489 cpu -1 group_fd -1 flags 0x8 = 5
> sys_perf_event_open: pid 17490 cpu -1 group_fd -1 flags 0x8 = 6
> <SNIP>
> libperf: idx 0: mmapping fd 5
> libperf: idx 0: set output fd 6 -> 5
> failed to mmap with 22 (Invalid argument)
>
> After:
>
> $ perf record --no-bpf-event -vvv --per-thread -p 17489
> <SNIP>
> sys_perf_event_open: pid 17489 cpu -1 group_fd -1 flags 0x8 = 5
> sys_perf_event_open: pid 17490 cpu -1 group_fd -1 flags 0x8 = 6
> <SNIP>
> libperf: mmap_per_thread: nr cpu values (may include -1) 1 nr threads 2
> libperf: idx 0: mmapping fd 5
> libperf: idx 1: mmapping fd 6
> <SNIP>
> [ perf record: Woken up 2 times to write data ]
> [ perf record: Captured and wrote 0.018 MB perf.data (15 samples) ]

It'd be nice if the example had 2 events so that it could check the
set-output rule actually worked.

Thanks,
Namhyung

>
> Per-cpu example (process 20341 has 2 threads, same as above):
>
> $ perf record --no-bpf-event -vvv -p 20341
> <SNIP>
> sys_perf_event_open: pid 20341 cpu 0 group_fd -1 flags 0x8 = 5
> sys_perf_event_open: pid 20342 cpu 0 group_fd -1 flags 0x8 = 6
> sys_perf_event_open: pid 20341 cpu 1 group_fd -1 flags 0x8 = 7
> sys_perf_event_open: pid 20342 cpu 1 group_fd -1 flags 0x8 = 8
> sys_perf_event_open: pid 20341 cpu 2 group_fd -1 flags 0x8 = 9
> sys_perf_event_open: pid 20342 cpu 2 group_fd -1 flags 0x8 = 10
> sys_perf_event_open: pid 20341 cpu 3 group_fd -1 flags 0x8 = 11
> sys_perf_event_open: pid 20342 cpu 3 group_fd -1 flags 0x8 = 12
> sys_perf_event_open: pid 20341 cpu 4 group_fd -1 flags 0x8 = 13
> sys_perf_event_open: pid 20342 cpu 4 group_fd -1 flags 0x8 = 14
> sys_perf_event_open: pid 20341 cpu 5 group_fd -1 flags 0x8 = 15
> sys_perf_event_open: pid 20342 cpu 5 group_fd -1 flags 0x8 = 16
> sys_perf_event_open: pid 20341 cpu 6 group_fd -1 flags 0x8 = 17
> sys_perf_event_open: pid 20342 cpu 6 group_fd -1 flags 0x8 = 18
> sys_perf_event_open: pid 20341 cpu 7 group_fd -1 flags 0x8 = 19
> sys_perf_event_open: pid 20342 cpu 7 group_fd -1 flags 0x8 = 20
> <SNIP>
> libperf: mmap_per_cpu: nr cpu values 8 nr threads 2
> libperf: idx 0: mmapping fd 5
> libperf: idx 0: set output fd 6 -> 5
> libperf: idx 1: mmapping fd 7
> libperf: idx 1: set output fd 8 -> 7
> libperf: idx 2: mmapping fd 9
> libperf: idx 2: set output fd 10 -> 9
> libperf: idx 3: mmapping fd 11
> libperf: idx 3: set output fd 12 -> 11
> libperf: idx 4: mmapping fd 13
> libperf: idx 4: set output fd 14 -> 13
> libperf: idx 5: mmapping fd 15
> libperf: idx 5: set output fd 16 -> 15
> libperf: idx 6: mmapping fd 17
> libperf: idx 6: set output fd 18 -> 17
> libperf: idx 7: mmapping fd 19
> libperf: idx 7: set output fd 20 -> 19
> <SNIP>
> [ perf record: Woken up 7 times to write data ]
> [ perf record: Captured and wrote 0.020 MB perf.data (17 samples) ]
>
> Fixes: ae4f8ae16a07 ("libperf evlist: Allow mixing per-thread and per-cpu mmaps")
> Signed-off-by: Adrian Hunter <[email protected]>

2022-09-06 19:50:11

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH V2] libperf evlist: Fix per-thread mmaps for multi-threaded targets

On Tue, Sep 06, 2022 at 05:04:45PM +0300, Adrian Hunter wrote:
> On 6/09/22 15:59, Jiri Olsa wrote:
> > On Mon, Sep 05, 2022 at 02:42:09PM +0300, Adrian Hunter wrote:
> >
> > SNIP
> >
> >> diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
> >> index e6c98a6e3908..6b1bafe267a4 100644
> >> --- a/tools/lib/perf/evlist.c
> >> +++ b/tools/lib/perf/evlist.c
> >> @@ -486,6 +486,7 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
> >> if (ops->idx)
> >> ops->idx(evlist, evsel, mp, idx);
> >>
> >> + pr_debug("idx %d: mmapping fd %d\n", idx, *output);
> >> if (ops->mmap(map, mp, *output, evlist_cpu) < 0)
> >> return -1;
> >>
> >> @@ -494,6 +495,7 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
> >> if (!idx)
> >> perf_evlist__set_mmap_first(evlist, map, overwrite);
> >> } else {
> >> + pr_debug("idx %d: set output fd %d -> %d\n", idx, fd, *output);
> >> if (ioctl(fd, PERF_EVENT_IOC_SET_OUTPUT, *output) != 0)
> >> return -1;
> >>
> >> @@ -519,6 +521,48 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
> >> return 0;
> >> }
> >>
> >> +static int
> >> +mmap_per_thread(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
> >> + struct perf_mmap_param *mp)
> >> +{
> >> + int nr_threads = perf_thread_map__nr(evlist->threads);
> >> + int nr_cpus = perf_cpu_map__nr(evlist->all_cpus);
> >> + int cpu, thread, idx = 0;
> >> + int nr_mmaps = 0;
> >> +
> >> + pr_debug("%s: nr cpu values (may include -1) %d nr threads %d\n",
> >> + __func__, nr_cpus, nr_threads);
> >
> > -1 as cpu value is only for 'empty' perf_cpu_map, right?
>
> The cpu map is a map of valid 3rd arguments to perf_event_open, so -1
> means all CPUs which is per-thread by necessity.
>
> >
> >> +
> >> + /* per-thread mmaps */
> >> + for (thread = 0; thread < nr_threads; thread++, idx++) {
> >> + int output = -1;
> >> + int output_overwrite = -1;
> >> +
> >> + if (mmap_per_evsel(evlist, ops, idx, mp, 0, thread, &output,
> >> + &output_overwrite, &nr_mmaps))
> >> + goto out_unmap;
> >> + }
> >> +
> >> + /* system-wide mmaps i.e. per-cpu */
> >> + for (cpu = 1; cpu < nr_cpus; cpu++, idx++) {
> >> + int output = -1;
> >> + int output_overwrite = -1;
> >> +
> >> + if (mmap_per_evsel(evlist, ops, idx, mp, cpu, 0, &output,
> >> + &output_overwrite, &nr_mmaps))
> >> + goto out_unmap;
> >> + }
> >
> > will this loop be executed? we are in here because all_cpus is empty, right?
>
> Yes it is executed. I put back the code that was there before ae4f8ae16a07
> ("libperf evlist: Allow mixing per-thread and per-cpu mmaps"), which uses

hm, but commit ae4f8ae16a07 does not have similar cpu loop

> perf_cpu_map__empty() which only checks the first entry is -1:
>
> bool perf_cpu_map__empty(const struct perf_cpu_map *map)
> {
> return map ? map->map[0].cpu == -1 : true;
> }
>
> But there can be more CPUs in the map, so perf_cpu_map__empty()
> returns true for the per-thread case, as desired, even if there
> are also system-wide CPUs.

I don't see how, if I'd see -1 together with other cpu values in
perf_cpu_map I'd think it's a bug, but I might be missing some
auxtrace usage,

I thought we use -1 just for empty cpu map, so in per-thread case
-1 is properly passed to perf_event_open syscall

jirka

>
> I guess perf_cpu_map__empty() needs renaming.
>
> >
> > thanks,
> > jirka
> >
> >> +
> >> + if (nr_mmaps != evlist->nr_mmaps)
> >> + pr_err("Miscounted nr_mmaps %d vs %d\n", nr_mmaps, evlist->nr_mmaps);
> >> +
> >> + return 0;
> >> +
> >> +out_unmap:
> >> + perf_evlist__munmap(evlist);
> >> + return -1;
> >> +}
> >
> > SNIP
>

2022-09-06 21:32:44

by Vince Weaver

[permalink] [raw]
Subject: Re: [PATCH V2] libperf evlist: Fix per-thread mmaps for multi-threaded targets

On Tue, 6 Sep 2022, Ian Rogers wrote:

> > >> perf_evlist__set_mmap_first(evlist, map, overwrite);
> > >> } else {
> libperf CPU map code as I'd like to propose we start a libperf2 (with
> LPC coming up it'd be a good place to discuss this, there's also the
> office hours on Thursday). What I hope for libperf2 is that we can
> make its license the same as libbpf, so the code can be more widely
> included in projects.

If you did plan to do this, it might be nice to include some
representatives from groups that would be likely to use such a library.
This might include the PAPI library developers, and there are various
other tools (especially in the HPC area) who are coding to
perf_event_open() directly but probably would like to use a library
instead if possible.

Vince Weaver
[email protected]

2022-09-07 05:01:23

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH V2] libperf evlist: Fix per-thread mmaps for multi-threaded targets

On 6/09/22 20:50, Namhyung Kim wrote:
> On Mon, Sep 5, 2022 at 4:42 AM Adrian Hunter <[email protected]> wrote:
>>
>> The offending commit removed mmap_per_thread(), which did not consider
>> the different set-output rules for per-thread mmaps i.e. in the per-thread
>> case set-output is used for file descriptors of the same thread not the
>> same cpu.
>>
>> This was not immediately noticed because it only happens with
>> multi-threaded targets and we do not have a test for that yet.
>
> Yeah, this is unfortunate. I feel like I need to spend some time on it.
>
>>
>> Reinstate mmap_per_thread() expanding it to cover also system-wide per-cpu
>> events i.e. to continue to allow the mixing of per-thread and per-cpu
>> mmaps.
>>
>> Debug messages (with -vv) show the file descriptors that are opened with
>> sys_perf_event_open. New debug messages are added (needs -vvv) that show
>> also which file descriptors are mmapped and which are redirected with
>> set-output.
>>
>> In the per-cpu case (cpu != -1) file descriptors for the same CPU are
>> set-output to the first file descriptor for that CPU.
>>
>> In the per-thread case (cpu == -1) file descriptors for the same thread are
>> set-output to the first file descriptor for that thread.
>>
>> Example (process 17489 has 2 threads):
>>
>> Before (but with new debug prints):
>>
>> $ perf record --no-bpf-event -vvv --per-thread -p 17489
>> <SNIP>
>> sys_perf_event_open: pid 17489 cpu -1 group_fd -1 flags 0x8 = 5
>> sys_perf_event_open: pid 17490 cpu -1 group_fd -1 flags 0x8 = 6
>> <SNIP>
>> libperf: idx 0: mmapping fd 5
>> libperf: idx 0: set output fd 6 -> 5
>> failed to mmap with 22 (Invalid argument)
>>
>> After:
>>
>> $ perf record --no-bpf-event -vvv --per-thread -p 17489
>> <SNIP>
>> sys_perf_event_open: pid 17489 cpu -1 group_fd -1 flags 0x8 = 5
>> sys_perf_event_open: pid 17490 cpu -1 group_fd -1 flags 0x8 = 6
>> <SNIP>
>> libperf: mmap_per_thread: nr cpu values (may include -1) 1 nr threads 2
>> libperf: idx 0: mmapping fd 5
>> libperf: idx 1: mmapping fd 6
>> <SNIP>
>> [ perf record: Woken up 2 times to write data ]
>> [ perf record: Captured and wrote 0.018 MB perf.data (15 samples) ]
>
> It'd be nice if the example had 2 events so that it could check the
> set-output rule actually worked.

Here is an Intel PT example

$ perf record -vvv --per-thread -p 1521 -e intel_pt//
Using CPUID GenuineIntel-6-8C-1
intel_pt default config: tsc,mtc,mtc_period=3,psb_period=3,pt,branch
DEBUGINFOD_URLS=
nr_cblocks: 0
affinity: SYS
mmap flush: 1
comp level: 0
------------------------------------------------------------
perf_event_attr:
type 9
size 128
config 0x300e601
{ sample_period, sample_freq } 1
sample_type IP|TID|IDENTIFIER
read_format ID
disabled 1
sample_id_all 1
exclude_guest 1
aux_watermark 1048576
------------------------------------------------------------
sys_perf_event_open: pid 1521 cpu -1 group_fd -1 flags 0x8 = 5
sys_perf_event_open: pid 1522 cpu -1 group_fd -1 flags 0x8 = 6
------------------------------------------------------------
perf_event_attr:
type 1
size 128
config 0x9
{ sample_period, sample_freq } 1
sample_type IP|TID|IDENTIFIER
read_format ID
disabled 1
exclude_kernel 1
exclude_hv 1
mmap 1
comm 1
task 1
sample_id_all 1
exclude_guest 1
mmap2 1
comm_exec 1
bpf_event 1
------------------------------------------------------------
sys_perf_event_open: pid 1521 cpu -1 group_fd -1 flags 0x8 = 7
sys_perf_event_open: pid 1522 cpu -1 group_fd -1 flags 0x8 = 8
------------------------------------------------------------
perf_event_attr:
type 1
size 128
config 0x9
{ sample_period, sample_freq } 1
sample_type IP|TID|TIME|IDENTIFIER
read_format ID
exclude_kernel 1
exclude_hv 1
sample_id_all 1
exclude_guest 1
ksymbol 1
text_poke 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 9
sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 10
sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 = 11
sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 = 12
sys_perf_event_open: pid -1 cpu 4 group_fd -1 flags 0x8 = 13
sys_perf_event_open: pid -1 cpu 5 group_fd -1 flags 0x8 = 14
sys_perf_event_open: pid -1 cpu 6 group_fd -1 flags 0x8 = 15
sys_perf_event_open: pid -1 cpu 7 group_fd -1 flags 0x8 = 16
mmap size 528384B
AUX area mmap length 4194304
libperf: mmap_per_thread: nr cpu values (may include -1) 9 nr threads 2
libperf: idx 0: mmapping fd 5
libperf: idx 0: set output fd 7 -> 5
libperf: idx 1: mmapping fd 6
libperf: idx 1: set output fd 8 -> 6
libperf: idx 2: mmapping fd 9
libperf: idx 3: mmapping fd 10
libperf: idx 4: mmapping fd 11
libperf: idx 5: mmapping fd 12
libperf: idx 6: mmapping fd 13
libperf: idx 7: mmapping fd 14
libperf: idx 8: mmapping fd 15
libperf: idx 9: mmapping fd 16
Control descriptor is not initialized
thread_data[0x5572a73ec2f0]: nr_mmaps=10, maps=0x5572a73ec380, ow_maps=(nil)
thread_data[0x5572a73ec2f0]: cpu-1: maps[0] -> mmap[0]
thread_data[0x5572a73ec2f0]: cpu0: maps[1] -> mmap[1]
thread_data[0x5572a73ec2f0]: cpu1: maps[2] -> mmap[2]
thread_data[0x5572a73ec2f0]: cpu2: maps[3] -> mmap[3]
thread_data[0x5572a73ec2f0]: cpu3: maps[4] -> mmap[4]
thread_data[0x5572a73ec2f0]: cpu4: maps[5] -> mmap[5]
thread_data[0x5572a73ec2f0]: cpu5: maps[6] -> mmap[6]
thread_data[0x5572a73ec2f0]: cpu6: maps[7] -> mmap[7]
thread_data[0x5572a73ec2f0]: cpu7: maps[8] -> mmap[8]
thread_data[0x5572a73ec2f0]: cpu-1: maps[9] -> mmap[9]
thread_data[0x5572a73ec2f0]: pollfd[0] <- event_fd=5
thread_data[0x5572a73ec2f0]: pollfd[1] <- event_fd=7
thread_data[0x5572a73ec2f0]: pollfd[2] <- event_fd=6
thread_data[0x5572a73ec2f0]: pollfd[3] <- event_fd=8
thread_data[0x5572a73ec2f0]: pollfd[4] <- non_perf_event fd=4
------------------------------------------------------------
perf_event_attr:
type 1
size 128
config 0x9
watermark 1
sample_id_all 1
bpf_event 1
{ wakeup_events, wakeup_watermark } 1
------------------------------------------------------------
sys_perf_event_open: pid 1521 cpu -1 group_fd -1 flags 0x8 = 17
sys_perf_event_open: pid 1522 cpu -1 group_fd -1 flags 0x8 = 18
mmap size 528384B
libperf: mmap_per_thread: nr cpu values (may include -1) 1 nr threads 2
libperf: idx 0: mmapping fd 17
libperf: idx 1: mmapping fd 18
Synthesizing TSC conversion information
Synthesizing id index
Synthesizing auxtrace information
auxtrace idx 1 old 0 head 0x2c0 diff 0x2c0
auxtrace idx 0 old 0 head 0x2060 diff 0x2060
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.029 MB perf.data ]




>
> Thanks,
> Namhyung
>
>>
>> Per-cpu example (process 20341 has 2 threads, same as above):
>>
>> $ perf record --no-bpf-event -vvv -p 20341
>> <SNIP>
>> sys_perf_event_open: pid 20341 cpu 0 group_fd -1 flags 0x8 = 5
>> sys_perf_event_open: pid 20342 cpu 0 group_fd -1 flags 0x8 = 6
>> sys_perf_event_open: pid 20341 cpu 1 group_fd -1 flags 0x8 = 7
>> sys_perf_event_open: pid 20342 cpu 1 group_fd -1 flags 0x8 = 8
>> sys_perf_event_open: pid 20341 cpu 2 group_fd -1 flags 0x8 = 9
>> sys_perf_event_open: pid 20342 cpu 2 group_fd -1 flags 0x8 = 10
>> sys_perf_event_open: pid 20341 cpu 3 group_fd -1 flags 0x8 = 11
>> sys_perf_event_open: pid 20342 cpu 3 group_fd -1 flags 0x8 = 12
>> sys_perf_event_open: pid 20341 cpu 4 group_fd -1 flags 0x8 = 13
>> sys_perf_event_open: pid 20342 cpu 4 group_fd -1 flags 0x8 = 14
>> sys_perf_event_open: pid 20341 cpu 5 group_fd -1 flags 0x8 = 15
>> sys_perf_event_open: pid 20342 cpu 5 group_fd -1 flags 0x8 = 16
>> sys_perf_event_open: pid 20341 cpu 6 group_fd -1 flags 0x8 = 17
>> sys_perf_event_open: pid 20342 cpu 6 group_fd -1 flags 0x8 = 18
>> sys_perf_event_open: pid 20341 cpu 7 group_fd -1 flags 0x8 = 19
>> sys_perf_event_open: pid 20342 cpu 7 group_fd -1 flags 0x8 = 20
>> <SNIP>
>> libperf: mmap_per_cpu: nr cpu values 8 nr threads 2
>> libperf: idx 0: mmapping fd 5
>> libperf: idx 0: set output fd 6 -> 5
>> libperf: idx 1: mmapping fd 7
>> libperf: idx 1: set output fd 8 -> 7
>> libperf: idx 2: mmapping fd 9
>> libperf: idx 2: set output fd 10 -> 9
>> libperf: idx 3: mmapping fd 11
>> libperf: idx 3: set output fd 12 -> 11
>> libperf: idx 4: mmapping fd 13
>> libperf: idx 4: set output fd 14 -> 13
>> libperf: idx 5: mmapping fd 15
>> libperf: idx 5: set output fd 16 -> 15
>> libperf: idx 6: mmapping fd 17
>> libperf: idx 6: set output fd 18 -> 17
>> libperf: idx 7: mmapping fd 19
>> libperf: idx 7: set output fd 20 -> 19
>> <SNIP>
>> [ perf record: Woken up 7 times to write data ]
>> [ perf record: Captured and wrote 0.020 MB perf.data (17 samples) ]
>>
>> Fixes: ae4f8ae16a07 ("libperf evlist: Allow mixing per-thread and per-cpu mmaps")
>> Signed-off-by: Adrian Hunter <[email protected]>

2022-09-07 05:53:16

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH V2] libperf evlist: Fix per-thread mmaps for multi-threaded targets

On 6/09/22 22:45, Jiri Olsa wrote:
> On Tue, Sep 06, 2022 at 05:04:45PM +0300, Adrian Hunter wrote:
>> On 6/09/22 15:59, Jiri Olsa wrote:
>>> On Mon, Sep 05, 2022 at 02:42:09PM +0300, Adrian Hunter wrote:
>>>
>>> SNIP
>>>
>>>> diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
>>>> index e6c98a6e3908..6b1bafe267a4 100644
>>>> --- a/tools/lib/perf/evlist.c
>>>> +++ b/tools/lib/perf/evlist.c
>>>> @@ -486,6 +486,7 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
>>>> if (ops->idx)
>>>> ops->idx(evlist, evsel, mp, idx);
>>>>
>>>> + pr_debug("idx %d: mmapping fd %d\n", idx, *output);
>>>> if (ops->mmap(map, mp, *output, evlist_cpu) < 0)
>>>> return -1;
>>>>
>>>> @@ -494,6 +495,7 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
>>>> if (!idx)
>>>> perf_evlist__set_mmap_first(evlist, map, overwrite);
>>>> } else {
>>>> + pr_debug("idx %d: set output fd %d -> %d\n", idx, fd, *output);
>>>> if (ioctl(fd, PERF_EVENT_IOC_SET_OUTPUT, *output) != 0)
>>>> return -1;
>>>>
>>>> @@ -519,6 +521,48 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
>>>> return 0;
>>>> }
>>>>
>>>> +static int
>>>> +mmap_per_thread(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
>>>> + struct perf_mmap_param *mp)
>>>> +{
>>>> + int nr_threads = perf_thread_map__nr(evlist->threads);
>>>> + int nr_cpus = perf_cpu_map__nr(evlist->all_cpus);
>>>> + int cpu, thread, idx = 0;
>>>> + int nr_mmaps = 0;
>>>> +
>>>> + pr_debug("%s: nr cpu values (may include -1) %d nr threads %d\n",
>>>> + __func__, nr_cpus, nr_threads);
>>>
>>> -1 as cpu value is only for 'empty' perf_cpu_map, right?
>>
>> The cpu map is a map of valid 3rd arguments to perf_event_open, so -1
>> means all CPUs which is per-thread by necessity.
>>
>>>
>>>> +
>>>> + /* per-thread mmaps */
>>>> + for (thread = 0; thread < nr_threads; thread++, idx++) {
>>>> + int output = -1;
>>>> + int output_overwrite = -1;
>>>> +
>>>> + if (mmap_per_evsel(evlist, ops, idx, mp, 0, thread, &output,
>>>> + &output_overwrite, &nr_mmaps))
>>>> + goto out_unmap;
>>>> + }
>>>> +
>>>> + /* system-wide mmaps i.e. per-cpu */
>>>> + for (cpu = 1; cpu < nr_cpus; cpu++, idx++) {
>>>> + int output = -1;
>>>> + int output_overwrite = -1;
>>>> +
>>>> + if (mmap_per_evsel(evlist, ops, idx, mp, cpu, 0, &output,
>>>> + &output_overwrite, &nr_mmaps))
>>>> + goto out_unmap;
>>>> + }
>>>
>>> will this loop be executed? we are in here because all_cpus is empty, right?
>>
>> Yes it is executed. I put back the code that was there before ae4f8ae16a07
>> ("libperf evlist: Allow mixing per-thread and per-cpu mmaps"), which uses
>
> hm, but commit ae4f8ae16a07 does not have similar cpu loop

It was calling mmap_per_cpu() for that case.

The 2 cases: mmap_per_cpu() and mmap_per_thread() could still be
combined into a single function.

>
>> perf_cpu_map__empty() which only checks the first entry is -1:
>>
>> bool perf_cpu_map__empty(const struct perf_cpu_map *map)
>> {
>> return map ? map->map[0].cpu == -1 : true;
>> }
>>
>> But there can be more CPUs in the map, so perf_cpu_map__empty()
>> returns true for the per-thread case, as desired, even if there
>> are also system-wide CPUs.
>
> I don't see how, if I'd see -1 together with other cpu values in
> perf_cpu_map I'd think it's a bug, but I might be missing some
> auxtrace usage,

Yes, it is for system-wide collection of events that can affect
every CPU. Currently text_poke is always system-wide - see the
Intel PT example.

>
> I thought we use -1 just for empty cpu map, so in per-thread case
> -1 is properly passed to perf_event_open syscall

Yes, but it does not need to be limited to that case.

>
> jirka
>
>>
>> I guess perf_cpu_map__empty() needs renaming.
>>
>>>
>>> thanks,
>>> jirka
>>>
>>>> +
>>>> + if (nr_mmaps != evlist->nr_mmaps)
>>>> + pr_err("Miscounted nr_mmaps %d vs %d\n", nr_mmaps, evlist->nr_mmaps);
>>>> +
>>>> + return 0;
>>>> +
>>>> +out_unmap:
>>>> + perf_evlist__munmap(evlist);
>>>> + return -1;
>>>> +}
>>>
>>> SNIP
>>

2022-09-07 14:43:13

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH V2] libperf evlist: Fix per-thread mmaps for multi-threaded targets

On Mon, Sep 05, 2022 at 02:42:09PM +0300, Adrian Hunter wrote:
> The offending commit removed mmap_per_thread(), which did not consider
> the different set-output rules for per-thread mmaps i.e. in the per-thread
> case set-output is used for file descriptors of the same thread not the
> same cpu.
>
> This was not immediately noticed because it only happens with
> multi-threaded targets and we do not have a test for that yet.
>
> Reinstate mmap_per_thread() expanding it to cover also system-wide per-cpu
> events i.e. to continue to allow the mixing of per-thread and per-cpu
> mmaps.
>
> Debug messages (with -vv) show the file descriptors that are opened with
> sys_perf_event_open. New debug messages are added (needs -vvv) that show
> also which file descriptors are mmapped and which are redirected with
> set-output.
>
> In the per-cpu case (cpu != -1) file descriptors for the same CPU are
> set-output to the first file descriptor for that CPU.
>
> In the per-thread case (cpu == -1) file descriptors for the same thread are
> set-output to the first file descriptor for that thread.
>
> Example (process 17489 has 2 threads):
>
> Before (but with new debug prints):
>
> $ perf record --no-bpf-event -vvv --per-thread -p 17489
> <SNIP>
> sys_perf_event_open: pid 17489 cpu -1 group_fd -1 flags 0x8 = 5
> sys_perf_event_open: pid 17490 cpu -1 group_fd -1 flags 0x8 = 6
> <SNIP>
> libperf: idx 0: mmapping fd 5
> libperf: idx 0: set output fd 6 -> 5
> failed to mmap with 22 (Invalid argument)
>
> After:
>
> $ perf record --no-bpf-event -vvv --per-thread -p 17489
> <SNIP>
> sys_perf_event_open: pid 17489 cpu -1 group_fd -1 flags 0x8 = 5
> sys_perf_event_open: pid 17490 cpu -1 group_fd -1 flags 0x8 = 6
> <SNIP>
> libperf: mmap_per_thread: nr cpu values (may include -1) 1 nr threads 2
> libperf: idx 0: mmapping fd 5
> libperf: idx 1: mmapping fd 6
> <SNIP>
> [ perf record: Woken up 2 times to write data ]
> [ perf record: Captured and wrote 0.018 MB perf.data (15 samples) ]
>
> Per-cpu example (process 20341 has 2 threads, same as above):
>
> $ perf record --no-bpf-event -vvv -p 20341
> <SNIP>
> sys_perf_event_open: pid 20341 cpu 0 group_fd -1 flags 0x8 = 5
> sys_perf_event_open: pid 20342 cpu 0 group_fd -1 flags 0x8 = 6
> sys_perf_event_open: pid 20341 cpu 1 group_fd -1 flags 0x8 = 7
> sys_perf_event_open: pid 20342 cpu 1 group_fd -1 flags 0x8 = 8
> sys_perf_event_open: pid 20341 cpu 2 group_fd -1 flags 0x8 = 9
> sys_perf_event_open: pid 20342 cpu 2 group_fd -1 flags 0x8 = 10
> sys_perf_event_open: pid 20341 cpu 3 group_fd -1 flags 0x8 = 11
> sys_perf_event_open: pid 20342 cpu 3 group_fd -1 flags 0x8 = 12
> sys_perf_event_open: pid 20341 cpu 4 group_fd -1 flags 0x8 = 13
> sys_perf_event_open: pid 20342 cpu 4 group_fd -1 flags 0x8 = 14
> sys_perf_event_open: pid 20341 cpu 5 group_fd -1 flags 0x8 = 15
> sys_perf_event_open: pid 20342 cpu 5 group_fd -1 flags 0x8 = 16
> sys_perf_event_open: pid 20341 cpu 6 group_fd -1 flags 0x8 = 17
> sys_perf_event_open: pid 20342 cpu 6 group_fd -1 flags 0x8 = 18
> sys_perf_event_open: pid 20341 cpu 7 group_fd -1 flags 0x8 = 19
> sys_perf_event_open: pid 20342 cpu 7 group_fd -1 flags 0x8 = 20
> <SNIP>
> libperf: mmap_per_cpu: nr cpu values 8 nr threads 2
> libperf: idx 0: mmapping fd 5
> libperf: idx 0: set output fd 6 -> 5
> libperf: idx 1: mmapping fd 7
> libperf: idx 1: set output fd 8 -> 7
> libperf: idx 2: mmapping fd 9
> libperf: idx 2: set output fd 10 -> 9
> libperf: idx 3: mmapping fd 11
> libperf: idx 3: set output fd 12 -> 11
> libperf: idx 4: mmapping fd 13
> libperf: idx 4: set output fd 14 -> 13
> libperf: idx 5: mmapping fd 15
> libperf: idx 5: set output fd 16 -> 15
> libperf: idx 6: mmapping fd 17
> libperf: idx 6: set output fd 18 -> 17
> libperf: idx 7: mmapping fd 19
> libperf: idx 7: set output fd 20 -> 19
> <SNIP>
> [ perf record: Woken up 7 times to write data ]
> [ perf record: Captured and wrote 0.020 MB perf.data (17 samples) ]
>
> Fixes: ae4f8ae16a07 ("libperf evlist: Allow mixing per-thread and per-cpu mmaps")
> Signed-off-by: Adrian Hunter <[email protected]>

Acked-by: Jiri Olsa <[email protected]>

thanks,
jirka

2022-09-08 15:21:35

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH V2] libperf evlist: Fix per-thread mmaps for multi-threaded targets

Em Wed, Sep 07, 2022 at 04:20:59PM +0200, Jiri Olsa escreveu:
> On Mon, Sep 05, 2022 at 02:42:09PM +0300, Adrian Hunter wrote:
> > Fixes: ae4f8ae16a07 ("libperf evlist: Allow mixing per-thread and per-cpu mmaps")
> > Signed-off-by: Adrian Hunter <[email protected]>
>
> Acked-by: Jiri Olsa <[email protected]>

Thanks, added,

- Arnaldo