2023-12-12 19:39:29

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V2] perf top: Use evsel's cpus to replace user_requested_cpus

From: Kan Liang <[email protected]>

perf top errors out on a hybrid machine
$perf top

Error:
The cycles:P event is not supported.

The perf top expects that the "cycles" is collected on all CPUs in the
system. But for hybrid there is no single "cycles" event which can cover
all CPUs. Perf has to split it into two cycles events, e.g.,
cpu_core/cycles/ and cpu_atom/cycles/. Each event has its own CPU mask.
If a event is opened on the unsupported CPU. The open fails. That's the
reason of the above error out.

Perf should only open the cycles event on the corresponding CPU. The
commit ef91871c960e ("perf evlist: Propagate user CPU maps intersecting
core PMU maps") intersect the requested CPU map with the CPU map of the
PMU. Use the evsel's cpus to replace user_requested_cpus.

The evlist's threads are also propagated to the evsel's threads in
__perf_evlist__propagate_maps(). For a system-wide event, perf appends
a dummy event and assign it to the evsel's threads. For a per-thread
event, the evlist's thread_map is assigned to the evsel's threads. The
same as the other tools, e.g., perf record, using the evsel's threads
when opening an event.

Reported-by: Arnaldo Carvalho de Melo <[email protected]>
Closes: https://lore.kernel.org/linux-perf-users/[email protected]/
Reviewed-by: Ian Rogers <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
---

Changes since V1:
- Update the description
- Add Reviewed-by from Ian

tools/perf/builtin-top.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index ea8c7eca5eee..cce9350177e2 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1027,8 +1027,8 @@ static int perf_top__start_counters(struct perf_top *top)

evlist__for_each_entry(evlist, counter) {
try_again:
- if (evsel__open(counter, top->evlist->core.user_requested_cpus,
- top->evlist->core.threads) < 0) {
+ if (evsel__open(counter, counter->core.cpus,
+ counter->core.threads) < 0) {

/*
* Specially handle overwrite fall back.
--
2.35.1


2023-12-12 20:37:33

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH V2] perf top: Use evsel's cpus to replace user_requested_cpus

On Tue, Dec 12, 2023 at 11:39 AM <[email protected]> wrote:
>
> From: Kan Liang <[email protected]>
>
> perf top errors out on a hybrid machine
> $perf top
>
> Error:
> The cycles:P event is not supported.
>
> The perf top expects that the "cycles" is collected on all CPUs in the
> system. But for hybrid there is no single "cycles" event which can cover
> all CPUs. Perf has to split it into two cycles events, e.g.,
> cpu_core/cycles/ and cpu_atom/cycles/. Each event has its own CPU mask.
> If a event is opened on the unsupported CPU. The open fails. That's the
> reason of the above error out.
>
> Perf should only open the cycles event on the corresponding CPU. The
> commit ef91871c960e ("perf evlist: Propagate user CPU maps intersecting
> core PMU maps") intersect the requested CPU map with the CPU map of the
> PMU. Use the evsel's cpus to replace user_requested_cpus.
>
> The evlist's threads are also propagated to the evsel's threads in
> __perf_evlist__propagate_maps(). For a system-wide event, perf appends
> a dummy event and assign it to the evsel's threads. For a per-thread
> event, the evlist's thread_map is assigned to the evsel's threads. The
> same as the other tools, e.g., perf record, using the evsel's threads
> when opening an event.
>
> Reported-by: Arnaldo Carvalho de Melo <[email protected]>
> Closes: https://lore.kernel.org/linux-perf-users/[email protected]/
> Reviewed-by: Ian Rogers <[email protected]>
> Signed-off-by: Kan Liang <[email protected]>
> ---
>
> Changes since V1:
> - Update the description
> - Add Reviewed-by from Ian

Thanks Kan, quick question. Does "perf top" on hybrid ask the user to
select between the cycles event on cpu_atom and cpu_core? I'm
wondering if there is some kind of missing "hybrid-merge"
functionality like we have for perf stat.

Thanks,
Ian

> tools/perf/builtin-top.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> index ea8c7eca5eee..cce9350177e2 100644
> --- a/tools/perf/builtin-top.c
> +++ b/tools/perf/builtin-top.c
> @@ -1027,8 +1027,8 @@ static int perf_top__start_counters(struct perf_top *top)
>
> evlist__for_each_entry(evlist, counter) {
> try_again:
> - if (evsel__open(counter, top->evlist->core.user_requested_cpus,
> - top->evlist->core.threads) < 0) {
> + if (evsel__open(counter, counter->core.cpus,
> + counter->core.threads) < 0) {
>
> /*
> * Specially handle overwrite fall back.
> --
> 2.35.1
>

2023-12-12 21:26:37

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH V2] perf top: Use evsel's cpus to replace user_requested_cpus



On 2023-12-12 3:37 p.m., Ian Rogers wrote:
> On Tue, Dec 12, 2023 at 11:39 AM <[email protected]> wrote:
>>
>> From: Kan Liang <[email protected]>
>>
>> perf top errors out on a hybrid machine
>> $perf top
>>
>> Error:
>> The cycles:P event is not supported.
>>
>> The perf top expects that the "cycles" is collected on all CPUs in the
>> system. But for hybrid there is no single "cycles" event which can cover
>> all CPUs. Perf has to split it into two cycles events, e.g.,
>> cpu_core/cycles/ and cpu_atom/cycles/. Each event has its own CPU mask.
>> If a event is opened on the unsupported CPU. The open fails. That's the
>> reason of the above error out.
>>
>> Perf should only open the cycles event on the corresponding CPU. The
>> commit ef91871c960e ("perf evlist: Propagate user CPU maps intersecting
>> core PMU maps") intersect the requested CPU map with the CPU map of the
>> PMU. Use the evsel's cpus to replace user_requested_cpus.
>>
>> The evlist's threads are also propagated to the evsel's threads in
>> __perf_evlist__propagate_maps(). For a system-wide event, perf appends
>> a dummy event and assign it to the evsel's threads. For a per-thread
>> event, the evlist's thread_map is assigned to the evsel's threads. The
>> same as the other tools, e.g., perf record, using the evsel's threads
>> when opening an event.
>>
>> Reported-by: Arnaldo Carvalho de Melo <[email protected]>
>> Closes: https://lore.kernel.org/linux-perf-users/[email protected]/
>> Reviewed-by: Ian Rogers <[email protected]>
>> Signed-off-by: Kan Liang <[email protected]>
>> ---
>>
>> Changes since V1:
>> - Update the description
>> - Add Reviewed-by from Ian
>
> Thanks Kan, quick question. Does "perf top" on hybrid ask the user to
> select between the cycles event on cpu_atom and cpu_core?

Yes, but the event doesn't include the PMU information.
We probably need a follow up patch to append the PMU name.

Available samples
385 cycles:P

903 cycles:P

Thanks,
Kan

> I'm
> wondering if there is some kind of missing "hybrid-merge"
> functionality like we have for perf stat.
>
> Thanks,
> Ian
>
>> tools/perf/builtin-top.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
>> index ea8c7eca5eee..cce9350177e2 100644
>> --- a/tools/perf/builtin-top.c
>> +++ b/tools/perf/builtin-top.c
>> @@ -1027,8 +1027,8 @@ static int perf_top__start_counters(struct perf_top *top)
>>
>> evlist__for_each_entry(evlist, counter) {
>> try_again:
>> - if (evsel__open(counter, top->evlist->core.user_requested_cpus,
>> - top->evlist->core.threads) < 0) {
>> + if (evsel__open(counter, counter->core.cpus,
>> + counter->core.threads) < 0) {
>>
>> /*
>> * Specially handle overwrite fall back.
>> --
>> 2.35.1
>>

2023-12-12 22:13:11

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH V2] perf top: Use evsel's cpus to replace user_requested_cpus

On Tue, Dec 12, 2023 at 1:25 PM Liang, Kan <[email protected]> wrote:
>
>
>
> On 2023-12-12 3:37 p.m., Ian Rogers wrote:
> > On Tue, Dec 12, 2023 at 11:39 AM <[email protected]> wrote:
> >>
> >> From: Kan Liang <[email protected]>
> >>
> >> perf top errors out on a hybrid machine
> >> $perf top
> >>
> >> Error:
> >> The cycles:P event is not supported.
> >>
> >> The perf top expects that the "cycles" is collected on all CPUs in the
> >> system. But for hybrid there is no single "cycles" event which can cover
> >> all CPUs. Perf has to split it into two cycles events, e.g.,
> >> cpu_core/cycles/ and cpu_atom/cycles/. Each event has its own CPU mask.
> >> If a event is opened on the unsupported CPU. The open fails. That's the
> >> reason of the above error out.
> >>
> >> Perf should only open the cycles event on the corresponding CPU. The
> >> commit ef91871c960e ("perf evlist: Propagate user CPU maps intersecting
> >> core PMU maps") intersect the requested CPU map with the CPU map of the
> >> PMU. Use the evsel's cpus to replace user_requested_cpus.
> >>
> >> The evlist's threads are also propagated to the evsel's threads in
> >> __perf_evlist__propagate_maps(). For a system-wide event, perf appends
> >> a dummy event and assign it to the evsel's threads. For a per-thread
> >> event, the evlist's thread_map is assigned to the evsel's threads. The
> >> same as the other tools, e.g., perf record, using the evsel's threads
> >> when opening an event.
> >>
> >> Reported-by: Arnaldo Carvalho de Melo <[email protected]>
> >> Closes: https://lore.kernel.org/linux-perf-users/[email protected]/
> >> Reviewed-by: Ian Rogers <[email protected]>
> >> Signed-off-by: Kan Liang <[email protected]>
> >> ---
> >>
> >> Changes since V1:
> >> - Update the description
> >> - Add Reviewed-by from Ian
> >
> > Thanks Kan, quick question. Does "perf top" on hybrid ask the user to
> > select between the cycles event on cpu_atom and cpu_core?
>
> Yes, but the event doesn't include the PMU information.
> We probably need a follow up patch to append the PMU name.
>
> Available samples
> 385 cycles:P
>
> 903 cycles:P

Thanks and agreed, it isn't possible to tell which is which PMU/CPU
type at the moment. I tried the patch with perf top --stdio, there
wasn't a choice of event and I can't tell what counter is being
displayed. When I quit I also see:
```
exiting.
corrupted double-linked list
Aborted (core dumped)
```
but I wasn't able to repro this on a debuggable binary/system.

If my memory serves there was a patch where perf top was showing >1
event. It would be nice here to do some kind of hybrid merging rather
than having to view each PMU's top separately.

Thanks,
Ian


> Thanks,
> Kan
>
> > I'm
> > wondering if there is some kind of missing "hybrid-merge"
> > functionality like we have for perf stat.
> >
> > Thanks,
> > Ian
> >
> >> tools/perf/builtin-top.c | 4 ++--
> >> 1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> >> index ea8c7eca5eee..cce9350177e2 100644
> >> --- a/tools/perf/builtin-top.c
> >> +++ b/tools/perf/builtin-top.c
> >> @@ -1027,8 +1027,8 @@ static int perf_top__start_counters(struct perf_top *top)
> >>
> >> evlist__for_each_entry(evlist, counter) {
> >> try_again:
> >> - if (evsel__open(counter, top->evlist->core.user_requested_cpus,
> >> - top->evlist->core.threads) < 0) {
> >> + if (evsel__open(counter, counter->core.cpus,
> >> + counter->core.threads) < 0) {
> >>
> >> /*
> >> * Specially handle overwrite fall back.
> >> --
> >> 2.35.1
> >>

2023-12-13 01:07:07

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH V2] perf top: Use evsel's cpus to replace user_requested_cpus

On Tue, Dec 12, 2023 at 2:12 PM Ian Rogers <[email protected]> wrote:
>
> On Tue, Dec 12, 2023 at 1:25 PM Liang, Kan <[email protected]> wrote:
> >
> >
> >
> > On 2023-12-12 3:37 p.m., Ian Rogers wrote:
> > > On Tue, Dec 12, 2023 at 11:39 AM <[email protected]> wrote:
> > >>
> > >> From: Kan Liang <[email protected]>
> > >>
> > >> perf top errors out on a hybrid machine
> > >> $perf top
> > >>
> > >> Error:
> > >> The cycles:P event is not supported.
> > >>
> > >> The perf top expects that the "cycles" is collected on all CPUs in the
> > >> system. But for hybrid there is no single "cycles" event which can cover
> > >> all CPUs. Perf has to split it into two cycles events, e.g.,
> > >> cpu_core/cycles/ and cpu_atom/cycles/. Each event has its own CPU mask.
> > >> If a event is opened on the unsupported CPU. The open fails. That's the
> > >> reason of the above error out.
> > >>
> > >> Perf should only open the cycles event on the corresponding CPU. The
> > >> commit ef91871c960e ("perf evlist: Propagate user CPU maps intersecting
> > >> core PMU maps") intersect the requested CPU map with the CPU map of the
> > >> PMU. Use the evsel's cpus to replace user_requested_cpus.
> > >>
> > >> The evlist's threads are also propagated to the evsel's threads in
> > >> __perf_evlist__propagate_maps(). For a system-wide event, perf appends
> > >> a dummy event and assign it to the evsel's threads. For a per-thread
> > >> event, the evlist's thread_map is assigned to the evsel's threads. The
> > >> same as the other tools, e.g., perf record, using the evsel's threads
> > >> when opening an event.
> > >>
> > >> Reported-by: Arnaldo Carvalho de Melo <[email protected]>
> > >> Closes: https://lore.kernel.org/linux-perf-users/[email protected]/
> > >> Reviewed-by: Ian Rogers <[email protected]>
> > >> Signed-off-by: Kan Liang <[email protected]>
> > >> ---
> > >>
> > >> Changes since V1:
> > >> - Update the description
> > >> - Add Reviewed-by from Ian
> > >
> > > Thanks Kan, quick question. Does "perf top" on hybrid ask the user to
> > > select between the cycles event on cpu_atom and cpu_core?
> >
> > Yes, but the event doesn't include the PMU information.
> > We probably need a follow up patch to append the PMU name.
> >
> > Available samples
> > 385 cycles:P
> >
> > 903 cycles:P
>
> Thanks and agreed, it isn't possible to tell which is which PMU/CPU
> type at the moment. I tried the patch with perf top --stdio, there
> wasn't a choice of event and I can't tell what counter is being
> displayed. When I quit I also see:
> ```
> exiting.
> corrupted double-linked list
> Aborted (core dumped)
> ```
> but I wasn't able to repro this on a debuggable binary/system.
>
> If my memory serves there was a patch where perf top was showing >1
> event. It would be nice here to do some kind of hybrid merging rather
> than having to view each PMU's top separately.

Using event groups, but I noticed you removed the --group option.
Maybe perf top can just use `{ ... }` notation for explicit grouping,
but it might be implicit like in the hybrid case.

Thanks,
Namhyung

2023-12-13 15:46:57

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH V2] perf top: Use evsel's cpus to replace user_requested_cpus



On 2023-12-12 8:06 p.m., Namhyung Kim wrote:
> On Tue, Dec 12, 2023 at 2:12 PM Ian Rogers <[email protected]> wrote:
>>
>> On Tue, Dec 12, 2023 at 1:25 PM Liang, Kan <[email protected]> wrote:
>>>
>>>
>>>
>>> On 2023-12-12 3:37 p.m., Ian Rogers wrote:
>>>> On Tue, Dec 12, 2023 at 11:39 AM <[email protected]> wrote:
>>>>>
>>>>> From: Kan Liang <[email protected]>
>>>>>
>>>>> perf top errors out on a hybrid machine
>>>>> $perf top
>>>>>
>>>>> Error:
>>>>> The cycles:P event is not supported.
>>>>>
>>>>> The perf top expects that the "cycles" is collected on all CPUs in the
>>>>> system. But for hybrid there is no single "cycles" event which can cover
>>>>> all CPUs. Perf has to split it into two cycles events, e.g.,
>>>>> cpu_core/cycles/ and cpu_atom/cycles/. Each event has its own CPU mask.
>>>>> If a event is opened on the unsupported CPU. The open fails. That's the
>>>>> reason of the above error out.
>>>>>
>>>>> Perf should only open the cycles event on the corresponding CPU. The
>>>>> commit ef91871c960e ("perf evlist: Propagate user CPU maps intersecting
>>>>> core PMU maps") intersect the requested CPU map with the CPU map of the
>>>>> PMU. Use the evsel's cpus to replace user_requested_cpus.
>>>>>
>>>>> The evlist's threads are also propagated to the evsel's threads in
>>>>> __perf_evlist__propagate_maps(). For a system-wide event, perf appends
>>>>> a dummy event and assign it to the evsel's threads. For a per-thread
>>>>> event, the evlist's thread_map is assigned to the evsel's threads. The
>>>>> same as the other tools, e.g., perf record, using the evsel's threads
>>>>> when opening an event.
>>>>>
>>>>> Reported-by: Arnaldo Carvalho de Melo <[email protected]>
>>>>> Closes: https://lore.kernel.org/linux-perf-users/[email protected]/
>>>>> Reviewed-by: Ian Rogers <[email protected]>
>>>>> Signed-off-by: Kan Liang <[email protected]>
>>>>> ---
>>>>>
>>>>> Changes since V1:
>>>>> - Update the description
>>>>> - Add Reviewed-by from Ian
>>>>
>>>> Thanks Kan, quick question. Does "perf top" on hybrid ask the user to
>>>> select between the cycles event on cpu_atom and cpu_core?
>>>
>>> Yes, but the event doesn't include the PMU information.
>>> We probably need a follow up patch to append the PMU name.
>>>
>>> Available samples
>>> 385 cycles:P
>>>
>>> 903 cycles:P
>>
>> Thanks and agreed, it isn't possible to tell which is which PMU/CPU
>> type at the moment. I tried the patch with perf top --stdio, there
>> wasn't a choice of event

The perf top --stdio uses a dedicated display function, see
perf_top__header_snprintf() in util/top.c

It only shows one event at a time. "E" is used to switch the event.

>> and I can't tell what counter is being
>> displayed.

For the hybrid case, I think we may display both PMU name and event
name. For example,

Available samples
656 cycles:P cpu_atom

701 cycles:P cpu_core

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index f4812b226818..afc7a1d54fe4 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -3433,8 +3433,10 @@ static void perf_evsel_menu__write(struct
ui_browser *browser,
}

nr_events = convert_unit(nr_events, &unit);
- printed = scnprintf(bf, sizeof(bf), "%lu%c%s%s", nr_events,
- unit, unit == ' ' ? "" : " ", ev_name);
+ printed = scnprintf(bf, sizeof(bf), "%lu%c%s%s %s", nr_events,
+ unit, unit == ' ' ? "" : " ", ev_name,
+ evsel->pmu ? evsel->pmu_name : "");
+
ui_browser__printf(browser, "%s", bf);

nr_events = evsel->evlist->stats.nr_events[PERF_RECORD_LOST];


>> When I quit I also see:
>> ```
>> exiting.
>> corrupted double-linked list
>> Aborted (core dumped)
>> ```
>> but I wasn't able to repro this on a debuggable binary/system.

I haven't see the issue yet.

>>
>> If my memory serves there was a patch where perf top was showing >1
>> event. It would be nice here to do some kind of hybrid merging rather
>> than having to view each PMU's top separately.

The current perf top doesn't merge when there are >1 event.
sudo ./perf top -e "cycles,instructions"

Available samples
2K cycles

2K instructions

For now, I think we may just append a PMU name to distinguish the hybrid
case.

We may implement the merging feature which impacts both hybrid and
non-hybrid cases later separately.

>
> Using event groups, but I noticed you removed the --group option.
> Maybe perf top can just use `{ ... }` notation for explicit grouping,
> but it might be implicit like in the hybrid case.
>

Yes, if the events are from different PMUs, the perf tool will
implicitly de-group the hybrid events. "{ ... }" may not help here.

Thanks,
Kan

2023-12-13 17:43:22

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH V2] perf top: Use evsel's cpus to replace user_requested_cpus

On Wed, Dec 13, 2023 at 7:45 AM Liang, Kan <[email protected]> wrote:
>
>
>
> On 2023-12-12 8:06 p.m., Namhyung Kim wrote:
> > On Tue, Dec 12, 2023 at 2:12 PM Ian Rogers <[email protected]> wrote:
> >>
> >> On Tue, Dec 12, 2023 at 1:25 PM Liang, Kan <[email protected]> wrote:
> >>>
> >>>
> >>>
> >>> On 2023-12-12 3:37 p.m., Ian Rogers wrote:
> >>>> On Tue, Dec 12, 2023 at 11:39 AM <[email protected]> wrote:
> >>>>>
> >>>>> From: Kan Liang <[email protected]>
> >>>>>
> >>>>> perf top errors out on a hybrid machine
> >>>>> $perf top
> >>>>>
> >>>>> Error:
> >>>>> The cycles:P event is not supported.
> >>>>>
> >>>>> The perf top expects that the "cycles" is collected on all CPUs in the
> >>>>> system. But for hybrid there is no single "cycles" event which can cover
> >>>>> all CPUs. Perf has to split it into two cycles events, e.g.,
> >>>>> cpu_core/cycles/ and cpu_atom/cycles/. Each event has its own CPU mask.
> >>>>> If a event is opened on the unsupported CPU. The open fails. That's the
> >>>>> reason of the above error out.
> >>>>>
> >>>>> Perf should only open the cycles event on the corresponding CPU. The
> >>>>> commit ef91871c960e ("perf evlist: Propagate user CPU maps intersecting
> >>>>> core PMU maps") intersect the requested CPU map with the CPU map of the
> >>>>> PMU. Use the evsel's cpus to replace user_requested_cpus.
> >>>>>
> >>>>> The evlist's threads are also propagated to the evsel's threads in
> >>>>> __perf_evlist__propagate_maps(). For a system-wide event, perf appends
> >>>>> a dummy event and assign it to the evsel's threads. For a per-thread
> >>>>> event, the evlist's thread_map is assigned to the evsel's threads. The
> >>>>> same as the other tools, e.g., perf record, using the evsel's threads
> >>>>> when opening an event.
> >>>>>
> >>>>> Reported-by: Arnaldo Carvalho de Melo <[email protected]>
> >>>>> Closes: https://lore.kernel.org/linux-perf-users/[email protected]/
> >>>>> Reviewed-by: Ian Rogers <[email protected]>
> >>>>> Signed-off-by: Kan Liang <[email protected]>
> >>>>> ---
> >>>>>
> >>>>> Changes since V1:
> >>>>> - Update the description
> >>>>> - Add Reviewed-by from Ian
> >>>>
> >>>> Thanks Kan, quick question. Does "perf top" on hybrid ask the user to
> >>>> select between the cycles event on cpu_atom and cpu_core?
> >>>
> >>> Yes, but the event doesn't include the PMU information.
> >>> We probably need a follow up patch to append the PMU name.
> >>>
> >>> Available samples
> >>> 385 cycles:P
> >>>
> >>> 903 cycles:P
> >>
> >> Thanks and agreed, it isn't possible to tell which is which PMU/CPU
> >> type at the moment. I tried the patch with perf top --stdio, there
> >> wasn't a choice of event
>
> The perf top --stdio uses a dedicated display function, see
> perf_top__header_snprintf() in util/top.c
>
> It only shows one event at a time. "E" is used to switch the event.
>
> >> and I can't tell what counter is being
> >> displayed.
>
> For the hybrid case, I think we may display both PMU name and event
> name. For example,
>
> Available samples
> 656 cycles:P cpu_atom
>
> 701 cycles:P cpu_core

I think there would be more uniformity if we could do:
cpu/cycles/P
I'm just reminded of the stat output where sometimes you can get things like:
cpu/cycles/
or sometimes:
cycles [cpu]
It seems the slash style approach is the most uniform given it looks
like what is written on the command line. Perhaps we need some kind of
helper function to do this as reformatting the modifier is a pain.

> diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
> index f4812b226818..afc7a1d54fe4 100644
> --- a/tools/perf/ui/browsers/hists.c
> +++ b/tools/perf/ui/browsers/hists.c
> @@ -3433,8 +3433,10 @@ static void perf_evsel_menu__write(struct
> ui_browser *browser,
> }
>
> nr_events = convert_unit(nr_events, &unit);
> - printed = scnprintf(bf, sizeof(bf), "%lu%c%s%s", nr_events,
> - unit, unit == ' ' ? "" : " ", ev_name);
> + printed = scnprintf(bf, sizeof(bf), "%lu%c%s%s %s", nr_events,
> + unit, unit == ' ' ? "" : " ", ev_name,
> + evsel->pmu ? evsel->pmu_name : "");
> +
> ui_browser__printf(browser, "%s", bf);
>
> nr_events = evsel->evlist->stats.nr_events[PERF_RECORD_LOST];
>
>
> >> When I quit I also see:
> >> ```
> >> exiting.
> >> corrupted double-linked list
> >> Aborted (core dumped)
> >> ```
> >> but I wasn't able to repro this on a debuggable binary/system.
>
> I haven't see the issue yet.
>
> >>
> >> If my memory serves there was a patch where perf top was showing >1
> >> event. It would be nice here to do some kind of hybrid merging rather
> >> than having to view each PMU's top separately.
>
> The current perf top doesn't merge when there are >1 event.
> sudo ./perf top -e "cycles,instructions"
>
> Available samples
> 2K cycles
>
> 2K instructions
>
> For now, I think we may just append a PMU name to distinguish the hybrid
> case.
>
> We may implement the merging feature which impacts both hybrid and
> non-hybrid cases later separately.
>
> >
> > Using event groups, but I noticed you removed the --group option.
> > Maybe perf top can just use `{ ... }` notation for explicit grouping,
> > but it might be implicit like in the hybrid case.
> >
>
> Yes, if the events are from different PMUs, the perf tool will
> implicitly de-group the hybrid events. "{ ... }" may not help here.

I think grouping may have been the situation where I saw >1 event
displayed but I agree with Kan, we implicitly de-group events on
different PMUs so it won't help here.

Thanks,
Ian

> Thanks,
> Kan

2023-12-13 21:19:25

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH V2] perf top: Use evsel's cpus to replace user_requested_cpus



On 2023-12-13 12:42 p.m., Ian Rogers wrote:
> On Wed, Dec 13, 2023 at 7:45 AM Liang, Kan <[email protected]> wrote:
>>
>>
>> On 2023-12-12 8:06 p.m., Namhyung Kim wrote:
>>> On Tue, Dec 12, 2023 at 2:12 PM Ian Rogers <[email protected]> wrote:
>>>> On Tue, Dec 12, 2023 at 1:25 PM Liang, Kan <[email protected]> wrote:
>>>>>
>>>>>
>>>>> On 2023-12-12 3:37 p.m., Ian Rogers wrote:
>>>>>> On Tue, Dec 12, 2023 at 11:39 AM <[email protected]> wrote:
>>>>>>> From: Kan Liang <[email protected]>
>>>>>>>
>>>>>>> perf top errors out on a hybrid machine
>>>>>>> $perf top
>>>>>>>
>>>>>>> Error:
>>>>>>> The cycles:P event is not supported.
>>>>>>>
>>>>>>> The perf top expects that the "cycles" is collected on all CPUs in the
>>>>>>> system. But for hybrid there is no single "cycles" event which can cover
>>>>>>> all CPUs. Perf has to split it into two cycles events, e.g.,
>>>>>>> cpu_core/cycles/ and cpu_atom/cycles/. Each event has its own CPU mask.
>>>>>>> If a event is opened on the unsupported CPU. The open fails. That's the
>>>>>>> reason of the above error out.
>>>>>>>
>>>>>>> Perf should only open the cycles event on the corresponding CPU. The
>>>>>>> commit ef91871c960e ("perf evlist: Propagate user CPU maps intersecting
>>>>>>> core PMU maps") intersect the requested CPU map with the CPU map of the
>>>>>>> PMU. Use the evsel's cpus to replace user_requested_cpus.
>>>>>>>
>>>>>>> The evlist's threads are also propagated to the evsel's threads in
>>>>>>> __perf_evlist__propagate_maps(). For a system-wide event, perf appends
>>>>>>> a dummy event and assign it to the evsel's threads. For a per-thread
>>>>>>> event, the evlist's thread_map is assigned to the evsel's threads. The
>>>>>>> same as the other tools, e.g., perf record, using the evsel's threads
>>>>>>> when opening an event.
>>>>>>>
>>>>>>> Reported-by: Arnaldo Carvalho de Melo <[email protected]>
>>>>>>> Closes: https://lore.kernel.org/linux-perf-users/[email protected]/
>>>>>>> Reviewed-by: Ian Rogers <[email protected]>
>>>>>>> Signed-off-by: Kan Liang <[email protected]>
>>>>>>> ---
>>>>>>>
>>>>>>> Changes since V1:
>>>>>>> - Update the description
>>>>>>> - Add Reviewed-by from Ian
>>>>>> Thanks Kan, quick question. Does "perf top" on hybrid ask the user to
>>>>>> select between the cycles event on cpu_atom and cpu_core?
>>>>> Yes, but the event doesn't include the PMU information.
>>>>> We probably need a follow up patch to append the PMU name.
>>>>>
>>>>> Available samples
>>>>> 385 cycles:P
>>>>>
>>>>> 903 cycles:P
>>>> Thanks and agreed, it isn't possible to tell which is which PMU/CPU
>>>> type at the moment. I tried the patch with perf top --stdio, there
>>>> wasn't a choice of event
>> The perf top --stdio uses a dedicated display function, see
>> perf_top__header_snprintf() in util/top.c
>>
>> It only shows one event at a time. "E" is used to switch the event.
>>
>>>> and I can't tell what counter is being
>>>> displayed.
>> For the hybrid case, I think we may display both PMU name and event
>> name. For example,
>>
>> Available samples
>> 656 cycles:P cpu_atom
>>
>> 701 cycles:P cpu_core
> I think there would be more uniformity if we could do:
> cpu/cycles/P
> I'm just reminded of the stat output where sometimes you can get things like:
> cpu/cycles/
> or sometimes:
> cycles [cpu]
> It seems the slash style approach is the most uniform given it looks
> like what is written on the command line. Perhaps we need some kind of
> helper function to do this as reformatting the modifier is a pain.

Actually, we already have a helper in the perf record,
record__uniquify_name().
I can move it to the util/record.c and let's perf top invoke it before
the open as well. Then we can get this in the perf top.

Available samples
148 cpu_atom/cycles:P/

1K cpu_core/cycles:P/

It should be good enough to distinguish the events.
I will send a patch to support it.

Thanks,
Kan

2023-12-13 21:55:08

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH V2] perf top: Use evsel's cpus to replace user_requested_cpus

On Wed, Dec 13, 2023 at 9:42 AM Ian Rogers <[email protected]> wrote:
>
> On Wed, Dec 13, 2023 at 7:45 AM Liang, Kan <[email protected]> wrote:
> >
> >
> >
> > On 2023-12-12 8:06 p.m., Namhyung Kim wrote:
> > > On Tue, Dec 12, 2023 at 2:12 PM Ian Rogers <[email protected]> wrote:

> > >> If my memory serves there was a patch where perf top was showing >1
> > >> event. It would be nice here to do some kind of hybrid merging rather
> > >> than having to view each PMU's top separately.
> >
> > The current perf top doesn't merge when there are >1 event.
> > sudo ./perf top -e "cycles,instructions"
> >
> > Available samples
> > 2K cycles
> >
> > 2K instructions
> >
> > For now, I think we may just append a PMU name to distinguish the hybrid
> > case.
> >
> > We may implement the merging feature which impacts both hybrid and
> > non-hybrid cases later separately.
> >
> > >
> > > Using event groups, but I noticed you removed the --group option.
> > > Maybe perf top can just use `{ ... }` notation for explicit grouping,
> > > but it might be implicit like in the hybrid case.
> > >
> >
> > Yes, if the events are from different PMUs, the perf tool will
> > implicitly de-group the hybrid events. "{ ... }" may not help here.
>
> I think grouping may have been the situation where I saw >1 event
> displayed but I agree with Kan, we implicitly de-group events on
> different PMUs so it won't help here.

The --group option was implemented in perf report which means
it doesn't matter if events are in different PMUs. It's just to display
the results together.

Thanks,
Namhyung