When PMUs have a cpu map in the 'cpus' or 'cpumask' file, perf will
try to open events on those CPUs. ARM doesn't remove offline CPUs
meaning taking a CPU offline will cause perf commands to fail unless a
CPU map is passed on the command line.
More context in:
https://lore.kernel.org/lkml/[email protected]/
Reported-by: Yicong Yang <[email protected]>
Closes: https://lore.kernel.org/lkml/[email protected]/
Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/arch/arm/util/pmu.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/tools/perf/arch/arm/util/pmu.c b/tools/perf/arch/arm/util/pmu.c
index 8b7cb68ba1a8..6b544edbd3f6 100644
--- a/tools/perf/arch/arm/util/pmu.c
+++ b/tools/perf/arch/arm/util/pmu.c
@@ -11,12 +11,15 @@
#include "arm-spe.h"
#include "hisi-ptt.h"
+#include "../../../util/cpumap.h"
#include "../../../util/pmu.h"
#include "../../../util/cs-etm.h"
#include "../../arm64/util/mem-events.h"
-void perf_pmu__arch_init(struct perf_pmu *pmu __maybe_unused)
+void perf_pmu__arch_init(struct perf_pmu *pmu)
{
+ struct perf_cpu_map *intersect;
+
#ifdef HAVE_AUXTRACE_SUPPORT
if (!strcmp(pmu->name, CORESIGHT_ETM_PMU_NAME)) {
/* add ETM default config here */
@@ -33,6 +36,9 @@ void perf_pmu__arch_init(struct perf_pmu *pmu __maybe_unused)
pmu->selectable = true;
#endif
}
-
#endif
+ /* Workaround some ARM PMU's failing to correctly set CPU maps for online processors. */
+ intersect = perf_cpu_map__intersect(cpu_map__online(), pmu->cpus);
+ perf_cpu_map__put(pmu->cpus);
+ pmu->cpus = intersect;
}
--
2.45.2.505.gda0bf45e8d-goog
Hi Ian,
On 6/7/24 07:53, Ian Rogers wrote:
[...]
> diff --git a/tools/perf/arch/arm/util/pmu.c b/tools/perf/arch/arm/util/pmu.c
> index 8b7cb68ba1a8..6b544edbd3f6 100644
> --- a/tools/perf/arch/arm/util/pmu.c
> +++ b/tools/perf/arch/arm/util/pmu.c
> @@ -11,12 +11,15 @@
>
> #include "arm-spe.h"
> #include "hisi-ptt.h"
> +#include "../../../util/cpumap.h"
> #include "../../../util/pmu.h"
> #include "../../../util/cs-etm.h"
> #include "../../arm64/util/mem-events.h"
>
> -void perf_pmu__arch_init(struct perf_pmu *pmu __maybe_unused)
> +void perf_pmu__arch_init(struct perf_pmu *pmu)
> {
> + struct perf_cpu_map *intersect;
> +
> #ifdef HAVE_AUXTRACE_SUPPORT
> if (!strcmp(pmu->name, CORESIGHT_ETM_PMU_NAME)) {
> /* add ETM default config here */
> @@ -33,6 +36,9 @@ void perf_pmu__arch_init(struct perf_pmu *pmu __maybe_unused)
> pmu->selectable = true;
> #endif
> }
> -
> #endif
> + /* Workaround some ARM PMU's failing to correctly set CPU maps for online processors. */
> + intersect = perf_cpu_map__intersect(cpu_map__online(), pmu->cpus);
> + perf_cpu_map__put(pmu->cpus);
> + pmu->cpus = intersect;
I did a test for this patch, it works well for me.
Tested-by: Leo Yan <[email protected]>
Just a nitpick and I think it is not an issue caused by this patch.
After hotplug off one CPU and then if specify the CPU with option '-C',
the 'perf stat' command still continues to run. This is inconsistent
with the 'perf record' which exits with failures immediately.
Maybe consider to send an extra patch to address this issue?
The test steps are:
# echo 0 > /sys/devices/system/cpu0/online
# perf stat -C 0 -e armv9_cortex_a520/cycles/ -- sleep 1
WARNING: A requested CPU in '0' is not supported by PMU
'armv9_cortex_a520' (CPUs 5) for event 'armv9_cortex_a520/cycles/'
Performance counter stats for 'CPU(s) 0':
<not supported> armv9_cortex_a520/cycles/
1.001223060 seconds time elapsed
Thanks,
Leo
> }
> --
> 2.45.2.505.gda0bf45e8d-goog
>
>
On Mon, Jun 10, 2024 at 6:30 AM Leo Yan <[email protected]> wrote:
[snip]
> I did a test for this patch, it works well for me.
>
> Tested-by: Leo Yan <[email protected]>
>
> Just a nitpick and I think it is not an issue caused by this patch.
> After hotplug off one CPU and then if specify the CPU with option '-C',
> the 'perf stat' command still continues to run. This is inconsistent
> with the 'perf record' which exits with failures immediately.
>
> Maybe consider to send an extra patch to address this issue?
As you say, this doesn't relate to the problem fixed here. I don't
have a problem with the command line behavior change but I think my
getting shouted at budget is fairly well exhausted. How about we trade
issues, if the following get fixed:
Renaming of the cycles event in arm_dsu_pmu.c - I'd say this is a top
priority issue right now.
Fixing ARM hybrid default perf stat (it is crazy we can accept this
being broken), opening events on both BIG.little PMUs as done here:
https://lore.kernel.org/lkml/[email protected]/
I'll address this.
Thanks,
Ian
Hi Ian,
On 2024/6/7 14:53, Ian Rogers wrote:
> When PMUs have a cpu map in the 'cpus' or 'cpumask' file, perf will
> try to open events on those CPUs. ARM doesn't remove offline CPUs
> meaning taking a CPU offline will cause perf commands to fail unless a
> CPU map is passed on the command line.
>
> More context in:
> https://lore.kernel.org/lkml/[email protected]/
>
> Reported-by: Yicong Yang <[email protected]>
> Closes: https://lore.kernel.org/lkml/[email protected]/
> Signed-off-by: Ian Rogers <[email protected]>
Tested worked for this case:
[root@localhost tmp]# echo 0 > /sys/devices/system/cpu/cpu1/online
[root@localhost tmp]# /home/yang/perf_static stat -e armv8_pmuv3_0/cpu_cycles/ --timeout 100
Error:
The sys_perf_event_open() syscall returned with 19 (No such device) for event (armv8_pmuv3_0/cpu_cycles/).
/bin/dmesg | grep -i perf may provide additional information.
[root@localhost tmp]# /tmp/perf_Ian stat -e armv8_pmuv3_0/cpu_cycles/ --timeout 100
Performance counter stats for 'system wide':
54994604 armv8_pmuv3_0/cpu_cycles/
0.176079800 seconds time elapsed
So:
Tested-by: Yicong Yang <[email protected]>
But still wondering why isn't it better to move this into pmu_cpumask() as does in
the previous patch? Yes currently this is a arm specific issue, but we cannot handle
the case if later PMU doesn't update the cpus/cpumask either :)
> ---
> tools/perf/arch/arm/util/pmu.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/arch/arm/util/pmu.c b/tools/perf/arch/arm/util/pmu.c
> index 8b7cb68ba1a8..6b544edbd3f6 100644
> --- a/tools/perf/arch/arm/util/pmu.c
> +++ b/tools/perf/arch/arm/util/pmu.c
> @@ -11,12 +11,15 @@
>
> #include "arm-spe.h"
> #include "hisi-ptt.h"
> +#include "../../../util/cpumap.h"
> #include "../../../util/pmu.h"
> #include "../../../util/cs-etm.h"
> #include "../../arm64/util/mem-events.h"
>
> -void perf_pmu__arch_init(struct perf_pmu *pmu __maybe_unused)
> +void perf_pmu__arch_init(struct perf_pmu *pmu)
> {
> + struct perf_cpu_map *intersect;
> +
> #ifdef HAVE_AUXTRACE_SUPPORT
> if (!strcmp(pmu->name, CORESIGHT_ETM_PMU_NAME)) {
> /* add ETM default config here */
> @@ -33,6 +36,9 @@ void perf_pmu__arch_init(struct perf_pmu *pmu __maybe_unused)
> pmu->selectable = true;
> #endif
> }
> -
> #endif
> + /* Workaround some ARM PMU's failing to correctly set CPU maps for online processors. */
> + intersect = perf_cpu_map__intersect(cpu_map__online(), pmu->cpus);
> + perf_cpu_map__put(pmu->cpus);
> + pmu->cpus = intersect;
> }
>
On Wed, Jun 12, 2024 at 3:16 AM Yicong Yang <[email protected]> wrote:
>
> Hi Ian,
>
> On 2024/6/7 14:53, Ian Rogers wrote:
> > When PMUs have a cpu map in the 'cpus' or 'cpumask' file, perf will
> > try to open events on those CPUs. ARM doesn't remove offline CPUs
> > meaning taking a CPU offline will cause perf commands to fail unless a
> > CPU map is passed on the command line.
> >
> > More context in:
> > https://lore.kernel.org/lkml/[email protected]/
> >
> > Reported-by: Yicong Yang <[email protected]>
> > Closes: https://lore.kernel.org/lkml/[email protected]/
> > Signed-off-by: Ian Rogers <[email protected]>
>
> Tested worked for this case:
>
> [root@localhost tmp]# echo 0 > /sys/devices/system/cpu/cpu1/online
> [root@localhost tmp]# /home/yang/perf_static stat -e armv8_pmuv3_0/cpu_cycles/ --timeout 100
> Error:
> The sys_perf_event_open() syscall returned with 19 (No such device) for event (armv8_pmuv3_0/cpu_cycles/).
> /bin/dmesg | grep -i perf may provide additional information.
>
> [root@localhost tmp]# /tmp/perf_Ian stat -e armv8_pmuv3_0/cpu_cycles/ --timeout 100
>
> Performance counter stats for 'system wide':
>
> 54994604 armv8_pmuv3_0/cpu_cycles/
>
> 0.176079800 seconds time elapsed
>
> So:
> Tested-by: Yicong Yang <[email protected]>
>
> But still wondering why isn't it better to move this into pmu_cpumask() as does in
> the previous patch? Yes currently this is a arm specific issue, but we cannot handle
> the case if later PMU doesn't update the cpus/cpumask either :)
Thanks Yicong. To the greatest extent possible I'm trying to avoid
unnecessary code during event parsing. On x86 there isn't a PMU that
has the buggy behavior of showing offline CPUs in the cpumask, so
computing the online CPUs and doing the intersection there is pure
overhead. The online CPUs calculation is either going to read a file
or probe via sysconf. The fallback sysconf probing may fail and can't
handle arbitrary holes in the CPU map (like in your example). So I'm
concerned about the overhead of doing this in pmu_cpumask and the
ability for it to break non-ARM platforms. I think the right
conservative thing to do is to just have the buggy cpumask fix for ARM
and work toward fixing the PMU drivers so potentially in the future we
can remove the fix there. Something I'd like to see is greater PMU
driver testing and detecting bugs, like:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/pmu.c?h=perf-tools-next#n285
Thanks,
Ian
> > ---
> > tools/perf/arch/arm/util/pmu.c | 10 ++++++++--
> > 1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/tools/perf/arch/arm/util/pmu.c b/tools/perf/arch/arm/util/pmu.c
> > index 8b7cb68ba1a8..6b544edbd3f6 100644
> > --- a/tools/perf/arch/arm/util/pmu.c
> > +++ b/tools/perf/arch/arm/util/pmu.c
> > @@ -11,12 +11,15 @@
> >
> > #include "arm-spe.h"
> > #include "hisi-ptt.h"
> > +#include "../../../util/cpumap.h"
> > #include "../../../util/pmu.h"
> > #include "../../../util/cs-etm.h"
> > #include "../../arm64/util/mem-events.h"
> >
> > -void perf_pmu__arch_init(struct perf_pmu *pmu __maybe_unused)
> > +void perf_pmu__arch_init(struct perf_pmu *pmu)
> > {
> > + struct perf_cpu_map *intersect;
> > +
> > #ifdef HAVE_AUXTRACE_SUPPORT
> > if (!strcmp(pmu->name, CORESIGHT_ETM_PMU_NAME)) {
> > /* add ETM default config here */
> > @@ -33,6 +36,9 @@ void perf_pmu__arch_init(struct perf_pmu *pmu __maybe_unused)
> > pmu->selectable = true;
> > #endif
> > }
> > -
> > #endif
> > + /* Workaround some ARM PMU's failing to correctly set CPU maps for online processors. */
> > + intersect = perf_cpu_map__intersect(cpu_map__online(), pmu->cpus);
> > + perf_cpu_map__put(pmu->cpus);
> > + pmu->cpus = intersect;
> > }
> >
Hi Ian,
On 6/10/24 18:33, Ian Rogers wrote:
[...]
>> Just a nitpick and I think it is not an issue caused by this patch.
>> After hotplug off one CPU and then if specify the CPU with option '-C',
>> the 'perf stat' command still continues to run. This is inconsistent
>> with the 'perf record' which exits with failures immediately.
>>
>> Maybe consider to send an extra patch to address this issue?
>
> As you say, this doesn't relate to the problem fixed here. I don't
> have a problem with the command line behavior change but I think my
> getting shouted at budget is fairly well exhausted.
I understand you put much efforts on fixing issues, on Arm platforms and
other platforms. This is also why I want to contribute a bit for testing
the patches.
> How about we trade issues, if the following get fixed:
>
> Renaming of the cycles event in arm_dsu_pmu.c - I'd say this is a top
> priority issue right now.
I cannot promise this. The main reason is that I still believe the
'cycles' event (or, generally speaking, all events) should be managed by
the tool rather than by the uncore PMU drivers. Additionally, the perf
tool currently has handled these symbolic events effectively.
> Fixing ARM hybrid default perf stat (it is crazy we can accept this
> being broken), opening events on both BIG.little PMUs as done here.
Yeah, James is working on this.
> I'll address this.
Anyway, I appreciate your work.
Thanks,
Leo
On Wed, Jun 12, 2024 at 1:32 PM Leo Yan <[email protected]> wrote:
[...]
> >
> > Renaming of the cycles event in arm_dsu_pmu.c - I'd say this is a top
> > priority issue right now.
>
> I cannot promise this. The main reason is that I still believe the
> 'cycles' event (or, generally speaking, all events) should be managed by
> the tool rather than by the uncore PMU drivers. Additionally, the perf
> tool currently has handled these symbolic events effectively.
I don't understand this.
1) the PMU advertises an event called 'cycles' - nothing to do with the tool
2) perf without a PMU matches events on all PMUs. I don't know people
involved in the perf tool development who think cycles shouldn't match
on an uncore PMU, which was Linus' stand point and reason for
rejecting my fix in favor of a revert.
3) matching all PMUs wasn't the case for legacy events, like cycles,
but now we want it to be to fix the Apple M? PMU issues where legacy
events fail to work. The whole reason we changed the priority of
legacy events as ARM requested. It also makes things more uniform with
BIG.little/hybrid.
We can update perf record to warn (not fail) about not opening events.
ARM should do this as (1) and (3) were caused by ARM - I have a WIP
patch but I'm in no mood to finish/send it. Even with this there will
be a warning for "perf record -e cycles .." and fixing the event name
is the only way to clear that up without special case code or making
warnings verbose only - something that makes me and others
uncomfortable.
Thanks,
Ian